extrapolation methods for accelerating pagerank computations

15
Extrapolation Methods for Extrapolation Methods for Accelerating PageRank Accelerating PageRank Computations Computations Doğu Gül Doğu Gül Boğaziçi University Boğaziçi University 1/12/2003 1/12/2003

Upload: anisa

Post on 06-Jan-2016

64 views

Category:

Documents


0 download

DESCRIPTION

Extrapolation Methods for Accelerating PageRank Computations. Doğu Gül Boğaziçi University 1/12/2003. Introduction. Fast computation method for PageRank which is a hyperlink-based estimate of the “importance” of Web pages, is proposed. Web link graph is represented by a “Markov matrix”. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Extrapolation Methods for Accelerating PageRank Computations

Extrapolation Methods for Accelerating Extrapolation Methods for Accelerating PageRank ComputationsPageRank Computations

Doğu GülDoğu Gül

Boğaziçi UniversityBoğaziçi University

1/12/20031/12/2003

Page 2: Extrapolation Methods for Accelerating PageRank Computations

IntroductionIntroduction

Fast computation method for PageRank which is a Fast computation method for PageRank which is a hyperlink-based estimate of the “importance” of Web hyperlink-based estimate of the “importance” of Web pages, is proposed.pages, is proposed.

Web link graph is represented by a “Markov matrix”.Web link graph is represented by a “Markov matrix”. The PageRank algorithm uses the “Power Method” to The PageRank algorithm uses the “Power Method” to

compute the Markov matrix.compute the Markov matrix. Empirically, it is shown that extrapolation methods speed Empirically, it is shown that extrapolation methods speed

up PageRank computation by 25-300%.up PageRank computation by 25-300%.

Page 3: Extrapolation Methods for Accelerating PageRank Computations

DefinitionsDefinitions

A link from a page A link from a page u u to a page to a page v v can be viewed as an can be viewed as an evidence that evidence that vv is an “important” page. is an “important” page.

The amount of importance of page The amount of importance of page vv which has a link which has a link from a page from a page uu, is proportional to the importance of , is proportional to the importance of uu and and inversely proportional to the number of pages inversely proportional to the number of pages uu points to. points to.

The PageRank of a page The PageRank of a page ii is defined as the probability is defined as the probability that at some particular time step, the surfer is at page that at some particular time step, the surfer is at page ii..

Page 4: Extrapolation Methods for Accelerating PageRank Computations

Adopting Markov MatrixAdopting Markov Matrix

The problem can be defined as a random walk on a directed The problem can be defined as a random walk on a directed Web graph.Web graph.

Assume there exists an edge from Assume there exists an edge from uu to to vv.. Deg(Deg(uu) is the outdegree of page ) is the outdegree of page uu in a Web graph G. in a Web graph G. Consider a random surfer visiting page Consider a random surfer visiting page uu at time at time kk, in the next , in the next

time step, the surfer chooses a node time step, the surfer chooses a node vvii from among from among uu’s out-’s out-

neighbors uniformly at random. neighbors uniformly at random. The transition matrix describing the transition from The transition matrix describing the transition from ii to to jj is given is given

by by PP with with PPijij = 1 / deg(i). = 1 / deg(i).

Page 5: Extrapolation Methods for Accelerating PageRank Computations

Conversion to a Valid Transition MatrixConversion to a Valid Transition Matrix

For For PP to be a valid transition matrix, to be a valid transition matrix, PP should have no rows should have no rows with consisting of all zeros.with consisting of all zeros.

A new transition matrix A new transition matrix P’ P’ is introduced which has no rows is introduced which has no rows existing with all zeros.existing with all zeros.

Let Let dd be the n-dimensional column vector identifying the be the n-dimensional column vector identifying the nodes with outdegree 0:nodes with outdegree 0:

Page 6: Extrapolation Methods for Accelerating PageRank Computations

Conversion to a Valid Transition Matrix Conversion to a Valid Transition Matrix (cont.)(cont.)

Then P’ is constructed as follows:Then P’ is constructed as follows:

P’’P’’ is constructed as follows: is constructed as follows:

Page 7: Extrapolation Methods for Accelerating PageRank Computations

Power MethodPower Method

The The A A that is equal to ( that is equal to (P’’)P’’)TT, , is used in the formulations of is used in the formulations of “Power Method”.“Power Method”.

x(k) = A(k).x(k-1) x(k) = A(k).x(k-1)

x(0) can be written as follows: x(0) can be written as follows:

x(0) = ux(0) = u11 + + αα22uu2 2 + ..... + + ..... + ααmmuumm

Page 8: Extrapolation Methods for Accelerating PageRank Computations

Power Method AlgorithmPower Method Algorithm

The power method algorithm:The power method algorithm:

PowerMethod(){PowerMethod(){

x(0) = vx(0) = v

k = 1k = 1

repeatrepeat

x(k) = Ax(k-1)x(k) = Ax(k-1)

a = |x(k) – x(k-1)|a = |x(k) – x(k-1)|

k = k + 1k = k + 1

until a < until a < εε

}}

Page 9: Extrapolation Methods for Accelerating PageRank Computations

Aitken ExtrapolationAitken Extrapolation

x(k-2) x(k-2) can be expressed as a linear combination of the can be expressed as a linear combination of the first two eigenvectors.first two eigenvectors.

x(k-2) = ux(k-2) = u11 + + αα22uu22

x(k-1) = A x(k-2) x(k-1) = A x(k-2)

x(k) = A x(k-1) x(k) = A x(k-1)

Page 10: Extrapolation Methods for Accelerating PageRank Computations

Aitken Extrapolation ResultsAitken Extrapolation Results

Comparison of convergence Comparison of convergence rate of unaccelerated Power rate of unaccelerated Power Method and Aitken Method and Aitken Extrapolation for c = 0.99.Extrapolation for c = 0.99.

Extrapolation was applied at Extrapolation was applied at the 10th iteration.the 10th iteration.

Page 11: Extrapolation Methods for Accelerating PageRank Computations

Quadratic ExtrapolationQuadratic Extrapolation

It is assumed that Markov matrix A has only three It is assumed that Markov matrix A has only three eigenvectors and eigenvectors and x(k-3)x(k-3) can be expressed as a linear can be expressed as a linear combination of these three eigenvectors.combination of these three eigenvectors. x(k-2) = ux(k-2) = u11 + + αα22uu2 + 2 + αα33uu33

x(k-2) = A x(k-3)x(k-2) = A x(k-3)

x(k-1) = A x(k-2) x(k-1) = A x(k-2)

x(k) = A x(k-1) x(k) = A x(k-1)

Page 12: Extrapolation Methods for Accelerating PageRank Computations

Quadratic Extrapolation ResultsQuadratic Extrapolation Results

Comparison of convergence Comparison of convergence rates for Power Method and rates for Power Method and Quadratic Extrapolation on Quadratic Extrapolation on LARGEWEB for LARGEWEB for

c = 0.90.c = 0.90.

Page 13: Extrapolation Methods for Accelerating PageRank Computations

Quadratic Extrapolation ResultsQuadratic Extrapolation Results

Comparison of times taken Comparison of times taken by Power Method and by Power Method and Quadratic Extrapolation on Quadratic Extrapolation on LARGEWEB for LARGEWEB for

c = {0.90, 0.95, 0.99}c = {0.90, 0.95, 0.99} The residual tolerance is set The residual tolerance is set

to 0.001 for c = {0.90, 0.95} to 0.001 for c = {0.90, 0.95} and 0.01 for c = 0.99.and 0.01 for c = 0.99.

Page 14: Extrapolation Methods for Accelerating PageRank Computations

Comparison of Convergence Rates for Comparison of Convergence Rates for Three MethodsThree Methods

Comparison of convergence Comparison of convergence rates for Power Method, rates for Power Method, Aitken Extrapolation and Aitken Extrapolation and Quadratic Extrapolation for Quadratic Extrapolation for c = 0.99.c = 0.99.

Page 15: Extrapolation Methods for Accelerating PageRank Computations

ConclusionConclusion

Although PageRank is an offline computation, it has Although PageRank is an offline computation, it has become increasingly desirable to speed up this become increasingly desirable to speed up this computation.computation.

The extrapolation step need only be applied periodically The extrapolation step need only be applied periodically not at all steps. not at all steps.

Quadratic and Aitken extrapolation is a simple technique Quadratic and Aitken extrapolation is a simple technique that requires little additional infrastructure to integrate that requires little additional infrastructure to integrate into the standard Power Method.into the standard Power Method.