1 page rank uintuition: solve the recursive equation: “a page is important if important pages link...

32
1 Page Rank Intuition: solve the recursive equation: “a page is important if important pages link to it.” In technical terms: compute the principal eigenvector of the stochastic matrix of the Web. A few fixups needed.

Upload: frederick-green

Post on 01-Jan-2016

223 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

1

Page Rank

Intuition: solve the recursive equation: “a page is important if important pages link to it.”

In technical terms: compute the principal eigenvector of the stochastic matrix of the Web. A few fixups needed.

Page 2: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

2

Stochastic Matrix of the Web

Enumerate pages. Page i corresponds to row and column

i. M[i,j] = 1/n if page j links to n pages,

including page i; 0 if j does not link to i. Seems backwards, but allows

multiplication by M on the left to represent “follow a link.”

Page 3: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

3

Example

i

j

Suppose page j links to 3 pages, including i

1/3

Page 4: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

4

Random Walks on the Web

Suppose v is a vector whose i-th component is the probability that we are at page i at a certain time.

If we follow a link from i at random, the probability distribution of the page we are then at is given by the vector Mv.

Page 5: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

5

The multiplicationp11 p12 p13 p1

p21 p22 p23 X p2

p31 p32 p33 p3

If the probability that we are in page i is pi, then in the next iteration p1 will be the probability we are in page 1 and will stay there + the probability we are in page 2 times the probability of moving from 2 to 1 + the probability that we are in page 3 times the probability of moving from 3 to 1:

p11 x p1 + p12 x p2+ p13 x p3

Page 6: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

6

Random Walks 2

Starting from any vector v, the limit M(M(…M(Mv)…)) is the distribution of page visits during a random walk.

Intuition: pages are important in proportion to how often a random walker would visit them.

The math: limiting distribution = principal eigenvector of M = PageRank.

Page 7: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

7

Example: The Web in 1839

Yahoo

M’softAmazon

y 1/2 1/2 0a 1/2 0 1m 0 1/2 0

y a m

Page 8: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

8

Simulating a Random Walk

Start with the vector v = [1,1,…,1] representing the idea that each Web page is given one unit of “importance.”

Repeatedly apply the matrix M to v, allowing the importance to flow like a random walk.

Limit exists, but about 50 iterations is sufficient to estimate final distribution.

Page 9: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

9

Example

Equations v = Mv: y = y/2 + a/2 a = y/2 + m m = a/2

ya =m

111

13/21/2

5/4 13/4

9/811/81/2

6/56/53/5

. . .

Page 10: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

10

Solving The Equations

These 3 equations in 3 unknowns do not have a unique solution.

Add in the fact that y+a+m=3 to solve.

In Web-sized examples, we cannot solve by Gaussian elimination (we need to use other solution (relaxation = iterative solution).

Page 11: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

11

Real-World Problems

Some pages are “dead ends” (have no links out). Such a page causes importance to

leak out. Other (groups of) pages are spider

traps (all out-links are within the group). Eventually spider traps absorb all

importance.

Page 12: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

12

Microsoft Becomes Dead EndYahoo

M’softAmazon

y 1/2 1/2 0a 1/2 0 0m 0 1/2 0

y a m

Page 13: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

13

Example

Equations v = Mv: y = y/2 + a/2 a = y/2 m = a/2

ya =m

111

11/21/2

3/41/21/4

5/83/81/4

000

. . .

Page 14: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

14

M’soft Becomes Spider Trap

Yahoo

M’softAmazon

y 1/2 1/2 0a 1/2 0 0m 0 1/2 1

y a m

Page 15: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

15

Example

Equations v = Mv: y = y/2 + a/2 a = y/2 m = a/2 + m

ya =m

111

11/23/2

3/41/27/4

5/83/82

003

. . .

Page 16: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

16

Google Solution to Traps, Etc.

“Tax” each page a fixed percentage at each iteration. This percentage is also called “damping factor”.

Add the same constant to all pages. Models a random walk in which surfer

has a fixed probability of abandoning search and going to a random page next.

Page 17: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

17

Ex: Previous with 20% Tax

Equations v = 0.8(Mv) + 0.2: y = 0.8(y/2 + a/2) + 0.2 a = 0.8(y/2) + 0.2 m = 0.8(a/2 + m) + 0.2

ya =m

111

1.000.601.40

0.840.601.56

0.7760.5361.688

7/11 5/1121/11

. . .

Page 18: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

18

Solving the Equations

We can expect to solve small examples by Gaussian elimination.

Web-sized examples still need to be solved by more complex (relaxation) methods.

Page 19: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

19

Search-Engine Architecture

All search engines, including Google, select pages that have the words of your query.

Give more weight to the word appearing in the title, header, etc.

Inverted indexes speed the discovery of pages with given words.

Page 20: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

20

Google Anti-Spam Devices

Early search engines relied on the words on a page to tell what it is about. Led to “tricks” in which pages attracted

attention by placing false words in the background color on their page.

Google trusts the words in anchor text Relies on others telling the truth about

your page, rather than relying on you.

Page 21: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

21

Use of Page Rank

Pages are ordered by many criteria, including the PageRank and the appearance of query words. “Important” pages more likely to be

what you want. PageRank is also an antispam device.

Creating bogus links to yourself doesn’t help if you are not an important page.

Page 22: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

22

Discussion

Dealing with incentives Several types of links Page ranking as voting

Page 23: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

23

Hubs and Authorities

Distinguishing Two Roles for Pages

Page 24: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

24

Hubs and Authorities

Mutually recursive definition: A hub links to many authorities; An authority is linked to by many hubs.

Authorities turn out to be places where information can be found. Example: information about how to use a

programming language Hubs tell who the authorities are.

Example: a catalogue of sources about programming languages

Page 25: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

25

Transition Matrix A

H&A uses a matrix A[i,j] = 1 if page i links to page j, 0 if not.

A’, the transpose of A, is similar to the PageRank matrix M, but A’ has 1’s where M has fractions.

Page 26: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

26

Example

Yahoo

M’softAmazon

y 1 1 1a 1 0 1m 0 1 0

y a m

A =

Page 27: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

27

Using Matrix A for H&A

Let h and a be vectors measuring the “hubbiness” and authority of each page.

Equations: h = Aa; a = A’ h. Hubbiness = scaled sum of

authorities of linked pages. Authority = scaled sum of hubbiness

of linked predecessors.

Page 28: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

28

Consequences of Basic Equations

From h = Aa; a = A’ h we can derive: h = AA’ h a = A’Aa

Compute h and a by iteration, assuming initially each page has one unit of hubbiness and one unit of authority.

There are different normalization techniques (after each iteration in an iterative procedure; other implementation is “normalization at end”).

Page 29: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

29

The multiplication

1 1 1 a1 h1

1 0 1 x a2 = h2

0 1 0 a3 h3

In order to know the hubbiness of page 2, h2, we need to add up the level of authority of the pages it points to (1 and 3).

Page 30: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

30

The multiplication

1 1 0 h1 a1

1 0 1 x h2 = a2

1 1 0 h3 a3

In order to know the level authority of page 3, a3, we need to add up the amount of hubbiness of the pages that point to it (1 and 2).

Page 31: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

31

Example

1 1 1A = 1 0 1 0 1 0

1 1 0A’ = 1 0 1 1 1 0

3 2 1AA’= 2 2 0 1 0 1

2 1 2A’A= 1 2 1 2 1 2

a(yahoo)a(amazon)a(m’soft)

===

111

545

241824

114 84114

. . .

. . .

. . .

1+sqrt(3)21+sqrt(3)

h(yahoo) = 1h(amazon) = 1h(m’soft) = 1

642

132 96 36

. . .

. . .

. . .

1.0000.7350.268

2820 8

Page 32: 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector

32

Solving the Equations

Solution of even small examples is tricky.

As for PageRank, we need to solve big examples by relaxation.