roshnika fernando p age r ank. w hy p age r ank ? the internet is a global system of networks...

12
Roshnika Fernando PAGERANK

Upload: lilian-hopkins

Post on 12-Jan-2016

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Roshnika Fernando P AGE R ANK. W HY P AGE R ANK ?  The internet is a global system of networks linking to smaller networks.  This system keeps growing,

Roshnika Fernando

PAGERANK

Page 2: Roshnika Fernando P AGE R ANK. W HY P AGE R ANK ?  The internet is a global system of networks linking to smaller networks.  This system keeps growing,

WHY PAGERANK?

The internet is a global system of networks linking to smaller networks.

This system keeps growing, so there must be a way to sort though all the information available.

PageRank is the algorithm used by the search engine Google to sort through internet webpages

A webpage’s rank determines the order it appears when a keyword search is performed on Google

Fun Fact: PageRank is named after Larry Page, one of the founders of Google, not after webpages

Page 3: Roshnika Fernando P AGE R ANK. W HY P AGE R ANK ?  The internet is a global system of networks linking to smaller networks.  This system keeps growing,

POPULARITY CONTEST

Rank, at its simplest, is the probability that a webpage will be visited

Sum of rank of all pages is 1

Rank of linked pagesaffects rank of page

Initially, rank = 1/(total # of pages available) ≈ 0 for internet

Page 4: Roshnika Fernando P AGE R ANK. W HY P AGE R ANK ?  The internet is a global system of networks linking to smaller networks.  This system keeps growing,

DETERMINING RANK

Let P be an i x j stochastic matrix where pi,j is the probability of going to webpage j from webpage i.

pi,j = (# of links to page j from page i) (# of links on page i)

Note: i and j are integers and positive values

Note: There are around 25 billion pi,j combinations on the internet

Page 5: Roshnika Fernando P AGE R ANK. W HY P AGE R ANK ?  The internet is a global system of networks linking to smaller networks.  This system keeps growing,

LONG TERM PROBABILITY

After a very long time, what is the probability that web surfers will be at a certain website?

Let be the stationary distribution vector where is the probability of being at state k.

Since stochastic matrices have eigenvalue λ = 1,

Solve for to determine long term probability of being at each webpage (aka the rank)

x

kx

0I)-(P

PP

x

xxxx

x

Page 6: Roshnika Fernando P AGE R ANK. W HY P AGE R ANK ?  The internet is a global system of networks linking to smaller networks.  This system keeps growing,

SMALL SCALE EXAMPLE

7 pages

linked to

one

another

0000000

0000002.00333.000015.05.002.00333.0002.0000002.0005.5.02.05.333.05.10

P

7

6

5

4

3

2

1

xxxxxxx

Page 7: Roshnika Fernando P AGE R ANK. W HY P AGE R ANK ?  The internet is a global system of networks linking to smaller networks.  This system keeps growing,

LINEAR PROGRAM

0000000

0000002.00333.000015.05.002.00333.0002.0000002.0005.5.02.05.333.05.10

P

7

6

5

4

3

2

1

xxxxxxx

.061

.045

.179

.105

.141

.166

.304

x

Solve for x vector using (P - I)x = 0 to obtain Page Rank

x vector is the eigenvector for eigenvalue λ = 1

Page 8: Roshnika Fernando P AGE R ANK. W HY P AGE R ANK ?  The internet is a global system of networks linking to smaller networks.  This system keeps growing,

SMALL SCALE SOLUTION

•As t → ∞

•pi,j given

•PageRank:

x1 = .304

x2 = .166

x3 = .141

x4 = .105

x5 = .179

x6 = .045

x7 = .061

Page 9: Roshnika Fernando P AGE R ANK. W HY P AGE R ANK ?  The internet is a global system of networks linking to smaller networks.  This system keeps growing,

SENSITIVITY ANALYSIS

What if a page has no links? What happens to the probability matrix P?

P is stochastic, meaning the sum of the columns must equal 1.

If a page has no links leading out, then pi,j for that given column will be distributed evenly to all rows in j so that

This assumes when someone reaches a dead end, the possibility of him/her going to a new page is entirely random

1

, 1i

jix

Page 10: Roshnika Fernando P AGE R ANK. W HY P AGE R ANK ?  The internet is a global system of networks linking to smaller networks.  This system keeps growing,

PROBABILITY AND RANK

The stationary distribution vector contains the rank of each webpage, which determines the order it appear when a keyword search is performed

This rank is the probability that a person will be at each of the billions of pages available online.

This takes several powerful computers to compute.

x

Page 11: Roshnika Fernando P AGE R ANK. W HY P AGE R ANK ?  The internet is a global system of networks linking to smaller networks.  This system keeps growing,

QUESTIONS?

Page 12: Roshnika Fernando P AGE R ANK. W HY P AGE R ANK ?  The internet is a global system of networks linking to smaller networks.  This system keeps growing,

CITATIONS

Austin, David. "How Google Finds Your Needle in the Web's Haystack." AMS.org. American Mathematical Society. Web. 09 Nov. 2009. <http://www.ams.org/featurecolumn/archive/page

rank.html>.

"PageRank." Wikipedia, the free encyclopedia. Web. 09 Nov. 2009. <http://en.wikipedia.org/wiki/PageRank#False_or_

spoofed_PageRank>.

Photograph. PageRanks-Example. Wikipedia, 8 July 2009. Web. 9 Nov. 2009. <http://upload.wikimedia.org/wikipedia/commons/f/fb

/PageRanks-Example.svg>.

"Stochastic matrix." Wikipedia, the free encyclopedia. Web. 09 Nov. 2009. <http://en.wikipedia.org/wiki/Stochastic_matrix>.