roshnika fernando p age r ank. w hy p age r ank ? the internet is a global system of networks...
TRANSCRIPT
Roshnika Fernando
PAGERANK
WHY PAGERANK?
The internet is a global system of networks linking to smaller networks.
This system keeps growing, so there must be a way to sort though all the information available.
PageRank is the algorithm used by the search engine Google to sort through internet webpages
A webpage’s rank determines the order it appears when a keyword search is performed on Google
Fun Fact: PageRank is named after Larry Page, one of the founders of Google, not after webpages
POPULARITY CONTEST
Rank, at its simplest, is the probability that a webpage will be visited
Sum of rank of all pages is 1
Rank of linked pagesaffects rank of page
Initially, rank = 1/(total # of pages available) ≈ 0 for internet
DETERMINING RANK
Let P be an i x j stochastic matrix where pi,j is the probability of going to webpage j from webpage i.
pi,j = (# of links to page j from page i) (# of links on page i)
Note: i and j are integers and positive values
Note: There are around 25 billion pi,j combinations on the internet
LONG TERM PROBABILITY
After a very long time, what is the probability that web surfers will be at a certain website?
Let be the stationary distribution vector where is the probability of being at state k.
Since stochastic matrices have eigenvalue λ = 1,
Solve for to determine long term probability of being at each webpage (aka the rank)
x
kx
0I)-(P
PP
x
xxxx
x
SMALL SCALE EXAMPLE
7 pages
linked to
one
another
0000000
0000002.00333.000015.05.002.00333.0002.0000002.0005.5.02.05.333.05.10
P
7
6
5
4
3
2
1
xxxxxxx
LINEAR PROGRAM
0000000
0000002.00333.000015.05.002.00333.0002.0000002.0005.5.02.05.333.05.10
P
7
6
5
4
3
2
1
xxxxxxx
.061
.045
.179
.105
.141
.166
.304
x
Solve for x vector using (P - I)x = 0 to obtain Page Rank
x vector is the eigenvector for eigenvalue λ = 1
SMALL SCALE SOLUTION
•As t → ∞
•pi,j given
•PageRank:
x1 = .304
x2 = .166
x3 = .141
x4 = .105
x5 = .179
x6 = .045
x7 = .061
SENSITIVITY ANALYSIS
What if a page has no links? What happens to the probability matrix P?
P is stochastic, meaning the sum of the columns must equal 1.
If a page has no links leading out, then pi,j for that given column will be distributed evenly to all rows in j so that
This assumes when someone reaches a dead end, the possibility of him/her going to a new page is entirely random
1
, 1i
jix
PROBABILITY AND RANK
The stationary distribution vector contains the rank of each webpage, which determines the order it appear when a keyword search is performed
This rank is the probability that a person will be at each of the billions of pages available online.
This takes several powerful computers to compute.
x
QUESTIONS?
CITATIONS
Austin, David. "How Google Finds Your Needle in the Web's Haystack." AMS.org. American Mathematical Society. Web. 09 Nov. 2009. <http://www.ams.org/featurecolumn/archive/page
rank.html>.
"PageRank." Wikipedia, the free encyclopedia. Web. 09 Nov. 2009. <http://en.wikipedia.org/wiki/PageRank#False_or_
spoofed_PageRank>.
Photograph. PageRanks-Example. Wikipedia, 8 July 2009. Web. 9 Nov. 2009. <http://upload.wikimedia.org/wikipedia/commons/f/fb
/PageRanks-Example.svg>.
"Stochastic matrix." Wikipedia, the free encyclopedia. Web. 09 Nov. 2009. <http://en.wikipedia.org/wiki/Stochastic_matrix>.