pagerank and markov chain
DESCRIPTION
A brief introduction to the methodology used by PageRank to rank the webpages.TRANSCRIPT
Markov Chains as methodology used by PageRank torank the Web Pages on Internet.
Sergio S. Guirreri - www.guirreri.host22.com
Google Technology User Group (GTUG) of Palermo.
5th March 2010
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 1 / 14
Overview
1 Concepts on Markov-Chains.
2 The idea of the PageRank algorithm.
3 The PageRank algorithm.
4 Solving the PageRank algorithm.
5 Conclusions.
6 Bibliography.
7 Internet web sites.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 2 / 14
Concepts on Markov-Chains.
Stochastic Process and Markov-Chains.
Let assume the following stochastic process
{Xn; n = 0, 1, 2, . . . }
with values in a set E , called the state space, while its elements are calledstate of the process.
Let assume the set E is finite or countable.
DefinitionA Markov Chain is a stochastic process Xn that hold the following feature:
Prob{Xn+1 = j|Xn = i,Xn−1 = in−1, . . . ,X0 = i0} =
= Prob{Xn+1 = j|Xn = i} = pij(n)
where E is the state space set and j, i, in−1, . . . , i0 ∈ E , n ∈ N.The transition probability matrix P of the process Xn is composed of pij ,∀i, j ∈ E .
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 3 / 14
Concepts on Markov-Chains.
Stochastic Process and Markov-Chains.
Let assume the following stochastic process
{Xn; n = 0, 1, 2, . . . }
with values in a set E , called the state space, while its elements are calledstate of the process.Let assume the set E is finite or countable.
DefinitionA Markov Chain is a stochastic process Xn that hold the following feature:
Prob{Xn+1 = j|Xn = i,Xn−1 = in−1, . . . ,X0 = i0} =
= Prob{Xn+1 = j|Xn = i} = pij(n)
where E is the state space set and j, i, in−1, . . . , i0 ∈ E , n ∈ N.The transition probability matrix P of the process Xn is composed of pij ,∀i, j ∈ E .
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 3 / 14
Concepts on Markov-Chains.
Stochastic Process and Markov-Chains.
Let assume the following stochastic process
{Xn; n = 0, 1, 2, . . . }
with values in a set E , called the state space, while its elements are calledstate of the process.Let assume the set E is finite or countable.
DefinitionA Markov Chain is a stochastic process Xn that hold the following feature:
Prob{Xn+1 = j|Xn = i,Xn−1 = in−1, . . . ,X0 = i0} =
= Prob{Xn+1 = j|Xn = i} = pij(n)
where E is the state space set and j, i, in−1, . . . , i0 ∈ E , n ∈ N.The transition probability matrix P of the process Xn is composed of pij ,∀i, j ∈ E .
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 3 / 14
Concepts on Markov-Chains.
Stochastic Process and Markov-Chains.
Let assume the following stochastic process
{Xn; n = 0, 1, 2, . . . }
with values in a set E , called the state space, while its elements are calledstate of the process.Let assume the set E is finite or countable.
DefinitionA Markov Chain is a stochastic process Xn that hold the following feature:
Prob{Xn+1 = j|Xn = i,Xn−1 = in−1, . . . ,X0 = i0} =
= Prob{Xn+1 = j|Xn = i} = pij(n)
where E is the state space set and j, i, in−1, . . . , i0 ∈ E , n ∈ N.
The transition probability matrix P of the process Xn is composed of pij ,∀i, j ∈ E .
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 3 / 14
Concepts on Markov-Chains.
Stochastic Process and Markov-Chains.
Let assume the following stochastic process
{Xn; n = 0, 1, 2, . . . }
with values in a set E , called the state space, while its elements are calledstate of the process.Let assume the set E is finite or countable.
DefinitionA Markov Chain is a stochastic process Xn that hold the following feature:
Prob{Xn+1 = j|Xn = i,Xn−1 = in−1, . . . ,X0 = i0} =
= Prob{Xn+1 = j|Xn = i} = pij(n)
where E is the state space set and j, i, in−1, . . . , i0 ∈ E , n ∈ N.The transition probability matrix P of the process Xn is composed of pij ,∀i, j ∈ E .
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 3 / 14
The idea of the PageRank algorithm.
PageRank’s idea.The idea behind the PageRank algorithm is similar to the idea of the impactfactor index used to rank the Journals [Page et al.(1999)][Brin and Page(1998)] [Langville et al.(2008)].
PageRank the impact factor of Internet.The impact factor of a journal is defined as the average number of citationsper recently published papers in that journal.By regarding each web page as a journal, this idea was then extended tomeasure the importance of the web page in the PageRank Algorithm.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 4 / 14
The idea of the PageRank algorithm.
PageRank’s idea.The idea behind the PageRank algorithm is similar to the idea of the impactfactor index used to rank the Journals [Page et al.(1999)][Brin and Page(1998)] [Langville et al.(2008)].
PageRank the impact factor of Internet.The impact factor of a journal is defined as the average number of citationsper recently published papers in that journal.By regarding each web page as a journal, this idea was then extended tomeasure the importance of the web page in the PageRank Algorithm.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 4 / 14
The idea of the PageRank algorithm.
Elements of the PageRank.
To illustrate the PageRank algorithm I define the following variables[Ching and Ng(2006)]:
let be N the total number of web pages in the web.
let be k the outgoing links of web page j.let be Q the so called hyperlink matrix with elements:
Qij =
1k if web page i is an outgoing link of web page j;0 otherwise;Qi,i > 0 ∀i.
(1)
The hyperlink matrix Q can be regarded as a transition probability matrix ofa Markov chain.One may regard a surfer on the net as a random walker and the web pages asthe states of the Markov chain.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 5 / 14
The idea of the PageRank algorithm.
Elements of the PageRank.
To illustrate the PageRank algorithm I define the following variables[Ching and Ng(2006)]:
let be N the total number of web pages in the web.let be k the outgoing links of web page j.
let be Q the so called hyperlink matrix with elements:
Qij =
1k if web page i is an outgoing link of web page j;0 otherwise;Qi,i > 0 ∀i.
(1)
The hyperlink matrix Q can be regarded as a transition probability matrix ofa Markov chain.One may regard a surfer on the net as a random walker and the web pages asthe states of the Markov chain.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 5 / 14
The idea of the PageRank algorithm.
Elements of the PageRank.
To illustrate the PageRank algorithm I define the following variables[Ching and Ng(2006)]:
let be N the total number of web pages in the web.let be k the outgoing links of web page j.let be Q the so called hyperlink matrix with elements:
Qij =
1k if web page i is an outgoing link of web page j;0 otherwise;Qi,i > 0 ∀i.
(1)
The hyperlink matrix Q can be regarded as a transition probability matrix ofa Markov chain.One may regard a surfer on the net as a random walker and the web pages asthe states of the Markov chain.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 5 / 14
The idea of the PageRank algorithm.
Elements of the PageRank.
To illustrate the PageRank algorithm I define the following variables[Ching and Ng(2006)]:
let be N the total number of web pages in the web.let be k the outgoing links of web page j.let be Q the so called hyperlink matrix with elements:
Qij =
1k if web page i is an outgoing link of web page j;0 otherwise;Qi,i > 0 ∀i.
(1)
The hyperlink matrix Q can be regarded as a transition probability matrix ofa Markov chain.One may regard a surfer on the net as a random walker and the web pages asthe states of the Markov chain.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 5 / 14
The idea of the PageRank algorithm.
Elements of the PageRank.
To illustrate the PageRank algorithm I define the following variables[Ching and Ng(2006)]:
let be N the total number of web pages in the web.let be k the outgoing links of web page j.let be Q the so called hyperlink matrix with elements:
Qij =
1k if web page i is an outgoing link of web page j;0 otherwise;Qi,i > 0 ∀i.
(1)
The hyperlink matrix Q can be regarded as a transition probability matrix ofa Markov chain.One may regard a surfer on the net as a random walker and the web pages asthe states of the Markov chain.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 5 / 14
The PageRank algorithm.
The PageRank with irreducible Markov Chain.
Assuming that the Markov chain is irreduciblea and aperiodicb then thesteady-state probability distribution (p1, p2, . . . , pN )T of the states (webpages) exists.
aA Markov chain is irreducible if all states communicate with each other.bA chain is periodic if there exists k > 1 such that the interval between two visits to some
state s is always a multiple of k. Therefore a chain is aperiodic if k=1.
The PageRank
Each pi is the proportion of time that the surfer visiting the web page i.The higher the value of pi is, the more important web page i will be.The PageRank of web page i is then defined as pi .
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 6 / 14
The PageRank algorithm.
The PageRank with irreducible Markov Chain.
Assuming that the Markov chain is irreduciblea and aperiodicb then thesteady-state probability distribution (p1, p2, . . . , pN )T of the states (webpages) exists.
aA Markov chain is irreducible if all states communicate with each other.bA chain is periodic if there exists k > 1 such that the interval between two visits to some
state s is always a multiple of k. Therefore a chain is aperiodic if k=1.
The PageRankEach pi is the proportion of time that the surfer visiting the web page i.
The higher the value of pi is, the more important web page i will be.The PageRank of web page i is then defined as pi .
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 6 / 14
The PageRank algorithm.
The PageRank with irreducible Markov Chain.
Assuming that the Markov chain is irreduciblea and aperiodicb then thesteady-state probability distribution (p1, p2, . . . , pN )T of the states (webpages) exists.
aA Markov chain is irreducible if all states communicate with each other.bA chain is periodic if there exists k > 1 such that the interval between two visits to some
state s is always a multiple of k. Therefore a chain is aperiodic if k=1.
The PageRankEach pi is the proportion of time that the surfer visiting the web page i.The higher the value of pi is, the more important web page i will be.
The PageRank of web page i is then defined as pi .
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 6 / 14
The PageRank algorithm.
The PageRank with irreducible Markov Chain.
Assuming that the Markov chain is irreduciblea and aperiodicb then thesteady-state probability distribution (p1, p2, . . . , pN )T of the states (webpages) exists.
aA Markov chain is irreducible if all states communicate with each other.bA chain is periodic if there exists k > 1 such that the interval between two visits to some
state s is always a multiple of k. Therefore a chain is aperiodic if k=1.
The PageRankEach pi is the proportion of time that the surfer visiting the web page i.The higher the value of pi is, the more important web page i will be.The PageRank of web page i is then defined as pi .
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 6 / 14
The PageRank algorithm.
The PageRank with reducible Markov Chain
Since the matrix Q can be reducible to ensure that the steady-stateprobability exists and is unique the following matrix P must be considered:
P = α
Q11 Q12 . . . Q1NQ21 Q22 . . . Q2N. . . . . . . . . . . .
QN1 QN2 . . . QNN
+ (1− α)N
1 1 . . . 11 1 . . . 1. . . . . . . . . . . .1 1 . . . 1
(2)
Where 0 < α < 1 and the most popular values of α are 0.85 and (1− 1/N ).
Interpretation of PageRankThe idea of the PageRank (2) is that, for a network of N web pages, each webpage has an inherent importance of (1− α)/N .If a page Pi has an importance of pi , then it will contribute an importance ofα pi which is shared among the web pages that it points to.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 7 / 14
The PageRank algorithm.
The PageRank with reducible Markov Chain
Since the matrix Q can be reducible to ensure that the steady-stateprobability exists and is unique the following matrix P must be considered:
P = α
Q11 Q12 . . . Q1NQ21 Q22 . . . Q2N. . . . . . . . . . . .
QN1 QN2 . . . QNN
+ (1− α)N
1 1 . . . 11 1 . . . 1. . . . . . . . . . . .1 1 . . . 1
(2)
Where 0 < α < 1 and the most popular values of α are 0.85 and (1− 1/N ).
Interpretation of PageRankThe idea of the PageRank (2) is that, for a network of N web pages, each webpage has an inherent importance of (1− α)/N .If a page Pi has an importance of pi , then it will contribute an importance ofα pi which is shared among the web pages that it points to.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 7 / 14
The PageRank algorithm.
The PageRank with reducible Markov Chain
Solving the following linear system of equations subject to the normalizationconstraint one can obtain the importance of web page Pi :
p1p2...
pN
= α
Q11 Q12 . . . Q1NQ21 Q22 . . . Q2N. . . . . . . . . . . .
QN1 QN2 . . . QNN
p1p2...
pN
+ (1− α)N
11...1
(3)
SinceN∑
i=1pi = 1
the (3) can be rewritten as
(p1, p2, . . . , pN )T = P(p1, p2, . . . , pN )T
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 8 / 14
Solving the PageRank algorithm.
The power method.
The power method is an iterative method for solving the dominant eigenvalueand its corresponding eigenvectors of a matrix.
Given an n × n matrix A, the hypothesis of power method are:
there is a single dominant eigenvalue. The eigenvalues can be sorted:
|λ1| > |λ2| ≥ |λ3| ≥ . . . |λn|
there is a linearly independent set of n eigenvectors:
{u(1),u(2), . . . ,u(n)}
so thatAu(i) = λiu(i), i = 1, . . . ,n.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 9 / 14
Solving the PageRank algorithm.
The power method.
The power method is an iterative method for solving the dominant eigenvalueand its corresponding eigenvectors of a matrix.
Given an n × n matrix A, the hypothesis of power method are:there is a single dominant eigenvalue. The eigenvalues can be sorted:
|λ1| > |λ2| ≥ |λ3| ≥ . . . |λn|
there is a linearly independent set of n eigenvectors:
{u(1),u(2), . . . ,u(n)}
so thatAu(i) = λiu(i), i = 1, . . . ,n.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 9 / 14
Solving the PageRank algorithm.
The power method.
The power method is an iterative method for solving the dominant eigenvalueand its corresponding eigenvectors of a matrix.
Given an n × n matrix A, the hypothesis of power method are:there is a single dominant eigenvalue. The eigenvalues can be sorted:
|λ1| > |λ2| ≥ |λ3| ≥ . . . |λn|
there is a linearly independent set of n eigenvectors:
{u(1),u(2), . . . ,u(n)}
so thatAu(i) = λiu(i), i = 1, . . . ,n.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 9 / 14
Solving the PageRank algorithm.
The power method.
The power method is an iterative method for solving the dominant eigenvalueand its corresponding eigenvectors of a matrix.
Given an n × n matrix A, the hypothesis of power method are:there is a single dominant eigenvalue. The eigenvalues can be sorted:
|λ1| > |λ2| ≥ |λ3| ≥ . . . |λn|
there is a linearly independent set of n eigenvectors:
{u(1),u(2), . . . ,u(n)}
so thatAu(i) = λiu(i), i = 1, . . . ,n.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 9 / 14
Solving the PageRank algorithm.
The power method.
The power method is an iterative method for solving the dominant eigenvalueand its corresponding eigenvectors of a matrix.
Given an n × n matrix A, the hypothesis of power method are:there is a single dominant eigenvalue. The eigenvalues can be sorted:
|λ1| > |λ2| ≥ |λ3| ≥ . . . |λn|
there is a linearly independent set of n eigenvectors:
{u(1),u(2), . . . ,u(n)}
so thatAu(i) = λiu(i), i = 1, . . . ,n.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 9 / 14
Solving the PageRank algorithm.
The power method.The initial vector x0 can be wrote:
x(0) = a1u(1) + a2u(2) + · · ·+ anu(n)
iterating the initial vector with the A matrix:
Akx(0) = a1Aku(1) + a2Aku(2) + · · ·+ anAku(n)
= a1λk1u(1) + a2λ
k2u(2) + · · ·+ anλ
knu(n).
dividing by λk1
Akx(0)
λk1
= a1u(1) + a2
(λ2
λ1
)ku(2) + · · ·+ an
(λnλ1
)ku(n),
Since|λi ||λ1|
< 1→ limk→∞
|λi |k
|λ1|k= 0→ Ak ≈ a1λ
k1u(1)
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 10 / 14
Solving the PageRank algorithm.
The power method.The initial vector x0 can be wrote:
x(0) = a1u(1) + a2u(2) + · · ·+ anu(n)
iterating the initial vector with the A matrix:
Akx(0) = a1Aku(1) + a2Aku(2) + · · ·+ anAku(n)
= a1λk1u(1) + a2λ
k2u(2) + · · ·+ anλ
knu(n).
dividing by λk1
Akx(0)
λk1
= a1u(1) + a2
(λ2
λ1
)ku(2) + · · ·+ an
(λnλ1
)ku(n),
Since|λi ||λ1|
< 1→ limk→∞
|λi |k
|λ1|k= 0→ Ak ≈ a1λ
k1u(1)
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 10 / 14
Solving the PageRank algorithm.
The power method.The initial vector x0 can be wrote:
x(0) = a1u(1) + a2u(2) + · · ·+ anu(n)
iterating the initial vector with the A matrix:
Akx(0) = a1Aku(1) + a2Aku(2) + · · ·+ anAku(n)
= a1λk1u(1) + a2λ
k2u(2) + · · ·+ anλ
knu(n).
dividing by λk1
Akx(0)
λk1
= a1u(1) + a2
(λ2
λ1
)ku(2) + · · ·+ an
(λnλ1
)ku(n),
Since|λi ||λ1|
< 1→ limk→∞
|λi |k
|λ1|k= 0→ Ak ≈ a1λ
k1u(1)
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 10 / 14
Solving the PageRank algorithm.
The power method.The initial vector x0 can be wrote:
x(0) = a1u(1) + a2u(2) + · · ·+ anu(n)
iterating the initial vector with the A matrix:
Akx(0) = a1Aku(1) + a2Aku(2) + · · ·+ anAku(n)
= a1λk1u(1) + a2λ
k2u(2) + · · ·+ anλ
knu(n).
dividing by λk1
Akx(0)
λk1
= a1u(1) + a2
(λ2
λ1
)ku(2) + · · ·+ an
(λnλ1
)ku(n),
Since|λi ||λ1|
< 1→ limk→∞
|λi |k
|λ1|k= 0→ Ak ≈ a1λ
k1u(1)
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 10 / 14
Solving the PageRank algorithm.
The power method.The initial vector x0 can be wrote:
x(0) = a1u(1) + a2u(2) + · · ·+ anu(n)
iterating the initial vector with the A matrix:
Akx(0) = a1Aku(1) + a2Aku(2) + · · ·+ anAku(n)
= a1λk1u(1) + a2λ
k2u(2) + · · ·+ anλ
knu(n).
dividing by λk1
Akx(0)
λk1
= a1u(1) + a2
(λ2
λ1
)ku(2) + · · ·+ an
(λnλ1
)ku(n),
Since|λi ||λ1|
< 1→
limk→∞
|λi |k
|λ1|k= 0→ Ak ≈ a1λ
k1u(1)
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 10 / 14
Solving the PageRank algorithm.
The power method.The initial vector x0 can be wrote:
x(0) = a1u(1) + a2u(2) + · · ·+ anu(n)
iterating the initial vector with the A matrix:
Akx(0) = a1Aku(1) + a2Aku(2) + · · ·+ anAku(n)
= a1λk1u(1) + a2λ
k2u(2) + · · ·+ anλ
knu(n).
dividing by λk1
Akx(0)
λk1
= a1u(1) + a2
(λ2
λ1
)ku(2) + · · ·+ an
(λnλ1
)ku(n),
Since|λi ||λ1|
< 1→ limk→∞
|λi |k
|λ1|k= 0→
Ak ≈ a1λk1u(1)
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 10 / 14
Solving the PageRank algorithm.
The power method.The initial vector x0 can be wrote:
x(0) = a1u(1) + a2u(2) + · · ·+ anu(n)
iterating the initial vector with the A matrix:
Akx(0) = a1Aku(1) + a2Aku(2) + · · ·+ anAku(n)
= a1λk1u(1) + a2λ
k2u(2) + · · ·+ anλ
knu(n).
dividing by λk1
Akx(0)
λk1
= a1u(1) + a2
(λ2
λ1
)ku(2) + · · ·+ an
(λnλ1
)ku(n),
Since|λi ||λ1|
< 1→ limk→∞
|λi |k
|λ1|k= 0→ Ak ≈ a1λ
k1u(1)
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 10 / 14
Conclusions.
The power method and PageRank.
Results.The matrix P of the PageRank algorithm is a stochastic matrix thereforethe largest eigenvalue is 1.
The convergence rate of the power method depends on the ratio of λ2λ1
.It has been showed by [Haveliwala and Kamvar(2003)] that for the secondlargest eigenvalue of P, we have
|λ2| ≤ α 0 ≤ α ≤ 1.
Since λ1 = 1 the converge rate depends on α.The most popular value for α is 0.85. With this value it has been provedthat the power method on web data set of over 80 million pages convergesin about 50 iterations.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 11 / 14
Conclusions.
The power method and PageRank.
Results.The matrix P of the PageRank algorithm is a stochastic matrix thereforethe largest eigenvalue is 1.The convergence rate of the power method depends on the ratio of λ2
λ1.
It has been showed by [Haveliwala and Kamvar(2003)] that for the secondlargest eigenvalue of P, we have
|λ2| ≤ α 0 ≤ α ≤ 1.
Since λ1 = 1 the converge rate depends on α.The most popular value for α is 0.85. With this value it has been provedthat the power method on web data set of over 80 million pages convergesin about 50 iterations.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 11 / 14
Conclusions.
The power method and PageRank.
Results.The matrix P of the PageRank algorithm is a stochastic matrix thereforethe largest eigenvalue is 1.The convergence rate of the power method depends on the ratio of λ2
λ1.
It has been showed by [Haveliwala and Kamvar(2003)] that for the secondlargest eigenvalue of P, we have
|λ2| ≤ α 0 ≤ α ≤ 1.
Since λ1 = 1 the converge rate depends on α.The most popular value for α is 0.85. With this value it has been provedthat the power method on web data set of over 80 million pages convergesin about 50 iterations.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 11 / 14
Conclusions.
The power method and PageRank.
Results.The matrix P of the PageRank algorithm is a stochastic matrix thereforethe largest eigenvalue is 1.The convergence rate of the power method depends on the ratio of λ2
λ1.
It has been showed by [Haveliwala and Kamvar(2003)] that for the secondlargest eigenvalue of P, we have
|λ2| ≤ α 0 ≤ α ≤ 1.
Since λ1 = 1 the converge rate depends on α.
The most popular value for α is 0.85. With this value it has been provedthat the power method on web data set of over 80 million pages convergesin about 50 iterations.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 11 / 14
Conclusions.
The power method and PageRank.
Results.The matrix P of the PageRank algorithm is a stochastic matrix thereforethe largest eigenvalue is 1.The convergence rate of the power method depends on the ratio of λ2
λ1.
It has been showed by [Haveliwala and Kamvar(2003)] that for the secondlargest eigenvalue of P, we have
|λ2| ≤ α 0 ≤ α ≤ 1.
Since λ1 = 1 the converge rate depends on α.The most popular value for α is 0.85. With this value it has been provedthat the power method on web data set of over 80 million pages convergesin about 50 iterations.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 11 / 14
Conclusions.
Really thanks to GTUG Palermoand
see you to the next meeting!
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 12 / 14
Bibliography.
Bibliography.
Brin, S. and Page, L. (1998).The anatomy of a large-scale hypertextual Web search engine.Computer networks and ISDN systems, 30(1-7), 107–117.
Ching, W. and Ng, M. (2006).Markov Chains: Models, Algoritms and Applications.Springer Science + Business Media, Inc.
Haveliwala, T. and Kamvar, M. (2003).The second eigenvalue of the google matrix.Technical report, Stanford University.
Langville, A., Meyer, C., and FernAndez, P. (2008).Google’s PageRank and beyond: the science of search engine rankings.The Mathematical Intelligencer, 30(1), 68–69.
Page, L., Brin, S., Motwani, R., and Winograd, T. (1999).The PageRank Citation Ranking: Bringing Order to the Web.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 13 / 14
Internet web sites.
Internet web sites.
Jon Atle Gulla (2007) - From Google Search to Semantic Exploration. -Norwegian University of Science Technology -www.slideshare.net/sveino/semantics-and-search?type=presentation
Steven Levy (2010) - Exclusive: How Google’s Algorithm Rules the Web - WiredMagazine - www.wired.com/magazine/2010/02/ff_google_algorithm/
Ann Smarty (2009) - Let’s Try to Find All 200 Parameters in Google Algorithm -Search Engine Journal -www.searchenginejournal.com/200-parameters-in-google-algorithm/15457/.
Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 14 / 14