Download - 2007 Csam Outlinks and PR Presentation2
-
8/14/2019 2007 Csam Outlinks and PR Presentation2
1/36
Googles PageRankand the Choice of the Outlinks
Laure Ninove
Joint work with Cristobald de Kerchove and Paul Van Dooren
CESAMEUniversit catholique de Louvain, Belgium
CESAME SeminarFebruary 27, 2007
Laure Ninove (CESAME) Outlinks and PR 1 / 27
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
2/36
Googles power
Googles search engine
guides websurfers in their
visits.
A good ranking is vital for a
webpage to be read.
How to improve yourGoogle rank?
Laure Ninove (CESAME) Outlinks and PR 2 / 27
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
3/36
Googles power
Googles search engine
guides websurfers in their
visits.
A good ranking is vital for a
webpage to be read.
How to improve yourGoogle rank?
Laure Ninove (CESAME) Outlinks and PR 2 / 27
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
4/36
Outline
1
Preliminaries: What is under Googles PageRank?A brief history
A story of links
PageRank equations
2 How to improve your PageRank?Add inlinks
Choose outlinks
3
Optimal outlink structureFor a single node
For a set of nodes
Laure Ninove (CESAME) Outlinks and PR 3 / 27
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
5/36
Outline
1
Preliminaries: What is under Googles PageRank?A brief history
A story of links
PageRank equations
2 How to improve your PageRank?Add inlinks
Choose outlinks
3
Optimal outlink structureFor a single node
For a set of nodes
Laure Ninove (CESAME) Outlinks and PR 4 / 27
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
6/36
A brief history of the Web search engine Google
1996: a research project, by L. Page and S. Brin
1998: Google Inc. company, 25 million webpages indexed
2005: 8 billion webpages indexed
2006: "to google" added to the Oxford English Dictionary
The primary goal is to provide high quality search results
over a rapidly growing World Wide Web. Google employs a
number of techniques to improve search quality including
page rank, anchor text, and proximity information.
Brin & Page, 1998
The anatomy of a large-scale hypertextual web search engine
Laure Ninove (CESAME) Outlinks and PR 5 / 27
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
7/36
Googles PageRank: a story of links
An hyperlink from i to j
is vote of confidence in j.
A page j has a high PageRank j if it is pointed to by many pages with
a high PageRank,
few outlinks.
Laure Ninove (CESAME) Outlinks and PR 6 / 27
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
8/36
Votes of confidence
Example
1
2
4
3
2/11
2/11
?
1/11
1/11
2/11
1 =1
2 2 + 1 4 =
3
11
Laure Ninove (CESAME) Outlinks and PR 7 / 27
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
9/36
PageRank equationsVote of confidence
j = cij
idi
+ (1 c)zj
j
j = 1
sum of parents weighted scores
normalization of the PageRanks
damping with personalization score
T = cTD1A + (1 c)zT
T
e = 1
A {0, 1}n: webgraphs adjacency matrix(zero diagonal, no zero row)
D = diag(Ae): outdegrees matrix
c ]0, 1[: damping factor
z > 0, zTe = 1: personalization vector
Laure Ninove (CESAME) Outlinks and PR 8 / 27
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
10/36
PageRank equationsVote of confidence
j = cij
idi
+ (1 c)zj
j
j = 1
sum of parents weighted scores
normalization of the PageRanks
damping with personalization score
T = cTD1A + (1 c)zT
T
e = 1
A {0, 1}n: webgraphs adjacency matrix(zero diagonal, no zero row)
D = diag(Ae): outdegrees matrix
c ]0, 1[: damping factor
z > 0, zTe = 1: personalization vector
Laure Ninove (CESAME) Outlinks and PR 8 / 27
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
11/36
PageRank equationsVote of confidence
j = cij
idi
+ (1 c)zj
j
j = 1
sum of parents weighted scores
normalization of the PageRanks
damping with personalization score
T = cTD1A + (1 c)zT
Te=
1
A {0, 1}n: webgraphs adjacency matrix(zero diagonal, no zero row)
D = diag(Ae): outdegrees matrix
c ]0, 1[: damping factor
z > 0, zTe = 1: personalization vector
Laure Ninove (CESAME) Outlinks and PR 8 / 27
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
12/36
PageRank equationsRandom walk
Google matrix:
G = c D1
A + (1 c) ezT
Irreducible, stochastic matrix transition probability matrix
Random walk on the webgraph:
P(i j) = Gij, with P(follow hyperlinks) = c
P(zap according z) = 1 c
PageRank vector : stationary distribution of this Markov chainTG = T
Te = 1
Laure Ninove (CESAME) Outlinks and PR 9 / 27
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
13/36
Damping with a personalization score
Example
1
2
4
3
0.19?
c*0.095
c*0.19
z
(1c)*0.25
0.19
1 = c
1
22 + 4
+ (1 c) z1
Laure Ninove (CESAME) Outlinks and PR 10 / 27
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
14/36
Outline
1
Preliminaries: What is under Googles PageRank?A brief history
A story of links
PageRank equations
2 How to improve your PageRank?Add inlinks
Choose outlinks
3 Optimal outlink structure
For a single node
For a set of nodes
Laure Ninove (CESAME) Outlinks and PR 11 / 27
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
15/36
How to improve your PageRank?
Laure Ninove (CESAME) Outlinks and PR 12 / 27
H i P R k?
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
16/36
How to improve your PageRank?Add inlinks
Add inlinks?
j = c
i
j
idi
+ (1 c)zj
Always your PR
Ipsen & Wills, 2006
Mathematical properties and analysis of Googles PageRank
Laure Ninove (CESAME) Outlinks and PR 13 / 27
H i P R k?
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
17/36
How to improve your PageRank?Add inlinks
Add inlinks?
j = ci
j
idi
+ (1 c)zj
Always your PR
Ipsen & Wills, 2006
Mathematical properties and analysis of Googles PageRank
Laure Ninove (CESAME) Outlinks and PR 13 / 27
H t i P R k?
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
18/36
How to improve your PageRank?Add inlinks
Example
1 1
1 = 0.196 < (inlink)1 = 0.245
Laure Ninove (CESAME) Outlinks and PR 14 / 27
H t i P R k?
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
19/36
How to improve your PageRank?Add inlinks
Add inlinks?
j = cij
idi
+ (1 c)zj
Always your PR
But no control
on your inlinks
Ipsen & Wills, 2006
Mathematical properties and analysis of Googles PageRank
Laure Ninove (CESAME) Outlinks and PR 15 / 27
Ho to impro e o r PageRank?
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
20/36
How to improve your PageRank?Choose outlinks
Choose outlinks?
You control them
Constraints:
at least one outlink
no loopImpact not obvious:
adding outlinks can
or your PR
Sydow, 2005
Can one out-link change your PageRank?
Laure Ninove (CESAME) Outlinks and PR 16 / 27
How to improve your PageRank?
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
21/36
How to improve your PageRank?Choose outlinks
Choose outlinks?
You control them
Constraints:
at least one outlink
no loopImpact not obvious:
adding outlinks can
or your PR
Sydow, 2005
Can one out-link change your PageRank?
Laure Ninove (CESAME) Outlinks and PR 16 / 27
How to improve your PageRank?
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
22/36
How to improve your PageRank?Choose outlinks
Example
1 1 1
(outlink a)
1 = 0.182 < 1 = 0.196 < (outlink b)
1 = 0.211
Laure Ninove (CESAME) Outlinks and PR 17 / 27
Outline
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
23/36
Outline
1 Preliminaries: What is under Googles PageRank?
A brief history
A story of links
PageRank equations
2 How to improve your PageRank?Add inlinks
Choose outlinks
3 Optimal outlink structure
For a single node
For a set of nodes
Laure Ninove (CESAME) Outlinks and PR 18 / 27
Notation
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
24/36
Notation
LetI be the considered set of nodes.
Up to a permutation of the indices,
A =
AI Aout(I)
Ain(I) AI
.
Laure Ninove (CESAME) Outlinks and PR 19 / 27
Optimal outlink structure for a single node
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
25/36
Optimal outlink structure for a single node
SupposeI = {1}.
We want to maximize 1(Aout({1})).
With Aout({1}) = eTL, where L = {children of 1} = .
Proposition
1(eTL) is maximal = L L
= arg mini
eTi (I GI)1e.
Proof.1(e
TL) =
1
c
iL
eTi (I GI)1e
|L|+ constant
.
Laure Ninove (CESAME) Outlinks and PR 20 / 27
Optimal outlink structure for a single node
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
26/36
Optimal outlink structure for a single node
SupposeI = {1}.
We want to maximize 1(Aout({1})).
With Aout({1}) = eTL, where L = {children of 1} = .
Proposition
1(eTL) is maximal = L L = arg mini
eTi (I GI)1e.
Proposition
Suppose that 1 has some parents. Then
1(eTL) is maximal = L {parents of 1}.
Laure Ninove (CESAME) Outlinks and PR 20 / 27
Optimal outlink structure for a single node
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
27/36
Optimal outlink structure for a single node
SupposeI = {1}.
We want to maximize 1(Aout({1})).
With Aout({1}) = eTL, where L = {children of 1} = .
Proposition
1(eTL) is maximal = L L = arg mini
eTi (I GI)1e.
Proposition
Suppose that 1 has some parents. Then
1(eTL) is maximal = L {parents of 1}.
But
Laure Ninove (CESAME) Outlinks and PR 20 / 27
Optimal outlink structure for a single node
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
28/36
Optimal outlink structure for a single nodeExample
Example
1
2
3*
*
*
In order to maximize its PageRank,
Node 1 should linkto some node(s) (parents).
But it is better for 1 to link
to node 3 (grand-parent)
rather than to node 2 (parent).
Laure Ninove (CESAME) Outlinks and PR 21 / 27
Optimal outlink structure for a set of nodes
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
29/36
Optimal outlink structure for a set of nodes
Consider now a set I of nodes.Internal link structure AI given, with AI has no zero row.
External outlink structure Aout(I) to be determined.Goal: to maximize the sum of PageRanks:
maxAout(I)
iI
i (Aout(I)).
Laure Ninove (CESAME) Outlinks and PR 22 / 27
Optimal outlink structure for a set of nodes
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
30/36
Optimal outlink structure for a set of nodes
Proposition
Under the assumption that I has at least m external outlinks,
iI
i (Aout(I)) is maximal = I has exactly m external outlinks.
Laure Ninove (CESAME) Outlinks and PR 22 / 27
Optimal outlink structure for a set of node
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
31/36
Optimal outlink structure for a set of node
Proof.
Removing a link i j from the graph perturbation:
G(i,j) = c(D1A + ei(i,j)T) + (1 c) ezT.
Difference between new and old PageRank sums:
sI
(i,j)s
sI
s = ci(i,j)T(I cD1A)1eI
1 c(i,j)T(I cD1A)1ei.
For every link i j, c(i,j)T
(I c D1
A)1
ei < 1.There exists an external outlink k with k I, / I, such that
(k,)T(I cD1A)1eI > 0.
Laure Ninove (CESAME) Outlinks and PR 23 / 27
Optimal outlink structure for a set of node
http://goforward/http://find/http://goback/ -
8/14/2019 2007 Csam Outlinks and PR Presentation2
32/36
p
Example
Sometimes, removing an outlink for Imay decrease the PageRank sum for I.
2
3 4 5
1 2
3 4 5
1 2
3 4 5
1
iI
i(I 3)