2007 csam outlinks and pr presentation2

Upload: pascal-van-hecke

Post on 31-May-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    1/36

    Googles PageRankand the Choice of the Outlinks

    Laure Ninove

    Joint work with Cristobald de Kerchove and Paul Van Dooren

    CESAMEUniversit catholique de Louvain, Belgium

    CESAME SeminarFebruary 27, 2007

    Laure Ninove (CESAME) Outlinks and PR 1 / 27

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    2/36

    Googles power

    Googles search engine

    guides websurfers in their

    visits.

    A good ranking is vital for a

    webpage to be read.

    How to improve yourGoogle rank?

    Laure Ninove (CESAME) Outlinks and PR 2 / 27

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    3/36

    Googles power

    Googles search engine

    guides websurfers in their

    visits.

    A good ranking is vital for a

    webpage to be read.

    How to improve yourGoogle rank?

    Laure Ninove (CESAME) Outlinks and PR 2 / 27

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    4/36

    Outline

    1

    Preliminaries: What is under Googles PageRank?A brief history

    A story of links

    PageRank equations

    2 How to improve your PageRank?Add inlinks

    Choose outlinks

    3

    Optimal outlink structureFor a single node

    For a set of nodes

    Laure Ninove (CESAME) Outlinks and PR 3 / 27

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    5/36

    Outline

    1

    Preliminaries: What is under Googles PageRank?A brief history

    A story of links

    PageRank equations

    2 How to improve your PageRank?Add inlinks

    Choose outlinks

    3

    Optimal outlink structureFor a single node

    For a set of nodes

    Laure Ninove (CESAME) Outlinks and PR 4 / 27

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    6/36

    A brief history of the Web search engine Google

    1996: a research project, by L. Page and S. Brin

    1998: Google Inc. company, 25 million webpages indexed

    2005: 8 billion webpages indexed

    2006: "to google" added to the Oxford English Dictionary

    The primary goal is to provide high quality search results

    over a rapidly growing World Wide Web. Google employs a

    number of techniques to improve search quality including

    page rank, anchor text, and proximity information.

    Brin & Page, 1998

    The anatomy of a large-scale hypertextual web search engine

    Laure Ninove (CESAME) Outlinks and PR 5 / 27

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    7/36

    Googles PageRank: a story of links

    An hyperlink from i to j

    is vote of confidence in j.

    A page j has a high PageRank j if it is pointed to by many pages with

    a high PageRank,

    few outlinks.

    Laure Ninove (CESAME) Outlinks and PR 6 / 27

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    8/36

    Votes of confidence

    Example

    1

    2

    4

    3

    2/11

    2/11

    ?

    1/11

    1/11

    2/11

    1 =1

    2 2 + 1 4 =

    3

    11

    Laure Ninove (CESAME) Outlinks and PR 7 / 27

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    9/36

    PageRank equationsVote of confidence

    j = cij

    idi

    + (1 c)zj

    j

    j = 1

    sum of parents weighted scores

    normalization of the PageRanks

    damping with personalization score

    T = cTD1A + (1 c)zT

    T

    e = 1

    A {0, 1}n: webgraphs adjacency matrix(zero diagonal, no zero row)

    D = diag(Ae): outdegrees matrix

    c ]0, 1[: damping factor

    z > 0, zTe = 1: personalization vector

    Laure Ninove (CESAME) Outlinks and PR 8 / 27

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    10/36

    PageRank equationsVote of confidence

    j = cij

    idi

    + (1 c)zj

    j

    j = 1

    sum of parents weighted scores

    normalization of the PageRanks

    damping with personalization score

    T = cTD1A + (1 c)zT

    T

    e = 1

    A {0, 1}n: webgraphs adjacency matrix(zero diagonal, no zero row)

    D = diag(Ae): outdegrees matrix

    c ]0, 1[: damping factor

    z > 0, zTe = 1: personalization vector

    Laure Ninove (CESAME) Outlinks and PR 8 / 27

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    11/36

    PageRank equationsVote of confidence

    j = cij

    idi

    + (1 c)zj

    j

    j = 1

    sum of parents weighted scores

    normalization of the PageRanks

    damping with personalization score

    T = cTD1A + (1 c)zT

    Te=

    1

    A {0, 1}n: webgraphs adjacency matrix(zero diagonal, no zero row)

    D = diag(Ae): outdegrees matrix

    c ]0, 1[: damping factor

    z > 0, zTe = 1: personalization vector

    Laure Ninove (CESAME) Outlinks and PR 8 / 27

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    12/36

    PageRank equationsRandom walk

    Google matrix:

    G = c D1

    A + (1 c) ezT

    Irreducible, stochastic matrix transition probability matrix

    Random walk on the webgraph:

    P(i j) = Gij, with P(follow hyperlinks) = c

    P(zap according z) = 1 c

    PageRank vector : stationary distribution of this Markov chainTG = T

    Te = 1

    Laure Ninove (CESAME) Outlinks and PR 9 / 27

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    13/36

    Damping with a personalization score

    Example

    1

    2

    4

    3

    0.19?

    c*0.095

    c*0.19

    z

    (1c)*0.25

    0.19

    1 = c

    1

    22 + 4

    + (1 c) z1

    Laure Ninove (CESAME) Outlinks and PR 10 / 27

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    14/36

    Outline

    1

    Preliminaries: What is under Googles PageRank?A brief history

    A story of links

    PageRank equations

    2 How to improve your PageRank?Add inlinks

    Choose outlinks

    3 Optimal outlink structure

    For a single node

    For a set of nodes

    Laure Ninove (CESAME) Outlinks and PR 11 / 27

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    15/36

    How to improve your PageRank?

    Laure Ninove (CESAME) Outlinks and PR 12 / 27

    H i P R k?

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    16/36

    How to improve your PageRank?Add inlinks

    Add inlinks?

    j = c

    i

    j

    idi

    + (1 c)zj

    Always your PR

    Ipsen & Wills, 2006

    Mathematical properties and analysis of Googles PageRank

    Laure Ninove (CESAME) Outlinks and PR 13 / 27

    H i P R k?

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    17/36

    How to improve your PageRank?Add inlinks

    Add inlinks?

    j = ci

    j

    idi

    + (1 c)zj

    Always your PR

    Ipsen & Wills, 2006

    Mathematical properties and analysis of Googles PageRank

    Laure Ninove (CESAME) Outlinks and PR 13 / 27

    H t i P R k?

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    18/36

    How to improve your PageRank?Add inlinks

    Example

    1 1

    1 = 0.196 < (inlink)1 = 0.245

    Laure Ninove (CESAME) Outlinks and PR 14 / 27

    H t i P R k?

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    19/36

    How to improve your PageRank?Add inlinks

    Add inlinks?

    j = cij

    idi

    + (1 c)zj

    Always your PR

    But no control

    on your inlinks

    Ipsen & Wills, 2006

    Mathematical properties and analysis of Googles PageRank

    Laure Ninove (CESAME) Outlinks and PR 15 / 27

    Ho to impro e o r PageRank?

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    20/36

    How to improve your PageRank?Choose outlinks

    Choose outlinks?

    You control them

    Constraints:

    at least one outlink

    no loopImpact not obvious:

    adding outlinks can

    or your PR

    Sydow, 2005

    Can one out-link change your PageRank?

    Laure Ninove (CESAME) Outlinks and PR 16 / 27

    How to improve your PageRank?

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    21/36

    How to improve your PageRank?Choose outlinks

    Choose outlinks?

    You control them

    Constraints:

    at least one outlink

    no loopImpact not obvious:

    adding outlinks can

    or your PR

    Sydow, 2005

    Can one out-link change your PageRank?

    Laure Ninove (CESAME) Outlinks and PR 16 / 27

    How to improve your PageRank?

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    22/36

    How to improve your PageRank?Choose outlinks

    Example

    1 1 1

    (outlink a)

    1 = 0.182 < 1 = 0.196 < (outlink b)

    1 = 0.211

    Laure Ninove (CESAME) Outlinks and PR 17 / 27

    Outline

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    23/36

    Outline

    1 Preliminaries: What is under Googles PageRank?

    A brief history

    A story of links

    PageRank equations

    2 How to improve your PageRank?Add inlinks

    Choose outlinks

    3 Optimal outlink structure

    For a single node

    For a set of nodes

    Laure Ninove (CESAME) Outlinks and PR 18 / 27

    Notation

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    24/36

    Notation

    LetI be the considered set of nodes.

    Up to a permutation of the indices,

    A =

    AI Aout(I)

    Ain(I) AI

    .

    Laure Ninove (CESAME) Outlinks and PR 19 / 27

    Optimal outlink structure for a single node

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    25/36

    Optimal outlink structure for a single node

    SupposeI = {1}.

    We want to maximize 1(Aout({1})).

    With Aout({1}) = eTL, where L = {children of 1} = .

    Proposition

    1(eTL) is maximal = L L

    = arg mini

    eTi (I GI)1e.

    Proof.1(e

    TL) =

    1

    c

    iL

    eTi (I GI)1e

    |L|+ constant

    .

    Laure Ninove (CESAME) Outlinks and PR 20 / 27

    Optimal outlink structure for a single node

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    26/36

    Optimal outlink structure for a single node

    SupposeI = {1}.

    We want to maximize 1(Aout({1})).

    With Aout({1}) = eTL, where L = {children of 1} = .

    Proposition

    1(eTL) is maximal = L L = arg mini

    eTi (I GI)1e.

    Proposition

    Suppose that 1 has some parents. Then

    1(eTL) is maximal = L {parents of 1}.

    Laure Ninove (CESAME) Outlinks and PR 20 / 27

    Optimal outlink structure for a single node

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    27/36

    Optimal outlink structure for a single node

    SupposeI = {1}.

    We want to maximize 1(Aout({1})).

    With Aout({1}) = eTL, where L = {children of 1} = .

    Proposition

    1(eTL) is maximal = L L = arg mini

    eTi (I GI)1e.

    Proposition

    Suppose that 1 has some parents. Then

    1(eTL) is maximal = L {parents of 1}.

    But

    Laure Ninove (CESAME) Outlinks and PR 20 / 27

    Optimal outlink structure for a single node

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    28/36

    Optimal outlink structure for a single nodeExample

    Example

    1

    2

    3*

    *

    *

    In order to maximize its PageRank,

    Node 1 should linkto some node(s) (parents).

    But it is better for 1 to link

    to node 3 (grand-parent)

    rather than to node 2 (parent).

    Laure Ninove (CESAME) Outlinks and PR 21 / 27

    Optimal outlink structure for a set of nodes

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    29/36

    Optimal outlink structure for a set of nodes

    Consider now a set I of nodes.Internal link structure AI given, with AI has no zero row.

    External outlink structure Aout(I) to be determined.Goal: to maximize the sum of PageRanks:

    maxAout(I)

    iI

    i (Aout(I)).

    Laure Ninove (CESAME) Outlinks and PR 22 / 27

    Optimal outlink structure for a set of nodes

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    30/36

    Optimal outlink structure for a set of nodes

    Proposition

    Under the assumption that I has at least m external outlinks,

    iI

    i (Aout(I)) is maximal = I has exactly m external outlinks.

    Laure Ninove (CESAME) Outlinks and PR 22 / 27

    Optimal outlink structure for a set of node

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    31/36

    Optimal outlink structure for a set of node

    Proof.

    Removing a link i j from the graph perturbation:

    G(i,j) = c(D1A + ei(i,j)T) + (1 c) ezT.

    Difference between new and old PageRank sums:

    sI

    (i,j)s

    sI

    s = ci(i,j)T(I cD1A)1eI

    1 c(i,j)T(I cD1A)1ei.

    For every link i j, c(i,j)T

    (I c D1

    A)1

    ei < 1.There exists an external outlink k with k I, / I, such that

    (k,)T(I cD1A)1eI > 0.

    Laure Ninove (CESAME) Outlinks and PR 23 / 27

    Optimal outlink structure for a set of node

    http://goforward/http://find/http://goback/
  • 8/14/2019 2007 Csam Outlinks and PR Presentation2

    32/36

    p

    Example

    Sometimes, removing an outlink for Imay decrease the PageRank sum for I.

    2

    3 4 5

    1 2

    3 4 5

    1 2

    3 4 5

    1

    iI

    i(I 3)