inside page ranking

Upload: arjun-c-chandrathil

Post on 30-May-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 inside page ranking

    1/24

    INS IDE PAGERANKINS IDE PAGERANK

    Bejo Thampi

    S7 CSE

    Roll No:-14

    SNGCE

    On :-26/08/2009

    Guided By : Mr. S ujith Kumar

  • 8/14/2019 inside page ranking

    2/24

    CONTENTSCONTENTS

    Introduction

    PageRank

    PageRank calculation

    Optimization of PageRankRemoval of Dangling Pages

    Conclusion

    References

  • 8/14/2019 inside page ranking

    3/24

    IntroductionIntroduction

    Represents how important a page is on theweb.

    Numeric value between 0 and 10

    Old Algorithms considered only the content

    of the page causing web spamming

    Considers both content and anchor text

    (BackLinks)

  • 8/14/2019 inside page ranking

    4/24

    PageRankPageRank

    PageRank is a link analysis algorithm.

    Determines a pages ranking in the searchresults.

    A trademark of Google which is patented to

    S tanford University.

  • 8/14/2019 inside page ranking

    5/24

    ContinueContinue

    Quoting from the original Google paper,PageRank is defined as:

    a link from page A to page B as a vote,by page A, for Page B.

    it also analyzes the page that casts thevote.

  • 8/14/2019 inside page ranking

    6/24

    Calculation of PageRankCalculation of PageRank

    Two Methods

    S implified Algorithm and Damping

    Algorithm

    Based probability distribution

  • 8/14/2019 inside page ranking

    7/24

    S implified Algorithm

    In the general case, the PageRank value

    for any page u can be expressed as:

  • 8/14/2019 inside page ranking

    8/24

    Damping Algorithm

    PR(V)= (1-d) + d(PR(V2)/1)

    d (damping factor) =0.85

    Calculated us ing S imple Iterative Method

  • 8/14/2019 inside page ranking

    9/24

    ExampleExample

    Each page has one outgoing link

    i.e. hA = 1 and hB = 1

    Page A Page B

  • 8/14/2019 inside page ranking

    10/24

    S tart a guess at 1.0

    PR (A) = (1 d) + d(PR (B)/1)

    PR (B) = (1 d) + d(PR (A)/1)

    i.e. PR (A) = 0.15 + 0.85 * 1 = 1PR(B) = 0.15 + 0.85 * 1 = 1

    numbers arent changing. S o here weconsider this as a good guess.

    Guess1Guess1

  • 8/14/2019 inside page ranking

    11/24

    Guess2Guess2

    guess at 0 instead and re-calculate:

    PR(A) = 0.15 + 0.85 * 0 = 0.15

    PR (B) = 0.15 + 0.85 * 0.15= 0.2775

    And again:

    PR(A) = 0.15 + 0.85 * 0.2775= 0.385875PR (B) = 0.15 + 0.85 * 0.385875= 0.47799375

  • 8/14/2019 inside page ranking

    12/24

    ContinueContinue

    On 39th iteration ,PR(A)=.999999 and PR (B)=1.0000000

    On40th iteration,

    PR(A)=1.000 and PR(B)=1.000.

    And average PageRank =1.00000

    Principle:- the normalized probabilitydistribution (the average PageR ank for allpages) will be 1.0.

  • 8/14/2019 inside page ranking

    13/24

    VariationsVariations

    Google Toolbar Rank

    The Google Toolbar's PageRank feature

    displays a visited page's PageRank as awhole number between 0 and 10.

    S ER P Rank

    Result returned by search engine in

    response to a keyword query.

  • 8/14/2019 inside page ranking

    14/24

    Optimization of PageRankOptimization of PageRank

    Three fundamental areas to look at when

    trying to optimize the PageR ank for site:

    1. The links you choose to have link to you,

    i.e., which ones you choose, and how much

    effort you put in to getting them.

  • 8/14/2019 inside page ranking

    15/24

    2. Who you choose to link out to from yoursite

    Maximising PageRank Feedback and minimisingPageRank leakage

    3.The internal navigational structure and

    linkage of your pages

    distribute PageR ank within your site.

    ContinueContinue

  • 8/14/2019 inside page ranking

    16/24

    Internal Linking

    Hierarchical

    Courtsey: http://www-db.Stanford.edu/~backrub/google.html

  • 8/14/2019 inside page ranking

    17/24

    ContinueContinue

    Looping:-

    Courtsey: http://www-db.Stanford.edu/~backrub/google.html

  • 8/14/2019 inside page ranking

    18/24

    Removal of Dangling PagesRemoval of Dangling Pages

    S imply links that point toany page with no outgoing

    links.

    These links are simplepages that not downloaded

    yet.

    Ranking is affected.

    Need to be removedhttp://www.iprcom.com/papers/pagerank/index.html

  • 8/14/2019 inside page ranking

    19/24

    ContinueContinue

    E liminates dangling pages before

    PageRank Calculation.

    Done by introducing a dummy page.

    A link to itself and is pointed by every

    dangling page.

  • 8/14/2019 inside page ranking

    20/24

    ContinueContinue

    Courtsey: http://www.iprcom.com/papers/pagerank/index.html

  • 8/14/2019 inside page ranking

    21/24

    ConclusionConclusion

    Parameter involved in Googles ranking of

    the answers to a given query.

    Optimal computation of PageR ank

    Optimization of PageRank

  • 8/14/2019 inside page ranking

    22/24

    ReferencesReferences

    1. The Google S tory - David A Vise

    2. htp://www.iprcom.com/papers/pagerank/index.html

    3. http://en.wikipedia.org/wiki/PageRank

    4. http://www.siteall.com/guide/

  • 8/14/2019 inside page ranking

    23/24

    THANK YOUTHANK YOU

  • 8/14/2019 inside page ranking

    24/24

    QUESTIONSQUESTIONS? ?? ?