presented by ashish chawla and vinit asher the pagerank citation ranking: bringing order to the web...
TRANSCRIPT
![Page 1: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/1.jpg)
PRESENTED BY ASHISH CHAWLA AND VINIT ASHER
The PageRank Citation Ranking: Bringing Order to the Web
Lawrence Page and Sergey Brin, Stanford University. 1998.
![Page 2: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/2.jpg)
Agenda
Introduction
Background
Link Structure
Propagation of Ranking
Simplified Page Rank Calculation
Problems in Ranking
Page Rank Definition
Computing Page Rank
Mathematical Basics
Implementation Details
Convergence
Searching with
PageRank
![Page 3: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/3.jpg)
Introduction
Challenges in Information Retrieval on Web Large # of documents Heterogeneous and Unstructured WWW
Is hypertext provides auxiliary information (other than the text of web
pages)
Objective Take advantage of this link structure.
![Page 4: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/4.jpg)
Background
Academic Citations link to other well known papers peer reviewed have quality control
Web : Homogeneous in their quality, usage, citation & length Quality measure (subjective to the user) Importance of a page is a quantity that isn’t intuitively
possible to capture
![Page 5: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/5.jpg)
What does a user want?
Most applicable documents firstWhat is the job of a retrieval system?
Present more relevant documents upfrontNotion: Quality/Importance of Web Pages
Difficult to classify (depends on user)
We deal with the overall importance of a page, rather than individual sections of the page.
![Page 6: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/6.jpg)
Link Structure
Forward LinksBack LinksWeb has 150 million
pages and 1.7 billion links (probably more now)
Use the concept of citation analysis Highly linked pages are
more “important" than pages with few links
![Page 7: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/7.jpg)
Propagation of Ranking
Page Rank: a page has high rank if the sum of the ranks of its back-links is high
Some notationsu Web Page
Fu Set of pages u points to (Forward links)
Bu Set of pages that point to u (Backlinks)
Nu = |Fu| Number of links from u
c Normalization factor
Simple Ranking function
uBv vN
vRcuR
)()(
![Page 8: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/8.jpg)
Simplified Page Rank Calculation
![Page 9: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/9.jpg)
Problem in Ranking?
Rank Sink: Two web pages that point to each other but to no
other page. Third page which points to one of them. loop will accumulate rank but never distribute it
(since there are no outedges).
![Page 10: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/10.jpg)
Page Rank Definition
Let E(u) be some vector over the Web pages that corresponds to a source of rank. Then, the PageRank of a set of Web pages is an assignment, R’, to the Web pages which satisfies
such that c is maximized and ||R’||1 = 1 (||R’||
1 denotes the L1 norm of R’).
)()(
)('
ucEN
vRcuR
uBv v
![Page 11: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/11.jpg)
Computing Page Rank
initialize vector over web pages
Loop:
new ranks sum of normalized backlink ranks
compute normalizing factor
add escape term
control parameter
while stop when converged
SR 0
ii ARR 1
111 ii RRd
dERR ii 11
ii RR 1
![Page 12: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/12.jpg)
Random Surfer Model
Random SurferClicks at random basis“Surfer” periodically gets bored.
![Page 13: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/13.jpg)
Solution to Random Surfer Model
Escape term: E(u) can be thought of as the random surfer gets bored periodically and jumps to a different page – not staying in the loop forever.
We term this E to be a vector over all the web pages that accounts for each page’s escape probability (user defined parameter).
)()(
)('
ucEN
vRcuR
uBv v
![Page 14: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/14.jpg)
Another Problem – Dangling Links
What are dangling links? Links that point to any page with no outgoing links. Pages not downloaded yet.
Why is this a problem? We don’t know how to distribute weight to these.
What do we do ? Remove them from the system
![Page 15: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/15.jpg)
Mathematical Basics
What is eigen vector and eigen value? Given a vector v in the n-dimensional vector space, we
can linear transform it to another vector space using a transformation matrix A. The transformed vector is Av.
An eigen vector is a vector that is scaled by a linear transformation, but not moved. The scaling factor is the eigen value. Eigen values and eigen vectors are not unique. We can compute them by Ax = x where is the eigen value of A and x is the corresponding eigen vector.
An eigenvector is a vector that 'points' in the same direction (has invariant direction cosines) under some transform. The eigenvalue is a number that describes how the magnitude of the eigenvector is scaled by the transform.
![Page 16: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/16.jpg)
Mathematical Basics
A is designated to be a matrix, u and v correspond to the columns of this matrix.
Given that A is a matrix, and R be a vector over all the Web pages, the dominant eigenvector is the one associated with the maximal eigenvalue.
![Page 17: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/17.jpg)
Example
AT=
![Page 18: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/18.jpg)
Example (contd..)
R = c A R = M R c: eigenvalue R : eigen vector of A
A =
R = Normalized =
A x = λ x| A - λI | x = 0
![Page 19: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/19.jpg)
Implementation
Web crawler keeps a database of URLs so that it can discover all URLs on the web
To implement PageRank, the web crawler builds an index of the URLs as it crawls
Problems??? Infinitely large sites Incorrect HTML Sites are down Web is always changing
![Page 20: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/20.jpg)
PageRank Implementation
Convert each URL into unique integer IDLink structure sorted by the IDsRemove dangling linksMake a initial assignment of ranks and
iterate until convergenceAdd the dangling links backIterate the process again to assign weights to
all dangling linksLink database A, is normally kept in RAM
![Page 21: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/21.jpg)
Convergence Properties
PageRank will scale very well for large collections as the scaling factor is roughly linear in log n.
![Page 22: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/22.jpg)
Convergence Properties
Here we interpret web as a expander like graph.
A graph is said to be expander if every subsets of nodes S has a neighborhood that is larger than some factor α times |S|
Mathematically we verify the same if the largest eigenvalue is sufficiently larger than the second-largest eigenvalue
![Page 23: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/23.jpg)
Searching with PageRank
Two search engines implemented using PageRank. Title based search engine
Matches titles of web pages with the given query Ranks the results using PageRank Works well for general queries having a large result set
Full text search engine (Google) Scans the entire document for a match with the given
query Performs rank merging.
![Page 24: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/24.jpg)
Types of Results
Information based result Finds a site which contains great deal of information Propagates textual matching score through the link
structureCommon Case result
Most commonly used site (often commercial) relevant to the search query
PageRank results in good representation of the common case
![Page 25: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/25.jpg)
Personalized PageRank
E vector Corresponds to a distribution of web pages Provides flexibility in adjustment of PageRanks
Uniform E causes highly linked web pages to achieve a very high ranking
Single page E results in important pages not related to the homepage to achieve a low PageRank
E consisting of root level pages of all web servers is a good compromise between uniform E and single page E
![Page 26: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/26.jpg)
Applications
Estimating Web Traffic Looking at differences between PageRank and actual
usage statistics, it is possible to find things that people often look at, but do not want to link to their web pages
Backlink Predictor Citation counts tends to get stuck in the local web
pages Using random surfer model, PageRank quickly finds
the site homepage, and gives preference to its children resulting in an efficient, broad search
Hence PageRank potentially acts as a better backlink predictor since it builds up the entire website information faster
![Page 27: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/27.jpg)
Other Applications
Spam detection and prevention
Sort the backlinks based on their importance
![Page 28: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/28.jpg)
Issues
Users are not random walkers.Starting point distribution (actual usage data
as starting vector).Bias towards main pages.Linkage spam.No query specific rank.
![Page 29: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/29.jpg)
Conclusion
PageRank is a global ranking of all webpages, regardless of their content based solely on their location in the Web’s graph structure
PageRank can be used to separate a small set of commonly used documents
Full database is consulted only when small database is not adequate to answer the queries
Personalized PageRank can be used to create a view of Web from a particular user’s perspective
![Page 30: PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University](https://reader033.vdocuments.net/reader033/viewer/2022052401/56649dbd5503460f94aaf513/html5/thumbnails/30.jpg)
Google Architecture..