search engine page rank demystification

27

Upload: raja-r

Post on 24-Apr-2015

745 views

Category:

Technology


3 download

DESCRIPTION

Hi All, This Presentation will feature more about the working of search engine how do the inner functionality takes place. In the later half of the Presentation the Page Rank will be explained in depth. how do they calculate it, How it differing from the actual PR, Google PR. How frequently they do update the PR value in the google. and lots more with calculation and few examples.

TRANSCRIPT

Page 1: Search engine page rank demystification
Page 2: Search engine page rank demystification

By, Rajanagan R Web Analyst

Search Engines

Page 3: Search engine page rank demystification

What is Search Engine.???

A Search Engine is an information retrieval system designed

to help find information stored on a computer system, such

as on the World Wide Web.

A web search tool that automatically visits websites (using

crawlers), records and indexes them within its database, and

generates results based on a user's search criteria.

Unlike Web directories, which are maintained by human

editors, search engines operate algorithmically or are a

mixture of algorithmic and human input.

Page 4: Search engine page rank demystification

History of Search Engines

1993: First web robot – World Wide Web WandererMatthew Gray, Physics student from MITObjective: Track all pages on web to monitor growth of the web

1994: First search engine – WebCrawler, Brian Pinkerton, CS student from U of WashingtonObjective: Download web pages, store the links linked to keyword-searchable DB

1994: Jerry’s Guide to the InternetJerry Yang, David Filo, Stanford UniversityObjective: Crawl for web pages, organize them by content into hierarchies Yet Another Hierarchical Officious Oracle (Yahoo)

1994-97: Infoseek, AltaVista, Excite, Lycos, LookSmart (meta engine) Ranking Based on Content & Structure

1998: Google (Sergey Brin, Larry Page, CS students, Stanford University) Ranking Based on Content, Structure & Value

1990: First tool for Searching on Internet - ArchieAlan Emtage, Student from McGill University in MontrealObjective: Tool for Indexing FTP archives, allowing people to find specific files.

Page 5: Search engine page rank demystification

How Search Engine Works..????

Page 6: Search engine page rank demystification

Step 1: Crawling

Want to See what Crawler looks @

Click Here

Page 7: Search engine page rank demystification

Crawler Looks @ Example

Page 8: Search engine page rank demystification

Back This is what I look in a

website..!!!

Page 9: Search engine page rank demystification

Step 2 : Indexing

Page 10: Search engine page rank demystification

Indexed Database Click Here

Page 11: Search engine page rank demystification

Back

Page 12: Search engine page rank demystification

Step 3 : Processing Query

Page 13: Search engine page rank demystification

Step 4 : Ranking

Page 14: Search engine page rank demystification

Overall Functioning of Search Engines

Your Browser

The Web

URL1

URL2

URL3 URL4

Crawler

Indexer

SearchEngine

Database Eggs?Eggs.

Eggs - 90%Eggo - 81%Ego- 40%

Huh? - 10%

All AboutEggs

in a fraction of second

Page 15: Search engine page rank demystification

SERP

Page Rank???

Page 16: Search engine page rank demystification

Google Page Rank Algorithm

Back Bone of Google Technology developed by Larry Page & Sergey Brin in 1998.

Ranks Pages based on the number of other pages that link to it.

Calculated by the nature and the number of Back links producing the SERP Listing.

Google toolbar shows the page rank as scale value from 0 -10, you can find at - www.toolbar.google.com. But it’s just an rough guide not the Actual or the Real PR. Nevertheless, it can be a good indication for SEO practitioners to know whether the website is moving in the right (or wrong) direction.

Page 17: Search engine page rank demystification

Definition of Page Rank In order to measure the relative importance of web pages, Page Rank is

proposed. It is a method for computing a ranking for every web page based on the graph (Links) of the web.

We assume,T1...Tn – Links in page A which point to it (i.e., are citations). D - Damping factor which can be set between 0 and 1, usually set d=0.85. C(A) - Number of links going out of page A i.e. Outgoing links

The Page Rank of a page A is given as follows,

PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))

Note: Page Ranks form a probability distribution over web pages, so the average of all web pages Page Ranks will be one.

Page 18: Search engine page rank demystification

Calculating Page Rank

The PR of each page depends on the PR of the pages pointing to it. We won’t know what PR those pages have until the pages pointing to them have their PR calculated and it goes on..

Seems impossible in calculating PR..! But there is a Solution..! Here we Go.!!!

Page Rank can be calculated using a simple iterative algorithm, corresponds to the principal eigenvector of the normalized link matrix of the web.

It means, We can calculate a page’s PR without knowing the final value of the PR of the

other pages. What we need to do :- Remember the each value we calculate Repeat the calculations lots of times until

the numbers stop changing much.

Page 19: Search engine page rank demystification

Simple hierarchy

Each page has one outgoing link, i.e. C(A) = 1 and C(B) = 1)

We don’t know the PR of the pages, lets assume each has PR = 1.00 , d = 0.85

PR(A) = (1 – d) + d(PR(B)/1) PR(B) = (1 – d) + d(PR(A)/1)

i.e.PR(A) = 0.15 + 0.85 * 1 = 1PR(B) = 0.15 + 0.85 * 1= 1

We started out with a lucky guess..! The numbers aren't changing at all..!

Page 20: Search engine page rank demystification

Complex Hierarchy

Average PR : 0.378 PR Loss : 8 – (.92+.41+.41+.41+.22+.22+.22+.22)0.378 = 7.622

For Calculation Click Here

Page 21: Search engine page rank demystification

Complex Hierarchy with Avg PR = 1.0000

Average PR : 1.0000 PR Loss : 8 – (3.35+1.1+1.1+1.1+.34+.34+.34+.34) = 0.0000

Page 22: Search engine page rank demystification

FinallyObservation:

It doesn't matter how many pages you have in your site, your average PR will always be 1.0 at best. But a hierarchical layout can strongly concentrate votes and therefore the PR.

Page Rank is, in fact, very simple (apart from one scary looking formula). But when a simple calculation is applied hundreds (or billions) of times over the results can seem complicated.

Page Rank is also only part of the story about what results get displayed high up in a Google listing. Google also pays attention to the text in a link's anchor when deciding the relevance of a target page perhaps more than the page's PR.

Page Rank is still part of the listings story though, so it's worth your while as a good designer to make sure you understand it correctly.

Page 23: Search engine page rank demystification

DFID 200623

Page 24: Search engine page rank demystification

ReferencesThe PageRank paper by Google's founders Sergey Brin and Lawrence Page

http://www-db.stanford.edu/~backrub/google.html

Chris Ridings' "PageRank Explained" paper which, as of April 2002 http://web.archive.org/web/*/

http://www.goodlookingcooking.co.uk/PageRank.pdf

An excellent discussion by Douglas W. Jones http://www.cs.uiowa.edu/~jones/cards/chad.html

http://www.sirgroane.net/google-page-rank/

http://www.youtube.com/watch?feature=player_embedded&v=h3Jup5R1MGY#!

http://www.searchnerd.com/pagerank/

Page 25: Search engine page rank demystification

Thank You..!!!

Queries if any please.!!Reach me @ [email protected]

Page 26: Search engine page rank demystification

Next

Page 27: Search engine page rank demystification

Back