using odp metadata to personalize search

Using ODP Metadata to Personalize Search

Presented by Lan Nie

09/21/2005, Lehigh University

Introduction ODP metadata

4 million sites, 590,000 categories Tree Structure

Categories: inner node Pages: leaf node, high quality, representative

Using ODP Metadata to personalize Search 4 billion vs. 4 million Using ODP Metadata for personalized search Is biasing possible in the ODP context?

Extend ODP classifications from its current 4 million to a 4 billion Web automatically by biasing

Using ODP Metadata For Personalized Search

User Profile: several topics from ODP selected by user Personalized Search

Send Q to a search Engine S(E.g., Google, ODP Search) Res=URLs returned by S For i= 1 to size(Res)

Dist[i]=Distance(Res[i], Prof) Resort Res based on Dist

Representation Both user profile and URL(50% in Google directory) can be

represented as a set of nodes in the directory tree Distance ( Profile, URL)

Minimum distance between the 2 set of nodes.

Naïve Distances

Minimum tree distance Intra-topic links Subsumer

Graph shortest path Inter-topic links

hh

hhl

ee

eeeebas

l

)..)1((),( 21,

)().1(),(1

1),(

,,, bPageRank

basbas

Combing with Google PageRankSome Google Results are not annotated

Complex DistanceThe bigger the subsumer’s depth is, the more related are the nodes

Experimental Results

Extending ODP Annotations To The Web

Manual annotation for the whole web is impossible Biasing is an implicit way for extending annotations to

the Web Is basing possible in the ODP context?

Are ODP entries good biasing sets to obtain relevant results: generate rankings which are different enough from the non-biased ranking

When does biasing make a difference?

Find the characteristics the biasing set has to exhibit in order to obtain relevant results

Compare the similarity between top 100 non-biased PageRank results and biased results

n

TopTop nn )()( 11

Similarity Measure OSIM: degree of overlap between the top n elements of two rank lists

KSim: degree of agreement on ordering between the two rank lists

|1|.||

),(,:),( '2

'1

UU

vuand

vuorderonagreevu

Experimental Setup

Choice of Biasing Sets Top [0-10]% PageRank pages Top[0-2]% PageRank pages Randomly selected pages Low PageRank pages

Varied the sum of score within the set between 0.000005% and 10% of the total sum over all pages (TOT).

Experiments are done on a crawl of 3 million pages, and then applied on Stanford WebBase crawl.

Biasing set consists of good pages

Biasing set consists of random selected pages

According to the random model of biasing, every set with TOT below 0.015% is good for biasing.

Results are not influence by the crawl size

(3 million crawl vs 120 million WebBase crawl) Entries in ODP have TOT below than 0.015% thus biasing is

possible in the ODP context

Conclusions

A Personalized search algorithm to rank urls based on the distance between user profile and url in the ODP taxonomy.

Biasing on ODP entries will take effect, thus it is feasible to extend the manual ODP classification to the Web is feasible

using odp metadata to personalize search

Documents

odp taxonomy

odp searchres

odp classifications

biasingusing odp metadata

millionusing odp metadata

odp entries good biasing

manual odp classification

set of nodes