kshitij: a search and page recommendation system for wikipedia center for e-business technology...

22
Kshitij: Kshitij: A Search and Page Recommendation A Search and Page Recommendation System for Wikipedia System for Wikipedia Center for E-Business Technology Seoul National University Seoul, Korea Nam, Kwang-hyun Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University, Seoul, Korea Phanikumar Bhamidipati, Kamalakar Karlapalem Center for Data Engineering International Institute of Information Technology, Hyderabad, India COMAD 2008

Upload: sheila-sanders

Post on 27-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Kshitij: Kshitij: A Search and Page Recommendation A Search and Page Recommendation System for WikipediaSystem for Wikipedia

Center for E-Business TechnologySeoul National University

Seoul, Korea

Nam, Kwang-hyun

Intelligent Database Systems LabSchool of Computer Science & EngineeringSeoul National University, Seoul, Korea

Phanikumar Bhamidipati, Kamalakar Karlapalem

Center for Data Engineering International Institute of Information Technology,

Hyderabad, India

COMAD 2008

Copyright 2009 by CEBT

ContentsContents

Motivation Problem statement Kshitij

Overview Graph Model Architecture Algorithms

– CBR, LBR, YBR, AR

Results Conclusion & Future Work Discussion

IDS Lab Seminar - 2

Copyright 2009 by CEBT

MotivationMotivation

New paradigms in Search Increased interest after PageRank and HITS (Hyperlink-

Induced Topic Search) algorithms

Wikipedia Powerful online collaborative encyclopedia Vast knowledge, available in structured format The links in each page represent some kind of relation with

the base page– Can be mine both the semantics and data from Wikipedia

Need for systems that leverage Wikipedia knowledge in recommendations

IDS Lab Seminar - 3

Copyright 2009 by CEBT

KshitijKshitij

A generic recommendation system based on Wikipedia semantics

Provides two services Search Recommendations Page Recommendations

Uses Yago as the stored knowledge base Extracts additional knowledge dynamically from the Wiki

pages.

IDS Lab Seminar - 4

Copyright 2009 by CEBT

Search RecommendationsSearch Recommendations

IDS Lab Seminar - 5

Kshitij RecommendationsKshitij RecommendationsResult from Search EngineResult from Search Engine

Keyword as inputKeyword as input

Copyright 2009 by CEBT

Page RecommendationsPage Recommendations

When the user visit a page, its identifier is sent as input to the algorithms to obtain recommendations

The most relevant aggregated results

Displayed as hyperlinks

IDS Lab Seminar - 6

Copyright 2009 by CEBT

Kshitij - OverviewKshitij - Overview

Leverages the structured model powered by Wikis

Categories

Links

YAGO

An ontology compiled from Wikipedia

The static source of knowledge

IDS Lab Seminar - 7

Copyright 2009 by CEBT

The Graph StructureThe Graph Structure

IDS Lab Seminar - 8

Atari JaguarAtari Jaguar

Atari Jaguar IIAtari Jaguar II

JaguarJaguar

Jaguar CarsJaguar Cars

Atari 7800Atari 7800SearchSearch

FelidaeFelidae

Black PantherBlack Panther

MammalMammalWilliam LyonsWilliam Lyons AutomobileAutomobile

Copyright 2009 by CEBT

Kshitij Kshitij –– Architecture Architecture

IDS Lab Seminar - 9

Copyright 2009 by CEBT

Kshitij - AlgorithmsKshitij - Algorithms

Three individual recommendations that explore different semantics CBR LBR YBR

A link based aggregator (AR) Combines the three into single set of recommendations

IDS Lab Seminar - 10

Copyright 2009 by CEBT

Category Based Recommendations Category Based Recommendations (CBR)(CBR)

Key idea If two pages belong to multiple categories together, the

probability that they belong to the same topic increases– London and Berlin in Capitals In Europe and Host cities of the

Summer Olympic Games

Algorithm Starts with a set of pages (search output) Explores category structure to obtain candidate pages Prunes the list based on similarity values calculated from

shared categories using threshold T1 and T2

IDS Lab Seminar - 11

Copyright 2009 by CEBT

Link Based Recommendations Link Based Recommendations (LBR)(LBR)

Key idea

If two pages are referred together from the same set of pages, they could be considered as related

– Competing sports persons, countries in same alliance

Algorithm

Start with search results and output of CBR

Identify frequent item sets

Support by search results is high over CBR output

IDS Lab Seminar - 12

Copyright 2009 by CEBT

Yago Based Recommendations Yago Based Recommendations (YBR)(YBR)

Set of facts in triplet form <E1, R, E2>

<New Delhi, Is Capital Of, India>

Prune the relation types

Key idea

To find a prioritized set of entities that are related to a given set of Wikipedia pages

Algorithm

Start with search output

Retrieve entities related to these pages based on the weight measure

Merge the lists and identify the related pages

IDS Lab Seminar - 13

Copyright 2009 by CEBT

Diversity of the algorithmsDiversity of the algorithms

Each explores different knowledge space

The graph explored along edges of a specific color

Recommendations of individual algorithms differ

Need for aggregation

Combines and prioritizes the results

IDS Lab Seminar - 14

Copyright 2009 by CEBT

Aggregated Recommendations Aggregated Recommendations (AR)(AR)

To group them based on the topic each result belongs to

A link based approach

Algorithm

Start with search results and an aggregated list of CBR, LBR and YBR (Cumulative List (CL))

Explore the neighborhood for each search result to find how many in CL are reachable

A threshold T on the nearness value to filter the related page

Each result page as a point in k-dimensional space (each dimension by one page in CL)

Run Agglomerative Nesting (AGNES – A hierarchical clustering algorithm) to obtain clusters of result pages

IDS Lab Seminar - 15

Copyright 2009 by CEBT

Results: EvaluationResults: Evaluation

Mean Absolute Error (MAE)

To evaluate the effectiveness of a recommendation system

IDS Lab Seminar - 16

N The total number of result pagesK The total number of recommendationsrij The actual relevance of a given recommendation ij The relevance given by the system

Copyright 2009 by CEBT

Results: Search Results: Search RecommendationsRecommendations

A value of 0.4 for T balances both fetching moderate number of recommendation and keeping good quality

IDS Lab Seminar - 17

Copyright 2009 by CEBT

Results: Search Results: Search RecommendationsRecommendations

Keyword: jaguar

IDS Lab Seminar - 18

Result from Search Engine

Kshitij Recommendations

Jaguar Felidae, Animal, Big cat, Black panther

Jaguar Cars Browns Lane Pant, Automaker, William Lyons

SEPECAT Jaguar Flight altitude record, Flight airspeed record, Aircraft manufacturer, Aviation

HMS Jaguar (F34) HMS Kelvin (F37)

Atari Jaguar, Atari Jaguar CD

Atari 7800, Atari Jaguar II

Jaguar X-Type, Jaguar XK Jaguar XJS, Car classification

Copyright 2009 by CEBT

Results: Search Results: Search RecommendationsRecommendations

Keyword: amazon

IDS Lab Seminar - 19

Result from Search Engine Kshitij Recommendations

Amazon.com Public company, Industry, NASDAQ

Amazon Rainforest, Amazon River, Amazon Basin

Brazil, Peru, Colombia, Bird, South America

Survivor, The Amazon Survivor: All-Stars, Brazil, Survivor: Africa,Survivor: Pearl Islands

HMS Amazon, HMS Amazon (F169)

Royal Navy, HMS Alacrity (F174), HMS Ambuscade (F172)

Volvo Amazon Car classification, Automaker, Car body style

Copyright 2009 by CEBT

Results: Page RecommendationsResults: Page Recommendations

IDS Lab Seminar - 20

Page Recommendations MAE

Hyderabad State

Kolhapur, Delhi Sultanate, List of Indian Princely States, British India

0.17

DAX Stock market index, List of stock market indices, Germany, BMW, Allianz

0.19

Godavari River Krishna River, Kaveri River, Beas River, Eastern Ghats, Ganges, Bay of Bengal, Chilka lake

0.2

Salzburg Vienna, Archbishopric of Salzburg, Augsburg, Austria

0.17

Horlicks Ovaltine, Hot chocolate, Nestle Milo, Maxim's, World War II,Malted milk, GlaxoSmithKline

0.18

Copyright 2009 by CEBT

Conclusion & Future WorkConclusion & Future Work

Good quality recommendations can be obtained from annotated knowledge bases using only semantic information

More Wikipedia structures

Templates, References, Info-Boxes, History

Currently, calculates the recommendations on-demand

Plan to come up with a strategy that pre-calculates and stores recommendations set

IDS Lab Seminar - 21

Copyright 2009 by CEBT

DiscussionDiscussion

Pros

Present a generic recommendation system that utilizes the stored as well as dynamically extracted semantics from Wikipedia

Good examples

Cons

The figures and tables are not sequentially located.

No comparison with other recommendation system

– But, the authors mention that there is no existing recommendation system with which they can directly compare theirs.

IDS Lab Seminar - 22