information retrieval in context

38
05/09/22 Xuehua Shen @CS, UIUC 1 Information Retrieval in Context Presenter: Xuehua Shen [email protected]

Upload: nitza

Post on 22-Feb-2016

41 views

Category:

Documents


0 download

DESCRIPTION

Information Retrieval in Context. Presenter: Xuehua Shen [email protected]. Presentation Layout. Problem Description Terminology Challenges IntelliZap System[WWW2001] Concerns. Problem. Search Engine has become key source of information - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 1

Information Retrieval in Context Presenter: Xuehua Shen [email protected]

Page 2: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 2

Presentation Layout

Problem Description Terminology Challenges IntelliZap System[WWW2001] Concerns

Page 3: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 3

Problem Search Engine has become key source of information 1998[GVU WWW Study]: 85% people use search engine to locate information Now [Craig’s Talk]: 500 million search on Internet per

day 150 million search at Google per day

Efforts on Coverage and Relevance

Page 4: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 4

Web Search Fact Given 3-5 billion web pages on the Web huge and diverse info provided by Web On average 1.7-words per query [Eric

Brewer CACM 09/2002] little info provided by Users Can search engine retrieve web pages very

well?

Page 5: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 5

Context Context may provide extra information to help improve search result relevance An example: Searching flowers [DirectHit 1999] Man: typically want sites that let them send flowers Woman: often want sites that let them order flower

seeds or plants for gardening purposes

What context information useful?

Page 6: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 6

Terminology Ephemeral Context In a single search session Category[Inquirus2], Document being viewed [Watson], Feedback

Persistent Context increment over time, used in subsequent

sessions User profile [My Yahoo!], Query history & Clickthrough Data [Google]

Page 7: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 7

Terminology cont. Personalization Search Engine use context information to

provide different search results for different users

Customization Users manually configure their preferences

Page 8: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 8

Challenges How to capture and store useful information?

SearchPad[WWW2001]:• Server-proxy-client architecture• User explicitly mark relevant pages • Any shortcomings? Better ways?

Page 9: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 9

Challenges cont. Many retrieval models, also many user

models, But how to merge them?

language model is used to represent context by Croft

Page 10: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 10

Challenges How to build such system, such as

architecture Server side, client side? User Interface? Server side: scalability, privacy Client side: communication of context info

with server

Page 11: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 11

Challenges How to evaluate such work?

Metrics?

HARD (Hard Accuracy Relevance from Document) Track added this year

leverage additional information about searcher and/or search context

Page 12: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 12

Intellizap – General Description

Assumption: a large fraction of searches originate while users are reading documents on their computers.

Standpoint: Context is a body of words of surrounding a user-selected phrase

Intellizap System: Meta Search Engine with context-based query augmentation, search engine

selection and reranking

Page 13: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 13

Walkthrough of IntelliZap

Page 14: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 14

Walkthrough cont.

Page 15: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 15

Walkthrough cont.

Page 16: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 16

Walkthrough cont.

Page 17: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 17

Walkthrough cont.

Page 18: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 18

How to use Context augment query before sending queries to

search engines

rerank the results returned by search engines

Page 19: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 19

How to collect right amount of context

Don’t include all document as Watson System Heuristics 1 establishing optimal context length as a function of the length of text phrase and individual frequencies Heuristics 2 relative weighting of the text and context in augmented query emphasize marked text phrase weight of context word: monotonic function of their proximity to text

Page 20: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 20

Algorithm Overview

Page 21: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 21

Step 0: Semantic Network Build Semantic Network (offline): statistics-

based semantic network

Linear combination of vector-based correlation metric and WordNet-based metric

Page 22: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 22

Semantic Network cont. Vector-based correlation metric: 27 knowledge domains (computer, business etc.) 10,000 documents samples on Internet each word: a 27-dimension vector use correlation to measure distance

WordNet: capture semantic relations between words (hypernymy, hyponymy, meronymy and holonymy).

WordNet:http://www.cogsci.princeton.edu/~wn/

Page 23: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 23

Step 1: Query Augmentation Extract keywords from context surrounding

the user-selected text utilizing semantic network

typically context – about 50 words

use clustering algorithm to construct several queries of different topics

Page 24: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 24

Step 2: Search Engine Selection

IntelliZap is a Meta Search Engine

Several general search engines ( such as Google, Altavista)

For several domains, specific search engines( such as WebMD, FindLaw) is assigned to as a priori.

Page 25: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 25

Step 3: Results Reranking There are several lists of results returned by

several search engines.

Use semantic network to calculate distance between results titles/summaries and text/context

Page 26: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 26

Evaluation Method

State-of-the-art: lack the benchmark

Use subjects recruited by external agency

Subjects don’t know objective of the experiments,

just asked to do search and evaluate results

Page 27: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 27

Experiment Result

Page 28: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 28

Experiment Results cont.

Page 29: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 29

Concerns Privacy and security Million users info database of My Yahoo! Monitor users through queries they sent! Relevance consistency Communication Problem

Page 30: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 30

End

Thank you!

Page 31: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 31

Backup Slides

Page 32: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 32

Web Statistics Accessibility of Information on the Web Steve Lawrence, Nature 1999

Page 33: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 33

Semantic Relation Hypernymy: the semantic relation of being superordinate or belonging to a higher rank or class Synonym: superordination Hyponymy: the semantic relation of being subordinate or belonging to a lower rank or class Synonym: subordination Meronymy: the semantic relation that holds between a part and the whole Synonym: part to whole relation Holonymy: the semantic relation that holds between a whole and its partsSynonym: whole to part relation More at http://dictionary.metor.com/wnet/

Page 34: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 34

Clustering algorithm Traditional clustering algorithm doesn’t

work due to a large amount of noise and a small amount of information available 50 context words represented in 27 D space

Special clustering algorithm-High Dimensional clustering

perform Recurrent Clustering analysis (averaging over iterations)

refine results statistically

Page 35: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 35

Limitation of Web Freshness Coverage( only publicly indexable web) Bias (not index sites equally)

Page 36: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 36

Several Systems--1 Inquirus2: meta search engine

Watson Project (Jay Budzik,NWU): contents of full documents being edited in MS Word or Viewed in Explorer

Remembrance Agent (Bradley Rhodes,MIT): software agent just-in-time information retrieval

Page 37: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 37

Several System--2 Outride (renamed in 2001) GroupFire (spin off from PARC Xerox) in

2000

Page 38: Information Retrieval in Context

04/22/23 Xuehua Shen @CS, UIUC 38

Reference [1] Graphic,Visualization and Usability

Center GVU’s 10th WWW User Survey,1998