survey on long queries in keyword search : phrase-based ir sungchan park 2008. 08. 07

12
Survey on Long Queries in Keyword Search Survey on Long Queries in Keyword Search : Phrase-based IR : Phrase-based IR Sungchan Park 2008. 08. 07.

Upload: kathleen-hunt

Post on 18-Jan-2018

213 views

Category:

Documents


0 download

DESCRIPTION

Copyright  2008 by CEBT My Topic: Phrase-based IR  Why? The presence of phrases is one significant difference between single word queries and multi word queries. And identifying phrases is important for understanding real meanings of sentences. – Ex) “hot dog” Thus, how to identify and use phrases in queries is important in devising processing strategy for multi word queries.  Focus of Survey Using Phrases(Judging Relevance) – Skipped the contents about identifying phrases 3

TRANSCRIPT

Page 1: Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park 2008. 08. 07

Survey on Long Queries in Keyword SearchSurvey on Long Queries in Keyword Search: Phrase-based IR: Phrase-based IR

Sungchan Park2008. 08. 07.

Page 2: Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park 2008. 08. 07

Copyright 2008 by CEBT

Survey So Far…Survey So Far… Jaehui

Term Proximity Scoring Jung-Yeon

Semantic Query Jongheum

Index Structure Optimized for Multi-keyword Query

2

Page 3: Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park 2008. 08. 07

Copyright 2008 by CEBT

My Topic: Phrase-based IRMy Topic: Phrase-based IR Why?

The presence of phrases is one significant difference between single word queries and multi word queries.

And identifying phrases is important for understanding real meanings of sentences.– Ex) “hot dog”

Thus, how to identify and use phrases in queries is important in devising processing strategy for multi word queries.

Focus of Survey Using Phrases(Judging Relevance)

– Skipped the contents about identifying phrases

3

Page 4: Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park 2008. 08. 07

Copyright 2008 by CEBT

Early Researches on Phrase-based IREarly Researches on Phrase-based IR Using fixed proximity constraints(window size)

“The Use of Phrase and Structural Queries in Information Retrieval”(1991) “Evaluation of Syntactic Phrase Indexing”(1996) …

4

word#1 word#2 word#3

Relevant DocumentQuery Phrase

word#1

word#2

word#3

Window

Page 5: Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park 2008. 08. 07

Copyright 2008 by CEBT

Progress #1: Structural Proximity Progress #1: Structural Proximity “Phrase-based Information Retrieval”

A.T. Arampatiz et al. 1998 Identifying noun phrases in documents, and using the noun phrases for

criteria of “nearness”

5

…A noun phrase identified by NLP engine

radio programs BBC

Relevant DocumentQuery Phrase

The studios for later BBC

onradio

programs

Page 6: Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park 2008. 08. 07

Copyright 2008 by CEBT

Progress #1: Structural Proximity, Progress #1: Structural Proximity, ExperimentExperiment

Experiment Result Gained high precision But loses recall

– The auhors wrote it can be addressed by taking into account linguistic variation and anaphora.

6

Page 7: Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park 2008. 08. 07

Copyright 2008 by CEBT

Progress #2: Varied Window SizeProgress #2: Varied Window Size “An Effective Approach to Document Retrieval via Utilizing Wordnet and

Recognizing Phrases” Shuang Liu et al. 2004

– Their consequent work was published in 2007 Classifying phrases into four types

– Proper name– Dictionary phrase – Simple phrase– Complex phrase– Proximity constraints of each types are different!

7

Page 8: Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park 2008. 08. 07

Copyright 2008 by CEBT

Progress #2: Varied Window Size, ExampleProgress #2: Varied Window Size, Example

8

Sungchan Park

NOT Relevant DocumentQuery Phrase #1

Sungchan

Park

… was hospitalized for mental problem

… and had been on lithium for his illness

Recently …

mental illness

Relevant DocumentQuery Phrase #2

mentalillness

Page 9: Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park 2008. 08. 07

Copyright 2008 by CEBT

Progress #2: Varied Window Size, SolutionProgress #2: Varied Window Size, Solution Solution

Learning the window size for each phrase types.– Result by Decision Tree

Proper name : 0 Dictionary phrase : 16 Simple phrase : 48 Complex phrase : 78

9

Page 10: Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park 2008. 08. 07

Copyright 2008 by CEBT

Progress #2: Varied Window Size, Progress #2: Varied Window Size, ExperimentExperiment

Experiment Result The author did not compare their approach with naïve approach. In my focus, above result only shows that phrase-based IR can improve

performance of IR system.

10

Page 11: Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park 2008. 08. 07

Copyright 2008 by CEBT

ConclusionConclusion Phrase-based relevance model have been researched by only few

researchers However, the progresses are interesting

– Determine nearness via sentence structure.– Varying proximity constraints according to type of query phrase.

11

Page 12: Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park 2008. 08. 07

Copyright 2008 by CEBT

ReferencesReferences The Use of Phrase and Structural Queries in Information Retrieval, 1991 Evaluation of Syntactic Phrase Indexing, 1996 Phrase-based Information Retrieval, 1998 Phrase Recognition and Expansion for Short, Precision-biased Queries

based on a Query log, 1999 The Use of Phrases from Query Texts in Information Retrieval, 2000 An Effective Approach to Document Retrieval via Utilizing Wordnet and

Recognizing Phrases, 2004 The Role of Multi-word Units in Interactive Information Retrieval, 2005 Recognition and Classification of Noun Phrases in Queries for Effective

Retrieval, 2007

12