survey on long queries in keyword search : phrase-based ir sungchan park 2008. 08. 07
DESCRIPTION
Copyright 2008 by CEBT My Topic: Phrase-based IR Why? The presence of phrases is one significant difference between single word queries and multi word queries. And identifying phrases is important for understanding real meanings of sentences. – Ex) “hot dog” Thus, how to identify and use phrases in queries is important in devising processing strategy for multi word queries. Focus of Survey Using Phrases(Judging Relevance) – Skipped the contents about identifying phrases 3TRANSCRIPT
Survey on Long Queries in Keyword SearchSurvey on Long Queries in Keyword Search: Phrase-based IR: Phrase-based IR
Sungchan Park2008. 08. 07.
Copyright 2008 by CEBT
Survey So Far…Survey So Far… Jaehui
Term Proximity Scoring Jung-Yeon
Semantic Query Jongheum
Index Structure Optimized for Multi-keyword Query
2
Copyright 2008 by CEBT
My Topic: Phrase-based IRMy Topic: Phrase-based IR Why?
The presence of phrases is one significant difference between single word queries and multi word queries.
And identifying phrases is important for understanding real meanings of sentences.– Ex) “hot dog”
Thus, how to identify and use phrases in queries is important in devising processing strategy for multi word queries.
Focus of Survey Using Phrases(Judging Relevance)
– Skipped the contents about identifying phrases
3
Copyright 2008 by CEBT
Early Researches on Phrase-based IREarly Researches on Phrase-based IR Using fixed proximity constraints(window size)
“The Use of Phrase and Structural Queries in Information Retrieval”(1991) “Evaluation of Syntactic Phrase Indexing”(1996) …
4
word#1 word#2 word#3
Relevant DocumentQuery Phrase
word#1
word#2
word#3
Window
Copyright 2008 by CEBT
Progress #1: Structural Proximity Progress #1: Structural Proximity “Phrase-based Information Retrieval”
A.T. Arampatiz et al. 1998 Identifying noun phrases in documents, and using the noun phrases for
criteria of “nearness”
5
…A noun phrase identified by NLP engine
…
radio programs BBC
Relevant DocumentQuery Phrase
The studios for later BBC
onradio
programs
Copyright 2008 by CEBT
Progress #1: Structural Proximity, Progress #1: Structural Proximity, ExperimentExperiment
Experiment Result Gained high precision But loses recall
– The auhors wrote it can be addressed by taking into account linguistic variation and anaphora.
6
Copyright 2008 by CEBT
Progress #2: Varied Window SizeProgress #2: Varied Window Size “An Effective Approach to Document Retrieval via Utilizing Wordnet and
Recognizing Phrases” Shuang Liu et al. 2004
– Their consequent work was published in 2007 Classifying phrases into four types
– Proper name– Dictionary phrase – Simple phrase– Complex phrase– Proximity constraints of each types are different!
7
Copyright 2008 by CEBT
Progress #2: Varied Window Size, ExampleProgress #2: Varied Window Size, Example
8
Sungchan Park
NOT Relevant DocumentQuery Phrase #1
Sungchan
Park
… was hospitalized for mental problem
… and had been on lithium for his illness
Recently …
mental illness
Relevant DocumentQuery Phrase #2
mentalillness
Copyright 2008 by CEBT
Progress #2: Varied Window Size, SolutionProgress #2: Varied Window Size, Solution Solution
Learning the window size for each phrase types.– Result by Decision Tree
Proper name : 0 Dictionary phrase : 16 Simple phrase : 48 Complex phrase : 78
9
Copyright 2008 by CEBT
Progress #2: Varied Window Size, Progress #2: Varied Window Size, ExperimentExperiment
Experiment Result The author did not compare their approach with naïve approach. In my focus, above result only shows that phrase-based IR can improve
performance of IR system.
10
Copyright 2008 by CEBT
ConclusionConclusion Phrase-based relevance model have been researched by only few
researchers However, the progresses are interesting
– Determine nearness via sentence structure.– Varying proximity constraints according to type of query phrase.
11
Copyright 2008 by CEBT
ReferencesReferences The Use of Phrase and Structural Queries in Information Retrieval, 1991 Evaluation of Syntactic Phrase Indexing, 1996 Phrase-based Information Retrieval, 1998 Phrase Recognition and Expansion for Short, Precision-biased Queries
based on a Query log, 1999 The Use of Phrases from Query Texts in Information Retrieval, 2000 An Effective Approach to Document Retrieval via Utilizing Wordnet and
Recognizing Phrases, 2004 The Role of Multi-word Units in Interactive Information Retrieval, 2005 Recognition and Classification of Noun Phrases in Queries for Effective
Retrieval, 2007
12