exploring sentence level query expansion in language modeling based information retrieval
DESCRIPTION
Debasis Ganguly Johannes Leveling Gareth Jones. Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval. Outline. Standard blind relevance feedback Sentence based query expansion Does it fit into LM? Evaluation on FIRE Bengali and English ad-hoc topics - PowerPoint PPT PresentationTRANSCRIPT
Exploring Sentence Level Query Expansion in Language Modeling
Based Information RetrievalDebasis Ganguly Johannes Leveling Gareth Jones
Outline
Standard blind relevance feedback
Sentence based query expansion
Does it fit into LM?
Evaluation on FIRE Bengali and English ad-hoc topics
Comparison with term based query expansion
Conclusions
Standard Blind Relevance Feedback (BRF)
Assume top R documents from initial retrieval as relevant.
Extract feedback terms from these documents:
Choose terms which occur in most number of pseudo-relevant documents (e.g. VSM)
Choose terms with highest value of RSV scores (e.g. BM25)
Choose terms with highest value of LM scores (e.g. LM)
Expand query with and final retrieval
What standard BRF assumes (wrongly)
The whole document is relevant
All R feedback documents are equally relevant
Query
t1
t2
Ideal scenario
The whole document is relevant.
Query
t1
t2
Restrict the choice of feedback terms to the relevant segments of the documents
Can we get closer to the ideal?
Extract sentences most similar to the query assuming these sentences constitute relevant text chunks.
Impossible to accurately know the relevant segments
Query
Sentence selection using rank
Make the number of sentences to add for a document proportional to its rank
Not all documents are equally relevant
Query
In short
Documents are often composed of a few main topics and a series of short, sometimes densely discussed subtopics.
Feedback terms chosen from a whole document might introduce a topic shift.
Good expansion terms might exist in a particular subtopic.
Terms with close proximity to the query terms might be useful for feedback.
Does this fit into LM?
Noisy channel
D1
D2
Dn
Query
Add a part of D1 to Q
Add a part of D2 to Q
As a result Q starts looking like D1 and D2 which increases the likelihood of generation Qexp
Qexp
Tools
FIRE collection comprises of newspaper articles from different genres like sports, business etc. in several Indian languages
Morphadorner package used for sentence demarcation
Stopword listsStandard SMART stopword list for English
Default stopword list provided by FIRE organizers for Bengali
StemmersRule based stemmer for Bengali
Porter’s stemmer for English
LM implemented in SMART used for indexing and retrieval
Setup
Baseline is standard BRF using terms occurring in most number of relevant documents
Two variants of sentence based expansion tried out
BRFcns: constant number of sentences for each document
BRFvns: variable number of sentences (proportional to retrieval rank)
Parameter Settings
R: # of documents assumed to be relevant,
varied in [10,40]
T: # of terms to add
varied in [10,40]
m: # of sentences to add from the top ranked document
varied in [2,10]
Best MAPs
Topics
R T MAP
EN-2008 10 10 0.5682
EN-2010 10 30 0.4953
BN-2008 20 40 0.3885
BN-2010 10 30 0.4537
BRF
Topics
R m MAP
EN-2008 30 5 0.5964
EN-2010 20 4 0.5032
BN-2008 20 4 0.4226
BN-2010 10 5 0.4467
BRFcns
Topics
R m MAP
EN-2008 30 10 0.6015
EN-2010 20 8 0.5102
BN-2008 30 10 0.4302
BN-2010 10 8 0.4581
BRFvns
Query drift analysis
As a result of adding too many terms the original query might be completely off-the-mark from the original information need
Measured with impact of changes in precision values per query
An easy query is one for which P@20 for initial retrieval is good
Queries categorized into groups by initial retrieval P@20
A good feedback algorithm would improve many (ideally bad) queries and hurt performance of a few (ideally good) queries
Query drift analysis
BRF
BRFcns
BRFvns
Comparison to True Relevance Feedback
The best possible average precision in IR is obtained by True Relevance Feedback
A BRF method should be as close as possible to this oracle.
Topic |TRF| o(|TBRF|)
o(|Tvns |)
EN08 937 743 912
EN10 433 407 432
BN08
979 744 955
BN10
991 728 933
Conclusions
The new approach improves over standard BRF by
using sentences instead of whole documentsdistinguishing between the amount of pseudo-relevance
Significantly improves MAP on four ad-hoc topic sets as compared to standard BRF for two languages
Is able to add more true relevant terms as compared to standard BRF
Queries?