discovering key concepts in verbose queries
DESCRIPTION
Discovering Key Concepts in Verbose Queries. Michael Bendersky and W. Bruce Croft University of Massachusetts SIGIR 2008. Objective. “Discovering Key Concepts in Verbose Queries”. Objective. “Discovering Key Concepts in Verbose Queries” Number 829 Spanish Civil War support - PowerPoint PPT PresentationTRANSCRIPT
Discovering Key Concepts in Verbose Queries
Michael Bendersky and W. Bruce Croft
University of Massachusetts
SIGIR 2008
Objective
• “Discovering Key Concepts in Verbose Queries”
Objective
• “Discovering Key Concepts in Verbose Queries”
• <num> Number 829
<title> Spanish Civil War support
<desc> Provide information on all kinds of material international support provided to either side in the Spanish Civil War
Objective
• “Discovering Key Concepts in Verbose Queries”
• <num> Number 829
<title> Spanish Civil War support
<desc> Provide information on all kinds of material international support provided to either side in the Spanish Civil War
Objective
• “Discovering Key Concepts in Verbose Queries”
• Use of key concepts?
Objective
• “Discovering Key Concepts in Verbose Queries”
• Use of key concepts?
• Combine with current IR model
Retrieval Model
• Conventional Language Model:
score(q,d) = p(q|d) = )(
),(
dp
dqp
Retrieval Model
• Conventional Language Model:
score(q,d) = p(q|d) =
• New Model:
score(q,d) = p(q|d) = =
)(
),(
dp
dqp
)(
),,(
dp
cdqpic
i)(
),(
dp
dqp
Final Retrieval Function
score(q,d) = ic
ii dcpqcpdqp )|()|()1()|(
Final Retrieval Function
score(q,d) =
Language Model
ic
ii dcpqcpdqp )|()|()1()|(
Final Retrieval Function
score(q,d) =
Key Concepts
ic
ii dcpqcpdqp )|()|()1()|(
What is a Concept?
• Noun phrase in a query
What is a Concept?
• Noun phrase in a query
• <num> Number 829
<title> Spanish Civil War support
<desc> Provide information on all kinds of material international support provided to either side in the Spanish Civil War
What is a Concept?
• Noun phrase in a query
• <num> Number 829
<title> Spanish Civil War support
<desc> Provide information on all kinds of material international support provided to either side in the Spanish Civil War
Finding ‘Key’ Concepts
• Rank concepts by p(ci|q)
Finding ‘Key’ Concepts
• Rank concepts by p(ci|q)
• Compute p(ci|q) by frequency?
• <num> Number 829
<title> Spanish Civil War support
<desc> Provide information on all kinds of material international support provided to either side in the Spanish Civil War
Finding ‘Key’ Concepts
• Approximate p(ci|q) by machine learning
• h(ci) is ci’s query-independent importance score
• p(ci|q) = h(ci) / ciq h(ci)
ci AdaBoost.M1 h(ci)
Features of a Concept
• is_cap : is capitalized• tf : in corpus• idf : in corpus• ridf : idf modified by Poisson model• wig : weighted information gain; change in entro
py from corpus to retrieved data• g_tf : Google term frequency• qp : number of times the concept appears as a
part of a query in MSN Live• qe : number of times the concept appears as ex
act query in MSN Live
TREC Corpus
Exp 1: Identifying Key Concept
• Cross-validation on corpus
• Each fold has 50 queries
• Check whether the top concept is a key concept
• Assume 1 key concept per query during annotation
Exp 1: Identifying Key Concept
Exp 1: Identifying Key Concept
• Better than idf ranking
Exp 2: Information Retrieval
score(q,d) =
• Use only the top 2 concepts for each query
• q is the entire <desc> section = 0.8
ic
ii dcpqcpdqp )|()|()1()|(
Exp 2: Information Retrieval
• KeyConcept[2]<desc> : author’s method
• SeqDep<desc> : include all bigrams in query
Exp 2: Information Retrieval
What to take home?
• Singling out key concepts improves retrieval