advanced information retrieval -...
TRANSCRIPT
![Page 1: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/1.jpg)
Advanced Information Retrieval
Wintersemester 2007/08Teil 7
Question AnsweringUwe QuasthoffUniversität Leipzig
Institut für [email protected]
unter Verwendung von Material von Giuseppe Attardi, Dipartimento di Informatica, Università di Pisa
![Page 2: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/2.jpg)
Question Answering
IR: find documents relevant to query– query: boolean combination of
keywords QA: find answer to question
– Question: expressed in natural language
– Answer: short phrase (< 50 byte)
![Page 3: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/3.jpg)
Trec-9 Q&A track
693 fact-based, short answer questions– either short (50 B) or long (250 B) answer
~3 GB newspaper/newswire text (AP, WSJ, SJMN, FT, LAT, FBIS)
Score: MRR (penalizes second answer) Resources: top 50 (no answer for 130 q) Questions: 186 (Encarta), 314 (seeds from
Excite logs), 193 (syntactic variants of 54 originals)
![Page 4: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/4.jpg)
Commonalities
Approaches:– question classification– finding entailed answer type– use of WordNet
High-quality document search helpful (e.g. Queen College)
![Page 5: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/5.jpg)
Sample Questions
Q: Who shot President Abraham Lincoln?A: John Wilkes Booth
Q: How many lives were lost in the Pan Am crash in Lockerbie?A: 270
Q: How long does it take to travel from London to Paris through the Channel?A: three hours 45 minutesQ: Which Atlantic hurricane had the highest recorded wind speed?A: Gilbert (200 mph)
Q: Which country has the largest part of the rain forest?A: Brazil (60%)
![Page 6: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/6.jpg)
Question Types
Class 1 Answer: single datum or list of itemsC: who, when, where, how (old, much, large)
Class 2 A: multi-sentenceC: extract from multiple sentences
Class 3 A: across several textsC: comparative/contrastive
Class 4 A: an analysis of retrieved informationC: synthesized coherently from several retrieved fragments
Class 5 A: result of reasoningC: word/domain knowledge and common sense reasoning
![Page 7: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/7.jpg)
Question subtypes
Class 1.A About subjects, objects, manner, time or location
Class 1.B About properties or attributes
Class 1.C Taxonomic nature
![Page 8: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/8.jpg)
Results (long)
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
SMU
Queens
Waterlo
oIB
MLIM
SINTT IC
Pisa
MRRUnofficial
![Page 9: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/9.jpg)
Falcon: ArchitectureQuestion
Question Semantic Form
ExpectedAnswer
TypeAnswer
Paragraphs
Answer Semantic Form
Answer
Answer Logical Form
Paragraph Index
Question Processing Paragraph Processing Answer Processing
Paragraph filtering
Collins Parser + NE Extraction
Abduction Filter
Coreference Resolution
Question Taxonomy
Question ExpansionWordNet
Collins Parser + NE Extraction
Question Logical Form
![Page 10: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/10.jpg)
Question parse
Who was the first Russian astronaut to walk in space
WP VBD DT JJ NNP NP TO VB IN NN
NP NP
PP
VP
S
VP
S
![Page 11: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/11.jpg)
Question semantic form
astronaut
walk space
Russianfirst
PERSON
first(x) ∧ astronaut(x) ∧ Russian(x) ∧ space(z) ∧ walk(y, z, x) ∧ PERSON(x)
Question logic form:Question logic form:
Answer type
![Page 12: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/12.jpg)
Expected Answer Type
size Argentina
dimension
QUANTITYWordNet
Question: Question: What is the size of Argentina?What is the size of Argentina?
![Page 13: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/13.jpg)
Questions about definitions
Special patterns:– What {is|are} …?– What is the definition of …?– Who {is|was|are|were} …?
Answer patterns:– …{is|are}– …, {a|an|the}– … -
![Page 14: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/14.jpg)
Question Taxonomy
Reason
Number
Manner
Location
Organization
Product
Language
Mammal
Currency
Nationality
Question
Game
Reptile
Country
City
Province
Continent
Speed
Degree
Dimension
Rate
Duration
Percentage
Count
![Page 15: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/15.jpg)
Question expansion
Morphological variants– invented → inventor
Lexical variants– killer → assassin– far → distance
Semantic variants– like → prefer
![Page 16: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/16.jpg)
Indexing for Q/A
Alternatives:– IR techniques– Parse texts and derive conceptual
indexes Falcon uses paragraph indexing:
– Vector-Space plus proximity– Returns weights used for abduction
![Page 17: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/17.jpg)
Abduction to justify answers
Backchaining proofs from questions Axioms:
– Logical form of answer– World knowledge (WordNet)– Coreference resolution in answer text
Effectiveness:– 14% improvement– Filters 121 erroneous answers (of 692)– Requires 60% question processing time
![Page 18: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/18.jpg)
TREC 13 QA
Several subtasks:– Factoid questions– Definition questions– List questions– Context questions
LCC still best performance, but different architecture
![Page 19: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/19.jpg)
LCC Block Architecture
PassageRetrieval
Answer Extraction
Theorem Prover
Answer Justification
Answer Reranking
Axiomatic Knowledge Base
WordNetNER WordNetNER
DocumentRetrieval
Keywords Passages
Question Semantics
Captures the semantics of the questionSelects keywords for PR
Extracts and ranks passagesusing surface-text techniques
Extracts and ranks answersusing NL techniques
Q AQuestion Parse
Semantic
Transformation
Recognition of Expected Answer Type
Keyword Extraction
Question Processing Answer Processing
![Page 20: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/20.jpg)
Question Processing
Two main tasks– Determining the type of the answer– Extract keywords from the question and
formulate a query
![Page 21: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/21.jpg)
Answer Types
Factoid questions…– Who, where, when, how many…– The answers fall into a limited and
somewhat predictable set of categories• Who questions are going to be answered
by… • Where questions…
– Generally, systems select answer types from a set of Named Entities, augmented with other types that are relatively easy to extract
![Page 22: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/22.jpg)
Answer Types
Of course, it isn’t that easy…– Who questions can have organizations
as answers• Who sells the most hybrid cars?
– Which questions can have people as answers
• Which president went to war with Mexico?
![Page 23: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/23.jpg)
Answer Type Taxonomy Contains ~9000 concepts reflecting expected
answer types Merges named entities with the WordNet hierarchy
![Page 24: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/24.jpg)
Answer Type Detection
Most systems use a combination of hand-crafted rules and supervised machine learning to determine the right answer type for a question.
Not worthwhile to do something complex here if it can’t also be done in candidate answer passages.
![Page 25: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/25.jpg)
Keyword Selection
Answer Type indicates what the question is looking for:– It can be mapped to a NE type and used
for search in enhanced index Lexical terms (keywords) from the
question, possibly expanded with lexical/semantic variations provide the required context.
![Page 26: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/26.jpg)
Keyword Extraction
Questions approximated by sets of unrelated keywordsQuestion (from TREC QA track) Keywords
Q002: What was the monetary value of the Nobel Peace Prize in 1989?
monetary, value, Nobel, Peace, Prize
Q003: What does the Peugeot company manufacture?
Peugeot, company, manufacture
Q004: How much did Mercury spend on advertising in 1993?
Mercury, spend, advertising, 1993
Q005: What is the name of the managing director of Apricot Computer?
name, managing, director, Apricot, Computer
![Page 27: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/27.jpg)
Keyword Selection Algorithm
1. Select all non-stopwords in quotations2. Select all NNP words in recognized
named entities3. Select all complex nominals with their
adjectival modifiers4. Select all other complex nominals5. Select all nouns with adjectival modifiers6. Select all other nouns7. Select all verbs8. Select the answer type word
![Page 28: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/28.jpg)
Passage RetrievalExtracts and ranks passagesusing surface-text techniques
PassageRetrieval
Answer Extraction
Theorem Prover
Answer Justification
Answer Reranking
Axiomatic Knowledge Base
WordNetNER WordNetNER
DocumentRetrieval
Keywords Passages
Question Semantics
Q AQuestion Parse
Semantic
Transformation
Recognition of Expected Answer Type
Keyword Extraction
Question Processing Answer Processing
![Page 29: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/29.jpg)
Passage Extraction Loop
Passage Extraction Component– Extracts passages that contain all selected keywords– Passage size dynamic– Start position dynamic
Passage quality and keyword adjustment– In the first iteration use the first 6 keyword selection
heuristics– If the number of passages is lower than a threshold ⇒
query is too strict ⇒ drop a keyword– If the number of passages is higher than a threshold ⇒
query is too relaxed ⇒ add a keyword
![Page 30: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/30.jpg)
Passage Scoring Passages are scored based on keyword windows
– For example, if a question has a set of keywords: {k1, k2, k3, k4}, and in a passage k1 and k2 are matched twice, k3 is matched once, and k4 is not matched, the following windows are built:
k1 k2 k3k2 k1
Window 1
k1 k2 k3k2 k1
Window 2
k1 k2 k3k2 k1
Window 3
k1 k2 k3k2 k1
Window 4
![Page 31: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/31.jpg)
Passage Scoring Passage ordering is performed using
a sort that involves three scores:– The number of words from the question
that are recognized in the same sequence in the window
– The number of words that separate the most distant keywords in the window
– The number of unmatched keywords in the window
![Page 32: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/32.jpg)
Answer Extraction
Extracts and ranks answersusing NL techniques
PassageRetrieval
Answer Extraction
Theorem Prover
Answer Justification
Answer Reranking
Axiomatic Knowledge Base
WordNetNER WordNetNER
DocumentRetrieval
Keywords Passages
Question Semantics
Q AQuestion Parse
Semantic
Transformation
Recognition of Expected Answer Type
Keyword Extraction
Question Processing Answer Processing
![Page 33: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/33.jpg)
Ranking Candidate Answers
■ Answer type: Person■ Text passage:
“Among them was Christa McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in “Raiders of the Lost Ark”, plays McAuliffe. Brian Kerwin is featured as shuttle pilot Mike Smith...”
Q066: Name the first private citizen to fly in space.
![Page 34: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/34.jpg)
Ranking Candidate Answers
■ Answer type: Person■ Text passage:
“Among them was Christa McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in “Raiders of the Lost Ark”, plays McAuliffe. Brian Kerwin is featured as shuttle pilot Mike Smith...”
■ Best candidate answer: Christa McAuliffe
Q066: Name the first private citizen to fly in space.
![Page 35: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/35.jpg)
Features for Answer Ranking Number of question terms matched in the answer passage Number of question terms matched in the same phrase as the
candidate answer Number of question terms matched in the same sentence as
the candidate answer Flag set to 1 if the candidate answer is followed by a
punctuation sign Number of question terms matched, separated from the
candidate answer by at most three words and one comma Number of terms occurring in the same order in the answer
passage as in the question Average distance from candidate answer to question term
matches
![Page 36: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/36.jpg)
Lexical ChainsQuestion: When was the internal combustion engine
invented?Answer: The first internal combustion engine was built in
1867.Lexical chains:(1) invent:v#1 → HYPERNIM → create_by_mental_act:v#1 →
HYPERNIM → create:v#1 → HYPONIM → build:v#1
Question: How many chromosomes does a human zygote have?
Answer: 46 chromosomes lie in the nucleus of every normal human cell.
Lexical chains:(1) zygote:n#1 → HYPERNIM → cell:n#1 → HAS.PART →
nucleus:n#1
![Page 37: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/37.jpg)
Theorem ProverQ: What is the age of the solar system?QLF: quantity_at(x2) & age_nn(x2) & of_in(x2,x3) & solar_jj(x3) &
system_nn(x3)Question Axiom: (exists x1 x2 x3 (quantity_at(x2) & age_NN(x2) &
of_in(x2,x3) & solar_jj(x3) & system_nn(x3))Answer: The solar system is 4.6 billion years old.Wordnet Gloss: old_jj(x6) ↔ live_vb(e2,x6,x2) & for_in(e2,x1) &
relatively_jj(x1) & long_jj(x1) & time_nn(x1) & or_cc(e5,e2,e3) & attain_vb(e3,x6,x2) & specific_jj(x2) & age_nn(x2)
Linguistic Axiom: all x1 (quantity_at(x1) & solar_jj(x1) & system_nn(x1) → of_in(x1,x1))
Proof: ¬quantity_at(x2) | ¬age_nn(x2) | ¬of_in(x2,x3) | ¬solar_jj(x3) | ¬system_nn(x3)
Refutation assigns value to x2
![Page 38: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/38.jpg)
Is the Web Different?
In TREC (and most commercial applications), retrieval is performed against a smallish closed collection of texts.
The diversity/creativity in how people express themselves necessitates all that work to bring the question and the answer texts together.
But…
![Page 39: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/39.jpg)
The Web is Different
On the Web popular factoids are likely to be expressed in a gazillion different ways.
At least a few of which will likely match the way the question was asked.
So why not just grep (or agrep) the Web using all or pieces of the original question.
![Page 40: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/40.jpg)
AskMSR
Process the question by…– Forming a search engine query from the
original question– Detecting the answer type
Get some results Extract answers of the right type
based on– How often they occur
![Page 41: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/41.jpg)
Step 1: Rewrite the questions
Intuition: The user’s question is often syntactically quite close to sentences that contain the answer
– Where is the Louvre Museum located? • The Louvre Museum is located in Paris
– Who created the character of Scrooge?• Charles Dickens created the character of
Scrooge.
![Page 42: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/42.jpg)
Query rewritingClassify question into seven categories
– Who is/was/are/were…?– When is/did/will/are/were …?– Where is/are/were …?
a. Hand-crafted category-specific transformation rulese.g.: For where questions, move ‘is’ to all possible locations
Look to the right of the query terms for the answer.
“Where is the Louvre Museum located?” → “is the Louvre Museum located” → “the is Louvre Museum located” → “the Louvre is Museum located” → “the Louvre Museum is located” → “the Louvre Museum located is”
![Page 43: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/43.jpg)
Step 2: Query search engine
Send all rewrites to a Web search engine
Retrieve top N answers (100-200) For speed, rely just on search
engine’s “snippets”, not the full text of the actual document
![Page 44: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/44.jpg)
Step 3: Gathering N-Grams Enumerate all N-grams (N=1,2,3) in all
retrieved snippets Weight of an n-gram: occurrence count,
each weighted by “reliability” (weight) of rewrite rule that fetched the document– Example: “Who created the character of
Scrooge?”Dickens 117Christmas Carol 78Charles Dickens 75Disney 72Carl Banks 54A Christmas 41Christmas Carol 45Uncle 31
![Page 45: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/45.jpg)
Step 4: Filtering N-Grams
Each question type is associated with one or more “data-type filters” = regular expressions for answer types
Boost score of n-grams that match the expected answer type.
Lower score of n-grams that don’t match.
![Page 46: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/46.jpg)
Step 5: Tiling the Answers
Dickens
Charles Dickens
Mr Charles
Scores
20
15
10
merged, discardold n-grams
Mr Charles DickensScore 45
![Page 47: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/47.jpg)
Results
Standard TREC contest test-bed (TREC 2001): 1M documents; 900 questions– Technique does ok, not great (would have
placed in top 9 of ~30 participants)– But with access to the Web… they do
much better, would have come in second on TREC 2001
![Page 48: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/48.jpg)
Harder Questions
Factoid question answering is really pretty silly.
A more interesting task is one where the answers are fluid and depend on the fusion of material from disparate texts over time.– Who is Condoleezza Rice?– Who is Mahmoud Abbas?– Why was Arafat flown to Paris?
![Page 49: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/49.jpg)
Components
![Page 50: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/50.jpg)
Language Processing Tools
Maximum Entropy classifier Sentence Splitter Multi-language POS Tagger Multi-language NE Tagger Conceptual clustering
![Page 51: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/51.jpg)
MaxEntropy: application
Sentence Splitting Not all punctuations are sentence
boundaries:– U.S.A.– St. Helen– 3.14
Use features like:– Capitalization (previous, next word)– Present in abbreviation list– Suffix/prefix digits– Suffix/prefix long
Precision: > 95%
![Page 52: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/52.jpg)
Part of Speech Tagging
TreeTagger: statistic package based on HMM and decision trees
Trained on manually tagged text Full language lexicon (with all
inflections: 140.000 words for Italian)
![Page 53: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/53.jpg)
Training Corpus
Il DET:def:*:*:masc:sg _ilpresidente NOM:*:*:*:masc:sg _presidentedella PRE:det:*:*:femi:sg _delRepubblica NOM:*:*:*:femi:sg _repubblicafrancese ADJ:*:*:*:femi:sg _franceseFrancois NPR:*:*:*:*:* _FrancoisMitterrand NPR:*:*:*:*:* _Mitterrandha VER:aux:pres:3:*:sg _avereproposto VER:*:pper:*:masc:sg _proporre…
![Page 54: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/54.jpg)
Named Entity Tagger
Uses MaxEntropy NE categories:
– Top level: NAME, ORGANIZATION, LOCATION, QUANTITY, TIME, EVENT, PRODUCT
– Second level: 30-100. E.g. QUANTITY:• MONEY, CARDINAL, PERCENT,
MEASURE, VOLUME, AGE, WEIGHT, SPEED, TEMPERATURE, ETC.
See resources at CoNLL (cnts.uia.ac.be/connl2004)
![Page 55: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/55.jpg)
NE Features
Feature types:– word-level (es. capitalization, digits, etc.)– punctuation– POS tag– Category designator (Mr, Av.)– Category suffix (center, museum, street, etc.) – Lowercase intermediate terms (of, de, in)– presence in controlled dictionaries (locations,
people, organizations) Context: words in position -1, 0, +1
![Page 56: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/56.jpg)
Sample training document<TEXT> Today the <ENAMEX TYPE='ORGANIZATION'>Dow Jones</ENAMEX>
industrial average gained <NUMEX TYPE='MONEY'>thirtyeight and three quarter points</NUMEX>.
When the first American style burger joint opened in <ENAMEX TYPE='LOCATION'>London</ENAMEX>'s fashionable <ENAMEX TYPE='LOCATION'>Regent street</ENAMEX> some <TIMEX TYPE='DURATION'>twenty years</TIMEX> ago, it was mobbed.
Now it's <ENAMEX TYPE='LOCATION'>Asia</ENAMEX>'s turn.</TEXT><TEXT> The temperatures hover in the <NUMEX
TYPE='MEASURE'>nineties</NUMEX>, the heat index climbs into the <NUMEX TYPE='MEASURE'>hundreds</NUMEX>.
And that's continued bad news for <ENAMEX TYPE='LOCATION'>Florida</ENAMEX> where wildfires have charred nearly <NUMEX TYPE='MEASURE'>three hundred square miles</NUMEX> in the last <TIMEX TYPE='DURATION'>month</TIMEX> and destroyed more than a <NUMEX TYPE='CARDINAL'>hundred</NUMEX> homes.
</TEXT>
![Page 57: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/57.jpg)
Clustering
Classification: assign an item to one among a given set of classes
Clustering: find groupings of similar items (i.e. generate the classes)
![Page 58: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/58.jpg)
Conceptual Clustering of results
Similar to Vivisimo– Built on the fly rather than from– Predefined categories (Northern Light)
Generalized suffix tree of snippets Stemming Stop words (articulated, essential)
![Page 59: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/59.jpg)
PiQASso: Pisa Question Answering System
“Computers are useless, they can only give answers”
Pablo Picasso
![Page 60: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/60.jpg)
PiQASso Architecture
SentenceSplitter
Indexer
QueryFormulation/Expansion
WordNet
MiniPar
?
Documentcollection
MiniPar
TypeMatching
RelationMatching
Answer Pars
AnswerScoring
PopularityRanking
Answer
found?
Answer
Questionanalysis
Answer analysis
WNSense
QuestionClassification
![Page 61: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/61.jpg)
Linguistic tools
• extracts lexical knowledge from WordNet
• classifies words according to WordNet top-level categories, weighting its senses
• computes distance between words based on is-a links
• suggests word alternatives for query expansion
What metal has the highest melting point?
subj lex-mod
obj
mod
WNSense Minipar [D. Lin]
Example: Theatre
Categorization: artifact 0.60, communication 0.40
Synonyms: dramaturgy, theater, house, dramatics
• Identifies dependency relations between words (e.g. subject, object, modifiers)
• Provides POS tagging
• Detects semantic types of words (e.g. location, person, organization)
• Extensible: we integrated a Maximum Entropy based Named Entity Tagger
![Page 62: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/62.jpg)
Question AnalysisWhat metal has the highest melting point?
metal, highest, melting, point
2. Keyword extraction
1. Parsing
3. Answer type detection
SUBSTANCE
4. Relation extraction
<SUBSTANCE, has, subj><point, has, obj><melting, point, lex-mod><highest, point, mod>
1. NL question is parsed
2. POS tags are used to select search keywords
3. Expected answer type is determined applying heuristic rules to the dependency tree
4. Additional relations are inferred and the answer entity is identified
What metal has the highest melting point?
subj lex-mod
obj
mod
![Page 63: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/63.jpg)
Answer Analysis
Tungsten is a very dense material and has the highest melting point of any metal.
1 Parsing………….
2 Answer type check 3 Relation extraction
SUBSTANCE<tungsten, material, pred><tungsten, has, subj><point, has, obj>…
4 Matching Distance
Tungsten
6 Popularity Ranking
ANSWER
1. Parse retrieved paragraphs
2. Paragraphs not containing an entity of the expected type are discarded
3. Dependency relations are extracted from Minipar output
4. Matching distance between word relations in question and answer is computed
5. Too distant paragraphs are filtered out
6. Popularity rank used to weight distances
5 Distance Filtering
![Page 64: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/64.jpg)
Match Distance between Question and Answer
Analyze relations between corresponding words considering:
number of matching words in question and in answer
distance between words. Ex: moon matching with satellite
relation types. Ex: words in the question related by subj while the matching words in the answer related by pred
![Page 65: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/65.jpg)
Improving PIQASso
![Page 66: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/66.jpg)
More NLP
NLP techniques largely unsuccessful at information retrieval– Document retrieval as primary measure
of information retrieval success• Document retrieval reduces the need for
NLP techniques– Discourse factors can be ignored– Query words perform word-sense
disambiguation– Lack of robustness:
• NLP techniques are typically not as robust as word indexing
![Page 67: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/67.jpg)
How these technologies help? Question Analysis
– The tag of the predicted category is added to the query
Named-Entity Detection:– The NE categories found in text are included
as tags in the index
What party is John Kerry in? (ORGANIZATION)
John Kerry defeated John Edwards in the primaries for the Democratic Party.
Tags: PERSON, ORGANIZATION
![Page 68: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/68.jpg)
NLP Technologies
Coreference Relations:– Interpretation of a paragraph may
depend on the context in which it occurs
Description Extraction:– Appositive and predicate nominative
constructions provide descriptive terms about entities
![Page 69: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/69.jpg)
Represented as annotations associated to words, i.e. words in the same position as the reference
Coreference Relations
How long was Margaret Thatcher the prime minister? (DURATION)
The truth, which has been added to over each of her 11 1/2 years in power, is that they don't make many like her anymore.Tags: DURATIONColocated: her, MARGARET THATCHER
![Page 70: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/70.jpg)
Description Extraction
Identifies DESCRIPTION category Allows descriptive terms to be used
in term expansion
Famed architect Frank Gary… Tags: DESCRIPTION, PERSON, LOCATION
Buildings he designed include the Guggenheim Museum in Bilbao.Colocation: he, FRANK GARY
Who is Frank Gary? (DESCRIPTION) What architect designed the Guggenheim Museum in Bilbao? (PERSON)
![Page 71: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/71.jpg)
NLP Technologies
Question Analysis:– identify the semantic type of the
expected answer implicit in the query Named-Entity Detection:
– determine the semantic type of proper nouns and numeric amounts in text
![Page 72: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/72.jpg)
Will it work?
Will these semantic relations improve paragraph retrieval?– Are the implementations robust enough
to see a benefit across large document collections and question sets?
– Are there enough questions where these relationships are required to find an answer?
Hopefully yes!
![Page 73: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/73.jpg)
Preprocessing
Paragraph Detection Sentence Detection Tokenization POS Tagging NP-Chunking
![Page 74: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/74.jpg)
Queries to a NE enhanced index
text matches bushtext matches PERSON:bushtext matches LOCATION:* & PERSON:
bin-ladentext matches DURATION:*
PERSON:margaret-thatcher prime-minister
![Page 75: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/75.jpg)
Coreference
Task:– Determine space of entity extents:
• Basal noun phrases:– Named entities consisting of multiple basal
noun phrases are treated as a single entity• Pre-nominal proper nouns• Possessive pronouns
– Determine which extents refer to the same entity in the world
![Page 76: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/76.jpg)
Paragraph Retrieval
Indexing:– add NE tags for each NE category
present in the text– add coreference relationships– Use syntactically-based categorical
relations to create a DESCRIPTION category for term expansion
– Use IXE passage indexer
![Page 77: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/77.jpg)
High Composability
DocInfo
PassageDoc
Collection<DocInfo>
Collection<PassageDoc>namedatesize
textboundaries QueryCursor
PassageQueryCursor
next()
next()
Cursornext()
![Page 78: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/78.jpg)
Tagged Documents
QueryCursor
QueryCursorTaggedWord
QueryCursorWord
select documents where – text matches bush– text matches PERSON:bush– text matches osama & LOCATION:*
![Page 79: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/79.jpg)
Combination
Searching passages on a collection of tagged documents
PassageQueryCursor<Collection<TaggedDoc>>
QueryCursor<Collection>
![Page 80: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/80.jpg)
Paragraph Retrieval
Retrieval:– Use question analysis component to
predict answer category and append it to the question
– Evaluate using TREC questions and answer patterns
• 500 questions
![Page 81: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/81.jpg)
System Overview
NE Recognizer
Coreference Resolution
Documents
IXE Search
Question Analysis
Question
Paragraphs
Description Extraction
Paragraphs+
Sent. Splitter
POS tagger
Paragr. Splitter
Tokenization
IXE indexer
Indexing Retrieval
![Page 82: Advanced Information Retrieval - uni-leipzig.deasv.informatik.uni-leipzig.de/document/file_link/23/AdvIR-7.pdf · Advanced Information Retrieval Wintersemester 2007/08 Teil 7 Question](https://reader034.vdocuments.net/reader034/viewer/2022042100/5e7c9d98b654a52680639430/html5/thumbnails/82.jpg)
Conclusion
QA is a challenging task Involves state of the art techniques
in various fields:– IR– NLP– AI– Managing large data sets– Advanced Software Technologies