aquaint bbn’s aqua project ana licuanan, jonathan may, scott miller, ralph weischedel, jinxi xu 3...
TRANSCRIPT
![Page 1: AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002](https://reader030.vdocuments.net/reader030/viewer/2022032612/56649ed35503460f94be34e9/html5/thumbnails/1.jpg)
AQUAINT
BBN’s AQUA ProjectBBN’s AQUA Project
Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu
3 December 2002
![Page 2: AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002](https://reader030.vdocuments.net/reader030/viewer/2022032612/56649ed35503460f94be34e9/html5/thumbnails/2.jpg)
2
AQUAINTBBN’s Approach to QABBN’s Approach to QA
• Theme: Use document retrieval, entity recognition, & proposition recognition
• Analyze the question
– Reduce question to propositions and a bag of words
– Predict the type of the answer
• Rank candidate answers using passage retrieval from primary corpus (the Aquaint corpus)
• Other knowledge sources (e.g. the Web) are optionally used to rerank answers
• Re-rank candidates based on propositions
• Estimate confidence for answers
![Page 3: AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002](https://reader030.vdocuments.net/reader030/viewer/2022032612/56649ed35503460f94be34e9/html5/thumbnails/3.jpg)
3
AQUAINTSystem DiagramSystem Diagram
Question Classification
Web Search
NP Labeling
Treebank
Name Annotation
Name Extraction
Parsing
Description ClassificationProposition Finding
Document Retrieval
Confidence Estimation
Passage Retrieval
Question
Answer & Confidence Score
Name Extraction
Regularization Proposition Bank
![Page 4: AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002](https://reader030.vdocuments.net/reader030/viewer/2022032612/56649ed35503460f94be34e9/html5/thumbnails/4.jpg)
AQUAINT
Question ClassificationQuestion Classification
![Page 5: AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002](https://reader030.vdocuments.net/reader030/viewer/2022032612/56649ed35503460f94be34e9/html5/thumbnails/5.jpg)
5
AQUAINTQuestion ClassificationQuestion Classification
• A hybrid approach based on rules and statistical parsing & question templates– Match question templates against statistical parses– Back off to statistical bag-of-word classification
• Example features used for classification– The type of WHNP starting the question (e.g. “Who”,
“What”, “When” …) – The headword of the core NP– WordNet definition– Bag of words– Main verb of the question
• Performance– TREC8&9 questions for training– ~85% when testing on TREC10
![Page 6: AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002](https://reader030.vdocuments.net/reader030/viewer/2022032612/56649ed35503460f94be34e9/html5/thumbnails/6.jpg)
6
AQUAINTExamples of Question AnalysisExamples of Question Analysis
• Where is the Taj Mahal?
– WHNP=where
– Answer type: Location or GPE
• Which pianist won the last International Tchaikovsky Competition?
– Headword of core NP=pianist,
– WordNet definition=person
– Answer type: Person
![Page 7: AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002](https://reader030.vdocuments.net/reader030/viewer/2022032612/56649ed35503460f94be34e9/html5/thumbnails/7.jpg)
7
AQUAINTQuestion-Answer TypesQuestion-Answer Types
Type Subtype
ORGANIZATIONCORPORATION EDUCATIONAL GOVERNMENT HOSPITAL HOTEL MUSEUM OTHER POLITICAL RELIGIOUS
LOCATION CONTINENT LAKE_SEA_OCEAN OTHER REGION RIVER BORDER
FAC AIRPORT ATTRACTION BRIDGE BUILDING HIGHWAY_STREET OTHER
GAME
PRODUCT DRUG OTHER VEHICLE WEAPON
NATIONALITY NATIONALITY OTHER POLITICAL RELIGION
LANGUAGE
FAC_DESC AIRPORT ATTRACTION BRIDGE BUILDING HIGHWAY_STREET OTHER
MONEY
GPE_DESC CITY COUNTRY OTHER STATE_PROVINCE
ORG_DESCCORPORATION EDUCATIONAL GOVERNMENT HOSPITAL HOTEL MUSEUM OTHER POLITICAL RELIGIOUS
CONTACT_INFO ADDRESS OTHER PHONE
WORK_OF_ART BOOK OTHER PAINTING PLAY SONG
*Thanks to USC/ISI and IBM groups for sharing the conclusions of their analyses.
![Page 8: AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002](https://reader030.vdocuments.net/reader030/viewer/2022032612/56649ed35503460f94be34e9/html5/thumbnails/8.jpg)
8
AQUAINTQuestion Answer Types (cont’d)Question Answer Types (cont’d)
PRODUCT_DESC OTHER VIHICLE WEAPON
PERSON
EVENT HURRICAN OTHER WAR
SUBSTANCE CHEMICAL DRUG FOOD OTHER
PER_DESC
PRODCUT OTHER
ORDINAL
ANIMAL
QUANTITY1D 1D_SPACE 2D 2D_SPACE 3D 3D_SPACE ENERGY OTHER SPEED WEIGHT TEMPERATURE
GPE CITY COUNTRY OTHER STATE_PROVINCE
DISEASE
CARDINAL
AGE
TIME
PLANT
PERCENT
LAW
DATE AGE DATE DURATION OTHER
![Page 9: AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002](https://reader030.vdocuments.net/reader030/viewer/2022032612/56649ed35503460f94be34e9/html5/thumbnails/9.jpg)
9
AQUAINTFrequency of Q TypesFrequency of Q Types
0
50
100
150
200
250P
ers
on
Qu
an
tity
Mo
ne
yP
erc
en
tO
rga
niz
atio
nO
rga
niz
atio
n-D
esc
Pro
du
ct-N
am
eP
rod
uct
-De
scF
aci
lity
Dis
ea
seR
ea
son
GP
EG
PE
-De
scW
ork
-of-
Art
Da
teE
ven
tT
ime
La
ng
ua
ge
Na
tion
alit
yL
oca
tion
-Na
me
De
finiti
on
Use
Oth
er
Ca
rdin
al
Ord
ina
lG
am
eC
on
tact
In
foA
nim
al
Pla
nt
Bio
Ca
use
-Eff
ect
-In
flue
nce
La
w
# i
n T
RE
C 8
, 9
, 1
0
![Page 10: AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002](https://reader030.vdocuments.net/reader030/viewer/2022032612/56649ed35503460f94be34e9/html5/thumbnails/10.jpg)
AQUAINT
InterpretationInterpretation
![Page 11: AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002](https://reader030.vdocuments.net/reader030/viewer/2022032612/56649ed35503460f94be34e9/html5/thumbnails/11.jpg)
11
AQUAINTIdentiFinderIdentiFinderTMTM Status Status
• Current IdentiFinder performance on types
• IdentiFinder easily trainable for other languages, e.g., Arabic and Chinese
Rec
all
Pre
cis
ion F
SubcategoryCategory
88 89 88.487 88 87.3
0
20
40
60
80
100
![Page 12: AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002](https://reader030.vdocuments.net/reader030/viewer/2022032612/56649ed35503460f94be34e9/html5/thumbnails/12.jpg)
12
AQUAINTProposition IndexingProposition Indexing
• A shallow semantic representation
– Deeper than bags of words
– But broad enough to cover all the text
• Characterizes documents by
– The entities they contain
– Propositions involving those entities
• Resolves all references to entities
– Whether named, described, or pronominal
• Represents all propositions that are directly stated in the text
![Page 13: AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002](https://reader030.vdocuments.net/reader030/viewer/2022032612/56649ed35503460f94be34e9/html5/thumbnails/13.jpg)
13
AQUAINTProposition Finding ExampleProposition Finding Example
Propositions
• (e1: “Dell”)
• (e2: “Comaq”)
• (e3: “the most PCs”)
• (e4: “2001”)
• (sold subj:e1, obj:e3, in:e4)
• (beating subj:e1, obj:e2)
• Question: Which company sold the most PCs in 2001?
• Text: Dell, beating Compaq, sold the most PCs in 2001.
• Passage retrieval would select the wrong answer
Answer
![Page 14: AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002](https://reader030.vdocuments.net/reader030/viewer/2022032612/56649ed35503460f94be34e9/html5/thumbnails/14.jpg)
14
AQUAINTProposition Recognition StrategyProposition Recognition Strategy
• Start with a lexicalized, probabilistic (LPCFG) parsing model
• Distinguish names by replacing NP labels with NPP
• Currently, rules normalize the parse tree to produce propositions
• At a later date, extend the statistical model to
– Predict argument labels for clauses
– Resolve references to entities
![Page 15: AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002](https://reader030.vdocuments.net/reader030/viewer/2022032612/56649ed35503460f94be34e9/html5/thumbnails/15.jpg)
15
AQUAINTConfidence EstimationConfidence Estimation
• Compute probability P(correct|Q,A) from the following featuresP(correct|Q,A)P(correct|type(Q), <m,n>, PropSat)– type(Q): question type– m: question length– n: number of matched question words in answer
context– PropSat: whether answer satisfies propositions in the
question• Confidence for answers found on the Web P(correct|Q,A)P(correct|Freq, InTrec)
– Freq=Number of Web hits, using Google– InTrec=Whether Q was also a top answer from Aquaint
corpus
![Page 16: AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002](https://reader030.vdocuments.net/reader030/viewer/2022032612/56649ed35503460f94be34e9/html5/thumbnails/16.jpg)
16
AQUAINT
Dependence of Answer Correctness Dependence of Answer Correctness on Question Typeon Question Type
0
0.1
0.2
0.3
0.4
0.5P
(cor
rrec
t|Typ
e)
![Page 17: AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002](https://reader030.vdocuments.net/reader030/viewer/2022032612/56649ed35503460f94be34e9/html5/thumbnails/17.jpg)
17
AQUAINT
Dependence on Proposition Dependence on Proposition SatisfactionSatisfaction
0
0.1
0.2
0.3
0.4
0.5
0.6
PropSat=True PropSat=False
P(c
orr
ec
t|P
rop
Sa
t)
![Page 18: AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002](https://reader030.vdocuments.net/reader030/viewer/2022032612/56649ed35503460f94be34e9/html5/thumbnails/18.jpg)
18
AQUAINT
Dependence on Number of Matched Dependence on Number of Matched WordsWords
0
0.1
0.2
0.3
0.4
0.5
0 2 4 6
number of matched words
p(co
rrec
t)
questionlength=3
questionlength=4
questionlength=5
![Page 19: AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002](https://reader030.vdocuments.net/reader030/viewer/2022032612/56649ed35503460f94be34e9/html5/thumbnails/19.jpg)
19
AQUAINT
Dependence of AnswerDependence of AnswerCorrectness on Web FrequencyCorrectness on Web Frequency
0
0.2
0.4
0.6
0.8
1
0 50 100 150
Fre que ncy of answe r in Google sum m arie s
P(c
orre
ct|F
,IN
TR
EC
)
INT REC t rue
INT REC false
![Page 20: AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002](https://reader030.vdocuments.net/reader030/viewer/2022032612/56649ed35503460f94be34e9/html5/thumbnails/20.jpg)
20
AQUAINTOfficial Results of TREC2002QAOfficial Results of TREC2002QA
RunTagUnranked Average
Precision
Ranked Average
Precision
Upper-bound
BBN2002A 0.186 0.257 0.498
BBN2002B 0.288 0.468 0.646
BBN2002C 0.284 0.499 0.641
• BBN2002A did not use Web
• BBN2002B&C used Web
• Unranked average precision=percentage of questions for which the first answer is correct
• Ranked average precision=Confidence weighted score, the official metric for TREC2002
• Upper-bound=confidence weighted score given perfect confidence estimation
![Page 21: AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002](https://reader030.vdocuments.net/reader030/viewer/2022032612/56649ed35503460f94be34e9/html5/thumbnails/21.jpg)
21
AQUAINTRecent Progress Recent Progress
• In the last six months, we have:
– Retrained our name tagger (IdentiFinderTM) for roughly 29 question types
– Distributed the re-trained English version of IdentiFinder to other sites
– Participated in the Question Answering track of TREC 2002
– Participated in a pilot evaluation of automatically answering definitional/biographical questions
– Developed a demonstration of our question answering system AQUA against streaming news