piquant
DESCRIPTION
PIQUANT. A Question Answering system integrating Information Retrieval, Natural Language Processing and Knowledge Representation. Practical Intelligent QUestion ANswering Technology. Prime Contractor: IBM T.J. Watson Research Center 30 Saw Mill River Road Hawthorne, NY 10532. - PowerPoint PPT PresentationTRANSCRIPT
PIQUANT at AQUAINT Kick-Off Dec 3-5 2001
PIQUANTPractical Intelligent
QUestion ANswering Technology
A Question Answering system integrating Information Retrieval, Natural Language Processing and Knowledge Representation
Prime Contractor:IBM T.J. Watson Research CenterIBM T.J. Watson Research Center
30 Saw Mill River Road30 Saw Mill River RoadHawthorne, NY 10532Hawthorne, NY 10532
Subcontractor: Cycorp
PIQUANT at AQUAINT Kick-Off Dec 3-5 2001
IBM & CycorpBringing Complementary Strengths to QA
• IBM
– Information Retrieval
– Natural Language Processing
– Scalable System Architectures
– Business Applications Architectures
• Cycorp
– Structured Knowledge Representation
– Rich Common Sense Knowledge Bases
– Deep Inferencing
– Ontologies
Both symbolic and statistical}
PIQUANT at AQUAINT Kick-Off Dec 3-5 2001
Experience from TREC8-10
• End-to-end system that has performed well
• Invaluable experience in learning where the problems are:– Coverage– Engineering– Understanding
PIQUANT at AQUAINT Kick-Off Dec 3-5 2001
IBM’s PIQUANT Principal Extensions
• Integration of IR/NLP with Structure KBs and Deep Inference
– Knowledge System to assist in decomposing and answering questions
– Provide justification and/or invalidation of candidate answers
• Parallel Solution Paths and Pervasive Confidence Analysis
– Multiple parallel solution approaches to problem/subproblem
– Pervasive use of confidences to mediate management of alternatives
– Extensive reinforcement of symbolic approaches by statistical data
• Well-Defined Component Architecture
– Modular
– Defined interfaces between NLP, IR, KS and Statistical Components
– Declarative representation of question answering plans
PIQUANT at AQUAINT Kick-Off Dec 3-5 2001
Where Knowledge-Systems Help
Heuristic of finding short passages with all the query terms/semantic classes is good but not sufficient. E.g. from TREC9:
Q: How much folic acid should an expectant mother take daily?
A: 360 tons
Q: What is the diameter of the Earth?
A: 14 ft.
Q: How many states have a lottery?
A: 3,312
We will investigate the use of a sophisticated inference engine and knowledge-base (Cyc) to eliminate such answers.
PIQUANT at AQUAINT Kick-Off Dec 3-5 2001
Question Complexity
“Simple” questions are not a solved problem:
• Complex questions can be decomposed into simpler components.
• If simpler questions cannot be handled successfully, there’s no hope for more complex ones.
BUT:
Areas not explored (intentionally) by TREC to date:
• spelling errors
• grammatical errors
• syntactic precision e.g. significance of articles
• not, only, just …
PIQUANT at AQUAINT Kick-Off Dec 3-5 2001
Is there such a thing as a “simple” question?
A: How many members are there in the Cabinet?
Which is more complex?
Suppose there is no text that gives the answer explicitly
42 (from HGTTG)
B: What is the meaning of life?
“simple” -> “simple to state”
Complexity is a function for question and data source
PIQUANT at AQUAINT Kick-Off Dec 3-5 2001
Different Solution ApproachesWhat is the largest city in England?
• Text Match– Find text that says “London is the largest city in England” (or paraphrase).
Confidence is confidence of NL parser * confidence of source. Find multiple instances and confidence of source -> 1.
• “Superlative” Search– Find a table of English cities and their populations, and sort.
– Find a list of the 10 largest cities in the world, and see which are in England. • Uses logic: if L > all objects in set R then L > all objects in set E R.
– Find the population of as many individual English cities as possible, and choose the largest.
• Heuristics– London is the capital of England. (Not guaranteed to imply it is the largest city, but
quite likely.)
• Complex Inference – E.g. “Birmingham is England’s second-largest city”; “Paris is larger than Birmingham”;
“London is larger than Paris”; “London is in England”.
PIQUANT at AQUAINT Kick-Off Dec 3-5 2001
Parallel Confidence Propagation
QFRAMES QPLANS
Question Classifications Confidences
.
.
Candidate Answers Selected Answers
Goals (logical forms) with boolean connectives, sequencing and recombination information
Validation and Sanity Checks Eliminate some Answers and Adjust Confidences
PIQUANT at AQUAINT Kick-Off Dec 3-5 2001
Probability Management
• Associated with every data element
• A priori probabilities associated with every processing module. Given default values at first, then learned as experience is gained
• Bayesian, Dempster-Shafer, …
PIQUANT at AQUAINT Kick-Off Dec 3-5 2001
KnowledgeRepresentation
ReasoningServices
Ontology& DataServicesQuestion
Classification
QA-Manager Internals
QFRAMES
QPLANS
QPLAN Execution Engine
IR
WN
DBCYC
KB
NLP Components
Linguistic Question Analysis
AnswerPresentation
Answers
QFRAME
Plan Generation
AnswerResolution
Answer Candidates
QGOAL
PIQUANT at AQUAINT Kick-Off Dec 3-5 2001
Question Classification “Daemons”
• Definition – What is OPEC?
• Comparative & Superlative– Does Kuwait export more oil than Venezuela?– Which country exports the most uranium?
• Profile – Who is Rabbani?
• Relationship– Which countries are allies of Qatar?
• Chronology– Was OPEC formed before Nixon became president?
• Enumeration – How many oil refineries are in the U.S.?
• Cause & Effect– Why did Iraq invade Kuwait?
• Combination– Which countries are Qatar’s most powerful allies?
Classifiers act as “daemons”; perform recognition and sub-plan generation
PIQUANT at AQUAINT Kick-Off Dec 3-5 2001
Architectural Features • Modularity
– Self-contained components with well-defined functions and interfaces
– Ease of development, experimentation and maintenance
• Robustness – If a “Knowledge Source” fails the system will continue to operate with
(minor) degradation
– Exploit redundancy to find best answer
• Reinforcement– Multiple sources of evidence for same answer are synergistic
• Transparency – Explicit plans permit ready generation of explanations and symbolic analysis
PIQUANT at AQUAINT Kick-Off Dec 3-5 2001
Implementation Highlights• Predictive Annotation
– Shift computational burden from NLP towards IR– Index semantic labels along with text– Beat the Precision-Recall tradeoff by boosting
precision at little cost to recall
• Virtual Annotation– Answer definitional (“What is”) questions by
combination of linguistic, ontological and statistical techniques
– Find the hypernyms in e.g. WordNet that have the best combination of closeness and co-occurrence
PIQUANT at AQUAINT Kick-Off Dec 3-5 2001
Predictive Annotation (1)Predictive Annotation• Annotate entire corpus and index semantic labels along with text
• Identify sought-after label(s) in questions and include in queries
• Example: Question is “Where is Belize?”– “Where” can map to CONTINENT$, COUNTRY$, STATE$, CITY$, CAPITAL$, PLACE$.
– Knowing Belize is a country: “Where is Belize?” {CONTINENT$ Belize}
(assume CONTINENT$ Continents plus sub-continental regions)
• Suppose text is “… including Belize in central America … ”
including COUNTRY$
PLACE$
CONTINENT$
PLACE$
Belize
in
central America
PIQUANT at AQUAINT Kick-Off Dec 3-5 2001
Predictive Annotation (2)Increased precision of enhanced bag-of-words:
– “Where is Belize” {CONTINENT$ Belize}
– Belize occurs 704 times in TREC corpus
– Belize and CONTINENT$ co-occur in only 22 sentences
• Note: data structure equally appropriate for “Name a country in
Central America”, which {COUNTRY$ Central America}
including COUNTRY$
PLACE$
CONTINENT$
PLACE$
Belize
in
central America
PIQUANT at AQUAINT Kick-Off Dec 3-5 2001
Summary
• Leverage existing technology base
• Parallel approach to find answer, exploiting redundancy
• Declarative plan representation
• Associate confidences with each component and each intermediate and final result
• CYC’s knowledge-base and inference engine to solve sub-problems and eliminate nonsensical answer candidates
PIQUANT at AQUAINT Kick-Off Dec 3-5 2001
High-Level 1st Year Development Plan• Finalize design of data-structures:
– QFRAME: question and derived attributes
– QPLAN: script for tackling solution
– QGOAL: logical-form like structure representing predicate for instantiation or verification
• Build several recognizers and QPLAN executor (many pieces already exist)
• Run on many examples to fine-tune and to develop a priori component confidence values
• Build answer resolution module
PIQUANT at AQUAINT Kick-Off Dec 3-5 2001
Statistical Features
• Co occurrences to support definition answers
• Machine Learning to evaluate search engine results
• Machine Learning to assist in answer selection
• Learn probable confidence of question recognizers
PIQUANT at AQUAINT Kick-Off Dec 3-5 2001
QPLAN• Multiple per question type
• Declarative representation of a solution
– Independent of knowledge source’s details
• Executed by planning engine
• Sequence of solution steps
– structure knowledge queries
– text search queries
– statistical queries etc.
• Confidences learned over time
PIQUANT at AQUAINT Kick-Off Dec 3-5 2001
High-level View of Solution Steps
1. Question is processed by linguistic tools.2. Question is classified into 1 or more types3. Parallel solution plan is generated and executed.4. Responses are gathered and examined.5. If necessary, plan is revised and steps 3-5 revisited.6. Candidate answers are checked for sanity, merged,
sorted and presented
Note:
a. Dialog manager functions are not considered here.
b. All data-structures are assigned confidences and all selections of next steps are mediated by probabilistic computations.