Download - PIQUANT


PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

PIQUANTPractical Intelligent

QUestion ANswering Technology

A Question Answering system integrating Information Retrieval, Natural Language Processing and Knowledge Representation

Prime Contractor:IBM T.J. Watson Research CenterIBM T.J. Watson Research Center

30 Saw Mill River Road30 Saw Mill River RoadHawthorne, NY 10532Hawthorne, NY 10532

Subcontractor: Cycorp

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

IBM & CycorpBringing Complementary Strengths to QA


– Information Retrieval

– Natural Language Processing

– Scalable System Architectures

– Business Applications Architectures

• Cycorp

– Structured Knowledge Representation

– Rich Common Sense Knowledge Bases

– Deep Inferencing

– Ontologies

Both symbolic and statistical}

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Experience from TREC8-10

• End-to-end system that has performed well

• Invaluable experience in learning where the problems are:– Coverage– Engineering– Understanding

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

IBM’s PIQUANT Principal Extensions

• Integration of IR/NLP with Structure KBs and Deep Inference

– Knowledge System to assist in decomposing and answering questions

– Provide justification and/or invalidation of candidate answers

• Parallel Solution Paths and Pervasive Confidence Analysis

– Multiple parallel solution approaches to problem/subproblem

– Pervasive use of confidences to mediate management of alternatives

– Extensive reinforcement of symbolic approaches by statistical data

• Well-Defined Component Architecture

– Modular

– Defined interfaces between NLP, IR, KS and Statistical Components

– Declarative representation of question answering plans

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Where Knowledge-Systems Help

Heuristic of finding short passages with all the query terms/semantic classes is good but not sufficient. E.g. from TREC9:

Q: How much folic acid should an expectant mother take daily?

A: 360 tons

Q: What is the diameter of the Earth?

A: 14 ft.

Q: How many states have a lottery?

A: 3,312

We will investigate the use of a sophisticated inference engine and knowledge-base (Cyc) to eliminate such answers.

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Question Complexity

“Simple” questions are not a solved problem:

• Complex questions can be decomposed into simpler components.

• If simpler questions cannot be handled successfully, there’s no hope for more complex ones.


Areas not explored (intentionally) by TREC to date:

• spelling errors

• grammatical errors

• syntactic precision e.g. significance of articles

• not, only, just …

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Is there such a thing as a “simple” question?

A: How many members are there in the Cabinet?

Which is more complex?

Suppose there is no text that gives the answer explicitly

42 (from HGTTG)

B: What is the meaning of life?

“simple” -> “simple to state”

Complexity is a function for question and data source

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Different Solution ApproachesWhat is the largest city in England?

• Text Match– Find text that says “London is the largest city in England” (or paraphrase).

Confidence is confidence of NL parser * confidence of source. Find multiple instances and confidence of source -> 1.

• “Superlative” Search– Find a table of English cities and their populations, and sort.

– Find a list of the 10 largest cities in the world, and see which are in England. • Uses logic: if L > all objects in set R then L > all objects in set E R.

– Find the population of as many individual English cities as possible, and choose the largest.

• Heuristics– London is the capital of England. (Not guaranteed to imply it is the largest city, but

quite likely.)

• Complex Inference – E.g. “Birmingham is England’s second-largest city”; “Paris is larger than Birmingham”;

“London is larger than Paris”; “London is in England”.

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Parallel Confidence Propagation


Question Classifications Confidences



Candidate Answers Selected Answers

Goals (logical forms) with boolean connectives, sequencing and recombination information

Validation and Sanity Checks Eliminate some Answers and Adjust Confidences

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Probability Management

• Associated with every data element

• A priori probabilities associated with every processing module. Given default values at first, then learned as experience is gained

• Bayesian, Dempster-Shafer, …

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001


High-Level Architecture

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

IBM PIQUANT Block Diagram

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001



Ontology& DataServicesQuestion


QA-Manager Internals



QPLAN Execution Engine





NLP Components

Linguistic Question Analysis




Plan Generation


Answer Candidates


PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Question Classification “Daemons”

• Definition – What is OPEC?

• Comparative & Superlative– Does Kuwait export more oil than Venezuela?– Which country exports the most uranium?

• Profile – Who is Rabbani?

• Relationship– Which countries are allies of Qatar?

• Chronology– Was OPEC formed before Nixon became president?

• Enumeration – How many oil refineries are in the U.S.?

• Cause & Effect– Why did Iraq invade Kuwait?

• Combination– Which countries are Qatar’s most powerful allies?

Classifiers act as “daemons”; perform recognition and sub-plan generation

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Architectural Features • Modularity

– Self-contained components with well-defined functions and interfaces

– Ease of development, experimentation and maintenance

• Robustness – If a “Knowledge Source” fails the system will continue to operate with

(minor) degradation

– Exploit redundancy to find best answer

• Reinforcement– Multiple sources of evidence for same answer are synergistic

• Transparency – Explicit plans permit ready generation of explanations and symbolic analysis

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001


Implementation Highlights

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Implementation Highlights• Predictive Annotation

– Shift computational burden from NLP towards IR– Index semantic labels along with text– Beat the Precision-Recall tradeoff by boosting

precision at little cost to recall

• Virtual Annotation– Answer definitional (“What is”) questions by

combination of linguistic, ontological and statistical techniques

– Find the hypernyms in e.g. WordNet that have the best combination of closeness and co-occurrence

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Predictive Annotation (1)Predictive Annotation• Annotate entire corpus and index semantic labels along with text

• Identify sought-after label(s) in questions and include in queries

• Example: Question is “Where is Belize?”– “Where” can map to CONTINENT$, COUNTRY$, STATE$, CITY$, CAPITAL$, PLACE$.

– Knowing Belize is a country: “Where is Belize?” {CONTINENT$ Belize}

(assume CONTINENT$ Continents plus sub-continental regions)

• Suppose text is “… including Belize in central America … ”

including COUNTRY$






central America

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Predictive Annotation (2)Increased precision of enhanced bag-of-words:

– “Where is Belize” {CONTINENT$ Belize}

– Belize occurs 704 times in TREC corpus

– Belize and CONTINENT$ co-occur in only 22 sentences

• Note: data structure equally appropriate for “Name a country in

Central America”, which {COUNTRY$ Central America}

including COUNTRY$






central America

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001


• Leverage existing technology base

• Parallel approach to find answer, exploiting redundancy

• Declarative plan representation

• Associate confidences with each component and each intermediate and final result

• CYC’s knowledge-base and inference engine to solve sub-problems and eliminate nonsensical answer candidates

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

High-Level 1st Year Development Plan• Finalize design of data-structures:

– QFRAME: question and derived attributes

– QPLAN: script for tackling solution

– QGOAL: logical-form like structure representing predicate for instantiation or verification

• Build several recognizers and QPLAN executor (many pieces already exist)

• Run on many examples to fine-tune and to develop a priori component confidence values

• Build answer resolution module

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001


Back up Slides

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Statistical Features

• Co occurrences to support definition answers

• Machine Learning to evaluate search engine results

• Machine Learning to assist in answer selection

• Learn probable confidence of question recognizers

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

QPLAN• Multiple per question type

• Declarative representation of a solution

– Independent of knowledge source’s details

• Executed by planning engine

• Sequence of solution steps

– structure knowledge queries

– text search queries

– statistical queries etc.

• Confidences learned over time

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

High-level View of Solution Steps

1. Question is processed by linguistic tools.2. Question is classified into 1 or more types3. Parallel solution plan is generated and executed.4. Responses are gathered and examined.5. If necessary, plan is revised and steps 3-5 revisited.6. Candidate answers are checked for sanity, merged,

sorted and presented


a. Dialog manager functions are not considered here.

b. All data-structures are assigned confidences and all selections of next steps are mediated by probabilistic computations.

Top Related