piquant

26
PIQUANT at AQUAINT Kick-Off Dec 3-5 2001 PIQUANT Practical Intelligent QUestion ANswering Technology A Question Answering system integrating Information Retrieval, Natural Language Processing and Knowledge Representation Prime Contractor: IBM T.J. Watson Research IBM T.J. Watson Research Center Center 30 Saw Mill River Road 30 Saw Mill River Road Hawthorne, NY 10532 Hawthorne, NY 10532 Subcontractor: Cycorp

Upload: griffin-mayer

Post on 31-Dec-2015

30 views

Category:

Documents


0 download

DESCRIPTION

PIQUANT. A Question Answering system integrating Information Retrieval, Natural Language Processing and Knowledge Representation. Practical Intelligent QUestion ANswering Technology. Prime Contractor: IBM T.J. Watson Research Center 30 Saw Mill River Road Hawthorne, NY 10532. - PowerPoint PPT Presentation

TRANSCRIPT

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

PIQUANTPractical Intelligent

QUestion ANswering Technology

A Question Answering system integrating Information Retrieval, Natural Language Processing and Knowledge Representation

Prime Contractor:IBM T.J. Watson Research CenterIBM T.J. Watson Research Center

30 Saw Mill River Road30 Saw Mill River RoadHawthorne, NY 10532Hawthorne, NY 10532

Subcontractor: Cycorp

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

IBM & CycorpBringing Complementary Strengths to QA

• IBM

– Information Retrieval

– Natural Language Processing

– Scalable System Architectures

– Business Applications Architectures

• Cycorp

– Structured Knowledge Representation

– Rich Common Sense Knowledge Bases

– Deep Inferencing

– Ontologies

Both symbolic and statistical}

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Experience from TREC8-10

• End-to-end system that has performed well

• Invaluable experience in learning where the problems are:– Coverage– Engineering– Understanding

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

IBM’s PIQUANT Principal Extensions

• Integration of IR/NLP with Structure KBs and Deep Inference

– Knowledge System to assist in decomposing and answering questions

– Provide justification and/or invalidation of candidate answers

• Parallel Solution Paths and Pervasive Confidence Analysis

– Multiple parallel solution approaches to problem/subproblem

– Pervasive use of confidences to mediate management of alternatives

– Extensive reinforcement of symbolic approaches by statistical data

• Well-Defined Component Architecture

– Modular

– Defined interfaces between NLP, IR, KS and Statistical Components

– Declarative representation of question answering plans

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Where Knowledge-Systems Help

Heuristic of finding short passages with all the query terms/semantic classes is good but not sufficient. E.g. from TREC9:

Q: How much folic acid should an expectant mother take daily?

A: 360 tons

Q: What is the diameter of the Earth?

A: 14 ft.

Q: How many states have a lottery?

A: 3,312

We will investigate the use of a sophisticated inference engine and knowledge-base (Cyc) to eliminate such answers.

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Question Complexity

“Simple” questions are not a solved problem:

• Complex questions can be decomposed into simpler components.

• If simpler questions cannot be handled successfully, there’s no hope for more complex ones.

BUT:

Areas not explored (intentionally) by TREC to date:

• spelling errors

• grammatical errors

• syntactic precision e.g. significance of articles

• not, only, just …

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Is there such a thing as a “simple” question?

A: How many members are there in the Cabinet?

Which is more complex?

Suppose there is no text that gives the answer explicitly

42 (from HGTTG)

B: What is the meaning of life?

“simple” -> “simple to state”

Complexity is a function for question and data source

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Different Solution ApproachesWhat is the largest city in England?

• Text Match– Find text that says “London is the largest city in England” (or paraphrase).

Confidence is confidence of NL parser * confidence of source. Find multiple instances and confidence of source -> 1.

• “Superlative” Search– Find a table of English cities and their populations, and sort.

– Find a list of the 10 largest cities in the world, and see which are in England. • Uses logic: if L > all objects in set R then L > all objects in set E R.

– Find the population of as many individual English cities as possible, and choose the largest.

• Heuristics– London is the capital of England. (Not guaranteed to imply it is the largest city, but

quite likely.)

• Complex Inference – E.g. “Birmingham is England’s second-largest city”; “Paris is larger than Birmingham”;

“London is larger than Paris”; “London is in England”.

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Parallel Confidence Propagation

QFRAMES QPLANS

Question Classifications Confidences

.

.

Candidate Answers Selected Answers

Goals (logical forms) with boolean connectives, sequencing and recombination information

Validation and Sanity Checks Eliminate some Answers and Adjust Confidences

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Probability Management

• Associated with every data element

• A priori probabilities associated with every processing module. Given default values at first, then learned as experience is gained

• Bayesian, Dempster-Shafer, …

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

IBM PIQUANT

High-Level Architecture

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

IBM PIQUANT Block Diagram

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

KnowledgeRepresentation

ReasoningServices

Ontology& DataServicesQuestion

Classification

QA-Manager Internals

QFRAMES

QPLANS

QPLAN Execution Engine

IR

WN

DBCYC

KB

NLP Components

Linguistic Question Analysis

AnswerPresentation

Answers

QFRAME

Plan Generation

AnswerResolution

Answer Candidates

QGOAL

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Question Classification “Daemons”

• Definition – What is OPEC?

• Comparative & Superlative– Does Kuwait export more oil than Venezuela?– Which country exports the most uranium?

• Profile – Who is Rabbani?

• Relationship– Which countries are allies of Qatar?

• Chronology– Was OPEC formed before Nixon became president?

• Enumeration – How many oil refineries are in the U.S.?

• Cause & Effect– Why did Iraq invade Kuwait?

• Combination– Which countries are Qatar’s most powerful allies?

Classifiers act as “daemons”; perform recognition and sub-plan generation

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Architectural Features • Modularity

– Self-contained components with well-defined functions and interfaces

– Ease of development, experimentation and maintenance

• Robustness – If a “Knowledge Source” fails the system will continue to operate with

(minor) degradation

– Exploit redundancy to find best answer

• Reinforcement– Multiple sources of evidence for same answer are synergistic

• Transparency – Explicit plans permit ready generation of explanations and symbolic analysis

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

IBM PIQUANT

Implementation Highlights

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Implementation Highlights• Predictive Annotation

– Shift computational burden from NLP towards IR– Index semantic labels along with text– Beat the Precision-Recall tradeoff by boosting

precision at little cost to recall

• Virtual Annotation– Answer definitional (“What is”) questions by

combination of linguistic, ontological and statistical techniques

– Find the hypernyms in e.g. WordNet that have the best combination of closeness and co-occurrence

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Predictive Annotation (1)Predictive Annotation• Annotate entire corpus and index semantic labels along with text

• Identify sought-after label(s) in questions and include in queries

• Example: Question is “Where is Belize?”– “Where” can map to CONTINENT$, COUNTRY$, STATE$, CITY$, CAPITAL$, PLACE$.

– Knowing Belize is a country: “Where is Belize?” {CONTINENT$ Belize}

(assume CONTINENT$ Continents plus sub-continental regions)

• Suppose text is “… including Belize in central America … ”

including COUNTRY$

PLACE$

CONTINENT$

PLACE$

Belize

in

central America

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Predictive Annotation (2)Increased precision of enhanced bag-of-words:

– “Where is Belize” {CONTINENT$ Belize}

– Belize occurs 704 times in TREC corpus

– Belize and CONTINENT$ co-occur in only 22 sentences

• Note: data structure equally appropriate for “Name a country in

Central America”, which {COUNTRY$ Central America}

including COUNTRY$

PLACE$

CONTINENT$

PLACE$

Belize

in

central America

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Summary

• Leverage existing technology base

• Parallel approach to find answer, exploiting redundancy

• Declarative plan representation

• Associate confidences with each component and each intermediate and final result

• CYC’s knowledge-base and inference engine to solve sub-problems and eliminate nonsensical answer candidates

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

High-Level 1st Year Development Plan• Finalize design of data-structures:

– QFRAME: question and derived attributes

– QPLAN: script for tackling solution

– QGOAL: logical-form like structure representing predicate for instantiation or verification

• Build several recognizers and QPLAN executor (many pieces already exist)

• Run on many examples to fine-tune and to develop a priori component confidence values

• Build answer resolution module

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

IBM PIQUANT

Back up Slides

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

Statistical Features

• Co occurrences to support definition answers

• Machine Learning to evaluate search engine results

• Machine Learning to assist in answer selection

• Learn probable confidence of question recognizers

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

QPLAN• Multiple per question type

• Declarative representation of a solution

– Independent of knowledge source’s details

• Executed by planning engine

• Sequence of solution steps

– structure knowledge queries

– text search queries

– statistical queries etc.

• Confidences learned over time

PIQUANT at AQUAINT Kick-Off Dec 3-5 2001

High-level View of Solution Steps

1. Question is processed by linguistic tools.2. Question is classified into 1 or more types3. Parallel solution plan is generated and executed.4. Responses are gathered and examined.5. If necessary, plan is revised and steps 3-5 revisited.6. Candidate answers are checked for sanity, merged,

sorted and presented

Note:

a. Dialog manager functions are not considered here.

b. All data-structures are assigned confidences and all selections of next steps are mediated by probabilistic computations.