javelin project briefing 1 aquaint year i review language technologies institute carnegie mellon...

1AQUAINT Year I ReviewJAVELIN Project Briefing

Language Technologies InstituteCarnegie Mellon University

Status Update forYear 1 Program Review

December 3-5, 2002


Outline• Background / Overview• Project Status Update• Brief Component Updates• Main Research Goals (Y1 into Y2)

– Deeper Planning (complex questions)– Deeper NL Understanding– Multilingual Support– Interactive Dialog with Analyst


Javelin Overview• AQUAINT Dimensions Selected:

– Full system– Multilingual

• Research Objectives:– QA as Planning– QA and Auditability– Utility-based Information Fusion


Javelin Architecture

DataRepository

JAVELIN GUI

QuestionAnalyst

AnswerGenerator

RetrievalStrategist

ExecutionManager

...search engines &document collections

process historyand results

operator (action) models

InformationExtractor

PlannerDomainModel


Project Status Summary• Started in November 2001• Attended LREC ’02 workshop on

question answering “road map”• Initial end-to-end system built• Participated in TREC 2002 QA track• Detailed analysis of TREC performance• Planner now integrated


TREC 2002 QA Track• Accelerated work on architecture for TREC

– Original proposal: first integrated system Q4-Q5• System snapshot from mid-July

– Planner not fully integrated (not included)– 2 classifiers in Information Extractor: KNN, DT– Limited to top 15 docs from Retrieval Strategist– Two runs submitted to TREC:

• DT classifier, 15 docs maximum• KNN classifier, 15 docs maximum


TREC2002Results

CMUJAV000495 (DT classifier, 15 docs)Number wrong (W): 402Number unsupported (U): 10Number inexact (X): 13Number right (R): 75

Confidence-weighted score: 0.251Precision of recog. no answer (12 / 79) 0.152Recall of recog. no answer (12 / 46) 0.261

CMUJAV000501 (KNN classifier, 15 docs)Number wrong (W): 394Number unsupported (U): 8Number inexact (X): 12Number right (R): 86

Confidence-weighted score: 0.209Precision of recog. no answer (10 / 61) 0.164Recall of recog. no answer (10 / 46) 0.217

More correct,less confidence

More confidence,less correct


Lessons Learned from TREC• Does JAVELIN need to process more

candidate docs? Or more intelligence?– In some cases, document(s) containing the

answer were not found– In some cases, the correct answer was found, but

not selected• Overall, TREC was a worthwhile experience

– Couldn’t field our complete system, but we learned a lot about integration and the system became much more robust as a result


Question Analyzer for TREC

2002• Taxonomy of

question-answer types and type-specific constraints

• Knowledge integration

• Pattern matching approach for this year’s evaluation

Question input (XML format)

TokenizerToken information extraction

WordnetKantoo Lexicon

Brill TaggerBBN IdentifierKANTOO lexifier

Token string input

QA taxonomy+

Type-specific constraints

Get FR?Yes

Event/entitytemplate filler

Request object builder

FRNo

KANTOO grammars

Parser

Pattern matchingRequest object builder

Request object + system result(XML format)


Retrieval Strategist (RS):TREC Results Analysis

• Success: % of questions where at least 1 answer document was found

• TREC 2002:Success rate @ 30 docs: ~80%

@ 60 docs: ~85%@ 120 docs: ~86%

• Reasonable performance for a simple method, but room for improvement


RS: Ongoing Improvements• Improved incremental relaxation strategy

– Searching for complete keyword set too restrictive• Use subsets prioritized by discriminative ability

– Remove likely duplicate documents from results• Don’t waste valuable list space

– 15% fewer failures (229 test questions)• Overall success rate: @ 30 docs 83% (was 80%)

@ 60 docs 87% (was 85%)• Larger improvements unlikely without additional

techniques, such as constrained query expansion• Investigate constrained query expansion

– WordNet, Statistical methods• Switch IR engine from Inquery to Lemur


Information Extractor (IX):TREC Analysis

Inputs Answer in top 5

Answerin docset

Trec 8 200 71 189Trec 9 693 218 424Trec 10 500 119 313

If the answer is in the doc set returned by the RetrievalStrategist, does the IX module identify it as an answercandidate with a high confidence score?


IX: Current & Future Work

• Enrich feature space beyond surface patterns & surface statistics

• Perform AType specific learning• Perform adaptive semantic expansion• Enhance training data quantity/quality• Tune objective function


Answer Generator (AG): Work for TREC 2002

• Normalization of location names and some constraint matching– Used TIPSTER gazetteer and CIA World Factbook

• Normalization of numeric expressions and unit/currency conversion

• Normalized input confidence scores to [0,1]– Human readability– Final score for clusters computed as probability that

at least one member of the cluster was correct


AG: Specific Issues• Large number of candidate answers

– ~5-10% produce over 100 unique candidates

– 1-2% produce over 600 unique candidates• Mostly questions with an unknown answer type

– Ex. “What did Sherlock Holmes call the street gang that helped him crack cases?” produced 717 candidates

– Causes incorrect, low confidence answer to get enough support to displace correct answers


AG: Ongoing Improvements• Fix candidate answer confidence scores (done)

– Confidences normalized to a standard normal in [-2, 2] with outliers going to 0 or 1, saturation

– Extend range to [-2, 3]• Answers from the same document (in progress)

– Currently only best answer from each doc• Producing list-type answer (in progress)

– Multiple, close answers from same document beneficial


AG: Future Work

• Combining multiple candidate answer sources• Altering confidence based on constraints and

outside knowledge– Ex. “What is the most populated country in the

world?” produced 261 answers, most were not locations

– Any non-country answer could be demoted or removed.

• Can easily be done with locations, book and movie titles, actors, directors, etc.

• Some success using Google and simple patterns


Repository and Answer Justification

• Tables added– Planning– UtilityFunction– BeliefState– ExecutionOutcome– CandidateAction– PlanningStep– State– BeliefStateRelation– Metric– MetricStateRelation


Repository/AJ: Ongoing WorkCreation of an Interactive Answer Justification

Mode– Collaborative analyst-driven Answer Justification

• Mixed-initiative between system and analyst– GUI interaction

• At runtime being able to see planner reasoning– Runtime Answer Justification

• Using this runtime justification to stop run and rerun with different parameters.

• Answer Type Based Justification – Different answer types require different justifications

• Numeric answer-type questions should have different justifications than location answer-type questions.


Planning in JAVELIN• Enable generation of new question-

answering strategies at run-time

• Improve ability to recover from bad decisions as information is collected

• Gain insight into when different QA components are most useful


Planner Integration

exe E

DomainModel

Planner

DataRepository

JAVELIN GUI

module A

ExecutionManager

process history and data

JAVELIN operator (action) models

module E

module F

...

question

answer

ack

...

dialog

response

exe A

results

results

exe F

results

store


Planning ApproachBuilds on INSPIRE planning and execution architecture

Represent QA process steps as operators and model features of the information state

• Abstract away syntactic and lexical details of individual requests

Utility-based forward-chaining planning algorithm• Select action sequence maximizing expected utility of information

Explicitly model state and action uncertainty

Interleave planning and execution control of individual JAVELIN QA modules to manage uncertainty


Planner Server Implementation

EMInterface

ObjectDatabase

JAVELIN GUI

Problemsession storage for numeric & symbolic features of state objects

State, Action State

Execution Manager

Results XMLExecute XML

BeliefState & State plan representation

...

PlannerPlannerOutput

QA domain model updates

Question XML

Server

Domain, Operators

Answer XML

ObjectWithFeatures

• Server translates GUI request to planning problem

• Planning & execution algorithm is run until terminates with success or failure

• EMInterface translates between QA module data and internal state representation


Current Domain Operators

RESPOND_TO_USERpre: (and (interactive_session) (request ?q ?ro) (ranked_answers ?ans ?ro ?fills) (> (max_ans_score ?ans) 0.1) (> answer_quality 0))

ASK_USER_FOR_ANSWER_TYPEpre: (and (interactive_session) (request ?q ?ro) (or (and (ranked_answers ?ans ?ro ?fills) (< (max_ans_score ?ans) 0.1))

(no_docs_found ?ro) (no_fills_found ?ro ?docs)))

ASK_USER_FOR_MORE_KEYWORDSpre: (and (interactive_session) (request ?q ?ro) (or (and (ranked_answers ?ans ?ro ?fills) (< (max_ans_score ?ans) 0.1)) (no_docs_found ?ro)

(no_fills_found ?ro ?docs)))

• QuestionAnalyzer module called as a precursor to planning• Demonstrates generation of multiple search paths, feedback loops

RETRIEVE_DOCUMENTSpre: (and (request ?q ?ro) (> (extracted_terms ?ro) 0) (> request_quality 0))

EXTRACT_DT_CANDIDATE_FILLSpre: (and (retrieved_docs ?docs ?ro) (== (expected_atype ?ro) location_t) (> docset_quality 0.3))

EXTRACT_KNN_CANDIDATE_FILLSpre: (and (retrieved_docs ?docs ?ro) (!= (expected_atype ?ro) location_t) (> docset_quality 0.3))

RANK_CANDIDATESpre: (and (candidate_fills ?fills ?ro ?docs) (> fillset_quality 0))


Current Domain OperatorsRETRIEVE_DOCUMENTS (?q - question ?ro - qtype)pre: (and (request ?q ?ro) (> (extracted_terms ?ro) 0) (> request_quality 0))

dbind: ?docs (genDocsetID) ?dur (estTimeRS (expected_atype ?ro)) ?pnone (probNoDocs ?ro) ?pgood (probDocsHaveAns ?ro) ?dqual (estDocsetQual ?ro))

effects: (?pnodocs ((no_docs_found ?ro)(scale-down request_quality 2)(assign docset_quality 0)(increase system_time ?dur))

?pgood ((retrieved_docs ?docs ?ro)(assign docset_quality ?dqual)(increase system_time ?dur))

(1-?pgood-?pnone) ((retrieved_docs ?docs ?ro) (scale-down request_quality 2) (assign docset_quality 0) (increase system_time ?dur)))

execute: (RetrievalStrategist ?docs ?ro 10 15 300)

more detailed operator view...


Illustrative ExamplesWhere is bile produced?

• Overcomes current limitations of system “location” knowledge• Uses answer candidate confidence scores to trigger feedback loop

<RETRIEVE_DOCUMENTS RetrievalStrategist DS2216 RO2262 10 15 300><EXTRACT_DT_CANDIDATE_FILLS DTRequestFiller FS2216 RO2262 DS2216 900><RANK_CANDIDATES AnswerGenerator AL2196 RO2262 FS2216 180> <ASK_USER_FOR_ANSWER_TYPE AskUserForAtype Q74050 RO2262> <ASK_USER_FOR_MORE_KEYWORDS AskUserForKeywords Q74050 RO2262><RETRIEVE_DOCUMENTS RetrievalStrategist DS2217 RO2263 10 15 300> <EXTRACT_KNN_CANDIDATE_FILLS KNNRequestFiller FS2217 RO2263 DS2217 900> <RANK_CANDIDATES AnswerGenerator AL2197 RO2263 FS2217 180> <RESPOND_TO_USER RespondToUser A2204 AL2197 Q74050 RANKED>

1st iter

2nd iter

Top 3 answers found during initial pass (with “location” answer type)

1: Moscow (Conf: 0.01825)2: China (Conf: 0.01817)3: Guangdong Province (Conf: 0.01817)

Top 3 answers displayed (with user-specified “object” answer type; ‘liver’ ranked 6th)

1: gallbladder (Conf: 0.58728)2: dollars (Conf: 0.58235)3: stores (Conf: 0.58147)


Illustrative ExamplesWho invented the road traffic cone?

• Overcomes current inability to relax phrases during document retrieval• Uses answer candidate confidence scores to trigger feedback loop

1st iter

2nd iter

<RETRIEVE_DOCUMENTS RetrievalStrategist DS2221 RO2268 10 15 300><EXTRACT_KNN_CANDIDATE_FILLS KNNRequestFiller FS2221 RO2268 DS2221 900><RANK_CANDIDATES AnswerGenerator AL2201 RO2268 FS2221 180><ASK_USER_FOR_ANSWER_TYPE AskUserForAtype Q74053 RO2268><ASK_USER_FOR_MORE_KEYWORDS AskUserForKeywords Q74053 RO2268><RETRIEVE_DOCUMENTS RetrievalStrategist DS2222 RO2269 10 15 300><EXTRACT_KNN_CANDIDATE_FILLS KNNRequestFiller FS2222 RO2269 DS2222 900><RANK_CANDIDATES AnswerGenerator AL2202 RO2269 FS2222 180><RESPOND_TO_USER RespondToUser A2207 AL2202 Q74053 RANKED>

1: Colvin (Conf: 0.0176)2: Vladimir Zworykin (Conf: 0.0162)3: Angela Alioto (Conf: 0.01483)

Top 3 answers found during initial pass (using terms ‘invented’ and ‘road traffic cone’)

Top 3 answers displayed (with additional user-specified term ‘traffic cone’; correct answer is ‘David Morgan’)

1: Morgan (Conf: 0.4203)2: Colvin (Conf: 0.0176)3: Angela Alioto (Conf: 0.01483)


Y2 Planner Goals• Improve operator preconditions

and parameter estimates

• Enable user-specified time limits

• Provide GUI with planner status updates

• Improve user dialogs for request modification and clarification

• Evaluate performance of the revised domain model on TREC question sets

• Continue operator refinements as new modules become available

• Evaluate different utility functions, sensitivity to operator parameter values

• Explore different execution and replanning strategies

• Support context questions, question decomposition, merging answers

• Add feedback loop for learning operator parameters


NLP for Information Extraction

• Simple statistical classifiers are not sufficient on their own

• Need to supplement statistical approach with natural language processing to handle more complex queries


Example of IX error

• Question: “When was Wendy’s founded?”• Passage candidate:

– “The renowned Murano glassmaking industry, on an island in the Venetian lagoon, has gone through several reincarnations since it was founded in 1291. Three exhibitions of 20th-century Murano glass are coming up in New York. By Wendy Moonan.”

• IX generates: 20th Century


Passage Analyzer• Employ multiple parsers over passages

returned by Information Retrieval module• Transform resultant constituent structures

(parse trees) into functional structures– Requires unique sub-module for each parser that

does not already output f-structure• Transform f-structures into argument

structures (predicates)– Requires only one sub-module for all parsers (given

proper transformation into f-structure)• Compare and unify resultant a-structures from

passage with a-structure from the question– Benefits Answer Generation module by lending

supporting evidence to results


Example question• Question: “When was Wendy’s founded?”• Question Analyzer extended output:

– { temporal(?x), found(*, Wendy’s) }• Passage discovered by Information Retrieval module:

– “R. David Thomas founded Wendy’s in 1969, …”• Conversion to predicate form by Passage Analyzer:

– { founded(R. David Thomas, Wendy’s), DATE(1969), … }• Unification of QA literals against PA literals:

– Equiv(found(*,Wendy’s), founded(R. David Thomas, Wendy’s))

– Equiv(temporal(?x), DATE(1969))

– ?x := 1969• Answer: 1969


Multiple IX Modules

IXi

IXj

AGPassages

Request Object

Answer candidates

Answer candidates

Answer


Module prototype

Parsers

NE Tagger Verb stemmer

PassageAnalyzer

PredicateunificationWordNet

Answer Candidates

Passages

Request Object

Predicates


NLP IX: Future Work• Clean & enhance current extraction rules

– Formalize distinction between transition from c- to f-structure vs. f- to a-structure

• Make use of multiple parsers/grammars to take advantage of individual strengths of each– Tradeoff: Depth in specific domain vs. breadth of coverage

• Move target representation from a-structure to semantic structure– Take advantage of cutting-edge work in semantic role

identification, FrameNet, PropBank, etc.– Leads into future effort towards answering non-factoid

questions• Reasoning about events, concept mappings, etc.


Multilingual Question Answering

• Goals– English questions– Multilingual information sources (Jpn/Chi)– English/Multilingual Answers

• Extensions to existing JAVELIN modules– Question Analyzer– Retrieval Strategist– Information Extractor– Answer Generator


RSMultilingualArchitecture

AnswerGenerator

JapaneseIndex

ChineseIndex

InformationExtractor3(Chinese)

QuestionAnalyzer

OtherIndex

EnglishIndex

Answers?’s

BilingualDictionary

Module

Machinexlation

InformationExtractor1(English)

InformationExtractor2(Japanese)

InformationExtractor4

(other lang)

EncodingConverter

Japanesecorpora

Chinesecorpora

other langcorpora

Englishcorpora


Japanese Language Resources• Mainichi Shimbun Corpus

– Full corpus for 1998 and 1999 of a major Japanese newspaper.

• About 240,000 articles

• Bilingual Dictionaries– EDICT

• (100,000 general entries, 200,000 Japanese personal names, 87,000 Japanese place names, 14,000 scientific terms)

– EIJIRO• English word to Japanese phrase – harder to use, but has

1,080,000 entries.


Chinese Language Resources• Corpora

– Xinhua News corpora• Xinhua News from 1991-2001

– Federal Broadcasting Information Service• Mandarin-English Parallel corpus

• Preprocessing (tools from RADD-MT project)– ASCII character and digit normalization– Segmentation– Name entity tagging

• Bilingual Dictionaries– LDC

• Bilingual word-to-word dictionary• Bilingual phrase-to-phrase dictionary

– ABC Dictionary• Contains POS


Areas for Future Exploration(outside the scope of the two-year Javelin project, but interesting)

• Machine Translation of Questions• Use of Web-based Translation Resources• Multilingual Answer Combining & Selection

– When multiple corpora from multiple languages return answers, how do we select the best one?

– For list-type answers, we may want to combine answers from several languages to get a more complete answer.

• Multilingual NLP Grammars for Information Extraction– NLP Grammar already being worked on for English – could

increase the quality of answers extracted from the corpus, but needs to be developed separately for each language.

• Additional Languages


Overall JAVELIN Goalsfor End of Year 1

• Evaluate post-TREC improvements to JAVELIN modules

• First end-to-end system with Japanese• Distribute system documentation• Install in MITRE testbed environment


JAVELIN Goals for Year 2• Investigate complex questions

– Question/answer decomposition– Context questions

• Interactive dialog with analyst– Query refinement– Multi-question dialogs

• Multiple data sources– Multilingual (Japanese, Chinese)– Multi-source (e.g. CNS corpus)

javelin project briefing 1 aquaint year i review language technologies institute carnegie mellon...

Documents