dec. 3-5, 2002aquaint 12-month workshop1 hitiqa: high-quality interactive question answering...

Dec. 3-5, 2002 AQUAINT 12-Month Workshop 1

HITIQA: High-Quality Interactive Question Answering

12-Month Review

University at Albany, SUNYRutgers University


HITIQA Team• SUNY Albany:

– Prof. Tomek Strzalkowski, PI/PM– Prof. Rong Tang– Prof. Boris Yamrom, consultant– Ms. Sharon Small, Research Scientist– Mr. Ting Liu, Graduate Student– Mr. Nobuyuki Shimizu, Graduate Student– Mr. Tom Palen, summer intern– Mr. Peter LaMonica, summer intern/AFRL

• Rutgers:– Prof. Paul Kantor, co-PI– Prof. K.B. Ng– Prof. Nina Wacholder– Mr. Robert Rittman, Graduate Student– Ms. Ying Sun, Graduate Student– Mr. Peng Song, Graduate student


HITIQA Concept

Question: What recent disasters occurred in tunnels used for transportation?

Possible Category Axes SeenV

ehic

le t

yp

eLosses/Cost

loca

tion

other

auto

train

USER PROFILE; TASK CONTEXT

QUESTION NL PROCESSING

Clarification Dialogue:S: Are you interested in train accidents,automobile accidents or others?U: Any that involved lost life or a majordisruption in communication. Must identifyloses.

Semantics: What the question“means”:• to the system• to the userS

EM

AN

TIC

PR

OC

FUSE &SUMMARIZE

Answer &Justification

AN

SW

ER

GE

NE

R.

SEARCH &CATEGORIZE

KB

TEMPLATE SELECTION

Focused Information Need

QUALITY ASSESSMENT


Key Research Issues

• Question Semantics – how the system “understands” user requests

• Human-Computer Dialogue – how the user and the system negotiate this

understanding

• Information Quality Metrics – how some information is better than other

• Information Fusion – how to assemble the answer that fits user

needs.


Document Retrieval

Document Retrieval

BuildFrames

BuildFrames

ProcessFrames

ProcessFrames

DialogueManager

DialogueManager

QuestionProcessor

QuestionProcessor

Wordnet

Completed Work

question

Segment/Filter

Segment/Filter

ClusterSegments

ClusterSegments

Query Refinement

Query Refinement

Current Focus

DB

Gate

AnswerGenerator

AnswerGenerator

answer

Visualization


Data-Driven NL Semantics

What does the question mean to the user?– The speech act– The focus– User’s task,

intention, goal– User’s background

knowledge

What does the question mean to the system?– Available

information– Information that

can be retrieved– The dimensions of

the retrieved information


Data-Driven Semantics• What’s available?

– Assemble potentially relevant information– Greedy retrieval to maximize recall

• How does it break down?– Break the retrieved set into topics and facets– Passage level clustering using dynamic n-grams

• What does it mean?– Frame each facet, determine attributes– Specialized information extraction routines

• What is the answer?– Match fact frames against the question frames– Consider full matches and near misses


• Because of Iraq's defiance, ``the council may need to consider, at some stage, that the effect of these actions by Iraq may prove that the commission is obliged to conclude that it is unable to provide 100 percent verification,'' that Iraq has destroyed all its banned weapons, the inspectors said.

• They repeated previous statements that they were close to declaring that Iraq had complied with resolutions regarding its chemical weapons and missiles, but that questions remained as to Iraq's biological weapons program.

• The report cites the biological problem as the reason why Iraq and not inspectors should still be responsible for making disclosures about banned weapons.

• For nearly four years, Iraq failed to tell inspectors that it had a biological weapons program, the inspectors said. Only when forced did Baghdad disclose it, but its reports since then have been ``neither credible nor verifiable.'' How then could inspectors be asked to prove what Iraq has refused to divulge, the inspectors asked in their report. Iraq should continue to be responsible for providing all information about its banned weapons programs, as called for by U.N. resolutions, the inspectors argued.

Framing a Topical Cluster

TextFrameGroupId: Target:subTarget:

locations: , ,organizations:

Because of Iraq's defiance, ``the council may need to consider, at some stage, that the effect of these actions by Iraq may prove that the commission is obliged to conclude that it is unable to provide 100 percent verification,'' that Iraq has destroyed all its banned weapons, the inspectors said.

Iraq

weaponsIraq

They repeated previous statements that they were close to declaring that Iraq had complied with resolutions regarding its chemical weapons and missiles, but that questions remained as to Iraq's biological weapons program.

Iraq

biological weapons

U.N.

Relevance: Matches on all elements found in GoalFrame = {location, target}

0

, Iraq, Iraq

GoalFrame Target: possessing, weapons, mass destruction, nuclear weapons,

biological weapons

locations: Iraq


Answer Space Topology

KERNELQUESTION

MATCH

KERNELQUESTION

MATCH

NEARMISSES,

ALTERNATIVE INTERPRETATIONS

ALL RETRIEVED

FRAMES


Data-Driven Interaction What does the

question mean to the user?– The speech act– The focus– User’s

task/intention/goal– User’s background

knowledge

What does the question mean to the system?– Available information– Information that can

be retrieved– The dimensions of the

retrieved information Shared Understanding

– Semantic gaps drive the dialogue: to negotiate between user’s meaning and system’s

meaning to fill the gaps in the expected answer to resolve ambiguities in the data to reduce dimensionality of the answer space


Dialogue with the System• Dialogue arises from:

– System’s need to clarify before proceeding– Analyst’s need to clarify to keep system on target

• Dialogue Strategies:– Alternative interpretations: narrowing

• SYSTEM: Ask user to differentiate answers from non-answers• USER: confirm, deny, offer extra cues, …

– Off-target interpretations: expanding• SYSTEM: Ask user to modify the question• USER: confirm, deny, extra cues, new question, …

– More details please: information seeking• USER: Ask linked questions to follow through• USER: Specify answer characteristics: shape, quality, …

• A dialogue is unique to each analyst-data pair


A Sample Dialogue (actual)Analyst: “need information on pollution and the fishing industry in regard to the

Black Sea. also need related items to pollution and international

cooperation.”

HITIQA: <Work in progress messages sent>

HITIQA: “We have found relevant information for your query. (You may ask for

answer generation at any point now.) We have also found other information

that may be relevant. Are you interested in seeing information about The

Black Sea convention as it is related to Black Sea pollution?

Analyst: “yes”

HITIQA: “Thank you. Are you interested in seeing information about The

International Waters Assessment team as it is related to Black Sea

pollution?

Analyst: “Yes particularly about fishing if possible”


Dialogue continues…

HITIQA: “We have more information like this. Would you like us to continue?”

Analyst: “yes”

HITIQA: The environment, wildlife and people linked with the Black Sea are also

under threat from large discharges of raw sewage, damaging levels of

coastal erosion and the suffocating impact of dumping sludge and mud

dredged from ports…


AQUAINT DIALOGUE EVALUATIONS

ALL PARTICIPANTS

• 10 Scenario’s plus 1 training scenario

• 15 minutes per scenario

• Chat interface

• Wizard control allowed

HITIQA

• No scenario filtering of data - 3 Gigabytes of newswire

• 13% Wizard interruption of system responses


0

5

10

15

20

25

30

35

Tr 9 4 6 2 8 1 10 3 7 5

Total Analyst

System Wizard

Breakdown of dialogue utterances Analyst Two

Analyst 34%

System 58%

Wizard 8%


Information QualityQuality Criteria

• CONTENT– Accuracy and Objectivity– Completeness; uniqueness– Importance; Verifiability

• AUTHORITY– Reliability; credibility

• PRESENTATION– Clarity and Un-ambiguity– Style and Gravitas– Orientation and Level– Readability and Usability

• TIMELINESS– Recency– Currency

Measurable Quality Indicators

• IN/OUT-DEGREE MEASURE– Number of cites or links

to/from– Credibility of these cites/links

• DOCUMENT SIZE• STYLISTIC FEATURES

– Typical sentence length– Use of pronouns, punctuations

• LINGUISTIC FEATURES– Sentence forms, verbs– References to names,

amounts• STRUCTURAL FEATURES

– Organization of sections– Use of section titles, etc.

• COLLECTION FEATURES


Information Quality Research

Document Selection

Focus Group Study

Implement Experimental Systems

Pretests

Quality Judgment Experiment

Textual Feature Extraction

Automated Document Quality Prediction

Start

We’re here


Quality Judgments

• Focus Group:– Sessions conducted: March-April, 2002– Results: Nine quality aspects generated

• Expert Sessions:– Sessions Conducted: May-June, 2002– Results: 100 documents scored twice along 9 quality aspects

• Student Sessions:– Training and Testing Sessions: June-July, 2002

• 10 documents judged by experts used for training/testing

– Actual Judgment Sessions: June-August, 2002• Qualified students evaluated 10 documents per session

– Results: 900 documents scored twice along 9 quality aspects


Quality Assessment GUI


Factor Analysis of 9 Quality Features

Appearance

Content


Modeling Quality of Text• Kitchen sink approach

– 160 “independent” variables– Part-of-speech, vocabulary – stylistics, named entities, …

• Statistical pruning– Statistically significant variables– May be nonsensical to human

• Human pruning– Only “sensible” variables retained for each quality

• Pruning improves performance– Kitchen sink overfits– Statistics and Human close in performance– More work needed to understand the relationship


Quality Prediction by Linear Combination of Textual Features (from 5 to 17 variables). Split Half for Training and Testing.

Quality Factors Prediction Rate

Depth 67%Author Credential 55%

Accuracy 69%Source 57%

Objectivity 64%Grammar 79%

One Side vs Multi View 70%

Verbosity 63%Readability 76%

Performance of models


Data Fusion

• Use multiple methods to assess the relevance of documents or passages, – For a given question, dialogue, or cluster– Each method assigns a “score”

• Candidates → points in a “score space”• Seek patterns to localize the most relevant

documents or passages in this “score space”• Developed interactive data analysis tool


Non-linear “iso-relevance”


Information Visualization

• Supports Evidence Fusion– Dimensional displays

• Supports Information Quality Decisions– User interfaces

• Supports Clarification Dialogue– Multi-media dialogue: “picture = Kilo-word”

• Navigation through information space– Multiple views and orientation


Visual Dialog in HITIQA• Display space

– Multi-dimensional– non-homogeneous– non-structured

• Mapping– Documents → Frames → Visuals– Navigation through changing dimensions

• Selection─ Use of Color and Shape


HITIQA Visual Panel: cluster view


Current Status Summary

• HITIQA 1st Prototype complete• Data-driven semantics for questions• Framing and Dialogue• Good results of pilot evaluation• Information Quality Experiments• User studies phase I completed• 2-D visualization developed• Information fusion work started


Plans for the next 6 months• Refine the prototype

– Typed, specialized frames– More informative dialogue– Handle series of questions

• Second round of quality experiments• Answer generation• Information fusion• Tests and evaluations

dec. 3-5, 2002aquaint 12-month workshop1 hitiqa: high-quality interactive question answering...

Documents

month workshopframing

graduate studentmr

chemical weapons

banned weapons programs

graduate studentaquaint

graduate studentms

question framesconsider

user needs