dec. 3-5, 2002aquaint 12-month workshop1 hitiqa: high-quality interactive question answering...
TRANSCRIPT
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 1
HITIQA: High-Quality Interactive Question Answering
12-Month Review
University at Albany, SUNYRutgers University
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 2
HITIQA Team• SUNY Albany:
– Prof. Tomek Strzalkowski, PI/PM– Prof. Rong Tang– Prof. Boris Yamrom, consultant– Ms. Sharon Small, Research Scientist– Mr. Ting Liu, Graduate Student– Mr. Nobuyuki Shimizu, Graduate Student– Mr. Tom Palen, summer intern– Mr. Peter LaMonica, summer intern/AFRL
• Rutgers:– Prof. Paul Kantor, co-PI– Prof. K.B. Ng– Prof. Nina Wacholder– Mr. Robert Rittman, Graduate Student– Ms. Ying Sun, Graduate Student– Mr. Peng Song, Graduate student
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 3
HITIQA Concept
Question: What recent disasters occurred in tunnels used for transportation?
Possible Category Axes SeenV
ehic
le t
yp
eLosses/Cost
loca
tion
other
auto
train
USER PROFILE; TASK CONTEXT
QUESTION NL PROCESSING
Clarification Dialogue:S: Are you interested in train accidents,automobile accidents or others?U: Any that involved lost life or a majordisruption in communication. Must identifyloses.
Semantics: What the question“means”:• to the system• to the userS
EM
AN
TIC
PR
OC
FUSE &SUMMARIZE
Answer &Justification
AN
SW
ER
GE
NE
R.
SEARCH &CATEGORIZE
KB
TEMPLATE SELECTION
Focused Information Need
QUALITY ASSESSMENT
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 4
Key Research Issues
• Question Semantics – how the system “understands” user requests
• Human-Computer Dialogue – how the user and the system negotiate this
understanding
• Information Quality Metrics – how some information is better than other
• Information Fusion – how to assemble the answer that fits user
needs.
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 5
Document Retrieval
Document Retrieval
BuildFrames
BuildFrames
ProcessFrames
ProcessFrames
DialogueManager
DialogueManager
QuestionProcessor
QuestionProcessor
Wordnet
Completed Work
question
Segment/Filter
Segment/Filter
ClusterSegments
ClusterSegments
Query Refinement
Query Refinement
Current Focus
DB
Gate
AnswerGenerator
AnswerGenerator
answer
Visualization
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 6
Data-Driven NL Semantics
What does the question mean to the user?– The speech act– The focus– User’s task,
intention, goal– User’s background
knowledge
What does the question mean to the system?– Available
information– Information that
can be retrieved– The dimensions of
the retrieved information
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 7
Data-Driven Semantics• What’s available?
– Assemble potentially relevant information– Greedy retrieval to maximize recall
• How does it break down?– Break the retrieved set into topics and facets– Passage level clustering using dynamic n-grams
• What does it mean?– Frame each facet, determine attributes– Specialized information extraction routines
• What is the answer?– Match fact frames against the question frames– Consider full matches and near misses
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 8
• Because of Iraq's defiance, ``the council may need to consider, at some stage, that the effect of these actions by Iraq may prove that the commission is obliged to conclude that it is unable to provide 100 percent verification,'' that Iraq has destroyed all its banned weapons, the inspectors said.
• They repeated previous statements that they were close to declaring that Iraq had complied with resolutions regarding its chemical weapons and missiles, but that questions remained as to Iraq's biological weapons program.
• The report cites the biological problem as the reason why Iraq and not inspectors should still be responsible for making disclosures about banned weapons.
• For nearly four years, Iraq failed to tell inspectors that it had a biological weapons program, the inspectors said. Only when forced did Baghdad disclose it, but its reports since then have been ``neither credible nor verifiable.'' How then could inspectors be asked to prove what Iraq has refused to divulge, the inspectors asked in their report. Iraq should continue to be responsible for providing all information about its banned weapons programs, as called for by U.N. resolutions, the inspectors argued.
Framing a Topical Cluster
TextFrameGroupId: Target:subTarget:
locations: , ,organizations:
Because of Iraq's defiance, ``the council may need to consider, at some stage, that the effect of these actions by Iraq may prove that the commission is obliged to conclude that it is unable to provide 100 percent verification,'' that Iraq has destroyed all its banned weapons, the inspectors said.
Iraq
weaponsIraq
They repeated previous statements that they were close to declaring that Iraq had complied with resolutions regarding its chemical weapons and missiles, but that questions remained as to Iraq's biological weapons program.
Iraq
biological weapons
U.N.
Relevance: Matches on all elements found in GoalFrame = {location, target}
0
, Iraq, Iraq
GoalFrame Target: possessing, weapons, mass destruction, nuclear weapons,
biological weapons
locations: Iraq
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 9
Answer Space Topology
KERNELQUESTION
MATCH
KERNELQUESTION
MATCH
NEARMISSES,
ALTERNATIVE INTERPRETATIONS
ALL RETRIEVED
FRAMES
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 10
Data-Driven Interaction What does the
question mean to the user?– The speech act– The focus– User’s
task/intention/goal– User’s background
knowledge
What does the question mean to the system?– Available information– Information that can
be retrieved– The dimensions of the
retrieved information Shared Understanding
– Semantic gaps drive the dialogue: to negotiate between user’s meaning and system’s
meaning to fill the gaps in the expected answer to resolve ambiguities in the data to reduce dimensionality of the answer space
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 11
Dialogue with the System• Dialogue arises from:
– System’s need to clarify before proceeding– Analyst’s need to clarify to keep system on target
• Dialogue Strategies:– Alternative interpretations: narrowing
• SYSTEM: Ask user to differentiate answers from non-answers• USER: confirm, deny, offer extra cues, …
– Off-target interpretations: expanding• SYSTEM: Ask user to modify the question• USER: confirm, deny, extra cues, new question, …
– More details please: information seeking• USER: Ask linked questions to follow through• USER: Specify answer characteristics: shape, quality, …
• A dialogue is unique to each analyst-data pair
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 12
A Sample Dialogue (actual)Analyst: “need information on pollution and the fishing industry in regard to the
Black Sea. also need related items to pollution and international
cooperation.”
HITIQA: <Work in progress messages sent>
HITIQA: “We have found relevant information for your query. (You may ask for
answer generation at any point now.) We have also found other information
that may be relevant. Are you interested in seeing information about The
Black Sea convention as it is related to Black Sea pollution?
Analyst: “yes”
HITIQA: “Thank you. Are you interested in seeing information about The
International Waters Assessment team as it is related to Black Sea
pollution?
Analyst: “Yes particularly about fishing if possible”
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 13
Dialogue continues…
HITIQA: “We have more information like this. Would you like us to continue?”
Analyst: “yes”
HITIQA: The environment, wildlife and people linked with the Black Sea are also
under threat from large discharges of raw sewage, damaging levels of
coastal erosion and the suffocating impact of dumping sludge and mud
dredged from ports…
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 14
AQUAINT DIALOGUE EVALUATIONS
ALL PARTICIPANTS
• 10 Scenario’s plus 1 training scenario
• 15 minutes per scenario
• Chat interface
• Wizard control allowed
HITIQA
• No scenario filtering of data - 3 Gigabytes of newswire
• 13% Wizard interruption of system responses
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 15
0
5
10
15
20
25
30
35
Tr 9 4 6 2 8 1 10 3 7 5
Total Analyst
System Wizard
Breakdown of dialogue utterances Analyst Two
Analyst 34%
System 58%
Wizard 8%
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 16
Information QualityQuality Criteria
• CONTENT– Accuracy and Objectivity– Completeness; uniqueness– Importance; Verifiability
• AUTHORITY– Reliability; credibility
• PRESENTATION– Clarity and Un-ambiguity– Style and Gravitas– Orientation and Level– Readability and Usability
• TIMELINESS– Recency– Currency
Measurable Quality Indicators
• IN/OUT-DEGREE MEASURE– Number of cites or links
to/from– Credibility of these cites/links
• DOCUMENT SIZE• STYLISTIC FEATURES
– Typical sentence length– Use of pronouns, punctuations
• LINGUISTIC FEATURES– Sentence forms, verbs– References to names,
amounts• STRUCTURAL FEATURES
– Organization of sections– Use of section titles, etc.
• COLLECTION FEATURES
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 17
Information Quality Research
Document Selection
Focus Group Study
Implement Experimental Systems
Pretests
Quality Judgment Experiment
Textual Feature Extraction
Automated Document Quality Prediction
Start
We’re here
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 18
Quality Judgments
• Focus Group:– Sessions conducted: March-April, 2002– Results: Nine quality aspects generated
• Expert Sessions:– Sessions Conducted: May-June, 2002– Results: 100 documents scored twice along 9 quality aspects
• Student Sessions:– Training and Testing Sessions: June-July, 2002
• 10 documents judged by experts used for training/testing
– Actual Judgment Sessions: June-August, 2002• Qualified students evaluated 10 documents per session
– Results: 900 documents scored twice along 9 quality aspects
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 19
Quality Assessment GUI
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 20
Factor Analysis of 9 Quality Features
Appearance
Content
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 21
Modeling Quality of Text• Kitchen sink approach
– 160 “independent” variables– Part-of-speech, vocabulary – stylistics, named entities, …
• Statistical pruning– Statistically significant variables– May be nonsensical to human
• Human pruning– Only “sensible” variables retained for each quality
• Pruning improves performance– Kitchen sink overfits– Statistics and Human close in performance– More work needed to understand the relationship
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 22
Quality Prediction by Linear Combination of Textual Features (from 5 to 17 variables). Split Half for Training and Testing.
Quality Factors Prediction Rate
Depth 67%Author Credential 55%
Accuracy 69%Source 57%
Objectivity 64%Grammar 79%
One Side vs Multi View 70%
Verbosity 63%Readability 76%
Performance of models
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 23
Data Fusion
• Use multiple methods to assess the relevance of documents or passages, – For a given question, dialogue, or cluster– Each method assigns a “score”
• Candidates → points in a “score space”• Seek patterns to localize the most relevant
documents or passages in this “score space”• Developed interactive data analysis tool
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 24
Non-linear “iso-relevance”
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 25
Information Visualization
• Supports Evidence Fusion– Dimensional displays
• Supports Information Quality Decisions– User interfaces
• Supports Clarification Dialogue– Multi-media dialogue: “picture = Kilo-word”
• Navigation through information space– Multiple views and orientation
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 26
Visual Dialog in HITIQA• Display space
– Multi-dimensional– non-homogeneous– non-structured
• Mapping– Documents → Frames → Visuals– Navigation through changing dimensions
• Selection─ Use of Color and Shape
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 27
HITIQA Visual Panel: cluster view
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 28
Current Status Summary
• HITIQA 1st Prototype complete• Data-driven semantics for questions• Framing and Dialogue• Good results of pilot evaluation• Information Quality Experiments• User studies phase I completed• 2-D visualization developed• Information fusion work started
Dec. 3-5, 2002 AQUAINT 12-Month Workshop 29
Plans for the next 6 months• Refine the prototype
– Typed, specialized frames– More informative dialogue– Handle series of questions
• Second round of quality experiments• Answer generation• Information fusion• Tests and evaluations