aquaint pi meeting dec. 3-6, 2002 aquaint dialogue experiment jean scholtz information access...
TRANSCRIPT
AQUAINT PI meetingDec. 3-6, 2002
AQUAINT Dialogue Experiment
Jean ScholtzInformation Access Division
National Institute of Standards and [email protected]
AQUAINT PI meetingDec. 3-6, 2002
Purpose of the Experiment
• To investigate “dialogue” between a system and an analyst– to learn what types of dialogue analysts
expect to engage in– to learn how analysts react to different
types of system responses
AQUAINT PI meetingDec. 3-6, 2002
Wizard of Oz Experiment
• Used a web-based text chat to eliminate any possible confounds with usability of user interfaces
• Experiment design– 5 systems participated– 2 analysts used each system– 10 scenarios were used; order was randomized for each
system– analysts were given 15 minutes to explore each scenario
• Data collection– logs of dialogues– rating questionnaires filled out by the analyst after each
scenario
– observation notes
AQUAINT PI meetingDec. 3-6, 2002
Results- Systems• Overall the systems were rated reasonably high by the
analysts
Subject Rating of Answer Quality
System AIDSBlackSea FARC
Indo-nesia
IvoryCoast
JointVenture
Micro-soft Opium Robot Sanchez Mean
A 6.5 7 7 7 2 6 7 6.5 5.5 7 6.15B 5 6.5 5 5 3 4.5 6.5 6.5 5.5 7 5.45C 5 6.5 7 4 5.5 5.5 6 5.5 6.5 6.5 5.80D 3 6.5 5.5 3.5 1.5 5.5 5 4 5.5 4.5 4.45E 3 2 6.5 6.5 3.5 2 1.5 3.5 2.5 7 3.80
Mean 4.5 5.7 6.2 5.2 3.1 4.7 5.2 5.2 5.1 6.4 5.13
Subject Rating of Dialogue
System AIDSBlackSea FARC
Indo-nesia
IvoryCoast
JointVenture
Micro-soft Opium Robot Sanchez Mean
A 6.5 7 7 6 4 6 7 7 6.5 7 6.40B 5 7 -- 6 -- 7 7 7 7 -- 6.57C 5 6.5 6 4.5 6.5 6.5 6 5.5 6 6.5 5.90D 5.5 6 6 5.5 4.5 6 5.5 5.5 6 7 5.75E 5.5 4 5.5 6 5.5 3.5 4 4.5 4.5 6.5 4.95
Mean 5.5 6.1 6.1 5.6 5.1 5.8 5.9 5.9 6.0 6.8 5.91
AQUAINT PI meetingDec. 3-6, 2002
Results - Systems
• Analysts also judged most scenarios as “successful”
Number of Successes by System and by Scenario
System AIDSBlackSea FARC
Indo-nesia
IvoryCoast
JointVenture
Micro-soft Opium Robot
San-chez Total
A 2 2 2 2 0 2 2 2 2 2 18B 2 2 2 1 1 2 2 2 2 2 18C 2 2 2 2 2 2 2 2 2 2 20D 1 2 2 1 0 2 2 1 2 1 14E 1 0 2 2 0 0 0 1 0 2 8
Mean 1.6 1.6 2.0 1.6 0.6 1.6 1.6 1.6 1.6 1.8 15.6
AQUAINT PI meetingDec. 3-6, 2002
Results - Analysts
• Initial queries and dialogues were extremely varied. – Most initial queries were phrased as questions but they also used
statements such as• “I need”; “please provide information on”; “looking for background
information on….”
– Analysts at times provided context in the initial query• Analyst: "Subject is effect of pollution on black sea fishing industry,
What are sources of pollution, trends in reducing pollution, and international cooperation in reducing pollution?”
• Analysts did not always take turns. They asked questions as they occurred to them.
• Analysts posed multipart questions. If the system did not understand, they broke these down into separate parts.
• Analysts posed general questions. If the system did not understand, they asked more specific questions.
AQUAINT PI meetingDec. 3-6, 2002
Results - Analysts
• Analysts expect the system to remember context– Example:– analyst, “good info. pls describe the how question."
– Example:– Wizard, "the answer is 90 billion dollars"– analyst ,"The same for 2000, please."
– Example:– Wizard,"I have no further information for the year 1998."– analyst,"OK on your anwser for 1998 can you do the same for
2000 and skip 1999"
AQUAINT PI meetingDec. 3-6, 2002
Results - Analysts• Self-clarification – Analysts interrupt to clarify their own questions or
even to cancel them.• Example:
– analyst, “iwould should read i would"• Example:
– analyst,"Can you tell me which of these was the first joint venture?– Analyst,"forget that last question"
• Analysts don’t just say no, or yes for that matter. When asked questions by the system that can be answered as yes or no, analysts often add constraints.
• Example:– Wizard,"Would you like more figures?"– analyst, "Keep going but would like timeline on figures if possible."– analyst, “also do u have any projections?"
• Example:– wizard,"Thank you. Are you interested in seeing information about The International
Waters Assessment Team as it is related to Black Sea pollution?"– analyst, “Yes particularly about fishing if possible.”
AQUAINT PI meetingDec. 3-6, 2002
Results- Analysts• Responding to system questions is not always straight forward
– Example: – Wizard, “We still have more, would you like to see?"– analyst,"can you give me info on when they were formed and the year they were
formed”
– Example:– Wizard, "There is information about inflation rate, GDP, exchange rate, and per capita
income. Which indicator would you like?"– analyst, “i would like to see all indicators”
– Example:– analyst, “give me allinformation concerning the economic situation in indonesia since the
the year 1998"– Wizard,"I did not understand the word "allinformation". Please try using a different word"– analyst, “actually two words namely all information"– Wizard,"I did not understand the word "namely". Please try using a different word. "– analyst,” i would like all available information you have concerning the economic
situation in indonesia"
AQUAINT PI meetingDec. 3-6, 2002
Results- Analysts• Analysts have high expectations – ask about content
• Example– Wizard, "The cultivated opium production for 2001 is 7606. – Analyst, “Is this metric tons?"– Wizard,"The cultivated opium production for 2001 is 7606 hectares
• Example:– wizard, “However, both the da Vinci and ZEUS systems must receive
governmental approval for each procedure that a surgeon plans to use it for.'"– wizard, “On July 11, 2000, the U.S. Food and Drug Administration (FDA)
approved the da Vinci Surgical System, making it the first robotic system allowed to be used in American operating rooms.'"
– analyst, “When you say governmental approval do you mean federal or something else?"
– Wizard," i think this refers to US federal approval"– analyst, "for each surgery?"
AQUAINT PI meetingDec. 3-6, 2002
Next Steps
• Issues to be addressed for next experiment– Scenarios
• need longer, more involved scenarios
– Database• need a common database
– User Interface• need to move to system supplied user interfaces
– Metrics• handling of different classifications of
dialogues/responses