aquaint pi meeting dec. 3-6, 2002 aquaint dialogue experiment jean scholtz information access...

11
AQUAINT PI meeting Dec. 3-6, 2002 AQUAINT Dialogue Experiment Jean Scholtz Information Access Division National Institute of Standards and Technology [email protected]

Upload: lawrence-washington

Post on 25-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AQUAINT PI meeting Dec. 3-6, 2002 AQUAINT Dialogue Experiment Jean Scholtz Information Access Division National Institute of Standards and Technology jean.scholtz@nist.gov

AQUAINT PI meetingDec. 3-6, 2002

AQUAINT Dialogue Experiment

Jean ScholtzInformation Access Division

National Institute of Standards and [email protected]

Page 2: AQUAINT PI meeting Dec. 3-6, 2002 AQUAINT Dialogue Experiment Jean Scholtz Information Access Division National Institute of Standards and Technology jean.scholtz@nist.gov

AQUAINT PI meetingDec. 3-6, 2002

Purpose of the Experiment

• To investigate “dialogue” between a system and an analyst– to learn what types of dialogue analysts

expect to engage in– to learn how analysts react to different

types of system responses

Page 3: AQUAINT PI meeting Dec. 3-6, 2002 AQUAINT Dialogue Experiment Jean Scholtz Information Access Division National Institute of Standards and Technology jean.scholtz@nist.gov

AQUAINT PI meetingDec. 3-6, 2002

Wizard of Oz Experiment

• Used a web-based text chat to eliminate any possible confounds with usability of user interfaces

• Experiment design– 5 systems participated– 2 analysts used each system– 10 scenarios were used; order was randomized for each

system– analysts were given 15 minutes to explore each scenario

• Data collection– logs of dialogues– rating questionnaires filled out by the analyst after each

scenario

– observation notes

Page 4: AQUAINT PI meeting Dec. 3-6, 2002 AQUAINT Dialogue Experiment Jean Scholtz Information Access Division National Institute of Standards and Technology jean.scholtz@nist.gov

AQUAINT PI meetingDec. 3-6, 2002

Results- Systems• Overall the systems were rated reasonably high by the

analysts

Subject Rating of Answer Quality

System AIDSBlackSea FARC

Indo-nesia

IvoryCoast

JointVenture

Micro-soft Opium Robot Sanchez Mean

A 6.5 7 7 7 2 6 7 6.5 5.5 7 6.15B 5 6.5 5 5 3 4.5 6.5 6.5 5.5 7 5.45C 5 6.5 7 4 5.5 5.5 6 5.5 6.5 6.5 5.80D 3 6.5 5.5 3.5 1.5 5.5 5 4 5.5 4.5 4.45E 3 2 6.5 6.5 3.5 2 1.5 3.5 2.5 7 3.80

Mean 4.5 5.7 6.2 5.2 3.1 4.7 5.2 5.2 5.1 6.4 5.13

Subject Rating of Dialogue

System AIDSBlackSea FARC

Indo-nesia

IvoryCoast

JointVenture

Micro-soft Opium Robot Sanchez Mean

A 6.5 7 7 6 4 6 7 7 6.5 7 6.40B 5 7 -- 6 -- 7 7 7 7 -- 6.57C 5 6.5 6 4.5 6.5 6.5 6 5.5 6 6.5 5.90D 5.5 6 6 5.5 4.5 6 5.5 5.5 6 7 5.75E 5.5 4 5.5 6 5.5 3.5 4 4.5 4.5 6.5 4.95

Mean 5.5 6.1 6.1 5.6 5.1 5.8 5.9 5.9 6.0 6.8 5.91

Page 5: AQUAINT PI meeting Dec. 3-6, 2002 AQUAINT Dialogue Experiment Jean Scholtz Information Access Division National Institute of Standards and Technology jean.scholtz@nist.gov

AQUAINT PI meetingDec. 3-6, 2002

Results - Systems

• Analysts also judged most scenarios as “successful”

Number of Successes by System and by Scenario

System AIDSBlackSea FARC

Indo-nesia

IvoryCoast

JointVenture

Micro-soft Opium Robot

San-chez Total

A 2 2 2 2 0 2 2 2 2 2 18B 2 2 2 1 1 2 2 2 2 2 18C 2 2 2 2 2 2 2 2 2 2 20D 1 2 2 1 0 2 2 1 2 1 14E 1 0 2 2 0 0 0 1 0 2 8

Mean 1.6 1.6 2.0 1.6 0.6 1.6 1.6 1.6 1.6 1.8 15.6

Page 6: AQUAINT PI meeting Dec. 3-6, 2002 AQUAINT Dialogue Experiment Jean Scholtz Information Access Division National Institute of Standards and Technology jean.scholtz@nist.gov

AQUAINT PI meetingDec. 3-6, 2002

Results - Analysts

• Initial queries and dialogues were extremely varied. – Most initial queries were phrased as questions but they also used

statements such as• “I need”; “please provide information on”; “looking for background

information on….”

– Analysts at times provided context in the initial query• Analyst: "Subject is effect of pollution on black sea fishing industry,

What are sources of pollution, trends in reducing pollution, and international cooperation in reducing pollution?”

• Analysts did not always take turns. They asked questions as they occurred to them.

• Analysts posed multipart questions. If the system did not understand, they broke these down into separate parts.

• Analysts posed general questions. If the system did not understand, they asked more specific questions.

Page 7: AQUAINT PI meeting Dec. 3-6, 2002 AQUAINT Dialogue Experiment Jean Scholtz Information Access Division National Institute of Standards and Technology jean.scholtz@nist.gov

AQUAINT PI meetingDec. 3-6, 2002

Results - Analysts

• Analysts expect the system to remember context– Example:– analyst, “good info. pls describe the how question."

– Example:– Wizard, "the answer is 90 billion dollars"– analyst ,"The same for 2000, please."

– Example:– Wizard,"I have no further information for the year 1998."– analyst,"OK on your anwser for 1998 can you do the same for

2000 and skip 1999"

Page 8: AQUAINT PI meeting Dec. 3-6, 2002 AQUAINT Dialogue Experiment Jean Scholtz Information Access Division National Institute of Standards and Technology jean.scholtz@nist.gov

AQUAINT PI meetingDec. 3-6, 2002

Results - Analysts• Self-clarification – Analysts interrupt to clarify their own questions or

even to cancel them.• Example:

– analyst, “iwould should read i would"• Example:

– analyst,"Can you tell me which of these was the first joint venture?– Analyst,"forget that last question"

• Analysts don’t just say no, or yes for that matter. When asked questions by the system that can be answered as yes or no, analysts often add constraints.

• Example:– Wizard,"Would you like more figures?"– analyst, "Keep going but would like timeline on figures if possible."– analyst, “also do u have any projections?"

• Example:– wizard,"Thank you. Are you interested in seeing information about The International

Waters Assessment Team as it is related to Black Sea pollution?"– analyst, “Yes particularly about fishing if possible.”

Page 9: AQUAINT PI meeting Dec. 3-6, 2002 AQUAINT Dialogue Experiment Jean Scholtz Information Access Division National Institute of Standards and Technology jean.scholtz@nist.gov

AQUAINT PI meetingDec. 3-6, 2002

Results- Analysts• Responding to system questions is not always straight forward

– Example: – Wizard, “We still have more, would you like to see?"– analyst,"can you give me info on when they were formed and the year they were

formed”

– Example:– Wizard, "There is information about inflation rate, GDP, exchange rate, and per capita

income. Which indicator would you like?"– analyst, “i would like to see all indicators”

– Example:– analyst, “give me allinformation concerning the economic situation in indonesia since the

the year 1998"– Wizard,"I did not understand the word "allinformation". Please try using a different word"– analyst, “actually two words namely all information"– Wizard,"I did not understand the word "namely". Please try using a different word. "– analyst,” i would like all available information you have concerning the economic

situation in indonesia"

Page 10: AQUAINT PI meeting Dec. 3-6, 2002 AQUAINT Dialogue Experiment Jean Scholtz Information Access Division National Institute of Standards and Technology jean.scholtz@nist.gov

AQUAINT PI meetingDec. 3-6, 2002

Results- Analysts• Analysts have high expectations – ask about content

• Example– Wizard, "The cultivated opium production for 2001 is 7606. – Analyst, “Is this metric tons?"– Wizard,"The cultivated opium production for 2001 is 7606 hectares

• Example:– wizard, “However, both the da Vinci and ZEUS systems must receive

governmental approval for each procedure that a surgeon plans to use it for.'"– wizard, “On July 11, 2000, the U.S. Food and Drug Administration (FDA)

approved the da Vinci Surgical System, making it the first robotic system allowed to be used in American operating rooms.'"

– analyst, “When you say governmental approval do you mean federal or something else?"

– Wizard," i think this refers to US federal approval"– analyst, "for each surgery?"

Page 11: AQUAINT PI meeting Dec. 3-6, 2002 AQUAINT Dialogue Experiment Jean Scholtz Information Access Division National Institute of Standards and Technology jean.scholtz@nist.gov

AQUAINT PI meetingDec. 3-6, 2002

Next Steps

• Issues to be addressed for next experiment– Scenarios

• need longer, more involved scenarios

– Database• need a common database

– User Interface• need to move to system supplied user interfaces

– Metrics• handling of different classifications of

dialogues/responses