sorry, i didn’t catch that! – an investigation of non-understandings and recovery strategies

27
sorry, I didn’t catch that! – an investigation of non- understandings and recovery strategies Dan Bohus www.cs.cmu.edu/~dbohus Alexander I. Rudnicky www.cs.cmu.edu/~air Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15213

Upload: micol

Post on 12-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies. Dan Bohus www.cs.cmu.edu/~dbohus Alexander I. Rudnicky www.cs.cmu.edu/~air Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15213. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies

Dan Bohus www.cs.cmu.edu/~dbohusAlexander I. Rudnicky www.cs.cmu.edu/~air

Computer Science DepartmentCarnegie Mellon UniversityPittsburgh, PA, 15213

Page 2: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

2

systems often do not understand correctly

S: What city are you leaving from?U: Urbana Champaign [OKAY IN THAT SAME PAY]

NON-understanding

System cannot extract any meaningful information from the user’s turn

S: What city are you leaving from?U: Birmingham [BERLIN PM]

System extracts incorrect information from the user’s turn

MIS-understanding

non-understandings and misunderstandings

Page 3: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

3

systems often do not understand correctly

S: What city are you leaving from?U: Urbana Champaign [OKAY IN THAT SAME PAY]

NON-understanding

System cannot extract any meaningful information from the user’s turn

detection

strategies

policy (knowing how to engage the strategies)

large space of strategies tradeoffs between them not well understood

typically trivial; although diagnosis is not

simple heuristics: “incremental prompting”

Page 4: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

4

questions under investigation

what are the main causes of non-understandings?

how large is their impact on performance?

how do various recovery strategies compare to each other?

what are the relationships between strategies and user behaviors?

can we improve global dialog performance by using a smarter policy?

if yes, can we learn a better policy from data?

data

Page 5: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

5

data collection

Roomline phone-based, mixed-initiative system conference room reservations

experimental design control group: uninformed recovery policy wizard group: recovery policy implemented by

wizard

46 participants, first-time users tasks & experimental procedure

up to 10 scenario-driven interactions

Page 6: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

6

non-understanding recovery strategiesS: For when do you need the conference room?1. ASK REPEAT Could you please repeat that?2. ASK REPHRASE Could you please try to rephrase that?3. NOTIFY (NTFY) Sorry, I didn’t catch that ...4. YIELD TURN (YLD) …5. REPROMPT (RP) For when do you need the conference room?6. DETAILED REPROMPT (DRP) Right now I need to know the date and time for when you need the reservation … 7. MOVE-ON Sorry, I didn’t catch that. For which day you need the room?8. YOU CAN SAY (YCS) Sorry, I didn’t catch that. For when do you need the conference room? You can say something like tomorrow at 10 am …9. TERSE YOU CAN SAY (TYCS) Sorry, I didn’t catch that. You can say something like tomorrow at 10 am …10. FULL HELP (HELP) Sorry, I didn’t catch that. I am currently trying to make a conference room reservation for you. Right now I need to know the date and time for when you need the reservation. You can say

something like tomorrow at 10 am …

Page 7: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

7

corpus statistics

449 sessions 8278 user turns utterances transcribed and checked manual annotations

misunderstandings correct concept values at each turn sources of understanding errors user response-types to recovery strategies

Page 8: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

8

questions under investigation

data

what are the main causes of non-understandings?

how large is their impact on performance?

how do various recovery strategies compare to each other?

what are the relationships between strategies and user behaviors?

Page 9: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

9

causes of non-understandings

conversationlevel

intentionlevel

signallevel

channellevel chann

el

Recognition

Parsing

Interpretation

End-pointing

Goal

Semantics

Text

Audio

user system

Page 10: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

10

causes of non-understandings

conversationlevel

intentionlevel

signallevel

channellevel

out-of-application16%

out-of-grammar16%

ASR error62%

endpointer error

Page 11: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

11

questions under investigation

data

what are the main causes of non-understandings?

how large is their impact on performance?

how do various recovery strategies compare to each other?

what are the relationships between strategies and user behaviors?

data : causes of non-understandings : impact on performance : strategy comparison : user behaviors

Page 12: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

12

1 + e-(α + β·FNON)

logistic regression

P(Task Success) =

modeling impact on performance

0 10% 20% 30% 40% 50%0

0.2

0.4

0.6

0.8

1

% Nonunderstandings (FNON)

P(T

as

k S

uc

ce

ss

= 1

)1

Page 13: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

13

questions under investigation

data

what are the main causes of non-understandings?

how large is their impact on performance?

how do various recovery strategies compare to each other?

what are the relationships between strategies and user behaviors?

data : causes of non-understandings : impact on performance : strategy comparison : user behaviors

Page 14: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

14

strategy performance – recovery rate

overall logistic ANOVA significant differences in mean recovery rates

all pairs comparison (corrected using FDR)

0%

10%

20%

30%

40%

50%

60%

70%

80%

Re

co

ve

ry

ra

te

MoveOnHelp

TerseYouCanSay

ReProm

pt

YouCanSay

AskRephra

se

Detaile

dReprom

pt

Notify

AskRepeat

Yield

reco

very

rate

Page 15: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

15

questions under investigation

data

what are the main causes of non-understandings?

how large is their impact on performance?

how do various recovery strategies compare to each other?

what are the relationships between strategies and user behaviors?

data : causes of non-understandings : impact on performance : strategy comparison : user behaviors

Page 16: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

16

user response types

tagging scheme by Shin also used by Choularton, Raux

5 categories repeat rephrase contradict change other

Page 17: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

17

50%

40%

30%

20%

10%

response types after non-understaning

0%

rephrase repeat contradict change other

Pizza (choularton & dale)

Communicator (Shin et al.)

Roomline (this study)

Page 18: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

18

user response types by strategy

MoveOnHelp

TerseYouCanSay

RePrompt

YouCanSay

AskRephrase

DetailedReprompt

Notify

AskRepeat

Yield

Rephrase

Change

Repeat

Other

100%

80%

60%

40%

20%

0%

Page 19: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

19

sources of non-understandings

impact on performance

strategy comparison

user responses

summary

can we improve global dialog performance by using a smarter policy?

can we learn a better policy from data?

asr, but also “language” errors → more shaping strategies …

regression model allows better quantitative assessment

help, “move-on” → further investigate “move-on”

margin for improving control over user responses

yes

preliminary results promising …

Page 20: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

20

thank you! questions …

Page 21: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

21

rejections

Figure 3. Misunderstandings and non-understandings before and after rejections

0 20% 40% 60% 80% 100%

Misunderstandings

Non-understandingsCorrect understandings

Before rejectionmechanism

After rejectionmechanism

False rejectionsCorrect rejections

Page 22: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

22

strategy performance assessment recovery rate recovery utility

weighted sum of correctly and incorrectly acquired concepts

weights are determined in a data-driven fashion

recovery efficiency also takes time to recovery into account

Page 23: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

23

experimental design: scenarios 10 scenarios, fixed order presented graphically (explained during briefing)

Page 24: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

24

strategy pair-wise comparison recovery performance ranked list, based on

pair-wise t-tests:

RNK MOVE HELP TYCS RP YCS ARPH DRP NTFY AREP YLD

MOVE 1 MOVE: - - - 1.31 1.33 1.35 1.71 1.8 1.91 2.06

HELP 2 HELP: - - - - - - 1.55 1.64 1.73 1.87

HELP 3 TYCS: - - - - - - 1.5 1.58 1.68 1.81

SIG 4 RP: - - - - - - - - 1.46 1.58

HELP 5 YCS: - - - - - - - - 1.44 1.55

SIG 6 ARPH: - - - - - - - - 1.42 1.53

SIG ? DRP: - - - - - - - - - -

SIG ? NTFY: - - - - - - - - - -

SIG ? AREP: - - - - - - - - - -

SIG ? YLD: - - - - - - - - - -

CER evaluation shows similar results

Page 25: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

25

recovery for various response-types

Repeat Rephrase Change Other0

10%

20%

30%

40%

50%

60%

70%

80%R

ec

ov

ery

ra

te

Page 26: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

26

Page 27: sorry, I didn’t catch that! –    an investigation of non-understandings and recovery strategies

27

impact of recovery rate on performance

0 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%0

0.2

0.4

0.6

0.8

1

Non-understanding recovery rate

P(T

as

k S

uc

ce

ss

=1

)

1 + e-(α + β·RecoveryRate)

recovery = next turn is correctly understood

P(Task Success) = 1