dialogue acts
DESCRIPTION
Dialogue Acts. Julia Hirschberg LSA07 353. Today. Recognizing structural information: Dialogue Acts vs. Discourse Structure Speech Acts Dialogue Acts Coding schemes (DAMSL) Practical goals Identifying DAs Direct and indirect DAs: experimental results - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/1.jpg)
04/19/23 1
Dialogue Acts
Julia Hirschberg
LSA07 353
![Page 2: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/2.jpg)
04/19/23 2
Today
• Recognizing structural information: Dialogue Acts vs. Discourse Structure
• Speech Acts Dialogue Acts– Coding schemes (DAMSL)– Practical goals
• Identifying DAs– Direct and indirect DAs: experimental results– Corpus studies of DA disambiguation– Automatic DA identification– More corpus studies
![Page 3: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/3.jpg)
04/19/23 3
Speech Acts
• Wittgenstein ’53, Austin ’62 and Searle ’75• Contributions to dialogue are actions performed
by speakers:– I promise to make you very very sorry for that.– Performative verbs
• Locutionary act: the act of conveying the ‘meaning’ of the sentence uttered (e.g. committing the Speaker to making the hearer sorry)
• Ilocutionary act: the act associated with the verb uttered (e.g. promising)
• Perlocutionary act: the act of producing an effect on the Hearer (e.g. threatening)
![Page 4: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/4.jpg)
04/19/23 4
Searle’s Classification Scheme
• Assertives: commit S to the truth of X (e.g. The world is flat)
• Directives: attempt by S to get H to do X (e.g. Open the window please)
• Commissives: commit S to do X (e.g. I’ll do it tomorrow)
• Expressives: S’s description of his/her own feelings about X (e.g. I’m sorry I screamed)
• Declarations: S brings about a change in the world by virtue of uttering X (e.g. I divorce you, I resign)
![Page 5: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/5.jpg)
04/19/23 5
Dialogue Acts
• Roughly correspond to Illocutionary acts– Motivation: Modeling Spoken Dialogue– Many coding schemes (e.g. DAMSL)– Many-to-many mapping between DAs and words
• Agreement DA can realized by Okay, Um, Right, Yeah, …• But each of these can express multiple DAs, e.g.
S: You should take the 10pm flight.
U: Okay
…that sounds perfect.
…but I’d prefer an earlier flight.
…(I’m listening)
![Page 6: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/6.jpg)
04/19/23 6
A Possible Coding Scheme for ‘ok’
• Ritualistic?– Closing – You're welcome – Other – No
• 3rd-Turn-Receipt?– Yes – No
• If Ritualistic==No, code all of these as well:
• Task Management:– I'm done– I'm not done yet– None
![Page 7: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/7.jpg)
04/19/23 7
• Topic Management:– Starting new topic– Finished old topic– Pivot: finishing and starting
• Turn Management:– Still your turn (=traditional backchannel)– Still my turn (=stalling for time)
– I'm done, it is now your turn
– None
• Belief Management:– I accept your proposition – I entertain your proposition– I reject your proposition– Do you accept my proposition? (=ynq)– None
![Page 8: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/8.jpg)
04/19/23 8
Practical Goals
• In Spoken Dialogue Systems– Disambiguate current DA
• Represent user input correctly• Respond appropriately
– Predict next DA• Switch Language Models for ASR• Switch states in semantic processing
– Produce DA for next system turn appropriately
![Page 9: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/9.jpg)
04/19/23 9
Disambiguating Ambiguous DAs Intonationally
• Modal (Can/would/would..willing) questions– Can you move the piano?– Would you move the piano?– Would you be willing to move the piano?
• Nickerson & Chu-Carroll ’99: Can info-requests be disambiguated reliably from action-requests?– By prosodic information?– Role of politeness
![Page 10: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/10.jpg)
04/19/23 10
Production Studies
• Design– Subjects read ambiguous questions in disambiguating
contexts– Control for given/new and contrastiveness– Polite/neutral/impolite readings– ToBI-style labeling
• Problems:– Cells imbalanced; little data– No pretesting– No distractors– Same speaker reads both contexts– No perception checks
![Page 11: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/11.jpg)
04/19/23 11
Results
• Indirect requests (e.g. for action)– If L%, more likely (73%) to be indirect– If H%,46% were indirect: differences in height
of boundary tone?– Politeness: can differs in impolite (higher rise)
vs. neutral cases– Speaker variability
• Some production differences– Limited utility in production of indirect DAs– Beware too steep a rise
![Page 12: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/12.jpg)
04/19/23 12
Corpus Studies: Jurafsky et al ‘98
• Can we distinguish different DA functions for affirmative words– Lexical, acoustic/prosodic/syntactic
differentiators for yeah, ok, uhuh, mhmm, um…
– Functional categories to distinguish• Continuers: Mhmm (not taking floor)• Assessments: Mhmm (tasty)• Agreements: Mhmm (I agree)• Yes answers: Mhmm (That’s right)• Incipient speakership: Mhmm (taking floor)
![Page 13: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/13.jpg)
04/19/23 13
Questions
• Are these terms important cues to dialogue structure?
• Does prosodic variation help to disambiguate them?
• Is there any difference in syntactic realization of certain DAs, compared to others?
![Page 14: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/14.jpg)
04/19/23 14
Corpus
• SwitchBoard telephone conversation corpus– Hand segmented and labeled with DA
information (initially from text) using the SWBD-DAMSL dialogue tagset
• ~60 labels that could be combined in different dimensions
– 84% inter-labeler agreement on tags– Tagset reduced to 42
• 7 CU-Boulder linguistics grad students labeling switchboard conversations of human-to-human interaction
![Page 15: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/15.jpg)
04/19/23 15
– Relabeling from speech only 2% changed labels (114/5757)
• 43/987 continuers --> agreements• Why?
– Shorter duration, lower F0, lower energy, longer preceding pause
– DAs analyzed for• Lexical realization• F0 and intensity features• Syntactic patterns
![Page 16: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/16.jpg)
04/19/23 16
Results: Lexical Differences
• Agreements– yeah (36%), right (11%),...
• Continuer– uhuh (45%), yeah (27%),…
• Incipient speaker– yeah (59%), uhuh (17%), right (7%),…
• Yes-answer– yeah (56%), yes (17%), uhuh (14%),...
![Page 17: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/17.jpg)
04/19/23 17
Prosodic and Lexico/Syntactic Cues
• Over all DA’s, duration best differentiator – Highly correlated with DA length in words
• Assessments: – Pro Term + Copula + (Intensifier) +
Assessment Adjective– That’s X (good, great, fine,…)
![Page 18: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/18.jpg)
04/19/23 18
Observations
• Yeah (and variations) ambiguous – agreement at 36% – incipient speaker at 59%– Yes-answer at 86%
• Uh-huh (with its variations): – a continuer at 45% (vs. yeah at 27%)
• Continuers (compared to agreements) are:– shorter in duration– less intonationally `marked’ – Preceded by longer pauses
![Page 19: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/19.jpg)
04/19/23 19
Hypothesis
• Prosodic information may be particularly helpful in distinguishing DAs with less lexical content
![Page 20: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/20.jpg)
04/19/23 20
Automatic DA Detection
• Rosset & Lamel ’04: Can we detect DAs automatically w/ minimal reliance on lexical content?– Lexicons are domain-dependent– ASR output is errorful
• Corpora (3912 utts total)– Agent/client dialogues in a French bank call
center, in a French web-based stock exchange customer service center, in an English bank call center
![Page 21: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/21.jpg)
04/19/23 21
• DA tags (44) similar to DAMSL– Conventional (openings, closings)– Information level (items related to the semantic
content of the task)– Forward Looking Function:
• statement (e.g. assert, commit, explanation)• infl on Hearer (e.g. confirmation, offer, request)
– Backward Looking Function: • Agreement (e.g. accept, reject)• Understanding (e.g. backchannel, correction)
– Communicative Status (e.g. self-talk, change-mind)
– NB: each utt could receive a tag for each class, so utts represented as vectors
• But…only 197 combinations observed
![Page 22: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/22.jpg)
04/19/23 22
– Method: Memory-based learning (TIMBL)• Uses all examples for classification• Useful for sparse data
– Features• Speaker identity• First 2 words of each turn• # utts in turn• Previously proposed DA tags for utts in turn
– Results• With true utt boundaries:
– ~83% accuracy on test data from same domain– ~75% accuracy on test data from different domain
![Page 23: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/23.jpg)
04/19/23 23
– On automatically identified utt units: 3.3% ins, 6.6% del, 13.5% sub
• Which DAs are easiest/hardest to detect?
DA GE.fr CAP.fr GE.eng
Resp-to 52.0% 33.0% 55.7%
Backch 75.0% 72.0% 89.2%
Accept 41.7% 26.0% 30.3%
Assert 66.0% 56.3% 50.5%
Expression 89.0% 69.3% 56.2%
Comm-mgt 86.8% 70.7% 59.2%
Task 85.4% 81.4% 78.8%
![Page 24: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/24.jpg)
04/19/23 24
• Conclusions– Strong ‘grammar’ of DAs in Spoken Dialogue
systems– A few initial words perform as well as more
![Page 25: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/25.jpg)
04/19/23 25
Phonetic, Prosodic, and Lexical Context Cues to DA Disambiguation
• Hypothesis: Prosodic information may be important for disambiguating shorter DAs
• Observation: ASR errors suggest it would be useful to limit the role of lexical content in DA disambiguation as much as possible …and that this is feasible
• Experiment:– Can people distinguish one (short) DA from
another purely from phonetic/acoustic/prosodic cues?
– Are they better with lexical context?
![Page 26: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/26.jpg)
04/19/23 26
The Columbia Games CorpusCollection
• 12 spontaneous task-oriented dyadic conversations in Standard American English.
• 2 subjects playing a computer game, no eye contact.
Describer: Follower:
![Page 27: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/27.jpg)
04/19/23 27
The Columbia Games CorpusAffirmative Cue Words
Cue Words– alright– gotcha– huh– mm-hm– okay– right– uh-huh– yeah– yep– yes– yup
Functions
– Acknowledgment / Agreement
– Backchannel
– Cue beginning discourse segment
– Cue ending discourse segment
– Check with the interlocutor
– Stall / Filler
– Back from a task
– Literal modifier
– Pivot beginning
– Pivot ending
count1. the
45652. of
15343. okay
11514. and
8865. like
753 …
![Page 28: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/28.jpg)
04/19/23 28
Perception StudySelection of Materials
– okay Speaker 1: but it's gonna be below the onionSpeaker 2: okay
– Cue beginning discourse segment– Backchannel
– Acknowledgment / Agreement
Speaker 1: okay alright I'll try it okaySpeaker 2: okay the owl is blinking
Speaker 1: yeah um there's like there's some space there'sSpeaker 2: okay I think I got it
![Page 29: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/29.jpg)
04/19/23 29
contextualized ‘okay’
Perception StudyExperiment Design
• 54 instances of ‘okay’ (18 for each function).• 2 tokens for each ‘okay’:• Isolated condition: Only the word ‘okay’.• Contextualized condition: 2 full speaker turns:
– The turn containing the target ‘okay’; and– The previous turn by the other speaker.
speakers okayokay
![Page 30: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/30.jpg)
04/19/23 30
Perception StudyExperiment Design
• Two conditions:– Part 1: 54 isolated tokens– Part 2: 54 contextualized tokens
• Subjects asked to classify each token of ‘okay’ as:– Acknowledgment / Agreement, or– Backchannel, or – Cue beginning discourse segment.
![Page 31: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/31.jpg)
04/19/23 31
Perception StudyDefinitions Given to the Subjects
• Acknowledge/Agreement:– The function of okay that indicates “I believe
what you said” and/or “I agree with what you say”.
• Backchannel:– The function of okay in response to another
speaker's utterance that indicates only “I’m still here” or “I hear you and please continue”.
• Cue beginning discourse segment– The function of okay that marks a new segment
of a discourse or a new topic. This use of okay could be replaced by now.
![Page 32: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/32.jpg)
04/19/23 32
Perception StudySubjects and Procedure
• Subjects:– 20 paid subjects (10 female, 10 male).– Ages between 20 and 60.– Native speakers of English.– No hearing problems.
• GUI on a laboratory workstation with headphones.
![Page 33: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/33.jpg)
04/19/23 33
ResultsInter-Subject Agreement
• Kappa measure of agreement with respect to chance (Fleiss ’71)
Isolated Condition Contextualized Condition
Overall .120 .294
Ack / Agree vs. Other .089 .227
Backchannel vs. Other .118 .164
Cue beginning vs. Other .157 .497
![Page 34: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/34.jpg)
04/19/23 34
ResultsCues to Interpretation
• Phonetic transcription of okay:
• Isolated Condition
Strong correlation for realization of initial vowel
Backchannel
Ack/Agree, Cue Beginning
• Contextualized Condition
No strong correlations found for phonetic variants.
![Page 35: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/35.jpg)
04/19/23 35
ResultsCues to Interpretation
Isolated Condition Contextualized Condition
Ack / Agree
Shorter /k/ Shorter latency between turns
Shorter pause before okay
Backchannel
Higher final pitch slope
Longer 2nd syllable
Lower intensity
Higher final pitch slope
More words by S2 before okay
Fewer words by S1 after okay
Cue beginning
Lower final pitch slope
Lower overall pitch slope
Lower final pitch slope
Longer latency between turns
More words by S1 after okay
S1 = Utterer of the target ‘okay’. S2 = The other speaker.
![Page 36: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/36.jpg)
04/19/23 36
Conclusions
• Agreement:– Availability of context improves inter-subject
agreement.– Cue beginnings easier to disambiguate than
the other two functions.• Cues to interpretation:
– Contextual features override word features– Exception: Final pitch slope of okay in both
conditions.• Guide to generation…
![Page 37: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/37.jpg)
04/19/23 37
Summary: Dialogue Act Modeling for SDS
• DA identification– Looks potentially feasible, even when
transcription is errorful– Prosodic and lexical cues useful
• DA generation– Descriptive results may be more useful for
generation than for recognition, ironically– Choice of DA realization, lexical and prosodic
![Page 38: Dialogue Acts](https://reader036.vdocuments.net/reader036/viewer/2022081512/56812d17550346895d9203ed/html5/thumbnails/38.jpg)
04/19/23 38
Next Class
• J&M 22.5• Hirschberg et al ’04• Goldberg et al ’03• Krahmer et al ‘01