morphology phonology discourse semantics human-computer ...skopp/lehre/mmi_ss09/mmi-vl10.pdf ·...

10
MMI / SS09 Human-Computer Interaction Session 10 Natural Dialog Interaction (continued) MMI / SS09 Classical SLDS structure Text-to- Speech Speech Recognition Syntactic analysis and Semantic Interpretation Response Generation Dialogue Management Discourse Interpretation U s e r Phonetics Phonology Morphology Syntax Semantics Pragmatics Discourse MMI / SS09 Voice Command Current automotive speech technology at BMW ! Artikel auf Spiegel Online vom 25.6.2009 25 MMI / SS09 Voice Command 26 Automotive voice command (BMW)

Upload: others

Post on 03-Aug-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Morphology Phonology Discourse Semantics Human-Computer ...skopp/Lehre/MMI_SS09/MMI-VL10.pdf · Human-Computer Interaction Session 10 Natural Dialog Interaction (continued) MMI

MMI / SS09

Human-Computer Interaction

Session 10

Natural Dialog Interaction (continued)

MMI / SS09

Classical SLDS structure

Text-to-Speech

Speech

Recognition

Syntactic analysis and Semantic Interpretation

Response

Generation

Dialogue

Management

Discourse

InterpretationU

s

e

r

PhoneticsPhonology

MorphologySyntaxSemantics

PragmaticsDiscourse

MMI / SS09

Voice Command

Current automotivespeech technology at BMW! Artikel auf

Spiegel Online

vom 25.6.2009

25 MMI / SS09

Voice Command

26

Automotive voice

command (BMW)

Page 2: Morphology Phonology Discourse Semantics Human-Computer ...skopp/Lehre/MMI_SS09/MMI-VL10.pdf · Human-Computer Interaction Session 10 Natural Dialog Interaction (continued) MMI

MMI / SS09

Spoken Language Dialogue Systems (SLDS)

A system that allows a user to speak his queries in natural language and receive useful spoken responses from it

Provides an interface between the user and a computer-based application that permits spoken interaction in a “relatively natural manner”

MMI / SS09

What is a dialogue?

" multiple participants exchange information

" all participants pursue (ideally) the same goal

" discourse develops over the dialogue

" some conventions and protocols exist

" general structure! Dialogue = [episodes]+ (topic changes)

! Episodes = [turn]+ (speaker changes)

! Turn = [utterance]+ (function changes)

MMI / SS09 29 MMI / SS09

A lot to be handled...

" in both monologue and dialogue! information status: what is given, what is new?

! coherence: how do the utterances fit together?

! references: what is being referred to?

! speech acts: what is the intention of the speaker?

! implicature: what can be inferred from it?

" +only in dialogue! turn-taking: who has the the right to speak?

! initiative: who is seizing control of the dialogue?

! grounding: what info is settled between the speakers?

! repair: how to detect and repair misunderstandings?

Page 3: Morphology Phonology Discourse Semantics Human-Computer ...skopp/Lehre/MMI_SS09/MMI-VL10.pdf · Human-Computer Interaction Session 10 Natural Dialog Interaction (continued) MMI

MMI / SS09

Resolve references

" Ellipsis! People often utter partial phrases to avoid repetition

A: At what time is “Titanic” playing?

B: 8pm

A: And “The 5th Element”?

! Necessary to keep track of the conversation to

complete such phrases

" Some words are only interpretable in conext! Anaphora: “I’ll take it”, he said.

! Temporal/spatial: “The man behind me will be dead

tomorrow.”

MMI / SS09

Handle information structure

Distinguish two parts of one utterance

" Theme:

Part of a proposition that repeats known information to

create cohesive connection to previous propositions

(„discourse cohesion“)

" Rheme:

Part of a proposition that contributes new information

Example: Who is he? He is a student.

" There can be purely rhematic/thematic utterances

Theme Rheme

(Bolinger; Halliday, 1960‘s)

MMI / SS09

Understand speech acts

" Every utterance is an action performed by the speaker in a real speech situation

" Obvious in performative sentences: „I name this ship titanic.“, „I bet you 5 bugs.“

" Any sentence in a speech situation constitutes three kinds of acts:

! Locutionary act: the utterance of the sentence „I‘m cold.“

! Illocutionary act: the action in uttering it (asking, answering, commanding, …) ! informing that I‘m cold.

! Perlocutionary act: the production of effects upon the addressee and ultimately the world ! get window closed

" speech act explicates the illocutionary act

Austin (1962), Searle (1975)

MMI / SS09

Understand indirect meaning

S: „What day in May do you want to travel?“

U: „Uh, I need to be there for a meeting that‘s from the 12th the 15th.“

" U does not answer the question, expects hearer to draw certain inferences

" Theory of conversational implicature: hearer can draw inferences because they assume the conversants follows four maxims (Grice, 1975):

! Maxim of Quantity: Be exactly as informative as required

! Maxim of Quality: Make your contribution one that is true

! Maxim of Relevance: Be relevant.

! Maxim of Manner: Be understandable, unambiguous, brief, and orderly

! Maxim of Relevance allows S to know that U wants to travel by the 12th.

Page 4: Morphology Phonology Discourse Semantics Human-Computer ...skopp/Lehre/MMI_SS09/MMI-VL10.pdf · Human-Computer Interaction Session 10 Natural Dialog Interaction (continued) MMI

MMI / SS09

Understand grounding

" Interlocutors are trying to establish common ground, a set of mutual beliefs

" Listener must ground a speaker‘s contribution by acknowledging it, signaling understanding or agreement

" Various ways to do this:

S: „I can upgrade you to an SUV at that rate.“

! Continued attention/permission to proceed - U gazes appreciatively at S

! Relevant next contribution - U: „Do you have an Explorer available?“

! Acknowledgement, “backchanneling” - U: „Ok/Mhm/Great!“

! Display/repetition - U: „You can upgrade me to an SUV at the same rate?“

! Request for repair- U: „Huh?“

Allwood, 1976;Clark & Shaefer, 1989

MMI / SS09

Manage initiative

Control - the ability/license to bring up new topics, to start tasks, to pose questions, etc.

" System-initiative: system always has control, user only responds to system questions

" User-initiative: user always has control, system passivelyanswers user questions

" Mixed-initiative:control switches between system and user either using fixed rules or dynamically based on participant roles, dialogue history, etc.

MMI / SS09

Initiative strategies

" System initiative (spoken “form filling”)S: Please give me your arrival city name.

U: Baltimore.

S: Please give me your departure city name

U: Boston

S:…

" User initiativeU: When do flights to Boston leave?

S: At 8:30 AM and 3:45 PM.

U: How much are they?

S:…

" Mixed initiativeS: Where are you traveling to?

U: I want to go to Boston.

S: At time do you want to fly?

U: Are there any cheap flights?

requires good NLP, users must be aware of possible words

natural, open, unpredictable, hard to model, requires NLP and complex dialogue manag.

Rigid, restricted vocabulary, rigid, NLP easy and more accurat,

MMI / SS09

Understand turns and utterances

" Turn = [utterance]+

" But what is an utterance?! Not a syntactic sentence (may span several turns)

A: We've got you on USAir flight 99

B: Yep

A: leaving on December 1.

! Not a turn (multiple utterances may occur in one turn)A: We've got you on USAir flight 99 leaving on December. Do you

need a rental car?

" Dialogue is characterized by turn-taking! Who should talk next?

! When should they talk?

Page 5: Morphology Phonology Discourse Semantics Human-Computer ...skopp/Lehre/MMI_SS09/MMI-VL10.pdf · Human-Computer Interaction Session 10 Natural Dialog Interaction (continued) MMI

MMI / SS09

Manage turn-taking

" People know well when they can take the turn! Only little speaker overlap (~5% in English)

! But little silence between turns either, a few of 1/10 s

" Less than needed to plan motor routines for speaking

" Speakers usually start motor planning before previous speaker has finished talking !!

" How do we know? ! Schegloff (1968): Adjacency pairs set up speaker

expectations and give rise to discourse obligations

" QUESTION ! ANSWER, REQUEST ! GRANT, ...

" Silence inbetween is dispreferred ! pauses disturb users!

! Sacks et al. (1974): transition-relevance places and rules that govern turn-taking, e.g.

" If current speaker does not select next speaker, any other speaker may take next turn

MMI / SS09 40

Usual structure of HCI dialogues

MMI / SS09

Measuring dialogue efficiency

Highly significant loss of dialogical efficiency in HCI vs. HHI using the PARADISE metric: Walker et al (2001) - dialogue turns / dialogue length

Robert Porzel, Uni Bremen

English Data German Data

MMI / SS09

Page 6: Morphology Phonology Discourse Semantics Human-Computer ...skopp/Lehre/MMI_SS09/MMI-VL10.pdf · Human-Computer Interaction Session 10 Natural Dialog Interaction (continued) MMI

MMI / SS09

Classical SLDS structure

Text-to-Speech

Speech

Recognition

Syntactic analysis and Semantic Interpretation

Response

Generation

Dialogue

Management

Discourse

InterpretationU

s

e

r

Phonetics,Phonology

Morphol.,SyntaxSemantics

Pragmatics,discourse

MMI / SS09

The heart of the system...

Jurafsky & Martin, 2000

(DM)

MMI / SS09

Finite state machine DM

" Graph specifies all legal dialogues (dialogue grammar)! Nodes: system’s questions

! Transitions: possible paths through the network

! Each state represents a stage in the dialogue (“now”), rarely with complete dialogue history

" System has initiative

" Context is fixed by the question being asked

" Used widely in commercial applications

MMI / SS09

Finite state machine DM

Do-it-yourself example: CSLU Toolkithttp://cslu.cse.ogi.edu/toolkit/

(Jurafsky & Martin, 2000)

Page 7: Morphology Phonology Discourse Semantics Human-Computer ...skopp/Lehre/MMI_SS09/MMI-VL10.pdf · Human-Computer Interaction Session 10 Natural Dialog Interaction (continued) MMI

MMI / SS09

Frame-based DM

47

!

!"#$%&'#(%)*+,#-./0%*12(3%$(*456

!"#$%&'*78%"%*#9)*:8%9*).*2.0*:#93*3.*3"#;%-<

(")$$)"'*=,9>03*.?*)%>#"30"%*#9)*#"",;#-*@,32A*)#3%*#9)*3,$%B

+,-%'*C-%#(%*(>%@,?2*38%*)%>#"30"%*#9)*#"",;#-*@,32A*)#3%*#9)*3,$%

./--,0'*1DEDFG*H*!IJK*@.99%@3,.9(*7LDID*)%>#"30"%*-,M%*N!IJK!*

OP+*)%(3,9#3,.9*-,M%*NGJ!"#$%*3,$%*-,M%"!&'($!

!"#$

!"#$%&'*!".$*:8,@8*@,32*#"%*2.0*-%#;,9/<

(")$$)"'*=,9>03*.?*#*@,32B

+,-%'*G%--*$%*38%*9#$%*.?*38%*@,32*2.0*:#93*3.*-%#;%*?".$

%#

!"#$%&'*G.*:8,@8*@,32*).*2.0*:#93*3.*3"#;%-<

(")$$)"'*=,9>03*.?*#*@,32B

+,-%'*G%--*$%*38%*9#$%*.?*38%*@,32*2.0*:#93*3.*3"#;%-*3.

&'()

!"#$%&'*78%9*).*2.0*:#93*3.*3"#;%-<

(")$$)"'*=,9>03*.?*)#3%*#9)*3,$%B

+,-%'*C-%#(%*(>%@,?2*)#3%*#9)*3,$%*.?*2.0"*Q.0"9%2

MMI / SS09

Frame-based DM

" template (frame) containing slots to be filled

" destination: London, date: unknown, time of departure: 9

" questions to fill slots, conditions under which they can be asked

! condition: unknown(origin) & unknown(destination)

question: “Which route do you want to travel?”

! condition: unknown(destination)

question: “Where do you want to travel to?”

" system loops and decides next question based on what information has been elicited and what not yet

" system has initiative, dialogue more flexible, develops based on the current state of the system

" Commercially used, parts of standards: VoiceXML, SALT

! bad for negotiation, planning, mixed-initiative

MMI / SS09

Intention-/plan-based DM

" Idea: dialogue arises from the collaboration of two

or more agents in solving a task! there are goals to be reached

! plans are made to reach those goals

! the goals and plans of the other participants must be inferred or predicted

! goals may involve changing the beliefs of others

! models of the mental state of participants are used

" draws on methods from Artificial Intelligence

" permits more complex interaction between user, system, and underlying application

" allows for mixed-initiative dialogue

MMI / SS09

Example: TRAINS (Traum, Allen, 1996)

" Design system as agent

with own mental states

(Bratman, 1987)

! Beliefs: world model

! Desires: goals

! Intentions: plans to pursue

Reasoning: derive new beliefs

Deliberation: decide actions

Page 8: Morphology Phonology Discourse Semantics Human-Computer ...skopp/Lehre/MMI_SS09/MMI-VL10.pdf · Human-Computer Interaction Session 10 Natural Dialog Interaction (continued) MMI

MMI / SS09

„Conversational Agency“ (David Traum)

" Extending BDI to social attitudes that link one agent to

others in dialogue

! about the conversational partner, including mutual beliefs about the other‘s mental state

! hearer thinks that

speaker wants him

to do an act

! about what the agent should do, but not necessarily wants

to: discourse obligations that inform deliberation

MMI / SS09

Early example: TRAINS (Traum, Allen, 1996)

MMI / SS09

TRAINS dialogue manager

" Explicit representation of conversational state! private and mutual beliefs, beliefs about user beliefs

! proposals (to represent insincere or tentative acts)

! domain plans (goals+actions+objects+constraints), either private, proposed or shared

! discourse goals, represented as scripts specifying goals in different phases of conversation

! obligations implied by received dialogue acts

! intended acts to be generated

! local initiative (who is expected to speak next)

! stack of accessible discourse units

! discourse structure information

MMI / SS09

TRAINS dialogue manager

" Reactive: system will deliberate as little as possible until it can act, running in cycles

" No long-range plans, one step at a time

" Prioritized list of sources for deliberations

1. Discourse obligations

2. Weak obligation: don‘t interrupt user‘s turn

3. Intended speech act (! NLG + state update)

4. Weak obligation: grounding (acknowledge, repair)

5. Discourse goals: proposal negotiation

6. High-level discourse goals (domain reasoning)

Page 9: Morphology Phonology Discourse Semantics Human-Computer ...skopp/Lehre/MMI_SS09/MMI-VL10.pdf · Human-Computer Interaction Session 10 Natural Dialog Interaction (continued) MMI

MMI / SS09

Information State approach

" Central data structure(s) to define conversational state

! employed in deciding on next actions

! updated in effect of dialogue acts by either speaker

" operational semantics of plans stated as update rules

" dialogue manager = definition of the contents of the IS +

description of update processes

(Traum & Larsson, 2003) MMI / SS09

SLDS architectures

" Pipeline structure with message passing! + Incrementality (D. Schlangen, Uni Potsdam)

" Blackboard! System = distributed, collaborating agents

! Dialogue manager hosts central data structures (IS)

! Rationale: Importance of context/discourse for all

stages

Dialogue

Manager

Speech

Recognition

Response

GenerationExternal

Communication

Language

Understanding

Speech

Output

MMI / SS09

" Incremental dialogue processing

! G. Skantze (KTH Stockholm), D. Schlangen (Uni Potsdam)

57 MMI / SS09

Summary

Features/

dialogue

control

State-based Frame-based Intention-based

Input Single words or

phrases

NL with concept

spotting

Unrestricted NL

Verification Explicit confirmation

of each turn or at

end

Explicit & implicit

confirmation

Grounding

Dialogue

Context

Implicitly in dialogue

states

Explicitly represented

Control represented

with algorithm

Model of

System’s BDI +

dialogue history

User Model Simple model of user

characteristics /

preferences

Simple model of user

characteristics /

preferences

Model of User’s

BDI

Page 10: Morphology Phonology Discourse Semantics Human-Computer ...skopp/Lehre/MMI_SS09/MMI-VL10.pdf · Human-Computer Interaction Session 10 Natural Dialog Interaction (continued) MMI

MMI / SS09

Next session: multimodal interaction

Turn it like

that!

speech recognition

& lip reading