11
Université Paris 8Université Paris 8
Multimodal Expressive Multimodal Expressive Embodied Conversational AgentsEmbodied Conversational Agents
Catherine PelachaudCatherine Pelachaud Elisabetta Bevacqua
Nicolas Ech Chafai, FT
Maurizio Mancini
Magalie Ochs, FT
Christopher Peters
Radek Niewiadomski
22
ECAs CapabilitiesECAs Capabilities
Anthropomorphic autonome figures Anthropomorphic autonome figures New form on human-machine New form on human-machine
interactioninteraction Study of human communication, Study of human communication,
human-human interactionhuman-human interaction ECAs ought to be endowed with ECAs ought to be endowed with
dialogic and expressive capabilities dialogic and expressive capabilities Perception: an ECA must be able to pay Perception: an ECA must be able to pay
attention to, perceive user and the attention to, perceive user and the context she is placed in.context she is placed in.
33
ECAs capabilitiesECAs capabilities
Interaction: Interaction: – speaker and addressee speaker and addressee emitsemits signals signals– speaker speaker perceivesperceives feedback from addressee feedback from addressee– speaker may decide to speaker may decide to adaptadapt to addressee’s to addressee’s
feedbackfeedback– consider social context consider social context
Generation: expressive synchronized visual Generation: expressive synchronized visual and acoustic behaviors. and acoustic behaviors. – produce expressive behavioursproduce expressive behaviours
words, voice, intonation,words, voice, intonation, gaze, facial expression, gesturegaze, facial expression, gesture body movements, body posturebody movements, body posture
44
Synchrony tool - Synchrony tool - BEATBEAT
Cassell et al, Media Cassell et al, Media Lab MITLab MIT
Decomposition of Decomposition of text into theme and text into theme and rhemerheme
Linked to WordNetLinked to WordNet Computation of:Computation of:
– intonationintonation– gazegaze– gesturegesture
55
Virtual Training Environments Virtual Training Environments MREMRE
(J. Gratch, L. Jonhson, S. (J. Gratch, L. Jonhson, S. Marsella…, USC)Marsella…, USC)
66
Interactive SystemInteractive System
Real state agentGesture synchronized with speech and intonation Small talk Dialog partner
77
MAX, MAX, S. Kopp, U of S. Kopp, U of BielefeldBielefeld
Gesture understanding and imitation
88
Gilbert and George at Gilbert and George at the Bank (Upenn, 1994)the Bank (Upenn, 1994)
99
1010
GretaGreta
1111
Problem to Be SolvedProblem to Be Solved Human communication is endowed Human communication is endowed
with three devices to express with three devices to express communicative intention:communicative intention:– Verbs and formulasVerbs and formulas– Intonation and paralinguisticIntonation and paralinguistic– Facial expression, gaze, gesture, body Facial expression, gaze, gesture, body
movement, posture…movement, posture…
Problem: For any communicative Problem: For any communicative act, the Speaker has to decide:act, the Speaker has to decide:– Which nonverbal behaviors to showWhich nonverbal behaviors to show– How to execute themHow to execute them
1212
Verbal and Nonverbal Verbal and Nonverbal CommunicationCommunication
Suppose I want to advise a friend to put on Suppose I want to advise a friend to put on her coat because it is snowing.her coat because it is snowing.
Which signals do I use?Which signals do I use?
Verbal signal: use of a syntactically complex Verbal signal: use of a syntactically complex sentence: sentence:
Take your umbrella because it is rainingTake your umbrella because it is raining
Verbal + nonverbal signals:Verbal + nonverbal signals:
Take your umbrella +Take your umbrella + point out to the window to point out to the window to show the rain by a gesture or by gazeshow the rain by a gesture or by gaze
1313
Multimodal SignalsMultimodal Signals
The whole body communicates by using:The whole body communicates by using:– Verbal acts (words and sentences)Verbal acts (words and sentences)– Prosody, intonation (nonverbal vocal signals)Prosody, intonation (nonverbal vocal signals)– Gesture (hand and arm movements)Gesture (hand and arm movements)– Facial action (smile, frown)Facial action (smile, frown)– Gaze (eyes and head movements)Gaze (eyes and head movements)– Body orientation and posture (trunk and leg Body orientation and posture (trunk and leg
movements)movements)
All these systems of signals have to All these systems of signals have to cooperate in expressing overall meaning cooperate in expressing overall meaning of communicative act.of communicative act.
1414
Multimodal SignalsMultimodal Signals
Accompany flow of speechAccompany flow of speech
Synchronized at the verbal levelSynchronized at the verbal level
Punctuate accented phonemic segments Punctuate accented phonemic segments and pausesand pauses
Substitute for word(s)Substitute for word(s)
Emphasize what is being saidEmphasize what is being said
Regulate the exchange of speaking turnRegulate the exchange of speaking turn
1515
SynchronizationSynchronization There exists an isomorphism between There exists an isomorphism between
patterns of speech, intonation and facial patterns of speech, intonation and facial actionsactions
Different levels of synchrony:Different levels of synchrony:
– Phoneme level (blink)Phoneme level (blink)
– Word level (eyebrow)Word level (eyebrow)
– Phrase level (hand gesture)Phrase level (hand gesture)
Interactional synchrony: Synchrony Interactional synchrony: Synchrony between speaker and addresseebetween speaker and addressee
1616
Taxonomy of Communicative Taxonomy of Communicative Functions (I. Poggi)Functions (I. Poggi)
The speaker may provide three broad The speaker may provide three broad types of information about:types of information about:– Information about the world: deictic, iconic Information about the world: deictic, iconic
(adjectival),…(adjectival),…– Information about the speaker’s mind: Information about the speaker’s mind:
belief (certainty, adjectival)belief (certainty, adjectival) goal (performative, rheme/theme, turn-system, belief goal (performative, rheme/theme, turn-system, belief
relation)relation) emotionemotion meta-cognitivemeta-cognitive
– Information about speaker’s identity (sex, Information about speaker’s identity (sex, culture, age…)culture, age…)
1717
Multimodal Signals Multimodal Signals (Isabella Poggi)(Isabella Poggi)
Characterization of multimodal signals by Characterization of multimodal signals by their placement with respect to linguistic their placement with respect to linguistic utterance and significance in transmitting utterance and significance in transmitting information. Eg:information. Eg:– Raised eyebrow may signal surprise, Raised eyebrow may signal surprise,
emphasis, question mark, suggestion…emphasis, question mark, suggestion…
– Smile may express happiness, be a polite Smile may express happiness, be a polite greeting, be a backchannel signal…greeting, be a backchannel signal…
Need two information to characterize Need two information to characterize multimodal signals:multimodal signals:– Their meaningTheir meaning– Their visual actionTheir visual action
1818
Lexicon=(meaning, signal)Lexicon=(meaning, signal)
Expression meaningExpression meaning
– deicticdeictic: this, that, here, there: this, that, here, there– adjectivaladjectival: small, difficult: small, difficult– certaintycertainty: certain, uncertain…: certain, uncertain…– performativeperformative: greet, request: greet, request– topictopic commentcomment: emphasis : emphasis – BeliefBelief relationrelation: contrast,…: contrast,…– turn allocationturn allocation: take/give turn: take/give turn– affectiveaffective: anger, fear, happy-: anger, fear, happy-
for, sorry-for, envy, relief, ….for, sorry-for, envy, relief, ….
Expression signalExpression signal– Deictic:Deictic: gaze direction gaze direction– Certainty: Certainty: CertainCertain: palm up : palm up
open hand; open hand; UncertainUncertain: raised : raised eyebroweyebrow
– adjectival:adjectival: small eye aperture small eye aperture – Belief relation:Belief relation: ContrastContrast: raised : raised
eyebroweyebrow– Performative:Performative: SuggestSuggest: small : small
raised eyebrow, head aside; raised eyebrow, head aside; AssertAssert: horizontal ring: horizontal ring
– Emotion: Emotion: Sorry-forSorry-for: head : head aside, inner eyebrow up; aside, inner eyebrow up; JoyJoy: : raising fist upraising fist up
– Emphasis:Emphasis: raised eyebrows, raised eyebrows, head nod, beathead nod, beat
1919
Representation LanguageRepresentation Language Affective Presentation Markup Language – APMLAffective Presentation Markup Language – APML
– describes the communicative functions describes the communicative functions
– works at meaning level and not the signal levelworks at meaning level and not the signal level
<APML>
<turn-allocation type="take turn"> <performative type="greet">
Good Morning, Angela. </performative>
<affective type="happy"> It is so
<topic-comment type="comment"> wonderful </topic-comment>
to see you again. </affective> <certainty type="certain"> I was
<topic-comment type="comment"> sure </topic-comment>
we would do so, one day! </certainty> </turn-allocation> </APML>..
2020
Facial Description Facial Description LanguageLanguage
Facial expressions defined as (meaning, Facial expressions defined as (meaning, signal) pairs stored in librarysignal) pairs stored in library
Hierarchical set of classes:Hierarchical set of classes:– Facial basis FB class: basic facial movementFacial basis FB class: basic facial movement– An FB may be represented as a set of MPEG-4 An FB may be represented as a set of MPEG-4
compliant FAPs or recursively, as a compliant FAPs or recursively, as a combination of other FBs using the `+' combination of other FBs using the `+' operatorsoperators FB={fap3=vFB={fap3=v11,…,fap69=v,…,fap69=vkk};};
FB'=cFB'=c11*FB*FB11+c+c22*FB*FB22;;
where cwhere c11 and c and c2 2 are constants and FBare constants and FB11 and FB and FB22 can be: can be:– Previous defined FBs Previous defined FBs
– FB of the form: {fap3=vFB of the form: {fap3=v11,…,fap69=v,…,fap69=vkk}}
2121
Facial basis classFacial basis class
Facial basis class Facial basis class – Examples of facial basis class:Examples of facial basis class:
Eyebrow: small_frown, left_raise, Eyebrow: small_frown, left_raise, right_raiseright_raise
Eyelid: upper_lid_raiseEyelid: upper_lid_raise Mouth: left_corner_stretch, Mouth: left_corner_stretch,
left_corner_raiseleft_corner_raise
+ =
2222
Facial DisplaysFacial Displays
Every facial display (FD) is made up of Every facial display (FD) is made up of one or more FBs:one or more FBs:– FD=FBFD=FB11 + FB + FB22 + FB + FB33 + … + FB + … + FBnn;;
– surprise=raise_eyebrow+raise_lid+open_msurprise=raise_eyebrow+raise_lid+open_mouth;outh;
– worried=(surprise*0.7)+sadness;worried=(surprise*0.7)+sadness;
2323
Facial DisplaysFacial Displays
Probabilistic mapping between the tags and signals:Probabilistic mapping between the tags and signals:
– Es: happy_for = (smile*0.5, 0.3) + (smile*0.25) + (smile*2 Es: happy_for = (smile*0.5, 0.3) + (smile*0.25) + (smile*2 + raised_eyebrow, 0.35) + (nothing, 0.1)+ raised_eyebrow, 0.35) + (nothing, 0.1)
Definition of a function class for addressee Definition of a function class for addressee association (meaning, signal)association (meaning, signal)
Class communicative function:Class communicative function:– CertaintyCertainty– AdjectivalAdjectival– PerformativePerformative– AffectiveAffective– ……
2424
Facial Temporal CourseFacial Temporal Course
2525
Gestural LexiconGestural Lexicon Certainty: Certainty:
– Certain: palm up open handCertain: palm up open hand– Uncertain: showing empty hands while lowering Uncertain: showing empty hands while lowering
forearmsforearms Belief-relation:Belief-relation:
– List of items of same class: numbering on fingersList of items of same class: numbering on fingers– Temporal relation: fist with extended hand moves back Temporal relation: fist with extended hand moves back
and forth behind one’s shoulderand forth behind one’s shoulder Turn-taking:Turn-taking:
– Hold the floor: raise hand, palm toward hearer Hold the floor: raise hand, palm toward hearer Performative: Performative:
– Assert: horizontal ringAssert: horizontal ring– Reproach: extended index, palm to left, rotating up & Reproach: extended index, palm to left, rotating up &
down on wristdown on wrist Emphasis: beatEmphasis: beat
2626
Gesture Specification Gesture Specification LanguageLanguage
Scripting language for hand-arm gestures, Scripting language for hand-arm gestures, based on formational parameters [Stokoe]:based on formational parameters [Stokoe]:– Hand shape specified using HamNoSys [Prillwitz et. al.]Hand shape specified using HamNoSys [Prillwitz et. al.]
– Arm position: concentric squares in front of agent Arm position: concentric squares in front of agent [McNeill][McNeill]
– Wrist orientation: palm and finger base orientationWrist orientation: palm and finger base orientation
Gestures are defined by a sequence of timed Gestures are defined by a sequence of timed key poses: gesture framekey poses: gesture frame
Gestures are broken down temporally into Gestures are broken down temporally into distinct (optional) phases:distinct (optional) phases:– Gesture phase: preparation, stroke, hold, retractionGesture phase: preparation, stroke, hold, retraction– Change of formational components over time Change of formational components over time
2727
Gesture Gesture specification specification
example: example: CertainCertain
2828
Gesture Temporal CourseGesture Temporal Course
rest position preparation stroke start – stroke end
retraction rest position
2929
ECA architectureECA architecture
3030
ECA ArchitectureECA Architecture
Input to the system: APML annotated textInput to the system: APML annotated text Output to the system: Animation files and Output to the system: Animation files and
WAV file for the audioWAV file for the audio System: System:
– Interprets APML tagged dialogs, i.e. all Interprets APML tagged dialogs, i.e. all communicative functionscommunicative functions
– Looks in a library the mapping between the Looks in a library the mapping between the meaning (specified by the XML-tag) and signalsmeaning (specified by the XML-tag) and signals
– Decides which signals to convey on which Decides which signals to convey on which modalitiesmodalities
– Synchronizes the signals with speech at different Synchronizes the signals with speech at different levels (word, phoneme or utterance)levels (word, phoneme or utterance)
3131
Behavioral EngineBehavioral Engine
3232
ModulesModules APML ParserAPML Parser: XML parser: XML parser
TTS FestivalTTS Festival: manages the speech synthesis and give us : manages the speech synthesis and give us the list of phonemes and phonemes duration.the list of phonemes and phonemes duration.
Expr2Signal ConverterExpr2Signal Converter: given a communicative : given a communicative function and its meaning, this module returns the list of function and its meaning, this module returns the list of facial signals facial signals
Conflicts ResolverConflicts Resolver: resolves the conflicts that may : resolves the conflicts that may happened when more than one facial signals should be happened when more than one facial signals should be activated on same facial partsactivated on same facial parts
Face GeneratorFace Generator: converts the facial signals into MPEG-4 : converts the facial signals into MPEG-4 FAP valuesFAP values
Viseme GeneratorViseme Generator: converts each phoneme, given by : converts each phoneme, given by Festival, into a set of FAPsFestival, into a set of FAPs
MPEG4 FAP DecoderMPEG4 FAP Decoder: is an MPEG-4 compliant Facial : is an MPEG-4 compliant Facial Animation Engine Animation Engine
3333
TTS FestivalTTS Festival Drive the synchronization of facial expressionDrive the synchronization of facial expression Synchronization implemented at word levelSynchronization implemented at word level
– Timing of facial expression connected to the text Timing of facial expression connected to the text embedded between the markersembedded between the markers
Use of the tree structure of Festival to Use of the tree structure of Festival to compute expressions durationcompute expressions duration
3434
Expr2Signal ConverterExpr2Signal Converter
Instantiation of APML tags: meaning Instantiation of APML tags: meaning of a given communicative functionof a given communicative function
Converts markers into facial signalsConverts markers into facial signals
Use of a library containing the Use of a library containing the lexicon of the type (meaning, facial lexicon of the type (meaning, facial expressions)expressions)
3535
Gaze ModelGaze Model
Based on communicative functions’ model Based on communicative functions’ model of Isabella Poggiof Isabella Poggi
This model predicts what should be the This model predicts what should be the value of gaze in order to have a given value of gaze in order to have a given meaning in a given conversational context. meaning in a given conversational context.
For example:For example:
– agent wants to emphasize a given word, the agent wants to emphasize a given word, the model will output that the agent should gaze at model will output that the agent should gaze at her conversant.her conversant.
3636
Gaze ModelGaze Model
Very deterministic behavior model: at every Very deterministic behavior model: at every Communicative Function associated with a Communicative Function associated with a meaning correspond the same signal (with meaning correspond the same signal (with probabilistic changes)probabilistic changes)
Event-driven model: only when a Event-driven model: only when a Communicative Function is specified the Communicative Function is specified the associated signals are computedassociated signals are computed
only when a Communicative Function is only when a Communicative Function is specified, the corresponding behavior may specified, the corresponding behavior may varyvary
3737
Gaze ModelGaze Model
Several drawbacks as there is no Several drawbacks as there is no temporal consideration:temporal consideration:
– No consideration of past and current No consideration of past and current gaze behavior to compute the new onegaze behavior to compute the new one
– No consideration of how long the current No consideration of how long the current gaze state of S and L has lastedgaze state of S and L has lasted
3838
Gaze AlgorithmGaze Algorithm Two steps:Two steps:
1.1. Communicative prediction:Communicative prediction:• Apply the communicative function model to Apply the communicative function model to
compute the gaze behavior as to convey a compute the gaze behavior as to convey a given meaning for S and Lgiven meaning for S and L
2.2. Statistical prediction:Statistical prediction:• The communicative gaze model is The communicative gaze model is
probabilistically modified by a statistical probabilistically modified by a statistical model defined with constraints:model defined with constraints:– what is the communicative gaze behavior of S what is the communicative gaze behavior of S
and Land L– in which gaze behavior S and L werein which gaze behavior S and L were– the duration of the current state of S and Lthe duration of the current state of S and L
3939
Temporal Gaze Temporal Gaze ParametersParameters
The gaze behaviors depend on the communicative The gaze behaviors depend on the communicative functions, general purpose of the conversation functions, general purpose of the conversation (persuasion discours, teaching...), personality, cultural (persuasion discours, teaching...), personality, cultural root, social relations... root, social relations...
Very, too, complex modelVery, too, complex model
propose parameters that control the gaze behavior propose parameters that control the gaze behavior overalloverall
TTS=1,L=1S=1,L=1maxmax: maximum duration the mutual gaze state may remain active.: maximum duration the mutual gaze state may remain active.
TTS=1S=1maxmax : maximum duration of gaze state S=1. : maximum duration of gaze state S=1.
TTL=1L=1maxmax : maximum duration of gaze state L=1 . : maximum duration of gaze state L=1 .
TTS=0S=0maxmax : maximum duration of gaze state S=0. : maximum duration of gaze state S=0.
TTL=0L=0maxmax : maximum duration of gaze state L=0. : maximum duration of gaze state L=0.
4040
Mutual Gaze
4141
Gaze Aversion
4242
Gesture PlannerGesture Planner Adaptive instantiation:Adaptive instantiation:
– Preparation and retraction phase adjustmentsPreparation and retraction phase adjustments
– Transition key and rest gesture insertionTransition key and rest gesture insertion
– Joint-chain follow-throughJoint-chain follow-through Forward time shifting of children joints in timeForward time shifting of children joints in time
Stroke of gesture on stressed wordStroke of gesture on stressed word
Stroke expansionStroke expansion During planning phase, identify During planning phase, identify rhemerheme clauses with clauses with
closely repeated emphases/pitch accentsclosely repeated emphases/pitch accents
Indicate secondary accents by repeating the stroke Indicate secondary accents by repeating the stroke of the primary gesture with decreasing amplitudeof the primary gesture with decreasing amplitude
4343
Gesture PlannerGesture Planner Determination of gesture:Determination of gesture:
– Look in dictionaryLook in dictionary
Selection of gestureSelection of gesture
– Gestures associated with most embedded tags Gestures associated with most embedded tags have priority (except beat): adjectival, deictichave priority (except beat): adjectival, deictic
Duration of gesture:Duration of gesture:
– Coarticulation between successive gestures Coarticulation between successive gestures closed in timeclosed in time
– Hold for gestures belonging to higher up tag Hold for gestures belonging to higher up tag hierarchy (e.g. performative, belief-relation)hierarchy (e.g. performative, belief-relation)
– Otherwise go to rest positionOtherwise go to rest position
4444
Behavior ExpressivityBehavior Expressivity
Behavior is related to the (Wallbott, 1998):Behavior is related to the (Wallbott, 1998):– qualityquality of the mental state (e.g. emotion) it refers of the mental state (e.g. emotion) it refers
toto– quantityquantity (somehow linked to the intensity factor (somehow linked to the intensity factor
of the mental state)of the mental state)
Behaviors encode: Behaviors encode: – content information (the ‘What is communicating’)content information (the ‘What is communicating’)– expressive information (the ‘How it is expressive information (the ‘How it is
communicating’)communicating’)
Behavior expressivity refers to the manner of Behavior expressivity refers to the manner of execution of the behaviorexecution of the behavior
4545
Expressivity DimensionsExpressivity Dimensions
SpatialSpatial: amplitude of movement: amplitude of movement TemporalTemporal: duration of movement: duration of movement PowerPower: dynamic property of movement: dynamic property of movement FluidityFluidity: smoothness and continuity of : smoothness and continuity of
movementmovement RepetitivenessRepetitiveness: tendency to rhythmic repeats: tendency to rhythmic repeats Overall Activation:Overall Activation: quantity of movement quantity of movement
across modalitiesacross modalities
4646
Overall ActivitationOverall Activitation
• Threshold filter on atomic behaviors during APML tag matching
• Determines the number of nonverbal signals to be executed.
4747
Spatial ParameterSpatial Parameter
• Amplitude of movement controlled through asymmetric scaling of the reach
• space that is used to find IK goal positions
• Expand or condense the entire space in front of agent
4848
Temporal parameterTemporal parameter
Stroke shift / velocity control of a beat gesture
Y p
osit
ion
of w
rist
w.r
.t. s
houl
der
[cm
]
Frame #
• Determine the speed of the arm movement of a gesture's meaning-carrying stroke phase
• Modify speed of stroke
4949
FluidityFluidity• Continuity control of TCB interpolation splines and gesture-to-gesture• Continuity of arms’ trajectory paths• Control the velocity profiles of an action
coarticulation
X p
osit
ion
of w
rist
w.r
.t. s
houl
der
[cm
]
Frame #
5050
PowerPower
• Tension and Bias control of TCB splines;• Overshoot reduction• Acceleration and deceleration of limbs
Hand shape control for gestures that do not need hand configuration to convey their meaning (beats).
5151
RepetitivityRepetitivity
• Technique of stroke expansion: Consecutive emphases are realized gesturally by repeating the stroke of the first gesture.
5252
Multiple Modality Ex: Multiple Modality Ex: AbruptAbrupt
Overall Activity = 0.6Spatial = 0Temporal = 1Fluidity = -1Power = 1Repetition = -1
5353
Multiple Modality Ex: Multiple Modality Ex: VigorousVigorous
Overall Activity = 1
Spatial = 1
Temporal = 1
Fluidity = 1
Power = 0
Repetition = 1
5454
Evaluation of Expressive Evaluation of Expressive GestureGesture
(H1) The chosen implementation for mapping single (H1) The chosen implementation for mapping single dimensions of expressivity onto animation dimensions of expressivity onto animation parameters is appropriate - a change in a single parameters is appropriate - a change in a single dimension can be recognized and correctly dimension can be recognized and correctly attributed by users.attributed by users.
(H2) Combining parameters in such a way that they (H2) Combining parameters in such a way that they reflect a given communicative intent will result in reflect a given communicative intent will result in more believable overall impression of the agent.more believable overall impression of the agent.
106 subjects from 17 to 26 years old106 subjects from 17 to 26 years old
5555
Perceptual Test StudiesPerceptual Test Studies
Evaluation of the adequacy of the implementation of each Evaluation of the adequacy of the implementation of each parameter:parameter:– check whether subjects could perceive and distinguish the six check whether subjects could perceive and distinguish the six
different expressivity parameters and indicate their direction of different expressivity parameters and indicate their direction of change. change.
– Result: good recognition for Result: good recognition for spatialspatial and and temporaltemporal parameters; parameters; lower recognition for lower recognition for fluidityfluidity and and powerpower parameters as they are parameters as they are inter-dependent.inter-dependent.
Evaluation task: does setting appropriate values for the Evaluation task: does setting appropriate values for the expressivity parameters create behaviors that are judged expressivity parameters create behaviors that are judged as exhibiting corresponding expressivity?as exhibiting corresponding expressivity?– 3 different types of behaviors: 3 different types of behaviors: abrupt, sluggish, vigorousabrupt, sluggish, vigorous– users prefer the coherent performance for vigorous and abruptusers prefer the coherent performance for vigorous and abrupt
5656
InteractionInteraction Interaction: two or more parties exchange Interaction: two or more parties exchange
messages. messages. Interaction is by no means a one way Interaction is by no means a one way
communication channel between parties. communication channel between parties. Within an interaction, parties take turns in playing Within an interaction, parties take turns in playing
the roles of the speaker and of the addressee.the roles of the speaker and of the addressee.
5757
InteractionInteraction
Speaker and addressee adapt their Speaker and addressee adapt their behaviors to each otherbehaviors to each other
– Speaker monitors addressees attention Speaker monitors addressees attention and interest in what he has to sayand interest in what he has to say
– addressee selects feedback behaviors to addressee selects feedback behaviors to show the speaker that he is paying show the speaker that he is paying attentionattention
5858
InteractionInteraction
Speaker:Speaker:– Pointless for a speaker to engage in an Pointless for a speaker to engage in an
act of communication if addressee does act of communication if addressee does not pay or intend to pay attentionnot pay or intend to pay attention
– Important for speaker to assess Important for speaker to assess addressee’s engagement at:addressee’s engagement at: when starting an interaction: assess the when starting an interaction: assess the
possibility of engagement in interaction possibility of engagement in interaction ((establish phaseestablish phase))
when interaction is going on: check if when interaction is going on: check if engagement is lasting and sustaining engagement is lasting and sustaining conversation (conversation (maintain phasemaintain phase))
5959
InteractionInteraction
addresseeaddressee– attentionattention: pay attention to the signals produced by : pay attention to the signals produced by
speaker to perceive, process and memorize themspeaker to perceive, process and memorize them– perceptionperception: of signals: of signals– comprehensioncomprehension: understand meaning attached to : understand meaning attached to
signalssignals– internal reactioninternal reaction: the comprehension of the : the comprehension of the
meaning may create cognitive and emotional meaning may create cognitive and emotional reactionreaction
– decisiondecision: communication or not of the internal : communication or not of the internal reactionreaction
– generationgeneration: display behaviors: display behaviors
6060
BackchannelBackchannel
Types of backchannels (I. Poggi):Types of backchannels (I. Poggi):– attentionattention– comprehensioncomprehension– beliefbelief– interestinterest– agreementagreement
positive/negativepositive/negative any combination of the above: pay any combination of the above: pay
attention but not understand; understand attention but not understand; understand but non believe, etc.but non believe, etc.
6161
BackchannelBackchannel
Depending on the type of speech act Depending on the type of speech act they respond to, a signal will be they respond to, a signal will be interpreted as a backchannel or not.interpreted as a backchannel or not.– backchannel: a signal of agreement / backchannel: a signal of agreement /
disagreement that follows the expression disagreement that follows the expression of opinions, evaluations, planningof opinions, evaluations, planning
– not a backchannel: a signal of not a backchannel: a signal of comprehension / incomprehension after comprehension / incomprehension after an explicit question « Did you an explicit question « Did you understand? »understand? »
6262
BackchannelBackchannel
Polysemy of backchannel signals: Polysemy of backchannel signals: – a signal may provide different types of a signal may provide different types of
informationinformation– a frown: negative feedback for a frown: negative feedback for
understanding, believing and agreeingunderstanding, believing and agreeing
6363
Backchannel signals of gazeBackchannel signals of gaze gaze: gaze:
– show direction of attentionshow direction of attention– inform on level of engagement or on intention to inform on level of engagement or on intention to
maintain engagementmaintain engagement– indicate degree of intimacy indicate degree of intimacy but alsobut also– monitor the gaze behavior of others to establish monitor the gaze behavior of others to establish
their intention to engage or maintain engagedtheir intention to engage or maintain engaged shared attention situation involved mutual shared attention situation involved mutual
gaze at each other partner or mutual gaze gaze at each other partner or mutual gaze at a same objectat a same object
6464
Backchannel modellingBackchannel modelling
Reactive modelReactive model– generates an instinctive feedback without reasoninggenerates an instinctive feedback without reasoning
– simple backchannel or mimicrysimple backchannel or mimicry
– spontaneous - sincere spontaneous - sincere
Cognitive modelCognitive model– conscious decision to provide backchannel to conscious decision to provide backchannel to
provoke a particular effect on the speaker or to provoke a particular effect on the speaker or to reach a specific goalreach a specific goal
– deliberate – possibly pretendeddeliberate – possibly pretended
– it can be shifted to automatic (it can be shifted to automatic (ex. ex. when listening to a when listening to a borebore))
6565
Backchannel DemoBackchannel Demo
6666
A reactive backchannelA reactive backchannel
Currently, our model is Currently, our model is reactivereactive in in naturenature
– Dependent on perceptionDependent on perception Speaker interprets addressee’s behaviorSpeaker interprets addressee’s behavior
Speaker generates or alters its own behaviorSpeaker generates or alters its own behavior
– Our focus: interest and attention on a Our focus: interest and attention on a signal level (not on a cognitive level)signal level (not on a cognitive level)
6767
Organization of the Organization of the communication communication Attraction of Attraction of attentionattention
Communicative agents: Communicative agents: the agents provide information to the user, and the agents provide information to the user, and should guarantee the user pay attentionshould guarantee the user pay attention
Animation expressivity:Animation expressivity: principle of “staging”, so that a single idea is principle of “staging”, so that a single idea is clearly expressed at each instant of timeclearly expressed at each instant of time
Animation specificity:Animation specificity: animators’ creativity, no realistic constraints for animators’ creativity, no realistic constraints for animators animators
What types of gesture properties could guarantee user’s attention?
France Telecom
6868
Organization of the Organization of the communication communication Attraction of attentionAttraction of attention
Corpus:Corpus: videos from traditional animationvideos from traditional animation that that illustrate different types of conversational illustrate different types of conversational interactioninteraction
the modulations of gesture expressivity over time play a role in managing communication, thus serving as a pragmatic tool France Telecom
6969
EmotionEmotion
elicited by the evaluation of events, elicited by the evaluation of events, objects, actionsobjects, actions
integration of emotions in a dialog integration of emotions in a dialog system (Artimis, FT)system (Artimis, FT)
identify under which circumstances a identify under which circumstances a dialog agent should express dialog agent should express emotionsemotions
France Telecom
7070
EmotionEmotion
BDI representationBDI representation based on OCC model: Appraisal variablesbased on OCC model: Appraisal variables [Ortony [Ortony
et al.et al. 1988]: 1988]:– Desirability/Undesirability Desirability/Undesirability : Achievement or threaten of the : Achievement or threaten of the
agent's choice agent's choice – Degree of realizationDegree of realization : Degree of certainty of the choice's : Degree of certainty of the choice's
achievementachievement– Probability of an eventProbability of an event : Probability of feasibility of an event : Probability of feasibility of an event– AgencyAgency : The agent who is actor of the event : The agent who is actor of the event
Emotional Mental State
Set of appraisal variables Configuration of mental
attitudes
Representation of appraisal variables by mental attitudes
France Telecom
7171
EmotionEmotion
complex emotions:complex emotions:– superposition of 2 emotions: evaluation of superposition of 2 emotions: evaluation of
an event can happen under different an event can happen under different anglesangles
– mask an emotion by another one : mask an emotion by another one : consideration of social contextconsideration of social context
joy + deception = maskingjoy + deception = masking
7272
VideoVideoMasking of Deception by JoyMasking of Deception by Joy
7373
ConclusionConclusion
Creation of a virtual agent able to Creation of a virtual agent able to – communicate nonverballycommunicate nonverbally– show emotionsshow emotions– use expressive gesturesuse expressive gestures– perceive and be attentiveperceive and be attentive– maintain the attentionmaintain the attention
Two studies on expressivityTwo studies on expressivity– from manual annotation of video corpusfrom manual annotation of video corpus– from mimicry of movement analysisfrom mimicry of movement analysis