communicating virtual agents - uni-bielefeld.deskopp/download/ki3_2.pdf · 2 kopp & krämer...

6
1 KI3 Communicating Virtual Agents Nicole Krämer [email protected] University of Cologne, Germany Stefan Kopp [email protected] University of Bielefeld, Germany Part 2: Bases of Multimodal Communication Kopp & Krämer KI3: Communicating virtual agents Overview I. Introduction Motivation, history, recent developments Evaluation II. Bases of multimodal communication Channels and functions of multimodal communication Synthetic communicative behaviors, e.g., facial & gestural animation, speech synthesis III. Modeling conversational behavior Underlying models & architecture Top-down vs. bottom-up Outlook & discussion Kopp & Krämer KI3: Communicating virtual agents ...knowledge about communication when implementing virtual agents that communicate in a human like fashion Conversational behavior is highly complex. Since the agent is supposed to behave „autonomously“, we need to know some rules. In order to build agents that are accepted and efficient, we need to know about the effects of specific behaviors. Communication research has to provide bases and rules of communication (fundamental research) as well as evaluate the effects of the agents (applied research). Problem: Most of the relevant bases and rules are not known yet! We need... Kopp & Krämer KI3: Communicating virtual agents Channels of communication behavior (I) Communication has an enormous complexity that mainly is caused by the variety of different channels and their interdependency. Verbal and nonverbal communication (Scherer & Wallbott, 1979), vocal and nonvocal channels (Laver & Hutcheson, 1972) „Basic triple structure“ of communication: language, paralanguage and kinesics (Poyatos, 1983) Studies show that especially the nonverbal behavior is of crucial importance for communication and person perception (Mehrabian & Ferris, 1967; „snap judgements“, Schneider, Hastorff & Ellsworth, 1979). Kopp & Krämer KI3: Communicating virtual agents Channels of communication behavior (II) Nonverbal behavior channels (according to Wallbott, 1994) vocal Time dependent aspects Voice dependent aspects Continuity dependent aspects nonvocal Motor channels Physio- chemical channels Ecological channels Facial expression Gestures Gaze Posture Olfactory Tactile Thermal Territory Interpersonal distance Appearance Kopp & Krämer KI3: Communicating virtual agents Further important features Dimensional complexity – interdependence with respect to the effects (dependence on various contexts: other channels, interaction partners, situational context) Sequential complexity - time structure is very important (turn taking, gestures, lip synch) Importance of movements and activity (cf. Grammer et al., 1999) Subliminal reception and judging as well as producing nonverbal behaviors („communication between limbic systems“, Buck, 1994) So far it remains an open question whether rules can be found that allow reliable production of the „correct“ behavior

Upload: others

Post on 22-Sep-2019

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Communicating Virtual Agents - uni-bielefeld.deskopp/download/KI3_2.pdf · 2 Kopp & Krämer KI3: Communicating virtual agents Functions of nonverbal behavior (I) Modeling functions

1

KI3

Communicating Virtual Agents

Nicole Krämer

[email protected]

University of Cologne, Germany

Stefan Kopp

[email protected]

University of Bielefeld, Germany

Part 2: Bases of Multimodal Communication

Kopp & Krämer

KI3: Communicating virtual agents

Overview

I. Introduction� Motivation, history, recent developments� Evaluation

II. Bases of multimodal communication� Channels and functions of multimodal communication� Synthetic communicative behaviors, e.g., facial &

gestural animation, speech synthesis

III. Modeling conversational behavior� Underlying models & architecture� Top-down vs. bottom-up� Outlook & discussion

Kopp & Krämer

KI3: Communicating virtual agents

...knowledge about communication when implementing virtual agentsthat communicate in a human like fashion

� Conversational behavior is highly complex. Since the agent is supposed to behave „autonomously“, we need to know some rules.

� In order to build agents that are accepted and efficient, we need toknow about the effects of specific behaviors.

�Communication research has to provide bases and rules of communication (fundamental research) as well as evaluate theeffects of the agents (applied research).

Problem: Most of the relevant bases and rules are not known yet!

We need...

Kopp & Krämer

KI3: Communicating virtual agents

Channels of communication behavior (I)

Communication has an enormous complexity that mainly is caused by the variety of different channels and their interdependency.

• Verbal and nonverbal communication (Scherer & Wallbott, 1979),vocal and nonvocal channels (Laver & Hutcheson, 1972)

• „Basic triple structure“ of communication: language, paralanguageand kinesics (Poyatos, 1983)

• Studies show that especially the nonverbal behavior is of crucial importance for communication and person perception (Mehrabian& Ferris, 1967; „snap judgements“, Schneider, Hastorff & Ellsworth, 1979).

Kopp & Krämer

KI3: Communicating virtual agents

Channels of communication behavior (II)

Nonverbal behavior channels (according to Wallbott, 1994)

vocal

Time dependent aspects

Voice dependent aspects

Continuity dependent aspects

nonvocal

Motor channels

Physio-chemical channels

Ecological channels

Facial expression

Gestures

Gaze

Posture

Olfactory

Tactile

Thermal

Territory

Interpersonal distance

Appearance

Kopp & Krämer

KI3: Communicating virtual agents

Further important features

• Dimensional complexity – interdependence with respect to the effects (dependence on various contexts: other channels, interaction partners, situational context)

• Sequential complexity - time structure is very important (turn taking, gestures, lip synch)

• Importance of movements and activity (cf. Grammer et al., 1999)

• Subliminal reception and judging as well as producing nonverbal behaviors („communication between limbic systems“, Buck, 1994)

�So far it remains an open question whether rules can be foundthat allow reliable production of the „correct“ behavior

Page 2: Communicating Virtual Agents - uni-bielefeld.deskopp/download/KI3_2.pdf · 2 Kopp & Krämer KI3: Communicating virtual agents Functions of nonverbal behavior (I) Modeling functions

2

Kopp & Krämer

KI3: Communicating virtual agents

Functions of nonverbal behavior (I)

Modeling functions

Discourse functions

Dialogue functions

Relational functions

All these functions are used in FTF-communication and therefore are expected when an humanoid agentappears on the screen. So they have to be modeled!

Mehrabian (1970), Exline et al., (1975), Frey (1999)

Security presen-tations in airplanes

Bandura (1977)

Bolinger (1983), McNeill (1992), Chovil (1991)

Duncan (1972)

Cassell et al. (1994), Nagao & Takeuchi (1994)

Cassell et al. (1999); Thórisson (1996)

Kopp & Krämer

KI3: Communicating virtual agents

Functions of nonverbal behavior (II)

• Discourse functions

� Nonverbal behaviors that are closely related to verbal behavior and can work either as complements, supplements or substitutes of speech

� Especially gestures, but also facial movements such as eyebrow raising (Chovil, 1991) can serve this function

� Concerning gesture Ekman & Friesen (1979; see Efron, 1941) differentiate Illustrators and Emblems (as well as Adaptorsthat do not seem to have discourse function)

� McNeill (1992) distinguishes iconics, metaphorics, deictics,and beats as different types of spontaneous gestures(

�KW1)

Kopp & Krämer

KI3: Communicating virtual agents

Coverbal gesture

• Coverbal gestures are closely related to speech flow (semantic, pragmatic, and temporal synchrony, McNeill, 1992)

• Speech-gesture synchronization on various levels

� Gestures co-occur with rheme (Cassell, 2000)

� Stroke onset precedes orco-occurs with the most contrastively stressed syllable in speech and covaries with it in time.(De Ruiter, 1999; McNeill, 1992; Kendon, 1986)

�Characteristic spatiotemporal features and kinematic properties

Kopp & Krämer

KI3: Communicating virtual agents

Functions of nonverbal behavior (III)

• Dialogue functions

� Consist of turn-taking and backchannel signals� Serve to guarantee the smooth flow of interaction when

exchanging speaker and listener roles� Sacks, Schegloff & Jefferson (1974) list verbal and paraverbal

regulators, Duncan (1972) finds important nonverbal cues� Controversy about the importance of nonverbal cues (Rimé,

1983 vs. Rutter et al., 1979)

Kopp & Krämer

KI3: Communicating virtual agents

Functions of nonverbal behavior (IV)

• Turn-taking-signals (cf. Duncan, 1972)

� Turn yielding signal – extension of the last syllable or last stressed syllable, terminal clause, termination of gestures, sociocentric sentences, looking at interaction partner

� Speaker state signal – starting gesticulation, audible breath, rotating the head away, (over)loudness

� Backchannel signal (Yngve, 1970) – nods, paraverbal feedback, short questions, repetitions, sentence completion

� Turn keeping signal – gesture (negates turn yielding signals), increased head movement activity (Donaghy & Goldberg, 1991)

Kopp & Krämer

KI3: Communicating virtual agents

Functions of nonverbal behavior (V)

• Relational functions

� Socio-emotional effects, definition of the relationship, regulation of emotional climate, impression management

� Mehrabian (1970; cf. Osgood, 1966) differentiates• Evaluation (immediacy cues)

• Dominance (relaxation cues)

• Activity, responsiveness

� Mehrabian tried to find cues for all different dimensions of nonverbal communication...

Page 3: Communicating Virtual Agents - uni-bielefeld.deskopp/download/KI3_2.pdf · 2 Kopp & Krämer KI3: Communicating virtual agents Functions of nonverbal behavior (I) Modeling functions

3

Kopp & Krämer

KI3: Communicating virtual agents

Functions of nonverbal behavior (VI)

• Relational functions – Findings

� Evaluation: gaze, smile, touch, forward lean, head tilt, low distance, activity (e.g. facial expressiveness)

� Dominance: turning away, more expansive gestures, leaning backwards, nonreciprocal touch, relaxation cues?

� Activity/responsiveness: synchrony, relation to increased evaluation

Kopp & Krämer

KI3: Communicating virtual agents

Example of multifunctionality: Eye gaze

• Signals search for information

• Helps to regulate flow of conversation (cf. Duncan, 1972; Kendon, 1967)

• Establishes intimacy (cf. Argyle & Dean, 1967)

• Indicates personality characteristics (social status, culture, etc.) (cf. Exline et al., 1975)

Kopp & Krämer

KI3: Communicating virtual agents

• How to generated communicative behaviors automatically?

� Verbal behavior, also known as speech

� Facial animation for creating facial display and lip synch speech

� Skeletal animation for synthetic gesture

Kopp & Krämer

KI3: Communicating virtual agents

Verbal behaviors

• Spoken utterances with natural intonation contour(crucial for intelligibility and believeability)

�Text-to-speech synthesis

• Lexical stress and sentence stress determined by word class, syntactic constituency, surface position

• Emphatic stress determined by information structure(rheme vs. theme, Halliday, 1967)

• Contrastive stress or focus, e.g. „I like blue tiles more than green tiles.“ vs. „I like blue tiles better than blue wallpaper.“

Emphatic & contrastive stress (= primary stress)�main synchronization points for nonverbal behaviors!

(de Ruiter, 1999)

Kopp & Krämer

KI3: Communicating virtual agents

TTS for multimodality

• TXT2PHO (IKP) and MBROLA (TCTS)• SABLE tags for additional intonation commands

TXT2PHOTXT2PHO

Parse tagsParse tags

ManipulationManipulation

MBROLAMBROLA

Phonetic text+Phonetic text+

Speech

External commands

„<SABLE> Drehe <EMPH> die Leiste <\EMPH>quer zu <EMPH> der Leiste <\EMPH>. <\SABLE>“

„<SABLE> Drehe <EMPH> die Leiste <\EMPH>quer zu <EMPH> der Leiste <\EMPH>. <\SABLE>“

Initialization Planning

Phonation

Phonetic textPhonetic text

Phonetic text:

S 105 18 ...

P 90 8 153

a: 104 4 ...

s 71 28 ...

IPA/XSAMPA

Phonetic text:

S 105 18 ...

P 90 8 153

a: 104 4 ...

s 71 28 ...

IPA/XSAMPA

Kopp & Krämer

KI3: Communicating virtual agents

Nonverbal behaviors

• Generation requires...� High-level way of specifying movements� Accuracy w.r.t. both, spatial and temporal features� Reproduction of naturalness, lifelikeness, even subtleties of

emotive and individual (personal) expression

�Computer animation:

Illusion of movement by displaying slightly alteredpictures in a subsequent and fast manner

�Translation of behaviors into positions and orientations of visual objects for each frame

Page 4: Communicating Virtual Agents - uni-bielefeld.deskopp/download/KI3_2.pdf · 2 Kopp & Krämer KI3: Communicating virtual agents Functions of nonverbal behavior (I) Modeling functions

4

Kopp & Krämer

KI3: Communicating virtual agents

Computer anmation

�Critical issue due to high complexity of both, object and movement, in nonverbal behaviors�Motion control on different levels of abstraction...

Direct specification of all motion parameters(e.g., human body > 240 DOFs)

Direct specification of all motion parameters(e.g., human body > 240 DOFs)

Abstract description of movement &Automatic generation of low-level parameters

Abstract description of movement &Automatic generation of low-level parameters

�Control level hierarchies

simplicity of motion spec

naturalness of animation

Computer animation = modeling + motion control + rendering

Computer animation = modeling + motion control + rendering

Kopp & Krämer

KI3: Communicating virtual agents

Representational animations

• The object‘s representation is subject to the animation

• soft object animation� Animated deformations� Facial Animation, „cloth animation“, etc.

• skeletal animation � Hierarchical structure of rotational joints

connected by rigid links� Animation by alteration of joint angles� Additional control methods (tissue simulation,

cloth animation, etc.) based on underlying kinematic skeleton

Kopp & Krämer

KI3: Communicating virtual agents

Facial Animation

• Requires control hierarchy for deforming the highly complex facial geometry

Vertex displacementsVertex displacements

Face muscle simulationFace muscle simulation

Action EncodingAction EncodingHigh-level specification of actions

performable on the human face:� FACS (Ekman & Friesen, 1978):

Visible facial actions (emotional or conversational) described at muscle level in terms of action units

� MPA (Kalra et al., 1998): Visible features of both facial expressions and visemes(65 MPAs)

High-level specification of actions performable on the human face:

� FACS (Ekman & Friesen, 1978):Visible facial actions (emotional or conversational) described at muscle level in terms of action units

� MPA (Kalra et al., 1998): Visible features of both facial expressions and visemes(65 MPAs)

Kopp & Krämer

KI3: Communicating virtual agents

Face muscles

• Eleven muscles responsible for facial animation; four major groups: Jaw (A), mouth (B-G), eye (H,I), brow/neck (J,K)

• Fixed mapping from muscle contractions to vertex displacements

• Examples: Levator labii superioris (B), Zygomaticus major (C)

(Flemming & Dobbs, 1999)

Kopp & Krämer

KI3: Communicating virtual agents

Vertex displacement

• Movement generation by interpolating target positions (Morphing)• Targets given by, e.g., set of muscle contractions or visual

phonems• Straight, weighted, or segmented morphing

(Flemming & Dobbs, 1999)

Kopp & Krämer

KI3: Communicating virtual agents

Speech animation

• Visual phonems (visemes): mouth positions representing the sounds we hear in speech

• 16 visual phonems, but reduced sets may beadequate for lip synch

• „ba“ & ga � da(McGurk & MacDonald,1986)

Page 5: Communicating Virtual Agents - uni-bielefeld.deskopp/download/KI3_2.pdf · 2 Kopp & Krämer KI3: Communicating virtual agents Functions of nonverbal behavior (I) Modeling functions

5

Kopp & Krämer

KI3: Communicating virtual agents

Speech animation

• Creating lip synch speech� Determine phonems and assign visemes� Animate visemes based on

articulation of phonems� Coarticulation, e.g., drop phonems to

increase smoothness

• Speech Animation + TTS = Talking heads� Baldi, (Massaro et al., 2000)

Kopp & Krämer

KI3: Communicating virtual agents

Skeletal animation

• Hierarchy of rotational joints connected by rigid links

• Anthropometric modeling, joint limits• Redundancy (

�DOF problem, IK

problem)�

Various motion control variables (Cartesian, joint angles, elbow swivel, etc.)

R3 Rn

FK

IK

Kopp & Krämer

KI3: Communicating virtual agents

Keyframing

• Parametric keyframing: Automatic generation of intermediate frames for a given a set of keyframes, by means of interpolating joint angles

• Quality of movements depends onnumber of keyframes

• Still tedious work to define keyframes in low-level control parameters

Kopp & Krämer

KI3: Communicating virtual agents

Performance animation

• Motion capture: Measuring and recording direct movements ofactor for immediate or delayed analysis and playback

• Capture data and map to digital character� Mechanical: joystick, mouse, data gloves, etc.� Optical: at least two cameras, reflecting markers� Electromagnetical: sensors for tracking keypoints

• High degree of naturalness, but lack of generality & flexibility

Kopp & Krämer

KI3: Communicating virtual agents

Procedural animation

• Motion algorithmically described; calculation of control parameters for given point in time

• Physics-based animation� Non-constraint (Newton, Lagrange, etc.) vs. constraint-based

methods (constraint forces, spacetime constraints)� Forward & inverse dynamics� Generation of secondary movements

• Model-based animations� Detailed knowledge about targetted movement� Freqently applied for locomotion

Kopp & Krämer

KI3: Communicating virtual agents

Real-time requirements

Only a polygonal shape with possible texture may be applied

Individual hairs possibleHair

Texture mappingModel with wrinklesSkin

Texture mappingCalculated using mechanical models

Clothes

Simplified models should be used; limitations on the facial deformations

Complex models may be used including muscles with finite elements

Facial Animation

Dynamic models may be too CPU intensiveAny model/method may be used: motion capture, kinematics, dynamics, biomechanics

Locomotion

Real-time processing may prevent using expensive methods based on inverse dynamics or control theory

Any method may be usedSkeletal Animation

Requires fast transformations, e.g., based on cross-sections

May be calculated using metaballs, FFD, splines

Deformations

Limitations on the number of polygonsNo limitations on complexitySurface Modeling

Real-timeFrame-by-frame

(Magnenat-Thalmann & Thalmann, 1998)

Page 6: Communicating Virtual Agents - uni-bielefeld.deskopp/download/KI3_2.pdf · 2 Kopp & Krämer KI3: Communicating virtual agents Functions of nonverbal behavior (I) Modeling functions

6

Kopp & Krämer

KI3: Communicating virtual agents

Gesture animation

• Flexibility, accuracy, and naturalness!

• Two approaches to skeleton motion control:� Motion drawn from a database of predefined motions� Motion dynamically calculated on demand

• Integration of several motion generators vital for designing complex motions!

� hand vs. arm movement� gesture stroke vs. retraction� emblematic vs. iconic gestures

• In terms of Laban Movement Analysis: „Gestures [...] exist because they have some distinctiveness in their Effort and Shapeparameter.“ (Costa et al., 2000)

Kopp & Krämer

KI3: Communicating virtual agents

Gesture animation

• Start from high-level, parametrizeable gesture representations

� Script-based animations, e.g., PaT-Nets (Badler et al. 1993)� Feature-based descriptions based on some

gesture/movement notation system (Calvert et al., 1982; Lebourque & Gibet, 1999; Kopp & Wachsmuth, 2000)

Kopp & Krämer

KI3: Communicating virtual agents

Trajectory formation...

...and modulation

Kopp & Krämer

KI3: Communicating virtual agents

Tomorrow...

I. Introduction� Motivation, history, recent developments� Evaluation

II. Bases of multimodal communication� Channels and functions of multimodal communication� Synthetic communicative behaviors, e.g., facial &

gestural animation, speech synthesis

III. Modeling conversational behavior� Underlying models & architecture� Top-down vs. bottom-up� Outlook & discussion

Kopp & Krämer

KI3: Communicating virtual agents

• Questions? Otherwise....