limsi-cnrs wp5 - belfast september, 2004 multimodal annotation of emotions in tv interviews s....

20
LIMSI-CNRS WP5 - Belfast September, 2004 Multimodal Annotation of Emotions in TV Interviews S. Abrilian, L. Devillers, J.C Martin, S. Buisine LIMSI – CNRS, France HUMAINE - WP5 Summer school - Belfast

Upload: jade-hopkins

Post on 27-Dec-2015

228 views

Category:

Documents


0 download

TRANSCRIPT

LIMSI-CNRS WP5 - Belfast September, 2004

Multimodal Annotation of Emotions in TV Interviews

S. Abrilian, L. Devillers, J.C Martin, S. Buisine

LIMSI – CNRS, France

HUMAINE - WP5 Summer school - Belfast

LIMSI-CNRS WP5 - Belfast September, 2004

Content

• Emotion pervades human communication – feelings are conveyed in faces, voices, and gestures; and people judge

others by the way they respond to those signals.

• On-going research on – modeling relations emotion / multimodal– blended & subtles real-life emotions– detection / synthesis (ECAs)

• EmoTV exploratory corpus– Annotations: segmentation, emotion, multimodal

• Difficult issues• Hands-on session

LIMSI-CNRS WP5 - Belfast September, 2004

EmoTV-1 Corpus

51 video recorded from French TV channelsThe interviews covering a range of different topics :

politics, sport, law, religion, etc

# interviewees: 48

# topics: 24

# total duration: 12 mn

# words total: ~2500

# emot. segmt: 281

Min: 4 s Max: 43 s

# distinct : ~800

LIMSI-CNRS WP5 - Belfast September, 2004

Video selection criteria

TV interviews Realistic situation Presence of emotion (fullblown/subtle/blended/…) Speaker face and upper body (close shot) Multimodal cues : speech, gesture, gaze… French language Focus on one person (unknown person)Audio quality

LIMSI-CNRS WP5 - Belfast September, 2004

Annotation Protocol with natural corpora

Annotation scheme design difficult even more for blended / subtle / masked / sequential emotions

Emotion and multimodal annotation iterations: defining categories phase annotating phase validating phase with inter-annotator agreement, perceptual tests and statistical analysis

LIMSI-CNRS WP5 - Belfast September, 2004

Emotion labelling phase Emotion segmentation: different strategies Emotion labelling with:

abstract dimensions: valence/activation (Cowie 2001)scale of 7 levels

category labels pragmatic decision : - converging on a smaller set of basic categories, - combining those categories in order to define complex emotion

« palette theory » (Cowie 2001), « Plutchik wheel » (Plutchik 1980)

example: Disappointment = sadness + surpriseContempt = anger + disgust

LIMSI-CNRS WP5 - Belfast September, 2004

Emotion labelling: choice of labels Annotation protocol : 2 annotators, free choice, then elaboration of emotion

category

176 labels -> preliminary list of 18 emotion categories

anger, despair, disappointment, disgust, doubtembarrassment, exaltation, fear, irritation, joy, neutral, pleased, pride, sadness, serenity, shame, surprise, worry

Example of emotion annotation scheme (ANVIL): Each segment is labelled with

Primary label: sadness Secondary label (or not): disgust Valence: 2 (negative) Intensity: 6 (very high)

LIMSI-CNRS WP5 - Belfast September, 2004

Annotation of multimodal behavior

• Multimodal corpora and tools– LREC 2002 & 2004 workshops– I. Poggi coding scheme

• Annotation of multimodal behavior– McNeill 1982, Kipp 2004

• Emotion and multimodal behavior– Emotional expression Collier 1985, – Facial expression: Ekman 2003, Pandzic 2002– Expressivity of gesture (Pelachaud 2004)

LIMSI-CNRS WP5 - Belfast September, 2004

Coding scheme design

• Requirements– « Fast » annotation by single annotator for all

modalities – Specific requirements for the mm coding scheme

for TV interviews

• Coding scheme design– Behavior observed in the videos– Suggested by the literature (prototypical emotions)

LIMSI-CNRS WP5 - Belfast September, 2004

LIMSI-CNRS WP5 - Belfast September, 2004

Audio tracks

• Required – Prosodic cues: rhythm (speech rate), melody

(F0), energy, voice quality.– Non-verbal events: laughter, cry, throat

clearing,…

• Tracks– Energy– Transliteration: French / English

LIMSI-CNRS WP5 - Belfast September, 2004

Posture group

• Pose track– Body orientation: up, down, left, right, front, back,

packed, seat

• Shift track– Activity: whole body, upper body, legs– Speed: fast, moderate, slow– Action: walk, jump, duck, run, stand, sit, turn over,

back, move back, come closer

LIMSI-CNRS WP5 - Belfast September, 2004

hand/arm movement

non-communicative

adaptor emblem

communicative

deictic illustrative

iconic metaphoric

beat

Communicative gesture classes• Several typologies (Efron 41, Ekman & Friesen 69, McNeill 92,

Kipp 04)

LIMSI-CNRS WP5 - Belfast September, 2004

Which gesture classes for emotional ecological corpora ?

• Criteria: corpus + ease of annotation– Adaptor – Beats– Gesticulation: free form, spontaneous– Deictics, emblems, iconics, metaphorics

LIMSI-CNRS WP5 - Belfast September, 2004

Phase gesture group (Kipp 1991)

• Phase – Type (Kendon and McNeill 1992): preparation, stroke, beats,

hold, retract

– Speed: fast, moderate, slow

– Energy: high, normal, low

– Handedness

– Spatial region: up, head, chest, down, periphery

– Hand shape: open, closed

– Direction: horizontal, vertical

LIMSI-CNRS WP5 - Belfast September, 2004

Facial expressions

• MPEG-4 Facial Animation Parameters

• FACS Action units– Chin, Lids, Brows,

Cheeks, Head, Lips, Nose, Mouth, Eyes

LIMSI-CNRS WP5 - Belfast September, 2004

Problematic issues• Time consuming• Subjectivity in the segmentation and annotation of emotion labels and some

modalities• Separate emotionally significant events from non emotional • Require expertise in annotating all modalities (gesture type, FAPS)• Limitations of TV samples: image resolution, mostly upper body, external

events/objects out of camera scope (which ellicit gaze)• Annotation is corpus-dependent (ex: gesture speed)…• But enable the exploration of complex natural emotions

LIMSI-CNRS WP5 - Belfast September, 2004

Next steps

• Corpus annotations/Analysis of results – inter-annotation agreement

– Signal/transcription alignment

– Statistics: relation between mm behavior and emotion

• Improve multimodal coding scheme• Update of coders documentation

Results will be presented at WP5 WS

LIMSI-CNRS WP5 - Belfast September, 2004

Future direction

• Typology of natural mm complex emotion

• Unsupervised classification of mm annotations

• Correlation of mm annotations and emotions

LIMSI-CNRS WP5 - Belfast September, 2004

Summer school protocol

• Annotate emotion and multimodal behavior on videos (individual and collective steps)

• Provided– Anvil short documentation (Kipp 1991)– Coding scheme – Emotional segments– Speech transcription: French, English