limsi-cnrs wp5 - belfast september, 2004 multimodal annotation of emotions in tv interviews s....
TRANSCRIPT
LIMSI-CNRS WP5 - Belfast September, 2004
Multimodal Annotation of Emotions in TV Interviews
S. Abrilian, L. Devillers, J.C Martin, S. Buisine
LIMSI – CNRS, France
HUMAINE - WP5 Summer school - Belfast
LIMSI-CNRS WP5 - Belfast September, 2004
Content
• Emotion pervades human communication – feelings are conveyed in faces, voices, and gestures; and people judge
others by the way they respond to those signals.
• On-going research on – modeling relations emotion / multimodal– blended & subtles real-life emotions– detection / synthesis (ECAs)
• EmoTV exploratory corpus– Annotations: segmentation, emotion, multimodal
• Difficult issues• Hands-on session
LIMSI-CNRS WP5 - Belfast September, 2004
EmoTV-1 Corpus
51 video recorded from French TV channelsThe interviews covering a range of different topics :
politics, sport, law, religion, etc
# interviewees: 48
# topics: 24
# total duration: 12 mn
# words total: ~2500
# emot. segmt: 281
Min: 4 s Max: 43 s
# distinct : ~800
LIMSI-CNRS WP5 - Belfast September, 2004
Video selection criteria
TV interviews Realistic situation Presence of emotion (fullblown/subtle/blended/…) Speaker face and upper body (close shot) Multimodal cues : speech, gesture, gaze… French language Focus on one person (unknown person)Audio quality
LIMSI-CNRS WP5 - Belfast September, 2004
Annotation Protocol with natural corpora
Annotation scheme design difficult even more for blended / subtle / masked / sequential emotions
Emotion and multimodal annotation iterations: defining categories phase annotating phase validating phase with inter-annotator agreement, perceptual tests and statistical analysis
LIMSI-CNRS WP5 - Belfast September, 2004
Emotion labelling phase Emotion segmentation: different strategies Emotion labelling with:
abstract dimensions: valence/activation (Cowie 2001)scale of 7 levels
category labels pragmatic decision : - converging on a smaller set of basic categories, - combining those categories in order to define complex emotion
« palette theory » (Cowie 2001), « Plutchik wheel » (Plutchik 1980)
example: Disappointment = sadness + surpriseContempt = anger + disgust
LIMSI-CNRS WP5 - Belfast September, 2004
Emotion labelling: choice of labels Annotation protocol : 2 annotators, free choice, then elaboration of emotion
category
176 labels -> preliminary list of 18 emotion categories
anger, despair, disappointment, disgust, doubtembarrassment, exaltation, fear, irritation, joy, neutral, pleased, pride, sadness, serenity, shame, surprise, worry
Example of emotion annotation scheme (ANVIL): Each segment is labelled with
Primary label: sadness Secondary label (or not): disgust Valence: 2 (negative) Intensity: 6 (very high)
LIMSI-CNRS WP5 - Belfast September, 2004
Annotation of multimodal behavior
• Multimodal corpora and tools– LREC 2002 & 2004 workshops– I. Poggi coding scheme
• Annotation of multimodal behavior– McNeill 1982, Kipp 2004
• Emotion and multimodal behavior– Emotional expression Collier 1985, – Facial expression: Ekman 2003, Pandzic 2002– Expressivity of gesture (Pelachaud 2004)
LIMSI-CNRS WP5 - Belfast September, 2004
Coding scheme design
• Requirements– « Fast » annotation by single annotator for all
modalities – Specific requirements for the mm coding scheme
for TV interviews
• Coding scheme design– Behavior observed in the videos– Suggested by the literature (prototypical emotions)
LIMSI-CNRS WP5 - Belfast September, 2004
Audio tracks
• Required – Prosodic cues: rhythm (speech rate), melody
(F0), energy, voice quality.– Non-verbal events: laughter, cry, throat
clearing,…
• Tracks– Energy– Transliteration: French / English
LIMSI-CNRS WP5 - Belfast September, 2004
Posture group
• Pose track– Body orientation: up, down, left, right, front, back,
packed, seat
• Shift track– Activity: whole body, upper body, legs– Speed: fast, moderate, slow– Action: walk, jump, duck, run, stand, sit, turn over,
back, move back, come closer
LIMSI-CNRS WP5 - Belfast September, 2004
hand/arm movement
non-communicative
adaptor emblem
communicative
deictic illustrative
iconic metaphoric
beat
Communicative gesture classes• Several typologies (Efron 41, Ekman & Friesen 69, McNeill 92,
Kipp 04)
LIMSI-CNRS WP5 - Belfast September, 2004
Which gesture classes for emotional ecological corpora ?
• Criteria: corpus + ease of annotation– Adaptor – Beats– Gesticulation: free form, spontaneous– Deictics, emblems, iconics, metaphorics
LIMSI-CNRS WP5 - Belfast September, 2004
Phase gesture group (Kipp 1991)
• Phase – Type (Kendon and McNeill 1992): preparation, stroke, beats,
hold, retract
– Speed: fast, moderate, slow
– Energy: high, normal, low
– Handedness
– Spatial region: up, head, chest, down, periphery
– Hand shape: open, closed
– Direction: horizontal, vertical
LIMSI-CNRS WP5 - Belfast September, 2004
Facial expressions
• MPEG-4 Facial Animation Parameters
• FACS Action units– Chin, Lids, Brows,
Cheeks, Head, Lips, Nose, Mouth, Eyes
LIMSI-CNRS WP5 - Belfast September, 2004
Problematic issues• Time consuming• Subjectivity in the segmentation and annotation of emotion labels and some
modalities• Separate emotionally significant events from non emotional • Require expertise in annotating all modalities (gesture type, FAPS)• Limitations of TV samples: image resolution, mostly upper body, external
events/objects out of camera scope (which ellicit gaze)• Annotation is corpus-dependent (ex: gesture speed)…• But enable the exploration of complex natural emotions
LIMSI-CNRS WP5 - Belfast September, 2004
Next steps
• Corpus annotations/Analysis of results – inter-annotation agreement
– Signal/transcription alignment
– Statistics: relation between mm behavior and emotion
• Improve multimodal coding scheme• Update of coders documentation
Results will be presented at WP5 WS
LIMSI-CNRS WP5 - Belfast September, 2004
Future direction
• Typology of natural mm complex emotion
• Unsupervised classification of mm annotations
• Correlation of mm annotations and emotions