acoustic cues to emotional speech

24
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003

Upload: emmanuel-garrett

Post on 02-Jan-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Acoustic Cues to Emotional Speech. Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003. Motivation. A speaker’s emotional state conveys important and potentially useful information - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Acoustic Cues to Emotional Speech

Acoustic Cues to Emotional Speech

Julia Hirschberg

(joint work with Jennifer Venditti and Jackson Liscombe)

Columbia University

26 June 2003

Page 2: Acoustic Cues to Emotional Speech

Motivation

• A speaker’s emotional state conveys important and potentially useful information– To recognize (e.g. Spoken Dialogue Systems ,

tutoring systems )– To generate (e.g. games)– If we know what emotion is and what aspects of

productions convey different types• Defining emotion in multidimensional space

– Valence: happy vs. sad– Activation: sad vs. despairing

Page 3: Acoustic Cues to Emotional Speech

• Features that might convey emotion– Acoustic and prosodic– Lexical and syntactic– Facial and gestural

Page 4: Acoustic Cues to Emotional Speech

Previous Research

• Emotion detection in corpus studies– Batliner, Noeth, et al; Ang et al:

anger/frustration in dialogue systems– Lee et al: pos/neg emotion in call center data– Ringel & Hirschberg: voicemail

• … in laboratory studies– Forced choice among 10-12 emotion categories– Sometimes with confidence rating

Page 5: Acoustic Cues to Emotional Speech

Problems

• Hard to identify emotions reliably– Variation in ‘emotional’ utterances: production

and perception– How can we obtain better training data?

• Easier to detect variation in activation than in valence– Variation in ‘emotional’ utterances– Large space of potential features– Which are necessary and sufficient?

Page 6: Acoustic Cues to Emotional Speech

New methods for eliciting judgments

• Hypothesis: Utterances in natural speech may evoke multiple emotions– Elicit judgments on multiple scales– Tokens from LDC Emotional Prosody Speech

and Transcripts Corpus• Professional actors reading 4-syllable dates

and numbers• disgust, panic, anxiety, hot anger, cold anger,

despair, sadness, elation, happiness, interest, boredom, shame, pride, contempt, neutrality

Page 7: Acoustic Cues to Emotional Speech

• Modified category set: – Positive: confident, encouraging, friendly,

happy, interested– Negative: angry, anxious, bored, frustrated, sad– Neutral

• For study: 1 token of each from each of 4 voices plus practice tokens

• Subjects participated over the internet

Page 8: Acoustic Cues to Emotional Speech

– 40 native speakers of standard American English with no reported hearing impairment

– 17 female, 23 male, all 18+– 4 random orders rotated among subjects

Page 9: Acoustic Cues to Emotional Speech

Correlations between Judgments

sad ang bor fru anx fri con hap int enc

sad .06 .44 .26 .22 -.27 -.32 -.42 -.32 -.33

angry .05 .70 .21 -.41 .02 .37 -.09 -.32

bored .14 -.14 -.28 -.17 -.32 -.42 -.27

frustrated .32 -.43 -.09 -.47 -.16 -.39

anxious -.14 -.25 -.17 .07 -.14

friendly .44 .77 .59 .75

confident .45 .51 .53

happy .58 .73

interested .62

encouraging

Page 10: Acoustic Cues to Emotional Speech

What acoustic features correlate with which emotion categories?

– F0: min, max, mean, ‘range’, stdev– RMS: min, max, mean, range, stdev– Voiced samples/all samples (VCD)– Mean syllable length– TILT: spectral tilt (2-1 harmonic over 30ms

window) of highest ampl vowel, nuclear stressed vowel

– Type of nuclear accent, contour, phrasal ending

Page 11: Acoustic Cues to Emotional Speech

Results

• F0, RMS and rate distinguish emotion categories by activation (act)– +act correlate with higher F0 and RMS, faster– do not distinguish valence (val)

• Tilt of highest amplitude vowel groups +act emotions with different val into different categories (e.g. friendly, happy, encouraging vs. angry, frustrated)

• Phrase accent/boundary tone also separates +val from -val

Page 12: Acoustic Cues to Emotional Speech

– H-L% positively correlated with -val and negatively with +val

– +val positively correlated with L-L% and -val not

Page 13: Acoustic Cues to Emotional Speech

Predicting Emotion Categories Automatically

• 1760 judgment/token datapoints (90%/10% training/test)– collapse 2-5 ratings to one

• Ripper machine learning algorithm– Baseline: choose most frequent ranking– Mean performance over all emotions 75% (22%

improvement over baseline)– Individual emotion categories

Page 14: Acoustic Cues to Emotional Speech

– Happy, encouraging, sad, and anxious predicted well

– Confident and interested show little improvement

– Which features best predict which emotion categories?

Page 15: Acoustic Cues to Emotional Speech

Best Performing Features

Emotion Feature Accuracy Angry F0*, RMS*, TILT*,

VCD 77.3/69.3%

Confident F0_range, F0_mean 76.1/75.0% Happy F0_min 81.3/57.4% Interested F0_stdev 75.6/69.9% Encouraging VCD 73.9/52.3%

Page 16: Acoustic Cues to Emotional Speech

Sad F0_max 81.3/61.9%

Anxious Tilt_RMS 78.4/55.7%

Bored Tilt_RMS 80.1/66.5%

Friendly Tilt_stress 75.0/59.1%

Frustrated F0_max 75.0/59.1%

Page 17: Acoustic Cues to Emotional Speech

Conclusions

• New features to distinguish valence: spectral tilt and prosodic endings

• New understanding of relations among emotion categories– Judgments– Features

Page 18: Acoustic Cues to Emotional Speech

Current/Future Work

• Use ML to rank rather than classify (RankBoost)• Eye-tracking task, matching tokens to ‘emotional’

pictures– Web survey to ‘norm’ pictures– Layout issues

Page 19: Acoustic Cues to Emotional Speech
Page 20: Acoustic Cues to Emotional Speech
Page 21: Acoustic Cues to Emotional Speech
Page 22: Acoustic Cues to Emotional Speech
Page 23: Acoustic Cues to Emotional Speech
Page 24: Acoustic Cues to Emotional Speech