emotion cs 3710 / issp 3565

Click here to load reader

Upload: bert

Post on 24-Feb-2016

64 views

Category:

Documents


0 download

DESCRIPTION

Emotion CS 3710 / ISSP 3565. (Slides modified from D. Jurafsky ). Scherer’s typology of affective states. - PowerPoint PPT Presentation

TRANSCRIPT

LSA.303 Introduction to Computational Linguistics

(Slides modified from D. Jurafsky)

Emotion

CS 3710 / ISSP 35651Scherers typology of affective statesEmotion: relatively brief eposide of synchronized response of all or most organismic subsystems in response to the evaluation of an external or internal event as being of major significanceangry, sad, joyful, fearful, ashamed, proud, desparateMood: diffuse affect state, most pronounced as change in subjective feeling, of low intensity but relatively long duration, often without apparent causecheerful, gloomy, irritable, listless, depressed, buoyantInterpersonal stance: affective stance taken toward another person in a specific interaction, coloring the interpersonal exchange in that situationdistant, cold, warm, supportive, contemptuousAttitudes: relatively enduring, affectively colored beliefs, preferences predispositions towards objects or persons liking, loving, hating, valueing, desiringPersonality traits: emotionally laden, stable personality dispositions and behavior tendencies, typical for a personnervous, anxious, reckless, morose, hostile, envious, jealous

Extracting social/interactional meaningEmotionAnnoyance in talking to dialog systemsUncertainty of students in tutoringMoodDetecting Trauma or DepressionInterpersonal StanceRomantic interest, flirtation, friendlinessAlignment/accommodation/entrainmentAttitudes = Sentiment (positive or negative) wont coverPersonality TraitsOpen, Conscienscious, Extroverted, Anxious

OutlineTheoretical background on emotion Extracting emotion from speech and text: case studies

Ekmans 6 basic emotionsSurprise, happiness, anger, fear, disgust, sadness

5Ekman and colleaguesHypothesis: certain basic emotions are universally recognized across cultures; emotions are evolutionarily adaptive and unlearned. Ekman, Friesen, and Tomkins: showed facial expressions of emotion to observers in 5 different countries (Argentina, US, Brazil, Chile, & Japan) and asked the observers to label each expression. Participants from all five countries showed widespread agreement on the emotion each of these pictures depicted. Ekman, Sorenson, and Friesen: conducted a similar study with preliterate tribes of New Guinea (subjects selected a story that best described the facial expression). The tribesmen correctly labeled the emotion even though they had no prior experience with print media. Ekman and colleagues: asked tribesman to show on their faces what they would look like if they experienced the different emotions. They took photos and showed them to Americans who had never seen a tribesman and had them label the emotion. The Americans correctly labeled the emotion of the tribesmen.Ekman and Friesen conducted a study in the US and Japan asking subjects to view highly stressful stimuli as their facial reactions were secretly videotaped. Both subjects did show exactly the same types of facial expressions at the same points in time, and these expressions corresponded to the same expressions that were considered universal in the judgment research.

Dimensional approach. (Russell, 1980, 2003)Arousal

High arousal, High arousal, Displeasure (e.g., anger) High pleasure (e.g., excitement) Valence

Low arousal, Low arousal, Displeasure (e.g., sadness) High pleasure (e.g., relaxation)

Slide from Julia Braverman8Image from Russell 1997

valence-+arousal-Image fromRussell, 19978EngagementLearning (memory, problem-solving, attention)MotivationDistinctive vs. Dimensional approach of emotionDistinctive

Emotions are units.Limited number of basic emotions.Basic emotions are innate and universalMethodology advantageUseful in analyzing traits of personality.Dimensional

Emotions are dimensions.Limited # of labels but unlimited number of emotions.Emotions are culturally learned.Methodological advantage:Easier to obtain reliable measures.Slide from Julia BravermanExpressed emotion Emotional attributioncuesEmotional communication expressed anger ?

encoder

decoderperception ofanger?Vocal cuesFacial cuesGesturesOther cues Loud voiceHigh pitchedFrownClenched fistsShaking

Example:slide from Tanja BaenzigerImplicationsExpressed emotion Emotional attributioncuesrelation of the cues to the expressed emotionrelation of the cues to the perceived emotionmatchingRecognition (Extraction systems): relation of the cues to expressed emotionGeneration (Conversational agents): relation of cues to perceived emotionImportant for Agent generationImportant for Extractionslide from Tanja BaenzigerFour Theoretical Approaches to EmotionDarwinian (natural selection)Jamesian: Emotion is bodily experiencewe feel sorry because we cry afraid because we tremble" our feeling of the changes as they occur IS the emotionCognitive AppraisalAn emotion is produced by appraising (extracting) elements of the situation. (Scherer)Fear: produced by the appraisal of an event or situation as obstructive to ones central needs and goals, requiring urgent action, being difficult to control through human agency, and lack of sufficient power or coping potential to deal with it.Social ConstructivismEmotions are cultural products (Averill)Explains gender and social group differencesanger is elicited by the appraisal that one has been wronged intentionally and unjustifiably by another person. dont get angry if you yank my arm accidentallyor if you are a doctor and do it to reset a boneonly if you do it on purpose

Why Emotion Detection from Speech or Text?Detecting frustration of callers to a help lineDetecting stress in drivers or pilotsDetecting (dis)interest, (un)certainty in on-line tutorsPacing/Content/FeedbackHot spots in meeting browsersSynthesis/generation:On-line literacy tutors in the childrens storybook domainComputer games13Hard Questions in Emotion RecognitionHow do we know what emotional speech is?Acted speech vs. natural (hand labeled) corporaWhat can we classify?Distinguish among multiple classic emotionsDistinguishValence: is it positive or negative?Activation: how strongly is it felt? (sad/despair)What features best predict emotions?What techniques best to use in classification?Slide from Julia Hirschberg14Accuracy of facial versus vocal cues to emotion (Scherer 2001)

Data and tasks for Emotion DetectionScripted speechActed emotions, often using 6 emotionsControls for words, focus on acoustic/prosodic differencesFeatures:F0/pitchEnergyspeaking rateSpontaneous speechMore natural, harder to controlDialogueKinds of emotion focused on:frustration, annoyance, certainty/uncertainty activation/hot spots16Quick case studiesActed speech LDCs EPSaTUncertainty in natural speechPitt ITSPOKEAnnoyance/Frustration in natural speechAng et al (assigned reading)

Example 1: Acted speech; emotional Prosody Speech and Transcripts Corpus (EPSaT)Recordings from LDC http://www.ldc.upenn.edu/Catalog/LDC2002S28.html8 actors read short dates and numbers in 15 emotional styles

Slide from Jackson Liscombe18 EPSaT ExampleshappysadangryconfidentfrustratedfriendlyInterestedEtc.Slide from Jackson Liscombe

19Liscombe et al. 2003 (Detection)Automatic Acoustic-prosodic FeaturesGlobal characterization pitchloudnessspeaking rate

Slide from Jackson Liscombe20Global Pitch StatisticsDifferent Valence/Different Activation

Slide from Jackson Liscombe

21Global Pitch StatisticsDifferent Valence/Same Activation

Slide from Jackson Liscombe

22Liscombe et al. FeaturesAutomatic Acoustic-prosodic ToBI Contours Spectral Tilt

Slide from Jackson Liscombe23Liscombe et al. ExperimentsBinary Classification for Each EmotionRipper, 90/10 splitResults62% average baseline75% average accuracy Most useful features: Slide from Jackson Liscombe

24

Slide from Jackson Liscombe

[pr01_sess00_prob58]25Task 1NegativeConfused, bored, frustrated, uncertainPositive Confident, interested, encouragedNeutral26

27

28Task 2: Uncertaintyum I dont even think I have an idea here ...... now .. mass isnt weight ...... mass is ................ the .......... space that an object takes up ........ is that mass?Slide from Jackson Liscombe[71-67-1:92-113]um I dont even think I have an idea here ...... now .. mass isnt weight ...... mass is ................ the .......... space that an object takes up ........ is that mass?um I dont even think I have an idea here ...... now .. mass isnt weight ...... mass is ................ the .......... space that an object takes up ........ is that mass?um I dont even think I have an idea here ...... now .. mass isnt weight ...... mass is ................ the .......... space that an object takes up ........ is that mass?um I dont even think I have an idea here ...... now .. mass isnt weight ...... mass is ................ the .......... space that an object takes up ........ is that mass?um I dont even think I have an idea here ...... now .. mass isnt weight ...... mass is ................ the .......... space that an object takes up ........ is that mass?

29One . corresponds to 0.25 seconds.Liscombe et al: ITSpoke ExperimentHuman-Human CorpusAdaBoost(C4.5) 90/10 split in WEKAClasses: Uncertain vs Certain vs NeutralResults:Slide from Jackson LiscombeFeaturesAccuracyBaseline66%Acoustic-prosodic75%30Current ITSpoke Experiment(s)Human-Computer (Binary) Uncertainty and (Binary) DisengagmentWizard and fully automated

Be a pilot subject!Scherer summaries re: Prosodic features

Juslin and Laukka metastudy

Ang et al 2002Prosody-Based detection of annoyance/ frustration in human computer dialogDARPA Communicator Project Travel Planning Data NIST June 2000 collection: 392 dialogs, 7515 uttsCMU 1/2001-8/2001 data: 205 dialogs, 5619 uttsCU 11/1999-6/2001 data: 240 dialogs, 8765 uttsConsiders contributions of prosody, language model, and speaking styleQuestionsHow frequent is annoyance and frustration in Communicator dialogs?How reliably can humans label it?How well can machines detect it?What prosodic or other features are useful?Slide from Shriberg, Ang, Stolcke36Data Annotation5 undergrads with different backgroundsEach dialog labeled by 2+ people independently2nd Consensus pass for all disagreements, by two of the same labelersSlide from Shriberg, Ang, Stolcke37Data LabelingEmotion: neutral, annoyed, frustrated, tired/disappointed, amused/surprised, no-speech/NASpeaking style: hyperarticulation, perceived pausing between words or syllables, raised voiceRepeats and corrections: repeat/rephrase, repeat/rephrase with correction, correction onlyMiscellaneous useful events: self-talk, noise, non-native speaker, speaker switches, etc.Slide from Shriberg, Ang, Stolcke38Emotion SamplesNeutralJuly 30YesDisappointed/tiredNo

Amused/surprisedNoAnnoyedYesLate morning (HYP)

FrustratedYesNo

No, I am (HYP)There is no Manila...

Slide from Shriberg, Ang, Stolcke

1234567891039Emotion Class Distribution

Slide from Shriberg, Ang, Stolcke

To get enough data, grouped annoyed and frustrated, versus else (with speech)40Prosodic ModelClassifier: CART-style decision treesDownsampled to equal class priorsAutomatically extracted prosodic features based on recognizer word alignmentsUsed 3/4 for train, 1/4th for test, no call overlapSlide from Shriberg, Ang, Stolcke41Prosodic FeaturesDuration and speaking rate featuresduration of phones, vowels, syllablesnormalized by phone/vowel means in training datanormalized by speaker (all utterances, first 5 only)speaking rate (vowels/time)Pause featuresduration and count of utterance-internal pauses at various threshold durationsratio of speech frames to total utt-internal frames

Slide from Shriberg, Ang, Stolcke42Prosodic Features (cont.)Pitch featuresF0-fitting approach developed at SRI (Snmez)LTM model of F0 estimates speakers F0 range

Many features to capture pitch range, contour shape & size, slopes, locations of interest Normalized using LTM parameters by speaker, using all utts in a call, or only first 5 utts

Slide from Shriberg, Ang, StolckeLog F0TimeF0LTMFitting43Features (cont.)Spectral tilt featuresaverage of 1st cepstral coefficient average slope of linear fit to magnitude spectrumdifference in log energies btw high and low bandsextracted from longest normalized vowel region

Slide from Shriberg, Ang, Stolcke44Language Model FeaturesTrain two 3-gram class-based LMsone on frustration, one on other.Given a test utterance, chose class that has highest LM likelihood (assumes equal priors)In prosodic decision tree, use sign of the likelihood difference as input featureSlide from Shriberg, Ang, Stolcke45Results (cont.)H-H labels agree 72%H labels agree 84% with consensus (biased)Tree model agrees 76% with consensus-- better than original labelers with each otherLanguage model features alone (64%) are not good predictorsSlide from Shriberg, Ang, Stolcke46Prosodic Predictors of Annoyed/FrustratedPitch:high maximum fitted F0 in longest normalized vowelhigh speaker-norm. (1st 5 utts) ratio of F0 rises/fallsmaximum F0 close to speakers estimated F0 toplineminimum fitted F0 late in utterance (no ? intonation)Duration and speaking rate:long maximum phone-normalized phone durationlong max phone- & speaker- norm.(1st 5 utts) vowellow syllable-rate (slower speech) Slide from Shriberg, Ang, Stolcke47Ang et al 02 ConclusionsEmotion labeling is a complex taskProsodic features: duration and stylized pitch Speaker normalizations help

Language model not a good feature

48

Count%

Neutral17994 .831

Annoyed1794 .083

No-speech1437.066

Frustrated176.008

Amused127.006

Tired125.006

TOTAL21653