angry emotion detection from real-life conversational speech by leveraging content structure
DESCRIPTION
ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE. Chun-Yu Chen. Wooil Kim and John H. L. Hansen. Outline. Real conversational speech corpus TEO-CB-AUTO-ENV Emotional language model score Experimental results. Real conversational speech corpus. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE](https://reader035.vdocuments.net/reader035/viewer/2022062517/5681331c550346895d99e1b8/html5/thumbnails/1.jpg)
ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH
BY LEVERAGING CONTENT STRUCTUREWooil Kim and John H. L. Hansen
Chun-Yu Chen
![Page 2: ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE](https://reader035.vdocuments.net/reader035/viewer/2022062517/5681331c550346895d99e1b8/html5/thumbnails/2.jpg)
Outline• Real conversational speech corpus
• TEO-CB-AUTO-ENV
• Emotional language model score
• Experimental results
![Page 3: ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE](https://reader035.vdocuments.net/reader035/viewer/2022062517/5681331c550346895d99e1b8/html5/thumbnails/3.jpg)
Real conversational speech corpus
• Neutral speech• digits , alphabets , and other words (First,
July, August)• specific information
• Angry speech• negative words (not, no, can’t, even, how)• Complaints• others(that, this, here)
![Page 4: ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE](https://reader035.vdocuments.net/reader035/viewer/2022062517/5681331c550346895d99e1b8/html5/thumbnails/4.jpg)
Real conversational speech corpus
![Page 5: ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE](https://reader035.vdocuments.net/reader035/viewer/2022062517/5681331c550346895d99e1b8/html5/thumbnails/5.jpg)
TEO-CB-AUTO-ENV
• one of the acoustic features for angry speech detection
• designed to represent nonlinear characteristics of the voiced sound production (e.g., vowels)
• The resulting vector of area coefficients has been shown to be large for neutral speech
![Page 6: ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE](https://reader035.vdocuments.net/reader035/viewer/2022062517/5681331c550346895d99e1b8/html5/thumbnails/6.jpg)
Emotional language model score
• two types of combination methods1. feature combination
MFCC feature vector is appended to the TEO-CB-Auto-Env feature vector
2. classifier combination combining the likelihood scores from both
classifiers with a scale factor
![Page 7: ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE](https://reader035.vdocuments.net/reader035/viewer/2022062517/5681331c550346895d99e1b8/html5/thumbnails/7.jpg)
• “Emotional” language models• Based on an initial language model with a
large vocabulary (HUB4)• using the transcripts of neutral and angry
speech• using HTK and CMU-Cambridge SLMT
toolkit to adapting the initial laguage model• formulate a 2-dimensional feature vector for
a “lexical” feature
Emotional language model score
![Page 8: ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE](https://reader035.vdocuments.net/reader035/viewer/2022062517/5681331c550346895d99e1b8/html5/thumbnails/8.jpg)
Emotional language model score
![Page 9: ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE](https://reader035.vdocuments.net/reader035/viewer/2022062517/5681331c550346895d99e1b8/html5/thumbnails/9.jpg)
• Collect data• 15 female and 13 male speakers• 136 segments for neutral speech and 124
segments for angry speech• Each segment has 3-6 sec
Experimental results
![Page 10: ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE](https://reader035.vdocuments.net/reader035/viewer/2022062517/5681331c550346895d99e1b8/html5/thumbnails/10.jpg)
Experimental results• Two type of model for test
1. Open-speaker• model training by all data except tester’s
2. Close-speaker• Split to two part of data• Tester only speak utterance in part A• Model is training by part B• More performance by include more data
![Page 11: ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE](https://reader035.vdocuments.net/reader035/viewer/2022062517/5681331c550346895d99e1b8/html5/thumbnails/11.jpg)
• Without EMLS• MFCC-EDZ is best
in single feature
Experimental results
![Page 12: ANGRY EMOTION DETECTION FROM REAL-LIFE CONVERSATIONAL SPEECH BY LEVERAGING CONTENT STRUCTURE](https://reader035.vdocuments.net/reader035/viewer/2022062517/5681331c550346895d99e1b8/html5/thumbnails/12.jpg)
• With EMLS
Experimental results