ismir2012 tutorial2
DESCRIPTION
TRANSCRIPT
10/5/2012
1
Music Affect Recognition:
The State-of-the-art and
Lessons Learned
Xiao Hu, Ph.D Yi-Hsuan Eric Yang, Ph.D
The University of Hong Kong Academic Sinica, Taiwan
ISMIR 2012 Tutorial 2
10/5/2012 1
Speaker
10/5/2012 2
Speaker
10/5/2012 3
The Audience
� Do you believe that music is powerful?
� Why do you think so?
� Have you searched for music by affect?
� Have you searched for other things (photos, video) by affect?
� Have you questioned the difference between emotion and mood?
� Is your research related to affect?
10/5/2012 4
Music Affect:
10/5/2012 5
Music Affect:
10/5/2012 6
10/5/2012
2
Music Affect:
10/5/2012 7
Music Affect:
10/5/2012 8
Agenda
� Grand challenges on music affect
� Music affect taxonomy and annotation
� Automatic music affect analysis
� Categorical approach
� Multimodal approach
� Dimensional approach
� Temporal approach
� Beyond music
� Conclusion
10/5/2012 9
Agenda
� Grand challenges on music affect
� Music affect taxonomy and annotation
� Automatic music affect analysis
� Categorical approach
� Multimodal approach
� Dimensional approach
� Temporal approach
� Beyond music
� Conclusion
10/5/2012 10
Emotion or Mood ?
10/5/2012 11
Emotion or Mood ?
� Mood: “relatively permanent and stable”
� Emotion: “temporary and evanescent”
� "most of the supposed [psychological] studies of emotion in music are actually concerned with mood and association."
Meyer, Leonard B. (1956). Emotion and Meaning in Music.
Chicago: Chicago University Press
Leonard Meyer
10/5/2012 12
10/5/2012
3
Expressed or Induced
� Designated/indicated/expressed by a music piece
� Induced/evoked/felt by a listener
� Both are studied in MIR
� Mainly differ in the ways of collecting labels
� “indicate how you feel when listen to the music”
� “indicate the mood conveyed by the music”
10/5/2012 13
Which Moods? 1/2� Different websites / studies use different terms
Thayer’s stress-energy model gives 4 clusters Farnsworth’s 10 adjective groups
Tellegen-Watson-Clark model
10/5/2012 14
Which Moods ? 2/2
� Lack of a general theory of emotions � Ekman’s 6 basic emotions:
� anger, joy, surprise disgust, sadness, fear
� Verbalization of emotional states is often a “distortion” (Meyer, 1956)� “unspeakable feelings”
� “ a restful feeling throughout ... like one of going downstream while swimming”
10/5/2012 15
Sources of Music Emotion
� Intrinsic (structural characteristics of the music)� e.g., modality -> happy vs. sad
� What about melody?
� Extrinsic emotion (semantic context related but outside the music)
� Lee et al., (2012) identified a range of factors in people’s assessment of music mood� Lyrics, tempo, instrumentation, genre, delivery, and even cultural
context
� Little has been known on the mapping of these factors to music mood
Lee, J. H., Hill, T., & Work, L. (2012) What does music mood mean for real
users? Proceedings of the iConference10/5/2012 16
Let’s ask the users… (Lee et al., 2012)
10/5/2012 17
Data, data, data!
� Extremely scarce resource
� Annotations are time consuming
� Consistency is low across annotators
� Existent public datasets on mood:� MoodSwings Turk dataset
� 240 30-sec clips; Arousal – Valence scores
� MIREX mood classification task
� 600 30-sec clips; in 5 mood clusters
� MIREX tag classification task (mood sub-task)
� 3,469 30-sec clips; in 18 mood-related tag groups
� Yang’s emotion regression dataset
� 193 25-sec clips; in 11 levels Arousal Valence scale
10/5/2012 18
10/5/2012
4
Suboptimal Performance
� MIREX Mood Classification (2012)
� Accuracy: 46% - 68%
� MIREX Tag Classification mood subtask(2011)
10/5/2012 19
Newer Challenges
� Cross-cultural applicability� Existent efforts focus on Western music
� OS1 @ ISMIR 2012 (tomorrow): Yang & Hu: Cross-cultural Music Mood Classification: A Comparison on English and Chinese Songs
� Personalization � Ultimate solution to the subjectivity problem
� Contextualization� Even the same person’s emotional responses change in different
time, location, occasions
� PS1 @ ISMIR 2012 (Tomorrow) Watson & Mandryk: Modeling Musical Mood From Audio Features and Listening Context on an In-Situ Data Set
10/5/2012 20
Summary of Challenges
� Terminology
� Models and categories
� No consensus
� Sources and factors
� No clear mapping between sources and affects
� Data scarcity
� Suboptimal performances
� Newer issues
� Cross-cultural, personalization, contextualization,...
10/5/2012 21
Agenda
� Grand challenges on music affect
� Music affect taxonomy and annotation
� Automatic music affect analysis
� Categorical approach
� Multimodal approach
� Dimensional approach
� Temporal approach
� Beyond music
� Conclusion
10/5/2012 22
Music affect taxonomy and
annotation
� Background� What are taxonomies? � Taxonomy vs. Folksonomy
� Developing music mood taxonomies� Taxonomy from Editorial Labels� Taxonomies from Social Tags
� Annotations� Experts� Crowdsourcing (e.g., MTurks, games)� Subjects� Derived from online services
10/5/2012 23
Taxonomy � Domain oriented controlled vocabulary
� Contain labels (metadata)
� Commonly used on websites� Pick list; browsable directory, etc.
10/5/2012 24
10/5/2012
5
Taxonomy vs. Folksonomy
� Taxonomy
� Controlled, structured vocabulary
� Often require expert knowledge
� Top-down and bottom up approaches
� e.g.,
� Folksonomy
� Uncontrolled, unstructured vocabulary
� Social tags freely applied by users
� Commonality exists in large number of tags
� e.g.,
10/5/2012 25
Models in Music Psychology 1/2
� Categorical
� Hevner’s
adjective circle
(1936)
10/5/2012
Hevner, K. 1936. Experimental studies of the elements of expression in music. American Journal of
Psychology, 48
26
Models in Music Psychology 2/2
� Dimensional
� Russell’s
circumplex
model
10/5/2012
Russell, J. A. 1980. A circumplex model of affect. Journal of
Personality and Social
Psychology, 39: 1161-1178.
27
Borrow from Psychology to MIR
Thayer’s stress-energy model gives 4 clusters
Tellegen-Watson-Clark model
Grounded in music perception research, but lack social context of music listening (Juslin & Laukka, 2004)
Juslin, P. N. and Laukka, P. (2004). Expression, perception, and induction of musical emotions: a review and a questionnaire study of everyday listening. JNMR.
Farnsworth’s 10 adjective groups
10/5/2012 28
Taxonomy Built from Editorial Labels
� allmusic.com: “the most comprehensive music reference source on the planet”
� 288 mood labels created and assigned to music works
10/5/2012
• Editorial labels:-Given by professional editors of online repositories
-Have a certain level of control
- Rooted in realistic social contexts
29 10/5/2012 30
10/5/2012
6
Mood Label Clustering
Mood labels for albums
10/5/2012
Mood labels for songs
C1 C1C2 C2C3 C3C4 C4C5 C5
Hu, X., & Downie, J. S. (2007). Exploring Mood Metadata: Relationships with Genre, Artist and Usage Metadata. In Proceedings of ISMIR 31
A Taxonomy of 5 Mood Clusters
10/5/2012
Cluster_1:
passionate, rousing, confident, boisterous, rowdy
Cluster_2:
rollicking, cheerful, fun, sweet, amiable/good natured
Cluster_3:
literate, poignant, wistful, bittersweet, autumnal, brooding
Cluster_4:
humorous, silly, campy, quirky, whimsical, witty, wry
Cluster_5:
aggressive, fiery, tense/anxious, intense, volatile, visceral
32
Taxonomy from Social Tags
� Social tags
Pros:
� Users’ perspectives
� Large quantity
Cons:
� Non-standardized
� Ambiguous Linguistic Resources Human Expertise
“The largest music tagging site for Western music”
Hu, X. (2010). Music and Mood: Where Theory and Reality Meet. InProceedings of the 5th iConference, (Best Student Paper).10/5/2012 33
The Method
� 1,586 terms in WordNet-Affect (a lexicon of affective words)
� – 202 evaluation terms in General Inquirer
� (“good”, “great”, “poor”, etc.)
� – 135 non-affect/ ambiguous terms by experts
� ( “cold”, “chill”, “beat”, etc.)
� = 1,249 terms
10/5/2012
476 terms are last.fm tags
group the tags by WordNet-Affect and experts
=> 36 categories
34
2222----D Mood TaxonomyD Mood TaxonomyD Mood TaxonomyD Mood Taxonomy
10/5/2012
2-Dimensional Representation
10/5/2012 35
Comparison to Russell’s 2-D Model
10/5/2012 36
10/5/2012
7
VAL
EN
CE
AROUSAL
Our TaxonomyOur TaxonomyOur TaxonomyOur Taxonomy
10/5/2012 37
Laurier et al. (2009) Taxonomy from
Social Tags 1/2
� Manually compiled 120 mood words from the literature
� Crawled 6.8M social tags from last.fm
� 107 unique tags matched mood words
� 80 tags with more than 100 occurrences
10/5/2012
Laurier et al. (2009) Music mood representations from social tags, ISMIR
Most used Least used
sad rollicking
fun solemn
melancholy rowdy
happy tense
38
Laurier et al. (2009) Taxonomy from
Social Tags 2/2
10/5/2012
• Used LSA to project tag-track matrix to a space of 100 dim.
• Clustering trials with varied number of clusters
Laurier et al. (2009) Music mood representations from social tags, ISMIR
cluster 1 cluster 2 cluster 3 cluster 4
angry sad tender happy
aggressive bittersweet soothing joyous
visceral sentimental sleepy bright
rousing tragic tranquil cheerful
intense depressing quiet humorous
confident sadness calm gay
anger spooky serene amiable
+A –V -A –V -A +V +A +V
39
� Based on Laurier’s 100-dimensional space
10/5/2012
Agreement between Laurier’s and the 5 cluster taxonomy
Inter-cluster dissimilarity
C1 C2 C3 C4 C5
C1 0 .74 .13 .20 .11
C2 0 .86 .82 .88
C3 0 .32 .27
C4 0 .53
C5 0
Laurier et al. (2009) Music mood representations from social tags, ISMIR
Intra-cluster similarity
40
Summary on Taxonomy
� What are taxonomies?
� Taxonomy vs. Folksonomy
� Developing music mood taxonomies
� from Editorial Labels
� from Social Tags
10/5/2012 41
Mood Annotations
� All annotation needs three things
� taxonomy, music, people
� People
� Experts
� Subjects
� Crowdsourcing (e.g., MTurks, games)
� Derive annotations from online services
10/5/2012 42
10/5/2012
8
Expert Annotation
� The MIREX Audio Mood Classification (AMC) task
� 5 cluster taxonomy
� 1,250 tracks selected from the APM libraries
� A Web-based annotation system called E6K
10/5/2012Hu, X., Downie, J. S., Laurier, C., Bay, M., & Ehmann, A. (2008). The 2007 MIREX Audio Mood Classification Task: Lessons Learned. In ISMIR.
43
Expert Annotation: MIREX AMC
� 2468 judgments collected (3750 planned)� Each clips had 2 or 3 judgments� Avg. Cohen’s Kappa: 0.5
•Each expert had 250 clips• 8 of 21 experts finished all assignments
Agreements C1 C2 C3 C4 C5 Total
3 of 3 judges 21 24 56 21 31 153
2 of 3 judges 41 35 18 26 14 134
2 0f 2 judges 58 61 46 73 75 313
Total 120 120 120 120 120 600
Accuracy
0.59
0.38
0.54
Lessons: 1. Missed judgments -> low accuracy2. Need more motivated annotators
Dataset built from agreements among experts
10/5/2012 44
Crowdsourcing: Amazon Mechanic Turk
• Lee & Hu (2012): compare expert and MTurk annotations
• The same 1,250 music clips as in MIREX AMC
• The same 5 clusters
• Annotators: “Turkers” who work on human intelligent tasks for very low payment
• Advantages of MTurk
• Plenty of labor
• Disadvantages of MTurk
• Quality control
Lee, J. H. & Hu, X. (2012) Generating Ground Truth for Music Mood Classification Using Mechanical Turk, In Proceedings of Joint Conference on Digital Libraries10/5/2012 45
Annotation: Amazon Mechanic Turk
� Human Intelligence Task (HIT)
� Each HIT had 27 clips
� 2 duplicates for consistency check
� Each clips had 2 judges
� Paid 0.55 USD for 1 HIT
� Qualification test before proceeding to task
� 186 HITs collected
� 100 HITs accepted
� Avg. Cohen’s kappa: 0.4810/5/2012 46
EVALUTRON 6000 EVALUTRON 6000 EVALUTRON 6000 EVALUTRON 6000
Comparison: Stats on Collecting Data
Number of Judgments Collected
2468 (incomplete) 2500 (complete)
Total Time for Collecting All Judgments
38 days(+ additional in-house
assessment)
19 days
Cost for Collecting All Judgments
$0 $60.50
Average Time Spent on Each Music Clip
21.54 seconds 17.46 seconds10/5/2012 47
EVALUTRON 6000 EVALUTRON 6000 EVALUTRON 6000 EVALUTRON 6000
Comparison: Agreement Rates
222 % of clips with agreements
C1 40.2%C2 60.2%C3 70.5% C4 39.6%C5 70.8%
Other 16.9%
% of clips with agreements
C1 39.6%C2 48.9%C3 69.5%C4 46.3%C5 60.0%
Other 21.3%
10/5/2012 48
10/5/2012
9
Clusters Disagreed in E6KDisagreed IN
MTurk
Cluster 1 & Cluster 2 20 95
Cluster 2 & Cluster 4 31 86
Cluster 1 & Cluster 5 13 74
⁞ ⁞ ⁞
Cluster 3 & Cluster 4 6 27
Cluster 2 & Cluster 5 1 22
Cluster 3 & Cluster 5 1 20
Total 253 595
Comparison: Confusions among
Clusters
EVALUTRON 6000 EVALUTRON 6000 EVALUTRON 6000 EVALUTRON 6000
10/5/2012 49
Cluster 1
Cluster 2
Cluster 5
Cluster 4Cluster
3
Confusions Shown in Russell’s Model
10/5/2012 50
Comparison: System Performances
(MIREX 2007)
10/5/2012
EVALUTRON 6000 EVALUTRON 6000 EVALUTRON 6000 EVALUTRON 6000
51
Crowdsourcing: Games
� MoodSwings (Kim et al., 2008) � 2-player Web-based game to collect
annotations of music pieces in the arousal- valence space
� Time-varying annotations are collected at a rate of 1 sample per second
� Players “score” for agreement with their competitor
10/5/2012
Kim, Y. E., Schimdt, E., and Emelle, L. (2008). Moodswings: a collaborative game for music mood label collection, ISMIR
52
MoodSwings: Challenges
� Needs a pair of players� Simulated AI player
� Randomly following the real player � less challenging
� Based on prediction model � need training data
� Attracting players (true for all games)� Must be challenging and fun
� Music: more recent and entertaining
� Game interface: sleek, aesthetic
� Research values� Variety of music and mood
10/5/2012
B. G. Morton, J. A. Speck, E. M. Schmidt, and Y. E. Kim (2010). Improving music emotion labeling using human computation,” in HCOMP 53
MoodSwings: MTurk version
10/5/2012
Speck, J. A., Schmidt, E. M., Morton, B. G., and Kim, Y. E. (2011). A comparative study of collaborative vs. traditional music mood annotation, ISMIR
• Single person game
• No competition, no scores
• Monetary reward
(0.25 USD/11 pieces)
• Consistency check:
-- 2 identical pieces whose labels must be within experts’ decision boundary
-- must not label all clips the same way
54
10/5/2012
10
MoodSwings: 2 version Comparison
Label
Corr.
V: 0.71
A: 0.85
10/5/2012
Speck, J. A., Schmidt, E. M., Morton, B. G., and Kim, Y. E. (2011). A comparative study of collaborative vs. traditional music mood annotation, ISMIR 55
Subject Annotation
� Do not require music expertise
� Easier to recruit than experts
� Arguably more authentic to MIR situations
� Can be trained for annotation task
� Higher data quality than MTurk
� Still needs verification/evaluation
� Often with payments
� Rates much higher than MTurk
10/5/2012 Image Copyright © www.allaboutaddiction.com 56
Derive Annotations from online services
� Harness the power of Music 2.0
� Based on editorial labels and noisy user tags
� e.g., the MSD
� e.g., MIREX Audio Tag Classification mood dataset
10/5/2012
Music 2.0 Logo by Rocketsurgeon
57
MIREX Mood Tag Classification
10/5/2012 58
MIREX Mood Tag Classification Dataset:
Positive Examples in Each Category
� Based on the top 100 tags provided by last.fm API
10/5/2012
Select songs tagged heavily with terms in a category
59
MIREX Mood Tag Classification Dataset:
An Example
10/5/2012 60
10/5/2012
11
Annotation Derived from Music 2.0
PROS
� Grounded on real-life usage
� Larger dataset, supporting multi-label
� No manual annotation required
10/5/2012
CONS
• Need mood-related social tags
• Need clever ways to filter out noise
• May be culturally dependent
61
Cross-Cultural Issue in Annotation
� A survey of 30 clips on Americans and Chinese
C1: passionateC2: cheerfulC3: bittersweetC4: humorousC5: aggressive
Hu, X. & Lee, J. H. (2012). A Cross-cultural Study of Music Mood Perception between American and Chinese Listeners, ISMIR (PS3 – Thursday!)
Got to get you into my life by The
Beatles
10/5/2012 62
Summary on Annotation
� Expert annotation for small datasets
� Crowdsourcing with careful designs
� Music 2.0 for super size datasets
� ??10/5/2012 63
Agenda
� Grand challenges on music affect
� Music affect taxonomy and annotation
� Automatic Music affect analysis
� Categorical approach
� Multimodal approach
� Dimensional approach
� Temporal approach
� Beyond music
� Conclusion
10/5/2012 64
Automatic Approaches
� Categorical vs. Dimensional
Pros Cons
Categorical • Intuitive• Natural language
• Term are ambiguous• Difficult to offer fine-
grained differentiation
Dimensional • Continuousaffective scales
• Good user interface
• Less intuitive• Difficult to annotate
10/5/2012 65
Categorical and Multimodal
Approaches
� Classification problem and framework
� Audio features and classification models
� Existing experiments
� Multimodal classification
� Cross-cultural classification
10/5/2012 66
10/5/2012
12
Automatic Classification
(supervised learning)
“Here comes the sun” � Happy
“ I will be back” -> Sad
“Down with the sickness” � AngrySong X � Happy
Song Y� Sad………
Training examples
New examples
� Happy
� Angry
� Sad
Classifier
Training
Testing
Prediction
6710/5/2012
Audio
Feature Extraction
Social tagsTextual
Feature Extraction
linguistic stylistic … tempo …
Feature Selection F-score
language modeling PCA
…
Classification
SVM KNN …
Hybrid methods
featureconcatenation
late fusion …
performance comparison
learning curves
featurecomparison
Classification andMultimodal Combination
Evaluation and Analysis
Lyrics MP3s…
timbral
Feature Generation and Selection
…
Dataset Construction
A Framework for Multimodal Mood Classification
10/5/2012 68
Audio FeaturesType Description Tool
EnergyThe mean and standard deviation of root
mean square energy
Marsyas,
MIR Toolbox
Rhythm Fluctuation pattern and tempoMIR Toolbox
PsySound
Pitch
Pitch class profile, the intensity of 12
semitones of the musical octave in
Western twelve-tone scale
MIR Toolbox
PsySound
TonalKey clarity, musical mode (major/minor),
and harmonic change (e.g., chord change)
MIR Toolbox
Timbre
The mean and standard deviation of the
first 13 MFCCs, delta MFCCs, and delta
delta MFCCs
Marsyas,
MIR Toolbox
Psycho-
acoustic
perceptual loudness, volume, sharpness
(dull/sharp), timbre width (flat/rough),
spectral and tonal dissonance
(dissonant/consonant) of music
PsySound
10/5/2012 69
Classification Models
� Generic supervised learning algorithms� neural network, k-nearest neighbor (k-NN), maximum likelihood,
decision tree, support vector machine (SVM), Gaussian mixture models (GMM), Neural Network, etc.
� Tools: generic machine learning packages� Weka, RapidMiner, LibSVM, SVMLight
� SVM seems superior
MIREX AMC 2007 Results10/5/2012 70
Audio signal’s “glass-ceiling”
� Aucouturier & Pachet (2004) “Semantic Gap” between low-Level music feature
and high-level human perception
� MIREX AMC performance (5 classes)
Year Top 3 accuracies2007 61.50%, 60.50%, 59.67%
2008 63.67%, 58.20%, 56.00%
2009 65.67%, 65.50%, 63.67%
2010 63.83%, 63.50%, 63.17%
2011 69.50%, 67.17%, 66.67%
2012 67.83%, 67.67%, 67.17%
Aucouturier, J-J., & Pachet, F. (2004), Improving timbre similarity: How high is the sky? Journal of Negative. Results in Speech and Audio Sciences, 1 (1).
10/5/2012 71
Multimodal Classification
MUSIC
Audio Lyrics
Social Tags
Improving classification performance by combining multiple independent sources
Bischoff et al. 2009
Yang & Lee, 2004Laurie et al, 2009
Hu & Downie, 2010
Metadata
Schuller et al. 2011
10/5/2012 72
10/5/2012
13
Lyric Features� Basic features:
� Content words, part-of-speech, function words
� Lexicon features:� Words in WordNet-Affect
� Psycholinguistic features:� Psychological categories in GI (General
Inquirer)
� Scores in ANEW (Affective Norm of English Words)
� Stylistic features:� Punctuation marks; interjection words
� Statistics: e.g., how many words per minute
Hu, X. & Downie, J. S. (2010) Improving Mood Classification in Music Digital Libraries by Combining Lyrics and Audio, JCDL
ANEW examples
Valence
Arousal
Dominance
Happy 8.21 6.49 6.63
Sad 1.61 4.13 3.45
Thrill 8.05 8.02 6.54
Kiss 8.26 7.32 6.93
Dead 1.94 5.73 2.84
Dream 6.73 4.53 5.53
Angry 2.85 7.17 5.55
Fear 2.76 6.96 3.22
10/5/2012 73
Lyric Feature Example
GI Feature Description Example
WlbPhys words connoting the physical aspects of well
being, including its absence
blood, dead, drunk,
pain
Perceiv words referring to the perceptual process of
recognizing or identifying something by
means of the senses
dazzle, fantasy, hear,
look, make, tell, view
Exert action words hit, kick, drag, upset
TIME words indicating time noon, night, midnight
COLL words referring to all human collectivities people, gang, party
WlbLoss words related to a loss in a state of well
being, including being upset
burn, die, hurt, mad
Top General Inquire (GI) features in category “Aggressive”
10/5/2012 74
No significant difference
between top combinations
Lyric
Classification
Results10/5/2012 75
Distribution of feature “!”
10/5/2012 76
Distribution of feature “hey”
10/5/2012 77
“number of words per minute”
10/5/2012 78
10/5/2012
14
Combine with Audio-based Classifier
� A leading system in MIREX AMC 2007 and 2008: Marsyas
� Music Analysis, Retrieval and Synthesis for Audio Signals
� led by Prof. Tzanetakis at University of Victoria
� Uses audio spectral features
� marsyas.info
� Finalist in the Sourceforge Community Choice Awards 2009
10/5/2012 79
Hybrid Methods
– Late fusion
– Feature concatenation (early fusion)
Classifier
Prediction
Lyric Classifier
Audio Classifier
Prediction
Prediction
Final Prediction
Dominate due to clarity and the avoidance of “curse of dimensionality”
10/5/2012 80
10/5/2012 81
Effectiveness
Hybrid(early (early (early (early fusion)fusion)fusion)fusion)
Lyrics
Audio
Hybrid(late (late (late (late fusion)fusion)fusion)fusion)
10/5/2012 82
Learning Curves
10/5/2012 83
Audio vs. Lyrics
10/5/2012 84Hu & Downie (2010) When Lyrics Outperform Audio for Music Mood Classification: A Feature Analysis, ISMIR
10/5/2012
15
Top Lyric Features
10/5/2012 85
Top Lyric Features in “Calm”
10/5/2012 86
vs.
Top AffectiveTop AffectiveTop AffectiveTop AffectiveWordsWordsWordsWords
10/5/2012 87
Other Textual Features used in
Music Mood Classification
� Based on SentiWordNet
� assigns to each synset of WordNet three sentiment scores: positivity, negativity, objectivity
� Simple Syntactic Structures
� Negation, modifier
� Lyric rhyme patterns (inspired by poems)
� Contextual features (Beyond lyrics)
� Social tags, blogs, playlists, etc.
10/5/2012 88
Cross-cultural Mood Classification
� Tomorrow, Oral Session 1
Yang & Hu (2012) Cross-cultural Music Mood Classification: A Comparison on English and Chinese Songs, ISMIR
Cross cultural model applicability:-23 mood categories based on AllMusic.com- Train on songs in one culture and classify songs in the other
10/5/2012 89
Summary of Categorical and
Multimodal Approaches� Natural language labels are intuitive to end users
� Based on supervised learning techniques
� Studies mostly focusing on Feature Engineering
� Multimodal approaches improve performances
� Effectiveness and Efficiency
� Cross-cultural mood classification: just started
� Challenges
� Ambiguity inherent in terms (Meyer’s “distortion”)
� Hierarchy of mood categories
� Connections between features and mood categories
10/5/2012 90
10/5/2012
16
Agenda
� Grand challenges on music affect
� Music affect taxonomy and annotation
� Automatic Music affect analysis
� Categorical approach
� Multimodal approach
� Dimensional approach
� Temporal approach
� Beyond music
� Conclusion
10/5/2012 91
Dimensional Approach
� What is and why dimensional model
� Computational model for dimensional music emotion recognition
� Issues
� Difficulty of emotion rating
� Subjectivity of emotion perception
� Context of music listening
� Usability of UI
10/5/2012 92
Categorical Approach
Hevner’ model (1936)
Audio spectrum
10/5/2012 93
Circumplex model
(Russell 1980)
Audio spectrum
Dimensional Approach
10/5/2012 94
What is the Dimensional Model� Alternative conceptualization of
emotions based on their placement along broad affective dimensions
� It is obtained by analyzing “similarity ratings” of emotion words or facial expression by factor analysis or multi-dimensional scaling� For example, Russell (1980) asked
343 subjects to describe their emotional states using 28 emotion words and use four different methods to analyze the correlation between the emotion ratings
� Many studies identifies similar dimensions
10/5/2012 95
The Valence-Arousal (VA) Emotion Model
○ Energy or neurophysiological stimulation level
Activation‒Arousal
Evaluation‒Valence○ Pleasantness○ Positive and
negative affective states
[psp80]10/5/2012 96
10/5/2012
17
More Dimensions
� The world of emotions is not 2D(Fontaine et al., 2007)
� 3rd dimension: potency‒control
� Feeling of power/weakness; dominance/submission
� Anger ↔ fear
� Pride ↔ shame
� Interest ↔ disappointment
� 4th dimension: predictability
� Surprise
� Stress↔ fear
� Contempt ↔ disgust
� However, 2D model seems to work fine for music emotion
10/5/2012 97
Why the Dimensional Model 1/3
� Free of emotion words
� Emotion words are not always precise and consistent� We often cannot find proper words to express our feelings
� Different people have different understandings to the words
� Emotion words are difficult to translate and might not exist with the exact same meaning in different languages (Russell 1991)
� Semantic overlap between emotion categories� Cheerful, happy, joyous, party/celebratory
� Melancholy, gloomy, sad, sorrowful
� Difficult to determine how many and what categories to be used in a mood classification system
10/5/2012 98
No Consensus on Mood Taxonomy in MIRWork # Emotion description
Katayose et al [icpr98] 4 Gloomy, urbane, pathetic, serious
Feng et al [sigir03] 4 Happy, angry, fear, sad
Li et al [ismir03],
Wieczorkowska et al [imtci04]
13Happy, light, graceful, dreamy, longing, dark, sacred, dramatic, agitated, frustrated, mysterious, passionate, bluesy
Wang et al [icsp04] 6 Joyous, robust, restless, lyrical, sober, gloomy
Tolos et al [ccnc05] 3 Happy, aggressive, melancholic+calm
Lu et al [taslp06] 4 Exuberant, anxious/frantic, depressed, content
Yang et al [mm06] 4 Happy, angry, sad, relaxed
Skowronek et al [ismir07]
12Arousing, angry, calming, carefree, cheerful, emo-tional, loving, peaceful, powerful, sad, restless, tender
Wu et al [mmm08] 8Happy, light, easy, touching, sad, sublime, grand, exciting
Hu et al [ismir08] 5 Passionate, cheerful, bittersweet, witty, aggressive
Trohidis et al [ismir08] 6 Surprised, happy, relaxed, quiet, sad, angry10/5/2012 99
Why the Dimensional Model 2/3
� Reliable and economical model
� Only two variables (valence, arousal), instead of tens or hundreds of mood tags
� Easy to compare the performance of different systems
� Suitable for continuous measurements
� Emotions may change over time
� Emotion intensity
� More precise and intuitive than emotion words
very angry
angry
neutral
arousal
valence
Emotion changes as time unfolds
10/5/2012 100
Why the Dimensional Model 3/3
� Ready canvas for user interaction� Emotion-based retrieval � Song collection navigation
Three dimensions are used:valence, arousal, synthetic/acoustic10/5/2012 101
Mapping Songs to the VA Space
� Assumption� View the VA space as a
continuous, Euclidean space
� View each point as an emotional state
� Goal� Given a short music clip
(e.g., 10 to 30 seconds)
� Automatically compute a pair of valence and arousal (VA) valuesthat best quantify (summarize) the expressed emotion of the overall clip
� The research on time-dependent second-by-second emotion recognition (emotion tracking) will be introduced in the next session
(valence, arousal)
10/5/2012 102
10/5/2012
18
How to Predict Emotion Values 1/3
� Sol (A): by dividing the emotion space into several mood classes� For example, into 16 classes
� Pros� Standard classification problem
y = f(x),x is a feature vector,y is a discrete label (1‒16)
� Cons� Poor granularity of the
emotion space (not really VA values)
Moody by Crayonroom
10/5/2012 103
How to Predict Emotion Values 2/3
� Sol (B): by further exploiting the “geographic information” (Yang et al., 2006)
� For example, perform binary classification for each quadrant
� Apply arithmetic operations to the probability estimates
� Valence = u1 + u4 – u2 – u3
� Arousal = u1 + u2 – u3 – u4
� Pros� Easy to compute
� Cons� Lack theoretical foundation
0
0.5
1
class 1 class 2 class 3 class 4
(u denotes likelihood)
10/5/2012 104
How to Predict Emotion Values 3/3
� Sol (C): by means of regression (Yang et al., 2007, 2008;
MacDorman et al., 2007; Eerola et al., 2009)
� Given features, predict a numerical value
� One for valence, one for arousal
yv = fv (x),ya = fa (x),
� Pros� Regression analysis is theoretical sound and well-developed
� Many off-the-shelf good regression algorithms
� Cons� Require ground truth “emotion values”
� Need to ask human subject to “rate” the emotion values of songs
x is a feature vector,yv and ya are both numerical values
10/5/2012 105
Linear Regression: Example
� Linear regression� f(x) = wTx +b
� Possible (hypothesized) w for valence and arousal
� positive valence = consonant harmony & major mode
� high arousal = loud loudness & fast tempo & high pitch
� Nonlinear regression functions can also be used
loudness(loud/soft)
tempo(fast/slow)
pitch level(high/low)
harmony(consonant/dissonant)
mode(major/miner)
valence 0 0 0 1 1
arousal 1 1 1 0 0
10/5/2012 106
Computational Framework
� Emotion annotation: obtain y for training data
� Feature extraction: obtain x
� Regression model training: obtain w
� Automatic prediction: obtain y for test data
Trainingdata
Emotion annotation
Feature extraction
Emotion value
Regressor training
Feature
Testdata
Feature extraction
Automatic Prediction
Feature
Regressor
Emotion value
y
x
x
w
y10/5/2012 107
Feature Extraction: Get xxxxExtractor Language Features
Marsyas-0.2 CMFCC, LPCC, spectral properties (centroid, moment, flatness, crest factor)
MIR toolbox MatlabSpectral features, rhythm features, pitch, key clarity, harmonic change, mode
MA toolbox MatlabMFCC, spectral histogram, periodic histogram, fluctuation pattern
PsySound MatlabPsychoacoustic model –based features (loudness, sharpness, roughness, virtual pitch, volume, timbre width, dissonance)
Rhythm pattern extractor
Matlab Rhythm pattern, beat histogram, tempo
EchoNest API Python Timbre, pitch, loudness, key, mode, tempo
MPEG-7 audio encoder
JavaSpectral properties, harmonic ratio, noise level, fundamental frequency type
10/5/2012 108
10/5/2012
19
Relevant Features
Sound intensity Tempo Rhythm
Pitch rangeMode Consonance
major
[Gomez and Danuser, 2007]
10/5/2012 109
Example Matlab Code for Extracting MFCC
Using the MA Toolbox
Take mean & STD along time
DC value
we take 20 coefficients
10/5/2012 110
Emotion Annotation: Get y
� Rate the VA values of each song
� Ordinal rating scale
� Scroll bar
Trainingdata
Emotion annotatio
n
Feature extraction
Emotion value
Regressor training
Feature
Testdata
Feature extraction
Automatic Prediction
Feature
Regressor
Emotion value
y
x
x
w
y
Only need to annotate the y for training data, the y for the test data can be
automatically predicted by our regression model
10/5/2012 111
Example System
� Data set (Yang et al., 2008)
� 195 pop songs (Chinese, Japanese, and English)
� Each song is rated by 10+ subjects
� Ground truth is set by averaging
� Use Marsyas and PsySound to extract features
� Model learning (get w)
� Linear regression
� Adaboost.RT (nonlinear)
� Support vector regression (SVR)(nonlinear)
10/5/2012 112
Y.-H. Yang, Y.-C. Lin, Y.-F. Su, and H.-H. Chen (2008) A regression approach to music emotion recognition, IEEE TASLP 16(2)
Performance Evaluation
� Evaluation metric� R 2 statistics
� Squared correlation between estimate and ground truth
� The higher the better
� R 2 = 1 � perfectly fits
� R 2 = 0 � random guess
� 10-fold cross validation� 9/10 data for training and 1/10 for testing
� Repeat 20 times to get average result
10/5/2012 113
Quantitative Result
� Result� SVR (nonlinear) performs the best
� Feature selection by the algorithm RReliefF offers gain
� Valence: 0.254
� Arousal: 0.609
� Valence is more difficult to model (it is more subjective)
� Valence: 0.25 – 0.35
� Arousal: 0.60 – 0.85
Method R2 of valence R2 of arousalLinear regression 0.109 0.568Adaboost.RT [ijcnn04] 0.117 0.553SVR (support vector regression) [sc04] 0.222 0.570SVR + RReliefF (feature selection) [ml03] 0.2540.2540.2540.254 0.6090.6090.6090.609
10/5/2012 114
10/5/2012
20
Qualitative ResultNo No No Part 2 - Beyonce
All Of Me - 50 Cent
New York Giants -Big Pun
Why Do I Have To Choose - Willie Nelson
The Last Resort - The Eagles
Mammas Don't Let Your Babies Grow
Up To Be Cowboys -Willie Nelson
Live For The One I Love -Celine Dion
If Only In The Heaven's Eyes - NSYNC
I've Got To See You Again - Norah Jones
Bodies - Sex Pistols
You're Crazy - Guns N' Roses
Out Ta Get Me - Guns N' Roses
10/5/2012 115
Music Retrieval in VA Space
� Provide a simple means for 2D user interface
� Pick a point
� Draw a trajectory
� Useful for mobile devices with small display space
⊳ Demoarousalarousal
valencevalence
Y.-H. Yang, Y.-C. Lin, H.-T. Cheng, and H.-H. Chen (2008) Mr. Emo: Music retrieval in the emotion plane, Proc. ACM Multimedia
10/5/2012 116
How to Further Improve the Accuracy
� Larger dataset
� Use higher-level features � Articulation, pitch contour, melody direction, tonality (Gabrielsson
and Lindström, 2010)
� High-level music concepts (tags)
� Lyrics
� Better understanding of human perception of valence
� Consider the correlation between arousal and valence� Output associative relevance vector machine (Nicolaou et al., 2012)
� Exploit the temporal information
� Long short-term memory neural networks (Weninger et al., 2011),
or HMM (hidden Markov model)
10/5/2012 117
Issue 1: Difficulty of Emotion
Annotation� Rating emotion is difficult
� User fatigue
� Uniform ratings
� Low quality of the ground truth
� Difficult to create large-scale dataset
Trainingdata
Manual annotation
Feature extraction
Emotion value
Regressor training
Feature
Testdata
Feature extraction
Automatic Prediction
Feature
Regressor
Emotion value
user A user B
10/5/2012 118
AnnoEmo: GUI for Emotion Rating
� Encourages differentiation
Click to listen again
Drag & drop to modify
annotation
⊳ Demo
Y.-H. Yang, Y.-F. Su, Y.-C. Lin, and H.-H. Chen (2007) Music emotion recognition: The role of individuality, Proc. Int. Works. Human-centered Multimedia10/5/2012 119
Sol: Ranking Instead of Rating
� Determines the position of a song
� By the relative ranking with respect to other songs
� Strength
� Ranking is easier than rating
� Encourages differentiation
� Avoids inconsistency
� Enhances the qualityof the ground truth
Oh Happy Day
I Want to Hold Your Hand by Beatles
I Feel Good by James Brown
What a Wonderful World by Louis Armstrong
Into the Woods by My Morning Jacket
The Christmas Song
C'est La Vie
Labita by Lisa One
Just the Way You Are by Billy Joel
Perfect Day by Lou Reed
When a Man Loves a Woman by Michael Bolton
Smells Like Teen Spirit by Nirvana
relativeranking
10/5/2012 120
10/5/2012
21
Ranking-Based Emotion Annotation
� Emotion tournament � Requires only n–1 pairwise comparisons
� The global ordering can later be approximated by a greedy algorithm [jair99]
� Use machine learning (“learning-to-rank”)to train a model that ranks songs according to emotion
a b c d e f g h
Which songs is more positive?
Y.-H. Yang and H.-H. Chen (2011) Ranking-based emotion recognition for music organization and retrieval, IEEE TASLP 19(4)
10/5/2012 121
Issue 2: Emotion Perception is Subjective
� A song can be perceived differently by two people, especially for emotions in the 3rd and 4th quadrants
� Also explain why valence prediction is more challenging
(a) Smells Like Teen Spirit
(b) A Whole New World
(c) The Rose (d) Tell Laura I Love Her
10/5/2012 122
Subjectivity of Emotion Perception
Each circle represents the emotion annotation for a music piece by a subject
10/5/2012 123
Sol: Personalized MER
� Need to have some emotion annotation of the target user to train a personalized model
� Not well studied so far
Trainingdata
Emotionannotation
Feature extraction
Emotion value
Regressor training
Feature
Testdata
Feature extraction
Automatic prediction
Feature
Regressor
Emotion value
Target user’sinteraction
Personalization
User feedback
10/5/2012 124
User Feedback: Personal Annotation
� Green cross: “universal” annotation
� Obtained from a group of annotators
� The blue eclipse shows the STD
� Red cross: “personal” annotation
10/5/2012 125
Evaluation of Personalized Prediction
Method |Ф|=5 |Ф|=10 |Ф|=20 |Ф|=30
General regressor 0.1630 0.1635 0.1645 0.1639
Personalized regressor 0.1632 0.1671 0.1768* 0.1839*
* Significant improvement over the general method (p<0.01)
|Ф|
Y. –H. Yang et al. (2009) Personalized music
emotion recognition, ACM SIGIR
10/5/2012 126
10/5/2012
22
Issue 3: Context of Music Listening
� Listening mood/context
� Familiarity/associated memory
� Preference of the singer/performer/song
� Social context (alone, with friends, with strangers)
10/5/2012 127
Issue 4: Usability of UI 1/2
10/5/2012 128
� A new user may not be familiar with the meaning of valence and arousal
Issue 4: Usability of UI 1/2
� A new user may not be familiar with the meaning of valence and arousal => display mood tags
10/5/2012 129
J.-C. Wang et al. (2012) Exploring the relationship between categorical and dimensional emotion semantics of music, Proc. MIRUM
Issue 4: Usability of UI 2/2
� Issues
� How to automatically map emotion words to the VA space
� How to “personalize” the above mapping� Different people may have different interpretation of words
� Let user determine which emotion word to be displayed � The emotion word can be used as a shortcut to organize user’s
music collection over the VA space
� Some preliminary study has been reported (Wang et al, 2012)
J.-C. Wang, Y.-H. Yang, K.-C. Chang, H.-M. Wang, and S.-K. Jeng (2012) Exploring the relationship between categorical and dimensional emotion semantics of music, Proc.
MIRUM10/5/2012 130
Summary of Dimensional Approach
� The dimensional approach as an alternative to the categorical approach
� Free of ambiguity, granularity issues
� A ready canvas for visualization, retrieval, and browsing
� Easy to track emotion variation within a music piece
� Regression-based computational method
� Issues
� Replacing the difficult task of emotion rating by ranking
� Address individual difference by personalization
� Need to take user context into account
� Enhance usability by displaying mood labels10/5/2012 131
Agenda
� Grand challenges on music affect
� Music affect taxonomy and annotation
� Automatic Music affect analysis
� Categorical approach
� Multimodal approach
� Dimensional approach
� Temporal approach
� Beyond music
� Conclusion
10/5/2012 132
10/5/2012
23
Temporal Emotion Variation� Emotion may change as a
musical piece unfolds� The chorus part is usually of
greater emotional intensity than the verse
� Failure of accounting the dynamic (time-varying) nature of music emotion may limit the performance of MER
� The combination of ‘anticipation’ and ‘surprise’ is important for our enjoyment of music (Huron, 2006)
10/5/2012 133
Temporal Aspect is Usually Neglected
� A 10–30 second segment is often used to represent the whole song� To reduce the emotion variation within the segment
� To lessen the burden of emotion annotation on the subjects
� The segment is selected� Manually by picking the most representative part
� By identifying the chorus section automatically
� By selecting the middle 30 seconds
� By selecting the [30, 60] segment
� Short-time features are pooled (e.g., by taking mean and STD) over the all segment, leading to a segment-level feature vector
10/5/2012 134
ComparisonApproach Emotion
annotationFeature extraction
Model training Prediction
Music emotion recognition
Segment-level (10-30 sec)
Segment-level vector or bag-of-frame
Segment-level A segment-level, static, estimate that summarizesthe emotion of the whole song
Musicemotion tracking
Second-level(1-3 sec)
Second-levelvector
Second-level;make use of the temporal relationship
Moment-to-moment “continuous” emotion variation within a song
Trainingdata
Emotion annotation
Feature extraction
Emotion value
Regressor training
Feature
Testdata
Feature extraction
Automatic Prediction
Feature
Regressor
Emotion value
y
x
x
w
y10/5/2012 135
Continuous Response� Continuous-response measures moment-to-moment
responses during listening
� In contrast, post-performance response assumes that music emotion can be understood by collecting affectiveresponses after a musical stimulus has been sounded
� Duration neglect: post-performance (remembered) rating ≠ averaged continuous-response data (Duke and Colprit, 2001)
� Post-performance ratings were generally higher (close to the “peak” or “end” experience, rather than the averaged one)
� Post-performance experience is information poor; listeners are forced to “compress” their response by giving an overall impression
10/5/2012 136
Issues� The emotion of popular music may not change very much or
often (Schmidt and Kim 2008)
� Collecting moment-to-moment response is interrupting and labor-intensive
� Gathering responses along 2 or more scales in the same pass may incur excessive cognitive load on the participant
� Lag structure
� There is a variable “reaction lag” between events and subject responses (Schubert 2001)
� Arousal responses follow the path of loudness with a delay of 2 to 4 seconds (Schubert 2004)
10/5/2012 137
Annotation Examples
� MoodSwings Turk Dataset (Schmidt and Kim, 2011)� 240 15-second clips annotated with Mechanical Turk
Live – Waitress Billy Joel– Captain Jack
10/5/2012 138
10/5/2012
24
Computational Approaches 1/2� The VA approach: predict the emotion values for every one
or two second
� Correspondingly, participants are asked to rate the VA values every one or two seconds� It is easier to track the continuous changes of emotional expression
by rating VA values instead of labeling mood classes (Gabrielsson, 2002)
� Algorithms� Regression
(neglect temporal info)
� Time-series analysis(make use of temporal info.) (Schubert, 1999)(Korhonen et al., 2006)(Schmidt and Kim, 2010)
10/5/2012 139
Time-Series Analysis
� Autoregression with extra inputs (ARX): learn Ak and Bk
valence (arousal) at time t valence (arousal) at previous time
feature at time t feature at previous time (lagged input) noise
Correlation between output values (V & A) y
Relationship between V, A and input values u(music features)
M. D. Korhonen, D. A. Clausi, and M. E. Jernigan (2006) Modeling emotional content of music using system identification, IEEE TSMC 36(3)10/5/2012 140
Block-wise Prediction
� Make a prediction for each short-time segment
� Fixed-length sliding window (Yang et al., 2006)
� Adaptive window (Lu et al., 2006)
Detect emotion change
boundariesSegmentation
Make a prediction for each segment
• Each contains a constant mood• The minimum length of a segment is set to 16
second for classical music (Lu et al., 2006)
10/5/2012 141
Subjective Issue� Listener ratings collected continuously in response to
music are notoriously diverse (Upham and McAdams, 2009)
Thirty audience members’ ratings, in colour, of Experienced Emotional Intensity during a live performance of Mozart’s Overture to Le nozze di Figaro and the average rating for this population in black (Upham and McAdams, 2009)
10/5/2012 142
Computational Approaches 2/2� The VA-Gaussian approach: model the emotion expression
of each time instance as a Gaussian distribution to take into account the subjectivity issue� Predict the time-dependent distribution parameters N(μ; Σ)
� Algorithm� Regression on the five parameters (Yang and Chen 2011)
� Kalman filter (Schmidt and Kim 2010)
� Acoustic emotion Gaussians(Want et al., 2012)
“American Pie” (Don McLean) 10/5/2012 143
Wang et al. (2012) “The Acoustic Emotion Gaussians model for emotion-based music annotation and retrieval,” Proc. ACM MM
Application of Emotion Tracking
� Specify a trajectory to indicate the desired emotion variation within a musical piece
A. Hanjalic (2006) Extracting moods from pictures and sounds: Towards truly personalized TV, IEEE Signal Processing Magazine10/5/2012 144
10/5/2012
25
Summary of Temporal Approach� The emotion of music changes as time unfolds
� Post-performance rating ≠ averaged continuous-response
� Computational methods for
� Tracking the second-by-second emotion variation as a trajectory
� Modeling emotion as a stochastic Gaussian distribution
� Issues
� Difficulty of gathering reliable annotation
(ranking is not useful here)
� Lag structure in emotion perception
� The effect of expectation, surprise
� Subjective differenceSweet anticipation by
David Huron
10/5/2012 145
Agenda
� Grand challenges on music affect
� Music affect taxonomy and annotation
� Automatic Music affect analysis
� Categorical approach
� Multimodal approach
� Dimensional approach
� Temporal approach
� Beyond music
� Conclusion
10/5/2012 146
Emotion in Images� The International Affective Picture System (IAPS)
10/5/2012 147
Emotions in Videos
� Need to depict the emotion variation along the
time (Zhang et al., 2009)
10/5/2012 148
Emotion Recognition in
Image/Video
� Image emotion recognition
� Categorical approach
� Dimensional approach
� Video emotion recognition
� Temporal approach
� Make use of dynamic attributes
such as motion intensity and
shot change rate
10/5/2012 149
Visual Features
10/5/2012 150
10/5/2012
26
Categorical Approach� The “categorical” approach
� Horror event detection in videos (Moncrieff et al., 2001)
� Affect detection in movies (Kang, 2002)
� Colors 'yellow', 'orange' and 'red' correspond to the 'fear' & 'anger'
� Color 'blue', 'violet' and 'green' can be found when the spectator feels 'high valence' and 'low arousal'
Class startle apprehension surprise climax
Precision 63% 89% 93% 80%
Class fear sadness joy
Precision 81% 77% 78%
10/5/2012 151
Dimensional/Temporal Approach
� Valence� Lighting key
� Color (saturation, color energy)
� Rhythm regularity
� Pitch
� Arousal� Shot change rate
� Motion Intensity
� Sound energy
� Rhythm-based features
� Tempo and beat strength
10/5/2012 152
Application: Affective Video Recommendation
� iMTV (Zhang et al., 2010)
10/5/2012 153
Application: Video Content Representation
(Hanjalic and Xu, 2005)
10/5/2012 154
Explicit (Self-Report) vs. Implicit Tagging
� Focus on felt emotion
10/5/2012 155
Bio-Sensors and Features
� Facial expression
� Blood volume pulse
� Respiration pattern
� Skin temperature
� Skin conductance
� EEG (ElectroEncephaloGram)
� EMG (ElectroMyoGram)
10/5/2012 156
10/5/2012
27
The DEAP Dataset 1/2
Music video recommendation system
User
Bodily Responses(EEG/
peripheral)Emotion
User’s taste
Recommended Music Video
Koelstra et al. (2012) “DEAP: A Database for Emotion Analysis using Physiological Signals,” IEEE Trans. Affective Computing
• Both explicit and implicit tagging
10/5/2012 157
The DEAP Dataset 2/2
40 one-minMusic Video
32 Participants
Multimedia content feat.
EEG
Physiological Signals
Rating
Dominance
Valance
Arousal
Liking
Familiarity
EEG feat.
Phys. Feat.
MusicVideos
Face Video
Correlation
Classification
10/5/2012 158
Correlation of EEG and Rating
Koelstra, “DEAP: A Database for Emotion Analysis using Physiological Signals”
Co
rrelation
Theta Alpha Beta Gamma
10/5/2012 159
Summary of Visual Emotion
Recognition� Commonality
� Similar emotion models and computational method (categorical, dimensional, temporal)
� Similar audio features
� Similar result (valence is more difficult than arousal)
� Difference� Prefer the temporal approach
� More focus on highlight extraction
� More focus on the “implicit” tagging of emotion, especially the study of physiological signals
10/5/2012 160
Agenda
� Grand challenges on music affect
� Affect categories and labels
� Music affect analysis
� Categorical approach
� Multimodal approach
� Dimensional approach
� Temporal approach
� Beyond music
� Conclusion
10/5/2012 161
Related Books
Sweet Anticipation: Music and the Psychology of Expectation
The Oxford Handbook of Music Psychology
Music Emotion Recognition
Handbook of Music and Emotion: Theory, Research, Applications
10/5/2012 162
10/5/2012
28
Recent Survey
� Y. E. Kim et al. (2010), “Music emotion recognition: A state of the art review,” in Proc. ISMIR.
� Y.-H. Yang and H.-H. Chen (2012) “Machine recognition of music emotion: A review,” ACM Trans. Intel. Systems & Technology, 3(4)
� M. Barthet et al. (2012) “Multidisciplinary perspectives on music emotion recognition: Implications for content and context-based models,” in Proc. CMMR
� Z. Zeng et al. (2009) “A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions,” IEEE Trans. Pattern Anal. & Machine Intel., 31(1)
10/5/2012 163
Conclusion
10/5/2012 164