emotion in music task at mediaeval 2014

20
Emotion in Music: Task Overview Anna Aljanaki 1 Mohammad Soleymani 2 Yi-Hsuan Yang 3 1 Utrecht University, Netherlands 2 University of Geneva, Switzerland 3 Academia Sinica, Taiwan 16-17 October, MediaEval 2014

Upload: multimediaeval

Post on 17-Jul-2015

60 views

Category:

Software


4 download

TRANSCRIPT

Page 1: Emotion in Music Task at MediaEval 2014

Emotion in Music: Task Overview

Anna Aljanaki1 Mohammad Soleymani2

Yi-Hsuan Yang3

1Utrecht University, Netherlands2University of Geneva, Switzerland

3Academia Sinica, Taiwan

16-17 October, MediaEval 2014

Page 2: Emotion in Music Task at MediaEval 2014

Task definition

Description

I A benchmark for music emotion recognition systems(similar but different from MIREX)

I Focusing on audio analysis (optionally, metadata)

Two subtasks

I Dynamic task (required): predict arousal and valencevalues for a song every 0.5s.

I Feature design task: design new or rework existing audiofeatures to estimate emotion for the whole 45s musicalexcerpt or dynamically.

Page 3: Emotion in Music Task at MediaEval 2014

Ground truth

Development set

I Collected for Emotion in Music brave new task in 2013.I 744 files.I 10 annotators per file.

Test set

I Additional data collected in 2014.I 1000 files.I 10 annotators per file.

Page 4: Emotion in Music Task at MediaEval 2014

Ground truth. Music

I 1744 musical excerpts of 45 seconds (randomly sampled)from Free Music Archive (freemusicarchive.org).

I Curated music licensed under Creative Commons.I Manually checked for quality.I 10 genres: Rock, Pop, Electronic, Hip-Hop, Classical, Soul

and RnB, Country, Folk, International, Jazz

Page 5: Emotion in Music Task at MediaEval 2014

Ground truth. Annotations.

Collecting annotations.

I Amazon Mechanical Turk (mturk.com).I 10 Mechanical Turk workers annotated each song.I We averaged 10 annotations and provided to participants:

I Continuous annotations of valence and arousal (1 labelevery 1/2 second).

I Static annotations of valence and arousal for each file(independent from continuous).

Page 6: Emotion in Music Task at MediaEval 2014

Ground truth. Annotations.

Worker Instructions on Valence and Arousal SpaceThe workers were given the following instructions to introducevalence-arousal space to them.

I Valence refers to the degree of positive or negativeemotions one experiences from a given piece of music.

I Positive valence: happiness, joy, excitement.I Negative valence: sadness, fear, anxiety, anger.

I Arousal refers to the intensity of the music clip.I High arousal: loud, energetic, emotionally engaging.I Low arousal: quiet, peaceful, repetitive.

Page 7: Emotion in Music Task at MediaEval 2014

Ground truth. Annotations.

Annotation Interface

Page 8: Emotion in Music Task at MediaEval 2014

Ground truth. Annotations.

Some statistics

I 250 out of 424 workers (59%) passed the qualification test.I It took annotators 10.5 minutes on average to complete the

task (3 songs), and we payed 0.40$ per task.I 99% of time the song was unfamiliar to the annotator.I In general, the music was enjoyed by annotators (on a

scale from 1 to 5, mean liking=3.32 ± 1.22, median=4)

Page 9: Emotion in Music Task at MediaEval 2014

Ground truth. Annotations.

Static annotations.A measure of inter-annotator agreement - Krippendorf’s alpha:

I Valence - 0.22I Arousal - 0.37

Page 10: Emotion in Music Task at MediaEval 2014

Ground truth. Annotations.

Dynamic annotations.A measure of inter-annotator agreement - Kendall’s W afterdiscarding first 15 seconds:

I Valence - 0.16 ± 0.11I Arousal - 0.2 ± 0.13

Page 11: Emotion in Music Task at MediaEval 2014

Evaluation

Dynamic subtask evaluationWe use Pearson’s correlation coefficient and RMSE as metrics in thefollowing steps:

1. Calculate Pearson’s rho between predictions and ground truthfor each song separately.

2. Average across songs separately for valence and for arousal.

3. Rank all submissions for each dimension based on the averagedrho.

4. In case the difference based on the one sided Wilcoxon test isnot significant (p>0.05), we use RMSE to break the tie.

5. If the ranking changed, we do significance test betweenneighbouring pairs again (bubble sort).

Feature design subtask evaluationSame procedure, but Pearson’s rho is calculated for all the songs intest set at once.

Page 12: Emotion in Music Task at MediaEval 2014

Baseline

The organizers decided not to submit and only provide a simplebaseline that participants should beat.

I Five features: Spectral Flux, HCDF (harmonic changedetection function), loudness, roughness and zero crossingrate.

I Linear Regression

Page 13: Emotion in Music Task at MediaEval 2014

Results - Arousal

7 teams crossed the finish line, 6 teams beat the baseline (atleast for arousal).

Dynamic task

Rank Team Arousalρ RMSE

1 TUMMISP 0.35 ± 0.45 0.1 ± 0.052 SAIL 0.28 ± 0.50 0.13 ± 0.073 UoA 0.21 ± 0.57 0.08 ± 0.054 Beatsens 0.23 ± 0.56 0.12 ± 0.055 Rainbow 0.18 ± 0.60 0.12 ± 0.076 THUHCSIL 0.17 ± 0.41 0.12 ± 0.057 Baseline 0.18 ± 0.36 0.14 ± 0.068 Average baseline 0 0.39 ± 0.03

Page 14: Emotion in Music Task at MediaEval 2014

Results - Valence

Dynamic taskThe teams highlighted in bold beat the baseline, other teamsare in the same rank with it.

Rank Team Valenceρ RMSE

1 TUMMISP 0.20 ± 0.49 0.08 ± 0.052 Beatsens 0.12 ± 0.55 0.09 ± 0.053 SAIL 0.15 ± 0.5 0.10 ± 0.064 UoA 0.17 ± 0.5 0.14 ± 0.075 THUHCSIL 0.10 ± 0.37 0.09 ± 0.055 Rainbow 0.07 ± 0.29 0.10 ± 0.065 Baseline 0.11 ± 0.34 0.10 ± 0.066 Average baseline 0 0.34 ± 0.03

Page 15: Emotion in Music Task at MediaEval 2014

Results

Only one team designed new features.

Feature design - static evaluation.Arousal Valence

ρ2 RMSE ρ2 RMSESAIL 0.53 0.32 0.28 0.27

Feature design - dynamic evaluation.Arousal Valenceρ RMSE ρ RMSE

SAIL 0.22 0.12 0.11 0.09

Page 16: Emotion in Music Task at MediaEval 2014

Results

Dynamic runs - Arousal.

Page 17: Emotion in Music Task at MediaEval 2014

Results

Dynamic runs - Valence.

Page 18: Emotion in Music Task at MediaEval 2014

Approaches

Beatsens

I 54 features from MIRToolbox.I Annotations are modeled as a continuous conditional

random field (CCRF) process.I SVR is used as base classifier.I Best performance is achieved by a combination of spectral,

dynamic and rhythmic features, of which the mostimportant were MFCCs.

Page 19: Emotion in Music Task at MediaEval 2014

Approaches

SAILHave designed 3 types of new features

1. Compressibility features2. Median Spectral Band Energy3. Spectral Centre of Mass

Use Partial Least Squares Regression in combination withHaar coefficients to predict the dynamic ratings based onfeatures from the whole song.

Page 20: Emotion in Music Task at MediaEval 2014

Acknowledgments