emotionally-controlled music synthesis

Emotionally-Controlled Music Synthesis

António Pedro OliveiraAmílcar CardosoUniversity of Coimbra, Portugal12/12/2008

2

Outline

Introduction Computational Model Features Extraction Regression Models Conclusion

3

Outline


Introduction

4

Music is accepted as a language of emotional expression

To control this expression in an automatic way, we are developing a computational model that establishes relations between emotions and musical features

Emotions are defined in 2 dimensions: Valence: degree of happiness (from very sad to very happy

music) Arousal: degree of activation (from very relaxing to very

activation music)

5

Outline


Computational Model – Features Extraction

6

Use a database of MIDI music labelled with symbolic and audio features

Computational Model – Regression models

7


Modelling relations between emotions and music features with regression models

Computational Model

8


Modelling relations between emotions and music features with regression models

Use these models to control the affective content of synthesized music

Computational Model - Experiments

9

96 MIDI pieces of film music that last between 20 and 90 seconds

80 listeners Label online each affective

dimension with integer values between 0 and 10

10

Outline


Features Extraction

11

Make a music base with MIDI music labelled with symbolic and audio features

Features Extraction – Correlation between audio

features and valence

12

Sharpness – ratio of high/bass frequencies Loudness – total energy Flatness – spectral distribution of energy Dissonance – perceptive interference of

sinusoids

Similarity – temporal spectral correlation of energy distribution by frequency bands

Dissonance – perceptive interference of sinusoids

Sharpness – ratio of high/bass frequencies Energy – total energy

Features Extraction – Correlation between audio

features and arousal

13

Bridge the gap between audio and symbolic domain:

Spectral similarity vs. note duration, interonset interval Spectral dissonance vs. prevalence of percussion

instruments

Features Extraction – Correlation between audio and symbolic

features

14

15

Outline


Regression models

16

Establish weighted relations between emotions and musical features

Use non-linear regression models Model with symbolic and audio

features

Regression models – Correlation between models

and valence

17

Best hybrid (use of audio and symbolic features) non-linear regression model – 84%

Best symbolic linear regression model – 75% Best audio non-linear regression model – 61%

Regression models – Best audio and symbolic features for

valence

18

Regression models – Correlation between models

and arousal

19

Best hybrid (use of audio and symbolic features) non-linear regression model – 90%

Best symbolic linear regression model – 84% Best audio non-linear regression model – 75%

Regression models – Best audio and symbolic features for

arousal

20

21

Outline


Conclusion

22

Hybrid non-linear regression models outperformed results of symbolic linear regression models

Non-linear models seem more appropriate than linear models

The use of features from audio and symbolic domains is more appropriate than the use of features from only one domain

Timbre/sound can be used to control/influence the emotional expression

emotionally-controlled music synthesis

Engineering