elen e4896 music signal processing lecture 11: chroma …dpwe/e4896/lectures/e4896-l11.pdf ·...
TRANSCRIPT
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18
Lecture 11:Chroma and Chords
Dan EllisDept. Electrical Engineering, Columbia University
[email protected] http://www.ee.columbia.edu/~dpwe/e4896/1
1. Features for Music Audio2. Chroma Features3. Chord Recognition
ELEN E4896 MUSIC SIGNAL PROCESSING
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18
1. Features for Music Audio
• Challenges of large music databaseshow to find “what we want”...
2
• Euclidean metaphormusic tracks as points in space
• What are the dimensions?“sound” - timbre, instruments → MFCCmelody, chords→ Chromarhythm, tempo→ Rhythmic bases
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18
MFCCs• The standard feature for speech recognition
3
Logan 2000
0.25 0.255 0.26 0.265 0.27 time / s!0.5
0
0.5
0 1000 2000 3000 freq / Hz05
10
0 5 10 15 freq / Mel012
x 104
0 5 10 15 freq / Mel
050
100
0 10 20 30 quefrency!200
0
200
FFT X[k]
Mel scalefreq. warp
log |X[k]|
IFFT
Truncate
MFCCs
Sound
spectra
audspec
cepstra
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18
MFCC Example• Resynthesize by imposing spectrum on noise
MFCCs capture instruments, not notes
4
freq
/ Hz
coef
ficie
ntfre
q / H
z
Let It Be - log-freq specgram (LIB-1)
300
1400
6000
MFCCs
2468
1012
time / sec
Noise excited MFCC resynthesis (LIB-2)
0 5 10 15 20 25
300
1400
6000
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18
MFCC Artist Classification• 20 Artists x 6 albums each
train models on 5 albums, classify tracks from last
• Model as MFCC mean + covarianceper artist“single Gaussian” model20 (mean) + 10 x 19 (covariance) parameters55% correct(guessing ~5%)
5
Confusion: MFCCs (acc 55.13%)
ae be cr cu da de flga gr lem
am
e pr qu ra ro st su to u2
aerosmithbeatles
creedence_c_rcure
dave_matthews_bdepeche_modefleetwood_mac
garth_brooksgreen_day
led_zeppelinmadonnametallica
princequeen
radioheadroxette
steely_dansuzanne_vega
tori_amosu2
true
Ellis 2007
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18
2. Chroma Features• What about modeling tonal content (notes)?
melody spottingchord recognitioncover songs...
• MFCCs exclude tonal content
• Polyphonic transcription is too harde.g. sinusoidal tracking: confused by harmonics
• Chroma features as solution...6
MID
I not
e nu
mbe
r
40
45
50
55
60
65
70
75
MID
I not
e nu
mbe
r
time / s22 24 26 28 30 32 34 36
40
45
50
55
60
65
70
75
RecognizedTrue
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18
Chroma Features• Idea: Project all energy onto 12 semitones
regardless of octavemaintains main “musical” distinctioninvariant to musical equivalenceno need to worry about harmonics?
W(k) is weighting, B(b) selects every ~ mod127
C(b) =NM�
k=0
B(12 log2(k/k0)� b)W (k)|X[k]|
50 100 150 fft bin 2 4 6 8 time / sec 50 100 150 200 250 time / frame
freq
/ kHz
01234
chro
ma
A
CD
FG
chro
ma
A
CD
FG
Fujishima 1999
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18
Better Chroma• Problems:
blurring of bins close to edgeslimitation of FFT bin resolution
• Solutions:peak picking - only keep energy at center of peaks
Instantaneous Frequency - high-resolution estimatesadapt tuning center based on histogram of pitches
8
2 4 6 8 time / sec 50 100 150 200 time / frame
freq /
kHz
01234
chro
ma
A
CD
FG
chro
ma
A
CD
FG
0 2000 freq / Hz
( )
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18
Chroma Resynthesis• Chroma describes the notes in an octave
... but not the octave
• Can resynthesize by presenting all octaves... with a smooth envelope“Shepard tones” - octave is ambiguous
endless sequence illusion
9
0 500 1000 1500 2000 2500 freq / Hz-60-50-40-30-20-10
0
2 4 6 8 10 time / sec
freq
/ kH
z
leve
l / d
B
0
1
2
3
4Shepard tone resynth12 Shepard tone spectra
yb(t) =M�
o=1
W (o +b
12) cos 2o+ b
12 w0t
Ellis & Poliner 2007
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18
Chroma Example• Simple Shepard tone resynthesis
can also reimpose broad spectrum from MFCCs
10
Let It Be - log-freq specgram (LIB-1)
Chroma features
CDEGAB
Shepard tone resynthesis of chroma (LIB-3)
MFCC-filtered shepard tones (LIB-4)
freq
/ Hz
300
1400
6000
freq
/ Hz
chro
ma
bin
300
1400
6000
freq
/ Hz
300
1400
6000
time / sec5202510150
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18
Beat-Synchronous Chroma• Drastically reduce data size
by recording one chroma frame per beat
11
Let It Be - log-freq specgram (LIB-1)
Onset envelope + beat times
Beat-synchronous chroma
Beat-synchronous chroma + Shepard resynthesis (LIB-6)
freq
/ Hz
30014006000
freq
/ Hz
30014006000
CDEGAB
chro
ma
bin
time / sec0 5 10 15 20 25
Bartsch & Wakefield 2001
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18
3. Chord Recognition• Beat synchronous chroma look like chords
can we transcribe them?
• Two approachesmanual templates (prior knowledge)learned models (from training data)
12
ACDEG
chro
ma
bin
time / sec0 5 10 15 20
C-E-
G
B-D
-G
A-C-
E
A-C-
D-F ...
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18
Chord Recognition System• Analogous to speech recognition
Gaussian models of features for each chordHidden Markov Models for chord transitions
13
Audio
Labels
Beat track
Resample
Chroma100-1600 HzBPF
Chroma25-400 HzBPF
Root normalize
HMMViterbi
Counttransitions
Gaussian Unnormalize
beat-synchronouschroma features
chordlabels
24x24transition
matrix
24Gaussmodels
traintest
C D E G A BCDE
GAB
CDE
GAB
C D E G A B
C maj
c min
C D E F G A B c d e f g a bC
D
EF
G
A
Bc
d
ef
g
a
b
Sheh & Ellis 2003
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18
HMMs• Hidden Markov Models are good for
inferring hidden statesunderlying Markov “generative model”each state has emission distribution
observationstell us somethingabout state...
infer smoothedstate sequence
14
1
2
3
0
0.2
0.4
0.6
0.8
0 10 20 300
0
1
2
3
0 1 2 3 40
0.2
0.4
0.6
0.8
observation x time step n
State sequenceEmission distributions
Observation sequence
x nx n
p(x|q)
p(x|q)
q = A q = B q = C
q = A q = B q = CAAAAAAAABBBBBBBBBBBCCCCBBBBBBBC
AS
EC
Bp(qn+1|qn)
S A B C E
0 1 0 0 0
0 0 0 0 1
0 .8 .1 .1 00 .1 .8 .1 00 .1 .1 .7 .1
S A B C E
qn
qn+1.8 .8
.7
.1
.1
.1.1.1
.1
.1
S A A A A A A A A B B B B B B B B B C C C C B B B B B B C E
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18
HMM Inference• HMM defines emission distribution
and transition probabilities
• Likelihood of observed given state sequence:
15
p(x|q)p(qn|qn�1)
p({xn}|{qn}) =�
n
p(xn|qn)p(qn|qn�1)
q0 q1 q2 q3 q4
S A A A ES A A B ES A B B ES B B B E
.9 x .7 x .7 x .1 = 0.0441
.9 x .7 x .2 x .2 = 0.0252
.9 x .2 x .8 x .2 = 0.0288
.1 x .8 x .8 x .2 = 0.0128Σ = 0.1109 Σ = p(X | M) = 0.4020
2.5 x 0.2 x 0.1 = 0.052.5 x 0.2 x 2.3 = 1.152.5 x 2.2 x 2.3 = 12.650.1 x 2.2 x 2.3 = 0.506
0.00220.02900.36430.0065
S A B E
AS B E0.9• 0.1 • 0.7• 0.2 0.1• • 0.8 0.2• • • 1
All possible 3-emission paths Qk from S to Ep(Q | M) = Πn p(qn|qn-1) p(X | Q,M) = Πn p(xn|qn) p(X,Q | M)
Observation likelihoods
x1 x2 x3
AB
p(x|q)
q{ 2.5 0.2 0.10.1 2.2 2.3
A B ES0.9
0.1
0.7
0.2
0.1
0.8
0.2
Model M1 xn
n1
p(x|B)p(x|A)
2 3
Observations x1, x2, x3
S0 1 2 3 4
ABE
Stat
es
time n
0.1
0.9
0.80.2
0.7
0.8 0.2
0.2
0.70.1
Paths • By dynamic programming, we can also identify the best state sequence given just the observations
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18
Key Normalization• Chord transitions depend on key of piece
dominant, relative minor, etc...
16
Aligned Global model
Taxman Eleanor Rigby I'm Only Sleeping
She Said She Said Good Day Sunshine And Your Bird Can Sing
Love You To Yellow Submarine
alig
ned
chro
ma
A
CD
FG
A C D F GA
CD
FG
A C D F GA
CD
FG
A C D F GA
CD
FG
A C D F GA
CD
FG
A C D F GA
CD
FG
A C D F GA
CD
FG
A C D F GA
CD
FG
A C D F GA
CD
FG
aligned chroma
• Chord transition probabilities should be key-relativeestimate main key of piecerotate all chroma features learn models
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18
Chord Recognition• Often works:
• But only about 60% of the time
17
freq
/ Hz
Let It Be/06-Let It Be
C G A:min
A:m
in/b7
F:m
aj7F:
maj6 C G F C C G A:min
A:m
in/b7
F:m
aj7
240
761
2416
0 2 4 6 8 10 12 14 16 18
ABCDEFG
C G a F C G F C G a
Groundtruthchord
Audio
Recognized
Beat-synchronous
chroma
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18
Summary• Music Audio Features
capture information useful for classification
• Chroma Features12 bins to robustly summarize notes
• Chord RecognitionSometimes easy, sometimes subtle
18
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18
References• B. Logan, “Mel frequency cepstral coefficients for music modeling,” in Proc. Int.
Symp. Music Inf. Retrieval ISMIR, Plymouth, September 2000.
• D. Ellis, “Classifying Music Audio with Timbral and Chroma Features,” in Proc. Int. Symp. Music Inf. Retrieval ISMIR-07, pp. 339-340, Vienna, October 2007.
• T. Fujishima, “Realtime chord recognition of musical sound: A system using common lisp music,” In Proc. Int. Comp. Music Conf., pp. 464–467, Beijing, 1999.
• D. Ellis and G. Poliner, “Identifying Cover Songs With Chroma Features and Dynamic Programming Beat Tracking,” Proc. ICASSP-07, pp. IV-1429-1432, Hawai'i, April 2007.
• M. A. Bartsch and G. H. Wakefield, “To catch a chorus: Using chroma-based representations for audio thumbnailing,” in Proc. IEEE WASPAA, Mohonk, October 2001.
• A. Sheh and D. Ellis, “Chord Segmentation and Recognition using EM-Trained Hidden Markov Models,” Int. Symp. Music Inf. Retrieval ISMIR-03, pp. 185-191, Baltimore, October 2003.
19