Page 1
Hierarchical Bayesian Models for Audio and Music
Signal Processing
A Taylan Cemgil
Signal Processing and Communications Lab
8 December 2007NIPS 07 Workshop on Music
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007
Colaborators
bull Onur Dikmen Bogazici Istanbul
bull Paul Peeling Cambridge
bull Nick Whiteley Cambridge
bull Simon Godsill Cambridge
bull Cedric Fevotte ENST Paris Telecom
bull David Barber UCL London
bull Bert Kappen Nijmegen The Netherlands
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 1
Statistical Approaches
bull Probabilistic
bull Hierarchical signal models to incorporate prior knowledgeinspiration
from various sources
ndash Physics (acoustics physical models )
ndash Studies of human cognition and perception (masking psychoacoustics )
ndash Musicology (musical constructs harmony tempo form )
bull Consistent framework for developing inference algorithms
bull Contrast to TraditionalProcedural approaches ndash where no clear
distinction between ldquowhatrdquo and ldquohowrdquo
bull Need to overcome computational obstacles (time memory)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 2
Generative Models for audition
bull Computer audition hArr inverse synthesis via Bayesian inference
p(Structure|Observations) prop p(Observations|Structure)p(Structure)
Goal Developing flexible prior structures for modelling nonstationary
sources
lowast source separation transcription
lowast restoration interpolation localisation identification
lowast coding compression resynthesis cross synthesis
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 3
Bayesian Source Separation
bull Joint estimation of Sources given Observations
Source Model v Parameters of Source prior
sk1 skn skN v
xk1 xkM
k = 1 K
λ
Observation Model λ Channel noise mixing system
p(Src|Obs) prop
int
dλdvp(Obs|Src λ)p(Src|v)p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 4
Audio RestorationInterpolation
bull Estimate missing samples given observed ones
bull Restoration concatenative expressive speech synthesis
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 5
Polyphonic Music Transcription
bull from sound
tsec
fHz
0 1 2 3 4 5 6 7 80
1000
2000
3000
4000
5000
0
10
20
(S)
bull to score
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 6
Decimated_chopinwav
Media File (audiowav)
Modelling and Computational issues
bull Hierarchical
ndash Signal levelpitch onsets timbre
ndash Symbolic levelmelody motives harmony chords tonality rhythm beat tempo articulationinstrumentation voice
ndash Cognitive levelexpression genre form style mood emotion
bull Uncertainty
ndash Parameter LearningWhich pitch rhythm tempo meter time signature
ndash Model SelectionHow many notes harmonics onsets sections
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 7
Generative Models for Music
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 8
Generative Models for Music
Score Expression
Piano-Roll
Signal
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 9
Hierarchical Modeling of Music
M
1 2 tv1 v2 vtk1 k2 kth1 h2 ht1 2 tm1 m2 mtgj1 gj2 gjtrj1 rj2 rjtnj1 nj2 njtxj1 xj2 xjtyj1 yj2 yjty1 y2 yt
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 10
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Research Questions
What kinds of prior knowledge and modelling techniques are usefulHow can we do efficient inference
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 11
Signal Models for Audio
bull Time domain ndash state space dynamical models
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA) switching state space models
ndash Flexible Physically realistic
ndash Analysis down to sample precision Computationally quite heavy
bull Transform domain ndash Fourier representations Generalised Linear
model
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 12
Sinusoidal Modeling
bull Sound is primarily about oscillations and resonance
bull Cascade of second order sytems
bull Audio signals can often be compactly represented by sinusoidals
(real) yn =
psum
k=1
αkeminusγkn cos(ωkn+ φk)
(complex) yn =
psum
k=1
ck(eminusγk+jωk)n
y = F (γ1p ω1p)c
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13
State space Parametrisation
xn+1 =
eminusγ1+jω1
eminusγp+jωp
︸ ︷︷ ︸
A
xn x0 =
c1c2cp
yn =(
1 1 1 1)
︸ ︷︷ ︸
C
xn
x0 x1 xkminus1 xk xK
y1 ykminus1 yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14
State Space Parametrisation
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15
Audio RestorationInterpolation
bull Estimate missing samples given observed ones
bull Restoration concatenative expressive speech synthesis
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16
Audio Interpolation
p(xnotκ|xκ) prop
int
dHp(xnotκ|H)p(xκ|H)p(H)
H equiv (parameters hidden states)
H
xnotκ xκ
Missing Observed
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17
Probabilistic Phase Vocoder (Cemgil and Godsill 2005)
Aν Qν
sν0 middot middot middot sν
k middot middot middot sνKminus1
ν = 0 W minus 1
x0 xk xKminus1
sνk sim N (sν
kAνsνkminus1 Qν) Aν sim N
(
Aν
(cos(ων) minus sin(ων)sin(ων) cos(ων)
)
Ψ
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18
Inference Structured Variational Bayes
Aα q(Aα) Qα q(Qα)
middot middot middot sαkminus1 sα
ksα
k+1 middot middot middot
α isin C
prod
k q(sαk |s
αkminus1)
xk q(xk)
bull Intuitive algorithm
ndash Substract from the observed signal x the prediction of the frequency bands in notα
ndash Compute a fit for α to this residual and iterate
bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of
equations
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19
Restoration
bull Piano
ndash Signal with missing samples (37)
ndash Reconstruction 768 dB improvement
ndash Original
bull Trumpet
ndash Signal with missing samples (37)
ndash Reconstruction 710 dB improvement
ndash Original
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20
piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)
Hierarchical Factorial Models
bull Each component models a latent process
bull The observations are projections
rν0 middot middot middot rν
k middot middot middot rνK
θν0 middot middot middot θ
νk middot middot middot θ
νK
ν = 1 W
yk yK
bull Generalises Source-filter models
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21
Harmonic model with changepoints
rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1
θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸
reg
+ [rk = 1]N (0 S)︸ ︷︷ ︸
new
yk|θk sim N (Cθk R)
A =
Gω
G2ω
GH
ω
N
Gω = ρk
(cos(ω) minus sin(ω)sin(ω) cos(ω)
)
damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22
Exact Inference in switching state space models is intractable
bull In general exact inference is NP hard
ndash Conditional Gaussians are not closed under marginalization
rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity
rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially
minus7903666343
076292
minus103422
minus101982minus2393
minus27957
minus04593
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23
Exact Inference for Changepoint detection
bull Exact inference is achievable in polynomial timespace
ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan
1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and
Heskes 2006)
r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0
θ0 θ1 θ2 θ3 θ4 θ5
y1 y2 y3 y4 y5
bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)
rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence
p(y1k r(i)1k) can be discarded without destroying optimality
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24
Monophonic model (Cemgil et al 2006)
bull We introduce a pitch label indicator m
bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states
r0 r1 rT
m0 m1 mT
s0 s1 sT
y1 yT
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 2
Colaborators
bull Onur Dikmen Bogazici Istanbul
bull Paul Peeling Cambridge
bull Nick Whiteley Cambridge
bull Simon Godsill Cambridge
bull Cedric Fevotte ENST Paris Telecom
bull David Barber UCL London
bull Bert Kappen Nijmegen The Netherlands
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 1
Statistical Approaches
bull Probabilistic
bull Hierarchical signal models to incorporate prior knowledgeinspiration
from various sources
ndash Physics (acoustics physical models )
ndash Studies of human cognition and perception (masking psychoacoustics )
ndash Musicology (musical constructs harmony tempo form )
bull Consistent framework for developing inference algorithms
bull Contrast to TraditionalProcedural approaches ndash where no clear
distinction between ldquowhatrdquo and ldquohowrdquo
bull Need to overcome computational obstacles (time memory)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 2
Generative Models for audition
bull Computer audition hArr inverse synthesis via Bayesian inference
p(Structure|Observations) prop p(Observations|Structure)p(Structure)
Goal Developing flexible prior structures for modelling nonstationary
sources
lowast source separation transcription
lowast restoration interpolation localisation identification
lowast coding compression resynthesis cross synthesis
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 3
Bayesian Source Separation
bull Joint estimation of Sources given Observations
Source Model v Parameters of Source prior
sk1 skn skN v
xk1 xkM
k = 1 K
λ
Observation Model λ Channel noise mixing system
p(Src|Obs) prop
int
dλdvp(Obs|Src λ)p(Src|v)p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 4
Audio RestorationInterpolation
bull Estimate missing samples given observed ones
bull Restoration concatenative expressive speech synthesis
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 5
Polyphonic Music Transcription
bull from sound
tsec
fHz
0 1 2 3 4 5 6 7 80
1000
2000
3000
4000
5000
0
10
20
(S)
bull to score
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 6
Decimated_chopinwav
Media File (audiowav)
Modelling and Computational issues
bull Hierarchical
ndash Signal levelpitch onsets timbre
ndash Symbolic levelmelody motives harmony chords tonality rhythm beat tempo articulationinstrumentation voice
ndash Cognitive levelexpression genre form style mood emotion
bull Uncertainty
ndash Parameter LearningWhich pitch rhythm tempo meter time signature
ndash Model SelectionHow many notes harmonics onsets sections
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 7
Generative Models for Music
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 8
Generative Models for Music
Score Expression
Piano-Roll
Signal
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 9
Hierarchical Modeling of Music
M
1 2 tv1 v2 vtk1 k2 kth1 h2 ht1 2 tm1 m2 mtgj1 gj2 gjtrj1 rj2 rjtnj1 nj2 njtxj1 xj2 xjtyj1 yj2 yjty1 y2 yt
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 10
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Research Questions
What kinds of prior knowledge and modelling techniques are usefulHow can we do efficient inference
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 11
Signal Models for Audio
bull Time domain ndash state space dynamical models
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA) switching state space models
ndash Flexible Physically realistic
ndash Analysis down to sample precision Computationally quite heavy
bull Transform domain ndash Fourier representations Generalised Linear
model
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 12
Sinusoidal Modeling
bull Sound is primarily about oscillations and resonance
bull Cascade of second order sytems
bull Audio signals can often be compactly represented by sinusoidals
(real) yn =
psum
k=1
αkeminusγkn cos(ωkn+ φk)
(complex) yn =
psum
k=1
ck(eminusγk+jωk)n
y = F (γ1p ω1p)c
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13
State space Parametrisation
xn+1 =
eminusγ1+jω1
eminusγp+jωp
︸ ︷︷ ︸
A
xn x0 =
c1c2cp
yn =(
1 1 1 1)
︸ ︷︷ ︸
C
xn
x0 x1 xkminus1 xk xK
y1 ykminus1 yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14
State Space Parametrisation
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15
Audio RestorationInterpolation
bull Estimate missing samples given observed ones
bull Restoration concatenative expressive speech synthesis
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16
Audio Interpolation
p(xnotκ|xκ) prop
int
dHp(xnotκ|H)p(xκ|H)p(H)
H equiv (parameters hidden states)
H
xnotκ xκ
Missing Observed
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17
Probabilistic Phase Vocoder (Cemgil and Godsill 2005)
Aν Qν
sν0 middot middot middot sν
k middot middot middot sνKminus1
ν = 0 W minus 1
x0 xk xKminus1
sνk sim N (sν
kAνsνkminus1 Qν) Aν sim N
(
Aν
(cos(ων) minus sin(ων)sin(ων) cos(ων)
)
Ψ
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18
Inference Structured Variational Bayes
Aα q(Aα) Qα q(Qα)
middot middot middot sαkminus1 sα
ksα
k+1 middot middot middot
α isin C
prod
k q(sαk |s
αkminus1)
xk q(xk)
bull Intuitive algorithm
ndash Substract from the observed signal x the prediction of the frequency bands in notα
ndash Compute a fit for α to this residual and iterate
bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of
equations
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19
Restoration
bull Piano
ndash Signal with missing samples (37)
ndash Reconstruction 768 dB improvement
ndash Original
bull Trumpet
ndash Signal with missing samples (37)
ndash Reconstruction 710 dB improvement
ndash Original
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20
piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)
Hierarchical Factorial Models
bull Each component models a latent process
bull The observations are projections
rν0 middot middot middot rν
k middot middot middot rνK
θν0 middot middot middot θ
νk middot middot middot θ
νK
ν = 1 W
yk yK
bull Generalises Source-filter models
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21
Harmonic model with changepoints
rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1
θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸
reg
+ [rk = 1]N (0 S)︸ ︷︷ ︸
new
yk|θk sim N (Cθk R)
A =
Gω
G2ω
GH
ω
N
Gω = ρk
(cos(ω) minus sin(ω)sin(ω) cos(ω)
)
damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22
Exact Inference in switching state space models is intractable
bull In general exact inference is NP hard
ndash Conditional Gaussians are not closed under marginalization
rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity
rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially
minus7903666343
076292
minus103422
minus101982minus2393
minus27957
minus04593
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23
Exact Inference for Changepoint detection
bull Exact inference is achievable in polynomial timespace
ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan
1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and
Heskes 2006)
r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0
θ0 θ1 θ2 θ3 θ4 θ5
y1 y2 y3 y4 y5
bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)
rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence
p(y1k r(i)1k) can be discarded without destroying optimality
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24
Monophonic model (Cemgil et al 2006)
bull We introduce a pitch label indicator m
bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states
r0 r1 rT
m0 m1 mT
s0 s1 sT
y1 yT
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 3
Statistical Approaches
bull Probabilistic
bull Hierarchical signal models to incorporate prior knowledgeinspiration
from various sources
ndash Physics (acoustics physical models )
ndash Studies of human cognition and perception (masking psychoacoustics )
ndash Musicology (musical constructs harmony tempo form )
bull Consistent framework for developing inference algorithms
bull Contrast to TraditionalProcedural approaches ndash where no clear
distinction between ldquowhatrdquo and ldquohowrdquo
bull Need to overcome computational obstacles (time memory)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 2
Generative Models for audition
bull Computer audition hArr inverse synthesis via Bayesian inference
p(Structure|Observations) prop p(Observations|Structure)p(Structure)
Goal Developing flexible prior structures for modelling nonstationary
sources
lowast source separation transcription
lowast restoration interpolation localisation identification
lowast coding compression resynthesis cross synthesis
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 3
Bayesian Source Separation
bull Joint estimation of Sources given Observations
Source Model v Parameters of Source prior
sk1 skn skN v
xk1 xkM
k = 1 K
λ
Observation Model λ Channel noise mixing system
p(Src|Obs) prop
int
dλdvp(Obs|Src λ)p(Src|v)p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 4
Audio RestorationInterpolation
bull Estimate missing samples given observed ones
bull Restoration concatenative expressive speech synthesis
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 5
Polyphonic Music Transcription
bull from sound
tsec
fHz
0 1 2 3 4 5 6 7 80
1000
2000
3000
4000
5000
0
10
20
(S)
bull to score
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 6
Decimated_chopinwav
Media File (audiowav)
Modelling and Computational issues
bull Hierarchical
ndash Signal levelpitch onsets timbre
ndash Symbolic levelmelody motives harmony chords tonality rhythm beat tempo articulationinstrumentation voice
ndash Cognitive levelexpression genre form style mood emotion
bull Uncertainty
ndash Parameter LearningWhich pitch rhythm tempo meter time signature
ndash Model SelectionHow many notes harmonics onsets sections
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 7
Generative Models for Music
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 8
Generative Models for Music
Score Expression
Piano-Roll
Signal
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 9
Hierarchical Modeling of Music
M
1 2 tv1 v2 vtk1 k2 kth1 h2 ht1 2 tm1 m2 mtgj1 gj2 gjtrj1 rj2 rjtnj1 nj2 njtxj1 xj2 xjtyj1 yj2 yjty1 y2 yt
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 10
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Research Questions
What kinds of prior knowledge and modelling techniques are usefulHow can we do efficient inference
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 11
Signal Models for Audio
bull Time domain ndash state space dynamical models
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA) switching state space models
ndash Flexible Physically realistic
ndash Analysis down to sample precision Computationally quite heavy
bull Transform domain ndash Fourier representations Generalised Linear
model
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 12
Sinusoidal Modeling
bull Sound is primarily about oscillations and resonance
bull Cascade of second order sytems
bull Audio signals can often be compactly represented by sinusoidals
(real) yn =
psum
k=1
αkeminusγkn cos(ωkn+ φk)
(complex) yn =
psum
k=1
ck(eminusγk+jωk)n
y = F (γ1p ω1p)c
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13
State space Parametrisation
xn+1 =
eminusγ1+jω1
eminusγp+jωp
︸ ︷︷ ︸
A
xn x0 =
c1c2cp
yn =(
1 1 1 1)
︸ ︷︷ ︸
C
xn
x0 x1 xkminus1 xk xK
y1 ykminus1 yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14
State Space Parametrisation
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15
Audio RestorationInterpolation
bull Estimate missing samples given observed ones
bull Restoration concatenative expressive speech synthesis
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16
Audio Interpolation
p(xnotκ|xκ) prop
int
dHp(xnotκ|H)p(xκ|H)p(H)
H equiv (parameters hidden states)
H
xnotκ xκ
Missing Observed
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17
Probabilistic Phase Vocoder (Cemgil and Godsill 2005)
Aν Qν
sν0 middot middot middot sν
k middot middot middot sνKminus1
ν = 0 W minus 1
x0 xk xKminus1
sνk sim N (sν
kAνsνkminus1 Qν) Aν sim N
(
Aν
(cos(ων) minus sin(ων)sin(ων) cos(ων)
)
Ψ
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18
Inference Structured Variational Bayes
Aα q(Aα) Qα q(Qα)
middot middot middot sαkminus1 sα
ksα
k+1 middot middot middot
α isin C
prod
k q(sαk |s
αkminus1)
xk q(xk)
bull Intuitive algorithm
ndash Substract from the observed signal x the prediction of the frequency bands in notα
ndash Compute a fit for α to this residual and iterate
bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of
equations
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19
Restoration
bull Piano
ndash Signal with missing samples (37)
ndash Reconstruction 768 dB improvement
ndash Original
bull Trumpet
ndash Signal with missing samples (37)
ndash Reconstruction 710 dB improvement
ndash Original
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20
piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)
Hierarchical Factorial Models
bull Each component models a latent process
bull The observations are projections
rν0 middot middot middot rν
k middot middot middot rνK
θν0 middot middot middot θ
νk middot middot middot θ
νK
ν = 1 W
yk yK
bull Generalises Source-filter models
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21
Harmonic model with changepoints
rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1
θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸
reg
+ [rk = 1]N (0 S)︸ ︷︷ ︸
new
yk|θk sim N (Cθk R)
A =
Gω
G2ω
GH
ω
N
Gω = ρk
(cos(ω) minus sin(ω)sin(ω) cos(ω)
)
damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22
Exact Inference in switching state space models is intractable
bull In general exact inference is NP hard
ndash Conditional Gaussians are not closed under marginalization
rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity
rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially
minus7903666343
076292
minus103422
minus101982minus2393
minus27957
minus04593
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23
Exact Inference for Changepoint detection
bull Exact inference is achievable in polynomial timespace
ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan
1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and
Heskes 2006)
r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0
θ0 θ1 θ2 θ3 θ4 θ5
y1 y2 y3 y4 y5
bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)
rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence
p(y1k r(i)1k) can be discarded without destroying optimality
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24
Monophonic model (Cemgil et al 2006)
bull We introduce a pitch label indicator m
bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states
r0 r1 rT
m0 m1 mT
s0 s1 sT
y1 yT
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 4
Generative Models for audition
bull Computer audition hArr inverse synthesis via Bayesian inference
p(Structure|Observations) prop p(Observations|Structure)p(Structure)
Goal Developing flexible prior structures for modelling nonstationary
sources
lowast source separation transcription
lowast restoration interpolation localisation identification
lowast coding compression resynthesis cross synthesis
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 3
Bayesian Source Separation
bull Joint estimation of Sources given Observations
Source Model v Parameters of Source prior
sk1 skn skN v
xk1 xkM
k = 1 K
λ
Observation Model λ Channel noise mixing system
p(Src|Obs) prop
int
dλdvp(Obs|Src λ)p(Src|v)p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 4
Audio RestorationInterpolation
bull Estimate missing samples given observed ones
bull Restoration concatenative expressive speech synthesis
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 5
Polyphonic Music Transcription
bull from sound
tsec
fHz
0 1 2 3 4 5 6 7 80
1000
2000
3000
4000
5000
0
10
20
(S)
bull to score
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 6
Decimated_chopinwav
Media File (audiowav)
Modelling and Computational issues
bull Hierarchical
ndash Signal levelpitch onsets timbre
ndash Symbolic levelmelody motives harmony chords tonality rhythm beat tempo articulationinstrumentation voice
ndash Cognitive levelexpression genre form style mood emotion
bull Uncertainty
ndash Parameter LearningWhich pitch rhythm tempo meter time signature
ndash Model SelectionHow many notes harmonics onsets sections
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 7
Generative Models for Music
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 8
Generative Models for Music
Score Expression
Piano-Roll
Signal
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 9
Hierarchical Modeling of Music
M
1 2 tv1 v2 vtk1 k2 kth1 h2 ht1 2 tm1 m2 mtgj1 gj2 gjtrj1 rj2 rjtnj1 nj2 njtxj1 xj2 xjtyj1 yj2 yjty1 y2 yt
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 10
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Research Questions
What kinds of prior knowledge and modelling techniques are usefulHow can we do efficient inference
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 11
Signal Models for Audio
bull Time domain ndash state space dynamical models
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA) switching state space models
ndash Flexible Physically realistic
ndash Analysis down to sample precision Computationally quite heavy
bull Transform domain ndash Fourier representations Generalised Linear
model
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 12
Sinusoidal Modeling
bull Sound is primarily about oscillations and resonance
bull Cascade of second order sytems
bull Audio signals can often be compactly represented by sinusoidals
(real) yn =
psum
k=1
αkeminusγkn cos(ωkn+ φk)
(complex) yn =
psum
k=1
ck(eminusγk+jωk)n
y = F (γ1p ω1p)c
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13
State space Parametrisation
xn+1 =
eminusγ1+jω1
eminusγp+jωp
︸ ︷︷ ︸
A
xn x0 =
c1c2cp
yn =(
1 1 1 1)
︸ ︷︷ ︸
C
xn
x0 x1 xkminus1 xk xK
y1 ykminus1 yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14
State Space Parametrisation
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15
Audio RestorationInterpolation
bull Estimate missing samples given observed ones
bull Restoration concatenative expressive speech synthesis
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16
Audio Interpolation
p(xnotκ|xκ) prop
int
dHp(xnotκ|H)p(xκ|H)p(H)
H equiv (parameters hidden states)
H
xnotκ xκ
Missing Observed
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17
Probabilistic Phase Vocoder (Cemgil and Godsill 2005)
Aν Qν
sν0 middot middot middot sν
k middot middot middot sνKminus1
ν = 0 W minus 1
x0 xk xKminus1
sνk sim N (sν
kAνsνkminus1 Qν) Aν sim N
(
Aν
(cos(ων) minus sin(ων)sin(ων) cos(ων)
)
Ψ
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18
Inference Structured Variational Bayes
Aα q(Aα) Qα q(Qα)
middot middot middot sαkminus1 sα
ksα
k+1 middot middot middot
α isin C
prod
k q(sαk |s
αkminus1)
xk q(xk)
bull Intuitive algorithm
ndash Substract from the observed signal x the prediction of the frequency bands in notα
ndash Compute a fit for α to this residual and iterate
bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of
equations
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19
Restoration
bull Piano
ndash Signal with missing samples (37)
ndash Reconstruction 768 dB improvement
ndash Original
bull Trumpet
ndash Signal with missing samples (37)
ndash Reconstruction 710 dB improvement
ndash Original
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20
piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)
Hierarchical Factorial Models
bull Each component models a latent process
bull The observations are projections
rν0 middot middot middot rν
k middot middot middot rνK
θν0 middot middot middot θ
νk middot middot middot θ
νK
ν = 1 W
yk yK
bull Generalises Source-filter models
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21
Harmonic model with changepoints
rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1
θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸
reg
+ [rk = 1]N (0 S)︸ ︷︷ ︸
new
yk|θk sim N (Cθk R)
A =
Gω
G2ω
GH
ω
N
Gω = ρk
(cos(ω) minus sin(ω)sin(ω) cos(ω)
)
damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22
Exact Inference in switching state space models is intractable
bull In general exact inference is NP hard
ndash Conditional Gaussians are not closed under marginalization
rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity
rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially
minus7903666343
076292
minus103422
minus101982minus2393
minus27957
minus04593
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23
Exact Inference for Changepoint detection
bull Exact inference is achievable in polynomial timespace
ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan
1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and
Heskes 2006)
r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0
θ0 θ1 θ2 θ3 θ4 θ5
y1 y2 y3 y4 y5
bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)
rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence
p(y1k r(i)1k) can be discarded without destroying optimality
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24
Monophonic model (Cemgil et al 2006)
bull We introduce a pitch label indicator m
bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states
r0 r1 rT
m0 m1 mT
s0 s1 sT
y1 yT
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 5
Bayesian Source Separation
bull Joint estimation of Sources given Observations
Source Model v Parameters of Source prior
sk1 skn skN v
xk1 xkM
k = 1 K
λ
Observation Model λ Channel noise mixing system
p(Src|Obs) prop
int
dλdvp(Obs|Src λ)p(Src|v)p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 4
Audio RestorationInterpolation
bull Estimate missing samples given observed ones
bull Restoration concatenative expressive speech synthesis
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 5
Polyphonic Music Transcription
bull from sound
tsec
fHz
0 1 2 3 4 5 6 7 80
1000
2000
3000
4000
5000
0
10
20
(S)
bull to score
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 6
Decimated_chopinwav
Media File (audiowav)
Modelling and Computational issues
bull Hierarchical
ndash Signal levelpitch onsets timbre
ndash Symbolic levelmelody motives harmony chords tonality rhythm beat tempo articulationinstrumentation voice
ndash Cognitive levelexpression genre form style mood emotion
bull Uncertainty
ndash Parameter LearningWhich pitch rhythm tempo meter time signature
ndash Model SelectionHow many notes harmonics onsets sections
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 7
Generative Models for Music
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 8
Generative Models for Music
Score Expression
Piano-Roll
Signal
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 9
Hierarchical Modeling of Music
M
1 2 tv1 v2 vtk1 k2 kth1 h2 ht1 2 tm1 m2 mtgj1 gj2 gjtrj1 rj2 rjtnj1 nj2 njtxj1 xj2 xjtyj1 yj2 yjty1 y2 yt
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 10
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Research Questions
What kinds of prior knowledge and modelling techniques are usefulHow can we do efficient inference
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 11
Signal Models for Audio
bull Time domain ndash state space dynamical models
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA) switching state space models
ndash Flexible Physically realistic
ndash Analysis down to sample precision Computationally quite heavy
bull Transform domain ndash Fourier representations Generalised Linear
model
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 12
Sinusoidal Modeling
bull Sound is primarily about oscillations and resonance
bull Cascade of second order sytems
bull Audio signals can often be compactly represented by sinusoidals
(real) yn =
psum
k=1
αkeminusγkn cos(ωkn+ φk)
(complex) yn =
psum
k=1
ck(eminusγk+jωk)n
y = F (γ1p ω1p)c
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13
State space Parametrisation
xn+1 =
eminusγ1+jω1
eminusγp+jωp
︸ ︷︷ ︸
A
xn x0 =
c1c2cp
yn =(
1 1 1 1)
︸ ︷︷ ︸
C
xn
x0 x1 xkminus1 xk xK
y1 ykminus1 yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14
State Space Parametrisation
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15
Audio RestorationInterpolation
bull Estimate missing samples given observed ones
bull Restoration concatenative expressive speech synthesis
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16
Audio Interpolation
p(xnotκ|xκ) prop
int
dHp(xnotκ|H)p(xκ|H)p(H)
H equiv (parameters hidden states)
H
xnotκ xκ
Missing Observed
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17
Probabilistic Phase Vocoder (Cemgil and Godsill 2005)
Aν Qν
sν0 middot middot middot sν
k middot middot middot sνKminus1
ν = 0 W minus 1
x0 xk xKminus1
sνk sim N (sν
kAνsνkminus1 Qν) Aν sim N
(
Aν
(cos(ων) minus sin(ων)sin(ων) cos(ων)
)
Ψ
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18
Inference Structured Variational Bayes
Aα q(Aα) Qα q(Qα)
middot middot middot sαkminus1 sα
ksα
k+1 middot middot middot
α isin C
prod
k q(sαk |s
αkminus1)
xk q(xk)
bull Intuitive algorithm
ndash Substract from the observed signal x the prediction of the frequency bands in notα
ndash Compute a fit for α to this residual and iterate
bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of
equations
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19
Restoration
bull Piano
ndash Signal with missing samples (37)
ndash Reconstruction 768 dB improvement
ndash Original
bull Trumpet
ndash Signal with missing samples (37)
ndash Reconstruction 710 dB improvement
ndash Original
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20
piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)
Hierarchical Factorial Models
bull Each component models a latent process
bull The observations are projections
rν0 middot middot middot rν
k middot middot middot rνK
θν0 middot middot middot θ
νk middot middot middot θ
νK
ν = 1 W
yk yK
bull Generalises Source-filter models
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21
Harmonic model with changepoints
rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1
θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸
reg
+ [rk = 1]N (0 S)︸ ︷︷ ︸
new
yk|θk sim N (Cθk R)
A =
Gω
G2ω
GH
ω
N
Gω = ρk
(cos(ω) minus sin(ω)sin(ω) cos(ω)
)
damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22
Exact Inference in switching state space models is intractable
bull In general exact inference is NP hard
ndash Conditional Gaussians are not closed under marginalization
rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity
rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially
minus7903666343
076292
minus103422
minus101982minus2393
minus27957
minus04593
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23
Exact Inference for Changepoint detection
bull Exact inference is achievable in polynomial timespace
ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan
1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and
Heskes 2006)
r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0
θ0 θ1 θ2 θ3 θ4 θ5
y1 y2 y3 y4 y5
bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)
rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence
p(y1k r(i)1k) can be discarded without destroying optimality
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24
Monophonic model (Cemgil et al 2006)
bull We introduce a pitch label indicator m
bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states
r0 r1 rT
m0 m1 mT
s0 s1 sT
y1 yT
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 6
Audio RestorationInterpolation
bull Estimate missing samples given observed ones
bull Restoration concatenative expressive speech synthesis
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 5
Polyphonic Music Transcription
bull from sound
tsec
fHz
0 1 2 3 4 5 6 7 80
1000
2000
3000
4000
5000
0
10
20
(S)
bull to score
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 6
Decimated_chopinwav
Media File (audiowav)
Modelling and Computational issues
bull Hierarchical
ndash Signal levelpitch onsets timbre
ndash Symbolic levelmelody motives harmony chords tonality rhythm beat tempo articulationinstrumentation voice
ndash Cognitive levelexpression genre form style mood emotion
bull Uncertainty
ndash Parameter LearningWhich pitch rhythm tempo meter time signature
ndash Model SelectionHow many notes harmonics onsets sections
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 7
Generative Models for Music
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 8
Generative Models for Music
Score Expression
Piano-Roll
Signal
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 9
Hierarchical Modeling of Music
M
1 2 tv1 v2 vtk1 k2 kth1 h2 ht1 2 tm1 m2 mtgj1 gj2 gjtrj1 rj2 rjtnj1 nj2 njtxj1 xj2 xjtyj1 yj2 yjty1 y2 yt
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 10
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Research Questions
What kinds of prior knowledge and modelling techniques are usefulHow can we do efficient inference
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 11
Signal Models for Audio
bull Time domain ndash state space dynamical models
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA) switching state space models
ndash Flexible Physically realistic
ndash Analysis down to sample precision Computationally quite heavy
bull Transform domain ndash Fourier representations Generalised Linear
model
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 12
Sinusoidal Modeling
bull Sound is primarily about oscillations and resonance
bull Cascade of second order sytems
bull Audio signals can often be compactly represented by sinusoidals
(real) yn =
psum
k=1
αkeminusγkn cos(ωkn+ φk)
(complex) yn =
psum
k=1
ck(eminusγk+jωk)n
y = F (γ1p ω1p)c
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13
State space Parametrisation
xn+1 =
eminusγ1+jω1
eminusγp+jωp
︸ ︷︷ ︸
A
xn x0 =
c1c2cp
yn =(
1 1 1 1)
︸ ︷︷ ︸
C
xn
x0 x1 xkminus1 xk xK
y1 ykminus1 yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14
State Space Parametrisation
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15
Audio RestorationInterpolation
bull Estimate missing samples given observed ones
bull Restoration concatenative expressive speech synthesis
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16
Audio Interpolation
p(xnotκ|xκ) prop
int
dHp(xnotκ|H)p(xκ|H)p(H)
H equiv (parameters hidden states)
H
xnotκ xκ
Missing Observed
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17
Probabilistic Phase Vocoder (Cemgil and Godsill 2005)
Aν Qν
sν0 middot middot middot sν
k middot middot middot sνKminus1
ν = 0 W minus 1
x0 xk xKminus1
sνk sim N (sν
kAνsνkminus1 Qν) Aν sim N
(
Aν
(cos(ων) minus sin(ων)sin(ων) cos(ων)
)
Ψ
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18
Inference Structured Variational Bayes
Aα q(Aα) Qα q(Qα)
middot middot middot sαkminus1 sα
ksα
k+1 middot middot middot
α isin C
prod
k q(sαk |s
αkminus1)
xk q(xk)
bull Intuitive algorithm
ndash Substract from the observed signal x the prediction of the frequency bands in notα
ndash Compute a fit for α to this residual and iterate
bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of
equations
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19
Restoration
bull Piano
ndash Signal with missing samples (37)
ndash Reconstruction 768 dB improvement
ndash Original
bull Trumpet
ndash Signal with missing samples (37)
ndash Reconstruction 710 dB improvement
ndash Original
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20
piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)
Hierarchical Factorial Models
bull Each component models a latent process
bull The observations are projections
rν0 middot middot middot rν
k middot middot middot rνK
θν0 middot middot middot θ
νk middot middot middot θ
νK
ν = 1 W
yk yK
bull Generalises Source-filter models
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21
Harmonic model with changepoints
rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1
θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸
reg
+ [rk = 1]N (0 S)︸ ︷︷ ︸
new
yk|θk sim N (Cθk R)
A =
Gω
G2ω
GH
ω
N
Gω = ρk
(cos(ω) minus sin(ω)sin(ω) cos(ω)
)
damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22
Exact Inference in switching state space models is intractable
bull In general exact inference is NP hard
ndash Conditional Gaussians are not closed under marginalization
rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity
rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially
minus7903666343
076292
minus103422
minus101982minus2393
minus27957
minus04593
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23
Exact Inference for Changepoint detection
bull Exact inference is achievable in polynomial timespace
ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan
1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and
Heskes 2006)
r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0
θ0 θ1 θ2 θ3 θ4 θ5
y1 y2 y3 y4 y5
bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)
rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence
p(y1k r(i)1k) can be discarded without destroying optimality
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24
Monophonic model (Cemgil et al 2006)
bull We introduce a pitch label indicator m
bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states
r0 r1 rT
m0 m1 mT
s0 s1 sT
y1 yT
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 7
Polyphonic Music Transcription
bull from sound
tsec
fHz
0 1 2 3 4 5 6 7 80
1000
2000
3000
4000
5000
0
10
20
(S)
bull to score
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 6
Decimated_chopinwav
Media File (audiowav)
Modelling and Computational issues
bull Hierarchical
ndash Signal levelpitch onsets timbre
ndash Symbolic levelmelody motives harmony chords tonality rhythm beat tempo articulationinstrumentation voice
ndash Cognitive levelexpression genre form style mood emotion
bull Uncertainty
ndash Parameter LearningWhich pitch rhythm tempo meter time signature
ndash Model SelectionHow many notes harmonics onsets sections
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 7
Generative Models for Music
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 8
Generative Models for Music
Score Expression
Piano-Roll
Signal
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 9
Hierarchical Modeling of Music
M
1 2 tv1 v2 vtk1 k2 kth1 h2 ht1 2 tm1 m2 mtgj1 gj2 gjtrj1 rj2 rjtnj1 nj2 njtxj1 xj2 xjtyj1 yj2 yjty1 y2 yt
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 10
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Research Questions
What kinds of prior knowledge and modelling techniques are usefulHow can we do efficient inference
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 11
Signal Models for Audio
bull Time domain ndash state space dynamical models
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA) switching state space models
ndash Flexible Physically realistic
ndash Analysis down to sample precision Computationally quite heavy
bull Transform domain ndash Fourier representations Generalised Linear
model
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 12
Sinusoidal Modeling
bull Sound is primarily about oscillations and resonance
bull Cascade of second order sytems
bull Audio signals can often be compactly represented by sinusoidals
(real) yn =
psum
k=1
αkeminusγkn cos(ωkn+ φk)
(complex) yn =
psum
k=1
ck(eminusγk+jωk)n
y = F (γ1p ω1p)c
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13
State space Parametrisation
xn+1 =
eminusγ1+jω1
eminusγp+jωp
︸ ︷︷ ︸
A
xn x0 =
c1c2cp
yn =(
1 1 1 1)
︸ ︷︷ ︸
C
xn
x0 x1 xkminus1 xk xK
y1 ykminus1 yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14
State Space Parametrisation
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15
Audio RestorationInterpolation
bull Estimate missing samples given observed ones
bull Restoration concatenative expressive speech synthesis
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16
Audio Interpolation
p(xnotκ|xκ) prop
int
dHp(xnotκ|H)p(xκ|H)p(H)
H equiv (parameters hidden states)
H
xnotκ xκ
Missing Observed
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17
Probabilistic Phase Vocoder (Cemgil and Godsill 2005)
Aν Qν
sν0 middot middot middot sν
k middot middot middot sνKminus1
ν = 0 W minus 1
x0 xk xKminus1
sνk sim N (sν
kAνsνkminus1 Qν) Aν sim N
(
Aν
(cos(ων) minus sin(ων)sin(ων) cos(ων)
)
Ψ
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18
Inference Structured Variational Bayes
Aα q(Aα) Qα q(Qα)
middot middot middot sαkminus1 sα
ksα
k+1 middot middot middot
α isin C
prod
k q(sαk |s
αkminus1)
xk q(xk)
bull Intuitive algorithm
ndash Substract from the observed signal x the prediction of the frequency bands in notα
ndash Compute a fit for α to this residual and iterate
bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of
equations
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19
Restoration
bull Piano
ndash Signal with missing samples (37)
ndash Reconstruction 768 dB improvement
ndash Original
bull Trumpet
ndash Signal with missing samples (37)
ndash Reconstruction 710 dB improvement
ndash Original
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20
piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)
Hierarchical Factorial Models
bull Each component models a latent process
bull The observations are projections
rν0 middot middot middot rν
k middot middot middot rνK
θν0 middot middot middot θ
νk middot middot middot θ
νK
ν = 1 W
yk yK
bull Generalises Source-filter models
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21
Harmonic model with changepoints
rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1
θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸
reg
+ [rk = 1]N (0 S)︸ ︷︷ ︸
new
yk|θk sim N (Cθk R)
A =
Gω
G2ω
GH
ω
N
Gω = ρk
(cos(ω) minus sin(ω)sin(ω) cos(ω)
)
damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22
Exact Inference in switching state space models is intractable
bull In general exact inference is NP hard
ndash Conditional Gaussians are not closed under marginalization
rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity
rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially
minus7903666343
076292
minus103422
minus101982minus2393
minus27957
minus04593
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23
Exact Inference for Changepoint detection
bull Exact inference is achievable in polynomial timespace
ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan
1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and
Heskes 2006)
r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0
θ0 θ1 θ2 θ3 θ4 θ5
y1 y2 y3 y4 y5
bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)
rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence
p(y1k r(i)1k) can be discarded without destroying optimality
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24
Monophonic model (Cemgil et al 2006)
bull We introduce a pitch label indicator m
bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states
r0 r1 rT
m0 m1 mT
s0 s1 sT
y1 yT
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 8
Modelling and Computational issues
bull Hierarchical
ndash Signal levelpitch onsets timbre
ndash Symbolic levelmelody motives harmony chords tonality rhythm beat tempo articulationinstrumentation voice
ndash Cognitive levelexpression genre form style mood emotion
bull Uncertainty
ndash Parameter LearningWhich pitch rhythm tempo meter time signature
ndash Model SelectionHow many notes harmonics onsets sections
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 7
Generative Models for Music
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 8
Generative Models for Music
Score Expression
Piano-Roll
Signal
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 9
Hierarchical Modeling of Music
M
1 2 tv1 v2 vtk1 k2 kth1 h2 ht1 2 tm1 m2 mtgj1 gj2 gjtrj1 rj2 rjtnj1 nj2 njtxj1 xj2 xjtyj1 yj2 yjty1 y2 yt
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 10
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Research Questions
What kinds of prior knowledge and modelling techniques are usefulHow can we do efficient inference
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 11
Signal Models for Audio
bull Time domain ndash state space dynamical models
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA) switching state space models
ndash Flexible Physically realistic
ndash Analysis down to sample precision Computationally quite heavy
bull Transform domain ndash Fourier representations Generalised Linear
model
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 12
Sinusoidal Modeling
bull Sound is primarily about oscillations and resonance
bull Cascade of second order sytems
bull Audio signals can often be compactly represented by sinusoidals
(real) yn =
psum
k=1
αkeminusγkn cos(ωkn+ φk)
(complex) yn =
psum
k=1
ck(eminusγk+jωk)n
y = F (γ1p ω1p)c
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13
State space Parametrisation
xn+1 =
eminusγ1+jω1
eminusγp+jωp
︸ ︷︷ ︸
A
xn x0 =
c1c2cp
yn =(
1 1 1 1)
︸ ︷︷ ︸
C
xn
x0 x1 xkminus1 xk xK
y1 ykminus1 yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14
State Space Parametrisation
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15
Audio RestorationInterpolation
bull Estimate missing samples given observed ones
bull Restoration concatenative expressive speech synthesis
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16
Audio Interpolation
p(xnotκ|xκ) prop
int
dHp(xnotκ|H)p(xκ|H)p(H)
H equiv (parameters hidden states)
H
xnotκ xκ
Missing Observed
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17
Probabilistic Phase Vocoder (Cemgil and Godsill 2005)
Aν Qν
sν0 middot middot middot sν
k middot middot middot sνKminus1
ν = 0 W minus 1
x0 xk xKminus1
sνk sim N (sν
kAνsνkminus1 Qν) Aν sim N
(
Aν
(cos(ων) minus sin(ων)sin(ων) cos(ων)
)
Ψ
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18
Inference Structured Variational Bayes
Aα q(Aα) Qα q(Qα)
middot middot middot sαkminus1 sα
ksα
k+1 middot middot middot
α isin C
prod
k q(sαk |s
αkminus1)
xk q(xk)
bull Intuitive algorithm
ndash Substract from the observed signal x the prediction of the frequency bands in notα
ndash Compute a fit for α to this residual and iterate
bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of
equations
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19
Restoration
bull Piano
ndash Signal with missing samples (37)
ndash Reconstruction 768 dB improvement
ndash Original
bull Trumpet
ndash Signal with missing samples (37)
ndash Reconstruction 710 dB improvement
ndash Original
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20
piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)
Hierarchical Factorial Models
bull Each component models a latent process
bull The observations are projections
rν0 middot middot middot rν
k middot middot middot rνK
θν0 middot middot middot θ
νk middot middot middot θ
νK
ν = 1 W
yk yK
bull Generalises Source-filter models
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21
Harmonic model with changepoints
rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1
θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸
reg
+ [rk = 1]N (0 S)︸ ︷︷ ︸
new
yk|θk sim N (Cθk R)
A =
Gω
G2ω
GH
ω
N
Gω = ρk
(cos(ω) minus sin(ω)sin(ω) cos(ω)
)
damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22
Exact Inference in switching state space models is intractable
bull In general exact inference is NP hard
ndash Conditional Gaussians are not closed under marginalization
rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity
rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially
minus7903666343
076292
minus103422
minus101982minus2393
minus27957
minus04593
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23
Exact Inference for Changepoint detection
bull Exact inference is achievable in polynomial timespace
ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan
1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and
Heskes 2006)
r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0
θ0 θ1 θ2 θ3 θ4 θ5
y1 y2 y3 y4 y5
bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)
rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence
p(y1k r(i)1k) can be discarded without destroying optimality
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24
Monophonic model (Cemgil et al 2006)
bull We introduce a pitch label indicator m
bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states
r0 r1 rT
m0 m1 mT
s0 s1 sT
y1 yT
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 9
Generative Models for Music
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 8
Generative Models for Music
Score Expression
Piano-Roll
Signal
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 9
Hierarchical Modeling of Music
M
1 2 tv1 v2 vtk1 k2 kth1 h2 ht1 2 tm1 m2 mtgj1 gj2 gjtrj1 rj2 rjtnj1 nj2 njtxj1 xj2 xjtyj1 yj2 yjty1 y2 yt
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 10
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Research Questions
What kinds of prior knowledge and modelling techniques are usefulHow can we do efficient inference
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 11
Signal Models for Audio
bull Time domain ndash state space dynamical models
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA) switching state space models
ndash Flexible Physically realistic
ndash Analysis down to sample precision Computationally quite heavy
bull Transform domain ndash Fourier representations Generalised Linear
model
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 12
Sinusoidal Modeling
bull Sound is primarily about oscillations and resonance
bull Cascade of second order sytems
bull Audio signals can often be compactly represented by sinusoidals
(real) yn =
psum
k=1
αkeminusγkn cos(ωkn+ φk)
(complex) yn =
psum
k=1
ck(eminusγk+jωk)n
y = F (γ1p ω1p)c
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13
State space Parametrisation
xn+1 =
eminusγ1+jω1
eminusγp+jωp
︸ ︷︷ ︸
A
xn x0 =
c1c2cp
yn =(
1 1 1 1)
︸ ︷︷ ︸
C
xn
x0 x1 xkminus1 xk xK
y1 ykminus1 yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14
State Space Parametrisation
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15
Audio RestorationInterpolation
bull Estimate missing samples given observed ones
bull Restoration concatenative expressive speech synthesis
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16
Audio Interpolation
p(xnotκ|xκ) prop
int
dHp(xnotκ|H)p(xκ|H)p(H)
H equiv (parameters hidden states)
H
xnotκ xκ
Missing Observed
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17
Probabilistic Phase Vocoder (Cemgil and Godsill 2005)
Aν Qν
sν0 middot middot middot sν
k middot middot middot sνKminus1
ν = 0 W minus 1
x0 xk xKminus1
sνk sim N (sν
kAνsνkminus1 Qν) Aν sim N
(
Aν
(cos(ων) minus sin(ων)sin(ων) cos(ων)
)
Ψ
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18
Inference Structured Variational Bayes
Aα q(Aα) Qα q(Qα)
middot middot middot sαkminus1 sα
ksα
k+1 middot middot middot
α isin C
prod
k q(sαk |s
αkminus1)
xk q(xk)
bull Intuitive algorithm
ndash Substract from the observed signal x the prediction of the frequency bands in notα
ndash Compute a fit for α to this residual and iterate
bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of
equations
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19
Restoration
bull Piano
ndash Signal with missing samples (37)
ndash Reconstruction 768 dB improvement
ndash Original
bull Trumpet
ndash Signal with missing samples (37)
ndash Reconstruction 710 dB improvement
ndash Original
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20
piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)
Hierarchical Factorial Models
bull Each component models a latent process
bull The observations are projections
rν0 middot middot middot rν
k middot middot middot rνK
θν0 middot middot middot θ
νk middot middot middot θ
νK
ν = 1 W
yk yK
bull Generalises Source-filter models
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21
Harmonic model with changepoints
rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1
θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸
reg
+ [rk = 1]N (0 S)︸ ︷︷ ︸
new
yk|θk sim N (Cθk R)
A =
Gω
G2ω
GH
ω
N
Gω = ρk
(cos(ω) minus sin(ω)sin(ω) cos(ω)
)
damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22
Exact Inference in switching state space models is intractable
bull In general exact inference is NP hard
ndash Conditional Gaussians are not closed under marginalization
rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity
rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially
minus7903666343
076292
minus103422
minus101982minus2393
minus27957
minus04593
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23
Exact Inference for Changepoint detection
bull Exact inference is achievable in polynomial timespace
ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan
1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and
Heskes 2006)
r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0
θ0 θ1 θ2 θ3 θ4 θ5
y1 y2 y3 y4 y5
bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)
rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence
p(y1k r(i)1k) can be discarded without destroying optimality
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24
Monophonic model (Cemgil et al 2006)
bull We introduce a pitch label indicator m
bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states
r0 r1 rT
m0 m1 mT
s0 s1 sT
y1 yT
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 10
Generative Models for Music
Score Expression
Piano-Roll
Signal
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 9
Hierarchical Modeling of Music
M
1 2 tv1 v2 vtk1 k2 kth1 h2 ht1 2 tm1 m2 mtgj1 gj2 gjtrj1 rj2 rjtnj1 nj2 njtxj1 xj2 xjtyj1 yj2 yjty1 y2 yt
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 10
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Research Questions
What kinds of prior knowledge and modelling techniques are usefulHow can we do efficient inference
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 11
Signal Models for Audio
bull Time domain ndash state space dynamical models
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA) switching state space models
ndash Flexible Physically realistic
ndash Analysis down to sample precision Computationally quite heavy
bull Transform domain ndash Fourier representations Generalised Linear
model
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 12
Sinusoidal Modeling
bull Sound is primarily about oscillations and resonance
bull Cascade of second order sytems
bull Audio signals can often be compactly represented by sinusoidals
(real) yn =
psum
k=1
αkeminusγkn cos(ωkn+ φk)
(complex) yn =
psum
k=1
ck(eminusγk+jωk)n
y = F (γ1p ω1p)c
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13
State space Parametrisation
xn+1 =
eminusγ1+jω1
eminusγp+jωp
︸ ︷︷ ︸
A
xn x0 =
c1c2cp
yn =(
1 1 1 1)
︸ ︷︷ ︸
C
xn
x0 x1 xkminus1 xk xK
y1 ykminus1 yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14
State Space Parametrisation
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15
Audio RestorationInterpolation
bull Estimate missing samples given observed ones
bull Restoration concatenative expressive speech synthesis
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16
Audio Interpolation
p(xnotκ|xκ) prop
int
dHp(xnotκ|H)p(xκ|H)p(H)
H equiv (parameters hidden states)
H
xnotκ xκ
Missing Observed
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17
Probabilistic Phase Vocoder (Cemgil and Godsill 2005)
Aν Qν
sν0 middot middot middot sν
k middot middot middot sνKminus1
ν = 0 W minus 1
x0 xk xKminus1
sνk sim N (sν
kAνsνkminus1 Qν) Aν sim N
(
Aν
(cos(ων) minus sin(ων)sin(ων) cos(ων)
)
Ψ
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18
Inference Structured Variational Bayes
Aα q(Aα) Qα q(Qα)
middot middot middot sαkminus1 sα
ksα
k+1 middot middot middot
α isin C
prod
k q(sαk |s
αkminus1)
xk q(xk)
bull Intuitive algorithm
ndash Substract from the observed signal x the prediction of the frequency bands in notα
ndash Compute a fit for α to this residual and iterate
bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of
equations
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19
Restoration
bull Piano
ndash Signal with missing samples (37)
ndash Reconstruction 768 dB improvement
ndash Original
bull Trumpet
ndash Signal with missing samples (37)
ndash Reconstruction 710 dB improvement
ndash Original
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20
piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)
Hierarchical Factorial Models
bull Each component models a latent process
bull The observations are projections
rν0 middot middot middot rν
k middot middot middot rνK
θν0 middot middot middot θ
νk middot middot middot θ
νK
ν = 1 W
yk yK
bull Generalises Source-filter models
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21
Harmonic model with changepoints
rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1
θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸
reg
+ [rk = 1]N (0 S)︸ ︷︷ ︸
new
yk|θk sim N (Cθk R)
A =
Gω
G2ω
GH
ω
N
Gω = ρk
(cos(ω) minus sin(ω)sin(ω) cos(ω)
)
damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22
Exact Inference in switching state space models is intractable
bull In general exact inference is NP hard
ndash Conditional Gaussians are not closed under marginalization
rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity
rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially
minus7903666343
076292
minus103422
minus101982minus2393
minus27957
minus04593
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23
Exact Inference for Changepoint detection
bull Exact inference is achievable in polynomial timespace
ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan
1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and
Heskes 2006)
r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0
θ0 θ1 θ2 θ3 θ4 θ5
y1 y2 y3 y4 y5
bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)
rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence
p(y1k r(i)1k) can be discarded without destroying optimality
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24
Monophonic model (Cemgil et al 2006)
bull We introduce a pitch label indicator m
bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states
r0 r1 rT
m0 m1 mT
s0 s1 sT
y1 yT
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 11
Hierarchical Modeling of Music
M
1 2 tv1 v2 vtk1 k2 kth1 h2 ht1 2 tm1 m2 mtgj1 gj2 gjtrj1 rj2 rjtnj1 nj2 njtxj1 xj2 xjtyj1 yj2 yjty1 y2 yt
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 10
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Research Questions
What kinds of prior knowledge and modelling techniques are usefulHow can we do efficient inference
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 11
Signal Models for Audio
bull Time domain ndash state space dynamical models
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA) switching state space models
ndash Flexible Physically realistic
ndash Analysis down to sample precision Computationally quite heavy
bull Transform domain ndash Fourier representations Generalised Linear
model
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 12
Sinusoidal Modeling
bull Sound is primarily about oscillations and resonance
bull Cascade of second order sytems
bull Audio signals can often be compactly represented by sinusoidals
(real) yn =
psum
k=1
αkeminusγkn cos(ωkn+ φk)
(complex) yn =
psum
k=1
ck(eminusγk+jωk)n
y = F (γ1p ω1p)c
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13
State space Parametrisation
xn+1 =
eminusγ1+jω1
eminusγp+jωp
︸ ︷︷ ︸
A
xn x0 =
c1c2cp
yn =(
1 1 1 1)
︸ ︷︷ ︸
C
xn
x0 x1 xkminus1 xk xK
y1 ykminus1 yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14
State Space Parametrisation
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15
Audio RestorationInterpolation
bull Estimate missing samples given observed ones
bull Restoration concatenative expressive speech synthesis
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16
Audio Interpolation
p(xnotκ|xκ) prop
int
dHp(xnotκ|H)p(xκ|H)p(H)
H equiv (parameters hidden states)
H
xnotκ xκ
Missing Observed
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17
Probabilistic Phase Vocoder (Cemgil and Godsill 2005)
Aν Qν
sν0 middot middot middot sν
k middot middot middot sνKminus1
ν = 0 W minus 1
x0 xk xKminus1
sνk sim N (sν
kAνsνkminus1 Qν) Aν sim N
(
Aν
(cos(ων) minus sin(ων)sin(ων) cos(ων)
)
Ψ
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18
Inference Structured Variational Bayes
Aα q(Aα) Qα q(Qα)
middot middot middot sαkminus1 sα
ksα
k+1 middot middot middot
α isin C
prod
k q(sαk |s
αkminus1)
xk q(xk)
bull Intuitive algorithm
ndash Substract from the observed signal x the prediction of the frequency bands in notα
ndash Compute a fit for α to this residual and iterate
bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of
equations
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19
Restoration
bull Piano
ndash Signal with missing samples (37)
ndash Reconstruction 768 dB improvement
ndash Original
bull Trumpet
ndash Signal with missing samples (37)
ndash Reconstruction 710 dB improvement
ndash Original
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20
piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)
Hierarchical Factorial Models
bull Each component models a latent process
bull The observations are projections
rν0 middot middot middot rν
k middot middot middot rνK
θν0 middot middot middot θ
νk middot middot middot θ
νK
ν = 1 W
yk yK
bull Generalises Source-filter models
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21
Harmonic model with changepoints
rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1
θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸
reg
+ [rk = 1]N (0 S)︸ ︷︷ ︸
new
yk|θk sim N (Cθk R)
A =
Gω
G2ω
GH
ω
N
Gω = ρk
(cos(ω) minus sin(ω)sin(ω) cos(ω)
)
damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22
Exact Inference in switching state space models is intractable
bull In general exact inference is NP hard
ndash Conditional Gaussians are not closed under marginalization
rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity
rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially
minus7903666343
076292
minus103422
minus101982minus2393
minus27957
minus04593
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23
Exact Inference for Changepoint detection
bull Exact inference is achievable in polynomial timespace
ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan
1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and
Heskes 2006)
r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0
θ0 θ1 θ2 θ3 θ4 θ5
y1 y2 y3 y4 y5
bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)
rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence
p(y1k r(i)1k) can be discarded without destroying optimality
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24
Monophonic model (Cemgil et al 2006)
bull We introduce a pitch label indicator m
bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states
r0 r1 rT
m0 m1 mT
s0 s1 sT
y1 yT
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 12
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Research Questions
What kinds of prior knowledge and modelling techniques are usefulHow can we do efficient inference
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 11
Signal Models for Audio
bull Time domain ndash state space dynamical models
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA) switching state space models
ndash Flexible Physically realistic
ndash Analysis down to sample precision Computationally quite heavy
bull Transform domain ndash Fourier representations Generalised Linear
model
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 12
Sinusoidal Modeling
bull Sound is primarily about oscillations and resonance
bull Cascade of second order sytems
bull Audio signals can often be compactly represented by sinusoidals
(real) yn =
psum
k=1
αkeminusγkn cos(ωkn+ φk)
(complex) yn =
psum
k=1
ck(eminusγk+jωk)n
y = F (γ1p ω1p)c
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13
State space Parametrisation
xn+1 =
eminusγ1+jω1
eminusγp+jωp
︸ ︷︷ ︸
A
xn x0 =
c1c2cp
yn =(
1 1 1 1)
︸ ︷︷ ︸
C
xn
x0 x1 xkminus1 xk xK
y1 ykminus1 yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14
State Space Parametrisation
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15
Audio RestorationInterpolation
bull Estimate missing samples given observed ones
bull Restoration concatenative expressive speech synthesis
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16
Audio Interpolation
p(xnotκ|xκ) prop
int
dHp(xnotκ|H)p(xκ|H)p(H)
H equiv (parameters hidden states)
H
xnotκ xκ
Missing Observed
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17
Probabilistic Phase Vocoder (Cemgil and Godsill 2005)
Aν Qν
sν0 middot middot middot sν
k middot middot middot sνKminus1
ν = 0 W minus 1
x0 xk xKminus1
sνk sim N (sν
kAνsνkminus1 Qν) Aν sim N
(
Aν
(cos(ων) minus sin(ων)sin(ων) cos(ων)
)
Ψ
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18
Inference Structured Variational Bayes
Aα q(Aα) Qα q(Qα)
middot middot middot sαkminus1 sα
ksα
k+1 middot middot middot
α isin C
prod
k q(sαk |s
αkminus1)
xk q(xk)
bull Intuitive algorithm
ndash Substract from the observed signal x the prediction of the frequency bands in notα
ndash Compute a fit for α to this residual and iterate
bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of
equations
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19
Restoration
bull Piano
ndash Signal with missing samples (37)
ndash Reconstruction 768 dB improvement
ndash Original
bull Trumpet
ndash Signal with missing samples (37)
ndash Reconstruction 710 dB improvement
ndash Original
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20
piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)
Hierarchical Factorial Models
bull Each component models a latent process
bull The observations are projections
rν0 middot middot middot rν
k middot middot middot rνK
θν0 middot middot middot θ
νk middot middot middot θ
νK
ν = 1 W
yk yK
bull Generalises Source-filter models
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21
Harmonic model with changepoints
rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1
θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸
reg
+ [rk = 1]N (0 S)︸ ︷︷ ︸
new
yk|θk sim N (Cθk R)
A =
Gω
G2ω
GH
ω
N
Gω = ρk
(cos(ω) minus sin(ω)sin(ω) cos(ω)
)
damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22
Exact Inference in switching state space models is intractable
bull In general exact inference is NP hard
ndash Conditional Gaussians are not closed under marginalization
rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity
rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially
minus7903666343
076292
minus103422
minus101982minus2393
minus27957
minus04593
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23
Exact Inference for Changepoint detection
bull Exact inference is achievable in polynomial timespace
ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan
1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and
Heskes 2006)
r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0
θ0 θ1 θ2 θ3 θ4 θ5
y1 y2 y3 y4 y5
bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)
rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence
p(y1k r(i)1k) can be discarded without destroying optimality
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24
Monophonic model (Cemgil et al 2006)
bull We introduce a pitch label indicator m
bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states
r0 r1 rT
m0 m1 mT
s0 s1 sT
y1 yT
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 13
Signal Models for Audio
bull Time domain ndash state space dynamical models
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA) switching state space models
ndash Flexible Physically realistic
ndash Analysis down to sample precision Computationally quite heavy
bull Transform domain ndash Fourier representations Generalised Linear
model
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 12
Sinusoidal Modeling
bull Sound is primarily about oscillations and resonance
bull Cascade of second order sytems
bull Audio signals can often be compactly represented by sinusoidals
(real) yn =
psum
k=1
αkeminusγkn cos(ωkn+ φk)
(complex) yn =
psum
k=1
ck(eminusγk+jωk)n
y = F (γ1p ω1p)c
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13
State space Parametrisation
xn+1 =
eminusγ1+jω1
eminusγp+jωp
︸ ︷︷ ︸
A
xn x0 =
c1c2cp
yn =(
1 1 1 1)
︸ ︷︷ ︸
C
xn
x0 x1 xkminus1 xk xK
y1 ykminus1 yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14
State Space Parametrisation
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15
Audio RestorationInterpolation
bull Estimate missing samples given observed ones
bull Restoration concatenative expressive speech synthesis
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16
Audio Interpolation
p(xnotκ|xκ) prop
int
dHp(xnotκ|H)p(xκ|H)p(H)
H equiv (parameters hidden states)
H
xnotκ xκ
Missing Observed
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17
Probabilistic Phase Vocoder (Cemgil and Godsill 2005)
Aν Qν
sν0 middot middot middot sν
k middot middot middot sνKminus1
ν = 0 W minus 1
x0 xk xKminus1
sνk sim N (sν
kAνsνkminus1 Qν) Aν sim N
(
Aν
(cos(ων) minus sin(ων)sin(ων) cos(ων)
)
Ψ
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18
Inference Structured Variational Bayes
Aα q(Aα) Qα q(Qα)
middot middot middot sαkminus1 sα
ksα
k+1 middot middot middot
α isin C
prod
k q(sαk |s
αkminus1)
xk q(xk)
bull Intuitive algorithm
ndash Substract from the observed signal x the prediction of the frequency bands in notα
ndash Compute a fit for α to this residual and iterate
bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of
equations
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19
Restoration
bull Piano
ndash Signal with missing samples (37)
ndash Reconstruction 768 dB improvement
ndash Original
bull Trumpet
ndash Signal with missing samples (37)
ndash Reconstruction 710 dB improvement
ndash Original
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20
piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)
Hierarchical Factorial Models
bull Each component models a latent process
bull The observations are projections
rν0 middot middot middot rν
k middot middot middot rνK
θν0 middot middot middot θ
νk middot middot middot θ
νK
ν = 1 W
yk yK
bull Generalises Source-filter models
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21
Harmonic model with changepoints
rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1
θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸
reg
+ [rk = 1]N (0 S)︸ ︷︷ ︸
new
yk|θk sim N (Cθk R)
A =
Gω
G2ω
GH
ω
N
Gω = ρk
(cos(ω) minus sin(ω)sin(ω) cos(ω)
)
damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22
Exact Inference in switching state space models is intractable
bull In general exact inference is NP hard
ndash Conditional Gaussians are not closed under marginalization
rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity
rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially
minus7903666343
076292
minus103422
minus101982minus2393
minus27957
minus04593
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23
Exact Inference for Changepoint detection
bull Exact inference is achievable in polynomial timespace
ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan
1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and
Heskes 2006)
r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0
θ0 θ1 θ2 θ3 θ4 θ5
y1 y2 y3 y4 y5
bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)
rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence
p(y1k r(i)1k) can be discarded without destroying optimality
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24
Monophonic model (Cemgil et al 2006)
bull We introduce a pitch label indicator m
bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states
r0 r1 rT
m0 m1 mT
s0 s1 sT
y1 yT
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 14
Sinusoidal Modeling
bull Sound is primarily about oscillations and resonance
bull Cascade of second order sytems
bull Audio signals can often be compactly represented by sinusoidals
(real) yn =
psum
k=1
αkeminusγkn cos(ωkn+ φk)
(complex) yn =
psum
k=1
ck(eminusγk+jωk)n
y = F (γ1p ω1p)c
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13
State space Parametrisation
xn+1 =
eminusγ1+jω1
eminusγp+jωp
︸ ︷︷ ︸
A
xn x0 =
c1c2cp
yn =(
1 1 1 1)
︸ ︷︷ ︸
C
xn
x0 x1 xkminus1 xk xK
y1 ykminus1 yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14
State Space Parametrisation
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15
Audio RestorationInterpolation
bull Estimate missing samples given observed ones
bull Restoration concatenative expressive speech synthesis
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16
Audio Interpolation
p(xnotκ|xκ) prop
int
dHp(xnotκ|H)p(xκ|H)p(H)
H equiv (parameters hidden states)
H
xnotκ xκ
Missing Observed
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17
Probabilistic Phase Vocoder (Cemgil and Godsill 2005)
Aν Qν
sν0 middot middot middot sν
k middot middot middot sνKminus1
ν = 0 W minus 1
x0 xk xKminus1
sνk sim N (sν
kAνsνkminus1 Qν) Aν sim N
(
Aν
(cos(ων) minus sin(ων)sin(ων) cos(ων)
)
Ψ
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18
Inference Structured Variational Bayes
Aα q(Aα) Qα q(Qα)
middot middot middot sαkminus1 sα
ksα
k+1 middot middot middot
α isin C
prod
k q(sαk |s
αkminus1)
xk q(xk)
bull Intuitive algorithm
ndash Substract from the observed signal x the prediction of the frequency bands in notα
ndash Compute a fit for α to this residual and iterate
bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of
equations
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19
Restoration
bull Piano
ndash Signal with missing samples (37)
ndash Reconstruction 768 dB improvement
ndash Original
bull Trumpet
ndash Signal with missing samples (37)
ndash Reconstruction 710 dB improvement
ndash Original
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20
piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)
Hierarchical Factorial Models
bull Each component models a latent process
bull The observations are projections
rν0 middot middot middot rν
k middot middot middot rνK
θν0 middot middot middot θ
νk middot middot middot θ
νK
ν = 1 W
yk yK
bull Generalises Source-filter models
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21
Harmonic model with changepoints
rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1
θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸
reg
+ [rk = 1]N (0 S)︸ ︷︷ ︸
new
yk|θk sim N (Cθk R)
A =
Gω
G2ω
GH
ω
N
Gω = ρk
(cos(ω) minus sin(ω)sin(ω) cos(ω)
)
damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22
Exact Inference in switching state space models is intractable
bull In general exact inference is NP hard
ndash Conditional Gaussians are not closed under marginalization
rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity
rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially
minus7903666343
076292
minus103422
minus101982minus2393
minus27957
minus04593
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23
Exact Inference for Changepoint detection
bull Exact inference is achievable in polynomial timespace
ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan
1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and
Heskes 2006)
r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0
θ0 θ1 θ2 θ3 θ4 θ5
y1 y2 y3 y4 y5
bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)
rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence
p(y1k r(i)1k) can be discarded without destroying optimality
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24
Monophonic model (Cemgil et al 2006)
bull We introduce a pitch label indicator m
bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states
r0 r1 rT
m0 m1 mT
s0 s1 sT
y1 yT
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 15
State space Parametrisation
xn+1 =
eminusγ1+jω1
eminusγp+jωp
︸ ︷︷ ︸
A
xn x0 =
c1c2cp
yn =(
1 1 1 1)
︸ ︷︷ ︸
C
xn
x0 x1 xkminus1 xk xK
y1 ykminus1 yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14
State Space Parametrisation
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15
Audio RestorationInterpolation
bull Estimate missing samples given observed ones
bull Restoration concatenative expressive speech synthesis
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16
Audio Interpolation
p(xnotκ|xκ) prop
int
dHp(xnotκ|H)p(xκ|H)p(H)
H equiv (parameters hidden states)
H
xnotκ xκ
Missing Observed
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17
Probabilistic Phase Vocoder (Cemgil and Godsill 2005)
Aν Qν
sν0 middot middot middot sν
k middot middot middot sνKminus1
ν = 0 W minus 1
x0 xk xKminus1
sνk sim N (sν
kAνsνkminus1 Qν) Aν sim N
(
Aν
(cos(ων) minus sin(ων)sin(ων) cos(ων)
)
Ψ
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18
Inference Structured Variational Bayes
Aα q(Aα) Qα q(Qα)
middot middot middot sαkminus1 sα
ksα
k+1 middot middot middot
α isin C
prod
k q(sαk |s
αkminus1)
xk q(xk)
bull Intuitive algorithm
ndash Substract from the observed signal x the prediction of the frequency bands in notα
ndash Compute a fit for α to this residual and iterate
bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of
equations
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19
Restoration
bull Piano
ndash Signal with missing samples (37)
ndash Reconstruction 768 dB improvement
ndash Original
bull Trumpet
ndash Signal with missing samples (37)
ndash Reconstruction 710 dB improvement
ndash Original
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20
piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)
Hierarchical Factorial Models
bull Each component models a latent process
bull The observations are projections
rν0 middot middot middot rν
k middot middot middot rνK
θν0 middot middot middot θ
νk middot middot middot θ
νK
ν = 1 W
yk yK
bull Generalises Source-filter models
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21
Harmonic model with changepoints
rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1
θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸
reg
+ [rk = 1]N (0 S)︸ ︷︷ ︸
new
yk|θk sim N (Cθk R)
A =
Gω
G2ω
GH
ω
N
Gω = ρk
(cos(ω) minus sin(ω)sin(ω) cos(ω)
)
damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22
Exact Inference in switching state space models is intractable
bull In general exact inference is NP hard
ndash Conditional Gaussians are not closed under marginalization
rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity
rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially
minus7903666343
076292
minus103422
minus101982minus2393
minus27957
minus04593
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23
Exact Inference for Changepoint detection
bull Exact inference is achievable in polynomial timespace
ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan
1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and
Heskes 2006)
r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0
θ0 θ1 θ2 θ3 θ4 θ5
y1 y2 y3 y4 y5
bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)
rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence
p(y1k r(i)1k) can be discarded without destroying optimality
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24
Monophonic model (Cemgil et al 2006)
bull We introduce a pitch label indicator m
bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states
r0 r1 rT
m0 m1 mT
s0 s1 sT
y1 yT
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 16
State Space Parametrisation
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15
Audio RestorationInterpolation
bull Estimate missing samples given observed ones
bull Restoration concatenative expressive speech synthesis
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16
Audio Interpolation
p(xnotκ|xκ) prop
int
dHp(xnotκ|H)p(xκ|H)p(H)
H equiv (parameters hidden states)
H
xnotκ xκ
Missing Observed
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17
Probabilistic Phase Vocoder (Cemgil and Godsill 2005)
Aν Qν
sν0 middot middot middot sν
k middot middot middot sνKminus1
ν = 0 W minus 1
x0 xk xKminus1
sνk sim N (sν
kAνsνkminus1 Qν) Aν sim N
(
Aν
(cos(ων) minus sin(ων)sin(ων) cos(ων)
)
Ψ
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18
Inference Structured Variational Bayes
Aα q(Aα) Qα q(Qα)
middot middot middot sαkminus1 sα
ksα
k+1 middot middot middot
α isin C
prod
k q(sαk |s
αkminus1)
xk q(xk)
bull Intuitive algorithm
ndash Substract from the observed signal x the prediction of the frequency bands in notα
ndash Compute a fit for α to this residual and iterate
bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of
equations
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19
Restoration
bull Piano
ndash Signal with missing samples (37)
ndash Reconstruction 768 dB improvement
ndash Original
bull Trumpet
ndash Signal with missing samples (37)
ndash Reconstruction 710 dB improvement
ndash Original
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20
piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)
Hierarchical Factorial Models
bull Each component models a latent process
bull The observations are projections
rν0 middot middot middot rν
k middot middot middot rνK
θν0 middot middot middot θ
νk middot middot middot θ
νK
ν = 1 W
yk yK
bull Generalises Source-filter models
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21
Harmonic model with changepoints
rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1
θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸
reg
+ [rk = 1]N (0 S)︸ ︷︷ ︸
new
yk|θk sim N (Cθk R)
A =
Gω
G2ω
GH
ω
N
Gω = ρk
(cos(ω) minus sin(ω)sin(ω) cos(ω)
)
damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22
Exact Inference in switching state space models is intractable
bull In general exact inference is NP hard
ndash Conditional Gaussians are not closed under marginalization
rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity
rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially
minus7903666343
076292
minus103422
minus101982minus2393
minus27957
minus04593
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23
Exact Inference for Changepoint detection
bull Exact inference is achievable in polynomial timespace
ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan
1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and
Heskes 2006)
r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0
θ0 θ1 θ2 θ3 θ4 θ5
y1 y2 y3 y4 y5
bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)
rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence
p(y1k r(i)1k) can be discarded without destroying optimality
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24
Monophonic model (Cemgil et al 2006)
bull We introduce a pitch label indicator m
bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states
r0 r1 rT
m0 m1 mT
s0 s1 sT
y1 yT
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 17
Audio RestorationInterpolation
bull Estimate missing samples given observed ones
bull Restoration concatenative expressive speech synthesis
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16
Audio Interpolation
p(xnotκ|xκ) prop
int
dHp(xnotκ|H)p(xκ|H)p(H)
H equiv (parameters hidden states)
H
xnotκ xκ
Missing Observed
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17
Probabilistic Phase Vocoder (Cemgil and Godsill 2005)
Aν Qν
sν0 middot middot middot sν
k middot middot middot sνKminus1
ν = 0 W minus 1
x0 xk xKminus1
sνk sim N (sν
kAνsνkminus1 Qν) Aν sim N
(
Aν
(cos(ων) minus sin(ων)sin(ων) cos(ων)
)
Ψ
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18
Inference Structured Variational Bayes
Aα q(Aα) Qα q(Qα)
middot middot middot sαkminus1 sα
ksα
k+1 middot middot middot
α isin C
prod
k q(sαk |s
αkminus1)
xk q(xk)
bull Intuitive algorithm
ndash Substract from the observed signal x the prediction of the frequency bands in notα
ndash Compute a fit for α to this residual and iterate
bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of
equations
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19
Restoration
bull Piano
ndash Signal with missing samples (37)
ndash Reconstruction 768 dB improvement
ndash Original
bull Trumpet
ndash Signal with missing samples (37)
ndash Reconstruction 710 dB improvement
ndash Original
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20
piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)
Hierarchical Factorial Models
bull Each component models a latent process
bull The observations are projections
rν0 middot middot middot rν
k middot middot middot rνK
θν0 middot middot middot θ
νk middot middot middot θ
νK
ν = 1 W
yk yK
bull Generalises Source-filter models
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21
Harmonic model with changepoints
rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1
θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸
reg
+ [rk = 1]N (0 S)︸ ︷︷ ︸
new
yk|θk sim N (Cθk R)
A =
Gω
G2ω
GH
ω
N
Gω = ρk
(cos(ω) minus sin(ω)sin(ω) cos(ω)
)
damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22
Exact Inference in switching state space models is intractable
bull In general exact inference is NP hard
ndash Conditional Gaussians are not closed under marginalization
rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity
rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially
minus7903666343
076292
minus103422
minus101982minus2393
minus27957
minus04593
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23
Exact Inference for Changepoint detection
bull Exact inference is achievable in polynomial timespace
ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan
1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and
Heskes 2006)
r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0
θ0 θ1 θ2 θ3 θ4 θ5
y1 y2 y3 y4 y5
bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)
rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence
p(y1k r(i)1k) can be discarded without destroying optimality
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24
Monophonic model (Cemgil et al 2006)
bull We introduce a pitch label indicator m
bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states
r0 r1 rT
m0 m1 mT
s0 s1 sT
y1 yT
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 18
Audio Interpolation
p(xnotκ|xκ) prop
int
dHp(xnotκ|H)p(xκ|H)p(H)
H equiv (parameters hidden states)
H
xnotκ xκ
Missing Observed
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17
Probabilistic Phase Vocoder (Cemgil and Godsill 2005)
Aν Qν
sν0 middot middot middot sν
k middot middot middot sνKminus1
ν = 0 W minus 1
x0 xk xKminus1
sνk sim N (sν
kAνsνkminus1 Qν) Aν sim N
(
Aν
(cos(ων) minus sin(ων)sin(ων) cos(ων)
)
Ψ
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18
Inference Structured Variational Bayes
Aα q(Aα) Qα q(Qα)
middot middot middot sαkminus1 sα
ksα
k+1 middot middot middot
α isin C
prod
k q(sαk |s
αkminus1)
xk q(xk)
bull Intuitive algorithm
ndash Substract from the observed signal x the prediction of the frequency bands in notα
ndash Compute a fit for α to this residual and iterate
bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of
equations
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19
Restoration
bull Piano
ndash Signal with missing samples (37)
ndash Reconstruction 768 dB improvement
ndash Original
bull Trumpet
ndash Signal with missing samples (37)
ndash Reconstruction 710 dB improvement
ndash Original
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20
piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)
Hierarchical Factorial Models
bull Each component models a latent process
bull The observations are projections
rν0 middot middot middot rν
k middot middot middot rνK
θν0 middot middot middot θ
νk middot middot middot θ
νK
ν = 1 W
yk yK
bull Generalises Source-filter models
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21
Harmonic model with changepoints
rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1
θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸
reg
+ [rk = 1]N (0 S)︸ ︷︷ ︸
new
yk|θk sim N (Cθk R)
A =
Gω
G2ω
GH
ω
N
Gω = ρk
(cos(ω) minus sin(ω)sin(ω) cos(ω)
)
damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22
Exact Inference in switching state space models is intractable
bull In general exact inference is NP hard
ndash Conditional Gaussians are not closed under marginalization
rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity
rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially
minus7903666343
076292
minus103422
minus101982minus2393
minus27957
minus04593
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23
Exact Inference for Changepoint detection
bull Exact inference is achievable in polynomial timespace
ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan
1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and
Heskes 2006)
r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0
θ0 θ1 θ2 θ3 θ4 θ5
y1 y2 y3 y4 y5
bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)
rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence
p(y1k r(i)1k) can be discarded without destroying optimality
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24
Monophonic model (Cemgil et al 2006)
bull We introduce a pitch label indicator m
bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states
r0 r1 rT
m0 m1 mT
s0 s1 sT
y1 yT
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 19
Probabilistic Phase Vocoder (Cemgil and Godsill 2005)
Aν Qν
sν0 middot middot middot sν
k middot middot middot sνKminus1
ν = 0 W minus 1
x0 xk xKminus1
sνk sim N (sν
kAνsνkminus1 Qν) Aν sim N
(
Aν
(cos(ων) minus sin(ων)sin(ων) cos(ων)
)
Ψ
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18
Inference Structured Variational Bayes
Aα q(Aα) Qα q(Qα)
middot middot middot sαkminus1 sα
ksα
k+1 middot middot middot
α isin C
prod
k q(sαk |s
αkminus1)
xk q(xk)
bull Intuitive algorithm
ndash Substract from the observed signal x the prediction of the frequency bands in notα
ndash Compute a fit for α to this residual and iterate
bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of
equations
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19
Restoration
bull Piano
ndash Signal with missing samples (37)
ndash Reconstruction 768 dB improvement
ndash Original
bull Trumpet
ndash Signal with missing samples (37)
ndash Reconstruction 710 dB improvement
ndash Original
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20
piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)
Hierarchical Factorial Models
bull Each component models a latent process
bull The observations are projections
rν0 middot middot middot rν
k middot middot middot rνK
θν0 middot middot middot θ
νk middot middot middot θ
νK
ν = 1 W
yk yK
bull Generalises Source-filter models
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21
Harmonic model with changepoints
rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1
θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸
reg
+ [rk = 1]N (0 S)︸ ︷︷ ︸
new
yk|θk sim N (Cθk R)
A =
Gω
G2ω
GH
ω
N
Gω = ρk
(cos(ω) minus sin(ω)sin(ω) cos(ω)
)
damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22
Exact Inference in switching state space models is intractable
bull In general exact inference is NP hard
ndash Conditional Gaussians are not closed under marginalization
rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity
rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially
minus7903666343
076292
minus103422
minus101982minus2393
minus27957
minus04593
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23
Exact Inference for Changepoint detection
bull Exact inference is achievable in polynomial timespace
ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan
1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and
Heskes 2006)
r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0
θ0 θ1 θ2 θ3 θ4 θ5
y1 y2 y3 y4 y5
bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)
rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence
p(y1k r(i)1k) can be discarded without destroying optimality
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24
Monophonic model (Cemgil et al 2006)
bull We introduce a pitch label indicator m
bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states
r0 r1 rT
m0 m1 mT
s0 s1 sT
y1 yT
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 20
Inference Structured Variational Bayes
Aα q(Aα) Qα q(Qα)
middot middot middot sαkminus1 sα
ksα
k+1 middot middot middot
α isin C
prod
k q(sαk |s
αkminus1)
xk q(xk)
bull Intuitive algorithm
ndash Substract from the observed signal x the prediction of the frequency bands in notα
ndash Compute a fit for α to this residual and iterate
bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of
equations
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19
Restoration
bull Piano
ndash Signal with missing samples (37)
ndash Reconstruction 768 dB improvement
ndash Original
bull Trumpet
ndash Signal with missing samples (37)
ndash Reconstruction 710 dB improvement
ndash Original
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20
piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)
Hierarchical Factorial Models
bull Each component models a latent process
bull The observations are projections
rν0 middot middot middot rν
k middot middot middot rνK
θν0 middot middot middot θ
νk middot middot middot θ
νK
ν = 1 W
yk yK
bull Generalises Source-filter models
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21
Harmonic model with changepoints
rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1
θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸
reg
+ [rk = 1]N (0 S)︸ ︷︷ ︸
new
yk|θk sim N (Cθk R)
A =
Gω
G2ω
GH
ω
N
Gω = ρk
(cos(ω) minus sin(ω)sin(ω) cos(ω)
)
damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22
Exact Inference in switching state space models is intractable
bull In general exact inference is NP hard
ndash Conditional Gaussians are not closed under marginalization
rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity
rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially
minus7903666343
076292
minus103422
minus101982minus2393
minus27957
minus04593
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23
Exact Inference for Changepoint detection
bull Exact inference is achievable in polynomial timespace
ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan
1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and
Heskes 2006)
r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0
θ0 θ1 θ2 θ3 θ4 θ5
y1 y2 y3 y4 y5
bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)
rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence
p(y1k r(i)1k) can be discarded without destroying optimality
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24
Monophonic model (Cemgil et al 2006)
bull We introduce a pitch label indicator m
bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states
r0 r1 rT
m0 m1 mT
s0 s1 sT
y1 yT
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 21
Restoration
bull Piano
ndash Signal with missing samples (37)
ndash Reconstruction 768 dB improvement
ndash Original
bull Trumpet
ndash Signal with missing samples (37)
ndash Reconstruction 710 dB improvement
ndash Original
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20
piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)
Hierarchical Factorial Models
bull Each component models a latent process
bull The observations are projections
rν0 middot middot middot rν
k middot middot middot rνK
θν0 middot middot middot θ
νk middot middot middot θ
νK
ν = 1 W
yk yK
bull Generalises Source-filter models
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21
Harmonic model with changepoints
rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1
θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸
reg
+ [rk = 1]N (0 S)︸ ︷︷ ︸
new
yk|θk sim N (Cθk R)
A =
Gω
G2ω
GH
ω
N
Gω = ρk
(cos(ω) minus sin(ω)sin(ω) cos(ω)
)
damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22
Exact Inference in switching state space models is intractable
bull In general exact inference is NP hard
ndash Conditional Gaussians are not closed under marginalization
rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity
rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially
minus7903666343
076292
minus103422
minus101982minus2393
minus27957
minus04593
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23
Exact Inference for Changepoint detection
bull Exact inference is achievable in polynomial timespace
ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan
1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and
Heskes 2006)
r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0
θ0 θ1 θ2 θ3 θ4 θ5
y1 y2 y3 y4 y5
bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)
rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence
p(y1k r(i)1k) can be discarded without destroying optimality
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24
Monophonic model (Cemgil et al 2006)
bull We introduce a pitch label indicator m
bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states
r0 r1 rT
m0 m1 mT
s0 s1 sT
y1 yT
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 22
Hierarchical Factorial Models
bull Each component models a latent process
bull The observations are projections
rν0 middot middot middot rν
k middot middot middot rνK
θν0 middot middot middot θ
νk middot middot middot θ
νK
ν = 1 W
yk yK
bull Generalises Source-filter models
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21
Harmonic model with changepoints
rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1
θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸
reg
+ [rk = 1]N (0 S)︸ ︷︷ ︸
new
yk|θk sim N (Cθk R)
A =
Gω
G2ω
GH
ω
N
Gω = ρk
(cos(ω) minus sin(ω)sin(ω) cos(ω)
)
damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22
Exact Inference in switching state space models is intractable
bull In general exact inference is NP hard
ndash Conditional Gaussians are not closed under marginalization
rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity
rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially
minus7903666343
076292
minus103422
minus101982minus2393
minus27957
minus04593
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23
Exact Inference for Changepoint detection
bull Exact inference is achievable in polynomial timespace
ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan
1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and
Heskes 2006)
r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0
θ0 θ1 θ2 θ3 θ4 θ5
y1 y2 y3 y4 y5
bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)
rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence
p(y1k r(i)1k) can be discarded without destroying optimality
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24
Monophonic model (Cemgil et al 2006)
bull We introduce a pitch label indicator m
bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states
r0 r1 rT
m0 m1 mT
s0 s1 sT
y1 yT
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 23
Harmonic model with changepoints
rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1
θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸
reg
+ [rk = 1]N (0 S)︸ ︷︷ ︸
new
yk|θk sim N (Cθk R)
A =
Gω
G2ω
GH
ω
N
Gω = ρk
(cos(ω) minus sin(ω)sin(ω) cos(ω)
)
damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22
Exact Inference in switching state space models is intractable
bull In general exact inference is NP hard
ndash Conditional Gaussians are not closed under marginalization
rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity
rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially
minus7903666343
076292
minus103422
minus101982minus2393
minus27957
minus04593
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23
Exact Inference for Changepoint detection
bull Exact inference is achievable in polynomial timespace
ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan
1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and
Heskes 2006)
r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0
θ0 θ1 θ2 θ3 θ4 θ5
y1 y2 y3 y4 y5
bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)
rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence
p(y1k r(i)1k) can be discarded without destroying optimality
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24
Monophonic model (Cemgil et al 2006)
bull We introduce a pitch label indicator m
bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states
r0 r1 rT
m0 m1 mT
s0 s1 sT
y1 yT
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 24
Exact Inference in switching state space models is intractable
bull In general exact inference is NP hard
ndash Conditional Gaussians are not closed under marginalization
rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity
rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially
minus7903666343
076292
minus103422
minus101982minus2393
minus27957
minus04593
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23
Exact Inference for Changepoint detection
bull Exact inference is achievable in polynomial timespace
ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan
1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and
Heskes 2006)
r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0
θ0 θ1 θ2 θ3 θ4 θ5
y1 y2 y3 y4 y5
bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)
rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence
p(y1k r(i)1k) can be discarded without destroying optimality
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24
Monophonic model (Cemgil et al 2006)
bull We introduce a pitch label indicator m
bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states
r0 r1 rT
m0 m1 mT
s0 s1 sT
y1 yT
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 25
Exact Inference for Changepoint detection
bull Exact inference is achievable in polynomial timespace
ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan
1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and
Heskes 2006)
r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0
θ0 θ1 θ2 θ3 θ4 θ5
y1 y2 y3 y4 y5
bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)
rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence
p(y1k r(i)1k) can be discarded without destroying optimality
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24
Monophonic model (Cemgil et al 2006)
bull We introduce a pitch label indicator m
bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states
r0 r1 rT
m0 m1 mT
s0 s1 sT
y1 yT
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 26
Monophonic model (Cemgil et al 2006)
bull We introduce a pitch label indicator m
bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states
r0 r1 rT
m0 m1 mT
s0 s1 sT
y1 yT
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 27
Monophonic Pitch Tracking
Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)
100 200 300 400 500 600 700 800 900 1000minus100
minus50
0
50
100 200 300 400 500 600 700 800 900 1000
5
10
15
bull If pitch is constant exact inference is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 28
Transcription
bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE
TSALP)
500 1000 1500 2000 2500 3000 3500
Exact inference (S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27
d1wav
Media File (audiowav)
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 29
Tracking Pitch Variations
bull Allow m to change with k
50 100 150 200 250 300 350 400 450 500
bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 30
Factorial Generative models for Analysis of Polyphonic Audio
νfr
eque
ncy
k
x k
bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29
montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 31
Single time slice - Bayesian Variable Selection
ri sim C(ri πon πoff)
si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)
x|s1W sim N (x Cs1W R)
C equiv [ C1 Ci CW ]
r1 rW
s1 sW
x
bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors
bull The exact posterior is a mixture of 2W Gaussians
bull When W is large computation of posterior features becomes intractable
bull Sparsity by construction (Olshausen and Millman Attias )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 32
Factorial Switching State space model
r0ν sim C(r0ν π0ν)
θ0ν sim N (θ0ν microν Pν)
rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator
θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state
yk|θk1W sim N (yk Ckθk1W R) Observation
rν0 middot middot middot rν
k middot middot middot rνK
sν0 middot middot middot sν
k middot middot middot sνK
ν = 1 W
yk yK
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 33
Synthetic Data
νx
freq
ν
ν
k
(S)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32
audio_examplewav
Media File (audiowav)
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 34
Technical Difficulties
bull Inference is quite heavy
bull Vanilla Kalman filtering methods are not stable ndash computations with
large matrices
ndash Need advance techniques from linear algebra
ndash Interesting links to subspace methods
bull Hyperparameter learning is necessary
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 35
Modelling levels
bull Physical - acoustical
bull Time domain ndash state space dynamical models
bull Transform domain ndash Fourier representations Generalised Linear
model
bull Feature Based
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 36
Spectrogram
bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT
x(t) =sum
k
skφk(t)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
bull Spectrogram displays log |sk| or |sk|2 (of STFT)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 37
Models for time-frequency Energy distributions
bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003
Virtanen 2003 Abdallah Plumbley 2004 )
Xντ = WνjSjτ
Spectrogram = Spectral Templatestimes Excitations
= times
ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 38
Models for time-frequency Energy distributions
bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )
Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)
ντ
Spectrogram = Masktimes Source0 + (1minusMask)times Source1
= + +
ndash however sources do overlap in time and frequency
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 39
Prior structures on time-frequency Energy distributions
bull Main Idea Spectrogram is a point estimate of the energy at a
time-frequency atom k(ν τ)
bull We place a suitable prior on the variance of transform coefficients sk
and tie the prior variances across harmonically and temporally related
time-frequency atoms
p(s|v)p(v) =
(prod
k
p(sk|vk)
)
p(v)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 40
One channel source separation Gaussian source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim N (skn 0 vkn)
xk|sk1N =sumN
n=1 skn
bull Straightforward application of Bayesrsquo theorem yields
p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))
κkn = vknsum
nprime
vknprime (Responsibilities)
bull Each source coefficient sn gets a fraction κn of the observation x
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 41
One channel source separation Poisson source model
vk1 vkN
sk1 skN
xk
k = 1 K
skn|vkn sim PO(skn vkn)
xk|sk1N =sumN
n=1 skn
bull This is the generative model for the NMF when we write
vk(ντ)n = tνn times eτn (Templatetimes Excitation)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 42
Gamma G(x a b) and Inverse Gamma IG(x a b)
0 1 2 3 4 50
02
04
06
08
1
12a = 09 b =1
a = 1 b =1
a = 13 b =1
a = 2 b =1
x
p(x)
0 1 2 3 4 50
02
04
06
08
1
12
14
a=1 b=1
a=1 b=05
a=2 b=1
G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))
IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))
bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale
bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 43
Gamma Chains
We define an inverse Gamma-Markov chain for k = 1 K as follows
vk|zk sim IG(vk a zka)
zk+1|vk sim IG(zk+1 az vkaz)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
bull Variance variables v are priors for sources
bull Auxillary variables z are needed for conjugacy and positive correlation
bull Shape parameters a and az describe coupling strength and drift of the chain
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 44
Gamma Chains typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 45
Gamma Chains with changepoints typical draws
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 10
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 4
100 200 300 400 500 600 700 800 900 1000
minus20
0
20
log
v k
a = 10 az = 40
k
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 46
Gamma Chains
bull The joint can be written as product of singleton and pairwise
potentials of form
ψkk = exp(minusazminus1k vminus1
k ) (Pairwise)
z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az
φzk = exp((az + a+ 1) log zminus1
k ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 47
Gamma Fields
bull The joint can be written as product of singleton and pairwise
potentials
ψij = exp(minusaijξminus1i ξminus1
j ) (Pairwise)
φi = exp((sum
j
aij + 1) log ξminus1i ) (Singletons)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 48
Possible Model Topologies
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 49
Approximate Inference
bull Stochastic
ndash Markov Chain Monte Carlo Gibbs sampler
ndash Sequential Monte Carlo Particle Filtering
bull Deterministic
ndash Variational Bayes
In all these conjugacy helps
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 50
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))
bull Gibbs
v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z
(τ)k )ψkk+1(z
(τ)k+1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 51
VB or Gibbs
ψ01
z1
ψ11
v1
ψ12
z2
ψ22
v2 middot middot middot
p(y1|v1) p(y2|v2)
bull VB
q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))
bull Gibbs
z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v
(τ)kminus1)ψkk(v
(τ)k )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 52
Denoising - Speech (VB)
bull Additive Gaussian noise with unknown variance
bull Inference Variational Bayes
Noisy Original
X
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xorg
20 40 60 80 100 120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xh SNR1998
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xv SNR2079
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xb SNR1968
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Xg SNR1997
20406080100120
50
100
150
200
250
300
350
400
450
500
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 53
Denoising ndash MusicOriginal
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Noisy
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
PF SNR853
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
Gibbs SNR866
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
VB SNR208
50 100 150 200 250
50
100
150
200
250
300
350
400
450
500
minus18
minus16
minus14
minus12
minus10
minus8
minus6
minus4
minus2
0
ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 54
Single Channel Source Separation (with Onur Dikmen)
bull Source 1 Horizontal Tie across time harmonic continuity
bull Source 2 Vertical Tie across frequency transients percussive sounds
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 55
Single Channel Source Separation with IGMCs
E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -474 -328 567 -158 1546 -137
Gibbs -45 -262 457 105 1246 161
GibbsEM -423 -242 482 134 1313 185
Preminus trained -404 -315 813 356 1144 464
Oracle 614 1716 658 1266 1995 136
bull Oracle We use the square of the source coefficient as the latent variance estimate
bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 56
Single Channel Source Separation with IGMCs
ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix
s1 s2
SDR SIR SAR SDR SIR SAR
VB -78 -622 453 -235 184 -225
Gibbs -846 -753 693 -404 1459 -383
GibbsEM -774 -619 462 -114 1662 -097
Preminus trained -64 -539 695 38 1639 414
Oracle 121 329 1214 2113 3389 2137
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 57
Harmonic-Transient Decomposition
Time (τ)
Fre
quen
cy B
in (ν
)
Xorg
Shor
Sver
(Original) (Hor) (Vert)
(Original) (Hor) (Vert)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56
s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 58
Chord Detection - Signal model (with Paul Peeling)
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 59
Chord Detection
Time τ s
Fre
quen
cy
Hz
MDCT of piano chord 41485156
05 1 15 2 250
500
1000
1500
2000
2500
3000
3500
4000
Time τ s
MID
Inot
ej
logsum
ν vνjτ
05 1 15 2 2540
45
50
55
60
65
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 60
Multichannel Source Separation
bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)
λ1 λn λN sim G(λn aλ bλ)
vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))
sk1 skn skN sim N (skn 0 vkn)
xk1 xkM
k = 1 K
sim N (xkma⊤msk1N rm)
a1 r1 aM
sim N (am middot middot middot )
rM
sim IG(rm middot middot middot )
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 61
Equivalent Gamma MRF
bull A tree for each source
bull λn can be interpreted as the overall ldquovolumerdquo of source n
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 62
Source Separation
tsecfH
z
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
5
10
15
20
25
(Guitar)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
(Mix)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61
s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 63
Reconstructions
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
10
15
20
25
30
(Speech)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
0
10
20
30
(Piano)
tsec
fHz
0 200 400 600 800 1000 12000
2000
4000
6000
8000
10000
5
10
15
20
25
(Guitar)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62
var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 64
Multimodality
bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 65
Multimodality
Annealing Bridging Overrelaxation Tempering
0 500 1000 1500 2000
minus08024
08295
20375
a
0 500 1000 1500 2000
72408
251398362295
λ
0 500 1000 1500 2000
0545118648
r
Epoch
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 66
Tempo tracking and score performance matching
bull Given expressive music data (onsetsdetectionsspectral features)
ndash Determine the position of a performance on a score
ndash Determine where a human listener would clap her hand
ndash Create a quantizedhuman readable score
ndash
bull Online-Realtime or Offline-Batch
bull All of these problems can be mapped to inference problems in a HMM
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 67
Bar position Pointer (Whiteley Cemgil Godsill 2006)
| | |
3 bull bull bull bull bull bull bull bull
nk 2 bull bull bull bull bull bull bull bull
1 bull bull bull bull bull bull bull bull
1 2 3 4 5 6 7 8
mk
34 time
44 time
bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)
bull Directed Arcs denote state transitions with positive probability
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 68
Bar position Pointer - transition model
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Tem
po L
evel
Bar Position
p(x2| x
1)
1 2 3 4 5 6 7 8
1
2
3
4
5
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 69
Bar position Pointer - k = 1
Tem
po L
evel
Bar Position
p(x1)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 70
Bar position Pointer - k = 2
Tem
po L
evel
Bar Position
p(x2)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 71
Bar position Pointer - k = 3
Tem
po L
evel
Bar Position
p(x3)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 72
Bar position Pointer - k = 4
Tem
po L
evel
Bar Position
p(x4)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 73
Bar position Pointer - k = 5
Tem
po L
evel
Bar Position
y5 = 0 p(x
5| y
15)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 74
Bar position Pointer - k = 10
Tem
po L
evel
Bar Position
p(x10
)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 75
Bar position Pointer - observation model (Poisson)
bull Observation model p(yk|xk) Poisson intensity
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Triplet Rhythm
0 100 200 300 400 500 600 700 800 900 10000
2
4
mk
micro k
Duplet Rhythm
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 76
Tempo Rhythm Meter analysis
Bar Pointer Model (Whiteley Cemgil Godsill 2006)
n0 n1 n2 n3
θ0 θ1 θ2 θ3
m0 m1 m2 m3
r0 r1 r2 r3
λ1 λ2 λ3
y1 y2 y3
bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 77
Filtering
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1k)
50 100 150 200 250 300 350 400 450
800
600
400
200
minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1k)
50 100 150 200 250 300 350 400 450
180
120
60minus4
minus2
0
p(rk|y
1k)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets
002040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 78
Smoothing
0 50 100 150 200 250 300 350 400 4500
1
2
y k
Observed Data
mk
log p(mk|y
1K)
50 100 150 200 250 300 350 400 450
800
600
400
200 minus10
minus5
0Q
uart
er n
otes
per
min
log p(nk|y
1K)
50 100 150 200 250 300 350 400 450
180
120
60minus10
minus5
0
p(rk|y
1K)
Frame Index k
50 100 150 200 250 300 350 400 450
Triplets
Duplets 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 79
Time Signature
0 2 4 6 8 10 12
minus1
0
1
sam
ple
valu
e
time s
Observed Data
mk
log p(mk|z
1K)
100 200 300 400 500
800
600
400
200 minus10
minus5
0
Qua
rter
not
es p
er m
in log p(n
k|z
1K)
100 200 300 400 500
155
103
52minus10
minus5
0
p(θk|z
1K)
Frame Index k
100 200 300 400 500
44
34 02040608
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 80
Score-Performance matching (ISMIR) 2007
bull Given a musical score associate note events with the audio
4
t
x t
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 81
Score-Performance matching - Graphical Model
ν = 1 W
t1 t2 tK
r1 r2 rK
λ1 λ2 λK
vν1 vν2 vνK
sν1 sν2 sνK
6 7 81 2 53 4
rk
vντ sim IG(vντ a 1(aλσν(rτ)))
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 82
Score-Performance matching - Signal model
0 500 1000 1500 2000 2500 3000 3500 4000minus12
minus10
minus8
minus6
minus4
minus2
0
Frequency ν Hz
log
σ ν
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 83
Score-Performance matching
Spectrogram Data
Time s
Fre
quen
cy
Hz
0 2 4 6 8 10 12 140
1000
2000
3000
4000
50 100 150 200 250 300 350 400 45055
60
65
70
75
80
85MIDI Data
Score position
MID
I not
e
Online (filtering) or Offline (smoothing) processing is possible
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 84
Transcription
log p(rτ |sτ )
MID
Inot
enum
ber
Time s1 2 3 4
60
65
70
75
80
1 2 3 4minus10
minus5
0
5
10
sum
i w(i)τ λ
(i)τ
Time s
logλ
MDCT of audio (source Daniel-Ben Pienaar)
Time s
Fre
quen
cy
Hz
1 2 3 40
500
1000
1500
2000
2500
3000
3500
4000
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 85
Summary
bull ldquoTime Domainrdquo ndash Switching State Space Models
ndash State space modeling
ndash Conditional Linear Dynamical Systems Gaussian processes (eg
AR ARMA)
ndash Analysis down to sample precision (if required)
ndash Computationally quite heavy
bull ldquoTransform Domainrdquo ndash Gamma Fields
ndash Models on (orthogonal) transform coefficients Energy compaction
ndash Practical can make use of fast transforms (FFT MDCT )
ndash Inherent limitations (analysis windows frequency resolution)
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85
Page 86
Summary
bull Gamma chains and fields a flexible stochastic volatility prior for
ndash Time-Frequency Energy distributions
bull Ongoing Work
ndash Comparison of inference methods (VB MCMC SMC)
ndash Learning
ndash Applications
lowast Chord detection Polyphonic transcription
lowast Musical Score guided source separation
ndash Prior structures for other observation models NMF
Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85