variational bayesian methods for audio indexing
DESCRIPTION
Variational Bayesian Methods for Audio Indexing. Fabio Valente, Christian Wellekens Institut Eurecom. Outline. Generalities on speaker clustering Model selection/BIC Variational learning Variational model selection Results. Speaker clustering. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Variational Bayesian Methods for Audio Indexing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815a87550346895dc7f85d/html5/thumbnails/1.jpg)
Variational Bayesian Methodsfor Audio Indexing
Fabio Valente, Christian Wellekens
Institut Eurecom
![Page 2: Variational Bayesian Methods for Audio Indexing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815a87550346895dc7f85d/html5/thumbnails/2.jpg)
Outline Generalities on speaker clustering Model selection/BIC Variational learning Variational model selection Results
![Page 3: Variational Bayesian Methods for Audio Indexing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815a87550346895dc7f85d/html5/thumbnails/3.jpg)
Speaker clustering Many applications (speaker indexing, speech
recognition) require clustering segments with the same characteristics e.g. speech from the same speaker.
Goal: grouping together speech segments of the same speaker
Fully connected (ergodic) HMM topology with duration constraint. Each state represent a speaker.
When speaker number is not known it must be estimated with a model selection criterion (e.g. BIC,…)
![Page 4: Variational Bayesian Methods for Audio Indexing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815a87550346895dc7f85d/html5/thumbnails/4.jpg)
Model selectionGiven data Y and model m optimal model maximizes:
)(
)()|()|(
Yp
mpmYpYmp
If prior is uniform, decision depends only on p(Y|m) (a.k.a. marginal likelihood)
Prohibitive to compute for some models (HMM,GMM)
Bayesian modeling assumes distributions over parameters
The criterion is thus the marginal likelihood:
)|,()|( mYpmp
dmpmYpmYp )|(),|()|(
![Page 5: Variational Bayesian Methods for Audio Indexing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815a87550346895dc7f85d/html5/thumbnails/5.jpg)
Bayesian information criterion (BIC)
First order approximation obtained from the Laplace approximationof the marginal likelihood (Schwartz, 1978)
nd
mYpmYBIC log2
),ˆ|(log),(
Generally, penalty is multiplied by a constant (threshold):
BIC does not depend on parameter distributions !
Asymptotically (n large) BIC converges to log-marginal likelihood
![Page 6: Variational Bayesian Methods for Audio Indexing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815a87550346895dc7f85d/html5/thumbnails/6.jpg)
Variational Learning
d
q
mYpqdmYpmYp
)(
)|,()(ln)|,(ln)|(ln
mFdq
mYpq ˆ
)(
)|,(ln)(
Applying Jensen inequality
Introduce an approximated variational distribution )(q
ln p(Y|m) maximization is then replaced by maximization of mF
![Page 7: Variational Bayesian Methods for Audio Indexing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815a87550346895dc7f85d/html5/thumbnails/7.jpg)
Variational Learning with hidden variables
If x is the hidden variable, we can write:
Independence hypothesis )()(),( ixqqxq
Sometimes model optimization needs the use of hidden variables (e.g. state sequence in the EM)
dxdxq
xYpxqFm
),(
),,(ln),(
))(||)(()(
)|,(ln)()( pqKLdxd
xq
xYpxqqFm
![Page 8: Variational Bayesian Methods for Audio Indexing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815a87550346895dc7f85d/html5/thumbnails/8.jpg)
EM-like algorithm
Under the hypothesis:
E-step: dmxypmqmxq )|,,(ln)|(exp)|(
M-step: )|(])|,,(ln)|([exp)|( mpdmxypmxqmq
)|()|()|,( mxqmqmxq
![Page 9: Variational Bayesian Methods for Audio Indexing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815a87550346895dc7f85d/html5/thumbnails/9.jpg)
VB Model selection
)(}exp{)( mpFmq m
)(mqIn the same way an approximated posterior distribution overmodels can be defined:
])(
)(ln)[()|()()(ln
mq
mpFmqmYpmpYp m
Model selection based on mF
Maximizing w.r.t. q(m) yields:
Best model maximizes q(m)
![Page 10: Variational Bayesian Methods for Audio Indexing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815a87550346895dc7f85d/html5/thumbnails/10.jpg)
Experimental framework BN-96 Hub4 evaluation data set Initialize a model with N speakers (states) and train the system using
VB and ML (or VB and MAP with UBM) Reduce the speaker number from N-1 to 1 and train using VB and ML
(or MAP). Score the N models with VB and BIC and choose the best one Three score
Best score Selected score (with VB or BIC) Score obtained with the known speaker number
Results given in terms of :Acp: average cluster purityAsp: average speaker purity
aspacpK
![Page 11: Variational Bayesian Methods for Audio Indexing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815a87550346895dc7f85d/html5/thumbnails/11.jpg)
Experiments I
File 1
N acp asp K
ML-known
8 0.60
0.84
0.71
ML-best 10 0.80
0.86
0.83
ML/BIC 13 0.80
0.86
0.83
File 1
N acp asp K
VB-known
8 0.70
0.91
0.80
VB-best 12 0.85
0.89
0.87
VB 15 0.85
0.89
0.87
File 2
N acp asp K
ML-known
14 0.76
0.67
0.72
ML-best 9 0.72
0.77
0.74
ML/BIC 13 0.84
0.63
0.73
File 2
N acp asp K
VB-known
14 0.75
0.82
0.78
VB-best 14 0.84
0.81
0.82
VB 14 0.84
0.81
0.82
![Page 12: Variational Bayesian Methods for Audio Indexing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815a87550346895dc7f85d/html5/thumbnails/12.jpg)
File 3
N acp asp K
ML-known
16 0.75
0.74
0.75
ML-best 15 0.77
0.83
0.80
ML/BIC 15 0.77
0.83
0.80
File 3
N acp asp K
VB-known
16 0.68
0.86
0.76
VB-best 14 0.75
0.90
0.82
VB 14 0.75
0.90
0.82
File 4
N acp asp K
ML-known
21 0.72
0.65
0.68
ML-best 12 0.63
0.80
0.71
ML/BIC 21 0.76
0.60
0.68
File 4
N acp asp K
VB-known
21 0.72
0.65
0.68
VB-best 13 0.63
0.80
0.71
VB 13 0.64
0.72
0.68
Experiments II
![Page 13: Variational Bayesian Methods for Audio Indexing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815a87550346895dc7f85d/html5/thumbnails/13.jpg)
Dependence on threshold
K function of the threshold Speaker number function of the threshold
![Page 14: Variational Bayesian Methods for Audio Indexing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815a87550346895dc7f85d/html5/thumbnails/14.jpg)
Free Energy vs. BIC
![Page 15: Variational Bayesian Methods for Audio Indexing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815a87550346895dc7f85d/html5/thumbnails/15.jpg)
Experiments III
File 1
N acp asp K
MAP-known
8 0.52
0.72
0.62
MAP-best 15 0.81
0.84
0.83
MAP/BIC 13 0.80
0.81
0.81
File 1
N acp asp K
VB-known
8 0.68
0.88
0.77
VB-best 22 0.83
0.85
0.84
VB 22 0.83
0.85
0.84
File 2
N acp asp K
MAP-known
14 0.68
0.78
0.73
MAP-best 22 0.84
0.80
0.82
MAP/BIC 18 0.68
0.85
0.81
File 2
N acp asp K
VB-known
14 0.69
0.80
0.74
VB-best 18 0.85
0.87
0.86
VB 19 0.87
0.80
0.83
![Page 16: Variational Bayesian Methods for Audio Indexing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815a87550346895dc7f85d/html5/thumbnails/16.jpg)
Experiments IV
File 3
N acp asp K
MAP-known
16 0.71
0.77
0.74
MAP-best 29 0.78
0.74
0.76
MAP/BIC 16 0.69
0.77
0.73
File 3
N acp asp K
VB-known
16 0.74
0.83
0.78
VB-best 22 0.82
0.82
0.82
VB 16 0.78
0.79
0.79
File 4
N acp asp K
MAP-known
18 0.65
0.69
0.67
MAP-best 18 0.65
0.69
0.67
MAP/BIC 20 0.63
0.64
0.64
File 4
N acp asp K
VB-known
21 0.67
0.73
0.70
VB-best 20 0.69
0.72
0.70
VB 19 0.67
0.73
0.70
![Page 17: Variational Bayesian Methods for Audio Indexing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815a87550346895dc7f85d/html5/thumbnails/17.jpg)
Conclusions and Future Works VB uses free energy for parameter
learning and model selection. VB generalizes both ML and MAP learning
framework. VB outperforms ML/BIC on 3 of the 4 BN
files. VB outperforms MAP/BIC on 4 of the 4 BN
files. Repeat the experiments on other
databases (e.g. NIST speaker diarization).
![Page 18: Variational Bayesian Methods for Audio Indexing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815a87550346895dc7f85d/html5/thumbnails/18.jpg)
Thanks for your attention!
![Page 19: Variational Bayesian Methods for Audio Indexing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815a87550346895dc7f85d/html5/thumbnails/19.jpg)
Data vs. Gaussian components
Final gaussian components function of amount of data for each speaker
![Page 20: Variational Bayesian Methods for Audio Indexing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815a87550346895dc7f85d/html5/thumbnails/20.jpg)
Experiments (file 1)
Real VB ML/BIC
Speaker 8 15 13
![Page 21: Variational Bayesian Methods for Audio Indexing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815a87550346895dc7f85d/html5/thumbnails/21.jpg)
Experiments (file 2)
Real VB ML/BIC
Speaker 14 14 16
![Page 22: Variational Bayesian Methods for Audio Indexing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815a87550346895dc7f85d/html5/thumbnails/22.jpg)
Experiments (file 3)
Real VB ML/BIC
Speaker 16 14 15
![Page 23: Variational Bayesian Methods for Audio Indexing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815a87550346895dc7f85d/html5/thumbnails/23.jpg)
Experiments (file 4)
Real VB ML/BIC
Speaker 21 13 12