lecture 20: model adaptation
DESCRIPTION
Lecture 20: Model Adaptation. Machine Learning April 15, 2010. Today. Adaptation of Gaussian Mixture Models Maximum A Posteriori (MAP) Maximum Likelihood Linear Regression (MLLR) Application: Speaker Recognition UBM-MAP + SVM. The Problem. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/1.jpg)
Lecture 20: Model Adaptation
Machine LearningApril 15, 2010
![Page 2: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/2.jpg)
Today
• Adaptation of Gaussian Mixture Models– Maximum A Posteriori (MAP)– Maximum Likelihood Linear Regression (MLLR)
• Application: Speaker Recognition– UBM-MAP + SVM
![Page 3: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/3.jpg)
The Problem
• I have a little bit of labeled data, and a lot of unlabeled data.
• I can model the trainingdata fairly well.
• But we always fit training data betterthan testing data.
• Can we use the wealthof unlabeled data to do better?
![Page 4: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/4.jpg)
Let’s use a GMM
• GMMs to model labeled data.• In simplest form, one mixture component per
class.
![Page 5: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/5.jpg)
Labeled training of GMM
• MLE estimators of parameters
• Or these can be used to seed EM.
![Page 6: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/6.jpg)
Adapting the mixtures to new data• Essentially, let EM start with MLE parameters as seeds. • Expand the available data for EM, proceed until convergence
![Page 7: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/7.jpg)
Adapting the mixtures to new data• Essentially, let EM start with MLE parameters as seeds. • Expand the available data for EM, proceed until convergence
![Page 8: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/8.jpg)
Problem with EM adaptation
• The initial labeled seeds could contribute very little to the final model
![Page 9: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/9.jpg)
One Problem with EM adaptation
• The initial labeled seeds could contribute very little to the final model
![Page 10: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/10.jpg)
MAP Adaptation• Constrain the contribution of unlabeled data.
• Let the alpha terms dictate how much weight to give to the new, unlabeled data compared to the exiting estimates.
![Page 11: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/11.jpg)
MAP adaptation
• The movement of the parameters is constrained.
![Page 12: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/12.jpg)
MLLR adaptation
• Another idea…• “Maximum Likelihood Linear Regression”.• Apply an affine transformation to the means.• Don’t change the covariance matrices
![Page 13: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/13.jpg)
MLLR adaptation
• Another view on adaptation.• Apply an affine transformation to the means.• Don’t change the covariance matrices
![Page 14: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/14.jpg)
MLLR adaptation
• The new means are the MLE of the means with the new data.
![Page 15: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/15.jpg)
MLLR adaptation
• The new means are the MLE of the means with the new data.
![Page 16: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/16.jpg)
MLLR adaptation• The new means are the MLE of the means with the new data.
![Page 17: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/17.jpg)
Why MLLR?• We can tie the transformation matrices of mixture components.• For example:
– You know that the red and green classes are similar– Assumption: Their transformations should be similar
![Page 18: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/18.jpg)
Why MLLR?• We can tie the transformation matrices of mixture components.• For example:
– You know that the red and green classes are similar– Assumption: Their transformations should be similar
![Page 19: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/19.jpg)
Application of Model Adaptation
• Speaker Recognition.• Task: Given speech from a known set of
speakers, identify the speaker.• Assume there is training data from each speaker.• Approach: – Model a generic speaker. – Identify a speaker by its difference from the generic
speaker– Measure this difference by adaptation parameters
![Page 20: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/20.jpg)
Speech Representation
• Extract a feature representation of speech.• Samples every 10ms.
MFCC – 16 dims
![Page 21: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/21.jpg)
Similarity of sounds
MFCC1
MFCC2 /s/
/b/
/o//u/
![Page 22: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/22.jpg)
Universal Background Model
• If we had labeled phone information that would be great.
• But it’s expensive, and time consuming.• So just fit a GMM to the MFCC representation
of all of the speech you have. – Generally all but one example, but we’ll come
back to this.
![Page 23: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/23.jpg)
MFCC Scatter
MFCC1
MFCC2 /s/
/b/
/o//u/
![Page 24: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/24.jpg)
UBM fitting
MFCC1
MFCC2 /s/
/b/
/o//u/
![Page 25: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/25.jpg)
MAP adaptation
• When we have a segment of speech to evaluate, – Generate MFCC features. – Use MAP adaptation on the UBM Gaussian
Mixture Model.
![Page 26: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/26.jpg)
MAP Adaptation
MFCC1
MFCC2 /s/
/b/
/o//u/
![Page 27: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/27.jpg)
MAP Adaptation
MFCC1
MFCC2 /s/
/b/
/o//u/
![Page 28: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/28.jpg)
UBM-MAP
• Claim:– The differences between speakers can be
represented by the movement of the mixture components of the UBM.
• How do we train this model?
![Page 29: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/29.jpg)
UBM-MAP training
TrainingData
Held outSpeaker N
UBM Training
MAP
Supervector
• Supervector– A vector of adapted
means of the gaussian mixture components
Train a supervised model with theselabeled vectors.
![Page 30: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/30.jpg)
UBM-MAP training
TrainingData
Held outSpeaker N
UBM Training
MAP
Supervector
Repeat for all training data
MulticlassSVM
Training
![Page 31: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/31.jpg)
UBM-MAP Evaluation
Test Data
UBM
MAP
Supervector MulticlassSVM
Prediction
![Page 32: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/32.jpg)
Alternate View
• Do we need all this?• What if we just train an SVM on labeled MFCC
data?
Test DataMulticlass
SVM
Prediction
LabeledTraining
Data
MulticlassSVM
Training
![Page 33: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/33.jpg)
Results
• UBM-MAP (with some variants) is the state-of-the-art in Speaker Recognition.– Current state of the art performance is about 97%
accuracy (~2.5% EER) with a few minutes of speech.
• Direct MFCC modeling performs about half as well ~5% EER.
![Page 34: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/34.jpg)
Model Adaptation
• Adaptation allows GMMs to be seeded with labeled data.
• Incorporation of unlabeled data gives a more robust model.
• Adaptation process can be used to differentiate members of the population – UBM-MAP
![Page 35: Lecture 20: Model Adaptation](https://reader035.vdocuments.net/reader035/viewer/2022062812/5681636c550346895dd44837/html5/thumbnails/35.jpg)
Next Time
• Spectral Clustering