speaker verification system using svm

Jun-Won Suh Intelligent Electronic Systems

Human and Systems EngineeringDepartment of Electrical and Computer Engineering

Speaker Verification System using SVM

of 12Research Progress: Jun-Won Suh

Outline – Summary of Ph.d Dissertation of Vincent Wan

• Speaker verification system

Extracting features

• Creating models of speakers

Generative models, discriminative models

Making generative models discriminative

• Developing speaker verification using SVMs

• My interest to improve our system.


Speaker verification system

• Authenticate a person’s claimed identity

• Text dependent and independent

The system models the sound of the client’s voice. (based on physical characteristics of the client’s vocal tract.)

A generic speaker verification system

• Feature extraction

• Enrolment

Creates a model for client’s voice

• Pattern matching

• Decision theory


Extracting features

• Building models of speakers depends on frequency analysis of the speaker’s voice.

• Linear predictive coding (LPC)

LPC assumes that speech can be modelled as the output of periodic pulses or random noise.

The solutions for these LPC coefficients is obtained by minimizing MSE.

• Perceptual linear prediction (PLP)

PLP combines LPC analysis with psychophysics knowledge of the human auditory system.

Ex: Human ear has a higher frequency resolution at low frequencies.


Creating models of speakers

• Generative models

Gaussian Mixture Model (GMM), Hidden Markov Model (HMM)

Models are probability density estimators that attempt to capture all of the fluctuations and variations of the data.

• Discriminative models

Polynomial classifiers, Support Vector Machines (SVM)

Models are optimized to minimize the error on a set of training samples.

Models draw the boundary between classes and ignores the fluctuations within each class.

• Generative models discriminative

Generative models use to estimate the within class probability densities and do not minimize a classification error.

Discriminative models achieves the highest performance in classification tasks.


Making generative models discriminative

• GMM-LR/SVM combination

GMM likelihood ratio

Bengio proposed that the probability estimates are not perfect and a better version would be

Bayes decision rule

)|(log)|(log)( XPMXPXS

cXPbMXPaXS )|(log)|(log)(

The input to the SVM is the two dimensional vector made up of the log likelihoods of the client and world models.

A limitation of these approaches arises from frame basis discrimination.

)|(log)|(log

)|(

)|(

XPMXPy

XP

MXP


Importance of kernels

• Early SVM using polynomial and RBF kernels

Optimization problems requiring significant computational resources that were unsustainable.

Employing cluster algorithms to reduce the accuracy.

Frame level training inputs discard the useful speaker classification information.

• SVM using score-space kernels

The variable length of utterance can be classified by sequence level.


Classifying sequences using score-space kernels

• The score-space kernel enables SVMs to classify whole sequences.

• A variable length sequence of input vectors is mapped explicitly onto a single point in a space of fixed dimension.

• The score-space is derived from the likelihood score.

• The likelihood ratio score-space

},...,{)}),,|(({)( 1^^ NkkkF

f

FxxXMXpfX

),|(

),|(log)}),|(({

22

11

MXP

MXPMXpf kkk

),|(

),|(log)(

22

11

MXP

MXPX


Computing the score-space vectors

Define the global likelihood of a sequence X = {x1, …, xNl}


Computing the score-space vectors

• The fixed length vectors of the likelihood ration kernel can be expressed as

• The final likelihood ratio kernel is

• The dimensionality of the score-space is equal to the total number of parameters in the generative models. Hence the SVM can classify the complete utterance sequences.

),|(log),|(log 2211 MXPMXP

)(

)()(

2

1

X

XX


Experiment Results on PolyVar

• The data has a noise.

• The data has a much more clients tests than YOHO.


Conclusion

• Add GMM-LR/SVM model in our verification system

• Add score-space kernel on SVM

Need to compare the computation requirement for Fisher and LR kernels.


References

• V. Wan, Speaker Verification using Support Vector Machines, University of Sheffield, June 2003

• V. Wan, Building Sequence Kernels for Speaker Verificaiton and Speech Recognition, University of Sheffield

• S. Bengio, and J. Marithoz, Learning the Decision Function for the Speaker Verification, IDIAP, 2001

speaker verification system using svm

Documents