approximating the kullback- leibler divergence between gaussian mixture models icassp 2007 john r....

23
Approximating The Kullback-Leibler Divergence Between Gaussian Mixture Models ICASSP 2007 John R. Hershey and Peder A. Olsen IBM T. J. Watson Research Center Speaker: 孝孝

Upload: joan-walton

Post on 24-Dec-2015

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Approximating The Kullback- Leibler Divergence Between Gaussian Mixture Models ICASSP 2007 John R. Hershey and Peder A. Olsen IBM T. J. Watson Research

Approximating The Kullback-Leibler Divergence Between Gaussian Mixture

ModelsICASSP 2007

John R. Hershey and Peder A. OlsenIBM T. J. Watson Research Center

Speaker: 孝宗

Page 2: Approximating The Kullback- Leibler Divergence Between Gaussian Mixture Models ICASSP 2007 John R. Hershey and Peder A. Olsen IBM T. J. Watson Research

Outline

• Introduction• Kullback-Leibler Divergence

• Methods• Monte Carlo Sampling• The Unscented Transformation• Gaussian Approximations• The Product of Gaussian Approximation• The Matched Bound Approximation• The Variational Approximation• The Variational Upper Bound

Page 3: Approximating The Kullback- Leibler Divergence Between Gaussian Mixture Models ICASSP 2007 John R. Hershey and Peder A. Olsen IBM T. J. Watson Research

Introduction

• Kullback-Leibler Divergence: relative entropy• KLD between two PDF and

• Three properties:1. Self similarity: 2. Self identification: 3. Positivity:

Page 4: Approximating The Kullback- Leibler Divergence Between Gaussian Mixture Models ICASSP 2007 John R. Hershey and Peder A. Olsen IBM T. J. Watson Research

Introduction

• The KL divergence is used in many aspects of speech and image recognition, such as determining if two acoustic models are similar,[2], measuring how confusable two words or HMMs are, [3, 4, 5]

• For two Gaussians and the KLD has closed formed expression,

• Whereas for two GMMs no such closed form expression exists.

Page 5: Approximating The Kullback- Leibler Divergence Between Gaussian Mixture Models ICASSP 2007 John R. Hershey and Peder A. Olsen IBM T. J. Watson Research

Monte Carlo Sampling

• The idea is to draw a sample from the pdf such that • Using n i.i.d. samples we have( if f(x) is PDF)

• To draw a sample from a GMM we first draw a discrete sample according to the probabilities . Then we draw a continuous sample from the resulting gaussian component

Alex Hong
identical independent distributed
Page 6: Approximating The Kullback- Leibler Divergence Between Gaussian Mixture Models ICASSP 2007 John R. Hershey and Peder A. Olsen IBM T. J. Watson Research

The Unscented Transformation

3. Non-linear transform

1.Choise sigma point2.Each sigma points has a weight

4. Compute a Gaussian from weighted points

http://ais.informatik.uni-freiburg.de/teaching/ws12/mapping/pdf/slam05-ukf.pdf

Page 7: Approximating The Kullback- Leibler Divergence Between Gaussian Mixture Models ICASSP 2007 John R. Hershey and Peder A. Olsen IBM T. J. Watson Research

The Unscented Transformation• An approach to estimate in such a way that

approximation is exact for all quadratic function • It is possible to pick up 2d sigma points such that

• One possible choice of the sigma points is

Page 8: Approximating The Kullback- Leibler Divergence Between Gaussian Mixture Models ICASSP 2007 John R. Hershey and Peder A. Olsen IBM T. J. Watson Research

The Unscented Transformation

Page 9: Approximating The Kullback- Leibler Divergence Between Gaussian Mixture Models ICASSP 2007 John R. Hershey and Peder A. Olsen IBM T. J. Watson Research

Gaussian Approximations

• Replace and with Gaussians

• Another method :

Page 10: Approximating The Kullback- Leibler Divergence Between Gaussian Mixture Models ICASSP 2007 John R. Hershey and Peder A. Olsen IBM T. J. Watson Research

The Product of Gaussian Approximation

• The likelihood defined by

So…

Page 11: Approximating The Kullback- Leibler Divergence Between Gaussian Mixture Models ICASSP 2007 John R. Hershey and Peder A. Olsen IBM T. J. Watson Research

The Product of Gaussian Approximation

*Jensen’s Inequality因為 log是 concave function

Page 12: Approximating The Kullback- Leibler Divergence Between Gaussian Mixture Models ICASSP 2007 John R. Hershey and Peder A. Olsen IBM T. J. Watson Research

The Product of Gaussian Approximation

• Closed-form solution

• Self similarity• Self identification • Positivity • tends to greatly underestimate

(Jensen’s Inequality)

Page 13: Approximating The Kullback- Leibler Divergence Between Gaussian Mixture Models ICASSP 2007 John R. Hershey and Peder A. Olsen IBM T. J. Watson Research

The Matched Bound Approximation• If and have the same number of components

𝐷 ¿≤∫∑

𝑖

𝜋 𝑖 𝑓 𝑖 log (𝜋𝑖 𝑓 𝑖𝜔𝑖𝑔𝑖

)

¿∫∑𝑖

𝜋𝑖 𝑓 𝑖 log (𝜋 𝑖

𝜔 𝑖

)+∫∑𝑖

𝜋 𝑖 𝑓 𝑖 log (𝑓 𝑖

𝑔𝑖

)

¿∑𝑖

𝜋 𝑖 log (𝜋 𝑖

𝜔 𝑖

)∫ 𝑓 𝑖+∑𝑖

𝜋 𝑖∫ 𝑓 𝑖 log (𝑓 𝑖

𝑔𝑖

)¿𝐷 ¿¿∑

𝑖

𝜋 𝑖¿¿為 1

Log-sum inequality

Page 14: Approximating The Kullback- Leibler Divergence Between Gaussian Mixture Models ICASSP 2007 John R. Hershey and Peder A. Olsen IBM T. J. Watson Research

The Matched Bound Approximation• Goldberger’s approximate formula• Define a match function,

• Self similarity• Self identification • Positivity

𝑚 (𝑎 )=argmin𝑏𝐷 ¿¿

𝐷𝑔𝑜𝑙𝑑𝑏𝑒𝑟𝑔𝑒𝑟 ¿

Page 15: Approximating The Kullback- Leibler Divergence Between Gaussian Mixture Models ICASSP 2007 John R. Hershey and Peder A. Olsen IBM T. J. Watson Research

The Variational Approximation• Define variational parameters such that

• We get the best bound by maximizing with respect to .

b

bbxf xgE )(log)(

b ab

bbabxf

xgE

||)(

)(log

b ab

bbabxf

xgE

||)(

)(log

≝ℒ 𝑓 (𝑔 ,𝜙 )

𝐿 𝑓 (𝑔)≝𝐸 𝑓 (𝑥 ) log 𝑔(𝑥 )

Lower bound

.ˆ)||(

)||(

| ba

ba

gfD

b b

gfDb

ab e

e

Page 16: Approximating The Kullback- Leibler Divergence Between Gaussian Mixture Models ICASSP 2007 John R. Hershey and Peder A. Olsen IBM T. J. Watson Research

The Variational Approximation .ˆ

)||(

ˆ ˆ

)||(

| aa

aa

ffD

a a

ffDa

aa e

e

Define: 𝐷𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛𝑎𝑙¿

𝐷𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛𝑎𝑙¿For equal numbers of components, if we restrict and to have only one non-zero element for a given a, the formula reduces exactly to the chain ruleupper bound given in equation (13).

Page 17: Approximating The Kullback- Leibler Divergence Between Gaussian Mixture Models ICASSP 2007 John R. Hershey and Peder A. Olsen IBM T. J. Watson Research

The Variational Upper Bound• Define satisfying the constraints ,

• With this notation we use Jensen’s inequality to obtain an upper bound of the KL divergence as follows

Page 18: Approximating The Kullback- Leibler Divergence Between Gaussian Mixture Models ICASSP 2007 John R. Hershey and Peder A. Olsen IBM T. J. Watson Research

The Variational Upper Bound• finding the variational parameters that minimize • The problem is convex in as well as in so we can fix one and

optimize for the other

• Since any zero in and are fixed under the iteration we recommend starting with

Page 19: Approximating The Kullback- Leibler Divergence Between Gaussian Mixture Models ICASSP 2007 John R. Hershey and Peder A. Olsen IBM T. J. Watson Research

Experiments

• the acoustic model consists of a total of 9,998 gaussians belonging to 826 separate GMMs. The number of gaussians per GMM varies from 1 to 76, of which 5 mixtures attained the lower bound of 1.• The median number of gaussians per GMM was 9.

We used all combinations of these 826 GMMs to test the various approximations to the KL divergence.

Page 20: Approximating The Kullback- Leibler Divergence Between Gaussian Mixture Models ICASSP 2007 John R. Hershey and Peder A. Olsen IBM T. J. Watson Research

Experiments• Each of the methods was compared to the reference approximation, which is

the Monte Carlo method with one million samples, denoted DMC(1M).

• The vertical axis represents the probability derived from a histogram of the deviations taken across all pairs of GMMs.

Page 21: Approximating The Kullback- Leibler Divergence Between Gaussian Mixture Models ICASSP 2007 John R. Hershey and Peder A. Olsen IBM T. J. Watson Research

Experiments

Page 22: Approximating The Kullback- Leibler Divergence Between Gaussian Mixture Models ICASSP 2007 John R. Hershey and Peder A. Olsen IBM T. J. Watson Research

Experiments

Page 23: Approximating The Kullback- Leibler Divergence Between Gaussian Mixture Models ICASSP 2007 John R. Hershey and Peder A. Olsen IBM T. J. Watson Research

Conclusion

• If accuracy is the primary concern, then MC is clearly best. • when computation time is an issue, or when

gradients need to be evaluated, the proposed methods may be useful.• Finally, some of the more popular methods, , , and ,

should be avoided, since better alternatives exist.