accelerating the em algorithm for mixture density estimation · 2020. 4. 24. · accelerating the...

21
Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department Worcester Polytechnic Instititute Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 1/18

Upload: others

Post on 25-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Accelerating the EM Algorithm for Mixture Density Estimation · 2020. 4. 24. · Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department

Accelerating the EM Algorithm

for Mixture Density Estimation

Homer Walker

Mathematical Sciences Department

Worcester Polytechnic Instititute

Joint work with Josh Plasse (WPI/Imperial College).

Research supported in part by DOE Grant DE-SC0004880 and NSF Grant DMS-1337943.

Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 1/18

Page 2: Accelerating the EM Algorithm for Mixture Density Estimation · 2020. 4. 24. · Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department

Mixture Densities

Consider a (finite) mixture density

p(x |Φ) =m∑i=1

αipi (x |φi ).

Problem: Estimate Φ = (α1, . . . , αm, φ1, . . . , φm) using an “unlabeled”sample {xk}Nk=1 on the mixture.

Maximum-Likelihood Estimate (MLE): Determine Φ = arg max L(Φ),where

L(Φ) ≡N∑

k=1

log p(xk |Φ).

Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 2/18

Page 3: Accelerating the EM Algorithm for Mixture Density Estimation · 2020. 4. 24. · Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department

The EM (Expectation-Maximization) Algorithm

The general formulation and name were given in . . .

A. P. Dempster, N. M. Laird, and D. B. Rubin (1977),Maximum-likelihood from incomplete data via the EM algorithm, J.Royal Statist. Soc. Ser. B (methodological), 39, pp. 1-38.

General idea: Determine the next approximate MLE to maximize theexpectation of the complete-data log-likelihood function, given theobserved incomplete data and the current approximate MLE.

Marvelous property: The log-likelihood function increases at eachiteration.

Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 3/18

Page 4: Accelerating the EM Algorithm for Mixture Density Estimation · 2020. 4. 24. · Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department

The EM Algorithm for Mixture Densities

For a mixture density, an EM iteration is . . .

α+i =

1

N

N∑k=1

αci pi (xk |φci )

p(xk |Φc),

φ+i = arg max

N∑k=1

log pi (xk |φi )αci pi (xk |φci )

p(xk |Φc)

For a derivation, convergence analysis, history, etc., see . . .

R. A. Redner and HW (1984), Mixture densities, maximum-likelihood,and the EM algorithm, SIAM Review, 26, 195–239.

Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 4/18

Page 5: Accelerating the EM Algorithm for Mixture Density Estimation · 2020. 4. 24. · Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department

Particular Example: Normal (Gaussian) Mixtures

Assume (multivariate) normal densities. For each i , φi = (µi ,Σi ) and

pi (x |φi ) =1

(2π)n/2(det Σi )1/2e−(x−µi )

T Σ−1i (x−µi )/2

EM iteration: For i = 1, . . . , m,

α+i =

1

N

N∑k=1

αci pi (xk |φ

ci )

p(xk |Φc ),

µ+i =

{N∑

k=1

xkαci pi (xk |φ

ci )

p(xk |Φc )

}/{N∑

k=1

αci pi (xk |φ

ci )

p(xk |Φc )

},

Σ+i =

{N∑

k=1

(xk − µ+i )(xk − µ+

i )Tαci pi (xk |φ

ci )

p(xk |Φc )

}/{N∑

k=1

αci pi (xk |φ

ci )

p(xk |Φc )

}.

Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 5/18

Page 6: Accelerating the EM Algorithm for Mixture Density Estimation · 2020. 4. 24. · Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department

EM Iterations Demo

A Univariate Normal Mixture.

I pi (x |φi ) = 1√2πσ2

i

e−(x−µi )2/(2σ2

i ) for i = 1, . . . , 5.

I Sample of 100,000 observations.— [α1, . . . , α5] = [.2, .3, .3, .1, .1]— [µ1, . . . , µ5] = [0, 1, 2, 3, 4],— [σ2

1 , . . . , σ25 ] = [.2, 2, .5, .1, .1].

I EM iterations on the means: µ+i =

{∑Nk=1 xk

αi pi (xk |φi )p(xk |Φ)

}/{∑Nk=1

αi pi (xk |φi )p(xk |Φ)

}.

!3 !2 !1 0 1 2 3 4 5!0.1

!0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

5!Population Mixture, Sample Size 10000, EM with No Acceleration, Iteration 0

0 20 40 60 80 100−14

−12

−10

−8

−6

−4

−2

0

Log

Res

idua

l Nor

m

Iteration Number

Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 6/18

Page 7: Accelerating the EM Algorithm for Mixture Density Estimation · 2020. 4. 24. · Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department

EM Iterations Demo

A Univariate Normal Mixture.

I pi (x |φi ) = 1√2πσ2

i

e−(x−µi )2/(2σ2

i ) for i = 1, . . . , 5.

I Sample of 100,000 observations.— [α1, . . . , α5] = [.2, .3, .3, .1, .1]— [µ1, . . . , µ5] = [0, 1, 2, 3, 4],— [σ2

1 , . . . , σ25 ] = [.2, 2, .5, .1, .1].

I EM iterations on the means: µ+i =

{∑Nk=1 xk

αi pi (xk |φi )p(xk |Φ)

}/{∑Nk=1

αi pi (xk |φi )p(xk |Φ)

}.

!3 !2 !1 0 1 2 3 4 5!0.1

!0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

5!Population Mixture, Sample Size 10000, EM with No Acceleration, Iteration 0

0 20 40 60 80 100−14

−12

−10

−8

−6

−4

−2

0

Log

Res

idua

l Nor

m

Iteration Number

Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 6/18

Page 8: Accelerating the EM Algorithm for Mixture Density Estimation · 2020. 4. 24. · Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department

Anderson Acceleration

Derived from a method of D. G. Anderson, Iterative procedures for nonlinear integralequations, J. Assoc. Comput. Machinery, 12 (1965), 547–560.

Consider a fixed-point iteration x+ = g(x), g : Rn → Rn.

Anderson Acceleration: Given x0 and mMax ≥ 1.

Set x1 = g(x0).

Iterate: For k = 1, 2, . . .

Set mk = min{mMax , k}.Set Fk = (fk−mk

, . . . , fk ), where fi = g(xi )− xi .

Solve minα∈Rmk+1 ‖Fkα‖2 s. t.∑mk

i=0 αi = 1.

Set xk+1 =∑mk

i=0 αig(xk−mk+i ).

Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 7/18

Page 9: Accelerating the EM Algorithm for Mixture Density Estimation · 2020. 4. 24. · Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department

EM Iterations Demo (cont.)

I pi (x |φi ) = 1√2πσ2

i

e−(x−µi )2/(2σ2

i ) for i = 1, . . . , 5.

I Sample of 100,000 observations.— [α1, . . . , α5] = [.2, .3, .3, .1, .1]— [µ1, . . . , µ5] = [0, 1, 2, 3, 4],— [σ2

1 , . . . , σ25 ] = [.2, 2, .5, .1, .1].

I EM iterations on the means: µ+i =

{∑Nk=1 xk

αi pi (xk |φi )p(xk |Φ)

}/{∑Nk=1

αi pi (xk |φi )p(xk |Φ)

}.

!3 !2 !1 0 1 2 3 4 5!0.1

!0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

5!Population Mixture, Sample Size 10000, EM with No Acceleration, Iteration 0

0 10 20 30 40 50 60 70 80 90 100−14

−12

−10

−8

−6

−4

−2

0

Log

Res

idua

l Nor

m

Iteration Number

Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 8/18

Page 10: Accelerating the EM Algorithm for Mixture Density Estimation · 2020. 4. 24. · Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department

EM Iterations Demo (cont.)

I pi (x |φi ) = 1√2πσ2

i

e−(x−µi )2/(2σ2

i ) for i = 1, . . . , 5.

I Sample of 100,000 observations.— [α1, . . . , α5] = [.2, .3, .3, .1, .1]— [µ1, . . . , µ5] = [0, 1, 2, 3, 4],— [σ2

1 , . . . , σ25 ] = [.2, 2, .5, .1, .1].

I EM iterations on the means: µ+i =

{∑Nk=1 xk

αi pi (xk |φi )p(xk |Φ)

}/{∑Nk=1

αi pi (xk |φi )p(xk |Φ)

}.

!3 !2 !1 0 1 2 3 4 5!0.1

!0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

5!Population Mixture, Sample Size 10000, EM with No Acceleration, Iteration 0

0 10 20 30 40 50 60 70 80 90 100−14

−12

−10

−8

−6

−4

−2

0

Log

Res

idua

l Nor

m

Iteration Number

Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 8/18

Page 11: Accelerating the EM Algorithm for Mixture Density Estimation · 2020. 4. 24. · Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department

EM Convergence and “Separation”

Redner–W (1984): For mixture densities, the convergence is linear anddepends on the “separation” of the component populations:

“well-separated” (fast convergence) if, whenever i 6= j ,

pi (x |φ∗i )

p(x |Φ∗)·pj(x |φ∗j )

p(x |Φ∗)≈ 0 for all x ∈ IRn;

“poorly separated” (slow convergence) if, for some i 6= j ,

pi (x |φ∗i )

p(x |Φ∗)≈

pj(x |φ∗j )

p(x |Φ∗)for all x ∈ Rn.

Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 9/18

Page 12: Accelerating the EM Algorithm for Mixture Density Estimation · 2020. 4. 24. · Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department

Example: EM Convergence and “Separation”

A Univariate Normal Mixture.

I pi (x |φi ) = 1√2πσ2

i

e−(x−µi )2/(2σ2

i ) for i = 1, . . . , 3.

I EM iterations on the means: µ+i =

{∑Nk=1 xk

αi pi (xk |φi )p(xk |Φ)

}/{∑Nk=1

αi pi (xk |φi )p(xk |Φ)

}.

I Sample of 100,000 observations.

— [α1, α2, α3] = [.3, .3, .4], [σ21 , σ

22 , σ

23 ] = [1, 1, 1].

— [µ1, µ2, µ3] = [0, 2, 4], [0, 1, 2], [0, .5, 1].

0 10 20 30 40 50 60 70 80 90 100−14

−12

−10

−8

−6

−4

−2

0

Log

Res

idua

l Nor

m

Iteration Number

Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 10/18

Page 13: Accelerating the EM Algorithm for Mixture Density Estimation · 2020. 4. 24. · Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department

Example: EM Convergence and “Separation”

A Univariate Normal Mixture.

I pi (x |φi ) = 1√2πσ2

i

e−(x−µi )2/(2σ2

i ) for i = 1, . . . , 3.

I EM iterations on the means: µ+i =

{∑Nk=1 xk

αi pi (xk |φi )p(xk |Φ)

}/{∑Nk=1

αi pi (xk |φi )p(xk |Φ)

}.

I Sample of 100,000 observations.

— [α1, α2, α3] = [.3, .3, .4], [σ21 , σ

22 , σ

23 ] = [1, 1, 1].

— [µ1, µ2, µ3] = [0, 2, 4], [0, 1, 2], [0, .5, 1].

0 10 20 30 40 50 60 70 80 90 100−14

−12

−10

−8

−6

−4

−2

0

Log

Res

idua

l Nor

m

Iteration Number

Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 10/18

Page 14: Accelerating the EM Algorithm for Mixture Density Estimation · 2020. 4. 24. · Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department

Experiments with Multivariate Normal Mixtures

Experiment with Anderson acceleration applied to . . .

EM iteration: For i = 1, . . . , m,

α+i =

1

N

N∑k=1

αci pi (xk |φ

ci )

p(xk |Φc ),

µ+i =

{N∑

k=1

xkαci pi (xk |φ

ci )

p(xk |Φc )

}/{N∑

k=1

αci pi (xk |φ

ci )

p(xk |Φc )

},

Σ+i =

{N∑

k=1

(xk − µ+i )(xk − µ+

i )Tαci pi (xk |φ

ci )

p(xk |Φc )

}/{N∑

k=1

αci pi (xk |φ

ci )

p(xk |Φc )

}.

Assume m is known.

Ultimate interest: very large N.

Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 11/18

Page 15: Accelerating the EM Algorithm for Mixture Density Estimation · 2020. 4. 24. · Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department

Experiments with Multivariate Normal Mixtures (cont.)

Two issues:

I Good initial guess? Use K-means.

— Fast clustering algorithm. Usually gives good results.

— Apply several times to random subsets of the sample. Choose theclustering with minimal sum of within-class distances.

— Use proportions, means, covariance matrices for the clusters as the initialguess.

I Preserving constraints? Iterate on . . .

—√αi , i = 1, . . . , m;

— Cholesky factors of each Σi .

Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 12/18

Page 16: Accelerating the EM Algorithm for Mixture Density Estimation · 2020. 4. 24. · Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department

Experiments with Generated Data

I All computing in MATLAB.

I Mixtures with m = 5 subpopulations.

I Generated data in Rd for d = 2, 5, 10, 15, 20:

— For each d , randomly generated 100 “true” {αi , µi ,Σi}5i=1.

— For each {αi , µi ,Σi}5i=1, randomly generated a sample of size

N = 1, 000, 000.

I Compared (unaccelerated) EM with EM+AA with mMax = 5, 10, 15, 20, 25, 30.

Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 13/18

Page 17: Accelerating the EM Algorithm for Mixture Density Estimation · 2020. 4. 24. · Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department

Experiments with Generated Data (cont.)

A look at failures.

mMax 0 5 10 15 20 25 30

? 75 66 52 52 51 51 51

?? 0 4 19 23 28 29 29

Totals 75 70 71 75 79 80 80

?⇒ failure to converge within 300 iterations. ??⇒∑N

k=1 αi pi (xk )/p(xk ) = 0 for some i .

I There were . . .— 49 trials in which all methods failed,— 26 trials in which EM failed and EM+AA succeeded for at least one mMax ,— 15 trials in which EM failed and EM+AA succeeded for all mMax ,— 20 trials in which EM succeeded and EM+AA failed for all mMax ,— 21 trials in which EM succeeded and EM+AA failed for at least one mMax .

Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 14/18

Page 18: Accelerating the EM Algorithm for Mixture Density Estimation · 2020. 4. 24. · Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department

Experiments with Generated Data (cont.)

Performance profiles (Dolan-More, 2002) for (unaccelerated) EM and EM+AA withmMax = 5 over all trials:

0 5 10 15 20

0

0.2

0.4

0.6

0.8

1

mMax = 0mMax = 5

0 2 4 6 8 10 12 14 16 18 20

0

0.2

0.4

0.6

0.8

1

mMax = 0mMax = 5

Iteration Numbers Run Times

Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 15/18

Page 19: Accelerating the EM Algorithm for Mixture Density Estimation · 2020. 4. 24. · Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department

An Experiment with Real Data

I Remotely sensed data from near Tollhouse, CA.(Thanks to Brett Bader, Digital Globe.)

I N = 3285× 959 = 3150315 observations of16-dimensional multispectral data.

I Modeled with a mixture of m = 3 multivariate normals.

I Applied (unaccelerated) EM and EM+AA withmMax = 5, 10, 15, 20, 25,30.

Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 16/18

Page 20: Accelerating the EM Algorithm for Mixture Density Estimation · 2020. 4. 24. · Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department

An Experiment with Real Data (cont.)

Log residual norms vs. iteration numbers.

Right: Bayes classification of data based on MLE.

Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 17/18

Page 21: Accelerating the EM Algorithm for Mixture Density Estimation · 2020. 4. 24. · Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department

In Conclusion . . .

I Anderson acceleration is a promising tool for accelerating the EM algorithm thatmay improve both robustness and efficiency.

I Future work:

— Expand generated-data experiments to include more trials, larger data sets,well-controlled “separation” experiments, “partially-labeled” samples, andother parametric PDF forms.

— Look for more data from real applications.

Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 18/18