expectation maximization and gaussian mixture models
DESCRIPTION
TRANSCRIPT
![Page 1: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/1.jpg)
Expectation Maximization and Mixture of Gaussians
1
![Page 2: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/2.jpg)
Recommend me some music!
Discover groups of similar songs…
君の知らない物語 (bpm
125)
Bach Sonata #1 (bpm 60)
Only my railgun (bpm
120) My Music Collection
Bpm 90!
2
![Page 3: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/3.jpg)
Recommend me some music!
Discover groups of similar songs…
My Music Collection
君の知らない物語 (bpm
125)
Bach Sonata #1 (bpm 60)
Only my railgun (bpm
120)
bpm 60
bpm 120
3
![Page 4: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/4.jpg)
An unsupervised classifying method
4
![Page 5: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/5.jpg)
1. Initialize K “means” , one for each class
Eg. Use random starting points, or choose k random points from the set
K=2
€
µk
€
µ2€
µ1
5
![Page 6: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/6.jpg)
2. Phase 1: Assign each point to closest mean
3. Phase 2: Update means of the new clusters
1 0
€
µk
6
![Page 7: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/7.jpg)
2. Phase 1: Assign each point to closest mean
3. Phase 2: Update means of the new clusters
0 1
€
µk
7
![Page 8: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/8.jpg)
2. Phase 1: Assign each point to closest mean
3. Phase 2: Update means of the new clusters
8
![Page 9: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/9.jpg)
2. Phase 1: Assign each point to closest mean
3. Phase 2: Update means of the new clusters
9
![Page 10: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/10.jpg)
2. Phase 1: Assign each point to closest mean
3. Phase 2: Update means of the new clusters
10
![Page 11: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/11.jpg)
0 1
2. Phase 1: Assign each point to closest mean
3. Phase 2: Update means of the new clusters
€
µk
11
![Page 12: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/12.jpg)
2. Phase 1: Assign each point to closest mean
3. Phase 2: Update means of the new clusters
12
![Page 13: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/13.jpg)
2. Phase 1: Assign each point to closest mean
3. Phase 2: Update means of the new clusters
€
µk
13
![Page 14: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/14.jpg)
2. Phase 1: Assign each point to closest mean
3. Phase 2: Update means of the new clusters
14
![Page 15: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/15.jpg)
4. When means do not change anymore clustering DONE.
15
![Page 16: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/16.jpg)
In K-means, a point can only have 1 class But what about points that lie in between
groups? eg. Jazz + Classical
16
![Page 17: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/17.jpg)
The Famous “GMM”: Gaussian Mixture Model
17
![Page 18: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/18.jpg)
€
p(X) = N(X |µ,Σ)
Gaussian == “Normal”
distribution
Mean
Variance
18
![Page 19: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/19.jpg)
€
p(X) = N(X |µ,Σ) + N(X |µ,Σ)
19
![Page 20: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/20.jpg)
€
p(X) = N(X |µ1,Σ1) + N(X |µ2,Σ2)
Variance
Example:
20
![Page 21: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/21.jpg)
€
p(X) = π1N(X |µ1,Σ1) + π 2N(X |µ2,Σ2)
€
π kk=1
k
∑ =1Mixing Coefficient
€
π1 = 0.7
Example:
€
π 2 = 0.321
![Page 22: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/22.jpg)
€
p(X) = π kN(X |µk,Σk )k=1
K
∑
€
K = 2Example:
22
![Page 23: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/23.jpg)
K-means is a classifier
Mixture of Gaussians is a probability model
We can USE it as a “soft” classifier
23
![Page 24: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/24.jpg)
K-means is a classifier
Mixture of Gaussians is a probability model
We can USE it as a “soft” classifier
24
![Page 25: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/25.jpg)
Parameter to fit to data: • Mean
K-means is a classifier
Mixture of Gaussians is a probability model
We can USE it as a “soft” classifier
Parameters to fit to data: • Mean • Covariance • Mixing coefficient
€
π k
€
Σk
€
µk
€
µk
25
![Page 26: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/26.jpg)
EM for GMM
26
![Page 27: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/27.jpg)
1. Initialize means 2. E Step: Assign each point to a cluster 3. M Step: Given clusters, refine mean of each
cluster k
4. Stop when change in means is small €
µk 1 0
€
µk
27
![Page 28: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/28.jpg)
1. Initialize Gaussian* parameters: means , covariances and mixing coefficients
2. E Step: Assign each point Xn an assignment score for each cluster k
3. M Step: Given scores, adjust , , for each cluster k
4. Evaluate likelihood. If likelihood or parameters converge, stop.
€
µk
€
Σk
€
π k
0.5 0.5
€
γ(znk )
€
Σk
€
µk€
π k
*There are k Gaussians
28
![Page 29: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/29.jpg)
1. Initialize , , one for each
Gaussian k
Tip! Use K-means result to initialize: €
µk
€
µ2
€
Σk
€
π k
€
Σ2
€
π 2
€
µk ← µk
€
Σk ← cov(cluster(K))
€
π k ← Number of points in k Total number of points
29
![Page 30: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/30.jpg)
2. E Step: For each point Xn, determine its assignment score to each Gaussian k:
.7 .3
is called a “responsibility”: how much is this Gaussian k responsible for this point Xn?
€
γ(znk )
Latent variable
30
![Page 31: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/31.jpg)
3. M Step: For each Gaussian k, update parameters using new
€
γ(znk )
Responsibility for this Xn
Mean of Gaussian k
Find the mean that “fits” the assignment scores best 31
![Page 32: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/32.jpg)
3. M Step: For each Gaussian k, update parameters using new
€
γ(znk )
Covariance matrix of Gaussian k
Just calculated this! 32
![Page 33: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/33.jpg)
3. M Step: For each Gaussian k, update parameters using new
€
γ(znk )
Mixing Coefficient for Gaussian k
Total # of points
eg. 105.6/200
33
![Page 34: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/34.jpg)
4. Evaluate log likelihood. If likelihood or parameters converge, stop. Else go to Step 2 (E step).
Likelihood is the probability that the data X was generated by the parameters you found. ie. Correctness!
34
![Page 35: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/35.jpg)
35
![Page 36: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/36.jpg)
1. Initialize parameters 2. E Step: Evaluate 3. M Step: Evaluate
4. Evaluate log likelihood. If likelihood or parameters converge, stop. Else
and go to E Step.
€
θ old
€
p(Z | X,θ old )
where
€
θ old ←θ new
Observed variables
Hidden variables
Likelihood
36
![Page 37: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/37.jpg)
K-means can be formulated as EM EM for Gaussian Mixtures EM for Bernoulli Mixtures EM for Bayesian Linear Regression
37
![Page 38: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/38.jpg)
“Expectation” Calculated the fixed, data-dependent
parameters of the function Q. “Maximization” Once the parameters of Q are known, it is fully
determined, so now we can maximize Q.
38
![Page 39: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/39.jpg)
We learned how to cluster data in an unsupervised manner
Gaussian Mixture Models are useful for modeling data with “soft” cluster assignments
Expectation Maximization is a method used when we have a model with latent variables (values we don’t know, but estimate with each step) 0.5 0.5
39
![Page 40: Expectation Maximization and Gaussian Mixture Models](https://reader033.vdocuments.net/reader033/viewer/2022051013/54840d43b47959140d8b4abd/html5/thumbnails/40.jpg)
My question: What other applications could use EM? How about EM of GMMs?
40