13 unsupervised learning - penn engineering › ~cis519 › spring2019 › lectures › 13... ·...
TRANSCRIPT
![Page 1: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/1.jpg)
Unsupervised Learning: K-‐Means &
Gaussian Mixture Models
![Page 2: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/2.jpg)
Unsupervised Learning • Supervised learning used labeled data pairs (x, y) to learn a func=on f : X→Y – But, what if we don’t have labels?
• No labels = unsupervised learning • Only some points are labeled = semi-‐supervised learning – Labels may be expensive to obtain, so we only get a few
• Clustering is the unsupervised grouping of data points. It can be used for knowledge discovery.
![Page 3: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/3.jpg)
K-‐Means Clustering
Some material adapted from slides by Andrew Moore, CMU.
Visit hMp://www.autonlab.org/tutorials/ for Andrew’s repository of Data Mining tutorials.
![Page 4: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/4.jpg)
Clustering Data
![Page 5: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/5.jpg)
K-‐Means Clustering
K-‐Means ( k , X ) • Randomly choose k cluster center loca=ons (centroids)
• Loop un=l convergence • Assign each point to the cluster of the closest centroid
• Re-‐es=mate the cluster centroids based on the data assigned to each cluster
![Page 6: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/6.jpg)
K-‐Means Clustering
K-‐Means ( k , X ) • Randomly choose k cluster center loca=ons (centroids)
• Loop un=l convergence • Assign each point to the cluster of the closest centroid
• Re-‐es=mate the cluster centroids based on the data assigned to each cluster
![Page 7: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/7.jpg)
K-‐Means Clustering
K-‐Means ( k , X ) • Randomly choose k cluster center loca=ons (centroids)
• Loop un=l convergence • Assign each point to the cluster of the closest centroid
• Re-‐es=mate the cluster centroids based on the data assigned to each cluster
![Page 8: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/8.jpg)
K-‐Means Anima=on
Example generated by Andrew Moore using Dan Pelleg’s super-duper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.
![Page 9: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/9.jpg)
K-‐Means Objec=ve Func=on
• K-‐means finds a local op=mum of the following objec=ve func=on:
arg min
S
kX
i=1
X
x2Si
kx� µik22
where S = {S1, . . . ,Sk} is a partitioning over
X = {x1, . . . ,xn} s.t. X =
Ski=1 Si
and µi = mean(Si)
arg min
S
kX
i=1
X
x2Si
kx� µik22
where S = {S1, . . . ,Sk} is a partitioning over
X = {x1, . . . ,xn} s.t. X =
Ski=1 Si
and µi = mean(Si)
![Page 10: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/10.jpg)
Problems with K-‐Means
• Very sensi=ve to the ini=al points – Do many runs of K-‐Means, each with different ini=al centroids
– Seed the centroids using a beMer method than randomly choosing the centroids
• e.g., Farthest-‐first sampling
• Must manually choose k – Learn the op=mal k for the clustering
• Note that this requires a performance measure
![Page 11: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/11.jpg)
• How do you tell it which clustering you want?
Constrained clustering techniques (semi-‐supervised)
Problems with K-‐Means
Same-‐cluster constraint (must-‐link)
Different-‐cluster constraint (cannot-‐link)
k = 2"
![Page 12: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/12.jpg)
Gaussian Mixture Models
• Recall the Gaussian distribu=on: P (x | µ,⌃) =
1p(2⇡)d|⌃|
exp
✓�1
2
(x� µ)|⌃�1(x� µ)
◆
![Page 13: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/13.jpg)
Clustering with Gaussian Mixtures: Slide 13 Copyright © 2001, 2004, Andrew W. Moore
The GMM assumption • There are k components. The
i’th component is called ωi
• Component ωi has an associated mean vector µi
µ1
µ2
µ3
![Page 14: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/14.jpg)
Clustering with Gaussian Mixtures: Slide 14 Copyright © 2001, 2004, Andrew W. Moore
The GMM assumption • There are k components. The
i’th component is called ωi
• Component ωi has an associated mean vector µi
• Each component generates data from a Gaussian with mean µi and covariance matrix σ2I
Assume that each datapoint is generated according to the following recipe:
µ1
µ2
µ3
![Page 15: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/15.jpg)
Clustering with Gaussian Mixtures: Slide 15 Copyright © 2001, 2004, Andrew W. Moore
The GMM assumption • There are k components. The
i’th component is called ωi
• Component ωi has an associated mean vector µi
• Each component generates data from a Gaussian with mean µi and covariance matrix σ2I
Assume that each datapoint is generated according to the following recipe:
1. Pick a component at random. Choose component i with probability P(ωi).
µ2
![Page 16: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/16.jpg)
Clustering with Gaussian Mixtures: Slide 16 Copyright © 2001, 2004, Andrew W. Moore
The GMM assumption • There are k components. The
i’th component is called ωi
• Component ωi has an associated mean vector µi
• Each component generates data from a Gaussian with mean µi and covariance matrix σ2I
Assume that each datapoint is generated according to the following recipe:
1. Pick a component at random. Choose component i with probability P(ωi).
2. Datapoint ~ N(µi, σ2I )
µ2
x
![Page 17: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/17.jpg)
Clustering with Gaussian Mixtures: Slide 17 Copyright © 2001, 2004, Andrew W. Moore
The General GMM assumption
µ1
µ2
µ3
• There are k components. The i’th component is called ωi
• Component ωi has an associated mean vector µi
• Each component generates data from a Gaussian with mean µi and covariance matrix Σi
Assume that each datapoint is generated according to the following recipe:
1. Pick a component at random. Choose component i with probability P(ωi).
2. Datapoint ~ N(µi , Σi )
![Page 18: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/18.jpg)
Fi8ng a Gaussian Mixture Model
(Optional)
![Page 19: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/19.jpg)
Clustering with Gaussian Mixtures: Slide 19
Just evaluate a Gaussian at xk
Copyright © 2001, 2004, Andrew W. Moore
Expectation-Maximization for GMMs Iterate until convergence: On the t’th iteration let our estimates be
λt = { µ1(t), µ2(t) … µc(t) }
E-step: Compute “expected” classes of all datapoints for each class
( ) ( ) ( )( )
( )( )∑
=
== c
jjjjk
iiik
tk
titiktki
tptwx
tptwxx
wwxxw
1
2
2
)(),(,p
)(),(,pp
P,p,P
I
I
σµ
σµ
λ
λλλ
M-step: Estimate µ given our data’s class membership distributions
( )( )
( )∑
∑=+
ktki
kk
tki
i xw
xxwt
λ
λ
,P
,P1µ
![Page 20: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/20.jpg)
Clustering with Gaussian Mixtures: Slide 20 Copyright © 2001, 2004, Andrew W. Moore
E.M. for General GMMs Iterate. On the t’th iteration let our estimates be
λt = { µ1(t), µ2(t) … µc(t), Σ1(t), Σ2(t) … Σc(t), p1(t), p2(t) … pc(t) }
E-step: Compute “expected” clusters of all datapoints
( ) ( ) ( )( )
( )( )∑
=
Σ
Σ== c
jjjjjk
iiiik
tk
titiktki
tpttwx
tpttwxx
wwxxw
1)()(),(,p
)()(),(,pp
P,p,P
µ
µ
λ
λλλ
M-step: Estimate µ, Σ given our data’s class membership distributions
pi(t) is shorthand for estimate of P(ωi) on t’th iteration
( )( )
( )∑
∑=+
ktki
kk
tki
i xw
xxwt
λ
λ
,P
,P1µ ( )
( ) ( )[ ] ( )[ ]
( )∑
∑ +−+−=+Σ
ktki
Tikik
ktki
i xw
txtxxwt
λ
µµλ
,P
11 ,P1
( )( )
R
xwtp k
tki
i
∑=+
λ,P1 R = #records
Just evaluate a Gaussian at xk
![Page 21: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/21.jpg)
(End op>onal sec>on)
![Page 22: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/22.jpg)
Clustering with Gaussian Mixtures: Slide 22 Copyright © 2001, 2004, Andrew W. Moore
Gaussian Mixture
Example: Start
Advance apologies: in Black and White this example will be
incomprehensible
![Page 23: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/23.jpg)
Clustering with Gaussian Mixtures: Slide 23 Copyright © 2001, 2004, Andrew W. Moore
After first iteration
![Page 24: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/24.jpg)
Clustering with Gaussian Mixtures: Slide 24 Copyright © 2001, 2004, Andrew W. Moore
After 2nd iteration
![Page 25: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/25.jpg)
Clustering with Gaussian Mixtures: Slide 25 Copyright © 2001, 2004, Andrew W. Moore
After 3rd iteration
![Page 26: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/26.jpg)
Clustering with Gaussian Mixtures: Slide 26 Copyright © 2001, 2004, Andrew W. Moore
After 4th iteration
![Page 27: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/27.jpg)
Clustering with Gaussian Mixtures: Slide 27 Copyright © 2001, 2004, Andrew W. Moore
After 5th iteration
![Page 28: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/28.jpg)
Clustering with Gaussian Mixtures: Slide 28 Copyright © 2001, 2004, Andrew W. Moore
After 6th iteration
![Page 29: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/29.jpg)
Clustering with Gaussian Mixtures: Slide 29 Copyright © 2001, 2004, Andrew W. Moore
After 20th iteration
![Page 30: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/30.jpg)
Clustering with Gaussian Mixtures: Slide 30 Copyright © 2001, 2004, Andrew W. Moore
Some Bio Assay data
![Page 31: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/31.jpg)
Clustering with Gaussian Mixtures: Slide 31 Copyright © 2001, 2004, Andrew W. Moore
GMM clustering
of the assay data
![Page 32: 13 Unsupervised Learning - Penn Engineering › ~cis519 › spring2019 › lectures › 13... · 2019-04-16 · Unsupervised+Learning+ • Supervised+learning+used+labeled+datapairs+](https://reader034.vdocuments.net/reader034/viewer/2022042401/5f102d6b7e708231d447d3d1/html5/thumbnails/32.jpg)
Clustering with Gaussian Mixtures: Slide 32 Copyright © 2001, 2004, Andrew W. Moore
Resulting Density
Estimator