clustering. what is cluster analysis k-means adaptive initialization em learning mixture gaussians...
Post on 21-Dec-2015
241 views
TRANSCRIPT
![Page 1: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/1.jpg)
Clustering
![Page 2: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/2.jpg)
What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians
![Page 3: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/3.jpg)
k-Means Clustering
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
![Page 4: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/4.jpg)
Feature space Sample
€
rx (1),
r x (2),..,
r x (k ),..,
r x (n )
{ }
€
rx =
x1
x2
..
..
xd
⎧
⎨
⎪ ⎪ ⎪
⎩
⎪ ⎪ ⎪
∈ ℜ d
€
rx −
r y = (x i − y i)
2
i=1
d
∑
![Page 5: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/5.jpg)
Norm
||x|| ≥ 0 equality only if x=0 || x||=|| ||x|| ||x1+x2||≤ ||x1||+||x2||
lp norm
€
rx
p= x i
p
i=1
d
∑ ⎛
⎝ ⎜
⎞
⎠ ⎟
1
p
![Page 6: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/6.jpg)
Metric
d(x,y) ≥ 0 equality holds only if x=y d(x,y) = d(y,x) d(x,y) ≤ d(x,z)+d(z,y)
€
d2(r x ,
r z ) = x i − zi( )
2
i=1
d
∑ ⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟
1
2
![Page 7: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/7.jpg)
k-means Clustering
Cluster centers c1,c2,.,ck with clusters C1,C2,.,Ck
€
d2(r x ,
r z ) = x i − zi( )
2
i=1
d
∑ ⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟
1
2
![Page 8: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/8.jpg)
Error
The error function has a local minima if,
€
E = d2(x,c j )2
x∈C j
∑j=1
k
∑
![Page 9: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/9.jpg)
k-means Example(K=2)
Pick seeds
Reassign clusters
Compute centroids
xx
Reasssign clusters
xx xx Compute centroids
Reassign clusters
Converged!
![Page 10: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/10.jpg)
AlgorithmRandom initialization of k cluster centers
do{
-assign to each xi in the dataset the nearest cluster center (centroid) cj according to d2
-compute all new cluster centers }until ( |Enew - Eold| < or number of iterations max_iterations)
![Page 11: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/11.jpg)
Adaptive k-means learning (batch modus)for large datasets
Random initialization of cluster centersdo{
chose xi from the dataset
cj* nearest cluster center (centroid) cj according to d2
}until ( |Enew - Eold| < or number of iterations max_iterations)
€
rc j
*new =r c j
*old +1
C j*old +1
x −r c j
*old( )
![Page 12: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/12.jpg)
![Page 13: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/13.jpg)
How to chose k? You have to know your data!
Repeated runs of k-means clustering on the same data can lead to quite different partition results Why? Because we use random initialization
![Page 14: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/14.jpg)
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
![Page 15: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/15.jpg)
Adaptive Initialization Choose a maximum radius within every data
point should have a cluster seed after completion of the initialization phase
In a single sweep go through the data and assigns the cluster seeds according to the chosen radius A data point becomes a new cluster seed, if it is not
covered by the spheres with the chosen radius of the other already assigned seeds
K-MAI clustering (Wichert et al. 2003)
![Page 16: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/16.jpg)
EM
Expectation Maximization Clustering
![Page 17: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/17.jpg)
Feature space Sample
€
rx (1),
r x (2),..,
r x (k ),..,
r x (n )
{ }
€
rx =
x1
x2
..
..
xd
⎧
⎨
⎪ ⎪ ⎪
⎩
⎪ ⎪ ⎪
∈ ℜ d
€
rx −
r y
m= (
r x −
r μ )T Σ−1(
r x −
r μ )
Mahalanobis distance
![Page 18: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/18.jpg)
Bayes’s rule
After the evidence is obtained; posterior probability P(a|b) The probability of a given that all we know is b
(Reverent Thomas Bayes 1702-1761)
€
P(b | a) =P(a | b)P(b)
P(a)
![Page 19: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/19.jpg)
Covariance Measuring the tendency two features xi and xj
varying in the same direction The covariance between features xi and xj is
estimated for n patterns
€
c ij =
x i(k ) − mi( ) x j
(k ) − m j( )k=1
n
∑n −1
![Page 20: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/20.jpg)
€
Σ=
c11 c12 .. c1d
c21 c22 .. c2d
.. .. .. ..
cd1 cd 2 .. cdd
⎡
⎣
⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥
![Page 21: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/21.jpg)
Learning Mixture Gaussians
What kind of probability distribution might have generated the data
Clustering presumes that the data are generated from mixture distributions, P
![Page 22: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/22.jpg)
The Normal Density Univariate density
Density which is analytically tractable Continuous density A lot of processes are asymptotically Gaussian
Where: = mean (or expected value) of x 2 = expected squared deviation or variance
,x
2
1exp
2
1)x(P
2
⎥⎥⎦
⎤
⎢⎢⎣
⎡⎟⎠
⎞⎜⎝
⎛ −−=
π
![Page 23: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/23.jpg)
![Page 24: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/24.jpg)
Example: Mixture of 2 Gaussians
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
![Page 25: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/25.jpg)
Multivariate density
Multivariate normal density in d dimensions is:
where:
x = (x1, x2, …, xd)t (t stands for the transpose vector form)
= (1, 2, …, d)t mean vector Σ = d*d covariance matrix
|Σ| and Σ-1 are determinant and inverse respectively
⎥⎦
⎤⎢⎣
⎡ −−−= − )x()x(2
1exp
)2(
1)x(P 1t
2/12/dμΣμ
Σπ
![Page 26: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/26.jpg)
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
![Page 27: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/27.jpg)
Example: Mixture of 3 Gaussians
![Page 28: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/28.jpg)
A mixture distribution has k components, each of which is a distribution in its own
A data point is generated by first choosing a component and than generating a sample from that component
![Page 29: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/29.jpg)
Let C denote the component with values 1,…,k Mixture distribution is given by
x refers to the data point wi=P(C=i) the weight of each component µi the mean (vector) of each component ∑i (matrix)
the covariance of each component€
P(x) = P(C = i)P(x | C = i)i=1
k
∑
⎥⎦
⎤⎢⎣
⎡ −−−= − )x()x(2
1exp
)2(
1)x(P 1t
2/12/dμΣμ
Σπ
€
1= P(C = i)I =1
k
∑
![Page 30: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/30.jpg)
If we knew which component generated each data point, then it would be easy to recover the component Gaussians
We could fit the parameters of a Gaussian to a data set
⎥⎦
⎤⎢⎣
⎡ −−−= − )x()x(2
1exp
)2(
1)x(P 1t
2/12/dμΣμ
Σπ
![Page 31: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/31.jpg)
Basic EM idea Pretend that we know the parameters of the
model Infer the probability that each data point
belongs to each component Refit the component to the data, where each
component is fitted to the entire data set Each point is weighted by the probability that it
belongs to that component
![Page 32: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/32.jpg)
Algorithm We initialize the mixture parameters arbitrarily
E- step (expectation): Compute the probabilities pij=P(C=i|xj), the
probability that xj was generated by the component I
By Bayes’ rule pij=P(xj|C=i)P(C=i)
• P(xj|C=i) is just the probability at xj of the ith Gaussian
• P(C=i) is just the weight parameter of the ith Gaussian
€
pi = pij
j=1
n
∑
![Page 33: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/33.jpg)
M-step (maximization):
wi=P(C=i)
€
i ←pij
r x j
pij=1
n
∑
€
Σi ←pij
r x j
r x j
T
pij=1
n
∑
€
wi ← pi
![Page 34: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/34.jpg)
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
![Page 35: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/35.jpg)
![Page 36: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/36.jpg)
![Page 37: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/37.jpg)
![Page 38: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/38.jpg)
![Page 39: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/39.jpg)
![Page 40: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/40.jpg)
![Page 41: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/41.jpg)
Problems Gaussians component shrinks so that it covers just a
single point Variance goes to zero, and likelihood will go to infinity Two components can “merge”, acquiring identical
means and variances and sharing their data points Serious problems, especially in high dimensions
It helps to initialize the parameters with reasonable values
![Page 42: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/42.jpg)
k-Means vs Mixture of Gaussians Both are iterative algorithms to assign points to clusters
K-Means: minimize
MixGaussian: maximize P(x|C=i)
Mixture of Gaussian is the more general formulation
Equivalent to k-Means when ∑i =I ,
⎥⎦
⎤⎢⎣
⎡ −−−= − )x()x(2
1exp
)2(
1)x(P 1t
2/12/dμΣμ
Σπ
€
P(C = i) =1
kC = i
0 else
⎧ ⎨ ⎪
⎩ ⎪
€
E = d2(x,c j )2
x∈C j
∑j=1
k
∑
![Page 43: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/43.jpg)
What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians
![Page 44: Clustering. What is Cluster Analysis k-Means Adaptive Initialization EM Learning Mixture Gaussians E-step M-step k-Means vs Mixture of Gaussians](https://reader035.vdocuments.net/reader035/viewer/2022062216/56649d555503460f94a32646/html5/thumbnails/44.jpg)
Tree Clustering COBWEB