made by: maor levy, temple university 2012 1. many data points, no labels 2
Post on 13-Dec-2015
213 Views
Preview:
TRANSCRIPT
1
Machine LearningUnit 5, Introduction to Artificial Intelligence, Stanford online course
Made by: Maor Levy, Temple University 2012
5
Choose a fixed number of clusters
Choose cluster centers and point-cluster allocations to minimize error
can’t do this by exhaustive search, because there are too many possible allocations.
Algorithm◦ fix cluster centers;
allocate points to closest cluster
◦ fix allocation; compute best cluster centers
x could be any set of features for which we can compute a distance (careful about scaling)
K-Means
x j i2
jelements of i'th cluster
iclusters
* From Marc Pollefeys COMP 256 2003
8
K-means clustering using intensity alone and color alone
Image Clusters on intensity Clusters on color
* From Marc Pollefeys COMP 256 2003
Results of K-Means Clustering:
9
Is an approximation to EM◦ Model (hypothesis space): Mixture of N Gaussians◦ Latent variables: Correspondence of data and Gaussians
We notice: ◦ Given the mixture model, it’s easy to calculate the
correspondence◦ Given the correspondence it’s easy to estimate the
mixture models
K-Means
10
Data generated from mixture of Gaussians
Latent variables: Correspondence between Data Items and Gaussians
Expectation Maximzation: Idea
14
Learning a Gaussian Mixture(with known covariance)
k
n
x
x
ni
ji
e
e
1
)(2
1
)(2
1
22
22
M-Step
k
nni
jiij
xxp
xxpzE
1
)|(
)|(][
E-Step
15
Converges! Proof [Neal/Hinton, McLachlan/Krishnan]:
◦ E/M step does not decrease data likelihood◦ Converges at local minimum or saddle point
But subject to local minima
Expectation Maximization
17
Number of Clusters unknown Suffers (badly) from local minima Algorithm:
◦ Start new cluster center if many points “unexplained”
◦ Kill cluster center that doesn’t contribute◦ (Use AIC/BIC criterion for all this, if you want to be
formal)
Practical EM
20
Spectral Clustering: Overview
Data Similarities Block-Detection
* Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University
21
Block matrices have block eigenvectors:
Near-block matrices have near-block eigenvectors: [Ng et al., NIPS 02]
Eigenvectors and Blocks
1 1 0 0
1 1 0 0
0 0 1 1
0 0 1 1
eigensolver
.71
.71
0
0
0
0
.71
.71
l1= 2 l2= 2 l3= 0 l4= 0
1 1 .2 0
1 1 0 -.2
.2 0 1 1
0 -.2 1 1
eigensolver
.71
.69
.14
0
0
-.14
.69
.71
l1= 2.02 l2= 2.02 l3= -0.02 l4= -0.02
* Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University
22
Can put items into blocks by eigenvectors:
Resulting clusters independent of row ordering:
Spectral Space
1 1 .2 0
1 1 0 -.2
.2 0 1 1
0 -.2 1 1
.71
.69
.14
0
0
-.14
.69
.71
e1
e2
e1 e2
1 .2 1 0
.2 1 0 1
1 0 1 -.2
0 1 -.2 1
.71
.14
.69
0
0
.69
-.14
.71
e1
e2
e1 e2
* Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University
23
The key advantage of spectral clustering is the spectral space representation:
The Spectral Advantage
* Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University
24
Measuring Affinity
Intensity
Texture
Distance
aff x, y exp 12 i
2
I x I y 2
aff x, y exp 12 d
2
x y
2
aff x, y exp 12 t
2
c x c y 2
* From Marc Pollefeys COMP 256 2003
29
Fit multivariate Gaussian
Compute eigenvectors of Covariance
Project onto eigenvectors with largest eigenvalues
Linear: Principal Components
30
Other examples of unsupervised learning
Mean face (after alignment)
Slide credit: Santiago Serrano
32
Isomap Local Linear Embedding
Non-Linear Techniques
Isomap, Science, M. Balasubmaranian and E. Schwartz
34
References:◦ Sebastian Thrun and Peter Norvig, Artificial Intelligence, Stanford
University http://www.stanford.edu/class/cs221/notes/cs221-lecture6-fall11.pdf
top related