unsupervised learning: clustering rong jin outline unsupervised learning k means for clustering ...

35
Unsupervised Learning: Clustering Rong Jin

Post on 19-Dec-2015

236 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

Unsupervised Learning: Clustering

Rong Jin

Page 2: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

Outline Unsupervised learning K means for clustering Expectation Maximization algorithm for clustering

Page 3: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

Unsupervised vs. Supervised Learning Supervised learning

Training data

Every training example is labeled Unsupervised learning

Training data No data is labeled

We can still discover the structure of the data Semi-supervised learning

Training data

Mixture of labeled and unlabeled data

1 1 2 2{( , ), ( , ),..., ( , )}n nD x y x y x y

1 2{( ), ( ),..., ( )}nD x x x

1 1 2 2 1 2( , ), ( , ),..., ( , ); ( ), ( ),..., ( )n n nD x y x y x y x x x

Can you think of ways to utilize the unlabeled

data for improving predication?

Page 4: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

Unsupervised Learning Clustering Visualization Density Estimation Outlier/Novelty Detection Data Compression

Page 5: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

Clustering/Density Estimation

$$$

age

Page 6: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

Clustering for Visualization

Page 7: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

Image Compression

http://www.ece.neu.edu/groups/rpl/kmeans/

Page 8: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

K-means for Clustering Key for clustering

Find cluster centers Determine appropriate

clusters (very very hard)

K-means Start with a random

guess of cluster centers Determine the

membership of each data points

Adjust the cluster centers

Page 9: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

K-means1. Ask user how many

clusters they’d like. (e.g. k=5)

Page 10: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

K-means1. Ask user how many

clusters they’d like. (e.g. k=5)

2. Randomly guess k cluster Center locations

Page 11: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

K-means1. Ask user how many

clusters they’d like. (e.g. k=5)

2. Randomly guess k cluster Center locations

3. Each datapoint finds out which Center it’s closest to. (Thus each Center “owns” a set of datapoints)

Page 12: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

K-means1. Ask user how many

clusters they’d like. (e.g. k=5)

2. Randomly guess k cluster Center locations

3. Each datapoint finds out which Center it’s closest to.

4. Each Center finds the centroid of the points it owns

Page 13: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

K-means1. Ask user how many

clusters they’d like. (e.g. k=5)

2. Randomly guess k cluster Center locations

3. Each datapoint finds out which Center it’s closest to.

4. Each Center finds the centroid of the points it ownsAny Computational Problem?

Computational Complexity: O(N) where N is the number of points?

Page 14: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

Improve K-means Group points by region

KD tree SR tree

Key difference Find the closest center for

each rectangle Assign all the points within a

rectangle to one cluster

Page 15: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

Improved K-means Find the closest center for

each rectangle Assign all the points within

a rectangle to one cluster

Page 16: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

Improved K-means

Page 17: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

Improved K-means

Page 18: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

Improved K-means

Page 19: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

Improved K-means

Page 20: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

Improved K-means

Page 21: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

Improved K-means

Page 22: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

Improved K-means

Page 23: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

Improved K-means

Page 24: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

Improved K-means

Page 25: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

Gaussian Mixture Model for Clustering Assume that data are

generated from a mixture of Gaussian distributions

For each Gaussian distribution Center: i

Variance: i (ignore)

For each data point Determine membership

( | )ji i jz p x

Page 26: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

Learning a Gaussian Mixture(with known covariance)

Probability ( )ip x x

2

/ 2 22

( ) ( , ) ( ) ( | )

1( ) exp

22

j j

j

i i j j i j

i jj d

p x x p x x p p x x

xp

Log-likelihood of unlabeled data

Find optimal parameters

2

/ 2 22

1log ( ) log ( ) exp

22j

i ji j d

i i

xp x x p

( ),j j jp

Page 27: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

Learning a Gaussian Mixture(with known covariance)

22

22

1( )

2

1( )

2

1

( )

( )

i j

i n

x

j

k x

kn

e p

e p

1

( | ) ( )[ ]

( | ) ( )

i j jij k

i n jn

p x x pE z

p x x p

E-Step

m

iiijj xzE

m 1

][1M-Step

1

1( ) [ ]

m

j iji

p E zm

Page 28: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

Gaussian Mixture Example: Start

Page 29: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

After First Iteration

Page 30: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

After 2nd Iteration

Page 31: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

After 3rd Iteration

Page 32: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

After 4th Iteration

Page 33: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

After 5th Iteration

Page 34: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

After 6th Iteration

Page 35: Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

After 20th Iteration