cs221 lecture6-fall11

36
CS 221: Artificial Intelligence Lecture 6: Advanced Machine Learning Sebastian Thrun and Peter Norvig Slide credit: Mark Pollefeys, Dan Klein, Chris Manning

Upload: darwinrlo

Post on 11-May-2015

305 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Cs221 lecture6-fall11

CS 221: Artificial Intelligence

Lecture 6:

Advanced Machine Learning

Sebastian Thrun and Peter Norvig

Slide credit: Mark Pollefeys, Dan Klein, Chris Manning

Page 2: Cs221 lecture6-fall11

Outline

Clustering K-Means EM Spectral Clustering

Dimensionality Reduction

2

Page 3: Cs221 lecture6-fall11

The unsupervised learning problem

3

Many data points, no labels

Page 4: Cs221 lecture6-fall11

Unsupervised Learning?

Google Street View

4

Page 5: Cs221 lecture6-fall11

K-Means

5

Many data points, no labels

Page 6: Cs221 lecture6-fall11

K-Means

Choose a fixed number of clusters

Choose cluster centers and point-cluster allocations to minimize error

can’t do this by exhaustive search, because there are too many possible allocations.

Algorithm fix cluster centers;

allocate points to closest cluster

fix allocation; compute best cluster centers

x could be any set of features for which we can compute a distance (careful about scaling)

x j i2

jelements of i'th cluster

iclusters

* From Marc Pollefeys COMP 256 2003

Page 7: Cs221 lecture6-fall11

K-Means

Page 8: Cs221 lecture6-fall11

K-Means

* From Marc Pollefeys COMP 256 2003

Page 9: Cs221 lecture6-fall11

K-means clustering using intensity alone and color alone

Image Clusters on intensity Clusters on color

* From Marc Pollefeys COMP 256 2003

Results of K-Means Clustering:

Page 10: Cs221 lecture6-fall11

K-Means

Is an approximation to EM Model (hypothesis space): Mixture of N Gaussians Latent variables: Correspondence of data and Gaussians

We notice: Given the mixture model, it’s easy to calculate the

correspondence Given the correspondence it’s easy to estimate the mixture

models

Page 11: Cs221 lecture6-fall11

Expectation Maximzation: Idea

Data generated from mixture of Gaussians

Latent variables: Correspondence between Data Items and Gaussians

Page 12: Cs221 lecture6-fall11

Generalized K-Means (EM)

Page 13: Cs221 lecture6-fall11

Gaussians

13

Page 14: Cs221 lecture6-fall11

ML Fitting Gaussians

14

Page 15: Cs221 lecture6-fall11

Learning a Gaussian Mixture(with known covariance)

k

n

x

x

ni

ji

e

e

1

)(2

1

)(2

1

22

22

M-Step

k

nni

jiij

xxp

xxpzE

1

)|(

)|(][

E-Step

Page 16: Cs221 lecture6-fall11

Expectation Maximization

Converges! Proof [Neal/Hinton, McLachlan/Krishnan]:

E/M step does not decrease data likelihood Converges at local minimum or saddle point

But subject to local minima

Page 17: Cs221 lecture6-fall11

EM Clustering: Results

http://www.ece.neu.edu/groups/rpl/kmeans/

Page 18: Cs221 lecture6-fall11

Practical EM

Number of Clusters unknown Suffers (badly) from local minima Algorithm:

Start new cluster center if many points “unexplained”

Kill cluster center that doesn’t contribute (Use AIC/BIC criterion for all this, if you want

to be formal)

18

Page 19: Cs221 lecture6-fall11

Spectral Clustering

19

Page 20: Cs221 lecture6-fall11

Spectral Clustering

20

Page 21: Cs221 lecture6-fall11

The Two Spiral Problem

21

Page 22: Cs221 lecture6-fall11

Spectral Clustering: Overview

Data Similarities Block-Detection

* Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University

Page 23: Cs221 lecture6-fall11

Eigenvectors and Blocks Block matrices have block eigenvectors:

Near-block matrices have near-block eigenvectors: [Ng et al., NIPS 02]

1 1 0 0

1 1 0 0

0 0 1 1

0 0 1 1

eigensolver

.71

.71

0

0

0

0

.71

.71

1= 2 2= 2 3= 0 4= 0

1 1 .2 0

1 1 0 -.2

.2 0 1 1

0 -.2 1 1

eigensolver

.71

.69

.14

0

0

-.14

.69

.71

1= 2.02 2= 2.02 3= -0.02 4= -0.02

* Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University

Page 24: Cs221 lecture6-fall11

Spectral Space

Can put items into blocks by eigenvectors:

Resulting clusters independent of row ordering:

1 1 .2 0

1 1 0 -.2

.2 0 1 1

0 -.2 1 1

.71

.69

.14

0

0

-.14

.69

.71

e1

e2

e1 e2

1 .2 1 0

.2 1 0 1

1 0 1 -.2

0 1 -.2 1

.71

.14

.69

0

0

.69

-.14

.71

e1

e2

e1 e2

* Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University

Page 25: Cs221 lecture6-fall11

The Spectral Advantage The key advantage of spectral clustering is the spectral space

representation:

* Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University

Page 26: Cs221 lecture6-fall11

Measuring Affinity

Intensity

Texture

Distance

aff x, y exp 12 i

2

I x I y 2

aff x, y exp 12 d

2

x y

2

aff x, y exp 12 t

2

c x c y 2

* From Marc Pollefeys COMP 256 2003

Page 27: Cs221 lecture6-fall11

Scale affects affinity

* From Marc Pollefeys COMP 256 2003

Page 28: Cs221 lecture6-fall11

* From Marc Pollefeys COMP 256 2003

Page 29: Cs221 lecture6-fall11

Dimensionality Reduction

29

Page 30: Cs221 lecture6-fall11

The Space of Digits (in 2D)

30M. Brand, MERL

Page 31: Cs221 lecture6-fall11

Dimensionality Reduction with PCA

[email protected]

Page 32: Cs221 lecture6-fall11

Linear: Principal Components

Fit multivariate Gaussian Compute eigenvectors of

Covariance Project onto eigenvectors

with largest eigenvalues

32

Page 33: Cs221 lecture6-fall11

Other examples of unsupervised learning

33

Mean face (after alignment)

Slide credit: Santiago Serrano

Page 34: Cs221 lecture6-fall11

Eigenfaces

34

Slide credit: Santiago Serrano

Page 35: Cs221 lecture6-fall11

Non-Linear Techniques

Isomap Local Linear Embedding

35Isomap, Science, M. Balasubmaranian and E. Schwartz

Page 36: Cs221 lecture6-fall11

Scape (Drago Anguelov et al)

36