cs221 lecture6-fall11

of 36 /36
CS 221: Artificial Intelligence Lecture 6: Advanced Machine Learning Sebastian Thrun and Peter Norvig Slide credit: Mark Pollefeys, Dan Klein, Chris Manning

Author: darwinrlo

Post on 11-May-2015

298 views

Category:

Education


1 download

Embed Size (px)

TRANSCRIPT

  • 1.CS 221: Artificial Intelligence Lecture 6: Advanced Machine Learning Sebastian Thrun and Peter Norvig Slide credit: Mark Pollefeys, Dan Klein, Chris Manning

2. Outline

  • Clustering
    • K-Means
    • EM
    • Spectral Clustering
  • Dimensionality Reduction

3. The unsupervised learning problem Many data points, no labels 4. Unsupervised Learning?

  • Google Street View

5. K-Means Many data points, no labels 6. K-Means

  • Choose a fixed number of clusters
  • Choose cluster centers and point-cluster allocations to minimize error
  • can t do this by exhaustive search, because there are too many possible allocations.
  • Algorithm
    • fix cluster centers; allocate points to closest cluster
    • fix allocation; compute best cluster centers
  • x could be any set of features for which we can compute a distance (careful about scaling)

* From Marc Pollefeys COMP 2562003 7. K-Means 8. K-Means * From Marc Pollefeys COMP 2562003 9. K-means clustering using intensity alone and color alone Image Clusters on intensity Clusters on color * From Marc Pollefeys COMP 2562003 Results of K-Means Clustering: 10. K-Means

  • Is an approximation to EM
    • Model (hypothesis space): Mixture of N Gaussians
    • Latent variables: Correspondence of data and Gaussians
  • We notice:
    • Given the mixture model, it s easy to calculate the correspondence
    • Given the correspondence it s easy to estimate the mixture models

11. Expectation Maximzation: Idea

  • Data generated from mixture of Gaussians
  • Latent variables: Correspondence between Data Items and Gaussians

12. Generalized K-Means (EM) 13. Gaussians 14. ML Fitting Gaussians 15. Learning a Gaussian Mixture (with known covariance) M-Step E-Step 16. Expectation Maximization

  • Converges!
  • Proof[Neal/Hinton, McLachlan/Krishnan] :
    • E/M step does not decrease data likelihood
    • Converges at local minimum or saddle point
  • But subject to local minima

17. EM Clustering: Results http://www.ece.neu.edu/groups/rpl/kmeans/ 18. Practical EM

  • Number of Clusters unknown
  • Suffers (badly) from local minima
  • Algorithm:
    • Start new cluster center if many points unexplained
    • Kill cluster center that doesnt contribute
    • (Use AIC/BIC criterion for all this, if you want to be formal)

19. Spectral Clustering 20. Spectral Clustering 21. The Two Spiral Problem 22. Spectral Clustering: Overview Data Similarities Block-Detection * Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University 23. Eigenvectors and Blocks

  • Block matrices have block eigenvectors:
  • Near-block matrices have near-block eigenvectors:[Nget al ., NIPS 02]

eigensolver 1 = 2 2 = 2 3 = 0 4 = 0 eigensolver 1 = 2.02 2 = 2.02 3 = -0.02 4 = -0.02 * Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University 1 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 .71 .71 0 0 0 0 .71 .71 1 1 .2 0 1 1 0 -.2 .2 0 1 1 0 -.2 1 1 .71 .69 .14 0 0 -.14 .69 .71 24. Spectral Space

  • Can put items into blocks by eigenvectors:
  • Resulting clusters independent of row ordering:

e 1 e 2 e 1 e 2 e 1 e 2 e 1 e 2 * Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University 1 1 .2 0 1 1 0 -.2 .2 0 1 1 0 -.2 1 1 .71 .69 .14 0 0 -.14 .69 .71 1 .2 1 0 .2 1 0 1 1 0 1 -.2 0 1 -.2 1 .71 .14 .69 0 0 .69 -.14 .71 25. The Spectral Advantage

  • The key advantage of spectral clustering is the spectral space representation:

* Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University 26. Measuring Affinity Intensity Texture Distance * From Marc Pollefeys COMP 2562003 27. Scale affects affinity * From Marc Pollefeys COMP 2562003 28. * From Marc Pollefeys COMP 2562003 29. Dimensionality Reduction 30. The Space of Digits (in 2D) M. Brand, MERL 31. Dimensionality Reduction with PCA [email_address] 32. Linear: Principal Components

  • Fit multivariate Gaussian
  • Compute eigenvectors of Covariance
  • Project onto eigenvectors with largest eigenvalues

33. Other examples of unsupervised learning Mean face (after alignment) Slide credit: Santiago Serrano 34. Eigenfaces Slide credit: Santiago Serrano 35. Non-Linear Techniques

  • Isomap
  • Local Linear Embedding

Isomap, Science, M. Balasubmaranian and E. Schwartz 36. Scape (Drago Anguelov et al)