machine learning dimensionality reduction

37
Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Dimensionality Reduction slides thanks to Xiaoli Fern (CS534, Oregon State Univ., 2011)

Upload: darva

Post on 07-Jan-2016

60 views

Category:

Documents


0 download

DESCRIPTION

Machine Learning Dimensionality Reduction. slides thanks to Xiaoli Fern (CS534, Oregon State Univ., 2011). Dimensionality reduction. Many modern data domains involve huge numbers of features / dimensions Documents: thousands of words, millions of bigrams - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 1

Machine Learning

Dimensionality Reduction

slides thanks to Xiaoli Fern (CS534, Oregon State Univ., 2011)

Page 2: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 2

Many modern data domains involve huge numbers of features / dimensions

– Documents: thousands of words, millions of bigrams

– Images: thousands to millions of pixels

– Genomics: thousands of genes, millions of DNA polymorphisms

Dimensionality reduction

Page 3: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 3

High dimensionality has many costs

– Redundant and irrelevant features degrade performance of some ML algorithms

– Difficulty in interpretation and visualization

– Computation may become infeasible what if your algorithm scales as O( n3 )?

– Curse of dimensionality

Why reduce dimensions?

Page 4: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 4

Page 5: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 5

Page 6: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 6

Page 7: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 7

Page 8: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 8

Page 9: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 9

Page 10: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 10

Page 11: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 11

Repeat until m lines

Page 12: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 12

Mean center the data

Compute covariance matrix

Calculate eigenvalues and eigenvectors of – Eigenvector with largest eigenvalue 1 is 1st principal

component (PC)

– Eigenvector with kth largest eigenvalue k is kth PC

– k / i i = proportion of variance captured by kth PC

Steps in principal component analysis

Page 13: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 13

Full set of PCs comprise a new orthogonal basis for feature space, whose axes are aligned with the maximum variances of original data.

Projection of original data onto first k PCs gives a reduced dimensionality representation of the data.

Transforming reduced dimensionality projection back into original space gives a reduced dimensionality reconstruction of the original data.

Reconstruction will have some error, but it can be small and often is acceptable given the other benefits of dimensionality reduction.

Applying a principal component analysis

Page 14: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 14

PCA example

original data mean centered data withPCs overlayed

Page 15: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 15

PCA example

original data projectedInto full PC space

original data reconstructed usingonly a single PC

Page 16: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 16

Page 17: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 17

Choosing the dimension k

Page 18: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 18

Page 19: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 19

Page 20: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 20

Page 21: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 21

Page 22: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 22

Helps reduce computational complexity. Can help supervised learning.

– Reduced dimension simpler hypothesis space.

– Smaller VC dimension less risk of overfitting.

PCA can also be seen as noise reduction. Caveats:

– Fails when data consists of multiple separate clusters.

– Directions of greatest variance may not be most informative (i.e. greatest classification power).

PCA: a useful preprocessing step

Page 23: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 23

Page 24: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 24

Page 25: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 25

Page 26: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 26

Page 27: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 27

Page 28: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 28

Page 29: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 29

Page 30: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 30

Page 31: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 31

Page 32: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 32

Page 33: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 33

Page 34: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 34

Page 35: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 35

Per Tom Dietterich:

“Methods that can be applied directly to data without requiring a great deal of time-consuming data preprocessing or careful tuning of the learning procedure.”

Off-the-shelf classifiers

Page 36: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 36

Off-the-shelf criteria

slide thanks to Tom Dietterich (CS534, Oregon State Univ., 2005)

Page 37: Machine Learning Dimensionality Reduction

Jeff Howbert Introduction to Machine Learning Winter 2012 37

from Andrew Ng at Stanford

slides:

http://cs229.stanford.edu/materials/ML-advice.pdf

video:

http://www.youtube.com/v/sQ8T9b-uGVE(starting at 24:56)

Practical advice on machine learning