dimensionality reduction part 2: nonlinear methods

48
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction Part 2: Nonlinear Methods Comp 790-090 Spring 2007

Upload: wenda

Post on 05-Jan-2016

35 views

Category:

Documents


4 download

DESCRIPTION

Dimensionality Reduction Part 2: Nonlinear Methods. Comp 790-090 Spring 2007. Why Dimensionality Reduction. Two approaches to reduce number of features Feature selection: select the salient features by some criteria - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

The UNIVERSITY of KENTUCKY

Dimensionality ReductionPart 2: Nonlinear Methods

Comp 790-090Spring 2007

Page 2: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Why Dimensionality Reduction

• Two approaches to reduce number of features– Feature selection: select the salient features by

some criteria– Feature extraction: obtain a reduced set of

features by a transformation of all features

• Data visualization and exploratory data analysis also need to reduce dimension– Usually reduce to 2D or 3D

Page 3: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Deficiencies of Linear Methods

• Data may not be best summarized by linear combination of features– Example: PCA cannot discover 1D structure of a

helix

-1-0.5

00.5

1

-1

-0.5

0

0.5

10

5

10

15

20

Page 4: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Intuition: how does your brain store these pictures?

Page 5: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Brain Representation

Page 6: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Brain Representation

• Every pixel?• Or perceptually meaningful

structure?– Up-down pose– Left-right pose– Lighting directionSo, your brain successfully

reduced the high-dimensional inputs to an intrinsically 3-dimensional manifold!

Page 7: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Manifold Learning

• Discover low dimensional representations (smooth manifold) for data in high dimension.

• Linear approaches(PCA, MDS)

• Non-linear approaches (ISOMAP, LLE, others)

d

i Ry Y

XN

i Rx

latent

observed

Page 8: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Linear Approach- PCA

• PCA Finds subspace linear projections of input data.

Page 9: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Linear Method

• Linear Methods for Dimensionality Reduction– PCA: rotate data so that principal axes lie in direction

of maximum variance– MDS: find coordinates that best preserve pairwise

distances

PCA

Page 10: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Motivation

• Linear Dimensionality Reduction doesn’t always work

• Data violates underlying “linear”assumptions– Data is not accurately modeled by “affine”

combinations of measurements– Structure of data, while apparent, is not

simple– In the end, linear methods do nothing

more than “globally transform” (rate, translate, and scale) all of the data, sometime what’s needed is to “unwrap” the data first

Page 11: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

What does PCA Really Model?

• Principle Component Analysis assumptions• Mean-centered distribution

– What if the mean, itself is atypical?• Eigenvectors of

Covariance– Basis vectors aligned

with successive directionsof greatest variance

• Classic 1st Orderstatistical model– Distribution is characterized

by its mean and variance (Gaussian Hyperspheres)

Page 12: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Nonlinear Approaches- Isomap

• Constructing neighbourhood graph G• For each pair of points in G, Computing shortest path distances ----

geodesic distances.• Use Classical MDS with geodesic distances. Euclidean distance Geodesic distance

Josh. Tenenbaum, Vin de Silva, John langford 2000

Page 13: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Sample points with Swiss Roll

• Altogether there are 20,000 points in the “Swiss roll” data set. We sample 1000 out of 20,000.

Page 14: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Construct neighborhood graph GK- nearest neighborhood (K=7)DG is 1000 by 1000 (Euclidean) distance matrix of two neighbors (figure A)

Page 15: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Compute all-points shortest path in G

Now DG is 1000 by 1000 geodesic distance matrix of two arbitrary points along the manifold (figure B)

Page 16: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Find a d-dimensional Euclidean space Y (Figure c) to preserve the pariwise diatances.

Use MDS to embed graph in Rd

Page 17: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Isomap

• Key Observation:

On a manifold distances are measured using geodesic distances rather than Euclidean distances

Small Euclidean distance

Large geodesic distance

Page 18: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Problem: How to Get Geodesics

• Without knowledge of the manifold it is difficult to compute the geodesic distance between points

• It is even difficult if you know the manifold• Solution

– Use a discrete geodesic approximation– Apply a graph algorithm to approximate the

geodesic distances

Page 19: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Dijkstra’s Algorithm• Efficient Solution to all-points-shortest path

problem• Greedy breath-first algorithm

Page 20: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Dijkstra’s Algorithm

• Efficient Solution to all-points-shortest path problem

• Greedy breath-first algorithm

Page 21: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Dijkstra’s Algorithm

• Efficient Solution to all-points-shortest path problem

• Greedy breath-first algorithm

Page 22: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Dijkstra’s Algorithm

• Efficient Solution to all-points-shortest path problem

• Greedy breath-first algorithm

Page 23: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Isomap algorithm– Compute fully-connected

neighborhood of points for each item

• Can be k nearest neighbors or ε-ball

• Neighborhoods must be symmetric

• Test that resulting graph is fully-connected, if not increase either K or

– Calculate pairwise Euclidean distances within each neighborhood

– Use Dijkstra’s Algorithm to compute shortest path from each point to non-neighboring points

– Run MDS on resulting distance matrix

Page 24: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Isomap Results• Find a 2D embedding of the 3D S-curve

(also shown for LLE)• Isomap does a good job of preserving metric structure

(not surprising)• The affine structure is also well preserved

Page 25: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Residual Fitting Error

Page 26: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Neighborhood Graph

Page 27: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

More Isomap Results

Page 28: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

More Isomap Results

Page 29: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Isomap Failures• Isomap also has problems on closed manifolds of

arbitrary topology

Page 30: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Local Linear Embeddings

• First Insight– Locally, at a fine enough scale, everything looks

linear

Page 31: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Local Linear Embeddings

• First Insight– Find an affine combination the “neighborhood”

about a point that best approximates it

Page 32: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Finding a Good Neighborhood

• This is the remaining “Art” aspect of nonlinear methods

• Common choices• -ball: find all items that lie within an epsilon ball of

the target item as measured under some metric– Best if density of items is high and every point has a

sufficient number of neighbors• K-nearest neighbors: find the k-closest neighbors to

a point under some metric– Guarantees all items are similarly represented, limits

dimension to K-1

Page 33: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Characterictics of a Manifold

Locally it is a linear patch

Key: how to combine all localpatches together?

M

x1

x2R2

Rn

Page 34: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

LLE: Intuition• Assumption: manifold is approximately “linear”

when viewed locally, that is, in a small neighborhood– Approximation error, e(W), can be made small

• Local neighborhood is effected by the constraint Wij=0 if zi is not a neighbor of zj

• A good projection should preserve this local geometric property as much as possible

Page 35: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

LLE: IntuitionWe expect each data point and its neighbors to lie on or close to a locally linear patch of themanifold.

Each point can be written as a linear combination of its neighbors.The weights chosen tominimize the reconstructionError.

Page 36: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

LLE: Intuition

• The weights that minimize the reconstruction errors are invariant to rotation, rescaling and translation of the data points.

– Invariance to translation is enforced by adding the constraint that the weights sum to one.

– The weights characterize the intrinsic geometric properties of each neighborhood.

• The same weights that reconstruct the data points in D dimensions should reconstruct it in the manifold in d dimensions.– Local geometry is preserved

Page 37: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

LLE: Intuition

Use the same weights from the original space

Low-dimensional embedding

the i-th row of W

Page 38: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Local Linear Embedding (LLE)• Assumption: manifold is approximately “linear” when viewed locally, that is, in a

small neighborhood

• Approximation error, (W), can be made small

• Meaning of W: a linear representation of every data point by its neighbors– This is an intrinsic geometrical property of the manifold

• A good projection should preserve this geometric property as much as possible

Page 39: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Constrained Least Square problemCompute the optimal weight for each point individually:

Neightbors of x

Zero for all non-neighbors of x

Page 40: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Finding a Map to a Lower Dimensional Space

• Yi in Rk: projected vector for Xi

• The geometrical property is best preserved if the error below is small

• Y is given by the eigenvectors of the lowest d non-zero eigenvalues of the matrix

Use the same weightscomputed above

Page 41: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Numerical Issues

• Numerical problems can arise in computing LLEs• The least-squared covariance matrix that arises

in the computation of the weighting matrix, W, solution can be ill-conditioned– Regularization (rescale the measurements by adding

a small multiple of the Identity to covariance matrix)• Finding small singular (eigen) values is not as

well conditioned as finding large ones. The small ones are subject to numerical precision errors, and to get mixed– Good (but slow) solvers exist, you have to use them

Page 42: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Results

• The resulting parameter vector, yi, gives the coordinates associated with the item xi

• The dth embedding coordinate is formed from the orthogonal vector associated with thedst singular value of A.

Page 43: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Reprojection

• Often, for data analysis, a parameterization is enough• For interpolation and compression we might want to

map points from the parameter space back to the “original” space

• No perfect solution, but a few approximations– Delauney triangulate the points in the embedding space, find

the triangle that the desired parameter setting falls into, and compute the baricenric coordinates of it, and use them as weights

– Interpolate by using a radially symmetric kernel centered about the desired parameter setting

– Works, but mappings might not be one-to-one

Page 44: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

LLE Example• 3-D S-Curve manifold with points color-coded• Compute a 2-D embedding

– The local affine structure is well maintained– The metric structure is okay locally, but can drift slowly over the

domain (this causes the manifold to taper)

Page 45: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

More LLE Examples

Page 46: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

More LLE Examples

Page 47: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

LLE Failures• Does not work on to closed manifolds• Cannot recognize Topology

Page 48: Dimensionality Reduction Part 2: Nonlinear Methods

CS685 : Special Topics in Data Mining, UKY

Summary• Non-Linear Dimensionality Reduction Methods

– These methods are considerably more powerful and temperamental than linear method

– Applications of these methods are a hot area of research

• Comparisons– LLE is generally faster, but more brittle than Isomaps– Isomaps tends to work better on smaller data sets

(i.e. less dense sampling)– Isomaps tends to be less sensitive to noise

(perturbation of the input vectors)

• Issues– Neither method handles closed manifolds and topological variations well