recent developments in nonlinear dimensionality reduction

Recent developments in nonlinear dimensionality

reduction

Josh Tenenbaum

MIT

Collaborators

• Vin de Silva

• John Langford

• Mira Bernstein

• Mark Steyvers

• Eric Berger

Outline

• The problem of nonlinear dimensionality reduction

• The Isomap algorithm

• Development #1: Curved manifolds

• Development #2: Sparse approximations

Learning an appearance map

• Given input: . . .

• Desired output:– Intrinsic dimensionality: 3– Low-dimensional

representation:

Linear dimensionality reduction: PCA, MDS

• PCA dimensionality of faces:

• First two

PCs:

• Linear manifold: PCA

• Nonlinear manifold: ?

Previous approaches to nonlinear dimensionality reduction

• Local methods seek a set of low-dimensional models, each valid over a limited range of data:– Local PCA

– Mixture of factor analyzers

• Global methods seek a single low-dimensional model valid over the whole data set:– Autoencoder neural networks– Self-organizing map– Elastic net– Principal curves & surfaces– Generative topographic mapping

A generative model

• Latent space Y Rd

• Latent data {yi} Y generated from p(Y)

• Mapping f: YRN for some N > d

• Observed data {xi = f (yi)} RN

Goal: given {xi}, recover f and {yi}.

Chicken-and-egg problem

• We know {xi} . . .

• . . . and if we knew{yi}, could estimate f.

• . . . or if we knew f, could estimate {yi}.

• So use EM, right? Wrong.

The problem of local minima

GTM SOM

• Global nonlinear dimensionality reduction + local optimization = severe local minima

A different approach

• Attempt to infer {yi} directly from {xi}, without explicit reference to f.

• Closed-form, non-iterative, globally optimal solution for {yi}.

• Then can approximate f with a suitable interpolation algorithm (RBFs, local linear, ...).

• In other words, finding f becomes a supervised learning problem on pairs {yi ,xi}.

When does this work?

• Only given some assumptions on the nature of f and the distribution of the {yi}.

• The trick: exploit some invariant of f, a property of the {yi} that is preserved in the {xi}, and that allows the {yi} to be read off uniquely*.

* up to some isomorphism (e.g., rotation).

The assumptions behind three algorithms

No free lunch: weaker assumptions on f stronger assumptions on p(Y).

Distribution: p(Y) Mapping: f Algorithm

ii) convex, dense isometric Isomap

iii) convex, uniformly dense conformal C-Isomap

i) ii) iii)

i) arbitrary linear isometric Classical MDS





i)


Classical MDS

• Invariant: Euclidean distance • Algorithm:

– Calculate Euclidean distance matrix D– Convert D to canonical inner product matrix B by

“double centering”:

– Compute {yi} from eigenvectors of B.

ijij

jij

iijijij d

nd

nd

ndb 2

2222 111

2

1





ii)


Isomap

• Invariant: geodesic distance

The Isomap algorithm• Construct neighborhood graph G.

– method– K method

• Compute shortest paths in G, with edge ij weighted by the Euclidean distance |xi - xj|.

– Floyd – Dijkstra (+ Fibonacci heaps)

• Reconstruct low-dimensional latent data {yi}.

– Classical MDS on graph distances– Sparse MDS with landmarks

Illustration on swiss roll

Discovering the dimensionality

• Measure residual variance in geodesic distances . . .

• . . . and find the elbow.

MDS / PCA

Isomap

Theoretical analysis of asymptotic convergence

• Conditions for PAC-style asymptotic convergence– Geometric:

• Mapping f is isometric to a subset of Euclidean space (i.e., zero intrinsic curvature).

– Statistical: • Latent data {yi} are a “representative” sample* from

a convex domain.

* Minimum distance from any point on the manifold to a sample point < e.g., variable density Poisson process).

Theoretical results on the rate of convergence

• Upper bound on the number of data points required.

• Rate of convergence depends on several geometric parameters of the manifold: – Intrinsic:

• dimensionality

– Embedding-dependent: • minimal radius of curvature

• minimal branch separation

Face under varying pose and illumination

• Dimensionality

• pictureMDS / PCA

Isomap

Hand under nonrigid articulation

• Dimensionality

• pictureMDS / PCA

Isomap

Apparent motion

Digits

• Dimensionality

• picture. MDS / PCA

Isomap

Summary of Isomap

A framework for global nonlinear dimensionality reduction that preserves the crucial features of PCA and classical MDS:

• A noniterative, polynomial-time algorithm.• Guaranteed to construct a globally optimal Euclidean

embedding. • Guaranteed to converge asymptotically for an important class

of nonlinear manifolds.

Plus, good results on real and nontrivial synthetic data sets.

Outline





Locally Linear Embedding (LLE)

• Roweis and Saul (2000)

Comparing LLE and Isomap

• Both start with only local metric information.• Isomap first estimates global metric structure, then

finds an embedding that optimally preserves global structure.

• LLE finds an embedding that optimally preserves only local structure.

• LLE may be more efficient, but may also introduce unpredictable global distortions.

• No asymptotic convergence results for LLE.

LLE Isomap

Outline









iii)


Isometric vs. conformal mapping

• Isometric map: preserves the Euclidean metric at each point y.

• Conformal map: preserves the Euclidean metric at each point y, up to an arbitrary scale factor (y) > 0.

• Properties of conformal maps: – Angle-preserving.– Any subset topologically equivalent to a disk can be

conformally mapped onto a disk.

)()()( iYX yiMiM

C-Isomap

• Invariant: ,

,

f

ijjiX xxiM

||)(ijjiY yyiM

||)(

independent of i

Y

X

The Isomap algorithm• Construct neighborhood graph G.


• Compute shortest paths in G, with edge ij weighted by the Euclidean distance |xi - xj|.

– Floyd – Dijkstra (+ Fibonacci heaps)



The C-Isomap algorithm• Construct neighborhood graph G.


• Compute shortest paths in G, with edge ij weighted by rescaled distance – Floyd – Dijkstra (+ Fibonacci heaps)



)()(|| jMiMxx XXji

Conformal fishbowl

Data MDS Isomap

C-Isomap LLE GTM

Uniform fishbowl

Data MDS Isomap

C-Isomap LLE GTM

Conformal fishbowl, Gaussian density

Latent data C-Isomap LLE

Conformal fishbowl, offset Gaussian density

Latent data C-Isomap LLE

Wavelet

Data MDS Isomap

C-Isomap LLE GTM

Images of Tom’s face

• Two intrinsic degrees of freedom:– Translation: left/right– Zoom: in/out

• Scale variables (e.g., zoom) introduce conformal distortion.

. . .

Face under translation and zoom

Data MDS Isomap

C-Isomap LLE GTM

Curvature in LLE vs. Isomap

• LLE: +/- Approach: look only at local structure, ignoring global structure.

- Asymptotics: unknown.

+ Nonconformal maps: good for some, but not all.

• Isomap: +/- Approach: explicitly estimate, and factor out, local metric distortion (assuming uniform density).

+ Asymptotics: succeeds for all conformal mappings.

+ Nonconformal maps: good for some, but not all.

recent developments in nonlinear dimensionality reduction

Documents

invariant of f

nature of f

falgorithmii convex

f u stronger assumptions

intrinsic dimensionality

py mapping f

data set

weaker assumptions