dimensionality reduction part 2: nonlinear methods

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Dimensionality ReductionPart 2: Nonlinear Methods

Comp 790-090Spring 2007


Previously…

Linear Methods for Dimensionality Reduction

PCA: rotate data so that principal axes lie in direction of maximum varianceMDS: find coordinates that best preserve pairwise distances

PCA


MotivationLinear Dimensionality Reduction doesn’t always workData violates underlying “linear”assumptions

Data is not accurately modeled by “affine” combinations of measurementsStructure of data, while apparent, is not simpleIn the end, linear methods do nothing more than “globally transform” (rate, translate, and scale) all of the data, sometime what’s needed is to “unwrap” the data first


Stopgap Remedies

Local PCACompute PCA models for small overlapping item neighborhoodsRequires a clustering preprocessFast and simple, but results in no global parameterization

Neural NetworksAssumes a solution of a given dimensionUses relaxation methods to deform given solution to find a better fitRelaxation step is modeled as “layers” in a network where properties of future iterations are computed based on information from the current structureMany successes, but a bit of an art


Why Linear Modeling Fails

Suppose that your sample data lies on some low-dimensional surface embedded within the high-dimensional measurement space. Linear models allow ALLaffine combinationsOften, certaincombinations are atypicalof the actual dataRecognizing this isharder as dimensionalityincreases


What does PCA Really Model?

Principle Component Analysis assumptionsMean-centered distribution

What if the mean, itself is atypical?

Eigenvectors ofCovariance

Basis vectors alignedwith successive directionsof greatest variance

Classic 1st Orderstatistical model

Distribution is characterizedby its mean and variance (Gaussian Hyperspheres)


Non-Linear Dimensionality Reduction

Non-linear Manifold LearningInstead of preserving global pairwise distances, non-linear dimensionality reduction tries to preserve only the geometric properties of local neighborhoods

Discover a lower-dimensional“embedding” manifoldFind a parameterizationover that manifold

Linear parameter spaceProjection mappingfrom original M-Dspace to d-Dembedding space

Linear Embedding Space

“projection”

“reprojection,elevating, or

lifting”


Nonlinear DimRedux Steps

Discover a low-dimensional embedding manifoldFind a parameterization over the manifoldProject data into parameter spaceAnalyze, interpolate, and compress in embedding space

Orient (by linear transformation) the parameter space to align axes with salient features

Linear (affine) combinations are valid here

In the case of interpolation and compression use “lifting” to estimate M-D original data


Nonlinear Methods

Local Linear Embeddings [Roweis 2000]Isomaps [Tenenbaum 2000]These two papers ignited the fieldPrincipled approach (Asymptotically, as the amount of data goes to infinity they have been proven to find the “real” manifold)Widely appliedHotly contested


Local Linear Embeddings

First InsightLocally, at a fine enough scale, everything looks linear


Local Linear Embeddings

First InsightFind an affine combination the “neighborhood” about a point

that best approximates it


Finding a Good Neighborhood

This is the remaining “Art” aspect of nonlinear methodsCommon choices-ball: find all items that lie within an epsilon ball of the target item as measured under some metric

Best if density of items is high and every point has a sufficient number of neighbors

K-nearest neighbors: find the k-closest neighbors to a point under some metric

Guarantees all items are similarly represented, limits dimension to K-1


Affine “Neighbor” CombinationsWithin locally linear neighborhoods, each point can be considered as an affine combination of its neighbors

1ijj

w

Weights should still be valid in lower-dimensional embedding space

( )i

i jijj neighbor x

x w x

Imagine cutting out patches from

manifold and placing them in

lower-dim so that angles between

points are preserved.


Find WeightsRewriting as a matrix for all x

Reorganizing

Want to find W that minimizes , and satisfies “sum-to-one” constraintEnds up as constrained “least-squares” problem

0000

00

000

1

3231

2321

12

321321

NN

nn

www

www

xxxxxxxx

~~~~

N

M

N

M

2

iNeighborj jjiii xwx

“Unknown W matrix”N

N


Find Linear Embedding Space

Now that we have the weight matrix W, find the linear vector that satisfies the following

where W is N x N and X is M x NThis can be found by finding the null space of

Classic problem: run SVD on and find the orthogonal vector associated with the smallest d singular values (the smallest singular value will be zero and represent the system’s invariance to translation)

XWX

XAWIX )(0A


Numerical IssuesNumerical problems can arise in computing LLEsThe least-squared covariance matrix that arises in the computation of the weighting matrix, W, solution can be ill-conditioned

Regularization (rescale the measurements by adding a small multiple of the Identity to covariance matrix)

Finding small singular (eigen) values is not as well conditioned as finding large ones. The small ones are subject to numerical precision errors, and to get mixed

Good (but slow) solvers exist, you have to use them


Results

The resulting parameter vector, yi, gives the coordinates associated with the item xi

The dth embedding coordinate is formed from the orthogonal vector associated with thedst singular value of A.


ReprojectionOften, for data analysis, a parameterization is enoughFor interpolation and compression we might want to map points from the parameter space back to the “original” spaceNo perfect solution, but a few approximations

Delauney triangulate the points in the embedding space, find the triangle that the desired parameter setting falls into, and compute the baricenric coordinates of it, and use them as weightsInterpolate by using a radially symmetric kernel centered about the desired parameter settingWorks, but mappings might not be one-to-one


LLE Example3-D S-Curve manifold with points color-codedCompute a 2-D embedding

The local affine structure is well maintainedThe metric structure is okay locally, but can drift slowly over the domain (this causes the manifold to taper)


More LLE Examples


LLE FailuresDoes not work on to closed manifoldsCannot recognize Topology


IsomapAn alternative non-linear dimensionality reduction method that extends MDSKey Observation:

On a manifold distances are measured using geodesic distances rather than Euclidean distances

Small Euclidean distance

Large geodesic distance


Problem: How to Get Geodesics

Without knowledge of the manifold it is difficult to compute the geodesic distance between pointsIt is even difficult if you know the manifoldSolution

Use a discrete geodesic approximationApply a graph algorithm to approximate the geodesic distances


Dijkstra’s Algorithm

Efficient Solution to all-points-shortest path problemGreedy breath-first algorithm


Isomap algorithmCompute fully-connected neighborhood of points for each item• Can be k nearest

neighbors or ε-ball• Neighborhoods must

be symmetric• Test that resulting

graph is fully-connected, if not increase either K or

Calculate pairwise Euclidean distances within each neighborhoodUse Dijkstra’s Algorithm to compute shortest path from each point to non-neighboring pointsRun MDS on resulting distance matrix


Isomap ResultsFind a 2D embedding of the 3D S-curve (also shown for LLE)Isomap does a good job of preserving metric structure (not surprising)The affine structure is also well preserved


Residual Fitting Error


Neighborhood Graph


More Isomap Results


Isomap FailuresIsomap also has problems on closed manifolds of arbitrary topology


Non-Linear ExampleA Data-Driven Reflectance Model (Matusik et al, Siggraph2003)Bidirectional Reflectance Distribution Functions(BRDF)

Define ratio of the reflected radiance in a particular direction to the incident irradiance from direction.

Isotropic BRDF),,(),,,( irrirrii ff


Modeling Bidirectional Reflectance Distribution Functions(BRDFs)

Measurement


A “fast” BRDF measurement device inspired by Marshner[1998]

Measurement


Measurement

20-80 million reflectance measurements per materialEach tabulated BRDF entails 90x90x180x3=4,374,000 measurement bins


Rendering from Tabulated BRDFs

Even without further analysis, our BRDFs are immediately usefulRenderings made with Henrik Wann Jensen’s Dali renderer

Nickel Hematite Gold Paint Pink Felt


BRDFs as Vectors in High-Dimensional Space

Each tabulated BRDF is a vector in 90x90x180x3 =4,374,000 dimensional space

Unroll

90

90

180

4,374,000


Linear Analysis (PCA)

Find optimal “linear basis”for our data set45 componentsneeded to reduce

residue to under measurement error

0 20 40 60 80 100 120

Eigenvalue magnitude

Dimension

mean 5 10 20 30 45 60 all


Problems with Linear Subspace Modeling

Large number of basis vectors (45)Some linear combinations yield invalid or unlikely BRDFs (outside convex hull)


Problems with Linear Subspace Modeling

Large number of basis vectors (45)Some linear combinations yield invalid or unlikely BRDFs (inside convex hull)


Results of Non-LinearManifold Learning

At 15 dimensions reconstruction error is less than 1%

Parameter count similar to analytical models

5 10 15

Dimensionality

Error


Non-Linear Advantages15-dimensional parameter spaceMore robust than linear model

More extrapolations are plausible

Non-linear ModelExtrapolation

Linear Model Extrapolation


Non-Linear Model Results


Representing Physical Processes

Steel Oxidation


SummaryNon-Linear Dimensionality Reduction Methods

These methods are considerably more powerful and temperamental than linear methodApplications of these methods are a hot area of research

ComparisonsLLE is generally faster, but more brittle than IsomapsIsomaps tends to work better on smaller data sets(i.e. less dense sampling)Isomaps tends to be less sensitive to noise (perturbation of the input vectors)

IssuesNeither method handles closed manifolds and topological variations well

dimensionality reduction part 2: nonlinear methods

Documents