richard g. baraniuk chinmay hegde sriram nagaraj

Richard G. Baraniuk Chinmay Hegde Sriram Nagaraj

Manifold Learning in the WildA New Manifold Modeling and Learning Framework for Image Ensembles

Aswin C. SankaranarayananRice University

Sensor Data Deluge

Concise Models• Efficient processing / compression requires

concise representation• Our interest in this talk: Collections of images

Concise Models• Our interest in this talk:

Collections of image parameterized by q \in

Q– translations of an object

q: x-offset and y-offset

– rotations of a 3D objectq: pitch, roll, yaw

– wedgeletsq: orientation and offset

Concise Models• Our interest in this talk:

Collections of image parameterized by q \in

Q– translations of an object

q: x-offset and y-offset

– rotations of a 3D objectq: pitch, roll, yaw

– wedgeletsq: orientation and offset

• Image articulation manifold

Image Articulation Manifold• N-pixel images:

• K-dimensional articulation space

• Thenis a K-dimensional manifoldin the ambient space

• Very concise model

articulation parameter space

Smooth IAMs• N-pixel images:

• Local isometry image distance parameter space distance

• Linear tangent spacesare close approximationlocally

• Low dimensional articulation space


Smooth IAMs


• N-pixel images:

• Local isometry image distance parameter space distance

• Linear tangent spacesare close approximationlocally

• Low dimensional articulation space

Ex: Manifold Learning

LLEISOMAPLEHEDiff. Geo …

• K=1rotation

Ex: Manifold Learning

• K=2rotation and scale

Theory/Practice Disconnect Smoothness

• Practical image manifolds are not smooth!

• If images have sharp edges, then manifold is everywhere non-differentiable [Donoho and Grimes]

Tangent approximations ?

Failure of Tangent Plane Approx.

• Ex: cross-fading when synthesizing / interpolating images that should lie on manifold

Input Image Input Image

Geodesic Linear path

• Ex: translation manifold

all blue images are equidistant from the red image

• Local isometry

– satisfied only when sampling is dense

0 20 40 60 80 100

0

0.5

1

1.5

2

2.5

3

3.5

Translation q in [px]

Euc

lidea

n di

stan

ce

Theory/Practice Disconnect Isometry

Theory/Practice DisconnectNuisance articulations

• Unsupervised data, invariably, has additional undesired articulations– Illumination– Background clutter, occlusions, …

• Image ensemble is no longer low-dimensional

Image representations

• Conventional representation for an image– A vector of pixels– Inadequate!

pixel image


• Conventional representation for an image– A vector of pixels– Inadequate!

• Remainder of the talk– TWO novel image representations that alleviate the

theoretical/practical challenges in manifold learning on image ensembles

Transport operators for image manifolds

The concept ofTransport operators

))(()( xfIxIrefqq

Beyond point cloud model for image manifolds

refIfI qq

Example

Example: Translation• 2D Translation manifold

• Set of all transport operators =• Beyond a point cloud model

– Action of the articulation is more accurate and meaningful

1I0I

2R

Optical Flow

• Generalizing this idea: Pixel correspondances

• Idea:OF between two images is a natural and accurate transport operator

(Figures from Ce Liu’s optical flow page)

OFfrom I1 to I2

I1 and I2

Optical Flow Transport

0q

1q 2q

IAM

2qI1q

I

Articulations

• Consider a reference imageand a K-dimensional articulation

• Collect optical flows fromto all images reachable by aK-dimensional articulation

Optical Flow Transport

0q

1q 2q

IAM

OFM at 0qI

2qI1q

I0qI

Articulations

• Consider a reference imageand a K-dimensional articulation

• Collect optical flows fromto all images reachable by aK-dimensional articulation

• Collection of OFs is a smooth, K-dimensional manifold(even if IAM is not smooth)

for large class of articulations

OFM is Smooth (Rotation)

00q 30q 60q 30q 60q

-80 -60 -40 -20 0 20 40 60 800

50

100

150

200

Articulation q in [ ]

Inte

nsity

-80 -60 -40 -20 0 20 40 60 80-20

-10

0

10

20

Articulation q in [ ]

Opt

ical

flow

vx i

n pi

xels

Articulation θ in [⁰]

Inte

nsity

I(θ)

Op. fl

ow v

(θ)

Pixel intensityat 3 points

Flow (nearly linear)

Main results• Local model at each

• Each point on the OFM defines a transport operator– Each transport operator maps

to one of its neighbors

• For a large class of articulations, OFMs are smooth and locally isometric– Traditional manifold processing

techniques work on OFMs

0q

1q 2q

IAM

OFM at 0qI

2qI1q

I0qI

Articulations

0qI

0qI

Linking it all together

0q

1q 2q

IAM

OFM at 0qI

2qI1q

I0qI

Articulations

0e

1e 2e

Nonlinear dim. reduction

The non-differentiability does not dissappear --- it is embedded in the mapping from OFM to the IAM.

However, this is a known map

))(()( such that },{ xfIxIIfI refref qqqq

The Story So Far…

0q

1q 2q

IAM

OFM at 0qI

2qI1q

I0qI

Articulations

Tangent space at 0qI

Articulations

0q

1q 2q

2qI1q

I0qI IAM

Input Image Input Image

Geodesic Linear pathIAM

OFM

Data196 images of two bears moving linearlyand independently

IAM OFM

TaskFind low-dimensional embedding

OFM Manifold Learning

IAM OFM

Data196 images of a cup moving on a plane

Task 1Find low-dimensional embedding

Task 2Parameter estimation for new images(tracing an “R”)

OFM ML + Parameter Estimation

• Point on the manifold such that the sum of geodesic distances to every other point is minimized

• Important concept in nonlinear data modeling, compression, shape analysis [Srivastava et al]

Karcher Mean

10 images from an IAM

ground truth KM OFM KM linear KM

Sparse keypoint-based image representation


• Conventional representation for an image– A vector of pixels

– Inadequate!

pixel image

Image representations• Replace vector of pixels with an abstract

bag of features

– Ex: SIFT (Scale Invariant Feature Transform) selects keypoint locations in an image and computes keypoint descriptors for each keypoint

Image representations• Replace vector of pixels with an abstract

bag of features

– Ex: SIFT (Scale Invariant Feature Transform) selects keypoint locations in an image and computes keypoint descriptors for each keypoint

– Feature descriptors are local; it is very easy to make them robust to nuisance imaging parameters

Loss of Geometrical Info• Bag of features representations hide

potentially useful image geometry

• Goal: make salient image geometrical info more explicit for exploitation

Image space

Keypoint space

Key idea• Keypoint space can be endowed with a rich

low-dimensional structure in many situations

• Mechanism: define kernels , between keypoint locations, keypoint descriptors

Keypoint Kernel• Keypoint space can be endowed with a rich

low-dimensional structure in many situations

• Mechanism: define kernels , between keypoint locations, keypoint descriptors

• Joint keypoint kernel between two images

is given by

Keypoint GeometryTheorem: Under the metric induced by the kernel

certain ensembles of articulating images formsmooth, isometric manifolds

• In contrast: conventional approach to image fusion via image articulation manifolds (IAMs) fraught with non-differentiability (due to sharp image edges)– not smooth– not isometric

Application: Manifold Learning

2D Translation

Application: Manifold Learning

2D Translation IAM KAM

Manifold Learning in the Wild• Rice University’s Duncan Hall Lobby

– 158 images– 360° panorama using handheld camera– Varying brightness, clutter

• Duncan Hall Lobby• Ground truth using state of the art

structure-from-motion software

Manifold Learning in the Wild

Ground truth IAM KAM

Internet scale imagery• Notre-dame

cathedral – 738 images

– Collected from Flickr

– Large variations in illumination (night/day/saturations), clutter (people, decorations), camera parameters (focal length, fov, …)

– Non-uniform sampling of the space

Organization• k-nearest neighbors

Organization• “geodesics’

3D rotation

“Walk-closer”

“zoom-out”

Summary• Need for novel image representations

– Transport operators Enables differentiability and meaningful transport

– Sparse features Robustness to outliers, nuisance articulations, etc. Learning in the wild: unsupervised imagery

• True power of manifold signal processing lies in fast algorithms that mainly use neighbor relationships– What are compelling applications where such methods can

achieve state of the art performance ?

Summary• IAMs a useful concise model for many image

processing problems involving image collections and multiple sensors/viewpoints

• But practical IAMs are non-differentiable– IAM-based algorithms have not lived up to their promise

• Optical flow manifolds (OFMs)– smooth even when IAM is not– OFM ~ nonlinear tangent space– support accurate image synthesis, learning, charting, …

• Barely discussed here: OF enables the safe extension of differential geometry concepts– Log/Exp maps, Karcher mean, parallel transport, …

Summary: Rice U• Bag of features representations pervasive in

image information fusion applications (ex: SIFT)• Progress to date:

– Bags of features can contain rich geometrical structure – Keypoint distance very efficient to compute

(contrast with combinatorial complexity of feature correspondence algorithm)

– Keypoint distance significantly outperforms standard image Euclidean distance in learning and fusion applications

– Punch line: KAM preferred over IAM

• Current/future work:– More exhaustive experiments with occlusion and clutter– Studying geometrical structure of feature representations

of other kinds of non-image data (ex: documents)

Open Questions

• Our treatment is specific to image manifolds under

brightness constancy

• What are the natural transport operators for other data manifolds?

dsp.rice.edu

Related Work• Analytic transport operators

– transport operator has group structure [Xiao and Rao 07][Culpepper and Olshausen 09] [Miller and Younes 01] [Tuzel et al 08]

– non-linear analytics [Dollar et al 06]– spatio-temporal manifolds [Li and Chellappa 10]– shape manifolds [Klassen et al 04]

• Analytic approach limited to a small class of standard image transformations (ex: affine transformations, Lie groups)

• In contrast, OFM approach works reliably with real-world image samples (point clouds) and broader class of transformations

Limitations• Brightness constancy

– Optical flow is no longer meaningful

• Occlusion– Undefined pixel flow in theory, arbitrary flow estimates in

practice– Heuristics to deal with it

• Changing backgrounds etc.– Transport operator assumption too strict– Sparse correspondences ?

Occlusion• Detect occlusion using forward-backward flow

reasoning

• Remove occluded pixel computations

• Heuristic --- formal occlusion handling is hard

Occluded

Open Questions• Theorem:

random measurements stably embed aK-dim manifoldwhp [B, Wakin, FOCM ’08]

• Q: Is there an analogousresult for OFMs?

Tools for manifold processing

Geodesics, exponential maps, log-maps, Riemannian metrics, Karcher means, …

Smooth diff manifold

Algebraic manifolds Data manifolds

LLE, kNN graphs

Point cloud model

Concise Models• Efficient processing / compression requires

concise representation

• Sparsity of an individual image

pixels largewaveletcoefficients(blue = 0)

History of Optical Flow

• Dark ages (<1985)– special cases solved– LBC an under-determined set of linear equations

• Horn and Schunk (1985) – Regularization term: smoothness prior on the flow

• Brox et al (2005)– shows that linearization of brightness constancy (BC) is

a bad assumption– develops optimization framework to handle BC directly

• Brox et al (2010), Black et al (2010), Liu et al (2010)– practical systems with reliable code

ISOMAP embedding error for OFM and IAM

2D rotations

Reference image

Manifold Learning

Manifold Learning

-6 -4 -2 0 2 4 6-5

-4

-3

-2

-1

0

1

2

3

4

5Two-dimensional Isomap embedding (with neighborhood graph).

Embedding of OFM

2D rotations

Reference image

OFM Implementation details

Reference Image

Pairwise distances and embedding

-5 0 5-4

-2

0

2

4

100 200 300 400

100

200

300

400

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

Res

idua

l var

ianc

e

Isomap dimensionality

-5 0 5-4

-2

0

2

4100 200 300 400

100

200

300

400

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

Res

idua

l var

ianc

e

Isomap dimensionality

Flow Embedding

OFM Synthesis

• Goal: build a generative model for an entire IAM/OFM based on a small number of base images

• eLCheapoTM algorithm:– choose a reference image randomly– find all images that can be generated from this image by OF– compute Karcher (geodesic) mean of these images– compute OF from Karcher mean image to other images– repeat on the remaining images until no images remain

• Exact representation when no occlusions

Manifold Charting

Manifold Charting• Goal: build a generative model for an entire

IAM/OFM based on a small number of base images

• Ex: cube rotating about axis. All cube images can be representing using 4 reference images + OFMs

• Many applications– selection of target templates for classification– “next-view” selection for adaptive sensing applications

Greedy charting

Optimal charting

0q 90q 180q 180q 90q

IAM

• Features (including SIFT) ubiquitous in fusion and processing apps (15k+ cites for 2 SIFT papers)

SIFT Features

building 3D models

part-based object recognitionorganizing internet-scale databases

image stitching

Figures courtesy Rob Fergus (NYU), Phototourism website, Antonio Torralba (MIT), and Wei Lu

Many Possible Kernels• Euclidean kernel

• Gaussian kernel

• Polynomial kernel

• Pyramid match kernel [Grauman et al. ’07]

• Many others

Keypoint Kernel• Joint keypoint kernel between two images

is given by

• Using Euclidean/Gaussian (E/G) combination yields

From Kernel to MetricTheorem: The E/G keypoint kernel is a Mercer kernel

– enables algorithms such as SVM

Theorem: The E/G keypoint kernel induces a metric on the space of images

– alternative to conventional L2 distance between images– keypoint metric robust to nuisance imaging parameters,

occlusion, clutter, etc.

Keypoint Geometry

NOTE: Metric requires no permutation-based matching of keypoints(combinatorial complexity in general)

Theorem: The E/G keypoint kernel induces a metric on the space of images

Keypoint GeometryTheorem: Under the metric induced by the kernel

certain ensembles of articulating images formsmooth, isometric manifolds

• Keypoint representation compact, efficient, and …

• Robust to illumination variations, non-stationary backgrounds, clutter, occlusions

Manifold Learning in the Wild

Viewing angle – 179 images

IAM KAM

Manifold Learning in the Wild• Rice University’s Brochstein Pavilion

– 400 outdoor images of a building– occlusions, movement in foreground, varying background

Manifold Learning in the Wild• Brochstein Pavilion

– 400 outdoor images of a building– occlusions, movement in foreground, background

IAM KAM

richard g. baraniuk chinmay hegde sriram nagaraj

Documents

image ensembles richard

image ensembles aswin

red image local isometry

ambient space

manifold learningk

collections of images

smooth iamsnpixel images

blue images