donald geman dept. of applied mathematics and statistics center for imaging science johns hopkins...

38
Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

Upload: gregory-parks

Post on 13-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

Donald GEMANDept. of Applied Mathematics and Statistics

Center for Imaging ScienceJohns Hopkins University

STRATEGIES FOR VISUAL RECOGNITION

Page 2: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

2

Outline

General Orientation within Imaging Semantic Scene Interpretation Three Paradigms with Examples

Generative Predictive Hierarchical

Critique and Conclusions

Page 3: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

3

Sensors to Images

Constructing images from measured data. Examples

Ordinary visible light cameras Computed tomography (CT, SPECT, PET, MRI) Ultrasound, Molecular, etc.

Mathematical Tools Harmonic analysis Partial differential equations Poisson processes

Page 4: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

4

Images to Images/Surfaces

Transform images to more compact or informative data structures.

Examples Restoration (de-noising, de-blurring, inpainting) Compression Shape-from-shading

Mathematical Tools Harmonic analysis Regularization theory and variational methods Bayesian inference, graphical models, MCMC

Page 5: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

5

Image to Words

Semantic and structural interpretations of images. Examples

Selective attention, figure/ground separation Object detection and classification Scene categorization

Mathematical Tools Distributions on grammars, graphs,

transformations Computational learning theory Shape spaces Geometric and algebraic invariants

Page 6: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

6

Semantic Scene Interpretation

Understanding how brains interpret sensory data, or how computers might also, is a major challenge.

Here: One greyscale image. Although likely crucial to biological learning, no cues from color, motion or depth.

There is objective reality Y(I), at least at the level of key words.

Page 7: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

7

Dreaming

A description machine

from images to description of an

underlying scene.

Better Yet: A sequence of increasingly fine interpretations perhaps “nested.”

:f I Y

I I Y Y

1 2Y = (Y ,Y ,...)

Page 8: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

8

More Dreaming

ACCURACY: for most images.

LEARNING: There is an explicit set of instructions for building

involving samples from a learning set

EXECUTION: There is an explicit set of instructions for evaluating with as little computation as possible.

ANALYSIS: There is “supporting theory” which guides

construction and predicts performance.

Y(I)=Y(I)

Y

1 1( , ),..., ( , )n nL I Y I Y

Y(I)

Page 9: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

9

Detecting Boats

Page 10: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

10

Where Are the Faces? Whose?

Page 11: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

11

Within Class Variability

Page 12: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

12

How Many Samples are Necessary?

Page 13: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

13

Recognizing Context

Page 14: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

14

Many Levels of Description

Page 15: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

15

Confounding Factors

Local (but not global) ambiguity Complexity: There are so many things to look for! Arbitrary views and lighting Clutter: Alternative hypothesis is not “white noise” Knowledge: Somehow quantify

Domination of clutter Invariance of object names under transforms Regularity of the physical world

Page 16: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

16

Confounding Factors (cont)

Scene interpretation is an infinite-dimensional classification problem.

Is segmentation/grouping performed before, during or after recognition?

No advances in computers or statistical learning will overcome the small-sample dilemma.

Some organizational framework is unavoidable.

Page 17: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

17

Small-Sample Computational Learning

: Training set for inductive learning : Measurement or feature vector; : True label or explanation of .

Examples: : Acoustic speech signals; : transcription into words : Natural images; :semantic description

Common property: n is very small relative to the effective dimensions

1 1( , ),..., ( , )n nL x y x y

ix Xiy Y ix

X YX Y

Page 18: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

18

Three Paradigms

Generative: Centered on a joint statistical model for features X and interpretations Y.

Predictive: Proceed (almost) directly from data to decision boundaries.

Hierarchical: Exploit shared features among objects and interpretations.

Page 19: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

19

Generative Modeling

The world is very special: Not all explanations and observations are equally likely. Capture regularities with stochastic models.

Learning and decision-making based P(X,Y), derived from A distribution P(Y) on interpretations accounting for priori

knowledge and expectations. A conditional data model P(X|Y) accounting for visual

appearance. Inference principle: Given X, choose the interpretation Y

which maximizes P(Y|X).

Page 20: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

20

Generative Modeling: Examples

Deformable templates Prior on transformations Template + noise data model

Hidden Markov models Probabilities on grammars and production rules Graphical models, e.g., Bayesian networks LDA, etc. Gaussian mixtures

Page 21: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

21

Gaussian Part/Appearance Model

Y: shape class with prior p(y) Z: locations of object “parts” X=X(I): features whose components capture local

topography (interest points, edges, wavelets) Compound Gaussian model:

p(z|y): multivariate normal (m(y), C(y)) p(x|z,y): multivariate normal (m(z,y), C(z,y))

Estimate Y as arg max p(z,y|x) = arg max p(x|z,y) p(z|y) p(y)

Page 22: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

22

Generative Modeling: Critique

In principle, a very general framework. In practice,

Diabolically hard to model P(Y). Intensive computation with P(Y|X). P(X|Y) alone amounts to “templates-for-everything”

which lacks power requires infinite computation

Page 23: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

23

Predictive Learning

Do not solve a more difficult problem than is necessary; ultimately only a decision boundary is needed.

Representation and learning: Replace I by a fixed length feature vector X Quantize Y to a finite number of classes 1,2,…,C Specify a family F of “classifiers” f(X) . Induce f(X) directly from a training set L .

Often does require some modeling.

Page 24: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

24

Predictive Learning: Examples

Examples which, in effect, learn P(Y|X) directly and apply Bayes rule: Artificial neural networks k-NN with smart metrics (e.g., “shape context”) Decision trees

Support vector machines (interpretation as Bayes rule via logistic regression)

Multiple classifiers (e.g., random forests)

Page 25: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

25

Support Vector Machines

1tw x b

1tw x b

0tw x b

w

2

w

Let be a training set generated i.i.d according to P(X,Y).

Maximize the margin:

1

2,min . . ( ) 1 0,t t

i iw b

w w s t y w x b i

1

2max ,

ii i j i j i ji i j

y y x x

. 0, 0i i iis t i and y

1 1( , ),..., ( , )n nL x y x y

Page 26: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

26

SVM (cont)

The classification function:

Data in the input space are mapped into a higher dimensional one, where linear separability holds:

1 1

( ) , ,i i

ti i i i

y y

f x b w x x x x x

( ) ( )( )( )( )

( )( )( )

Page 27: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

27

SVM (cont)

The optimization problem and the classification function are similar to the linear case.

The scalar product is replaced by a kernel.

1

2max ( ), ( )

ii i j i j i ji i j

y y x x

. 0, 0i i iis t i and y

1 1

( ) ( ), ( ) ( ), ( )i i

ti i i i

y y

f x b w x x x x x

,( ), ( )x x

Page 28: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

28

Predictive Learning: Critique

In principle, universal learning machines which could mimic natural processes and “learn” invariance from enough examples.

In practice, lacks a global organizing principle to confront A very large number of classes (say 30,000) The small-sample dilemma The complexity of clutter Excessive computation

Page 29: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

29

Hierarchical Modeling

The world is very special: Vision is only possible due to its hierarchical organization into common parts and sub-interpretations.

Determine common visual structure by: Clustering images; Information-theoretic criteria (e.g., mutual

information) to select common patches; Building classifiers (e.g., decision trees or multi-

class boosting); Constructing grammars.

Page 30: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

30

Hierarchical: Examples

Compositional vision: A “theory of reusable parts” Hierarchies of image patches or fragments. Algorithmic modeling: coarse-to-fine

representation of the computational process.

Page 31: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

31

Hierarchical Indexing

Coarse-to-fine modeling of both the interpretations and the computational process: Unites representation and processing. Proceed from broad scope with low power to

narrow scope with high power. Concentrate processing on ambiguous areas. Evidence that coarse information is conveyed

earlier than fine information in neural responses to visual stimuli.

Page 32: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

32

Hierarchical Indexing (cont)

A

Estimate by exploring

where is a hierarchy of nested partitions of

and is binary test for vs .

Index : Explanation

= ,

s not ruled out

by any test:

X Y A Y A

:

A

Y

X A

y

y Y

X H

H Y

D Y

D = 1

y

AX A

Page 33: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

33

Hierarchical Indexing (cont)

A recursive partitioning of Y with four levels; there is a binary test for each of the 15 cells.

(A): Positive tests are shown in black (B): The index is the union of leaves 3 and 4. (C): The “trace” of coarse-to-fine search.

Page 34: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

34

When is CTF Optimal?

c(A) = cost, p(A) = power of the test for cell A in H. c* = cost of a perfect test for a single hypothesis. The mean cost of a sequential testing strategy T is

where is the probability of performing test

THEOREM: (G. Blanchard/DG) CTF is optimal if

where C(A) = direct children of A in H.

( )

( ) ( ),

( ) ( )B C A

c A c BA

p A p B

*( ) ( ) ( ) | |AA

EC T c A q T c E D

( )Aq TAX

Page 35: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

35

Density of Work

Original image Spatial concentration of processing

Page 36: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

36

Modeling vs. Learning: Variations on the Bias-Variance Dilemma

Reduce variance (dependence on L) by introducing the “right” biases (a priori structure) or by introducing more complexity?

Is dimensionality a “curse” or a “blessing”? Hard-wiring vs. tabula rasa. |L| - small vs. large:

“Credit” for learning with small L?. Is the interesting limit |L| goes to infinity or zero?

Page 37: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

37

Conclusions

Automatic scene interpretation remains elusive. However, growing success with particular object

categories, e.g., vehicles and faces, and many industrial applications (e.g., wafer inspection).

No dominant mathematical framework, and the “right” one is unclear.

Few theoretical results outside classification.

Page 38: Donald GEMAN Dept. of Applied Mathematics and Statistics Center for Imaging Science Johns Hopkins University STRATEGIES FOR VISUAL RECOGNITION

38

Naïve Bayes

Map I to a feature vector X Boolean edges Wavelet coefficients Interest points

Assume the components of X are conditionally independent given Y.

Learn the marginal distributions under object and background hypotheses from data.

Uniform prior P(Y). Perform a likelihood ratio test to detect objects against

background.