8.0 object recognition - cs.cmu.edu16385/s17/slides/8.0_object_recognition.pdf · object...

52
Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and Davis. Shape recognition using hierarchical Constraint Analysis. 1979

Upload: others

Post on 15-Jul-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Object Recognition16-385 Computer Vision (Kris Kitani)

Carnegie Mellon University

Henderson and Davis. Shape recognition using hierarchical

Constraint Analysis. 1979

Page 2: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

What do we mean by ‘object recognition’?

Page 3: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Is this a street light?(Verification / classification)

Page 4: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Where are the people?(Detection)

Page 5: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Is that Potala palace?(Identification)

Page 6: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

What’s in the scene?(semantic segmentation)

Building

Mountain

Trees

VendorsPeople

Ground

Sky

Page 7: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

What type of scene is it?(Scene categorization)

Outdoor

City

Marketplace

Page 8: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Challenges (Object Recognition)

Page 9: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Viewpoint variation

Page 10: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Illumination variation

Page 11: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Scale variation

Page 12: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Background clutter

Page 13: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Deformation

Page 14: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Occlusion

Page 15: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Intra-class variation

Page 16: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Common approaches

Page 17: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Spatial reasoning

Window classification

Feature Matching

Common approaches: object recognition

Page 18: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Feature matching

Page 19: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

What object do these parts belong to?

Page 20: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

a collection of local features (bag-of-features)

An object as

Some local feature are very informative

Are the positions of the parts important?

• deals well with occlusion • scale invariant • rotation invariant

Page 21: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Pros

• Simple

• Efficient algorithms

• Robust to deformations

Cons

• No spatial reasoning

Page 22: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Spatial reasoning

Window classification

Feature Matching

Common approaches: object recognition

Page 23: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Spatial reasoning

Page 24: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

The position of every part depends on the positions of all the other parts

positional dependence

Many parts, many dependencies!

Page 25: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

1. Extract features 2. Match features 3. Spatial verification

Page 26: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

1. Extract features 2. Match features 3. Spatial verification

Page 27: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

1. Extract features 2. Match features 3. Spatial verification

an old idea…

Page 28: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Fu and Booth. Grammatical Inference. 1975

Structural (grammatical) descriptionScene

Page 29: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and
Page 30: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

1972

Description for left edge of face

Page 31: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

vector of RVs: set of part locations L = {L1, L2, . . . , LM}

RV

A more modern probabilistic approach…

think of locations as random variables (RV)

RV RV

Page 32: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

vector of RVs: set of part locations L = {L1, L2, . . . , LM}

What are the dimensions of R.V. L?

How many possible combinations of part locations?

RV

A more modern probabilistic approach…

think of locations as random variables (RV)

RV RV

L1

L2

LM

image (N pixels)

Page 33: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

vector of RVs: set of part locations L = {L1, L2, . . . , LM}

What are the dimensions of R.V. L?

How many possible combinations of part locations?

RV

A more modern probabilistic approach…

think of locations as random variables (RV)

RV RV

L1

L2

LM

image

Lm = [ x y ]

Page 34: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

vector of RVs: set of part locations L = {L1, L2, . . . , LM}

What are the dimensions of R.V. L?

How many possible combinations of part locations?

NM

RV

A more modern probabilistic approach…

think of locations as random variables (RV)

RV RV

Lm = [ x y ]

L1

L2

LM

image

Page 35: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Most likely set of locations L is found by maximizing:

Likelihood: How likely it is to observe

image I given that the M parts are at locations L

(scaled output of a classifier)

Prior: spatial prior controls the

geometric configuration of the parts

What kind of prior can we formulate?

p(L|I) / p(I|L)p(L)part

locations image

Posterior

Page 36: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Given any collection of selfie images, where would you expect the nose to be?

P (Lnose

) =?

What would be an appropriate prior?

Page 37: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

A simple factorized model

Break up the joint probability into smaller (independent) terms

p(L) =Y

m

p(Lm)

Page 38: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Independent locations

Each feature is allowed to move independently

Does not model the relative location of parts at all

p(L) =Y

m

p(Lm)

Page 39: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Tree structure (star model)

Represent the location of all the parts relative to a single

reference part

Assumes that one reference part is defined

(who will decide this?)

Root (reference)

node

p(L) = p(Lroot

)M�1Y

m=1

p(Lm|Lroot

)

Page 40: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Fully connected (constellation model)

Explicitly represents the joint distribution of locations

Good model: Models relative location of parts

BUT Intractable for moderate number of parts

positional dependence

p(L) = p(l1, . . . , lN )

Page 41: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Pros

• Retains spatial constraints

• Robust to deformations

Cons

• Computationally expensive

• Generalization to large inter-class variation (e.g., modeling chairs)

Page 42: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Spatial reasoning

Window classification

Feature Matching

Page 43: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Window-based

Page 44: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

1. get image window 2. extract features 3. classify

When does this work and when does it fail?

How many templates do you need?

Template Matching

Page 45: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Per-exemplar

find the ‘nearest’ exemplar, inherit its label

exemplar template top hits from test data

Page 46: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

1. get image window(or region proposals)

2. extract features 3. compare to template

Template Matching

Do this part with one big classifier ‘end to end learning’

Page 47: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Convolutional Neural Networks

Convolution Pooling

Image patch (raw pixels values)

response of one ‘filter’

max/min response over

a region

A 96 x 96 image convolved with 400 filters (features) of size 8 x 8 generates about 3

million values (892x400)

Pooling aggregates statistics and lowers the dimension of convolution

Image patch (raw pixels values)

response of one ‘filter’

Page 48: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and
Page 49: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

630 million connections 60 millions parameters to learn

Krizhevsky, A., Sutskever, I. and Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012.

224/4=56

96 ‘filters’

Page 50: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Pros

• Retains spatial constraints

• Efficient test time performance

Cons

• Many many possible windows to evaluate

• Requires large amounts of data

• Sometimes (very) slow to train

Page 51: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

How to write an effective CV resume

Page 52: 8.0 Object Recognition - cs.cmu.edu16385/s17/Slides/8.0_Object_Recognition.pdf · Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Henderson and

Deep Learning+1-DEEP-LEARNING deeplearning@deeplearning http://deeplearning

Summary: Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning

Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning

Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning

Deep Learning

Experience: Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning

Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning

Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning

Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning

Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning

Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning

Education

Deep Learning Deep Learning ?Deep Learning Deep Learning Deep Learning

Experience

Deep Learning Deep Learning .Deep Learning Deep Learning, Deep Learning

· Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning

· Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep

Learning Deep Learning Deep Learning

· Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep

Learning Deep Learning

Deep Learning in another country Deep LearningDeep Learning , Deep Learning , Deep Learning

· Deep Learning ... wait.. Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning

Deep Learning Deep Learning Deep Learning

· Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep

Learning Deep Learning

Deep Learning Deep LearningDeep Learning Deep Learning

· Very Deep Learning

Publications

1. Deep Learning in Deep Learning People who do Deep Learning things. Conference of Deep

Learning.

2. Shallow Learning... Nawww.. Deep Learning bruh Under submission while Deep Learning

Patent

1. System and Method for Deep Learning. Deep Learning, Deep Learning , Deep Learning , Deep

Learning

1