university of pennsylvania 1 grasp cis 580 machine perception fall 2004 jianbo shi cis580 object...

35
University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi http://www.seas.upenn.edu/~cis580 Object recognition

Upload: byron-hunt

Post on 17-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

1GRASP

CIS 580 Machine Perception

Fall 2004

Jianbo Shi

http://www.seas.upenn.edu/~cis580

Object recognition

Page 2: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

2GRASP

Problem Definition

1. An Image is a set of 2D geometric features, along with positions.

2. An Object is a set of 2D/3D geometric features, along with positions.

3. A pose positions the object relative to the image. 2D Translation; 2D translation + rotation; 2D translation, rotation and scale; planar or 3D object positioned in 3D with perspective or scaled

orthorgraphic

4. The best pose places the object features nearest the image features

Page 3: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

3GRASP

Strategy

1. Build feature descriptions

2. Search possible poses. Can search space of poses Or search feature matches, which produce pose

3. Transform model by pose.

4. Compare transformed model and image.

5. Pick best pose.

Page 4: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

4GRASP

Overview

1. We will discuss feature selection in 2 lectures2. Pose error measurement3. search methods for 2D.4. discuss search for 3D objects.

Page 5: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

5GRASP

Feature example: edges

Page 6: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

6GRASP

Evaluate Pose

We look at this first, since it defines the problem.

Again, no perfect measure; Trade-offs between veracity of measure and computational considerations.

Page 7: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

7GRASP

Chamfer Matching

∑i

d

For every edge point in the transformed object, compute the distance to the nearest image edge point. Sum distances.

||),||||,,||||,,min(||21 1 mii

n

i iqpqpqp K∑

=

Page 8: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

8GRASP

Main Feature:

• Every model point matches an image point.

• An image point can match 0, 1, or more model points.

Page 9: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

9GRASP

Variations

Sum a different distance f(d) = d2 or Manhattan distance. f(d) = 1 if d < threshold, 0 otherwise.

This is called bounded error.

Use maximum distance instead of sum. This is called: directed Hausdorff distance.

Use other features Corners. Lines. Then position and angles of lines must be similar.

Model line may be subset of image line.

Page 10: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

10GRASP

Other comparisons

• Enforce each image feature can match only one model feature.

• Enforce continuity, ordering along curves.

These are more complex to optimize.

Page 11: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

11GRASP

Overview

1. We will discuss feature selection in 2 lectures2. Pose error measurement3. search methods for 2D.4. discuss search for 3D objects.

Page 12: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

12GRASP

Pose Search

Simplest approach is to try every pose.

Two problems: many poses, costly to evaluate each.

We can reduce the second problem with:

Page 13: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

13GRASP

Pose: Chamfer Matching with the Distance Transform

00

0 0000

11

1

11

1

11

1111 1

1

22

22

2222

22

22 33

3333

33 4

Example: Each pixel has (Manhattan) distance to nearest edge pixel.

Page 14: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

14GRASP

Computing Distance Transform

• It’s only done once, per problem, not once per pose.• Basically a shortest path problem.• Simple solution passing through image once for each

distance. First pass mark edges 0. Second, mark 1 anything next to 0, unless it’s already marked.

Etc….

• Actually, a more clever method requires 2 passes.

Page 15: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

15GRASP

Pose: Ransac

Match enough features in model to features in image to determine pose.

Examples: match a point and determine translation. match a corner and determine translation and rotation. Points and translation, rotation, scaling? Lines and rotation and translation?

Page 16: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

16GRASP

Page 17: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

17GRASP

Pose: Generalized Hough Transform

Like Hough Transform, but for general shapes.

Example: match one point to one point, and for

every rotation of the object its translation is

determined.

Page 18: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

18GRASP

Overview

1. We will discuss feature selection in 2 lectures2. Pose error measurement3. search methods for 2D objects.4. discuss search for 3D objects.

Page 19: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

19GRASP

Computing Pose: Points

1. Solve W= S*R. In Structure-from-Motion, we knew W.

In Recognition, we know R.

2. This is just set of linear equations Ok, maybe with some non-linear constraints.

Page 20: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

20GRASP

Linear Pose: 2D Translation

⎟⎟⎟

⎜⎜⎜

⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟

⎞⎜⎝

111

...

10

01...21

21

21

21

n

n

y

x

n

n yyy

xxx

t

t

vvv

uuu

We know x,y,u,v, need to find translation.

For one point, u1 - x1 = tx ; v1 - x1 = ty

For more points we can solve a least squares problem.

Page 21: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

21GRASP

Linear Pose: 2D rotation, translation and scale

θθ

θθθθ

sin,coswith

111

...

111

...

cossin

sincos...

21

21

21

21

21

21

sbsa

yyy

xxx

tab

tba

yyy

xxx

t

ts

vvv

uuu

n

n

y

x

n

n

y

x

n

n

==

⎟⎟⎟

⎜⎜⎜

⎟⎟⎠

⎞⎜⎜⎝

⎛−

⎟⎟⎟

⎜⎜⎜

⎟⎟⎠

⎞⎜⎜⎝

⎛−

=⎟⎠

⎞⎜⎝

• Notice a and b can take on any values.

• Equations linear in a, b, translation.

• Solve exactly with 2 points, or overconstrained system with more.

sabas =+= θcos22

Page 22: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

22GRASP

Linear Pose: 3D Affine

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟

⎞⎜⎝

111

...

...

21

21

21

3,22,21,2

3,12,11,1

21

21

n

n

n

y

x

n

n

zzz

yyy

xxx

tsss

tsss

vvv

uuu

Page 23: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

23GRASP

Pose: Scaled Orthographic Projection of Planar points

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟

⎞⎜⎝

111

000

...

...21

21

3,22,21,2

3,12,11,1

21

21 n

n

y

x

n

nyyy

xxx

tsss

tsss

vvv

uuu

s1,3, s2,3 disappear. Non-linear constraints disappear with them.

Page 24: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

24GRASP

Non-linear pose

• A bit trickier. Some results:• 2D rotation and translation. Need 2 points.• 3D scaled orthographic. Need 3 points, give 2 solutions.

• 3D perspective, camera known. Need 3 points. Solve 4th

degree polynomial. 4 solutions.

Page 25: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

25GRASP

Transforming the Object

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

⎟⎠

⎞⎜⎝

⎛=⎟⎠

⎞⎜⎝

111

...

????

????

???

???

21

21

21

4321

4321

n

n

n

zzz

yyy

xxx

vvvv

uuuu

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟

⎞⎜⎝

111

...

???

???

21

21

21

3,22,21,2

3,12,11,1

4321

4321

n

n

n

y

x

zzz

yyy

xxx

tsss

tsss

vvvv

uuuu

We don’t really want to know pose, we want to know what the object looks like in that pose.

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟

⎞⎜⎝

111

...

...

21

21

21

3,22,21,2

3,12,11,1

21

21

n

n

n

y

x

n

n

zzz

yyy

xxx

tsss

tsss

vvv

uuu

We start with:

Solve for pose:

Project rest of points:

Page 26: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

26GRASP

Transforming object with Linear Combinations

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

22

2

2

1

22

2

2

1

11

2

1

1

11

2

1

1...

n

n

n

n

vvv

uuu

vvv

uuu

No 3D model, but we’ve seen object twice before.

⎟⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜⎜

??

??

.

.

.

.

3

4

3

3

3

2

3

1

3

4

3

3

3

2

3

1

22

4

2

3

2

2

2

1

21

4

2

3

2

2

2

1

11

4

1

3

1

2

1

1

11

4

1

3

1

2

1

1

vvvv

uuuu

vvvvv

uuuuu

vvvvv

uuuuu

n

n

n

nSee four points in third image, need to fill in location of other points.

Just use rank theorem.

Page 27: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

27GRASP

Recap: Recognition w/ RANSAC

1. Find features in model and image. Such as corners.

2. Match enough to determine pose. Such as 3 points for planar object, scaled orthographic

projection.

3. Determine pose.4. Project rest of object features into image.5. Look to see how many image features they match.

Example: with bounded error, count how many object features project near an image feature.

6. Repeat steps 2-5 a bunch of times.7. Pick pose that matches most features.

Page 28: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

28GRASP

Recognizing 3D Objects

Previous approach will work.

But slow. RANSAC considers n3m3 possible matches. About m3 correct.

Solutions: Grouping. Find features coming from single object. Viewpoint invariance. Match to small set of model features

that could produce them.

Page 29: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

29GRASP

Grouping: Continuity

Page 30: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

30GRASP

Connectivity

Connected lines likely to come from boundary of same object. Boundary of object often produces connected contours. Different objects more rarely do; only when overlapping.

Connected image lines match connected model lines. Disconnected model lines generally don’t appear connected.

Page 31: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

31GRASP

Other Viewpoint Invariants

Parallelism

Convexity

Common region

….

Page 32: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

32GRASP

Planar Invariants

⎟⎟⎟

⎜⎜⎜

⎟⎟⎠

⎞⎜⎜⎝

⎛=

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟

⎞⎜⎝

111

...

111

000

...

...

21

21

2,21,2

2,11,1

21

21

3,22,21,2

3,12,11,1

21

21

n

n

y

x

n

n

y

x

n

n

yyy

xxx

tss

tss

yyy

xxx

tsss

tsss

vvv

uuu

A t

Page 33: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

33GRASP

p1 p2

p3p4

p4 = p1 + a(p2-p1) + b(p3-p1)

A(p4)+ t = A(p1+a(p2-p1) + b(p3-p1)) + t

= A(p1)+t + a(A(p2)+t – A(p1)-t) + b(A(p3)+t – A(p1)-t)

Page 34: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

34GRASP

p1 p2

p3p4

p4 = p1 + a(p2-p1) + b(p3-p1)

A(p4)+ t = A(p1+a(p2-p1) + b(p3-p1)) + t

= A(p1)+t + a(A(p2)+t – A(p1)-t) + b(A(p3)+t – A(p1)-t)

p4 is linear combination of p1,p2,p3. Transformed p4 is same linear combination of transformed p1, p2, p3.

Page 35: University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi cis580 Object recognition

University of Pennsylvania

35GRASP

What we didn’t talk about

Smooth 3D objects.

Can we find the guaranteed optimal solution?

Indexing with invariants.

Error propagation.