introduction to computer vision for...

19
1 Introduction to Computer Vision for Robotics Jan-Michael Frahm Introduction to Computer Vision for Robotics Overview Camera model Multi-view geometry Camera pose estimation Feature tracking & matching Robust pose estimation

Upload: others

Post on 04-Mar-2020

55 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Computer Vision for Roboticsgamma.cs.unc.edu/courses/robotics-f08/LEC/IntroductionToComputerVision.pdfIntroduction to Computer Vision for Robotics The Essential Matrix

1

Introduction to Computer Vision

for Robotics

Jan-Michael Frahm

Introduction to Computer Vision for Robotics

Overview

� Camera model

� Multi-view geometry

� Camera pose estimation

� Feature tracking & matching

� Robust pose estimation

Page 2: Introduction to Computer Vision for Roboticsgamma.cs.unc.edu/courses/robotics-f08/LEC/IntroductionToComputerVision.pdfIntroduction to Computer Vision for Robotics The Essential Matrix

2

Introduction to Computer Vision for Robotics

Homogeneous coordinates

e2

e3

e1o

MUnified notation:

include origin in affine basis

1

21 2 3

30 0 0 1

1

a

ae e e oM

a

=

r r r r

Affine basis matrix

Homogeneous

Coordinates of M

a1

a3

a2

Introduction to Computer Vision for Robotics

Properties of affine transformation

� Parallelism is preserved

� ratios of length, area, and volume are preserved

� Transformations can be concatenated:

11 12 13

21 22 23

31 32 33

0 0 0 1

x

y

affine

z

a a a t

a a a tT

a a a t

=

3 3 3

0 0 0 1

x

affine

A tM T M M

′ = =

1 1 2 2 1 2 2 1 21if and M T M M T M M T T M T M= = ⇒ = =

Transformation Taffine combines linear mapping and

coordinate shift in homogeneous coordinates

− Linear mapping with A3x3 matrix

− coordinate shift with t3 translation vector

Page 3: Introduction to Computer Vision for Roboticsgamma.cs.unc.edu/courses/robotics-f08/LEC/IntroductionToComputerVision.pdfIntroduction to Computer Vision for Robotics The Essential Matrix

3

Introduction to Computer Vision for Robotics

Projective geometry

� Projective space P2 is space of rays emerging from O

− view point O forms projection center for all rays

− rays v emerge from viewpoint into scene

− ray g is called projective point, defined as scaled v: g=lv

x

y

w R3

( ) , 0

x

g y v

w

λ λ λ

= = ∈ ℜ ≠

v

O

1

x

v y

′ ′=

Introduction to Computer Vision for Robotics

Projective and homogeneous points

� Given: Plane Π in R2 embedded in P2 at coordinates w=1

− viewing ray g intersects plane at v (homogeneous coordinates)

− all points on ray g project onto the same homogeneous point v

− projection of g onto Π is defined by scaling v=g/l = g/w

w=1

R3( )

1

x x

g v y y

w

λ λ λ

′ ′= = =

1

11

x

wxy

v y gw w

′ ′= = =

O

Π (Ρ2)

w

x

v

y

Page 4: Introduction to Computer Vision for Roboticsgamma.cs.unc.edu/courses/robotics-f08/LEC/IntroductionToComputerVision.pdfIntroduction to Computer Vision for Robotics The Essential Matrix

4

Introduction to Computer Vision for Robotics

Finite and infinite points

� All rays g that are not parallel to Π intersect at an affine point v on Π.

w=1

P3

O

vΠ (Ρ2) 2v1vv∞

g1g

2g

0

x

g y∞

=

� The ray g(w=0) does not intersect Π. Hence v∞ is not an affine point but a direction. Directions have the coordinates (x,y,z,0)T

� Projective space combines affine space with infinite points (directions).

Introduction to Computer Vision for Robotics

Affine and projective transformations

� Affine transformation leaves infinite points at infinity

11 12 13

21 22 23

31 32 33

0 0 0 0 1 0

x

y

z

X a a a t X

Y a a a t Y

Z a a a t Z

′ ′ ⇒ = ′

'

affineM T M∞ ∞=

� Projective transformations move infinite points into finite affine space

'

projectiveM T M∞=

11 12 13

21 22 23

31 32 33

41 42 43 441 0

xp

yp

zp

a a a tx X X

a a a ty Y Y

a a a tz Z Z

w w w ww

λ λ

′ ′ ⇒ = = ′

Example: Parallel lines intersect at the horizon (line of infinite points).

We can see this intersection due to perspective projection!

Page 5: Introduction to Computer Vision for Roboticsgamma.cs.unc.edu/courses/robotics-f08/LEC/IntroductionToComputerVision.pdfIntroduction to Computer Vision for Robotics The Essential Matrix

5

Introduction to Computer Vision for Robotics

Pinhole Camera (Camera obscura)

Interior of camera obscura

(Sunday Magazine, 1838)Camera obscura

(France, 1830)

Introduction to Computer Vision for Robotics

Focal length f

aperture

image

object

View direction

lens

Image sensor

Center

(c x, c y

)

Pinhole camera model

Page 6: Introduction to Computer Vision for Roboticsgamma.cs.unc.edu/courses/robotics-f08/LEC/IntroductionToComputerVision.pdfIntroduction to Computer Vision for Robotics The Essential Matrix

6

Introduction to Computer Vision for Robotics

Perspective projection

� Perspective projection in P3 models pinhole camera:

− scene geometry is affine Ρ3 space with coordinates M=(X,Y,Z,1)T

− camera focal point in O=(0,0,0,1)T, camera viewing direction along Z

− image plane (x,y) in Π(Ρ2) aligned with (X,Y) at Z= Z0

− Scene point M projects onto point Mp on plane surface

X

Y

Z0

Π3

O

pMΠ (Ρ2)

Image plane

Camera center

Z (Optical axis)

1

X

YM

Z

=

0

0

0

0

0

0

1 1

Z Xp Z

Z Yp Z

p Z Z

ZZZ

Xx

Yy ZM

ZZ Z

= = =

x

y

Introduction to Computer Vision for Robotics

Projective Transformation

� Projective Transformation maps M onto Mp in Ρ3 space

0

0

1

1 0 0 0

0 1 0 0

0 0 1 0

0 0 01 1

p

p

p p

z

x X X

y Y YM T M

Z Z Z

W

ρ ρ

= ⇒ = =

X

Y

O

M

pM

� Projective Transformation linearizes projection

0

projective scale factorZ

Zρ = =

ρ

Page 7: Introduction to Computer Vision for Roboticsgamma.cs.unc.edu/courses/robotics-f08/LEC/IntroductionToComputerVision.pdfIntroduction to Computer Vision for Robotics The Essential Matrix

7

Introduction to Computer Vision for Robotics

Perspective Projection

Dimension reduction from Ρ3 into Ρ2 by projection onto Π(Ρ2)

X

Y

O

pm

M

Π (Ρ2)

0

0

01

1 0 0 0

0 1 0 0 ,

1 0 0 01

p p p

z

Xx

Y Zm D T M P M y

Z Zρ ρ ρ

= = ⇒ = =

Perspective projection P0 from Ρ3 onto Ρ2:

=

0 0

1 0 0 0

0 1 0 0

0 0 1 0

1 0 0 0 1 1

p p

p p

x x

y y

Z Z

=

00

1 0 0 0

0 1 0 0

1 0 0 0 1 1

0 0 1 0

p p

p p

x

y

Z

x

y

Z

=

0

1 0 0 0

0 1 0 0

1 0 0 0 11

p

p

p

p

xx

yy

Z

Introduction to Computer Vision for Robotics

X

Y

Image center

c= (cx, cy)T

Projection center

Z (Optical axis)

Focal length Z0

Pixel scale

f= (fx,fy)T

x

y

Pixel coordinates

m = (y,x)T

Image plane and image sensor

� A sensor with picture elements (Pixel) is added onto the image plane

Image sensor

pm Km=

Image-sensor mapping:

0

0 0 1

x x

y y

f s c

K f c

=

� Pixel coordinates are related to image coordinates by affine transformation K with five parameters:

− Image center c at optical axis

− distance Zp (focal length) and Pixel size determines pixel resolution fx, fy

− image skew s to model angle between pixel rows and columns

Page 8: Introduction to Computer Vision for Roboticsgamma.cs.unc.edu/courses/robotics-f08/LEC/IntroductionToComputerVision.pdfIntroduction to Computer Vision for Robotics The Essential Matrix

8

Introduction to Computer Vision for Robotics

Projection in general pose

Rotation [R]

Projection center CM

World coordinates

Projection:mp

0 1cam T

R CT

=

Ppm PMρ =

1

0 1

T T

scene cam T

R R CT T − −

= =

Introduction to Computer Vision for Robotics

Projection matrix P

� Camera projection matrix P combines:

− inverse affine transformation Tcam-1 from general pose to origin

− Perspective projection P0 to image plane at Z0 =1

− affine mapping K from image to sensor coordinates

0, = T T

scenem PM P KP T K R R Cρ ⇒ = = −

[ ]0

1 0 0 0

projection: 0 1 0 0 0

0 0 1 0

P I

= =

scene pose transformation: 0 1

T T

scene T

R R CT

−=

sensor calibration: 0

0 0 1

x x

y y

f s c

K f c

=

Page 9: Introduction to Computer Vision for Roboticsgamma.cs.unc.edu/courses/robotics-f08/LEC/IntroductionToComputerVision.pdfIntroduction to Computer Vision for Robotics The Essential Matrix

9

Introduction to Computer Vision for Robotics

2-view geometry: F-Matrix

Projection onto two views:

= = + = +

0

0

0

1 0 1

X X

Y YM M O

Z ZX

Y

O

0m

M

Z

C1

1m

P0

P1

[ ]−= 1

0 0 0 0P K R I

[ ]ρ −= = −1

1 1 1 1 1 1m PM K R I C M[ ]ρ −= = 1

0 0 0 0 0 0m P M K R I M

[ ]−= −1

1 1 1 1P K R I C

[ ]ρ −

∞⇒ = 1

0 0 0 0 0m K R I M

ρ ∞� 0M

[ ] [ ]− −∞= + −1 1

1 1 1 1 10K R I M K R I C O

∞M

ρ ρ− − −= −1 1 11 1 1 1 0 0 0 0 1 1 1m K R R K m K R C

ρ ρ ∞⇒ = +1 1 0 0 1m H m e

1e

∞ 0H mEpipolar line

Introduction to Computer Vision for Robotics

The Fundamental Matrix F

� The projective points e1 and (H∞m0) define a plane in camera 1 (epipolar plane Π e)

� the epipolar plane intersect the image plane 1 in a line (epipolar line ue)

� the corresponding point m1 lies on that line: m1Tue= 0

� If the points (e1),(m1),(H∞m0) are all collinear, then the collinearitytheorem applies: (m1

T e1 x H∞m0) = 0.

[ ]

[ ]

∞ ∞⇒ =

= − −

1 1 0 1 1 0collinearity of , , ( ) 0

0

0

0

T

x

z y

z xx

y x

m e H m m e H m

e e

e e e

e e

=1 0 0Tm Fm[ ] ∞= 1 xF e H

Fundamental Matrix F Epipolar constraint

3 3xF

Page 10: Introduction to Computer Vision for Roboticsgamma.cs.unc.edu/courses/robotics-f08/LEC/IntroductionToComputerVision.pdfIntroduction to Computer Vision for Robotics The Essential Matrix

10

Introduction to Computer Vision for Robotics

Estimation of F from correspondences

� Given a set of corresponding points, solve linearily for the 9 elements of

F in projective coordinates

� since the epipolar constraint is homogeneous up to scale, only eight

elements are independent

� since the operator [e]x and hence F have rank 2, F has only 7

independent parameters (all epipolar lines intersect at e)

� each correspondence gives 1 collinearity constraint

=> solve F with minimum of 7 correspondences

for N>7 correspondences minimize distance point-line:

=

⇒∑ 2

1, 0,

0

( ) min!N

T

n n

n

m Fm

=det( ) 0 (Rank 2 constraint)F1 0 0T

i im Fm =

Introduction to Computer Vision for Robotics

The Essential Matrix E

• F is the most general constraint on an image pair. If the camera

calibration matrix K is known, then more constraints are available

• Essential Matrix E

� E holds the relative orientation of a calibrated camera pair. It has 5 degrees of freedom: 3 from rotation matrix Rik, 2 from direction of translation e, the epipole.

( ) ( ) ( ) 010101~~~~ mFKKmmKFmKFmm

E

TTTT

43421==

==

0

0

0

][ with ][

xy

xz

yz

xx

ee

ee

ee

eReE

Page 11: Introduction to Computer Vision for Roboticsgamma.cs.unc.edu/courses/robotics-f08/LEC/IntroductionToComputerVision.pdfIntroduction to Computer Vision for Robotics The Essential Matrix

11

Introduction to Computer Vision for Robotics

Estimation of P from E

� From E we can obtain a camera projection matrix pair: E=Udiag(0,0,1)VT

� P0=[I3x3 | 03x1] and there are four choices for P1:

P1=[UWVT | +u3] or P1=[UWVT | -u3] or P1=[UWTVT | +u3] or P1=[UWTVT | -u3]

=

100

001

010

with W

four possible configurations:

only one with 3D point in front of both cameras

Introduction to Computer Vision for Robotics

3D Feature Reconstruction

• corresponding point pair (m0, m1) is projected from

3D feature point M

• M is reconstructed from by (m0, m1) triangulation• M has minimum distance of intersection

m0

m1

P0

F

M0I

P1

1I

d

2 2

0 0 1 1( ) ( ) min.m P M m PM− + − ⇒

minimize reprojection error:

2min!d ⇒

constraints:

0

1

0

0

T

T

I d

I d

=

=

Page 12: Introduction to Computer Vision for Roboticsgamma.cs.unc.edu/courses/robotics-f08/LEC/IntroductionToComputerVision.pdfIntroduction to Computer Vision for Robotics The Essential Matrix

12

Introduction to Computer Vision for Robotics

Multi View Tracking

• 2D match: Image correspondence (m1, mi)

m0

m1

Pi

M

3D match

2D match

mi

• 3D match: Correspondence transfer (mi, M) via P1

• 3D Pose estimation of Pi with mi - Pi M => min.

= =

− ⇒∑∑2

,

0 0

Minimize lobal reprojection error: min!N K

k i i k

i k

m PM

P0

P1

Introduction to Computer Vision for Robotics

Correspondences matching vs. tracking

Extract features independently and then match by comparing descriptors [Lowe 2004]

Extract features in first images and find same feature back in next view [Lucas & Kanade 1981] , [Shi & Tomasi 1994]

� Small difference between frames� potential large difference overall

� Image-to-image correspondences are essential to 3D reconstruction

KLT-trackerSIFT-matcher

Page 13: Introduction to Computer Vision for Roboticsgamma.cs.unc.edu/courses/robotics-f08/LEC/IntroductionToComputerVision.pdfIntroduction to Computer Vision for Robotics The Essential Matrix

13

Introduction to Computer Vision for Robotics

SIFT-detector

� Scale and image-plane-rotation invariant feature descriptor [Lowe 2004]

Introduction to Computer Vision for Robotics

SIFT-detector

• Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging parameters

Page 14: Introduction to Computer Vision for Roboticsgamma.cs.unc.edu/courses/robotics-f08/LEC/IntroductionToComputerVision.pdfIntroduction to Computer Vision for Robotics The Essential Matrix

14

Introduction to Computer Vision for Robotics

Difference of Gaussian for Scale

invariance

� Difference-of-Gaussian with constant ratio of scales is a close approximation to Lindeberg’s scale-normalized Laplacian[Lindeberg 1998]

Gaussian

Difference of Gaussian

Introduction to Computer Vision for Robotics

Difference of Gaussian for Scale

invariance

� Difference-of-Gaussian with constant ratio of scales is a close approximation to Lindeberg’s scale-normalized Laplacian[Lindeberg 1998]

Page 15: Introduction to Computer Vision for Roboticsgamma.cs.unc.edu/courses/robotics-f08/LEC/IntroductionToComputerVision.pdfIntroduction to Computer Vision for Robotics The Essential Matrix

15

Introduction to Computer Vision for Robotics

Key point localization• Detect maxima and minima of

difference-of-Gaussian in scale space

• Fit a quadratic to surrounding values for sub-pixel and sub-scale interpolation (Brown & Lowe, 2002)

• Taylor expansion around point:

• Offset of extremum (use finite differences for derivatives):

Blur

Subtract

Introduction to Computer Vision for Robotics

Orientation normalization

� Histogram of local gradient directions computed at selected scale

� Assign principal orientation at peak of smoothed histogram

� Each key specifies stable 2D coordinates (x, y, scale, orientation)

0 2π

Page 16: Introduction to Computer Vision for Roboticsgamma.cs.unc.edu/courses/robotics-f08/LEC/IntroductionToComputerVision.pdfIntroduction to Computer Vision for Robotics The Essential Matrix

16

Introduction to Computer Vision for Robotics

Example of keypoint detectionThreshold on value at DOG peak and on ratio of principle

curvatures (Harris approach)

(a) 233x189 image

(b) 832 DOG extrema(c) 729 left after peak

value threshold(d) 536 left after testing

ratio of principle

curvatures

courtesy Lowe

Introduction to Computer Vision for Robotics

SIFT vector formation

� Thresholded image gradients are sampled over 16x16 array of locations in scale space

� Create array of orientation histograms

� 8 orientations x 4x4 histogram array = 128 dimensions

© Lowe

example 2x2 histogram array

Page 17: Introduction to Computer Vision for Roboticsgamma.cs.unc.edu/courses/robotics-f08/LEC/IntroductionToComputerVision.pdfIntroduction to Computer Vision for Robotics The Essential Matrix

17

Introduction to Computer Vision for Robotics

Sift feature detector

Introduction to Computer Vision for Robotics

Robust data selection: RANSAC

• Estimation of plane from point data

Select m samples

Compute n-parameter solution

Evaluate on {potential plane points}

Best solution so far?

Keep it

{potential plane points}

yes no

1-(1-(#inlier/|{potential plane points}|)n)steps>0.99

no

Best solution, {inlier}, {outlier}

{inliersamples}

x

xx

x

x

xx

xx

x

x

x

x

x

x

xx

Page 18: Introduction to Computer Vision for Roboticsgamma.cs.unc.edu/courses/robotics-f08/LEC/IntroductionToComputerVision.pdfIntroduction to Computer Vision for Robotics The Essential Matrix

18

Introduction to Computer Vision for Robotics

RANSAC: Evaluate Hypotheses

x

xx

x

x

xx

xx

x

x

x

x

x

x

xx

� Evaluate cost function

ε

( )

+−++

<≤+

+≤≤

1else

12if1

1

10

22222

2222

ελελελ

ελελ

cccc

c

c

cc

c

1ε0ε

1ε0ε

1ε0ε

1ε0ε

nεpote

ntial poin

ts

plane hypotheses

Introduction to Computer Vision for Robotics

Robust Pose Estimation Calibrated Camera

{2D-3D correspondences}

RANSAC(3-point algorithm)

nonlinear refinement with all inliers

{2D-2D correspondences}

E-RANSAC(5-point algorithm)

nonlinear refinement with all inliers

Bootstrap known 3D points

E P

P

Estimate P0,P1

P0,P1

P0,P1

triangulate points

Page 19: Introduction to Computer Vision for Roboticsgamma.cs.unc.edu/courses/robotics-f08/LEC/IntroductionToComputerVision.pdfIntroduction to Computer Vision for Robotics The Essential Matrix

19

Introduction to Computer Vision for Robotics

References

[Lowe 2004] David Lowe, Distinctive Image Features from Scale-Invariant Keypoints, IJCV, 60(2), 2004, pp91-110

[Lucas & Kanade 1981] Bruce D. Lucas and Takeo Kanade. An Iterative Image Registration Technique with an Application to Stereo Vision. In Proceedings International Joint Conference on Artificial Intelligence, 1981.

[Shi & Tomasi 1994] Jianbo Shi and Carlo Tomasi, Good Features to Track, IEEE Conference on Computer Vision and Pattern Recognition 1994

[Baker & Matthews 2004] S. Baker and I. Matthews, Lucas-Kanade 20 Years On: A Unifying Framework.International Journal of Computer Vision, 56(3):221–255, March 2004.

[Fischler & Bolles] M.A. Fischler and R.C. Bolles, Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography.CACM, 24(6), June’81.

Introduction to Computer Vision for Robotics

References

� [Mikolajczyk 2003], K.Mikolajczyk, C.Schmid. “A Performance Evaluation of Local Descriptors”. CVPR 2003

� [Lindeberg 1998], T. Lindeberg, "Feature detection with automatic scale selection," International Journal of Computer Vision, vol. 30, no. 2, 1998

� [Hartley 2003] R. hartley and A. Zisserman, “Multiple View Geometry in Computer Vision”, 2nd edition, Cambridge Press, 2003