introduction to computer vision for...

1

Introduction to Computer Vision

for Robotics

Jan-Michael Frahm

Introduction to Computer Vision for Robotics

Overview

� Camera model

� Multi-view geometry

� Camera pose estimation

� Feature tracking & matching

� Robust pose estimation

2


Homogeneous coordinates

e2

e3

e1o

MUnified notation:

include origin in affine basis

1

21 2 3

30 0 0 1

1

a

ae e e oM

a

=

r r r r

Affine basis matrix

Homogeneous

Coordinates of M

a1

a3

a2


Properties of affine transformation

� Parallelism is preserved

� ratios of length, area, and volume are preserved

� Transformations can be concatenated:

11 12 13

21 22 23

31 32 33

0 0 0 1

x

y

affine

z

a a a t

a a a tT

a a a t

=

3 3 3

0 0 0 1

x

affine

A tM T M M

′ = =

1 1 2 2 1 2 2 1 21if and M T M M T M M T T M T M= = ⇒ = =

Transformation Taffine combines linear mapping and

coordinate shift in homogeneous coordinates

− Linear mapping with A3x3 matrix

− coordinate shift with t3 translation vector

3


Projective geometry

� Projective space P2 is space of rays emerging from O

− view point O forms projection center for all rays

− rays v emerge from viewpoint into scene

− ray g is called projective point, defined as scaled v: g=lv

x

y

w R3

( ) , 0

x

g y v

w

λ λ λ

= = ∈ ℜ ≠

v

O

1

x

v y

′ ′=


Projective and homogeneous points

� Given: Plane Π in R2 embedded in P2 at coordinates w=1

− viewing ray g intersects plane at v (homogeneous coordinates)

− all points on ray g project onto the same homogeneous point v

− projection of g onto Π is defined by scaling v=g/l = g/w

w=1

R3( )

1

x x

g v y y

w

λ λ λ

′ ′= = =

1

11

x

wxy

v y gw w

′ ′= = =

O

Π (Ρ2)

w

x

v

y

4


Finite and infinite points

� All rays g that are not parallel to Π intersect at an affine point v on Π.

w=1

P3

O

vΠ (Ρ2) 2v1vv∞

g1g

2g

0

x

g y∞

=

� The ray g(w=0) does not intersect Π. Hence v∞ is not an affine point but a direction. Directions have the coordinates (x,y,z,0)T

� Projective space combines affine space with infinite points (directions).


Affine and projective transformations

� Affine transformation leaves infinite points at infinity

11 12 13

21 22 23

31 32 33

0 0 0 0 1 0

x

y

z

X a a a t X

Y a a a t Y

Z a a a t Z

′ ′ ⇒ = ′

'

affineM T M∞ ∞=

� Projective transformations move infinite points into finite affine space

'

projectiveM T M∞=

11 12 13

21 22 23

31 32 33

41 42 43 441 0

xp

yp

zp

a a a tx X X

a a a ty Y Y

a a a tz Z Z

w w w ww

λ λ

′ ′ ⇒ = = ′

′

Example: Parallel lines intersect at the horizon (line of infinite points).

We can see this intersection due to perspective projection!

5


Pinhole Camera (Camera obscura)

Interior of camera obscura

(Sunday Magazine, 1838)Camera obscura

(France, 1830)


Focal length f

aperture

image

object

View direction

lens

Image sensor

Center

(c x, c y

)

Pinhole camera model

6


Perspective projection

� Perspective projection in P3 models pinhole camera:

− scene geometry is affine Ρ3 space with coordinates M=(X,Y,Z,1)T

− camera focal point in O=(0,0,0,1)T, camera viewing direction along Z

− image plane (x,y) in Π(Ρ2) aligned with (X,Y) at Z= Z0

− Scene point M projects onto point Mp on plane surface

X

Y

Z0

Π3

O

pMΠ (Ρ2)

Image plane

Camera center

Z (Optical axis)

1

X

YM

Z

=

0

0

0

0

0

0

1 1

Z Xp Z

Z Yp Z

p Z Z

ZZZ

Xx

Yy ZM

ZZ Z

= = =

x

y


Projective Transformation

� Projective Transformation maps M onto Mp in Ρ3 space

0

0

1

1 0 0 0

0 1 0 0

0 0 1 0

0 0 01 1

p

p

p p

z

x X X

y Y YM T M

Z Z Z

W

ρ ρ

= ⇒ = =

X

Y

O

M

pM

� Projective Transformation linearizes projection

0

projective scale factorZ

Zρ = =

ρ

7


Perspective Projection

Dimension reduction from Ρ3 into Ρ2 by projection onto Π(Ρ2)

X

Y

O

pm

M

Π (Ρ2)

0

0

01

1 0 0 0

0 1 0 0 ,

1 0 0 01

p p p

z

Xx

Y Zm D T M P M y

Z Zρ ρ ρ

= = ⇒ = =

Perspective projection P0 from Ρ3 onto Ρ2:

=

0 0

1 0 0 0

0 1 0 0

0 0 1 0

1 0 0 0 1 1

p p

p p

x x

y y

Z Z

=

00

1 0 0 0

0 1 0 0

1 0 0 0 1 1

0 0 1 0

p p

p p

x

y

Z

x

y

Z

=

0

1 0 0 0

0 1 0 0

1 0 0 0 11

p

p

p

p

xx

yy

Z


X

Y

Image center

c= (cx, cy)T

Projection center

Z (Optical axis)

Focal length Z0

Pixel scale

f= (fx,fy)T

x

y

Pixel coordinates

m = (y,x)T

Image plane and image sensor

� A sensor with picture elements (Pixel) is added onto the image plane

Image sensor

pm Km=

Image-sensor mapping:

0

0 0 1

x x

y y

f s c

K f c

=

� Pixel coordinates are related to image coordinates by affine transformation K with five parameters:

− Image center c at optical axis

− distance Zp (focal length) and Pixel size determines pixel resolution fx, fy

− image skew s to model angle between pixel rows and columns

8


Projection in general pose

Rotation [R]

Projection center CM

World coordinates

Projection:mp

0 1cam T

R CT

=

Ppm PMρ =

1

0 1

T T

scene cam T

R R CT T − −

= =


Projection matrix P

� Camera projection matrix P combines:

− inverse affine transformation Tcam-1 from general pose to origin

− Perspective projection P0 to image plane at Z0 =1

− affine mapping K from image to sensor coordinates

0, = T T

scenem PM P KP T K R R Cρ ⇒ = = −

[ ]0

1 0 0 0

projection: 0 1 0 0 0

0 0 1 0

P I

= =

scene pose transformation: 0 1

T T

scene T

R R CT

−=

sensor calibration: 0

0 0 1

x x

y y

f s c

K f c

=

9


2-view geometry: F-Matrix

Projection onto two views:

∞

= = + = +

0

0

0

1 0 1

X X

Y YM M O

Z ZX

Y

O

0m

M

Z

C1

1m

P0

P1

[ ]−= 1

0 0 0 0P K R I

[ ]ρ −= = −1

1 1 1 1 1 1m PM K R I C M[ ]ρ −= = 1

0 0 0 0 0 0m P M K R I M

[ ]−= −1

1 1 1 1P K R I C

[ ]ρ −

∞⇒ = 1

0 0 0 0 0m K R I M

ρ ∞� 0M

[ ] [ ]− −∞= + −1 1

1 1 1 1 10K R I M K R I C O

∞M

ρ ρ− − −= −1 1 11 1 1 1 0 0 0 0 1 1 1m K R R K m K R C

ρ ρ ∞⇒ = +1 1 0 0 1m H m e

1e

∞ 0H mEpipolar line


The Fundamental Matrix F

� The projective points e1 and (H∞m0) define a plane in camera 1 (epipolar plane Π e)

� the epipolar plane intersect the image plane 1 in a line (epipolar line ue)

� the corresponding point m1 lies on that line: m1Tue= 0

� If the points (e1),(m1),(H∞m0) are all collinear, then the collinearitytheorem applies: (m1

T e1 x H∞m0) = 0.

[ ]

[ ]

∞ ∞⇒ =

−

= − −

1 1 0 1 1 0collinearity of , , ( ) 0

0

0

0

T

x

z y

z xx

y x

m e H m m e H m

e e

e e e

e e

=1 0 0Tm Fm[ ] ∞= 1 xF e H

Fundamental Matrix F Epipolar constraint

3 3xF

10


Estimation of F from correspondences

� Given a set of corresponding points, solve linearily for the 9 elements of

F in projective coordinates

� since the epipolar constraint is homogeneous up to scale, only eight

elements are independent

� since the operator [e]x and hence F have rank 2, F has only 7

independent parameters (all epipolar lines intersect at e)

� each correspondence gives 1 collinearity constraint

=> solve F with minimum of 7 correspondences

for N>7 correspondences minimize distance point-line:

=

⇒∑ 2

1, 0,

0

( ) min!N

T

n n

n

m Fm

=det( ) 0 (Rank 2 constraint)F1 0 0T

i im Fm =


The Essential Matrix E

• F is the most general constraint on an image pair. If the camera

calibration matrix K is known, then more constraints are available

• Essential Matrix E

� E holds the relative orientation of a calibrated camera pair. It has 5 degrees of freedom: 3 from rotation matrix Rik, 2 from direction of translation e, the epipole.

( ) ( ) ( ) 010101~~~~ mFKKmmKFmKFmm

E

TTTT

43421==

−

−

−

==

0

0

0

][ with ][

xy

xz

yz

xx

ee

ee

ee

eReE

11


Estimation of P from E

� From E we can obtain a camera projection matrix pair: E=Udiag(0,0,1)VT

� P0=[I3x3 | 03x1] and there are four choices for P1:

P1=[UWVT | +u3] or P1=[UWVT | -u3] or P1=[UWTVT | +u3] or P1=[UWTVT | -u3]

−

=

100

001

010

with W

four possible configurations:

only one with 3D point in front of both cameras


3D Feature Reconstruction

• corresponding point pair (m0, m1) is projected from

3D feature point M

• M is reconstructed from by (m0, m1) triangulation• M has minimum distance of intersection

m0

m1

P0

F

M0I

P1

1I

d

2 2

0 0 1 1( ) ( ) min.m P M m PM− + − ⇒

minimize reprojection error:

2min!d ⇒

constraints:

0

1

0

0

T

T

I d

I d

=

=

12


Multi View Tracking

• 2D match: Image correspondence (m1, mi)

m0

m1

Pi

M

3D match

2D match

mi

• 3D match: Correspondence transfer (mi, M) via P1

• 3D Pose estimation of Pi with mi - Pi M => min.

= =

− ⇒∑∑2

,

0 0

Minimize lobal reprojection error: min!N K

k i i k

i k

m PM

P0

P1


Correspondences matching vs. tracking

Extract features independently and then match by comparing descriptors [Lowe 2004]

Extract features in first images and find same feature back in next view [Lucas & Kanade 1981] , [Shi & Tomasi 1994]

� Small difference between frames� potential large difference overall

� Image-to-image correspondences are essential to 3D reconstruction

KLT-trackerSIFT-matcher

13


SIFT-detector

� Scale and image-plane-rotation invariant feature descriptor [Lowe 2004]


SIFT-detector

• Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging parameters

14


Difference of Gaussian for Scale

invariance

� Difference-of-Gaussian with constant ratio of scales is a close approximation to Lindeberg’s scale-normalized Laplacian[Lindeberg 1998]

Gaussian

Difference of Gaussian


Difference of Gaussian for Scale

invariance

� Difference-of-Gaussian with constant ratio of scales is a close approximation to Lindeberg’s scale-normalized Laplacian[Lindeberg 1998]

15


Key point localization• Detect maxima and minima of

difference-of-Gaussian in scale space

• Fit a quadratic to surrounding values for sub-pixel and sub-scale interpolation (Brown & Lowe, 2002)

• Taylor expansion around point:

• Offset of extremum (use finite differences for derivatives):

Blur

Subtract


Orientation normalization

� Histogram of local gradient directions computed at selected scale

� Assign principal orientation at peak of smoothed histogram

� Each key specifies stable 2D coordinates (x, y, scale, orientation)

0 2π

16


Example of keypoint detectionThreshold on value at DOG peak and on ratio of principle

curvatures (Harris approach)

(a) 233x189 image

(b) 832 DOG extrema(c) 729 left after peak

value threshold(d) 536 left after testing

ratio of principle

curvatures

courtesy Lowe


SIFT vector formation

� Thresholded image gradients are sampled over 16x16 array of locations in scale space

� Create array of orientation histograms

� 8 orientations x 4x4 histogram array = 128 dimensions

© Lowe

example 2x2 histogram array

17


Sift feature detector


Robust data selection: RANSAC

• Estimation of plane from point data

Select m samples

Compute n-parameter solution

Evaluate on {potential plane points}

Best solution so far?

Keep it

{potential plane points}

yes no

1-(1-(#inlier/|{potential plane points}|)n)steps>0.99

no

Best solution, {inlier}, {outlier}

{inliersamples}

x

xx

x

x

xx

xx

x

x

x

x

x

x

xx

18


RANSAC: Evaluate Hypotheses

x

xx

x

x

xx

xx

x

x

x

x

x

x

xx

� Evaluate cost function

ε

2ε

( )

+−++

<≤+

+≤≤

1else

12if1

1

10

22222

2222

ελελελ

ελελ

cccc

c

c

cc

c

1ε0ε

nε

1ε0ε

nε

1ε0ε

nε

1ε0ε

nεpote

ntial poin

ts

plane hypotheses


Robust Pose Estimation Calibrated Camera

{2D-3D correspondences}

RANSAC(3-point algorithm)

nonlinear refinement with all inliers

{2D-2D correspondences}

E-RANSAC(5-point algorithm)

nonlinear refinement with all inliers

Bootstrap known 3D points

E P

P

Estimate P0,P1

P0,P1

P0,P1

triangulate points

19


References

[Lowe 2004] David Lowe, Distinctive Image Features from Scale-Invariant Keypoints, IJCV, 60(2), 2004, pp91-110

[Lucas & Kanade 1981] Bruce D. Lucas and Takeo Kanade. An Iterative Image Registration Technique with an Application to Stereo Vision. In Proceedings International Joint Conference on Artificial Intelligence, 1981.

[Shi & Tomasi 1994] Jianbo Shi and Carlo Tomasi, Good Features to Track, IEEE Conference on Computer Vision and Pattern Recognition 1994

[Baker & Matthews 2004] S. Baker and I. Matthews, Lucas-Kanade 20 Years On: A Unifying Framework.International Journal of Computer Vision, 56(3):221–255, March 2004.

[Fischler & Bolles] M.A. Fischler and R.C. Bolles, Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography.CACM, 24(6), June’81.


References

� [Mikolajczyk 2003], K.Mikolajczyk, C.Schmid. “A Performance Evaluation of Local Descriptors”. CVPR 2003

� [Lindeberg 1998], T. Lindeberg, "Feature detection with automatic scale selection," International Journal of Computer Vision, vol. 30, no. 2, 1998

� [Hartley 2003] R. hartley and A. Zisserman, “Multiple View Geometry in Computer Vision”, 2nd edition, Cambridge Press, 2003

introduction to computer vision for...

Documents