c280, computer vision - eecs at uc berkeleytrevor/cs280notes/11sfm.pdfc280, computer vision prof....

67
C280, Computer Vision Prof. Trevor Darrell [email protected] Lecture 11: Structure from Motion

Upload: dinhcong

Post on 30-May-2018

225 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

C280, Computer Vision

Prof. Trevor Darrell

[email protected]

Lecture 11: Structure from Motion

Page 2: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Roadmap• Previous: Image formation, filtering, local features, (Texture)…

• Tues: Feature-based Alignment

– Stitching images together

– Homographies, RANSAC, Warping, Blending

– Global alignment of planar models

• Today: Dense Motion Models

– Local motion / feature displacement– Local motion / feature displacement

– Parametric optic flow

• No classes next week: ICCV conference

• Oct 6th: Stereo / ‘Multi-view’: Estimating depth with known

inter-camera pose

• Oct 8th: ‘Structure-from-motion’: Estimation of pose and 3D

structure

– Factorization approaches

– Global alignment with 3D point models

Page 3: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Last time: Stereo

• Human stereopsis & stereograms

• Epipolar geometry and the epipolar constraint

– Case example with parallel optical axes

– General case with calibrated cameras

• Correspondence search

• The Essential and the Fundamental Matrix

• Multi-view stereo

Page 4: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Today: SFM

• SFM problem statement

• Factorization

• Projective SFM

• Bundle Adjustment• Bundle Adjustment

• Photo Tourism

• “Rome in a day:

Page 5: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Structure from motion

Lazebnik

Page 6: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Multiple-view geometry questions

• Scene geometry (structure): Given 2D point matches in two or more images, where are the corresponding points in 3D?

• Correspondence (stereo matching): Given a point in just one image, how does it constrain the position of the corresponding point in another image?

• Camera geometry (motion): Given a set of corresponding points in two or more images, what are the camera matrices for these views?

Lazebnik

Page 7: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Structure from motion

• Given: m images of n fixed 3D points

xij = Pi Xj , i = 1, … , m, j = 1, … , n

• Problem: estimate m projection matrices Pi and

n 3D points Xj from the mn correspondences xij

Xj

x1j

x2j

x3j

P1

P2

P3

Lazebnik

Page 8: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Structure from motion ambiguity

• If we scale the entire scene by some factor k and, at the

same time, scale the camera matrices by the factor of 1/k,

the projections of the scene points in the image remain

exactly the same:

It is impossible to recover the absolute scale of the scene!

)(1

XPPXx kk

==

Lazebnik

Page 9: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Structure from motion ambiguity

• If we scale the entire scene by some factor k and, at the

same time, scale the camera matrices by the factor of 1/k,

the projections of the scene points in the image remain

exactly the same

• More generally: if we transform the scene using a

transformation Q and apply the inverse transformation to transformation Q and apply the inverse transformation to

the camera matrices, then the images do not change

( )( )QXPQPXx-1==

Lazebnik

Page 10: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Projective ambiguity

( )( )XQPQPXx P

-1

P==

Lazebnik

Page 11: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Projective ambiguity

Lazebnik

Page 12: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Affine ambiguity

Affine

( )( )XQPQPXx A

-1

A==

Lazebnik

Page 13: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Affine ambiguity

Lazebnik

Page 14: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Similarity ambiguity

( )( )XQPQPXx S

-1

S==

Lazebnik

Page 15: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Similarity ambiguity

Lazebnik

Page 16: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Hierarchy of 3D transformations

vTv

tAProjective15dof

Affine

12dof

Preserves intersection and

tangency

Preserves parallellism,

volume ratios

10

tAT

tRsSimilarity7dof

Euclidean

6dof

Preserves angles, ratios of

length

10

tRT

s

10

tRT

Preserves angles, lengths

• With no constraints on the camera calibration matrix or on the scene, we get a projective reconstruction

• Need additional information to upgrade the reconstruction to affine, similarity, or Euclidean

Lazebnik

Page 17: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Structure from motion

• Let’s start with affine cameras (the math is easier)

center atinfinity

Lazebnik

Page 18: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Recall: Orthographic Projection

Special case of perspective projection

• Distance from center of projection to image plane is infinite

Image World

• Projection matrix:

Slide by Steve SeitzLazebnik

Page 19: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Orthographic Projection

Affine cameras

Parallel Projection

Lazebnik

Page 20: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Affine cameras

• A general affine camera combines the effects of an affine

transformation of the 3D space, orthographic projection,

and an affine transformation of the image:

=

×=10

bAP

1000

]affine44[

1000

0010

0001

]affine33[ 2232221

1131211

baaa

baaa

• Affine projection is a linear mapping + translation in

inhomogeneous coordinates

10001000

x

Xa1

a2

bAXx +=

+

=

=

2

1

232221

131211

b

b

Z

Y

X

aaa

aaa

y

x

Projection ofworld originLazebnik

Page 21: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Affine structure from motion

• Given: m images of n fixed 3D points:

xij = Ai Xj + bi , i = 1,… , m, j = 1, … , n

• Problem: use the mn correspondences xij to estimate mprojection matrices Ai and translation vectors bi, and n points Xj

• The reconstruction is defined up to an arbitrary affine

transformation Q (12 degrees of freedom):transformation Q (12 degrees of freedom):

• We have 2mn knowns and 8m + 3n unknowns (minus 12 dof for affine ambiguity)

• Thus, we must have 2mn >= 8m + 3n – 12

• For two views, we need four point correspondences

1

XQ

1

X,Q

10

bA

10

bA1

Lazebnik

Page 22: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Affine structure from motion

• Centering: subtract the centroid of the image points

( )

ji

n

k

kji

n

k

ikiiji

n

k

ikijij

n

nn

XAXXA

bXAbXAxxx

ˆ1

11ˆ

1

11

=

−=

+−+=−=

∑∑

=

==

• For simplicity, assume that the origin of the world

coordinate system is at the centroid of the 3D points

• After centering, each normalized point xij is related to the

3D point Xi by

jiij XAx =ˆ

Lazebnik

Page 23: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Affine structure from motion

• Let’s create a 2m × n data (measurement) matrix:

=n

n

xxx

xxx

xxx

D

ˆˆˆ

ˆˆˆ

ˆˆˆ

22221

11211

L

O

L

L

cameras(2m)

mnmm xxx ˆˆˆ

21 L

points (n)

C. Tomasi and T. Kanade. Shape and motion from image streams under orthography:

A factorization method. IJCV, 9(2):137-154, November 1992.

Page 24: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Affine structure from motion

• Let’s create a 2m × n data (measurement) matrix:

[ ]n

n

n

XXX

A

A

A

xxx

xxx

xxx

D LM

L

O

L

L

21

2

1

22221

11211

ˆˆˆ

ˆˆˆ

ˆˆˆ

=

=

points (3 × n)

mmnmm Axxx L21ˆˆˆ

cameras

(2m × 3)

The measurement matrix D = MS must have rank 3!

C. Tomasi and T. Kanade. Shape and motion from image streams under orthography:

A factorization method. IJCV, 9(2):137-154, November 1992.

Page 25: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Factorizing the measurement matrix

Source: M. HebertLazebnik

Page 26: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Factorizing the measurement matrix

• Singular value decomposition of D:

Source: M. HebertLazebnik

Page 27: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Factorizing the measurement matrix

• Singular value decomposition of D:

Source: M. HebertLazebnik

Page 28: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Factorizing the measurement matrix

• Obtaining a factorization from SVD:

Source: M. HebertLazebnik

Page 29: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Factorizing the measurement matrix

• Obtaining a factorization from SVD:

Source: M. Hebert

This decomposition minimizes

|D-MS|2

Lazebnik

Page 30: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Affine ambiguity

• The decomposition is not unique. We get the same D by

using any 3×3 matrix C and applying the transformations M

→ MC, S →C-1S

• That is because we have only an affine transformation and

we have not enforced any Euclidean constraints (like

forcing the image axes to be perpendicular, for example)

Source: M. HebertLazebnik

Page 31: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

• Orthographic: image axes are perpendicular and scale is 1

Eliminating the affine ambiguity

x

a2

a1 · a2 = 0

|a1|2 = |a2|

2 = 1

• This translates into 3m equations in L = CCT :

Ai L AiT = Id, i = 1, …, m

• Solve for L

• Recover C from L by Cholesky decomposition: L = CCT

• Update M and S: M = MC, S = C-1S

Xa1

Source: M. HebertLazebnik

Page 32: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Algorithm summary

• Given: m images and n features xij

• For each image i, center the feature coordinates

• Construct a 2m × n measurement matrix D:

• Column j contains the projection of point j in all views

• Row i contains one coordinate of the projections of all the n

points in image i

• Factorize D:• Factorize D:

• Compute SVD: D = U W VT

• Create U3 by taking the first 3 columns of U

• Create V3 by taking the first 3 columns of V

• Create W3 by taking the upper left 3 × 3 block of W

• Create the motion and shape matrices:

• M = U3W3½ and S = W3

½ V3T (or M = U3 and S = W3V3

T)

• Eliminate affine ambiguity

Source: M. HebertLazebnik

Page 33: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Reconstruction results

C. Tomasi and T. Kanade. Shape and motion from image streams under orthography:

A factorization method. IJCV, 9(2):137-154, November 1992.

Page 34: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Dealing with missing data

• So far, we have assumed that all points are visible in all

views

• In reality, the measurement matrix typically looks

something like this:

cameras

points

Lazebnik

Page 35: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Dealing with missing data

• Possible solution: decompose matrix into dense sub-

blocks, factorize each sub-block, and fuse the results

• Finding dense maximal sub-blocks of the matrix is NP-complete

(equivalent to finding maximal cliques in a graph)

• Incremental bilinear refinement

(1) Perform

factorization on a

dense sub-block

(2) Solve for a new

3D point visible by

at least two known cameras (linear

least squares)

(3) Solve for a new

camera that sees at

least three known 3D points (linear

least squares)

F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. Segmenting, Modeling, and

Matching Video Clips Containing Multiple Moving Objects. PAMI 2007.

Page 36: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Further Factorization work

Factorization with uncertainty

Factorization for indep. moving objects (now)

Factorization for articulated objects (now)

(Irani & Anandan, IJCV’02)

(Costeira and Kanade ‘94)

Factorization for articulated objects (now)

Factorization for dynamic objects (now)

Perspective factorization (next week)

Factorization with outliers and missing pts.

(Bregler et al. 2000, Brand 2001)

(Jacobs ‘97 (affine), Martinek & Pajdla‘01 Aanaes’02 (perspective))

(Sturm & Triggs 1996, …)

(Yan and Pollefeys ‘05)

Pollefeys

Page 37: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Structure from motion of multiple moving

objects

Pollefeys

Page 38: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Structure from motion of multiple moving

objects

Pollefeys

Page 39: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Shape interaction matrix

Shape interaction matrix for articulated objects looses

block diagonal structure

Costeira and Kanade’s approach is not usable for articulated bodies(assumes independent motions)

Pollefeys

Page 40: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Articulated motion subspaces

Joint (1D intersection)

(joint=origin)

Motion subspaces for articulated bodies intersect

(rank=8-1)

(Yan and Pollefeys, CVPR’05)

(Tresadern and Reid, CVPR’05)

Hinge (2D intersection)

(hinge=z-axis)(rank=8-2)

Exploit rank constraint to obtain better estimate

(Yan & Pollefeys, 06?)Also for non-rigid parts if

Page 41: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Student

Segmentation

Results

Toy truck

Segmentation

IntersectionIntersection

Pollefeys

Page 42: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Articulated shape and motion factorization

Automated kinematic chain building for articulated & non-rigid

obj.

• Estimate principal angles between subspaces

• Compute affinities based on principal angles

• Compute minimum spanning tree

(Yan and Pollefeys, 2006?)

Pollefeys

Page 43: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Structure from motion of deforming objects

Extend factorization approaches to deal with dynamic

shapes

(Bregler et al ’00;Brand ‘01)

Pollefeys

Page 44: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Representing dynamic shapes

∑=k

kk (t)ScS(t)

represent dynamic shape as varying linear combination of basis shapes

(fig. M.Brand)

Pollefeys

Page 45: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Results

(Bregler et al ’00)

Pollefeys

Page 46: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Dynamic SfM factorization

(Brand ’01)

constraints to be satisfied for M

constraints to be satisfied for M, use to compute J

hard!

(different methods are possible, not so simple and also not optimal)

Pollefeys

Page 47: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Non-rigid 3D subspace flow

Same is also possible using optical flow in stead of features, also

takes uncertainty into account (Brand ’01)

Pollefeys

Page 48: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

(Brand ’01)

Results

Pollefeys

Page 49: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Results

(Bregler et al ’01)

Pollefeys

Page 50: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Projective structure from motion

• Given: m images of n fixed 3D points

zij xij = Pi Xj , i = 1,… , m, j = 1, … , n

• Problem: estimate m projection matrices Pi and n 3D

points Xj from the mn correspondences xij

Xj

x1j

x2j

x3j

P1

P2

P3

Lazebnik

Page 51: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Projective structure from motion

• Given: m images of n fixed 3D points

zij xij = Pi Xj , i = 1,… , m, j = 1, … , n

• Problem: estimate m projection matrices Pi and n 3D

points Xj from the mn correspondences xij

• With no calibration info, cameras and points can only be

recovered up to a 4x4 projective transformation Q:recovered up to a 4x4 projective transformation Q:

X → QX, P → PQ-1

• We can solve for structure and motion when

2mn >= 11m +3n – 15

• For two cameras, at least 7 points are needed

Lazebnik

Page 52: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Projective SFM: Two-camera case

• Compute fundamental matrix F between the two views

• First camera matrix: [I|0]

• Second camera matrix: [A|b]

• Then

bAxbX0|IAx +=+=′′ zz ][

XbAxX0|Ix ]|[,][ =′′= zz

bAxbx ×=×′′ zz

xbAxxbx ′⋅×=′⋅×′′ )()( zz

0][T =′× Axbx

AbF ][ ×= b: epipole (FTb = 0), A = –[b×]F

F&P sec. 13.3.1Lazebnik

Page 53: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Projective factorization

[ ]n

mmnmnmmmm

nn

nn

zzz

zzz

zzz

XXX

P

P

P

xxx

xxx

xxx

D LM

L

O

L

L

21

2

1

2211

2222222121

1112121111

=

=

cameras

(3m × 4)

points (4 × n)

• If we knew the depths z, we could factorize D to estimate

M and S

• If we knew M and S, we could solve for z

• Solution: iterative approach (alternate between above

two steps)

(3m × 4)

D = MS has rank 4

Lazebnik

Page 54: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Sequential structure from motion

•Initialize motion from two images

using fundamental matrix

•Initialize structure

•For each additional view:

• Determine projection matrix

of new camera using all the

known 3D points that are

visible in its image –

ca

me

ras

points

visible in its image –

calibration

ca

me

ras

Lazebnik

Page 55: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Sequential structure from motion

•Initialize motion from two images

using fundamental matrix

•Initialize structure

•For each additional view:

• Determine projection matrix

of new camera using all the

known 3D points that are

visible in its image –

ca

me

ras

points

visible in its image –

calibration

• Refine and extend structure:

compute new 3D points,

re-optimize existing points

that are also seen by this

camera – triangulation ca

me

ras

Lazebnik

Page 56: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Sequential structure from motion

•Initialize motion from two images

using fundamental matrix

•Initialize structure

•For each additional view:

• Determine projection matrix

of new camera using all the

known 3D points that are

visible in its image –

ca

me

ras

points

visible in its image –

calibration

• Refine and extend structure:

compute new 3D points,

re-optimize existing points

that are also seen by this

camera – triangulation

•Refine structure and motion:

bundle adjustment

ca

me

ras

Lazebnik

Page 57: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Bundle adjustment

• Non-linear method for refining structure and motion

• Minimizing reprojection error

( )2

1 1

,),( ∑∑= =

=m

i

n

j

jiijDE XPxXP

Xj

x1j

x2j

x3j

P1

P2

P3

P1Xj

P2Xj

P3Xj

Lazebnik

Page 58: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Self-calibration

• Self-calibration (auto-calibration) is the process of

determining intrinsic camera parameters directly from

uncalibrated images

• For example, when the images are acquired by a single

moving camera, we can use the constraint that the

intrinsic parameter matrix remains fixed for all the images

• Compute initial projective reconstruction and find 3D projective • Compute initial projective reconstruction and find 3D projective

transformation matrix Q such that all camera matrices are in the

form Pi = K [Ri | ti]

• Can use constraints on the form of the calibration matrix:

zero skew

Lazebnik

Page 59: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Summary: Structure from motion

• Ambiguity

• Affine structure from motion: factorization

• Dealing with missing data

• Projective structure from motion: two views

• Projective structure from motion: iterative factorization

• Bundle adjustment• Bundle adjustment

• Self-calibration

Lazebnik

Page 60: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Photo Tourism:Photo Tourism:Exploring Photo Collections in 3DExploring Photo Collections in 3D

Noah Snavely

(Included presentation…available fromhttp://phototour.cs.washington.edu/)

© 2006 Noah Snavely

Noah Snavely

Steven M. Seitz

University of Washington

Richard Szeliski

Microsoft Research

© 2006 Noah Snavely

Page 61: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

http://grail.cs.washington.edu/rome/

Page 62: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

“Rome in a day”: Coliseum video

http://grail.cs.washington.edu/rome/

Page 63: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

“Rome in a day”: Trevi video

http://grail.cs.washington.edu/rome/

Page 64: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

“Rome in a day”: St. Peters video

http://grail.cs.washington.edu/rome/

Page 65: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Slide Credits

• Svetlana Lazebnik

• Marc Pollefeys

• Noah Snaveley & co-authors

Page 66: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Today: SFM

• SFM problem statement

• Factorization

• Projective SFM

• Bundle Adjustment• Bundle Adjustment

• Photo Tourism

• “Rome in a day:

Page 67: C280, Computer Vision - EECS at UC Berkeleytrevor/CS280Notes/11SFM.pdfC280, Computer Vision Prof. Trevor Darrell trevor@eecs.berkeley.edu Lecture 11: Structure from Motion

Roadmap• Previous: Image formation, filtering, local features, (Texture)…

• Tues: Feature-based Alignment

– Stitching images together

– Homographies, RANSAC, Warping, Blending

– Global alignment of planar models

• Today: Dense Motion Models

– Local motion / feature displacement– Local motion / feature displacement

– Parametric optic flow

• No classes next week: ICCV conference

• Oct 6th: Stereo / ‘Multi-view’: Estimating depth with known

inter-camera pose

• Oct 8th: ‘Structure-from-motion’: Estimation of pose and 3D

structure

– Factorization approaches

– Global alignment with 3D point models