3d vision: structure from motion - cvg @ ethz · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x...

38
3D Vision: Structure from Motion

Upload: others

Post on 17-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

3D Vision: Structure from Motion

Page 2: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

Structure from Motion

•  Two view reconstruction •  Epipolar geometry computation •  Triangulation

•  Adding more views •  Pose estimation

Page 3: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

Epipolar geometry

Page 4: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

The fundamental matrix F algebraic representation of epipolar geometry

we will see that mapping is (singular) correlation (i.e. projective mapping from points to lines) represented by the fundamental matrix F

Page 5: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

The fundamental matrix F

geometric derivation

mapping from 2-D to 1-D family (rank 2)

Page 6: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

The fundamental matrix F

algebraic derivation

(note: doesn’t work for C=C’ ⇒ F=0)

Page 7: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

The fundamental matrix F

correspondence condition

The fundamental matrix satisfies the condition that for any pair of corresponding points x↔x’ in the two images

Page 8: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

The fundamental matrix F - recap

F is the unique 3x3 rank 2 matrix that satisfies x’TFx=0 for all x↔x’

(i)   Transpose: if F is fundamental matrix for (P,P’), then FT is fundamental matrix for (P’,P)

(ii)   Epipolar lines: l’=Fx & l=FTx’ (iii)   Epipoles: on all epipolar lines, thus e’TFx=0, ∀x

⇒e’TF=0, similarly Fe=0 (iv)  F has 7 d.o.f. , i.e. 3x3-1(homogeneous)-1(rank2) (v)  F is a correlation, projective mapping from a point x to a

line l’=Fx (not a proper correlation, i.e. not invertible)

Page 9: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

Computation of F

•  Linear (8-point) •  Minimal (7-point) •  Calibrated (5-point) (Essential matrix)

•  Practical two-view geometry computation

Page 10: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

Epipolar geometry: basic equation

separate known from unknown

(data) (unknowns) (linear)

Page 11: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

~10000 ~10000 ~10000 ~10000 ~100 ~100 1 ~100 ~100

! Orders of magnitude difference between column of data matrix → least-squares yields poor results

the NOT normalized 8-point algorithm

Page 12: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

Transform image to ~[-1,1]x[-1,1]

(0,0)

(700,500)

(700,0)

(0,500)

(1,-1)

(0,0)

(1,1) (-1,1)

(-1,-1)

the normalized 8-point algorithm

Page 13: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

the singularity constraint

SVD from linearly computed F matrix (rank 3)

Compute closest rank-2 approximation

Page 14: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11
Page 15: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

the minimum case – 7 point correspondences

one parameter family of solutions

but F1+λF2 not automatically rank 2

Page 16: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

F1 F2 F

σ3

F7pts

(obtain 1 or 3 solutions)

(cubic equation)

the minimum case – impose rank 2

Compute possible λ as eigenvalues of (only real solutions are potential solutions)

Page 17: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

•  Linear equations for 5 points

•  Linear solution space

•  Non-linear constraints

Calibrated case: 5-point relative motion

10 cubic polynomials

scale does not matter, choose

(Nister, CVPR03)

!x1x1 !x1y1 !x11 !y1x1 !y1y1 !y1 x1 y1 1!x2x2 !x2y2 !x21 !y2x2 !y2y2 !y2 x2 y2 1!x3x3 !x3y3 !x31 !y3x3 !y3y3 !y3 x3 y3 1!x4x4 !x4y4 !x41 !y4x4 !y4y4 !y4 x4 y4 1!x5x5 !x5y5 !x51 !y5x5 !y5y5 !y5 x5 y5 1

"

#

$$$$$$$

%

&

'''''''

E11E12E13E21E22E23E31E32E33

"

#

$$$$$$$$$$$$$

%

&

'''''''''''''

= 0

(assumes normalized coordinates)

Page 18: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

Calibrated case: 5-point relative motion

•  Perform Gauss-Jordan elimination on polynomials

-z

-z

-z

represents polynomial of degree n in z

(Nister, CVPR03)

Page 19: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

Step 1. Extract features Step 2. Compute a set of potential matches Step 3. do

Step 3.1 select minimal sample (i.e. 7 or 5 matches) Step 3.2 compute solution(s) for F Step 3.3 determine inliers

until Γ(#inliers,#samples)<95%

#inliers 90% 80% 70% 60% 50%

#samples 5 13 35 106 382

Step 4. Compute F based on all inliers Step 5. Look for additional matches Step 6. Refine F based on all correct matches

(generate hypothesis)

(verify hypothesis)

Automatic computation of F

RANSAC

Page 20: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

restrict search range to neighborhood of epipolar line (e.g. ±1.5 pixels) relax disparity restriction (along epipolar line)

Finding more matches

Page 21: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

Initial structure and motion Epipolar geometry ↔ Projective calibration

compatible with F

Yields correct projective camera setup (Faugeras´92,Hartley´92)

Obtain structure through triangulation Use reprojection error for minimization Avoid measurements in projective space

Page 22: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

Initial structure and motion (calibrated case)

Essential Matrix:

Essential Matrix decomposition

Recover R and t from E

use or use or ambiguity

P1 = I 0!"

#$

P2 = R t!"

#$

(e.g. see Hartley and Zisserman, Sec.8.6)

Page 23: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

Triangulation

C1 x1 L1

x2

L2 X

C2

Triangulation -  calibration

-  correspondences

Page 24: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

Triangulation •  Backprojection

•  Triangulation

Iterative least-squares •  Maximum Likelihood Triangulation (geometric error)

C1 x1 L1

x2

L2 X

Page 25: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

Optimal 3D point in epipolar plane

•  Given an epipolar plane, find best 3D point for (m1,m2)

m1

m2

l1 l2l1 m1

m2 l2

m1´ m2´

Select closest points (m1´,m2´) on epipolar lines Obtain 3D point through exact triangulation Guarantees minimal reprojection error (given this epipolar plane)

Page 26: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

Non-iterative optimal solution •  Reconstruct matches in projective frame

by minimizing the reprojection error

•  Non-iterative method Determine the epipolar plane for reconstruction

Reconstruct optimal point from selected epipolar plane Note: only works for two views

(Hartley and Sturm, CVIU´97)

(polynomial of degree 6)

m1

m2 l1(α) l2(α)

3DOF

1DOF

Page 27: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

Initialize Motion (P1,P2 compatibel with F or E)

Sequential Structure and Motion Computation

Initialize Structure (minimize reprojection error)

Page 28: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

Sequential structure and motion recovery

•  Initialize structure and motion from two views

•  For each additional view •  Determine pose •  Refine and extend structure

•  Determine correspondences robustly by jointly estimating matches and epipolar geometry

Page 29: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

Compute Pi+1 using robust approach (6-point RANSAC) Extend and refine reconstruction

2D-2D

2D-3D 2D-3D

mi mi+1

M

new view

Determine pose towards existing structure

Page 30: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

Compute P with 6-point RANSAC

•  Generate hypothesis using 6 points

•  Planar scenes are degerate!

(similar DLT algorithm as see in 2nd lecture for homographies)

(two equations per point)

Page 31: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

Three points perspective pose – p3p (calibrated case)

(Haralick et al., IJCV94)

All techniques yield 4th order polynomial

1903 1841

Page 32: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

Initialize Motion (P1,P2 compatibel with F or E)

Sequential Structure and Motion Computation

Initialize Structure (minimize reprojection error)

Extend motion (compute pose through matches seen in 2 or more previous views)

Extend structure (Initialize new structure, refine existing structure)

Page 33: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

Changchang’s SfM code

for iconic graph •  uses 5-point+RANSAC for 2-view initialization •  uses 3-point+RANSAC for adding views •  performs bundle adjustment For additional images •  use 3-point+RANSAC pose estimation

http://ccwu.me/vsfm/

Page 34: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

Rome on a cloudless day (Frahm et al. ECCV 2010)

GIST & clustering (1h35)

SIFT & Geometric verification (11h36)

SfM & Bundle (8h35)

Dense Reconstruction (1h58)

Some numbers •  1PC •  2.88M images •  100k clusters •  22k SfM with 307k images •  63k 3D models •  Largest model 5700 images •  Total time 23h53

Page 35: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

Hierarchical structure and motion recovery

•  Compute 2-view •  Compute 3-view •  Stitch 3-view reconstructions •  Merge and refine reconstruction

F T

H

PM

Page 36: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

Stitching 3-view reconstructions

Different possibilities 1. Align (P2,P3) with (P’1,P’2)

2. Align X,X’ (and C’C’)

3. Minimize reproj. error

4. MLE (merge)

Page 37: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

SfM revisited

Soon available at https://github.com/colmap/colmap

Structure-from-Motion revisited, Johannes L. Schönberger, Jan-Michael Frahm IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

Page 38: 3D Vision: Structure from Motion - CVG @ ETHZ · x!4x 4 x!4y 4 x!41y!4x 4 y!4y 4 y!4x 4 y 4 1 x!5x 5 x!5y 5 x!51y!5x 5 y!5y 5 y!5x 5 y 5 1 " # $ $ $ $ $ $ $ % & ' ' ' ' ' ' ' E 11

Next week: Dense Correspondences