computer vision: models, learning and inference m ahad multiple cameras

30
Computer vision: models, learning and inference M Ahad Multiple Cameras http://research.google.com/pubs/pub37112.html http://grail.cs.washington.edu/rome/ http://grail.cs.washington.edu/projects/interior/ http://phototour.cs.washington.edu/ http://phototour.cs.washington.edu/PhotoTourismPreview-640x480.mov BigBed: http://photosynth.net/view.aspx?cid=877fce1c-4aa9-405c-8024-4fa1dce6a84f Trevi Fountain: http://photosynth.net/view.aspx?cid=8089d414-fa91-4828-b88f- df07173edee4

Upload: moris-harrell

Post on 06-Jan-2018

220 views

Category:

Documents


0 download

DESCRIPTION

Structure from motion 33Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Given an object that can be characterized by I 3D points projections into J images Find Intrinsic matrix Extrinsic matrix for each of J images 3D points

TRANSCRIPT

Page 1: Computer vision: models, learning and inference M Ahad Multiple Cameras

Computer vision: models, learning and inference

M Ahad Multiple Cameras

http://research.google.com/pubs/pub37112.htmlhttp://grail.cs.washington.edu/rome/

http://grail.cs.washington.edu/projects/interior/http://phototour.cs.washington.edu/

http://phototour.cs.washington.edu/PhotoTourismPreview-640x480.movBigBed: http://photosynth.net/view.aspx?cid=877fce1c-4aa9-405c-8024-4fa1dce6a84f

Trevi Fountain: http://photosynth.net/view.aspx?cid=8089d414-fa91-4828-b88f-df07173edee4

Page 2: Computer vision: models, learning and inference M Ahad Multiple Cameras

2

Structure from Motion (SfM)

• Consider a single camera moving around a static object. • The goal: to build a 3D model from the images taken by

the camera. • To do this, we will also need to simultaneously establish

the properties of the camera and its position in each frame.

• This problem is widely known as structure from motion [although this is something of a misnomer as both ‘structure’ and ‘motion’ are recovered simultaneously.

Page 3: Computer vision: models, learning and inference M Ahad Multiple Cameras

3

Structure from motion

3Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Given • an object that can be characterized by I 3D points• projections into J images

Find• Intrinsic matrix• Extrinsic matrix for each of J images• 3D points

Page 4: Computer vision: models, learning and inference M Ahad Multiple Cameras

4

Structure from motion

4Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

For simplicity, we’ll start with simpler problem

• Just J=2 images• Known intrinsic matrix

Page 5: Computer vision: models, learning and inference M Ahad Multiple Cameras

5

Structure

5Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Two view geometry• The essential and fundamental matrices• Reconstruction pipeline• Rectification• Multi-view reconstruction• Applications

Page 6: Computer vision: models, learning and inference M Ahad Multiple Cameras

6

• There is a geometric relationship between corresponding points in two images of the same scene.

• This geometric relationship or epipolar constrinat depends only on – the intrinsic parameters of the two cameras and– their relative translation and rotation of the 2

cameras (determined by extrinsic parameters).

Page 7: Computer vision: models, learning and inference M Ahad Multiple Cameras

7

Recap - Intrinsic parameters

Intrinsic [inherent/essential] parameters: - Focal length parameter f. different focal length

parameter for x and y dims.- Skew parameter (gamma)- Offset parameter - Pixel (0,0) is where the principal

ray strikes the image plane (i.e., the center) & a shift/offset to center (delta)

Page 8: Computer vision: models, learning and inference M Ahad Multiple Cameras

8

Epipolar lines

8Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

•Consider point x1 in the first image. •The 3D point w that projected to x1 must lie somewhere along the ray that passes from the optical center of camera 1 through the position x1 in the image plane (dashed green line). •But, we don't know where along that ray it lies (4 possibilities shown). •It follows that x2, the projected position in camera 2 must lie somewhere on the projection of this ray. •The projection of this ray is a line in image 2 and is referred to as an epipolar line.

Page 9: Computer vision: models, learning and inference M Ahad Multiple Cameras

9

Epiplar geometry

Typical use case for epiplar geometry:• 2 cameras take a picture of the same scene from different points of view. • The epipolar geometry then describes the relation between the two resulting views.

Scene/object

cameracamera

Resulting views/images for both cameras

Page 10: Computer vision: models, learning and inference M Ahad Multiple Cameras

10

• Epipolar geometry is the geometry of stereo vision. • When 2 cameras view a 3D scene from two distinct

positions, there are a number of geometric relations between the 3D points and their projections onto the 2D images that lead to constraints between the image points.

• These relations are derived based on the assumption that the cameras can be approximated by the pinhole camera model.

Page 11: Computer vision: models, learning and inference M Ahad Multiple Cameras

11

Page 12: Computer vision: models, learning and inference M Ahad Multiple Cameras

12

• 2 pinhole cameras looking at point X. • In real cameras, the image plane is actually behind the center

of projection, and produces an image that is rotated 180 degrees উল্টা.

• Epipolar geometry, however, the projection problem is simplified by placing a virtual image plane in front of the center of projection of each camera to produce an unrotated image.

• OL and OR the centers of projection of the two cameras. • X the point of interest in both cameras. • Points xL and xR the projections of point X onto the image

planes.

Page 13: Computer vision: models, learning and inference M Ahad Multiple Cameras

13

• Each camera captures a 2D image of the 3D world. • This conversion from 3D to 2D is referred to as a

perspective projection & is described by the pinhole camera model.

• It is common to model this projection operation by rays that emanate from the camera, passing through its center of projection.

• Note that each emanating ray corresponds to a single point in the image.

Page 14: Computer vision: models, learning and inference M Ahad Multiple Cameras

Epipole or epipolar point

• Since the centers of projection of the cameras are distinct, each center of projection projects onto a distinct point into the other camera's image plane.

• These 2 image points are denoted by eL and eR and are called epipoles or epipolar points.

• Both epipoles eL and eR in their respective image planes and both centers of projection OL and OR lie on a single 3D line.

Page 15: Computer vision: models, learning and inference M Ahad Multiple Cameras

15

Epipolar line• The line OL–X is seen by the left camera as a point because it is directly in

line with that camera's center of projection. However, the right camera sees this line as a line in its image plane. That line (eR–xR) in the right camera is called an epipolar line. Symmetrically, the line OR–X seen by the right camera as a point is seen as epipolar line eL–xLby the left camera.

• An epipolar line is a function of the 3D point X, i.e. there is a set of epipolar lines in both images if we allow X to vary over all 3D points. Since the 3D line OL–X passes through the center of projection OL, the corresponding epipolar line in the right image must pass through the epipole eR (and correspondingly for epipolar lines in the left image).

• This means that all epipolar lines in one image must intersect the epipolar point of that image. In fact, any line which intersects with the epipolar point is an epipolar line since it can be derived from some 3D point X.

Source: http://encyclopedia.thefreedictionary.com/Epipolar+geometry

Page 16: Computer vision: models, learning and inference M Ahad Multiple Cameras

16

Epipolar constraint

If the relative translation and rotation of the two cameras is known, the corresponding epipolar geometry leads to two important observations:

• If the projection point xL is known, then the epipolar line eR–xR is known and the point X projects into the right image, on a point xR which must lie on this particular epipolar line.

• This means that for each point observed in one image – the same point must be observed in the other image on a known epipolar line.

Page 17: Computer vision: models, learning and inference M Ahad Multiple Cameras

17

Epipolar constraint

• This provides an epipolar constraint which corresponding image points must satisfy and it means that it is possible to test if two points really correspond to the same 3D point.

• Epipolar constraints can be described by the essential matrix or the fundamental matrix between the two cameras.

Page 18: Computer vision: models, learning and inference M Ahad Multiple Cameras

18

• For any point in the first image, the corresponding point in the second image is constrained to lie on a line. This is known as the epipolar constraint.

• The particular line that it is constrained to lie on depends on the intrinsic parameters of the cameras and

• The relative translation and rotation of the two cameras (determined by the extrinsic parameters).

Page 19: Computer vision: models, learning and inference M Ahad Multiple Cameras

19

triangulation

• If the points xL and xR are known, their projection lines are also known.

• If the two image points - correspond to the same 3D point X - the projection lines must intersect precisely at X.

• This means that X can be calculated from the coordinates of the two image points, a process called triangulation.

Page 20: Computer vision: models, learning and inference M Ahad Multiple Cameras

20

Epipole

20Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

•Now consider a number of points in the first image. •Each is associated with a ray in 3D space. •Each ray projects to form an epipolar line in the second image. •Since all the rays converge at the optical center of the first camera, the epipolar lines must converge at a single point in the second image plane; this is the image in the second camera of the optical center of the first camera and is known as the epipole.

Page 21: Computer vision: models, learning and inference M Ahad Multiple Cameras

21

Special configurations: The epipoles are not necessarily within the observed images: the epipolar lines may converge to a point outside the visible area.

21Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

When the camera movement isa pure translation perpendicular to the optical axis (parallel to the image plane) & cameras are oriented in the same direction (i.e., no relative rotation) -- the epipolar lines are parallel and the epipole (where they converge) is at infinity.

Page 22: Computer vision: models, learning and inference M Ahad Multiple Cameras

22

Special config.-2When the camera movement is a pure translation along the optical axis, the epipoles are in the center of the image and the epipolar lines form a radial pattern.

Page 23: Computer vision: models, learning and inference M Ahad Multiple Cameras

23

• To calculate depth information from a pair of images we need to compute the epipolar geometry.

• In the calibrated environment we capture this geometric constraint in an algebraic representation known as the essential matrix.

• In the uncalibrated environment, it is captured in the fundamental matrix.

Page 24: Computer vision: models, learning and inference M Ahad Multiple Cameras

24

16.2Simon: The essential matrix

• Assume that the world coordinate system is centered on the 1st camera so that the extrinsic parameters (rotation and translation) of the 1st camera are {I, 0}.

• The 2nd camera may be in any general position {omega, tau}.

• We will further assume that the cameras are normalized so that intrinsic params: A1 = A2 = I.

Page 25: Computer vision: models, learning and inference M Ahad Multiple Cameras

25

The geometric relationship between the two cameras is captured by the essential matrix.Assume normalized cameras, first camera at origin. In homogeneous coordinates, a 3D point w is projected into the two cameras as:

First camera:

16.2Simon: The essential matrix

25Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

where ~x1 is the observed position in the cam1, & ~x2 is the observed position in cam2

This simplifies to:

Page 26: Computer vision: models, learning and inference M Ahad Multiple Cameras

26

p.371: 14.3

• λ is an arbitrary scaling factor. • This is a redundant representation in that any

scalar multiple λ represents the same 2D point.

• E.g., the homogeneous vectors ~x = [2; 4; 2]T and ~x = [3; 6; 3]T both represent the Cartesian 2D point x = [1; 2]T , where scaling factors λ = 2 and λ = 3 have been used, respectively.

Page 27: Computer vision: models, learning and inference M Ahad Multiple Cameras

27

Similarly, for camera2

By a similar process, the projection in the second camera can be written as,

Page 28: Computer vision: models, learning and inference M Ahad Multiple Cameras

The essential matrixFirst camera:

Second camera:

Substituting:

This is a mathematical relationship between the points in the two images, but it’s not in the most convenient form.This relationship represents a constraint between the possible positions of corresponding points x1 and x2 in the two images. The constraint is parameterized by the rotation and translation {Omega, tau} of the cam2 relative to the cam1.

Page 29: Computer vision: models, learning and inference M Ahad Multiple Cameras

29

The essential matrix

29Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Take cross product of both sides with the translation vector t (This removes the last term as the cross product of any vector with itself is zero):

Take inner product of both sides with ~x2 (The left hand side disappears since (τ x ~x2) must be perpendicular to ~x2)

Page 30: Computer vision: models, learning and inference M Ahad Multiple Cameras

30

The cross product term can be expressed as a matrix

Defining:

We now have the essential matrix relation,

The essential matrix

30It is a formulation of the mathematical constraint between the positions of corresponding points x1 and x2 in two normalized cameras.

is known as the essential matrix