computer vision: models, learning and inference m ahad multiple cameras

Computer vision: models, learning and inference

M Ahad Multiple Cameras

http://research.google.com/pubs/pub37112.htmlhttp://grail.cs.washington.edu/rome/

http://grail.cs.washington.edu/projects/interior/http://phototour.cs.washington.edu/

http://phototour.cs.washington.edu/PhotoTourismPreview-640x480.movBigBed: http://photosynth.net/view.aspx?cid=877fce1c-4aa9-405c-8024-4fa1dce6a84f

Trevi Fountain: http://photosynth.net/view.aspx?cid=8089d414-fa91-4828-b88f-df07173edee4

http://research.google.com/pubs/pub37112.html

http://grail.cs.washington.edu/rome/

http://grail.cs.washington.edu/projects/interior/

2

Structure from Motion (SfM)

• Consider a single camera moving around a static object. • The goal: to build a 3D model from the images taken by

the camera. • To do this, we will also need to simultaneously establish

the properties of the camera and its position in each frame.

• This problem is widely known as structure from motion [although this is something of a misnomer as both ‘structure’ and ‘motion’ are recovered simultaneously.

3

Structure from motion

3Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Given • an object that can be characterized by I 3D points• projections into J images

Find• Intrinsic matrix• Extrinsic matrix for each of J images• 3D points

4

Structure from motion


For simplicity, we’ll start with simpler problem

• Just J=2 images• Known intrinsic matrix

5

Structure


• Two view geometry• The essential and fundamental matrices• Reconstruction pipeline• Rectification• Multi-view reconstruction• Applications

6

• There is a geometric relationship between corresponding points in two images of the same scene.

• This geometric relationship or epipolar constrinat depends only on – the intrinsic parameters of the two cameras and– their relative translation and rotation of the 2

cameras (determined by extrinsic parameters).

7

Recap - Intrinsic parameters

Intrinsic [inherent/essential] parameters: - Focal length parameter f. different focal length

parameter for x and y dims.- Skew parameter (gamma)- Offset parameter - Pixel (0,0) is where the principal

ray strikes the image plane (i.e., the center) & a shift/offset to center (delta)

8

Epipolar lines


•Consider point x1 in the first image. •The 3D point w that projected to x1 must lie somewhere along the ray that passes from the optical center of camera 1 through the position x1 in the image plane (dashed green line). •But, we don't know where along that ray it lies (4 possibilities shown). •It follows that x2, the projected position in camera 2 must lie somewhere on the projection of this ray. •The projection of this ray is a line in image 2 and is referred to as an epipolar line.

9

Epiplar geometry

Typical use case for epiplar geometry:• 2 cameras take a picture of the same scene from different points of view. • The epipolar geometry then describes the relation between the two resulting views.

Scene/object

cameracamera

Resulting views/images for both cameras

10

• Epipolar geometry is the geometry of stereo vision. • When 2 cameras view a 3D scene from two distinct

positions, there are a number of geometric relations between the 3D points and their projections onto the 2D images that lead to constraints between the image points.

• These relations are derived based on the assumption that the cameras can be approximated by the pinhole camera model.

12

• 2 pinhole cameras looking at point X. • In real cameras, the image plane is actually behind the center

of projection, and produces an image that is rotated 180 degrees উল্টা.

• Epipolar geometry, however, the projection problem is simplified by placing a virtual image plane in front of the center of projection of each camera to produce an unrotated image.

• OL and OR the centers of projection of the two cameras. • X the point of interest in both cameras. • Points xL and xR the projections of point X onto the image

planes.

13

• Each camera captures a 2D image of the 3D world. • This conversion from 3D to 2D is referred to as a

perspective projection & is described by the pinhole camera model.

• It is common to model this projection operation by rays that emanate from the camera, passing through its center of projection.

• Note that each emanating ray corresponds to a single point in the image.

Epipole or epipolar point

• Since the centers of projection of the cameras are distinct, each center of projection projects onto a distinct point into the other camera's image plane.

• These 2 image points are denoted by eL and eR and are called epipoles or epipolar points.

• Both epipoles eL and eR in their respective image planes and both centers of projection OL and OR lie on a single 3D line.

15

Epipolar line• The line OL–X is seen by the left camera as a point because it is directly in

line with that camera's center of projection. However, the right camera sees this line as a line in its image plane. That line (eR–xR) in the right camera is called an epipolar line. Symmetrically, the line OR–X seen by the right camera as a point is seen as epipolar line eL–xLby the left camera.

• An epipolar line is a function of the 3D point X, i.e. there is a set of epipolar lines in both images if we allow X to vary over all 3D points. Since the 3D line OL–X passes through the center of projection OL, the corresponding epipolar line in the right image must pass through the epipole eR (and correspondingly for epipolar lines in the left image).

• This means that all epipolar lines in one image must intersect the epipolar point of that image. In fact, any line which intersects with the epipolar point is an epipolar line since it can be derived from some 3D point X.

Source: http://encyclopedia.thefreedictionary.com/Epipolar+geometry

http://encyclopedia.thefreedictionary.com/Epipolar+geometry

16

Epipolar constraint

If the relative translation and rotation of the two cameras is known, the corresponding epipolar geometry leads to two important observations:

• If the projection point xL is known, then the epipolar line eR–xR is known and the point X projects into the right image, on a point xR which must lie on this particular epipolar line.

• This means that for each point observed in one image – the same point must be observed in the other image on a known epipolar line.

17

Epipolar constraint

• This provides an epipolar constraint which corresponding image points must satisfy and it means that it is possible to test if two points really correspond to the same 3D point.

• Epipolar constraints can be described by the essential matrix or the fundamental matrix between the two cameras.

18

• For any point in the first image, the corresponding point in the second image is constrained to lie on a line. This is known as the epipolar constraint.

• The particular line that it is constrained to lie on depends on the intrinsic parameters of the cameras and

• The relative translation and rotation of the two cameras (determined by the extrinsic parameters).

19

triangulation

• If the points xL and xR are known, their projection lines are also known.

• If the two image points - correspond to the same 3D point X - the projection lines must intersect precisely at X.

• This means that X can be calculated from the coordinates of the two image points, a process called triangulation.

20

Epipole


•Now consider a number of points in the first image. •Each is associated with a ray in 3D space. •Each ray projects to form an epipolar line in the second image. •Since all the rays converge at the optical center of the first camera, the epipolar lines must converge at a single point in the second image plane; this is the image in the second camera of the optical center of the first camera and is known as the epipole.

21

Special configurations: The epipoles are not necessarily within the observed images: the epipolar lines may converge to a point outside the visible area.


When the camera movement isa pure translation perpendicular to the optical axis (parallel to the image plane) & cameras are oriented in the same direction (i.e., no relative rotation) -- the epipolar lines are parallel and the epipole (where they converge) is at infinity.

22

Special config.-2When the camera movement is a pure translation along the optical axis, the epipoles are in the center of the image and the epipolar lines form a radial pattern.

23

• To calculate depth information from a pair of images we need to compute the epipolar geometry.

• In the calibrated environment we capture this geometric constraint in an algebraic representation known as the essential matrix.

• In the uncalibrated environment, it is captured in the fundamental matrix.

24

16.2Simon: The essential matrix

• Assume that the world coordinate system is centered on the 1st camera so that the extrinsic parameters (rotation and translation) of the 1st camera are {I, 0}.

• The 2nd camera may be in any general position {omega, tau}.

• We will further assume that the cameras are normalized so that intrinsic params: A1 = A2 = I.

25

The geometric relationship between the two cameras is captured by the essential matrix.Assume normalized cameras, first camera at origin. In homogeneous coordinates, a 3D point w is projected into the two cameras as:

First camera:

16.2Simon: The essential matrix


where ~x1 is the observed position in the cam1, & ~x2 is the observed position in cam2

This simplifies to:

26

p.371: 14.3

• λ is an arbitrary scaling factor. • This is a redundant representation in that any

scalar multiple λ represents the same 2D point.

• E.g., the homogeneous vectors ~x = [2; 4; 2]T and ~x = [3; 6; 3]T both represent the Cartesian 2D point x = [1; 2]T , where scaling factors λ = 2 and λ = 3 have been used, respectively.

27

Similarly, for camera2

By a similar process, the projection in the second camera can be written as,

The essential matrixFirst camera:

Second camera:

Substituting:

This is a mathematical relationship between the points in the two images, but it’s not in the most convenient form.This relationship represents a constraint between the possible positions of corresponding points x1 and x2 in the two images. The constraint is parameterized by the rotation and translation {Omega, tau} of the cam2 relative to the cam1.

29

The essential matrix


Take cross product of both sides with the translation vector t (This removes the last term as the cross product of any vector with itself is zero):

Take inner product of both sides with ~x2 (The left hand side disappears since (τ x ~x2) must be perpendicular to ~x2)

30

The cross product term can be expressed as a matrix

Defining:

We now have the essential matrix relation,

The essential matrix

30It is a formulation of the mathematical constraint between the positions of corresponding points x1 and x2 in two normalized cameras.

is known as the essential matrix

computer vision: models, learning and inference m ahad multiple cameras

Documents