image formation • camera models - uvic.caaalbu/computer vision 2008/week 2... · {basic...

13
1 Image formation Computer Vision CENG 421/ELEC 536 Spring 2008 2 Course outline • Image formation • Camera Models Pinhole Perspective Projection • Affine Projection Reading: Cipolla and Gee, Lecture Notes on Projection, 1999. 3 Digital images A digital image=2D array (matrix) of numbers demo Depending on the nature of the image, these numbers may represent: light intensities, colour (wavelength) distance other physical quantities related to the image acquisition process. Several types of images are typically used in computer vision: Intensity images encode light intensities and colour (R,G,B) channels acquired by digital cameras Range images Encode shape and distance (3D vision) Acquired by special sensors like sonars, radars, laser scanners Oceanography, intelligent vehicles etc. Medical images 2D or 3D depending on the acquisition techniques Ultrasound (2D/3D), CT (3D), MRI (3D), digital X-Ray (2D) etc. others 4 Digital images- discussion The exact relationship between a digital image and the physical world is determined by the acquisition process, which depends on the sensor used Any information contained in images (e.g. shape- related measurements, relative position of various objects or object identity) must be ultimately be extracted from the 2D numerical arrays in which it is encoded Computer Vision algorithms extract the information relevant for the task at hand from the 2D images This chapter investigates the process of image formation in intensity images; the rest of the course is dedicated to computational techniques for the extraction of information from images. 5 Image formation (geometric) The geometric process is projection. A 3D scene is projected onto a 2D image. Basic assumptions: - the 3D scene consists of opaque and reflective objects in a transparent medium (air) with one or more light sources. Additional assumptions (for more complex scenes and/or tasks) - for instance, controlled scene lighting is helpful for the removal of cast shadows o For a sharp image (in focus), all rays coming from a single scene point P must converge to a single point P’ in the image. P’ will be referred as the image of P. 6 Cameras Basic abstraction: pinhole camera Abstract camera model - box with a small hole in it Pinhole cameras work in practice The pinhole perspective projection equations were discovered by Brunelleschi, in the 15 th century. First pinhole camera: 16 th century. still represent the most used theoretical camera model

Upload: others

Post on 02-Jan-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Image formation • Camera Models - UVic.caaalbu/computer vision 2008/Week 2... · {Basic assumptions: - the 3D scene consists of opaque and reflective objects in a transparent medium

1

1

Image formation

Computer VisionCENG 421/ELEC 536Spring 2008

2

Course outline

• Image formation• Camera Models

• Pinhole Perspective Projection• Affine Projection

Reading: Cipolla and Gee, Lecture Notes on Projection, 1999.

3

Digital images A digital image=2D array (matrix) of numbers demoDepending on the nature of the image, these numbers may represent:

light intensities,colour (wavelength)distanceother physical quantities related to the image acquisition process.

Several types of images are typically used in computer vision:Intensity images

encode light intensities and colour (R,G,B) channelsacquired by digital cameras

Range imagesEncode shape and distance (3D vision)Acquired by special sensors like sonars, radars, laser scanners Oceanography, intelligent vehicles etc.

Medical images2D or 3D depending on the acquisition techniquesUltrasound (2D/3D), CT (3D), MRI (3D), digital X-Ray (2D) etc.

others

4

Digital images- discussion

The exact relationship between a digital image and the physical world is determined by the acquisition process, which depends on the sensor usedAny information contained in images (e.g. shape-related measurements, relative position of various objects or object identity) must be ultimately be extracted from the 2D numerical arrays in which it is encodedComputer Vision algorithms extract the information relevant for the task at hand from the 2D imagesThis chapter investigates the process of image formation in intensity images; the rest of the course is dedicated to computational techniques for the extraction of information from images.

5

Image formation (geometric)

The geometric process is projection. A 3D scene is projected onto a 2D image.Basic assumptions:- the 3D scene consists of opaque and reflectiveobjects in a transparent medium (air) with one or more light sources.Additional assumptions (for more complex scenes and/or tasks)- for instance, controlled scene lighting is helpful for the removal of cast shadows

o For a sharp image (in focus), all rays coming from a single scene point P must converge to a single point P’ in the image. P’ will be referred as the image of P.

6

Cameras

Basic abstraction: pinhole cameraAbstract camera model - box with a small hole in itPinhole cameras work in practiceThe pinhole perspective projection equations were discovered by Brunelleschi, in the 15th century.First pinhole camera: 16th century.still represent the most used theoretical camera model

Page 2: Image formation • Camera Models - UVic.caaalbu/computer vision 2008/Week 2... · {Basic assumptions: - the 3D scene consists of opaque and reflective objects in a transparent medium

2

7

Brunelleschi’s experiments

The Baptistry in Florencehttp://www.kap.pdx.edu/trow/winter01/perspective/

A peephole in the mirror

A two-mirror system for comparing the painting and the real scene

Painted panel of the baptistry

Final result: a mathematical theory of projection by which 3D space could be rendered on any 2D surface.

8

First pinhole cameraThe first published picture of a pinhole camera obscura : a drawing in Gemma Frisius' De Radio Astronomica et Geometrica (1545).Gemma Frisius (an astronomer) had used the pinhole in a darkened room to study the solar

eclipse of 1544.

The term camera obscura ("dark room") was coined by Johannes Kepler(1571–1630).

9

In a pinhole camera, images are formed by the projection of 3D objects.

Figure from US Navy Manual of Basic Optics and Optical Instruments, prepared by Bureau of Naval Personnel. Reprinted by Dover Publications, Inc., 1969.

10

Camera obscura : each point on the image plane sees light from only one direction, the one that passes through the pinhole. The pinhole is the center of projection through which all light passes.

! Perspective projection creates inverted images.

! It is sometimes convenient to consider a virtual image in a plane lying in front of a pinhole and symmetric to the image plane with respect to the pinhole point.

11

Pinhole optics

Using ray-tracing, we see that only a narrow light beam passes through a pinhole

A) In a wide pinhole, light from the source spreads across the image, making it blurry.

B) In a narrow pinhole, only a small amount of light is let in. The image is sharper.

Small apertures require longer exposure times.

The sharpness is limited by diffraction. 12

Pinhole too big -many directions areaveraged, blurring theimage

Pinhole too small-diffraction effects blurthe image.

Generally, images from pinhole cameras are dark, because a very small set of rays from a particular point hits the screen.

Page 3: Image formation • Camera Models - UVic.caaalbu/computer vision 2008/Week 2... · {Basic assumptions: - the 3D scene consists of opaque and reflective objects in a transparent medium

3

13

Pinhole optics

Pinhole optics focuses images:- without lenses- with an infinite depth of field- Depth of field = The distance between the nearest

and farthest objects that appear in acceptably sharp focus in the image.

Small pinhole:- Better focus- Less light energy available from any scene point - The sharpness is limited by difraction

14

Perspective effect: Far objects appear smaller than close ones

! The image plane is behind the pinhole (inverted images).

15

Pinhole optics: Horizon and vanishing points

The film plane is usually placed in front of the pinhole O(virtual image plane).Moving the film plane → image scaling.H is the horizon line. Considering all the possible sets of parallel lines in plane Π, their intersection (vanishing points) lie on the horizon line.

What is the image of a point located on line L?

16Parallel lines meet at vanishing points

17

Parallel lines meet (cont’d)

• each set of parallel lines in the real 3D world will have a different vanishing point in the image located on the horizon line.

• Also, planes parallel to the ground plane meet in the horizon line.

18Is this a perspective image of four identical buildings?

Spotting ‘fake’ images

Page 4: Image formation • Camera Models - UVic.caaalbu/computer vision 2008/Week 2... · {Basic assumptions: - the 3D scene consists of opaque and reflective objects in a transparent medium

4

19

The Pinhole Perspective Equation

We associate a coordinate system to the pinhole camera (also known as the camera frame).

The pinhole camera is defined by Oc and by the image plane.

The ZC axis is called the optical axis of the camera.

Point C’ is the image center.

3D scene point P=(Xc ,Yc ,Zc)T projects to image point P’=(x, y, f) where f is the focal distance

Equation of perspective projection is found by analyzing similar triangles

C’

P

P’

20

Properties of perspective projectionSize of the image of an object changes as it translates along the z axis (scaling effect)

Perspective projection is line-preserving

image Planar scene z=z0

0

'''

zf

PQ

QPm ==

- Ratios of lengths are not preserved, except for planar scenes parallel to the image plane

- The focal distance f is an essential parameter of the pinhole camera.

-f small: more world points project onto the finite image. This is called a wide angle image

- f large: telescopic image

21

Homogenous coordinates

Add an extra coordinate and use an equivalence relationfor 2D

equivalence relationk*(X,Y,Z) is the same as (X,Y,Z)

for 3Dequivalence relationk*(X,Y,Z,T) is the same as (X,Y,Z,T)

Basic notionPossible to represent points “at infinity”

Where parallel lines intersectWhere parallel planes intersect

Possible to describe the perspective projection as a matrix transform.

22

Rationale for using homogeneous coordinates

Every point in an image corresponds to one incoming light ray: any 3D point along the ray projects to the same image point, so only the direction of the ray is relevant, not the distance of the point along it. One way to represent incoming ray directions is by their corresponding pixel location : 2 image coordinates (x,y).Another way is by arbitrarily choosing some 3D point along each ray to represent the ray's direction.

- In this case we need 3 homogeneous coordinates instead of 2 ‘inhomogeneous’ ones to represent each ray. This seems inefficient, but it has the significant advantage of making the image projection process much easier to model.

23

Rationale for homogeneous coordinates (cont’d)

A. suppose that the camera is at the origin (0,0,0). The ray represented by homogeneous coordinates (X,Y,T) passes through the 3D point (X,Y,T). The 3D point

also lies on (represents) the same ray. Thus, rescaling homogeneous coordinates makes no difference:

( ) ),,(,, TYXTYX λλλλ =

( ) ),,(,, TYXTYX λ≈

24

Relationship between homogeneous and inhomogeneous coordinates

suppose that the image plane of the camera is T=1;The ray through pixel (x,y) can be represented homogeneously by the vector (x,y,1) ≈ (xT, yT, T) for any depth T>0.The homogeneous point vector (X,Y,T ) with T≠0 corresponds to the inhomogeneous coordinates on the plane T=1.

TY

TX ,

Page 5: Image formation • Camera Models - UVic.caaalbu/computer vision 2008/Week 2... · {Basic assumptions: - the 3D scene consists of opaque and reflective objects in a transparent medium

5

25

Perspective projection revisitedWe want to express the projection equation into HC’s (homogeneous coordinates)

HC’s for 3D point are [λX λY λZ λ]T

HC’s for its image are [sx sy s]T

26

Homogeneous coordinates and vanishing points

What happens when T=0?(X,Y,0) is a valid 3D point defining an optical ray parallel to the plane T=1.It has no finite intersection with it!Such rays (homogeneous vectors) can no longer be interpreted as finite points of the standard 2D plane.They may be considered as ‘ideal’ points, or limits.Points at infinity (vanishing points)Lines at infinity (intersection between two parallel planes) (horizon line)

27

Application

Given two parallel planes :nx Xc +ny Yc + nzZc=d1

nx Xc +ny Yc + nzZc=d2 , d1≠ d2,

in inhomogeneous coordinates, prove that the 4th homogeneous coordinate of every point lying on the horizon line is 0.

28

Affine projection models: Weak perspective projection

is the magnification.

The weak perspective projection (m=constant for all points in the scene) works when the scene depth is small relative to the average distance from the camera.

0

'where''

zfm

myymxx −=

−=−=

29

Affine projection models: Orthographic projection

==

yyxx

''

When the camera is at a (roughly constant) distance from the scene, take m=1.Unlike other geometric models of image formation, orthographic projection does not involve a reversal of image features.What is the main difference between orthographic and weak perspective, general perspective projection? 30

The projection matrix for orthographic projection

=

TZYX

WVU

010000100001

==

yyxx

''

HC

The focal distance does not influence the image formation process under the assumption of orthographic projection.

Page 6: Image formation • Camera Models - UVic.caaalbu/computer vision 2008/Week 2... · {Basic assumptions: - the 3D scene consists of opaque and reflective objects in a transparent medium

6

31

Image formation with orthographic projection

• Parallel lines in the scene appear as parallel lines in the image

• The length of parallel segments is preserved by the projection transform

32

Geometric camera models

Camera models describe the mapping from world to pixel coordinates.

- useful in the process of camera calibration- can be expressed in either homogeneous or

inhomogeneous coordinates - Must account for the following transformations:1) Rigid body motion between the camera and the

scene2) Perspective projection of the 3D real world onto

the image plane3) CCD imaging – the geometry of the sensor array

33

Rigid body transformation

34

Euclidean, right-handed coordinate systems and vectors of coordinates

=⇔++=⇔

===

zyx

zyxOPOPzOPyOPx

Pkjikji

...

35

Coordinate changes: pure translation

POOOOB AABP +=A

BAB OPP +=

- basis vectors are parallel to each other

- Origins OA ≠OB

Convention : PF is the coordinate vector of point P in frame F

36

Coordinate Changes: Rotation

The rotation matrix describing the frame (A) in the coordinate system (B)

−=

1000cossin0sincos

θθθθ

RBA

( ) ( )PRP ABA

B =

Rotation about the z axis

Page 7: Image formation • Camera Models - UVic.caaalbu/computer vision 2008/Week 2... · {Basic assumptions: - the 3D scene consists of opaque and reflective objects in a transparent medium

7

37

Coordinate Changes: Pure Rotation

=

BABABA

BABABA

BABABABA R

kkkjkijkjjjiikijii

.........

=TB

A

TB

A

TB

A

kji

38

Geometric camera models

Camera models describe the mapping from world to pixel coordinates.

- useful in the process of camera calibration- can be expressed in either homogeneous or

inhomogeneous coordinates - Must account for the following transformations:1) Rigid body motion between the camera and the

scene2) Perspective projection of the 3D real world onto

the image plane3) CCD imaging

39

Perspective projection in inhomogeneous coordinates

c

c

c

c

ZYfy

ZXfx

=

=

Non-linear

40

Geometric camera models

Camera models describe the mapping from world to pixel coordinates.

- useful in the process of camera calibration- can be expressed in either homogeneous or

inhomogeneous coordinates - Must account for the following transformations:1) Rigid body motion between the camera and the

scene2) Perspective projection of the 3D real world onto

the image plane3) CCD imaging – the geometry of the sensor array

41

CCD imaging

Same scene, same camera viewpoint (external parameters), two different images

CCD imaging refers to the geometry of the CCD array (size and shape of pixels) and its position with respect to the optical axis.

42

Camera model – CCD imaging (inhomogeneous coordinates)

=

=

yx

vu

w x

• The imaging process involves digitization (discrete images)

• Thus, we define a vector w of pixel coordinates, related to the vector x of image coordinates.

• The relationship between x and w depends on the pixel size (aspect ratio).

The pixel size is :

Pixel coordinates

vu kk11

×

Image coordinatesu=u0 + ku x

v=v0 + kv y

Page 8: Image formation • Camera Models - UVic.caaalbu/computer vision 2008/Week 2... · {Basic assumptions: - the 3D scene consists of opaque and reflective objects in a transparent medium

8

43

Rigid camera motion in homogeneous coordinates

Pr is the rigid body transformation matrix

It is composed of extrinsic parameters and has 6 degrees of freedom (DOF). 44

Perspective projection in homogeneous coordinates

Pp is the projection matrix

45

CCD imaging in homogeneous coordinates

Equivalently, w=Pc x

Pc is the CCD calibration matrix

=

ssysx

vkuk

ssvsu

v

u

1000

0

0

0

46

Overall mapping from world coordinates to pixel coordinates inhomogeneous coordinates

w=PpsX

where Pps=PcPpPr is the camera projection matrix for a perspective camera

Intrinsic camera parameters : PcPp; Extrinsic camera parameters : Pr

Pps has 10 degrees of freedom (rank (Pps)=10) because f, ku and kvare not independent (2 DOF instead of 3).

==

10000100000000

1000

0

0

0 TRf

fvkuk

PPPP v

u

rpcps

47

The camera projection matrix

Is not a general 3 x 4 matrix, but has a special structure composed of Pr, Pp, and Pc.

It can be conveniently written as a product of two matrices

• αu =f ku and αv =f kv are image scaling factors

• ratio αu / αv is known as the aspect ratio

48

The projective camera

The perspective camera is a special case of the projective camera (11 degrees of freedom).a general 3 x 4 matrix

Page 9: Image formation • Camera Models - UVic.caaalbu/computer vision 2008/Week 2... · {Basic assumptions: - the 3D scene consists of opaque and reflective objects in a transparent medium

9

49

Projective camera versus perspective camera

It is more convenient to work with a projective camera model instead of a perspective one, since we do not have to worry about any nonlinear constraints on the elements of P.

Perspective=special case of projective. Thus, any results derived for the perspective camera will also work for the projective camera.

50

Camera calibration

Estimation of the projection matrix from an image of a controlled scene.Good images for calibration are grids with patterns of known size.

51

A typical set-up for grid-based calibration

52

Camera calibration

If we use a projective camera model we need to estimate 11 parameters (we can set p34 to 1)

53

Particular cases: how many parameters do we need to estimate?

2D→2D

1D→1D

54

Recovery of world position

With a calibrated camera, we can attempt to recover the world position of image features1D case (line to line)2D case (plane to plane)

Page 10: Image formation • Camera Models - UVic.caaalbu/computer vision 2008/Week 2... · {Basic assumptions: - the 3D scene consists of opaque and reflective objects in a transparent medium

10

55

Recovery of world position (cont’d)

3D case (3D world to image plane)

We need at least two cameras to determine the position of the world point: Stereo vision

56

Particular projection cases simplify the calibration process

Orthographic (parallel) projectionDepth of the objects in the scene is small compared to the distance of the camera to the scenef→∞, Zav→∞

Weak perspective Scaling according to the average depth of the scenePreserves ratios of segments and anglesThus preserves parallelism

57 58

Error introduced by the weak perspective approximation

59

Affine camera models

Describe weak and orthographic projectionError in assigned reading p. 40: parallel projection should be weak perspectiveThe form of the projection matrix is simplified when using the weak perspective assumption

60

Affine cameras – planar view

The 6 degrees of freedom for an affine planar camera

Page 11: Image formation • Camera Models - UVic.caaalbu/computer vision 2008/Week 2... · {Basic assumptions: - the 3D scene consists of opaque and reflective objects in a transparent medium

11

61

Sensing

Main difference between a modern camera and the camera obscura of the 17th century:

-ability to record the pictures formed in the backplane (photographic film, CCD technology)

-ability to focus the image with lenses

-The pinhole perspective is still considered an a convenient mathematical model for camera sensing

62

Lenses

Useful for :1. Gathering light. Under ideal pinhole projection, a single

ray of light will reach each point in the image plane. 2. Sharpening the imageThe trade-off between 1 and 2 is possible only if using

lenses.

63

Lenses behave according to the laws of geometric optics

-Light travels in straight lines (light rays) in homogeneousmedia- Reflection law- Refraction law : the incident ray, the refracted ray and the normal at the refraction surface are coplanar. Angles obey Snell’s law. Snell’s law n1 sinα1 = n2 sin α2

64

Paraxial (or first-order) optics

Snell’s law:

n1 sinα1 = n2 sin α2

Small angles:

n1 α1 ~ n2α2R

nndn

dn 12

2

2

1

1 −=+

The paraxial refraction equation

65

Thin lenses: basic properties

Any ray entering the lens parallel to the axis on one side goes through the focus on the other sideAny ray entering the lens from the focus on one side emerges parallel to the axis on the other side

66

Thin Lenses: a ray entering the lens and refracted at its right boundary is immediately refracted again at the left boundary.

)1(2 and11

'1 e wher

''

''

−==−

=

=

nRf

fzzzyzy

zxzx

All rays passing through P are focused by the thin lens on point P’(x’, y’, z’) along PO.

Page 12: Image formation • Camera Models - UVic.caaalbu/computer vision 2008/Week 2... · {Basic assumptions: - the 3D scene consists of opaque and reflective objects in a transparent medium

12

67

Thin lenses – depth of fieldWhen a lens focuses on an object at a given distance, all objects at the same distance are sharply focused. Objects located at different distances are out of focusand theoretically not sharp.Is it possible to reduce the size of the circle of confusion?

68

Spherical Aberration

Blue region : paraxial zone (small angles), where P corresponds to P’ (called paraxial image)

If the image plane is Π’, then the image of P is a circle of confusion of diameter d’.

The focus plane (dashed) leads to a circle of confusion of minimal diameter.

69

Barrel or pincushion?

70

How to correct (minimize) aberrations?

By aligning several simple lenses with well-chosen shapes (apertures) and refraction indexes- compound lenses

Vignetting : light beams emanating from objects located off-axis are partially blocked by the various apertures of individual lenses.

Brightness drops gradually in the periferal zones of the image.

71Vignetting in photography

72

CCD Camera has discrete elements

Lens collects light raysCCD elements replace chemicals of filmNumber of elements less than with film (so far)

Page 13: Image formation • Camera Models - UVic.caaalbu/computer vision 2008/Week 2... · {Basic assumptions: - the 3D scene consists of opaque and reflective objects in a transparent medium

13

73

CCD sensors

incoming light is recorded on a small, rectangular piece of silicon, called a charge-coupled device (CCD). This silicon wafer = array of individual light-sensitive cells called photosites. Each photosite corresponds to one picture element, or pixel.The CCD photosites sense incoming light through the photoelectric effect; an electron is released when the photosite is hit with a photon of light.electrons emitted within the CCD are fenced within nonconductive boundaries → they remain within the area of the photon strike. As long as light is allowed to impinge on a photosite, electrons will accumulate in that pixel. When the shutter is closed, the CCD array is unloaded using charge coupling; electrons in each pixel are counted, the resulting data is displayed and stored as an image.

74

Sensors in CCD cameras (cont’d)

( ) ( )( )

( ) λλλλ

dpdqpRpETcrIcrSp

∫ ∫∈

=,

,),(

T is the electron collection time;

The integral is computed over the spatial domain of the cell, and over its range of wavelengths.

E is the irradiance (more about it in following courses)

R – spatial response of the site

Q is the quantum efficiency (how many electrons are generated per unit of incident light energy)

75

You know now

Digital images can be acquired by a variety of acquisition processesCameras

Geometric image formation obeys laws of perspective projection geometryPinhole cameraMapping the 3D world onto 2D image coordinates must consider

Camera motion (rigid)Perspective projectionCCD imagingCamera calibration is necessary if we want an exact mapping

Lenses: optical systems that help enhance the brightness and overall quality of the imageCamera-based image acquisition features systematic distorsions and aberrations.