photometer: easy-to-use monographometrics

PhotoMeter: Easy-to-use MonoGraphoMetrics- Visual Metrology from Calibrated Images -

Hendrik Müller*

In cooperation of:

Prof. Jarek Rossignac Dr. Markus WackerGVU Center, IRIS Cluster Department of Computer Graphics

College of Computing College of ComputingGeorgia Institute of Technology, GA University of Technology Dresden

Dresden, July 30th 2004

* [email protected]

ii

ii

AcknowledgementsI have enjoyed my work at the GVU Center at the College of Computing at the Georgia

Institute of Technology immensely. It has been a fantastic experience, extremely useful for

my educational and personal growth. Being surrounded by helpful and outgoing people in

a friendly atmosphere and environment assisted the success immeasurably. I extend many

thanks to Prof. Jarek Rossignac for giving me this opportunity. Additionally, I would like to

thank Alexander Powell for his guidance and help in developing the user interface for Pho-

toMeter, and to Lawrence Ibarria and Francisco Palop for their suggestions and support.

I also want to underline Dr. Markus Wacker’s seamless takeover of my project and

his guidance upon my return to Dresden from Atlanta, GA. Finally, I express my gratitude

to Mark Wittman who proofread the whole thesis and gave me suggestions to improve the

grammar and fluency of my English.

iii

iii

Figure 1: Waterfall by M. C. Escher (1961). This painting illustrates perspective absurdities. It seems like thewaterfall is fed with its own water, but the painting's perspective projection is correct. This effect occursbecause of a specific view position where well-chosen parts of the flume overlap. Because of this picture andpictures like it, it is necessary to understand the issues of perspective. One way to find out if this building couldactually exist is to create a 3D model of it. Creating three-dimensional models from single images will be amain issue of the present thesis.

iv

iv

AbstractMonoGraphoMetrics is the science of computing 3D measures from a single image. Sev-

eral recent research activities have produced theoretical principles and practical tools for

extracting 3D measures from uncalibrated photographs, using either single- or multiple-

view-metrology. Some tools require that the user identifies configurations of edges, which

are used to establish constraints for the camera model or to identify planes in the scene.

Other tools detect vanishing lines and vanishing points automatically to compute the cam-

era position as a prerequisite to compute several dimensions. The process involved is often

laborious and its application is limited to images where the required configurations of

edges are visible. In contrast, the work presented here is limited to photographs taken with

a calibrated camera. More specifically, I assume to have a horizontally oriented camera

at a known height above the floor and with a known field of view. Under these simple

conditions, a single mouse operation – point-press-drag-release – provides enough infor-

mation to compute the 3D position of any point

€

P with respect to the coordinate system of

the camera, provided that

€

P and its projection

€

F on the floor can be identified in the im-

age. Horizontal and vertical displacements between pairs of points are displayed as 3D

measurements (arrows and dimension labels). This approach is the basis of the interactive

PhotoMeter system, in which a novice user can easily measure indoor and outdoor spaces

with only one or two mouse clicks per measurement. The resulting measurements obtain

dimensions and positions of buildings, rooms, offices, windows, doors, pieces of furniture,

and even people. Through theoretical and experimental analysis, I will discuss how errors

in lens distortion, camera calibration, height, and alignment impact the accuracy of the re-

ported measurements. For example, I achieved an inaccuracy of less than five centimeters

when measuring walls, doors, or furniture at five meters from the camera. I envision appli-

cations of this simple technology in a variety of fields involving architecture, home acquisi-

tion, interior decoration, and online purchase of furniture.

v

v

German SummaryMonoGraphoMetrics ist die Wissenschaft der Berechnung von dreidimensionalen Maßen

aus einem einzelnen Bild heraus. Aktuelle Forschungsaktivitäten haben theoretische

Grundsätze und praktische Werkzeuge entwickelt um aus unkalibrierten Fotografien 3D

Maße durch Metrologie mit einzelnen oder mehreren Ansichten zu extrahieren. Einige

Werkzeuge gehen davon aus, dass der Benutzer Anordnungen von Kanten aus dem Bild

identifiziert, welche später dazu genutzt werden um Einschränkungen für das Kameramo-

dell zu bestimmen und um Ebenen in der Szene zu erkennen. Andere Werkzeuge ermitteln

automatisch Fluchtlinien und Fluchtpunkte um die Kameraposition zu bestimmen, welche als

die Voraussetzung für die Berechnung zahlreicher Abmessungen dienen. Der damit ver-

bundene Prozess ist oft mühselig und deren Anwendungen sind auf Bilder begrenzt, in de-

nen benötigte Kantenanordnungen erkennbar sind. Im Gegensatz dazu ist die Arbeit, die

hier präsentiert wird, auf Fotografien limitiert, die mit einer kalibrierten Kamera erstellt

wurden. Ich setze voraus, dass eine horizontal orientierte Kamera mit einer bekannten

Höhe über dem Fußboden und einem bekannten Sichtfeld eingesetzt wird. Unter diesen

einfachen Umständen stellt eine einzige Mausoperation (klicken-ziehen-loslassen) genü-

gend Informationen zur Verfügung um die 3D-Position jedes Punktes

€

P relativ zum Koor-

dinatensystem der Kamera zu berechnen. Die Voraussetzung dafür ist, dass der Punkt

€

Pund dessen Projektion

€

F auf den Fußboden im Bild identifiziert werden können. Horizonta-

le und vertikale Positionsunterschiede zwischen Paaren von Punkten werden als 3D-Maße

(Pfeile mit Abmessungen) dargestellt. Diese Herangehensweise bietet die Basis des interak-

tiven PhotoMeter-Systems, in welchem es einem Einsteiger mit nur einem oder zwei Maus-

klicks pro Messung oder Position leicht möglich ist innere und äußere Bereiche, wie Ge-

bäude, Räume, Büros, Fenstern, Türen, Möbelstücke und sogar Personen, auszumessen.

Durch theoretische und experimentelle Analysen erörtere ich, wie Fehler durch Linsenver-

zerrung, Kalibrierung, Höhe und Ausrichtung der Kamera die aufgeführten Abmessungen

beeinflussen. Zum Beispiel erreiche ich eine Ungenauigkeit von weniger als fünf Zentimeter

für Wände, Türen oder Möbel, welche sich in einer Entfernung von fünf Meter von der

Kamera befinden. Diese Anwendung kann durch ihre einfache Technologie in einer Viel-

zahl von Gebieten, wie der Architektur, der Innendekoration und sogar dem Online-Kauf

von Möbelstücken, eingesetzt werden.

vi

vi

Table of ContentsAcknowledgements.........................................................................................................ii

Abstract........................................................................................................................iv

German Summary ..........................................................................................................v

Table of Contents ..........................................................................................................vi

List of Figures .............................................................................................................. viii

1 Introduction............................................................................................................. 1

1.1 Measurements from images.............................................................................1

1.2 Using vision vs. reality ....................................................................................2

1.3 Thesis outline .................................................................................................4

2 Projective Geometry ................................................................................................ 5

2.1 Projective geometry and art history..................................................................5

2.1.1 The visual pyramid & Alberti’s “open window”.....................................6

2.1.2 Alberti’s perspective construction.........................................................7

2.1.3 Pelerin’s “distance point construction”..................................................8

2.2 Camera models and perspective mappings ......................................................9

2.2.1 Pinhole camera model ........................................................................9

2.2.2 Plane-to-plane homography...............................................................11

2.2.3 Plane-to-plane homology...................................................................12

2.3 Distortion correction .....................................................................................13

2.4 Vanishing points and vanishing lines ..............................................................14

3 Previous & Related Work ....................................................................................... 16

3.1 Metrology & modeling from a single image ...................................................17

3.1.1 “Tour Into the Picture” ......................................................................18

3.1.2 “Single-view metrology” ...................................................................20

3.2 Metrology & modeling from multiple images ..................................................23

3.2.1 Using two views ...............................................................................23

3.2.2 Using three or more views.................................................................24

3.3 Metrology & modeling from panoramas?.......................................................24

4 PhotoMeter: Easy-to-use MonoGraphoMetrics ......................................................... 26

vii

vii

4.1 Introduction .................................................................................................26

4.2 Computations...............................................................................................27

4.2.1 Main Idea........................................................................................27

4.2.2 Mathematical/geometrical calculations ..............................................28

4.2.3 Validation with “3D Measurement Simulation” ...................................32

4.3 Requirements & Preliminaries ........................................................................34

4.4 User Interface ..............................................................................................36

4.5 Implementation & Documentation ..................................................................39

4.5.1 Implementation decisions ..................................................................39

4.5.2 Class diagrams.................................................................................40

4.5.3 Activity diagrams..............................................................................40

4.6 Results.........................................................................................................43

5 Conclusion ............................................................................................................ 47

Bibliography ............................................................................................................... 48

CD Contents................................................................................................................ 50

viii

viii

List of FiguresFigure 1: Waterfall by M. C. Escher (1961. ....................................................................iii

Figure 2: A three-dimensional visual measuring device.. ...................................................1

Figure 3: Errors from input to output. ..............................................................................2

Figure 4: First proof of perspective effect by Leonardo. ....................................................3

Figure 5: From left to right: Filippo Brunelleschi and Leon Battista Alberti. ..........................5

Figure 6: The model of a visual pyramid..........................................................................6

Figure 7: Leonardo da Vinci The Annunciation (c. 1472-1475). ........................................7

Figure 8: Alberti’s perspective construction. .....................................................................8

Figure 9: Two applications of perspective construction with Alberti’s method......................8

Figure 10: Pelerin's distance point construction process. ...................................................9

Figure 11: Pinhole camera model. ..................................................................................9

Figure 12: Leonardo’s perspectograph; A perspective machine from Albrecht Dürer ........10

Figure 13: Plane-to-plane camera model. ......................................................................11

Figure 14: Inter-image homography.. ............................................................................12

Figure 15: Planar homology is defined by a vertex, an axis, and a characteristic ratio. ....12

Figure 16: Radial distortion correction. .........................................................................14

Figure 17: Extracted parallel lines, vanishing points, and a vanishing line. .......................15

Figure 18: "Tour into the picture" process overview. .......................................................18

Figure 19: "Tour into the picture": a) spidery mesh; b) five regions. .................................19

Figure 20: Basic geometry of Criminisi’s approach. ........................................................21

Figure 21: Distance between two planes.. .....................................................................21

Figure 22: Distance between two planes 2.. ..................................................................22

Figure 23: An example of results obtained with Criminisi’s single-view metrology. ............23

Figure 24: Stereovision system......................................................................................23

Figure 25: Vanishing point a vanishing circle for a spherical panorama image. ................25

Figure 26: Using a spherical image to create other TIP views. .........................................25

Figure 27: Using a cylindrical image to create other TIP views. .......................................25

Figure 28: PhotoMeter's applications. ...........................................................................26

Figure 29: The basic setup of the 3D scene (left) and basic labels in the scene (right).. .....27

Figure 30: The sequence shows how to measure a window with two mouse clicks.............28

Figure 31: 3D view showing the geometric computation of the horizontal distance. ..........29

Figure 32: Top view showing how to compute the x and z coordinates:. ..........................29

Figure 33: Side view of vertical section plane defined in Figure 32..................................29

Figure 34: Implementation of the basic PhotoMeter computations....................................31

Figure 35: Shows the two different views of 3D Measurement Simulation. .......................33

ix

ix

Figure 36: Shows a section of 3D Measurement Simulation. ...........................................33

Figure 37: Shows the comparison process. ....................................................................34

Figure 38: Initial camera calibration panel in PhotoMeter...............................................35

Figure 39: An inexpensive laser level for adjusting the horizontal. ..................................35

Figure 40: Start screen of PhotoMeter...........................................................................36

Figure 41: Menu of PhotoMeter with all submenus. ........................................................37

Figure 42: The user interface of PhotoMeter. .................................................................38

Figure 43: PhotoMeter panels. .....................................................................................38

Figure 44: Main UML class diagram of PhotoMeter. ......................................................41

Figure 45: UML activity diagram explaining, the use of panels and dialogs. ....................42

Figure 46: UML activity diagram explaining mouse actions in the window. ......................43

Figure 47: Acceptable tilt as a function of depth. ...........................................................44

Figure 48: Error in computed height as a function of distance for a tilt of one degree. ......44

Figure 49: Position error as a function of inaccuracy of the camera height. ......................45

Figure 50: Error in computed height as a function of distance for due to pixel resolution...45

Figure 51: The error in the dimensions measured with PhotoMeter. .................................46

1

1

1 IntroductionIn this section I will introduce and state the motivation for metrology from images. I will

expose the basic model and differentiate between metrology from vision and metrology

from reality.

1.1 Measurements from imagesImages or sequences of images potentially carry a tremendous amount of geometrical in-

formation about the represented scene. It is one aim of computer graphics and computer

vision to extract this information in a quantifiable, accurate way. By interpreting photo-

graphs, general techniques are developed to reconstruct a three-dimensional digital model

of a scene. The leading idea is as follows (see Figure 2):

(i) A person takes some pictures of the scene (or the object) to be measured.

(ii) A computer creates a 3D metric model of the viewed scene by interpreting

those images.

(iii) Finally, the model is stored in a database, which may be queried at any

time for measurements via a graphical user interface.

Figure 2: A three-dimensional visual measuring device: (i) a photograph of a scene is taken; (ii) the image istransferred into a computer and interpreted; (iii) a 3D model of the viewed scene is reconstructed and interac-tively queried for measurements.

Such a device possesses several interesting features:

(i) It is user friendly. In fact, once the images are taken and the model is built,

the user can virtually walk trough it, view the scene from different locations,

take measurements querying the software interface and store them in a da-

2

2

tabase, interact with the objects of the scene, place new, consistent objects

in the scene (augmented reality) and create animations;

(ii) The capture process is rapid and simple since it only involves one camera

taking pictures of the environment to be measured;

(iii) The acquired data are stored digitally on a disk, ready for reuse at any time

when measurements of the original scene and its objects are needed;

(iv) The hardware involved is cheap and easy to use. No new, dedicated hard-

ware is necessary.

The work presented here first gives an overview of existing single- and multiple-view

metrology methods underlying simple geometric mathematics. A new way of extracting

measurements from a single image taken with a calibrated camera using a special graphi-

cal user interface is presented as the main topic.

The process of taking measurements is traditionally a difficult engineering task, and

like all the engineering tasks, must be accurate and robust. This image is taken with a cali-

brated camera and this data, all transformations, and the output measurement are affected

by error. (see Figure 3).

Figure 3: Errors: the input data are processed by the transformation T to obtain the required output measure-ment. Input data and transformations are affected by error, leading to error in the output measurement.

Because the process presented here is highly fault-prone, I will provide a detailed error

analysis at the end of this thesis.

1.2 Using vision vs. realityFor digital measurements of the real world, several different methods of distance measur-

ing are used. Devices that help to obtain these dimensions can be categorized as active

and passive devices. Active and passive devices differ in their acquisition of the measure-

ments. Active devices send signals into the environment and receive reflected ones back.

The returned signals are analyzed and compared with the outgoing ones to retrieve infor-

mation related to distances. Contrarily, passive devices do not use any active signals; they

rely on after-analyses done on the given date, e.g. a photograph taken with a camera.

Many distance-measuring systems have been based on ultrasonic technology. It is

possible to buy relatively cheap ultrasonic devices capable of measuring the distance from

the operator to an object (such as a wall) with an echo reflection time measurement sys-

tem. Ultrasonic scanners have, for instance, been successfully used in medical imaging and

in robotic problems [FM94, SS94]. The main problem of such an approach is that the

3

3

measurement returned is affected by strange phenomena like multiple reflections of the ul-

trasound waves on objects, thus leading to wrong estimation of the reflection time. A sec-

ond approach for measuring depths is the use of laser range finders. These devices work

by directing laser beams onto the object to be measured and analyzing the phase or echo-

ing time of the reflected beams [Le99]. Those systems are extremely accurate but they suf-

fer problems similar to those of ultrasonic devices. Other active devices employ cameras to

acquire images of an object illuminated by a regular light pattern projected by special

auxiliary devices. The shape of the object is computed from the deformation of the pro-

jected grid [Ma92]. Consequently, active devices are affected by reflections or interfer-

ences and need to be used with care. Passive devices such as cameras do not suffer from

the above problems and are characterized by a wide range of applications. They can be

applied to measure the distance of the device from an object as well as the distance be-

tween two arbitrary points in space, areas of surface and angles. Cameras return bi-

dimensional data (rather than mono-dimensional ones from active devices) by dense sam-

pling within the field of view. So, they can measure objects, which are far as well as close

ones as long as they are visible in the image. Speed is not an issue for such devices, be-

cause the photograph is taken in a few milliseconds. In contrary to these advantages, there

are many issues and problems arising when using cameras.

Taking measurements of the world from images is complicated by the fact that in the

imaging process the 3D space is projected onto a planar image, with some unavoidable

loss of information. Reconstruction of the scene means retrieving that information from the

images and/or the camera’s internal parameters. In particular, perspective distortions oc-

cur during the acquisition stage (see Figure 4). For example, objects that are far away

from the eye or the camera look smaller than objects that are close. Furthermore, camera

properties lead to lens distortion and other issues.

Figure 4: First proof of perspective effect by Leonardo: “Among objects of equal size that which is most remotefrom the eye will look smallest.”

4

4

1.3 Thesis outlineThe remainder of this thesis is organized as follows: In section 2, I give an overview of

mathematical foundations and the development of projective geometry. I start with a jour-

ney into art, describe camera models, and provide basics about vanishing points and van-

ishing lines. Previous and related work about metrology from single views, multiple views,

and panoramas are explained in section 3. I delve into “Tour into the picture” and single-

view metrology as two examples of visual metrology. Section 4 is dedicated to the results

of my research in single-view metrology. There, I will primarily describe the method used

to compute the location of 3D points from a single mouse click. This also includes descrip-

tions of the developed system’s user interface and its implementation decisions. Further-

more, results from simulations and real data are provided and discussed. Finally, I con-

clude this thesis with a short summary and a discussion of possible future work in the last

section.

5

5

2 Projective GeometryIn this section, I will introduce the mathematical foundations used in metrology from im-

ages. This involves a journey into art history, a presentation of different camera models,

and also a short discussion about distortion correction. If the reader has previous knowl-

edge of the contents within this section, it can be skipped.

To begin with projective geometry, we first need to make sure we understand the

differences between transformations and projections. Transformations are mappings within

n-dimensional space, for example the movement of a point in a 3D space. In contrast, a

projection is a mapping from n-dimensional space down to lower-dimensional subspace,

e.g. the mapping of a point in 3D to a point on a plane (a 2D entity). Here, we are inter-

ested in such 3D to 2D projections where the plane is an image. Basically, there are two

different kinds of projection: orthographic projection and perspective projection. In ortho-

graphic projection, perspective effects do not occur – distant objects look same as near

ones. With perspective projections further objects appear smaller than nearer ones and

lines need to be seen as preserved.

In the present thesis, only the perspective projection model is used and explained,

since it can be applied to real world cognition. So, images taken with cameras are based

on this model. For further details, I will refer to [A1435], [Ae1435], and [Crim01] for this

section.

2.1 Projective geometry and art history

Figure 5: From left to right: Filippo Brunelleschi and Leon Battista Alberti.

In the early renaissance, painters started using perspective projection to come up

with mathematically correct paintings. Ironically, the invention of mathematical rules for

correct perspective came not from a painter, but from the sculptor and architect Filippo

Brunelleschi (1377-1446). He did not write down his deliberations, and as a result the first

written account of a method to construct pictures in correct perspective is found in a trea-

tise written by the learned humanist Leon Battista Alberti (1404-1472). The first version,

written in Latin and dated 1435, was entitled “De Pictura” (On Painting), and was Al-

6

6

berti’s effort to relate the development of painting in Florence with his own theories of art.

Later in 1505, Jean Pelerin (1445-1522) published his book “De Artificiali Perspectiva”

that describes another way of perspective construction. Here, I will enter into details of the

work of Alberti [A1435, Ae1435] and Pelerin.

2.1.1 The visual pyramid & Alberti’s “open window”Alberti used the common knowledge of a visual pyramid to understand perspective. He

described the visual pyramid as “…a figure of a body from whose base straight lines are

drawn upward, terminating in a single point. The base of this pyramid is a plane, which is

seen. The sides of the pyramid are those rays, which I have called extrinsic. The cuspid,

that is the point of the pyramid, is located within the eye where the angle of the quantity

is…” [Ae1435] (see Figure 6).

Figure 6: The model of a visual pyramid (also known as pyramid of vision or visual cone).

He defined a two-dimensional image as the cutting plane through the visual pyramid. To

explain this, he made the following mental experiment: imagine an image plane out of

glass. The painter/viewer is located at the peak of the pyramid. Now, when looking from

the peak through the pyramid, you can see lots of different planes cutting through the im-

age plane; some are parallel to the cutting edges, others are perpendicular, transversal,

or horizontal. When projecting these lines on the image plane, you get the correct per-

spective projection. So, the perspective correct image comes into being by a linear projec-

tion from the base plane of the pyramid onto the image plane.

In a different way, this method was explained with an “open window” [A1435]: To

use linear perspective an artist must first imagine the picture surface as an “open window”

through which he sees the painted world. Straight lines are then drawn on the canvas to

represent the horizon. “Visual rays” connect the viewer's eye to a point in the distance.

The horizon line runs across the canvas at the eye level of the viewer. The horizon line is

where the sky appears to meet the ground. The vanishing point should be located near the

7

7

center of the horizon line. Accordingly, it is located where all parallel lines, which he

called orthogonals, that run towards the horizon line appear to come together like train

tracks in the distance. Orthogonal lines are “visual rays” helping the viewer's eye to con-

nect points from the edges of the canvas to the vanishing point. An artist uses them to align

the edges of walls and paving stones. Leonardo learned this method as an apprentice in

Florence and produced his painting “Annunciation” (see Figure 7) when he was only 21

years old.

Figure 7: Leonardo da Vinci The Annunciation (1472-1475): Shows the correctness of perspective in hispainting. The green horizontal line is the vanishing line with the red circle representing the vanishing point,which is the location where all parallel lines - orthogonals - (drawn in blue) cross.

2.1.2 Alberti’s perspective constructionUsing the previously described theory, Alberti came up with his perspective construction of

an image from the real world. He explained it with a floor or pavement. Referring to

Figure 8(a), the centric point

€

C is chosen; this is the point in the picture directly opposite to

the viewer’s eye. It is also known as the central vanishing point. The ground plane

€

AB in

the picture is divided equally, and each division point is joined to

€

C by a line. These are

lines that run perpendicular to the plane of the picture. In Figure 8(b), the point

€

R is de-

termined by setting

€

NR as the viewing distance. The viewing distance is how far the

painter was from the picture. This is then how far a viewer should stand from the picture.

€

R is known as the right diagonal vanishing point. The lines from the edge

€

AB to the cen-

tric point

€

C are intersected by the lines converging at

€

R. As seen in Figure 8(c), these in-

tersection points highlighted as red dots are the basis to draw lines perpendicular to the

line

€

NB . These formed lines are called transversals and run parallel to the ground line

€

ABof the picture. Notice that the diagonal points of the squares in the grid can be joined by a

straight line. This is an indication that Alberti’s construction shows the ground plane of a

picture in the correct perspective. To summarize, the result of this construction is a floor like

a checkerboard (green in Figure 8(c)) in the right perspective relative to specific viewing

8

8

distance. The construction of the floor can be used to draw other objects like cuboids or a

circle (see Figure 9). Alberti became very famous with this method, even though it was not

explained completely and nor was it proved correctly.

Figure 8: Alberti’s perspective construction, original [A1435] (left). The construction process (right).

Figure 9: Two applications of perspective construction with Alberti’s method: (a) the rectangle

€

PQRS is thebase plane of the cuboid. The perspective construction of this plane is originated in the perspective floor. Thewhole cuboid is then created; (b) the circle can be drawn in perspective.

2.1.3 Pelerin’s “distance point construction”Historical records show that besides Alberti’s construction, there were other methods for

constructing floors. One of them is known as the distance point construction, and was

found in the treatise of Jean Pelerin (1445-1522) entitled “De Artificiali Perspectiva”. Re-

ferring to Figure 10(a), the ground line is

€

AB is divided equally, and each of these divi-

9

9

sion points are joined to the centric point

€

C . Next, the distance point

€

D is chosen. The dis-

tance

€

CD is the viewing distance. As in Figure 10(b), the line

€

AD will intersect all the or-

thogonals. These intersection points are used to draw the transversals, which are parallel

to

€

AB . The result is a floor as accurate in perspective Alberti’s method would produce.

This could be proven geometrically, as discussed in more detail in [NUS04].

Figure 10: Pelerin's distance point construction process.

2.2 Camera models and perspective mappingsNow that we have gone through perspective constructions in art history, we need to ad-

dress mathematics. In current single- and multiple-view metrology, the image formation

process must be modeled in a rigorous mathematical way. This section describes the cam-

era models and the projective transformations, which are relevant for these methods.

2.2.1 Pinhole camera modelThe pinhole camera is the simplest, and the ideal, model of camera function. It has an in-

finitesimally small hole through which light enters before forming an inverted image on the

camera surface facing the hole. The intersection of the light rays with the camera surface

plane form the image of the object. To simplify things, we usually model a pinhole camera

by placing the image plane between the focal point of the camera and the object, such

that the image is not inverted. Such a mapping from three dimensions onto two dimensions

is called perspective projection. This takes place because of straight lines that pass through

a single point. A schematic pinhole camera model is presented in Figure 11.

Figure 11: Pinhole camera model: a point

€

X in the 3D space is imaged as

€

x . Euclidean coordinates

€

X ,Y ,Zand

€

x,y are used for the world and image references systems, respectively.

€

O is the center of projection, theviewer.

10

10

Simple geometry shows that if we denote the distance of the image plane to the cen-

ter of projection by

€

f (also known as the focal length), then the image coordinates

€

x = (x,y)Τ are related to the object coordinates

€

X = (X ,Y ,Z )Τ by similar triangles by

€

x = f XZ

;

€

y = f YZ

. (1)

These equations are non-linear. They can be made linear by introducing homogenous co-

ordinates, which is effectively a matter of embedding the Euclidean geometry into the per-

spective framework. Each point

€

X ,Y ,Z( ) in three-space is mapped onto a line in four-space

given by

€

WX ,WY ,WZ ,W( ), where

€

W is a dummy variable that sweeps out the line

(

€

W ≠ 0). In homogenous coordinates, the perspective projection onto the plane is given by

€

x = PX with

€

P =

f 0 0 00 f 0 00 0 1 0

€

xyw

=

f 0 0 00 f 0 00 0 1 0

XYZW

. (2)

The “=” in function (2) stands for equality up to scale. That means the result needs to be

multiplied with a scale factor for correctness. This, however, is not important here. The

camera model is completely specified once the matrix

€

P is determined. The matrix can be

computed from the relative positioning of the world points and camera center, and from

the camera’s internal parameters (focal length); however, it can also be computed directly

from image-to-world point correspondences as shown in (1).

Leonardo da Vinci and Albrecht Dürer were two of the first painters to use the tech-

nique of a pinhole camera model to paint pictures with correct perspective (see Figure

12). They used special machines instead of geometric calculations.

Figure 12: left: Leonardo’s perspectograph “The things approach the point of the eye in pyramids, and thesepyramids are intersected on the glass plane.” Leonardo da Vinci (1453-1519); right: A perspective machinefrom Albrecht Dürer “Underweysung der Messung mit Zirckel und Richtscheyt” (1525).

11

11

2.2.2 Plane-to-plane homographyAn interesting specialization of the general central projection described above is a plane-

to-plane projection; a 2D-2D projective mapping. The camera model for perspective im-

ages of planes, mapping points on a world plane to points on the image plane (and vice-

versa) is well known [SeKn79]. Points on a plane are mapped to points on another plane

by a plane-to-plane homography, also known as a planar projective transformation. It is

an invertible mapping induced by the star of rays focused in the camera center (center of

projection). Planar homographies arise, for example, when a world planar surface is im-

aged.

Figure 13: Plane-to-plane camera model: a point

€

X in the world plane is imaged as

€

x . Euclidean coordinates

€

X ,Y and

€

x,y are used for the world and image planes, respectively.

€

O is the viewer’s position.

This homography can be described by a 3x3 homogenous matrix. Figure 13 shows

the imaging process. Under perspective projection corresponding points

€

X = (X ,Y )Τ in the

world plane and

€

x = (x,y)Τ in the image plane are related homogenously by

€

x = HX with

€

H =

f 0 00 f 00 0 1

€

xyw

=

f 0 00 f 00 0 1

XYW

. (3)

The

€

W or

€

w in this function is necessary to apply matrix transformations like rotation, scal-

ing, and transformation to the object. The “=” stands again for equality up to scale. The

camera model is completely specified once the matrix is determined. Here too the matrix

can be computed from the relative positioning of the two planes and camera center, and

the camera internal parameters; however, it can also be computed directly from image-to-

world correspondences.

The plane-to-plane camera model can also be used to understand inter-image ho-

mography. A planar surface viewed from two different viewpoints induces a homography

between the two images. Some points on the world plane can be transferred from one to

12

12

the other by means of a homography mapping (see Figure 14). A few visible mappings

are enough to describe the projection matrix.

Figure 14: Inter-image homography: the floor viewed in both images induces a homography. Some points canbe mapped from one image to the other.

2.2.3 Plane-to-plane homologyA planar homology is a plane-to-plane projective transformation and a specialization of

the homography. It is characterized by a line of fixed points, called the axis and a distinct

fixed point not on the axis known as the vertex (see Figure 15). Planar homologies arise in

several imaging situations, for instance, when different light sources cast shadows of an

object onto the same plane.

Figure 15: A planar homology is defined by a vertex, an axis, and a characteristic ratio: its characteristic in-

variant s given by the cross-ratio

€

v, p1,p2 ,ip( ) where

€

p1 and

€

p2 are any pair of points mapped by the ho-

mology and

€

ip is the intersection of the line through

€

p1 and

€

p2 and the axis. The point

€

p1 is projected onto

the point

€

p2 under the homology, and similarly for

€

q1 and

€

q2 .

13

13

Such a transformation is defined by a 3x3 matrix (

€

H ) with one distinct eigenvalue whose

corresponding eigenvector is the vertex, and two repeated eigenvalues whose correspond-

ing eigenvectors span the axis. A planar homology can be interpreted as a particular pla-

nar homography. The projective transformation representing the homology can be given

directly in terms of the 3-vector representing the axis

€

a , the 3-vector representing the ver-

tex

€

v , and a scalar factor

€

µ as

€

H = I + µvaΤ

v•a . (4)

The factor

€

µ is the characteristic ratio and it can be computed as the cross-ratio of four

aligned points

€

a,b,c,d . The cross-ratio

€

µ = a,b,c,d( ) is then given by

€

a,b,c,d( ) =c− a( ) d − b( )c− b( ) d − a( )

. (5)

This case can also be seen in Figure 15. For more details on cross-ratios see [Sam86].

2.3 Distortion correctionA prerequisite of the theory treated in this thesis is that the camera behaves according to

the pinhole model. But cheap wide-angle lenses, such as those used in security systems vio-

late this requirement. In such cases the grossest distortions from the pinhole model are

usually radial. A correction step is, therefore, necessary before any metrology process

may be performed. Several possible methods have been investigated to correct such a dis-

tortion. A simple correction has been proposed by Devernay and Faugeras [DeFa01]

where only one image of the scene is necessary and the radial distortion model is com-

puted from the deformation of images of world straight edges. Devernay’s algorithm first

extracts edges from the image and measures how much these are bended compared to a

straight line. On the basis of the degree of distortion, a correction factor

€

f rd( ) can be

computed. So, it is possible to correct the whole image in the following way: A point

€

xc inthe corrected image can be computed from its corresponding point

€

xd in the distorted im-

age with a given center

€

c of distortion by

€

xc = c+ f rd( )× xd − c( ) . (6)

Idealistically, all previously detected lines, that have been bended because of per-

spective projection, become straight and the image can be considered to be taken with a

pinhole camera. In fact, this is not possible for all edges in reality, because not all bended

edges can be detected. With such a corrected image, perspective and metrology algo-

rithms can be safely performed.

14

14

An example from [Crim01] shows an image captured by a cheap security type cam-

era which exhibits radial distortion (see Figure 16). Note how straight edges in the scene

appear curved in the image. After extracting all edges, a set of edges assumed to be

straight in the scene has been selected. From those, the distortion parameters has been

computed and the image can be corrected accordingly. Note that now images of straight

edges in the world are straight. I did not implement any distortion correction algorithm into

my system because there are several solutions offered already†. Thus, the lens distortion of

the image from which measurements are going to be computed has to be corrected be-

forehand.

Figure 16: Radial distortion correction; a) original image showing radial distortion; b) lines corresponding tostraight world edges have been selected in image; c) corrected image; d) edges from corrected image.[Crim01]

2.4 Vanishing points and vanishing linesVanishing points and vanishing lines are extremely powerful geometric cues. They convey

vast amounts of information about direction of lines and orientation of planes. These enti-

ties can be estimated directly from the image and no explicit knowledge of the relative ge-

ometry between camera and viewed scene is required. Often they lie outside the physical

image, but his does not affect the computations.

† An example for software that corrects lens distortion is LensDoc from Andromeda Software(http://www.andromeda.com/info/lensdoc/).

15

15

After straight-line segments in the image have been detected with an edge detector

and broken ones have been merged, vanishing points can be computed. Images of paral-

lel world lines intersect each other in the same vanishing point. Note here, that these lines

are parallel in the world, but because of perspective distortion they do not look parallel in

the image anymore. The vanishing point is, therefore, defined by at least two such lines.

Images of lines parallel to each other and to a plane intersect in points on the vanishing

line. Therefore, two sets of those lines with different directions are sufficient to define the

vanishing line (see Figure 17). A Maximum Likelihood Estimate algorithm [Crim01] can be

employed when more than two lines or orientations are available to compute the most-

likely correct intersection.

Figure 17: Extracted parallel lines, vanishing points, and a vanishing line from an image.

16

16

3 Previous & Related WorkThere are particular areas of research that are applicable to my work: multiple- and

single-view modeling, rendering, and metrology. This section presents a survey of the most

significant work in the field of the three-dimensional reconstruction from two-dimensional

images. The papers are arranged from single-view systems to multi-view ones. I will talk

specifically about the use of panoramas and about the method “Tour into the picture”

(TIP).

There have been various papers about image-based modeling and rendering and

image-based metrology. These results are closely related to the approach presented here.

Metrology only adds the knowledge of a reference distance in the image on which basis

other measurements can be obtained. For modeling purposes, multiple input images are

necessary to get a 360-degree viewable model [DTM96]. On the other hand, for metrol-

ogy, a single image can be enough [HAA97, LCZ99, CRZ00, Crim01, Crim02, KBB02,

WW02, ElHa01, KCS03].

In 1996, Debevec et al. [DTM96] proposed a hybrid geometry- and image-based

approach to model and render architecture from photographs. He developed a photo-

grammetric modeling method, which facilitates the recovery of the basic geometry of the

photographed scene. Also, he came up with a better model-based stereo algorithm to re-

cover how the real scene deviates from the basic model. Finally, with view-dependent tex-

ture mapping his method was composed of multiple views of the scene.

“Tour Into the Picture” [HAA97], also known as TIP, presented in 1997 was one of

the first approaches using a single image to create a 3D scene and an animation out of it.

This will be described in more detail in section 3.1.1. In 1999, Criminisi et al. [LZC99]

started to contribute tremendously to multiple- and single-view metrology and modeling.

Contrary to Debevec’s accomplishments, Criminisi’s approach does not require multiple

images and the camera’s internal calibration. His algorithms use vanishing points and van-

ishing lines to establish planes by defining a square with four control points. These are

rectified and can then be used to take measurements on that plane knowing a reference

distance of an object on the same plane. Furthermore, it is possible to compute internal

camera parameters knowing three orthogonal vanishing points. This technique is restricted

to exceptional images, e.g. ones where specific edge configurations can be identified.

Later, he added the option to measure between parallel planes and emphasize the usage

of uncalibrated images and finally, the reconstruction of complete 3D scenes from single

images [CRZ00, Crim01, Crim02]. He assumes to have images from which vanishing lines

can be extracted. I will talk more about Criminisi’s single-view metrology in section 3.1.2.

Several authors describe extensions of Criminisi’s method. A vanishing point based method

17

17

is of equal precision and robustness compared with Criminisi’s homography-based ap-

proach, but with less complexity [WW02]. Another approach [ElHa01] neither needs van-

ishing lines nor calibrated images to create complete 3D models. Finally, Kushal et al.

[KBB02] suggested a method using two planes in the scene, selected from the user, to con-

struct a three-dimensional representation of the image. Furthermore, in [KCS03] they

showed a method based on building the final model through high-level primitives like

planes, spheres, cuboids, and others.

3.1 Metrology & modeling from a single imageIn general, one view alone does not provide enough information for a complete 3D recon-

struction. However, some metric quantities can be computed from the knowledge of some

geometrical information, such as the relative position of points, lines, and planes in the

scene. But, generally, in order to do so, the intrinsic parameters of the camera need to be

known. These parameters are: focal length, principal point, skew, and aspect ratio

[Fau93].

A number of visual algorithms have been developed to compute the intrinsic pa-

rameters of a camera in the case that they are not known. This task is called camera cali-

bration. Usually, calibration algorithms assume some of the camera internal parameters to

be known and derive the remaining ones. Common assumptions: unit aspect ratio, zero

skew, or coincidence of principal point and image center. The work of Tsai [Tsai87] has

been popular in the field of camera calibration. From a single image of a known, planar

calibration grid (e.g. a checker board) the algorithm estimates the focal length of the cam-

era, its external position, and its orientation assuming a known principal point. The prob-

lem of calibrating a camera is also discussed by several other research scientists, for ex-

ample by Faugeras [Fau93, DeFa01] as mentioned before in section 2.3. He presents al-

gorithms to compute the projection matrix (external calibration) and eventually the camera

internal parameters from only one view of a 3D known grid. He analyses linear and non-

linear methods for estimating the 3D-2D projection matrix, the robustness of the estimate

and the best location of the reference points. An interesting problem is addressed in

[KSH98] by Kim et al. In this paper, the authors compute the position of a ball from single

images of a football game. By making use of shadows on the ground plane and simple

geometric relationships based on similar triangles the ball can be tracked throughout the

sequence.

In the following sections, I will discuss TIP [HAA97] and Criminisi’s “Single-view me-

trology” [LZC99, CRZ01] in more detail. With TIP it is not necessary to know internal cam-

era parameters nor any reference distances. Nothing is needed to create a kind of walk-

through or fly-through animation. “Single-view metrology” suggests a way of measuring

18

18

the scene without knowing any internal camera parameters. Other scene constraints are

necessary here. I note that the approach developed in this thesis does need some internal

camera parameters, but does not require special scenes like in “Single-view metrology”.

3.1.1 “Tour Into the Picture”TIP presented in 1997 [HAA97] was one of the first approaches using a single image to

create a 3D scene and an animation out of it. The main idea of this method is simply to

provide a user interface, which allows the user to easily and interactively perform the fol-

lowing operations. I follow [HAA97].

(i) Adding “virtual” vanishing points for the scene – The specification of the

vanishing point should be done by the user.

(ii) Distinguishing foreground objects from background – The decision as to

whether an object in the scene is near the viewer should be made by the

user, since no 3D geometry of the scene is known. In other words, this

means that the user can freely position the foreground object, with the cam-

era parameters being arranged.

(iii) Constructing the background scene and the foreground objects by simple

polygons – In order to approximate the geometry of the background scene,

several polygons should be generated to represent the background. This

model is then a polyhedron-like form with the vanishing point being on its

base. The “billboard”-like representation and its variation are used for fore-

ground objects.

These three operations are closely related to each other so that the interactive user inter-

face should be able to provide their easy and simultaneous performance. A spidery mesh

is the key to fulfilling this requirement.

Figure 18: "Tour into the picture" process overview.

19

19

Figure 19: "Tour into the picture": a) apply a spidery mesh on the image; b) define five regions.

This method is outlined as follows: Figure 18 shows the process flow. After an input

image is digitized (Figure 18(a)), the 2D image of the background and 2D mask image of

the foreground objects are made (Figure 18(b), (c)). The background image is made by

retouching the image with the foreground objects removed. This is easily done with 2D

paint tools like Adobe Photoshop. TIP uses a spidery mesh as seen in Figure 19(a) to pre-

scribe a few perspective conditions, including the specification of a vanishing point (Figure

18(d)). This spidery mesh is a 2D image consisting of a vanishing point and an inner rec-

tangle. The inner rectangle is used to specify the rear window in 3D space (see Figure

19(a)). The rear window can be thought of as the border that the virtual camera cannot

go through when making the animation. The TIP GUI allows the user to deform and trans-

late the inner rectangle and to translate the vanishing point. Next, the background is mod-

eled with five 3D rectangles (Figure 18(e)). The outer rectangle in the spidery mesh is de-

composed into five smaller regions: floor, right wall, left wall, rear wall, and ceiling Figure

19(b). Then, simple polygonal models for the foreground objects are also constructed

(Figure 18(f)). Based on the foreground mask information, you construct a 3D polygonal

model for a foreground object in the scene. Since one polygon is usually not enough to

specify a complete object, you need to use several ones hierarchically ordered. After the

scene is completely reconstructed, the last user input is to position the new virtual camera

from which we want to see the scene (Figure 18(g)). For that, three parameters can be

20

20

controlled by the user through the user interface: the camera position, the view-plane nor-

mal, and the view angle. This allows for rotations, translations, zoom, and view angle

changes of the generated 3D scene. Finally, the image seen from a virtual position can be

rendered using ordinary texture mapping techniques (Figure 18(h)). On top of that, it is

possible to render animations by a key-framing method.

There are a few things to note about TIP. In some images it is difficult for users to

specify the vanishing point. When the 2D image contains more than one vanishing point

(see Figure 17) it is hard to create a background image and a foreground mask. But TIP

shows that it is possible to create an animation from only a single picture of photograph.

3.1.2 “Single-view metrology”Single-view metrology is the science of obtaining measurements of scene structure (e.g.

lengths, areas) from a single image. This field is mainly marked by Criminisi, whose work I

will basically follow [CRZ00, Crim02]. His idea is to use scene constraints imposed by

parallel lines and planes to obtain dimensions out of a single image. These dimensions can

be up to scale, that means we only know ratios, or with absolute metric values because of

a known reference measurement in the scene. Criminisi assumes in his approach that the

vanishing line of a reference plane in the scene may be determined from the image, to-

gether with a vanishing point for a reference direction (not parallel to the plane). He is

concerned with three types of outputs:

(i) measurements of distances between any of the planes which are parallel to

the reference plane;

(ii) measurements on these planes; and

(iii) the determination of the camera’s position in terms of the reference plane

and direction.

His measurement methods are independent of the camera internal parameters. Further-

more, his ideas can be seen as reversing the rules for drawing perspective images given

by Leon Battista Alberti in his treatise on perspective (1435) as described in the section

“Introduction”.

The basic geometry of the plane’s vanishing line and the vanishing point Criminisi

used are illustrated in Figure 20. The vanishing line

€

l of the reference plane is the projec-

tion of the line at infinity of the reference plane into the image. This line could also be de-

scribed as the horizon. The vanishing point

€

v is the image of the point at infinity in the ref-

erence direction. Note that the reference direction need not be vertical. The vanishing

point is then the image of the vertical “footprint” of the camera center on the reference

plane. Likewise, the reference plane will often, but not necessarily, be the ground plane, in

which case the vanishing line is commonly known as the “horizon”.

21

21

Figure 20: Basic geometry of Criminisi’s approach: The plane’s vanishing line

€

l is the intersection of the imageplane with a plane parallel to the reference plane and passing through the camera center. The vanishing point

€

v is the intersection of the image plane with a line parallel to the reference direction through the camera cen-ter [Crim01].

First, let us take a look at Criminisi’s approach to measure the distance (in the refer-

ence direction) between two parallel planes, specified by the image points

€

x and

€

x'.Figure 21 shows the geometry, with points

€

x and

€

x' in correspondence. He formulated the

following theorems that solve this problem.

Theorem 1: Given the vanishing line of a reference plane and the vanishing point for areference direction, then distances from the reference plane parallel to the reference di-rection can be computed from their imaged end points up to a common scale factor. Thescale factor can be determined from one known reference length.

Theorem 2: Given a set of linked parallel planes, the distance between any pair of planesis sufficient to determine the absolute distance between any other pair. The link providesa chain of point correspondences between the set of planes.

Figure 21: Distance between two planes relative to the distance of the camera center from one of the twoplanes: a) in the real world; b) in the image [Crim01].

To understand this, we need to take a look at Figure 21. The four points

€

x ,

€

x',

€

c ,

€

vdefine a cross-ratio, where

€

x is a point on the far plane

€

X ,

€

x' is point on the near plane

€

X ' ,

€

c is the camera center, and

€

v is the vanishing point. The value of the cross-ratio de-

termines a ratio of distances described as

€

d(a,b) between planes in the world as follows:

22

22

€

cross =d(x,c) d(x' ,v)d(x' ,c) d(x,v)

=d(X ,c) d(X ' ,v)d(X ' ,c) d(X ,v)

(7)

Assuming that the vanishing point

€

V is at infinity the distances

€

d(X ' ,v) and

€

d(X ,c) are in-

finitesimal large (

€

∞ ). So, they can be eliminated from (7), which leads to:

€

cross =d(X ,c)d(X ' ,c)

(8)

Let us define the far-plane-to-camera distance as

€

ZC and the near-plane-to-camera distance

as

€

ZC − Z :

€

d(X ,c) = ZC and

€

d(X ' ,c) = ZC − Z (9)

€

cross =ZC

ZC − Z(10)

After some algebra this function leads to:

€

ZZC

=1− 1cross

(11)

So, with a known far-plane-to-camera distance

€

ZC or near-plane-to-camera

€

ZC − Z , we can

use the cross ratio from the image to compute the plane-to-plane distance

€

Z (and vice-

versa). The plane-to-camera distance may be difficult or inconvenient to measure directly.

Instead, we can use a known plane-to-plane distance (in the reference direction) to derive

the unknown plane-to-plane distance (see Figure 22). If a reference distance

€

Zr is known,

we need to apply the cross-ratio like described above to

€

v,cr ,r2 ,r1 to get plane

€

π r -to-

camera distance. Then, we need to use that distance and the cross-ratio of

€

v,cs,s2 ,s1 to get

the distance between

€

π r and

€

π to determine

€

ZC .

Figure 22: Distance between two planes relative to the distance between two other planes: a) in the world; b)in the image [Crim01].

If the reference plane

€

π is affine calibrated (its vanishing line is known) then from

image measurements we can compute the ratios of lengths of parallel line segments on the

23

23

plane and the ratios of areas on the plane. An affine calibration is an affine rectification.

This method was described earlier as plane-to-plane homology, to recall:

€

x = HX with

€

H = I + µvaΤ

v•a(12)

As described before, the camera’s distance

€

ZC from a particular plane can be obtained

knowing a single reference distance

€

Zr . Knowing

€

ZC allows computing the complete posi-

tion of the camera.

Now, there is the question of reconstructing the whole scene. Computing measure-

ments of relevant edges in the scene allows a complete model reconstruction. An example

of a complete reconstruction is shown in Figure 23.

Figure 23: An example of results obtained with Criminisi’s single-view metrology: a) original image; b) synthe-sized image; c) synthesized view with original camera location.

3.2 Metrology & modeling from multiple imagesAn extension of single-view metrology and modeling is multiple-view metrology and model-

ing. With more than one view, it is possible to reconstruct or measure a bigger portion of

the scene.

3.2.1 Using two views

Figure 24: Stereovision system: Two images of the same scene are captures. Three-dimensional structure canbe computed from the analysis of those images.

24

24

The classical algorithms for 3D reconstruction use stereovision systems. Stereovision con-

sists of capturing two images of a scene taken from different viewpoints and estimating the

depth of the scene by analyzing the disparity between corresponding features (see Figure

24). This methodology finds its basis in trigonometry and is employed by the human bin-

ocular vision system. The basic steps in reconstructing a scene from two images are:

(i) finding corresponding points on the two images; and

(ii) intersecting the corresponding rays in the 3D space.

3.2.2 Using three or more viewsTwo views suffice to reconstruct a scene, but adding more images from different points of

view can constrain the reconstruction problem by reducing the uncertainty in the estimated

structure. This is particularly true if a line matching process is used rather than a point

matching one (line matching is not possible in two views). Furthermore, the use of three or

more views allows a check on the consistency of the features matched using the first two

views.

3.3 Metrology & modeling from panoramas?Panoramas are an interesting extension of using multiple views to apply scene reconstruc-

tion and metrology methods. There is not much work done is this field, yet. In [KAS01], a

walk-through of a panorama image is proposed as an extension to TIP. With this model, it

is not possible to reconstruct the whole scene, but it is the beginning of creating a 3D

model.

In a (spherical) panorama image, the environment viewed from a camera is mapped

onto a base sphere at the camera position. As shown in Figure 25, parallel lines

€

A and

€

Bon the ground plane are projected onto this sphere as arcs

€

A' and

€

B', respectively.

€

A'and

€

B' intersect at a point on the base sphere referred to as a vanishing point of the

spherical image. Since

€

A and

€

B on the ground plane take on any inclinations, the set of

all vanishing points on the sphere form a circle, that is the intersection of the base sphere

with the horizon plane. This circle is said to be a vanishing circle that is analogous to the

vanishing line for the planar image. The vanishing circle divides the base sphere into two

disjoint hemispheres. The lower one corresponds to the ground plane in the 3D environ-

ment, and the upper one corresponds to the space above the ground plane. Thus, the van-

ishing circle can be thought of as the horizon that separated the earth represented by the

ground plane from the sky. Using these assumptions, it is easy to reconstruct the back-

ground and foreground image and a TIP-like walk- or fly-through animation can be com-

puted.

25

25

Figure 26 and Figure 27 show an example of spherical and conical panoramas

that are used for a walk-through or fly-through animation.

Figure 25: Vanishing point a vanishing circle for a spherical panorama image.

Figure 26: Using a spherical image to create other TIP views.

Figure 27: Using a cylindrical image to create other TIP views.

26

26

4 PhotoMeter: Easy-to-use MonoGraphoMetricsThe solution proposed here is less general than the approaches mentioned in the previous

section. It offers two significant advantages. First, the user does not need to waste time es-

tablishing a reference plane like with Criminisi’s approach. Second, measurements may be

taken even when only a small portion of the floor below the measured points is visible, but

no set of floor edges is available to define vanishing points (e.g. see Figure 28).

Figure 28: PhotoMeter, for example helps to measure the height and width of a window (a) or the height,width, and depth of a locker (b).

4.1 IntroductionOne of the primary challenges of Computer Vision and Computer Graphics is the automa-

tion of the creation of precise 3D models of real environments. The objective of the project

described here is much more modest. I strive to provide users with a very simple-to-use and

effective tool for performing real 3D measurements from a single photograph. Such a tool

may be used for measuring buildings, rooms, windows, doors, furniture pieces, and even

people.

The advantage of the proposed approach lies in its simplicity. A simple mouse op-

eration – position-press-drag-release – is sufficient to precisely measure the 3D location of

a point visible in the image. Relative vertical and horizontal dimensions between such a

user-defined 3D point and a previously defined point, serving as a temporary anchor, are

clearly shown on the screen as mark-up arrows with dimension labels. This tool has been

integrated within a complete interactive system, called PhotoMeter. The user of PhotoMe-

ter can load a picture, perform the desired measurements, save the marked image for fu-

ture references or email it to a colleague, client, or sub-contractor.

The simplicity and effectiveness of PhotoMeter result from a design decision, which

restricts its use to pictures taken with a calibrated camera. More specifically, the user must

know the horizontal field of view of the camera and the height of the camera above the

27

27

floor or ground when the picture was taken. Furthermore, PhotoMeter relies heavily on the

assumption that the camera was perfectly leveled (horizontal). Consequently, errors in

camera calibration, in its height, and in its horizontal alignment will introduce errors in the

measurements displayed by PhotoMeter. I provide an error analysis and suggest an ap-

proach for camera calibration, but recommend that PhotoMeter be used with tripods that

permit to lock the camera tilt, hence ensuring a horizontal position and that make it easy

to always set the camera at the same height. Alternatively, a simple stool may be used to

support the camera.

4.2 ComputationsIn this section, I will explain necessary foundations for the computations and how the

measurements are computed by PhotoMeter.

4.2.1 Main IdeaAfter the image has been prepared in means of lens distortion correction and loaded, Pho-

toMeter can perform its work. PhotoMeter assumes the following scene settings:

(i) the aspect ratio of the image is set to 4:3;

(ii) the camera’s focal center is at

€

C = (0,h,0)Τ where

€

h is the height above the

floor; that means the origin of the world coordinate system is set to be on

the floor exactly below the camera;

(iii) the center of the image is at

€

S = (0,h,d)Τ where

€

d is the distance between

the camera center and the image center.

Figure 29: The basic setup of the 3D scene (left) and basic labels in the scene (right).

The scene in Figure 29 displays the image as a look-through screen. The distance

€

d is set

automatically by PhotoMeter depending of the horizontal field of view set by the user

through the calibration dialog described in section Fehler! Verweisquelle konnte nicht ge-

funden werden.. Furthermore, the scene relies on a left-handed coordinate system as used

in OpenGL. Then, the y-vector is the up-vector in the scene, the x-vector is the side-vector,

and the z-vector is the depth-vector respectively to the camera center. Using these settings

for the scene, PhotoMeter needs two more coordinates. First, the coordinates of the pixel

€

P'= (p'x , p'y )Τ are needed, which is the pixel in the image where the object to be meas-

28

28

ured is situated. Second, the projection of the point

€

P' onto the floor gives the pixel

€

F'= ( f 'x , f 'y )Τ. A dragging constraint ensures that

€

f 'x = p'x . Finally with this input, dimen-

sions between every point in the image whose floor shadows are visible can be computed

easily. These computations can be performed by similar triangles.

4.2.2 Mathematical/geometrical calculations

Figure 30: The sequence shows how to measure a window with two mouse clicks. (a) The user clicks the top leftcorner of the window, drags down, and releases when the line touches the floor. During the drag, the line isconstrained to remain vertical. The 3D location of the top corner of the window is computed and becomes theanchor (datum). (b) The user clicks the opposite, lower-right corner of the window and drags down to thefloor. The 3D location of the opposite corner is computed. (c) PhotoMeter displays the perspective projectionof the vertical and horizontal displacements between the anchor and the new point. (d) The user pressesENTER to keep this series of measurements and to annotate the image with the associated dimensions.

To obtain the height of a world 3D point

€

P in the scene, the user clicks at the pixel

€

P' where

€

P appears on the screen, drags the cursor down to the floor, and releases the

mouse-button at the pixel

€

F' where the vertical projection of

€

P onto the floor appears on

the screen. To help the user ensure that the line from

€

P' and

€

F' is vertical, once

€

P' is se-

lected, the horizontal motions of the cursor are temporarily disabled, until the mouse-

button is released. As shown in Figure 30, when

€

P' identifies a point on a wall or on a

vertical side of a furniture, the location of

€

F' is obvious. Furthermore, when

€

P lies on the

floor, no dragging is necessary. In this case

€

P' equals

€

F'. The first 3D point

€

P identified

during a series of measurements is used as an anchor (datum) for all subsequent meas-

urements in the series. Subsequent 3D points may be added to the series until the ENTER

key is pressed. PhotoMeter computes the vertical and horizontal displacements, in 3D, be-

tween each one of these points and the anchor. It overlays these displacements on the im-

age by drawing their perspective projections as red lines. When ENTER is pressed, the se-

ries is frozen and the labels with dimensions added to the image. The next selected 3D

point will automatically become the anchor for the next series. To find the coordinates

€

(px ,py , pz )Τ of the selected 3D point

€

P in the coordinate system of the floor projection of

the camera, PhotoMeter uses

€

P' and

€

F'.

29

29

Figure 31: 3D view showing the geometric computation of the horizontal distance dobject from the object to thecamera.

Figure 32: Top view showing how to compute the x and z coordinates of the object. The orange line defines avertical section plane through the camera and the object.

Figure 33: Side view of vertical section plane defined in Figure 32 explaining the computation of the height ofthe object.

30

30

To understand the internal computations of PhotoMeter, we first take a closer look

at Figure 31. In this figure, you can easily see how to compute

€

dobject , which is the absolute

distance from the world coordinate origin

€

O = (0,0,0)Τ (vertical projection from the camera

position onto the floor) to the foot section of the object

€

F = ( fx ,0, fz )Τ . To solve this prob-

lem, I use similar triangle techniques (see the green marked triangles in Figure 31). The

distance

€

dobject can be computed as follows:

€

dobjectd'

=hh'

with

€

d'= p'x2+ d2 and

€

h'= f 'y (13)

€

dobject =h• p'x

2+ d2

f 'y(14)

Now, to compute

€

px , I also use similar triangles (see Figure 32):

€

pxp'x

=dobjectd'

€

px =p'x• dobject

d'(15)

After substituting

€

dobject and

€

d', I get the following result:

€

px =p'x

p'x2+ d2

•h• p'x

2+ d2

f 'y , which leads to

€

px =p'x• hf 'y

(16)

I also use similar triangles to get

€

py (see Figure 33). I need to add

€

h afterwards:

€

pyp'y

=dobjectd'

€

py = h+p'y• dobject

d'(17)

After substituting

€

dobject and

€


€

py = h+p'y

p'x2+ d2

•h• p'x

2+ d2


€

py = h+p'y• hf 'y

(18)

Then, only

€

pz is left and is again computed by similar triangles (see Figure 32):

€

pzd

=dobjectd'

€

pz =d •dobject

d'(19)

After substituting

€

dobject and

€


€

pz =d

p'x2+ d2

•h• p'x

2+ d2


€

pz =d • hf 'y

(20)

To conclude, I get

€

P = (px , py ,pz )Τ with:

€

px =p'x• hf 'y

;

€

py = h+p'y• hf 'y

;

€

pz =d • hf 'y

(21)

31

31

The implementation of these computations is shown in Figure 34.

class Calculation{

// variable declarationspublic:...

private:...

// p and f values have to be relative to a screen center of (0,0)Calculation::Calculation (GLfloat distCameraScreen, GLfloat cameraHeight, GLfloat pPrimeX, GLfloat pPrimeY, GLfloat fPrimeY){

// given values through user inputcamera.y = cameraHeight;object.pPrime.x = pPrimeX;object.pPrime.y = pPrimeY;object.fPrime.x = pPrimeX;object.fPrime.y = fPrimeY;screen.distFromCamera = distCameraScreen;

// set valuescamera.x = 0.0f;camera.z = 0.0f;screen.center.x = 0.0f;screen.center.y = camera.y;screen.center.z = screen.distFromCamera;

}

void Calculation::Compute3DCoordinate(){

ComputeHeight();ComputeWidth();ComputeDepth();

}

void Calculation::ComputeHeight(){

object.p.y = camera.y + object.pPrime.y * camera.y / fabs(object.fPrime.y);

}

void Calculation::ComputeWidth(){

object.p.x = object.pPrime.x * camera.y / fabs(object.fPrime.y);}

void Calculation::ComputeDepth(){

object.p.z = screen.distFromCamera * camera.y / fabs(object.fPrime.y);

}}

Figure 34: Implementation of the basic PhotoMeter computations.

Knowing the position of more than one pixel that has been projected back into the

world coordinate system, it is possible to compute distances between these positions.

32

32

4.2.3 Validation with “3D Measurement Simulation”It is easy to prove the mathematical correctness of the functions stated above. To see if

these functions work to compute the 3D position of a point presented by a pixel in the im-

age, I implemented another program, which I called 3D Measurement Simulation. Using

3D Measurement Simulation, I altered and simplified these functions several times until I

came to the final result presented above.

3D Measurement Simulation basically consists of two views (see Figure 35 and

Figure 36). The first one is a perspective view where you can see the scene with the setup

of a camera, a screen (image plane), a floor, several wall planes, and two objects used

for validation (cuboids). The second one is a camera view, which shows the picture taken

by the camera i.e. the picture seen through the camera.

To validate the correctness of the computed 3D position of the selected pixel in the

image, I put two spheres in the scene representing a specific position in 3D. Here, I will

only talk about the position of the upper right corner of the right cube. Since this is a gen-

erated scene, the correct position of this corner is known. Now, I need to display the com-

puted position after clicking at this specific corner and show the occurred differences,

which is the error. To do so, I implemented a special section in the program as seen in

Figure 36. Here is a short description of the outputs. The numbers refer to Figure 36:

1) The user can change the perspective matrix of the perspective view to set

values (camera view, navigation view).

2) This section provides the user with basic scene settings: Here, the camera is

at a height of 1, the distance between the screen and the camera is set to

1.5, and this results in a horizontal field of view of 28.07°.

3) Selection between two presets is given here: This sets the comparison object

(left or right cube). Also, their original positions are displayed.

4) This section presents the scanned position in the image and the computed

position corresponding to the same pixel.

5) The original and the computed positions are compared and the error in per-

cent is shown here.

Having a very low error (usually less than 1%) for different pixels representing the preset

position of the sphere allows concluding that the computations are correct. This low error

only leads to pixel discretization failure. Beside the calculated error, there is another way

of validation: The program draws a small sphere in the scene at the computed position.

Now, you can check if this sphere is at the point you intended to reach.

33

33

Figure 35: Shows the two different views of 3D Measurement Simulation.

Figure 36: Shows the section of 3D Measurement Simulation where the computed error is displayed.

Figure 37 presents the results of computing the error for the right cube. The goal is

to calculate the position of the green sphere (see Figure 37(a) upper left). To do so, the

user clicks on that position, drags the mouse down to the floor, and releases the mouse

button (it draws a red arrow). Simultaneously, the new position and the error are dis-

played. Furthermore, the blue sphere is drawn at that position (see Figure 37(b)). The per-

spective view allows visual checking for its correctness. Figure 37(c) provides the results.

34

34

Figure 37: Shows the comparison process of the original and the computed position of a set sphere.

4.3 Requirements & PreliminariesWhen metrology is used for engineering or architectural applications, it is important to

provide accurate measurements and to quantify the associated error. As in other Mono-

GraphoMetrics applications, the error in a measurement computed with PhotoMeter is due

to the combined effect of several errors:

(i) Camera calibration

(ii) Lens distortion

(iii) Horizontal alignment

(iv) Tripod height

(v) Error in selecting a point in the image

(vi) Error in selecting its floor shadow

The cumulative effects of these errors on the accuracy of the measurements are discussed

in section 4.6. Here, I briefly suggest how to reduce these errors through proper camera

calibration and image correction.

PhotoMeter uses the field of view of the camera. Test results have shown that speci-

fications provided by camera manufacturers may be confusing or inaccurate. Therefore it

is preferable to measure the actual field of view manually. Figure 38 shows the starting

panel in PhotoMeter explaining graphically how to measure the horizontal field of view by

35

35

placing the camera parallel to a wall and reporting the perpendicular distance (Label “a”

in Figure 38) from the camera to the wall and the horizontal length (Label “b” in Figure

38) of the visible portion of the wall. The horizontal field of view angle (“α”) is computed

automatically. To ensure that the camera viewing direction is perpendicular to the wall, I

recommend placing the camera so that the left and right vertical edges of a wall, door, or

window appear perfectly flushed against the left and right borders of the image in the

viewfinder. This camera calibration needs to be performed only once per camera.

Figure 38: Initial camera calibration panel in PhotoMeter.

To ensure that the camera is perfectly horizontal when used to take a picture for

PhotoMeter, one could simply place the camera on a horizontal surface, such as a stool or

a tripod with fixed orientation and height. To ensure that the stool or tripod is perfectly

horizontal, one could use expensive theodolites or a much cheaper laser level (Figure 39),

which includes a laser pointer that can be used to ensure that the height of the laser point

on the wall is identical in several directions.

Figure 39: An inexpensive laser level for adjusting the horizontal.

The height of the camera must also be measured accurately. Pointing the laser at a

vertical edge in the room may ensure that the height is measured vertically. After the pic-

ture has been taken, distortions in the image caused by the lens of the camera should be

corrected [DeFa01]. Several programs that correct lens distortion automatically or semi-

automatically are available and should be use here independently from PhotoMeter‡.

‡ An example for software that corrects lens distortion is LensDoc from Andromeda Software(http://www.andromeda.com/info/lensdoc/).

36

36

4.4 User InterfaceI implemented PhotoMeter on the MacOS X platform as a non-portable software. In this

section, I would like to describe all features and buttons of PhotoMeter. For a better rec-

ognition of PhotoMeter, I created an application icon (see Figure 40). The program can be

opened in two different ways: by clicking on the application icon in the file browser or

from the dock or by dragging an image on the application icon.

Figure 40: Start screen of PhotoMeter.

Right after the user has opened the program; another panel automatically appears

from the top of the main window – the calibration panel (see Figure 38). Here, the user is

asked to provide some values to calculate the camera’s internal parameters. After confirm-

ing this dialog, the program can be controlled by the buttons from the toolbar on the right

(see Figure 40), by keyboard shortcuts, or by the menu. The buttons are explained as fol-

lows:

Starts an open panel to load an image.

Starts a save panel to save the measured image.

Starts a print panel that allows printing of the altered image.

37

37

Saves the current measurements permanently.

Clears the last performed measurement.

Clears all measurements to get the original image.

Opens the calibration dialog to change camera settings.

Keyboard shortcuts are displayed in the menu (see Figure 41).

Figure 41: Menu of PhotoMeter with all submenus.

When an image is loaded, either by dragging on the application icon, by dragging

into the program or by using the menu or the button (see Figure 43(a)), PhotoMeter

checks the file format, the aspect ratio, and the size of the image. The system can interpret

images of the file formats JPEG, JPG, BMP, GIF, TIFF, TIF, PNG, PDF, PSD, PICT, and EPS.

This variety is possible because of the Acceleration framework offered from Apple. If the

image does not have an aspect ratio of 4:3 PhotoMeter refuses the image (see Figure

43(f)). If the image does not have the right resolution (800x600 pixels), but offers the

right aspect ratio (4:3), PhotoMeter resizes the image to 800x600 (see Figure 43(e)).

800x600 is a size possible to use as an OpenGL texture. After the image is loaded, the

user can start measuring the scene. The current measurement is drawn in red arrows. The

anchor representing the first measured object is marked with a red dot and is saved for

further processing. After pressing ENTER or using the “Keep Red Arrows” button the red

arrows change its color and dimensions are assigned and finally saved (see Figure 42).

The user can change the color and the width of the arrows in the preferences dialog when

these interfere with the image (see Figure 43(d)). After the user has finished his work, the

result can be saved or printed (see Figure 43(b) and (c)). If the user opens another image

without saving the previously edited one the software asks the user if he or she wants to

discard these changes or to save the measured image before proceeding (see Figure

43(e)). When the user quits the program, the same test takes place.

38

38

Figure 42: The user interface of PhotoMeter. This screenshot also shows measurements that are already saved(blue arrows with dimensions) and a measurement series that is still in progress (red arrows with anchor).

Figure 43: PhotoMeter panels: a) open dialog; b) save dialog; c) print dialog; d) preferences panel; e) savechanges dialog; f) aspect ratio dialog.

39

39

4.5 Implementation & DocumentationIn this section I describe the implementation of PhotoMeter. First, I give a short insight into

important implementation decisions. I will also go into the system’s class structure and

some significant activity explanations.

4.5.1 Implementation decisionsIn early stages, the question after the right platform emerged. I first wanted to implement a

portable program for the Mac OS X, Windows, and UNIX platform. To reach this goal, I

wanted to use OpenGL combined with GLUT. Lots of disadvantages came up when think-

ing about a user interface development: GLUT only allows a spare user interface. Further-

more, problems with image loading could arise. After all, I decided to implement my ideas

for the Mac OS X platform.

Mac OS X development is typified with Cocoa. Cocoa is a rich set of object-oriented

frameworks that allow for the most rapid development of applications on Mac OS X. The

Cocoa application environment is designed specifically for Mac OS X – only native appli-

cations and allows development of applications and plug-ins, which can make use of all

the extensive features of Mac OS X. Using Apple’s free Xcode Tools with Cocoa, allows

for rapid interface prototyping and a seamless development process. Apple’s Xcode Tools

consist of several programs:

(i) Xcode to actually implement;

(ii) Interface Builder to prototype the interface;

(iii) Icon Composer to create Apple conform icons; and

(iv) several for useful features.

After reviewing this development package I decided to create a Cocoa application.

Then, there was the question for the right programming language: Java, C, C++,

Objective-C, and Objective-C++ can be used. The Objective-C language is a simple com-

puter language designed to enable sophisticated object-oriented programming, but based

on standard C. So, it combines several features:

(i) object-oriented techniques allow the delivery of functionality that is packed

in the Cocoa framework;

(ii) because it is a standard ANSI C extension, existing C code can be adapted

and you can use all the benefits of C when working within Objective-C; and

(iii) it is a very simple and dynamic programming language.

The Objective-C++ language allows you to freely mix C++ and Objective-C code in the

same source file or project. Using Objective-C++, you can directly call Objective-C objects

from C++ code, and you can directly call C++ code from Objective-C objects. Thus, Objec-

tive-C++ allows you to use C++ class libraries directly from within the Cocoa application.

40

40

The filename extension for Objective-C files is .m (e.g. file.m); Objective-C++ uses .mm

(e.g. file.mm).

To summarize, I decided to implement a Cocoa application (to get support from its

extensive frameworks) with Objective-C++ (because it allows me to use my previous

knowledge of C and C++, but also the support of all Mac OS X features).

4.5.2 Class diagramsIn the UML class diagram presented in Figure 44, I am showing all classes used in my im-

plementation and its inheritances from classes from several Cocoa frameworks. The main

class in my implementation is MyNSOpenGLView. This class handles the user interface and

all user interactions like mouse and keyboard functions. Furthermore, all panels, image

saving and printing processes are controlled here. The basic OpenGL functionality is inher-

ited from the Cocoa class NSOpenGLView. The appearance of the actual scene – the im-

age and all arrows – is described in the class MyScene. This class is also connected to the

classes MeasurementEntry and APTexture. MeasurementEntry represents all values of an

arrow, including its start and end position, all connected measurements, its color and so

on. The class APTexture handles image loading using the Cocoa Application framework

and the creation of textures for OpenGL. The class Calculation, implemented in C++, com-

putes from necessary input values the 3D position of an object in the scene. All these

classes inherit from several Cocoa classes, mainly from NSWindow to create windows,

from NSPanel to create panels, and for sure from NSObject for basic object preferences.

4.5.3 Activity diagramsPhotoMeter offers a variety of interaction possibilities:

(i) setting camera parameters,

(ii) changing scene settings,

(iii) loading an image,

(iv) measuring the scene,

(v) saving the measured image, and

(vi) printing the image.

Figure 45 shows a UML use-case diagram of all loading, saving, printing procedures as

well as the usage of the camera calibration and scene settings dialog. Displaying the main

window without any panels on top characterizes the main state of the program. By clicking

on buttons different process are started. When the user opens the Calibration Panel and

sets new values, the input is validated continuously. After applying the user input, these

values get saved. The Preferences Panel, the Save Panel, the Print Panel work in a similar

simple way. Pressing the assigned button opens the panel; confirming the input terminates

the panel. Also, all panels can be cancelled without applying any input. The image load-

41

41

ing process is a more difficult procedure. First, it needs to be checked if the current image

has undergone any changes since it has been opened or saved. Second, the selected im-

age needs to be checked for its aspect ratio and its size. Based on the results and related

user inputs, the procedure goes different ways. Figure 46 describes the workflow of inter-

actions when measuring a loaded picture. Here, we need to distinguish between pressing

ENTER and any mouse operations. When the user presses ENTER, the current measuring

process is finished by saving and labeling these measurements. Using the mouse leads to

new measurements: Pressing the mouse sets the start point for a new measurement. Before

setting this position is the anchor, the software needs to check the amount of arrows of the

current measuring process. With dragging and later releasing the mouse, the measurement

process ends and red arrows are drawn on the image.

Figure 44: Main UML class diagram of PhotoMeter.

42

42

Figure 45: UML activity diagram explaining, the use of panels and dialogs of PhotoMeter.

43

43

Figure 46: UML activity diagram explaining mouse actions in the window.

4.6 ResultsThe accuracy of PhotoMeter is strongly dependent from a human task – taking the picture

with a well-setup camera. A small human failure evokes strong uncertainties with the com-

puted measurements. Beside human errors, there are also errors occurred because of pixel

discretization.

The measurement errors due to camera tilt and height error are also a function of

depth (i.e., distance between the measured object and the camera). These dependencies

are illustrated in Figure 47, Figure 48 and Figure 49 as theoretical values. Figure 47

shows the acceptable horizontal camera tilt as a function of depth, so that the error in the

object’s height does not exceed ten centimeters. A tilt of one degree would create an error

of less than ten centimeters for an object located at six meters from the camera. If the dis-

tance from the camera to the objects gets bigger the acceptable horizontal camera tilt for

44

44

a ten centimeters error drops rapidly to non-achievable values. Figure 48 shows the posi-

tion error as a function of distance for a one-degree tilt. For example, at a distance of

twenty meters from the camera, a one-degree camera tilt causes a position error of thirty-

five centimeters. Just as above, when the depth is higher the error raises linear, but gets

unacceptable. Figure 49 plots the error in the position of

€

P as a function of the error in

camera height. The error depends on the height of the measured point. I have plotted the

range of errors (dotted lines) and the mean error (red line).

Figure 47: Acceptable tilt as a function of depth to ensure an accuracy of 10cm of the object’s height.

Figure 48: Error in computed height as a function of distance for a tilt of one degree.

45

45

Figure 49: Position error as a function of inaccuracy in the estimation of the camera height.

A further source of error comes from the inaccurate selection of the points on the

screen. It stems from the discretization of the image and from the difficulty of aligning the

cursor with the correct pixel. Assuming that the user has selected the correct pixel, the

maximal error of the position in the image would be the half of the diagonal of a pixel.

Figure 50 plots the maximal position error correct pixel selection as a function of depth for

points near the center of the screen and points near the edge.

Figure 50: Error in computed height as a function of distance due to pixel resolution for points near the centerof the image (red line) and points near the border (blue line).

To measure the cumulative effect of these errors, I have experimented with a variety

of indoor scenes. The camera and tripod were calibrated as discussed in section Fehler!

Verweisquelle konnte nicht gefunden werden.. In practice, I obtain an accuracy of less

than five centimeters when measuring the dimensions of windows, doors, or furniture

pieces within five meters from the camera (see Figure 51).

46

46

Figure 51: The error in the dimensions measured with PhotoMeter and shown here does not exceed 5cm.

47

47

5 ConclusionI describe a very simple-to-use tool for measuring vertical and horizontal distances be-

tween arbitrary 3D points from a single image, provided that the points and their vertical

projection on the floor can be identified in the image. The errors resulting from the cumula-

tive effects of camera calibration, lens distortion, camera alignment, and pixel selection

increased with distance from the camera. At five meters, I usually achieve an accuracy of

less than five centimeters. I believe PhotoMeter is a viable alternative to physical meas-

urements or to more elaborate photometry techniques for a range of applications where

speed and ease-of-use are important and where the inherent inaccuracy is acceptable.

Such applications include area floor and wall estimations for interior decoration and plan-

ning the layout of office or kitchen furniture.

In the future, I will explore how higher-resolution and local image-processing tech-

niques can improve the accuracy of the manual point selection on the screen. Furthermore,

the method presented can be extended to panoramas.

48

48

Bibliography[A1435] L. B. ALBERTI. De Pictura. 1435. Reproduced and commented by Wissen-

schaftliche Buchgesellschaft Darmstadt. 2000.

[Ae1435] L. B. ALBERTI. On Painting. Translation of “De Pictura” into English at

http://www.noteaccess.com/Texts/Alberti/index.htm.

[Ber87] M. BERGER. Geometry II. Springer-Verlag. 1987.

[Crim01] A. CRIMINISI. Accurate Visual Metrology from Single and Multiple Uncali-

brated Images. Distinguished Dissertation Series. Springer-Verlag London

Ltd., September 2001.

[Crim02] A. CRIMINISI. Single-View Metrology: Algorithms and Applications. In Proc.

DAGM 2002 Symposium, Zürich, Switzerland, September 2002.

[CRZ00] A. CRIMINISI, I. REID, AND A. ZISSERMAN. Single view metrology. IJCV,

40(2): pages 123-148, 2000.

[DeFa01] F. DEVERNAY AND O. D. FAUGERAS. Straight lines have to be straight: Auto-

matic calibration and removal of distortion from scenes of structured envi-

ronments. Machine Vision and Applications, 1, pages 14-24. 2001.

[DTM96] P. E. DEBEVEC, C. J. TAYLOR, AND J. MALIK. Modeling and rendering architec-

ture from photographs: A hybrid geometry- and image- based approach. In

Proceedings, ACM SIGGRAPH, pages 11-20, 1996.

[HAA97] Y. HORRY, K. ANJYO, AND K. ARAI. Tour into the picture: Using a spidery

mesh interface to make animation from a single image. In SIGGRAPH, pages

225-232, 1997.

[ElHa01] S. F. EL-HAKIM. A flexible approach to 3D reconstruction from single images.

SIGGRAPH ’01 Sketches and Applications, 2001.

[Fau93] O. D. FAUGERAS. Three—dimensional Computer Vision: a Geometric View-

point. MIT Press. 1993.

[FM94] F. FIGGUEROA AND A. MAHAJAN. A robust method to determine the coordi-

nates of a wave source for 3-D position sensing. ASME Journal of Dynamic

Systems, Measurements and Control, 116: pages 505-511, September

1994.

[KAS01] H. KANG, S.PYO, K, ANJYO, AND S. SHIN. Tour into the picture using a Van-

ishing Line and its Extension to Panormamic Images. In Proceedings of

EUROGRAPHICS. 2001.

49

49

[KSH98] T. KIM, Y. SEO, AND K. HONG. Physics-based 3D position analysis of a soccer

ball from monocular image sequences. Proceedings of International Confer-

ence on Computer Vision, pages 721-726, 1998.

[KCS03] A. M. KUSHAL, G. CHANDA, K. SRIVASTAVA, M. GUPTA, S. SANYAL, T. V. N.

SRIRAM, P. KALRA, AND S. BANERJEE. Multilevel modelling and rendering of

architectural scenes. In Proceedings of EUROGRAPHICS. 2003.

[KBB02] A. M. KUSHAL, V. BANSAL, AND S. BANERJEE. A simple method for interactive

3D reconstruction and camera calibration from a single view. In Proceedings

Indian Conference in Computer Vision, Graphics and Image Processing,

2002.

[LCZ99] D. LEIBOWITZ, A. CRIMINISI, AND A. ZISSERMAN. Creating architectural models

from images. In Proceedings EUROGRAPHICS, pages 39-50, 1999.

[Le99] M. LEVOY. The digital Michelangelo project. In Proc. Eurographics, volume

18, September 1999.

[Maas92] H.-G. MASS. Robust automatic surface reconstruction with structured light. In

International Archives of Photogrammetry and Remote Sensing, volume

XXIX of Part B5, pages 102-107. 1992.

[NUS04] H. ASLAKSEN, National University of Singapore. Perspective in Mathematics

and Art. www.math.nus.edu.sg/aslaksen/projects/perspective/.2004.

[Sam86] P. SAMUEL. Undergraduate Texts in Mathematics: Projective Geometry.

Springer-Verlag. 1986.

[SeKn79] J. SEMPLE AND G. KNEEBONE. Algebraic Projective Geometry. Oxford Uni-

versity Press. 1979.

[SS94] G. STERN AND A. SCHINDLER. Three-dimensional visualization of bone sur-

faces from ultrasound scanning. Technical report, A.A.Du Pont Institute,

1994.

[Tsai87] Y. R. TSAI. A versatile camera calibration technique for high-accuracy 3D

machine vision metrology using off-the-shelf tv cameras and lenses. IEEE

Journal of Robotics and Automation, RA-3(4): pages 323-344, August

1987.

[WW02] G. H. WANG, Y. H. WU, AND Z. Y. HU. A Novel Approach for Single View

Based Plane Metrology. In Proc. International Conference on Pattern Rec-

ognition, vol.2, pages 556-559, Quebec City, Canada, August 2002.

50

50

CD ContentsWritten formulations:

- Thesis as PDF

Source codes:

- 3D Measurements Simulation

- PhotoMeter (version 0.1)

Compilations:

- 3D Measurements Simulation

- PhotoMeter version (1.0)

photometer: easy-to-use monographometrics

Documents