[ieee 2007 ieee 11th international conference on computer vision - rio de janeiro, brazil...

A new Omnidirectional Stereovision Sensor

El Mustapha Mouaddib Gilles Dequen Laure Devendeville

CREA, LaRIA -FRE 2733-

University of Picardie Jules Verne,

33 rue Saint Leu, 80039 Amiens Cedex 1, France

{mouaddib, gilles.dequen, laure.devendeville}@u-picardie.fr

Abstract

This paper describes a new compact omnidirectional

stereovision sensor that combines a single orthographic

camera and four paraboloidal mirrors. Its geometry has

been designed with the help of a stochastic optimization ap-

proach in order to minimize the 3D reconstruction error. In

comparison with state-of-the-art sensors described in the

literature, better results are obtained for this sensor during

simulations. We will especially compare it with a classical

configuration with two mirrors. Two criteria will be used:

the 3D reconstruction accuracy and the field of view. We il-

lustrate the advantages of our sensor within a framework of

a simulation using a realistic environment and a ray-tracing

software.

1. Introduction

There are many methods to obtain omnidirectional stere-

ovision systems. One way is to employ a pair of rotating

cameras simultaneously [3]. This method is appropriate to

obtain a very good resolution but it requires the rotation of

the cameras which consequently prevents treating scenes

with mobile objects. A second way is to use two omni-

directional catadioptric cameras [14]. This method avoids

the previous problem but it requires two cameras and two

mirrors, thus increasing the weight and size of the sensor.

It also has all the conventional stereovision drawbacks, as

the calibration cameras, the difference of optical response

between cameras, ...

Another way to recover stereovision is to exploit a single

camera that observes several mirrors. This makes it possible

to design sensors which have many advantages compared to

the systems which use several cameras. These advantages

are: single calibration, no synchronization problem, similar

optical response, wide field of view, rigid link between mir-

rors, and finally a reduced cost. Several works have dealt

with a single camera and planar mirrors [6, 7, 9, 13]. We

restrict our overview to the stereo system based on a single

camera (single lens) and convex mirrors. A stereo vision

system based on a single conventional camera (one lens)

and two specular spheres (convex mirrors) was probably

used first by [17]. In [17], the authors studied four stereo

systems with a single camera looking at mirrors. They dis-

cussed the case of all single view point systems (planar, el-

lipsoidal, hyperboloidal and paraboloidal). A stereovision

system that uses two vertically aligned mirrors with differ-

ent curvatures (”two-biconvex lobes”) has been proposed

by [15, 20]. More recently, Sagawa et al., proposed a sin-

gle camera with nine spherical mirrors (they are easier to

make): a principal one with eight others around it [19].

Usually, the design of these sensors is made without any

optimization process. In [4], the authors have studied the

behavior of these sensors if they modify the number (at least

two), the position and the size of mirrors. Our paper aims to

describe a new way to design a new stereovision system us-

ing a single camera and four mirrors. We will describe the

criteria we have used and the optimization algorithm devel-

oped for the design. Next, we will show how to validate this

sensor and compare it to another classical sensor. Finally,

we will discuss the accuracy and the field of view of this

sensor.

2. Criteria to design a better sensor

Most omnidirectional stereo catadioptric sensors are

based on drawings illustrated in Fig. 1. The framework of

this study is based on them. Indeed, we used several mirrors

set on the same plane and looking to a camera.

2.1. Criteria and models

We wish to design a stereovision sensor using a single

camera and multiple mirrors to obtain a compact sensor.

The purpose of this sensor is to reconstruct an environment

for the navigation of mobile robots. To design it, we use the

following criteria:

978-1-4244-1631-8/07/$25.00 ©2007 IEEE

ba

c d

Figure 1. Stereovision system based on a single camera and several

mirrors

Camera

telecentric lens

−40

−40−30−20−10 0 10 20 30 40

0

2

4

6

8

10

−30

−20

−10

0

10

20

30

40 40 −40 −30 −20 −10 0 10 20 30

30 20

10 0

−10−20

−30−40

40

Figure 2. Paraboloidal mirrors are seen by an orthographic camera

• 3D reconstruction Accuracy

• Field of view

The components of sensors are: mirrors, lenses and cam-

eras. Typical sensors based on different associations of

these components have been proposed in [16].

As shown in [1], one interesting constraint is the sin-

gle view point. In the case of hyperbolic mirror, its focus

point has to coincide with the camera’s focus. This is easy

to do when we have a single mirror. But if we have more

than one, it became very difficult to manufacture. On the

contrary, the parabolic mirror does not have this problem.

More over, it reflects the rays in parallel way to the optical

axis. The camera’s horizontal placing (perpendicular to the

optical axis) does not influence the picture. In other words,

there exists an invariance with respect to the camera plac-

ing [5, 18]. This is an interesting property since several

mirrors can be set so that their revolution axes are parallel.

In this case, an orthographic projection is used as shown in

[1], obtained by a telecentric lens and a camera. This is the

configuration we have used to design our sensor (see Fig. 2).

Let P = (X, Y, Z) be a three dimensional point and p =(x, y, z)i be its image on mirror i. The projection center of

mirror i, is at dX, dY, dZ as shown in Fig. 3.

The general model for the not-centered mirrors, is given

by (1):

Figure 3. Stereovision system with two mirrors and reconstruction

error

x

y

z

i

=hi√

X2+Y2+Z2+Z

X

Y

Z

(1)

where X = X − dX, Y = Y − dY, Z = Z − dZ and 4hi is

the latus rectum (where h is the focal length).

The orthographic projection of those points on the cam-

era is given by (2):

(

u

v

)

i

=

(

αu 00 αv

) (

x + dX

y + dY

)

i

(2)

where (u, v) are the image coordinates of this point and

(αu, αv) are intrinsic camera parameters. Note that the

model does not need the camera center.

To show the principle of the 3D reconstruction, we con-

sider only two mirrors (see Fig. 3).

Equations (1) and (2) are valid even if we simultaneously

combine horizontal and vertical shifts. dXi, dYi and dZi

represent translations between the general frame and mirror

i’s frame (respectively dXj , dYj and dZj for frame j).

2.2. Reconstruction Error

Considering two images (obtained thanks to two mir-

rors, see Fig. 3), the main problem is to find the real point

corresponding to its position in both images. When there

is no noise the problem is trivial. Otherwise the rays can

not meet and the problem is how to find the real point that

we called “back-projected point”. This problem is known

as “triangulation” problem which corresponds to finding

the intersection of two rays in space. Let (u′, v′)1 and

(u′, v′)2 be respectively the coordinates of the reflected

points of P on the image plane thanks to the two mirrors.

Due to some noise, (u′, v′)1 and (u′, v′)2 are not the es-

timate values. Let (u, v)1 and (u, v)2 be the real correct

values that are close to (u′, v′)1 and (u′, v′)2, the rays of

which meet in P (i.e. (u′, v′)1 = (u, v)1 + (du, dv)1 and

(u′, v′)2 = (u, v)2+(du, dv)2). As described in [11], there

are many methods to find the back-projected point knowing

(u′, v′)1 and (u′, v′)2.

3 mirrors

4 mirrors

1.341.34

1.241.31

(1.42, −1.43)(−1.44, −1.45)

(1.55, 1.55)(−1.43, 1.48)

5 mirrors

(−1.44, −1.34)

(1.53, 1.54)

1.34

1.24

2 mirrors

1.001.20

1.011.19

(1.69, −1.73)(−1.54, −1.53)

(0.28, −0.12)

(1.68, 1.69)(−1.49, 1.46)

(−1.40, 1.46)(1.47, 1.31)

(−1.25, −1.41)

1.301.31

1.38

mean error (in cm): 28.691

−1

0

1

2

−2 −1 0 1 2

2 1 0−1−2

2

1

0

−1

−2

2

−2

−1

0

1

2

−2 −1 0 1 2

1 0−1−2

2

1

0

−1

−2

mean error(in cm): 40.300 mean error(in cm): 34.778

mean error (in cm): 31.595

−2

Figure 4. SLS4OCS: Results from 2 to 5 mirrors

The reconstruction error associated to P is the distance

between P and its back-projected point P ′. The shorter this

distance, the greater the 3D reconstruction accuracy and the

better the configuration. For an environment E (i.e. a set

of points), the reconstruction error is the mean of all the

reconstruction errors associated to each point belonging to

E.

For our experiments we choose the method which con-

sists in computing the mid-point of the common perpendic-

ular to the two rays from (u′, v′)1 and (u′, v′)2 [2].

3. SLS4OCS: a Stochastic Local Search for

Omnidirectional Catadioptric System

As the size and positions of mirrors depend on the en-

vironment to be reconstructed, we develop an automatic

method in order to compute a sensor which will be able to

provide a solution depending on a given environment and a

preset number of mirrors. Our preliminary study leads us to

develop a method that was based on a genetic algorithm [8].

This first approach was quickly prohibitive because of its

time consuming for a small number of mirrors. We finally

chose to develop a stochastic local search method which is

able to compute a solution (i.e. the size and the coordinates

of each mirror) in reasonable time. This approach aims to

minimize the mean reconstruction error of a representative,

randomly chosen set of points of the considered environ-

ment by providing the geometry of all the mirrors of the

sensor.

A classical stochastic local search algorithm [12] con-

sists in a “walk” among all feasible solutions. A cost func-

tion guides the walk to areas of the search-space where the

solutions have a low mean reconstruction error. This tech-

nique is named the ”Local Search”. To escape from these

”local minima” and find other ”good” areas, random steps

across the search-space can be done. Our algorithm (named

SLS4OCS) starts with a random feasible solution and its

reconstruction error is estimated. The first step consists in

envisaging all feasible solutions in which we increment or

decrement one of the coordinates or the size of only one

mirror parameter (coordinates or radius) in the initial solu-

tion. These solutions are in a set called “neighborhood”.

For each solution of the “neighborhood”, the reconstruction

error is computed and the solution with the lowest one is

chosen. To avoid local minimum, a random choice can oc-

cur in the choice of the best neighbor with a probability,

’p’ (empirically set at 0.5 in this study). A local minimum

is a solution where all solutions in its neighborhood are a

greater reconstruction error than the latter. The algorithm

stops when a preset number of iteration is reached and the

best solution found is given. Within the framework of this

study, we define some constraints: the mirrors can not over-

lap each other, they can not overflow out of the sensor image

and the mirror centers are on the same plane (z = 0).

SLS4OCS has been implemented in C language and run

AMD Opteron under Linux Kernel. Within the experimen-

tal framework of this study, the sensor is characterized by

an image plane with 1000 × 1000 pixels. Before back-

projecting one given point from the image plane, we intro-

duce an error of 1

2pixel toward a random direction to sim-

ulate pixel noise. Within the framework of this study, each

point from the set of points to be reconstructed is randomly

chosen in a cubic or cylindrical environment in order to sim-

ulate a realistic environment. The final solutions found by

the SLS4OCS algorithm for a number of mirrors from 2 to

5 and for a set of points from an cubic environment with

edges of 500 cm are presented in Fig. 4. Each circle of

the figure represents the top view of a paraboloidal mirror

(i.e. its omnidirectional picture). For each configuration in

Fig. 4, we give the center of the mirrors, the radius and the

mean reconstruction error. Solutions from 6 to 9 have been

computed but they are not presented here.

4. Validation

We used the SLS4OC algorithm [4] as a decision help

process. The solutions provided have been slightly modi-

fied in order to have symmetric configurations. Symmetric

configurations are easier to manufacture and cheaper than

configurations shown in Fig 4 while the accuracy is very

close. Among all results, we focus on the two mirror and

four mirror configurations. In the following, we named Two

the configuration with two mirrors and Four the configura-

tion with four mirrors [16]. The Two configuration has first

mirror center in (-1.48 cm, -1.48 cm), radius 1.31 cm and

the second mirror center in (1.48 cm, 1.48 cm), radius 1.31

cm. The Four configuration has first mirror center in (-1.48

cm, 1.48 cm), radius 1.31 cm; the second mirror center in

(1.48 cm, 1.48 cm), radius 1.31 cm; the third mirror cen-

(a) Image from the Two sensor (b) Image from the Four sensor

(c) Lab environment

Figure 5. Modeling of our Lab

(a) With the Two sensor (b) With the Four sensor

Figure 6. Pictures of targets on walls

ter in (-1.48 cm, -1.48 cm), radius 1.31 cm and the fourth

mirror center in (1.48 cm, -1.48 cm), radius 1.31 cm.

We focus on Four because it has the best reconstruction

error. To validate this configuration we propose:

• to compare it with the Two configuration since the Two

configuration has been used in [17]

• to generate pictures thanks to PovRay 1 and calculate

the reconstruction error. PovRay has been chosen be-

cause getting a ground truth to evaluate the reconstruc-

tion error is easy. Beside, the obtained ground truth is

precise. We modeled the environment of our Lab for

experiment as we can see in Fig. 5(c) and the taken im-

ages by the configuration Two and Four are presented

in Fig. 5(a) and Fig. 5(b).

In the following, all images have a size of 1000 × 1000pixels.

4.1. Accuracy

We evaluate the reconstruction accuracy of the Two and

Four configurations, respectively. We select targets of our

1Persistence of Vision Ray-Tracing Software

1

10

100

1000

5 10 15 20 25 30 35Reconstr

uction e

rror

(cm

logscale

)

points

Two sensor (Avg: 47, Std: 113)Four sensor (Avg: 6, Std: 5)

Figure 7. 3D error reconstruction for Two and Four mirrors con-

figurations

0

10

20

30

40

50

60

70

80

90

5 10 15 20 25 30 35

Re

co

nstr

uctio

n e

rro

r (c

m)

points

Two sensor (Avg: 21, Std: 18)Four sensor (Avg: 13, Std: 8)

Figure 8. 3D error reconstruction averaged for 1000 images for

Two and Four mirror configurations. Points are corrupted by a

random noise.

modeled environment, the 3D coordinates of which are

known in global frame. Fig. 6 provides the omnidirectional

projection of the environment on the camera for the Two

(see Fig. 6(a)) and Four (see Fig. 6(b)) sensors, respec-

tively. The walls and textures have been removed on Fig. 6

to facilitate the mapping since it is not the aim of this work.

To estimate automatically the 3D reconstruction accuracy,

we apply a Harris Detector [10], a sub-pixel localization of

image points for each mirror image and finally a cross cor-

relation algorithm to match corresponding points. The tri-

angulation algorithm (see subsection 2.2) provides the ex-

tracted and mapped points. Each 3D point needs exactly

(resp. at least) two projected points from mirror images to

apply the triangulation process for the Two sensor (resp. the

Four sensor).

The reconstruction error corresponds to the euclidean

distance between the real 3D point and the reconstructed

one from the images. Several tests have been realized to

compare the two sensors.

The curve in Fig. 7 shows the reconstruction error for

each point of the our environment corresponding to selected

targets (see Fig. 6) when a sub-pixel detection is done. The

reconstruction error from the Four sensor is about twice bet-

ter than the one obtained by the Two sensor. Let us note that

these points are well distributed in the image. We then can

consider that they are representative enough. In Fig. 9, real

3D and back-projected points are displayed to see the spa-

cial placing corresponding to targets in Fig. 6(b).

In the last experiment (see Fig. 8), the image obtained

from the raytracing has been corrupted by a Gaussian noise

and the result is obtained for an average of 1000 images.

That is why there is a difference with the mean reconstruc-

tion error from Fig. 9. The Four sensor prints a greater

-400-300

-200-100

0 100

200

-500-400

-300-200

-100 0

100 200

300

-50

0

50

100

150

200

250

Z

Real 3D pointsReconstructed points

X

Y

Z

Figure 9. 3D real points and associated reconstructed points for

Four sensor

Angle

He

igh

t (c

m)

0 45 90 135 180 225 270 315 3600

20

40

60

80

100

120

(a) Two sensor

Angle

He

igh

t (c

m)

0 90 180 270 360.0 0

20

40

60

80

100

120

(b) Four sensor

Figure 10. Field of view of the configurations

Figure 11. Image obtained by Four with a single target that it is

occluded by mirror # 2 (seen by only 3 mirrors)

accuracy than the other one.

4.2. The field of view

We define the useful field of view as the set of points in

the 3D environment that are reflected by mirrors and con-

sequently seen by the camera. When the sensor consists of

multiple mirrors, self occlusions between mirrors occur. It

is then interesting to study how to increase the number of

mirrors remains beneficial for the sensor.

Fig. 10 illustrates the influence of the number of mir-

rors on the field of view. Each square of Fig. 10 provides

the set of points of the unrolled surface of our cylindrical

environment for the Two sensors (see Fig. 10(a)) and Four

sensor (see Fig. 10(b)). For each point of this environment,

we choose a gray-scale representation to indicate the num-

ber of mirrors which are able to see it. Thus, the white area

represents the points of the environment which can be seen

and then reconstructed by all the mirrors of the sensor. The

black area (see the bottom of Fig 10(a)) shows the points

which can be seen by only one mirror and then cannot be

reconstructed. Between these two bounds, the other gray-

scale areas describe the geometry of the environment where

the points cannot be seen by all the mirrors of the sensor but

by at least two mirrors. The darker the area, the greater the

number of blind mirrors of the sensor.

Within this framework, on the left side, Fig. 10(a) shows

the field of view of the Two sensor for a cylindrical envi-

ronment. We can notice that the geometry of this sensor

causes two ”blind” areas where the points are seen by only

one mirror and then cannot be reconstructed. On the other

hand, the field of view of the sensor which consists of four

mirrors (see Fig. 10(b)) doesn’t have ”blind” areas.

In order to explain this phenomenon, we select a target of

our modeled environment and project it to the Two and Four

sensors , respectively. Fig. 11 provides the omnidirectional

image of this projected target for the Four sensor (consider-

ing mirrors from #1 to #4) and the Two sensor (considering

mirrors #2 and #3). As the mirror #3 is occluded by mir-

ror #2, the Two sensor cannot reconstruct the target. On the

contrary, three of four mirrors of the Four sensor are able

to back-project our target. We then are able to estimate the

associated 3D points, as can be seen it on Fig. 12.

We first note that, the 3D point reconstruction accuracy

depends on the geometry of the back-projected images de-

pending on mirrors. Thus, when a back-projected point on

one image is on (or quite on) the line which crosses the cen-

ter of a mirror, the conditions needed for triangulation are

bad and therefore a small image localization error gener-

ates a big estimation error of the 3D point. The Four sensor

decreases this case compared to the Two sensor. Another

advantage of the Four sensor is that for each 3D point of the

environment, it always exists at least two of the four mir-

rors which are well positioned to back-project the point. To

illustrate this property, the curves on Fig. 12 represent the

mean reconstruction error of our selected target when we re-

construct it with the Four sensor (see Fig. 11). As this target

can be reconstructed by at most three mirrors (i.e #1, #2 and

#4), the four curves of Fig. 12(a) (and also Fig. 12(b) and

Fig. 12(c)) represent the reconstruction error of our target

when the triangulation is respectively processed from cou-

ples of mirrors (#1, #2), (#2, #4), (#1, #4) and finally from

0

50

100

150

200

250

1 2 3 4

Re

co

nstr

uctio

n e

rro

r (c

m)

points

Mirror 1,2 & 4Mirror 1 & 2Mirror 1 & 4Mirror 2 & 4

(a) Image points are extracted with a sub-pixel accuracy

0

50

100

150

200

250

1 2 3 4

Re

co

nstr

uctio

n e

rro

r (c

m)

Points


(b) Image points positions are corrupted by a Gaussian noise (σ = 0.25)

0

50

100

150

200

250

1 2 3 4

Re

co

nstr

uctio

n e

rro

r (c

m)

Points


(c) Image points positions are corrupted by a Gaussian noise (σ = 0.5)

Figure 12. Reconstruction error for the single target obtained with

Four vs number of used mirrors

all three mirrors. On the contrary of Fig 12(a) where the

projected points are not corrupted, Fig. 12(b) and Fig. 12(c)

provide reconstruction errors of the target where the pro-

jected points are corrupted by at most 1

4and 1

2pixel respec-

tively. We can note on Fig. 12 that considering three mirrors

give often the better reconstruction error but both mirrors #1

and #4 give good results too since the single target is near

the perpendicular bisector of these centers of mirrors. This

is an another advantage of the Four sensor.

5. Conclusion and Future Works

We have presented a compact sensor for omnidirectional

stereovision. This sensor consists of four paraboloid mir-

rors and one orthographic camera. This new configuration

has been exhibited with the help of an optimization process

(Local Search Algorithm). We have shown that this sensor

is more accurate than those based on two mirrors by using

raytracing images as they give a ground truth. This prop-

erty is invariant for all noises added to the pixel positions.

We have also shown that this sensor is more isotropic as it

offers a better field of view. We have finished to build the

Four sensor and we are now working to validate it on mobile

robot for navigation in an indoor environment.

Acknowledgment

We want to thank our colleagues L. Guyot and P. Vasseur

for their remarks and corrections.

References

[1] S. Baker and S. Nayar. A theory of catadioptric image for-

mation. In ICCV, pages 35–42, 1998.

[2] P. A. Beardsley, A. Zisserman, and D. W. Murray. Navigation

using affine structure from motion. In ECCV, pages 85–96,

1994.

[3] R. Benosman, T. Maniere, and J. Devars. Multidirectional

stereovision sensor, calibration and scene reconstruction. In

ICPR 96, pages 161–1665, 1996.

[4] G. Dequen, L. Devendeville, and M. Mouaddib. Stochastic

Local Search for Omnidirectional Catadioptric Stereovision

Design. In IbPRIA, volume 4478 of LNCS, pages 404–411,

2007.

[5] C. Geyer and K. Daniilidis. Catadioptric camera calibration.

In ICCV, pages 398–404, 1999.

[6] J. Gluckman and S. Nayar. Planar catadioptric stereo: Ge-

ometry and calibration. CVPR, 1:22–28, 1999.

[7] J. Gluckman and S. Nayar. Rectified catadioptric stereo sen-

sors. In CVPR, pages 224–236, 2000.

[8] D. E. Goldberg. Genetic Algorithms in Search, Optimization

and Machine Learning. Kluwer Academic, 1989.

[9] A. Goshtasby and W. Gruver. Design of a single lens stereo

camera system. Pattern Recognition, 26(6):923–937, 1993.

[10] C. Harris and M. Stephens. A combined corner and edge

detector. In AVC, pages 147–152, 1988.

[11] R. Hartley and P. Sturm. Triangulation. Computer Vision

and Image Understanding, 68(2):146–157, 1997.

[12] H. H. Hoos and T. Stutzle. STOCHASTIC LOCAL SEARCH

Foundations and Applications. Elsevier, 2005.

[13] M. Inaba, T. Hara, and H. Inoue. A stereo viewer based on

a single camera with view-control mechanism. In ICIRS,

volume 2, 1993.

[14] H. Ishiguro, M. Yamamoto, and S. Tsuji. Omnidirectional

stereo. IEEE Trans. PAMI, 14, no. 2:257–262, 1992.

[15] G. Jang, S. Kim, and I. Kweon. Single-camera panoramic

stereo system with single-viewpoint optics. Optics Letters,

31(1):41–43, 2006.

[16] E. Mouaddib, R. Sagawa, T. Echigo, and Y. Yagi. Stereo vi-

sion with a single camera and multiple mirrors. ICRA, 2005.

[17] S. Nayar. Sphereo: Determining Depth using Two Specular

Spheres and a Single Camera. In SPIE Conference on Op-

tics, Illumination, and Image Sensing for Machine Vision III,

pages 245–254, 1988.

[18] S. Nayar. Catadioptric Omnidirectional Camera. In CVPR,

pages 482–488, 1997.

[19] R. Sagawa, N. Kurita, T. Echigo, and Y. Yagi. Compound

catadioptric stereo sensor for omnidirectional object detec-

tion. In IROS, volume 2, pages 2612–2617, 2004.

[20] D. Southwell, J. Reyda, M. Fiala, and A. Basu. Panoramic

stereo. In ICPR96, 1996.

[ieee 2007 ieee 11th international conference on computer vision - rio de janeiro, brazil...

Documents