[ieee 2007 ieee 11th international conference on computer vision - rio de janeiro, brazil...
TRANSCRIPT
A new Omnidirectional Stereovision Sensor
El Mustapha Mouaddib Gilles Dequen Laure Devendeville
CREA, LaRIA -FRE 2733-
University of Picardie Jules Verne,
33 rue Saint Leu, 80039 Amiens Cedex 1, France
{mouaddib, gilles.dequen, laure.devendeville}@u-picardie.fr
Abstract
This paper describes a new compact omnidirectional
stereovision sensor that combines a single orthographic
camera and four paraboloidal mirrors. Its geometry has
been designed with the help of a stochastic optimization ap-
proach in order to minimize the 3D reconstruction error. In
comparison with state-of-the-art sensors described in the
literature, better results are obtained for this sensor during
simulations. We will especially compare it with a classical
configuration with two mirrors. Two criteria will be used:
the 3D reconstruction accuracy and the field of view. We il-
lustrate the advantages of our sensor within a framework of
a simulation using a realistic environment and a ray-tracing
software.
1. Introduction
There are many methods to obtain omnidirectional stere-
ovision systems. One way is to employ a pair of rotating
cameras simultaneously [3]. This method is appropriate to
obtain a very good resolution but it requires the rotation of
the cameras which consequently prevents treating scenes
with mobile objects. A second way is to use two omni-
directional catadioptric cameras [14]. This method avoids
the previous problem but it requires two cameras and two
mirrors, thus increasing the weight and size of the sensor.
It also has all the conventional stereovision drawbacks, as
the calibration cameras, the difference of optical response
between cameras, ...
Another way to recover stereovision is to exploit a single
camera that observes several mirrors. This makes it possible
to design sensors which have many advantages compared to
the systems which use several cameras. These advantages
are: single calibration, no synchronization problem, similar
optical response, wide field of view, rigid link between mir-
rors, and finally a reduced cost. Several works have dealt
with a single camera and planar mirrors [6, 7, 9, 13]. We
restrict our overview to the stereo system based on a single
camera (single lens) and convex mirrors. A stereo vision
system based on a single conventional camera (one lens)
and two specular spheres (convex mirrors) was probably
used first by [17]. In [17], the authors studied four stereo
systems with a single camera looking at mirrors. They dis-
cussed the case of all single view point systems (planar, el-
lipsoidal, hyperboloidal and paraboloidal). A stereovision
system that uses two vertically aligned mirrors with differ-
ent curvatures (”two-biconvex lobes”) has been proposed
by [15, 20]. More recently, Sagawa et al., proposed a sin-
gle camera with nine spherical mirrors (they are easier to
make): a principal one with eight others around it [19].
Usually, the design of these sensors is made without any
optimization process. In [4], the authors have studied the
behavior of these sensors if they modify the number (at least
two), the position and the size of mirrors. Our paper aims to
describe a new way to design a new stereovision system us-
ing a single camera and four mirrors. We will describe the
criteria we have used and the optimization algorithm devel-
oped for the design. Next, we will show how to validate this
sensor and compare it to another classical sensor. Finally,
we will discuss the accuracy and the field of view of this
sensor.
2. Criteria to design a better sensor
Most omnidirectional stereo catadioptric sensors are
based on drawings illustrated in Fig. 1. The framework of
this study is based on them. Indeed, we used several mirrors
set on the same plane and looking to a camera.
2.1. Criteria and models
We wish to design a stereovision sensor using a single
camera and multiple mirrors to obtain a compact sensor.
The purpose of this sensor is to reconstruct an environment
for the navigation of mobile robots. To design it, we use the
following criteria:
978-1-4244-1631-8/07/$25.00 ©2007 IEEE
ba
c d
Figure 1. Stereovision system based on a single camera and several
mirrors
Camera
telecentric lens
−40
−40−30−20−10 0 10 20 30 40
0
2
4
6
8
10
−30
−20
−10
0
10
20
30
40 40 −40 −30 −20 −10 0 10 20 30
30 20
10 0
−10−20
−30−40
40
Figure 2. Paraboloidal mirrors are seen by an orthographic camera
• 3D reconstruction Accuracy
• Field of view
The components of sensors are: mirrors, lenses and cam-
eras. Typical sensors based on different associations of
these components have been proposed in [16].
As shown in [1], one interesting constraint is the sin-
gle view point. In the case of hyperbolic mirror, its focus
point has to coincide with the camera’s focus. This is easy
to do when we have a single mirror. But if we have more
than one, it became very difficult to manufacture. On the
contrary, the parabolic mirror does not have this problem.
More over, it reflects the rays in parallel way to the optical
axis. The camera’s horizontal placing (perpendicular to the
optical axis) does not influence the picture. In other words,
there exists an invariance with respect to the camera plac-
ing [5, 18]. This is an interesting property since several
mirrors can be set so that their revolution axes are parallel.
In this case, an orthographic projection is used as shown in
[1], obtained by a telecentric lens and a camera. This is the
configuration we have used to design our sensor (see Fig. 2).
Let P = (X, Y, Z) be a three dimensional point and p =(x, y, z)i be its image on mirror i. The projection center of
mirror i, is at dX, dY, dZ as shown in Fig. 3.
The general model for the not-centered mirrors, is given
by (1):
Figure 3. Stereovision system with two mirrors and reconstruction
error
x
y
z
i
=hi√
X2+Y2+Z2+Z
X
Y
Z
(1)
where X = X − dX, Y = Y − dY, Z = Z − dZ and 4hi is
the latus rectum (where h is the focal length).
The orthographic projection of those points on the cam-
era is given by (2):
(
u
v
)
i
=
(
αu 00 αv
) (
x + dX
y + dY
)
i
(2)
where (u, v) are the image coordinates of this point and
(αu, αv) are intrinsic camera parameters. Note that the
model does not need the camera center.
To show the principle of the 3D reconstruction, we con-
sider only two mirrors (see Fig. 3).
Equations (1) and (2) are valid even if we simultaneously
combine horizontal and vertical shifts. dXi, dYi and dZi
represent translations between the general frame and mirror
i’s frame (respectively dXj , dYj and dZj for frame j).
2.2. Reconstruction Error
Considering two images (obtained thanks to two mir-
rors, see Fig. 3), the main problem is to find the real point
corresponding to its position in both images. When there
is no noise the problem is trivial. Otherwise the rays can
not meet and the problem is how to find the real point that
we called “back-projected point”. This problem is known
as “triangulation” problem which corresponds to finding
the intersection of two rays in space. Let (u′, v′)1 and
(u′, v′)2 be respectively the coordinates of the reflected
points of P on the image plane thanks to the two mirrors.
Due to some noise, (u′, v′)1 and (u′, v′)2 are not the es-
timate values. Let (u, v)1 and (u, v)2 be the real correct
values that are close to (u′, v′)1 and (u′, v′)2, the rays of
which meet in P (i.e. (u′, v′)1 = (u, v)1 + (du, dv)1 and
(u′, v′)2 = (u, v)2+(du, dv)2). As described in [11], there
are many methods to find the back-projected point knowing
(u′, v′)1 and (u′, v′)2.
3 mirrors
4 mirrors
1.341.34
1.241.31
(1.42, −1.43)(−1.44, −1.45)
(1.55, 1.55)(−1.43, 1.48)
5 mirrors
(−1.44, −1.34)
(1.53, 1.54)
1.34
1.24
2 mirrors
1.001.20
1.011.19
(1.69, −1.73)(−1.54, −1.53)
(0.28, −0.12)
(1.68, 1.69)(−1.49, 1.46)
(−1.40, 1.46)(1.47, 1.31)
(−1.25, −1.41)
1.301.31
1.38
mean error (in cm): 28.691
−1
0
1
2
−2 −1 0 1 2
2 1 0−1−2
2
1
0
−1
−2
2
−2
−1
0
1
2
−2 −1 0 1 2
1 0−1−2
2
1
0
−1
−2
mean error(in cm): 40.300 mean error(in cm): 34.778
mean error (in cm): 31.595
−2
Figure 4. SLS4OCS: Results from 2 to 5 mirrors
The reconstruction error associated to P is the distance
between P and its back-projected point P ′. The shorter this
distance, the greater the 3D reconstruction accuracy and the
better the configuration. For an environment E (i.e. a set
of points), the reconstruction error is the mean of all the
reconstruction errors associated to each point belonging to
E.
For our experiments we choose the method which con-
sists in computing the mid-point of the common perpendic-
ular to the two rays from (u′, v′)1 and (u′, v′)2 [2].
3. SLS4OCS: a Stochastic Local Search for
Omnidirectional Catadioptric System
As the size and positions of mirrors depend on the en-
vironment to be reconstructed, we develop an automatic
method in order to compute a sensor which will be able to
provide a solution depending on a given environment and a
preset number of mirrors. Our preliminary study leads us to
develop a method that was based on a genetic algorithm [8].
This first approach was quickly prohibitive because of its
time consuming for a small number of mirrors. We finally
chose to develop a stochastic local search method which is
able to compute a solution (i.e. the size and the coordinates
of each mirror) in reasonable time. This approach aims to
minimize the mean reconstruction error of a representative,
randomly chosen set of points of the considered environ-
ment by providing the geometry of all the mirrors of the
sensor.
A classical stochastic local search algorithm [12] con-
sists in a “walk” among all feasible solutions. A cost func-
tion guides the walk to areas of the search-space where the
solutions have a low mean reconstruction error. This tech-
nique is named the ”Local Search”. To escape from these
”local minima” and find other ”good” areas, random steps
across the search-space can be done. Our algorithm (named
SLS4OCS) starts with a random feasible solution and its
reconstruction error is estimated. The first step consists in
envisaging all feasible solutions in which we increment or
decrement one of the coordinates or the size of only one
mirror parameter (coordinates or radius) in the initial solu-
tion. These solutions are in a set called “neighborhood”.
For each solution of the “neighborhood”, the reconstruction
error is computed and the solution with the lowest one is
chosen. To avoid local minimum, a random choice can oc-
cur in the choice of the best neighbor with a probability,
’p’ (empirically set at 0.5 in this study). A local minimum
is a solution where all solutions in its neighborhood are a
greater reconstruction error than the latter. The algorithm
stops when a preset number of iteration is reached and the
best solution found is given. Within the framework of this
study, we define some constraints: the mirrors can not over-
lap each other, they can not overflow out of the sensor image
and the mirror centers are on the same plane (z = 0).
SLS4OCS has been implemented in C language and run
AMD Opteron under Linux Kernel. Within the experimen-
tal framework of this study, the sensor is characterized by
an image plane with 1000 × 1000 pixels. Before back-
projecting one given point from the image plane, we intro-
duce an error of 1
2pixel toward a random direction to sim-
ulate pixel noise. Within the framework of this study, each
point from the set of points to be reconstructed is randomly
chosen in a cubic or cylindrical environment in order to sim-
ulate a realistic environment. The final solutions found by
the SLS4OCS algorithm for a number of mirrors from 2 to
5 and for a set of points from an cubic environment with
edges of 500 cm are presented in Fig. 4. Each circle of
the figure represents the top view of a paraboloidal mirror
(i.e. its omnidirectional picture). For each configuration in
Fig. 4, we give the center of the mirrors, the radius and the
mean reconstruction error. Solutions from 6 to 9 have been
computed but they are not presented here.
4. Validation
We used the SLS4OC algorithm [4] as a decision help
process. The solutions provided have been slightly modi-
fied in order to have symmetric configurations. Symmetric
configurations are easier to manufacture and cheaper than
configurations shown in Fig 4 while the accuracy is very
close. Among all results, we focus on the two mirror and
four mirror configurations. In the following, we named Two
the configuration with two mirrors and Four the configura-
tion with four mirrors [16]. The Two configuration has first
mirror center in (-1.48 cm, -1.48 cm), radius 1.31 cm and
the second mirror center in (1.48 cm, 1.48 cm), radius 1.31
cm. The Four configuration has first mirror center in (-1.48
cm, 1.48 cm), radius 1.31 cm; the second mirror center in
(1.48 cm, 1.48 cm), radius 1.31 cm; the third mirror cen-
(a) Image from the Two sensor (b) Image from the Four sensor
(c) Lab environment
Figure 5. Modeling of our Lab
(a) With the Two sensor (b) With the Four sensor
Figure 6. Pictures of targets on walls
ter in (-1.48 cm, -1.48 cm), radius 1.31 cm and the fourth
mirror center in (1.48 cm, -1.48 cm), radius 1.31 cm.
We focus on Four because it has the best reconstruction
error. To validate this configuration we propose:
• to compare it with the Two configuration since the Two
configuration has been used in [17]
• to generate pictures thanks to PovRay 1 and calculate
the reconstruction error. PovRay has been chosen be-
cause getting a ground truth to evaluate the reconstruc-
tion error is easy. Beside, the obtained ground truth is
precise. We modeled the environment of our Lab for
experiment as we can see in Fig. 5(c) and the taken im-
ages by the configuration Two and Four are presented
in Fig. 5(a) and Fig. 5(b).
In the following, all images have a size of 1000 × 1000pixels.
4.1. Accuracy
We evaluate the reconstruction accuracy of the Two and
Four configurations, respectively. We select targets of our
1Persistence of Vision Ray-Tracing Software
1
10
100
1000
5 10 15 20 25 30 35Reconstr
uction e
rror
(cm
logscale
)
points
Two sensor (Avg: 47, Std: 113)Four sensor (Avg: 6, Std: 5)
Figure 7. 3D error reconstruction for Two and Four mirrors con-
figurations
0
10
20
30
40
50
60
70
80
90
5 10 15 20 25 30 35
Re
co
nstr
uctio
n e
rro
r (c
m)
points
Two sensor (Avg: 21, Std: 18)Four sensor (Avg: 13, Std: 8)
Figure 8. 3D error reconstruction averaged for 1000 images for
Two and Four mirror configurations. Points are corrupted by a
random noise.
modeled environment, the 3D coordinates of which are
known in global frame. Fig. 6 provides the omnidirectional
projection of the environment on the camera for the Two
(see Fig. 6(a)) and Four (see Fig. 6(b)) sensors, respec-
tively. The walls and textures have been removed on Fig. 6
to facilitate the mapping since it is not the aim of this work.
To estimate automatically the 3D reconstruction accuracy,
we apply a Harris Detector [10], a sub-pixel localization of
image points for each mirror image and finally a cross cor-
relation algorithm to match corresponding points. The tri-
angulation algorithm (see subsection 2.2) provides the ex-
tracted and mapped points. Each 3D point needs exactly
(resp. at least) two projected points from mirror images to
apply the triangulation process for the Two sensor (resp. the
Four sensor).
The reconstruction error corresponds to the euclidean
distance between the real 3D point and the reconstructed
one from the images. Several tests have been realized to
compare the two sensors.
The curve in Fig. 7 shows the reconstruction error for
each point of the our environment corresponding to selected
targets (see Fig. 6) when a sub-pixel detection is done. The
reconstruction error from the Four sensor is about twice bet-
ter than the one obtained by the Two sensor. Let us note that
these points are well distributed in the image. We then can
consider that they are representative enough. In Fig. 9, real
3D and back-projected points are displayed to see the spa-
cial placing corresponding to targets in Fig. 6(b).
In the last experiment (see Fig. 8), the image obtained
from the raytracing has been corrupted by a Gaussian noise
and the result is obtained for an average of 1000 images.
That is why there is a difference with the mean reconstruc-
tion error from Fig. 9. The Four sensor prints a greater
-400-300
-200-100
0 100
200
-500-400
-300-200
-100 0
100 200
300
-50
0
50
100
150
200
250
Z
Real 3D pointsReconstructed points
X
Y
Z
Figure 9. 3D real points and associated reconstructed points for
Four sensor
Angle
He
igh
t (c
m)
0 45 90 135 180 225 270 315 3600
20
40
60
80
100
120
(a) Two sensor
Angle
He
igh
t (c
m)
0 90 180 270 360.0 0
20
40
60
80
100
120
(b) Four sensor
Figure 10. Field of view of the configurations
Figure 11. Image obtained by Four with a single target that it is
occluded by mirror # 2 (seen by only 3 mirrors)
accuracy than the other one.
4.2. The field of view
We define the useful field of view as the set of points in
the 3D environment that are reflected by mirrors and con-
sequently seen by the camera. When the sensor consists of
multiple mirrors, self occlusions between mirrors occur. It
is then interesting to study how to increase the number of
mirrors remains beneficial for the sensor.
Fig. 10 illustrates the influence of the number of mir-
rors on the field of view. Each square of Fig. 10 provides
the set of points of the unrolled surface of our cylindrical
environment for the Two sensors (see Fig. 10(a)) and Four
sensor (see Fig. 10(b)). For each point of this environment,
we choose a gray-scale representation to indicate the num-
ber of mirrors which are able to see it. Thus, the white area
represents the points of the environment which can be seen
and then reconstructed by all the mirrors of the sensor. The
black area (see the bottom of Fig 10(a)) shows the points
which can be seen by only one mirror and then cannot be
reconstructed. Between these two bounds, the other gray-
scale areas describe the geometry of the environment where
the points cannot be seen by all the mirrors of the sensor but
by at least two mirrors. The darker the area, the greater the
number of blind mirrors of the sensor.
Within this framework, on the left side, Fig. 10(a) shows
the field of view of the Two sensor for a cylindrical envi-
ronment. We can notice that the geometry of this sensor
causes two ”blind” areas where the points are seen by only
one mirror and then cannot be reconstructed. On the other
hand, the field of view of the sensor which consists of four
mirrors (see Fig. 10(b)) doesn’t have ”blind” areas.
In order to explain this phenomenon, we select a target of
our modeled environment and project it to the Two and Four
sensors , respectively. Fig. 11 provides the omnidirectional
image of this projected target for the Four sensor (consider-
ing mirrors from #1 to #4) and the Two sensor (considering
mirrors #2 and #3). As the mirror #3 is occluded by mir-
ror #2, the Two sensor cannot reconstruct the target. On the
contrary, three of four mirrors of the Four sensor are able
to back-project our target. We then are able to estimate the
associated 3D points, as can be seen it on Fig. 12.
We first note that, the 3D point reconstruction accuracy
depends on the geometry of the back-projected images de-
pending on mirrors. Thus, when a back-projected point on
one image is on (or quite on) the line which crosses the cen-
ter of a mirror, the conditions needed for triangulation are
bad and therefore a small image localization error gener-
ates a big estimation error of the 3D point. The Four sensor
decreases this case compared to the Two sensor. Another
advantage of the Four sensor is that for each 3D point of the
environment, it always exists at least two of the four mir-
rors which are well positioned to back-project the point. To
illustrate this property, the curves on Fig. 12 represent the
mean reconstruction error of our selected target when we re-
construct it with the Four sensor (see Fig. 11). As this target
can be reconstructed by at most three mirrors (i.e #1, #2 and
#4), the four curves of Fig. 12(a) (and also Fig. 12(b) and
Fig. 12(c)) represent the reconstruction error of our target
when the triangulation is respectively processed from cou-
ples of mirrors (#1, #2), (#2, #4), (#1, #4) and finally from
0
50
100
150
200
250
1 2 3 4
Re
co
nstr
uctio
n e
rro
r (c
m)
points
Mirror 1,2 & 4Mirror 1 & 2Mirror 1 & 4Mirror 2 & 4
(a) Image points are extracted with a sub-pixel accuracy
0
50
100
150
200
250
1 2 3 4
Re
co
nstr
uctio
n e
rro
r (c
m)
Points
Mirror 1,2 & 4Mirror 1 & 2Mirror 1 & 4Mirror 2 & 4
(b) Image points positions are corrupted by a Gaussian noise (σ = 0.25)
0
50
100
150
200
250
1 2 3 4
Re
co
nstr
uctio
n e
rro
r (c
m)
Points
Mirror 1,2 & 4Mirror 1 & 2Mirror 1 & 4Mirror 2 & 4
(c) Image points positions are corrupted by a Gaussian noise (σ = 0.5)
Figure 12. Reconstruction error for the single target obtained with
Four vs number of used mirrors
all three mirrors. On the contrary of Fig 12(a) where the
projected points are not corrupted, Fig. 12(b) and Fig. 12(c)
provide reconstruction errors of the target where the pro-
jected points are corrupted by at most 1
4and 1
2pixel respec-
tively. We can note on Fig. 12 that considering three mirrors
give often the better reconstruction error but both mirrors #1
and #4 give good results too since the single target is near
the perpendicular bisector of these centers of mirrors. This
is an another advantage of the Four sensor.
5. Conclusion and Future Works
We have presented a compact sensor for omnidirectional
stereovision. This sensor consists of four paraboloid mir-
rors and one orthographic camera. This new configuration
has been exhibited with the help of an optimization process
(Local Search Algorithm). We have shown that this sensor
is more accurate than those based on two mirrors by using
raytracing images as they give a ground truth. This prop-
erty is invariant for all noises added to the pixel positions.
We have also shown that this sensor is more isotropic as it
offers a better field of view. We have finished to build the
Four sensor and we are now working to validate it on mobile
robot for navigation in an indoor environment.
Acknowledgment
We want to thank our colleagues L. Guyot and P. Vasseur
for their remarks and corrections.
References
[1] S. Baker and S. Nayar. A theory of catadioptric image for-
mation. In ICCV, pages 35–42, 1998.
[2] P. A. Beardsley, A. Zisserman, and D. W. Murray. Navigation
using affine structure from motion. In ECCV, pages 85–96,
1994.
[3] R. Benosman, T. Maniere, and J. Devars. Multidirectional
stereovision sensor, calibration and scene reconstruction. In
ICPR 96, pages 161–1665, 1996.
[4] G. Dequen, L. Devendeville, and M. Mouaddib. Stochastic
Local Search for Omnidirectional Catadioptric Stereovision
Design. In IbPRIA, volume 4478 of LNCS, pages 404–411,
2007.
[5] C. Geyer and K. Daniilidis. Catadioptric camera calibration.
In ICCV, pages 398–404, 1999.
[6] J. Gluckman and S. Nayar. Planar catadioptric stereo: Ge-
ometry and calibration. CVPR, 1:22–28, 1999.
[7] J. Gluckman and S. Nayar. Rectified catadioptric stereo sen-
sors. In CVPR, pages 224–236, 2000.
[8] D. E. Goldberg. Genetic Algorithms in Search, Optimization
and Machine Learning. Kluwer Academic, 1989.
[9] A. Goshtasby and W. Gruver. Design of a single lens stereo
camera system. Pattern Recognition, 26(6):923–937, 1993.
[10] C. Harris and M. Stephens. A combined corner and edge
detector. In AVC, pages 147–152, 1988.
[11] R. Hartley and P. Sturm. Triangulation. Computer Vision
and Image Understanding, 68(2):146–157, 1997.
[12] H. H. Hoos and T. Stutzle. STOCHASTIC LOCAL SEARCH
Foundations and Applications. Elsevier, 2005.
[13] M. Inaba, T. Hara, and H. Inoue. A stereo viewer based on
a single camera with view-control mechanism. In ICIRS,
volume 2, 1993.
[14] H. Ishiguro, M. Yamamoto, and S. Tsuji. Omnidirectional
stereo. IEEE Trans. PAMI, 14, no. 2:257–262, 1992.
[15] G. Jang, S. Kim, and I. Kweon. Single-camera panoramic
stereo system with single-viewpoint optics. Optics Letters,
31(1):41–43, 2006.
[16] E. Mouaddib, R. Sagawa, T. Echigo, and Y. Yagi. Stereo vi-
sion with a single camera and multiple mirrors. ICRA, 2005.
[17] S. Nayar. Sphereo: Determining Depth using Two Specular
Spheres and a Single Camera. In SPIE Conference on Op-
tics, Illumination, and Image Sensing for Machine Vision III,
pages 245–254, 1988.
[18] S. Nayar. Catadioptric Omnidirectional Camera. In CVPR,
pages 482–488, 1997.
[19] R. Sagawa, N. Kurita, T. Echigo, and Y. Yagi. Compound
catadioptric stereo sensor for omnidirectional object detec-
tion. In IROS, volume 2, pages 2612–2617, 2004.
[20] D. Southwell, J. Reyda, M. Fiala, and A. Basu. Panoramic
stereo. In ICPR96, 1996.