shane tuohy thesis
Post on 16-Oct-2014
93 Views
Preview:
TRANSCRIPT
i
Real Time Distance Determination for an
Automobile Environment using Inverse
Perspective Mapping in Open CV
Shane Tuohy
B.E in Electronic and Computer Engineering
Supervisor – Dr. Martin Glavin
Co Supervisor – Dr. Fearghal Morgan
24 March 2010
ii
Abstract
This project aims to develop a real time distance determination algorithm for use in an automobile
environment. Increasingly modern cars are being fitted with image sensors, these image sensors can be
used to obtain large amounts of information about the surrounding area.
From a single front facing image, it is difficult to determine distances to objects in front of the vehicle
with any degree of certainty. There is a non linear relationship between the height of an object in a front
facing image, and its distance from the camera.
This project aims to use Inverse Perspective Mapping to overcome this problem. Using Inverse
Perspective Mapping, we can transform the front facing image to a top down bird’s eye view, in which
there is a linear relationship between distances in the image and in the real world.
The aim of the project is to implement the algorithm in the C language using the OpenCV libraries.
Working in this way provides for a high performance, low overhead system that will be possible to
implement and run on a low power embedded device in an automobile environment.
iii
Acknowledgements
I would like to acknowledge the help and support I received throughout the project from my project
supervisor Dr. Martin Glavin and postgraduate researchers in the CAR lab, NUIG.
In particular I would like to thank Diarmaid O'Cualain for his constant support and patience.
This project would not have been possible without debugging help, discussion and encouragement
received from my fellow 4th
EE/ECE classmates.
Finally, I’d like to thank my parents for their continued support over the last 4 years.
iv
Declaration of Originality
I declare that this thesis is my original work except where stated.
Date: ___________________________________
Signature: ___________________________________
v
1 Contents Abstract ..................................................................................................................................................... ii
Acknowledgements .................................................................................................................................. iii
Declaration of Originality ............................................................................................................................. iv
Table of Figures ........................................................................................................................................... vii
1 Glossary ................................................................................................................................................. 1
2 System Overview................................................................................................................................... 2
3 Background Technologies ..................................................................................................................... 6
3.1 Computer Vision ........................................................................................................................... 6
3.2 OpenCV ......................................................................................................................................... 6
3.2.1 Useful OpenCV Functions ..................................................................................................... 7
3.2.2 cvSetImageROI ...................................................................................................................... 8
3.2.3 cvWarpPerspective ............................................................................................................... 8
3.2.4 Drawing Functions ................................................................................................................ 8
3.3 Inverse Perspective Mapping ........................................................................................................ 9
4 Project Structure ................................................................................................................................. 12
4.1 Overall System Flowchart ........................................................................................................... 13
4.2 First Frame Operations Flowchart .............................................................................................. 14
4.3 Process Specifics ......................................................................................................................... 15
4.3.1 Capture Video Frame .......................................................................................................... 15
4.3.2 First Frame Operations ....................................................................................................... 15
4.3.3 Threshold Image ................................................................................................................. 16
4.3.4 Warp Perspective ................................................................................................................ 18
4.3.5 Distance Determination ...................................................................................................... 20
4.3.6 Provide Graphical Overlay................................................................................................... 21
4.3.7 Processing Real Time Images (from camera) ...................................................................... 22
5 Optimisation and Testing .................................................................................................................... 23
5.1 Generate lookup array ................................................................................................................ 23
5.2 Sampling Rate ............................................................................................................................. 24
5.3 Finding Threshold Range ............................................................................................................. 25
5.4 Using Performance Primitives ..................................................................................................... 27
5.5 Level to Trigger Detection ........................................................................................................... 28
vi
5.6 Memory Management ................................................................................................................ 29
5.7 Calibration ................................................................................................................................... 30
6 Results ................................................................................................................................................. 32
6.1 Selection of Sampling Rate ......................................................................................................... 32
6.2 Performance of Algorithms ......................................................................................................... 34
7 Further Work ....................................................................................................................................... 37
7.1.1 Processing Time .................................................................................................................. 37
7.1.2 Environmental Conditions................................................................................................... 37
7.1.3 Tracking ............................................................................................................................... 38
7.1.4 Embedded Implementation ................................................................................................ 38
8 Conclusion ........................................................................................................................................... 39
9 References .......................................................................................................................................... 40
10 Appendix A - On the CD .................................................................................................................. 41
vii
Table of Figures
Figure 1 - Overview of proposed system ...................................................................................................... 4
Figure 2 - Images illustrating differences between vertical distance on camera image, and real world
distance ......................................................................................................................................................... 5
Figure 3 - Inverse Perspective Mapped image with distance indicated by white arrow. Vertical distance
now corresponds linearly to real distance .................................................................................................... 5
Figure 4 – Illustration of camera position and coordinate systems in use ................................................... 9
Figure 5 - Inverse Perspective Mapped view of road scene. Distorted image of car ahead can be seen in
red box at top of image. .............................................................................................................................. 11
Figure 6 - Overall System Flowchart ........................................................................................................... 13
Figure 7 - First Frame Operations Flowchart .............................................................................................. 14
Figure 8 - Original image before thresholding to remove road pixels ........................................................ 17
Figure 9 - Thresholded image of same scene as figure above with road pixels removed .......................... 17
Figure 10 - Thresholded image from figure above with road removed and non road objects highlighted 18
Figure 11 – Sample road scene image before perspective is warped using transformation matrix .......... 19
Figure 12 – Previous figure after perspective has been warped using transformation matrix .................. 19
Figure 13 – Transformed image of sample road scene, ready for object detection. ................................. 20
Figure 14 – Source image with overlay of rectangular box and distance value ......................................... 22
Figure 15 - Illustrates mapping of points in source top down image to points in front facing image ........ 24
Figure 16 - Original Image - No Thresholding Applied ................................................................................ 26
Figure 17 -Small threshold value applied to scene Large threshold value applied to scene .................. 26
Figure 18 - Thresholding with range of ±35 ................................................................................................ 27
Figure 19 - Example of distance detection performed with small trigger value ........................................ 28
Figure 20 - Graph illustrating memory use for several video samples ....................................................... 30
Figure 21 – Example of ‘known square’ method of calibration .................................................................. 31
Figure 22 - Samples of successful distance determination ......................................................................... 32
Figure 23 - Plot of computation times for different sampling rates ........................................................... 33
Figure 24 - Measured computation time - comparison between sampling rate of 1 and 10 ..................... 34
Figure 25 - Comparison of computation times in seconds ......................................................................... 35
Figure 26 - Graph of processing time for each of 3 major constituents of algorithm ................................ 36
1
1 Glossary
IPM – Inverse Perspective Mapping
OpenCV – Open Computer Vision
Thresholding – Process by which pixels above or below a certain intensity are removed from an image
C – Low level, compiled programming language
gcc – Open source C compiler
ROI – Region of Interest
2
2 System Overview
In 2005, 396 people, more than one per day, were killed in road traffic accidents [1]. For this reason,
collision avoidance or prevention systems are in the best interests of car safety. Safety is a primary
concern for all car manufacturers. In recent years, ABS, stability controls, airbags, ESP etc. have become
standard on many car models.
Using computer vision techniques and optical cameras, safety systems can be vastly improved. Cars in
the near future will be able to intelligently analyze their environment and react accordingly to improve
driver safety.
Computer vision is fundamentally the process by which we can allow machines to ‘see’ the world and
react to it. Its importance cannot be overstated in fields such as manufacturing, surveillance and
environment detection. Using the techniques of computer vision, we can create powerful and helpful
real world applications which incorporate real world conditions.
An increasingly common application of computer vision systems is in the field of safety. Machines can be
programmed to detect and respond to dangerous conditions automatically, based on the interpretation
of the world around them. Computer vision can be used to provide accurate, useful information to
machine operators or users. One such machine, where computer vision can be leveraged to provide
useful, potentially lifesaving information, is the car.
Current systems on the market from manufacturers such as Mercedes [2] pre-charge brakes and tighten
slack on seatbelts if an imminent collision is detected.
It is becoming increasingly common for modern automobiles to be fitted with camera systems to aid in
driver awareness and safety. Systems such as those found in the Opal Insignia are becoming more and
more popular. The Opel Insignia uses a front mounted camera to detect road signs and monitor lane
departures providing increased levels of information to drivers.
Distance determination in an automobile environment is understandably a worthwhile undertaking.
With an effective distance determination algorithm, steps can be taken to alert drivers to potential
hazards and unsafe driving. Distance data from a system similar to the one proposed could be applied to
an adaptive cruise control system, which senses upcoming obstacles and adjusts the speed of the
vehicle accordingly. In fact, combined with lane detection algorithms, it is entirely possible to envision a
car that could, in theory, drive itself.
3
Currently available systems on the market from manufacturers such as Audi, Mercedes Benz, and Nissan
use RADAR or LIDAR sensors to implement collision detection. These can work well when the RADAR
signals reflect from a metal object, they do not, however, detect pedestrians or animals on the road.
These systems are also expensive to implement, and are therefore a sub optimal solution.
Current research into the area of collision detection is in the field of forward facing cameras, which
provide more information about a scene and are cheap and reliable.
The proposed system consists of a single front facing video camera mounted on a vehicle capturing
video images at 30 frames per second. This setup distinguishes the system from similar systems which
use either a multi camera setup or, alternatively, active devices such as RADAR or LADAR. A single
camera system is more reliable and simpler than any of these methods.
A dual (or more) camera setup as employed by Toyota in some Lexus models, provides more data to
process, and therefore, more accurate results. However, it also carries severe processing and
configuration overheads, which render it unsuitable for use in low power, low resource, embedded
devices typically found in automobiles. It is also a much more expensive system to implement, for
obvious reasons, than a single camera system.
Active systems such as RADAR or LIDAR require signals to be reflected from targets, this leaves them
susceptible to interference, possibly from other identical systems approaching them. Mounting these
active systems to cars can difficult, and most importantly, carry a significant financial expense.
Often, a front facing optical camera fitted to a car can have several uses, the same camera can be used
for lane detection and road sign detection as well as distance determination, providing a comprehensive
security package using a single camera.
Previously, computer vision algorithms were much too computationally heavy to implement in real time
on low power devices. Devices such as the Intel Atom processor, which are very capable, yet consume
very little power, can make implementation of these types of algorithm in real time a reality.
In conclusion, the passive nature, flexibility and simplicity of a single camera set up makes it well suited
to implementation in an automobile environment. The proposed system is capable of providing life
saving information to drivers.
4
Figure 1 - Overview of proposed system
The two figures below illustrate the problem explicitly. As the vehicle in front approaches the camera,
vertical distance in the image, as indicated by the white lines, does not vary in a linear fashion.
The third image has been transformed to a top down view, now distance on the image is linearly related
to distance in the real world.
5
Figure 2 - Images illustrating differences between vertical distance on camera image, and real world distance
Figure 3 - Inverse Perspective Mapped image with distance indicated by white arrow. Vertical distance now corresponds
linearly to real distance
6
3 Background Technologies
This chapter illustrates the technology background of the project. It provides a brief overview of the
field of computer vision and the technologies being used to implement the system.
3.1 Computer Vision
Computer vision is a rapidly growing field. Modern, more powerful, microprocessors are increasingly
able to handle the computationally expensive overhead of working with images. This opens up exciting
possibilities for innovative computer vision implementations in everyday live. A recently popular
example is the field of augmented reality, which uses computer vision techniques on a portable device
to ‘understand’ a scene and overlay useful information. It is often found in mobile phones, which prior
to now, would not have been capable of handling the processing overhead of such a system.
One area in which computer visions are an integral part is the DARPA Urban Challenge1 in which
students attempt to create a vehicle that will drive itself. Vehicles are required to merge into two way
traffic and carry out other complex driving manoeuvres autonomously. Without advanced computer
vision algorithms, these challenges would be impossible.
3.2 OpenCV2
OpenCV stands for “Open Computer Vision”. OpenCV is a library of functions for image processing.
Originally developed by Intel in 1999, it is now an Open Source project released under the BSD license.
The purpose of OpenCV is to provide open and optimized code to support the advancement of
The functions themselves are mostly written in C. The purpose of OpenCV is to provide robust, quick
routines for standard image processing techniques. There are many commonly used techniques in image
processing for computer vision applications. OpenCV provides implementations of many of these
functions allowing for rapid algorithm implementation. While implementing algorithms, the
programmer doesn’t have to continually “reinvent the wheel”.
Although OpenCV itself is an open source project, Intel provides a product named “Integrated
Performance Primitives”, a commercial package of highly optimized routines which OpenCV can use in
place of its own routines to speed up computation.
1 http://www.darpa.mil/grandchallenge/index.asp
2 http://opencv.willowgarage.com/wiki/
7
For this particular project, OpenCV greatly accelerated development by providing routines to threshold
images, generate homography matrices and sample regions of images.
Although wrappers for the OpenCV libraries have been developed for high level languages such as C#
and Python, code for this system was written solely in C, and compiled using the standard gcc compiler.
Using a lower level language like C increases performance, leading to a real time, or close to real time
implementation.
3.2.1 Useful OpenCV Functions
3.2.1.1 cvThreshold
Performing a threshold of an image is a fundamental image processing technique. It involves examining
the intensity values of each pixel in an image, and performing a particular operation depending on this
value.
OpenCV provides extensive thresholding options via the cvThreshold function. Several different types of
thresholding are available;
• CV_THRESH_BINARY
• CV_THRESH_BINARY_INV
• CV_THRESH_TRUNC
• CV_THRESH_TOZERO_INV
• CV_THRESH_TOZERO
Most pertinent to this project are CV_THRESH_TOZERO, CV_THRESH_TOZERO_INV and
CV_THRESH_BINARY.
CV_THRESH_TOZERO – If a pixel is below the threshold value, it is given an intensity value of 0.
Otherwise it is not affected.
CV_THRESH_TOZERO_INV – The opposite of above, if a pixel is above the threshold value, it is given an
intensity of 0. Otherwise it is not affected.
CV_THRESH_BINARY – As the name suggests, depending on which side of the threshold value the pixel
intensity value lies on, it is assigned a value of 0, or 255.
8
3.2.2 cvSetImageROI
cvSetImageROI allows the programmer to perform operations on specified areas of an image. This
allows functions such as thresholding, averaging, smoothing etc to be applied to one part of an image,
while leaving the rest of the image the same, this function has proven very useful throughout the
project.
3.2.3 cvWarpPerspective
cvWarpPerspective, as the name suggest, allows the perspective of an image be warped based on a
transformation matrix. This has been the most important function provided by OpenCV for this project.
It allows points to be mapped from one perspective to another through the use of a single function,
greatly reducing programmer overhead.
3.2.4 Drawing Functions
OpenCV provides numerous drawing functions which allow feedback to be given to the user easily by
overlaying shapes and text onto images.
9
3.3 Inverse Perspective Mapping
A front facing image from a car is useful for many applications, it is however, useless for distance
determination. Vehicles that are far away from the camera appear high in the image. They appear
progressively lower as they approach the camera.
The problem is that this does not occur in a linear fashion, therefore there is no simple way to discern
distance information based on the position of a vehicle in a front facing image.
To overcome this issue, we use Inverse Perspective Mapping [3][4]. Inverse Perspective Mapping (IPM)
uses a 3 by 3 homography matrix to translate points from the image plane to those found in the real
world.
Using this homography matrix, we can transform our image so that we are looking directly at the true
road plane. We can use this to measure the distance between objects in the image with a degree of
certainty, since the relative position of the objects will change in a linear fashion.
The figure below illustrates the different coordinate systems that are employed, and their relation to
one another.
Figure 4 – Illustration of camera position and coordinate systems in use
���, �, �� �� � �� ����� �� � ������ �. ���, �� �� �� �� � ������ ���� � �� �� ������ � ��� .
���, �, 0� ! � �� �������� � �� � ��� �� � �� ����"��� � ��� .
10
As can be seen from the figure above, mapping a point from I(u,v) involves a rotation about the angle θ
and a translation along the line of sight of the camera.
The matrix mathematics are shown below in Equation 1 [5]
Equation 1 – Matrix Mathematics for transforming point from image coordinates to world coordinates
���, �, 1�$ % �&�� &�����'����������(�����������, �, �, 1�$
Simplifying results in Equation 2:
Equation 2 - Illustrating transformation matrix
Transformation Matrix
)�*�*1+ % ,�-- �-. �-/ �-0�.- �.. �./ �.0�/- �/. �// �/0123454641 7
We use homogenous co ordinate systems as it allows us to represent points at infinity, e.g. vanishing
points. They also allow us to include division into linear equations, as well as constants.[6]
Equation 3 - Demonstration of benefits of homogenous coordinates
8��� �� �"��� ������� ����� � �� �� ���� �3, 5, 6� % �3, 5, 6, 0� 89 % �3, 5, 6, :� �� � 8 % �3: , 5: , 6:�
Using the Transformation Matrix above, we can map any point in the plane I(u,v) to its corresponding
point in the real world plane W(X, Y, Z) with Z = 0. The transformation results in images in which any
point with a Z coordinate greater than zero is incorrectly mapped, resulting in distortion of the image, as
can be seen in the image below. For our purposes, an image similar to the one in the figure below is
perfectly sufficient to determine distances.
Figure 5 - Inverse Perspective Mapped view of road sce
Inverse Perspective Mapped view of road scene. Distorted image of car ahead can be seen in red box at top of
image.
11
ne. Distorted image of car ahead can be seen in red box at top of
12
4 Project Structure
This chapter describes the structure of work carried out during the course of the project and the
development of the system. It describes an overall flow for the system before explaining in detail the
logic and implementation behind each step in the system.
This project was split into 5 distinct goals to measure progress. Those goals were;
Goal 1
• Commission the OpenCV system to load frames of video into memory.
• Sample the image pixels of the road directly in front of the vehicle.
• Threshold based on these sample pixels to remove the road surface from the image. The only
remaining pixels are that of the sky, edges of the road (such as trees and buildings), road
markings and other vehicles.
• Generate the IPM characteristic matrix which is developed from the height of the camera from
the ground, and the angle at which the camera is mounted (w.r.t. the ground plane).
Goal 2
• Transform the image from the original input view to the IPM transformed view. This is done by
applying the IPM matrix developed in the previous milestone.
Goal 3
• Determine the distance to the vehicle in front, by looking for the first pixels of “non-road”
directly in front of the vehicle. One the position is determined; it is then possible to calculate the
distance.
Goal 4
• Display this information to the driver by overlaying graphics on the original image to clearly
indicate the distance to the vehicle in front.
13
Goal 5
• Modify the system to run in real time with a video stream. If possible, this could be achieved in
real-time using a video camera and a laptop (or other available embedded signal processing
hardware).
4.1 Overall System Flowchart
The following flowchart illustrates general overall system operation.
Figure 6 - Overall System Flowchart
14
4.2 First Frame Operations Flowchart
The following flowchart illustrates operations carried out on receipt of first frame of video.
Figure 7 - First Frame Operations Flowchart
15
4.3 Process Specifics
4.3.1 Capture Video Frame
Capturing of video frames is done using OpenCVs file capture or camera capture functions. Using these
functions we create a CvCapture object from which we can query for frames from either the video, or
camera connected to the laptop.
4.3.2 First Frame Operations
This section of the system consists of several operations, which need only be carried out once. The data
generated through these operations can then be used over and over for each frame of video that is
sampled. Many of the methods used in this project are computationally heavy to do on the fly, so as
much work as possible is done at the beginning of the program. This way the stored values can be used
repeatedly for each frame, saving on computation time.
4.3.2.1 Capture source points
Firstly, 4 source points are captured from the user. These points are stored in an array, and used for the
next step, generation of the transformation matrix.
This is done using a mouse handler to return the position of user selected points in the image.
4.3.2.2 Generate transformation matrix
The transformation matrix is the key to the Inverse Perspective Mapping algorithm. Helpfully, OpenCV
provides a function to generate this matrix without needing to manually carry out the mathematical
operations listed in the Inverse Perspective Mapping section above.
The function cvGeneratePerspectiveTransform takes an array of points from the source image, and
generates the transformation matrix that maps them to an array of points in the destination (top down)
image. The simplified matrix mathematics is illustrated in the equation below.
������ 8���� % �'���"������� ;������'���"��� � 8����
The source points that are chosen by the user by clicking points in the image, map to a square in the
destination image. If we lay a square shaped object on the road in front of the camera, we can use the
corners of the square to generate the appropriate transformation matrix for the current environmental
conditions.
16
Importantly, applying this operation in the other direction gives us the inverse transformation matrix.
This allows us to map points back from the destination image back to the source image.
4.3.3 Threshold Image
In order to detect a vehicle on the road in front of us, we need to be able to discern what part of an
image is the road, and what part is a vehicle.
A solution to this problem is to threshold the image to remove road pixels. That way, anything in front of
the car with an RGB intensity value greater than zero is an object in front of the car.
The major difficulty that this presents is, given a certain image, how can one detect what is a road pixel
and what is not? Roads vary in shade depending on the time of day, weather etc. To obtain a value for
the particular scene we are working on, we can sample the value of pixels directly in front of the vehicle.
We take a small patch of road slightly in front of the car and obtain an average value for the pixels
across that patch. This gives a good value for the RGB characteristics of the road surface.
This process is somewhat rough, and could be improved by instead taking the median value for all pixels,
thus eliminating noise values generated by the system. Adaptive thresholding, that is, using different
thresholding values for different areas of the image, could be implemented to improve the process
further.
We threshold based on these values. Thresholding works as follows;
• Split image into its constituent channels (R, G, B)
• Use built in OpenCV threshold functions CV_THRESH_TOZERO to remove pixels above and
below the threshold value
• This leaves us with images containing just the road surface in the R, G and B planes
• We subtract the original R, G and B images from the images containing just the road surface and
merge the result
• Finally we perform a binary threshold to leave any non road values with a high value to aid in
distance determination
An example of the thresholding algorithm at work can be seen in the following three figures. Fig. 8
shows a sample source image prior to application of the algorithm. Fig. 9 shows the result of
17
removal of road pixels from the image. Finally, Fig. 10 shows how non road objects are highlighted
to allow for easier detection.
Figure 8 - Original image before thresholding to remove road pixels
Figure 9 - Thresholded image of same scene as figure above with road pixels removed
18
Figure 10 - Thresholded image from figure above with road removed and non road objects highlighted
As can be seen from the example above, the thresholding algorithm is very effective. Most of the road
surface has been removed, leaving the object in front clearly detectable due to its bright colouring.
4.3.4 Warp Perspective
As part of the first frame operations, we generated the transformation matrix based on 4 source points
in the image mapped to 4 destination points in a transformed image. Now we need to transform the
video frame based on this transformation matrix.
OpenCV provides a function to transform an image based on a 3x3 homography matrix. This is the
matrix we generated earlier. Application of this function to an image results in a transformation similar
to the one shown in the figures below.
Warping the perspective of an image involves considerable computational overhead. It is therefore
pertinent to use the operation as sparingly as possible.
19
Figure 11 – Sample road scene image before perspective is warped using transformation matrix
Figure 12 – Previous figure after perspective has been warped using transformation matrix
Note – Image samples above are not thresholded as they would be during real operation of the system.
They are given to illustrate clearly the effects of the perspective transform.
20
4.3.5 Distance Determination
Now that we have warped the image to a top down view, we can measure the distance to the vehicle in
front linearly. This consists of several steps.
Figure 13 – Transformed image of sample road scene, ready for object detection.
We know the coordinates of the front of our car and so, we loop vertically upwards through the image,
working on a small rectangle in the area in front of the car. For each small rectangle, we average the
pixel values across the rectangle. We know that all road pixels are zero, thanks to the threshold applied
earlier, so we increment the position of the rectangle until the average function returns a non-zero
21
value. This value is stored in a global variable that is accessible from the main body of code and
corresponds to the distance between our vehicle and the object directly in front.
Now we have the position of the object directly in front of the car in its transformed image coordinates.
We transform this point back to the original coordinate system in order to overlay feedback to the user.
4.3.6 Provide Graphical Overlay
Now that we have a value for the position of the object in co ordinates relevant to the original image,
we can overlay graphics and text for display to the user.
The value that we have calculated for distance is in pixels. It is the distance in pixels from the base of the
image (front of our vehicle) to the next object in front of our vehicle. This value will change in a linear
fashion as the real world distance changes, e.g. a value of 700 pixels will equate to twice the distance
350 pixels equates to. It is this linearity that is the strength of the Inverse Perspective Mapping
algorithm.
In order to display an accurate value for distance, we need to know the ‘scaling factor’. This is the value
in pixels that corresponds to one meter. This value can calibrated for a particular camera configuration
by placing a meter stick in front of the car flat on the road surface and measuring the number of pixels
that this corresponds to in the top down view. This value will stay the same as long as the characteristics
of the camera (its height from the ground, focal length etc) remain the same. For the purposes of testing
using videos from several different sources, an approximate scaling factor was chosen based on analysis
of several sample top down images. The value was calibrated based on the fact that the standard for
Irish lane markings is approximately 1.5m.
Graphical information provided to the user is in the form of the distance figure being displayed in the
upper left quadrant of the image, typically where very little activity takes place.
A rectangular box is also drawn around the detected area to verify that the correct object has been
detected. The figure below shows a sample overlay of information to the user.
22
Figure 14 – Source image with overlay of rectangular box and distance value
4.3.7 Processing Real Time Images (from camera)
While most design, testing and processing was carried out on pre recorded video, a very important goal
had also to be realised. For the algorithm to function in a real time system, it had to be made possible to
not only work on pre recorded video, put live real time data from a camera also.
This involved implementation of a mechanism by which the program would read frames from a camera
rather than a file. If the system is invoked with one command line argument, it attempts to load the
video at the path specified by the argument. If the system is started without a command line argument,
the program will attempt to query frames from a camera attached to the system.
23
5 Optimisation and Testing
Computer vision is inherently computationally heavy. Algorithms which employ many computer vision
techniques can take prohibitively long to compute for real time video. This section explores some ways
in which the system was optimised for maximum accuracy with minimum computation overhead.
5.1 Generate lookup array
One way of optimising an algorithm is to generate a lookup table. In order to translate a single point
from the top down, IPM view, back to the source image we must perform a perspective transform on
this point. If we are running the algorithm on a real time system, this overhead may be unacceptable for
real time operation. As was explored in the section on Inverse Perspective Mapping, in order to
transform an image from the IPM coordinate system to the image coordinate system, we must perform
non trivial matrix multiplication.
The solution is to use the inverse transformation matrix to map all vertical points back to their
equivalent point in the source image in one single operation, and store the values in an array. This way
we can map a point to its equivalent vertical co-ordinate by simply referencing the lookup array at that
point.
So, when we need to map a point to the original image from the transformed view, we simply need to
reference a value in a pre computed array, instead of performing the actual transformation.
The figure below illustrates how points are not linearly mapped back to the original image. Using the
look up table, we can map any points’ vertical coordinate in the top down image, to its equivalent in the
front facing image. This allows us to map the point where the object is detected back to the original
image with ease.
24
Figure 15 - Illustrates mapping of points in source top down image to points in front facing image
5.2 Sampling Rate
Image processing operations are computationally heavy operations. A video file contains, in general, 30
frames per second. Performing all operations on each frame equates to 30 sets of operations per
second. This is much too frequent for our needs. So, instead we perform the operations once every x
frames.
This project is targeted to function in an automobile environment using a front mounted camera. Most
of the time, the rate of change of distance between our object and the object in front of us is relatively
small, usually cars on the same road are travelling at roughly similar speeds.
Sampling and computing distance information every 33 milliseconds provides very little extra
information. Over 33ms the difference in distance between our vehicle and the vehicle in front of us will
be negligible. For a car travelling 5km/hr faster than another car, the rate of change of distance between
the two cars is 1.38m/s. In 33ms, the difference in distance is 0.046m. This value is small enough as to
25
be imperceptible to a user. Therefore we can alter the sampling rate to calculate distance less
frequently, and therefore save on resources with little to no visible difference during operation.
5.3 Finding Threshold Range
In order to remove the road surface from the input image, we must know something about the
characteristics of the road we are travelling on. This involves sampling an area of the road to determine
appropriate RGB values for the colour of the road surface.
This is done by sampling a small box in front of the car and extracting average colour data from this
area. A larger sample area gives us a more reliable and accurate reading of the road surface, but there is
a trade off in computation time and levels of noise in the image.
The larger the sampling box used, the more likely that road markings or other objects will be included in
the averaging process. This results in a less accurate thresholding value, which when used in
thresholding, decreases reliability and performance.
Choosing an appropriate level above and below the threshold value is very important. Enough allowance
must be given to allow all road pixels to be removed, while not removing much of other objects.
If the value is too small, not all of the road surface will be removed, leading to detection of the road as
an object. If the value is too large, too much of the destination object will be removed, leading to
difficulty in distance determination. An example of incorrect values in use can be seen below.
26
Figure 16 - Original Image - No Thresholding Applied
Figure 17 -Small threshold value applied to scene Large threshold value applied to scene
As can be seen from the image above on the left, in which a range of values ±1 the threshold values
were applied, we need a relatively large range to accurately remove the entire road surface. The image
on the right applied a threshold range of ±100, which is clearly too large as we have removed much of
the vehicle in front along with the road.
27
A range of ±35 gives satisfactory results, which can be seen below.
Figure 18 - Thresholding with range of ±35
In the figure above some of the detail in the bottom part of the vehicle in front of ours has been
removed along with the road. This is not a problem as we will now apply a binary threshold to highlight
any values that are above zero.
5.4 Using Performance Primitives
Performance is a major concern in this project. OpenCV provided somewhat optimized routines for
image processing, but there are available more optimized implementations of some core functions.
These are available in a commercial package called “Intel Performance Primitives” [7]. When deploying
on an Intel processor, OpenCV is able to take advantage of these performance routines to greatly
accelerate the execution of code. Since an intended target platform for this system is the Intel Atom [8]
processor, use of these performance primitives could greatly accelerate execution of the algorithm.
This package is a commercial product, and for cost reasons its use was not explored during the course of
the project.
28
5.5 Level to Trigger Detection
While carrying out distance detection, we loop horizontally through the frame, averaging a small
rectangle directly in front of the vehicle.
The frame that we are scanning has been thresholded, but that does not definitely mean that all road
pixels have successfully been removed. We cannot simply check for the first non zero pixel value in the
image. There may be noise or artifacts left in the image.
Testing needed to be done to select a value to ‘trigger’ object detection, which would only be reached
by an actual object, and filter out any noise. The figure below shows the effect of noise on distance
detection.
Figure 19 - Example of distance detection performed with small trigger value
Using a very low value, as shown above, results in very small levels of noise triggering object detection.
Conversely, using a very high value leads to no object detection at all. An appropriate value was found
through measuring average values across the detection rectangle and discerning a threshold value from
these measurements.
29
5.6 Memory Management
Given that the majority of the project was completed using the C programming language, which does
not include automatic memory management, it was of paramount importance to de allocate any
memory that was allocated. Memory is not in ample supply in embedded systems and therefore must
be strictly monitored.
During early testing of the finished algorithm, memory use became a big concern. While processing a 30
second video, system memory use peaked at over 1GB. Clearly this was unacceptable.
There are several ways to ensure that memory used by an OpenCV program is kept in check. Primary
among these is, when allocating memory structures, one must be vigilant in de allocating it when
finished. The following table illustrates common memory allocation methods and their equivalent de
allocation methods.
cvCreateImage cvReleaseImage
cvCreateImageHeader cvReleaseImageHeader
cvCreateMat cvReleaseMat
cvCreateMatND cvReleaseMatND
cvCreateData cvReleaseData
cvCreateSparseMat cvReleaseSparseMat
cvCreateMemStorage cvReleaseMemStorage
cvCreateGraphScanner cvReleaseGraphScanner
cvOpenFileStorage cvReleaseFileStorage
cvAlloc cvFree
Once these rules were observed strictly, memory usage declined drastically. Below is a table illustrating
measurements of memory use for several sample videos, after memory use was cut.
30
Figure 20 - Graph illustrating memory use for several video samples
5.7 Calibration
In order to obtain accurate distances from the system, calibration for a particular environment needs to
be carried out. There are several ways in which this can be done.
When the system is installed in a vehicle, the position and angle of the camera will not change, which
means that instead of the approximate method employed for testing in the current implementation, we
can employ a more accurate method of calibration, which will give more accurate distance values.
One such method of calibration is to calibrate the camera by placing a square object of known size in
front of the camera on the road plane. Using simple mouse clicks, the transformation matrix for that
environment can be obtained. Due to the wide variety of samples from different environments being
tested as part of this project, this is the method that has been implemented. This method provides a
somewhat rough value, but is satisfactory for testing purposes.
The figure below shows a scene in which a rectangular shape has been overlain, by clicking the corners
of this rectangle, the transformation matrix for this environment can be found.
0
2
4
6
8
10
12
14
16
1 2 3 4 5
Memory
(MB)
Memory Usage
31
Figure 21 – Example of ‘known square’ method of calibration
A second method that can be employed to calibrate the camera and generate the transformation matrix
is through the use of the cameras intrinsic and extrinsic characteristics. A cameras focal length, height
above the ground, and viewing angle can be used. Exploration of this method of calibration is outside
the scope of this project.
Finally, it is possible to perform automatic calibration of the camera using a checkerboard patterned
square placed in front of the car. This is the preferred option, providing simple and reliable calibration.
Firstly, the system detects the checkerboard pattern of known size [11]. From this information the
transformation matrix can be generated. This technique has not been implemented as part of this
system.
32
6 Results
This section illustrates results obtained by the system as implemented. Insight is given into situations in
which the algorithm is effective and where it can be improved.
Naturally the easiest way to evaluate the effectiveness of a system is to test it on a variety of different
conditions.
It was found that the algorithm functioned as expected for most all sample videos. Below are
screenshots of several videos in which the algorithm functioned as expected.
Figure 22 - Samples of successful distance determination
Areas in which the algorithm did not perform as expected will be explored in the “Further Work”
chapter.
6.1 Selection of Sampling Rate
Computer vision algorithms are, as a general rule, very resource intensive. Since the goal for this
algorithm is to implement it in real time on a video stream, performance in a very important concern.
33
A standard camera captures frames at a rate of 30 frames per second. Each frame is displayed for 33
milliseconds. It is not feasible to run the entire algorithm on each frame, 30 times per second.
To improve performance and lessen the load on the system, we run the algorithm less than 30 times per
second, every x frames.
Below is a chart of computation times for the algorithm on a 386 frame video stream, at different
sampling rates.
Figure 23 - Plot of computation times for different sampling rates
A sampling rate of 10 frames was chosen as a good tradeoff between accuracy and updating frequency.
0
5
10
15
20
25
0 2 4 6 8 10 12 14 16
Time (s)
Sampling Rate
Total Computation Time
34
Figure 24 - Measured computation time - comparison between sampling rate of 1 and 10
6.2 Performance of Algorithms
Analysis was carried out on time taken to carry out each step of the system as described in the overall
flowchart of the system, the results of this analysis can be seen below.
Firstly measurements were taken of the total time to perform all 3 of the major operations that must be
repeatedly carried out, namely;
• Thresholding image to remove road surface,
• Warping perspective to create top down view,
• Distance determination.
The chart below shows a comparison on the time taken by each step to process all frames of a 386
frame video.
0
5
10
15
20
25
1 10
Time
Sampling Rate
Average Total Computation Time
35
Figure 25 - Comparison of computation times in seconds
Carrying out all operations on each frame was found to take a total of 0.053 seconds or 53 milliseconds.
This figure is not inherently very useful to know, as it is relative to the processor on which the test is
being carried out. It is greatly affected by the presence of other processes running on a system. Testing
the algorithm on a dedicated microprocessor would give more quantitative bench marks.
What can be inferred from the figures obtained are figures for the percentage of total time taken up by
each part of the system.
To generate these values, the algorithm was modified to run only one of the 3 major operations listed
above. Timing was then carried out using the built in Linux command ‘time’ [9], which measures real
time taken as well as user and kernel time taken to execute programs.
The results were corroborated with a second timing method, using built in clock functionality in the C
language [10].
Below is an illustration of the percentage of total execution time taken up by each of the 3 major parts
of the algorithm.
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
1 2 3 4 5 6 7 8
Time
(s)
Processing Time in Seconds
Thresholding
Warp Perspective
Distance
Determination
36
Figure 26 - Graph of processing time for each of 3 major constituents of algorithm
As can be seen from the figure above, as predicted, warping perspective of the image in order to
generate the top down view of the scene is by far the most time consuming part of the algorithm.
The second most computationally expensive operation is thresholding of the image, again as expected.
Thresholding an image works on the whole image, altering each pixel based on a rule. In this system, this
is done several times, resulting in significant processing time.
0%
10%
20%
30%
40%
50%
60%
70%
Warp Perspective Thresholding Distance Determination
Processing Time for Each Section as %
37
7 Further Work
While the system is very successful in determining distances and detecting objects in front of a vehicle, it
stands to be improved in several areas.
7.1.1 Processing Time
Currently each processed frame requires on average, 0.05 seconds of processor time. This figure can be
improved upon in a number of ways;
• Reduce number of channels in image to be transformed, to one.
This will have the effect of reducing the computation required to transform the (3
channel) image being transformed to 1, and should provide a drastic increase in
performance.
• When thresholding image, only threshold the portion required by the algorithm.
Currently thresholding is applied to the whole image, this is not required as some parts
of the image, e.g., the horizon and the bonnet of the car are irrelevant and unimportant.
Cropping these areas out will increase efficiency.
• Change thresholding algorithm to use less memory.
In the system, as part of the thresholding operation, several extra data structures are
allocated and de allocated. This slows down computation and increases the amount of
memory used. A more efficient algorithm using fewer resources would improve overall
processing time.
• Implement tracking algorithm
The sample rate could be further reduced from 3 times per second with the help of a
tracking algorithm.
7.1.2 Environmental Conditions
The algorithm in its current form is quite susceptible to changes in environment, e.g., going from bright
areas to dim areas. This aspect of the system could be improved using adaptive thresholding.
38
Secondly, the system detects road markings in the middle of the road as objects, which interferes with
distance detection. The system could be improved to intelligently filter out these markings and improve
reliability of the algorithm
7.1.3 Tracking
Implementing tracking as part of the system would greatly improve the algorithm in several ways. By the
nature of the environment where the system operates, there is little change in the location of the
detected object from one frame to another. A tracking algorithm could assist in situations where the
algorithm has lost the object or has been compromised by noise conditions on the road.
7.1.4 Embedded Implementation
It is very much hoped that the system will be ported to an embedded processor in the near future where
it can be properly tested and benchmarked for use in an actual vehicle. Manufacturer specific high
performance C libraries such as the Intel Performance Primitives could be employed to greatly increase
performance.
39
8 Conclusion
As can be seen from the successful implementation of this algorithm in the C language, a real time
distance determination system using OpenCV is clearly achievable. The system as it stands is functional
and complete. Refinements are needed before the system can be deployed with confidence to an actual
embedded device, but indications are positive that this will be possible.
OpenCV has proven a powerful and lightweight computer vision framework and greatly assisted in the
development of the project.
A real time, single camera, passive distance determination algorithm as implemented here could have a
positive effect on road safety and avoidance of road collisions. The use of a single optical camera, which
can have many purposes in a single installation, makes it an attractive proposition for car manufacturers
due to its low cost and simple configuration.
This system offers benefits over similar active systems in terms of both cost and functionality, in that its
object detection is not solely limited to metal, reflective objects.
For ‘normal’ road conditions the algorithm was found to function very well, providing useful information
to the user. This information could then be integrated into the vehicles operation in several ways; by
alerting a user of imminent danger; alerting a user that they are not maintaining a safe following
distance in relation to the car in front; and by performing pre crash safety procedures if an impending
collision.
All of these benefits combine to make a vehicle which implements this system a safer one which ought
to lead to fewer road accidents and fewer injuries or fatalities.
40
9 References
1. Road Safety Authority – Road Collision Facts 2005
(http://www.rsa.ie/publication/publication/upload/2005%20Road%20Collision%20Facts.pdf)
2. Mercedes Pre Safe (http://www2.mercedes-
benz.co.uk/content/unitedkingdom/mpc/mpc_unitedkingdom_website/en/home_mpc/passengercars/ho
me/new_cars/models/cls-class/c219/overview/safety.html)
3. Maud, Hussain, Samad et al. 2004. Implementation of Inverse Perspective Mapping Algorithm For The
Development Of An Automatic Lane Tracking System
4. Mallot et al. 1991. Inverse perspective mapping simplifies optical flow computation and obstacle detection
5. D. O Cualain, C. H. 2009. Lane Departure Detection Using Subtractive Clustering in the Hough Domain.
6. Paul Smith, NUIG Guest Lecture. Applications of Linear Algebra: Computer Vision in Sports
7. Intel Performance Primitives (http://software.intel.com/en-us/intel-ipp/)
8. Intel Atom processor (http://www.intel.com/technology/atom/)
9. ‘time’ command (http://linux.about.com/library/cmd/blcmdl1_time.htm)
10. Timing in C (http://beige.ucs.indiana.edu/B673/node104.html)
11. Learning OpenCV – Computer Vision with the OpenCV libraries. Gary Bradski, Adrian Kaehler. 2008. O.
Reilly Media.
41
10 Appendix A - On the CD Included on the submitted CD is the entirety of the Subversion repository of code developed throughout
the course of the project.
The code is split into various folders with snippets to carry out different parts of the algorithm.
The final implementation, which incorporates many of the separate parts can be found in the ‘Final
Implementation’ folder. Some sample images and videos are included for testing purposes.
top related