shane tuohy thesis

Real Time Distance Determination for an

Automobile Environment using Inverse

Perspective Mapping in Open CV

Shane Tuohy

B.E in Electronic and Computer Engineering

Supervisor – Dr. Martin Glavin

Co Supervisor – Dr. Fearghal Morgan

24 March 2010

Abstract

This project aims to develop a real time distance determination algorithm for use in an automobile

environment. Increasingly modern cars are being fitted with image sensors, these image sensors can be

used to obtain large amounts of information about the surrounding area.

From a single front facing image, it is difficult to determine distances to objects in front of the vehicle

with any degree of certainty. There is a non linear relationship between the height of an object in a front

facing image, and its distance from the camera.

This project aims to use Inverse Perspective Mapping to overcome this problem. Using Inverse

Perspective Mapping, we can transform the front facing image to a top down bird’s eye view, in which

there is a linear relationship between distances in the image and in the real world.

The aim of the project is to implement the algorithm in the C language using the OpenCV libraries.

Working in this way provides for a high performance, low overhead system that will be possible to

implement and run on a low power embedded device in an automobile environment.

Acknowledgements

I would like to acknowledge the help and support I received throughout the project from my project

supervisor Dr. Martin Glavin and postgraduate researchers in the CAR lab, NUIG.

In particular I would like to thank Diarmaid O'Cualain for his constant support and patience.

This project would not have been possible without debugging help, discussion and encouragement

received from my fellow 4th

EE/ECE classmates.

Finally, I’d like to thank my parents for their continued support over the last 4 years.

Declaration of Originality

I declare that this thesis is my original work except where stated.

Date: ___________________________________

Signature: ___________________________________

1 Contents Abstract ..................................................................................................................................................... ii

Acknowledgements .................................................................................................................................. iii

Declaration of Originality ............................................................................................................................. iv

Table of Figures ........................................................................................................................................... vii

1 Glossary ................................................................................................................................................. 1

2 System Overview................................................................................................................................... 2

3 Background Technologies ..................................................................................................................... 6

3.1 Computer Vision ........................................................................................................................... 6

3.2 OpenCV ......................................................................................................................................... 6

3.2.1 Useful OpenCV Functions ..................................................................................................... 7

3.2.2 cvSetImageROI ...................................................................................................................... 8

3.2.3 cvWarpPerspective ............................................................................................................... 8

3.2.4 Drawing Functions ................................................................................................................ 8

3.3 Inverse Perspective Mapping ........................................................................................................ 9

4 Project Structure ................................................................................................................................. 12

4.1 Overall System Flowchart ........................................................................................................... 13

4.2 First Frame Operations Flowchart .............................................................................................. 14

4.3 Process Specifics ......................................................................................................................... 15

4.3.1 Capture Video Frame .......................................................................................................... 15

4.3.2 First Frame Operations ....................................................................................................... 15

4.3.3 Threshold Image ................................................................................................................. 16

4.3.4 Warp Perspective ................................................................................................................ 18

4.3.5 Distance Determination ...................................................................................................... 20

4.3.6 Provide Graphical Overlay................................................................................................... 21

4.3.7 Processing Real Time Images (from camera) ...................................................................... 22

5 Optimisation and Testing .................................................................................................................... 23

5.1 Generate lookup array ................................................................................................................ 23

5.2 Sampling Rate ............................................................................................................................. 24

5.3 Finding Threshold Range ............................................................................................................. 25

5.4 Using Performance Primitives ..................................................................................................... 27

5.5 Level to Trigger Detection ........................................................................................................... 28

5.6 Memory Management ................................................................................................................ 29

5.7 Calibration ................................................................................................................................... 30

6 Results ................................................................................................................................................. 32

6.1 Selection of Sampling Rate ......................................................................................................... 32

6.2 Performance of Algorithms ......................................................................................................... 34

7 Further Work ....................................................................................................................................... 37

7.1.1 Processing Time .................................................................................................................. 37

7.1.2 Environmental Conditions................................................................................................... 37

7.1.3 Tracking ............................................................................................................................... 38

7.1.4 Embedded Implementation ................................................................................................ 38

8 Conclusion ........................................................................................................................................... 39

9 References .......................................................................................................................................... 40

10 Appendix A - On the CD .................................................................................................................. 41

Table of Figures

Figure 1 - Overview of proposed system ...................................................................................................... 4

Figure 2 - Images illustrating differences between vertical distance on camera image, and real world

distance ......................................................................................................................................................... 5

Figure 3 - Inverse Perspective Mapped image with distance indicated by white arrow. Vertical distance

now corresponds linearly to real distance .................................................................................................... 5

Figure 4 – Illustration of camera position and coordinate systems in use ................................................... 9

Figure 5 - Inverse Perspective Mapped view of road scene. Distorted image of car ahead can be seen in

red box at top of image. .............................................................................................................................. 11

Figure 6 - Overall System Flowchart ........................................................................................................... 13

Figure 7 - First Frame Operations Flowchart .............................................................................................. 14

Figure 8 - Original image before thresholding to remove road pixels ........................................................ 17

Figure 9 - Thresholded image of same scene as figure above with road pixels removed .......................... 17

Figure 10 - Thresholded image from figure above with road removed and non road objects highlighted 18

Figure 11 – Sample road scene image before perspective is warped using transformation matrix .......... 19

Figure 12 – Previous figure after perspective has been warped using transformation matrix .................. 19

Figure 13 – Transformed image of sample road scene, ready for object detection. ................................. 20

Figure 14 – Source image with overlay of rectangular box and distance value ......................................... 22

Figure 15 - Illustrates mapping of points in source top down image to points in front facing image ........ 24

Figure 16 - Original Image - No Thresholding Applied ................................................................................ 26

Figure 17 -Small threshold value applied to scene Large threshold value applied to scene .................. 26

Figure 18 - Thresholding with range of ±35 ................................................................................................ 27

Figure 19 - Example of distance detection performed with small trigger value ........................................ 28

Figure 20 - Graph illustrating memory use for several video samples ....................................................... 30

Figure 21 – Example of ‘known square’ method of calibration .................................................................. 31

Figure 22 - Samples of successful distance determination ......................................................................... 32

Figure 23 - Plot of computation times for different sampling rates ........................................................... 33

Figure 24 - Measured computation time - comparison between sampling rate of 1 and 10 ..................... 34

Figure 25 - Comparison of computation times in seconds ......................................................................... 35

Figure 26 - Graph of processing time for each of 3 major constituents of algorithm ................................ 36

1 Glossary

IPM – Inverse Perspective Mapping

OpenCV – Open Computer Vision

Thresholding – Process by which pixels above or below a certain intensity are removed from an image

C – Low level, compiled programming language

gcc – Open source C compiler

ROI – Region of Interest

2 System Overview

In 2005, 396 people, more than one per day, were killed in road traffic accidents [1]. For this reason,

collision avoidance or prevention systems are in the best interests of car safety. Safety is a primary

concern for all car manufacturers. In recent years, ABS, stability controls, airbags, ESP etc. have become

standard on many car models.

Using computer vision techniques and optical cameras, safety systems can be vastly improved. Cars in

the near future will be able to intelligently analyze their environment and react accordingly to improve

driver safety.

Computer vision is fundamentally the process by which we can allow machines to ‘see’ the world and

react to it. Its importance cannot be overstated in fields such as manufacturing, surveillance and

environment detection. Using the techniques of computer vision, we can create powerful and helpful

real world applications which incorporate real world conditions.

An increasingly common application of computer vision systems is in the field of safety. Machines can be

programmed to detect and respond to dangerous conditions automatically, based on the interpretation

of the world around them. Computer vision can be used to provide accurate, useful information to

machine operators or users. One such machine, where computer vision can be leveraged to provide

useful, potentially lifesaving information, is the car.

Current systems on the market from manufacturers such as Mercedes [2] pre-charge brakes and tighten

slack on seatbelts if an imminent collision is detected.

It is becoming increasingly common for modern automobiles to be fitted with camera systems to aid in

driver awareness and safety. Systems such as those found in the Opal Insignia are becoming more and

more popular. The Opel Insignia uses a front mounted camera to detect road signs and monitor lane

departures providing increased levels of information to drivers.

Distance determination in an automobile environment is understandably a worthwhile undertaking.

With an effective distance determination algorithm, steps can be taken to alert drivers to potential

hazards and unsafe driving. Distance data from a system similar to the one proposed could be applied to

an adaptive cruise control system, which senses upcoming obstacles and adjusts the speed of the

vehicle accordingly. In fact, combined with lane detection algorithms, it is entirely possible to envision a

car that could, in theory, drive itself.

Currently available systems on the market from manufacturers such as Audi, Mercedes Benz, and Nissan

use RADAR or LIDAR sensors to implement collision detection. These can work well when the RADAR

signals reflect from a metal object, they do not, however, detect pedestrians or animals on the road.

These systems are also expensive to implement, and are therefore a sub optimal solution.

Current research into the area of collision detection is in the field of forward facing cameras, which

provide more information about a scene and are cheap and reliable.

The proposed system consists of a single front facing video camera mounted on a vehicle capturing

video images at 30 frames per second. This setup distinguishes the system from similar systems which

use either a multi camera setup or, alternatively, active devices such as RADAR or LADAR. A single

camera system is more reliable and simpler than any of these methods.

A dual (or more) camera setup as employed by Toyota in some Lexus models, provides more data to

process, and therefore, more accurate results. However, it also carries severe processing and

configuration overheads, which render it unsuitable for use in low power, low resource, embedded

devices typically found in automobiles. It is also a much more expensive system to implement, for

obvious reasons, than a single camera system.

Active systems such as RADAR or LIDAR require signals to be reflected from targets, this leaves them

susceptible to interference, possibly from other identical systems approaching them. Mounting these

active systems to cars can difficult, and most importantly, carry a significant financial expense.

Often, a front facing optical camera fitted to a car can have several uses, the same camera can be used

for lane detection and road sign detection as well as distance determination, providing a comprehensive

security package using a single camera.

Previously, computer vision algorithms were much too computationally heavy to implement in real time

on low power devices. Devices such as the Intel Atom processor, which are very capable, yet consume

very little power, can make implementation of these types of algorithm in real time a reality.

In conclusion, the passive nature, flexibility and simplicity of a single camera set up makes it well suited

to implementation in an automobile environment. The proposed system is capable of providing life

saving information to drivers.

Figure 1 - Overview of proposed system

The two figures below illustrate the problem explicitly. As the vehicle in front approaches the camera,

vertical distance in the image, as indicated by the white lines, does not vary in a linear fashion.

The third image has been transformed to a top down view, now distance on the image is linearly related

to distance in the real world.

Figure 2 - Images illustrating differences between vertical distance on camera image, and real world distance

Figure 3 - Inverse Perspective Mapped image with distance indicated by white arrow. Vertical distance now corresponds

linearly to real distance

3 Background Technologies

This chapter illustrates the technology background of the project. It provides a brief overview of the

field of computer vision and the technologies being used to implement the system.

3.1 Computer Vision

Computer vision is a rapidly growing field. Modern, more powerful, microprocessors are increasingly

able to handle the computationally expensive overhead of working with images. This opens up exciting

possibilities for innovative computer vision implementations in everyday live. A recently popular

example is the field of augmented reality, which uses computer vision techniques on a portable device

to ‘understand’ a scene and overlay useful information. It is often found in mobile phones, which prior

to now, would not have been capable of handling the processing overhead of such a system.

One area in which computer visions are an integral part is the DARPA Urban Challenge1 in which

students attempt to create a vehicle that will drive itself. Vehicles are required to merge into two way

traffic and carry out other complex driving manoeuvres autonomously. Without advanced computer

vision algorithms, these challenges would be impossible.

3.2 OpenCV2

OpenCV stands for “Open Computer Vision”. OpenCV is a library of functions for image processing.

Originally developed by Intel in 1999, it is now an Open Source project released under the BSD license.

The purpose of OpenCV is to provide open and optimized code to support the advancement of

The functions themselves are mostly written in C. The purpose of OpenCV is to provide robust, quick

routines for standard image processing techniques. There are many commonly used techniques in image

processing for computer vision applications. OpenCV provides implementations of many of these

functions allowing for rapid algorithm implementation. While implementing algorithms, the

programmer doesn’t have to continually “reinvent the wheel”.

Although OpenCV itself is an open source project, Intel provides a product named “Integrated

Performance Primitives”, a commercial package of highly optimized routines which OpenCV can use in

place of its own routines to speed up computation.

1 http://www.darpa.mil/grandchallenge/index.asp

2 http://opencv.willowgarage.com/wiki/

For this particular project, OpenCV greatly accelerated development by providing routines to threshold

images, generate homography matrices and sample regions of images.

Although wrappers for the OpenCV libraries have been developed for high level languages such as C#

and Python, code for this system was written solely in C, and compiled using the standard gcc compiler.

Using a lower level language like C increases performance, leading to a real time, or close to real time

implementation.

3.2.1 Useful OpenCV Functions

3.2.1.1 cvThreshold

Performing a threshold of an image is a fundamental image processing technique. It involves examining

the intensity values of each pixel in an image, and performing a particular operation depending on this

value.

OpenCV provides extensive thresholding options via the cvThreshold function. Several different types of

thresholding are available;

• CV_THRESH_BINARY

• CV_THRESH_BINARY_INV

• CV_THRESH_TRUNC

• CV_THRESH_TOZERO_INV

• CV_THRESH_TOZERO

Most pertinent to this project are CV_THRESH_TOZERO, CV_THRESH_TOZERO_INV and

CV_THRESH_BINARY.

CV_THRESH_TOZERO – If a pixel is below the threshold value, it is given an intensity value of 0.

Otherwise it is not affected.

CV_THRESH_TOZERO_INV – The opposite of above, if a pixel is above the threshold value, it is given an

intensity of 0. Otherwise it is not affected.

CV_THRESH_BINARY – As the name suggests, depending on which side of the threshold value the pixel

intensity value lies on, it is assigned a value of 0, or 255.

3.2.2 cvSetImageROI

cvSetImageROI allows the programmer to perform operations on specified areas of an image. This

allows functions such as thresholding, averaging, smoothing etc to be applied to one part of an image,

while leaving the rest of the image the same, this function has proven very useful throughout the

project.

3.2.3 cvWarpPerspective

cvWarpPerspective, as the name suggest, allows the perspective of an image be warped based on a

transformation matrix. This has been the most important function provided by OpenCV for this project.

It allows points to be mapped from one perspective to another through the use of a single function,

greatly reducing programmer overhead.

3.2.4 Drawing Functions

OpenCV provides numerous drawing functions which allow feedback to be given to the user easily by

overlaying shapes and text onto images.

3.3 Inverse Perspective Mapping

A front facing image from a car is useful for many applications, it is however, useless for distance

determination. Vehicles that are far away from the camera appear high in the image. They appear

progressively lower as they approach the camera.

The problem is that this does not occur in a linear fashion, therefore there is no simple way to discern

distance information based on the position of a vehicle in a front facing image.

To overcome this issue, we use Inverse Perspective Mapping [3][4]. Inverse Perspective Mapping (IPM)

uses a 3 by 3 homography matrix to translate points from the image plane to those found in the real

world.

Using this homography matrix, we can transform our image so that we are looking directly at the true

road plane. We can use this to measure the distance between objects in the image with a degree of

certainty, since the relative position of the objects will change in a linear fashion.

The figure below illustrates the different coordinate systems that are employed, and their relation to

one another.

Figure 4 – Illustration of camera position and coordinate systems in use

��, �, �� . ��, �� .

��, �, 0� ! � �� "�� .

As can be seen from the figure above, mapping a point from I(u,v) involves a rotation about the angle θ

and a translation along the line of sight of the camera.

The matrix mathematics are shown below in Equation 1 [5]

Equation 1 – Matrix Mathematics for transforming point from image coordinates to world coordinates

��, �, 1�$ % �&�� &��'��(��, �, �, 1�$

Simplifying results in Equation 2:

Equation 2 - Illustrating transformation matrix

Transformation Matrix

)�*�*1+ % ,�-- �-. �-/ �-0�.- �.. �./ �.0�/- �/. �// �/0123454641 7

We use homogenous co ordinate systems as it allows us to represent points at infinity, e.g. vanishing

points. They also allow us to include division into linear equations, as well as constants.[6]

Equation 3 - Demonstration of benefits of homogenous coordinates

8�� "�� 3, 5, 6� % �3, 5, 6, 0� 89 % �3, 5, 6, :� �� 8 % �3: , 5: , 6:�

Using the Transformation Matrix above, we can map any point in the plane I(u,v) to its corresponding

point in the real world plane W(X, Y, Z) with Z = 0. The transformation results in images in which any

point with a Z coordinate greater than zero is incorrectly mapped, resulting in distortion of the image, as

can be seen in the image below. For our purposes, an image similar to the one in the figure below is

perfectly sufficient to determine distances.

Figure 5 - Inverse Perspective Mapped view of road sce

Inverse Perspective Mapped view of road scene. Distorted image of car ahead can be seen in red box at top of

image.

ne. Distorted image of car ahead can be seen in red box at top of

4 Project Structure

This chapter describes the structure of work carried out during the course of the project and the

development of the system. It describes an overall flow for the system before explaining in detail the

logic and implementation behind each step in the system.

This project was split into 5 distinct goals to measure progress. Those goals were;

Goal 1

• Commission the OpenCV system to load frames of video into memory.

• Sample the image pixels of the road directly in front of the vehicle.

• Threshold based on these sample pixels to remove the road surface from the image. The only

remaining pixels are that of the sky, edges of the road (such as trees and buildings), road

markings and other vehicles.

• Generate the IPM characteristic matrix which is developed from the height of the camera from

the ground, and the angle at which the camera is mounted (w.r.t. the ground plane).

Goal 2

• Transform the image from the original input view to the IPM transformed view. This is done by

applying the IPM matrix developed in the previous milestone.

Goal 3

• Determine the distance to the vehicle in front, by looking for the first pixels of “non-road”

directly in front of the vehicle. One the position is determined; it is then possible to calculate the

distance.

Goal 4

• Display this information to the driver by overlaying graphics on the original image to clearly

indicate the distance to the vehicle in front.

Goal 5

• Modify the system to run in real time with a video stream. If possible, this could be achieved in

real-time using a video camera and a laptop (or other available embedded signal processing

hardware).

4.1 Overall System Flowchart

The following flowchart illustrates general overall system operation.

Figure 6 - Overall System Flowchart

4.2 First Frame Operations Flowchart

The following flowchart illustrates operations carried out on receipt of first frame of video.

Figure 7 - First Frame Operations Flowchart

4.3 Process Specifics

4.3.1 Capture Video Frame

Capturing of video frames is done using OpenCVs file capture or camera capture functions. Using these

functions we create a CvCapture object from which we can query for frames from either the video, or

camera connected to the laptop.

4.3.2 First Frame Operations

This section of the system consists of several operations, which need only be carried out once. The data

generated through these operations can then be used over and over for each frame of video that is

sampled. Many of the methods used in this project are computationally heavy to do on the fly, so as

much work as possible is done at the beginning of the program. This way the stored values can be used

repeatedly for each frame, saving on computation time.

4.3.2.1 Capture source points

Firstly, 4 source points are captured from the user. These points are stored in an array, and used for the

next step, generation of the transformation matrix.

This is done using a mouse handler to return the position of user selected points in the image.

4.3.2.2 Generate transformation matrix

The transformation matrix is the key to the Inverse Perspective Mapping algorithm. Helpfully, OpenCV

provides a function to generate this matrix without needing to manually carry out the mathematical

operations listed in the Inverse Perspective Mapping section above.

The function cvGeneratePerspectiveTransform takes an array of points from the source image, and

generates the transformation matrix that maps them to an array of points in the destination (top down)

image. The simplified matrix mathematics is illustrated in the equation below.

�� 8�� % �'��"�� ;��'��"�� 8��

The source points that are chosen by the user by clicking points in the image, map to a square in the

destination image. If we lay a square shaped object on the road in front of the camera, we can use the

corners of the square to generate the appropriate transformation matrix for the current environmental

conditions.

Importantly, applying this operation in the other direction gives us the inverse transformation matrix.

This allows us to map points back from the destination image back to the source image.

4.3.3 Threshold Image

In order to detect a vehicle on the road in front of us, we need to be able to discern what part of an

image is the road, and what part is a vehicle.

A solution to this problem is to threshold the image to remove road pixels. That way, anything in front of

the car with an RGB intensity value greater than zero is an object in front of the car.

The major difficulty that this presents is, given a certain image, how can one detect what is a road pixel

and what is not? Roads vary in shade depending on the time of day, weather etc. To obtain a value for

the particular scene we are working on, we can sample the value of pixels directly in front of the vehicle.

We take a small patch of road slightly in front of the car and obtain an average value for the pixels

across that patch. This gives a good value for the RGB characteristics of the road surface.

This process is somewhat rough, and could be improved by instead taking the median value for all pixels,

thus eliminating noise values generated by the system. Adaptive thresholding, that is, using different

thresholding values for different areas of the image, could be implemented to improve the process

further.

We threshold based on these values. Thresholding works as follows;

• Split image into its constituent channels (R, G, B)

• Use built in OpenCV threshold functions CV_THRESH_TOZERO to remove pixels above and

below the threshold value

• This leaves us with images containing just the road surface in the R, G and B planes

• We subtract the original R, G and B images from the images containing just the road surface and

merge the result

• Finally we perform a binary threshold to leave any non road values with a high value to aid in

distance determination

An example of the thresholding algorithm at work can be seen in the following three figures. Fig. 8

shows a sample source image prior to application of the algorithm. Fig. 9 shows the result of

removal of road pixels from the image. Finally, Fig. 10 shows how non road objects are highlighted

to allow for easier detection.

Figure 8 - Original image before thresholding to remove road pixels

Figure 9 - Thresholded image of same scene as figure above with road pixels removed

Figure 10 - Thresholded image from figure above with road removed and non road objects highlighted

As can be seen from the example above, the thresholding algorithm is very effective. Most of the road

surface has been removed, leaving the object in front clearly detectable due to its bright colouring.

4.3.4 Warp Perspective

As part of the first frame operations, we generated the transformation matrix based on 4 source points

in the image mapped to 4 destination points in a transformed image. Now we need to transform the

video frame based on this transformation matrix.

OpenCV provides a function to transform an image based on a 3x3 homography matrix. This is the

matrix we generated earlier. Application of this function to an image results in a transformation similar

to the one shown in the figures below.

Warping the perspective of an image involves considerable computational overhead. It is therefore

pertinent to use the operation as sparingly as possible.

Figure 11 – Sample road scene image before perspective is warped using transformation matrix

Figure 12 – Previous figure after perspective has been warped using transformation matrix

Note – Image samples above are not thresholded as they would be during real operation of the system.

They are given to illustrate clearly the effects of the perspective transform.

4.3.5 Distance Determination

Now that we have warped the image to a top down view, we can measure the distance to the vehicle in

front linearly. This consists of several steps.

Figure 13 – Transformed image of sample road scene, ready for object detection.

We know the coordinates of the front of our car and so, we loop vertically upwards through the image,

working on a small rectangle in the area in front of the car. For each small rectangle, we average the

pixel values across the rectangle. We know that all road pixels are zero, thanks to the threshold applied

earlier, so we increment the position of the rectangle until the average function returns a non-zero

value. This value is stored in a global variable that is accessible from the main body of code and

corresponds to the distance between our vehicle and the object directly in front.

Now we have the position of the object directly in front of the car in its transformed image coordinates.

We transform this point back to the original coordinate system in order to overlay feedback to the user.

4.3.6 Provide Graphical Overlay

Now that we have a value for the position of the object in co ordinates relevant to the original image,

we can overlay graphics and text for display to the user.

The value that we have calculated for distance is in pixels. It is the distance in pixels from the base of the

image (front of our vehicle) to the next object in front of our vehicle. This value will change in a linear

fashion as the real world distance changes, e.g. a value of 700 pixels will equate to twice the distance

350 pixels equates to. It is this linearity that is the strength of the Inverse Perspective Mapping

algorithm.

In order to display an accurate value for distance, we need to know the ‘scaling factor’. This is the value

in pixels that corresponds to one meter. This value can calibrated for a particular camera configuration

by placing a meter stick in front of the car flat on the road surface and measuring the number of pixels

that this corresponds to in the top down view. This value will stay the same as long as the characteristics

of the camera (its height from the ground, focal length etc) remain the same. For the purposes of testing

using videos from several different sources, an approximate scaling factor was chosen based on analysis

of several sample top down images. The value was calibrated based on the fact that the standard for

Irish lane markings is approximately 1.5m.

Graphical information provided to the user is in the form of the distance figure being displayed in the

upper left quadrant of the image, typically where very little activity takes place.

A rectangular box is also drawn around the detected area to verify that the correct object has been

detected. The figure below shows a sample overlay of information to the user.

Figure 14 – Source image with overlay of rectangular box and distance value

4.3.7 Processing Real Time Images (from camera)

While most design, testing and processing was carried out on pre recorded video, a very important goal

had also to be realised. For the algorithm to function in a real time system, it had to be made possible to

not only work on pre recorded video, put live real time data from a camera also.

This involved implementation of a mechanism by which the program would read frames from a camera

rather than a file. If the system is invoked with one command line argument, it attempts to load the

video at the path specified by the argument. If the system is started without a command line argument,

the program will attempt to query frames from a camera attached to the system.

5 Optimisation and Testing

Computer vision is inherently computationally heavy. Algorithms which employ many computer vision

techniques can take prohibitively long to compute for real time video. This section explores some ways

in which the system was optimised for maximum accuracy with minimum computation overhead.

5.1 Generate lookup array

One way of optimising an algorithm is to generate a lookup table. In order to translate a single point

from the top down, IPM view, back to the source image we must perform a perspective transform on

this point. If we are running the algorithm on a real time system, this overhead may be unacceptable for

real time operation. As was explored in the section on Inverse Perspective Mapping, in order to

transform an image from the IPM coordinate system to the image coordinate system, we must perform

non trivial matrix multiplication.

The solution is to use the inverse transformation matrix to map all vertical points back to their

equivalent point in the source image in one single operation, and store the values in an array. This way

we can map a point to its equivalent vertical co-ordinate by simply referencing the lookup array at that

point.

So, when we need to map a point to the original image from the transformed view, we simply need to

reference a value in a pre computed array, instead of performing the actual transformation.

The figure below illustrates how points are not linearly mapped back to the original image. Using the

look up table, we can map any points’ vertical coordinate in the top down image, to its equivalent in the

front facing image. This allows us to map the point where the object is detected back to the original

image with ease.

Figure 15 - Illustrates mapping of points in source top down image to points in front facing image

5.2 Sampling Rate

Image processing operations are computationally heavy operations. A video file contains, in general, 30

frames per second. Performing all operations on each frame equates to 30 sets of operations per

second. This is much too frequent for our needs. So, instead we perform the operations once every x

frames.

This project is targeted to function in an automobile environment using a front mounted camera. Most

of the time, the rate of change of distance between our object and the object in front of us is relatively

small, usually cars on the same road are travelling at roughly similar speeds.

Sampling and computing distance information every 33 milliseconds provides very little extra

information. Over 33ms the difference in distance between our vehicle and the vehicle in front of us will

be negligible. For a car travelling 5km/hr faster than another car, the rate of change of distance between

the two cars is 1.38m/s. In 33ms, the difference in distance is 0.046m. This value is small enough as to

be imperceptible to a user. Therefore we can alter the sampling rate to calculate distance less

frequently, and therefore save on resources with little to no visible difference during operation.

5.3 Finding Threshold Range

In order to remove the road surface from the input image, we must know something about the

characteristics of the road we are travelling on. This involves sampling an area of the road to determine

appropriate RGB values for the colour of the road surface.

This is done by sampling a small box in front of the car and extracting average colour data from this

area. A larger sample area gives us a more reliable and accurate reading of the road surface, but there is

a trade off in computation time and levels of noise in the image.

The larger the sampling box used, the more likely that road markings or other objects will be included in

the averaging process. This results in a less accurate thresholding value, which when used in

thresholding, decreases reliability and performance.

Choosing an appropriate level above and below the threshold value is very important. Enough allowance

must be given to allow all road pixels to be removed, while not removing much of other objects.

If the value is too small, not all of the road surface will be removed, leading to detection of the road as

an object. If the value is too large, too much of the destination object will be removed, leading to

difficulty in distance determination. An example of incorrect values in use can be seen below.

Figure 16 - Original Image - No Thresholding Applied

Figure 17 -Small threshold value applied to scene Large threshold value applied to scene

As can be seen from the image above on the left, in which a range of values ±1 the threshold values

were applied, we need a relatively large range to accurately remove the entire road surface. The image

on the right applied a threshold range of ±100, which is clearly too large as we have removed much of

the vehicle in front along with the road.

A range of ±35 gives satisfactory results, which can be seen below.

Figure 18 - Thresholding with range of ±35

In the figure above some of the detail in the bottom part of the vehicle in front of ours has been

removed along with the road. This is not a problem as we will now apply a binary threshold to highlight

any values that are above zero.

5.4 Using Performance Primitives

Performance is a major concern in this project. OpenCV provided somewhat optimized routines for

image processing, but there are available more optimized implementations of some core functions.

These are available in a commercial package called “Intel Performance Primitives” [7]. When deploying

on an Intel processor, OpenCV is able to take advantage of these performance routines to greatly

accelerate the execution of code. Since an intended target platform for this system is the Intel Atom [8]

processor, use of these performance primitives could greatly accelerate execution of the algorithm.

This package is a commercial product, and for cost reasons its use was not explored during the course of

the project.

5.5 Level to Trigger Detection

While carrying out distance detection, we loop horizontally through the frame, averaging a small

rectangle directly in front of the vehicle.

The frame that we are scanning has been thresholded, but that does not definitely mean that all road

pixels have successfully been removed. We cannot simply check for the first non zero pixel value in the

image. There may be noise or artifacts left in the image.

Testing needed to be done to select a value to ‘trigger’ object detection, which would only be reached

by an actual object, and filter out any noise. The figure below shows the effect of noise on distance

detection.

Figure 19 - Example of distance detection performed with small trigger value

Using a very low value, as shown above, results in very small levels of noise triggering object detection.

Conversely, using a very high value leads to no object detection at all. An appropriate value was found

through measuring average values across the detection rectangle and discerning a threshold value from

these measurements.

5.6 Memory Management

Given that the majority of the project was completed using the C programming language, which does

not include automatic memory management, it was of paramount importance to de allocate any

memory that was allocated. Memory is not in ample supply in embedded systems and therefore must

be strictly monitored.

During early testing of the finished algorithm, memory use became a big concern. While processing a 30

second video, system memory use peaked at over 1GB. Clearly this was unacceptable.

There are several ways to ensure that memory used by an OpenCV program is kept in check. Primary

among these is, when allocating memory structures, one must be vigilant in de allocating it when

finished. The following table illustrates common memory allocation methods and their equivalent de

allocation methods.

cvCreateImage cvReleaseImage

cvCreateImageHeader cvReleaseImageHeader

cvCreateMat cvReleaseMat

cvCreateMatND cvReleaseMatND

cvCreateData cvReleaseData

cvCreateSparseMat cvReleaseSparseMat

cvCreateMemStorage cvReleaseMemStorage

cvCreateGraphScanner cvReleaseGraphScanner

cvOpenFileStorage cvReleaseFileStorage

cvAlloc cvFree

Once these rules were observed strictly, memory usage declined drastically. Below is a table illustrating

measurements of memory use for several sample videos, after memory use was cut.

Figure 20 - Graph illustrating memory use for several video samples

5.7 Calibration

In order to obtain accurate distances from the system, calibration for a particular environment needs to

be carried out. There are several ways in which this can be done.

When the system is installed in a vehicle, the position and angle of the camera will not change, which

means that instead of the approximate method employed for testing in the current implementation, we

can employ a more accurate method of calibration, which will give more accurate distance values.

One such method of calibration is to calibrate the camera by placing a square object of known size in

front of the camera on the road plane. Using simple mouse clicks, the transformation matrix for that

environment can be obtained. Due to the wide variety of samples from different environments being

tested as part of this project, this is the method that has been implemented. This method provides a

somewhat rough value, but is satisfactory for testing purposes.

The figure below shows a scene in which a rectangular shape has been overlain, by clicking the corners

of this rectangle, the transformation matrix for this environment can be found.

1 2 3 4 5

Memory

Memory Usage

Figure 21 – Example of ‘known square’ method of calibration

A second method that can be employed to calibrate the camera and generate the transformation matrix

is through the use of the cameras intrinsic and extrinsic characteristics. A cameras focal length, height

above the ground, and viewing angle can be used. Exploration of this method of calibration is outside

the scope of this project.

Finally, it is possible to perform automatic calibration of the camera using a checkerboard patterned

square placed in front of the car. This is the preferred option, providing simple and reliable calibration.

Firstly, the system detects the checkerboard pattern of known size [11]. From this information the

transformation matrix can be generated. This technique has not been implemented as part of this

system.

6 Results

This section illustrates results obtained by the system as implemented. Insight is given into situations in

which the algorithm is effective and where it can be improved.

Naturally the easiest way to evaluate the effectiveness of a system is to test it on a variety of different

conditions.

It was found that the algorithm functioned as expected for most all sample videos. Below are

screenshots of several videos in which the algorithm functioned as expected.

Figure 22 - Samples of successful distance determination

Areas in which the algorithm did not perform as expected will be explored in the “Further Work”

chapter.

6.1 Selection of Sampling Rate

Computer vision algorithms are, as a general rule, very resource intensive. Since the goal for this

algorithm is to implement it in real time on a video stream, performance in a very important concern.

A standard camera captures frames at a rate of 30 frames per second. Each frame is displayed for 33

milliseconds. It is not feasible to run the entire algorithm on each frame, 30 times per second.

To improve performance and lessen the load on the system, we run the algorithm less than 30 times per

second, every x frames.

Below is a chart of computation times for the algorithm on a 386 frame video stream, at different

sampling rates.

Figure 23 - Plot of computation times for different sampling rates

A sampling rate of 10 frames was chosen as a good tradeoff between accuracy and updating frequency.

0 2 4 6 8 10 12 14 16

Time (s)

Sampling Rate

Total Computation Time

Figure 24 - Measured computation time - comparison between sampling rate of 1 and 10

6.2 Performance of Algorithms

Analysis was carried out on time taken to carry out each step of the system as described in the overall

flowchart of the system, the results of this analysis can be seen below.

Firstly measurements were taken of the total time to perform all 3 of the major operations that must be

repeatedly carried out, namely;

• Thresholding image to remove road surface,

• Warping perspective to create top down view,

• Distance determination.

The chart below shows a comparison on the time taken by each step to process all frames of a 386

frame video.

Sampling Rate

Average Total Computation Time

Figure 25 - Comparison of computation times in seconds

Carrying out all operations on each frame was found to take a total of 0.053 seconds or 53 milliseconds.

This figure is not inherently very useful to know, as it is relative to the processor on which the test is

being carried out. It is greatly affected by the presence of other processes running on a system. Testing

the algorithm on a dedicated microprocessor would give more quantitative bench marks.

What can be inferred from the figures obtained are figures for the percentage of total time taken up by

each part of the system.

To generate these values, the algorithm was modified to run only one of the 3 major operations listed

above. Timing was then carried out using the built in Linux command ‘time’ [9], which measures real

time taken as well as user and kernel time taken to execute programs.

The results were corroborated with a second timing method, using built in clock functionality in the C

language [10].

Below is an illustration of the percentage of total execution time taken up by each of the 3 major parts

of the algorithm.

1 2 3 4 5 6 7 8

Processing Time in Seconds

Thresholding

Warp Perspective

Distance

Determination

Figure 26 - Graph of processing time for each of 3 major constituents of algorithm

As can be seen from the figure above, as predicted, warping perspective of the image in order to

generate the top down view of the scene is by far the most time consuming part of the algorithm.

The second most computationally expensive operation is thresholding of the image, again as expected.

Thresholding an image works on the whole image, altering each pixel based on a rule. In this system, this

is done several times, resulting in significant processing time.

Warp Perspective Thresholding Distance Determination

Processing Time for Each Section as %

7 Further Work

While the system is very successful in determining distances and detecting objects in front of a vehicle, it

stands to be improved in several areas.

7.1.1 Processing Time

Currently each processed frame requires on average, 0.05 seconds of processor time. This figure can be

improved upon in a number of ways;

• Reduce number of channels in image to be transformed, to one.

This will have the effect of reducing the computation required to transform the (3

channel) image being transformed to 1, and should provide a drastic increase in

performance.

• When thresholding image, only threshold the portion required by the algorithm.

Currently thresholding is applied to the whole image, this is not required as some parts

of the image, e.g., the horizon and the bonnet of the car are irrelevant and unimportant.

Cropping these areas out will increase efficiency.

• Change thresholding algorithm to use less memory.

In the system, as part of the thresholding operation, several extra data structures are

allocated and de allocated. This slows down computation and increases the amount of

memory used. A more efficient algorithm using fewer resources would improve overall

processing time.

• Implement tracking algorithm

The sample rate could be further reduced from 3 times per second with the help of a

tracking algorithm.

7.1.2 Environmental Conditions

The algorithm in its current form is quite susceptible to changes in environment, e.g., going from bright

areas to dim areas. This aspect of the system could be improved using adaptive thresholding.

Secondly, the system detects road markings in the middle of the road as objects, which interferes with

distance detection. The system could be improved to intelligently filter out these markings and improve

reliability of the algorithm

7.1.3 Tracking

Implementing tracking as part of the system would greatly improve the algorithm in several ways. By the

nature of the environment where the system operates, there is little change in the location of the

detected object from one frame to another. A tracking algorithm could assist in situations where the

algorithm has lost the object or has been compromised by noise conditions on the road.

7.1.4 Embedded Implementation

It is very much hoped that the system will be ported to an embedded processor in the near future where

it can be properly tested and benchmarked for use in an actual vehicle. Manufacturer specific high

performance C libraries such as the Intel Performance Primitives could be employed to greatly increase

performance.

8 Conclusion

As can be seen from the successful implementation of this algorithm in the C language, a real time

distance determination system using OpenCV is clearly achievable. The system as it stands is functional

and complete. Refinements are needed before the system can be deployed with confidence to an actual

embedded device, but indications are positive that this will be possible.

OpenCV has proven a powerful and lightweight computer vision framework and greatly assisted in the

development of the project.

A real time, single camera, passive distance determination algorithm as implemented here could have a

positive effect on road safety and avoidance of road collisions. The use of a single optical camera, which

can have many purposes in a single installation, makes it an attractive proposition for car manufacturers

due to its low cost and simple configuration.

This system offers benefits over similar active systems in terms of both cost and functionality, in that its

object detection is not solely limited to metal, reflective objects.

For ‘normal’ road conditions the algorithm was found to function very well, providing useful information

to the user. This information could then be integrated into the vehicles operation in several ways; by

alerting a user of imminent danger; alerting a user that they are not maintaining a safe following

distance in relation to the car in front; and by performing pre crash safety procedures if an impending

collision.

All of these benefits combine to make a vehicle which implements this system a safer one which ought

to lead to fewer road accidents and fewer injuries or fatalities.

9 References

1. Road Safety Authority – Road Collision Facts 2005

(http://www.rsa.ie/publication/publication/upload/2005%20Road%20Collision%20Facts.pdf)

2. Mercedes Pre Safe (http://www2.mercedes-

benz.co.uk/content/unitedkingdom/mpc/mpc_unitedkingdom_website/en/home_mpc/passengercars/ho

me/new_cars/models/cls-class/c219/overview/safety.html)

3. Maud, Hussain, Samad et al. 2004. Implementation of Inverse Perspective Mapping Algorithm For The

Development Of An Automatic Lane Tracking System

4. Mallot et al. 1991. Inverse perspective mapping simplifies optical flow computation and obstacle detection

5. D. O Cualain, C. H. 2009. Lane Departure Detection Using Subtractive Clustering in the Hough Domain.

6. Paul Smith, NUIG Guest Lecture. Applications of Linear Algebra: Computer Vision in Sports

7. Intel Performance Primitives (http://software.intel.com/en-us/intel-ipp/)

8. Intel Atom processor (http://www.intel.com/technology/atom/)

9. ‘time’ command (http://linux.about.com/library/cmd/blcmdl1_time.htm)

10. Timing in C (http://beige.ucs.indiana.edu/B673/node104.html)

11. Learning OpenCV – Computer Vision with the OpenCV libraries. Gary Bradski, Adrian Kaehler. 2008. O.

Reilly Media.

10 Appendix A - On the CD Included on the submitted CD is the entirety of the Subversion repository of code developed throughout

the course of the project.

The code is split into various folders with snippets to carry out different parts of the algorithm.

The final implementation, which incorporates many of the separate parts can be found in the ‘Final

Implementation’ folder. Some sample images and videos are included for testing purposes.

shane tuohy thesis

Documents

yun, geun young and tuohy, p.g. and steemers, k. (2009

shane barrett

history and current processes of the martian polar...

caputo v palermo, palermo, & tuohy, p.c

an evening with leigh anne tuohy - google groups

thesis - leonid boytsov · participation in writing a grant...

shane & shane - the one you need (devocional)

shane tremblayportugal

tuohy daniel

parallelism formatting your thesis statement parallel...

shane designs

carolyn hughs tuohy: a tale of three healthcare reforms

mac malware by: shane binkerd, shane moreland, travis...

kane tuohy. financial and commercial center ireland...

shane presentation

shane ryanzakpres

shane drake

pat tuohy, owen fenton - teagasc

tuohy visual collaboration furniture zi collection brochure

investigation of bubble formation in tuohy-borst adaptors