automated traffic sign recognition

8/3/2019 Automated Traffic Sign Recognition

1/72

Road Sign Recognition from a

Moving Vehicle

Bjorn Johansson

1


2/72

Abstract

This project aims to research the current technology for recognisingroad signs in real-time from a moving vehicle. The most promis-ing technology for intelligent vehicle systems is vision sensors andimage processing, so this is examined the most thoroughly. Differ-ent processing algorithms and research around the world concernedwith sign recognition are investigated. A functioning system hasalso been implemented using a standard web-camera mounted in atesting vehicle. This system is restricted to speed signs and achievesgood performances thanks to fast but still robust algorithms. Colourinformation is used for the segmentation and a model matching al-gorithm is responsible for the recognition. The human-computerinterface is a voice saying what sign has been found.

2


3/72

Contents

1 Introduction 51.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Sensing Hardware 122.1 Active versus Passive Sensors . . . . . . . . . . . . . . . . . . . . 122.2 Vision based intelligent vehicles . . . . . . . . . . . . . . . . . . . 132.3 Advantages and Disadvantages of Computer Vision Approach . . 15

3 Processing 173.1 Processing Strategies . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 Architectural Issues . . . . . . . . . . . . . . . . . . . . . . . . . 18

4 Previous work 204.1 Using knowledge about road signs . . . . . . . . . . . . . . . . . 20

4.1.1 Colour basics . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.1.2 The HSI colour space . . . . . . . . . . . . . . . . . . . . 224.2 Object detection and recognition using colour . . . . . . . . . . . 244.3 Detection using Shape . . . . . . . . . . . . . . . . . . . . . . . . 264.4 Algorithm considerations . . . . . . . . . . . . . . . . . . . . . . 26

5 Techniques 295.1 Interesting Techniques . . . . . . . . . . . . . . . . . . . . . . . . 295.2 Sign Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.3 Multi-feature Hierarchical Template Matching Using Distance

Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.4 Shape recognition using convex hull approximation . . . . . . . . 345.5 Temporal Integration . . . . . . . . . . . . . . . . . . . . . . . . . 355.6 Colour classification with Neural Networks . . . . . . . . . . . . . 35

5.7 Recognition in Noisy Images Using Simulated Annealing . . . . . 365.8 Using a Statistical Classifier . . . . . . . . . . . . . . . . . . . . . 39

5.8.1 Statistical Classifier with cascade classification . . . . . . 415.8.2 How to get features from an image . . . . . . . . . . . . . 42

5.9 Present state of the art of the Road Sign Recognition Research . 45

6 My Implementation 476.1 Resulting system . . . . . . . . . . . . . . . . . . . . . . . . . . . 476.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3


4/72

6.3 Influences caused by motion . . . . . . . . . . . . . . . . . . . . . 496.4 Practical tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6.5 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516.5.1 The Camera . . . . . . . . . . . . . . . . . . . . . . . . . 516.5.2 The Computer . . . . . . . . . . . . . . . . . . . . . . . . 52

6.6 The Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526.6.1 Internal Segmentation of the Sign . . . . . . . . . . . . . 606.6.2 OCR for the Digits . . . . . . . . . . . . . . . . . . . . . . 606.6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.7 Trial and error, Experiences learned during the process . . . . . . 646.7.1 Color Segmentation . . . . . . . . . . . . . . . . . . . . . 646.7.2 Filtering/Feature extraction . . . . . . . . . . . . . . . . . 686.7.3 Using an Edge Image . . . . . . . . . . . . . . . . . . . . 736.7.4 Compensating for a Rotated/Tilted sign . . . . . . . . . . 776.7.5 Discriminators between digits . . . . . . . . . . . . . . . . 79

7 The Future of RSR/ITS 82

4


5/72

Chapter 1

Introduction

1.1 Background

The Road Sign Recognition (RSR) is a field of applied computer vision researchconcerned with the automatical detection and classification of traffic signs intraffic scene images acquired from a moving car. The result of the RSR researcheffort can be used as a support system for the driver. When the neighbourhoodenvironment is understood, computer support can assist the driver in advancedcollision prediction and avoidance.

Driving is a task based almost entirely on visual information processing. Theroad signs and traffic signals define a visual language interpreted by drivers.Road signs carry many information necessary for successful driving - they de-scribe the current traffic situation, define right-of-way, prohibit or permit certain

directions, warn about risky factors etc. Road signs also help drivers with nav-igation.

Two basic applications of RSR are under consideration in the research com-munity - drivers aid (DSS) and automated surveillance of road traffic devices.It is desirable to design smart car control systems in such a way which will allowthe evolution of fully autonomous vehicles in the future. The RSR system isalso being considered as a valuable complement of the GPS-based navigationsystem. The dynamical environmental map may be enriched by road sign typesand positions (acquired by RSR) and so will increase the precision in the vehiclepositioning.

Problems concerning traffic mobility, safety and energy consumption havebecome more serious in most developed countries. The endeavours to solvethese problems have triggered the interest toward new fields of research and

applications such as automatic vehicle driving, in which new techniques areinvestigated for the entire or partial automation of driving tasks. A recentlydefined comprehensive and integrated system approach referred to as intelligenttransportation system (ITS), links the vehicle, the infrastructure, and the driverto make it possible to achieve more mobile and safer traffic conditions by usingstate of the art electronic communication and computer controlled technology.Over time the ITS research community expects that intelligent vehicles willadvance in three primary ways: in the capabilities of in-vehicle systems, in thesophistication of the driver-vehicle interface and in the ability of vehicles to

5


6/72

communicate with each other and a smart infrastructure [1].An example of this can be found in the research by DaimlerChrysler [50]. A

vehicle detects ice on the road and notifies this to a radio transmitting stationwhich broadcasts this information to all other vehicles approaches this area.The vehicle can also transmit the information to vehicles moving behind andapproaching this area. The first car may have to break and it will warn ap-proaching vehicles of this intention so that it is not hit from behind. See figure1.1.

Figure 1.1: Cooperation between the intelligent infrastructure and intelligentvehicles to warn about slippery road ahead.

Smart vehicles will be able to give route directions, sense objects, warn

drivers of impending collisions (with obstacles, other cars and even pedestri-ans), automatically signal for help in emergencies, keep drivers alert, and mayultimately be able to take over driving.

ITS technologies may provide vehicles with different types and levels of in-telligence to complement the driver. Information systems expand the driversknowledge of routes and locations. Warning systems, such as collision avoidancetechnologies, enhance the drivers ability to sense the surrounding environment,help the driver in sorting and understand all the information passed to him viaroad signs and other types of road markings.

In the last two decades government institutions have activated initial explo-rative phases by means of various projects world-wide, involving a large numberof research units who worked co-operatively, producing several prototypes andsolutions, based on rather different approaches.

In Europe, the PROMOTHEUS project (PROgraM for a European Traf-fic with highest Efficiently and Unprecedented Safety) started this explorationstage in 1986. The project involved more than 13 vehicle manufacturers and sev-eral research units from governments and universities of 19 European countries.Within this framework a number of different ITS approaches were conceived,implemented and demonstrated.

In the United States a great deal of initiatives were launched to addressthe mobility problem involving universities, research centres and automobilecompanies. After this pilot phase, in 1995 the US government established the

6


7/72

Figure 1.2: A typical city image in a non cluttered scene.

National Automated Highway System Consortium (NAHSC) [3] and launchedthe Intelligent Vehicle Initiative (IVI) right after in 1997.

In Japan, where mobility problem is even more intense and evident some ve-hicle prototypes were also developed within the framework of different projects.Similarly to the US case, in 1996 the Advanced Cruise-Assist Highway Sys-tem Research Association (AHSRA) was established amongst a large numberof automobile industries and research centres [4] which developed different ap-proaches to the problem of automatic vehicle guidance.

The ITS is now entering its second phase characterised by a maturity in ap-proaches and by new technological possibilities which allow the development of

the first experimental products. A number of prototypes of intelligent vehicleshave been designed, implemented and tested on the road. The design of theseprototypes has been preceded by the analysis of solutions deriving from similarand close fields of research and has produced a great flourishing of new ideas, in-novative approaches and novel ad hoc solutions. Robotics, artificial intelligence,computer science, computer architectures, telecommunications, control and au-tomation and signal processing are just some of the principal research areas fromwhich the main ideas and solutions were first derived. Initially underlying tech-nological devises, such as head up displays, infrared cameras, radar and sonarderived from expensive military applications but, thanks to the increased inter-est in theses applications and to the progress in industrial production, todaystechnology offers sensors, processing systems and output devices at very com-petitive prices. In order to test a wide spectrum of different approaches these

automatic vehicles prototypes are equipped with a large number of differentsensor and computing engines

1.2 The problem

The problem to detect road sign might seem well defined and simple. The roadsigns occur in standardised positions in traffic scenes, their shapes, colours andpictograms are known (because of international standards).

To see the problem in its whole complexness we must add additional fea-

7


8/72

tures that influence the recognition system design and performance: Road signsare acquired from a vehicle moving on the (often uneven) road surface with

considerable speed. The traffic scene images then often suffer from vibrations;colour information is affected by varying illumination. Road signs are frequentlyoccluded partially by other vehicles or other objects like trees or lamp poles,or other signs. Many objects are present in the traffic scenes which make thesign detection hard (pedestrians, other vehicles, buildings and billboards mayconfuse the detection system by patterns similar to that of road signs). Fur-thermore, the algorithms must be suitable for real-time implementation. Thehardware platform must be able to process huge amount of information in thevideo data stream.

Also there exists variations in the actual pictograms on the signs. For ex-ample an examination of the sign A12 Children reveals that there is a widedifference between the norm and the signs that are used, see figure 1.3. Themost common differences are in ideograms and widths of the sign border. Somesymmetrical signs like No Stopping and End Of Right Of Way are ofteninverted. The reason is, that these alternations do not cause problems to humandrivers and hence has been neglected. For automatic road sign recognition thismeans that the norms can not be taken as a fundamental basis and variationshave to be dealt with by the program.

Figure 1.3: Differences between European road signs (sign A12 Children).

An interesting approach would be to integrate a road sign recognition sys-tem with a Geographical Information System (GIS) which could backup therecognition system by stored information about the road on which the vehicleis currently driving on. This can increase the safety of the recognition system.

A recognition of a 110 sign inside a town will be classified as a false recognition.A test with an ISA (Swedish abbreviation for Intelligent Support system forAdjustment of speed) which integrates a GPS system with a database of speedlimits has been implemented in a public transport buss. The system makes itdifficult to accelerate above the legal speed limit, the accelerator resists theattempt by a counter force. As reported in Metro [49], the driver finds thesystem helpful and stress reducing. He says that he is more aware of the speedlimits and is not as tempted as before to increase the speed a little if he fallsbehind the schedule. The safety must come in the first hand, the time schedulein the second.

8


9/72

The same thing might apply to all drivers, including those driving personalvehicles. If a DSS system regulates the speed it is not as tempting to accelerate

to speeds over the regulated limit if another car comes up to close behind. Theconcentration can be taken from the speedometer and to the current trafficsituation.

Any on-board system for ITS applications needs to meet some importantrequirements.

The final system, installed on a commercial vehicle, must be sufficientlyrobust to adapt to different conditions and changes of environment, road,traffic, illumination, and weather. Moreover, the hardware system needsto be resistant to mechanical and thermal stress.

On-board systems for ITS applications are safety-critical and require ahigh degree of reliability: the project has to be thorough and rigorous

during all its phases, from requirements specification to design and im-plementation. An extensive phase of testing and validation is therefor ofparamount importance.

For marketing reasons, the design of an ITS system is driven by strict costcriteria (it should cost no more than 10 % of the vehicles price), thusrequiring a specific engineering phase. Operative costs (such as powerconsumption) need to be kept low as well, since vehicle performance shouldnot be affected by the use of ITS apparatus.

The systems hardware and sensors have to be kept compact in size andshould not disturb car styling.

The design of the driver-vehicle interface (the place where the driver inter-

acts physically and cognitively with the vehicle) is critical. When givingdrivers access to ITS systems inside the vehicle, designers must not onlyconsider safety (i.e. overloading the drivers information-processing re-sources), but also usability and driver acceptance [5]: interfaces will needto be intelligent and user friendly, effective, and transparent to use; in par-ticular, a full understanding of the subtle tradeoffs of multimodal interfaceintegration will require significant research [2].

9


10/72

Chapter 2

Sensing Hardware

2.1 Active versus Passive Sensors

Laser-based sensors and millimetre-wave radar detect the distance of objectsby measuring the travel time of a signal emitted by the sensors themselves andreflected by the object, and are therefore classified as active sensors. Their maincommon drawback consists of low spatial resolution and slow scanning speed.However millimetre-wave radar are more robust to rain and fog than laser basedradar, though more expensive.

Vision based sensors are defined as passive sensors and have an intrinsicadvantage over laser and radar sensors: the possibility of acquiring data in anon-invasive way, thus not altering the environment (image scanning is per-formed fast enough for ITS applications). Moreover they can be used for some

specific applications for which visual information plays a basic role (such aslane markings localisation, traffic signs recognition, and obstacle identification)without requiring any modifications to road infrastructures. Unfortunately, vi-sion sensors are less robust than millimetre-wave radar in foggy, night or directsunlight conditions.

Active sensors posses some specific peculiarities which result in advantagesover vision-based sensors, in this specific application: they can measure somequantities such as movement, in a more direct way than vision and require lessperforming computing resources, as they acquire a considerably lower amountof data. Nevertheless, besides the problem of environment pollution, the widevariation in reflection ratios caused by different reasons (such as obstacles shapeor material) and the need for the maximum signal level to comply with somesafety rules, the main problem in using active sensors is represented by interfer-

ence among sensors of the same type, which could be critical for a large numberof vehicles moving simultaneously in the same environment, as, example, in thecase of autonomous vehicles travelling on intelligent highways. Hence, foreseeinga massive and widespread use of autonomous sensing agents, the use of passivesensors, such as cameras, obtains key advantages over the use of active ones.

Obviously, machine vision does not extend sensing capabilities besides hu-man possibilities in very critical conditions (e.g., in foggy weather or at nightwith no specific illumination), but can, however, due to a lack of concentrationor due to drowsiness.

10


11/72

2.2 Vision based intelligent vehicles

Some important issues must be carefully considered in the design of a vision sys-tem for automotive applications. In the first place, ITS systems require fasterprocessing than other applications, since vehicle speed is bounded by the pro-cessing rate. The main problem that has to be faced when real-time imaging isconcerned and which is intrinsic to the processing of images is the large amountof data - and therefor computing - involved. As a result, specific computer archi-tectures and processing techniques must be devised in order to achieve real-timeperformance. Nevertheless, since the success of ITS apparatus is tightly relatedto their cost, the computing engines cannot be based on expensive processors.Therefore, either of -the-shelf components or ad hoc dedicated low cost solutionsmust be considered.

Secondly in the automotive field, no assumptions can be made on key param-eters, for example, scene illumination or contrast, which are directly measuredby the vision sensor. Hence, the subsequent processing must be robust enoughto adapt to different environmental conditions( such as sun, rain or fog) andto their dynamic changes (such as transitions between sun and shadow, or theentrance or exit from a tunnel).

Furthermore, other key issues, such as the robustness to vehicles movementsand drifts in the cameras calibration, must be handled as well. However recentadvances in both computer and sensor technologies promote the use of machinevision also in the intelligent vehicles field. The developments in computationalhardware, such as a higher degree of integration and a reduction of the powersupply voltage, permit the production of machines that can deliver a high com-puting power with fast networking facilities, at an affordable price. Currenttechnology allows the use of SIMD-like processing paradigms even in generation

of processors that include multimedia extensions.In addition current cameras include new important features that permit thesolution of some basic problems directly at sensor level. For example, image sta-bilisation can be performed during acquisition, while the extension for cameradynamics allows one to avoid the processing required to adapt the acquisitionparameters to specific light conditions, at least to some extent. The resolutionof sensors has been drastically enhanced, and, in order to decrease the acqui-sition and transfer time, new technological solutions can be found in CMOSsensors, such as the possibility of dealing with pixels independently as in tradi-tional memories. Another key advantage of CMOS-based sensors is that theirintegration on the processing chip seems to be straightforward.

Many different parameters must be evaluated for the design and choice of animage acquisition device. First of all some parameters tightly coupled with the

algorithms regard the choice of monocular versus binocular (stereo) vision andthe sensors angle of view ( some system adopt a multi-camera approach by usingmore than one camera with different viewing angles, e.g. fish eye or zoom). Theresolution and the depth (number of bit/pixel) of the images have to be selectedas well (this also includes the selection of colour versus monochrome images)Other parameters-intrinsic to the sensor-must be considered. Although theframe rate is generally fixed for CCD-devices (25 or 30 Hz) the dynamics of thesensor is of basic importance: conventional cameras allow an intensity contrastof 500:1 within the same image frame, while most ITS applications require a 10000:1 dynamic range for each frame and 100 000: 1 for a short image sequence.

11


12/72

Different approaches have been studied to meet this requirement, ranging fromthe use of CMOS-based cameras with a logarithmically compressed dynamic [6],

[7] to the interpolation and superimposition regarding values of two subsequentimages taken from the same camera [8].

In conclusion, although extremely complex and highly demanding, computervision is a powerful means for sensing the environment and has been widely em-ployed to deal with a large number of tasks in the automotive field, thanksto the great deal of information it can deliver (it has been estimated that hu-mans perceive visually about 90 % of the environment information required fordriving).

2.3 Advantages and Disadvantages of Computer

Vision Approach

Employing computer vision technology into smart vehicles design calls for con-sideration of all its advantages and disadvantages. Firstly, vision subsystemincorporated into the DSS may exploit all the information processed by humandrivers without any requirements for new traffic infrastructure devices (a hardand expensive task). Smart cars equipped with vision based systems will be ableto adapt themselves to operate in different countries (with often quite dissimilartraffic devices). As the integration of various technologies in the field of trafficengineering has been introduced the convenience of computer vision usage hasbecome more obvious. We may observe this trend e.g. in proceedings of annualIEEE International Conference on Intelligent Vehicles (IVS) where more than50 percent of papers are focused on Image Processing and Computer Visionmethods.

Obviously, there exists even disadvantages of the vision-based approach.Smart vehicles will operate in real traffic conditions on the road. So, the al-gorithms must be robust enough to give good results even under adverse il-lumination and weather conditions. For example Fridtjof Stein, main projectmanager of the Cleopatra project (Clusters of embedded parallel time- criticalapplications) [37] said that Reliable optical detection is the biggest hurdle theproject must overcome.

It is impossible to assure absolute system reliability and the system will notbe fail-safe. The aim is to provide a level of safety similar to or higher thanthat of human drivers. Experiments have shown that 60 percent of crashesat intersections and about 30 percent of head-on collisions could have beenavoided if the driver had an additional half-second to react. About 75 percentof vehicular crashes are caused by inattentive drivers.

12


13/72

Chapter 3

Processing

3.1 Processing Strategies

Since sign recognition is generally based on the localisation of specific patterns,it can be performed with the analysis of a single still image. In addition, someassumptions may help and/or speed up the detection process. Due to bothphysical and continuity constraints, the processing of the whole image can bereplaced by the analysis of specific regions of interest only (the so-called focus ofattention), in which the features of interest are more likely to be found. This isa generally followed strategy that can be adopted using the results of previouslyprocessed frames or assuming an a priori knowledge on the road environment.

In some approaches, in particular, windows of interest (WOIs) are deter-mined dynamically by means of statistical methods. For example the system

developed by LASMEA [9] selects the proper window according to the currentstate and previously detected WOIs. The search for features is an iterative pro-cess where continuous updates of the lane model and the size of areas of interestallow the lane detection task to be relatively noise insensitive.

Other systems adopt a more generic model for the road. The ROMA vision-based system uses a contour-based method [11]. A dynamic road model permitsthe processing of small portions of the acquired image therefor enabling real-timeperformance. Actually, only straight or small curved roads without intersectionsare included in this model. Images are processed using a gradient-based filterand a programmable threshold. The road model is used to follow contoursformed by pixels that feature a significant gradient direction value.

3.2 Architectural IssuesIn the early years of ITS applications, a great deal of custom solutions whereproposed, based on ad hoc, special-purpose hardware. This recurrent choicewas motivated by the fact that the hardware available on the market at a rea-sonably low cost was not powerful enough to provide real-time image processingcapabilities. As an example, the researches of the Universitt of Bundeswehrdeveloped their own system architecture: several special-purpose boards wereincluded in the Transputer-based architecture of the VITA vehicle [12]. Othersdeveloped or acquired ad hoc processing engines based on SIMD computational

13


14/72

paradigms to exploit the spatial parallelism of images. Among them, the casesof the 16k Mas-Par MP-2 installed on the experimental vehicle NavLab I [13] at

Carneige Mellon University and the massively parallel architecture PAPRICA[16] jointly developed by the University of Parma and the Politecnico di Torinoand tested on the MOB-LAB vehicle.

Besides selecting the proper sensors and developing specific algorithms, alarge percentage of this first research stage was therefore dedicated to the de-sign, implementation and test of new hardware platforms. In fact, when a newcomputer architecture is built, not only do the hardware and architectural as-pects - such as instruction set, I/O interconnections, or computational paradigm- need to be considered, but software issues as well. Low-level basic librariesmust be developed an tested along with specific tools for code generation, opti-misation and debugging.

In the last few years, the technological evolution led to a change: almost allresearch groups are shifting toward the use of off-the-shelf components for theirsystems. In fact, commercial hardware has nowadays reached a low price / per-formance ratio. As an example, both the new NavLab5 vehicle from CarneigeMellon and the ARGO vehicle from the University of Parma are presently drivenby systems based on general-purpose processors. Thanks to the current avail-ability of fast internetworking facilities, even some MIMD solutions are beingexplored, composed of a rather small number of powerful, independent proces-sors, as in the case of the VaMoRs-P vehicle of the Universitt der Bundeswehron which the Transputer processing system has now been partly replaced bya cluster of three PCs (dual Pentium II) connected via a fast Ethernet-basednetwork [10]

Current trends, however, are moving toward a mixed architecture, in which apowerful general-purpose processor is aided by specific hardware such as boards

and chips implementing optical flow computation, pattern-matching, convolu-tion and morphological filters. Moreover, some SIMD capabilities are now beingtransferred into the instruction set of the last generation CPUs, which has beentailored to exploit the parallelism intrinsic to the processing of visual and audio(multimedia) data. The MMX extensions of the Intel Pentium processor, forinstance, are exploited by the GOLD system which acts as the automatic driverof the ARGO vehicle to boost up performance.

In conclusion it is important to emphasise that, although the new generationof systems are all based on commercial hardware, the development of customhardware has not lost significance, but is gaining a renewed interest for theproduction of embedded systems. Once a hardware and software prototype hasbeen built and extensively tested, its functionalites have to be integrated in afully optimised and engineered embedded system before marketing. It is in this

stage of the project that the development of ad hoc custom hardware still playsa fundamental role and its costs are justified through a large scale market.

14


15/72

Chapter 4

Previous work

4.1 Using knowledge about road signs

Knowledge is available that can be exploited for tackling the problem of roadsign detection and recognition in an efficient way. All road signs are designed,manufactured and installed according to tight regulations stated by federal coun-cils:

colour is regulated not only for the sign category (red = stops, yellow =danger, etc) but also for the tint of the paint that covers the sign, whichshould correspond, with a tolerance, to a specific wavelength in the visiblespectrum. This certainly is a key information but one would be careful inusing it since the standard has been determined according to the controlledillumination that prevailed during the experiments while, in practice, theweather conditions will have a definite impact on the outdoor illuminationand as a result, on the colours as perceived by the cameras. The paint onsigns also deteriorates with time.

Sign shape and dimensions along with those for pictograms, including textfont and character height are also regulated

Signs are usually located on the right side of the road, at a distance usuallyranging from 2 to 4.5 m from the road edge, which is loosely regulated,with the exception of overhead or clearance signs which appear over themiddle lanes. This fact is useful for sign detection since a large portionof the road image can be ignored and thus the processing can be speededup.

Signs may appear in various conditions including damaged, partly oc-cluded, and highlighted by sun light. Signs may also be clustered, e.g.,three or four signs appears one over/beside the other.

4.1.1 Colour basics

Colour undoubtedly represents a key information for drivers to handle. As aconsequence, almost all traffic sign recognition systems found in the literatureprocess colour and acknowledge its importance. Before describing any solution

15


16/72

for sign detection based on colour, it is advisable to understand what colour isand how it is dealt with in the computer vision community, particularly with

regard to changes in lighting conditions, which is a major factor for road signrecognition which will operate in every imaginable outdoor lightning conditions.

When we see a red object, we tend to believe that the red colour is an in-trinsic property of the object. This is not strictly the case. Such a fundamentalproperty would actually be the reflectance of the surface of the object, which al-ters the incident light rays so that a portion of the light energy is absorbed whilethe rest is reflected to our eyes (or the camera). The spectral distribution ofthe reflected rays conveys the chromatic information, and one will acknowledgethat it is dependent not only on the object surface (the interface, the nature ofthe matter that makes the object and the pigments of the colorant) but also onthe spectral distribution of the incident light. The following equation exposesthe basic idea:

=visible

E()S()R()d (4.1)

where is a measurement by sensor , E()S() is the colour signal perceivedby the sensor and R() is the spectral sensitivity of the sensor. The coloursignal is the result of light of spectral power distribution E() hitting the surfaceof an object with spectral reflectance S().

This implies that, for example, if a traffic sign is lit with sunlight that ischaracterised by a rich spectral distribution with much energy toward the bluewavelengths then its colour, as measured by a sensor, will be different fromthat of the same sign lit by a cars headlamps, which have an asymmetricaldistribution and high energy in the red portion of the spectrum.

Moreover, the colour that human eyes perceive is actually a sensation that

is a function of many parameters of biological nature and the environment inwhich the object appears. This is the reason why a sub domain of the scienceof colour is concerned with the psychophysics of the phenomenon.

Similarly to the human eye which has three kinds of receptors (cones) forsensing specific parts of the spectrum, modern CCD cameras perceive colourwith three sensors, each for a primary colour: Red, green and blue. So an objectseen by a camera is represented by a collection of three coordinate (R,G,B)pixels. The data space containing the pixels is called the colour space. Apartfrom the RGB colour space, there are many ways to represent colour dependingon the application. For example, the YIQ colour scheme, which is based on lineartransformations of the RGB coordinates, is used for broadcasting. In this colourscheme, Y stands for luminance, I and Q coordinate carry the chrominanceinformation.

Directly using a threshold value in the RGB space is generally not applicablesince variations in the ambient light intensity will shift the same colour towardthe white (r,g,b) = (255, 255, 255) or (for low energy light) to the black corner(r,g,b) = (0,0,0).

4.1.2 The HSI colour space

An interesting space is called HSI (Hue, Saturation and Intensity) and it has thedistinctive feature of being similar to the way colours are perceived by humans.The first coordinate, Hue (H), represents the actual colour or tint information.

16


17/72

Saturation (S) indicates how deep or pure the colour is, e.g., red is deeper thanpink. Intensity (I) is simply the amount of light. RGB coordinates can be

mapped to HSI space with the use of non-linear transformations (from DigitalImage Processing, Gonzales et al. [48]):

H =

if B G360 if B > G

(4.2)

with

= cos1

12 [(R G) + (R B)]

[(R G)2 + (R B)(G B)]1/2

(4.3)

The saturation component is given by

S = 1

3

(R + G + B) [min(R,G,B)] (4.4)

Finally, the intensity component is given by

I =1

3(R + G + B) (4.5)

Even though variations exists to these transformations the HSI space is al-ways viewed as a conical shaped space with the position of a point expressed interms of cylindrical coordinates, i.e., the triplet (H,S,I) corresponds to cylin-drical coordinates (,r,z) The HSI colour space is certainly appealing for colourprocessing because chromatic information is represented by the hue coordinate,and varying light conditions are absorbed (to some extent) by the intensitycoordinate. However, there are difficulties in using this space:

There is a singularity in the hue dimension along the grey-level axis ( R =G = B).

The Hue coordinate is unstable near the same axis, i.e., small perturba-tions in the RGB signals may cause strong variations in hue.

Properties

Hue is multiplicative/scale invariant: hue(R,G,B) = hue(aR,aG,aB) forall a such that (aR,aG,aB) [0, 255] [0, 255] [0, 255]

Hue is additative/shift invariant: hue(R,G,B) = hue(R + b, G + b, B + b)for all b such that(R + b, G + b, B + b) [0, 255] [0, 255] [0, 255]

The second point underlines that hue is invariant under saturation changes,which means that the tint of an object can still be recovered even if the objectis lit with an intensity-varying illumination source. In fact, Perez and Koach[17] show that the hue coordinate is unaffected by the presence of highlights andshadows on the object, as long as the illumination is white (i.e. equal energy inthe red, green and blue colour bands).

17


18/72

4.2 Object detection and recognition using colour

Colour-based object detection and recognition has a definite appeal to researchersand developers in machine vision. With colour, it is possible to efficiently per-form tasks such as object sorting, quality control, even counting, or structuremeasuring in medical imaging, etc. Various methods have been published aboutcolour object detection.

One approach is to use neural networks that are trained to recognise patternsof colours. For example, Krumbiegel et al [18] used a neural net as classifier forrecognising traffic signs within a region of interest. Another idea is by comparingcolour histograms. Dubuisson and Jain [19] exploited this idea in the design of acar matching system for measurements of travel times. A first camera grabs animage of a passing car and the computer saves its histogram to ta database. Asecond camera is set up further down the road to get the image of another car,hopefully the same car that was imaged by the first camera. The computer usesthe colour histogram for indexing the image into the database and the potentialcandidates are then analysed for shape similarity.

One could imagine a database of road signs along with their histograms, anda sign detection module what would analyse various portions of the image inorder to give an estimate of the probability of finding a sign at each location.The drawback of this is the sensitivity to lighting changes as it deals with RGBpixels.

Projects using colour information for image segmentation:

Ghica et al [20] use look-up tables on the RGB colour space to filter outunwanted colors.

Kehtarnavaz [21] exclusively process stop signs. The colors of a road

scene are mapped to the HSI clour space, in which sub-spaces have beendefined according to a statistical study on stop signs (i.e. 3 hue 56,saturations > 15 units, intensity < 84 units, units are not specified bythe authors but it is probably in the [0 255] interval). A binary imageis then constructed with pixels falling into the sub-space.

An interesting work is that of Priese and Rehrmann [22] who proposed anew parallel segmentation method based on region growing. The approachis in fact hierarchical in the sense that subsets of pixels (in the HSI space)are grouped at different levels so that every object can be representedby a tree. The work has been incorporated into a complete traffic signrecognition system [23] for which colour classes have been set up accordingto the types of signs to be recognised.

[24] et al also work in HSI space. They apply a region growing process toget colour patches which in turn undergo a shape analysis in a later stage.

Krumbiegel et al [18] explore the connectionist approach for road signdetection. An artificial retina is made up of three multilayer neuralnetworks for the three RGB planes and a control unit. The nets act ascorrelators by giving a high output if the region of interest covered by theretina contains a sign. If no sign is present, the control unit forces theretina to shift to another portion of the image.

18


19/72

Kellmeyer and Zwahlen [25] follow the same line in using the same typeof neural net for road scene segmentation. Inputs to the net are colour

differences, defined as proportion of red - proportion of green, and 2 xproportion of blue - proportion of yellow, between each pixel and its neigh-bours. The eight output correspond to a palette of the eight colours, red,orange, yellow, green, blue, violet brown and achromatic, which are themost discriminating colours for traffic signs. All pixels are processed bythe net. However, since only warning signs are sought, the yellow patchesare further processed for shape analysis.

4.3 Detection using Shape

The search for shape-based sign detection algorithms take advantage of thetremendous efforts of research in the field of object recognition, where techniques

have been developed for scene analysis in robotics, solid (3D) object recognition,part localisation in CAD databases and so on. The intrinsic complexity of thetask to be performed leads one to think that a model-based approach is needed.It would yield an elegant solution for object detection in cluttered scenes andallow new shapes to be added to the set of shapes under tracking. There are anumber of difficulties for shape based sign detection:

The signs appear in cluttered scenes, which increases the number of candi-dates since many objects typical o f urban environments can be confoundedwith read signs from a shape point of view: building windows, commer-cial signs (e.g. real-estate signs), various man-made objects (e.g. cars,mailboxes), etc.

The signs do not always have a perfect shape (corners may be torn, othersigns may be attached to them, etc.) and some are tilted with respect tothe vertical axis.

A significant problem with sign detection is related to the variance in scale:signs get bigger as a vehicle moves toward them. The detection moduleshould obviously handle these variations.

Not only do the signs have a varying size, they also appear relatively small:40-50 pixels wide, at the most.

Another difficulty is linked to the way the signs are captured by the acqui-sition system. There may be an non-zero angle between the optical axisof each camera and the normal vector to the sign surface. This angle may

be as high as 30 degrees, depending on the distance between the sign andthe camera.

4.4 Algorithm considerations

A final algorithm should meet the following requirements:

The detection module should operate in real-time. One should thus lookfor efficient, low-complexity matching algorithms.

19


20/72

The detection module should be flexible and adaptable to different condi-tions (such as speed of the vehicle, day-night conditions, exit of tunnels,

change in light etc).

The selection of an object recognition scheme for the detection of road signsbased on their shape will have to address a number of issues such as the typeof object representations, the ways to create a model for a new object and todeal with uncertainties, e.g., imperfect models. Scale-invariant is a desirableproperty of object representations. Past studies of shade detection in road signrecognition

When looking for stop signs, Kehtarnavaz et al. [21] extract the red com-ponents of the image, perform edge detection and then apply the Houghtransform to characterise the sides of the sign. A specific criteria is usedto confirm the presence of a stop sign.

The approach chosen by Saint-Blancard [26] is more refined. The acquiredimage strongly reveals red components due to the presence of a red cut-offfilter in front of the camera lens. The image is filtered with an edge detec-tion algorithm (viz. Dirrerential Nagao gradient) and the result is scannedfor contour follow-up using Freeman coding. The appropriate contoursare then analysed with respect to a set of features consisting of perime-ter (number of pixels), outside surrounding box, surface (inside/outsidecontour within surrounding box), centre of gravity, compactness (aspectratio of box), polygon approximation, Freeman code, histogram of Free-man code, and average grey level inside box. Classification based on thesefeatures is done by a neural network (restricted Coulomb Energy) or anexpert system. The author reports good results, especially in terms of

processing speed (contour detection in 0.6 sec. Using a DSP-based visionboard).

The work by Priese [23] is more model-based since basic shapes of traf-fic sign components (circles, triangles, etc.) are predefined with 24-edgepolygons describing their convex hulls. At the detection stage, patches ofcolours extracted by the segmentation algorithm are collected in objectlists, with one list for each sign colour (blue, white, red...); all objectsare then encoded, and assigned a probability (based on an edge-to-edgecomparison between the object and the model) to characterise their mem-bership to specific shape classes. A decision tree is constructed for thefinal classification. As far as processing times go, the analysis (colour andshape) of all objects in a 512x512 image takes about 700 ms on a Sparc10

machine. The concept is interesting but he shape modelling part is weak,as is the decision making process.

A novel shape detection technique has been proposed by Besserer et al [27]and it has been integrated into a traffic recognition system. The methodclassifies chain coded objects according to evidence (in the Dempster-Shafer theory) supplied by knowledge sources. These sources are a cornerdetector, a circle detector and an histogram-based analyser which informson the number of main directions; the classes are the circle, the triangleand the polygon. When an unknown shape is presented to the module,

20


21/72

each knowledge source studies the pattern and computes a basic proba-bility assignment attached to the feature found. These probabilities are

combined with the Dempster rule, and a semantic network builds up thebelief for each class that the unknown shape is a member of that class.The author claims that this method is flexible and reliable, but point outthat the limiting factor is the segmentation quality, which can be improvedusing colour information.

21


22/72

Chapter 5

Techniques

5.1 Interesting Techniques

Techniques referred to as rigid model fitting in [28] may be promising. Manyof these use specific model representations and a common matching mechanism,called geometric hashing, for indexing in a model database. The idea of ge-ometric hashing is to build a hash table based on the chosen representation(geometrical information) for each object: at the recognition stage, the hashtable is used to make a correspondence between a collection of features andpotential object model. Ex:

Lamdan and Wolfson [29] use a model description in terms of interestpoints which are described in a transformation-invariant coordinate frameand stored into a hash table along with the model itself. For recogni-tion, the images is scanned for interest points, whose coordinates are thentransformed to be transformation invariant: votes are compiled for mod-els that are pointed to through the hash table and those that collectedenough votes are more closely examine against the portions of the imagethat triggered the votes. The principle can be generalised to collectionsof edges or surfaces. The authors claim that this method is very efficienteven when objects are occluded (affine transformations are naturally han-dled), but if the number of features is too high many false alarms mightarise.

Stein and Medioni [62] has presented an idea using super segments forpolygonal approximation of 2D shapes. A super-segment is made up ofa fixed number of adjacent segments, and is encoded as a key to a hashtable (the coding is a collection of angles between pairs of segments plusan eccentricity parameter that is similar to a compactness index). At therecognition stage, the image is filtered with a Canny edge detector, bound-ary tracing is performed and a line fitting algorithm is used for polygonalapproximation. The resulting supersegmentes trigger hypotheses via thehash table (in the form: super-segment i belongs to object model j) anda consistency check allows the identification of the best matching model.It is mentioned that the approach can recognise models in the presenceof noise, occlusion, scale, rotation, translation and weak perspective. An

22


23/72

example is given where the system manages to localise aircrafts from anaerial image of an airport (very cluttered scene).

These algorithms might tend to be slow and indeed in the crude form theyare. However in road sign recognition more knowledge can be used to make theimplementation efficient and real-time. Colour information could be taken intoaccount as well as knowledge about position in the image and so on.

5.2 Sign Recognition

To satisfy the real-time demand most sign recognition techniques use fast recog-nition techniques (e.g. neural networks) instead of more elaborate, more precisebut slower approaches.

A common approach is template matching. For example Piccioli et al [30]use a template matching algorithm. All signs are stored in a database.Each potential sign is normalised in size and compared to every templateof the same shape. Normalised cross-correlations are used.

Akatsuka and Imai [31] restricted the recognition to speed signs so theircontent recognition algorithm is in fact a digit classifier. An histogramanalysis helps to determine the position and size of each digit, then theactual matching is done through correlation with digits from standardspeed signs.

Estable et al [32] relied on Radial Basis Function networks for pictogramrecognition. A collection of sign images with hand-defined regions of in-terest enclosing the signs are used for training the neural nets. Some

colour processing is performed in order to enhance sign features such asthe coloured frame around signs, pictograms, etc.. After training, each nethas the task to detect a specific coloured frame or a specific pictogram.The decision is taken followings inspection of the best responses suppliedby the RBF networks.

Structural strategies include those that decompose the pictogram intoits basic elements (circle, arrow, etc), asses their exactness with respectto the model and combine various measures to yield a global similar-ity/dissimilarity index. Classification using Fourier descriptors for eachcomponent might also be used. Dynamic programming on separate el-ements could also be possible considering that the search space is welldefined within the sign boundaries.

Global strategies do not try to decompose the pictogram into salient fea-tures. Instead the symbol image is recorded into a compact representationby means of a data compression technique such as a neural network, vectorquantization, or the Karhunen-Love (K-L) Transform, and this represen-tation is used as an input to a standard classifier.

23


24/72

5.3 Multi-feature Hierarchical Template Match-

ing Using Distance TransformsFrom Gavrila, [14]. Since matching is a central problem in pattern recognitionan efficient implementation is crucial for computer vision algorithms basing therecognition phase on these kind of approaches.

Figure 5.1: (a) Original traffic scene, from [14]

Figure 5.2: (b) Template, from [14]

Figure 5.3: (c) Edge image, from [14]

This method uses distance transforms (DT) which transforms a binary imageconsisting of feature and non feature pixels, e.g. from an edge image, into a DTimage where each pixel denotes the distance to the nearest feature point, seefigure 5.5. Similarly the object that is to be matched is described using the samescheme. Matching proceeds by correlating the template against the DT image.The correlation value is a measure of similarity in image space. Particular DTalgorithms depend on a variety of factors. The use of a Euclidean distancemetric or not is one of these factors. Another possibility is to use chamfer-2-3metric. See for example Borgefors [33] for more information.

24


25/72

Figure 5.4: (d) DT image, from [14]

Figure 5.5: A binary pattern and its Euclidean Distance Transform, from [14]

Matching template T with image I consists of computing the distance trans-

form on I and transforms on the template T. The matching distance is deter-mined by the pixel values of the DT of the image which lies under the onpixel values of the template image. One possible measure of the similarity atthe template position is the chamfer distance:

Dchamfer(T, I) =1

|T|

tT

dI(t) (5.1)

where |T| denotes the number of features and dI(t) denotes the distance be-tween feature t and the closest feature in I. A template is considered matchedif the distance measure D(T, I) is below a user-supplied threshold. A coarse-to-fine approach can be used to speed up the correlation testing. The resolutionsofT and I can be reduced and matching tried with the smaller images. If a suf-

ficient good match is found the matching can be continued at higher resolutionwhere there might be a match. Also if several templates are to be matched inthe same image they too can be grouped together and replaced by prototypes.For circles this can be concretised by replacing all circles with ranges of circleswhere the prototype for each range will be the circle with radii equal to themedian value of the interval. These can be grouped as a binary tree, each levelfurther reducing the intervals.

To reduce the number of false positives one can also divide the image intoseveral feature spaces. One possibility is to only consider edges in one direction

25


26/72

Figure 5.6: Matching using a DT, from [14]

at the time. The directions can be found by dividing the unit circle in M bins:

{[i

M2,

i + 1

M2]|i = 0, . . . , M 1} (5.2)

Thus a template edge with edge orientation is assigned to the typed templatewith index

2M (5.3)

Error in the measurement of edge orientation must be considered so each edgepoint is assigned to a range of templates within the tolerance of the error inangle measurement. The distance measure will be the sum of the distancemeasurements with the M templates and the M feature images at the examinedlocation.

Figure 5.7: Results using DT transforms, from [14]

26


27/72

5.4 Shape recognition using convex hull approx-

imationThe discrimination of the shape can be done by approximating the shape by apolygon. The shape of an object can be encoded by its convex hull. A goodapproximation of the convex hull can be a 24 sided regular polygon. All neigh-bouring edges are at 15 degrees to each other. A circle will then be characterisedby 24 edges of nearly the same length. A square will have four sides of approx-imately the same length at 90 degrees angles toward each other.

5.5 Temporal Integration

Simple tracking techniques involve few assumptions about the world and relysolely on what is observed in the images to estimate object motion and es-tablish correspondence. More sophisticated techniques model camera geometryand vehicle speed to achieve better motion estimates. For example, [34] con-siders a vehicle driving straight with constant velocity and uses a Kalman-filterframework to track the centres of detected traffic signs. Once correspondence isestablished over time, integration of recognition results is done by simple aver-aging techniques where larger weights are given to recognition results of trafficsigns closer to the camera.

5.6 Colour classification with Neural Networks

An article by Rogahn [35] describes experiments using a neural network toclassify the pixels to the output classes sign pixel and non-sign pixel. The

use of a neural network for the colour classification have the benefit that a bettersolution than a human designer would design might be found. The design ofthe network is as follows:

Input Layer 3x3x3 (x, y, colourspace)

Hidden Layer 1 6 nodes

Hidden Layer 2 3 nodes

Output Layer 3 (colourspace) with range 0 - 1 (0= Not a sign, 1 = Is a sign)

The training algorithm used is hybrid delta-bar-delta backpropagation withskip. A different colour space is used for the input, similar to YUV and the

human visual system: YGrBy, Y = red + green, Gr = green - red, By = blue -Y. A neural network for detecting edges had a similar design but with one out-put node. The networks are trained using noise free and noisy images. It seemshowever that changes in illumination in the image can reduce the classificationresult. Especially for the colour classification.

27


28/72

5.7 Recognition in Noisy Images Using Simu-

lated AnnealingFast Object Recognition in Noisy Images Using Simulated Annealing describedin [36]

Simulated Annealing is a search technique for solving optimisation prob-lems. Its name originates from the process of slowly cooling molecules to forma perfect crystal. The cooling process and its analogous search algorithm isan iterative process, controlled by a decreasing temperature parameter. If thesearch problem involves different kinds of parameters the annealing algorithmis analogous to the cooling of a mixture of liquids, each of which have differentfreezing points.

A model image M(x, y) should be matched in the image I(x, y). From themodel M(x, y) templates T(x, y) can be generated by choosing parameters that

describe a transformation of M into T. The parameters used are a rotation pa-rameter and two sampling parameters sx and sy that define the number ofsamples along the templates coordinate axes (i.e. the scale factor). Thus newtemplates can be generated online from a given model image. The recognition

Figure 5.8: Templates generation from model template, [36]

problem is defined as follows. An object in the image I is defined to be recog-nised if it correlates highly with a template image T of the hypothesised object.This template image T is a transformed version of the model of the hypothesisedobject. Model images can be stored in a library. A correlation coefficient is de-fined which measures how accurate a subpart of the image can be approximatedby template T. Since a model normally does not fill a squared region only thenon-zero pixels in the template is compared with the pixels in the image.

The dimension of the search space is determined by the number of possi-bilities for position, size, shape and orientation of the ob ject to be found. Thenumber of possibilities for the centroid of the object in the image is O(n2) foran nxn image. Assuming that the width and height of the object can be ap-proximated by sampling the model along two perpendicular axes the number ofpossibilities to approximate the size and shape of the object is also O(n2). Thenumber of possible angels are very large but since the image is discrete it can beassumed that the number of angles is O(n). Thus the size of the search spaceis O(n5) for an nxn image. An exhaustive search would take too long.

Terminology from the radar and sonar literature is used to describe the

28


29/72

search space. The search space is called an ambiguity surface. A peak in thesurface means that the correlation coefficient is high for a particular set of pa-

rameters. There may be several peaks in an ambiguity surface. If the templateand the object in the image match perfectly, the cross-correlation between tem-plate and image results in a peak in the ambiguity surface which is the globaloptimum. An iterative search such as steepest decent risk to get stuck in localminima. Simulated Annealing is able to jump out of local minima and find theglobally best correlation value.

Figure 5.9: Traffic Scene and its ambiguity surface for all possible translationsusing fixed scaling and rotation parameters. Simulated Annealing is used tofind the best correlation value (here the darkest pixel value), [36]

At each iteration of the algorithm new templates are generated online byrandomly perturbate the values for location, sampling and rotation from currentvalues. If the new correlation coefficient rj increases over the previous coefficientrj1, the new parameters are accepted in the j th iteration (as in the gradientmethod). Otherwise they are accepted if

e((EjEj1)/Tj) > (5.4)

Where is randomly chosen to be in [0,1], Tj is the temperature parameterand Ej = 1 rj is the cost function in the j th iteration. For a sufficienttemperature this allows jumps out of local minima. Take Tj = T0/j as thecooling temperature for the j th update of the temperature where T0 is theinitial temperature. The criterion for stopping is chosen to be a limit to thesearch length L.

29


30/72

An algorithm that randomly perturbs all parameters at the same time haspoor convergence properties. Therefor at a specific temperature the test for

the location, sampling and rotation angle are not combined. Good results areobtained by using simulated annealing only for the location parameters anda gradient descent with large enough random perturbations for the remainingparameters.

False matches is when one template from the image gets a higher correlationvalue with the simulated annealing process than the correct one. This canhappen if the information content of one template is not very high. For examplethe yield sign can give high correlation values where there are a horizontal barwith dark regions above and below. The information content can be determinedusing the coherence area (see Betke [36] for details). The yield sign has a largecoherence area (197) meaning that even if the sign is moved a bit from a perfectmatch it would still give a high correlation value. Lots of similar objects willgive high correlation values for this model. The stop sign has a coherence areaof only 56 meaning that only similar objects and good transformation valuesgives high correlation values.

The method produces good results even on noisy images. The authors ad-vocate the use of template matching for recognition tasks. The templates canbe constructed on-line so the method is well suited for recognition tasks that in-volve objects with scale and shape variations. The method described is for greyscale images and an extension to colour images could be interesting to pursue.However it might be too slow for real time applications. On a 112 x 77 pixelimage their implementation found the sign in 15 seconds after 300 iterations.This is still a drastic improvement over the exhaustive search which took morethan 10 hours.

5.8 Using a Statistical Classifier

A pattern recognition problem can often be modeled by a statistical decisionproblem, a theoretical approach where the bayesian paradigm can be applied.One wishes to classify a feature vector x RD to one of C mutually exclusiveclasses knowing that the class of x, denoted , takes values in = {1, . . . , C}with probabilites P(1), . . . , P (C), respectively and that x is a realisation of arandom vector X characterized by a class conditional probability density func-tion f(x|), . The task is to find a mapping d: RD such thatthe expected loss function R(d) = E{(d(X), )} called risk, is minimal. Here(i, j) is the loss or penalty incurred when the decision d(x) = i is madeand the true pattern class is in fact j , j = 1, 2, . . . , C .

It can be assumed, without loss of generality, that (i, j

) = 0 for i

= jand (i, j) = 1 for i = j and then R(d) = P(d(x) = ) is called the

probability of error. The optimal rule dopt (called Bayes rule) minimizes R(d)is of the following form:

dopt = arg max1iC

P(|x) (5.5)

where

P(|x) =P(i)f(x|iCi=1 P(i)f(x|i)

i = 1, . . . , C (5.6)

30


31/72

are the posterior probabilities. Ropt denotes then Bayes risk (the risk inequation 5.5).

In practice we rarely have any information about the distribution of (x, ),instead that we have a set of samples N = {(xi, i)}

Ni=1 i.e. a sequence of pairs

(xi, i) distributed like (x, ), where xi is the feature vector and i is its classassignment. The set N of samples is called the training set.

An empirical classification rule dN is a function ofX and N. It is natural toconstruct the rule by replacing P(i|x) in 5.5 by some of its estimates P(i|x).Such a rule can be defined as

dN = arg max1iC

P(|x) (5.7)

5.8.1 Statistical Classifier with cascade classification

It has been demonstrated that road signs cannot be efficiently recognised witha monolithic classifier. This is due firstly to that the number of distinct classeswould exceed a feasible limit. Kotek ([38] pp.178) recommends the class-countto a maximum of 10 and the highest feature count to 20. Moreover the higherthe number of classes the longer will take the classifiers decision since decisionshas to be taken for all classes one by one.

The solution is a cascade classifier [39] where the recognition problem isdivided into several small recognition tasks using specific a priori knowledge.Subproblems are then covered by tiny classifiers. The classification process hasthe form of a n-airy tree with classifiers at each node and a verification processat the leafs. The classes to be considered at the leafs has been greatly reducedcompared to the number of initial classes which gives a higher classificationspeed and more failsafe results. The misclassification risk between separate

groups of road signs will also be minimised. If the algorithm rejects a samplebefore the leaf classifier the partial results still contains important informationabout road sign group (contrary to the monolithical classifier which offers onlya all-or-nothing approach). Finally, the particularities within each group maybe highlighted using the most descriptive features.

In a statistical approach to pattern recognition, Bayes rule 5.6 is used in theclassifiers design. The a priori probability P(i) may be replaced by samplesfrequencies but the probability density f(x|i) must be estimated from trainingdata set. For this there are two possibilities: If the shape of the proability densityis known but not the parameters, the classifier is a parametric classifier. If thedensity shape is unknown, the classifier is nonparametric. Since the distributiontype for road signs is unknown a nonparametric classifier must be used. Theprobability density and its parameters are learned from the training data. Hence

classification quality depends mainly in selection of characteristic train samplesand parameter estimation. An example of a nonparametric approach is thekernel classifier, see [40] for details.

5.8.2 How to get features from an image

There exist no general theory about passing from raw image data to qualityfeatures yet. Successfully implemented methods from the literature may beused. For road signs the entities carrying information is colour combinations,shape and ideogram. Features selected in this classifier is:

31


32/72

Unscaled Spatial Moment MU

MU(m, n) =Jj=1

Kk=1

(xk)m(yj)

nF(j,k) (5.8)

Coordinates xk and yj are defined as :

xk = k 1

2(5.9)

yj = J +1

2j (5.10)

Where the coordinate transformation is described by Pratt [41]

Scaled Spatial Moment M

M(m, n) =MU(m, n)

JnKm=

1

JnKm

Jj=1

Kk=1

(xk)m(yj)

nF(j,k) (5.11)

Unscaled Spatial Central Moment UU

Central moments are invariant against translation by defining the shapecentroid (x, y). Moment M(0, 0) denotes the sum of pixel points in theimage.

U(m, n) =1

JnKm

Jj=1

Kk=1

[(xk xk)]m[(yj) yj ]

nF(j,k) (5.12)

where xk and yj are defined as:

xk =M(1, 0)

M(0, 0)(5.13)

yj =M(0, 1)

M(0, 0)(5.14)

Scaled Spatial Central Moment U

U(m, n) =1

JnKm

Jj=1

Kk=1

[(xk

xk)]

m[(yj)

yj ]nF(j,k) (5.15)

Centroid coordinates xk and yj are defined using unscaled moments:xk = MU(1, 0)

MU(0, 0)(5.16)

yj = MU(0, 1)MU(0, 0)

(5.17)

32


33/72

Normalized Unscaled Central Moment V

V(m, n) =UU

(m, n)

[M(0, 0)] (5.18)

where

=m + n

2+ 1 (5.19)

The normalization of the unscaled central moments has been proposed byHu.

Hus Invarinats hi (four first, more computationally complex invariantsexists also)

These absolute invariants (introduced by Hu) are invariant under transla-tion and general linear transformation.

h1 = V(2, 0) + V(0, 2) (5.20)

h2 = [V(2, 0) V(0, 2)]2 + 4[V(1, 1)]2 (5.21)

h3 = [V(3, 0) 3V(1, 2)]2 + [V(0, 3) 3V(2, 1)]2 (5.22)

h4 = [V(3, 0) + V(1, 2)]2 + [V(0, 3) V(2, 1)]2 (5.23)

Affine Moments Invariants I

These four affine moment invariants were introduced by Flusser [43]:

I2,2 = (V(2, 0)V(0, 2) V(1, 1)2)/V(0, 0)4 (5.24)

I3,4 = (V(3, 0)2 V(0, 3)2 6V(3, 0)V(2, 1)V(1, 2)V(0, 3) + 4V(3, 0)V(1, 2)3 +

+ V(2, 1)3V(0, 3) 3V(2, 1)2V(1, 2)2)/V(0, 0)10 (5.25)

I3,2 = (V(2, 0)(V(2, 1)V(3, 0) V(1, 2)2

) V(1, 1)(V(3, 0)V(0, 3) V(2, 1)V(1, 2)) ++ V(0, 2)(V(3, 0)V(1, 2) V(2, 1)2))/V(0, 0)7 (5.26)

I4,2 = (V(4, 0)V(0, 4) 4V(3, 1)V(1, 3) + 3V(2, 2)2)/V(0, 0)6 (5.27)

Normalized Size

nsize =M(0, 0)

J K(5.28)

A simple feature but with large distinctive strength for some signs.

Center of gravity

Calculated as in 5.13 and 5.14.

Eigenvalues of U

1 =1

2 [U(2, 0) + U(0, 2)] +1

2

U(2, 0)2 + U(0, 2)2 2U(2, 0)U(0, 2) + 4(U(1, 1)2(5.29)

2 =1

2[U(2, 0) + U(0, 2)]

1

2

U(2, 0)2 + U(0, 2)2 2U(2, 0)U(0, 2) + 4(U(1, 1)2(5.30)

max = max[1, 2] (5.31)

min = min[1, 2] (5.32)

33


34/72

Eigenvalues Ratio RA

RA =

min

max (5.33)

Standard Deviation Moments m

These kind of moments have been introduced b Mertzios and Tsirikolis inthe article [42]. The 2D moment of the grade p, q is defined as

mpq =1

LM

Lx=1

My=1

xi xx

pyi yy

qF(x, y) p = 1, 2, . . . q = 1, 2, . . .

(5.34)where L and M denote the image dimensions. The basic idea is thatthe moment is normalised with respect to the standard deviation. Thestandard deviation is defined as:

x =

1LM

Lx=1

My=1

(x x)2F(x, y)

(5.35)

The average x stands for the centroid coordinate. Such defined mo-ments are invariant under translation and magnification of the image,but not under rotation. Authors have recommended usage of momentsm30, m40, m50, m60, m70, m80 and their counterparts m03 etc. The mo-ments may be calculated on either binary or grey level images.

Compactness

Compactness is calculated as:

comp =P2

4A(5.36)

where P stands for object perimeter and A denotes the objects area. Forcircles compactness comes close to unity while elongated objects has valuescomp (1.0, ). This alows easy separation of circles from other objects.The perimeter can be found using standard mathematical morphologicaloperations.

5.9 Present state of the art of the Road SignRecognition Research

Up to now, many algorithms for the road sign detection and classification havebeen introduced. Road signs are often used as convenient real-world objectssuitable for algorithm testing purposes. There may be found papers focusing onthe presentation of successful recognition of particular road signs by some specialalgorithm in the literature. These papers are a valuable source of informationabout different recognition approaches.

A lot of articles that test various algorithms on the detection problem canbe found in the reference list at the end of this document.

The use of optical correlators have also been reported by research groups:

34


35/72

Application of optical multiple-correlation to recognition of road signs:the ability of multiple-correlation, Matsuoka, Taniguchi, Mokuno, Optical

Computing. Proceedings of the International Conference, 1995

On-board optical joint transform correlator for real-time road sign recog-nition, Guibert, Keryer, Servel, Attia, Optical Engineering Vol 34 Iss 1,1995

Scale-invariant optical correlators using ferroelectric liquid-crystal spatiallight modulators, Wilkinson, Petillot, Mears, Applied Optics Vol 34 Iss11, 1995

Extensive research efforts have been founded by Daimler-Benz (now Daimler-Chrysler) whose research groups have reported papers concerning colour seg-mentation, parallel computation structures design and more. The detectionsystem is designed to use colour information for the sign detection. The classi-fication stage is covered by various neural-network or nearest- neighbour classi-fiers. The presence of colour is substantial and the system is unable to operateon images with weak of missing colour information. The most important advan-tage of research groups supported by Daimler-Chrysler is the library of 60000traffic scene images used for system training and evaluation.

35


36/72

Chapter 6

My Implementation

6.1 Resulting system

The algorithm which has been implemented is quite fast, the average time fordetection and classificiation of signs is about 200 ms on a 1 Ghz computer. Thedetection phase is based on color. This detection is quite rudimentary but ithas shown to be both fast and able to segment the image in day and night-timeimages. It might still fail if there are too many sign-like colors in the image.It will also fail if the sign is too small so that a proper classification is notsuccessful. This limit is aproximately when the sign is about 17 pixels wide.The worst case processing time is when there are many red and yellow regionsin the image which will produce lots of segments that has to be analyzed. Theworst case processing time during the tests was about 400 ms.

Lots of methodes have been tried for the realization of this algorithm and Ihave described the most important of these in the section trial and error.

36


37/72

Figure 6.1: Screenshot Fifty

37


38/72

6.2 Introduction

For a successful analysis of the sign it has to have a minimum width in numberof pixels. This will impose requirements on the acquisition of the image in termsof camera resolution, distance to the sign from the camera location, directivityof the camera and the factors imposed by the lens and zoom options of thecamera.

The standard road sign is 64 cm wide. The minimum number of pixelsover the width of the sign for a possible recognition is about 18. A probableresolution of the camera system is 640 pixels over the image. Assuming the fieldof view of the camera is 55 degrees. Then at 5 meters from the car the signwill have the width of 79 pixels and a classification should be possible. Theminimum pixel width of 18 pixels occurs at 21 meters from the car.

6.3 Influences caused by motionAs the sign is getting closer to the car it constitutes a bigger space in the viewplane but it will also be slightly more rotated. The angle between the line ofsight from the car and the normal of the sign increases. This means that thesign gets more and more distorted as the car approaches. Equivalently when thesign is far away from the car the angle between the line of sight and the normalis small so the sign is less distorted. This means that there is an equationthat should be maximised between the distance to the sign and the angle ofdistortion. This should probably be done by experimentation.

Also the moving of the car can cause vibrations in the images. Also blurringcan occur because of the motion of the objects captured in the images. Thetests will reveal how severe these effects are and how they can be dealt with.

6.4 Practical tests

The tests should consist of evaluating the quality of images that can be retrieved.For this purpose a camera which is at the top-of-the-line among the camerasexisting today should be used. There is no point in using a camera only suitablefor distances up to a couple of meters. The tests will evaluate if the resolution isadequate for a possible detection. Different camera variables will also be tested.Most cameras have setting or options for brightness and gamma corrections.This can also be processed as a pre-process step to the detection algorithm.Different settings for resolution can also be tried. A smaller resolution canmean faster detection since the number of pixels to be scanned is smaller. Of

course this can lead to a more difficult detection since the number of pixelscovering the sign has been reduced. Focus settings will also be tried. The bestfocal distance might be 5 meters away from the car since the sign is at its bestposition there. Since we are receiving a flow of images we must try to choosethe best images to analyse. With todays computers it is not possible to doan extensive analysis of all images (up to 30 per second) and we must try toquickly scan through a few images per second and try to detect if there mightbe a sign in this image. Lots of heuristics can be applied here to speed up thesearch as much as possible. When a possible sign is detected another algorithm

38


39/72

can be applied to see what sign it is. The practical tests should produce lots ofmaterials that can be used for testing different heuristics.

These testings have shown that the effects of blurring caused by motion andby vibrations of the car are not that severe. A successful detection when movingat 70 or 90 km/h is possible without the use of additional motion compensation.Otherwise there are optical stabilisation systems that can improve the image atthe camera level. Fast system exist now for handcameras and similar systemscan be used in RSR.

39


40/72

6.5 Hardware

6.5.1 The Camera

The camera used is a Logitech Quickcam Pro 3000 (figure 6.2) which has true640x480 pixels resolution and a colour depth of 24 bits. It has a lens apertureof F/2.0 and manual focus settings (you have to turn the ring on the cameramanually). This means that the focus setting must be set to a specific rangewhere the sign is expected to be located. This has proven to not be a bigproblem. As long as the focus setting is set for a few meters the sign will appearin enough clarity for a successful classification even if it is tens of meters away.The focus settings are not one of the most critical factors of the system

Figure 6.2: The camera used for the experiments

The camera which have been chosen can be accessed using standard TWAINinterfaces. Even if this seem to be a quite slow solution demonstrated by theinitial testing we can still evaluate the performance of the detection system

by analysing the time to analyse the images captured from the camera. Areasonable demand for a system of this type would be that the sign should bedetected before the car passes it. Speed limit signs imposes a speed restrictionwhich has its effective start at the sign. Thus the system can warn the driverif he is going too fast at that point. Preferably the warnings should come abit sooner to give the driver time to slow down. A crude detection algorithmmight find possible signs at a long distance from the sign and before it is able tocorrectly decode the sign it might warn the driver that a speed limit sign mightbe approaching and thus give the driver adequate time to react.

6.5.2 The Computer

The computer used for the experiments is a portable 450 MHz with 128 Mbof RAM and a 2x16kb internal (code+data) and 256 kb external cache. Thetesting is done under windows 2000. This computer and the camera has beenmounted in a testing vehicle for image gathering and testing purposes.

40


41/72

6.6 The Program

Figure 6.3: Simplified class diagram showing the active classes

This represent almost 10 000 lines of code. The algoritm is largely like the

following:The first thing that is done with an image is to extract the regions containing

colours that are belonging to signs. The red regions are found by first extractingall colours in the image that has the following requirements:

The red component of the RGB value is at least 35 (of 255).

The fraction between blue and red is at most 0.69 : bluered < 0.69, theinfluence of the blue is limited.

The fraction between green and red is at most 0.64: greenred < 0.64

41


42/72

Figure 6.4: Localizing signs

If too few pixels where found that met these requirements the lightningconditions in the image might be so that lower limits might be required so thevalues are changed to 20 for the red value and 0.76 for blue/red and 0.74 forgreen/red. Even lower values might be needed for images when a very brightsky is dominating the image and the iris is almost closed so that the colours inthe sign becomes very dark. Then the values are lowered even more. If thenstill too few pixels meeting the requirements are found there might be very bad

lightning conditions but a segmentation is still tried after a dilation of the pixelsfound to increase the probability that an eventual sign is found.

If too many pixels is found it is probable that the signs in the image has arelative big size or that the colour classification was not successful in selectingonly the best matching pixels. Thus we can erode the binary image to increasethe probability of a correct segmentation in the following steps.

The binary image is segmented and overlapping regions is joined. If joinedregions represent two signs this will be detected in subsequent steps. If we didnot find any region that is bigger than the smallest possible sign. The pixels

42


43/72

Figure 6.5: Classifying signs

found are dilated and resegmented.Even though this is a rudimentary and heuristic method it has proven to be

fast and successful even in poor lightning conditions.

Figure 6.6: Red regions marked.

43


44/72

The classification of yellow pixels is done in the same fashion. Here the limitsfor the minimum values applies both for red and green and the blue should not

be too big in comparison with the other values.

Figure 6.7: Yellow regions marked.

The two segmented images is merged together so that eventual signs canbe marked. Ideally we would like to find a yellow region that is about 75% of an overlapping red region. This is rarely the case since signs might beoversegmentad and under segmented. The procedure used is as follows:

For all yellow regions do

For all red regions do

If the yellow region is within the red region

then add a region using the yellow region extended by a normal signborder which is 16% of the height of the yellow region. Add alsothe red region if it is approximately square shaped.

If some yellow regions just overlap the red region a new region isformed by adding the red and yellow regions to form a new signregion.

Yellow regions are also compared to one another to find signs that havebeen split somehow in the segmentation process and needs to be putback together. Also red regions which lies close to yellow regions

are examined so that an undersegmented sign can still be found byextending the region to include the probable sign area.

The expected number of possible segmentation regions is expected to be smallso the correctness of the segmentation is not verified at this stage, except for toosmall or too big segments which are removed as well as those extending beyondthe borders of the image.

44


45/72

Figure 6.8: Example of a segmentation of a partly hidden sign

45


46/72

Figure 6.9: Example of a segmentation of a sign backgrounded by a red seiling

46


47/72

Figure 6.10: Example of a segmentation of a nighttime image

47


48/72

6.6.1 Internal Segmentation of the Sign

Next follows the internal segmentation, i.e. to find the numbers that might bethere if the sign is a speed sign.

All the pixels in the internal region of the sign area is sorted and the darkest30 % are kept. This is likely to include only the pixels belonging to digitssince the yellow background is lighter (higher values in the RGB space). Theremaining pixels are segmented and labelled into regions. Here we would ideallylike that each digit is one segment.

Possible digit regions are searched for within the segmented region, startingat possible locations for speed sign digits. If no regions are found that are bigenough and on the right place for standard digits, small regions are joined untilthe most likely digit region is found.

Each digit is magnified by a factor of 4 to faci

automated traffic sign recognition

Documents