e cient learning of pre-attentive steering in a driving...

Efficient Learning of Pre-attentive Steering in a Driving School

Framework∗

Reinis Rudzits Nicolas Pugeault†

December 24, 2014

Abstract

Autonomous driving is an extremely challengingproblem and existing driverless cars use non-visualsensing to palliate the limitations of machine visionapproaches. This paper presents a driving schoolframework for learning incrementally a fast and ro-bust steering behaviour from visual gist only. Theframework is based on an autonomous steering pro-gram interfacing in real time with a racing simula-tor (TORCS): hence the teacher is a racing programhaving perfect insight into its position on the road,whereas the student learns to steer from visual gistonly. Experiments show that (i) such a framework al-lows the visual driver to drive around the track suc-cessfully after a few iterations, demonstrating thatvisual gist is sufficient input to drive the car success-fully; and (ii) the number of training rounds requiredto drive around a track reduces when the studenthas experienced other tracks, showing that the learntmodel generalises well to unseen tracks.

1 Introduction

Imagine you are driving on a countryside road. Yousteer the car keeping it on your side of the road, slowdown in curves, shift gears accordingly. When an-other car is in front, you adapt your driving to match

∗This article appeared in KI - Kunstliche Intelligenz, 2014,http://dx.doi.org/10.1007/s13218-014-0340-1. The finalpublication is available at link.springer.com.†Corresponding author. Centre for Vision, Speech and Sig-

nal Processing, University of Surrey, GU2 7XH, Guildford, [email protected]

its speed, and if a wide truck comes in front, you cor-rect your lane position to avoid collision. Now con-sider that you are crossing a town. In addition ofthese, you need to handle intersections, other cars,cyclists and pedestrians, traffic lights and signs. Allof these tasks are performed routinely and effortlesslyby experienced drivers, even to the point that driverinattention is named as a major cause for road ac-cidents. In contrast, and despite significant progressover the last decades, artificial systems struggle to ap-proach human performance in any of these tasks, letalone all of them. Early approaches in autonomousdriving can be traced back to the 70’s (e.g., VITS[12], ALVINN [7], see [5] for a review) up to state-of-the-art systems such as the Stanley robot [11], theGoogle car1 or Oxford’s RoboCar UK2—for examplethe UK is planning to test them in selected cities asearly as 2015. Yet today, all working systems mustrely on extensive arrays of sensors (LIDAR, GPS,aerial images, road maps) to palliate the insufficien-cies of machine vision. Despite the difficulties it en-tails, vision based driving remains the most promis-ing way to spread driverless cars outside of city cen-ters and motorways to rural areas and countrysideroads: first because most of the existing infrastruc-ture and signalisation in place is visual; second, visionoffers the advantage of a fast, passive sensor that al-lows for early detection of hazards. Unfortunately,vision-based autonomous driving is still far from ourreach, and significant progresses in machine vision arerequired.

1https://plus.google.com/+GoogleSelfDrivingCars/

posts2http://mrg.robots.ox.ac.uk/robotcar/

1

http://dx.doi.org/10.1007/s13218-014-0340-1

link.springer.com

[email protected]

https://plus.google.com/+GoogleSelfDrivingCars/posts

https://plus.google.com/+GoogleSelfDrivingCars/posts

http://mrg.robots.ox.ac.uk/robotcar/

Figure 1: The tracks from the TORCS simulator that were used in this work.

One aspect that sets vision-based driving asidefrom other computer vision problems, is the vastamount of visual information that is continuouslyperceived and the high variability of the environment:driving differs in cities versus countryside or motor-ways, lane markings can be faded or nonexistent, andit is impractical to predict a priori all sort of roadusers that a car may encounter (cars, motorcyclistsand trucks, but also pedestrians, cyclists in London,sheep in Scotland, wild poneys in Dartmoor, etc.).In addition, any driving system is required to anal-yse visual information swiftly in order to act withminimal latency—a delayed response when drivingat 60mph can have lethal consequences. The main-stream of computer vision approaches, for examplerunning detectors over scanning windows in images,require high processing power and is ill suited to thislow-latency requirement.

In contrast to these approaches, the recent avail-ability of large image collections on the internethas motivated computer vision scientists to look forfaster approaches. Drawing inspiration from the pre-attentive capabilities of the human visual system,Oliva & Torralba proposed a holistic image descriptorbased on the low frequency components of images’Fourier decomposition, to encode the gist of a vi-sual scene [6]. They demonstrated that such a coarseholistic encoding was sufficient for classifying imagesbetween categories such as open or closed, indoor oroutdoor... Because such gist descriptors discard allhigh frequency components, they are a plausible ap-proximation of human pre-attentive perception basedon peripheral vision. In addition to this, gist de-scriptors are computationally attractive because theyonly process low-resolution versions of the images,

which makes them well suited for low-latency appli-cations such as driving. The first application of vi-sual gist in robotic domain was by Siagian & Itti [10],where visual gist was used for robot localisation (butnot steering). Pugeault & Bowden [8] demonstratedthat detectors based on visual gist could predict driv-ing context such as ‘crossroads’, ‘urban’ versus ‘non-urban’ and ‘single-lane’ versus ‘motorway’. Addition-ally, it was shown that the driver’s actions such aspressing the brake pedal or turning the steering wheelcould also be predicted to a large extent. A later ar-ticle [9] showed that a human driver’s steering anglecould be predicted accurately using a Random Forestregression approach on the extracted gist descriptors.The results were demonstrated on a robotic platformthat could steer itself at moderate speed on a nar-row track and on driving data recorded in a real caron a countryside road. These studies demonstratedthat (i) the majority of human driving actions can beexplained from visual gist only, and (ii) such modelshave the capacity to adapt to large variations in theenvironment: lane markers were learnt as cues whenvisible, whereas other cues were learnt otherwise [9].Both of these works used batch learning approacheshowever, where all training examples were providedat training time. This is a limitation because of theextensive variability in driving environment and sit-uations, and missing training examples can cause thesystem to steer off the road. To make matters worse,informative situations, such as taking a narrow curveor a hairpin bend are uncommon which means thathundreds of hours of driving may be required beforeall informative situations are featured in the trainingset. This is impractical both from the perspective ofdata storage and training algorithms.

2

In the following, we overcome this limitation bydevising a driving school in a racing simulation pro-gram. In this approach, the ‘student’ is the pre-attentive driving program, and the ‘teacher’ is an-other program that has perfect knowledge of the trackand how to steer around it—and does not rely on vi-sion for this.

2 Introducing the virtual driv-ing school

In order to implement a driving school for an artificialvision-based program, two main challenges need tobe addressed: safety and training. First, in a real-life driving school, the risk of giving the wheel to anuntrained driver is alleviated by the assumption thatthe learner has enough common sense and caution.There is no such comfort with an algorithm: assessingalgorithm performance on a racing track at a speedof 70 mph under varying environmental conditionswould pose significant safety issues—notwithstandingmaterial damage. Second, a driving school requiresan experienced driver, the teacher, to take over andcorrect the student when he does wrong. Althoughwe could imagine to use a human teacher, it would beeasier and faster for the system to learn from anotherprogram that is known to be a good and safe driverand using additional sensors beside vision and worldknowledge. For these reasons, we devised our drivingschool experiment inside a virtual environment usingan artificial driver as tutor.

2.1 Simulated environment

In this work, we build our driving school on an OpenSource racing simulator, called TORCS [13]. Thevideo game engine provides with a convenient per-ception/action framework as well as performance in-formation such as the car’s position on the road. Ad-ditionally, given that we are only using visual gist asinput, the 3D graphics are virtually undistinguishablefrom real life videos. Using an Open Source programallows to easily modify the simulator’s code in orderto (i) transmit visual data and controls to the driv-ing program, (ii) override the player’s steering control

Grey-scale &

down-scale to

160x120

Extract HOG

features

10101010101010101011010...

HOG feature vector

HOG feature extraction

Figure 2: Visual gist extraction (HOG) from the car’scockpit.

with our program, and (iii) to check for failure casesby controlling for the position of the car on the road,and override the student’s controls with the teacher’swhen this happens.

TORCS provides us with realistic car and roadphysics and a variety of racing tracks : Figure 1shows the selection of 16 tracks that we used inthis experiment, featuring large variations in weather,landscape, layout, difficulty and markings. Finally,TORCS also provides automated racing programs (or‘bots’), that steer the car around the track based onperfect knowledge of the track shape and position onthe road. In this project we use a bot developped byBernhard Wymann, called ‘damned’, as the teacherfor our system [13]. Note that this work only consid-ers the problem of high speed steering for road fol-lowing; therefore in all experiments described in thispaper the system was racing on an empty track withno other cars or obstacles (as would an actual driverwhen training). Also, anyone could drive around aracing track at low speed, as it allows the driverplenty of time to correct errors and poor steeringchoices. In contrast, high speed driving require quickand accurate steering to remain on the track, chal-lenging even expert drivers. Hence, we set the car’sspeed at 70mph in all experiments, to ensure thatsuccessful driving around the track is an indicator ofskillful steering, while ensuring that an average hu-man player could also succeed.

3

2.2 Visual gist

As explained before, we want our system to learn amodel of pre-attentive driving, and therefore we en-sure that it only perceives the visual gist of the viewfrom the car’s cockpit, as in [8, 9]. In [9], two de-scriptors were considered for regressing steering: theGIST descriptor from [6] and the related Histogramof Oriented Gradients (HOG) descriptor [3]. HOGis a popular feature descriptor in machine vision forits efficiency, originally designed for human detectionin images. It consists in calculating the image gra-dients at all pixel localisation and pooling gradients’orientations into histograms over a coarse grid. Inthis work we make use of HOGs descriptors calcu-lated on 8 × 8 grids over 160 × 120 images, whichis an efficient approximation of visual gist [9] allow-ing us to process the simulator’s visual input at 20frames per second—see Figure 2. Formally, we defineour visual gist extration from images I ∈ I as a pro-cess φ : I → X , where X is the space of our HOGfeatures.

2.3 Steering angle regression

The learning is performed using random regressionforests, as in [9]. Random forests were initially pro-posed by [1, 2], and consist in learning a randomisedcollection of decision trees from a dataset. Randomforests can operate regression from large datasetswith high dimensional visual gist observation vectors,they are efficient to train and can predict steeringvalues in real-time, and therefore ideally suited forthis study. Formally, a random forest regressor ψ isa mapping from the space of visual features X to asteering angle θ ∈ R: ψ : X → R. A steering forestψD will be defined from a training set D composedof N pairs (xi, θi) where x ∈ X is a HOG featureextracted at time ti and θi ∈ R is the correspond-ing steering value decided at time ti by the teacherprogram. In practice, random forest training timetypically increases superlinearly with the number oftraining examples N , hence the aim of our approachis to add example pairs to D parsimoniously. Werefer to [9] for details of the random forest learningapproach.

Autonomousdriving system TORCS Demonstration

robot

carposition

student'ssteering steer

student/teacher

race image

(teacher)

override

teacher's steering

Figure 3: Diagram of the driving school frameworkbuilt on top of the racing simulator.

Get the next race

step when the

system failed

Update the

training dataset

More failure

race steps?

Re-train the

systemNO

START END

Get 100 previous

images

Get 100 previous

demonstrator’s

steering

responsesYES

Figure 4: Flow diagram of the incremental learningapproach.

2.4 Incremental learning

This work focuses on incremental re-training byadding failure cases to the training set after each race.Incrementally adding failure cases to update a learntmodel is a common approach in visual tracking—see,e.g., [4]. The rationale is that the object’s appear-ance is likely to change over time due to lighting andviewpoint, and that it is impossible to define a com-plete set of training examples a priori. Therefore amore successful approach is to update the model ofthe tracked object at each frame with new successand failure cases. The same rationale is used in thiswork: we start by learning a simple random forestregression for a minimal training sample (one racearound the track demonstrated by the teacher), andupdate this training set after each race by adding thefailure cases.

In our experimental set up, the car speed is fixedat 70mph, and the steering can be controlled eitherby the student, using the learnt random forest re-gressor, or by the teacher using TORC’s bot pro-gram: the student controls the steering alone untilthe car gets off the road, then the control is passedonto the teacher to correct the car’s position—this isillustrated in diagram 3, where the ‘student/teacher

4

(a) trajectories of the student

(b) steering control

Figure 5: Student learning on a single curve: In(a) the trajectories followed by the system after one,two or three rounds of training, on the same curve.Each time the trajectory touches the road boundary(denoted as shaded circles on the figure), the teachertook over the steering and brought the car back onthe road, hence runs 1 and 2 both failed twice, andrun 3 was successful. Panel (b) shows the actualsteering control for runs 2 and 3 on this curve: theblue curve is the student’s control and the purple oneis the teacher’s correction when the student failed.

override’ signal switches the car’s control from stu-dent to teacher when the car goes off track. In addi-tion to correcting the car’s position, the teacher thenprovides feedback to the student in terms of correctsteering controls in the 5 second that preceeded theincident. This duration of 5 seconds is arbitrary andchosen because at a speed of 70mph, it will includemost, if not all, steering information leading to thefailure (it corresponds to a distance of ' 155 meters).The system’s performance is unaffected by small vari-ations of this parameter, although shorter episodeswill require more retraining while significantly longerepisodes will increase the training set with less rele-vant examples and slow down each re-training.

In practice, everytime the student drives the car offthe track: (1) the 100 frames (i.e., 5 seconds) priorto the failure are saved as a new training episode,along with the steering suggested by the teacher forthese frames; (2) the teacher takes over and bringsthe car back on the track; and (3) the student re-sumes driving around the track—this is illustratedin diagram 4. When the student finally finishes rac-ing around the track, all failure cases and associatedteacher’s steering are added to the training set of thestudent’s regression model, and the random forest isre-trained. Both stages of experimenting and trainingare performed iteratively on the same track until thestudent manages to race 10 consecutive laps withoutmistake. Figure 5 illustrates the control signal gener-ated by the student (b), along with the correspondingtrajectories (a) for one curve.

3 Results

The experiment were set the following way: First,the teacher demonstrates racing one lap around track#1; then, the student program is given the controlsand attempts to race around the track, acquiring newtraining data after each mistake. After the end of thelap, the student is retrained before attempting again.When the student succeeds in driving 10 consecutivelaps around the track, it proceeds to the next track.

5

(a) average failures on known track (b) training steps for adapting to new tracks

Figure 6: Incremental learning of the student driver: (a) average number of student failures (ie, going offtrack) against the number of laps and re-training performed; (b) number of training steps required whendiscovering a new track before driving without mistake. Note that the first column indicate the number ofre-training required on the same track on which pre-training was performed.

3.1 Learning to race on a known track

Figure 6(a) shows the average number of times thestudent went off the track per km, after each trainingstage. This graph shows that the incremental retrain-ing improves steering behaviour quickly and that thesystem can drive successfully around the track after6 re-trainings. Figure 7 provides a more qualitativeinsight in the student’s steering improvement due tore-training. In this figure, the bottom row shows thetrajectories at the first trial, and after four re-trainingstages, and the top row shows the intermediate tra-jectories, after one two and three re-trainings. Inthese figures, the student’s trajectory is denoted bythe green line and the failure cases where the teacherhad to intervene are denoted by the purple segments.The student originally fails in all curves (and evensome straight portions), whereas a single failure oc-cur after just two re-training steps. After four re-trainings, the student successfully drive around thetrack as shown by the lack of purple segments.

3.2 Generalisation to new tracks

One question is whether the behaviour learnt by thestudent generalises to different tracks, surfaces, en-

vironments, lane markings, etc. Figure 6(b) pro-vides some evidence that this is the case. This graphplots the number of retraining stages required be-fore the student could drive successfully around thetrack, against the number of tracks previously en-countered by the student (the tracks were attemptedin the same order as Figure 1, starting from top-left tobottom right). This shows that training generally be-comes faster after the student has already masteredseveral tracks, although some more difficult tracksstill require re-training. For example the studentcould drive around the third, fifth and sixth trackwithout re-training, i.e. the training examples col-lected from the previous tracks were sufficient to in-form driving on these ones. Overall, six tracks couldbe raced without additional training, and three morewith only one additional training round. Conversely,some of the tracks seem to be intrinsically more dif-ficult to learn, for example tracks 7 and 8 containtighter curves than the previous ones, track 9 is theonly track in a snowy weather, hence the need forre-training.

6

Figure 7: Illustration of car trajectories across successive trials and re-training stages. The green line is thetrajectory, and the purple sections indicate the student’s failures and the episodes selected for re-training.

7

4 Conclusions

In this article we have presented some results of a sim-ulated driving school framework for artificial drivingsystems. The studied system used only visual gist asperceptual input and achieved a good approximationof an expert program’s driving behaviour, despite thefact that the program in question benefit from perfectknowledge of the track and the car’s position on theroad. This is a confirmation of the evidence in [8, 9]that pre-attentive vision accounts for a large share ofdriving behaviour.

In addition, we presented an incremental learningapproach that only sparingly increase the training setwith failure episodes and is shown to reach good per-formance after only a few re-training stages. More-over, the learnt behaviour is shown to generalise wellto different tracks. The re-training stages could beimproved in future work by implementing a variantof boosting instead of retraining the whole forest ateach step.

Finally, this experiment showed that simulatedenvironments can provide convenient test beds forlearning, and especially for active learning ap-proaches.

References

[1] Amit Y, Geman D (1997) Shape quantizationand recognition with randomized trees. NeuralComputation 9:1545–1588

[2] Breiman L (2001) Random forests. MachineLearning 45(1):5–32

[3] Dalal N, Triggs B (2005) Histograms of orientedgradients for human detection. In: Proceedingsof the IEEE Conference on Computer Vision andPattern Recognition (CVPR)

[4] Kalal Z, Mikolajczyk K, Matas J (2010)Tracking-learning-detection. IEEE Transactionson Pattern Analysis and Machine Intelligence6(1)

[5] Markelic I, Kulvicius T, Tamosiunaite M,Worgotter F (2008) Anticipatory driving for a

robot-car based on supervised learning. In: ABi-ALS, pp 267–282

[6] Oliva A, Torralba A (2001) Modeling the shapeof the scene: a holistic representation of the spa-tial envelope. International Journal of ComputerVision 42(3):145–175

[7] Pomerleau D (1989) Alvinn: An autonomousland vehicle in a neural network. In: Proceed-ings of NIPS

[8] Pugeault N, Bowden R (2010) Learning pre-attentive driving behaviour from holistic visualfeatures. In: Proceedings of the European Con-ference on Computer Vision (ECCV’2010), PartVI, LNCS 6316, pp 154–167

[9] Pugeault N, Bowden R (2011) Driving mearound the bend: Learning to drive from visualgist. In: 1st IEEE Workshop on Challenges andOpportunities in Robotic Perception, in con-junction with ICCV’2011

[10] Siagian C, Itti L (2009) Biologically inspired mo-bile robot vision localization. IEEE Transactionson Robotics 25(4):861–873

[11] Thrun S, Montemerlo M, Dahlkamp H, StavensD, Aron A, Diebel J, Fong P, Gale J, HalpennyM, Hoffmann G, Lau K, Oakley C, Palatucci M,Pratt V, Stang P, Strohband S, Dupont C, Jen-drossek LE, Koelen C, Markey C, Rummel C,van Niekerk J, Jensen E, Alessandrini P, Brad-ski G, Davies B, Ettinger S, Kaehler A, NefianA, Mahoney P (2006) Stanley: The robot thatwon the DARPA Grand Challenge. Journal ofRobotic Systems 23(9):661–692

[12] Turk M, Morgenthaler D, Gremban K, Marra M(1988) VITS—a vision system for autonomousland vehicle navigation. IEEE Transactionson Pattern Analysis and Machine Intelligence10(3):342–361

[13] Wymann B, Espie E, Guionneau C, Dim-itrakakis C, Coulom R, Sumner A (2013)TORCS: The open racing car simulator, v1.3.5

8

Reinis Rudzits graduated withfirst class BEng in Electronics En-gineering at the University of Sur-rey (UK) in 2014. He is currentlyemployed as a financial software de-veloper at BNP Paribas, London.

Nicolas Pugeault is a Lecturerat the Centre for Vision, Speechand Signal Processing (CVSSP),University of Surrey (UK). Heholds a Ph.D. from the Univer-sity of Gottingen, and has co-authored over 40 peer-reviewed pub-lications. His research covers as-

pects of computer vision, machine learning and cog-nitive robotics.

9

e cient learning of pre-attentive steering in a driving...

Documents