chapter 30 vision for driver assistance: looking at people in a...

Chapter 30Vision for Driver Assistance:Looking at People in a Vehicle

Cuong Tran and Mohan Manubhai Trivedi

Abstract An important real-life application domain of computer vision techniqueslooking at people is in developing Intelligent Driver Assistance Systems (IDAS’s).By analyzing information from both looking in and looking out of the vehicle, suchsystems can actively prevent vehicular accidents, improve driver safety as well asdriver experience. Towards such goals, developing systems looking people in a ve-hicle (i.e. driver and passengers) to understand their intent, behavior, and states isneeded. This is a challenging task which typically requires high reliability, accuracy,and efficient performance. Challenges also come from the dynamic background andvarying lighting condition in driving scenes. However, looking at people in a vehi-cle also has its own characteristics which could be exploited to simplify the problemsuch as people typically sitting in a fixed position and their activities being highlyrelated to the driving context. In this chapter, we give a concise overview of variousrelated research studies to see how their approaches were developed to fit the specificrequirements and characteristics of looking at people in a vehicle. From a historicalpoint of view, we first discuss studies looking at head, eyes, and facial landmarksand then studies looking at body, hands, and feet. Despite lots of active research andpublished papers, developing accurate, reliable, and efficient approaches for lookingat people in real-world driving scenarios is still an open problem. To this end, wewill discuss some remaining issues for the future development in the area.

30.1 Introduction and Motivation

Automobiles were at the core of transforming lives of individuals and nations duringthe 20th century. However, despite their many benefits, motor vehicles pose a con-siderable safety risk. A study by World Health Organization mentions that annually,over 1.2 million fatalities and over 20 million serious injuries occur worldwide [28].

C. Tran (�) · M.M. TrivediLaboratory for Intelligent and Safe Automobiles (LISA), University of California at San Diego,San Diego, CA 92037, USAe-mail: [email protected]

M.M. Trivedie-mail: [email protected]

T.B. Moeslund et al. (eds.), Visual Analysis of Humans,DOI 10.1007/978-0-85729-997-0_30, © Springer-Verlag London Limited 2011

597

mailto:[email protected]

mailto:[email protected]

http://dx.doi.org/10.1007/978-0-85729-997-0_30

598 C. Tran and M.M. Trivedi

Fig. 30.1 Looking-in andLooking-out of a vehicle [36]

Most roadway accidents are caused by driver error. A 2006 study sponsored by theUS Department of Transportation’s National Highway Traffic Safety Administra-tion concluded that driver inattention contributes to nearly 80 percent of crashesand 65 percent of near crashes. Therefore in today’s vehicles, embedded computingsystems are increasingly used to make them safer as well as more reliable, comfort-able, and enjoyable to drive.

In vehicle-based safety systems, it is more desirable to prevent an accident (activesafety) rather than reduce the severity of injuries (passive safety). However, active-safety systems also pose more difficult and challenging problems. To be effective,such technologies must be human-centric and work in a “holistic” manner [34, 36].As illustrated in Fig. 30.1, information from looking inside a vehicle (i.e. driver andpassengers), looking outside to the environment (e.g. looking at roads, other cars), aswell as vehicle sensors (e.g. measuring steering angle, speed) need to be taken intoaccount. In this chapter, we focus on the task of looking at people inside a vehicle(i.e. driver and passengers) to understand their intent, behavior, and states. This taskis inherently challenging due to the dynamic driving scene background and varyinglighting condition. Moreover it also demands high reliability, accuracy, and efficientperformance (e.g. real-time performance for safety related applications). Obviously,the fundamental computer vision and machine learning techniques looking at peo-ple, which were covered in previous chapters, are the foundation for techniqueslooking at people inside a vehicle. However, human activity in a vehicle also has itsown characteristics, which should be exploited to improve the system performancesuch as people typically sit in a fixed position and their activities are highly relatedto the driving context (e.g. most of driver foot movements are related to pedal pressactivity).

In the following sections, we provide a concise overview of several selected re-search studies focusing on how computer vision techniques are developed to fit therequirements and characteristics of systems looking at people in a vehicle. We startin Sect. 30.2 with a discussion of some criteria for categorizing existing approachessuch as their objective (e.g. to monitor driver fatigue or to analyze driver intent) orthe cueing information which is used (e.g. looking at head, eyes, or feet). Initially,

30 Vision for Driver Assistance: Looking at People in a Vehicle 599

research studies in this area focus more on cues related to driver head like head pose,eye gaze, and facial landmarks which are needed to determine driver attention andfatigue state [3, 13, 14, 18, 20, 26, 30, 38]. Some selected approaches of this kind arecovered in Sect. 30.3. More recently, beside these traditional cues, other parts of thebody like hand movement, foot movement, or the whole upper body posture are alsoshown to be important for understanding people intent and behavior in a vehicle [6,7, 19, 31, 33, 35]. We will talk about some selected approaches in this category inSect. 30.4. Despite lots of active research, developing accurate, reliable, and efficientapproaches for looking in a vehicle as well as combining them with looking-out in-formation for holistic human-centered Intelligent Driver Assistance System (IDAS)are still open problems. Section 30.5 is a discussion of some open issues for thefuture development in the area, and finally we have some concluding remarks inSect. 30.6.

30.2 Overview of Selected Studies

There are several ways to categorize related studies in the area depending on spe-cific purpose. Figure 30.2 shows the basic steps of a common computer vision sys-tem looking at people. We see that approaches may use different types of input (e.g.monocular camera, stereo camera, camera with active infrared illuminators), extractdifferent types of intermediate features, and aim to analyze different types of driverbehavior or state. Beside these functional criteria, we can also categorize the ap-proaches based on the fundamental techniques underlying their implementation ateach step. With the goal of providing an overview of several selected research stud-ies, we put them into a summary table (Table 30.1) with the following importantelements associated with these approaches.

• Objective: What is the final goal of that study (e.g. to monitor driver fatigue,detect driver distraction, or to recognize driver turn intent)?

• Sensor input: Which type of sensor input is used (e.g. monocular, stereo, or ther-mal camera)?

Fig. 30.2 Basic components of a system looking at people in a vehicle


Table 30.1 Overview of selected studies for looking at people in a vehicle

Objective Sensor input Monitoredbody parts

Methodology and experimentalevaluation

Grace et al.’98 [14]

Drowsinessdetection fortruck driver

TwoPERCLOS [14]cameras

Eyes Use illuminated eye detectionand PERCLOS measurement.In-vehicle experiment

Smith et al.’03 [30]

Determinationof drivervisualattention

Monocular Head eyes,and facefeatures

Use appearance-based head andface features tracking. Modeldriver visual attention withFSM’s. In-vehicle experiment

Ishikawaet al. ’04 [18]

Driver gazetracking

Monocular Eyes Use active appearance model totrack the whole face. Thendetect iris with templatematching and estimate eye gaze.In-vehicle and simulation

Ji et al.’04 [20]

Driver fatiguemonitoringand prediction

Two cameraswith activeinfraredilluminators

Head, eye,faciallandmarks

Combine illumination based andappearance-based techniques foreye detection. Fuse differentinformation from head pose,eyes in a probabilistic fatiguemodel. Simulation experiment

Trivedi et al.’04 [35]

Occupantpostureanalysis

Stereo andthermalcameras

Bodyposture

Use head tracking to infer sittingposture. In-vehicle experiment

Fletcher et al.’05 [13]

Driverawarenessmonitoring

Commercialeye tracker

Eye gaze Develop road sign recognitionalgorithm. Use epipolargeometry to correlate eye gazewith road scene for awarenessmonitoring. In-vehicleexperiment

Veeraragha-van et al.’05 [37]

Unsafe driveractivitiesdetection

Monocular Face andhands

Use motion of skin regions anda Bayesian classifier to detectsome unsafe activities (e.g.drinking, using cellphone).Simulation experiment

Bergasa et al.’06 [3]

Drivervigilancemonitoring

One cameraw/illuminator

Head andeyes

Use PERCLOS, noddingfrequency, blink frequency in afuzzy inference system tocompute vigilance level.In-vehicle experiment

Cheng andTrivedi’06 [6]

Turn intentanalysis

Multi-modalsensors andmaker-basedmotion capture

Head andhands

Use sparse Bayesian learning toclassify turn intent and evaluatewith different feature vectorcombination. In-vehicleexperiment

(continued on the next page)


Table 30.1 (Continued)

Objective Sensor input Monitoredbody parts

Methodology and experimentalevaluation

Cheng et al.’06 [5]

Driver handgrasp and turnanalysis

Color andthermalcamera

Head andhands

Use optical flow head trackingand HMM based activityclassifier. In-vehicle experiment

Ito andKanade’08 [19]

Prediction of9 driveroperations

Monocular Body Track 6 marker points onshoulders, elbows, and wrists.Use discriminant analysis tolearn Gaussian operation modelsand then Bayesian classifier.Simulation experiment

Doshi andTrivedi’09 [11]

Driver lanechange intentanalysis

Monocular Head, eye Use Relevance Vector Machinefor lane change prediction(optical flow based head motion,manually labeled eye gaze).In-vehicle and simulation

Tran andTrivedi’09 [31]

Driverdistractionmonitoring

3 cameras Head, hands Combine tracked head pose andhand position using a rule-basedapproach. In-vehicle experiment

Murphy-Chutorian andTrivedi’10 [26]

Real-time 3Dhead posetracking

Monocular Head Hybrid method combining staticpose estimation with anappearance-based particle filter3D head tracking algorithm.In-vehicle experiment

Wu andTrivedi’10 [38]

Eye gazetracking andblinkrecognition

Monocular Eye Use two interactive ParticleFilters to simultaneously trackeyes and detect blinks.In-vehicle and lab experiment

Cheng andTrivedi’10 [7]

Driver &passengerhanddetermination

Monocularcamera withilluminator

Hands Use HOG feature descriptor andSVM classifier. In-vehicleexperiment

• Monitored body parts: Which type of cueing feature is extracted (e.g. informationabout head pose, eye gaze, body posture, or foot movement)?

• Methodology and algorithm: The underlying techniques that were used• Experiment and evaluation: How were the proposed approach evaluated? Was it

actually evaluated in real-world driving scenario or indoor simulation?

In the next sections, we will review several selected methods focusing on howcomputer vision techniques were developed to fit the requirements and characteris-tics of systems looking at people in a vehicle. Based on the type of cueing informa-tion, we will discuss those approaches in two main categories which are approaches


looking at driver head, face, and facial landmarks (Sect. 30.3) and approaches look-ing at driver body, hands, and feet (Sect. 30.4).

30.3 Looking at Driver Head, Face, and Facial Landmarks

Initial research studies looking at driver focused more on cues related to driver headlike head pose, eye gaze, and facial landmarks. This kind of cueing features wereshown to be important in determining driver attention and cognitive state (e.g. fa-tigue) [3, 13–15, 20]. Some example studies in this category are approaches formonitoring and prediction of driver fatigue, driver head pose tracking for monitor-ing driver awareness, eye tracking and blink recognition.

30.3.1 Monitoring and Prediction of Driver Fatigue

The National Highway Traffic Safety Administration (NHTSA) [27] has reporteddrowsy drivers as an important cause for fatal on road crashes and injuries in theU.S. Therefore, developing systems that actively monitor a driver’s level of vigi-lance and alert the driver of any insecure driving conditions is desirable for accidentprevention. Different approaches were used to tackle the problem such as assessingthe vigilance capacity of an operator before the work is perform [9], assess the driverstate using sensors mounted on the driver to measure heart rate, brain activity [39] orusing vehicle embedded sensors information (e.g. steering wheel movements, accel-eration and braking profiles) [2]. Computer vision techniques looking at the drivercould provide another non-intrusive approach to the problem. Research studies haveshown that information such as the PERCLOS measurement introduced by Graceet al. [14] are highly correlated to fatigue state and can be used to monitor driverfatigue. Other head and face related features like eye blink frequency, eye move-ment, nodding frequency, facial expression have also been used for driver fatigueand vigilance analysis [3, 20].

We will take a look at a representative approach proposed by Ji et al. [20] forreal-time monitoring and prediction of driver fatigue. In order to achieve the robust-ness required for in-vehicle applications, different cues including eyelid movement,gaze movement, head movement, and facial expression were extracted and fused ina Bayesian Network for human fatigue modeling and prediction. Two CCD cam-eras with active infrared illuminators were used. For eye detection and tracking,the bright pupil technique was combined with an appearance-based technique us-ing a SVM classifier to improve the robustness. This information of eye detectionand tracking was then also utilized in their algorithm for tracking head pose withKalman filter and using Gabor features to track facial landmarks around the mouthand eye regions.

The validation of the eye detection and tracking part as well as the extracted fa-tigue parameters and score were provided which showed some good results (e.g.


0.05% false-alarm rate and a 4.2% misdetection rate). However, it seems that theproposed approach was only evaluated with data from indoor environment. There-fore how this approach work with real-world driving scenarios with their challengesis still an open question.

30.3.2 Eye Localization, Tracking and Blink Pattern Recognition

Focusing on the task of robustly extracting visual cue information, a former mem-ber of our team Wu et al. proposed an appearance-based approach using monocularcamera input for eye tracking and blink pattern recognition [38]. For better accuracyand robustness, a binary tree is used to model the statistical structure of the object’sfeature space. This is a kind of global to local representation in which each subtreeexplains more detailed information than its parent tree (useful to represent objectwith high-order substructures like eye image). After the eyes are automatically lo-cated, a particle filter-based approach is used to simultaneously track eyes and de-tect blinks. Two interactive particle filters were used, one for open-eye and one forclose-eye. The posterior probabilities learned by the particle filters are used to deter-mine which particle filter gives the correct tracks. This particle filter is then labeledas the primary one and used to reinitialize the other particle filter. The performanceof both the blink detection rate and the eye tracking accuracy were evaluated andshowed good results with various scenarios including indoor and in-vehicle data se-quences as well as the FRGC (Face Recognition Grand Challenge) benchmark datafor evaluation of tracking accuracy.

Also focusing on a robust eye gaze tracking system, Ishikawa et al. [18] proposedto track the whole face with AAMs for more reliable extraction of eye regions andhead pose. Based on the extracted eye regions, a template matching method is usedto detect iris and use that for eye gaze estimation. This approach was evaluated andshowed promising results with a few subjects for both indoor and in vehicle videosequences.

30.3.3 Tracking Driver Head Pose

Head pose information is also a strong indicator of a driver’s field of view and cur-rent focus of attention and typically is less noisy than eye gaze. Driver head-motionestimation has also been used along with video-based lane detection and the vehicleCAN-bus (Controller Area Network) data to predict the driver’s intent to changelanes in advance of the actual movement of the vehicle [22]. Related works in headpose estimation can be roughly categorized into static head pose estimation meth-ods which estimate head pose directly from the current still image, tracking methodswhich recover the global pose change of the head from the observed movement be-tween video frames, and hybrid methods. A detailed survey of head pose estimation


and tracking approaches can be found in [25]. Up to now, computational head poseestimation still remains a challenging vision problem, and there are no solutionsthat are both inexpensive and widely available. In [26], a former member of ourteam Murphy-Chutorian et al. proposed an integrated approach using monocularcamera input for real-time driver head pose tracking in 3D. In order to overcomethe difficulties inherent with varying lighting conditions in a moving car, a statichead pose estimator using support vector regressors (SVRs) was combined with anappearance-based particle filter for 3-D head model tracking in an augmented realityenvironment.

For initial head pose estimation with SVRs, the Local Gradient Orientation(LGO) histogram, which is robust to minor deviations in region alignment, lighting,was used. The LGO histogram of a scale-normalized facial region is a 3D histogramM × N × O in which the first two dimensions correspond to the vertical and hor-izontal positions in the image and the third to the gradient orientation. Based onthe initial head pose estimation, an appearance-based particle filter in an augmentedreality, which is a virtual environment that mimics the view space of a real camera,is used to track the driver head in 3D. Using an initial estimate of the head positionand orientation, the system generates a texture-mapped 3-D model of the head fromthe most recent video image and places it into the environment. A particle filter ap-proach is then used to match the view from each subsequent video frame. Thoughthis operation is computationally expensive, it was highly optimized for graphicprocessing units (GPUs) in the proposed implementation to achieve real-time per-formance (tracking head at ∼30 frames per second). Evaluation of this approachshowed good results in real-world driving situations with drivers of varying ages,race, and sex spanning daytime and nighttime conditions.

30.4 Looking at Driver Body, Hands, and Feet

Beside cues from head, eyes, and facial features, information from other parts of thedriver body like hand movement, foot movement, or the whole upper body posturealso provides important information. Recently, there have been more research stud-ies making use of such cues for better understanding of driver intent, behavior [6, 7,31, 33].

30.4.1 Looking at Hands

Looking at driver hands is needed since it is an important factor in controlling thevehicle. However, it has not been studied much in the area of looking inside a vehi-cle. In [6], a sparse Bayesian classifier taking into account both hand position andhead pose was developed for lane change intent prediction. Hand position was alsoused in a system assisting driver in “keeping hands on the wheel and eyes on theroad” [31]. A rule-based approach with state machines was used to combine hand


Fig. 30.3 System for “keeping hands on the wheel and eyes on the road” [31]

position and head pose in monitoring driver distraction (Fig. 30.3). In [7], a formermember of our team Cheng et al. proposed a novel real-time computer-vision systemthat robustly discriminates which of the front-row seat occupants is accessing the in-fotainment controls. The knowledge of who is the user-that is, driver, passenger, orno one-can alleviate driver distraction and maximize the passenger infotainment ex-perience (e.g. the infotainment system should only provide its fancy options, whichcan be distracting, to the passenger but not the driver). The algorithm uses a modi-fied histogram-of-oriented-gradients HOG feature descriptor to represent the imagearea over the infotainment controls and a SVM and median filtering over time toclassify each image to one of the three classes with ∼96% average correct classi-fication rate. This rate was achieved over a wide range of illumination conditions,human subjects, and times of day.

30.4.2 Modeling and Prediction of Driver Foot Behavior

Beside hands, driver feet also has an important role in controlling the vehicle. Inaddition to information from embedded pedal sensors, the visual foot movementbefore and after a pedal press can provide valuable information for better semanticunderstanding of driver behavior, state, and style. They can also be used to gain atime advantage in predicting a pedal press before it actually happens, which is veryimportant for providing proper assistance to driver in time critical (e.g. safety re-lated) situations. However, there were very few research studies in analyzing driverfoot information. Mulder et al. have introduced a haptic gas pedal feedback sys-tem for car-following [23] in which the gas pedal position was used to improve thesystem performance. A former member of our team McCall et al. [22] developed abrake assistance system, which took into account both driver’s intent to brake (frompedal positions and the camera-based foot position) and the need to brake given thecurrent situation.

Recently, our team has examined an approach for driver foot behavior analysisusing a monocular foot camera input. The underlying idea is motivated by the factthat driver foot movement is highly related to the pedal press activity. After trackingthe foot movement with an optical flow based tracking method, a 7-state HMM) fordescribing foot behavior was specifically designed for driving scenarios (Fig. 30.4).The elements of this driver foot behavior HMM are as follows.


Fig. 30.4 Foot behavior HMM state model with 7 states

• Hidden states: We have 7 states {s1, s2, s3, s4, s5, s6, s7} including Neutral,BrkEngage, AccEngage, TowardsBrk, TowardsAcc, ReleaseBrk, ReleaseAcc. Thestate at time t is denoted by the random variable qt .

• Observation: The observation at time t is denoted by the random variable Ot

which has 6 components Ot = {px,py, vx, vy,B,A

}where {px,py, vx, vy} are

the current estimated position and velocity of driver foot. {B,A} are obtainedfrom vehicle CAN information which determine whether the brake and accelera-tor are currently engaged or not.

• Observation probability distributions: In our HMM model, we assume a Gaussianoutput probability distribution P(Ot |qt = si) = N(μi, σi).

• Transition matrix: A = {aij } is a 7 × 7 state transition matrix where aij is theprobability of making a transition from state si to sj , aij = P(qt+1 = sj |qt = si).

• Initial state distribution: Assume an uniform distribution of the initial states.

Utilizing reliable information from the vehicle CAN data, an automatic data la-beling procedure was developed for training and evaluating of the HMM model. TheHMM model parameters Λ including the Gaussian observation probability distribu-tion and the transition matrix are learned using a Baum–Welch algorithm.

The meaning of these estimated foot behavior states also connect directly to theprediction of actual pedal presses (i.e. when the foot is in the state TowardsBrk orTowardsAcc, we can predict a corresponding brake or acceleration press in nearfuture). This approach was evaluated with data from a real-world driving testbed


Fig. 30.5 Vehicle testbed configuration for foot analysis experiment

Fig. 30.6 Tracked trajectories of a brake (red) and an acceleration (blue). The labeled points showthe outputs of the HMM based foot behavior analysis

(Fig. 30.5). An experimental data collection paradigm was designed to approximatestop-and-go traffic in which the driver will accelerate or brake depending on whetherthe stop or go cue is shown. Figure 30.6 visualizes the outputs of the approach


for a brake and an acceleration example. Over all 15 experimental runs with 128trials (a stop or go cue is shown) per run, a major part ∼75% of the pedal pressescan be predicted with ∼95% accuracy at 133 ms prior to the actual pedal press.Regarding the misapplication cases (i.e. subjects were cued to hit a specific pedal butinstead applied the wrong pedal), all of them were predicted correctly ∼200 ms onaverage before the actual press, which is actually earlier than for general pedal pressprediction. This indicates the potential of using the proposed approach in predictingand mitigating pedal errors which is one problem of recent interest to the automotivesafety community [16].

30.4.3 Analyzing Driver Posture for Driver Assistance

The whole body posture is another cueing information that should be explored morein looking at people inside a vehicle. Figure 30.7 shows some possible ranges ofdriver posture movement which might have connection to driver state and intention.For example, leaning backward might indicate relax position, leaning forward in-dicates concentration. Driver may also change posture in preparation for some spe-cific tasks such as moving head forward to prepare for a better visual check beforelane change. In [19], Ito and Kanade used six marker points on shoulders, elbows,and wrists to predict nine driver operations toward different destinations includingnavigation, A/C, left vent, right vent, gear box, console box, passenger seat, glovecompartment, and rear-view mirror. Their approach has been evaluated with differ-ent subjects in driving simulation with high prediction accuracy 90% and low falsepositive rate 1.4%. This approach, however, requires putting markers on the driver.In [8], Datta et al. have developed a markerless approach for tracking systems ofarticulated planes was also applied to track 2D driver body pose on these same sim-ulation data. Though this approach has automated the tracking part, it still requiresa manual initialization of the tracking model.

Beside looking at driver, looking at occupant posture is also important. In [35],our team investigated basic feasibility of using stereo and thermal long-wavelengthinfrared video for occupant position and posture analysis, which is a key require-ment in designing “smart airbag” systems. In this investigation, our suggestion wasto use head tracking information, which is easier to track, instead of more detailed

Fig. 30.7 Illustration of some possible range of driver posture movement during driving


Fig. 30.8 Elbow joints prediction. (Left) Generate elbow candidates at each frames. (Right) Overa temporal segment, select the sequence of elbow joints that minimizes the joint displacement. Byadding 2 pseudo nodes s and t with zero-weighted edges, this can be represented as a shortest pathproblem

occupant posture analysis for robust “smart airbag” deployment. However, for po-tential applications goes beyond the purpose of “smart airbag” such as driver atten-tiveness analysis and human-machine interfaces inside the car, we see we need tolook at more detailed body posture of driver and occupant.

Our team has developed a computational approach for upper body tracking usingthe 3D movement of extremities (head and hands) [32]. This approach tracks a 3Dskeletal upper body model which can be determined by a set of upper body jointsand end points positions. To achieve robustness and real-time performance, this ap-proach first tracks the 3D movements of extremities, including head and hands.Then using human upper body configuration constraints, movements of the extrem-ities are used to predict the whole 3D upper body motion with inner joints. Since thehead and hand regions are typically well defined and undergo less occlusion, track-ing is more reliable and could enable us more robust upper body pose determination.Moreover by breaking the problem of high-dimensional search for upper body poseinto two steps, the complexity is reduced considerably. The downside is that weneed to deal with the ambiguity in inverse kinematics of upper body, i.e. there couldbe various upper body poses corresponding to the same head and hands positions.However, this issue is reduced in driving scenarios, since the driver typically sitsin a fixed position. To deal with this ambiguity, the “temporal inverse kinematics”based on observation of dynamics of the extremities was used instead of just inversekinematics constraints at each single frame.

Figure 30.8 briefly describes this idea with a numerical method to predict elbowjoint sequences. Since the lengths of upper arm and lower arm are fixed, possible el-bow joint positions with known shoulder joint position S and hand position H will lieon a circle. At each frame, the range of possible elbow joint (the mentioned circle)is determined and then quantized into several elbow candidates based on a distance


Fig. 30.9 Superimposed results of 3D driver body pose tracking using extremities movement

threshold between candidates (Fig. 30.8(left)). For a whole temporal segmentation,the selected sequence is the one that minimize the total elbow joint displacement. Asshown in Fig. 30.8(right), this selection can be represented as a shortest path prob-lem. Due to the layer structure of the constructed graph, a dynamic programmingtechnique can be used to solve this shortest path problem in linear time complexityO(n) where n is the number of frames in the sequence.

This approach was validated and showed good results with various subjects inboth indoor and in vehicle environments. Figure 30.9 shows some example resultsof the 3D driver body pose tracking superimposed on input images for visual evalu-ation.

30.5 Open Issues for Future Research

Some related research studies have shown promising results. However, the develop-ment of accurate, reliable, and efficient approaches to looking at people in a vehiclefor real-world driver assistance systems is still in its infancy. In this section, we willdiscuss some of the main issues that we think should be addressed for the futuredevelopment in the area.

• Coordination between real-world and simulation testbeds: Simulation environ-ments have the advantage of more flexibility in configuring sensors and design-ing experiment tasks for deeper analysis, which might be difficult and unsafe forimplementing in real-world driving. However, the ultimate goal is to develop sys-tems that work for real vehicle and there are always gaps between simulationenvironment and real world. Therefore in general a coordination between real-world driving and simulation environment is useful and should be considered inthe development process.

• Looking at driver body at multiple levels: To achieve robustness and accuracy,a potential trend is to combine cues at multiple body levels since human bodyis a homogeneous and harmonious whole and behavior and states are generallyexpressed at different body levels simultaneously. However, we see that cueinginformation from different body parts have different characteristics and typicallyrequire different approaches to extract. Therefore how to develop efficient sys-tems looking at driver body at multiple levels is still an open question.

• Investigating the role of features extracted from different body parts: Dependingon the concerned behavior and/or cognitive state, features from some body partsmay be useful, while others may not or may even be distracting factors. Moreover


for efficiency, only useful feature cues should be extracted. In [11], Doshi et al.from our team have done a comparative study on the role of head pose and eyegaze for driver lane change intent analysis. The results indicated that head pose,which is typically less noisy and easier to track than eye gaze, is actually a betterfeature for lane change intent prediction. In general, how to systematically dosimilar investigation for different feature cues and analysis tasks is desirable.

• Combining looking-in and looking-out: Some research studies have combinedthe output of looking-in and looking-out analysis for different assistance sys-tems such as driver intent analysis [7, 10, 21], intelligent brake assistance [22],traffic sign awareness [13], driver distraction [17]. In [29], Pugeault and Bow-den showed that information from a looking-out camera can be used to predictsome driver actions including steering left or right, pressing accelerator, brake,or clutch. This implies that the contextual information from looking-out is alsoimportant to looking-in analysis of driver behavior and states. In general, bothlooking-in and looking-out information will be needed in developing efficienthuman-centered driver assistance systems [34, 36].

• Interacting with driver when needed: Generally, IDAS’s need to have the abil-ity to provide feedbacks to the user when needed (e.g. to alert driver in criticalsituations). However, these IDAS feedbacks must be introduced carefully to en-sure that they do not confuse or distract the driver, thereby undermining theirintended purpose. Generally, interdisciplinary efforts need investigation as to theeffect of different feedback mechanisms including visual, audio, and/or hapticfeedback [1].

• Learning individual driver models vs. generic driver models: It has been notedthat individual drivers may act and respond in different ways under various con-ditions [4, 12, 24]. Therefore, it might be difficult to learn generic driver modelsthat work well for all drivers. In order to achieve better performance, adapting theassistance systems to individual drivers based on their style and preferences hasbeen needed. Murphey et al. [24] used the pedal press profile for classification ofdriver styles (i.e. calm, normal, and aggressive) and showed the correlation be-tween these styles and the fuel consumption. In [12], our team has also studiedsome measures of driving style and their correlation with the predictability andresponsiveness of the driver. The results indicated that “aggressive” drivers aremore predictable than “non-aggressive” drivers, while “non-aggressive” driversare more receptive of feedback from Driver Assistance Systems.

30.6 Conclusion

Looking at people in a vehicle to understand their behavior and state is an impor-tant area which plays a significant role in developing human-centered IntelligentDriver Assistance Systems. The task is challenging due to the high demand on reli-ability and efficiency as well as the inherent computer vision difficulty of dynamicbackground and varying lighting conditions. In this chapter, we provided a conciseoverview of several selected research studies looking at different body parts ranging


from coarse body to more detailed levels of feet, hands, head, eyes, and facial land-marks. To overcome the inherent challenges and achieve the required performance,some high-level directions learned from those studies are as follows.

• Design techniques which are specific for in-vehicle applications utilizing thecharacteristics such as that a driver typically sits in a fixed position or driver footmovement is highly related to pedal press actions.

• Integrate cueing information from different body parts.• Consider the trade-offs between the cues that can be extracted more reliably and

the cues that seem to be useful but hard to extract.• Make use of both dynamic information (body motion) and static information

(body appearance).• Make use of different input modalities (e.g. color cameras and thermal infrared

cameras).

Despite lots of active studies, more research efforts are still needed to bring thesehigh-level ideas into development of accurate, reliable, and efficient approaches forlooking at people in a vehicle and actually improve the lives of drivers around theworld.

30.6.1 Further Reading

Interested readers may consult the following references for a broad overview ofresearch topic trends and research groups in the area of intelligent transportationsystems.

• Li, L., Li, X., Cheng, C., Chen, C., Ke, G., Zeng, D., Scherer, W.T.: Research col-laboration and ITS topic evolution: 10 years at T-ITS. IEEE Trans. Intell. Transp.Syst. (June 2010)

• Li, L., Li, X., Li, Z., Zeng, D., Scherer, W.T.: A bibliographic analysis of theIEEE transactions on intelligent transportation systems literature. IEEE Trans.Intell. Transp. Syst. (October 2010)

Acknowledgements We thank the sponsorships of U.C. Discovery Program, National ScienceFoundation as well as industry sponsors including Nissan, Volkswagen Electronic Research Lab-oratory, and Mercedes. We also thank former and current colleagues from our Laboratory for In-telligent and Safe Automobiles (LISA) for their cooperation, assistance, and contributions: Dr.Kohsia Huang, Dr. Joel McCall, Dr. Tarak Gandhi, Dr. Sangho Park, Dr. Shinko Cheng, Dr. SteveKrotosky, Dr. Junwen Wu, Dr. Erik Murphy-Chutorian, Dr. Brendan Morris, Dr. Anup Doshi, Mr.Sayanan Sivaraman, Mr. Ashish Tawari, and Mr. Ofer Achlertheir.

References

1. Adell, E., Várhelyi, A.: Development of HMI components for a driver assistance system forsafe speed and safe distance. In: The 13th World Congress and Exhibition on Intelligent Trans-port Systems and Services ExCel London, United Kingdom (2006) [611]


2. Artaud, P., Planque, S., Lavergne, C., Cara, H., de Lepine, P., Tarriere, C., Gueguen, B.: An on-board system for detecting lapses of alertness in car driving. In: The 14th Int. Conf. EnhancedSafety of Vehicles (1994) [602]

3. Bergasa, L.M., Nuevo, J., Sotelo, M.A., Barea, R., Lopez, M.E.: Real-time system for moni-toring driver vigilance. IEEE Trans. Intell. Transp. Syst. 7(1), 63–77 (2006) [599,600,602]

4. Burnham, G.O., Seo, J., Bekey, G.A.: Identification of human drivers models in car following.IEEE Trans. Autom. Control 19(6), 911–915 (1974) [611]

5. Cheng, S.Y., Park, S., Trivedi, M.M.: Multiperspective and multimodal video arrays for 3dbody tracking and activity analysis. Comput. Vis. Image Underst. (Special Issue on Advancesin Vision Algorithms and Systems Beyond the Visible Spectrum) 106(2–3), 245–257 (2007)[601]

6. Cheng, S.Y., Trivedi, M.M.: Turn-intent analysis using body pose for intelligent driver assis-tance. IEEE Pervasive Comput. 5(4), 28–37 (2006) [599,600,604]

7. Cheng, S.Y., Trivedi, M.M.: Vision-based infotainment user determination by hand recogni-tion for driver assistance. IEEE Trans. Intell. Transp. Syst. 11(3), 759–764 (2010) [599,601,604,605,611]

8. Datta, A., Sheikh, Y., Kanade, T.: Linear motion estimation for systems of articulated planes.In: IEEE Conference on Computer Vision and Pattern Recognition (2008) [608]

9. Dinges, D., Mallis, M.: Managing fatigue by drowsiness detection: Can technologicalpromises be realized? In: Hartley, L. (ed.) Managing Fatigue in Transportation, Elsevier, Ox-ford (1998) [602]

10. Doshi, A., Trivedi, M.M.: Investigating the relationships between gaze patterns, dynamic vehi-cle surround analysis, and driver intentions. In: IEEE Intelligent Vehicles Symposium (2009)[611]

11. Doshi, A., Trivedi, M.M.: On the roles of eye gaze and head pose in predicting driver’s intentto change lanes. IEEE Trans. Intell. Transp. Syst. 10(3), 453–462 (2009) [601,611]

12. Doshi, A., Trivedi, M.M.: Examining the impact of driving style on the predictability andresponsiveness of the driver: Real-world and simulator analysis. In: IEEE Intelligent VehiclesSymposium (2010) [611]

13. Fletchera, L., Loyb, G., Barnesc, N., Zelinsky, A.: Correlating driver gaze with the road scenefor driver assistance systems. Robot. Auton. Syst. 52(1), 71–84 (2005) [599,600,602,611]

14. Grace, R., Byrne, V.E., Bierman, D.M., Legrand, J.M., Davis, R.K., Staszewski, J.J., Carna-han, B.: A drowsy driver detection system for heavy vehicles. In: Digital Avionics SystemsConference, Proceedings, The 17th DASC, The AIAA/IEEE/SAE (1998) [599,600,602]

15. Hammoud, R., Wilhelm, A., Malawey, P., Witt, G.: Efficient realtime algorithms for eye stateand head pose tracking in advanced driver support systems. In: IEEE Conference on ComputerVision and Pattern Recognition (2005) [602]

16. Healey, J.R., Carty, S.S.: Driver error found in some Toyota acceleration cases. In: USA Today(2010) [608]

17. Huang, K.S., Trivedi, M.M., Gandhi, T.: Driver’s view and vehicle surround estimation usingomnidirectional video stream. In: IEEE Intelligent Vehicles Symposium (2003) [611]

18. Ishikawa, T., Baker, S., Matthews, I., Kanade, T.: Passive driver gaze tracking with active ap-pearance models. In: The 11th World Congress on Intelligent Transportation Systems (2004)[599,600,603]

19. Ito, T., Kanade, T.: Predicting driver operations inside vehicles. In: IEEE International Con-ference on Automatic Face and Gesture Recognition (2008) [599,601,608]

20. Ji, Q., Zhu, Z., Lan, P.: Real time non-intrusive monitoring and prediction of driver fatigue.IEEE Trans. Veh. Technol. 53(4), 1052–1068 (2004) [599,600,602]

21. McCall, J., Wipf, D., Trivedi, M.M., Rao, B.: Lane change intent analysis using robust op-erators and sparse Bayesian learning. IEEE Trans. Intell. Transp. Syst. 8(3), 431–440 (2007)[611]

22. McCall, J.C., Trivedi, M.M.: Driver behavior and situation aware brake assistance for intelli-gent vehicles. Proc. IEEE 95(2), 374–387 (2007) [603,605,611]


23. Mulder, M., Pauwelussen, J.J.A., van Paassen, M.M., Mulder, M., Abbink, D.A.: Active de-celeration support in car following. IEEE Trans. Syst. Man Cybern., Part A, Syst. Hum. 40(6),1271–1284 (2010) [605]

24. Murphey, Y.L., Milton, R., Kiliaris, L.: Driver’s style classification using jerk analysis. In:IEEE Workshop on Computational Intelligence in Vehicles and Vehicular Systems (2009)[611]

25. Murphy-Chutorian, E., Trivedi, M.M.: Head pose estimation in computer vision: A survey.IEEE Trans. Pattern Anal. Mach. Intell. 31(4), 607–626 (2009) [604]

26. Murphy-Chutorian, E., Trivedi, M.M.: Head pose estimation and augmented reality track-ing: An integrated system and evaluation for monitoring driver awareness. IEEE Trans. Intell.Transp. Syst. 11(2), 300–311 (2010) [599,601,604]

27. NHTSA: Traffic safety facts 2006 – a compilation of motor vehicle crash data from the fatalityanalysis reporting system and the general estimates system. In: Washington, DC: Nat. CenterStat. Anal., US Dept. Transp. (2006) [602]

28. Peden, M., Scurfield, R., Sleet, D., Mohan, D., Hyder, A.A., Jarawan, E., Mathers, C.: Worldreport on road traffic injury prevention: Summary. In: World Health Organization, Geneva,Switzerland (2004) [597]

29. Pugeault, N., Bowden, R.: Learning pre-attentive driving behaviour from holistic visual fea-tures. In: The 11th European Conference on Computer Vision (2010) [611]

30. Smith, P., Shah, M., Lobo, N.V.: Determining driver visual attention with one camera. IEEETrans. Intell. Transp. Syst. 4(4), 205–218 (2003) [599]

31. Tran, C., Trivedi, M.M.: Driver assistance for keeping hands on the wheel and eyes on theroad. In: IEEE International Conference on Vehicular Electronics and Safety (2009) [599,601,604,605]

32. Tran, C., Trivedi, M.M.: Introducing ‘XMOB’: Extremity movement observation frameworkfor upper body pose tracking in 3d. In: IEEE International Symposium on Multimedia (2009)[609]

33. Tran, C., Trivedi, M.M.: Towards a vision-based system exploring 3d driver posture dynamicsfor driver assistance: Issues and possibilities. In: IEEE Intelligent Vehicles Symposium (2010)[599,604]

34. Trivedi, M.M., Cheng, S.Y.: Holistic sensing and active displays for intelligent driver supportsystems. IEEE Comput. 40(5), 60–68 (2007) [598,611]

35. Trivedi, M.M., Cheng, S.Y., Childers, E., Krotosky, S.: Occupant posture analysis with stereoand thermal infrared video: Algorithms and experimental evaluation. IEEE Trans. Veh. Tech-nol. (Special Issue on In-Vehicle Vision Systems) 53(6), 1698–1712 (2004) [599,600,608]

36. Trivedi, M.M., Gandhi, T., McCall, J.: Looking-in and looking-out of a vehicle: Computer-vision-based enhanced vehicle safety. IEEE Trans. Intell. Transp. Syst. 8(1), 108–120 (2007)[598,611]

37. Veeraraghavan, H., Atev, S., Bird, N., Schrater, P., Papanikolopoulos, N.: Driver activity mon-itoring through supervised and unsupervised learning. In: IEEE Conference on IntelligentTransportation Systems (2005) [600]

38. Wu, J., Trivedi, M.M.: An eye localization, tracking and blink pattern recognition system:Algorithm and evaluation. ACM Trans. Multimedia Comput. Commun. Appl. 6(2) (2010)[599,601,603]

39. Yammamoto, K., Higuchi, S.: Development of a drowsiness warning system. J. Soc. Automot.Eng. Jpn. (1992) [602]

chapter 30 vision for driver assistance: looking at people in a...

Documents