vehicles overtaking detection using rgb-d data

1

3

5

7

9

11

13

15

17

19

21

23

25

27

29

31

33

35

37

39

41

43

45

47

49

51

53

55

57

59

61

Q1

Contents lists available at ScienceDirect

Signal Processing

Signal Processing ] (]]]]) ]]]–]]]

http://d0165-16

n CorrE-m

Pleas//dx.

journal homepage: www.elsevier.com/locate/sigpro

Vehicles overtaking detection using RGB-D data

Yingjie Xia a,n, Chunhui Wang a, Xingmin Shi a, Luming Zhang b

a Intelligent Transportation and Information Security Lab, Hangzhou Normal University, Hangzhou, Zhejiang, Chinab School of Computing, National University of Singapore, Singapore

a r t i c l e i n f o

Article history:Received 1 March 2014Received in revised form11 May 2014Accepted 30 July 2014

Keywords:RGB-DTraffic scene understandingVehicle trackingOvertaking detectionIntelligent transportation systems

x.doi.org/10.1016/j.sigpro.2014.07.02584/& 2014 Elsevier B.V. All rights reserved.

esponding author.ail address: [email protected] (Y. Xia).

e cite this article as: Y. Xia, et al., Vdoi.org/10.1016/j.sigpro.2014.07.025

a b s t r a c t

Outstanding breakthrough in traffic video surveillance has been made with the develop-ment of computer vision techniques. However, there are still some problems to be solveddue to the limitations of two-dimensional (2D) video data. With the popularity of three-dimensional (3D) cameras, high-quality color and depth data can be obtained simulta-neously in real time through the video stream. In this paper, we propose a vehiclesovertaking detection method using RGB-D data captured by the Kinect device in simulatedtraffic scenes. Vehicles are detected and tracked with a robust traffic scene understandingon RGB-D data. The depth data is utilized to recognize vehicles overtaking by analyzingthe posture change of vehicles in different scenes. The principle of vehicles overtakingdetection is to fit the line of vehicle side in the coordinate system and calculate the anglebetween the vehicle side and the road orientation. As a consequence, the posture changeof the vehicle can be recognized. The proposed method is evaluated using simulationexperiments which show that our work has good performance in vehicles overtakingdetection and tracking.

& 2014 Elsevier B.V. All rights reserved.

63

65

67

69

71

73

75

77

79

1. Introduction

Overtaking is a complex action which has resulted ingreat loss of life and property. In vehicles overtaking, theblind spot of drivers increases the chance of traffic acci-dents. As a result, overtaking detection and warning arehelpful in avoiding potential accidents. When a driverintends to overtake the preceding vehicle, a piece ofwarning message can be sent to notice them to evadeeach other.

There exist various sensors to detect overtaking vehi-cles, such as sonar and radar. However, the performance ofthose sensors is unsatisfactory, because they are suscep-tible to the external environment. Compared with thesesensors, the digital camera has an advantage in cost and it

81

83

ehicles overtaking dei

can provide various functions, such as overtaking detec-tion, collision avoidance, lane detection, traffic sign recog-nition, object identification, and vehicle license platerecognition. With the rapid development of computervision, vision-based traffic surveillance becomes an impor-tant tool in intelligent transportation systems (ITS). Forexample, to improve the robustness of overtaking vehicledetection, a series of research work using the traffic camerahas been conducted. However, most of the research workuses two-dimensional (2D) video data, which has encoun-tered many difficulties, such as the vibrations of cameras,illumination change, poses of vehicles, occasional overlapsand complex background [1]. These problems have reducedthe stability and efficiency of the vehicle overtakingdetection.

Driven by indoor motion-sensitivity games, RGB-Dcameras have got great attention from developers. Micro-soft Kinect is a kind of RGB-D cameras, and it can capturecolor images and per pixel depth information in real time.

85

87

tection using RGB-D data, Signal Processing (2014), http:

www.sciencedirect.com/science/journal/01651684

www.elsevier.com/locate/sigpro

http://dx.doi.org/10.1016/j.sigpro.2014.07.025



mailto:[email protected]






1

3

5

7

9

11

13

15

17

19

21

23

25

27

29

31

33

35

37

39

41

43

45

47

49

51

53

55

57

59

61

63

65

67

69

71

73

75

77

79

81

83

85

87

89

91

93

95

97

99

101

103

105

107

109

111

113

115

117

119

121

123

Y. Xia et al. / Signal Processing ] (]]]]) ]]]–]]]2

The depth information can give us more choices whilechoosing the features of objects. Moreover, it can effec-tively overcome the influence of complex background,illumination change and scale variant. As a consequence,it can make the detection result more reliable than tradi-tional 2D images. Nowadays, the RGB-D camera is widelyapplied in various fields, such as people detection in robotsystems [2] and three-dimensional (3D) modeling ofindoor environments [3].

In order to enhance the robustness of tracking andovertaking detection, we propose a method using RGB-Ddata. The color and depth information of traffic scenes arecaptured by Microsoft Kinect. In this paper, we buildsimulation scenarios in which the vehicles are near theKinect and their size is small. However, in real scenes thesize of vehicles is large and the distance between thecamera and vehicles becomes far. Therefore, if the distanceis less than 10 m, the Kinect can still work by adapting tothe “far-mode”. Otherwise, if the distance is greater than10 m, we need to choose other types of workable depthcamera. The vehicles in different scenarios are detectedand tracked with a robust 3D tracking algorithm at first.Subsequently, according to the depth data of detectedvehicles, the behaviors of tracked vehicles are analyzedand recognized. Finally, judgement can be made whetherthe vehicle has a trend to overtake the preceding vehicle.The contribution of this paper lies in that (1) we utilize theRGB-D data to make the tracking algorithm more robust,and therefore can make up for the limitations of colorinformation. (2) The RGB-D can provide better semanticunderstanding of vehicles overtaking.

The rest of the paper is organized as follows: Section 2provides some related work. Section 3 firstly presents ourapproach for 3D vehicle tracking which integrates colorand depth information for better performance. Then themethod for overtaking judgment is specified. Section 4presents the simulation experimental results and theiranalysis. Conclusions and future research are finally pre-sented in Section 5.

2. Related work

With the development of RGB-D cameras, obtainingcolor and depth images in real time and stably has becomea reality. Under the promotion of RGB-D cameras, a seriesof research has been conducted in various fields. PeterHenry et al. applied depth cameras to build 3D maps ofindoor environments and detection of loop closure with ajoint optimization algorithm which consists of both visualfeatures and shape-based alignment [3]. In order to over-come the problems of depth data missing and uncertainty,Shen and Cheung proposed a probabilistic model based onthe concept of depth layer [4]. This method gets the labelof depth layer by a maximum a posterior probabilityproblem and this process is smoothed through a Markovrandom filed algorithm. Since the real-time problem is akey issue in processing RGB-D data, Holz et al. proposed ahighly-efficient plane labeling algorithm based on fastplane segmentation in 3D point clouds [5].

Video-based object tracking is an important topic in thefield of computer vision. In past decades, a large number of

Please cite this article as: Y. Xia, et al., Vehicles overtaking de//dx.doi.org/10.1016/j.sigpro.2014.07.025i

algorithms have been proposed, but many challengingissues still need to be solved by researchers. The task ofobject tracking is to tackle the issue of following thelocation of the target moving elements in the sequentialvideo frames to learn the motion or trajectory informationof the moving objects automatically.

Detection of moving objects is usually the first step ofthe tracking. To detect the moving objects in the surveil-lance video, a simple and widely used approach is theframe difference (FD) method in that there is no back-ground model that needs to be estimated or updated, andits processing speed is faster than any other methods [6].The FD method uses two or three time-adjacent frameimages to subtract and get difference images. Then themoving objects can be segmented with a specific thresholdvalue. This approach is easy to implement, but the com-plete contour of the moving objects is hard to be obtained.Moreover, the ‘double’ and ‘holes’ phenomenon is oftengenerated in the internal region of the object. As a result,the obtained object information may become inaccurate.Another detection method is background subtraction,which uses the current frame to subtract the backgroundmodel image to extract the target in the video [7]. Inbackground subtraction, the algorithm of backgroundmodeling is a critical process. The modeling methods canbe codebook [8] or Gaussian mixture modeling [9], whichis most commonly utilized due to its effectiveness androbustness.

After locating the target objects, matching is the finalstage of tracking. The methods that are usually adopted formatching in research including the feature-based methodand the model-based method [10]. In the feature-basedmethod, feature selection is critical in object matching forthat distinguished features can make the judgments moreefficient and robust, and different features can be selectedin different application scenarios. The features that arecommonly used include color features [11], texture fea-tures [12], shape features [9], spatio-temporal features[13], local features [14] and the combination of them[15]. Due to the fact that the speed up robust feature(SURF) detector can extract key points of the objects andthe SURF descriptor is full invariant to rotation and scalechanges, Miao et al. proposed a new framework for on-lineobject tracking based on SURF [16] to improve overalltracking performance. For some instances, a single featuremay be not enough for object detection, and Wang et al.incorporated optical flow, color and depth in their frame-work [17]. The model-based method requires the genera-tion of object model or template, which could beappearance model, shape model and so on, before track-ing. Song and Nevatia utilized vehicle shape models toseparate the vehicles in the merged blobs successfully [18].Zhong et al. adopted a patch-based appearance model totackle the problems of drift and occlusion through com-bining with local background estimation and the robusttracking is achieved perfectly [19].

The advent of RGB-D camera has made it possible toprovide the depth information and RGB values for eachimaged pixel. The depth data has several advantages inforeground or background segmentation under somechallenging scenarios, in which the illumination will be






1

3

5

7

9

11

13

15

17

19

21

23

25

27

29

31

33

35

37

39

41

43

45

47

49

51

53

55

57

59

61

63

65

67

69

71

73

75

77

79

81

83

85

87

89

91

93

95

97

99

101

103

105

107

109

111

113

115

117

119

121

123

Fig. 1. Flowchart of 3D vehicle tracking.

Fig. 2. FD method with two frames.

Y. Xia et al. / Signal Processing ] (]]]]) ]]]–]]] 3

changed gradually, and there may exist occlusions andshadows. It is difficult to make the tracking or segmenta-tion algorithm adaptive to these challenging conditions for2D image. Wei et al. utilized the RGB-D data to detect andsegment objects in the foreground accurately, and thenclassified the objects into pedestrian and non-pedestrianregions by estimating the distance and classifying with theshape and texture features [20].

RGB-D cameras have been applied in ITS such as over-taking detection and assistance which plays a key role inreducing traffic accidents and has drawn significant atten-tion. Batavia et al. detected vehicles in the blind spot withan implicit optical flow based method. The method doesnot explicitly calculate the optical flow field, so it savesmuch computational resources [21]. Milanes et al. pro-posed an intelligent automatic overtaking system whichcan decide whether to overtake by analyzing the precedingenvironment in real time. Furthermore, this system canperform overtaking autonomously, considering time-to-collision, width and length of the preceding vehicle andsome other information from differential global positionalsystem and inertial measurement unit [22]. Wang et al. putforward a method which models the background intodynamic and quasi-static regions to detect overtakingvehicles. In this method, homogeneous sparse optical flowis utilized to minimize the impact of vibrations of cameras.Moreover, a block-based eigenspace algorithm whichmodels the quasi-background is applied to make themethod more robust to illumination change [23]. In orderto reduce the influence of shadows and illuminations, Zhuet al. proposed an overtaking detection approach which issupplementary to previous sensing system. The methodutilizes dynamic scene modeling, hypothesis testing andvariable bandwidth density and multi-scale mean shiftbased information fusion to make the method more robustand reliable to illumination change and camera vibration[24].

3. Approach

The whole approach contains two steps: 3D vehicletracking and overtaking detection. They are specified asfollows.

3.1. 3D vehicle tracking

The object tracking is regularly performed in RGB colorspace from each surveillance video frame, and the trackedobjects are classified. Most tracking methods are applic-able to relatively simple or specific scenarios. However, ifthe occlusion and the light variation exist, these methodsmay not work well in extracting the object. This is for thereason that 2D image does not contain any spatial infor-mation and the relative position of these objects cannot beanalyzed easily.

In this paper, we utilize a vehicle tracking method byintroducing the depth information into RGB workspace toidentify and classify the objects efficiently and correctly. Inour method, RGB image is used to track the moving vehiclein the surveillance video, and depth information is utilized


to differentiate multiple vehicles in the current scenario.The process of 3D vehicle tracking is illustrated in Fig. 1.

3.1.1. Vehicle detection in RGB color spaceThe developed techniques for extracting moving

objects from video stream can be mainly grouped intothe following two methods: background subtraction (BS)and frame differencing (FD). For BS, an efficient back-ground modeling method is essential to obtain accurateforeground objects, but generating a more perfect back-ground model will lead to an increment in time consump-tion. It seems that the perfect background modelingmethod does not exist until now. The FD method focuseson the temporal intensity variation of each pixel in twoconsecutive adjacent frames to extract moving objects.Compared with BS, FD provides a faster and more efficientway for tracking and can be used in real-time trackingapplications. There is no need to estimate and update thebackground model continuously. Besides, when the vehi-cles running too fast, the motion blur influence may occurduring the vehicle detection. As the solution, we candirectly employ the automatic deblurring method [25] asan independent step to preprocess the motion blur.

Let Inðx; yÞ be the nth frame. In�1ðx; yÞ and Inþ1ðx; yÞrefer to ðn�1Þth, ðnþ1Þth frame respectively. The differ-ences can be computed per pixel with two adjacent framesbased on Eq. (1). For FD with two frames, a part of movingregion (the region in white), which does not belong to themoving object, will be detected due to intensity variationin this region as illustrated in Fig. 2.

In this paper, we adopt a three-frame differencingmethod to detect moving vehicles in the video in orderto avoid the problem mentioned above. The principle ofthis method is illuminated in Fig. 3 where the white regionwhich is the common region of Dnþ1;nðx; yÞ and Dn;n�1ðx; yÞ,represents the moving object in the nth frame accurately.The equations are formulated in (1)–(3).

Dn;n�1ðx; yÞ ¼ jInðx; yÞ� In�1ðx; yÞj ð1Þ

Dnþ1;nðx; yÞ ¼ jInþ1ðx; yÞ� Inðx; yÞj ð2Þ






1

3

5

7

9

11

13

15

17

19

21

23

25

27

29

31

33

35

37

39

41

43

45

47

49

51

53

55

57

59

61

63

65

67

69

71

73

75

77

79

81

83

85


Fnðx; yÞ ¼ jDn;n�1ðx; yÞ \ Dnþ1;nðx; yÞj ð3ÞDue to the complex condition of moving regions, some

holes or discontinuous edges may be left inside the region.As a result, some morphological operations are utilized tofill in the holes and connect the discontinuous edges.

3.1.2. Clustering of depth dataWhen two or more vehicles are running in the video

surveillance, the vehicles in the image may be occluded orconnected with each other in most cases. When theseevents occur, it is difficult for using 2D images to separatethe regions efficiently. Furthermore, if the vehicles stoppedin the scene temporarily for a while, the motion-basedmethod of object detection will miss to locate the object.The depth data from RGB-D camera can solve suchproblems by obtaining the pixel distance from the camerabased on the assumption that occluded objects in trafficscenario should have different distance. As the color imageand the depth image are shown in Fig. 4(a) and (b)respectively, we exploit the histogram of the depth image(Fig. 4(c)) to cluster the depth data according to thedistance into different pixel groups. In our application,the groups with the minimum or maximum distance value

87

89

91

93

95

Fig. 3. Three-frame FD method.

Fig. 4. The illustration of depth clustering. (a) Color image,


are not in our consideration for they do not represent anyobjects in the scenes. For other pixel groups, each of themrepresents an object in the current frame image asillustrated in Fig. 4(d). The clustering results are used todetermine whether occlusion exists. If there are occludedor connected objects, the object can be extracted with thehelp of depth data clustering result and the connectedobjects can be separated.

3.1.3. Segmentation of vehiclesIn the proposed method, the moving regions (MR) are

achieved by the frame difference tracking method and theclustered regions CR are obtained by the depth clustering.(MR) and (CR) can be formulated as Eqs. (4) and (5)respectively.

CR¼ C1;C2;…;Cn; nZ1 ð4Þ

MR¼M1;M2;…;MN ; nZ1 ð5ÞTherefore, the final region set contains all vehicles isR¼MR \ CR.

As illustrated in Fig. 5, we evaluate the detected movingregions with clustering results resulted from the depthdata. For each moving region, we take the clustered regioninto account, considering the occlusion in the movingregion. If occlusion occurs, the moving region will berefined into multiple regions according to the depthclustering results. As a result, all bounding rectangles canbe calculated by contour finding. Therefore, the segmenta-tion can be completed with help of the resulted rectangles.

3.1.4. Feature extraction and matchingThe SURF is a kind of local feature detectors which is

based on the sums of 2D Haar wavelet response aroundthe point of interest and it can use integral image

97

99

101

103

105

107

109

111

113

115

117

119

(b) depth image, (c) histogram and (d) clustering result.






1

3

5

7

9

11

13

15

17

19

21

23

25

27

29

31

33

35

37

39

41

43

45

47

49

51

53

55

57

59

61

63

65

67

69

71

73

75

77

79

81

83

85

87

89

91

93

95

97

99

101

103

105

107

109

111

113

115

117

119

121

123

Fig. 5. The process of vehicles segmentation.

Fig. 6. Direction change of the vehicle.

Fig. 7. The angle between the detected vehicle side and the road.


efficiently. Many applications have shown that the SURFdescriptor is much faster and more robust against differentimage transformations than SIFT.

Firstly, we extract the SURF feature points from thesegmented vehicle image. According to the SURF algo-rithm, points of interest are detected by the Hessianmatrix. Let P ¼ ðx; yÞ denote a point in an image, theHessian matrix in P at scale σ can be formulated asEq. (6), where LxxðP;σÞ refers to the convolutional compu-tation of the Gaussian second order derivative within pointP for smooth filtering, and so does LxyðP;σÞ and LyyðP;σÞ.

HðP;σÞ ¼LxxðP;σÞ LxyðP;σÞLxyðP;σÞ LyyðP;σÞ

" #ð6Þ

Then SURF feature vector is extracted by finding adominant orientation and then constructing a squareregion along the direction to get descriptive vector. Thesquare region is divided into 4 n 4 sub-regions. For each ofthe regions we calculate dx and dy, which denote Haarwavelet responses in the vertical and the horizontaldirection at the point of interest respectively. And theyare weighted with a Gaussian (σ¼3.3 s). jdxj and jdyj aresummed up in order to counteract rotation. After thevector is normalized, we can get a vector which has fourelements for each sub-region. The vector can be formu-lated as Eq. (7).

υ¼ ð∑dx;∑dy;∑jdxj;∑jdyjÞ ð7Þ

Therefore, the SURF descriptor is formed as an eigen-vector which consists of 16 n 4¼ 64 elements.

Finally, in the matching step, the Euclidean distancebetween the descriptor vectors of points of interest in thevehicle image is computed for matching evaluation. As aresult, the corresponding relationships between the vehi-cles extracted in the current frame and those in previousframe are generated.

3.2. Vehicles overtaking detection

A whole vehicle overtaking process consists of severalsteps, including direction changing, lane changing, accel-erating, passing, and returning back to the original lane.For safety purpose, there should be a warning message tonotice the preceding vehicle in the initial phase of over-taking. In vehicles overtaking detection method, the trend


of overtaking when the vehicle changes direction isrecognized. In addition, this method can also be used tojudge whether the drivers violate the traffic regulations. Inthe following sections, the method of vehicles overtakingdetection is discussed in detail.

At the beginning of overtaking, there will be an anglebetween the vehicle and the road orientation, as is shownbelow in Fig. 6. According to depth value of different pixelson the vehicle, the angle can be estimated by fitting theline which is obtained by projecting the vehicle side to aplane. If there is a vehicle detected ahead, it is believedthat the vehicle changing direction intends to overtake thepreceding vehicle.

The vehicles overtaking scenes are modeled in thecoordinate system, as shown in Fig. 7. The positive direc-tion of X-axis points at the right of Kinect. The y-value isthe depth value which represents the distance from theplane of the Kinect to each pixel of vehicles. The angle






1

3

5

7

9

11

13

15

17

19

21

23

25

27

29

31

33

35

37

39

41

43

45

47

49

51

53

55

57

59

61

63

65

67

69

71

73

75

77

79

81

83

85

87

89

91

93

95

97

99

Fig. 10. Left overtaking driving from left to right.


between the vehicle side and the road can be obtainedthrough the following steps:

(1) Line fitting: After the vehicle tracking, the corre-sponding area of the vehicle in depth image can bedetermined. As a result, the depth value of every pixel ofthe detected vehicle can be obtained. In the coordinatesystem, the vehicle side which is captured by the Kinectcan be projected to the XY plane out of a line approxi-mately. The equation of the straight line can be obtainedthrough following steps.

Firstly, the average y-value on the line S at intervalsalong the X-axis is calculated as adi ¼ ð∑n

1djÞ=n, where djdenotes the depth value of the pixel, n denotes the numberof pixels of a vertical line on the vehicle side. The model toobtain average vertical depth is shown in Fig. 8.

Then some representative coordinates of the vehicleside are obtained, such as ðx1; ad1Þ; ðx2; ad2Þ…ðxj; adjÞ. Byfitting the line with a least square method, the linearequation of the vehicle side can be represented as theequation: y¼ kxþb, where bis calculated as (8), and k iscalculated as (9).

b¼ ∑m

1adj

� �m�k ∑

m

1xj

� �m=

�ð8Þ

k¼ ∑m

1xjadj� ∑

m

1xj ∑

m

1adj

� �m

#∑m

1x2

j � ∑m

1xj

� �2

m

#,",,"

ð9Þ

(2) Overtaking judgment: Since we have obtained theline which represents the vehicle side, the approximateangle between the vehicle and the road direction can becalculated as α¼ arctanðkÞ. Then the direction change ofthe vehicle can be traced continuously and the motion ofthe vehicle can be tracked.

Set an angle threshold θ which is appropriate fordeciding whether the detected vehicle is turning its

Fig. 8. Average depth at intervals along X-axis.

Fig. 9. Left overtaking driving from right to left.


driving direction and changing to the adjacent lane. Inorder to counteract the depth data errors generatedrandomly in the acquisition process, θ should be setreasonably to eliminate those errors. Different scenariosof overtaking and the corresponding judgment methodsare elucidated below.

As illustrated in Fig. 9, two vehicles are running in thedriving direction, which is from the right to the left in thediagram. If the angle α between the directions of thevehicle A and the road changed suddenly and its variationis more than the threshold θ, it shows that the vehicle B isovertaking vehicle A on the left.

The proposed estimation rules are applicable to thesituation when vehicles are running in the direction fromleft to right, as shown in Fig. 10.

If the angle is between �θ1 and �901, it is believedthat illegal overtaking on the right is taking place, asshown in Figs. 11 and 12.

101

103

105

107

109

111

113

115

117

119

121

123

Fig. 11. Right overtaking driving from right to left.

Fig. 12. Right overtaking driving form left to right.






1

3

5

7

9

11

13

15

17

19

21

23

25

27

29

31

33

35

37

39

41

43

45

47

49

51

53

55

57

59

61

63

65

67

69

71

73

75

Q4


4. Experimental results

4.1. Experiment setup

The simulation experiments implemented in Cþþ areconducted on the desktop PC with 2.6 GHz, Dual core i5CPU. Two vehicle models are arranged to simulate severaltraffic scenarios, including driving straightly and overtak-ing on different directions. The Kinect was fixed at the sideof the virtual road horizontally, facing perpendicular to thedirection of the road. Several videos with the resolution of640n480 were captured to test the effectiveness of theproposed method both on 3D vehicle tracking and over-taking detection.

77

79

81

83

85

87

89

91

93

4.2. 3D vehicle detection and tracking

The vehicle detection and tracking algorithm weretested at different times in different lighting conditionsand scenes, for driving straightly ahead and overtakingfrom different directions. The segmented vehicles in sev-eral traffic scenes during the daytime are labeled withgreen and blue in Fig. 13 to distinguish between them.Fig. 13(a) shows the tracking scene, depth images and thesimulations results where the vehicles go straightly ahead.Fig. 13(b) shows the case of overtaking where the smallervehicle is going to overtake the preceding truck on the leftside. In Fig. 13(c), the smaller vehicle overtakes on theright side, which is illegal in some countries. The vehicle isenclosed in a box if it is successfully tracked. As shown inFig. 13, the gray in depth image changes with depth value:

Fig. 13. Vehicle tracking in different traffic scenes during the daytime. (a) Straighof the references to color in this figure caption, the reader is referred to the we


the greater the depth, the greater the gray value is. Theexperimental results show that the proposed algorithmcan detect vehicles in challenging scenes and overcomethe occlusion effectively.

Some simulations were also conducted under weaklighting conditions. Fig. 14 shows parts of the experimentalresults at night, for driving straightly, left overtaking andright overtaking. As shown in the figure, even though thecolor condition is poor, the vehicles in the images can alsobe segmented.

The experimental results in Figs. 13 and 14 demonstratethe robustness of proposed method to the illuminationchange. Although the low-light condition affects theacquisition of color information, the depth data can alsobe obtained for the segmentation of vehicles and robustunderstanding of traffic scenes. As a result, the depth datacan make up the influence of low-light environmenteffectively.

4.3. Tracking accuracy

In order to evaluate the accuracy of vehicle trackingalgorithm, each car model in the video is numberedmanually at first. By comparing the number of each framelabeled by the proposed tracking algorithm with themanually labeled number, the tracking accuracy can becalculated as 10.

Accuracy¼ Rall=Nall ð10Þ

where Rall denotes the number of the frames where thenumbers are labeled correctly, Nall denotes the number of

95

97

99

101

103

105

107

109

111

113

115

117

119

121

123tly driving, (b) left overtaking and (c) right overtaking. (For interpretationb version of this paper.)






1

3

5

7

9

11

13

15

17

19

21

23

25

27

29

31

33

35

37

39

41

43

45

47

49

51

53

55

57

59

61

63

65

67

69

71

73

75

77

79

81

83

85

87

89

91

93

95

97

99

101

103

105

107

109

111

113

115

117

119

121

123

Fig. 14. Vehicle tracking in different traffic scenes at night. (a) Straightly driving, (b) left overtaking and (c) right overtaking.

Fig. 15. Tracking accuracy during the daytime and night in differentscenarios.


total frames of a video. The tracking algorithm is evaluatedon 12 videos, and Fig. 15 shows the tracking accuracy indifferent scenes and different illumination conditions.

As shown in Fig. 15, the tracking accuracy of allexperiments is higher than 70%, with maximum being upto about 97%. The experimental results show a robust andadaptable performance of our method in different scenes,in which challenging issues such as partial occlusion canbe solved with the help of depth data. However, due tothat the smaller vehicle is partially or totally blocked by thetruck while overtaking on the right, the tracking accuracybecomes lower. However when the smaller vehicle over-takes the truck on the left, it shows little influence onsegmenting these two vehicles. The experimental results


also show that the 3D tracking accuracy during the daytimeis higher than that at night. The reason is that even thoughthe depth data work well at night, they cannot completelymake up the loss of color information.

4.4. Overtaking detection

In the process of overtaking, the vehicle changes theangle gradually until the vehicle completes lane-changing.Fig. 16 shows the process of the angle change in scenes ofstraightly driving, left overtaking and right overtaking inthree videos.

Fig. 17 shows the fitting effect of the proposed methodduring the daytime. The black shadows are obtained byprojecting the vehicle side to a plane whose ordinate is thedepth according to the depth data of each pixel. Byaveraging the depth of vertical direction of vehicle side,some representative points of the vehicle side areobtained. After that, we can fit the line of the vehicle sidewith these points, as the red line in Fig. 17. Fig. 18 showsthe fitting effect at night when the lighting is weak.

As shown in Figs. 17 and 18, the proposed overtakingdetection algorithm is able to effectively detect the anglechange between the vehicle and the road orientationduring the daytime and night. And it lays the foundationfor overtaking judgment in the next step. However,because of the error of depth data and the roughness ofvehicle side, there is some error in linear fitting, as shownin 17(b).

Another experiment is conducted to evaluate the over-taking detection accuracy by counting the incorrect






1

3

5

7

9

11

13

15

17

19

21

23

25

27

29

31

33

35

37

39

41

43

45

47

49

51

53

55

57

59

61

63

65

67

69

71

73

75

77

79

81

83

85

87

89

91

93

95

97

99

101

103

105

107

109

111

113

115

117

119

121

123

Fig. 16. The direction change in overtaking.

Fig. 17. Linear fitting of vehicle side during the daytime. (a) Straightly driving, (b) left overtaking and (c) right overtaking. (For interpretation of thereferences to color in this figure caption, the reader is referred to the web version of this paper.)


judgment on overtaking. The incorrect judgment includes(1) Judging straightly driving as overtaking. (2) Judging leftovertaking as right overtaking, and judging right over-taking as left overtaking. (3) Missing judgment of over-taking. Set the angle threshold θ as 101 in order toeliminate the error of depth data and the roughness ofthe vehicle side. Six videos which contain different trafficincidents were captured during the daytime and night. Theground truth of each frame is labeled manually with:straightly-driving, left-overtaking and right-overtaking. By


calculating the angle α with the fitted line of the vehiclebehind, the judgment on overtaking can be given. Fig. 19shows the error rate of overtaking detection experimentsin different scenes during the daytime and night. The errorrate is defined as Eq. (11).

ErrorRate¼Nerror=Nall ð11Þ

where Nerror denotes the total of frames where the judg-ment is wrong, Nall denotes total of frames of a video.






1

3

5

7

9

11

13

15

17

19

21

23

25

27

29

31

33

35

37

39

41

43

45

47

49

51

53

55

57

59

61

63

65

67

69

71

73

75

77

79

81

83

85

87

89

91

93

95

97

99

101

103

105

107

109

111

113

115

117

119

121

123

Fig. 18. Linear fitting of vehicle side at night. (a) Straightly driving, (b) left overtaking and (c) right overtaking.

Fig. 19. Error rate of overtaking detection.


As shown in Fig. 19, the whole error rate of conductedexperiments is lower than 25%, and the proposed methodperforms well in different traffic scenes and lightingconditions according to the experimental results. Mean-while, we also perceive that the error rate still keepshigher when overtaking occurs, especially in the situationof right overtaking, in which most part of the smaller car isblocked by the truck, and therefore little useful informa-tion can be extracted either from the color image ordepth data.

4.5. Efficiency

In order to timely avoid potential accidents caused bynormal or illegal overtaking, the overtaking behaviorsshould be recognized in a short time period. We performthree efficiency test by using three videos which contain


different scenes during the daytime. The time consumedfor processing each frame from vehicle tracking to judg-ment making on overtaking is shown in Fig. 20.

The figure shows that the average time consumptionfor each frame is in the range from 200 ms to 350 ms,which can be regarded as achieving real-time perfor-mance. The time consumption for each frame is deter-mined by the number of feature points detected by theHessian matrix. If more feature points on the vehicles areextracted, it will take more time to match them. Inaddition, the proposed algorithm can detect the trend ofovertaking efficiently and avoid the potential accidentswith little delay.

5. Conclusions and future work

In this paper, we proposed a overtaking detectionmethod using the RGB-D camera. This method firstlydetects vehicles with the three-frame FD algorithm. Thenthe vehicle is segmented with the assistance of depth data.By extracting the SURF feature vector which is moreefficient than SIFT, the vehicles in current and previousframes are matched, and the vehicles are tracked. Finally,the depth data of vehicles are used to detect vehiclesovertaking.

The two steps of the proposed method have beentested in different simulation scenes and different lightingconditions. The experimental results show the effective-ness of 3D vehicle tracking and overtaking detectionalgorithm. The primary contribution of our method is theutilization of depth data which is a feasible supplementaryto 2D vehicle detection and tracking. We can use the depthdata to overcome the occlusion problem and make the






1

3

5

7

9

11

13

15

17

19

21

23

25

27

29

31

33

35

37

39

41

43

45

47

49

51

53

55

57

59

61

63

65

67

69

71

73

75

77

79

81

83

85

87

89

91

93

Q3

Fig. 20. Time consumption for each frame.


vehicle segmentation more reliable. Moreover, the posi-tion and posture information of vehicles can be deter-mined easily with depth data, and the overtaking can berecognized effectively.

In the future, by rectifying the incorrect depth data andfilling in the missing data, the algorithm can track thevehicles more accurately and give more reliable judgmentabout overtaking. In addition, the method to detect othersteps of vehicle overtaking such as changing lane andaccelerating will be developed.

95

97

99

101

103

105

107

109

111

113

115

117

119

121

123

Q2

Acknowledgment

This paper draws on work supported in part by thefollowing funds: National High Technology Research andDevelopment Program of China (863 Program) under grantnumber 2011AA010101, National Natural Science Founda-tion of China under grant number 61002009 and61304188, Key Science and Technology Program of Zhe-jiang Province of China under grant number 2012C01035-1, and Zhejiang Provincial Natural Science Foundation ofChina under grant number LZ13F020004.

References

[1] Z. Sun, G. Bebis, R. Miller, On-road vehicle detection using opticalsensors: a review, in: 2004 Proceedings of the 7th International IEEEConference on Intelligent Transportation Systems, IEEE, 2004,pp. 585–590.

[2] L. Spinello, K.O. Arras, People detection in rgb-d data, in: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS), IEEE, 2011, pp. 3838–3843.

[3] P. Henry, M. Krainin, E. Herbst, X. Ren, D. Fox, Rgb-d mapping: usingdepth cameras for dense 3d modeling of indoor environments, in:Experimental Robotics, Springer, 2014, pp. 477–491.

[4] J. Shen, S.-C.S. Cheung, Layer depth denoising and completion forstructured-light rgb-d cameras, in: 2013 IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR), IEEE, 2013, pp. 1187–1194.

[5] D. Holz, S. Holzer, R.B. Rusu, S. Behnke, Real-time plane segmenta-tion using rgb-d cameras, in: RoboCup 2011: Robot Soccer WorldCup XV, Springer, 2012, pp. 306–317.


[6] R. Liang, L. Yan, P. Gao, X. Qian, Z. Zhang, H. Sun, Aviation videomoving-target detection with inter-frame difference, in: 2010 3rdInternational Congress on Image and Signal Processing, vol. 3, 2010,pp. 1494–1497.

[7] S. Brutzer, B. Hoferlin, G. Heidemann, Evaluation of backgroundsubtraction techniques for video surveillance, in: 2011 IEEE Con-ference on Computer Vision and Pattern Recognition (CVPR), IEEE,2011, pp. 1937–1944.

[8] J.-M. Guo, Y.-F. Liu, C.-H. Hsia, M.-H. Shih, C.-S. Hsu, Hierarchicalmethod for foreground detection using codebook model, IEEE Trans.Circuits Syst. Video Technol. 21 (6) (2011) 804–815.

[9] Z. Liu, H. Shen, G. Feng, D. Hu, Tracking objects using shape contextmatching, Neurocomputing 83 (2012) 47–55.

[10] B. Płaczek, M. Staniek, Model based vehicle extraction and trackingfor road traffic control, in: Computer Recognition Systems, vol. 2,Springer, 2007, pp. 844–851.

[11] J.U. Cho, S.H. Jin, X. Dai Pham, D. Kim, J.W. Jeon, A real-time colorfeature tracking system using color histograms, in: 2007 Interna-tional Conference on Control, Automation and Systems, ICCAS'07,IEEE, 2007, pp. 1163–1167.

[12] C.-Y. Chung, H.H. Chen, Video object extraction via mrf-basedcontour tracking, IEEE Trans. Circuits Syst. Video Technol. 20 (1)(2010) 149–155.

[13] H. Sabirin, M. Kim, Moving object detection and tracking using aspatio-temporal graph in h. 264/avc bitstreams for video surveil-lance, IEEE Trans. Multimed. 14 (3) (2012) 657–668.

[14] H.-c. Lu, C. Wang, Y.-w. Chen, Gaze tracking by binocular vision andlbp features, in: 2008 19th International Conference on PatternRecognition, ICPR 2008, IEEE, 2008, pp. 1–4.

[15] V. Takala, M. Pietikainen, Multi-object tracking using color, textureand motion, in: 2007 IEEE Conference on Computer Vision andPattern Recognition, CVPR'07, IEEE, 2007, pp. 1–7.

[16] Q. Miao, G. Wang, C. Shi, X. Lin, Z. Ruan, A new framework for on-line object tracking based on surf, Pattern Recognit. Lett. 32 (13)(2011) 1564–1571.

[17] Q. Wang, J. Fang, Y. Yuan, Multi-cue based tracking, Neurocomputing.[18] X. Song, R. Nevatia, A model-based vehicle segmentation method for

tracking, in: 2005 Tenth IEEE International Conference on ComputerVision, ICCV 2005, vol. 2, IEEE, 2005, pp. 1124–1131.

[19] B. Zhong, Y. Chen, Y. Shen, Y. Chen, Z. Cui, R. Ji, X. Yuan, D. Chen,W. Chen, Robust tracking via patch-based appearance modeland local background estimation, Neurocomputing 123 (2014)344–353.

[20] X. Wei, S.L. Phung, A. Bouzerdoum, Object segmentation andclassification using 3-d range camera, J. Vis. Commun. ImageRepresent. 25 (1) (2014) 74–85.

[21] P.H. Batavia, D.A. Pomerleau, C.E. Thorpe, Overtaking vehicle detec-tion using implicit optical flow, in: 1997 IEEE Conferenceon Intelligent Transportation System, ITSC'97, IEEE, 1997, pp. 729–734.

[22] V. Milanés, D.F. Llorca, J. Villagrá, J. Pérez, C. Fernández, I. Parra,C. González, M.A. Sotelo, Intelligent automatic overtaking system






1

3

5

7

9

11


using vision for vehicle detection, Expert Syst. Appl. 39 (3) (2012)3362–3373.

[23] J. Wang, G. Bebis, R. Miller, Overtaking vehicle detection usingdynamic and quasi-static background modeling, in: 2005 IEEEComputer Society Conference on Computer Vision and PatternRecognition-Workshops, CVPR Workshops, IEEE, 2005, pp. 64–64.


[24] Y. Zhu, D. Comaniciu, M. Pellkofer, T. Koehler, Reliable detection ofovertaking vehicles using robust information fusion, IEEE Trans.Intell. Transp. Syst. 7 (4) (2006) 401–414.

[25] A. Agrawal, Y. Xu, R. Raskar, Invertible motion blur in video, ACMTrans. Graph (2009) 1–8.






vehicles overtaking detection using rgb-d data

Documents