sppra'2013 paper presentation

Authors: Andrews SobralLeizer SchnitmanLuciano OliveiraFelippe De Souza

1

Introduction Proposed System Crowd Segmentation

Background Subtraction Evaluation

Crowd Density Estimation

Crowd Tracking

Speed Estimation of Vehicle Crowd

Feature Vector Experimental Evaluation Discussion and Conclusions

2

Intelligent vision systems for urban traffic surveillance have been adopted more frequently.

The traditional approaches are based on detectionand counting of individual vehicles.

Basically each vehicle is segmented and tracked, and its motion trajectory is analyzed to estimate traffic flow, vehicle speed and parked vehicle.

3

4

AgilityVideo system for vehicle counting. VaxtorSystems for vehicle speed estimation.

I2V system for vehicle counting and classification. VCA system for vehicle detection, tracking and classification.

5

Most of the existing work commonly fails on crowded situations due to the large occlusion of moving objects.

Alternative methods for dealing with this problem gave rise to a new field of study called crowd analysis (Junior et al, 2010).

There are two approaches to perform behavioral analysis of crowded scenes:• Object-based approaches try to infer crowd behavior by analyzing

individual elements of the scene (tracking of some individuals to analyze the group behavior).

• Holistic approaches evaluate the crowd as an individual entity.

Holistic approaches try to obtain global information, such as crowd flows, and skip local information (e.g. single vehicle against the flow).

• Background Subtraction• Optical Flow• Texture, Color and Edge analysis• Analysis in Frequency Domain

Input video Vehicle crowd

DensitySpeedDirectionLocalization

Saxena et al. (2008), Zhan et al. (2008;) and

Junior et al. (2010)

Related works

Porikli and Li (2004) – DCT and MPEG flow vectors

Chan and Vasconcelos (2005) – ARIMA

Lee and Bovik (2009) – hist. of optical flow vectors

Derpanis and Wildes (2011) – 3D gaussian + fourrier

6

Some holistic properties can be extracted from crowds

behavior analysis like:

Can be extracted from:

Classification based on autoregressive model (Chan and Vasconcelos 2005).

Commercial system from ObjectVideo for classification

of traffic status.

7

8

We propose a method to classify traffic patterns based on holistic approach

The method classifies the traffic into three classes (light, medium or heavycongestion) by usage of average crowd density and average speed of vehicle crowd.

The heavy congestion is represented by a high crowd density and low (or zero) crowdspeed. Otherwise, when crowd density is low and crowd speed is high, the systemconsider that the traffic has light congestion. In intermediate situations, the traffic isclassified as medium congestion.

9

Here we estimate the vehicle crowd density by background subtraction process.

Five recently background subtraction methods with ChangeDetection.net videodatabase are evaluate.

All BGS methods have been set with default parameters defined in each work.

As can be seen in Table 1, the Multi-Layer method proposed by Yao and Odobez (2007) had the best score.

10

The crowd density is determined by counting the number of pixels in foregroundmask obtained by Multi-Layer BGS.This procedure is performed for each video frame.

Note: The traffic crowd density is estimated by the average of density

variation in each video.

11

To perform the crowd trackingthe traditional KLT (Kanade-Lucas-Tomasi) tracker method was chosen.

Example of feature points tracking. Given twoconsecutive frames (a), one extracts a certainamount of feature points in first frame (b) andseek for the matching points in the secondframe (c). The filtered out points are shown in(d).

12

To estimate the speed, the average displacement of feature pointsalong all frames is calculated.

13

To train the classifier for predicting the traffic congestion, a feature vector is built

for the i th processed video.

They are all combined in one train vector

i = average crowd density of the i th video

vi = average crowd speed of the i th video

UCSD highway traffic data set

14

Each video has 42-52 frames with 320x240 resolution recorded at 10 frames per second (fps). The data set also provides a hand-labeled ground truth that describes each video sequence. Table 2 shows a summary of UCSD dataset.

A set contains 254 videos of daytime high-way traffic in Seattle.All videos are recorded from a single stationary camera totaling 20 minutes.The data set includes a diversity of traffic patterns like light, medium andheavy congestion with variety of weather conditions (e.g., clear, raining andcloudy).

Highway traffic video classification

15

The same training and testing methodology of Chan e Vasconcelos (2005) and Derpanis and Wildes (2011) is adopted here.

The experiment evaluation consists of four trials (T1,...,T4), where in each trial the data set was split with 75% for training and cross-validation and 25% for testing.

Feature classification is evaluated using four classifiers (kNN, Naive Bayes, SVM and ANN-MLP).

The results obtained for each classifier are shown in the next slides.

16

kNN with Euclidean distance was used and the number of kNN neighbors was evaluated empirically varying the range in K = [1, …,10].

17

Here the NBC assumes a Gaussian distribution.

Table 4 shows the accuracy of NBC.

18

Here, the following kernels are selected: linear, polynomial, radial basis and sigmoid.

The parameters of each kernel function have been adjusted automatically by 3-fold and 10-fold cross-validation on training sets.

19

20

The MLP network was configured as follows:

a) the input layer has 2 neurons (one for crowd density and the other to crowd speed);

b) the hidden layer evaluation are made with 2-5 neurons;

c) the output layer contains 3 neurons, one for each traffic patterns (light, medium and heavy).

21

The MLP network was configured as follows:

All neurons use the same activation functions. Sigmoid function (SIG) and Gaussian function (GAU) are selected with standard parameters.

Two training algorithms are chosen, the traditional gradient descent back-propagation(BPROP) and resilient back-propagation (RPROP).

The experimental results have shown that the ANN-MLP network has the best accuracy (94.5%).

The most critical situation of misclassification occurs between medium and heavy traffic patterns (next slide shows that).

All experiments are performed on a computer running Intel Core i5-2410m processor.

On UCSD data, the proposed system requires avg. 30ms/frame for background segmentation and tracking.

In Chan and Vasconcelos (2005), the authors have had 94.5% of accuracy using SVM classifier

Later, Derpanis and Wildes (2011) have achieved an accuracy of 95.3% with kNN classifier (Table 9, next slide)

The present proposal has achieved 94.5% of accuracy using ANN-MLP(Artificial Neural Networks-Multi-Layer Perceptions). 22

This paper has presented a system for traffic congestion classification based on crowd density and crowd speed of vehicles.

The present approach is based on crowd segmentation and tracking using robust background subtraction method and traditional pyramidal KLT feature tracker.

Experimental evaluation on real world data set shows that the proposed system achieves compatible results of similar previous work even when using a different approach.

24

In previous works Chan and Vasconcelos (2005) and Derpanisand Wildes (2011), the authors have described that only dynamical information is insufficient to distinguish empty-traffic from stopped-traffic (both stationary) since the pixel dynamics are similar.

Derpanis and Wildes (2011) suggests that one possible solution is to incorporate spatial appearance information of background scene to distinguish the presence (or absence) of vehicles.

25

In the present work, the BGS method includes both spatial appearance and dynamical information to build a background model, but the problem of distinguishing empty-traffic from stopped-traffic (both stationary) still remains a challenge since:

a) to initialize the background model it is necessary that traffic is not stopped, otherwise the background model will include stopped vehicles; and

b) even having a reasonable background model, the BGS method needs updating with a certain learning rate. If the vehicles are stopped for a lengthy period, the BGS method may include stopped vehicles in the background model.

So, the problem of empty-traffic and stopped-traffic is not completely solved here.

Another possible solution for future work research may be the development of a crowd detector that uses only spatial appearance to segment the vehicle’s crowd.

26

Thank you!

Felippe de [email protected]

BUCH, N.; VELASTIN, S.; ORWELL, J. A review of computer vision techniques for the analysis of urban traffic. IEEE Transactions on Intelligent Transportation Systems (ITS'11), v. 12, n. 3, p. 920{939, Sept. 2011.

CHAN, A.; VASCONCELOS, N. Classification and retrieval of traffic video using auto-regressive stochastic processes. In: IEEE Intelligent Vehicles Symposium, 2005. p. 771-776.

DERPANIS, K. G.; WILDES, R. P. Classification of traffic video based on a spatiotemporal orientation analysis. In: Proceedings of the 2011 IEEE Workshop on Applications of Computer Vision (WACV). Washington, DC, USA: IEEE Computer Society, 2011. (WACV'11), p. 606-613. ISBN 978-1-4244-9496-5.

JUNIOR, J. J.; MUSSE, S.; JUNG, C. Crowd analysis using computer vision techniques. IEEE Signal Processing Magazine, v. 27, n. 5, p. 6677, sept. 2010.

LUCAS, B. D.; KANADE, T. An iterative image registration technique with an application to stereo vision. In: Proceedings of the 7th international joint conference on Artificial intelligence - Volume 2. [S.l.: s.n.], 1981. p. 674-679.

LEE, J.; BOVIK, A. Estimation and analysis of urban traffic flow. In: 16th IEEE International Conference on Image Processing (ICIP'09). [S.l.: s.n.], 2009. p. 1157-1160. ISSN 1522-4880.

SANTORO, F.; PEDRO, S.; TAN, Z.-H.; MOESLUND, T. B. Crowd analysis by using optical flow and density based clustering. Proceedings of the European Signal Processing Conference (EUSIPCO), European Association for Signal Processing (EURASIP), v. 18, p. 269{273, 2010. ISSN 2076-1465.

28

SAXENA, S.; BREMOND, F.; THONNAT, M.; MA, R. Crowd behavior recognition for video surveillance. In: Proceedings of the 10th International Conference on Advanced Concepts for Intelligent Vision Systems. Berlin, Heidelberg: Springer-Verlag, 2008. (ACIVS'08), p. 970-981. ISBN 978-3-540-88457-6.

SENST, T.; EISELEIN, V.; SIKORA, T. Robust local optical flow for feature tracking. IEEE Transactions on Circuits and Systems for Video Technology, 2012.

SHI, J.; TOMASI, C. Good features to track. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'94). [S.l.: s.n.], 1994. p. 593-600. ISSN 1063-6919.

SOBRAL, A.; OLIVEIRA, L.; SCHNITMAN, L.; SOUZA, F. D. Highway traffic congestion classification using holistic properties. 10th IASTED International Conference on Signal Processing, Pattern Recognition and Applications (SPPRA'2013), fev. 2013.

YAO, J.; ODOBEZ, J. Multi-layer background subtraction based on color and texture. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR'07), 2007. p. 1{8.

YILMAZ, A.; JAVED, O.; SHAH, M. Object tracking: A survey. ACM Computing Surveys, ACM, New York, NY, USA, v. 38, n. 4, dec 2006. ISSN 0360-0300.

ZHAN, B.; MONEKOSSO, D. N.; REMAGNINO, P.; VELASTIN, S. A.; XU, L.-Q. Crowd analysis: a survey. Machine Vision and Applications, v. 19, n. 5-6, p. 345-357, 2008.

29