improvement of search strategy with knn approach for traffic state prediction

Improvement of Search Strategy With KNN Approach for Traffic State Prediction

Midhun Xavier

CSE15P004

Contents

• Introduction

• Why pattern searching approach

• Why KNN

• Challenges

• Search strategy with KNN

• Single level search strategy

• Sequential search strategy

• Computation complexities

• Performance comparison

• Conclusion

• Reference

10/2/2016 CSE15P004 2

Introduction

• The future traffic state information is essential for maintaining successfulIntelligent Transportation Systems (ITS) deployments and can eventuallycontribute towards the society by mitigating congestion costs.

• For this reason, various data-driven algorithms aiming to increase theprediction reliability have been developed in the past.

• statistical linear models (e.g., linear regression, ARIMA family models, Kalmanfilter),

• artificial intelligence based models (Artificial neural networks),

• pattern searching methods (Nearest neighbors methods).

10/2/2016 CSE15P004 3

Why pattern searching approach?

• A pattern searching approach has been recently receiving attentions fromresearchers due to the recent progress in “big data” related technologies.

• This strategy relies on the historical data, which seems practicable ifsufficiently large database is available.

• One of the pioneering researches of the K-NN based approach has beenimplemented in 1991, which predicts the traffic flow and occupancy with datafrom the previous day.

10/2/2016 CSE15P004 4

Why KNN?

• With larger historical data, the K-NN has been found to outperform othermethods such as ARIMA and feed-forward neural networks.

• The local linear regression model and time-varying linear regression modelhave outperformed the K-NN in terms of accuracy.

• To improve the accuracy, the K-NN method is modified by incorporating ahybrid state vector, a multivariate matching process, and optimal parametricconstants.

10/2/2016 CSE15P004 5

Challenges

• We have to increase the efficiency in making traffic state predictions, whiledealing with relatively large prediction ranges and reducing computationalefforts.

• This paper develops a novel sequential search strategy for the K-NN basedapproach which reduces the searching space sequentially.

• The proposed sequential strategy is found to be outperforming theconventional single-level search approach in terms of prediction measures,which are prediction accuracy, efficiency, and stability.

10/2/2016 CSE15P004 6

Search Strategy With K-NN

• The K-NN extracts k neighbors for a given input set by measuring its similarityto those neighbors, and the feature vector represents the description of atraffic situation.

• Therefore, it is important to compose the feature vector with relevantvariables to recognize correct past records.

• The computational efficiency relies on the search process which is closelylinked with the dimension of the feature vector and the design of the searchstrategy.

10/2/2016 CSE15P004 7

Single-Level Search Strategy

• The feature vector contains current acceleration and deceleration informationcontributing to the identification of traffic dynamics.

• Traffic conditions of adjacent sensors are included in the feature vector byincorporating speeds collected from upstream and downstream of a VehicleDetection System (VDS).

10/2/2016 CSE15P004 8

Feature Vector

• Fig. 1 provides schematic description for the hypothetical highway networkcontaining N VDS sensors (red circle). The basic feature vector (F V ) at a certain time(ti) and location (Ln) contains Msi components (Msi by 1 matrix) as:

in which,

v(ti) = speed at time ti;

v (ti) = acceleration rate at time ti;

Ln = nth VDS; Ln−1 = upstream VDS;

Ln+1 = downstream VDS.

10/2/2016 CSE15P004 9

Section Feature Vector

• The study network covered with N VDS sensors (for all Ln, 1 ≤ n ≤ N), the F Vfor each location comprises Msi by N matrix of section feature vector (SFV) as:

in which,

L1−1 indicates upstream VDS of the first VDS,

LN+1 for downstream VDS of the last VDS.

10/2/2016 CSE15P004 10

Pseudo code of single level search

10/2/2016 CSE15P004 11


• At a certain location (nth VDS), the similarity between current input andthe record from dth day can be determined by calculating thenormalized Euclidean distances (which is unit-less) as:

in which,

SFVT a[, n] and SFVdyd [, n] indicate nth column (= Ln) in SFV of the target

dth day, 1 ≤ d ≤ Dsi, 1 ≤ n ≤ N, and, 1 ≤ m ≤ Msi.

10/2/2016 CSE15P004 12


• This process yields a separate similarity for each VDS location (Ln) and eachday (dyd) as

10/2/2016 CSE15P004 13


• Based on the similarity measurement, we select the ksi neighbors which yieldthe minimum distances in the similarity vector for each location, then,generate the future state by taking the average of the ksi neighbors for eachcolumn

• We call this approach as a single-level search strategy, because it searches theksi historical nearest neighbors of the target traffic pattern with single typesof feature vector on a single search process

10/2/2016 CSE15P004 14

Sequential Search Strategy

• This structure separates the attributes for each search level: the speedand acceleration of nth VDS and its upstream and downstream VDS forLevel I and II respectively.

• The speed is considered in the first level as it is an intuitive representativevariable indicating the traffic state.

• In Level II, the acceleration is considered for distinguishing the detailedclass of current input and historical data among the selected observationsfrom the first stage (Level I).

• Searching historical pattern is a sequential process with multi-levelphases.

10/2/2016 CSE15P004 15


• In the first level (Level II), search queries are made for klvI historical patternsthat yield the most similar pattern to the current observation based on F VlvIfrom the whole data points (DlvI ).

• Note that DlvI indicates the whole number of days in the database, which isequivalent to the size of data that the single-level search strategy covers (DlvI= Dsi{dy1, dy2, . . . , dyDsi (or dyDlvI )}).

10/2/2016 CSE15P004 16


• Subsequently, in the second search process (Level II), we consider thehistorical observations (DlvII = {dy1 , dy2 , . . . , dyDlvII }) that are the productsof the Level I.

10/2/2016 CSE15P004 17

Pseudo code for sequential search

10/2/2016 CSE15P004 18

Computation Complexities

• Compared with the single-level search structure, the complexity ofcomputations is reduced by the two main reasons:

• i) the computation of similarity with smaller dimensions of the feature vectorfrom each level.

• ii) the reduced searching space in Level II as long as the size inequality is heldas:

in which,

size(SFVsi) = Msi ∗ N;

size(SFVlvI ) = MlvI ∗ N;

size(SFVlvII ) = MlvII ∗ N.

10/2/2016 CSE15P004 19


• Using the size of input data set, the algorithmic characteristic oncomputational complexity of single-level search can be expressed as:

• In case of the complexity of sequential search, it linearly grows with thecomplexity of each sub levels in sequential process:

10/2/2016 CSE15P004 20


• The searching size inequality characterizes the complexity inequality betweenthe single-level search and the sequential search:

• To achieve this inequality, it is necessary to satisfy the following sizecondition:

• This size condition is derived from the possible maximum number of daysthat should be concerned in Level II, if the Level I yields mutually exclusive klvIcandidates for each location (Ln).

10/2/2016 CSE15P004 21

Experimental Setting

• The predictive performances of the two types of strategies are investigated using VDS data (5-min speed) from the highway SR78-E (State Route 78 Eastbound) collected from its Performance Measurement System (PeMS).

• This site is chosen because of its high quality data and the availability of various traffic situations.

• The study site is installed with 16 VDS stations (N = 16) along the 25 km stretch with 0.6∼2.9 km of spacing, and isolated bottlenecks frequently appearing during the PM peak hour on typical weekdays.

• The period of the historical data is 240 typical week days spanning over 1.5 years (January, 2013∼July, 2014).

10/2/2016 CSE15P004 22

Description of Experimental Setting

10/2/2016 CSE15P004 23

K?

• The number of neighbors is usually determined with an empiricalprocedure

• As a preliminary study, RSS (Residual Sum of Squares) values are estimatedfrom the linear relation between real points and predicted values.

10/2/2016 CSE15P004 24

Performance Comparison Between Single-Level and Sequential Search Strategies

• The two strategies visually appear to be able to replicate the traffic statetransition by capturing the bottleneck location, the activation time, themaximum queue length, and the duration of bottleneck.

• The prediction accuracy and efficiency have been quantitativelymeasured.

• Using the indicators for the average error (MAPE (Mean AbsolutePercent Error) and RMSE (Root Mean Square Error)) between the realdata and the predicted state, the accuracies from the two strategieshave been compared.

10/2/2016 CSE15P004 25

Example of speed contour generated from real data and prediction results

10/2/2016 CSE15P004 26

Comparison of accuracy between the two strategies for the whole day

10/2/2016 CSE15P004 27

Comparison of accuracy between the two strategies for the whole day

10/2/2016 CSE15P004 28

Comparison of Accuracy between the two strategies for the whole day

10/2/2016 CSE15P004 29

Improvement of pattern searching performance

10/2/2016 CSE15P004 30

Prediction error and Searching time for each strategy

10/2/2016 CSE15P004 31

Internal Improvement Between Level I and Level II in Sequential Search Strategy

• Internally, from Level I to Level II, prediction error reduced averagely from5.09 to 3.83% for MAPE and from 5.12 to 3.88 km/h for RMSE.

• Besides the small improvements for the non-peak hour predictions, theprediction error has approximately reduced by 30% for the peak hour trafficpredictions

• MAPE and RMSE averagely have reduced from 8.84 to 6.13% and from 14.29to 9.91 km/h respectively.

• Moreover, the small variance in the Level II indicates the performances havebeen stabilized: 4.65 to 2.05% in MAPE and 7.32 to 3.25 km/h in RMSE.

10/2/2016 CSE15P004 32

Internal hierarchical improvement across sub-levels in sequential search strategy.

10/2/2016 CSE15P004 33

Prediction error and Searching time for each level in Sequential search strategy

10/2/2016 CSE15P004 34

Increasing searching time according to the size of searching space in Level II

10/2/2016 CSE15P004 35

Conclusion

• Comparing with the conventional single-level search strategy, it is found thatthe sequential structure in the K-NN based searching algorithm outperformsthe former one with the higher efficiency and accuracy.

• Especially for the peak hour traffic predictions, the proposed algorithmsignificantly reduces prediction errors from the conventional strategy, whilstaccelerating computational efficiency.

• The sequential selection process is mainly credited for the performanceimprovements.

10/2/2016 CSE15P004 36

References

• H. Yeo, K. Jang, A. Skabardonis, and S. Kang, “Impact of traffic states on freeway crash involvement rates,” Accident Anal. Prev., vol. 50, pp. 713–723, Jan. 2013.

• B. L. Smith, B. M. Williams, and R. K. Oswald, “Comparison of parametric and nonparametric models for traffic flow forecasting,” Transp. Res. Part C, Emerging Technol., vol. 10, no. 4, pp. 303–321, Aug. 2002

• J. W. C. van Lint, S. P. Hoogendoorn, and H. J. van Zuylen, “Accurate freeway travel time prediction with state-space neural networks under missing data,” Transp. Res. Part C, Emerging Technol., vol. 13, no. 5/6, pp. 347–369, Oct.–Dec. 2005.

10/2/2016 CSE15P004 37

THANK YOU...

10/2/2016 CSE15P004 38

improvement of search strategy with knn approach for traffic state prediction

Engineering