using big data and efficient methods to capture...

Using Big Data and Efficient Methods to Capture Stochasticity for Calibration of Macroscopic Traffic Simulation Models

1 Department of Civil and Environmental Engineering, Rutgers University, New Jersey

2 Center for Urban Science + Progress (CUSP); Department of Civil & Urban Engineering,

New York University, New York

Sandeep Mudigonda 1, Kaan Ozbay 2

CIVIL AND ENVIRONMENTAL ENGINEERING

Simulation & Calibration

Traffic Simulation

Model SI

| ,Sim s sO I C

Calibration

Inputs: Travel Demand Geometry Operational rules

ObsO

{ }min : ( , ( , ))S

obs sim s sCU O O I Cε

sC

Calibration Parameters: User-, traffic-related parameters

ε

Simulated Outputs: given inputs and parameters Observed

Field Data

Error

: ( , ) | ,( , ) functional form of the internal models in a simulation system

simulation output data given the input data and calibrations,margin of error between simulation

obs s s sim s s

s s

sim

O f I C O I Cf I C

O

ε

ε

→ +

=

=

= output and observed data, and,observed field data.obsO =

Model Inputs: •  Driver

Characteristics Data

•  Vehicle Composition Data

•  Travel Demand Data

•  Ped./Bike Data

Model Parameters: •  Link (capacity,

speed limit,…) •  Path (route choice,

tolls,…) •  Infrastructure

(signal timings, VMS, work zones,…)

•  Weather •  Driver behavior

data •  Activity data

Observed Outputs: •  Flows & Speeds •  Queue data •  Trajectories •  Accidents (?) •  Emissions •  Other

Data Needs for Calibration

Study Data Hourdakis et al. (2003)

5-min. data; 21 detector stations; 12-mile freeway section; PM peak; 3 days

Jha et al. (2004)

Detector data;15 days; AM and PM peaks; large urban network

Toledo et al. (2004)

68 detector stations; 3 freeways; 5 weekdays

Qin and Mahmassani (2004)

7 detector stations; 3 freeways; AM peak; 5 weekdays

Kim et al. (2005)

Travel time data for 1 hr.; AM peak;1.1 km. freeway section

Balakrishna et al. (2007)

15 min. data; 33 detector stations

Zhang et al. (2008)

5-min detector count; PM peak; 7 days

Mudigonda et al. (2009)

ETC data for AM and PM peaks

Lee and Ozbay (2009)

5-min detector count; AM & PM peaks ; 16 days

OR

OR OR

Data used for simulation calibration, •  spans 3-16 days, •  limited to few specific conditions or, •  a diluted sample of different

conditions

Data used in Previous Calibration Studies

CURRENT PRACTICE

Distribution of traffic data: Typical day? •  Illustration of demand

clusters: No “typical” day

•  Big Data’s large spatial and temporal extent can help calibrate and validate traffic simulation models.

–  RFIDs,

–  GPS-equipped devices,

–  Traffic sensors and cameras

Incorporating Stochasticity in Macroscopic Model •  Macroscopic model used adopted to simulate traffic flow during

different conditions.

•  Stochastic version of first-order model

•  Hence simulation parameters and output are obtained

( )( , , ) ( , , ) 0( , , )

( , ) : - for t-th time period( ,0) ( , ) : stochastic initial condition

( , ) ( , ) : stochastic demand

t x

t

IB

i i

x t x tx t D

q v f qx xx t t Z

ω ω

ω

ρ ρ ω ρ

ρ ρ ω

ρ ρ

∂ + ∂ =

∈ ×Ω

=

=

=

ρ ρvρ

@

( , ) ( , , )tx f t x A Pρ = ∈ Ω ( , ) ( , , )t

x g t x A Pʹ′ ʹ′ ʹ′Θ = ∈ Ω

t (time of day, season, weather, etc.) and x (distance, changing geometry or pavement condition in different parts of the network).

METHODOLOGY

( : , , ) 0B x t ω =ρ

Solving Stochastic Traffic Simulation Models

• Computational complexity is an important factor in the choice of numerical solution methods

• Most simple and common solution method is a Monte Carlo-type independent sampling of n simulation runs for various traffic conditions.

• No. of replications for a level of precision γ: • Convergence rate for MC-type method is slow: O(1/√n) • Depending on the size of the network and no. of stochastic dimensions, this approach can become prohibitive in terms of computational time requirements.

• Also, all possible points in the stochastic space of simulation output may not have corresponding observed data.

METHODOLOGY

2

1,1 /2 ( )( )( )

nt S nn

nαγ

γ− −

⎛ ⎞≥ ⎜ ⎟⎜ ⎟⋅ Ξ⎝ ⎠

Stochastic Collocation

•  The stochasticity is treated as another dimension and the stochastic solution space Ω is approximated (Γ) using –  a set of prescribed support nodes with basis functions Θj (stochastic

collocation)

•  The multi-dimensional stochastic solution is approximated by an interpolation function built using deterministic solutions evaluated at each of a set of prescribed nodes (collocation points)

METHODOLOGY

{ }1

Q

j j=∈ΓΘ

1

( , , ) ( , , ) ( )

ˆ ( ),

where, ( ) pdf/weight of -th interpolation basis function

Q

j j jj

j

x t x t p d

pj

ρ α

α

Γ

=

≈

≈ = Θ

=

=

∫

∑

ρ ξ ρ ξ ξ ξ

ρ

ξ ξ

%

For higher dimensions of stochasticity, computationally efficient schemes are required to reduce the number of collocation points.

Smolyak Algorithm

•  Developed originally for multi-dimensional integration •  1-D interpolant •  In N-dimensions the full tensor interpolant is approximated by

the sparse grid interpolant •  Error: O(Q-2|logQ|3(N-1)). (piecewise linear basis) •  O(Q-k|logQ|(k+2)(N-1)). (k-polynomial basis)

1( ) ( ) , # nodes at level .

imi

j j ij

U f f L m i=

= Θ =∑

Can be controlled by poly. order k : O(Smolyak) >

O(MC), O(LHS)

Collocation Points at which the deterministic simulation is performed

Distribution of parameter 2 (jam density, etc.)

Distribution of parameter 1 (free flow speed, etc.) Multiple replications

for variance reduction due to stochastic demands

METHODOLOGY

Parameter Optimization •  From each realization of the parameter set, using the demand

distribution as an input, the simulation output distribution (e.g., flow or density distribution) is generated.

•  This distribution is compared with the observed output distribution and using a test statistic (such as the test statistic from the KS test), the error is estimated.

•  This error is used as an objective function and is minimized as part of the multi-objective parameter optimization using the simultaneous perturbation stochastic approximation (SPSA) algorithm.

{ }1 1 2 21

min ( , ( )) ( , ( ))

where,, - observed and simulated flows at location

, - observed and simulated densities for location

- parameter set for time period

t

NOb S k Ob S ki i t i i t

i

Ob Si iOb Si ikt

wU q q wU

q q ii

ρ ρ

ρ ρ

Θ=

Θ + Θ

Θ

∑

1 2

1 2

and iteration , - weights for the error measures, - functions representing the error in flow and density

t kw wU U

Weight parameter signifies the variance of each output measure in

the data

w

METHODOLOGY

Output at collocation

point

Collocation Points

Deterministic 1st Order

PDE

Error ≤ Allowed Error ? Output

Distribution

Yes

Flowchart of Calibration using Stochastic Collocation

No

( , )jx tρ

Any existing simulation or legacy codes can alternatively be used

Parallelizable Input

Demand / Parameter Distribution

j = j + 1

1|Q

j jt =

?j Qt t=

jt

No

Yes

*( , , )x t Zρ

METHODOLOGY

SPSA Optimization

New Parameter

Set ktΘ

j = 0

END

Study section

•  Section of NJTPK at interchange 7 with a single on- and off-ramp with stochastic demand

•  Big Data: ETC Data –  Vehicle-by-vehicle entry and exit time, lane,

transaction type, vehicle type, number of axles.

–  Available in NJ for 150 miles of NJTPK and 170 miles of GSP.

•  The variation in demand at this section is captured using the ETC data for every 5 minutes between January 1, 2011 and August 31, 2011.

RESULTS

Study Data

•  The demand is divided into clusters using k-means algorithm.

•  For each cluster, the distribution of demand during each 5 minute time period is generated.

•  The simulation is performed for weekday –  AM peak (7-9AM) –  off-peak (10AM-12PM) –  weekend peak period (10AM-12PM)

Implementation of Proposed Approach to Study section •  With the demand distribution as an input, for each realization

of the parameter set, the simulation output flow distribution is generated.

•  Clemshaw-Curtis grid is the appropriate sparse grid to discretize the stochastic demand.

•  Sparse grid interpolation is performed using the output of the simulation at each collocation node.

•  Distribution of simulated flows is obtained by repeated evaluation of the Smolyak interpolation function.

•  This distribution is compared with the sensor data flow distribution and using the test statistic from the KS test @ 90% sign., the error is estimated and is minimized using the SPSA algorithm.

Results

•  To achieve the flow distribution for ,

•  AM peak: –  SC approach required 2433

evaluations –  MC-type sampling required

240,000 runs

Results

•  To achieve the flow distribution for ,

•  Off-peak –  SC approach required 441


5420 runs

Results

•  Weekend peak –  SC approach required 441


8000 runs

Results

•  To illustrate the drawback of using limited data, we compare the distribution of flow for high and low weekend demands with the case where only three weekend days of flow and demand are used to calibrate the weekend model.

Conclusions •  Calibrating for various traffic conditions require large datasets. •  Big Data such as RFID data from ETC data is useful. •  However, Big Data poses the computational problem in

calibrating for all conditions. •  Traditional MC-type sampling need heavy computational

resources •  We propose a methodology to capture stochasticity using

stochastic collocation by defining each stochastic factor as a dimension.

•  Computationally efficient sparse grids are used to sample the stochastic space and build an interpolant using the deterministic output at each support node of the grid.

CONCLUSIONS

Conclusions & Future Work •  Using 5-min. 8-month demand data, we calibrate AM peak,

off-peak and weekend peak macroscopic traffic models. •  Distribution of flows is obtained from the interpolant and used

with the observed dist. to build a KS test stat. for calibration using SPSA.

•  Proposed methodology: –  Any type of simulation model –  Efficient than MC-type methods –  Parallelized to increase speed

•  Use stochastic parameters for jam density and wave speed for the traffic flow fundamental diagram for a larger freeway section.

•  Apply methodology for a larger network with higher dimensions of stochasticity

FUTURE WORK

using big data and efficient methods to capture...

Documents