using big data and efficient methods to capture...
TRANSCRIPT
Using Big Data and Efficient Methods to Capture Stochasticity for Calibration of Macroscopic Traffic Simulation Models
1 Department of Civil and Environmental Engineering, Rutgers University, New Jersey
2 Center for Urban Science + Progress (CUSP); Department of Civil & Urban Engineering,
New York University, New York
Sandeep Mudigonda 1, Kaan Ozbay 2
CIVIL AND ENVIRONMENTAL ENGINEERING
Simulation & Calibration
Traffic Simulation
Model SI
| ,Sim s sO I C
Calibration
Inputs: Travel Demand Geometry Operational rules
ObsO
{ }min : ( , ( , ))S
obs sim s sCU O O I Cε
sC
Calibration Parameters: User-, traffic-related parameters
ε
Simulated Outputs: given inputs and parameters Observed
Field Data
Error
: ( , ) | ,( , ) functional form of the internal models in a simulation system
simulation output data given the input data and calibrations,margin of error between simulation
obs s s sim s s
s s
sim
O f I C O I Cf I C
O
ε
ε
→ +
=
=
= output and observed data, and,observed field data.obsO =
Model Inputs: • Driver
Characteristics Data
• Vehicle Composition Data
• Travel Demand Data
• Ped./Bike Data
Model Parameters: • Link (capacity,
speed limit,…) • Path (route choice,
tolls,…) • Infrastructure
(signal timings, VMS, work zones,…)
• Weather • Driver behavior
data • Activity data
Observed Outputs: • Flows & Speeds • Queue data • Trajectories • Accidents (?) • Emissions • Other
Data Needs for Calibration
Study Data Hourdakis et al. (2003)
5-min. data; 21 detector stations; 12-mile freeway section; PM peak; 3 days
Jha et al. (2004)
Detector data;15 days; AM and PM peaks; large urban network
Toledo et al. (2004)
68 detector stations; 3 freeways; 5 weekdays
Qin and Mahmassani (2004)
7 detector stations; 3 freeways; AM peak; 5 weekdays
Kim et al. (2005)
Travel time data for 1 hr.; AM peak;1.1 km. freeway section
Balakrishna et al. (2007)
15 min. data; 33 detector stations
Zhang et al. (2008)
5-min detector count; PM peak; 7 days
Mudigonda et al. (2009)
ETC data for AM and PM peaks
Lee and Ozbay (2009)
5-min detector count; AM & PM peaks ; 16 days
OR
OR OR
Data used for simulation calibration, • spans 3-16 days, • limited to few specific conditions or, • a diluted sample of different
conditions
Data used in Previous Calibration Studies
CURRENT PRACTICE
Distribution of traffic data: Typical day? • Illustration of demand
clusters: No “typical” day
• Big Data’s large spatial and temporal extent can help calibrate and validate traffic simulation models.
– RFIDs,
– GPS-equipped devices,
– Traffic sensors and cameras
Incorporating Stochasticity in Macroscopic Model • Macroscopic model used adopted to simulate traffic flow during
different conditions.
• Stochastic version of first-order model
• Hence simulation parameters and output are obtained
( )( , , ) ( , , ) 0( , , )
( , ) : - for t-th time period( ,0) ( , ) : stochastic initial condition
( , ) ( , ) : stochastic demand
t x
t
IB
i i
x t x tx t D
q v f qx xx t t Z
ω ω
ω
ρ ρ ω ρ
ρ ρ ω
ρ ρ
∂ + ∂ =
∈ ×Ω
=
=
=
ρ ρvρ
@
( , ) ( , , )tx f t x A Pρ = ∈ Ω ( , ) ( , , )t
x g t x A Pʹ′ ʹ′ ʹ′Θ = ∈ Ω
t (time of day, season, weather, etc.) and x (distance, changing geometry or pavement condition in different parts of the network).
METHODOLOGY
( : , , ) 0B x t ω =ρ
Solving Stochastic Traffic Simulation Models
• Computational complexity is an important factor in the choice of numerical solution methods
• Most simple and common solution method is a Monte Carlo-type independent sampling of n simulation runs for various traffic conditions.
• No. of replications for a level of precision γ: • Convergence rate for MC-type method is slow: O(1/√n) • Depending on the size of the network and no. of stochastic dimensions, this approach can become prohibitive in terms of computational time requirements.
• Also, all possible points in the stochastic space of simulation output may not have corresponding observed data.
METHODOLOGY
2
1,1 /2 ( )( )( )
nt S nn
nαγ
γ− −
⎛ ⎞≥ ⎜ ⎟⎜ ⎟⋅ Ξ⎝ ⎠
Stochastic Collocation
• The stochasticity is treated as another dimension and the stochastic solution space Ω is approximated (Γ) using – a set of prescribed support nodes with basis functions Θj (stochastic
collocation)
• The multi-dimensional stochastic solution is approximated by an interpolation function built using deterministic solutions evaluated at each of a set of prescribed nodes (collocation points)
METHODOLOGY
{ }1
Q
j j=∈ΓΘ
1
( , , ) ( , , ) ( )
ˆ ( ),
where, ( ) pdf/weight of -th interpolation basis function
Q
j j jj
j
x t x t p d
pj
ρ α
α
Γ
=
≈
≈ = Θ
=
=
∫
∑
ρ ξ ρ ξ ξ ξ
ρ
ξ ξ
%
For higher dimensions of stochasticity, computationally efficient schemes are required to reduce the number of collocation points.
Smolyak Algorithm
• Developed originally for multi-dimensional integration • 1-D interpolant • In N-dimensions the full tensor interpolant is approximated by
the sparse grid interpolant • Error: O(Q-2|logQ|3(N-1)). (piecewise linear basis) • O(Q-k|logQ|(k+2)(N-1)). (k-polynomial basis)
1( ) ( ) , # nodes at level .
imi
j j ij
U f f L m i=
= Θ =∑
Can be controlled by poly. order k : O(Smolyak) >
O(MC), O(LHS)
Collocation Points at which the deterministic simulation is performed
Distribution of parameter 2 (jam density, etc.)
Distribution of parameter 1 (free flow speed, etc.) Multiple replications
for variance reduction due to stochastic demands
METHODOLOGY
Parameter Optimization • From each realization of the parameter set, using the demand
distribution as an input, the simulation output distribution (e.g., flow or density distribution) is generated.
• This distribution is compared with the observed output distribution and using a test statistic (such as the test statistic from the KS test), the error is estimated.
• This error is used as an objective function and is minimized as part of the multi-objective parameter optimization using the simultaneous perturbation stochastic approximation (SPSA) algorithm.
{ }1 1 2 21
min ( , ( )) ( , ( ))
where,, - observed and simulated flows at location
, - observed and simulated densities for location
- parameter set for time period
t
NOb S k Ob S ki i t i i t
i
Ob Si iOb Si ikt
wU q q wU
q q ii
ρ ρ
ρ ρ
Θ=
Θ + Θ
Θ
∑
1 2
1 2
and iteration , - weights for the error measures, - functions representing the error in flow and density
t kw wU U
Weight parameter signifies the variance of each output measure in
the data
w
METHODOLOGY
Output at collocation
point
Collocation Points
Deterministic 1st Order
PDE
Error ≤ Allowed Error ? Output
Distribution
Yes
Flowchart of Calibration using Stochastic Collocation
No
( , )jx tρ
Any existing simulation or legacy codes can alternatively be used
Parallelizable Input
Demand / Parameter Distribution
j = j + 1
1|Q
j jt =
?j Qt t=
jt
No
Yes
*( , , )x t Zρ
METHODOLOGY
SPSA Optimization
New Parameter
Set ktΘ
j = 0
END
Study section
• Section of NJTPK at interchange 7 with a single on- and off-ramp with stochastic demand
• Big Data: ETC Data – Vehicle-by-vehicle entry and exit time, lane,
transaction type, vehicle type, number of axles.
– Available in NJ for 150 miles of NJTPK and 170 miles of GSP.
• The variation in demand at this section is captured using the ETC data for every 5 minutes between January 1, 2011 and August 31, 2011.
RESULTS
Study Data
• The demand is divided into clusters using k-means algorithm.
• For each cluster, the distribution of demand during each 5 minute time period is generated.
• The simulation is performed for weekday – AM peak (7-9AM) – off-peak (10AM-12PM) – weekend peak period (10AM-12PM)
Implementation of Proposed Approach to Study section • With the demand distribution as an input, for each realization
of the parameter set, the simulation output flow distribution is generated.
• Clemshaw-Curtis grid is the appropriate sparse grid to discretize the stochastic demand.
• Sparse grid interpolation is performed using the output of the simulation at each collocation node.
• Distribution of simulated flows is obtained by repeated evaluation of the Smolyak interpolation function.
• This distribution is compared with the sensor data flow distribution and using the test statistic from the KS test @ 90% sign., the error is estimated and is minimized using the SPSA algorithm.
Results
• To achieve the flow distribution for ,
• AM peak: – SC approach required 2433
evaluations – MC-type sampling required
240,000 runs
Results
• To achieve the flow distribution for ,
• Off-peak – SC approach required 441
evaluations – MC-type sampling required
5420 runs
Results
• Weekend peak – SC approach required 441
evaluations – MC-type sampling required
8000 runs
Results
• To illustrate the drawback of using limited data, we compare the distribution of flow for high and low weekend demands with the case where only three weekend days of flow and demand are used to calibrate the weekend model.
Conclusions • Calibrating for various traffic conditions require large datasets. • Big Data such as RFID data from ETC data is useful. • However, Big Data poses the computational problem in
calibrating for all conditions. • Traditional MC-type sampling need heavy computational
resources • We propose a methodology to capture stochasticity using
stochastic collocation by defining each stochastic factor as a dimension.
• Computationally efficient sparse grids are used to sample the stochastic space and build an interpolant using the deterministic output at each support node of the grid.
CONCLUSIONS
Conclusions & Future Work • Using 5-min. 8-month demand data, we calibrate AM peak,
off-peak and weekend peak macroscopic traffic models. • Distribution of flows is obtained from the interpolant and used
with the observed dist. to build a KS test stat. for calibration using SPSA.
• Proposed methodology: – Any type of simulation model – Efficient than MC-type methods – Parallelized to increase speed
• Use stochastic parameters for jam density and wave speed for the traffic flow fundamental diagram for a larger freeway section.
• Apply methodology for a larger network with higher dimensions of stochasticity
FUTURE WORK