developing hpc-enabled simulation system for multi-scale

15
D Developing HPC-enabled Simulation System for Multi-scale Mobility Network Analytics PI: Pengfei (Taylor) Li Oct-12, 2018 Project Kickoff Meeting 1

Upload: others

Post on 28-Mar-2022

9 views

Category:

Documents


0 download

TRANSCRIPT

DDeveloping HPC-enabled Simulation System for Multi-scale Mobility

Network Analytics

PI: Pengfei (Taylor) LiOct-12, 2018

Project Kickoff Meeting

1

OOverview

• Research Team (2018~2019): • Dr. Taylor Li, PI

• (HPC-enabled simulation engine development)

• Dr. Linkan Bian, Co-PI • (Methodology design of big data analytics)

• Dr. Wenmeng Tian, Senior staff • (Big data analytics and visualization)

• Slade Wang, Ph.D. student • (HPC environment configuration and data visualization in the HPC environment)

2

OObjective

• Build a multi-purpose, scalable, agent-based simulation engine within the HPC environment

• Mega-scale data streaming, archiving, visualizing and analyzing within the distributed computing environment • at the magnitude of hundreds of GB, TB or PB

• Preliminary Investigation of pioneering GPU-based computing engine• Combinatorial Optimization• Discrete-event Simulation

3

RResearch Tasks

• Task 1: Develop and fine tune HPC-enabled discrete-event simulation engine• Mega-scale, scalable, high-fidelity

• Task 2: Data analysis and visualization• Single computer• Distributed environment

• Task 3: Optimization

4

PPreliminary Results• HPC-enabled simulation engine development

• Second-by-second formulation• Path first-in-first-out rule• The engine can be expedited on a computer cluster• Hybrid programming: MPI (for shortest path finding) + Multi-threading

(network loading)

5

Simulation Parallelism

6

Computer 1

Computer 2

Multi-threading

Multi-threading

MPI

DData Set for Experiment

• Houston network, (28+ million vehicles with known origins and destinations per day, 86,400 sec (24 hours), 70k links and 20k nodes• Shortest path finding per iteration (6 hours)• Network loading per day, (5 hours)• Mobility data size: 14+ Gigabytes per day

Time of Day Demand Road Network7

PPreliminary Results

• Two Files: • Vehicle Shortest path data (14 G): with no link capacity constraints• Vehicle Trajectory data: with link capacity constraints

• Multiple runs will be needed.

8

9

DData Analytics Overview

• Model 1: Time series analysis• Analyze change of traffic flow over time for critical locations

• Model 2: Spatial analysis• Recognize spatial pattern at limited time points

• Model 3: Spatio-temporal analysis• Identify spatio-temporal patterns using tensor based approach

10

0 5 10 15 20 25 300

50

100

150

200

250

300

350

400

450

500Link ID=26248 (3104559.2805,13946399.335)

0 5 10 15 20 25 300

100

200

300

400

500

600Link ID=34838 (3115688.604,13756788.775)

0 5 10 15 20 25 300

2

4

6

8

10

12

14Link ID=1 (3118945.5045,13841059.685)

Model 1: Time Series Analysis at Critical Locations

• 74990 links -> 74990 time series models

• Drawbacks• This approach totally

ignores the spatial correlation between the links

• Huge redundancy in parameter estimation Link ID Xt-1 Xt-2 Xt-3 Xt-4 Xt-5

1 0.9954 0.0135 0.0353 -0.0897

.

.

.

26248 0.9801 0.0223 0.0018 0.0372 -0.0438

34838 1.0070 0.0046 -0.0170 -0.0052 0.0156

Time Series Analysis based on AR(p) model*

*The first 10k points are used for demonstration purpose11

Time

# of

Veh

icles

Time

# of

Veh

icles

Time

# of

Veh

icles

MModel 2: Spatial Analysis at Limited Time Points

• Characterize the spatial correlation at each time stamp• How does traffic pattern change over space?• Predict traffic volume at unobserved locations• Drawbacks

• Totally ignores temporal correlation• Very computationally intensive

12

MModel 3: Spatio-temporal Analysis using Tensor based Approach• Advantages

• Handle large scale spatio-temporal data

• Jointly considers spatio-temporal correlation

• Effective data compression in space and time

• Projection matrices indicates the importance of each single point on the map

• Challenges• Large matrix operations may still be a

problem

• Proposed solution• Distributed computing

DimensionOriginal Data 121*121*10

Compressed Data 50*52*4

x

y

t

x’

y’

t’

Dimension reduced by 92.9%13

OOn-going Efforts and Future Plan

• Improve the simulation engine• Generate the benchmark data set for cross-comparing various big-data analytics

approaches.

• Explore alternative options for mega-size data processing and visualization• Alt 1: Distributed MATLAB• Alt 2: Python +Spark within the HPC environment• Alt 3: Other appropriate tools?

• Preliminary GPU-based computing• GPU Development environment set up• GPU-based big data analytical packages• GPU-based computing engine development

14

PPotential Impacts on Other Projects

• Data generation platform

• Data visualization tool

• Data analysis

•Optimization

• ERDC / TARDEC mission critical applications15