pads 2010, georgia institute of technology, atlanta, ga, usa exploring multi-grained parallelism in...

21
PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations Qi (Jacky) Liu and Gabriel Wainer Department of Systems and Computer Engineering Carleton University Ottawa, Canada

Upload: nathan-cobb

Post on 27-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations Qi (Jacky) Liu and

PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA

Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations

Qi (Jacky) Liu and Gabriel WainerDepartment of Systems and Computer Engineering

Carleton University

Ottawa, Canada

Page 2: PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations Qi (Jacky) Liu and

PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA2/18

Outline

Motivation & Background

Fine-Grained Event Parallelism

Parallel DEVS Simulation on Cell

Experimental Results

Conclusion & Future Work

Event Processing Kernel

Page 3: PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations Qi (Jacky) Liu and

PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA3/18

Motivation

Accelerate general-purpose DEVS-based simulations on heterogeneous CMP architectures like the Cell processor

Develop new parallelization strategies based on fine-grained event-level parallelism inherent in the simulation process

Exploit multi-grained parallelism simultaneously at different levels of the system

Allow general users to gain performance transparently w/o being distracted by multicore programming details

Provide some generalizable methods & insight for PDES on emerging CMP architectures

Page 4: PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations Qi (Jacky) Liu and

PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA4/18

Cell Processor Overview

Nine-core heterogeneous CMP with two distinct ISAs Software-managed LS with explicitly-addressed DMA transfer Low-latency EIB channels – 32-bit mailbox & signal messages

Page 5: PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations Qi (Jacky) Liu and

PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA5/18

Discrete-EVent System Specification (DEVS)

M1 M2

M3 M4

Parallel DEVS (P-DEVS) Formalism

Cell-DEVS Formalism

Page 6: PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations Qi (Jacky) Liu and

PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA6/18

Layered View of M&S

Page 7: PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations Qi (Jacky) Liu and

PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA7/18

Parallel Simulation with CD++ Flat LP Structure

Structured Simulation Process

(I) LP and model init. (@) model output (*) model state trans. (D) model sync. (X) model input data (Y) model output data

Page 8: PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations Qi (Jacky) Liu and

PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA8/18

Fine-Grained Event Parallelism

Event-embarrassing parallelism» Independent events within a step

» Executed in an arbitrary order

Event-streaming parallelism» Causally-related events between

consecutive steps

» Executed in a pipelined fashion

Phase-changing events» Exchanged between NC & FC

» Natural fork & join points

Data-flow oriented parallelization

Page 9: PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations Qi (Jacky) Liu and

PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA9/18

Event Processing Kernel Hydrological Watershed Simulation

» 320×320×2 with 204,800 Simulators» Compute-intensive state transitions» Over 300 million events across 663 phases» Cell-DEVS model defined in CD++ spec. lang.

Simulation Profile on the PPE

SEKConcurrent exec. across SPEs - 98.02%

(event-embarrassing parallelism)

Pipelined exec. between PPE & SPEs - 1.15%

(event-streaming parallelism)

Page 10: PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations Qi (Jacky) Liu and

PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA10/18

Parallel DEVS Simulation on Cell - Overview

VVECTORECTOR P PARALLELISMARALLELISM (SPE SIMD)(SPE SIMD)

TTHREADHREAD PPARALLELISMARALLELISM

EEVENT-EMBARRASSINGVENT-EMBARRASSING PPARALLELISMARALLELISM

EEVENT-STREAMINGVENT-STREAMING PPARALLELISMARALLELISM

(TWO-STAGE PIPELINE)(TWO-STAGE PIPELINE)

DDATA-STREAMINGATA-STREAMING PPARALLELISMARALLELISM

(DOUBLED-BUFFERED (DOUBLED-BUFFERED DMA AT THREE LAYERS)DMA AT THREE LAYERS)

CCOMPUTE-OMPUTE-I/O I/O PPARALLELISMARALLELISM

Page 11: PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations Qi (Jacky) Liu and

PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA11/18

Parallel DEVS Simulation on Cell – LP Virtualization

Purpose » Map active Simulators to a limited group of SPE threads» Fit into the small on-chip LS» Assign each SPE a reusable task operating on a stream of data» Facilitate fine-grained dynamic load-balancing between SPEs

Solution» Turn Simulators (and associated atomic models) into virtual LPs» Separate event-processing logic (wrapped in SPE threads) from

state data (maintained in main memory buffers)» Match the states of active Simulators to available SPE threads

dynamically at each virtual time – SEK job scheduling

Page 12: PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations Qi (Jacky) Liu and

PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA12/18

Parallel DEVS Simulation on Cell – More Details

Virtual Simulator

State Mgmt.

Decentralized

Event Mgmt.

Page 13: PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations Qi (Jacky) Liu and

PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA13/18

Parallel DEVS Simulation on Cell – More Details

Rule Evaluation on SPEs

SEK Job Scheduling

Page 14: PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations Qi (Jacky) Liu and

PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA14/18

Platform and Configuration

IBM BladeCenter QS223.2GHz PowerXCell 8i × 2

32GB RAM

Red Hat Enterprise Linux 5.2

IBM SDK for Multicore Acceleration 3.1

Parallel DEVS simulator on Cell CD++/Cell

SEK job scheduling policy

round-robin or

shortest-queue-first

CD++ event-logging turned off

minimize the impact of file I/O

Page 15: PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations Qi (Jacky) Liu and

PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA15/18

Total Simulation Time with Watershed Model

Performance gain with just one SPE 5.84×» OO C++ code on PPE vs. SIMD-aware C code on SPEs

» memory latency & cache miss vs. data locality & double-buffered DMA

» Low-level optimizations on SPEs (LS data alignment, call stack usage, branch minimization, loop unrolling, in-line substitution, pipelined event execution)

Overall performance with 8 SPEs 33.06×

Page 16: PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations Qi (Jacky) Liu and

PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA16/18

Speedups over (PPE with 1 SPE) Version

Speedup grows slower with more and more SPEs» Higher overhead for SEK job scheduling and orchestration

» Increased DMA contention & channel stalls

Page 17: PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations Qi (Jacky) Liu and

PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA17/18

Conclusion

Formalism-Based Design Methodology» Facilitate model reuse & portability» Reduce validation & verification cost

Performance-Centric Approach» Accelerate event processing for compute-intensive DEVS models» Minimize communication & synchronization overhead» Achieve fine-grained dynamic load balancing

New Parallelization Strategy for PDES» Exploit fine-grained event parallelism from a data-flow perspective» Combine multi-grained parallelism at different system levels» Break LP boundaries with LP virtualization

Insight for PDES on Heterogeneous CMP Architectures» Match workload characteristics to functional specialization of cores» Address data locality, memory latency, & code optimization issues

Page 18: PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations Qi (Jacky) Liu and

PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA18/18

Future Work

Porting different types of models to Cell performance testing» Transparency» Minimal knowledge (and learning curve) from users

Integrating with existing conservative/optimistic approaches» Combine cluster-level LP-based conservative simulation

Using both synchronous & asynchronous algorithms» Combine cluster-level Time Warp optimistic simulation

Using Lightweight Time Warp (DS-RT 2008, PADS 2009)

Testing on large-scale hybrid supercomputers

Using Cell processor in new ways

Page 19: PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations Qi (Jacky) Liu and

PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA19/19

This research was supported in part

by the MITACS Accelerate Ontario program, Canada,

and by the IBM T. J. Watson Research Center, NY.

[email protected]

http://www.sce.carleton.ca/~liuqi/

ARS Lab: http://cell-devs.sce.carleton.ca/ars/

Questions?

Page 20: PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations Qi (Jacky) Liu and

PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA

Some Applications

Battlefield Simulations

Crowd Behavior & Evacuation Analysis

Defense & Emergency Planning

Page 21: PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations Qi (Jacky) Liu and

PADS 2010, Georgia Institute of Technology, Atlanta, GA, USA

Some Applications

Biomedical & Environmental Analysis

Presynaptic Nerve

Krebs Cycle in living organisms Forest fire propagation Watershed formation

Deformable Membrane