characterization of domain-based partitioners for parallel samr applications

19
Characterization of Domain-Based Partitioners for Parallel SAMR Applications Johan Steensland Sumir Chandra Michael Thuné Manish Parashar IT, Dept. of Scientific Computing Dept. of Electrical & Computer Engg. Uppsala University Rutgers, The State University of NJ Uppsala, Sweden Piscataway, NJ, USA

Upload: tale

Post on 21-Jan-2016

50 views

Category:

Documents


0 download

DESCRIPTION

Johan Steensland Sumir Chandra Michael Thun é Manish Parashar IT, Dept. of Scientific ComputingDept. of Electrical & Computer Engg. Uppsala UniversityRutgers, The State University of NJ Uppsala, SwedenPiscataway, NJ, USA - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Characterization of  Domain-Based Partitioners  for Parallel SAMR Applications

Characterization of Domain-Based Partitioners

for Parallel SAMR Applications

Johan Steensland Sumir Chandra Michael Thuné Manish Parashar

IT, Dept. of Scientific Computing Dept. of Electrical & Computer Engg.Uppsala University Rutgers, The State University of NJUppsala, Sweden Piscataway, NJ, USA

This research was supported by the National Science Foundation and Swedish Foundation for Strategic Research

Page 2: Characterization of  Domain-Based Partitioners  for Parallel SAMR Applications

Characterization of Domain-based Partitioners for Parallel SAMR Applications

Overview

• Structured AMR• Partitioning Adaptive Grid Hierarchies• Grid Structures• Characterizing Partitioning Schemes• SAMR Applications• Partitioning Techniques• Experimental Evaluation• Partitioner Performance• Octant Approach• Partitioning Prescriptions• Towards ARMaDA• Conclusions

Page 3: Characterization of  Domain-Based Partitioners  for Parallel SAMR Applications

Characterization of Domain-based Partitioners for Parallel SAMR Applications

Adaptive Mesh Refinement

•Start with a base coarse grid with minimum acceptable resolution

• Tag regions in the domain requiring additional resolution, cluster the tagged cells, and fit finer grids over these clusters

• Proceed recursively so that regions on the finer grid requiring more resolution are similarly tagged and even finer grids are overlaid on these regions

• Resulting grid structure is a dynamic adaptive grid hierarchy

The Berger-Oliger AlgorithmRecursive Procedure Integrate(level)

If (RegridTime) Regrid Step t on all grids at level “level”

If (level + 1 exists)Integrate (level + 1) Update(level, level + 1)

End ifEnd Recursionlevel = 0Integrate(level)

Structured AMR

Page 4: Characterization of  Domain-Based Partitioners  for Parallel SAMR Applications

Characterization of Domain-based Partitioners for Parallel SAMR Applications

Partitioning Adaptive Grid Hierarchies

• Balance load and…– Expose available parallelism– Minimize communication overheads

• Inter-level prolongations/restrictions• Intra-level “ghost” communications

– Enable dynamic load redistribution with minimum overheads

• Parallel AMR costs– Communications

• intra-level “ghost” communication– along the surface of each block

• inter-level prolongation/restriction communications– gather/scatter between parents/children

– Grid recomposition• grid refinement/coarsening• redistribution and load-balancing• prolongation• data-movement

Page 5: Characterization of  Domain-Based Partitioners  for Parallel SAMR Applications

Characterization of Domain-based Partitioners for Parallel SAMR Applications

Grid Structures

Time Step 40 Time Step 80Time Step 0

Time Step 160Time Step 120 Time Step 182

Level 1:Level 0: Level 3:Level 2: Level 4:Legend

• 2-D Grid Structure

Page 6: Characterization of  Domain-Based Partitioners  for Parallel SAMR Applications

Characterization of Domain-based Partitioners for Parallel SAMR Applications

Grid Structures (contd.)

• 3-D Grid Hierarchy

Page 7: Characterization of  Domain-Based Partitioners  for Parallel SAMR Applications

Characterization of Domain-based Partitioners for Parallel SAMR Applications

Characterizing Partitioning Schemes

• PAC Tuple– Run-time selection of partitioning

schemes based on system/ application parameters

• Evaluation Metrics– Communication Requirement

• inter-level/intra-level communication & memory copies

– Load Imbalance• amount of imbalance • effort required

– Data Migration• consider existing distribution

– Partitioning Time– Partitioning Induced Overhead

• number of grid components• quality of grid components

– size, aspect ratio

• Overview of Distribution Schemes– Space-Filling Curves (SFC)– Sequence Partitioning (SP)– Multi-level Inverse SFC (Vampire)

• Geometric, binary dissection, parameterized binary dissection

– Binary Dissection (BD)– Wavefront Diffusion (WD - ParMetis)– Iterative Tree Balancing (ITB)– Combined Grid Distribution (CGD)– Independent Grid Distribution (IGD)– Independent Level Distribution (ILD)– Weighted Distribution

Page 8: Characterization of  Domain-Based Partitioners  for Parallel SAMR Applications

Characterization of Domain-based Partitioners for Parallel SAMR Applications

SAMR Applications

• Suite of 5 real-world SAMR application kernels• Scientific and engineering domains

– Numerical relativity: scalarwave 2-D & 3-D

– Oil reservoir simulations: Buckley-Leverette 2-D & 3-D

– Computational fluid dynamics:• Compressible turbulence: rm 2-D• Supersonic flows: enoamr 2-D

– Transport equation: Transport 2-D

• Applications use 3 levels of factor 2 refinements• Refinements performed every 4 time-steps• Applications executed for 100 time-steps

Page 9: Characterization of  Domain-Based Partitioners  for Parallel SAMR Applications

Characterization of Domain-based Partitioners for Parallel SAMR Applications

Partitioning Techniques

• SFC (ISP)– Recursive linear representation of multi-dimensional grid hierarchy

generated using space-filling mappings (N-to-1 dimensional mapping)

– Computational load determined by segment length and recursion level

• G-MISP– Multi-level algorithm viewing matrix of workloads from SAMR grid

hierarchy as a one-vertex graph, refined recursively

– Favors speed at expense of load balance

• G-MISP + SP– “Smarter” variant of G-MISP – uses sequence partitioning to assign

consecutive portions of one-dimensional list to processors

– Load balance improves but scheme is computationally more expensive

Page 10: Characterization of  Domain-Based Partitioners  for Parallel SAMR Applications

Characterization of Domain-based Partitioners for Parallel SAMR Applications

Partitioning Techniques (contd.)

• pBD-ISP– Generalization of binary dissection – domain partitioned into p partitions

– Each split divides load as evenly as possible, considering processors

• SP– Domain sub-divided into p*b equally sized blocks

– Dual-level algorithm enabling different parameter settings for each level

– Fine granularity scheme – good load balance but increased overhead, communication and computational cost

• WD– Part of ParMetis suite based on global workload and specializes in

repartitioning graphs where refinements are scattered

– Scheme results in fine grain partitionings with jagged boundaries and increased communication costs and overheads

Page 11: Characterization of  Domain-Based Partitioners  for Parallel SAMR Applications

Characterization of Domain-based Partitioners for Parallel SAMR Applications

Experimental Evaluation

• Normalized results for Scalarwave and Buckley-Leverette applications

Page 12: Characterization of  Domain-Based Partitioners  for Parallel SAMR Applications

Characterization of Domain-based Partitioners for Parallel SAMR Applications

Experimental Evaluation (contd.)

Scheme LB Comm. DM OH Speed

G-MISP - o - - - - - o o o o o o o o o o o o o o o o o o o o o +

GMISP+SP + + + o + o + o o o o o o o o o o o o o o o o o o o o o o

pBD-ISP o o o - o - - + + + o + o o + o + o + o + + + + + + + + +

SP + + - - o - o - o o + o + o o o o + - - o - o - - o - o -

ISP - - o - - - - - - o - o - - - - o o - o o o + + + + o + +

(+) - Significantly better (o) - Average (-) - Significantly worse

Page 13: Characterization of  Domain-Based Partitioners  for Parallel SAMR Applications

Characterization of Domain-based Partitioners for Parallel SAMR Applications

Partitioner Performance

Performance summary of observed results

• G-MISP– Fast, load balance not optimized, average overall performance

• G-MISP+SP– Similar to G-MISP, better load balance, higher computational costs

• pBD-ISP– Good overall performance, very fast, small communications and data

movement, average load balance

• SP– Computationally very expensive, unpredictable behavior, worse load

balance than G-MISP+SP

Page 14: Characterization of  Domain-Based Partitioners  for Parallel SAMR Applications

Characterization of Domain-based Partitioners for Parallel SAMR Applications

Partitioner Performance (contd.)

• ISP– Very fast, generates low overhead, below average load balance, higher

communication, similar to those of G-MISP

• WD– Metis integration extremely expensive, dedicated SAMR partitioners

performed much better

– Even though Metis is known to produce high-quality partitionings at a low cost, two extra steps were needed in our interface

– Metis graph generated from grid before partitioning, clustering used to regenerate grid blocks from graph partitions after partitioning

Page 15: Characterization of  Domain-Based Partitioners  for Parallel SAMR Applications

Characterization of Domain-based Partitioners for Parallel SAMR Applications

Octant Approach

• Used to classify the state of the SAMR application with respect to– Adaptation pattern (scattered or localized)

– Whether run-time is dominated by computation or communication

– Activity dynamics in the solution

Page 16: Characterization of  Domain-Based Partitioners  for Parallel SAMR Applications

Characterization of Domain-based Partitioners for Parallel SAMR Applications

Partitioning Prescriptions

• Association of partitioning techniques to application state octants

Octant Scheme

I pBD-ISP

II pBD-ISP

III G-MISP+SP

IV G-MISP+SP, SP

V pBD-ISP

VI pBD-ISP

VII G-MISP+SP

VIII G-MISP+SP

Page 17: Characterization of  Domain-Based Partitioners  for Parallel SAMR Applications

Characterization of Domain-based Partitioners for Parallel SAMR Applications

Towards ARMaDA

• ARMaDA – Adaptive Runtime Management of Dynamic Applications

• “Best” partitioning depends on application/system configuration and current application/system state– Application Sensitive Adaptation

• Partitioning Scheme: Vampire (MISP), GrACE (SFC), ParMetis (WD), RSB, ITB, etc.

• Granularity: Patch size: AMR efficiency, comm./comp. ratio, overhead, node-performance, load-balance, etc.

• Number of Processors/ Load on Processors: Dynamic allocations/ configuration/ management (1000+ processors from the beginning or “on-demand”, hierarchical decomposition using dynamic processor groups)

– System Sensitive Adaptation• Availability of system resources• State of system resources: SNMP, NWS, REMOS

• Heterogeneity

Page 18: Characterization of  Domain-Based Partitioners  for Parallel SAMR Applications

Characterization of Domain-based Partitioners for Parallel SAMR Applications

Towards ARMaDA (contd.)

• Adaptive meta-partitioner– Dynamic PAC tuple

Page 19: Characterization of  Domain-Based Partitioners  for Parallel SAMR Applications

Characterization of Domain-based Partitioners for Parallel SAMR Applications

Conclusions

• Application-centric characterization of domain-based partitioners– Partitioning quality determined by a 5-component metric

– 6 partitioning schemes evaluated using 5 application kernels

– Mapping of partitioners onto application state octants

– Octant approach and dynamic PAC tuple

• Overall goal– Support the formulation of policies required to drive a dynamically

adaptive meta-partitioner for SAMR grid hierarchies

– Selection of most appropriate partitioning strategy at run-time, based on current application and system state

– Decrease in overall execution time

– ARMaDA : Adaptive Run-time Management of Dynamic Applications