enzo-p / cello pursuing petascale astrophysics · 2017. 10. 17. · , numerical methods mostly...

29

Upload: others

Post on 02-Mar-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

Enzo-P / CelloPursuing Petascale Astrophysics

PI Michael L. Norman, Co-PI James Bordner

University of California, San Diego

San Diego Supercomputer Center

NCSA Blue Waters Symposium16�19 May 2017

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 2: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

The search for missing physicsRealistic simulations of the intergalactic medium

Hubble 2010Science Year in Review

Our project has two components:

cosmological simulations (Norman)

investigating Helium reionization epochsimulations don't match new observationsran extreme simulations using Enzo on BWmore: Norman BW 2016, Pengfei Chen thesis

software development (Bordner)

Enzo's AMR is Terascale not Petascaledeveloping next-generation Enzo-P

parallelized using Charm++scalable array-of-octree AMR (Cello)using BW for performance testing

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 3: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

The Enzo cosmology and astrophysics applicationEnzo's strengths

Applicable to a wide range of astrophysical/cosmological problems

Physics Equations: mathematical models

� Hydrodynamics (Euler equations) � Gravity (∇2Φ = 4πGρ) � Chemistry/cooling �

Star formation � Magnetism � Radiation . . .

Numerical Methods: approximate and solve

� PPM, � ZEUS, � MUSCL � FFT � multigrid � Gadget cooling � Cloudy cooling �

Grackle � Dedner MHD � MHD-CT � Implicit FLD � Moray . . .

Data Structures: computer representation

� Structured Adaptive Mesh Re�nement (SAMR) � Eulerian �elds � Lagrangian

particles . . .

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 4: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

The Enzo cosmology and astrophysics applicationEnzo's strengths

Implements a variety of sophisticated numerical methods

Physics Equations: mathematical models

� Hydrodynamics (Euler equations) � Gravity (∇2Φ = 4πGρ) � Chemistry/cooling �

Star formation � Magnetism � Radiation . . .

Numerical Methods: approximate and solve

� PPM, � ZEUS, � MUSCL � FFT � multigrid � Gadget cooling � Cloudy cooling �

Grackle � Dedner MHD � MHD-CT � Implicit FLD � Moray . . .

Data Structures: computer representation

� Structured Adaptive Mesh Re�nement (SAMR) � Eulerian �elds � Lagrangian

particles . . .

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 5: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

The Enzo cosmology and astrophysics applicationEnzo's strengths

Enabled by adaptive mesh re�nement with particles and �elds

Physics Equations: mathematical models

� Hydrodynamics (Euler equations) � Gravity (∇2Φ = 4πGρ) � Chemistry/cooling �

Star formation � Magnetism � Radiation . . .

Numerical Methods: approximate and solve

� PPM, � ZEUS, � MUSCL � FFT � multigrid � Gadget cooling � Cloudy cooling �

Grackle � Dedner MHD � MHD-CT � Implicit FLD � Moray . . .

Data Structures: computer representation

� Structured Adaptive Mesh Re�nement (SAMR) � Eulerian �elds � Lagrangian

particles . . .

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 6: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

The Enzo cosmology and astrophysics applicationEnzo's struggles

Limitations primarily due to relentless growth of HPC platforms

Ever-changing software requirementsEnzo was born in early 1990's�massive parallelism� meant P ≈ 100×1000 parallelism today

A�ects di�erent parts of Enzo di�erently

, physics requirements unchanged, numerical methods mostly viable/ data structures limit Enzo's scalability

Motivates AMR data structure redesignEnzo-P: �petascale� version of Enzokeep Enzo's physics and many methodsbuilt on Cello scalable AMR framework

[ Sam Skillman, Matt Turk ]

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 7: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

The Enzo cosmology and astrophysics applicationEnzo's struggles

Limitations primarily due to relentless growth of HPC platforms

Ever-changing software requirementsEnzo was born in early 1990's�massive parallelism� meant P ≈ 100×1000 parallelism today

A�ects di�erent parts of Enzo di�erently

, physics requirements unchanged, numerical methods mostly viable/ data structures limit Enzo's scalability

Motivates AMR data structure redesignEnzo-P: �petascale� version of Enzokeep Enzo's physics and many methodsbuilt on Cello scalable AMR framework

[ Sam Skillman, Matt Turk ]

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 8: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

The Enzo cosmology and astrophysics applicationEnzo's struggles

Limitations primarily due to relentless growth of HPC platforms

Ever-changing software requirementsEnzo was born in early 1990's�massive parallelism� meant P ≈ 100×1000 parallelism today

A�ects di�erent parts of Enzo di�erently

, physics requirements unchanged, numerical methods mostly viable/ data structures limit Enzo's scalability

Motivates AMR data structure redesignEnzo-P: �petascale� version of Enzokeep Enzo's physics and many methodsbuilt on Cello scalable AMR framework

[ Sam Skillman, Matt Turk ]

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 9: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

Cello scalable adaptive mesh re�nementDi�erences between Enzo and Enzo-P

Enzo

structured AMRvariable shaped patchesneighbor- & parent-communicationMPI parallelization

Enzo-P/Cello

array of octrees�xed shaped blocksneighbor-only communicationCharm++ parallelization

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 10: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

Cello scalable adaptive mesh re�nementDi�erences between Enzo and Enzo-P

Enzo

structured AMRvariable shaped patchesneighbor- & parent-communicationMPI parallelization

Enzo-P/Cello

array of octrees�xed shaped blocksneighbor-only communicationCharm++ parallelization

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 11: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

The Charm++ parallel programming system

pX()

Main

ChareC

pW()

Main()

pZ()

pY()

pV()

ChareA

Charm++ Runtime

ChareB[0]

A Charm++ parallel program

Charm++ program

Decomposed by objects

Charm++ objects called chares

invoke entry methods

asynchronous

communicate via messages

Charm++ runtime system

maps chares to processorsschedules entry methodsmigrates chares to load balance

Additional features

checkpoint/restartdynamic load balancingfault-tolerance

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 12: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

The Charm++ parallel programming system

pX()

Main

ChareC

pW()

Main()

pZ()

pY()

pV()

ChareA

Charm++ Runtime

ChareB[0]

A Charm++ parallel program

Charm++ program

Decomposed by objects

Charm++ objects called chares

invoke entry methods

asynchronous

communicate via messages

Charm++ runtime system

maps chares to processorsschedules entry methodsmigrates chares to load balance

Additional features

checkpoint/restartdynamic load balancingfault-tolerance

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 13: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

The Charm++ parallel programming system

pX()

Main

ChareC

pW()

Main()

pZ()

pY()

pV()

ChareA

Charm++ Runtime

ChareB[0]

A Charm++ parallel program

Charm++ program

Decomposed by objects

Charm++ objects called chares

invoke entry methods

asynchronous

communicate via messages

Charm++ runtime system

maps chares to processorsschedules entry methodsmigrates chares to load balance

Additional features

checkpoint/restartdynamic load balancingfault-tolerance

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 14: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

Cello scalable adaptive mesh re�nementHow Enzo-P addresses Enzo's limitations

Memory usage

AMR structure is fully distributeduniform blocks reduce fragmentation

Mesh quality

2-to-1 re�nement constraint maintained

Parallel task de�nition

uniform �eld array sizes in blockssizes determined by user

Parallel task scheduling

asynchronous data-driven

Data locality

primarily nearest-neighbor comm.

Enzo

[ John Wise ]

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 15: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

Cello scalable adaptive mesh re�nementHow Enzo-P addresses Enzo's limitations

Memory usage

AMR structure is fully distributeduniform blocks reduce fragmentation

Mesh quality

2-to-1 re�nement constraint maintained

Parallel task de�nition

uniform �eld array sizes in blockssizes determined by user

Parallel task scheduling

asynchronous data-driven

Data locality

primarily nearest-neighbor comm.

Enzo

[ John Wise ]

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 16: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

Cello scalable adaptive mesh re�nementHow Enzo-P addresses Enzo's limitations

Memory usage

AMR structure is fully distributeduniform blocks reduce fragmentation

Mesh quality

2-to-1 re�nement constraint maintained

Parallel task de�nition

uniform �eld array sizes in blockssizes determined by user

Parallel task scheduling

asynchronous data-driven

Data locality

primarily nearest-neighbor comm.

Enzo

[ John Wise ]

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 17: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

Cello scalable adaptive mesh re�nementHow Enzo-P addresses Enzo's limitations

Memory usage

AMR structure is fully distributeduniform blocks reduce fragmentation

Mesh quality

2-to-1 re�nement constraint maintained

Parallel task de�nition

uniform �eld array sizes in blockssizes determined by user

Parallel task scheduling

asynchronous data-driven

Data locality

primarily nearest-neighbor comm.

Enzo

[ John Wise ]

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 18: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

Cello scalable adaptive mesh re�nementHow Enzo-P addresses Enzo's limitations

Memory usage

AMR structure is fully distributeduniform blocks reduce fragmentation

Mesh quality

2-to-1 re�nement constraint maintained

Parallel task de�nition

uniform �eld array sizes in blockssizes determined by user

Parallel task scheduling

asynchronous data-driven

Data locality

primarily nearest-neighbor comm.

Enzo

[ John Wise ]

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 19: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

Cello scalable adaptive mesh re�nementHow Enzo-P addresses Enzo's limitations

Memory usage

AMR structure is fully distributeduniform blocks reduce fragmentation

Mesh quality

2-to-1 re�nement constraint maintained

Parallel task de�nition

uniform �eld array sizes in blockssizes determined by user

Parallel task scheduling

asynchronous data-driven

Data locality

primarily nearest-neighbor comm.

Enzo

[ John Wise ]

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 20: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

Cello distributed adaptive mesh re�nement

Cello implements a fully-distributed array-of-octree mesh hierarchy usinga Charm++ chare array of Blocks. Block chare elements are indexedusing a bit-coding of the array and octree indices.

Data-driven execution

Field face data sent when availablelast update triggers computation

Dynamic task scheduling

multiple Blocks per processoverlapedcommunication/computation

Charm++ also provides

cutting-edge load-balancing schemesdouble in-memory checkpoint/restart

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 21: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

How particle data is communicated between Blocks

communication is required when particles move outside a Block

this is done using a 4x4x4 array

array contains pointers to ParticleData (PD) objectsone PD object per neighbor Block

migrating particles are

scatter()-ed to PD array objectssent to associated neighborsgather()-ed by neighbors

one sweep through particles

one communication step per neighbor

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 22: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

How Particle objects store particle data

Types

"trace"

"dark_matter"

y

z

vx

vy

vz

mass

x

x

y

z

id

Batches multiple particle types

particles allocated in batches

�xed size arraysfewer new/delete operationse�cient insert/delete operationspotentially useful for GPU's

batches store particle attributes

(position, velocity, mass, etc.)8,16,32,64-bit integers32,64,128-bit �oats

particle positions may be �oating-point or integers

�oating-point for storing global positionsintegers for Block-local coordinates

solves reduced precision issue for deep hierarchiesless memory required for given accuracy

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 23: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

Enzo-P/Cello Blue Waters scaling testHydro and particles: �Alphabet Soup� problem

We tested basic Enzo-P hydrodynamics and particles scalability

variation of �array of Sedov Blast� test

letters instead of spheres

inhibits lockstep coarsen/re�ne

one letter per BW fp-core

tested with/without tracer particles

323 or 243 cells per block

among largest AMR runs ever done

256K fp-cores1.7T cells; 0.7T (cells + particles)50M Blocks

Enzo would require 72GB / process

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 24: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

Enzo-P/Cello Blue Waters hydro/particle scalingInitial tests: e�ciency dropo� before 32K fp-cores

100 101 102 103 104 105

Processes (FP cores)

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Eff

icie

ncy

parallel efficiencymemory efficiency

Enzo-P Weak Scaling on Blue Waters: EfficiencyAlphabet Soup Test (with tracer particles): 161013

100 101 102 103 104 105

Processes (FP cores)

10-4

10-3

10-2

10-1

100

101

102

Tim

e (

s)

computeadapt timeadapt sync

refresh timerefresh sync

stoppingtotal

Enzo-P Weak Scaling on Blue Waters: TimeAlphabet Soup Test (with tracer particles): 161013

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 25: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

Enzo-P/Cello Blue Waters hydro/particle scalingProjections analysis: hidden O(N) or O(P)

Bytes recevied

Messages received

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 26: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

Enzo-P/Cello Blue Waters hydro/particle scalingCharm++ SMP mode: excellent e�ciency through 256K fp-cores

100 101 102 103 104 105 106

Processes (FP cores)

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Eff

icie

ncy

parallel efficiencymemory efficiency

Enzo-P Weak Scaling on Blue Waters: EfficiencyAlphabet Soup Test (with tracer particles): 161110

100 101 102 103 104 105 106

Processes (FP cores)

10-4

10-3

10-2

10-1

100

101

102

Tim

e (

s)

computeadapt timeadapt sync

refresh timerefresh sync

stoppingtotal

Enzo-P Weak Scaling on Blue Waters: TimeAlphabet Soup Test (with tracer particles): 161110

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 27: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

Enzo-P/Cello Blue Waters gravity test

We tested scalability of a recently implemented gravity solver

�array of collapsing spheres� problem

varied both block size and problem size

multigrid V-cycle solve (uniform-mesh)

debugging scalable AMR gravity

Reynolds �HG� algorithmBiCG-STAB Krylov solvermultigrid-based preconditioner

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 28: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

Enzo-P/Cello Blue Waters gravity scaling

23 25 27 29 211 213 215 217

Processing units (FP cores)

2 6

2 4

2 2

20

22

24

Tim

e pe

r MG

V-cy

cle (s

)

Enzo-P Gravity Weak Scaling on Blue Waters

B24B32B48B64

N256N384N512

N768N1024N1536

N2048N3072N4096

23 25 27 29 211 213 215 217

Processing units (FP cores)

2 6

2 4

2 2

20

22

24

Tim

e pe

r MG

V-cy

cle (s

)

Enzo-P Gravity Strong Scaling on Blue Waters

B24B32B48B64

N256N384N512

N768N1024N1536

N2048N3072N4096

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017

Page 29: Enzo-P / Cello Pursuing Petascale Astrophysics · 2017. 10. 17. · , numerical methods mostly viable / data structures limit Enzo's scalability Motivates AMR data structure redesign

Conclusions and next steps

We are developing Enzo-P/Cello, the next-generation of the Enzoparallel AMR cosmology and astrophysics application

Scalability of AMR hydro is excellent through 256K cores

Scalability of uniform grid gravity is very good through 64K cores

Scalable AMR gravity capability is around the corner

Next steps include the following

Analyse and optimize uniform grid gravity

Test and optimize AMR gravity

More rigorous and realistic scaling problems

Exercise Charm++ load balancing algorithms

http://cello-project.org/ NSF SI2-SSE-1440709

James Bordner Enzo-P / Cello Pursuing Petascale Astrophysics BW Symposium 2017