best practices: large scale multiphysics

Best Prac*ces: Large Scale Mul*physics Frank Ham & Sanjeeb Bose, Cascade Technologies L Shunn, G Bres, D Philips, M Emory, A Saghafian, D Kim, B Hejazi, P Quang [Cascade]

1

Acknowledging financial, technical, or computaMonal support from: NASA, Navair, Dept of Energy Office of Science, Oak Ridge NaMonal Labs, Argonne NaMonal Labs, ERDC (DoD), GE

2

•  In 2014, GE observed combus*on dynamics during a full-‐scale test of a gas turbine •  The par*cular frequency had not been predicted by lower-‐fidelity models, and had not been observed in single can combustor tes*ng • Amplitudes were not a concern, and could be managed with fueling adjustments, but important ques*ons remained: • What caused the observed dynamics? • Would it manifest in a new engine configura*on with new technology scheduled for tes*ng in late 2015?

Mo*va*on: A mul*physics problem Gas Turbine Self-‐Excited Dynamics (SED)

2015 Press Release: Super Collabora*on GE and Cascade Apply New Technologies and Super-‐Computer Capabili*es to Improve Gas Turbine Combus*on

3

“The enhanced simula*on and visualiza*on capabili*es enabled by our collabora*on with Cascade can help us deliver even higher efficiency and lower emissions in the next genera*on of gas turbines,” John Lammas, vice president, power genera*on engineering at GE Power and Water

Source: GE Press Release June 22, 2015

4

The *meline: Simula*ng Gas Turbine Self-‐Excited Dynamics (SED)

2014 full speed full load test uncovers unexpected instability

2015 test planned for new engine configura*on including new technology

2015 2016

Joint Development Agreement announced between GE and Cascade to develop simula*on to improve Gas Turbine Combus*on

5

The *meline: Simula*ng Gas Turbine Self-‐Excited Dynamics (SED)

2014 full speed full load test uncovers unexpected instability

2015 test planned for new engine configura*on including new technology

2015 2016

Joint Development Agreement announced between GE and Cascade to develop simula*on to improve Gas Turbine Combus*on

Pre-‐test predic*on one month prior to scheduled full-‐speed full load

tes*ng

Lots of hard work

HPC Partnerships: cri*cal for success stories (and the likely model for early industrial use of exascale)

6

-‐  Owner of the problem/challenge -‐  Business urgency (read $) -‐  Deep domain-‐specific knowledge -‐  Compelling energy and US

compe**ve arguments

-‐  Owner of the hardware (Titan) -‐  Aligned with energy and US

compe**ve argument

-‐  Owner of the socware -‐  HPC Socware/Mul*physics/

Numerics development team able to move fast and touch all parts of the simula*on workflow

The full story heps://www.olcf.ornl.gov/2016/05/31/beeer-‐combus*on-‐for-‐power-‐genera*on/

7

Revolu*onary Computa*onal Aerosciences 5 revolu*ons required…

•  Efficient deployment on next-‐genera*on computer hardware and algorithmic innova*on •  Complex geometries, mesh genera*on and adapta*on remain a boeleneck • Wall modeling, and current inability predict separated turbulent flows •  The need for more than a single predic*on: sensi*vity analysis, design op*miza*on, UQ, and the consequent data management issues •  Mul*-‐physics modeling – push towards cons*tu*ve models that reflect the underlying physics more closely than ever before

8

Revolu*onary Computa*onal Aerosciences 5 revolu*ons required…

•  Efficient deployment on next-‐genera*on computer hardware and algorithmic innova*on •  Complex geometries, mesh genera*on and adapta*on remain a boeleneck • Wall modeling, and current inability predict separated turbulent flows •  The need for more than a single predic*on: sensi*vity analysis, design op*miza*on, UQ, and the consequent data management issues •  Mul*-‐physics modeling – push towards cons*tu*ve models that reflect the underlying physics more closely than ever before

9

Star*ng point: Cascade’s CharLES solver 2015

•  Finite volume method •  Explicit RK3 *me advancement • Compressible reac*ng flow •  Flamelet-‐based tabula*on of EOS • Arbitrary unstructured grids with parMe*s-‐based domain-‐decomposi*on •  Significant verifica*on and valida*on

10

Pressure and temperature field of a screeching supersonic jet. Image from “Stanford researchers break million-‐core supercompuMng barrier”.

Can we do grid genera*on on the HPC resource?

• Genera*ng grids is s*ll a boeleneck for most solvers working with geometries of engineering relevance • Grids not going away for turbulent flows: problems are non-‐smooth and require energy es*mates to ensure stability • Grid genera*on/op*miza*on should ul*mately be part of the simula*on process (to minimize grid-‐related only known during the simula*on)

• What about Voronoi diagrams?

11

Clipped Voronoi Diagrams

•  Key technology: Robust direct genera*on of the clipped Voronoi diagram inside an arbitrary hull

•  Fully and uniquely defines the the mesh from a set of points (connec*vity, volumes, normals) •  To modify the mesh, move, add, or remove points • mesh changes are con*nuous and differen*able with point mo*on • Voronoi diagram has compact support, and can thus be essen*ally “rendered” from the given point set

12

Voronoi Genera*ng Points

• HCP = hexagonal close-‐packed laqce •  Leads to centroidal Voronoi meshes of self-‐similar truncated octahedra • User specifies regions/windows and desired point density

• HCP mesh inserted using fast scanning approach •  Similar to pixel “rendering” •  Parallel, scalable

13

Op*mal sphere packing is the mo*va*on for HCP-‐laqce based Voronoi diagrams

Boundary Recovery using Lloyd Itera*on

•  Lloyd itera*on is a robust and powerful mesh smoothing capability for Voronoi diagrams that can be used to: •  Improve the mesh quality either locally or globally •  Introduce boundary alignment

14

StarMng mesh: Voronoi sites and mesh centroids very different

IteraMon 1: regenerate Voronoi from centroids of previous mesh

IteraMon 2: regenerate Voronoi from centroids of previous mesh

Example of a Voronoi Mesh around an airfoil

15

Stretched Cartesian mesh point distribuMons near important boundaries

Uniform point distribuMons in the far-‐field: Cartesian or hexagonal close packed (HCP)

Interface volumes and connecMvity uniquely determined by Voronoi diagram: fully conformal

CPU-‐side solver op*miza*ons: 1/2

• Restructuring data layout for increased cache efficiency

• Aggressive overlapping computa*on/communica*on (latency hiding) • Vectorizing instruc*ons, aggressive inlining •  Improved par**oning algorithm (na*ve, no longer ParMe*s)

16

CPU-‐side solver op*miza*ons: 2/2

•  LLVM bytecode: load opera*ons for internal flux calcula*on [Dec 2015]

• Reordering instruc*ons, algebraic factoriza*ons, storing intermediate variables in internal flux rou*nes to reduce repeated load instruc*ons

17

Execu*on *me breakdown CPU-‐op*mized compressible premixed solver

18

•  The most *me-‐consuming opera*ons are associated with tabulated lookups (cubic interpola*on on a Cartesian grid) -‐ move execu*on of these kernels to the GPU (cuda for Titan’s K20X Keplers)

Speedup from CPU-‐side op*miza*ons Including flux simplifica*ons associated with Voronoi diagrams

19

• normalized *ming: core-‐sec/M cells/step at 2 levels of strong scaling • Approximately speedup of 5X for the CPU only execu*on •  slightly beeer strong scaling due to latency hiding

Execu*on *me breakdown CPU-‐op*mized + GPU port of tabula*on

20


Execu*on *me breakdown CPU-‐op*mized + GPU port of tabula*on

21


Speedup from CPU+GPU op*miza*ons GPU chemistry tabula*on

22

• normalized *ming: core-‐sec/M cells/step at 2 levels of strong scaling • Approximately speedup of 9X for the CPU+GPU execu*on

Great: Simula*ons are running fast Now how do we look at them/learn from them?

23

•  Ability to produce data exceeding our ability to move data by 5x/7years

Slope: 20x / 7 years

Slope: 100x / 7 years

History of supercomputer performance History of network bandwidth

Solu*on: Images + metadata

24 Simula*on result file size: 20-‐200+ GB Image size: ~1 MB

(4-‐5 orders of magnitude smaller)

•  Instrument solvers to directly write images + metadata •  Images become Cartesian arrays of probes • Dimensional reduc*on + lossless compression at source •  Intrinsically value

Contours of streamwise velocity, u/c0 Rej = 418,000, Stsamp = 2, Data size = 50 MB

Treat each pixel in image as a data point (excluding geometry) analogous to PIV analysis

Supported by NAVAIR STTR project with FSU, under the supervision of Dr. John T. Spyropoulos. Computer Mme provided by DoD HPC faciliMes in ERDC DSRC

Leveraging the PNG standard Images+metadata as Cartesian arrays of probes

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0.001 0.01 0.1 1

Amplitude[relative]

St

separation

Near wall/shear layer

Identify salient flow structures/frequencies directly from images (similar to Schmid 2011, Schmid et al 2012 applied to expt data)

Quan*ta*ve data analysis from images Dynamic mode decomposi*on

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0.001 0.01 0.1 1

Amplitude[relative]

St

separation

Exit boundary layer/shear layer

Identify salient flow structures/frequencies directly from images (similar to Schmid 2011, Schmid et al 2012 applied to expt data)

Quan*ta*ve data analysis from images Dynamic mode decomposi*on

Summary

•  Fast development + *ght coupling with internal design team AND hardware team allowed Titan-‐based simula*ons to contribute understanding and mi*gate new technology risk for GE •  Solvers realized speedup of 9X rela*ve to star*ng point over 2 quarters •  3-‐way partnership with physics socware developer in the loop cri*cal for this success •  Speedup has allowed GE to bring many more impacyul simula*ons in-‐house onto internal HPC

28

best practices: large scale multiphysics

Technology