cae applications for hpc - serc.iisc.in · cae applications for hpc stephen behling cray inc ......

32
COMPUTE | STORE | ANALYZE CAE Applications for HPC Stephen Behling Cray Inc May, 2015 8/20/2015 1

Upload: leduong

Post on 21-Apr-2018

243 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

CAE Applications for HPC Stephen Behling

Cray Inc May, 2015

8/20/2015 1

Page 2: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

Short bio

8/20/2015 2

● Education in Nuclear Engineering ● Worked at U.S. National Laboratory in Idaho on reactor

safety computer codes ● Joined Cray Research in 1986

● Vectors, micro-tasking, macro-tasking ● CAE applications

● IBM (1999 – 2008) ● CAE applications

● Now back at Cray Inc. in Performance Team ● CAE applications: PowerFLOW, PAMCrash, AcuSolve, ANSYS

Mechanical ● Many other codes: SU3, GFS, NIM, …

Page 3: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

CAE encompasses industries, national laboratories, and research

● Aerospace ● Commercial; military; space

● Automotive ● Commercial; sports

● Other transportation ● Trains; ocean transport

● Manufacturing

● Energy ● Fossil fuels

● Nuclear

● Hydrology; medical devices; architecture; insurance

8/20/2015 3

Page 4: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

CAE is growing rapidly

● Trends are: ● More accurate analyses ● Bigger models

● “1 billion cells”; “100 million elements”; “19 million degrees of freedom”

● Bigger computers ● More nodes; more cores; more memory; more parallel I/O

● Much CAE work uses third-party Independent Software Vendors (ISVs) for financial reasons ● Engineers (cost the most)

● Need to get answers quickly

● Software licenses (second most costly) ● Less costly than internal code development, maintenance, and support

● Computer hardware (third most costly)

8/20/2015

4

Page 5: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

HPC workload

Automotive

Dynamics

CFD

Structures

Other

Aerospace

Dynamics

CFD

Structures

Other

8/20/2015 5

Page 6: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

Growth in CAE HPC usage

6

Automotive and

aerospace companies

saw a huge growth in

CAE HPC power from

2000-2015

and

CAE simulation is

growing at an increasing

rate in recent years

Ref:

Industrial High Performance Computing:

Michael Taeschner, Volkswagen AG

8/20/2015

Page 7: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

Growth in CAE HPC usage

7

Automotive and

aerospace companies

saw a huge growth in

CAE HPC power from

2000-2015

and

CAE simulation is

growing at an increasing

rate in recent years

Ref:

Industrial High Performance Computing:

Michael Taeschner, Volkswagen AG

• More car models

• More runs per

model

• More accuracy

8/20/2015

Page 8: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

Growth in CAE HPC usage

8

Automotive and

aerospace companies

saw a huge growth in

CAE HPC power from

2000-2015

and

CAE simulation is

growing at an increasing

rate in recent years

Ref:

Industrial High Performance Computing:

Michael Taeschner, Volkswagen AG

NEED BIGGER

COMPUTERS!

8/20/2015

Page 9: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

ISV license pricing favors parallel computing

9

• Most ISVs have a pricing

system the encourages

running in parallel

• It is typically cheaper per

simulation to use more cores

• Graph shows PAM-CRASH

example with “very

conservative” estimate for

parallel performance Ref.

March 2015

7.5X performance

2.1X

license cost

PAM-CRASH performance vs. License cost

8/20/2015

Page 10: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

CAE and parallel computers

● Computational Fluid Dynamics (CFD) ● Most scalable of the CAE applications

● All codes are MPI parallel

● Some have threading

● Structural Dynamics ● Moderate scaling; contact as parts buckle is difficult

● All codes are MPI parallel

● Some have threading

● Structural NVH (Noise, Vibration, Harshness) ● Low scaling; large memory or large I/O requirements

● All codes are MPI parallel and may be threaded

8/20/2015 10

Page 11: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

CAE Workflow

1. Recognize problem to be solved ● Meet safety requirements? ● Reduce drag? ● Minimize weight/noise/cost? ● Maximize efficiency/reliability/profit/safety?

2. Represent system via a CAD model ● Multiple use: both for manufacturing and for various analyses

3. Translate CAD description into computational mesh ● Each discipline needs its own mesh

4. Decompose mesh into computational domains ● First pick number of computational nodes/cores and then run decomposition tool

5. Solve ● Main computational task in CAE

6. Analyze ● Graphics; statistics

8/20/2015 11

Page 12: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

CAE Workflow

1. Recognize problem to be solved ● Meet safety requirements? ● Reduce drag? ● Minimize weight/noise/cost? ● Maximize efficiency/reliability/profit/safety?

2. Represent system via a CAD model ● Multiple use: both for manufacturing and for various analyses

3. Translate CAD description into computational mesh ● Each discipline needs its own mesh

4. Decompose mesh into computational domains ● First pick number of computational nodes/cores and then run decomposition tool

5. Solve ● Main computational task in CAE

6. Analyze ● Graphics; statistics

8/20/2015 12

Page 13: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

CAE Workflow

1. Recognize problem to be solved ● Meet safety requirements? ● Reduce drag? ● Minimize weight/noise/cost? ● Maximize efficiency/reliability/profit/safety?

2. Represent system via a CAD model ● Multiple use: both for manufacturing and for various analyses

3. Translate CAD description into computational mesh ● Each discipline needs its own mesh

4. Decompose mesh into computational domains ● First pick number of computational nodes/cores and then run decomposition tool

5. Solve ● Main computational task in CAE

6. Analyze ● Graphics; statistics

8/20/2015 13

Page 14: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

CAE Workflow requirements

● CAD ● Days to weeks

● Translate CAD to mesh ● Hours to day ● Usually single processor

● Domain decomposition ● Minutes to hours; can be single workstation ● May use parallel processing and may be part of solve step ● Examples: Metis (serial), pMetis (parallel)

● Solve ● Hours to days to weeks to … unsolvable ● The most benefit for engineers is to have one or more results per day ● Need a supercomputer for this

● Analyze ● Days or more ● Need powerful graphics

8/20/2015

14

Page 15: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

CAE simulation characteristics for solve step

• Computational Fluid Dynamics (CFD)

• Often highly scalable (16000+ cores)

• I/O requirements low to moderate for typical analyses; big data for LES

• Seldom use math libraries; HDF5

• Typical runs 100 -1000 cores

• Dynamics: Impact Simulation; Crash/Safety Simulation

• Can be moderately scalable (2000+ cores)

• Low I/O requirements

• Seldom use math libraries; HDF5

• Typical runs: 20 – 200 cores

• Structures and NVH

• Low Scaling (200+) cores

• Large memory; good I/O; often have GPU option

• BLAS2, BLAS3

• Typical runs: 1 to 10% of the HPC environment

• Other/Multi-Physics

• Fluid-Structure interaction

• Ships and waves; Blood flow; Oil pipe riser (sub-sea well to shore)

8/20/2015 15

Page 16: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

Scaling is affected by load imbalance and network communication

0

4000

8000

12000

16000

0 4000 8000 12000 16000

Sp

ee

d u

p

Number of cores

Excellent

Good

Not so good

Ideal speedup

It can be a lot of

work to move

from “good” to

“excellent”

scaling.

8/20/2015 16

Page 17: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

Elapsed time (on log-log plot) is another way to look at scaling

1

4

16

64

256

1024

32 256 2048 16384

Ela

ps

ed

tim

e

Number of cores

Actual

Ideal

8/20/2015 17

Page 18: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

CFD ISV examples: all are unstructured grids

● OpenFOAM by OpenCFD Ltd at ESI Group ● Open source under GNU General Public License ● Finite volume discretization for typical CFD; MPI parallelization

● ANSYS Fluent by ANSYS Inc. ● Finite volume discretization; MPI parallelization

● STAR-CCM+ by CD-adapco Inc. ● Finite volume discretization; MPI parallelization

● AcuSolve by Altair ● Finite element; hybrid MPI and OpenMP parallelization

● PowerFLOW by Exa Corporation ● Lattice Boltzmann; MPI parallelization with some threading

● HiFUN by Sandl ● MPI parallelization

8/20/2015 18

Page 19: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

EXA/PowerFLOW scaling on the Cray XC40

19

1

2

4

8

16

32

64

128

256

512

1024

32 256 2048 16384

Elap

sed

tim

e 51

2 ti

me

ste

ps

(s.)

Number of SP tasks (cores)

PowerFLOW version 5.1a on Cray XC40

2.3 GHz 16-core Intel ® Haswell processors

PowerFLOW CFD

simulation scaling to

over 16,000 cores

Lattice Boltzmann code

88 million voxels

“large-performance-test”

PowerFLOW scaling

Ideal scaling

8/20/2015

Page 20: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

Cray and ANSYS/Fluent work together to add value On-going development effort to improve HPC scaling in Fluent

• Segregated implicit solver

• Scalable at ~10K cells per core! 0

500

1000

1500

2000

2500

3000

3500

4000

0 2048 4096 6144 8192 10240 12288

Pe

rfo

rma

nc

e R

ati

ng

Number of Cores Rating is jobs per day. A higher rating means faster performance.

Truck_111M Turbulent Flow

0

100

200

300

400

500

600

700

800

900

1000

0 2048 4096 6144 8192 10240 12288 14336

Number of Cores

DLR_96M LES Combustion

R15.0

Ideal

• Pressure based coupled solver

• Scalable at ~10K cells per core!

Release15

Release14

Release13

8/20/2015

Page 21: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

1 billion element AcuSolve Formula 1 external flow simulation

● Date = Fri Nov 8 15:49:22 2013

● Problem = F1

● Title = AcuSolve Problem

● Platform = Linux 3.0.80-

0.5.1_1.0501.7664-cray_ari_c x86_64

● Machine = linux64

● No. of threads = 24

● No. of nodes = 169984316

● No. of elements = 1007704126

0

1000

2000

3000

0 1000 2000 3000

Number of cores

Speed up

Linear

8/20/2015 21

Page 22: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

Human Respiratory System Transient incompressible turbulent flow 360M elements, scaled to 25,000 cores

Kiln Furnace Transient incompressible turbulent flow Coupled with energy and combustion 4.22 billion elements, scaled to 100,000 cores

Human Heart Non-linear solid mechanics Coupled with electrical propagation 3.4 billion elements, scaled to 100,000 cores

CFD Results from NCSA “Blue Waters” system ALYA CFD code: 3 Real-World Cases

Ref: “Growth of HPC Industrial Partnership”, Merle Giles NCSA, Oct. 2014

8/20/2015 22

Page 23: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

Impact/Crash Simulation Dynamic Structural Analysis

8/20/2015 23

Examples:

• LS-DYNA by LSTC

• RADIOSS by Altair

• PAMCrash by ESI Group

• Abaqus explicit by Dassault Systèmes

Page 24: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

2014 IDC award for scaling LS-DYNA

24

Rolls-Royce, Procter and Gamble, National Center for Supercomputing

Applications, Cray Inc., Livermore Software Technology Corporation (U.S.).

Researchers from NCSA, Rolls Royce, Proctor and Gamble, Cray Inc, and

Livermore Software Technology Corporation were able to scale the commercial

explicit finite element code, LS-DYNA, to 15,000 cores…

8/20/2015

Page 25: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

Ford: 100M Element Model of “B pillar”

● Three papers at the 2014 LS-DYNA conference using 100M element model ● “LS-DYNA performance in Side Impact Simulations

with 100M element Models” El Fadl, B., Ford Motor Company ● 2048 cores: 2.5 days

● 1024 cores: 4.5 days

● “Meso-Scale FEA Modeling to Simulate Crack Initiation and Propagation in Boron Steel” Chen, Y., Ford Motor Company

● “Fracture Prediction and Correlation of ALSi Host Stamped Steels with Difference Models in LS-DYNA” by Zhu, H.

8/20/2015 25

Page 26: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

Example using RADIOSS

8/20/2015 26

8072.29

4509.92

2686.82

1747.8

1242.66

844.09

628.5 458.23

356.68 349.56

100

1000

10000

1 2 4 8 16 32 64 128 256 512

[16 mpi, 2omp]

[16 mpi, 4omp]

[32 mpi, 4omp]

[32 mpi, 8omp]

[64 mpi, 8omp]

[512 mpi,2 omp]

[512 mpi,4 omp

[512 mpi,8 omp]

[512mpi,16omp]

[512mpi,32omp]

EL

AP

SE

D T

IME

Results for Taurus A05 Refined 10 million elements RADIOSS 13.0

Crystal XC40; Haswell-32 cores - 2.3 GHz

16000 cores

Number of nodes

Page 27: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

Noise, Vibration & Harshness (NVH) Implicit Structural Analysis

8/20/2015 27

Image courtesy of MSC Software

Examples:

• MSC-Nastran by MSC Software

• Abaqus Implicit by Dassault Systèmes

• ANSYS Mechanical by Ansys Inc.

Page 28: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

MSC Nastran NVH performance on Cray CS400

28

Large Structure Nodes ● Two Xeon E5-2667-v3 (Haswell, 8 core, 3.2 GHz)

● 758 GB RAM – Twenty four (24) 32GB DIMMs

● 4 x 1.6TB PCIe SSDs (Striped)

Implicit, structural eigenvalue solutions require a balance of

processor speed, memory and IO performance

Recent MSC Nastran benchmarks posted: http://web.mscsoftware.com/support/prod_support/nastran/performance/msc20140.cfm

• Largest NVH model size increased to 19 million DOF

• This NVH model is 5X the version largest version 2013 example

• Cray CS400 “NVH configuration” 1.6X faster than best version 2013 results

8/20/2015

Page 29: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

Summary

8/20/2015 29

Page 30: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

Scalability of select ISV applications in CLE

ISV Application Primary segment Demonstrated

scalability *

ANSYS Fluent Commercial CFD >36,000 cores

LS-DYNA Impact/crash analysis >15,000 cores

CFD++ Aerospace CFD >10,000 cores

STAR-CCM+ Commercial CFD >100,000 cores

PowerFLOW External CFD >16,000 cores

AcuSolve Commercial CFD > 6,000 cores

Abaqus/standard Structural analysis >300 cores

30

* Demonstrated scalability typically limited by the simulation model available

30

8/20/2015

Page 31: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

CAE is growing rapidly

● Trends are: ● More accurate analyses

● Bigger models ● “1 billion cells”; “100 million elements”; “19 million degrees of freedom”

● Bigger computers ● More nodes; more cores; more memory; more parallel I/O

● Cray Inc is proud to be a key vendor in this discipline

8/20/2015 31

Page 32: CAE Applications for HPC - serc.iisc.in · CAE Applications for HPC Stephen Behling Cray Inc ... PAMCrash, AcuSolve, ... Finite volume discretization for typical CFD;

C O M P U T E | S T O R E | A N A L Y Z E

Questions

32 8/20/2015