engineering productivity: gpu-accelerated simulation & hpc

21
1 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential Improving Engineering Productivity with GPU-accelerated Simulation and HPC Ray Browell NVIDIA GPU Technology Conference March 19, 2013 Session - S3546

Upload: others

Post on 31-Dec-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Engineering Productivity: GPU-Accelerated Simulation & HPC

1 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential

Improving Engineering Productivity with GPU-accelerated Simulation and HPC

Ray Browell

NVIDIA GPU Technology Conference

March 19, 2013

Session - S3546

Page 2: Engineering Productivity: GPU-Accelerated Simulation & HPC

2 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential

• ANSYS Inc. Overview

• ANSYS and NVIDIA Collaboration

• Mechanical GPU Accelerator Capabilities

• Fluent CFD GPU Accelerator Capabilities

• HPC Revolution

• Questions

Improving Engineering Productivity with GPU-accelerated Simulation and HPC

Page 3: Engineering Productivity: GPU-Accelerated Simulation & HPC

© 2012 ANSYS, Inc. March 18, 2013 3 ANSYS Confidential

Insert image here

“We’re relentlessly committed to your product development success.

We’re passionate about developing world-class engineering software

that addresses your current and future product development needs.”

ANSYS is dedicated exclusively to engineering simulation and is the world's

leading software provider. Product innovators in the most demanding

markets have trusted us for over 40 years.

Dipankar Choudhury Chief Technologist ANSYS

ANSYS, Inc. - Our Focus

Page 4: Engineering Productivity: GPU-Accelerated Simulation & HPC

© 2012 ANSYS, Inc. March 18, 2013 4 ANSYS Confidential

Insert image here

Insert image here

“Continual ANSYS research leads to advanced, more robust solutions

for even the most complex problems. End users have confidence in

analysis results, meaning they can rely less on costly physical testing.”

Florian Menter Research and Development Fellow ANSYS

Our technology enables you to predict with confidence that your products will

thrive in the real world. Customers trust our software to help ensure the

integrity of their products and drive business success through innovation.

ANSYS, Inc. - Our Technology

Page 5: Engineering Productivity: GPU-Accelerated Simulation & HPC

5 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential

Release

ANSYS Mechanical ANSYS Fluent

13.0 Dec 2010

Shared Memory Solvers;

Single Node/ Single GPU

14.0 Dec 2011

+ Distributed ANSYS;

Multi-node - 1 GPU/node

Radiation Heat Transfer

(beta)

14.5 Nov 2012

+ Multi-GPU / node;

+ Hybrid PCG

+ GPU AMG Solver (beta),

Single GPU

15.0 2013

Ongoing Collaboration

Cuda 5, etc.

Pressure based coupled

solver (PBCS) (WIP), multi-

GPU distributed (WIP)

ANSYS and NVIDIA Collaboration

Page 6: Engineering Productivity: GPU-Accelerated Simulation & HPC

6 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential

Mechanical GPU Accelerator Capability

• “Accelerate” Sparse direct equation solver (Shared & Distributed Memory)

– GPU is used to factor many dense “frontal” matrices

– Decision is made automatically on when to send data to GPU

• “Frontal matrix” too small, too much overhead, stays on CPU

• “Frontal matrix” too large, exceeds GPU memory, only partially accelerated

• “Accelerate” Preconditioned/Jacobian Conjugate Gradient iterative solvers (Shared & Distributed Memory)

– GPU is only used for sparse-matrix vector multiply (SpMV kernel)

– Decision is made automatically on when to send data to GPU

• Model too small, too much overhead, stays on CPU

• Model too large, exceeds GPU memory, only partially accelerated

Page 7: Engineering Productivity: GPU-Accelerated Simulation & HPC

7 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential

Mechanical GPU Accelerator Capability

• Supported Hardware

– Currently support NVIDIA Tesla 20-series, Quadro 6000, Quadro K5000

– Next Generation NVIDIA Tesla Cards (Kepler) will be supported at 14.5.7 Release

– Installing a GPU requires the following:

• Larger power supply (single card needs ~250W)

• Open 2x form factor PCIe x16 2.0 (or 3.0) slot

• Supported Operating Systems

– Windows and Linux 64-bit platforms only

Page 8: Engineering Productivity: GPU-Accelerated Simulation & HPC

8 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential

NVIDIA Tesla C2075

NVIDIA Tesla

M2090

NVIDIA Quadro

6000

NVIDIA Quadro K5000

NVIDIA Tesla K10

NVIDIA Tesla K20

Power (W) 225 250 225 122 250 225

Memory 6 GB 6 GB 6 GB 4 GB 8 GB 5 GB

Memory Bandwidth

(GB/s) 144 177.4 144 173 320 208

Peak Speed SP/DP

(GFlops) 1030/515 1331/665 1030/515 2290/95 4577/190 3520/1170

• Targeted Hardware Specifications

Mechanical GPU Accelerator Capability

Page 9: Engineering Productivity: GPU-Accelerated Simulation & HPC

9 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential

Mechanical GPU Accelerator Capability

• Supports majority of ANSYS Mechanical users

– Covers both sparse direct and PCG/JCG iterative solvers

– Only a few minor limitations

• Ease of use

– Requires at least one supported GPU card to be installed

– Requires at least one HPC Pack license

– No rebuild, no additional installation steps

• Performance

– ~10-25% reduction in time to solution when using 8 CPU cores

– Should never slow down you simulation!

Page 10: Engineering Productivity: GPU-Accelerated Simulation & HPC

10 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential

8 8

12

19

0

5

10

15

20

25

AN

SY

S M

echanic

al

Num

ber

of

Jobs

Per

Day

Results from HP Z820; 2 x Xeons

(16 Cores, use of only 8) 128GB

memory, Win7; 2 x Tesla C2075

V145sp-5 Model

Turbine geometry

2,100 K DOF

SOLID187 FEs

Static, nonlinear

One iteration

ANSYS Mechanical14.5

Direct sparse solver

Results for Distributed ANSYS 14.5 Preview and Xeon 8-Core CPUs

Higher is

Better

Xeon E5-2687W 8 Cores + Tesla C2075

1.6x

ANSYS Mechanical 14.5 Preview

Xeon E5-2687W 8 Cores + 2 x Tesla C2075

Page 11: Engineering Productivity: GPU-Accelerated Simulation & HPC

11 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential

2.6x

3.8x

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

2 cores 8 cores 8 cores

Re

lati

ve S

pe

ed

up

GPU Performance

(no GPU) (no GPU)

• 6.5 million DOF • Linear static analysis • Sparse solver (DMP) • 2 Intel Xeon E5-2670 (2.6 GHz, 16 cores total), 128 GB RAM, SSD, 4 Tesla C2075, Win7

• GPUs can offer significantly faster time to solution

(1 GPU)

Mechanical GPU Accelerator Capability

1.5x

Page 12: Engineering Productivity: GPU-Accelerated Simulation & HPC

12 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential

• GPUs can offer significantly faster time to solution

2.7x

5.2x

0.0

1.0

2.0

3.0

4.0

5.0

6.0

2 cores 8 cores 16 cores

Re

lati

ve S

pe

ed

up

GPU Performance

• 11.8 million DOF • Linear static analysis • PCG solver (DMP) • 2 Intel Xeon E5-2670 (2.6 GHz, 16 cores total), 128 GB RAM, SSD, 4 Tesla C2075, Win7

(no GPU) (1 GPU) (4 GPUs)

Mechanical GPU Accelerator Capability

Page 13: Engineering Productivity: GPU-Accelerated Simulation & HPC

13 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential

Fluent CFD Radiation Modeling on GPUs VIEWFAC

• Utility to compute view factors

• Hybrid MPI-OpenMP-OpenCL parallel implementation

• Works on CPUs, GPUs or both

RAY TRACING

• Utility to compute view factors

• Uses Optix on NVIDIA C2070

Available as full features in 14.5

Page 14: Engineering Productivity: GPU-Accelerated Simulation & HPC

14 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential

2832

933

517 517

0

1000

2000

3000 Dual Socket CPUDual Socket CPU + Tesla C2075

AN

SY

S F

luent

AM

G

Solv

er

Tim

e (

Sec)

2 x Xeon X5650, Only 1 Core Used

1.8x

5.5x

Lower is

Better

2 x Xeon X5650, All 12 Cores Used

Helix geometry

1.2M Hex cells

Unsteady, laminar

Coupled PBNS, DP

AMG F-cycle on CPU

AMG V-cycle on GPU

Helix Model

NOTE: All jobs solver

time only ~65% of total

time

Fluent CFD AMG Solver on GPUs Work-in-Progress

NVAMG Project – Preview of ANSYS Fluent 14.5 Performance

Page 15: Engineering Productivity: GPU-Accelerated Simulation & HPC

15 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential

HPC Revolution

Recent advancements have revolutionized the computational speed available on the desktop

– Multi-core processors

• Every core is really an independent processor

– Large amounts of RAM

– Solid State Drives (SSDs)

– GPUs

Page 16: Engineering Productivity: GPU-Accelerated Simulation & HPC

16 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential

HPC Revolution

The right combination of algorithms and hardware

leads to maximum efficiency

SMP vs. DMP

HDD vs. SSDs

Interconnects Clusters

GPUs

Page 17: Engineering Productivity: GPU-Accelerated Simulation & HPC

17 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential

• Balanced system for overall optimum performance

1.0x 2.7x 5.2x

12.5x

0

5

10

15

20

25

30

2 cores 8 cores 8 cores +GPU

8 cores +GPU + SSD

Re

lati

ve S

pe

ed

up

Balanced Performance IO Bound

• 2.1 million DOF • Nonlinear static analysis • Direct sparse solver (DSPARSE) • 2 Intel Xeon E5-2670 (2.6 GHz, 16 cores total), 128 GB RAM, SSD, 1 Tesla K20c, Win7

Mechanical GPU Accelerator Capability

1.9x

Page 18: Engineering Productivity: GPU-Accelerated Simulation & HPC

18 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential

• Balanced system for overall optimum performance

1.0x 2.7x 5.2x

12.5x

5.7x

12.0x

24.8x 27.3x

0

5

10

15

20

25

30

2 cores 8 cores 8 cores +GPU

8 cores +GPU + SSD

Re

lati

ve S

pe

ed

up

Balanced Performance

IO Bound

Compute Bound

• 2.1 million DOF • Nonlinear static analysis • Direct sparse solver (DSPARSE) • 2 Intel Xeon E5-2670 (2.6 GHz, 16 cores total), 128 GB RAM, SSD, 1 Tesla K20c, Win7

Mechanical GPU Accelerator Capability

Page 19: Engineering Productivity: GPU-Accelerated Simulation & HPC

19 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential

How will you use all of this computing power?

Design Optimization Studies

Higher fidelity Full assemblies More nonlinear

HPC Revolution - Design Optimization

Page 20: Engineering Productivity: GPU-Accelerated Simulation & HPC

20 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential

Improving Engineering Productivity with GPU-accelerated Simulation and HPC

• ANSYS, Inc. is a world leader in CAE

• ANSYS, Inc. is a world leader in GPU software deployment

• HPC Revolution

• Accelerates knowledge

• Allows better understanding

• Empowers Design Optimization

• ANSYS, Inc. and NVIDIA will continue to lead!

Page 21: Engineering Productivity: GPU-Accelerated Simulation & HPC

21 © 2013 ANSYS, Inc. March 18, 2013 ANSYS Confidential

Thank You!

Improving Engineering Productivity with HPC and GPU-Accelerated Simulation

Raymond Browell

724.514.3070

[email protected]