abaqus performance benchmark and profiling performance analysis_.pdf · 3 simulia abaqus • abaqus...

19
Abaqus Performance Benchmark and Profiling December 2009

Upload: phunganh

Post on 04-Mar-2018

285 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Abaqus Performance Benchmark and Profiling Performance Analysis_.pdf · 3 SIMULIA Abaqus • ABAQUS offers a suite of engineering design analysis software products, including tools

Abaqus Performance Benchmark and Profiling

December 2009

Page 2: Abaqus Performance Benchmark and Profiling Performance Analysis_.pdf · 3 SIMULIA Abaqus • ABAQUS offers a suite of engineering design analysis software products, including tools

2

Note

• The following research was performed under the HPC Advisory Council activities

– Participating vendors: Intel, SIMULIA, Dell, Mellanox

– Compute resource - HPC Advisory Council Cluster Center

• The participating members would like to thank SIMULIA for their support and guidelines

• For more info please refer to

– www.mellanox.com, www.dell.com/hpc, www.intel.com,

http://www.simulia.com

Page 3: Abaqus Performance Benchmark and Profiling Performance Analysis_.pdf · 3 SIMULIA Abaqus • ABAQUS offers a suite of engineering design analysis software products, including tools

3

SIMULIA Abaqus

• ABAQUS offers a suite of engineering design analysis

software products, including tools for:

– Nonlinear finite element analysis (FEA)

– Advanced linear and dynamics application problems

• ABAQUS/Standard provides general-purpose FEA that includes a broad range of analysis capabilities

• ABAQUS/Explicit provides nonlinear, transient, dynamic

analysis of solids and structures using explicit time

integration

Page 4: Abaqus Performance Benchmark and Profiling Performance Analysis_.pdf · 3 SIMULIA Abaqus • ABAQUS offers a suite of engineering design analysis software products, including tools

4

Objectives

• The presented research was done to provide best practices

– Abaqus performance benchmarking

– Interconnect performance comparisons

– Understanding Abaqus communication patterns

– Power-efficient simulations

• The presented results will demonstrate

– The scalability of the compute environment to provide good application

scalability

– Considerations for power saving through balanced system configuration

Page 5: Abaqus Performance Benchmark and Profiling Performance Analysis_.pdf · 3 SIMULIA Abaqus • ABAQUS offers a suite of engineering design analysis software products, including tools

5

Test Cluster Configuration

• Dell™ PowerEdge™ M610 24-node cluster

• Quad-Core Intel X5570 @ 2.93 GHz CPUs

• Intel Cluster Ready certified cluster

• Mellanox ConnectX MCQH29-XCC 4X QDR InfiniBand mezzanine card

• Mellanox M3601Q 32-Port Quad Data Rate (QDR-40Gb) InfiniBand Switch

• Memory: 24GB memory per node

• OS: RHEL5U3, OFED 1.4.1 InfiniBand SW stack

• MPI: HP-MPI 2.3

• Application: Abaqus 6.9 EF1

• Benchmark Workload

– Abaqus/Standard Server Benchmarks: S4B

– Abaqus/Explicit Server Benchmarks: E5

Page 6: Abaqus Performance Benchmark and Profiling Performance Analysis_.pdf · 3 SIMULIA Abaqus • ABAQUS offers a suite of engineering design analysis software products, including tools

6

Mellanox InfiniBand Solutions

• Industry Standard– Hardware, software, cabling, management

– Design for clustering and storage interconnect

• Performance– 40Gb/s node-to-node

– 120Gb/s switch-to-switch

– 1us application latency

– Most aggressive roadmap in the industry

• Reliable with congestion management• Efficient

– RDMA and Transport Offload

– Kernel bypass

– CPU focuses on application processing

• Scalable for Petascale computing & beyond• End-to-end quality of service• Virtualization acceleration• I/O consolidation Including storage

InfiniBand Delivers the Lowest Latency

The InfiniBand Performance Gap is Increasing

Fibre Channel

Ethernet

60Gb/s

20Gb/s

120Gb/s

40Gb/s

240Gb/s (12X)

80Gb/s (4X)

Page 7: Abaqus Performance Benchmark and Profiling Performance Analysis_.pdf · 3 SIMULIA Abaqus • ABAQUS offers a suite of engineering design analysis software products, including tools

7

Bandwidth Intensive• Intel® QuickPath Technology• Integrated Memory Controller

Delivering Intelligent Performance Next Generation Intel® Microarchitecture

Performance on Demand• Intel® Turbo Boost Technology• Intel® Intelligent Power Technology

Threaded Applications• 45nm quad-core Intel® Xeon® Processors• Intel® Hyper-threading Technology

Intel® 5520 Chipset

PCI Express* 2.0

ICH 9/10Intel® X25-ESSDs

Intel® Data Center Manager

Intel® Node Manager Technology

Performance That Adapts to The Software Environment

Page 8: Abaqus Performance Benchmark and Profiling Performance Analysis_.pdf · 3 SIMULIA Abaqus • ABAQUS offers a suite of engineering design analysis software products, including tools

8

Dell PowerEdge Servers helping Simplify IT

• System Structure and Sizing Guidelines– 24-node cluster build with Dell PowerEdge™ M610 blades server

– Servers optimized for High Performance Computing environments

– Building Block Foundations for best price/performance and performance/watt

• Dell HPC Solutions– Scalable Architectures for High Performance and Productivity

– Dell's comprehensive HPC services help manage the lifecycle requirements.

– Integrated, Tested and Validated Architectures

• Workload Modeling– Optimized System Size, Configuration and Workloads

– Test-bed Benchmarks

– ISV Applications Characterization

– Best Practices & Usage Analysis

Page 9: Abaqus Performance Benchmark and Profiling Performance Analysis_.pdf · 3 SIMULIA Abaqus • ABAQUS offers a suite of engineering design analysis software products, including tools

9

Abaqus/Standard Server Benchmark(S4B)

0

1000

2000

3000

4000

5000

8 16 32 64

Number of Cores

Tota

l Run

time

(s)

GigE InfiniBand QDR

Abaqus/Standard Benchmark Results

• Input Dataset: S4B– Cylinder head bolt-up

• InfiniBand provides higher utilization, performance and scalability– Up to 51% higher performance versus GigE

Lower is better

51%

8-cores per node

Page 10: Abaqus Performance Benchmark and Profiling Performance Analysis_.pdf · 3 SIMULIA Abaqus • ABAQUS offers a suite of engineering design analysis software products, including tools

10

Abaqus/Explicit Server Benchmark(E5)

0

100

200

300

400

500

8 16 32 64

Number of Cores

Tota

l Run

time

(s)

GigE InfiniBand QDR

Abaqus/Explicit Benchmark Results

• Input Dataset: E5– Blast loaded plate

• InfiniBand provides higher utilization, performance and scalability– Up to 57% higher performance versus GigE

Lower is better 8-cores per node

57%

Page 11: Abaqus Performance Benchmark and Profiling Performance Analysis_.pdf · 3 SIMULIA Abaqus • ABAQUS offers a suite of engineering design analysis software products, including tools

11

Abaqus Power CostInfiniBand QDR vs. GigE

(24 Nodes)

0

4000

8000

12000

16000

20000

Abaqus/Standard (S4B) Abaqus/Explicit (E5)

Pow

er C

ost (

$)

GigE InfiniBand

Power Cost Savings with Different Interconnect

• InfiniBand saves up to $6400 power to finish the same number of Abaqus jobs compared to GigE– Yearly based for 24-node cluster

• As cluster size increases, more power can be saved

$/KWh = KWh * $0.20For more information - http://enterprise.amd.com/Downloads/svrpwrusecompletefinal.pdf

$6400$5750

Page 12: Abaqus Performance Benchmark and Profiling Performance Analysis_.pdf · 3 SIMULIA Abaqus • ABAQUS offers a suite of engineering design analysis software products, including tools

12

Abaqus Benchmark Results Summary• Interconnect comparison shows

– InfiniBand delivers superior performance in every cluster size

– Performance advantage extends as cluster size increases

• InfiniBand enables power saving

– Up to $6400/year power savings versus GigE

Page 13: Abaqus Performance Benchmark and Profiling Performance Analysis_.pdf · 3 SIMULIA Abaqus • ABAQUS offers a suite of engineering design analysis software products, including tools

13

Abaqus/Explicit MPI Profiling – MPI Functions

• Mostly used MPI functions – MPI_Test, MPI_Isend, MPI_Irecv, MPI_Waitall, and MPI_Allgather/Allgatherv

Abaqus/Explicit MPI Profiliing(E5)

110

1001000

10000100000

100000010000000

1000000001000000000

MPI_Allg

ather

MPI_Allg

atherv

MPI_Allre

duce

MPI_Bca

st

MPI_Irec

v

MPI_Ise

nd

MPI_Rec

v

MPI_Sen

d

MPI_Test

MPI_Wait

all

MPI Function

Num

ber o

f Mes

sage

s

16 Cores 32 Cores 64 Cores

Page 14: Abaqus Performance Benchmark and Profiling Performance Analysis_.pdf · 3 SIMULIA Abaqus • ABAQUS offers a suite of engineering design analysis software products, including tools

14

Abaqus/Explicit MPI Profiling – Timing

• MPI_Allgatherv and MPI_Allgather show highest communication overhead

Abaqus/Explicit MPI Profiliing(E5)

1

10

100

1000

10000

100000

1000000

MPI_Allg

ather

MPI_Allg

atherv

MPI_Allre

duce

MPI_Bca

st

MPI_Irec

v

MPI_Ise

nd

MPI_Rec

v

MPI_Sen

d

MPI_Test

MPI_Wait

all

MPI Function

MPI

Run

time

(s)

16 Cores 32 Cores 64 Cores

Page 15: Abaqus Performance Benchmark and Profiling Performance Analysis_.pdf · 3 SIMULIA Abaqus • ABAQUS offers a suite of engineering design analysis software products, including tools

15

Abaqus/Explicit MPI Profiling – Messags

• Majority messages are small and medium messages• Number of messages increases with cluster size

Abaqus/Explicit MPI Profiliing(E5)

110

1001000

10000100000

100000010000000

1000000001000000000

[0..64

B]

[65B..2

56B]

[257B

..102

4B]

[1KB..4

KB]

[4KB..1

6KB]

[16KB..6

4KB]

[64KB..2

56KB]

Message Size

Num

ber o

f Mes

sage

s

16 Cores 32 Cores 64 Cores

Page 16: Abaqus Performance Benchmark and Profiling Performance Analysis_.pdf · 3 SIMULIA Abaqus • ABAQUS offers a suite of engineering design analysis software products, including tools

16

Abaqus/Explicit MPI Profiling – Messags

• Most data related MPI messages are within 65B-256B• Total data transferred increases with cluster size

Abaqus/Explicit MPI Profiliing(E5)

110

1001000

10000100000

100000010000000

100000000

[0..64

B]

[65B..2

56B]

[257B

..102

4B]

[1KB..4

KB]

[4KB..1

6KB]

[16KB..6

4KB]

[64KB..2

56KB]

Message Size

Tota

l Siz

e (K

B)

16 Cores 32 Cores 64 Cores

Page 17: Abaqus Performance Benchmark and Profiling Performance Analysis_.pdf · 3 SIMULIA Abaqus • ABAQUS offers a suite of engineering design analysis software products, including tools

17

Abaqus Profiling Summary

• Abaqus/Explicit was profiled to identify its communication patterns

• Frequent used message sizes

– Abaqus/Explicit has large number of both small and medium messages

– Number of messages increases with cluster size

• Interconnects effect to Abaqus performance

– Both Interconnect latency (MPI_Allgather/Allgatherv) and bandwidth

(MPI_Isend/Irecv) are important to Abaqus/Explicit performance

• Balanced system – CPU, memory, Interconnect that match each

other capabilities, is essential for providing application efficiency

Page 18: Abaqus Performance Benchmark and Profiling Performance Analysis_.pdf · 3 SIMULIA Abaqus • ABAQUS offers a suite of engineering design analysis software products, including tools

18

Productive Systems = Balanced System

• Balanced system enables highest productivity– Interconnect performance to match CPU capabilities– CPU capabilities to drive the interconnect capability– Memory bandwidth to match CPU performance

• Applications scalability relies on balanced configuration– “Bottleneck free”– Each system components can reach it’s highest capability

• Dell M610 system integrates balanced components– Intel “Nehalem” CPUs and Mellanox InfiniBand QDR

• Latency to memory and Interconnect latency at the same magnitude of order– Provide the leading productivity and power/performance system for

Desmond simulations

Page 19: Abaqus Performance Benchmark and Profiling Performance Analysis_.pdf · 3 SIMULIA Abaqus • ABAQUS offers a suite of engineering design analysis software products, including tools

1919

Thank YouHPC Advisory Council

All trademarks are property of their respective owners. All information is provided “As-Is” without any kind of warranty. The HPC Advisory Council makes no representation to the accuracy and completeness of the information contained herein. HPC Advisory Council Mellanox undertakes no duty and assumes no obligation to update or correct any information presented herein