altix usage and application programming

41
Zellescher Weg 12 Willers-Bau A113 Tel. +49 351 - 463 - 39835 Matthias S. Mueller ([email protected]) Center for Information Services and High Performance Computing (ZIH) Altix Usage and Application Programming Discussion And Important Information For Users

Upload: others

Post on 07-Feb-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Zellescher Weg 12

Willers-Bau A113

Tel. +49 351 - 463 - 39835

Matthias S. Mueller ([email protected])

Center for Information Services and High Performance Computing (ZIH)

Altix Usage and Application Programming

Discussion And Important Information For Users

Outline

Timeline

Support and Collaboration for Computational Science on HPC

Access to the Systems and Current Configuration

First Experiences

Some final remarks

Timeline2005 2006

Jul Aug Sep Oct Nov

Machine Room Upgrade

Installation Stage 1a (Test operation)

Installation Stage 1b

Installation Stage 2

Dec Jan Feb Mar Apr May Jun Jul Aug Sep

Overall Infrastructure - Details

Performance of computers at ZIH

59.7 GF/s

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

Origin 2800,

Rapunzel

Altix + PC Farm

T3E

Origin 3800

Romulus, Remus

N=1

N=500

SUM

1 Gflop/s

1 Tflop/s

100 Mflop/s

100 Gflop/s

100 Tflop/s

10 Gflop/s

10 Tflop/s

1 Pflop/s

Altix 3700

merkur, venus

Evolution of a parallel application

DebugServer

Parallelization – Correctness – Performance - Postprocessing

HPC Consulting

Serial Program

Model?

MPI OpenMP

Platform Platform

Parallel Debugging - DDT

MPI Groups

Thread,

Stack,

Localand Global Variables

Pane

Evaluation window

Output,

Breakpoints,

Watch

Pane

File browseand Sourcepane

Vampir – Performance Analysis of Applications

Vampir Next Generation

Worker 1

Worker 2

Worker m

Master

Server

Trace 1Trace 2

Trace 3Trace N

Tools

1. Trace generator

2. Vampir viewer and analyzer

3. VNG viewer

4. Parallel VNG analysis engine

5. Conversion and analysis tools

Visualization of experimental data

(Visualization of experimental data of a low speed axial compressor )

• Flow field and compressor geometry

• Animation to show time evolution.

Zellescher Weg 12

Willers-Bau A113

Tel. +49 351 - 463 - 39835

Matthias S. Mueller ([email protected])

Third Party Applications

Third Party Applications

???malettiLS-Dyna

AvailAvailMPIInstalledCPMD

AvailAvailInstalledMaple

AvailAvailInstalledMathematica

AvailAvailInstalledMatlab

AvailAvailInstalledAbaqus

AvailInstalledInstalledAnsys

AvailMarc

AvailAvailInstalledNastran/Patran

AvailAvailInstalledFluent

AvailAvailInstalledAMBER

AvailAvailSMPSMPGaussion03

ClusterAltixO3KO2KName

Numerical Libraries

AvailAvail?InstalledBLAS

AvailAvail?InstalledLapack

AvailAvail??ScaLapack

AvailAvailMPIInstalledNAG

AvailAvailMPIInstalledIMSL

ClusterAltixO3KO2KName

Zellescher Weg 12

Willers-Bau A113

Tel. +49 351 - 463 - 39835

Matthias S. Mueller ([email protected])

Current Configuration

General configuration

Currently the system is split into two partitions:

– Merkur with 64 CPUs

– Venus with 128 CPUs

Merkur is for login

Currently the debugger DDT is only available on merkur. This system has slower MPI communication and no one-sided communication, due to a removed xpmem module. Currently there are no cross-partition MPI jobs possible.

LSF

8h8-32Intermediate

4h32-124Large

8h1-8Small

2h1-16Interactive

Time LimitCPU countQueue

Zellescher Weg 12

Willers-Bau A113

Tel. +49 351 - 463 - 39835

Matthias S. Mueller ([email protected])

Access

Access - Technical

The only available method of access is via ssh

Hostname: merkur.hrsk.tu-dresden.de

Access - administrative

Access to the machine is granted by external committee after evaluation

Proposals can be submitted online athttp://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/dienste/formulare/projektantrag

Initially access will be granted immediately after proposal submission

Test operation („user-friendly mode“) during December

Production starts in January 2006

Electronic Proposal Submission (I)

Electronic Proposal Submission (II)

Zellescher Weg 12

Willers-Bau A113

Tel. +49 351 - 463 - 39835

Matthias S. Mueller ([email protected])

First Experiences on Altix

Stresstests

Memory:

– >18 tests, >68000 different patterns, >500 TB memory throughput

– ~20h test time

MPI

– >28 tests, >14000 different patterns >100 TB message throughput

– ~24h test time

DISK

– >260 tests, >11400 files, 8.5h, 157 TB disk throughput

MPI latency

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1

latency

0 10

20 30

40 50

60 70 0

10

20

30

40

50

60

70

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9

2 2.1

MPI bandwidth

800

1000

1200

1400

1600

1800

2000

2200

bandwidth

0 10

20 30

40 50

60 70 0

10

20

30

40

50

60

70

800 1000 1200 1400 1600 1800 2000 2200

I/O Performance during acceptance

0

0,5

1

1,5

2

2,5

3

Read

Write

Read 2,89 2,73 2,73

Write 2,79 2,76 2,63

AcceptRemoved

DiskRebuild

Scalability of /fastfs file system

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

1 2 4 8 16 32 64 128

band

wid

th[G

B/s

]

CPUs

I/O-Benchmark 3928 MB / CPU, 8 chunks

read (venus) (1.67 GB/s max.)read (merkur) (1.73 GB/s max.)write (venus) (1.51 GB/s max.)

write (merkur) (1.18 GB/s max.)

Code Tuning: different compiler flags

905,748

638,904

0

200

400

600

800

1000

1200

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Flags

Tim

e[s]

Zellescher Weg 12

Willers-Bau A113

Tel. +49 351 - 463 - 39835

Matthias S. Mueller ([email protected])

Short Comparison Origin - Altix

Matrix Multiplication from www.benchit.de

0

1

2

3

4

5

6

7

0 100 200 300 400 500 600 700 800 900 1000

GFL

OP

S

Matrix Size

numerical.matmul.F77.0.0.double

Intel Itanium 2, FLOPS (jki)MIPS R12000, FLOPS (jki)

DGEMM from www.benchit.de

0

30

60

90

120

150

0 500 1000 1500 2000 2500 3000 3500 4000

Matrix Size

numerical.matmul.C.0.SCSL.double

1 Thread, Performance2 Threads, Performance4 Threads, Performance

16 Threads, Performance8 Threads, Performance

32 Threads, Performance

0

30

60

90

120

150

0 500 1000 1500 2000 2500 3000 3500 4000G

FLO

PS

Matrix Size

numerical.matmul.C.0.MKL.double

auto-parallelism (OpenMP) using Intel MKL, 2 CPUs, Performanceauto-parallelism (OpenMP) using Intel MKL, 4 CPUs, Performanceauto-parallelism (OpenMP) using Intel MKL, 8 CPUs, Performance

auto-parallelism (OpenMP) using Intel MKL, 32 CPUs, Performanceauto-parallelism (OpenMP) using Intel MKL, 16 CPUs, Performance

MPI Bandwidth

0

0.5

1

1.5

2

2.5

3

3.5

4

0 1 2 3 4 5 6 7 8 9 10

Ban

dwid

th [G

iB/s

]

Message Size [MiB]

MPI Bandwidth (Pingpong with 8 pairs)

AltixO3kK

MPI latencies

0

1

2

3

4

5

6

7

1 4 16 64 256 1024

us

# pair

AltixO3k

Single CPU Results for CFD kernels

Single CPU Results for CFD kernels

Performance of Lautrec: O3K vs. Altix

Performance

0

50

100

150

200

250

300

350

0 10 20 30 40 50 60 70

#CPUs

Rel

. Spe

ed

O3K-00

Altix-00

O3K-01

Alitx-01

O3K-02

Alitx-02

O3K-03

Altix-03

O3K-04

Alitx04

Performance Ratio Altix3700/Origin3800 (preliminary)

0

10

20

201 243 247 252 441 446 450 621 644 649

Your results may be different.

Feedback is very welcome.

ZIH Application Performance Competition

Prices are awarded for the best ratio between SGI Origin 3800 and SGI Altix 3700

Two categories:

– Single CPU performance

– 32 CPU performance

Criteria:

– Real application

– Demonstrated performance with Vampir tracefile

– Cheating is not allowed!!

Deadline: 28.2.2006

Winners will be selected by the ZIH award committee

ZIH staff is not eligible.

ZIH Application Performance Competition

Prices:

– A good bottle of wine and one ZIH shirt for each category

Good Luck !!!!