altix usage and application programming
TRANSCRIPT
Zellescher Weg 12
Willers-Bau A113
Tel. +49 351 - 463 - 39835
Matthias S. Mueller ([email protected])
Center for Information Services and High Performance Computing (ZIH)
Altix Usage and Application Programming
Discussion And Important Information For Users
Outline
Timeline
Support and Collaboration for Computational Science on HPC
Access to the Systems and Current Configuration
First Experiences
Some final remarks
Timeline2005 2006
Jul Aug Sep Oct Nov
Machine Room Upgrade
Installation Stage 1a (Test operation)
Installation Stage 1b
Installation Stage 2
Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Performance of computers at ZIH
59.7 GF/s
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
Origin 2800,
Rapunzel
Altix + PC Farm
T3E
Origin 3800
Romulus, Remus
N=1
N=500
SUM
1 Gflop/s
1 Tflop/s
100 Mflop/s
100 Gflop/s
100 Tflop/s
10 Gflop/s
10 Tflop/s
1 Pflop/s
Altix 3700
merkur, venus
Evolution of a parallel application
DebugServer
Parallelization – Correctness – Performance - Postprocessing
Parallel Debugging - DDT
MPI Groups
Thread,
Stack,
Localand Global Variables
Pane
Evaluation window
Output,
Breakpoints,
Watch
Pane
File browseand Sourcepane
Vampir Next Generation
Worker 1
Worker 2
Worker m
Master
Server
Trace 1Trace 2
Trace 3Trace N
Tools
1. Trace generator
2. Vampir viewer and analyzer
3. VNG viewer
4. Parallel VNG analysis engine
5. Conversion and analysis tools
Visualization of experimental data
(Visualization of experimental data of a low speed axial compressor )
• Flow field and compressor geometry
• Animation to show time evolution.
Zellescher Weg 12
Willers-Bau A113
Tel. +49 351 - 463 - 39835
Matthias S. Mueller ([email protected])
Third Party Applications
Third Party Applications
???malettiLS-Dyna
AvailAvailMPIInstalledCPMD
AvailAvailInstalledMaple
AvailAvailInstalledMathematica
AvailAvailInstalledMatlab
AvailAvailInstalledAbaqus
AvailInstalledInstalledAnsys
AvailMarc
AvailAvailInstalledNastran/Patran
AvailAvailInstalledFluent
AvailAvailInstalledAMBER
AvailAvailSMPSMPGaussion03
ClusterAltixO3KO2KName
Numerical Libraries
AvailAvail?InstalledBLAS
AvailAvail?InstalledLapack
AvailAvail??ScaLapack
AvailAvailMPIInstalledNAG
AvailAvailMPIInstalledIMSL
ClusterAltixO3KO2KName
Zellescher Weg 12
Willers-Bau A113
Tel. +49 351 - 463 - 39835
Matthias S. Mueller ([email protected])
Current Configuration
General configuration
Currently the system is split into two partitions:
– Merkur with 64 CPUs
– Venus with 128 CPUs
Merkur is for login
Currently the debugger DDT is only available on merkur. This system has slower MPI communication and no one-sided communication, due to a removed xpmem module. Currently there are no cross-partition MPI jobs possible.
Zellescher Weg 12
Willers-Bau A113
Tel. +49 351 - 463 - 39835
Matthias S. Mueller ([email protected])
Access
Access - Technical
The only available method of access is via ssh
Hostname: merkur.hrsk.tu-dresden.de
Access - administrative
Access to the machine is granted by external committee after evaluation
Proposals can be submitted online athttp://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/dienste/formulare/projektantrag
Initially access will be granted immediately after proposal submission
Test operation („user-friendly mode“) during December
Production starts in January 2006
Zellescher Weg 12
Willers-Bau A113
Tel. +49 351 - 463 - 39835
Matthias S. Mueller ([email protected])
First Experiences on Altix
Stresstests
Memory:
– >18 tests, >68000 different patterns, >500 TB memory throughput
– ~20h test time
MPI
– >28 tests, >14000 different patterns >100 TB message throughput
– ~24h test time
DISK
– >260 tests, >11400 files, 8.5h, 157 TB disk throughput
MPI latency
1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1
latency
0 10
20 30
40 50
60 70 0
10
20
30
40
50
60
70
1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9
2 2.1
MPI bandwidth
800
1000
1200
1400
1600
1800
2000
2200
bandwidth
0 10
20 30
40 50
60 70 0
10
20
30
40
50
60
70
800 1000 1200 1400 1600 1800 2000 2200
I/O Performance during acceptance
0
0,5
1
1,5
2
2,5
3
Read
Write
Read 2,89 2,73 2,73
Write 2,79 2,76 2,63
AcceptRemoved
DiskRebuild
Scalability of /fastfs file system
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
1 2 4 8 16 32 64 128
band
wid
th[G
B/s
]
CPUs
I/O-Benchmark 3928 MB / CPU, 8 chunks
read (venus) (1.67 GB/s max.)read (merkur) (1.73 GB/s max.)write (venus) (1.51 GB/s max.)
write (merkur) (1.18 GB/s max.)
Code Tuning: different compiler flags
905,748
638,904
0
200
400
600
800
1000
1200
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Flags
Tim
e[s]
Zellescher Weg 12
Willers-Bau A113
Tel. +49 351 - 463 - 39835
Matthias S. Mueller ([email protected])
Short Comparison Origin - Altix
Matrix Multiplication from www.benchit.de
0
1
2
3
4
5
6
7
0 100 200 300 400 500 600 700 800 900 1000
GFL
OP
S
Matrix Size
numerical.matmul.F77.0.0.double
Intel Itanium 2, FLOPS (jki)MIPS R12000, FLOPS (jki)
DGEMM from www.benchit.de
0
30
60
90
120
150
0 500 1000 1500 2000 2500 3000 3500 4000
Matrix Size
numerical.matmul.C.0.SCSL.double
1 Thread, Performance2 Threads, Performance4 Threads, Performance
16 Threads, Performance8 Threads, Performance
32 Threads, Performance
0
30
60
90
120
150
0 500 1000 1500 2000 2500 3000 3500 4000G
FLO
PS
Matrix Size
numerical.matmul.C.0.MKL.double
auto-parallelism (OpenMP) using Intel MKL, 2 CPUs, Performanceauto-parallelism (OpenMP) using Intel MKL, 4 CPUs, Performanceauto-parallelism (OpenMP) using Intel MKL, 8 CPUs, Performance
auto-parallelism (OpenMP) using Intel MKL, 32 CPUs, Performanceauto-parallelism (OpenMP) using Intel MKL, 16 CPUs, Performance
MPI Bandwidth
0
0.5
1
1.5
2
2.5
3
3.5
4
0 1 2 3 4 5 6 7 8 9 10
Ban
dwid
th [G
iB/s
]
Message Size [MiB]
MPI Bandwidth (Pingpong with 8 pairs)
AltixO3kK
Performance of Lautrec: O3K vs. Altix
Performance
0
50
100
150
200
250
300
350
0 10 20 30 40 50 60 70
#CPUs
Rel
. Spe
ed
O3K-00
Altix-00
O3K-01
Alitx-01
O3K-02
Alitx-02
O3K-03
Altix-03
O3K-04
Alitx04
Performance Ratio Altix3700/Origin3800 (preliminary)
0
10
20
201 243 247 252 441 446 450 621 644 649
ZIH Application Performance Competition
Prices are awarded for the best ratio between SGI Origin 3800 and SGI Altix 3700
Two categories:
– Single CPU performance
– 32 CPU performance
Criteria:
– Real application
– Demonstrated performance with Vampir tracefile
– Cheating is not allowed!!
Deadline: 28.2.2006
Winners will be selected by the ZIH award committee
ZIH staff is not eligible.