hpcmp benchmarking update
DESCRIPTION
Department of Defense High Performance Computing Modernization Program. HPCMP Benchmarking Update. Cray Henry April 2008. Outline. Context – HPCMP Initial Motivation from 2003 Process Review Results. DoD HPC Modernization Program. DoD HPC Modernization Program. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/1.jpg)
HPCMP Benchmarking Update
Cray HenryApril 2008
Department of DefenseHigh Performance Computing Modernization Program
![Page 2: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/2.jpg)
Outline
Context – HPCMP
Initial Motivation from 2003
Process Review
Results
![Page 3: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/3.jpg)
DoD HPC Modernization Program
![Page 4: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/4.jpg)
DoD HPC Modernization Program
![Page 5: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/5.jpg)
HPCMP Serves a Large, Diverse DoD User Community
519 projects and 4,086 users at approximately 130 sites
Requirements categorized in 10 Computational Technology Areas (CTA)
FY08 non-real-time requirements of 1,108 Habu-equivalents
156 users are self characterized as “Other”
Computational Structural Mechanics – 437 Users
Electronics, Networking, and Systems/C4I – 114 Users
Computational Chemistry, Biology & Materials Science – 408 Users
Computational Electromagnetics & Acoustics – 337 Users
Computational Fluid Dynamics – 1,572 Users
Environmental Quality Modeling & Simulation – 147 Users
Signal/Image Processing – 353 Users
Integrated Modeling & Test Environments – 139 Users
Climate/Weather/Ocean Modeling & Simulation – 241 Users
Forces Modeling & Simulation – 182 Users
![Page 6: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/6.jpg)
Benchmarks Have REAL Impact
In 2003 we started to describe our benchmarking approach
Today benchmarks are even more important
In 2003 we started to describe our benchmarking approach
Today benchmarks are even more important
![Page 7: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/7.jpg)
2003 Benchmark Focus
Focused on application benchmarks
Recognized application benchmarks were not enough
![Page 8: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/8.jpg)
2003 Challenge – Move to Synthetic Benchmarks
5 years later we have made progress, but not enough to fully transition to synthetics
Supported over $300M in purchases so far
![Page 9: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/9.jpg)
Comparison of HPCMP System Capabilities – FY 2003 - FY 2008
![Page 10: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/10.jpg)
What Has Changed Since 2003
(TI-08) Introduction of performance modeling and predictions– Primary emphases still on application benchmarks
– Performance modeling now used to predict some application performance
– Performance predictions and measured benchmark results compared for HPCMP systems used in TI-08 to assess accuracy
(TI-08) Met one on one with vendors to review performance predictions for each vendor’s individual systems
![Page 11: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/11.jpg)
Overview of TI-XX Acquisition Process
Determine requirements, usage, and allocations
Choose application benchmarks, test cases, and weights
Vendors provide measured and projected times on offered systems
Vendors provide measured and projected times on offered systems
Measure benchmark times on DoD standard system
Measure benchmark times on existing DoD systems
Determine performance for each offered system per application test case
Determine performance for each existing system per application test case
Determine performance for each offered system
Usability/past performance information on offered systems
Usability/past performance information on offered systems
Collective acquisition decision
Use optimizer to determine price/performance for each offered system and combination of systems
Center facility requirements
Vendor pricingVendor pricing
Life-cycle costs for offered systems
![Page 12: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/12.jpg)
TI-09 Application Benchmarks
AMR – Gas dynamics code
– (C++/FORTRAN, MPI, 40,000 SLOC)
AVUS (Cobalt-60) – Turbulent flow CFD code
– (Fortran, MPI, 19,000 SLOC)
CTH – Shock physics code
– (~43% Fortran/~57% C, MPI, 436,000 SLOC)
GAMESS – Quantum chemistry code
– (Fortran, MPI, 330,000 SLOC)
HYCOM – Ocean circulation modeling code– (Fortran, MPI, 31,000 SLOC)
ICEPIC – Particle-in-cell magnetohydrodynamics code – (C, MPI, 60,000 SLOC)
LAMMPS – Molecular dynamics code– (C++, MPI, 45,400 SLOC)
Red = predictedBlack = benchmarkedRed = predictedBlack = benchmarked
![Page 13: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/13.jpg)
Predicting Code Performance for TI-08 and TI-09
*The next 12 charts were provided by the Performance Modeling and Characterization Group at the San Diego Supercomputer Center.
![Page 14: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/14.jpg)
Prediction Framework – Processor and Communications Models
![Page 15: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/15.jpg)
Memory Subsystem Is Key in Predicting Performance
![Page 16: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/16.jpg)
Red Shift – Memory Subsystem Bottleneck
0.01
0.1
1
10
100
1000
1982
1984
1985
1987
1989
1991
1993
1995
1997
1999
2001
2003
2005
2007n
ano
seco
nd
s
CPU Cycle Time
Multi Core Effective Cycle Time
Memory Access Time
![Page 17: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/17.jpg)
Predicted Compute Time Per Core –HYCOM
![Page 18: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/18.jpg)
MultiMAPS System Profile
One curve per stride pattern– Plateaus
correspond to data fitting in cache
– Drops correspond to data split between cache levels
MultiMAPS ported to C and will be included in HPC Challenge Benchmarks
0.0E+00
2.0E+03
4.0E+03
6.0E+03
8.0E+03
1.0E+04
1.2E+04
1.4E+04
1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 1.E+08 1.E+09
Working Set size (8-byte words)
Me
mo
ry B
an
dw
idth
(M
B/s
) IBM p655
SGI Altix
IBM Opteron
Working Set Size (8 Byte Words)Working Set Size (8 Byte Words)
Mem
ory
B
and
wid
th (
MB
/s)
Mem
ory
B
and
wid
th (
MB
/s)
Sample MultiMAPS Output Sample MultiMAPS Output
![Page 19: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/19.jpg)
0.00E+00
1.00E+03
2.00E+03
3.00E+03
4.00E+03
5.00E+03
6.00E+03
7.00E+03
8.00E+03
1.00E+05 1.00E+06 1.00E+07 1.00E+08 1.00E+09
Size (bytes)
Me
mo
ry B
an
dw
idth
(M
B/s
)
1255 1core
1255 4 cores
L2 cache being shared
4 Core Woodcrest Node
Modeling the Effects of Multicore
![Page 20: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/20.jpg)
LAMMPS LRG 256
-0.5
0.0
0.5
1.0
1.5
flops
L1 cache
L2 cache
L3 cache
Main memoryOn-node lat
On-node BW
Off-node lat
Off-node BW
Performance Sensitivity of LAMMPS LRG to 2x Improvements
![Page 21: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/21.jpg)
Performance Sensitivity of OVERFLOW2 STD to 2x Improvements
![Page 22: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/22.jpg)
Performance Sensitivity of OVERFLOW2 LRG to 2x Improvements
![Page 23: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/23.jpg)
-0.5
0.0
0.5
1.0
1.5
flops
L1 cache
L2 cache
L3 cache
Main memoryOn-node lat
On-node BW
Off-node lat
Off-node BW
WRF STD 64
AMR STD 128
LAMMPS LRG 256
Overflow2 LRG 512
Main Memory and L1 Cache Have Most Effect on Runtime
![Page 24: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/24.jpg)
Differences Between Predicted and Measured Benchmark Times (Unsigned)
Application Test Case
System AMRStd
AMRLg
ICEPIC Std
ICEPICLg
LAMMPS Lg
OVERFLOW2 Std
OVERFLOW2 Lg
WRF Std
WRFLg
Overall
ASC HP Opteron Cluster
16.6% 6.3% - - 2.9% 8.0% 43.0% - - 15.4%
ASC SGI Altix 14.1% 3.4% 22.1% 15.6% 7.5% 4.1% 10.0% 24.3% 16.5% 13.1%
MHPCC Dell Xeon Cluster
20.7% 14.7% 6.7% 4.2% 8.1% 23.3% - - - 13.0%
NAVO IBM P5+ 11.7% - - - 9.6% 3.0% 1.8% 7.8% 16.4% 8.4%
Overall 12.4%
Note: Average uncertainties of measured benchmark times on loaded HPCMP systems are approximately 5%.
![Page 25: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/25.jpg)
Solving the hard problems . . .04/21/23
25
ASC_HP (AMR_LG)
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
192 256 320 384 448 512 576 640 704 768 832
Cores
Rel
ativ
e P
erfo
rman
ce (S
app
hir
e E
q.)
Benchmark Data
Benchmark Curve
Predicted Data
Predicted Curve
STM Range (B)
STM Range (P)
![Page 26: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/26.jpg)
Solving the hard problems . . .04/21/23
26
ASC_DC_Altix (AMR_LG)
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
192 256 320 384 448 512 576 640 704 768 832
Cores
Rel
ativ
e P
erfo
rman
ce (S
app
hir
e E
q.)
Benchmark Data
Benchmark Curve
Predicted Data
Predicted Curve
STM Range (B)
STM Range (P)
![Page 27: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/27.jpg)
Solving the hard problems . . .04/21/23
27
MHPCC_DC_Dell (AMR_LG)
0.6
0.8
1.0
1.2
1.4
1.6
1.8
192 256 320 384 448 512 576 640 704 768 832
Cores
Rel
ativ
e P
erfo
rman
ce (S
app
hir
e E
q.)
Benchmark Data
Benchmark Curve
Predicted Data
Predicted Curve
STM Range (B)
STM Range (P)
![Page 28: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/28.jpg)
MHPCC_DC_Dell (OVERFLOW2_STD)
0.0
0.5
1.0
1.5
2.0
2.5
0 64 128 192 256 320
Cores
Rel
ativ
e P
erfo
rman
ce (
Sap
phir
e E
q.)
Benchmark Data
Benchmark Curve
Predicted Data
Predicted Curve
STM Range (B)
STM Range (P)
![Page 29: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/29.jpg)
NAVO_DC_P5+ (OVERFLOW2_LG)
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
192 256 320 384 448 512 576 640 704 768 832
Cores
Rel
ativ
e P
erfo
rman
ce (
Sap
phir
e E
q.)
Benchmark Data
Benchmark Curve
Predicted Data
Predicted Curve
STM Range (B)
STM Range (P)
![Page 30: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/30.jpg)
Solving the hard problems . . .04/21/23
30
ASC_DC_Altix (ICEPIC_LG)
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
192 320 448 576 704 832 960 1088 1216 1344 1472 1600
Cores
Rel
ativ
e P
erfo
rman
ce (
Sap
phir
e E
q.)
Benchmark Data
Benchmark Curve
Predicted Data
Predicted Curve
STM Range (B)
STM Range (P)Pseudo Score
![Page 31: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/31.jpg)
Solving the hard problems . . .04/21/23
31
ASC_HP (LAMMPS_LG)
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
192 256 320 384 448 512 576 640 704
Cores
Rel
ativ
e P
erfo
rman
ce (S
app
hir
e E
q.)
Benchmark Data
Benchmark Curve
Predicted Data
Predicted Curve
STM Range (P)
STM Range (B)
![Page 32: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/32.jpg)
Solving the hard problems . . .04/21/23
32
ASC_DC_Altix (AMR_STD)
0.2
0.6
1.0
1.4
1.8
2.2
2.6
3.0
3.4
0 64 128 192 256 320
Cores
Rel
ativ
e P
erfo
rman
ce (S
app
hir
e E
q.)
Benchmark Data
Benchmark Curve
Predicted Data
Predicted Curve
STM Range (B)
STM Range (P)
![Page 33: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/33.jpg)
NAVO_DC_P5+ (WRF_STD)
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
0 64 128 192 256 320
Cores
Rel
ativ
e P
erfo
rman
ce (
Sap
phir
e E
q.)
Benchmark Data
Benchmark Curve
Predicted Data
Predicted Curve
STM Range (B)
STM Range (P)
![Page 34: HPCMP Benchmarking Update](https://reader036.vdocuments.net/reader036/viewer/2022062807/56815001550346895dbdcd6b/html5/thumbnails/34.jpg)
What’s Next?
More focus on Signature Analysis
Continue to evolve application benchmarks to represent accurately the HPCMP computational workload
Increase profiling and performance modeling to understand application performance better
Use performance predictions to supplement application benchmark measurements and to guide vendors in designing more efficient systems