optimum system balance for systems of finite price john d. mccalpin, ph.d. ibm corporation austin,...
TRANSCRIPT
![Page 1: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649ee45503460f94bf3367/html5/thumbnails/1.jpg)
Optimum System Balance for Systems of Finite Price
John D. McCalpin, Ph.D.IBM Corporation
Austin, TX
SuperComputing 2004Pittsburgh, PA
November 10, 2004
![Page 2: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649ee45503460f94bf3367/html5/thumbnails/2.jpg)
Overview
• The HPC Challenge Benchmark was announced last year at SuperComputing’2003
• The HPC Challenge Benchmark consists of– LINPACK (HPL)– STREAM– PTRANS (transposing the array used by HPL)– RandomAccess (random read/modify/write)– FFT– and some low-level MPI latency & BW measurements
• No single figure of merit is defined
![Page 3: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649ee45503460f94bf3367/html5/thumbnails/3.jpg)
Overview (continued)
• Q: What is a “balanced” system?
• My answer:“A balanced system is one for which the primary applications are limited in performance by the most expensive component(s) of the system.”
![Page 4: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649ee45503460f94bf3367/html5/thumbnails/4.jpg)
The Two Questions
• We need to understand both performance and cost in the context of low-level component metrics
• Performance– What performance model?– Use a harmonically weighted, time-based model
• Cost– What cost model?– Simple linear additive cost model
![Page 5: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649ee45503460f94bf3367/html5/thumbnails/5.jpg)
Performance Model
• Composite Figures of Merit for Performance must be based on “time” rather than “rate”– i.e., weighted harmonic means of rates
• Why?– Combining “rates” in any other way fails to have a
“Law of Diminishing Returns”
• Time = Work/Rate• Repeat for each component: Ti = Wi/Ri
![Page 6: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649ee45503460f94bf3367/html5/thumbnails/6.jpg)
A Simple Composite Model
• Assume the time to solution is composed of a compute time proportional to peak GFLOPS plus a memory transfer time proportional to sustained memory bandwidth
• Assume “x Bytes/FLOP” to get:
• Target SPECfp_rate2000 as the workload
GB/s SustainedBytes
GFLOPSPeak op FP 1
op" FP Effective" 1 GFLOPS" Balanced"
x
![Page 7: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649ee45503460f94bf3367/html5/thumbnails/7.jpg)
Does Peak GFLOPS predict SPECfp_rate2000?SPECfp_rate2000 vs Peak MFLOPS
0
1000
2000
3000
4000
5000
6000
7000
8000
0.00 5.00 10.00 15.00 20.00 25.00 30.00
SPECfp_rate2000/cpu
Pe
ak
MF
LO
PS
![Page 8: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649ee45503460f94bf3367/html5/thumbnails/8.jpg)
Does Sustained Memory Bandwidth predict SPECfp_rate2000?
SPECfp_rate2000 vs Sustained BW
0.000
1.000
2.000
3.000
4.000
5.000
6.000
7.000
0.00 5.00 10.00 15.00 20.00 25.00 30.00
SPECfp_rate2000/cpu
GB
/s p
er
CP
U
![Page 9: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649ee45503460f94bf3367/html5/thumbnails/9.jpg)
Optimized Model Results
• Results rounded to nearby round values:– Bytes/FLOP for large caches === 0.16– Bytes/FLOP for small caches === 0.80– Size of asymptotically large cache === ~12 MB– Coefficient of best fit === ~6.4– The units of the coefficient are
SPECfp_rate2000 / Effective GFLOPS
![Page 10: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649ee45503460f94bf3367/html5/thumbnails/10.jpg)
Does this Revised Metric predict SPECfp_rate2000?Optimized SPECfp_rate2000 Estimates
0.00
5.00
10.00
15.00
20.00
25.00
30.00
0.00 5.00 10.00 15.00 20.00 25.00 30.00
SPECfp_rate2000/cpu
Es
tim
ate
d R
ate
/cp
u
![Page 11: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649ee45503460f94bf3367/html5/thumbnails/11.jpg)
Cost Model
• Assume simple linear additive model– FLOPS cost some amount
– Sustained BW costs a different amount
– Define:
= Rmem / Rcpu
= Wmem / Wcpu
= ($/BW) / ($/FLOPS)
System characteristics
Application characteristics
Technology characteristics
![Page 12: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649ee45503460f94bf3367/html5/thumbnails/12.jpg)
How to Optimize?
• For a given application (Wmem/Wcpu), what is the optimum system balance ?
• Many people seem to believe that the system should be “balanced” to the application:– optimal =
i.e.,
– Rmem/Rcpu = Wmem/Wcpu
• This does not optimize price/performance
![Page 13: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649ee45503460f94bf3367/html5/thumbnails/13.jpg)
The Correct Optimization
• This is actually an easy optimization problem• Minimize cost/performance
– Same as minimizing cost * time
• Optimum cost/performance occurs at – = sqrt(/)
• Definitely not intuitive!
![Page 14: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649ee45503460f94bf3367/html5/thumbnails/14.jpg)
Example: High BW, expensive BWgamma = 3, delta = 3
0.000
0.500
1.000
1.500
2.000
2.500
3.000
0.10 1.00 10.00
beta
rela
tive
pri
ce/p
erfo
rman
ce
= 3 relatively high BW = 3 relatively expensive BW
Optimum price/performance is at =1, not =3Improvement in price/performance is ~30%
![Page 15: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649ee45503460f94bf3367/html5/thumbnails/15.jpg)
High BW, very expensive memorygamma = 3, delta = 10
0.000
0.500
1.000
1.500
2.000
2.500
3.000
3.500
0.10 1.00 10.00
beta
rela
tive
pri
ce/p
erfo
rman
ce
= 3 relatively high BW = 10 very expensive BW
Optimum price/performance is at =0.58, not =3Improvement in price/performance is ~50%
![Page 16: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649ee45503460f94bf3367/html5/thumbnails/16.jpg)
Low-BW, expensive BWgamma = 0.1, delta = 3
0.000
2.000
4.000
6.000
8.000
10.000
12.000
14.000
0.10 1.00 10.00
beta
rela
tive
pri
ce/p
erfo
rman
ce
= 0.1 low BW application = 3 moderately expensive BW
Optimum price/performance is at =0.18, not =0.1Improvement in price/performance is ~5%
More BW helps here even though it is expensive, because the application does not need much.
![Page 17: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649ee45503460f94bf3367/html5/thumbnails/17.jpg)
Medium BW, expensive BWgamma = 1, delta = 3
0.000
0.500
1.000
1.500
2.000
2.500
3.000
3.500
4.000
4.500
0.10 1.00 10.00
beta
rela
tive
pri
ce/p
erfo
rman
ce
= 1 modest BW = 3 moderately expensive BW
Optimum price/performance is at =0.58, not =1Improvement in price/performance is ~10%
![Page 18: Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004](https://reader035.vdocuments.net/reader035/viewer/2022081519/56649ee45503460f94bf3367/html5/thumbnails/18.jpg)
Summary
• Balance is important to cost/performance
• You must understand performance
• You must understand cost
• Optimum cost-performance is not intuitive!