benchmarking,and,performance, evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp...
TRANSCRIPT
![Page 1: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/1.jpg)
Benchmarking and Performance Evalua5ons
Todd Mytkowicz Microso= Research
![Page 2: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/2.jpg)
Let’s pole for an upcoming elec5on
I ask 3 of my co-‐workers who they are vo3ng for.
![Page 3: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/3.jpg)
Let’s pole for an upcoming elec5on
I ask 3 of my co-‐workers who they are vo3ng for.
• My approach does not deal with – Variability – Bias
![Page 4: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/4.jpg)
Issues with my approach
Variability source: hDp://www.pollster.com
My approach is not reproducible
![Page 5: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/5.jpg)
Issues with my approach(II)
Bias
source: hDp://www.pollster.com
My approach is not generalizable
![Page 6: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/6.jpg)
Take Home Message
• Variability and Bias are two different things – Difference between reproducible and generalizable!
![Page 7: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/7.jpg)
Take Home Message
• Variability and Bias are two different things – Difference between reproducible and generalizable!
Do we have to worry about Variability and Bias when we benchmark?
![Page 8: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/8.jpg)
Let’s evaluate the speedup of my whizbang idea
What do we do about Variability?
![Page 9: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/9.jpg)
Let’s evaluate the speedup of my whizbang idea
What do we do about Variability?
![Page 10: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/10.jpg)
Let’s evaluate the speedup of my whizbang idea
What do we do about Variability?
• Sta3s3cs to the rescue – mean – confidence interval
![Page 11: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/11.jpg)
Intui5on for T-‐Test
• 1-‐6 is uniformly likely (p = 1/6) • Throw die 10 5mes: calculate mean
![Page 12: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/12.jpg)
Intui5on for T-‐Test
• 1-‐6 is uniformly likely (p = 1/6) • Throw die 10 5mes: calculate mean
Trial Mean of 10 throws
1 4.0
2 4.3
3 4.9
4 3.8
5 4.3
6 2.9
… …
![Page 13: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/13.jpg)
Intui5on for T-‐Test
• 1-‐6 is uniformly likely (p = 1/6) • Throw die 10 5mes: calculate mean
Trial Mean of 10 throws
1 4.0
2 4.3
3 4.9
4 3.8
5 4.3
6 2.9
… …
![Page 14: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/14.jpg)
Back to our Benchmark: Managing Variability
![Page 15: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/15.jpg)
Back to our Benchmark: Managing Variability
> x=scan('file') Read 20 items > t.test(x)
One Sample t-‐test
data: x t = 49.277, df = 19, p-‐value < 2.2e-‐16 95 percent confidence interval: 1.146525 1.248241 sample es5mates: mean of x 1.197383
![Page 16: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/16.jpg)
So we can handle Variability. What about Bias?
![Page 17: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/17.jpg)
System = gcc -‐O2 perlbench System + Innova5on = gcc -‐O3 perlbench
Evalua5ng compiler op5miza5ons
![Page 18: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/18.jpg)
Madan: speedup = 1.18 ± 0.0002
Conclusion: O3 is good
System = gcc -‐O2 perlbench System + Innova5on = gcc -‐O3 perlbench
Evalua5ng compiler op5miza5ons
![Page 19: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/19.jpg)
Madan: speedup = 1.18 ± 0.0002
Conclusion: O3 is good
Todd: speedup = 0.84 ± 0.0002
Conclusion: O3 is bad
System = gcc -‐O2 perlbench System + Innova5on = gcc -‐O3 perlbench
Evalua5ng compiler op5miza5ons
![Page 20: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/20.jpg)
Madan: speedup = 1.18 ± 0.0002
Conclusion: O3 is good
Todd: speedup = 0.84 ± 0.0002
Conclusion: O3 is bad
System = gcc -‐O2 perlbench System + Innova5on = gcc -‐O3 perlbench
Why does this happen?
Evalua5ng compiler op5miza5ons
![Page 21: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/21.jpg)
Madan: HOME=/home/madan
Todd: HOME=/home/toddmytkowicz
env
stack
text text
env
stack
Differences in our experimental setup
![Page 22: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/22.jpg)
Run5me of SPEC CPU 2006 perlbench depends on who runs it!
![Page 23: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/23.jpg)
32 randomly generated linking orders
Bias from linking order speedu
p
![Page 24: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/24.jpg)
32 randomly generated linking orders
Order of .o files can lead to contradictory conclusions
Bias from linking order speedu
p
![Page 25: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/25.jpg)
Where exactly does Bias come from?
![Page 26: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/26.jpg)
Interac5ons with hardware buffers
O2
Page N Page N + 1
![Page 27: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/27.jpg)
Interac5ons with hardware buffers
O2
Page N Page N + 1
Dead Code
![Page 28: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/28.jpg)
Interac5ons with hardware buffers
O2
Page N Page N + 1
Code affected by O3
![Page 29: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/29.jpg)
Interac5ons with hardware buffers
O2
Page N Page N + 1
Hot code
![Page 30: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/30.jpg)
Page N Page N + 1
Interac5ons with hardware buffers
O2
O3
O3 beDer than O2
![Page 31: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/31.jpg)
Page N Page N + 1
Interac5ons with hardware buffers
O2
O3
O2
O3
O3 beDer than O2
O2 beDer than O3
![Page 32: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/32.jpg)
Cachline N Cacheline N + 1
Interac5ons with hardware buffers
O2
O3
O2
O3
O3 beDer than O2
O2 beDer than O3
![Page 33: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/33.jpg)
Other Sources of Bias
• JIT • Garbage Collec5on • CPU Affinity
• Domain specific (e.g. size of input data)
• How do we manage these?
![Page 34: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/34.jpg)
Other Sources of Bias
How do we manage these? – JIT:
• ngen to remove impact of JIT • “warmup” phase to JIT code before measurement
– Garbage Collec5on • Try different heap sizes (JVM) • “warmup” phase to build data structures • Ensure program is not “leaking” memory
– CPU Affinity • Try to bind threads to CPUs (SetProcessAffinityMask)
– Domain Specific: • Up to you!
![Page 35: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/35.jpg)
R for the T-‐Test
• Where to download – hDp://cran.r-‐project.org
• Simple intro to get data into R
• Simple intro to do t.test
![Page 36: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/36.jpg)
![Page 37: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/37.jpg)
![Page 38: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/38.jpg)
![Page 39: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/39.jpg)
![Page 40: Benchmarking,and,Performance, Evaluaons,courses.cs.washington.edu › courses › csep506 › 11sp › ... · Let’s,pole,for,an,upcoming,elec5on, I"ask3"of"my"co,workers"who"they"are"vo3ng"for."](https://reader033.vdocuments.net/reader033/viewer/2022060409/5f1017d57e708231d4476933/html5/thumbnails/40.jpg)
Some Conclusions
• Performance Evalua5ons are hard! – Variability and Bias are not easy to deal with
• Other experimental sciences go to great effort to work around variability and bias – We should too!