p ath & e dge p rofiling
DESCRIPTION
P ath & E dge P rofiling. Continuous. Michael Bond, UT Austin Kathryn McKinley, UT Austin. Presented by: Yingyi Bu. Why profile?. Inform optimizations Target hot code Inlining and unrolling Code scheduling and register allocation Increasingly important for speculative optimization - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/1.jpg)
Path &
Edge
Profiling
Michael Bond, UT Austin
Kathryn McKinley, UT Austin
Continuous
Presented by: Yingyi Bu
![Page 2: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/2.jpg)
Why profile?
Inform optimizations– Target hot code– Inlining and unrolling– Code scheduling and register allocation
Increasingly important for speculative optimization– Hardware trends simplicity & multiple contexts– Less speculation in hardware, more in software
![Page 3: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/3.jpg)
Speculative optimization
Superblocks
[Hwu et al. ’93]
Dynamo fragments
[Bala et al. ’00]
Hyperblocks
[Mahlke et al. ‘92]
rePLay frames
[Patel & Lumetta ’99]
MSSP tasks
[Zilles & Sohi ’02]
Less speculative More speculative
![Page 4: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/4.jpg)
Speculative optimization
Superblocks
[Hwu et al. ’93]
Dynamo fragments
[Bala et al. ’00]
Hyperblocks
[Mahlke et al. ‘92]
rePLay frames
[Patel & Lumetta ’99]
MSSP tasks
[Zilles & Sohi ’02]
Less speculative More speculative
![Page 5: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/5.jpg)
Predict future with representative profile– Accurate– Continuous
Profiling requirements
![Page 6: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/6.jpg)
Predict future with representative profile– Accurate– Continuous
Other requirements– Low overhead– Portable– Path profiling
Previous work struggles to meet all goals
Profiling requirements
![Page 7: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/7.jpg)
Predict future with representative profile– Accurate: 94-96%– Continuous: yes
Other requirements– Low overhead: 1.2%– Portable: yes– Path profiling: yes
PEP: continuous path and edge profiling
![Page 8: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/8.jpg)
Combines all-the-time instrumentation & sampling
PEP: continuous path and edge profiling
New!
![Page 9: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/9.jpg)
Combines all-the-time instrumentation & sampling– Instrumentation computes
path number– Sampling updates profiles
using path number
PEP: continuous path and edge profiling
r=r+2
r=0
SAMPLE r
r=r+1
New!
![Page 10: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/10.jpg)
Combines all-the-time instrumentation & sampling– Instrumentation computes
path number– Sampling updates profiles
using path number– Overhead: 30% 2%
PEP: continuous path and edge profiling
r=r+2
r=0
SAMPLE r
r=r+1
New!
![Page 11: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/11.jpg)
Combines all-the-time instrumentation & sampling– Instrumentation computes
path number– Sampling updates profiles
using path number– Overhead: 30% 2%
Path profiling as efficient means of edge profiling
PEP: continuous path and edge profiling
r=r+2
r=0
SAMPLE r
r=r+1
New!
New!
![Page 12: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/12.jpg)
Outline
Introduction Background: Ball-Larus path profiling PEP Implementation & methodology Overhead & accuracy
![Page 13: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/13.jpg)
Ball-Larus path profiling
Acyclic, intraprocedural paths Instrumentation maintains execution
frequency of each path– Each path computes unique integer in [0, N-1]
![Page 14: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/14.jpg)
Ball-Larus path profiling
4 paths [0, 3]
![Page 15: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/15.jpg)
Ball-Larus path profiling
2
1
4 paths [0, 3] Each path sums to
unique integer
0
0
![Page 16: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/16.jpg)
Ball-Larus path profiling
2
4 paths [0, 3] Each path sums to
unique integer
Path 01 0
0
![Page 17: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/17.jpg)
Ball-Larus path profiling
2
4 paths [0, 3] Each path sums to
unique integer
Path 0
Path 11 0
0
![Page 18: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/18.jpg)
Ball-Larus path profiling
2
4 paths [0, 3] Each path sums to
unique integer
Path 0
Path 1
Path 2
1 0
0
![Page 19: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/19.jpg)
Ball-Larus path profiling
2
4 paths [0, 3] Each path sums to
unique integer
Path 0
Path 1
Path 2
Path 3
1 0
0
![Page 20: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/20.jpg)
Ball-Larus path profiling
2
1 0
0
r: path register– Computes path number
count: frequency table– Stores path frequencies
![Page 21: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/21.jpg)
Ball-Larus path profiling
r=0
count[r]++
2
1 0
0
r: path register– Computes path number
count: frequency table– Stores path frequencies
![Page 22: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/22.jpg)
Ball-Larus path profiling
r=r+2
r=0
count[r]++
r: path register– Computes path number
count: frequency table– Stores path frequencies
r=r+1
![Page 23: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/23.jpg)
Ball-Larus path profiling
r=r+2
r=0
r=r+1
count[r]++
r: path register– Computes path number
count: frequency table– Stores path frequencies– Array by default– Too many paths?
Hash table
![Page 24: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/24.jpg)
Outline
Introduction Background: Ball-Larus path profiling PEP Implementation & methodology Overhead & accuracy
![Page 25: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/25.jpg)
r=r+2
r=0
count[r]++
r=r+1
Motivation for PEP
Computes path
Updates path profile
![Page 26: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/26.jpg)
r=r+2
r=0
count[r]++
r=r+1
Motivation for PEP
cheap <10%
expensive >90%
Where have all the cycles gone?
![Page 27: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/27.jpg)
r=r+2
r=0
r=r+1
What PEP does
All-the-time instrumentation
Sampling
(piggybacks on existing mechanism)
SAMPLE r
![Page 28: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/28.jpg)
r=r+2
r=0
r=r+1
What PEP does
All-the-time instrumentation
Sampling
(piggybacks on existing mechanism)
Overhead: 30% 2%
SAMPLE r
![Page 29: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/29.jpg)
freq = 30
freq = 90 freq = 10
freq = 70
Profile-guided profiling
Existing edge profile informs path profiling [Joshi et al. ’04]
![Page 30: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/30.jpg)
2
0 1
0
Profile-guided profiling
Existing edge profile informs path profiling [Joshi et al. ’04]
Assign zero to hotter edges
![Page 31: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/31.jpg)
r=r+2
r=0
SAMPLE r
r=r+1
Profile-guided profiling
Existing edge profile informs path profiling [Joshi et al. ’04]
Assign zero to hotter edges– No instrumentation
![Page 32: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/32.jpg)
Existing edge profile informs path profiling [Joshi et al. ’04]
Assign zero to hotter edges– No instrumentation
From practical path profiling [Bond & McKinley ’05]
r=r+2
r=0
SAMPLE r
r=r+1
Profile-guided profiling
![Page 33: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/33.jpg)
Outline
Introduction Background: Ball-Larus path profiling PEP Implementation & methodology Overhead & accuracy
![Page 34: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/34.jpg)
Implementation
Jikes RVM– High performance, Java-in-Java VM– Adaptive compilation triggered by sampling
Two compilers– Baseline compiles at first invocation
Adds instrumentation-based edge profiling
– Optimizer recompiles hot methods Three optimization levels
PEP implemented in optimizing compiler
![Page 35: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/35.jpg)
PEP sampling
Piggybacks on existing thread-switching mechanism
![Page 36: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/36.jpg)
PEP sampling
if (flag) handler();
if (flag) handler();
if (flag) handler(); Piggybacks on existing
thread-switching mechanism Jikes RVM inserts yieldpoints
at loop headers and method entry & exit– Yieldpoint handlers switch
threads and update profiles
![Page 37: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/37.jpg)
Piggybacks on existing thread-switching mechanism
Jikes RVM inserts yieldpoints at loop headers and method entry & exit– Yieldpoint handlers switch
threads and update profiles
PEP samples r at headers and method exit– Updates path and edge profiles
PEP sampling
if (flag) handler(r);
if (flag) handler(r);
if (flag) handler();
![Page 38: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/38.jpg)
Sampling methodology
Classic
![Page 39: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/39.jpg)
Sampling methodology
Classic
Arnold-Grove sampling: invaluable for PEP– Multiple samples per timer tick– Strides to reduce timer-based bias
![Page 40: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/40.jpg)
Sampling methodology
Classic
Arnold-Grove sampling: invaluable for PEP– Multiple samples per timer tick– Strides to reduce timer-based bias
PEP strides before first sample only, not between samples
– Timer goes off every 20 ms– Stride 0 -16 samples, then 64 consecutive samples– PEP sampling rate: 3200 paths/second
![Page 41: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/41.jpg)
Execution methodology
Adaptive: normal adaptive run– Different behavior from run to run
Adaptive execution
Adaptive run
![Page 42: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/42.jpg)
Execution methodology
Adaptive: normal adaptive run– Different behavior from run to run
Replay: deterministic compilation decisions– First run includes compilation– Second run is application only [Eeckhout et al. 2003]
Adaptive execution Replay execution
Optimization decisions
Edge profile
Adaptive runFirst run
(includes compiler)Call graph profile
Second run
(application only)
![Page 43: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/43.jpg)
Benchmarks and platform
SPEC JVM98 pseudojbb: SPEC JBB2000, fixed workload DaCapo Benchmarks
– Exclude hsqldb
3.2 GHz Pentium 4 with Linux– 8K DL1, 12Kμop IL1, 512K L2, 1GB memory
![Page 44: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/44.jpg)
Outline
Introduction Background: Ball-Larus path profiling PEP Implementation & methodology Accuracy & overhead
![Page 45: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/45.jpg)
Path accuracy
Compare PEP’s path profile to perfect profile Wall weight-matching scheme [Wall ’91]
– Measures how well PEP predicts hot paths Branch-flow metric [Bond & McKinley ’05]
– Weights paths by their lengths
![Page 46: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/46.jpg)
Path accuracy
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Ac
cu
rac
y
![Page 47: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/47.jpg)
Edge accuracy
Compare PEP’s edge profile to perfect profile Relative overlap
– Measures how well PEP predicts edge frequency relative to source basic block
– Jikes RVM uses relative frequencies only
![Page 48: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/48.jpg)
Edge accuracy
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Ac
cu
rac
y
![Page 49: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/49.jpg)
Compilation overhead
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
To
tal E
xe
cu
tio
n T
ime
Without PEP With PEP
![Page 50: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/50.jpg)
Execution overhead
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Ove
rhead
Instrumentation only Instrumentation & sampling
![Page 51: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/51.jpg)
Execution overhead
0%
1%
2%
3%
4%
5%
6%
Ove
rhead
Instrumentation only Instrumentation & sampling
![Page 52: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/52.jpg)
Related work
Instrumentation [Ball & Larus ’96]
– High overhead One-time profiling [Jikes RVM baseline compiler]
– Vulnerable to phased behavior
Sampling– Code sampling [Anderson et al. ’00]
No path profiling
– Code switching [Arnold & Ryder ’01, dynamic instrumentation]
Hardware [Vaswani et al. ’05, Shye et al. ‘05]
![Page 53: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/53.jpg)
Summary
Continuous & accurate profiling needed for aggressive, speculative dynamic optimization
PEP: continuous, accurate, low overhead, portable, path profiling
![Page 54: P ath & E dge P rofiling](https://reader035.vdocuments.net/reader035/viewer/2022062301/56815862550346895dc5beef/html5/thumbnails/54.jpg)
Thank you!
Questions?