The PinPoints Toolkit for Finding Representative Regions
of Large ProgramsHarish Patil
Platform Technology & Architecture DevelopmentEnterprise Platform Group
Intel CorporationPresented as part of the Pin tutorial at ASPLOS 2004, Boston, MA
10/09/2004
ASPLOS’04 2PinPoints
People
PinPoints: Harish Patil, Robert Cohn, Mark Charney, Andrew Sun, Rajiv Kapoor, Anand Karunanidhi
Pin: Robert Cohn, Artur Klauser, Geoff Lowney, CK Luk, Robert Muth, Harish Patil,Vijay Janapa Reddi, Steven Wallace
Acknowledgements: Brad Calder, Michael Greenfield, Geoff Lowney, Joel Emer, Chris Weaver, Michael Adler, Kim Hazelwood, James Vash, Ram Ramanujam, Roger Golliver, Timothy Prince, Allan Knies, Youngsoo Choi, Nechama Katan, Chris Gianos, Hideki Saito, Mahesh Madhav …
ASPLOS’04 3PinPoints
PinPoints
Representative Regions of Programs– Automatically chosen– Validated ( represent whole-program behavior)– For Trace-driven or Execution-driven Simulation
• Pin (Intel) : http://rogue.colorado.edu/Pin + SimPoint (UCSD) http://www.cse.ucsd.edu/~calder/simpoint/
Found/Validated PinPoints for long running (trillions of instructions) programs [IPF & x86]
ASPLOS’04 4PinPoints
Outline Of the Talk
• Why PinPoints?
• PinPoints methodology: How to find and validate representative regions for simulation
Reference: Paper in MICRO-37: “Pinpointing Representative Portions of Large Intel® Itanium® Programs with
Dynamic Instrumentation”, Patil et al.
http://rogue.colorado.edu/Pin/links.php PinPoints [download]
ASPLOS’04 5PinPoints
Motivation: Simulating Large Programs
Problem: Whole-program simulation is very slow (can take months)
Solution: Find representative simulation points– Programs have phases: random/blind selection may
miss them– SimPoint approach: Find phases using basic block
profile: one simulation point (PinPoint) per phasePinPoints : < 1% of program execution
– Capture whole-program behavior
ASPLOS’04 6PinPoints
Motivation: Simulating Large Programs (continued)
Problem: Porting programs to simulators is often not practical – license issues
– extra resources (disks etc.)
Solution: Drive simulation from native environment– Run under Pin
Pin runs programs “out-of-the-box” (no porting required)
ASPLOS’04 7PinPoints
The PinPoints Methodology
isimpoint : Generate Dynamic Basic Block Profile
SimPoint Tools: Analyze Basic Block Profile to find phases
Scripts: Generate PinPoints files
PinPoints file
H/W counters-based Validation
Sample Counters
Match?
Whole ProgramWeighted Sum
for PinPoints
Phase Detection+
PinPoint Selection
Trace Generation/Simulation
ASPLOS’04 8PinPoints
Phases in gzip’s Execution
Performance(IPC)
Energy usedper interval
Instructioncache misses
Data cache misses
2nd levelcache misses
Branchmisprediction
Instructions
ASPLOS’04 9PinPoints
SimPoint: You are what you execute
• Goal - track behavior of a program– Behavior caused by the path through code
• How - Track the code that is executing– Detect changes and similarities in code
• Basic Block Distribution Analysis– Generate and compare Code Signatures
ASPLOS’04 10PinPoints
Basic-Block Distribution Analysis
B C
A
D
E
3
2
3
1
1
< 3, 1, 2, 3, 1 >A B C D E
ASPLOS’04 11PinPoints
Basic-Block Distribution Analysis
B C
A
D
E
< 3, 2, 1, 3, 1 >< 2, 0, 2, 2, 2 >
< 3, 1, 2, 3, 1 >A B C D E
3
1
3
1
2
2
2
2
2
0
•Capture using isimpoint•Compare vectors•Group similar vectors in clusters•Choose one PinPoint per cluster
ASPLOS’04 12PinPoints
Phase Detection + PinPoint Selection
PinPoint 1: Weight 30% PinPoint 2: Weight 70%
pinpoints.pp
350 3518… …
1 2 350 4232… …
Profiles(vectors) for Program Slices (100 Million Instructions each)
1 2 1022 4232… …
Profile with isimpoint
Analyze with SimPoint
ASPLOS’04 13PinPoints
PinPoints Generated for Some Programs (Commercial and SPEC2000)
Program # Retired Instructions
(billions)
# Slices
(250 million insts.)
# PinPoints
AMBER-rt 3994 15975 6
Fluent-m3 2625 10499 8
LS-DYNA 4932 19729 6
SPECINT 142 567 4
SPECFP 373 1491 5
PinPoints : < 1% of program execution
ASPLOS’04 14PinPoints
PinPoints: Validation
• Do PinPoints capture whole-program behavior?Whole-Program CPI: Actual-CPI Predict using CPI for PinPoints: Predicted-CPI
Predicted-CPI = Weighti * CPIi% Delta = (Actual-CPI – Predicted-CPI)*100/ (Actual-CPI)
• Do they work across micro-architectures?– Predict performance on different configurations with
the same binary/PinPoints : Compare with actual performance
ASPLOS’04 15PinPoints
CPI: Some IPF SPEC2000 programs: Actual vs. Predicted
0.1
0.6
1.1
1.6
2.1
CP
I
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
% A
bs(
Del
ta in
CP
I)
%Delta
Whole_pgm_CPI
PinPoints_CPI
Predicting Whole-program CPI with PinPoints
(Itanium 2: 1.3 GHz)
ASPLOS’04 16PinPoints
CPI: Some x86 SPEC2000 programs: Actual vs. Predicted
0
5
10
15
20
25
CP
I
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Ab
s(%
Del
ta in
CP
I) %Delta
Whole_pgm_CPI
PinPoints_CPI
Predicting Whole-program CPI with PinPoints
(Pentium 4: 2.8 GHz)
ASPLOS’04 17PinPoints
Predicting Whole-program L2 Misses with PinPoints
(Itanium 2: 900 MHz)
L2 Misses Per Thousand Insts for Some SPEC2000 Programs Actual vs. Predicated
0102030405060708090
MP
KI
0.02.04.06.08.010.012.014.0 A
bs(
Del
ta in
MP
KI)
Delta
Whole_pgm_L2MPKI
PinPoints_L2MPKI
ASPLOS’04 18PinPoints
Speedup Prediction with PinPoints (Itanium 1, 2 varying Frequency)
Same binaries/ Same set of PinPoints : Different Microarchitectures
Speedup over Config1:Some IPF SPEC2000 programs: Actual vs Predicted
0
1
2
3
4
5
6
7
8
9
Sp
ee
du
p
Config2:Actual
Config2:Predicted
Config4:Actual
Config4:Predicted
ASPLOS’04 19PinPoints
Relevant Pin Tools
• isimpoint : generates basic block vectors in a format suitable for SimPoint analysis
• controller : allows fast-forwarding till a region of interest is reached
Specifying a region of interest:– Skip N instructions– Specific code address + Count– PinPoints file + PinPoint numberAvailable as “class CONTROL” in a Pin kit
ASPLOS’04 20PinPoints
Summary
Finding simulation points : The Pin Advantage• No special compiler/link flags or porting required• Allows analysis of programs as they run
PinPoints : < 1% of program execution• Predict whole-program behavior• Work across microarchitectures
ASPLOS’04 21PinPoints
Resources
• Timothy Sherwood, Erez Perelman, Greg Hamerly and Brad Calder. “Automatically Characterizing Large Scale Program Behavior” ASPLOS’02
• SimPoint toolkithttp://www-cse.ucsd.edu/~calder/simpoint/
• Harish Patil, Robert Cohn, Mark Charney, Rajiv Kapoor, Andrew Sun, and Anand Karunanidhi. “Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation” MICRO-37(2004).
• PinPoints toolkit: To be released soon (available upon request)