- 1 - copyright © 2006 intel corporation. all rights reserved. techniques for speeding up pin-based...
Post on 17-Dec-2015
214 Views
Preview:
TRANSCRIPT
- 1 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
Techniques for Speeding up Techniques for Speeding up Pin-based SimulationPin-based Simulation
Harish PatilHarish Patil
- 2 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
ISIS : High-level techniques for : High-level techniques for speeding up Pin-based simulationspeeding up Pin-based simulation
IS NotIS Not : low-level optimizations (in- : low-level optimizations (in-lining etc.) of Pintoolslining etc.) of Pintools
Two usage modelsTwo usage models
ObjectiveObjective
Pin-toolSimulator
Pin-tool Simulator
- 3 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
OutlineOutline
Two techniques:Two techniques:
1.1. Selective simulationSelective simulation
2.2. Conditional instrumentationConditional instrumentation
PinPoints : Selecting simulation PinPoints : Selecting simulation regions with Pin and SimPointregions with Pin and SimPoint
Case Study: Pin Case Study: Pin SimpleScalar.x86 SimpleScalar.x86
- 4 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
Instruction Counts : Some IPF Instruction Counts : Some IPF ApplicationsApplications
Real Applications Are Long-running# Instructions (billions)
142 373 463
3,979 3,994
4,932
0
1,000
2,000
3,000
4,000
5,000
6,000
IPF Applications
# In
stru
ctio
ns (b
illio
ns)
- 5 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
Problem: Whole-Program Problem: Whole-Program Simulation is SlowSimulation is Slow
Simulation Time in YEARS@ 10,000 Instructions/Second
0.41.2 1.5
12.6 12.7
15.6
02468
1012141618
IPF Applications
Yea
rs
- 6 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
Solution: Select Simulation PointsSolution: Select Simulation Points
Select One PointSelect One Point
– At the beginning (no skip)At the beginning (no skip)
– After 1 billion instructions After 1 billion instructions
– After skipping a random number of instructionsAfter skipping a random number of instructions
Select Multiple PointsSelect Multiple Points
– Manually by looking at performance data Manually by looking at performance data
– Randomly anywhereRandomly anywhere
– Randomly from uniform regionsRandomly from uniform regions
– By program phase analysis (SimPoint : UCSD)By program phase analysis (SimPoint : UCSD)
– Fine-grain sampling (SMARTS: CMU)Fine-grain sampling (SMARTS: CMU)
Fast-forward
Simulation Fast-forward
Simulation
- 7 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
How Pin Supports Selective How Pin Supports Selective Simulation?Simulation?
Class CONTROL : in InstLib/control.HClass CONTROL : in InstLib/control.H(via instlib.H)(via instlib.H)Pintool includes the class and provides a Pintool includes the class and provides a “Handler” for “start and end of region”“Handler” for “start and end of region”
Provides a number of switches:Provides a number of switches:
– For specifying “start of region”For specifying “start of region” -skip <instruction count> -skip <instruction count> -start_address <Address> -start_address <Address>……
– For specifying “end of region”For specifying “end of region”-length <instruction count>-length <instruction count>-stop_address <Address>-stop_address <Address>……
- 8 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
InstlibExamples/controlInstlibExamples/control
$ $ pinpin –t –t control –skip 100 –length 500control –skip 100 –length 500 –- –- hellohelloip: 0x40000e00 104 Startip: 0x40000e00 104 Start
ip: 0x4000105e 598 Stopip: 0x4000105e 598 Stop
Hello worldHello world
Other example switches:Other example switches:One region:One region:
1.1. -start_address foo:10 -length 500-start_address foo:10 -length 500
Multiple regions:Multiple regions:
2.2. -uniform_period 1000 uniform_length 200-uniform_period 1000 uniform_length 200
3.3. -ppfile foo.pp-ppfile foo.pp
- 9 -
Copyright © 2006 Intel Corporation. All Rights Reserved.#include "instlib.H"#include "instlib.H"
using namespace INSTLIB;using namespace INSTLIB;
// Contains knobs and instrumentation to recognize start/stop points// Contains knobs and instrumentation to recognize start/stop points
CONTROL control;CONTROL control;
VOID VOID Handler(CONTROL_EVENT evHandler(CONTROL_EVENT ev, VOID *v, CONTEXT *ct, VOID *ip, VOID *tid), VOID *v, CONTEXT *ct, VOID *ip, VOID *tid)
{ std::cout << "ip: " << ip << " " << icount.Count() ;{ std::cout << "ip: " << ip << " " << icount.Count() ;
switch(ev){switch(ev){
case CONTROL_START:case CONTROL_START:
std::cout << "Start" << endl;std::cout << "Start" << endl;
break;break;
case CONTROL_STOP:case CONTROL_STOP:
std::cout << "Stop" << endl;std::cout << "Stop" << endl;
break;break;
default:default:
ASSERTX(false);ASSERTX(false);
break;break;}}
}}
main() {main() {......
control.CheckKnobs(Handler, 0);control.CheckKnobs(Handler, 0);}}
analysis routine
InstLibExamples/control.C
Instrumentation (hidden)
- 10 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
Recap: Instrumentation vs. Analysis Recap: Instrumentation vs. Analysis InstrumentationInstrumentation routinesroutines define where define where
instrumentation isinstrumentation is inserted inserted
– e.g. before instructione.g. before instruction
Occurs Occurs first timefirst time an instruction is executed an instruction is executed
Analysis routinesAnalysis routines define what to do when define what to do when instrumentation is instrumentation is activatedactivated
– e.g. increment countere.g. increment counter
Occurs every timeOccurs every time an instruction is executed an instruction is executed
- 11 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
Selective Simulation: Naive Selective Simulation: Naive Approach: Conditional AnalysisApproach: Conditional Analysis
LOCALVAR INT32 enabled = 0;LOCALVAR INT32 enabled = 0;
VOID Simulation()VOID Simulation()
{{
if(!enabled) return;if(!enabled) return;
// Analysis code for detailed simulation// Analysis code for detailed simulation
}}
VOID Handler { VOID Handler {
switch(ev){switch(ev){
case CONTROL_START:case CONTROL_START:
enabled = 1;enabled = 1;
break;break;
case CONTROL_STOP:case CONTROL_STOP:
enabled = 0;enabled = 0;
break;break;
}}
Conditional Analysis routine
Instrumentation always present !
- 12 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
Changing Instrumentation on-the-flyChanging Instrumentation on-the-fly
PIN_RemoveInstrumentation()PIN_RemoveInstrumentation()All instrumentation is removed. When application code is All instrumentation is removed. When application code is executed the instrumentation routines will be called to re-executed the instrumentation routines will be called to re-instrument all codeinstrument all code
Removes old instrumentation, forces Removes old instrumentation, forces instrumentation to be done again (after a instrumentation to be done again (after a delay)delay)
PIN_ExecuteAt ( const CONTEXT * ctxt ) PIN_ExecuteAt ( const CONTEXT * ctxt ) Starts execution at an arbitrary point given the architectural Starts execution at an arbitrary point given the architectural state.state.
– CONTEXT passed in to Handler()CONTEXT passed in to Handler()
– Currently only on IA32 and IA32ECurrently only on IA32 and IA32E
- 13 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
Selective Simulation: Faster Approach: Selective Simulation: Faster Approach: Conditional InstrumentationConditional InstrumentationLOCALVAR INT32 enabled = 0;LOCALVAR INT32 enabled = 0;
VOID Trace(){VOID Trace(){
if(!enabled) return;if(!enabled) return;
// Add instrumentation for detailed simulation// Add instrumentation for detailed simulation
}}
VOID Handler (... CONTEXT *ctxt ... ) { VOID Handler (... CONTEXT *ctxt ... ) {
switch(ev){switch(ev){
case CONTROL_START:case CONTROL_START:
enabled = 1;enabled = 1;
PIN_RemoveInstrumentation();PIN_RemoveInstrumentation();
if (ctxt) PIN_ExecuteAt(ctxt); // Only on IA32/IA32Eif (ctxt) PIN_ExecuteAt(ctxt); // Only on IA32/IA32E
break;break;
case CONTROL_STOP:case CONTROL_STOP:
enabled = 0;enabled = 0;
PIN_RemoveInstrumenation();PIN_RemoveInstrumenation();
if (ctxt) PIN_ExecuteAt(ctxt); // Only on IA32/IA32Eif (ctxt) PIN_ExecuteAt(ctxt); // Only on IA32/IA32E
break;break;
}}
Conditional instrumentation routine
Instrumentation only in simulation regions
DebugTrace/debugtrace.C
- 14 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
Comparing Naïve vs. Fast ApproachComparing Naïve vs. Fast Approachnaïve_debugtrace naïve_debugtrace vs. vs. debugtracedebugtrace
Switches: Switches: -skip 100000000 -length 1000 -skip 100000000 -length 1000 -instruction -memory -early_out-instruction -memory -early_out
Naïve approach : Conditional Analysis
Fast approach (default) : Conditional Instrumentation
- 15 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
debugtrace: Conditional Analysis vs debugtrace: Conditional Analysis vs Conditional InstrumentationConditional Instrumentation
Fast-forwarding is 5X faster with conditional instrumentation!
Fast-forward
Simulation Fast-forward
Simulation
Time to skip 100 million instructions
0
50
100
150
200
250
SPECINT SPECFP
Seco
nd
s naive_debugtrace
debugtrace (default)
- 16 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
Simulation Point Selection: Simulation Point Selection: Re-visitedRe-visited
Select One PointSelect One Point
– At the beginning (no skip)At the beginning (no skip)
– After 1 billion instructions After 1 billion instructions
– After skipping a random number of instructionsAfter skipping a random number of instructions
Select Multiple PointsSelect Multiple Points
– Manually by looking at performance data Manually by looking at performance data
– Randomly anywhereRandomly anywhere
– Randomly from uniform regionsRandomly from uniform regions
– By program phase analysis (SimPoint : UCSD)By program phase analysis (SimPoint : UCSD)
– Fine-grain sampling (SMARTS: CMU)Fine-grain sampling (SMARTS: CMU)
Question: Are the simulation points representative?
- 17 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
CPI: CPI: Average Error Average Error SPEC2000(IA32)SPEC2000(IA32)Whole Program vs. Selected Whole Program vs. Selected
PointsPoints
27.1%
13.6%10.8%
8.9%
4.6%
48.0%
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
1Selection method
Ave
rag
e C
PI E
rro
r
No Skip:1 point (N*100 million insts.)
Skip 1 billion: 1 point (N*100 million insts)
Skip Random: 1 point (N*100 million insts.)
Random: N points (100 million insts. each)
Uniform Random: N points (100 million insts. each)
Phase-based: N points (100 million insts. each)
- 18 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
PinPoints PinPoints http://rogue.colorado.edu/Pin/PinPoints/http://rogue.colorado.edu/Pin/PinPoints/
Pin (Intel) + SimPoint (UCSD)Pin (Intel) + SimPoint (UCSD)
What are PinPoints?What are PinPoints? Representative regions of programsRepresentative regions of programs
– Automatically chosenAutomatically chosen
– Validated ( represent whole-program behavior)Validated ( represent whole-program behavior)
– For trace-driven or execution-driven simulationFor trace-driven or execution-driven simulation
Found/validated PinPoints for long running Found/validated PinPoints for long running (trillions of instructions) programs [(trillions of instructions) programs [IA-32, IA-32, EM64T, ItaniumEM64T, Itanium]]
- 19 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
Phase Detection Phase Detection ++ PinPoint Selection PinPoint Selection
PinPoint 1: Weight 30% PinPoint 2: Weight 70%
Choose one simulation
point per phase…350 3518 …
1 2 350 4232… …
1 2 1022 4232… …
Profile with isimpoint
Intervals :100 million
Instructions each
PinPoints file
3518 Find phases
Two Phases => Two PinPoints
Bb-vectorsAnalyze with SimPoint
- 20 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
Inside a PinPoints fileInside a PinPoints file
Region-numberRegion-numberSlice-number WeightSlice-number WeightStart-address Count1Start-address Count1End-address Count2End-address Count2
Start-of-region : When Start-of-region : When Start-addressStart-address is is reached reached Count1 Count1 timestimes
End-of-region : When End-of-region : When End-addressEnd-address is reached is reached Count2Count2 times times
Example usage:Example usage:
pinpin –t –t simulator –ppfile foo.ppsimulator –ppfile foo.pp –- –- foofooFast-forward
Simulation Fast-forward
Simulation
- 21 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
PinPoints: Estimating Total Execution TimePinPoints: Estimating Total Execution TimeTotal Execution Time = Total Cycles / Frequency
– We know the simulated Frequency; need to know Total Cycles for *full* We know the simulated Frequency; need to know Total Cycles for *full* run of the binary on the Simulator run of the binary on the Simulator
Total Cycles Simulated = (Weighted CPI) * (Total Instructions)
– PinPoints provides the Total number of instructions in the PinPoints provides the Total number of instructions in the PinPoints file.PinPoints file.
Weighted CPI can be determined through simulation of PinPoints regions and Weighted CPI can be determined through simulation of PinPoints regions and weighting of results:weighting of results:
Weighted CPI = Weighted CPI = Weight Weightii * CPI * CPIii
CAUTION: Use the formula only for statistics normalized by CAUTION: Use the formula only for statistics normalized by instructions : CPI computation OK; IPC computation is NOT OKinstructions : CPI computation OK; IPC computation is NOT OK
- 22 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
PinPoints : Usage ModelPinPoints : Usage Model
Pin-based profiler
Simulation Point
Selection
BB ProfilePinPoints
Pin-based Trace Generator
Pin-based Branch Predictor
Your Simulator Here
CONTROL
CONTROL
CONTROL
- 24 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
Ad-hoc system call side-effect emulationAd-hoc system call side-effect emulationswitch (syscall_id)switch (syscall_id)case SC1 : // Action for SC1case SC1 : // Action for SC1case SC2 : // Action for SC2case SC2 : // Action for SC2
Simplescalar(Alpha) emulates 80+ syscalls Simplescalar(Alpha) emulates 80+ syscalls (enough to run SPEC2000 only)(enough to run SPEC2000 only)
User-level Simulation with User-level Simulation with SimpleScalar (Alpha): Old ApproachSimpleScalar (Alpha): Old Approach
Host Operating System
Host Machine
User Level Simulator
ArchitectureSimulation
Engine
System Call
EmulationEngine
syscall(id, arg1,…,argn)
Register and memory updates
Executes syscall natively
- 25 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
No ad-hoc processing of system calls No ad-hoc processing of system calls neededneeded
Ease of porting to newer OSes Ease of porting to newer OSes (MacOS/Windows)(MacOS/Windows)
Simulation of many more applications Simulation of many more applications (non-SPEC) feasible (non-SPEC) feasible
pinSEL : A tool for Automatic pinSEL : A tool for Automatic System-call Side-effect LoggingSystem-call Side-effect Logging
pinSELLog of syscall
side-effects
// At a system call// set memory // locations as// specified in the log
- 26 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
Coming Soon : Coming Soon : pinSEL + SimpleScalar-x86pinSEL + SimpleScalar-x86
pinSEL : Pin-based “System Effects Log” pinSEL : Pin-based “System Effects Log” generator (alternative to generator (alternative to EIOEIO traces) traces)
pinSEL SimpleScalar-x86
SELs
PinPoints
CONTROL
pinSEL Key Advantages
• Automated system-call effect analysis
• Easy port to MacOS and Windows
- 27 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
Example : pinSEL for SimpleScalar.x86Example : pinSEL for SimpleScalar.x86
$ pin -t $ pin -t pinSEL -ppfile perlbmk.makerand.pppinSEL -ppfile perlbmk.makerand.pp - -tracefile perlbmk.makerand -- tracefile perlbmk.makerand -- perlbmk.exe -I perlbmk.exe -I lib makerand.pllib makerand.pl
START:START:icount:13 icount:13 do_trace: 1do_trace: 1
PinPoint #: 1 phase id: 2 weight: 25.64 PinPoint #: 1 phase id: 2 weight: 25.64 slice_size: 30000000slice_size: 30000000
SEL file names: perlbmk.makerand_1_0.sel SEL file names: perlbmk.makerand_1_0.sel perlbmk.makerand_1_0.ssiperlbmk.makerand_1_0.ssi
END:END: icount:30000786 icount:30000786 do_trace: 0do_trace: 0
Selective Simulation
Conditional Instrumentation
- 28 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
SummarySummary
Techniques for speeding up Pin-based Techniques for speeding up Pin-based simulationsimulation
1.1. Be selectiveBe selective : choose simulation regions : choose simulation regions
2.2. Instrument conditionallyInstrument conditionally : Only in : Only in “regions of interest”“regions of interest”
Coming Soon [ from UCSD] : Coming Soon [ from UCSD] : pinSEL + SimpleScalar-x86pinSEL + SimpleScalar-x86
- 29 -
Copyright © 2006 Intel Corporation. All Rights Reserved.
ResourcesResources Pin Manual: Instrumentation Library: Library for
common instrumentation tasks Controller : Identify start and stop points for instrumentation
PinPoints: Harish Patil, Robert Cohn, Mark Charney, Rajiv Kapoor, Andrew Sun, and Anand Karunanidhi. “Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation” MICRO-37(2004)
pinSEL: Satish Narayanasamy, Cristiano Pereira, Harish Patil, Robert Cohn, and Brad Calder. “Automatic Logging of Operating System Effects to Guide Application-Level Architecture Simulation” SIGMETRICS’06
top related