using dyninst to dynamically control memory reference tracing
DESCRIPTION
Using Dyninst to Dynamically Control Memory Reference Tracing. Jeff Odom. Sigma Goals. Collaboration between IBM and UMD Family of tools to understand caches Focus of detailed statistics Complement existing hardware counters Ability to handle real applications MPI and OpenMP programs - PowerPoint PPT PresentationTRANSCRIPT
University of Maryland
Using Dyninst to Dynamically Control Memory Reference Tracing
Jeff Odom
University of Maryland2
Sigma Goals
Collaboration between IBM and UMD Family of tools to understand caches
– Focus of detailed statistics– Complement existing hardware counters
Ability to handle real applications– MPI and OpenMP programs– Fortran and C
Provide hints about restructuring– Padding (both inter and intra data
structures)– Blocking
University of Maryland3
Approach
Run instrumented program– Capture full information about memory use– Produce compact trace
• Extracts loops and memory strides Post execution tools
– Detailed simulator• Full discrete event simulator
– Memory profiler• share of accesses due to each data
structure– Cache Prediction Tool
• Predict cache misses using symbolic equations
University of Maryland4
Representing Program Execution
Capture full execution behavior– Record all basic blocks and memory
addresses– Produces large traces (due to looping)
Trace compression– Maintain pattern buffer – Scan for repeating patterns
• Extract memory strides– Repeat algorithms for nested loopsBLK1 ADR ADR ADRBLK2
100 200 300
4 4 4
300 500
4 4
ADR ADR
250
7
BLK3RPT
Count
Length
Base
Stride
University of Maryland5
Not EnoughA few seconds generates gigabytes
– Regularity of data critical to compression
Original Program
Trace Size Application
5.9 s 1,900,591,649
seis_s
1.2 s 2,154,062,238
cg.SLossy tracing
– Statistically “rebuild” trace from sampled set
University of Maryland6
Sampling
Leverages Sigma– Most scientific apps loop based– Regular data access gives better
compresion
Time step boundary– Outermost loop– Non-uniform memory access OK
University of Maryland7
Sigma + Dyninst
Dyninst natural choice– Vary sample rate without recompilation– Adaptive/progressive rate during execution
Leverage existing Sigma infrastructure– Only generate trace– Offline simulation step unchanged
University of Maryland8
DynSigma
Mutator parses executable, inserts instrumentation, generates aux files– Instructions/module– Stack/global variables– Functions/line #
Group points by basic block (NEW)– Find load/store instrumentation via
BPatch_basicBlock::findPoint()
Mutatee generates trace– Inserted Sigma library
University of Maryland9
Sample Application
Seismic simulation from SPEC-HPC 2002– Models multiple seismic processes– Process results pipelined
Variable time steps– Different data pattern for each process
C & Fortran– Fortran – data processing– C – dynamic memory management, IO
University of Maryland10
L1 cache memtime by data structure
0.000001
0.00001
0.0001
0.001
0.01
0.1
1
10
Tim
e (
s)Full
0.1%0.5%
1%
5%10%
University of Maryland11
L2 cache memtime by data structure
0.00000001
0.0000001
0.000001
0.00001
0.0001
0.001
0.01
0.1
1
10
Tim
e (
s)Full
0.1%
0.5%
1%
5%
10%
University of Maryland12
L1 + L2 memtime by data structure
0.000001
0.00001
0.0001
0.001
0.01
0.1
1
10
Tim
e (
s)
Full
0.1%
0.5%
1%
5%
10%
University of Maryland13
L1 + L2 memtime by data structure
init 0.091
process 5.713
report 0.095
0.000001
0.00001
0.0001
0.001
0.01
0.1
1
10
Tim
e (
s)
Full
1%
1% init
University of Maryland14
Why go to all the trouble?
How about just one time step?Single Time Step
Full 1% First Middle Last
sa 3.46 3.86 2.46 6.32 9.96
otr 1.18 1.50 4.14 0.291 0.22
ra 0.0512 0.0973 0 0 0.299166
sxyz@coord 0.00153 0.00181 0 0 0.004883
rxyz@coord 0.001459 0.002038 0 0 0.005640
xn@dgen 0.001311 0.001934 0 0 0.005895
ityp@dgen 0.000202 0.000394 0 0 0.001211
ref@dgen 0.000184 0.000344 0 0 0.001050
z0@dgen 0.000159 0.000031 0 0 0.000899
name@sys 2.77E-05 2.48E-05 0 0 0
kra@sys 9.54E-06 2.48E-05 0 0 0
lenra@sys 9.54E-06 9.54E-06 9.46E-06 9.46E-06 9.46E-06
University of Maryland15
Size does matterSample Trace Size Time Correlation
0.10% 0.90 MB 0:31 0.999506
0.50% 4.56 MB 1:07 0.999516
1.00% 9.55 MB 1:58 0.999021
1.00% w/ init 9.79 MB 2:00 0.999433
5.00% 46.8 MB 8:22 0.999581
10.00% 116 MB 18:00 0.995556
Full 1,813 MB 43:03
Uninstrumented 0:06
Includes 0:12 mutator overhead
University of Maryland16
Conclusions
Compressed traces may be very large for short runtimes
Sampling single time step no good Concentrate on main processing loop Small (1%) samples accurate enough
University of Maryland17
Ongoing & Future Work
Measure another application Determining time steps at runtime
– Extending code coverage with counters
Adaptive sampling rates– Multi-pass memory profiling
Irregular accesses– Sampling
Multithreaded applications