dynaprof and papi - linux clusters institute · 6 papi preset listing from tests/avail test case 8:...

29
DynaProf and PAPI DynaProf and PAPI An Object Code Instrumentation System An Object Code Instrumentation System for Dynamic Profiling for Dynamic Profiling Philip J. Mucci Philip J. Mucci [email protected] [email protected] Sept, Sept, 2000 2000

Upload: hatuyen

Post on 01-Nov-2018

232 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

DynaProf and PAPIDynaProf and PAPI

An Object Code Instrumentation System An Object Code Instrumentation System for Dynamic Profilingfor Dynamic Profiling

Philip J. Mucci Philip J. Mucci [email protected]@cs.utk.edu Sept, Sept, 20002000

Page 2: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

2

Introduction to PAPIIntroduction to PAPI

? A portable interface to hardware performance profiling.

? Stimulate and leverage performance tool development and research.

? Further research on run-time and/or feedback-driven compilation techniques. – ATLAS-like Software– Hardware Resource Caching– Run-Time Self Optimizing Code

Page 3: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

3

PAPI MethodologyPAPI Methodology

?“Get the right people talking”?Loosely standardize an interface (and kernel

functionality) under Linux/x86.?Release a free reference implementation for

a few architectures.?Redesign and standardize through an “MPI-

like” forum.

Page 4: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

4

PAPI v1.2betaPAPI v1.2beta

?42 C, 33 Fortran functions?Works on Linux/x86 2.0,2.2 and 2.4?Also on AIX, IRIX, Unicos, Solaris 8/Ultra?Integrated into TAU, DEEP/MPI, vprof,

lperfex tools?Adopted by Cray?Running at various SC centers including

AHPCC on Road Runner

Page 5: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

5

Functions in PAPIFunctions in PAPI

?Efficient aggregate counter access?Handler dispatch on threshold (overflow)?Segmented statistical profiling via PC

sampling on handler dispatch?Query and describe the hardware?User-interface helpers?Open Source

Page 6: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

6

PAPI Preset Listing from PAPI Preset Listing from tests/availtests/avail

Test case 8: Available events and hardware information.-------------------------------------------------------------------------Vendor string and code : GenuineIntel (-1)Model string and code : Celeron (Mendocino) (6)CPU revision : 10.000000CPU Megahertz : 366.504944-------------------------------------------------------------------------Name Code Avail Deriv Description (Note)PAPI_L1_DCM 0x80000000 Yes No Level 1 data cache missesPAPI_L1_ICM 0x80000001 Yes No Level 1 instruction cache missesPAPI_L2_DCM 0x80000002 No No Level 2 data cache missesPAPI_L2_ICM 0x80000003 No No Level 2 instruction cache missesPAPI_L3_DCM 0x80000004 No No Level 3 data cache missesPAPI_L3_ICM 0x80000005 No No Level 3 instruction cache missesPAPI_L1_TCM 0x80000006 Yes Yes Level 1 cache misses PAPI_L2_TCM 0x80000007 Yes No Level 2 cache misses PAPI_L3_TCM 0x80000008 No No Level 3 cache misses PAPI_CA_SNP 0x80000009 No No Requests for a snoop PAPI_CA_SHR 0x8000000a No No Requests for shared cache linePAPI_CA_CLN 0x8000000b No No Requests for clean cache linePAPI_CA_INV 0x8000000c No No Requests for cache line inv....

Page 7: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

7

Internal Design of PAPIInternal Design of PAPI

Page 8: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

8

The Linux/x86 KernelThe Linux/x86 Kernel

? No hardware counters in any major Linux distribution

? 2.0 and 2.2 kernel patch– System call interface based on Beowulf/perf from Erik

Hendricks– Backwards compatible

? 2.4 kernel patch– Memory mapped interface from Mikael Pettersson of

Uppsala

Page 9: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

9

Future of PAPIFuture of PAPI

?Additional functionality– Process/thread inheritance– Third party interface

?Performance tuning?Java language binding?Additional platforms: IA-64

Page 10: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

10

Tools for PAPITools for PAPI

?Now:– DEEP/MPI – from Pacific Sierra Research– TAU – Tuning and Analysis Utilities– lperfex – Simple clone for x86– vprof – Qt visualizer for statistical profile data

?In progress:– UTK – perfometer, profometer– Luis Derose: SvPablo

Page 11: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

11

TAU ScreenshotTAU Screenshot

Page 12: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

12

DEEP/MPI ScreenshotDEEP/MPI Screenshot

Page 13: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

13

vprof Screenshotvprof Screenshot

Page 14: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

14

Introduction to DynaProfIntroduction to DynaProf

?A portable tool to instrument a running executable with performance Probes.

?Simple and user friendly command line interface.

?Open Source ?Very much a work in progress…

Page 15: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

15

DynaProf MethodologyDynaProf Methodology

?Make collection of run time performance data easy. (and fun)

?Use the same tool with different probes.?Keep the probes easy to build.?Probes specify how to handle their own

output.

Page 16: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

16

Why the “Dyna”?Why the “Dyna”?

?Instrumentation is selectively inserted directly into the programs address space.

?Why is this a better way?– No perturbation of compiler optimizations.– No extra compilation step to use the tool.– Language independence.– Multiple Attach/Insert/Remove/Detach/Continue cycles

Page 17: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

17

DynaProf DesignDynaProf Design

?User interface: command line & script interface

?Uses GNU readline and GNU history for command line editing and command completion. Feels a lot like GDB…

?Instrumentation substrate:– DynInst 2.2 on Linux (and maybe others)– DPCL on AIX with latest PSSP

Page 18: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

18

DynaProf commandsDynaProf commands

Load [executable]Use [probe [probe_args]List [module,function,functions] [regexp]Select [<module>]Run [args]InfoHelp

Page 19: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

19

DynaProf Probes TypesDynaProf Probes Types

?2 Subroutine profiling probes generate:– Inclusive metrics– Exclusive metrics– 1-level call tree and metrics.

?1 init/stop probe simple aggregate metrics?1 statistical profiling probe

Page 20: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

20

Current DynaProf ProbesCurrent DynaProf Probes

? wall clock– Measures real microseconds from the cycle counter.

(SMP)? papiprobe

– Measures any valid PAPI event, native or preset. (SMP)

? vprofprobe– Generates data for the vprof visual profiler from a timer

or PAPI hardware event overflow. (from Curtis Janssen of Sandia Livermore)

? lperfexprobe– Dynamic version of the lperfex command (from Troy

Baer of OSC)

Page 21: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

21

DynaProf Probe DesignDynaProf Probe Design

? Shared libraries containing the instrumentation are loaded into the executable’s address space.

? Probes export 2 functions with a standardized interface:– Initialization function and program exit handler– Instrumentation function for function entry, call-before,

call-after and return? Easy to roll your own. (<1day for existing timers)? Supports separate libraries for threads.

Page 22: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

22

Subroutine ProfilingSubroutine Profiling

?Instrument entry and exit of selected subroutines to get inclusive time.

?Instrument before and after calls to children in those functions to get exclusive time.

?Output of probe consists of:– Inclusive profile– Exclusive profile– 1-level call tree with percents of children vs. parent

Page 23: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

23

Statistical ProfilingStatistical Profiling

? A probabilistic distribution of where in the code events occurred.

Program Text Addresses

EventCount

Page 24: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

24

Statistical ProfilingStatistical Profiling

? Most OS’s have profil() system or library call that accumulates samples based on time.

? On overflow of timer, dispatch an interrupt or signal.

? Handler gets the address at which the code was interrupted.

? Store counts of interrupts for each address.? The PAPI profil() call maintains the same

semantics but for hardware counter thresholds.

Page 25: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

25

Vprof: A simple visualizerVprof: A simple visualizer

Page 26: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

26

DynaProf StatusDynaProf Status

? V0.4: Includes– MPI support for all probes– PAPI subroutine probe with OpenMP support– Microsecond subroutine probe with OpenMP support– vprof/cprof probe and visualization tool– DynInst 2.2 shared libraries– Lperfex 1.0 probe

http://www.cs.utk.edu/~mucci/DynaProf/dynaprof-0.4-bin86.tgz

Page 27: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

27

DynaProf V0.4 BugsDynaProf V0.4 Bugs

– Only entire modules can be selected. – Large amount of instrumentation can exceed

4M limit.– Lack in the consistency of argument handling.– Memory leaks abound.

Page 28: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

28

Web ResourcesWeb Resources? Vprof from Curtis Janssen of Sandia Livermore

http://aros.ca.sandia.gov/~cljanss/perf/vprof/? DynInst 2.2 from U. Wisconsin and U. Maryland

http://www.cs.umd.edu/projects/dyninstAPI/? lperfex 1.0 from Troy Baer of Ohio Supercomputing

Centerhttp://www.osc.edu/~troy/lperfex/

? PAPI from me, ICL/UTennhttp://icl.cs.utk.edu/projects/papi

Page 29: DynaProf and PAPI - Linux Clusters Institute · 6 PAPI Preset Listing from tests/avail Test case 8: Available events and hardware information.-----

29

Mailing listsMailing lists

– send “subscribe ptools-perfapi” to [email protected]

[email protected] is the reflector– send “subscribe perfapi-devel” to [email protected]

[email protected] is the reflector– Informally [email protected] for DynaProf

news