towards scalable cross-platform application performance analysis -- tool goals and progress

26
October 18, 2001 LACSI Symposium, Santa Fe , NM 1 Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress Shirley Moore shirley@ cs . utk . edu

Upload: theodore-todd

Post on 31-Dec-2015

25 views

Category:

Documents


0 download

DESCRIPTION

Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress. Shirley Moore [email protected]. Scalability Issues. Code instrumentation Hand instrumentation too tedious for large codes Runtime control of data collection Batch queueing systems - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM 1

Towards Scalable Cross-Platform Application Performance Analysis --

Tool Goals and ProgressShirley [email protected]

Page 2: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM

2

Scalability Issues

• Code instrumentation– Hand instrumentation too tedious for

large codes

• Runtime control of data collection• Batch queueing systems

– Cause problems for interactive tools

• Tracefile size and complexity• Data analysis

Page 3: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM

3

Cross-platform Issues

• Goal: similar user interfaces across different platforms

• Tools necessarily rely on platform-dependent substrates – e.g., for accessing hardware counters.

• Standardization of interfaces and data formats promotes interoperability and allows design of portable tools.

Page 4: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM

4

Where is Standardization Needed?

• Performance data– Trace records vs. summary statistics– Data format– Data semantics

• Library interfaces– Access to hardware counters– Statistical profiling– Dynamic instrumentation

Page 5: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM

5

Standardization? (cont.)

• User interfaces– Common set of commands– Common functionality

• Timing routines• Memory utilization information

Page 6: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM

6

Parallel Tools Consortium

• http://www.ptools.org/• Interaction between vendors,

researchers, and users• Venue for standardization• Current projects

– PAPI– DPCL

Page 7: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM

7

Hardware Counters

• Small set of registers that count events, which are occurrences of specific signals related to the processor’s function

• Monitoring these events facilitates correlation between the structure of the source/object code and the efficiency of the mapping of that code to the underlying architecture.

Page 8: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM

8

Goals of PAPI

• Solid foundation for cross platform performance analysis tools

• Free tool developers from re-implementing counter access

• Standardization between vendors, academics and users

• Encourage vendors to provide hardware and OS support for counter access

• Reference implementations for a number of HPC architectures

• Well documented and easy to use

Page 9: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM

9

PAPI Implementation

Tools!!!

PAPI Low LevelPAPI High Level

Hardware Performance Counter

Operating System

Kernel Extension

PAPI Machine Dependent SubstrateMachine

SpecificLayer

PortableLayer

Page 10: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM

10

PAPI Preset Events

• Proposed standard set of events deemed most relevant for application performance tuning

• Defined in papiStdEventDefs.h• Mapped to native events on a

given platform– Run tests/avail to see list of PAPI

preset events available on a platform

Page 11: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM

11

Statistical Profiling

• PAPI provides support for execution profiling based on any counter event.

• PAPI_profil() creates a histogram by text address of overflow counts for a specified region of the application code.

• Used in vprof tool from Sandia Lab

Page 12: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM

12

PAPI Reference Implementations

• Linux/x86, Windows 2000– Requires patch to Linux kernel, driver for Windows

• Linux/IA-64• Sun Solaris 2.8/Ultra I/II• IBM AIX 4.3+/Power

– Contact IBM for pmtoolkit

• SGI IRIX/MIPS• Compaq Tru64/Alpha Ev6 & Ev67

• Requires OS device driver patch from Compaq• Per-thread and per-process counts not possible• Extremely limited number of events

• Cray T3E/Unicos

Page 13: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM

13

PAPI Future Work

• Improve accuracy of hardware counter and statistical profiling data– Microbenchmarks to measure accuracy (Pat

Teller, UTEP)– Use hardware support for overflow

interrupts– Use Event Address Registers (EARs) where

available

• Data structure based performance counters (collaboration with UMd)– Qualify event counting by address range– Page level counters in cache coherence

hardware

Page 14: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM

14

PAPI Future (cont.)

• Memory utilization extensions (following list suggested by Jack Horner, LANL)– Memory available on a node– Total memory available/used– High-water-mark memory used by

process/thread– Disk swapping by process– Process-memory locality– Location of memory used by an object

• Dynamic instrumentation – e.g., PAPI probe modules

Page 15: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM

15

For More Information

• http://icl.cs.utk.edu/projects/papi/– Software and documentation– Reference materials– Papers and presentations– Third-party tools– Mailing lists

Page 16: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM

16

DPCL

• Dynamic Probe Class Library• Built of top of IBM version of

University of Maryland’s dyninst• Current platforms

– IBM AIX– Linux/x86 (limited functionality)

• Dyninst ported to more platforms but by itself lacks functionality for easily instrumenting parallel applications.

Page 17: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM

17

Infrastructure Components?

• Parsers for common languages• Access to hardware counter data• Communication behavior

instrumentation and analysis• Dynamic instrumentation

capability• Runtime control of data collection

and analysis• Performance data management

Page 18: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM

18

Case Studies

• Test tools on large-scale applications in production environment

• Reveal limitations of tools and point out areas where improvements are needed

• Develop performance tuning methodologies for large-scale codes

Page 19: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM

19

PERC: Performance Evaluation Research Center

• Developing a Developing a sciencescience for understanding for understanding

performance of scientific applications on high-end performance of scientific applications on high-end

computer systems. computer systems.

• Developing Developing engineeringengineering strategies for improving strategies for improving

performance on these systems. performance on these systems.

• DOE Labs: ANL, LBNL, LLNL, ORNLDOE Labs: ANL, LBNL, LLNL, ORNL

• Universities: UCSD, UI-UC, UMD, UTKUniversities: UCSD, UI-UC, UMD, UTK

• Funded by SciDAC: Scientific Discovery through Funded by SciDAC: Scientific Discovery through

Advanced ComputingAdvanced Computing

Page 20: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM

20

PERC: Real-World Applications

• High Energy and Nuclear PhysicsHigh Energy and Nuclear Physics– Shedding New Light on Exploding Stars: Terascale Simulations of Shedding New Light on Exploding Stars: Terascale Simulations of

Neutrino-Driven SuperNovae and Their NucleoSynthesisNeutrino-Driven SuperNovae and Their NucleoSynthesis– Advanced Computing for 21st Century Accelerator Science and Advanced Computing for 21st Century Accelerator Science and

TechnologyTechnology• Biology and Environmental ResearchBiology and Environmental Research

– Collaborative Design and Development of the Community Climate Collaborative Design and Development of the Community Climate

System Model for Terascale ComputersSystem Model for Terascale Computers• Fusion Energy SciencesFusion Energy Sciences

– Numerical Computation of Wave-Plasma Interactions in Multi-Numerical Computation of Wave-Plasma Interactions in Multi-

dimensional Systemsdimensional Systems• Advanced Scientific ComputingAdvanced Scientific Computing

– Terascale Optimal PDE Solvers (TOPS)Terascale Optimal PDE Solvers (TOPS)– Applied Partial Differential Equations Center (APDEC)Applied Partial Differential Equations Center (APDEC)– Scientific Data Management (SDM)Scientific Data Management (SDM)

• Chemical SciencesChemical Sciences– Accurate Properties for Open-Shell States of Large MoleculesAccurate Properties for Open-Shell States of Large Molecules

• ……and more…and more…

Page 21: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM

21

Parallel Climate Transition Model

• Components for Ocean, Atmosphere, Sea Ice, Land Surface and River Transport

• Developed by Warren Washington’s group at NCAR

• POP: Parallel Ocean Program from LANL

• CCM3: Community Climate Model 3.2 from NCAR including LSM: Land Surface Model

• ICE: CICE from LANL and CCSM from NCAR

• RTM: River Transport Module from UT Austin

• Fortran 90 with MPI

Page 22: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM

22

PCTM: Parallel Climate Transition Model

Flux CouplerLand

SurfaceModel

OceanModel Atmosphere

Model

Sea Ice Model

Sequential Executionof Parallelized Modules

RiverModel

Page 23: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM

23

PCTM Instrumentation

• Vampir tracefile in tens of gigabytes range even for toy problem

• Hand instrumentation with PAPI tedious• UIUC working on SvPablo

instrumentation• Must work in batch queueing

environment• Plan to try other tools

– MPE logging and jumpshot– TAU– VGV?

Page 24: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM

24

In Progress

• Standardization and reference implementations for memory utilization information (funded by DoD HPCMP PET, Ptools-sponsored project)

• Repositories of application performance evaluation case studies (e.g., SciDAC PERC)

• Portable dynamic instrumentation for parallel applications (DOE MICS project – UTK, UMd, UWisc)

• Increased functionality and accuracy of hardware counter data collection (DoD HPCMP, DOE MICS)

Page 25: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM

25

Next Steps

• Additional areas for standardization?– Scalable trace file format– Metadata standards for performance

data– New hardware counter metrics (e.g.,

SMP and DMP events, data-centric counters)

– Others?

Page 26: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress

October 18, 2001 LACSI Symposium, Santa Fe, NM

26

Next Steps (cont.)

• Sharing of tools and data– Open source software– Machine and software profiles– Runtime performance data– Benchmark results– Application examples and case

studies• Long-term goal: common

performance tool infrastructure across HPC systems