parallel performance analysis with open|speedshop seminar@ nasa nasa ames research center

51
1 Parallel Performance Analysis Parallel Performance Analysis with Open|SpeedShop with Open|SpeedShop Seminar@ NASA Seminar@ NASA NASA Ames Research Center NASA Ames Research Center October 29, 2008 October 29, 2008

Upload: rufin

Post on 15-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Parallel Performance Analysis with Open|SpeedShop Seminar@ NASA NASA Ames Research Center October 29, 2008. Presenters and Partners. Jim Galarowicz, Krell Don Maghrak, Krell Larger Team: Martin Schulz, LLNL David Montoya, LANL Scott Cranford, Sandia NLs University of Wisconsin - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

11

Parallel Performance AnalysisParallel Performance Analysiswith Open|SpeedShop with Open|SpeedShop

Seminar@ NASASeminar@ NASANASA Ames Research CenterNASA Ames Research Center

October 29, 2008October 29, 2008

Page 2: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 22

Presenters and PartnersJim Galarowicz, KrellJim Galarowicz, Krell

Don Maghrak, KrellDon Maghrak, Krell

Larger Team:Larger Team: Martin Schulz, LLNLMartin Schulz, LLNL David Montoya, LANLDavid Montoya, LANL Scott Cranford, Sandia NLs University of WisconsinScott Cranford, Sandia NLs University of Wisconsin William Hachfeld, Krell University of MarylandWilliam Hachfeld, Krell University of Maryland Samuel Gutierrez, LANL Rice UniversitySamuel Gutierrez, LANL Rice University Joseph Kenny, Sandia NLsJoseph Kenny, Sandia NLs Chris Chambreau, LLNLChris Chambreau, LLNL

Page 3: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 33

Seminar Goals

Introduce Open|SpeedShopIntroduce Open|SpeedShop Basic concepts, terminology, modes of operationBasic concepts, terminology, modes of operation Running first examplesRunning first examples

Provide Overview of FeaturesProvide Overview of Features Sampling & Tracing in O|SSSampling & Tracing in O|SS Performance comparisonsPerformance comparisons Parallel performance analysisParallel performance analysis

Status and RoadmapStatus and Roadmap

Page 4: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 44

HighlightsOpen Source Performance Analysis Tool Open Source Performance Analysis Tool FrameworkFramework Most common performance analysis steps Most common performance analysis steps in one toolin one tool ExtensibleExtensible by using plugins for data collection and by using plugins for data collection and

representationrepresentation Profiling (sampling) and Tracing (wrapping functions)Profiling (sampling) and Tracing (wrapping functions)

Multiple Instrumentation OptionsMultiple Instrumentation Options All work on All work on unmodifiedunmodified applicationapplication binariesbinaries

Need –g, but can be with –O3, O2, etc., in order to map to Need –g, but can be with –O3, O2, etc., in order to map to source lines.source lines.

Offline Offline data collection: run program start to enddata collection: run program start to end Online Online data collection withdata collection with abilityability to attach to attach to running to running

applications. applications. Start and stop data collection.Start and stop data collection.

Page 5: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 55

HighlightsFlexible and Easy to useFlexible and Easy to use User access through: User access through:

Graphical User Interface (GUI)Graphical User Interface (GUI)Interactive CommandInteractive Command LineLinePythonPython Scripting APIScripting API

Large Range of Platforms Large Range of Platforms LinuxLinux Clusters/SSIClusters/SSI with with x86, IA-64, Opteron, and x86, IA-64, Opteron, and

EM64T CPUsEM64T CPUs New: New: more more portableportable offline data collection mechanism offline data collection mechanism

AvailabilityAvailability Full source available on sourceforge.netFull source available on sourceforge.net Release tar balls on sourceforge.netRelease tar balls on sourceforge.net

Page 6: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 66

O|SS Target Audience

Programmers/code teamsProgrammers/code teams Use Open|SpeedShop out of the boxUse Open|SpeedShop out of the box Powerful performance analysisPowerful performance analysis Ability to integrate O|SS into projectsAbility to integrate O|SS into projects

Tool developersTool developers Single, comprehensive infrastructureSingle, comprehensive infrastructure Easy deployment of new toolsEasy deployment of new tools

Project/product specific customizationsProject/product specific customizations Predefined/custom experimentsPredefined/custom experiments

Page 7: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 77

Performance Experiments

Concept of an ExperimentConcept of an Experiment What program to analyzeWhat program to analyze What type of performance data to gatherWhat type of performance data to gather How often the performance data is gatheredHow often the performance data is gathered

Consists of Collectors and ViewsConsists of Collectors and Views Collectors define specific type of performance data Collectors define specific type of performance data

Hardware counters, program counter samplesHardware counters, program counter samples

Tracing of certain routines (I/O, MPI)Tracing of certain routines (I/O, MPI)

Views specify data aggregation and presentationViews specify data aggregation and presentation Multiple collectors per experiment possibleMultiple collectors per experiment possible

Page 8: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 88

Results

Experiment Workflow

Ru

nApplication

“Experiment”

Results can be displayed using several “Views”

Process Management

Panel

Consists of one or more data “Collectors”

Stored in SQL database

Page 9: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 99

Experiment Types in O|SS

Sampling ExperimentsSampling Experiments Periodically interrupt run and record locationPeriodically interrupt run and record location Report statistical distribution of these locationsReport statistical distribution of these locations Typically provides good overviewTypically provides good overview Overhead mostly low and uniformOverhead mostly low and uniform

Tracing Experiments Tracing Experiments Gather and store individual application events, Gather and store individual application events,

e.g., function invocations (MPI, I/O, …)e.g., function invocations (MPI, I/O, …) Provides detailed, low-level informationProvides detailed, low-level information Higher overhead, potentially burstyHigher overhead, potentially bursty

Page 10: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 1010

Sampling Experiments

PC Sampling (pcsamp)PC Sampling (pcsamp) Record PC in user defined time intervalsRecord PC in user defined time intervals Low overhead overview of time distributionLow overhead overview of time distribution

User Time (usertime)User Time (usertime) PC Sampling + Call stacks for each samplePC Sampling + Call stacks for each sample Provides inclusive & exclusive timing dataProvides inclusive & exclusive timing data

Hardware Counters (hwc, hwctime)Hardware Counters (hwc, hwctime) Sample HWC overflow eventsSample HWC overflow events Access to data like cache and TLB missesAccess to data like cache and TLB misses

Page 11: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 1111

Tracing Experiments

I/O Tracing (io, iot)I/O Tracing (io, iot) Record invocation of all POSIX I/O eventsRecord invocation of all POSIX I/O events Provides aggregate and individual timingsProvides aggregate and individual timings

MPI Tracing (mpi, mpit, mpiotf)MPI Tracing (mpi, mpit, mpiotf) Record invocation of all MPI routinesRecord invocation of all MPI routines Provides aggregate and individual timingsProvides aggregate and individual timings

Floating Point Exception Tracing (fpe)Floating Point Exception Tracing (fpe) Triggered by any FPE caused by the codeTriggered by any FPE caused by the code Helps pinpoint numerical problem areasHelps pinpoint numerical problem areas

Page 12: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 1212

Parallel Experiments

O|SS supports MPI and threaded codesO|SS supports MPI and threaded codes Tested with a variety of MPI implementationTested with a variety of MPI implementation Thread support based on POSIX threadsThread support based on POSIX threads

Any collector can be applied to parallel jobAny collector can be applied to parallel job Automatically applied to all tasks/threadsAutomatically applied to all tasks/threads Default views aggregate across all tasks/threadsDefault views aggregate across all tasks/threads Data from individual tasks/threads availableData from individual tasks/threads available

Specific parallel experiments (Specific parallel experiments (e.g., mpi, mpite.g., mpi, mpit))

Page 13: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 1313

High-level Architecture

GUI pyO|SSCLI

AMD and Intel based clusters/SSI using Linux

CLI

Open SourceSoftware

Code Instrumentation

Page 14: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 1414

Code Instrumentation in O|SS

Offline/External Data CollectionOffline/External Data Collection Instrument application at start-upInstrument application at start-up Write data to raw files and convert to O|SSWrite data to raw files and convert to O|SS Performance data available at end of execution.Performance data available at end of execution.

Online Scalable Data Collection via Online Scalable Data Collection via MRNetMRNet Scalable transport layerScalable transport layer Performance data delivered directly to tool onlinePerformance data delivered directly to tool online Ability for interactive online analysis and viewing Ability for interactive online analysis and viewing

intermediate resultsintermediate results

Page 15: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 1515

Offline & Online Data Collection

MPI Application

O|SS

post-mortem

OfflineMPI Application

O|SS

MRNet

Page 16: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 1616

High-level Architecture

GUI pyO|SSCLI

AMD and Intel based clusters/SSI using Linux

CLI

Open SourceSoftware

Code Instrumentation

Page 17: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 1717

Three Interfaces (GUI, CLI, Python)

Experiment Commands expAttach expCreate expDetach expGo expView

List Commands list -v exp list -v hosts list -v status

Session Commands setBreak openGui

import openss

my_filename=oss.FileList("myprog.a.out")my_exptype=oss.ExpTypeList("pcsamp")my_id=oss.expCreate(my_filename,my_exptype)

oss.expGo()

My_metric_list = oss.MetricList("exclusive")my_viewtype = oss.ViewTypeList("pcsamp“)result = oss.expView(my_id,my_viewtype,my_metric_list)

Page 18: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 1818

Running an Experiment

Running a simple example experiment Running a simple example experiment Examine the command syntaxExamine the command syntax List the outputs from the experimentList the outputs from the experiment

Viewing and Interpreting gathered Viewing and Interpreting gathered measurementsmeasurements GUI, CLI via the experiment database fileGUI, CLI via the experiment database file

Show “–offline” example in more detailShow “–offline” example in more detail

Introduce additional command syntaxIntroduce additional command syntax

Page 19: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 1919

Basic offline experiment syntax

openss –offline –f “executable” pcsampopenss –offline –f “executable” pcsamp openssopenss is the command to invoke Open|SpeedShop is the command to invoke Open|SpeedShop -offline -offline indicates the user interface to use indicates the user interface to use (immediate command)(immediate command)

There are a number of user interface optionsThere are a number of user interface options

-f-f is the option for specifying the executable name is the option for specifying the executable nameThe “executable” can be a sequential or parallel commandThe “executable” can be a sequential or parallel command

pcsamppcsamp indicates what type of performance data (metric) indicates what type of performance data (metric) you will gatheryou will gather

Here pcsamp indicates that we will periodically take a sample of the Here pcsamp indicates that we will periodically take a sample of the address that the program counter is pointing to.address that the program counter is pointing to.

We will associate that address with a function and/or source line.We will associate that address with a function and/or source line.

There are several existing performance metric choices There are several existing performance metric choices

Page 20: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 2020

What are the outputs?

Outputs from : Outputs from : openss –offline –f “executable” pcsampopenss –offline –f “executable” pcsamp Normal program output while executable is runningNormal program output while executable is running The sorted list of performance informationThe sorted list of performance information

A list of the top time taking functionsA list of the top time taking functions

The corresponding sample derived time for each functionThe corresponding sample derived time for each function

A performance information database fileA performance information database fileThe database file contains all the information needed to view the The database file contains all the information needed to view the data at anytime in the future without the executable(s).data at anytime in the future without the executable(s).

Symbol table information from executable(s) and system Symbol table information from executable(s) and system librarieslibraries

Performance data openss gatheredPerformance data openss gathered

Time stamps for when dso(s) were loaded and unloadedTime stamps for when dso(s) were loaded and unloaded

Page 21: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 2121

Example Parallel Run with Output

openss –offline –f “orterun -np 128 sweep3d.mpi” pcsamp

Page 22: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 2222

Output from Example Run

openss –offline –f “orterun -np 128 sweep3d.mpi” pcsamp

Page 23: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 2323

Using the Database file

Database file is one of the outputs from running: Database file is one of the outputs from running: openss –offline –f “executable” pcsamp openss –offline –f “executable” pcsamp Use this file to view the dataUse this file to view the data How to open the database file with openssHow to open the database file with openss

openss –f <database file name>openss –f <database file name>

openss (then use menus or wizard to open)openss (then use menus or wizard to open)

openss –cliopenss –cli

exprestore –f <database file name>exprestore –f <database file name>

In this example, we show: both In this example, we show: both openss –cli –f X.0.openss (CLI)openss –cli –f X.0.openss (CLI)

openss –f X.0.openss (GUI)openss –f X.0.openss (GUI)

X.0.openss is the file name openss creates by defaultX.0.openss is the file name openss creates by default

Page 24: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 2424

Output from Example Run

Loading the database file: openss –cli –f X.0.openss

Page 25: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 2525

Process Management PanelControl your job, focus stats panel, create process subsets Control your job, focus stats panel, create process subsets

Page 26: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 2626

Default Stats Panel Viewopenss –f X.0.openss: Performance statistics by function openss –f X.0.openss: Performance statistics by function is default viewis default view

Page 27: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 2727

Results map to SourceSplit screen mapping of performance data to source lineSplit screen mapping of performance data to source line

Page 28: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 2828

Min,Max,Average (Load Balance) View

Select “LB” in Toolbar to generate Load Balance ViewSelect “LB” in Toolbar to generate Load Balance View

Page 29: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 2929

Comparative Analysis: Clustering Ranks

Select “CA” in Toolbar to generate Comp. Analysis View Select “CA” in Toolbar to generate Comp. Analysis View

Page 30: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 3030

Comparative Analysis: Clustering Ranks

Select “CA” in Toolbar to generate Comp. Analysis View Select “CA” in Toolbar to generate Comp. Analysis View

Page 31: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 3131

Additional experiment syntax

openss –offline –f “executable” pcsampopenss –offline –f “executable” pcsamp -offline indicates the user interface is immediate command mode.-offline indicates the user interface is immediate command mode. Uses offline (LD_PRELOAD) collection mechanism.Uses offline (LD_PRELOAD) collection mechanism.

openss –cli –f “executable” pcsampopenss –cli –f “executable” pcsamp -cli indicates the user interface is interactive command line-cli indicates the user interface is interactive command line.. Uses online (dynamic instrumentation) collection mechanism.Uses online (dynamic instrumentation) collection mechanism.

openss –f “executable” pcsampopenss –f “executable” pcsamp No interface option indicates the user interface is graphical user.No interface option indicates the user interface is graphical user. Uses online (dynamic instrumentation) collection mechanism.Uses online (dynamic instrumentation) collection mechanism.

openss –batch < input.commands.fileopenss –batch < input.commands.file Executes from file of cli commandsExecutes from file of cli commands

Page 32: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 3232

Wizard Panel – page 1

Gather data fromnew runs

Analyze and/or compare existing data from previous

runs

O|SS CommandLine Interface

Page 33: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 3333

Wizard Panel – Gather new data

Select type of data to begathered by Open|SpeedShop

Page 34: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 3434

Compare WizardSide by side performance resultsSide by side performance results

Page 35: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 3535

Compare WizardSide by Side Source for the two versionsSide by Side Source for the two versions

Page 36: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 3636

Comparing MPI Ranks

Rank 0 Rank 1

Page 37: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 3737

CLI Language

An interactive command Line InterfaceAn interactive command Line Interface gdb/dbx like processinggdb/dbx like processing

Several interactive commandsSeveral interactive commands Create ExperimentsCreate Experiments Provide Process/Thread Control Provide Process/Thread Control View Experiment ResultsView Experiment Results

Where possible commands execute Where possible commands execute asynchronouslyasynchronously

http://www.openspeedshop.org/docs/cli_doc/ http://www.openspeedshop.org/docs/cli_doc/

Page 38: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 3838

CLI Command Overview

Experiment Creations– expcreate– expattach

Experiment Control– expgo– expwait– expdisable– expenable

Experiment Storage– expsave– exprestore

Result Presentation– expview– opengui

Misc. Commands– help– list– log– record– playback– history– quit

Page 39: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 3939

User-Time Example

lnx-jeg.americas.sgi.com-17>openss -cliopenss>>Welcome to OpenSpeedShop 1.9openss>>expcreate -f test/executables/ fred/fred usertimeThe new focused experiment identifier is: -x 1openss>>expgoStart asynchronous execution of experiment: -x 1openss>>Experiment 1 has terminated.

Create experiments and load application

Start application

Page 40: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 4040

Showing CLI Results

openss>>expviewExcl CPU time Inclu CPU time % of Total Exclusive Function in seconds. in seconds. CPU Time (defining location)5.2571 5.2571 49.7297 f3 (fred: f3.c,2)3.3429 3.3429 31.6216 f2 (fred: f2.c,2)1.9714 1.9714 18.6486 f1 (fred: f1.c,2)0.0000 10.5714 0.0000 __libc_start_main (libc.so.6)0.0000 10.5714 0.0000 _start (fred)0.0000 10.5429 0.0000 work(fred:work.c,2)0.0000 10.5714 0.0000 main (fred: fred.c,5)

Page 41: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 4141

CLI Batch Scripting (1)

Create batch file with CLI commandsCreate batch file with CLI commands Plain text filePlain text file Example:Example:

# Create batch fileecho expcreate -f fred pcsamp >> input.scriptecho expgo >> input.scriptecho expview pcsamp10 >>input.script

# Run OpenSpeedShopopenss -batch < input.script

Page 42: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 4242

CLI Batch Scripting (2)

Open|SpeedShop Batch Example ResultsOpen|SpeedShop Batch Example Results

The new focused experiment identifier is: -x 1Start asynchronous execution of experiment: -x 1

Experiment 1 has terminated. CPU Time Function (defining location) 24.2700 f3 (mutatee: mutatee.c,24) 16.0000 f2 (mutatee: mutatee.c,15) 8.9400 f1 (mutatee: mutatee.c,6) 0.0200 work (mutatee: mutatee.c,33)

Page 43: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 4343

CLI Batch Scripting (3)

Open|SpeedShop Batch Example: directOpen|SpeedShop Batch Example: direct

#Run Open|SpeedShop as a single non-interactive commandopenss –batch –f fred pcsamp

The new focused experiment identifier is: -x 1Start asynchronous execution of experiment: -x 1

Experiment 1 has terminated. CPU Time Function (defining location) 24.2700 f3 (mutatee: mutatee.c,24) 16.0000 f2 (mutatee: mutatee.c,15) 8.9400 f1 (mutatee: mutatee.c,6) 0.0200 work (mutatee: mutatee.c,33)

Page 44: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 4444

Python Scripting

Open|SpeedShop Python API that executes “same” Open|SpeedShop Python API that executes “same” Interactive/Batch Open|SpeedShop commandsInteractive/Batch Open|SpeedShop commands

User can intersperse “normal” Python code with User can intersperse “normal” Python code with Open|SpeedShop Python APIOpen|SpeedShop Python API

Run Open|SpeedShop experiments via the Open|Run Open|SpeedShop experiments via the Open|SpeedShop Python APISpeedShop Python API

Page 45: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 4545

Python Example (1)

Necessary steps:Necessary steps: Import O|SS Python moduleImport O|SS Python module Prepare arguments for target applicationPrepare arguments for target application Set view and experiment typeSet view and experiment type Create experimentCreate experiment

import openss

my_filename=openss.FileList("usability/phaseII/fred")my_viewtype = openss.ViewTypeList()my_viewtype += "pcsamp"exp1=openss.expCreate(my_filename,viewtype)

Page 46: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 4646

Python Example (2)

After experiment creationAfter experiment creation Start target application (asynchronous!)Start target application (asynchronous!) Wait for completionWait for completion Write resultsWrite results

openss.expGo()

openss.wait()

except openss.error: print "expGo(exp1,my_modifer) failed"

openss.dumpView()

Page 47: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 4747

Python Example Output

Two interfaces to dump dataTwo interfaces to dump data Plain text (similar to CLI) for viewingPlain text (similar to CLI) for viewing As Python objects for post-processingAs Python objects for post-processing

>python example.py/work/jeg/OpenSpeedShop/usability/phaseII/fred: successfully completed.

Excl. CPU time % of CPU Time Function (def. location) \ 4.6700 47.7994 f3 (fred: f3.c,23) 3.5100 35.9263 f2 (fred: f2.c,2) 1.5900 16.2743 f1 (fred: f1.c,2)

Page 48: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 4848

ExtensibilityO|SS is more than a performance toolO|SS is more than a performance tool All functionality in one toolset with one interfaceAll functionality in one toolset with one interface General infrastructure to create new toolsGeneral infrastructure to create new tools

Plugins to add new functionalityPlugins to add new functionality Cover all essential steps of performance analysisCover all essential steps of performance analysis Automatically loaded at O|SS startupAutomatically loaded at O|SS startup

Three types of pluginsThree types of plugins Collectors: How to acquire performance data?Collectors: How to acquire performance data? Views: How to aggregate and present data?Views: How to aggregate and present data? Panels: How to visualize data in the GUI?Panels: How to visualize data in the GUI?

Page 49: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 4949

Overview Summary Two techniques for instrumentationTwo techniques for instrumentation Online vs. OfflineOnline vs. Offline Different strength for different target scenariosDifferent strength for different target scenarios

Flexible GUI that can be customizedFlexible GUI that can be customized

Several compatible scripting optionsSeveral compatible scripting options Command Line LanguageCommand Line Language Direct batch interfaceDirect batch interface Integration of O|SS into PythonIntegration of O|SS into Python

GUI and scripting interoperableGUI and scripting interoperable

Plugin concept to extend Open|SpeedShopPlugin concept to extend Open|SpeedShop

Page 50: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 5050

Status & Future Plans

Open|SpeedShop 1.9 available shortlyOpen|SpeedShop 1.9 available shortly Packages and source from sourceforge.netPackages and source from sourceforge.net Tested on a variety of platformsTested on a variety of platforms

Offline version featured in version 1.9Offline version featured in version 1.9

Online (MRNet) work in progressOnline (MRNet) work in progress Target is version 2.0 in DecemberTarget is version 2.0 in December Working on some platforms but not allWorking on some platforms but not all

Focus on Scalability in coming months Focus on Scalability in coming months

Support for capability machines via Office of Science Support for capability machines via Office of Science proposal with ASC assistanceproposal with ASC assistance

Page 51: Parallel Performance Analysis with Open|SpeedShop  Seminar@ NASA NASA Ames Research  Center

Seminar @ NASA, 10-29-08 Seminar @ NASA, 10-29-08 Slide Slide 5151

Availability and ContactOpen|SpeedShop website:Open|SpeedShop website: http://www.openspeedshop.org/

Installed on cfe1.nas.nasa.govInstalled on cfe1.nas.nasa.gov

Download options:Download options: Package with Install ScriptPackage with Install Script Source for tool and base librariesSource for tool and base libraries

FeedbackFeedback Bug tracking and contact info available from websiteBug tracking and contact info available from website Feel free to contact presenters directlyFeel free to contact presenters directly

[email protected] and/or and/or [email protected]@krellinst.org