recent advances in the tau performance system sameer shende , allen d. malony university of oregon
DESCRIPTION
Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon. TAU Performance System Framework. T uning and A nalysis U tilities Performance system framework for scalable parallel and distributed high-performance computing - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/1.jpg)
Recent Advances in the TAU Performance System
Sameer Shende, Allen D. MalonyUniversity of Oregon
![Page 2: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/2.jpg)
TAU Performance System Framework
Tuning and Analysis Utilities Performance system framework for scalable parallel and distributed high-
performance computing Targets a general complex system computation model
nodes / contexts / threads Multi-level: system / software / parallelism Measurement and analysis abstraction
Integrated toolkit for performance instrumentation, measurement, analysis, and visualization Portable, configurable performance profiling/tracing facility Open software approach
University of Oregon, LANL, FZJ Germany http://www.cs.uoregon.edu/research/paracomp/tau
![Page 3: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/3.jpg)
TAU Performance System Architecture
EPILOG
Paraver
![Page 4: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/4.jpg)
Program Database Toolkit (PDT)
Program code analysis framework for developing source-based tools
High-level interface to source code information Integrated toolkit for source code parsing, database
creation, and database query commercial grade front end parsers portable IL analyzer, database format, and access API open software approach for tool development
Target and integrate multiple source languages Use in TAU to build automated performance
instrumentation tools
![Page 5: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/5.jpg)
PDT Architecture and Tools
C/C++ Fortran
77/90
![Page 6: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/6.jpg)
PDT Components Language front end
Edison Design Group (EDG): C (C99), C++ Mutek Solutions Ltd.: F77, F90 creates an intermediate-language (IL) tree
IL Analyzer processes the intermediate language (IL) tree creates “program database” (PDB) formatted file
DUCTAPE C++ program Database Utilities and Conversion Tools
APplication Environment processes and merges PDB files C++ library to access the PDB for PDT applications
![Page 7: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/7.jpg)
New Features in TAU
Instrumentation OPARI – OpenMP directive rewriting approach [POMP, FZJ] Selective instrumentation –grouping, include/exclude lists tau_reduce – rule based detection of high overhead lightweight
routines Measurement
PAPI [UTK] – Support for multiple hardware counters/time Callpath profiling (1-level) Native generation of EPILOG traces [EXPERT, FZJ]
Analysis Support for Paraver [CEPBA] trace visualizer jracy – New Java based profile browser in TAU
Availability New platforms and compilers supported (NEC, Hitachi, Intel)
![Page 8: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/8.jpg)
TAU Instrumentation
Flexible instrumentation mechanisms at multiple levels Source code
manual automatic using Program Database Toolkit (PDT), OPARI
(for OpenMP programs) Object code
pre-instrumented libraries (e.g., MPI using PMPI) statically linked dynamically linked (e.g., Virtual machine instrumentation) fast breakpoints (compiler generated)
Executable code dynamic instrumentation (pre-execution) using DynInstAPI
![Page 9: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/9.jpg)
Instrumentation of OpenMP Constructs
OOpenMP PPragma AAnd RRegion IInstrumentor Source-to-Source translator to insert POMP calls
around OpenMP constructs and API functions Done: Supports
Fortran77 and Fortran90, OpenMP 2.0 C and C++, OpenMP 1.0 POMP Extensions EPILOG and TAU POMP implementations Preserves source code information (#line line file)
Work in Progress:Investigating standardization through OpenMP Forum
![Page 10: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/10.jpg)
OpenMP API Instrumentation
Transform omp_#_lock() pomp_#_lock() omp_#_nest_lock() pomp_#_nest_lock()
[ # = init | destroy | set | unset | test ]
POMP version Calls omp version internally Can do extra stuff before and after call
![Page 11: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/11.jpg)
Example: !$OMP PARALLEL DO Instrumentation
!$OMP PARALLEL DO clauses...
do loop
!$OMP END PARALLEL DO
!$OMP PARALLEL other-clauses...
!$OMP DO schedule-clauses, ordered-clauses, lastprivate-clausesdo loop
!$OMP END DO
!$OMP END PARALLEL DO
NOWAIT
!$OMP BARRIER
call pomp_parallel_fork(d)
call pomp_parallel_begin(d)
call pomp_parallel_end(d)
call pomp_parallel_join(d)
call pomp_do_enter(d)
call pomp_do_exit(d)
call pomp_barrier_enter(d)
call pomp_barrier_exit(d)
![Page 12: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/12.jpg)
Opari Instrumentation: Example
OpenMP directive instrumentation Ocean Current Circulation [Tim Kaiser, SDSC]
pomp_for_enter(&omp_rd_2);
#line 252 "stommel.c"
#pragma omp for schedule(static) reduction(+: diff) private(j) firstprivate (a1,a2,a3,a4,a5) nowait
for( i=i1;i<=i2;i++) {
for(j=j1;j<=j2;j++){
new_psi[i][j]=a1*psi[i+1][j] + a2*psi[i-1][j] + a3*psi[i][j+1]
+ a4*psi[i][j-1] - a5*the_for[i][j];
diff=diff+fabs(new_psi[i][j]-psi[i][j]);
}
}
pomp_barrier_enter(&omp_rd_2);
#pragma omp barrier
pomp_barrier_exit(&omp_rd_2);
pomp_for_exit(&omp_rd_2);
#line 261 "stommel.c"
![Page 13: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/13.jpg)
OPARI: Basic Usage (f90)
Reset OPARI state information rm -f opari.rc
Call OPARI for each input source file opari file1.f90...opari fileN.f90
Generate OPARI runtime table, compile it with ANSI C opari -table opari.tab.ccc -c opari.tab.c
Compile modified files *.mod.f90 using OpenMP Link the resulting object files, the OPARI runtime table opari.tab.o and the TAU POMP RTL
![Page 14: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/14.jpg)
OPARI: Makefile Template (C/C++)
OMPCC = ... # insert C OpenMP compiler hereOMPCXX = ... # insert C++ OpenMP compiler here
.c.o:opari $<$(OMPCC) $(CFLAGS) -c $*.mod.c
.cc.o:opari $<$(OMPCXX) $(CXXFLAGS) -c $*.mod.cc
opari.init:rm -rf opari.rc
opari.tab.o:opari -table opari.tab.c$(CC) -c opari.tab.c
myprog: opari.init myfile*.o ... opari.tab.o$(OMPCC) -o myprog myfile*.o opari.tab.o -lpomp
myfile1.o: myfile1.c myheader.hmyfile2.o: ...
![Page 15: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/15.jpg)
Tracing Hybrid Executions – TAU and Vampir
![Page 16: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/16.jpg)
Profiling Hybrid Executions
![Page 17: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/17.jpg)
Instrumentation Control
Selection of which performance events to observe Could depend on scope, type, level of interest Could depend on instrumentation overhead
How is selection supported in instrumentation system? No choice Include / exclude lists (TAU) Environment variables Static vs. dynamic
Problem: Controlling instrumentation of small routines High relative measurement overhead Significant intrusion and possible perturbation
![Page 18: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/18.jpg)
Instrumentation Control: Grouping
Profile Groups A group of related routines forms a profile group Statically defined
TAU_DEFAULT, TAU_USER[1-5], TAU_MESSAGE, TAU_IO, …
Dynamically defined Group name based on string “integrator”, “particles” Runtime lookup in a map to get unique group identifier tau_instrumentor file.pdb file.cpp –o file.i.cpp -g “particles”
Assigns all routines in file.cpp to group “particles” Ability to change group names at runtime Instrumentation control based on profile groups
![Page 19: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/19.jpg)
TAU Instrumentation Control API
Enabling Profile Groups TAU_ENABLE_INSTRUMENTATION(); // Global control TAU_ENABLE_GROUP(TAU_GROUP); // statically defined TAU_ENABLE_GROUP_NAME(“group name”); // dynamic TAU_ENABLE_ALL_GROUPS(); // for all groups
Disabling Profile Groups TAU_DISABLE_INSTRUMENTATION(); TAU_DISABLE_GROUP(TAU_GROUP); TAU_DISABLE_GROUP_NAME(); TAU_DISABLE_ALL_GROUPS();
Obtaining Profile Group Identifier TAU_GET_PROFILE_GROUP(“group name”);
Runtime Switching of Profile Groups TAU_PROFILE_SET_GROUP(TAU_GROUP); TAU_PROFILE_SET_GROUP_NAME(“group name”);
![Page 20: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/20.jpg)
TAU Pre-execution Instrumentation Control
Dynamic groups defined at file scope Group names and group associations may be modified at runtime Controlling groups at pre-execution time using
--profile <group1+group2+…+groupN> option% tau_instrumentor app.pdb app.cpp –o app.i.cpp –g “particles” % mpirun –np 4 application –profile particles+field+mesh+io Enables instrumentation for TAU_DEFAULT and particles, field, mesh and
io groups. Examples:
POOMA v1 (LANL) Static groups used
VTF (ASAP Caltech) Dynamic execution instrumentation control by python based controller
![Page 21: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/21.jpg)
Selective Instrumentation: Include/Exclude Lists% tau_instrumentor
Usage : tau_instrumentor <pdbfile> <sourcefile> [-o <outputfile>] [-noinline] [-g groupname] [-i headerfile] [-c|-c++|-fortran] [-f <instr_req_file> ]
For selective instrumentation, use –f option
% cat selective.dat
# Selective instrumentation: Specify an exclude/include list.
BEGIN_EXCLUDE_LIST
void quicksort(int *, int, int)
void sort_5elements(int *)
void interchange(int *, int *)
END_EXCLUDE_LIST
# If an include list is specified, the routines in the list will be the only
# routines that are instrumented.
# To specify an include list (a list of routines that will be instrumented)
# remove the leading # to uncomment the following lines
#BEGIN_INCLUDE_LIST
#int main(int, char **)
#int select_
#END_INCLUDE_LIST
![Page 22: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/22.jpg)
Rule-Based Overhead Analysis (N. Trebon, UO)
Analyze the performance data to determine events with high (relative) overhead performance measurements
Create a select list for excluding those events Rule grammar (used in TAUreduce tool)
[GroupName:] Field Operator Number GroupName indicates rule applies to events in group Field is a event metric attribute (from profile statistics)
numcalls, numsubs, percent, usec, cumusec, count [PAPI], totalcount, stdev, usecs/call, counts/call
Operator is one of >, <, or = Number is any number Compound rules possible using & between simple rules
![Page 23: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/23.jpg)
Example Rules
#Exclude all events that are members of TAU_USER #and use less than 1000 microsecondsTAU_USER:usec < 1000
#Exclude all events that have less than 100 #microseconds and are called only onceusec < 1000 & numcalls = 1
#Exclude all events that have less than 1000 usecs per #call OR have a (total inclusive) percent less than 5usecs/call < 1000percent < 5
Scientific notation can be used usec>1000 & numcalls>400000 & usecs/call<30 & percent>25
![Page 24: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/24.jpg)
TAU Measurement: Integration with PAPI
Uniform access to hardware performance counters, wall clock and process virtual time PAPI (Performance API) (UTK, Ptools Consortium) consistent, portable API Support for measuring multiple counter/timing metrics
(upto 25) in a single experiment -MULTIPLECOUNTERS TAU configuration option
![Page 25: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/25.jpg)
TAU Measurement System Configuration configure [OPTIONS]
{-c++=<CC>, -cc=<cc>} Specify C++ and C compilers {-pthread, -sproc} Use pthread or SGI sproc threads -openmp Use OpenMP threads -opari=<dir> Specify location of Opari OpenMP
tool -papi=<dir> Specify location of PAPI -pdt=<dir> Specify location of PDT {-mpiinc=<d>, mpilib=<d>} Specify MPI library instrumentation -TRACE Generate TAU event traces -PROFILE Generate TAU profiles -PROFILECALLPATH Generate Callpath profiles (1-level) -MULTIPLECOUNTERS Use more than one hardware counter -CPUTIME Use usertime+system time -PAPIWALLCLOCK Use PAPI to access wallclock time -PAPIVIRTUAL Use PAPI for virtual (user) time …
![Page 26: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/26.jpg)
TAU Measurement Configuration – Examples
./configure -c++=xlC -cc=xlc –pdt=/usr/packages/pdtoolkit-2.1-papi=/usr/packages/papi –mpi Use TAU with IBM’s xlC compiler, PDT, PAPI and MPI Enable TAU profiling (default)
./configure -TRACE –PROFILE Enable both TAU profiling and tracing
./configure -c++=CC -cc=cc -MULTIPLECOUNTERS -papi=/usr/local/packages/papi –opari=/usr/local/opari-pomp-1.1 -mpiinc=/usr/packages/mpich/include -PAPIWALLCLOCK -mpilib=/usr/packages/mpich/lib –PAPIVIRTUAL –useropt=-O2 Use OpenMP+MPI using SGI’s compiler suite, Opari and use PAPI
for accessing hardware performance counters, wallclock & process virtual time for measurements
Typically configure multiple measurement libraries
![Page 27: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/27.jpg)
Setup: Running Applications
% setenv PROFILEDIR /home/data/experiments/profile/01
% setenv TRACEDIR /home/data/experiments/trace/01(optional)
% set path=($path <taudir>/<arch>/bin)
% setenv LD_LIBRARY_PATH $LD_LIBRARY_PATH\:<taudir>/<arch>/lib
For PAPI (1 counter):
% setenv PAPI_EVENT PAPI_FP_INS
For PAPI (multiplecounters):
% setenv COUNTER1 PAPI_FP_INS (PAPI’s Floating point ins)
% setenv COUNTER2 PAPI_L1_DCM (PAPI’s L1 Data cache misses)
% setenv COUNTER3 P_VIRTUAL_TIME (PAPI’s virtual time)
% setenv COUNTER4 P_WALL_CLOCK_TIME (PAPI’s Wallclock time)
% mpirun –np <n> <application>
% llsubmit job.sh
![Page 28: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/28.jpg)
Profiling with PAPI (PAPI_L1_DCM, Matrix)
![Page 29: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/29.jpg)
Profiling with PAPI (PAPI_FP_INS, NPB - LU)
![Page 30: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/30.jpg)
Performance Mapping
Associate performance with “significant” entities (events)
Source code points are important Functions, regions, control flow events, user events
Execution process and thread entities are important Some entities are more abstract, harder to measure Consider callgraph (callpath) profiling
Measure time (metric) along an edge (path) of callgraph Incident edge gives parent / child view Edge sequence (path) gives parent / descendant view
Problem: Callpath profiling when callgraph is unknown Determine callgraph dynamically at runtime Map performance measurement to dynamic call path state
![Page 31: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/31.jpg)
1-Level Callpath Implementation in TAU
TAU maintains a performance event (routine) callstack Profiled routine (child) looks in callstack for parent
Previous profiled performance event is the parent A callpath profile structure created first time parent calls TAU records parent in a callgraph map for child String representing 1-level callpath used as its key
“a( )=>b( )” : name for time spent in “b” when called by “a” Map returns pointer to callpath profile structure
1-level callpath is profiled using this profiling data Build upon TAU’s performance mapping technology Measurement is independent of instrumentation Use –PROFILECALLPATH to configure TAU
![Page 32: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/32.jpg)
Callpath Profiling Example (NAS LU v2.3)
% configure -PROFILECALLPATH -SGITIMERS -arch=sgi64-mpiinc=/usr/include -mpilib=/usr/lib64 -useropt=-O2
![Page 33: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/33.jpg)
Vampir Trace Visualization Tool
Visualization and Analysis of MPI Programs
Originally developed by Forschungszentrum Jülich
Current development by Technical University Dresden
Distributed by PALLAS, Germany
http://www.pallas.de/pages/vampir.htm
![Page 34: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/34.jpg)
Vampir (NAS Parallel Benchmark – LU)
Timeline display Callgraph display
Communications display
Parallelism display
![Page 35: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/35.jpg)
Applications: EVH1
![Page 36: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/36.jpg)
Applications: VTF (ASCI ASAP Caltech) C++, C, F90, Python PDT, MPI
![Page 37: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/37.jpg)
Applications: SAMRAI (LLNL) C++ PDT, MPI SAMRAI timers (groups)
![Page 38: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/38.jpg)
Applications: Uintah (U. Utah) (500 cpus)
TAU uses SCIRun [U. Utah] for visualization of performance data (online/offline)
![Page 39: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/39.jpg)
Applications: Uintah (contd.)
Scalability analysis
![Page 40: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/40.jpg)
TAU Performance System Status Computing platforms
IBM SP, SGI Origin, ASCI Red, Cray T3E, HP Tru64 (Compaq) SC, HP Superdome (HPUX), Sun, Apple, Windows, Linux (IA-32, IA-64, Alpha, PPC), Hitachi, NEC
Programming languages C, C++, Fortran 77/90, HPF, Java
Communication libraries MPI, PVM, Nexus, Tulip, ACLMPL, MPIJava
Thread libraries pthread, Java,Windows, SGI sproc, Tulip, SMARTS, OpenMP
Compilers Intel KAI (KCC, KAP/Pro), PGI, GNU, Fujitsu, HP, Sun, Microsoft,
SGI, Cray, IBM, HP, Compaq, Hitachi, NEC, Intel
![Page 41: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/41.jpg)
Information
TAU (http://www.acl.lanl.gov/tau) PDT (http://www.acl.lanl.gov/pdtoolkit) PAPI (http://icl.cs.utk.edu/projects/papi/) OPARI (http://www.fz-juelich.de/zam/kojak/)
![Page 42: Recent Advances in the TAU Performance System Sameer Shende , Allen D. Malony University of Oregon](https://reader034.vdocuments.net/reader034/viewer/2022051401/56814982550346895db6cbd4/html5/thumbnails/42.jpg)
Support Acknowledgement
TAU and PDT support: Department of Energy (DOE)
DOE 2000 ACTS contract DOE MICS contract DOE ASCI Level 3 (LANL, LLNL) U. of Utah DOE ASCI Level 1 subcontract
DARPA NSF National Young Investigator (NYI) award