mpicl/paragraph evaluation report
DESCRIPTION
MPICL/ParaGraph Evaluation Report. Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida. Color encoding key: Blue: Information Red: Negative note Green: Positive note. Basic Information. Name: MPICL/ParaGraph Developer: - PowerPoint PPT PresentationTRANSCRIPT
MPICL/ParaGraph Evaluation Report
Adam Leko,Hans Sherburne
UPC Group
HCS Research LaboratoryUniversity of Florida
Color encoding key:
Blue: Information
Red: Negative note
Green: Positive note
2
Basic Information Name: MPICL/ParaGraph Developer:
ParaGraph: University of Illinois, University of Tennessee MPICL: ORNL
Current versions: Paragraph (no version number, but last available update 1999) MPICL 2.0 Website:
http://www.csar.uiuc.edu/software/paragraph/http://www.csm.ornl.gov/picl/
Contacts: ParaGraph
Michael Heath ([email protected]) Jennifer Finger
MPICL Patrick Worley ([email protected])
Note: Paragraph last updated 1999, MPICL last updated 2001 [both seem dead]
3
MPICL/ParaGraph Overview MPICL
Trace file creation library Uses MPI profiling interface Only records MPI commands
Support for “custom” events using manual instrumentation Writes traces in documented ASCII PICL format
ParaGraph PICL trace visualization tool Very old tool (first written during 1989-1991) Offers a lot of visualizations
Analog: MPICL -> MPE, Jumpshot -> ParaGraph
4
MPICL Overview Installation a nightmare
Requires knowledge of F2C symbol naming convention (!) Had to edit and remove some code to work with new version of MPICH
Hardcoded values for certain field sizes had to be updated One statement in the Fortran environment setup was causing a coredump of instrumented
programs on startup
Automatic instrumentation of MPI programs offered via profiling interface Once installed, very easy to use Have to add 3 lines of code to enable creation of trace files
Calls to tracefiles(), tracelevel(), and tracenode() (see ParaGraph documentation) Minor annoyance, could be done automatically
Manual instrumentation routines also available Calls to tracedata() and traceevent() (see ParaGraph documentation) Notion of program “phases” which allow crude form of source code correlation
Also has extra code to ensure accurate clock synchronization Extra work is done to ensure consistent ordering of events Helps prevent “tachyons” (showing messages received before they are sent) Delays startup by several seconds (but is not mandatory)
After trace file is collected, it has to be sorted using tracesort
5
MPICL Overhead Instrumentation performed using MPI profiling interface
Used a 5MB buffer for trace files On average, instrumentation relatively intrusive, but within 20% Does not include overhead for synchronizing clocks Note: Benchmarks marked with * have high variability in runtimes
MPICL instrumentation overhead
0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 20%
CAMEL*
NAS LU (8p, W)
PP: Big message
PP: Diffuse procedure*
PP: Hot procedure*
PP: Intensive server
PP: Ping pong
PP: Random barrier
PP: Small messages*
PP: System time
PP: Wrong way*
Ben
chm
ark
Overhead (instrumented/uninstrumented)
6
ParaGraph Overview Uses its own widget set
Probably necessary when it was first written in 1989 Widgets look extremely crude by today’s standards
Button = square with text in the middle Uses its own conventions, takes a bit getting used to Once you adjust to interface, becomes less of an issue, but at times conventions
used become cumbersome Example: closing any child window shuts down entire application
ParaGraph philosophy Provide as many different types of visualizations as possible
4 categories: Utilization, communication, tasks, other Use a tape player abstraction for viewing trace data
Similar to Paraver, cumbersome for trying to maneuver to specific times All visualizations use a form of animation Trace data is drawn as fast as possible
This creates problems on modern machines “Slow motion” option available, but doesn’t work that well
Supports application-specific visualizations Have to write custom code and link against it during ParaGraph compilation
7
ParaGraph Visualizations Utilization visualizations
Display rough estimate of processor utilization Utilization broken down into 3 states:
Idle – When program is blocked waiting for a communication operation (or it has stopped execution) Overhead – When a program is performing communication but is not blocked (time spent within MPI library) Busy – if execution part of program other than communication
“Busy” doesn’t necessarily mean useful work is being done since it assumes (not communication) := busy
Communication visualizations Display different aspects of communication Frequency, volume, overall pattern, etc. “Distance” computed by setting topology in options menu
Task visualizations Display information about when processors start & stop tasks Requires manually instrumented code to identify when processors start/stop tasks
Other visualizations Miscellaneous things
Can load/save a visualization window set (does not work)
8
Utilization Visualizations – Utilization Count
Displays # of processors in each state at a given moment in time
Busy shown on bottom, overhead in middle, idle on top
9
Utilization Visualizations – Gantt Chart
Displays utilization state of each processor as a function of time
10
Utilization Visualizations – Kiviat Diagram Shows our friend, the
Kiviat diagram Each spoke is a single
processor Dark green shows moving
average, light green shows current high watermark Timing parameters for each
can be adjusted Metric shown can be
“busy” or “busy + overhead”
11
Utilization Visualizations – Streak
Shows “streak” of state Similar to winning/losing
streaks of baseball teams
Win = overhead or busy Loss = idle
Not sure how useful this is
12
Utilization Visualizations – Utilization Summary
Shows percentage of time spent in each utilization state up to current time
13
Utilization Visualizations – Utilization Meter
Shows percentage of processors in each utilization state at current time
14
Utilization Visualizations – Concurrency Profile Shows histograms of
# processors in a particular utilization state
Ex: Diagram shows Only 1 processor was
busy ~5% of the time All 8 processors were
busy ~90% of the time
15
Communication Visualizations – Color Code
Color code controls colors used on most communication visualizations
Can have color indicate message sizes, message distance, or message tag Distance computed by topology set in options menu
16
Communication Visualizations – Communication Traffic
Shows overall traffic at a given time Bandwidth used, or Number of messages in flight
Can show single node or aggregate of all nodes
17
Communication Visualizations – Spacetime Diagram
Shows standard space-time diagram for communication Messages sent from node to node at which times
18
Communication Visualizations – Message Queues
Shows data about message queue lengths Incoming/outgoing Number of bytes queued/number of messages queued
Colors mean different things Dark color shows current moving average Light color shows high watermark
19
Communication Visualizations – Communication Matrix Shows which
processors sent data to which other processors
20
Communication Visualizations – Communication Meter Show percentage of
communication used at the current time
Message count or bandwidth 100% = max # of messages /
max bandwidth used by the application at a specific time
21
Communication Visualizations – Animation Animates messages as they
occur in trace file Can overlay messages over
topology Available topologies
Mesh Ring Hypercube User-specified
Can layout each node as you want Can store to a file and load later on
22
Communication Visualizations – Node Data Shows detailed
communication data Can display
Metrics Which node Message tag Message distance Message length
For a single node, or aggregate for all nodes
23
Task Visualizations – Task Count
Shows number of processors that are executing a task at the current time
At end of run, changes to show summary of all tasks
24
Task Visualizations – Task Gantt
Shows Gantt chart of which task each processor was working on at a given time
25
Task Visualizations – Task Speed
Similar to Gantt chart, but displays “speed” of each task Must record work done by task in instrumentation call (not
done for example shown above)
26
Task Visualizations – Task Status Shows which tasks
have started and finished at the current time
27
Task Visualizations – Task Summary
Shows % time spent on each task Also shows any overlap between tasks
28
Task Visualizations – Task Surface
Shows time spent on each task by each processor
Useful for seeing load imbalance on a task-by-task basis
29
Task Visualizations – Task Work Displays work done by
each processor Shows rate and volume
of work being done Example doesn’t show
anything because no work amounts recorded in trace being visualized
30
Other Visualizations – Clock, Coordinates Clock
Shows current time Coordinate information
Shows coordinates when you click on any visualization
31
Other Visualizations – Critical Path
Highlights critical path in space-time diagram in red Longest serial path shown in red Depends on point-to-point communication (collective can screw it
up)
32
Other Visualizations – Phase Portrait Shows relationship
between processor utilization and communication usage
33
Other Visualizations – Statistics
Gives overall statistics for run Data
% busy, overhead, idle time Total count and bandwidth of
messages Max, min, average
Message size Distance Transit time
Shows max of 16 processors at a time
34
Other Visualizations – Processor Status Shows
Processor status Which task each
processor is executing Communication (sends
& receives) Each processor is a
square in the grid (8-processor example shown)
35
Other Visualizations – Trace Events
Shows text output of all trace file events
36
Bottleneck Identification Test Suite Testing metric: what did visualizations tell us (no manual instrumentation)? Programs correctness not affected by instrumentation CAMEL: PASSED
Space-time diagram & bandwidth utilization visualizations showed large number of small messages at beginning
Utilization graphs showed low overhead, few idle states LU: PASSED
Space-time diagram showed large number of small messages Kiviat diagram showed moving average of processor utilization low Phase portrait showed large correlation between communication and low
processor utilization Big messages: PASSED
Utilization Gantt and space-time diagrams showed large amount of overhead at time of each send
Diffuse procedure: PASSED Utilization Gantt showed one processor busy & rest idle Need manual instrumentation to determine that one routine takes too long
37
Bottleneck Identification Test Suite (2) Hot procedure: FAILED
Purely sequential code, so ParaGraph could not distinguish between idle and busy states
Intensive server: PASSED Utilization Gantt chart showed all processors except first idle Space-time chart showed processor 0 being inundated with messages
Ping-pong: PASSED Space-time chart showed large # of small messages dependent on each
other Random barrier: TOSS-UP
Utilization count showed one processor busy through execution Utilization Gantt chart showed busy processor randomly dispersed However, “waiting for barrier” state shown as idle, so difficult to track
down to barrier without extra manual instrumentation
38
Bottleneck Identification Test Suite (3) Small messages: PASSED
Utilization Gantt chart showed lots of time spent in MPI code (overhead)
Space-time diagram showed large numbers of small messages
System time: FAILED All processes show as busy, no distinction of user vs. system
time No communication = classification of processor states not really
done at all, everything just gets attributed to busy time
Wrong order: PASSED Space-time diagram showed messages being received in the
reverse order they were sent But, have to pay close attention to how the diagram is drawn
39
How to Best Use ParaGraph/MPICL Don’t use MPICL
Better trace file formats and libraries are available now We probably should look over the clock synchronization
code, but this probably isn’t useful if high-resolution timers are available Especially for shared-memory machines
Don’t use ParaGraph’s code directly But, has a lot of neat visualizations we could copy At the most we should scan the code to see how a
visualization is calculated In summary: just take the best ideas & visualizations
40
Evaluation (1) Available metrics: 2/5
Only records communication, task entrance and exit Approximates processor state by equating not communication = busy
Cost: 5/5 Free!
Documentation quality: 2/5 ParaGraph has excellent manual Very hard to find information on MPICL MPICL installation instructions woefully inadequate
Extensibility: 2/5 Can add custom visualizations, but must write code and recompile ParaGraph Open source, but uses old X-Windows API & it’s own widget set Dead project (no updates since 1999)
Filtering and aggregation: 1/5 Not really performed A few visualizations can be restricted to a certain processor Can output summary statistics (other visualization -> stats)
41
Evaluation (2) Hardware support: 5/5
Cray X1, AlphaServer (Tru64), IBM SP (AIX), SGI Altix, 64-bit Linux clusters (Opteron & Itanium)
Support for a large number of vendor-specific MPI libraries Would probably need a lot of effort to port to more modern architectures
though Heterogeneity support: 0/5 (not supported) Installation: 1.5/5
ParaGraph relatively easy to compile and install MPICL installation is extremely difficult, especially with modern versions of
MPIC/LAM Interoperability: 0/5
Does not interoperate with other tools Learning curve: 2.5/5
MPICL library easy to use ParaGraph interface unintuitive, can get in the way
42
Evaluation (3) Manual overhead: 1/5
Can record all MPI calls by linking, but this requires the addition of trace control instructions in source code
Task visualizations depend on manual instrumentation Measurement accuracy: 2/5
CAMEL: ~18% overhead Instrumentation adds a bit of runtime overhead, especially when many messages
are sent Multiple executions: 0/5 (not supported) Multiple analyses & views: 5/5
Many, many ways of looking at trace data Performance bottleneck identification: 4/5
Bottleneck identification must be performed manually Many visualizations help with bottleneck detection, but no guidance is provided
on which one you should examine first
43
Evaluation (4) Profiling/tracing support: 3/5
Only tracing supported Profiling data can be shown in ParaGraph after processing trace file
Response time: 2/5 Nothing reported until after program runs Also need (computationally expensive) trace sort to be performed
before you can view trace file Large trace files take a while to load (ParaGraph must pass over entire
trace before displaying anything) Searching: 0/5 (not supported) Software support: 3/5
Can link against any library using MPI profiling interface, but will not be instrumented
Only MPI and some (very old, obsolete) vendor-specific message-passing libraries are supported
44
Evaluation (5) Source code correlation: 0/5
Not supported Can do indirectly via manual instrumentation of tasks, but still hard to figure out
exactly where things occur in source code System stability: 3.5/5
MPICL relatively stable after bugs were fixed during compilation ParaGraph stable as long as you don’t try to do weird things (load the wrong file)
Not very robust with error handling ParaGraph’s load/save window set doesn’t work
Technical support: 0/5 Dead project Project email addresses still seem valid, but not sure how much help we could
get from the developers now