spider—an advanced symbolic debugger for fortran 90/hpf programs

34
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2002; 14:103–136 (DOI: 10.1002/cpe.618) SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs T. Fahringer 1, ,† , K. Sowa-Pieklo 2 , P. Czerwi´ nski 1 , P. Brezany 1 , M. Bubak 3 , R. Koppler 4 and R. Wism¨ uller 5 1 Institute for Software Science, University of Vienna, Liechtensteinstrasse 22, A-1090, Vienna, Austria 2 ABB Corporate Research, ul. Starowi´ slna 13A, 31-038 Krak´ ow, Poland 3 Institute of Computer Science, AGH, al. Mickiewicza 30, 30-059 Krak´ ow, Poland 4 GUP Linz, Johannes Kepler University Linz, Altenbergerstrasse 69, 4040 Linz, Austria 5 Lehrstuhl f¨ ur Rechnertechnik und Rechnerorganisation (LRR-TUM), Technische Universit¨ at M¨ unchen, D-80290 M¨ unchen, Germany SUMMARY Debuggers play an important role in developing parallel applications. They are used to control the state of many processes, to present distributed information in a concise and clear way, to observe the execution behavior, and to detect and locate programming errors. More sophisticated debugging systems also try to improve understanding of global execution behavior and intricate details of a program. In this paper we describe the design and implementation of SPiDER, which is an interactive source-level debugging system for both regular and irregular High-Performance Fortran (HPF) programs. SPiDER combines a base debugging system for message-passing programs with a high-level debugger that interfaces with an HPF compiler. SPiDER, in addition to conventional debugging functionality, allows a single process of a parallel program to be expected or the entire program to be examined from a global point of view. A sophisticated visualization system has been developed and included in SPiDER to visualize data distributions, data-to- processor mapping relationships, and array values. SPiDER enables a programmer to dynamically change data distributions as well as array values. For arrays whose distribution can change during program execution, an animated replay displays the distribution sequence together with the associated source code location. Array values can be stored at individual execution points and compared against each other to examine execution behavior (e.g. convergence behavior of a numerical algorithm). Finally, SPiDER also offers limited support to evaluate the performance of parallel programs through a graphical load diagram. SPiDER has been fully implemented and is currently being used for the development of various real-world applications. Several experiments are presented that demonstrate the usefulness of SPiDER. Copyright 2002 John Wiley & Sons, Ltd. KEY WORDS: debugger; data parallel programs; message passing programs Correspondence to: T. Fahringer, Institute for Software Science, University of Vienna, Liechtensteinstrasse 22, A-1090, Vienna, Austria. E-mail: [email protected] Contract/grant sponsor: Austrian Science Fund; contract/grant number: SFBF1104 Copyright 2002 John Wiley & Sons, Ltd.

Upload: t-fahringer

Post on 11-Jun-2016

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCEConcurrency Computat.: Pract. Exper. 2002; 14:103–136 (DOI: 10.1002/cpe.618)

SPiDER—An advancedsymbolic debugger forFortran 90/HPF programs

T. Fahringer1,∗,†, K. Sowa-Piekło2, P. Czerwinski1, P. Brezany1,M. Bubak3, R. Koppler4 and R. Wismuller5

1Institute for Software Science, University of Vienna, Liechtensteinstrasse 22, A-1090, Vienna, Austria2ABB Corporate Research, ul. Starowislna 13A, 31-038 Krakow, Poland3Institute of Computer Science, AGH, al. Mickiewicza 30, 30-059 Krakow, Poland4GUP Linz, Johannes Kepler University Linz, Altenbergerstrasse 69, 4040 Linz, Austria5Lehrstuhl fur Rechnertechnik und Rechnerorganisation (LRR-TUM), Technische Universitat Munchen,D-80290 Munchen, Germany

SUMMARY

Debuggers play an important role in developing parallel applications. They are used to control the stateof many processes, to present distributed information in a concise and clear way, to observe the executionbehavior, and to detect and locate programming errors. More sophisticated debugging systems also try toimprove understanding of global execution behavior and intricate details of a program. In this paper wedescribe the design and implementation of SPiDER, which is an interactive source-level debugging systemfor both regular and irregular High-Performance Fortran (HPF) programs. SPiDER combines a basedebugging system for message-passing programs with a high-level debugger that interfaces with an HPFcompiler. SPiDER, in addition to conventional debugging functionality, allows a single process of a parallelprogram to be expected or the entire program to be examined from a global point of view. A sophisticatedvisualization system has been developed and included in SPiDER to visualize data distributions, data-to-processor mapping relationships, and array values. SPiDER enables a programmer to dynamically changedata distributions as well as array values. For arrays whose distribution can change during programexecution, an animated replay displays the distribution sequence together with the associated source codelocation. Array values can be stored at individual execution points and compared against each other toexamine execution behavior (e.g. convergence behavior of a numerical algorithm). Finally, SPiDER alsooffers limited support to evaluate the performance of parallel programs through a graphical load diagram.SPiDER has been fully implemented and is currently being used for the development of various real-worldapplications. Several experiments are presented that demonstrate the usefulness of SPiDER. Copyright 2002 John Wiley & Sons, Ltd.

KEY WORDS: debugger; data parallel programs; message passing programs

∗Correspondence to: T. Fahringer, Institute for Software Science, University of Vienna, Liechtensteinstrasse 22, A-1090, Vienna,Austria.†E-mail: [email protected]

Contract/grant sponsor: Austrian Science Fund; contract/grant number: SFBF1104

Copyright 2002 John Wiley & Sons, Ltd.

Page 2: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

104 T. FAHRINGER ET AL.

1. INTRODUCTION

In recent years, parallel processing has evolved to more wide-spread technology for delivering parallelcomputing capability across a range of parallel architectures. An important reason for this evolutionis the fact that proprietary parts, e.g. CPUs, disks, and memories, have been replaced by commodityparts. Technologies for such commodity parts have matured enough to be used for high-end computersystems and are much cheaper than proprietary components.

Unfortunately, availability of parallel systems does not imply ease of use. Hence, there has been anincreased emphasis on parallel programming environments, including parallel language systems andtools for performance analysis, debugging, and visualization.

In this paper we discuss the design and implementation of SPiDER, which is an interactivesource-level debugging system for High-Performance Fortran (HPF) programs and leverages the HPFlanguage, compiler, and runtime system to address the general problem of providing high-level accessto distributed data. The objectives of SPiDER are summarized as follows:

• to support programmers to observe and to understand the execution behavior of their programs;• to detect and locate programming errors at the high-level HPF code instead of debugging the

low-level message-passing program;• to enable sophisticated data distribution steering and animation as well as visualization and

comparison of array values;• to provide support to examine the quality of data distribution strategies;• to develop debugging technology that is capable of handling both regular and irregular parallel

programs.

The development of SPiDER is a joint effort among several research groups in Austria, Germany,and Poland. SPiDER integrates a base debugging system (Technical University of Munich [1] andAGH Cracow [2,3]) for message-passing programs with a high-level debugger (University of Vienna)that interfaces with VFC (University of Vienna [4]), a Fortran 90/HPF compiler. Among others, aspecial symbol table has been developed to store information about the translation process fromFortran 90/HPF to Fortran 90 message-passing code. This symbol table is crucial in order to relatethe debugging process at the message-passing code to the HPF program. The visualization systemof SPiDER, which is crucial for achieving the design objectives mentioned above, consists of twosubsystems. Firstly, a graphical user interface displays the source code and allows the programmer tocontrol execution, to inspect and to modify the program state. Secondly, GDDT (University of Linz [5])is a sophisticated system to visualize data distributions and array values, to animate array distributionsequences and to display how data has been distributed across all processors.

SPiDER has been applied to several real-world applications including: a system for pricing offinancial derivatives [6] developed by Professor Dockner’s group at the University of Vienna, and asystem for quantum mechanical calculations of solids [7] developed by Professor Schwarz and hisgroup at the Vienna University of Technology.

In the next section we give an overview of the VFC compiler and the most important HPF languageconstructs which are necessary to describe some of the functionality of SPiDER. In Sections 3–5 wedescribe SPiDER as a multi-layer system comprising VFC, the HPF-dependent debugging system, anda base debugging system. The visualization system of SPiDER is presented in Section 6, experimentsare described in Sections 7, related work is discussed in Section 8, and concluding remarks are givenin Section 9.

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 3: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

SPiDER 105

2. PREREQUISITES

2.1. VFC compiler and High-Performance Fortran

The Vienna High Performance Compiler (VFC, [4]) is a command-line source-to-source parallelizationsystem that translates Fortran 90/HPF+ programs into Fortran 90/MPI message-passing SPMD (single-program-multiple-data) programs. The SPMD model implies that each processor is executing the sameprogram based on a different data domain. VFC has been developed at the Institute for SoftwareScience at the University of Vienna. The system is currently available for the Solaris operating system.Codes parallelized with VFC can be executed on clusters of workstations, and the QSW (Meiko) CS-2,NEC Cenju-3, Fujitsu AP-3000, and cluster architectures.

VFC is the first HPF compilation system that, besides the basic block and cyclic data distributions,provides new data distribution formats required for irregular codes such as the generalized blockand indirect distributions. Irregular codes can be characterized by data access patterns that cannotbe determined at compile-time, e.g. indirect array accessing where array subscripts are in turn arrayreferences. Moreover, the system fully implements dynamic data redistribution. VFC also provides awide range of powerful parallelization strategies that are applicable to a large class of non-perfectlynested loops with irregular runtime-dependent access patterns, which are common in industrial codes.

The input language to VFC is Fortran 90/HPF+ where HPF+ [8] is an improved variant of theHPF language. HPF consists of a set of language extensions for Fortran to support data parallelprogramming. The main concept of HPF relies on data distribution. A programmer writes a sequentialprogram and specifies how the data space of a program should be distributed by adding data distributiondirectives to the declarations of arrays. It is then the responsibility of the compiler to translate aprogram containing such directives into an efficient parallel SPMD target program using explicitmessage-passing on distributed memory machines. HPF+ goes beyond HPF by offering mechanismsto reuse runtime-generated communication schedules of irregular loops, to influence the mapping ofcomputations to processors, to assert the locality of computations, and to specify, in addition to thedistribution of an array, the possibility of irregular areas of non-local accesses.

The core element of HPF is the specification of the data distribution which is expressed by theDISTRIBUTE directive. In HPF+ the data space is mapped directly onto a virtual processor array. HPFsupports a two-level mapping model where arrays must be at first aligned to a template and then thetemplate is distributed onto a processor array. Processor arrays are declared by using the PROCESSORSdirective. For every array dimension the distribution is specified separately. HPF+ extends the standardHPF set of distribution methods (replicated, block, cyclic, block-cyclic) with the generalized block andindirect distributions which support more flexible distribution methods especially useful for irregularproblems. In HPF the ALIGN directive can be used to specify that an array or an array dimension(alignee) is distributed in the same way as another array or array dimension (align target).

An array can be distributed statically or dynamically. Arrays are assumed to have static distributions,unless a programmer explicitly associates them with the DYNAMIC directive. A static distributionof an array cannot be changed during execution. On the other hand, dynamic distribution allowsredistributing arrays during program execution. The REDISTRIBUTE directive can be used to changethe distribution of an array which can cause communication among processors that own parts of thearray. If the array being redistributed is an align target, all its alignees also have to be redistributedaccordingly. The set of possible distributions for a given array can be narrowed by employing the

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 4: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

106 T. FAHRINGER ET AL.

RANGE directive (introduced in HPF+). When the RANGE attribute is not present then a dynamic arraycan be assigned any valid distribution. The RANGE directive may enable the compiler to apply moreaggressive optimization techniques.

The INDEPENDENT directive is a key feature of HPF to express parallelism which asserts thatthe iterations of a subsequent DO loop are independent and, therefore, may be executed in any order.In this context VFC supports the NEW clause for defining variables that are private to all loop iterations,the REDUCTION clause for performing reduction operations, and the ON HOME clause to specify howthe loop iterations should be distributed.

VFC employs the inspector/executor [4] strategy for parallelization of DO INDEPENDENT loopnests. We will describe this strategy in more detail because it is a crucial feature for real-worldapplications and it is supported by SPiDER.

In general, a loop nest that is transformed based on the inspector/executor strategy is executed inthree phases: the work distribution phase, the inspector phase, and the executor phase. In the workdistribution phase the iteration space of the loop nest is distributed among processors. A programmercan control the work distribution of the loop nest by, firstly, specifying whether loop iterations aredistributed BLOCK or CYCLIC onto a set of processors, or by, secondly, using the ON HOME clause.For instance, ON HOME(A(i)) states that the iteration i is executed on the processor that owns A(i).In the inspector phase the communication schedule is computed together with other supporting datastructures. The communication schedule specifies for each processor the set of non-local data elementsaccessed during the loop execution which must be transferred from other processors. In the executorphase, all non-local data are gathered at first (according to the computed prior schedules). Thereafter,the actual execution of the loop is performed. Note, it is also possible that a processor modifies dataowned by another processor which invokes a final scatter communication.

HPF+ contains several language constructs to improve the parallelization quality (e.g. by reducingruntime analysis and/or communication) of DO INDEPENDENT loop nests. The RESIDENT clauseexpresses the locality assertion for the specified arrays. For such arrays the compiler can assume thatall array accesses are local. Work distribution and inspector phases can be very time consuming due toruntime analysis and communication. VFC tries to reduce these overheads by reusing the informationobtained during these phases across subsequent executions of the loop. However, in many cases, thecompiler does not have sufficient information to decide whether an inspector computation is redundant.Therefore, eliminating redundant inspector phases is performed in VFC based on the REUSE clause.For a DO INDEPENDENT loop with a REUSE clause the work distribution and inspector phases areguarded by means of conditional statements to enforce that they are executed only if the reuse conditionis true or if the loop is executed for the first time.

3. SPiDER

SPiDER [9] is an advanced symbolic debugging system for Fortran 90/HPF parallel programs thatenables program processes at the source code level to be controlled and monitored. A multiple processview of the program allows a programmer to examine a single process of a parallel program or toinspect the entire program from a global point of view. SPiDER permits distributed data structuresto be examined as a single global entity; for instance, a programmer can inspect and modify asection or individual elements of distributed arrays without the need to specify on which processor the

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 5: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

SPiDER 107

University of Vienna

TU Munich & AGH Cracow

MonitoringSystem

OCM

Parallel Hardware

Object program

F90Compiler

F90

Symbol

Table

HPF+

Symbol

Table

Parallel ProgramF90+MPI

GUIGDDT

University of Vienna & University of Linz

Vienna Fortran CompilerVFC

Source ProgramHPF+

Visualization System

Debugging System HPF Dependent

Base Debugging System

Figure 1. Architecture of the HPF debugging system.

elements reside. Moreover, SPiDER provides support for regular and irregular applications with severalexceptional features for visualization and steering of data distributions. The data distribution can bedynamically changed after stopping program execution at a breakpoint. Sophisticated visualizationcapabilities provide graphical representations of array values and the data distribution with convenientnavigation facilities for distributed data and logical processor arrays. The contents of an array canbe stored in so-called snapshots. Differences between snapshots can be graphically displayed. Forcomplex applications in which the distribution of arrays changes many times during program execution,

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 6: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

108 T. FAHRINGER ET AL.

SPiDER provides an animated replay of the array redistribution sequence and allows the migration ofarbitrary array elements to be observed in a stepwise or continuous mode. Finally, SPiDER supportsa load diagram that visualizes how many array elements have been mapped to every processor.This feature enables a programmer to examine how evenly the data has been distributed across allprocessors.

Figure 1 shows the architecture of SPiDER with an emphasis on the support provided by VFCand low- and high-level debugging technology. The input programs of SPiDER are compiled withVFC to Fortran 90 message-passing programs. In order to generate an executable file, a vendor back-end Fortran 90 compiler is used. The two-stage compilation process is reflected in the debuggerarchitecture. The main parts of the system are the Base Debugging System (BDS) (Section 4) andthe HPF-Dependent Debugging System (HDDS) (Section 5). BDS operates as a low-level debuggerclosely related to the target machine on which it is running. It resolves all platform-specific issuesand hides them from the HDDS level. It also constitutes a clear, simple but unequivocal interface thatprovides functionality allowing the state of processes and values of data in the parallel program to beinspected. BDS does not check for consistency of the running application with the HPF source codebut provides information to HDDS about every process of the program. The design of BDS partiallyrelies on the DETOP parallel debugger [1] and on the OCM [10] monitoring system developed atLRR-TUM. HDDS works on top of BDS and provides a higher level functionality for visualizing theassociated HPF source code of the target parallel program and for interactively controlling and alteringapplication data. The interface of SPiDER to VFC is supported by a symbol table file (see Section 5.1)which includes mapping information about mutually corresponding lines and symbols in the HPF andthe resulting Fortran 90 message-passing programs, and information about compiler transformations.

A programmer interacts with SPiDER by using the visualization system which consists of aGraphical User Interface (GUI) and a Graphical Data Distribution Tool [11] for visualization of HPFdata structures [5] (see Section 6.2).

4. BASE DEBUGGING SYSTEM

BDS is responsible for providing basic debugger functionality at the level of the parallelFortran 90/MPI program generated by VFC. BDS is based on parts of the debugger DETOP whichwas originally developed at LRR-TUM for Parsytec computers. The current DETOP version supportsparallel programs written in C and Fortran—based on the PVM or MPI programming library [2]—which are executed on clusters of workstations.

DETOP provides a procedural C/C++ interface (called ULIBS) for all its debugger commands,which can be used to construct high-level debuggers such as SPiDER. The procedural interface ofDETOP allows its functionality to be accessed much more easily compared to other systems thatoffer only command-line or graphical interfaces designed to be accessed by programmers but not bysoftware systems. An investigation of ULIBS showed that it is sufficiently complete to act as the BDSfor SPiDER. On the other hand, it is small and relatively simple, so there would be a good chance ofreplacing it by some other system if necessary, if one was available for a special target machine.

ULIBS is a library offering a single point of control over a distributed set of processes. Thisincludes debugger functions like running, stopping, and single-stepping processes, setting breakpoints,viewing the back-trace of procedure calls, and viewing/modifying variables. A basic feature of ULIBS

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 7: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

SPiDER 109

is location transparency, for instance debugger functions can be called without knowing the exactlocation of a process on a parallel architecture. Location transparency is realized by using theOMIS [12] compliant distributed on-line monitoring system (OCM) which monitors and optionallymanipulates the execution of processes of a parallel program. OCM consists of a local monitor processon each target node and a component that automatically distributes requests to the proper monitorprocesses [10]. Since the resource usage (CPU, memory, and communication) of the monitor processesshould be minimal, the OCM operates on physical addresses (instead of symbols) which removes theneed for storing replicated copies of the program’s Fortran 90 symbol table on each node. Instead, thesymbol table is stored only once within ULIBS. However, ULIBS can store multiple symbol tables,e.g. to support heterogeneous target systems.

A major concern for the development of SPiDER was the fact that the ULIBS functions implycommunication with OCM (a distributed system). The original ULIBS was based on blockingcommunication, which in turn blocked the user interface every time a ULIBS function was called. Evenmore critical was that only one OCM request could be active at any given time, which prevented thedebugger from executing independent requests in parallel. These drawbacks led to an improved versionof ULIBS which enables non-blocking ULIBS functions. In the current version, ULIBS functionsimmediately return a result-id which represent a future result. A call-back function may then beassociated with one or more result-ids and will be called as soon as the specified results are available.

5. HPF DEPENDENT DEBUGGING SYSTEM

HDDS is a debugger kernel which services all the functionality at the HPF program level. HDDScontrols program execution, for instance creation of processes at the beginning and termination ofprocesses at the end of the debugging session, proceeding program execution to the next breakpoint,single stepping statement by statement, and dealing with program exceptions and interrupts. Afterstopping the program at a breakpoint, HDDS enables inspection of the program which includes:presenting the back-trace of procedure calls, mapping relationships between data and processors,and viewing data distributions, data types, and values. More generally, the main responsibilities ofHDDS covers observing and controlling the state of many processors, and summarizing and presentingdistributed information in a concise and clear way together with the input program. The currentimplementation focuses on support for a unified view of distributed arrays. HDDS resolves the problemof finding location and fetching the values of distributed array elements. Distributed array elements canbe examined and modified by simply referring to the array’s name. However, for execution control andexamining program stack HDDS still supports a multiple process view of the HPF program. As a result,SPiDER displays a list of all processes that exist in the program and the user can switch between themor select a group on which the debugger commands work.

5.1. Symbol table

In a source-level debugger, a programmer operates on the application’s objects by using symbol names(e.g. variable and procedure names) only. It is the task of the debugger to map these names to thecorresponding objects located in the address space of the program. Compilers provide this crucialinformation to debuggers through a symbol table.

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 8: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

110 T. FAHRINGER ET AL.

As described in Figure 1, HPF programs are compiled in two phases. In phase 1, an HPF programis transformed into Fortran 90 message-passing code which in turn is compiled into machine codefor a specific target architecture in phase 2. For both phases a unique symbol table is generated.VFC and the target Fortran 90 compiler, respectively, generate an HPF symbol table and a machinecode symbol table (we call it the target symbol table). The HPF symbol table describes the translationfrom HPF to Fortran 90 message-passing code which is critical information for SPiDER so that it canrelate the two source codes as well as their variables to each other. The target symbol table associatesFortran 90 statements with blocks of machine code. In this section we describe the content, structure,and implementation of the HPF symbol table.

Since HPF is a set of directive-based extensions for Fortran 90, the structure of both the HPF and thegenerated Fortran 90 message-passing programs is the same. In order to avoid redundancy, the HPFsymbol table contains only information about HPF specific symbols (e.g. distributed data and processorarrays) and program transformations (e.g. line mapping information, effects of program parallelizationand optimizations, etc.). The HPF symbol table information provides a concise data and control flowview of the HPF program.

Existing symbol table formats which are used to describe a mapping between source and machinecode do not meet the requirements of a compiler that transforms HPF to Fortran 90 message-passingcode. Therefore, we decided to adjust an existing format for our HPF symbol table. An investigationof existing support for debugging HPF programs in the PGI HPF compiler [13] convinced us to useASCII code to improve portability of SPiDER. The format of our symbol table also retains the scopingstructure of the stored information that reflects the HPF and Fortran 90 semantics. Similarities with thePGI HPF symbol table can also be found in the notation (for instance, tags) used to identify the basictypes of entries describing program structure, for instance program units and symbols. However, theformat of these entries has been redesigned due to the fact that the PGI HPF compiler transforms HPFto Fortran 77 programs. Our format focuses specifically on HPF directives of the program and containsinformation only about data structures and blocks of the code that extended/changed their meaningafter applying HPF constructs in the compilation process. For this purpose we added a new type ofsymbol table entry which describes parallel constructs appearing in the code.

During compilation VFC generates a separate HPF symbol table for every HPF source file of theinput program. An HPF symbol table file comprises a set of records. Each record is described by itstype and consists of various fields separated by white space. The number of fields and their formatdepend on the type of the record.

Figure 2 depicts the structure of the HPF symbol table file as generated by VFC. All informationabout a single HPF source file is enclosed in a file section. A file section starts with a begin file sectionrecord and ends with a matching end file section record. The begin file section record stores informationabout the location of the HPF program and the generated Fortran 90 message-passing file. Furthermore,a time stamp is included which allows the symbol table to be checked for validity.

The symbol table file has a hierarchal structure which corresponds to the hierarchy of Fortranprogram units (e.g. program, subroutine, function, or module). Every Fortran unit is represented bymeans of the block construct. Each block starts with a begin block record and ends with a matchingend block record. A block contains a list of symbols, a list of parallel regions, and a list of blockswhich are included in this block (e.g. a program may contain subroutines). The list of symbols providesinformation about HPF data objects such as processor arrays, distributed arrays, and other variables thathave been inserted during compilation. The list of parallel regions contains information about parallel

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 9: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

SPiDER 111

List of symbols

List of parallel regions

Begin block

List of symbols

List of parallel regions

End block

(HPF name, F90 name)

(id, HPF name, F90 name, type, attrib., desc.)

(HPF name)(HPF name)

Begin file section

(HPF name, F90 name)

(list of pairs <HPF line, F90 line>)

(HPF file)

Begin block

End block

End file section

Line translation

(id, type, start line, end line, symbol ids)

(path, HPF file, F90 file, time, comp. vers.)

(id, HPF name, F90 name, type, attrib., descr.)

(id, type, start line, end line, symbol ids)

Symbol table file

Figure 2. Organization of the HPF symbol table file generated by VFC.

constructs in the program (e.g. DO INDEPENDENT loops). The HPF symbol table file also contains aline mapping table that enables the debugger to associate HPF source lines with their correspondingline in the Fortran 90 message-passing code and vice versa.

Figure 3 shows an HPF example code, the associated Fortran 90 message-passing code, and the HPFsymbol table as generated by VFC. For the sake of illustration only the most important code sectionsare shown. The arrows between HPF and Fortran 90 message-passing code illustrate line mappingrelationships. The HPF symbol table includes only line mappings between code blocks (e.g. begin andend of loops and programs) and executable statements. Note that the first line of the DO INDEPENDENTloop is related to the first line of the inspector phase—inserted by VFC to implement the parallel loop—in the Fortran 90 message-passing code.

An excerpt of the HPF symbol table that is generated is displayed in the lower part of Figure 3.Records specified with begin tag F and end tag Z mark the beginning and the end of the symbol tablefor this file. Note that a file may contain several units of a program. The unit boundaries are specifiedby a begin tag B and an end tag E. A record defined by tag L specifies a set of line-to-line mappingpairs. ‘L 8’ means that there are eight pairs of line-to-line mappings between the HPF and Fortran 90message-passing code. For instance, the entry ‘1 8’ means that line 1 in the HPF code is associatedwith line 8 in the Fortran 90 message-passing code.

A block of a given unit is marked with begin tag B and end tag E. Tag S describes HPF symbolssuch as a processor array P and a distributed array A, which are described by references to runtimedescriptors provided by VFC and also used by SPiDER. The record marked with tag P allows a parallelregion to be located through a begin and an end line. For instance, the DO INDEPENDENT loop in thegiven code excerpt is bounded by lines 14 and 16.

In order to maintain a portable debugging system, a symbol table management library ST LIB hasbeen developed [1] and extended to support Fortran 90/HPF compilers [3]. ST LIB can read symbol

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 10: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

112 T. FAHRINGER ET AL.

HPF source code

F /home/spider/ a.hpf a_vfc.f90 907755947 1.1B simple_hpf simple_hpf

S 406 a a 2 0 a_dsc

HPF Symbol Table Contents

VFC

PROGRAM simple_hpfUSE vhpf_runtime

1)2)3)4)5)6)7)8)

Fortran 90 codeVFC

S 405 p p_dsc 1 0 p_dsc

PROGRAM SIMPLE_HPF!HPF$ PROCESSORS P(2)

REAL, DIMENSION(10) :: A!HPF$ DISTRIBUTE(BLOCK) ONTO P :: A

1)2)3)4)5)6)7)8)9)

10)11)12)13)14)

16)17)18)19)20)

15)

END PROGRAM SIMPLE_HPF

END SUBROUTINE FOO

CALL FOO(A)

CONTAINSSUBROUTINE FOO(X)REAL, DIMENSION(10) :: X

!HPF$ DISTRIBUTE(BLOCK) ONTO P :: X

!HPF$ INDEPENDENT, ON HOME(X(I))DO I=3, 10

X(I) = X(I) + IENDDO

REAL, ALLOCATABLE, DIMENSION(:) :: aTYPE (proc_dsc), POINTER :: p_dscTYPE (rt_dsc), POINTER :: a_dsc< internal variables declarations >

< calls to runtime initialization functions >

DO i_000 = 1, niter(1) i = exec(1)%iters(i_000)x_i1_01 = (i_000 - 1) + x_istart_1_01x(x_i1_01) = x(x_i1_01) + (i)

ENDDO

CONTAINS

< inspector code >

� � �

SUBROUTINE foo(x, x_dsc)

CALL foo(a, a_dsc)

43)42)41)40)

END PROGRAM simple_hpf < calls to runtime finalize functions>

END SUBROUTINE foo< cleanup code >

L 8 1 86 19 9 2314 3115 4116 4418 4920 58Z simple_hpf

continued

19)

21)

23)

!HPF$ INDEPENDENT, ON HOME(x(i))39)31)

44)45)49)

51)58)

B foo fooS 409 x x 2 0 x_dscP 410 1 14 16 E fooE simple_hpf

Figure 3. HPF symbol table example.

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 11: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

SPiDER 113

files in various formats (e.g. binary or ASCII format) and transforms them into a generic language-independent format. ST LIB can manage several symbol tables at the same time which enablesdebuggers to handle heterogeneous programs for heterogeneous workstation clusters. Even programswhich consist of different binary files that are dynamically loaded at runtime can be supported.

At the beginning of a debugging session the target machine and the file format which are used tobuild the executable file are determined. Thereafter, a format-specific conversion module of ST LIBtransforms all relevant data to the internal format of ST LIB. This format reflects the structure of theinput program which in turn has an impact on SPiDER’s internal program representation. For instance,a program is described by several objects which include: source files of the program, procedures(subroutines), functions, variables, data types, mapping between source and target statements, etc.

The internal symbol table format is organized as a tree-like hierarchical structure. Each node ofthe tree represents a data object (for instance, a symbol) and contains a set of attributes describingits properties. The common part of a symbol specification comprises the symbol name and address,the symbol class (e.g. variable, procedure, and constant) and a subclass which determines the primaryscope of the object (e.g. local/global symbol, function parameter, etc.). The root node represents anentire application. Program unit nodes are the roots of subtrees which contain all objects defined orenclosed in the scope of that unit. Each statement of the program is associated with a unit node thatimmediately encloses this statement.

A programmer refers to a symbol by its name. In order to find the actual representation of a symbol,the debugger invokes a symbol lookup function. The search for a symbol starts at the node associatedwith the currently valid scope which is usually the unit that immediately encloses the statement of abreakpoint. If the symbol cannot be found in a node then the search continues in the parent of this node.The search proceeds until the symbol has been found.

Besides a symbol table lookup function more sophisticated queries are provided. Since each nodecontains the entire set of attributes describing a given object, the symbol table manager can return a setof objects that matches a query based on these attributes. A variety of query types is supported whichincludes:

• local: the query is limited to a symbol defined in a single unit;• tree: the query traverses a given tree;• scope: the query starts at a given node and continues with its parent node until it reaches the root

node.

5.2. Displaying Fortran 90 types

As HPF provides directive-based extensions to Fortran 90, an HPF debugger must be able to supportFortran 90 data types. Among the most important features are dynamic data structures. Moreover, theconcept of modules requires the debugger to address the issue of variable scope.

Dynamic data structures

Fortran 90 introduces three types of dynamic data: pointers, automatic data objects, and dynamicarrays. A Fortran 90 pointer type allows the programmer to point to scalar or array types. Automaticdata objects consist of those objects that are created on entry to a subprogram and destroyed upon exit.

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 12: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

114 T. FAHRINGER ET AL.

For dynamic arrays, e.g. allocateable arrays, pointers to arrays, and assumed shape arrays, the actualbounds are not determined until the array is allocated, the pointer is assigned, and the subroutine iscalled, respectively. SPiDER displays the type of dynamic data imitating the Fortran 90 syntax. Forexample, a rank 1 dynamic (assumed shape or allocateable) array of real data is displayed using atypeof command as follows:

REAL*8 :: (?:?)

Similarly, a pointer to the same rank 1 array of real data is displayed as follows:

REAL*8 , pointer :: (?:?)

If array bounds are unknown at runtime (the array has not been allocated so far), then the unknownlower and upper bound values are denoted with a ‘?’ character; otherwise the actual array bounds areprinted. For instance, a pointer to a rank 1 array value of real data is displayed using a typeofcommand as follows:

REAL*8 , pointer :: (1:32)

The allocated lower and upper bounds of this array are given by 1 and 32, respectively. Additionally,SPiDER indicates in the variable’s type information whether it is distributed or replicated. For instance,the full information displayed by the typeof command applied on an array u distributed on fourprocessors is as follows:

Type of u is:Distributed onto[p1,p2,p3,p4]REAL*8, pointer :: (1:32)

Fortran 90 modules and variable scopes

Fortran 90 provides a new program unit, called a module, that lets the programmer place a set ofdeclarations and module procedures under a single global name available for access in any otherprogram unit via the USE statement, unless they have been renamed or excluded by the USE ONLYstatement. As a result the programmer does not need to explicitly qualify the name of a modulefunction or variable from the source code. When debugging this kind of information, the user mustuse the following syntax to specify the context of displayed functions or variables contained within amodule:

typeof ’module_name’ variable_name|procedure_name

Generally the user can use the following syntax to specify a context for typeof, set, and printcommands:

context ::={ ‘‘| ‘module_name‘| ‘procedure_name‘| ‘module_name:procedure_name‘| ‘procedure_name(incarnation)‘| ‘module_name:procedure_name(incarnation)‘}

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 13: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

SPiDER 115

The context specifications have the following meanings:

• ‘‘: this string defines a global context, for instance each variable name given in the expressionis regarded as a global variable. Using this context the user can inspect global variables thatotherwise would be shadowed by local variables with the same name.

• ‘module name‘: supplying a module name results in a global context, for instance eachidentifier in the expression is first searched for among the global variables defined in the givenmodule. In this way the user can explicitly request the value of a global variable in a givenmodule.

• ‘procedure name‘:‘module name:procedure name‘: specifying a procedure name will result in theevaluation of the expression in the context of the topmost activation of that procedure call stack.All identifiers will be searched for according to the usual rule: identifiers defined locally in theprocedure are considered first, then identifiers in the lexically enclosing scopes. The modulename and the colon may be omitted if the procedure name is unique within a program.

• ‘procedure name(incarnation)‘:‘module name:procedure name(incarnation)‘: the user may also add anincarnation number to the context specification. In this way the user can read the values in otherthan the most recent incarnations of recursive procedures. The incarnation number is 0 for themost recent incarnation and is incremented for each occurrence of the given procedure scanningthe procedure call stack from top to bottom.

If no context is specified on the command line, SPiDER will use the current context of each process.The current context is the local context defined by the topmost activation of a procedure compiled witha debugging option (for instance, -g).

5.3. Replicated variables

SPiDER is given the information about replicated variables from a program symbol table generated bythe compiler. If the value of a replicated scalar variable changes in one or more processes, then SPiDERautomatically invokes an update of replicated variables. As a general approach SPiDER comparesvalues of scalar variables in selected processes and presents the synthesized view by grouping valueswith processes. Lists of processes with the same value of a scalar variable are enclosed in squarebrackets followed by the value of the scalar. For example, the value of a variable ‘i’ at some executionpoint can be displayed by using the print command as follows:

Value of ’i’ is:[p1, p3, p4] 2[p2] 3

5.4. Data distribution steering

The capability to modify variable values in order to influence the behavior of a program is a veryimportant feature of traditional debuggers. For long-running applications the programmer may inspectthe program state at a given breakpoint and also control parameters that impact the program’s

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 14: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

116 T. FAHRINGER ET AL.

performance. Specifying data distributions is of paramount importance to impact the performanceof HPF programs. Therefore, the ability to steer the selection of data distributions provides theprogrammer with an excellent capability for program tuning. However, changing data distributionsduring program execution must be done with great care, otherwise compiler assumptions about datadistributions may become invalid, which can result in incorrect program behavior. Compilers performvarious optimizations and transformations based on the assumption of a single data distribution(or possibly a set of them) that hold at a specific program point. If a programmer changes the datadistribution during a debugging session such assumptions may become invalid. Whether interactiveredistribution of an array at a specific program point is valid or not mostly depends on the underlyingcompiler. In the following we discuss important issues for interactive array redistribution underSPiDER.

In HPF the DYNAMIC directive is used to specify that the distribution of an array can be changedduring program execution. All other arrays are assumed to be statically distributed (distributioncannot be changed during execution). For DYNAMIC arrays, compilers may assume that the associateddistribution strategy is always unknown and, therefore, generate code that is distribution transparent,for instance its behavior does not depend on the distribution of arrays and refrain from performing anydistribution-driven optimizations. However, advanced compiler technology may determine the set ofpossible distributions of an array at a given program point. This information can enable more optimizedcode generation which usually implies a reduced runtime overhead.

The execution of an SPMD parallel program commonly consists of interleaved phases ofindependent computation and communication phases. Note that independent computation phases arenot restricted to code sections associated with the INDEPENDENT directive. The processes of a parallelprogram are not synchronized during a computation phase and every process may execute a differentline of code at any given point in time. In many cases breakpoints serve as synchronization pointswhere the debugger can provide a consistent view of the program execution and data (for instance,single execution point of processes and single value of replicated variables). However, there areexceptions based on the parallel nature of some HPF constructs that break the consistency of theprogram. The most typical example is a DO INDEPENDENT loop nest where every process executesa unique set of loop iterations. In order to enable a genuine parallel execution of DO INDEPENDENTloops, all data read or written by any process has to be local, otherwise non-local accesses wouldsynchronize the execution. Invoking an array redistribution during execution of a DO INDEPENDENTloop would change the placement of array elements and as a consequence may invalidate the currentwork distribution.

VFC employs the inspector/executor strategy (see Section 2.1) to implement DO INDEPENDENTloops. A communication schedule specifies the non-local data needed to perform local computations.The loop nest is transformed in order to provide a uniform mechanism for accessing local andnon-local data kept in buffers. By changing the distribution of a given array both communicationschedule and associated buffers to access the array would be invalidated. The semantics of the programmay be changed and incorrect results may be computed or the program may even crash. Anotherdanger of changing a program’s semantics stems from the REUSE clause which prevents redundantcomputation of inspector phases. Array redistribution could invalidate communication schedules and,therefore, also REUSE clauses. Although it is possible to recalculate the work distribution and resumeexecution of a loop nest, VFC and SPiDER currently disallow array redistribution during execution ofa DO INDEPENDENT loop.

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 15: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

SPiDER 117

GDDTGDDT

of the running program4

Program

Parallel file system

5

32

1

2

user interaction

via GUI

calling internal function

reading of arbitrary array sectionvia parallel I/O library

Gateway

writing of the array contentParallel

via OCM

via parallel I/O library

commands via pipecommands via ET++Debugging

System

Figure 4. Interaction between the debugging system and GDDT.

VFC provides SPiDER with important information (included in the HPF symbol table) to decidewhether redistribution is allowed or not. Currently array redistribution is allowed based on thefollowing constraints:

1. an array is associated with the DYNAMIC directive;2. an array is not an alignee or an align target;3. a breakpoint is set outside a DO INDEPENDENT loop nest;4. distribution-driven compiler optimizations are turned off.

VFC determines these constraints and provides them to SPiDER through the HPF symbol table. Theconditions are evaluated by the debugger at breakpoints and, depending on the result, a programmer ispermitted to change the distribution of an array or not.

5.5. Support for visualization of distributed data

SPiDER uses GDDT (Graphical Data Distribution Tool) (see Section 6.2) to visualize the distributionor array values. GDDT commands (opening and displaying data files, visualizing array distribution orvalues, etc.) can be invoked by using the ET++ user interface library [14]. In order to interface GDDT tothe debugging system (DS, comprising HDDS and BDS of Figure 1) we developed the GDDT gateway(see Figure 4) which is built upon ET++. DS accesses the GDDT gateway through a Unix pipe.

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 16: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

118 T. FAHRINGER ET AL.

In order for GDDT to visualize array information, DS generates a file (with pseudo-Fortran format)with all important array information (e.g. rank, shape, distribution, etc.). DS issues a GDDT commandto open this file and interprets its contents via the GDDT gateway. Thereafter, GDDT can visualizearray values and data distributions depending on the semantics of the command issued. We use apseudo-Fortran format for the file to be read by GDDT because GDDT has a built-in parser forparsing Fortran programs. Similarly, the GDDT gateway is used to access GDDT’s data animator (seeSection 6.2) which visualizes a redistribution history for a given array during a debugger session.

If array values should be displayed, DS invokes a function (part of a SPiDER library which is linkedwith the target program) of the currently debugged program that stores the array contents in a file(see (2) in Figure 4) and passes the name of this file on to GDDT. Thereafter, GDDT accesses this fileby using an I/O library and extracts the requested array sections from it (see (5) in Figure 4).

GDDT currently uses an I/O library based on MPI-IO [15] to access files provided by DS.We currently use two different library versions: one for distributed memory parallel programs andanother one for sequential programs. The first version is used to write distributed arrays to a file. Thesecond one is used to write and read files by DS.

5.6. Collecting data for redistribution history

The data required to display the redistribution history of arrays is collected by SCALA [16], anintegrated instrumentation and profiling system for VFC. SCALA inserts instrumentation code atall program points where redistribution can occur. During execution of the HPF program everyredistribution defines an event which is written to a log file, and which records all importantinformation including source file, event number, code line number, array name, rank, shape, distributioninformation, processor array, etc. Most of this information can be accessed by using VFC runtimedescriptors that are maintained by VFC runtime libraries. The log file is written in a pseudo-Fortranformat which can be accessed by GDDT in order to animate the redistribution sequence of a givenarray.

The instrumentation code does not influence the execution order and has very little impact on theperformance of the program to be debugged. A negative consequence of instrumentation can be thatsome objects in the memory may be shifted and as a result some elusive bugs may appear in a differentplace or can be masked during debugging. For such cases the programmer can compile the programwithout visualization support and use all other SPiDER functionality to find these bugs.

6. VISUALIZATION SYSTEM

In this section we describe the visualization system of SPiDER which consists of two sub-systems:firstly, a graphical user interface which displays the source code and allows a programmer to controlexecution, to inspect, and to modify the program state; secondly, GDDT which is used to visualize datadistributions and array values, to animate data distribution sequences, and to reflect the quality of datadistributions.

6.1. Graphical user interface

SPiDER’s GUI comprises a debugger window (see ‘SPiDER window’ in Figure 5) which consists ofseveral frames: task frame, source code frame, state frame, output frame, and command frame. The

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 17: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

SPiDER 119

Figure 5. Inspecting the pricing system under SPiDER.

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 18: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

120 T. FAHRINGER ET AL.

source code frame shows the source of the currently debugged program. A green arrow (see Figure 5)points to the current statement where the debugger stopped all associated processes. Breakpoints aremarked by red STOP icons. A programmer may click on a code line, variable name, or a breakpointmarker which pops up a menu offering all possible debugger commands for the given selection. Forinstance, if a line or a variable is selected, the menu will allow a breakpoint to be set or the type orcontents of the variable to be printed. If a breakpoint marker has been selected, the menu will enablethis breakpoint to be deleted or modified.

The task frame tabulates the list of processes (tasks) which are currently executing the programshown in the source code frame. All commands invoked by a programmer in the debugger window willbe applied to all selected processes in the task frame. In addition, there are also global commands thataffect the entire application, e.g. a global stop. The set of processes shown in the task frame can bechanged any time.

The state frame displays the back-trace of procedure calls for all processes that are currently stopped.The command frame enables the programmer to enter debugger commands. The output frame showsvarious debugger outputs as a result of debugger commands entered by a programmer. In this frame,for instance, SPiDER outputs array values or data distributions.

A single debugger window is often sufficient and most convenient for debugging SPMD data parallelprograms where all processes execute the same code. However, in the case where different processesare executing different parts of a program at a given time, it is very useful to simultaneously visualizeall source code frames. Among other things this feature may be useful if pure procedures are calledin DO INDEPENDENT loops. Moreover, a coarse grain view of the entire program can be shown inone window, whereas a specific process could be debugged in a second window, which is useful fordebugging EXTRINSIC(HPF LOCAL) and EXTRINSIC(HPF SERIAL) procedures.

SPiDER has been designed to allow multiple debugger windows, each of which may be associatedwith an arbitrary set of processes.

The debugging commands offered by SPiDER can be subdivided into six classes.

1. Execution control. SPiDER enables starting, stopping, and single stepping of either the entireprogram or a specific process at any given time (see Figure 5).

2. Inspection of program state. These commands allow retrieval of information on the program’scurrent state, e.g. the list of all existing processes, the current point of execution, the back-trace ofprocedure calls, and the types and values of variables (distributed and replicated) or expressions(see Figure 5).

3. Visualization of distributed data. These commands invoke GDDT to graphically visualize datadistributions and array values (see Figure 5). Moreover, a history of data distributions and arrayvalue changes can be displayed.

4. Modification of program state. A set of commands is provided to modify the contents ofvariables and to change data distributions.

5. Events and actions (breakpoints). Breakpoints (see Figure 5) may be set on source code linesor procedure entries for an arbitrary set of processes. A breakpoint consists of an execution eventand an associated stop action. The event is raised whenever one of the selected processes reachesthe given position in the source code. The stop action can either stop the process that raised theevent, the processing node (on which a process is executing), or the entire program. These modesare essential in order to obtain consistent views of shared variables or the program’s global state.

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 19: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

SPiDER 121

Additionally, there are several events that are permanently monitored by SPiDER, e.g. exceptionsor termination of a process.

6. Miscellaneous. There are also commands to display a source file, to set defaults, e.g. a defaultaction for breakpoints, and to configure the graphical interface.

6.2. GDDT

In this section we describe GDDT which is used by SPiDER to visualize distributed arrays and theircorresponding processor arrays. Note that the development of GDDT has been largely driven by theneeds of SPiDER; however, as of today it is a tool that can be used for other systems as well. Moredetails about GDDT can be found in [5].

GDDT has been designed for visualization and manipulation of distributed data structures andcomprises the following features:

• visualization of

– data distributions,– statistical information about data distributions,– array values;

• animation of redistributions histories.

Array distribution viewer

Up to three dimensions of distributed arrays with an arbitrary number of dimensions can be visualized.For instance, a three-dimensional array is displayed as a cube. If an array has more than threedimensions then a programmer must choose up to three dimensions for visualization. The three-dimensional projection on the screen can be translated, rotated, and zoomed for better investigation ofthe displayed geometry. The same is true for the associated processor arrays. Both data and processorarrays are color-coded. Every processor in the processor array has a unique color. Data elements haveidentical colors as their owning processors which visualizes the mapping of data to processors. Forinstance, Figure 6 shows a one-dimensional processor array PR(1:4) (window ‘Processor Array’)together with a data array h(1:50,1:50) (window ‘Data Array’).

Array value viewer

An important feature of an on-line visualization tool is the ability to reflect data values at specific pointsduring execution of a program. For this purpose GDDT provides the array value viewer that visualizesarray element values through color-coding. The color spectrum ranges from ‘cool’ (reflecting lowvalues) colors such as blue to ‘warm’ (reflecting high values) ones such as red.

Like GDDT’s array distribution viewer, the array value viewer can display up to three arraydimensions. Several array value viewers can be invoked for a specific array in order to visualizethe array’s contents at different points in program execution. The value of an array at a specificexecution point is named the ‘array snapshot’. This feature is especially useful for monitoring the

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 20: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

122 T. FAHRINGER ET AL.

Figure 6. Inspecting the HNS code under SPiDER.

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 21: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

SPiDER 123

Figure 7. Visualization of array values of the HNS code.

progress of a program’s execution. For instance, the value changes of an array can be monitored forsubsequent loop iterations in order to detect convergence problems. Since small differences betweentwo array snapshots may not be detected by a human being, GDDT visualizes also the difference ofsnapshots. Up to seven array snapshots can be stored under GDDT. Arbitrary pairs of snapshots can beselected for comparison. Snapshot differences are again visualized through color-coding. Strong andsmall differences are displayed by using the colors red and blue, respectively. Figure 7 illustrates thearray value viewer. The two upper windows display two different snapshots of an array and the lowerleft window shows the differences in values of the two snapshots. The lower right window shows

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 22: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

124 T. FAHRINGER ET AL.

Figure 8. The data load diagram.

the content manager for naming snapshots and selecting snapshots for comparison. The snapshotvisualization can be controlled by a programmer, for instance, by zooming, rotating, and slicing(choosing array sections) operations.

The display can be customized by a programmer (see the ‘DAAV View Controls’ and ‘Scene-Setup’windows in Figure 6), for instance, by changing the color range, adding grids to the display, etc.

Data load diagram

A load balance metric of a program specifies how evenly program data are distributed onto a set ofprocessors. For this purpose GDDT includes a data load diagram (see Figure 8). For every processora color-coded bar depicts the number of array elements that have been mapped to it. The colors help todistinguish the array elements of neighboring processors in the diagram. The info browser in the lowerpart of the load diagram shows several statistics, for example the average number of data elementsowned by a processor in a given processor array or the variance of the data load. Load imbalances asshown in Figure 8 occur, for instance, in triangular loop nests that iterate over arrays based on a blockdistribution.

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 23: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

SPiDER 125

Figure 9. Inspecting the ADI code under SPiDER.

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 24: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

126 T. FAHRINGER ET AL.

Distribution animator

A distribution animator shows the distribution history of user-selected arrays. This feature is importantfor DYNAMIC arrays which can be redistributed during execution of a program (see also Section 5.6).

Figure 9 gives an example of the distribution animator (see upper left window) which has similarfunctionality as a player console. In order to browse and replay the redistribution sequence of arrays aprogrammer can use the following control buttons: play (distribution sequence), stop (browsing throughdistribution sequence), goto first (distribution in sequence), and goto last (distribution in sequence).The distribution sequence can be examined step-by-step or played continuously. A programmer canexamine the changes of array distributions by selecting certain array elements and by watching themmigrating among processors.

Working with GDDT

GDDT can either be directly invoked from SPiDER in order to visualize data, or it can be employedpost-mortem based on a previous debugging session during which appropriate data have been writtento a file. In the first case, the user must issue the view debugger command with appropriatesubcommands: distribution , values, history followed by an array name. Among otherthings, these subcommands can be used to minimize the visualization time, if the user chooses alimited set of array properties to be displayed. By default (without subcommand specified) all datais generated for all possible visualizations. The view command automatically invokes GDDT (usingthe interfaces described in Section 5.5) in order to visualize the generated array data. Alternatively, theuser can employ GDDT for post-mortem analysis of the program’s data by starting the visualizer fromthe command line. Post-mortem analysis is based on data generated in a preceding debugging session.If a fatal error occurs in a debugged program, then some of the data which is about to be written in afile can get lost. In this case, post-mortem visualization is likely to be disabled.

7. APPLICATIONS

SPiDER has been fully implemented with all the functionality described in this paper. SPiDER iscurrently based on DETOP version 1.1, GDDT version 1.1, and VFC version 2.0, and currently runsunder Sun Solaris 7. VFC generates message-passing programs based on MPI library mpich 1.1.2.In this section we present three test cases in order to examine the usefulness of SPiDER. These are asystem for pricing of financial derivatives [6] developed by Professor Dockner’s group at the Universityof Vienna, a system for quantum mechanical calculations of solids [7] developed by Professor Schwarzand his group at the Vienna University of Technology, and an alternating direction implicit (ADI) code[17] which implements a well known and effective method for solving partial differential equations intwo or more dimensions.

7.1. Pricing of financial derivatives

The pricing of derivative products is an important field in finance theory. A derivative (or derivativesecurity) is a financial instrument whose value depends on other, so-called underlying, securities [18].

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 25: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

SPiDER 127

0.00

0.02

0.04

0.06

0.08

0.10

0.12

rate

0 1 2 3 4 5

time

pmid = 0.67

pdown = 0.16

pup = 0.16

.......................................................................

........

..........................................................

..........................................................

....................................................

.......................................

.......................................

.......................................

..................

..................

............................

...................................................................

..................

.................

..................................................................................................................................................

..................

..................

........

.......................................

.......................................

...................

...................

............................................

.......................................

..............

..............

..............

........

....................

...................

..............................

....................................................

............................................

...................

...................

............................................

.......................................

....................................................................

....................................................................

...............................................................

.......................................

.......................................

....................

....................

.........................................

.......................................

..............

..............

..............

.......

....................

....................

.........................................

.......................................

.....................................................................................................................................................................

....................

....................

.........................................

.......................................

....................

....................

.........................................

.......................................

..............................................................

...........................................................

........................................

..............

..............

..............

.......

......................

...........................................................

........................................

...........................................................................................................................

....................................................................................................................

..............................................................

...........................................................

Figure 10. A Hull and White tree for the �t spot rate with selected path.

Examples are stock options and variable coupon bonds, the latter paying interest-rate-dependentcoupons. The pricing problem can be stated as follows: What is the price today of an instrumentwhich will pay some cash flows in the future, depending on the development of an underlyingsecurity, e.g. stock prices or interest rates? For simple cases analytical formulas are available, but fora range of products, whose cash flows depend on a value of a financial variable in the past—so-calledpath-dependent products—Monte Carlo simulation techniques have to be applied [19]. By utilizingmassively parallel architectures very efficient implementations can be achieved [20].

The Monte Carlo simulation is based on a discrete representation of a stochastic process thatdescribes the dynamics of the underlying security over time [21]. In the case of interest-rate-dependentproducts, the Hull and White tree describes the future development of the short rate, which is used tocalculate the entire interest rate curve for a specific state of the system [18]. Each state is representedby a node in a directed graph and has three successor nodes, representing increasing, constant, anddecreasing interest rates. Nodes are described by (time, interest rate) pairs. Arcs are labeled with thetransition probabilities pup, pmid, and pdown. A state can be reached by more than one predecessor; thisrecombining property establishes a lattice structure. Figure 10 shows a Hull and White tree with time(for instance, in years) on the horizontal axis and interest rates on the vertical axis.

To price interest-rate-dependent products the interest rate tree is used either to solve it backwards intime or by simulating paths through the tree and averaging the corresponding prices. The Monte Carlosimulation algorithm selects a number of N paths in the Hull and White tree from the root node to somefinal node (see Figure 10). Along each path, it iteratively discounts, backwards from the final node tothe root node, the cash flow generated by the instrument along this path. For variable coupon bonds,the cash flows are path dependent, for instance they depend on the interest rates at predecessor nodes.Discounting is performed using the interest rates along this path. The resulting price of the instrument isthe mean value over all selected paths. The group of Professor Dockner at the Department of BusinessAdministration, University of Vienna, developed the pricing system [6,16] as an HPF application.

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 26: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

128 T. FAHRINGER ET AL.

...!HPF$ PROCESSORS :: PR(NUMBER OF PROCESSORS())!HPF$ DISTRIBUTE (BLOCK) ONTO PR :: VALUE...

! the bond to be pricedTYPE(BOND) :: B

! path in the Hull and White treeINTEGER :: PATH(0: N STEPS )

! all path resultsREAL(DBLE) :: VALUE(1: N )

!HPF$ INDEPENDENT, NEW(PATH), ON HOME(VALUE(I))DO I = 1, N! select a path starting at node (0,0)PATH = RANDOM PATH(0,0,N)

! discount the bond’s cashflow to time 0VALUE(I) = DISCOUNT(0,CASH FLOW(B,1,N),FACTORS AT(PATH))END DO

! mean valuePRICE = SUM(VALUE)/N

...

Figure 11. HPF DO INDEPENDENT code of the pricing system.

VFC was used to parallelize the pricing system and SPiDER to debug and to control the numericalbehavior of this application. The HPF/Fortran 90 code segment in Figure 11 shows the main loop ofthe simulation procedure TRAVERSE DISCOUNT. The cash flow generated by an instrument is storedin array VALUE.

In Figure 5 we show a snapshot of the SPiDER debugging session with the pricing system stoppedin procedure TRAVERSE DISCOUNT at a specific path in the Hull and White tree. The generated cashflow values are stored in array VALUE which consists of 5904 elements.

SPiDER provides a multiple process view of the pricing system. A programmer can either monitorand control a single process or inspect the entire program. SPiDER displays a list of all processesthat are currently executing the program. A programmer can switch among them or select a groupof processes based on which debugger commands can be invoked. In the source code frame thecurrent execution point is displayed with several breakpoints set in procedureTRAVERSE DISCOUNT.When execution reaches a breakpoint, either the processes for which the breakpoint has been set orall processes of the program are stopped. The state frame shows the current backtrace of procedurecalls. It can be seen that the main program started procedure au which in turn invoked procedureTRAVERSE DISCOUNT. Window ‘Processor Array’ shows processor array PR(1:4) with twoprocessors PR(2:3) selected by a programmer. Window ‘Array Values’ displays the element valuesfor VALUE(2940:2970). The value range is between 14.19 and 89.63. Window ‘Data Array’ showsthe mapping relationship between processor and selected array elements.

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 27: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

SPiDER 129

Sr

O

Ti

HNS code

crystal structure, atoms, atomic positions

Define Problem:

HC = ESC

where

Hn;m =< �

njHMT jm >sp + < �

njHNSjm >sp + < �

njHjm >int

Snm =< �

njm >sp + < �

njm >int

Setup Generalized Eigenvalue Problem

Solve HC = ESC

using Lapack (ScaLapack)

Figure 12. Computation of a crystal structure using WIEN97.

A programmer can operate on distributed arrays in the same way as for arrays in sequential programsor for replicated arrays. Displaying or modifying values of distributed arrays does not require aprogrammer to know anything about array distribution or mapping. For large arrays, as in the caseof the pricing system, a programmer can slice arrays and view or operate on selected parts of an array.This feature of SPiDER has been helpful, in particular, in the debugging of code errors and in the studyof the numerical behavior of the algorithm.

7.2. Quantum mechanical calculations of solids

A material science program package called WIEN97 [7] has been developed by the group of ProfessorSchwarz, Institute of Physical and Theoretical Chemistry, Vienna University of Technology. WIEN97

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 28: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

130 T. FAHRINGER ET AL.

...!HPF$ PROCESSORS :: PR(NUMBER OF PROCESSORS())!HPF$ DISTRIBUTE(*,CYCLIC) ONTO PR :: H...DO 60 I = 1, N

!HPF$ INDEPENDENT, ON HOME (H(:,J))DO 70 J = 1, I

H(I,J) = H(I,J) + A1R(1,J)*A2R(1,I)H(I,J) = H(I,J) - A1I(1,J)*A2I(1,I)H(I,J) = H(I,J) + B1R(1,J)*B2R(1,I)H(I,J) = H(I,J) - B1I(1,J)*B2I(1,I)

70 CONTINUE60 CONTINUE...DO 260 I = N+1, N+NLO

!HPF$ INDEPENDENT, ON HOME (H(:,J))DO 270 J = 1, I

H(I,J) = H(I,J) + A1R(1,J)*A2R(1,I)H(I,J) = H(I,J) - A1I(1,J)*A2I(1,I)H(I,J) = H(I,J) + B1R(1,J)*B2R(1,I)H(I,J) = H(I,J) - B1I(1,J)*B2I(1,I)H(I,J) = H(I,J) + C1R(1,J)*C2R(1,I)H(I,J) = H(I,J) - C1I(1,J)*C2I(1,I)

270 CONTINUE260 CONTINUE...

Figure 13. HNS based on HPF DO INDEPENDENT.

is based on density functional theory and the LAPW method [22] which is one of the most accuratemethods to theoretically investigate the properties of high technology materials.

WIEN97 calculates the electronic structure of solids. Figure 12 describes the principle tasks of sucha calculation. After the definition of the problem, a generalized eigenvalue problem must first be setup and then solved iteratively (for instance many times) leading to energies (eigenvalues, E) and thecorresponding coefficients (eigenvectors, C). The size (N) of the corresponding Hamilton (H ) andOverlap (S) matrices is related to the accuracy of the calculation and thus to the number of plane wave(PW) basis functions. About 50–100 PWs are needed per atom in the unit cell. For systems containingfrom 50 up to 100 atoms per unit cell, matrices of size 2500 to 10 000 must be handled.

One of the most computational intensive parts of WIEN97 comprises of the setting up the matrixelements of H and S, which are complicated sums of various terms (integrals between basis functions).A large fraction of this time is spent in the subroutine HNS (see Figure 13), where the contributions toH due to the non-spherical potential are calculated.

In HNS radial and angular-dependent contributions to these elements are precomputed andcondensed in a number of vectors which are then applied in a series of rank-two updates to thesymmetric (Hermitian) Hamilton matrix. HNS has 17 one-, 14 two-, 5 three-, and 6 four-dimensionalarrays. The computational complexity of HNS is of the order O(N2). All floating point operations aredone in double (8 bytes) precision.

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 29: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

SPiDER 131

H, which is the main HNS array, has been distributed CYCLIC [23] in the second dimension ontothe maximum number of processors (HPF intrinsic function NUMBER OF PROCESSORS) that areavailable on a given architecture. In order to achieve good work distribution, the CYCLIC distributionhas been chosen due to triangular loop iteration spaces. Figures 6 and 7 show two snapshots of an HNSdebugging session under SPiDER. The parallel HNS version has been executed on a cluster of SUNworkstations. Like the pricing system, WIEN97 requires sophisticated support to visualize distributedarrays and the mapping relationship of data to processors. The window ‘Data Array’ in Figure 6 showsthe cyclic distribution of array H of the HNS code. The individual colors of array elements emphasizethe owning processor in window ‘Processor Array’. GDDT’s rich set of display customization (slicing,scaling, and rotation) enables the exploration of the view of array H. The graphical representation ofdistributed data is complemented by the visualization of array values. The global view of array values(see window ‘Array Values’) allows a programmer to quickly identify array elements with a possibleinaccurate value. By incorporating GDDT’s ability to store various array snapshots, the position in thesource code where an inaccurate value has been assigned can be found quite quickly. Among otherthings SPiDER has been used to locate an erroneous initialization of array H in the HNS code. Allelements of H should have values in the range between 0 and 1. Moreover, only the lower left triangleof H should have values different from 0. The window ‘Array Values’ of Figure 6 clearly displays arrayelement values above 1 and also shows that array H is not triangular. Several array snapshots weremade which quickly enabled the programmer to detect that this bug was caused by an initializationprocedure. Figure 7 shows the values of array H after eliminating this bug. The upper two windowsshow array elements at different iterations of a timing loop. The differences in values can be visualizedby another feature of GDDT which is shown in the lower left window. Comparing array values atdifferent execution points again allows the numerical behavior and, in particular, the convergence rateof the underlying algorithm to be checked.

7.3. Alternating direction implicit

ADI (alternating direction implicit) [17] is a well known and effective method for solving partialdifferential equations in two or more dimensions. An excerpt of the ADI code is shown in the ‘SPiDERwindow’ of Figure 9.

The ADI code contains an outermost loop whose body is subdivided into two computational phases:phase-1 comprises a forward and backward sweep along the rows of array U followed by phase-2, aforward and backward sweep along the columns. Phase-1 and phase-2, respectively, favor column-and row-wise distributions of array U. Let us assume that an inexperienced programmer includedan HPF REDISTRIBUTE directive immediately between phase-1 and phase-2 that dynamicallydistributed array U CYCLIC in the first dimension and BLOCK in the second dimension. A breakpointcan be set at line 51 (right before phase-2) in order to examine the data distribution and mappingrelationship of data to processors for array U by using the distribution viewer of GDDT (the twomiddle-left windows of Figure 9). By examining the loop nest starting at line 51 (phase-2) it can be seenthat the row-wise distribution for array U should achieve better performance than the CYCLIC, BLOCKdistribution. A programmer can now interactively change the distribution of array U accordingly, whichis confirmed in the output frame of the ‘SPiDER window’.

For complex applications in which the distribution of a given array may change several times duringexecution the programmer can see an animated replay of the redistribution sequence by opening a

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 30: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

132 T. FAHRINGER ET AL.

data array viewer and selecting the window ‘Distribution Animator’ (see the upper left window inFigure 9). SPiDER offers stepwise or continuous replay. A programmer can observe the migration ofarbitrary array elements by selecting them in the data array viewer. For every redistribution selected inthe distribution animator, the corresponding targets in the processor array viewer change their location,which shows how array elements migrate from one set of processors to another. Finally, a programmermay also examine array values, which can be done by invoking the array value viewer (see the upperright window).

8. RELATED WORK

The simplest approach to debugging parallel programs is to start a serial debugger on every singleprocess of the parallel program and to provide some kind of communication facility among theprocesses. However, only a few interesting HPF debugging challenges would be addressed by thisbasic debugging approach. Despite much activity in parallel software development, debugging parallelapplications, in particular debugging HPF applications, has not been adequately supported so far. Mostdebuggers lack sophisticated support to present the user with a single control flow and data space. Theexperimental HPF debugger Aardvark [24] from DIGITAL represents a system that addresses many ofthe HPF debugging challenges. Aardvark introduces the concept of logical entities (an abstraction thatexists within the debugger) which groups together several related physical entities and synthesizesa single view or behavior from them. Derived from this concept, a logical process contains a setof physical processes, and provides algorithms for controlling an HPF program. A logical stackcollects the physical stack frames and provides mechanisms for collecting physical stack frames.A logical breakpoint collects the physical breakpoint representations and provides mechanisms forsetting and managing breakpoints with actions or conditions associated with the logical breakpoint.Regarding execution control, Aardvark implements a time-stamping execution state transition policywhich addresses the problem of determining a consistent stop state of a program. Although the conceptof logical entities covers much of the control complexity and data management, Aardvark does notensure a consistent view of the HPF program execution when processes stop at the same place but indifferent iterations of a loop. The debugger mistakenly presents this state as synchronized and presentsdata as if it were consistent. In addition, Aardvark does not provide a mechanism for recovering theconsistent state of a program, for instance, when a program was interrupted by the user or an exceptionoccurred. In this scenario it might happen that processes are stopped at different places in a program.Aardvark does not try to advance the execution to a consistent state. This means that in the worstcase when every process is stopped at a different execution point, Aardvark works as a multiprocessdebugger (stacks and states of individual processes are presented). However, Aardvark’s concept oflogical entities makes it easier—compared to SPiDER—for a user to detect inconsistencies in theexecution state of an HPF program. Aardvark provides a text-based user interface without a supportfor displaying/visualizing array distributions.

Many tools (e.g. PDT [25], TotalView [26]), support debugging of HPF programs by providing adisplay of the source code and global data visualization for viewing entire arrays and array segmentsallocated across processors. PDT supports global data visualization and replays for race conditions atthe message-passing level. TotalView allows source-level debugging of HPF programs compiled withthe PGI HPF compiler [13]. TotalView allows one to display the HPF source code and set breakpoints

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 31: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

SPiDER 133

in an HPF program. In terms of execution control, TotalView and SPiDER are similar by supportingprocess grouping features. Both tools do not provide a unified view of call stacks. In order to examineand modify data, TotalView permits automatic updating of all copies of replicated scalar variables, thedisplay of distributed arrays, and visualization of data distributions. Array values can be visualized asgraphs or as two or three-dimensional surfaces. Only two array dimensions can be visualized at a time.The importance of distributed data visualization has been recognized in other data-parallel debuggers.Prism [27] provides various displays based on histograms, multidimensional graphs, surfaces, andvector representations of program data including support for data-parallel programs written in CM-Fortran. The most recent version of Prism [28] supports debugging of MPI applications by visualizingdata values and distributions according to distribution descriptions provided by the programmer. Thereare other systems that visualize data distributions (e.g. HPF-Builder [29]) at compile time, but theylack the ability to examine intricate details of dynamically changing data distributions. One of themost advanced systems in this field is DAQV [30], which has been designed for visualization of HPFprograms. It is not a debugger by itself but rather a framework for accessing and modifying dataat runtime in order to simplify visualization and computational steering. CUMULVS (CollaborativeUser Migration User Library for Visualization and Steering) [31] is a software framework that enablesprogrammers to incorporate fault-tolerance, interactive visualization, and computational steering intoexisting parallel programs. The CUMULVS software consists of two libraries—one for the applicationprogram and one for the visualization and steering front-end (called the‘viewer’). CUMULVS collectsand transfers distributed data to the viewers and enables parameters in the application to be steered.It also manages the dynamic attachment and detachment of multiple independent viewers to a runningparallel application.

Recent activities to support the development of applications that run on large-scale computationalgrids try to cope with heterogeneous computing platforms and to provide sufficiently abstract andscalable operations for examining and controlling program execution. The p2d2 [32] debuggeraddresses these requirements by providing various degrees of abstractions: a top-level view focusingon the entire grid showing which processes are running or stopped, an intermediate-level viewsummarizing the state of processes in user-selected process groups, and a low-level view providingfull information about a single process. It also supports the visualization of data distributions but onlyfor simple, structured distribution types. Additionally, if the program has been parallelized without theuse of parallelization support tools, p2d2 prompts the user to provide distribution information via adialog box.

9. CONCLUSIONS AND FUTURE WORK

In this paper we have described SPiDER which is an interactive, source-level debugging system forboth regular and irregular HPF programs. SPiDER combines a base debugging system for message-passing programs with a high-level debugger that interfaces to an HPF compiler. A sophisticatedvisualization system has been developed and integrated into SPiDER for data distribution steeringand animation as well as visualization and comparison of array values. The main novel features ofSPiDER are the following.

• Besides regular applications, SPiDER also supports irregular codes with highly dynamicbehavior including indirect array accesses.

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 32: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

134 T. FAHRINGER ET AL.

• Arrays can be dynamically redistributed at well-selected execution points which are controlledby the underlying compiler and SPiDER.

• Convenient facilities to navigate through distributed data arrays and logical processor arrays areprovided, with an emphasis on the mapping relationships between data and processors.

• An automatic replay feature enables the user to browse and replay array distribution sequences,which supports the examination of data redistributions during execution of a program.

• Array snapshots can be taken to store all values of an array at a specific execution point.Sophisticated visualization technology has been developed to examine and compare arraysnapshots which, for instance, enables the observation of the numerical behavior (e.g.convergence rate) of applications.

• The quality of data distributions can be examined using a load diagram, which visualizes howmany data elements have been mapped to each process in a parallel program.

SPiDER combines the most useful capabilities of many existing debuggers and providesnovel visualization and data distribution steering functionality for both regular and irregular datadistributions.

During the development of SPiDER we were faced with several problems. First, the runtimeinformation about program data was mostly controlled by a compiler runtime library. Some information(e.g. data alignment information) was not available at runtime. A generic interface to runtimeinformation that can be controlled by the debugger should be of paramount importance to improveportability and to extend the range of runtime information available to a debugger.

In future work we will enhance SPiDER by presenting a single control flow of a programbeing debugged instead of a multiple process view. Moreover, we plan to extend existing SPiDERtechnology to support metacomputer applications based on OCM (see Section 4) which is a distributedmonitoring environment [33,34,35]. The target language will be JavaSymphony [36] which is a high-level coordination language for distributed and heterogeneous applications. The main challenge for aJavaSymphony Debugger is to cope with extensive and easy-to-use process-grouping capabilities thatenable the control and inspection of JavaSymphony processes. For this feature, OCM must be extendedwith a set of new objects and services.

ACKNOWLEDGEMENT

This research is partially supported by the Austrian Science Fund as part of Aurora Project under contractSFBF1104.

REFERENCES

1. Oberhuber M, Wismuller R. DETOP—An interactive debugger for PowerPC based multicomputers. Parallel Programmingand Applications, Fritzson P, Finmo L (eds.). IOS Press: Amsterdam, 1995; 170–183.

2. Bubak M, Funika W, Gembarowski R, Hodurek P, Wismuller R. Enhancing OCM to support MPI applications. Proceedingsof the International Conference on High Performance Computing and Networking, Amsterdam, The Netherlands, April1999 (Lecture Notes in Computer Science, vol. 1593), Sloot P, Bubak M, Hoekstra A, Hertzberger B (eds.). Springer,1999; 1274–1277.

3. Bubak M, Funika W, Młynarczyk G, Sowa K, Wismuller R. Symbol table management in an HPF debugger. Proceedingsof the 7th International Conference, HPCN Europe 1999, Amsterdam, The Netherlands, April 1999. Springer, 1999; 1278–1281.

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 33: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

SPiDER 135

4. Benkner S. VFC: The Vienna Fortran Compiler. Scientific Programming 1999; 7(1):67–81.5. Koppler R, Grabner S, Volker J. Visualization of distributed data structures for HPF-like languages. Scientific Programming

(Special Issue on High Performance Fortran Comes of Age) 1997; 6(1):115–126.6. Dockner E, Moritsch H. Pricing constant maturity floaters with embedded options using Monte Carlo simulation. AURORA

Technical Report AuR 99-04, University of Vienna, January 1999.7. Blaha P, Schwarz K, Luitz J. WIEN97, full-potential, linearized augmented plane wave package for calculating crystal

properties. Institute of Technical Electrochemistry, Vienna University of Technology, Vienna, Austria, 1997. ISBN3-9501031-0-4.

8. Benkner S. HPF+—High Performance Fortran for advanced industrial applications. Proceedings of HPCN 98, Amsterdam,The Netherlands, April 1998. Springer, 1998.

9. Brezany P, Bubak M, Czerwinski P, Koppler R, Sowa K, Volkert J, Wismuller R. Advanced symbolic debugging of HPFprograms with SPiDER. Proceedings of SC’99, Portland, OR, U.S.A., November 1999. ACM, 1999.

10. Wismuller R, Trinitis J, Ludwig T. OCM—a monitoring system for interoperable tools. Proceedings of the 2ndSIGMETRICS Symposium on Parallel and Distributed Tools SPDT’98, Welches, OR, USA, August 1998. ACM Press,1998.

11. Koppler R, Grabner S, Volker J. Design and visualization of irregular data distributions. Technical Report DeliverableD1V-3, Institute for Computer Science, Johannes Kepler University Linz, PACT Consortium, CEI, May 1995.

12. Ludwig T, Wismuller R, Sunderam V, Bode A. OMIS—On-Line Monitoring Interface Specification Version 2.0 (LRR-TUMResearch Report Series, vol 9). Shaker: Aachen, 1997.

13. The Portland Group, Wilsonville, Oregon. PGHPF High Performance Fortran Manual, 1998. http://www.pgroup.com.14. Gamma E. Objektorientierte Software-Entwicklung mit ET++. Springer: Berlin, 1992.15. Message Passing Interface Forum. MPI-2: Extensions to the Message-Passing Interface, July 1997.16. Fahringer T, Blaha P, Hossinger A, Luitz J, Mehofer E, Moritsch H, Scholz B. Development and performance analysis

of real-world applications for distributed and parallel architecture. AURORA Technical Report TR1999-16, University ofVienna, August 1999. http://www.vcpc.univie.ac.at/aurora/publications/.

17. Vetterling W, Teukolsky S, Press W, Flannery B. Numerical Recipes: Example Book (FORTRAN). Cambridge UniversityPress: Cambridge, 1990.

18. Hull JC. Options, Futures, and Other Derivatives. Prentice Hall: Englewood Cliffs, NJ, 1997.19. Boyle P, Broadie M, Glasserman P. Monte Carlo methods for security pricing. Journal of Economic Dynamics and Control

1997; 21:1267–1321.20. Hutchinson J, Zenios S. Financial simulations on a massively parallel connection machine. The International Journal of

Supercomputer Applications 1991; 5(2):27–45.21. Hull JC, White A. One factor interest rate models and the valuation of interest rate derivative securities. Journal of Financial

and Quantitative Analysis 1993; 28:235–254.22. Schwarz K, Blaha P. Description of an LAPW DF program (Wien95). Lecture Notes in Chemistry 1996; 67:139–153.23. High Performance Fortran Forum. High Performance Fortran language specification. Scientific Programming 1993; 2(1–

2):1–170.24. LaFrance-Linden DCP. Challenges in designing an HPF debugger. DIGITAL Technical Journal 1997; 9(3):50–64.25. Clemencon C, Fritscher J, Ruhl R. Visualization, execution control and replay of massively parallel programs within

Annai’s debugging tool. Technical Report TR-94-11, Swiss Center for Scientific Computing, 1994.26. Etnus LLC. TotalView User’s Guide, version 5.0. August 2001.27. A Sun Microsystems, Inc. Business. PrismT M 5.0 User’s Guide. July 1997.28. Sistare S, Dorenkamp ELE, Nevin N. MPI support in the PrismTM programming environment. Proceedings of the SC99

Conference, Portland, Oregon, USA, November 1999. ACM Press, 1999.29. Lefebvre C, Dekeyser J-L. Visualisation of HPF data mappings and of their communication cost. Proceedings of

VECPAR’98, Porto, Portugal, June 1998.30. Hackstadt ST, Malony AD. DAQV: Distributed array query and visualization framework. Theoretical Computer Science,

Special Issue on Parallel Computing 1998; 196(1–2):289–317.31. Geist GA, Kohl JA, Papadopoulos PM. CUMULVS: Providing fault-tolerance, visualization and steering of parallel

applications. International Journal of High Performance Computing Applications 1997; 11(3):224–236.32. Hood R, Jost G. A debugger for heterogenous grid applications. Proceedings of the 9th Heterogeneous Computing

Workshop, Cancun, Mexico, 2000.33. Bubak M, Funika W, Zbik D, van Albada D, Iskra K, Sloot P, Wismuller R, Sowa-Piekło K. Performance measurement,

debugging and load balancing for metacomputing. Proceedings of ISThum 2000—Research and Development for theInformation Society, Poznan, Poland, April 2000; 409–418.

34. Bubak M, Funika W, Balis B, Wismuller R. On-line tool support for parallel applications. Proceedings of the InternationalConference on High Performance Computing and Networking, Amsterdam, The Netherlands, June 2001 (Lecture Notes inComputer Science, vol. 2110). Springer, 2001; 415–424.

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136

Page 34: SPiDER—An advanced symbolic debugger for Fortran 90/HPF programs

136 T. FAHRINGER ET AL.

35. Bubak M, Funika W, Balis B, Wismuller R. Proposal of the tool support for grid application monitoring. PIONIER 2001,Conference Proceedings, Poznan, Poland, 2001. Instytut Informatyki Politechniki Poznanskiej, 2001; 149–154.

36. Fahringer T. JavaSymphony: A system for development of locality-oriented distributed and parallel Java applications.Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER 2000), Chemnitz, Germany, 2000.IEEE Computer Society, 2000. www.par.univie.ac.at/project/javasymphony.

Copyright 2002 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2002; 14:103–136