computer and automation research institute hungarian academy of sciences the p-grade visual parallel...

Post on 20-Dec-2015

216 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Computer and Automation Research InstituteComputer and Automation Research Institute

Hungarian Academy of SciencesHungarian Academy of Sciences

The P-GRADE The P-GRADE Visual Parallel Programming EnvironmentVisual Parallel Programming Environment

Péter KacsukLaboratory of Parallel and Distributed Systems

MTA SZTAKI Research Institute

kacsuk@sztaki.hu

www.lpds.sztaki.hu

Problems of Developing Parallel Problems of Developing Parallel ProgramsPrograms

High-SpeedSwitch

Observing?

Programming?

Our Solution: P-GRADEOur Solution: P-GRADE

P-GRADE is a parallel programming environment which supports the whole life-cycle of parallel program development

For non-specialist programmers it provides a complete solution for efficient and easy parallel program development

Fast reengineering of sequential programs for parallel computers

Unified graphical support in program design, debugging and performance analysis

Portability on supercomputers and heterogeneous workstation/PC clusters based on PVM and MPI

Tools of P-GRADETools of P-GRADE

• GRAPNEL: Hybrid Parallel Prog. Language – Graphics to express parallelism– C/C++ to describe sequential parts

• GRED: Graphical Editor• GRP2C: Pre-compiler to (C/C++)+(PVM/MPI)• DIWIDE: Integrated distributed debugger and

animation system• GRM: distributed monitoring system• PROVE: Integrated visualisation tool

Parallel Program Design

GRAPNEL

GRED

Mapping

User mapping

GRP file

Pre-compilation

GRP2C

C source code, Cross-ref file, Make file

Building executables

C compiler, linker

GRP-PVM

GRM Library

PVM Library

GRP-MPI

GRM Library

MPI Library

executables

Trace file

Monitoring

GRM

Visualisation

PROVE

Life-cycle of Life-cycle of Parallel Parallel Program Program Development Development and its and its support in support in P-GRADEP-GRADE

GRP file

Debugging

DIWIDE

Design Goals of GRAPNELDesign Goals of GRAPNEL

• Graphical interface – to define all parallel activities– Strong support for hierarchical design– Visual abstractions to hide the low level details of

message-passing

• C/C++ (or Fortran) to describe sequential parts – Strong support for parallelizing sequential

applications– Support for programming in large– No steep learning curve

• GRAPNEL = (C/C++) + graphics

GRAPNEL: GRaphical Process NEt LanguageGRAPNEL: GRaphical Process NEt Language

• Programming paradigm: message-passing– component processes run in parallel and can

interact only by means of sending and receiving messages

• Communication model:– point-to-point, synchronous/asynchronous– collective (e.g. multicast, scatter, reduce, etc.)

• Process model:– single processes– process groups– predefined process communication templates

Three layers of GRAPNELThree layers of GRAPNEL

GRAPNELGRAPNEL

Hierarchical design levels: Graphics used at application

level: Defines interprocess

communication topology Port protocols

Graphics hides PVM/MPI function calls

Support for SPMD programming style Predefined communication

patterns Automatic scaling of parallel

programs

Communication TemplatesCommunication Templates

• Pre-defined regular process topologies– process farm– pipeline– 2D mesh– tree

• User defines: – representative

processes– actual size

• Automatic scaling

Mesh TemplateMesh Template

Tree TemplateTree Template

The process farm The process farm parallelisation approachparallelisation approach

Master

Send work packagessend();

Collect resultsrecv();

Slave1 Slave2 SlaveN

spawn(N);

The code of each slave is the same.

Parallelising the Mandelbrot set Parallelising the Mandelbrot set computationcomputation

Draw process outputDraw process output

Compute process inputCompute process input

Compute process outputCompute process output

Draw process inputDraw process input

Process GroupsProcess Groups

• Hierarchical design(subgraph abstraction)

• Collective communication(group ports)– multicast– scatter– gather– reduce

GRAPNELGRAPNEL

Hierarchical design levels: Graphics used at process

internal level C/C++ used at the text level

Synch/asynch. comm. Programming in large:

Any C/C++ library call can be included in text blocks

Graphical support for object-based programming

GRAPNELGRAPNEL

Structuring facility by macro graphs

multicastgather

Userdef(grp_in)

reduce

scatter

Point-point

gatherscatter

Userdef(grp_out)

GRAPNELGRAPNEL

Parallelising the Parallelising the Mandelbrot set Mandelbrot set

computationcomputation

Parallelising the Parallelising the Mandelbrot set Mandelbrot set

computationcomputation

Parallelising the Parallelising the Mandelbrot set Mandelbrot set

computationcomputation

Parallelising the Parallelising the Mandelbrot set Mandelbrot set

computationcomputation

GRED EditorGRED Editor

Supports the creation of all the elements of GRAPNEL

Drag-and-drop style of drawing

Cut/copy/paste/move on graphical objects

Automatic port positioning with minimal lengths and crossing of communication channels

GRED EditorGRED Editor

Extremely easy and fast construction of process graph Automatic arrange of the

process graph Automatic resizing of process

windows

Cut/copy/paste on graphical objects

Macro graph construction at arbitrarily nested level

C/C++ code can be edited by any standard text editor

GRP2C Pre-compilerGRP2C Pre-compiler

• Automatic generation of PVM and MPI calls based on GRAPNEL graphics

GRP2CC/C++

graphics

GRAPNEL

• Automatic code instrumentation for debugging and performance monitoring

C/C++

PVM/MPI

Generated code

Debugging Parallel ProgramsDebugging Parallel Programs

High-SpeedSwitch

Observing?

Principle of sequential program Principle of sequential program debuggingdebugging

• Reproducibility - determinism– For the same input set the sequential

program delivers always the same output set (even if the program is incorrect)

• Used technique: cyclic debugging– breakpoints– step-by-step execution

Problem of parallel program Problem of parallel program debuggingdebugging

• Non-reproducibility (non-determinism)– For the same input set the incorrect parallel

program can deliver different output sets

• Cyclic debugging cannot be used– breakpoints– step-by-step execution

Classification of parallel Classification of parallel debuggersdebuggers

Parallel runningseq. debuggers

Replayabledebuggers

Monitor&replay Control&replay

DIWIDEDIWIDE DebuggerDebugger

Graphical and C/C++ level debug support (breakpoints, variable inspection, etc.)

3 kinds of “step by step execution”, according to the programmer’s demand: Instruction by instruction, Graphical item by graphical

item, Macrostep by macrostep

Visualisation and animation support

Hierarchical Hierarchical Debugging by Debugging by

DIWIDEDIWIDE

Classification of parallel Classification of parallel debuggersdebuggers

Parallel runningseq. debuggers

Replayabledebuggers

Monitor&replay Control&replay

Classification of parallel Classification of parallel breakpointsbreakpoints

Local breakpoints

Global breakpoints

Individual breakpoints

Collectivebreakpoints

Principle of Macrostep DebuggingPrinciple of Macrostep Debugging

Parallel debugging is as easy as debugging traditional sequential programs.

Macrosteps Collective Breakpoints

M0 = {S1-> A1, S2-> A2, S3-> A3} A1 A2 A3

M1 = {A1-> B1, A2-> B2, A3-> B3} B1 B2 B3

M2 = {B1-> B1, B2-> C2, B3-> B3} B1 C2 B3

M3 = {B1-> B1, C2-> D2, B3-> E3} B1 D2 E3

M4 = {B1-> E1, D2-> E2} E1 E2

whereSi = Starti and Ei = Endi

P1 P2 P3

S1

A1

B1

E1

S2 S3

A2

A3

B2

C2

D2

E2

B3

E3

Macrostep DebuggingMacrostep Debugging

Support for systematic

debugging to handle non-

deterministic behaviour of

parallel applications

Systematic and automatic

generation of Execution Trees

Testing parallel programs for all

time conditions

Replay technique with

collective breakpoints

Automatic Deadlock Detection by Automatic Deadlock Detection by Macrostep DebuggingMacrostep Debugging

Integration of Integration of Macrostep Macrostep Debugging Debugging and PROVEand PROVE

Performance monitoring and analysis Performance monitoring and analysis of Parallel Programsof Parallel Programs

High-SpeedSwitch

Observing?

Visualisation SystemsVisualisation Systems

Scientific (Data

Oriented) Visualisation

Scientific (Data

Oriented) Visualisation

Program Visualisation

Program Visualisation

Problem Visualisation

(Alg. Animation)

Problem Visualisation

(Alg. Animation)

Correctness Debugging

Correctness Debugging

Performance (Debugging) Visualisation

Performance (Debugging) Visualisation

Combined Visualisation

Combined Visualisation

Goal of visualisation?

What to visualise?

Program Visualisation

Program Visualisation

Correctness Debugging

Correctness Debugging

Performance Visualisation

Performance Visualisation

Combined Visualisation

Combined Visualisation

Goal of visualisation?

Off-lineOff-lineOn-lineOn-line Semi On-lineSemi On-line

When to visualise?

Phases of Performance Visualisation

Source CodeInstrumentation

(GRAPNEL/GRED)

Runtime Monitoring

(GRM)

Visualisation

(PROVE)

Data Analysis

(PROVE)

Performance

Visualisation

Performance

Visualisation

Scalability(Data handling)

Scalability(Data handling)

Source CodeInstrumentation

Source CodeInstrumentation

Versatility(Visualisation)

Versatility(Visualisation)

Evaluation Criteria

Source Code Instrumentation

Source Code Instrumentation

Manual or Automatic

Manual or Automatic

Monitoring modes

Monitoring modes

FilteringFiltering Click-back facility

Click-back facility

Selectable program

units

Selectable program

units

Individual Events

Individual Events

On/off facility

On/off facility

StatisticsStatistics

ScalabilityScalability

Data Acquisition

Data Acquisition

Data Analysis & Display

Data Analysis & Display

Turning tracing on/off

Turning tracing on/off

FilteringFiltering Zooming Zooming FilteringFiltering

Interactive Interactive Non-Interactive Non-Interactive

VISTOP Nupshot

VersatilityVersatility

Interoperate with other tools

Interoperate with other tools

Different viewsDifferent views

Event views

Event views

NoNoStatistics views

Statistics views

YesYes

Standalone Performance Standalone Performance Analysis ToolsAnalysis Tools

• VAMPIR• Pablo• ParaGraph• AIMS• Paradyn

VAMPIRVAMPIR

Integrated Performance Integrated Performance Analysis ToolsAnalysis Tools

• VISTOP (TOPSYS)

• PVMVis (EDPEPPS)

• PROVE (GRADE)

Source Code Instrumentation

Source Code Instrumentation

Automatic Automatic Monitoring modes

Monitoring modes

FilteringFiltering Click-back facility

Click-back facility

Selectable program

units

Selectable program

units

Individual Events

Individual Events

On/off facility

On/off facility

StatisticsStatistics

Source Code Instrumentation

Source Code Instrumentation

Automatic Automatic Monitoring modes

Monitoring modes

FilteringFiltering Click-back facility

Click-back facility

Selectable program

units

Selectable program

units

Individual Events

Individual Events

On/off facility

On/off facility

StatisticsStatistics

Source code click-back Source code click-back facility and click-forwardfacility and click-forward

facilityfacility

ScalabilityScalability

Data Acquisition

Data Acquisition

Data Analysis & Display

Data Analysis & Display

Turning tracing on/off

Turning tracing on/off

FilteringFiltering Zooming Zooming FilteringFiltering

Interactive Interactive Non-Interactive Non-Interactive

ScalabilityScalability

Data Acquisition

Data Acquisition

Data Analysis & Display

Data Analysis & Display

Turning tracing on/off

Turning tracing on/off

FilteringFiltering Zooming Zooming FilteringFiltering

Interactive Interactive Non-Interactive Non-Interactive

Behaviour Window of PROVEBehaviour Window of PROVE

Scrolling visualisation windows forward and backwards User controlled focus on processors, processes and messages Zooming, event filtering facilities

ScalabilityScalability

Data Acquisition

Data Acquisition

Data Analysis & Display

Data Analysis & Display

Turning tracing on/off

Turning tracing on/off

FilteringFiltering Zooming Zooming FilteringFiltering

Interactive Interactive Non-Interactive Non-Interactive

Filtering in PROVE

VersatilityVersatility

Interoperate with other tools

Interoperate with other tools

Different viewsDifferent views

Event views

Event views

NoNoStatistics views

Statistics views

YesYes

PROVE Performance analyserPROVE Performance analyser

• Various views for displaying performance information

Synchronised multi-window visualisation

PROVE Summary WindowsPROVE Summary Windows

Various views for displaying summary information

Synchronised multi-window visualisation

PROVE Statistics WindowsPROVE Statistics Windows

Profiling based on counters Analysis of very long running

programs is enabled

VersatilityVersatility

Interoperate with other tools

Interoperate with other tools

Different viewsDifferent views

Event views

Event views

NoNoStatistics views

Statistics views

YesYes

P-GRADE

The GRM MonitorThe GRM Monitor

• Off-line monitoring (GRADE)– stores trace events in a (local or global) storage and– makes it available after execution for post-mortem

processing.

• Semi-on-line monitoring (P-GRADE)– stores trace events in a storage but– makes it available for the visualisation tool any time

during execution if the user asks for it– interactive usage of PROVE– user can remove already inspected part of the trace– evaluation of long-running programs– macrostep debugging in P-GRADE with execution

visualisation

• Application-level monitor• Tracing + statistics collection• Semi-on-line

P-GRADE

D IW ID E G R E D

G R M(M M ) Trace file

L M L M L M

H o st 1 H o st 2 H o st n

so ck e t f ile o p e ra tio n

L o ca l h o s t

S e rv e r h o s t

R em o te c lu s te r

P R O V E

P ro v e -rd d

GRM monitorGRM monitor

M M

L M

p ro c 1 p ro c 2

S h a re d -m e m o ry

b u ffe r

H o st

so ck e t

m em o ry o p .p ip e

Buffer is full (to a certain threshold)

Trace collectionTrace collection

MM

LM LM

Process 1 Process 2 Process 3

Trace fileProcess notifies LM

LM notifies MM

MM asks all LMs to stop application

MM for each LM:

asks each LM to send trace

sets timestamps to a global time

writes trace into the trace file

receives trace from LM

MM asks LMs to continue application

Trace fileTrace file

PortabilityPortabilitySupported Hardware/Software PlatformsSupported Hardware/Software Platforms

Workstation clusters SGI MIPS / IRIX 5.x/6.x (MTA SZTAKI, Univ. of Vienna) Sun UltraSPARC / Solaris 2.x (Univ. of Athens) Intel x86 / Linux (MTA SZTAKI)

Supercomputers Hitachi SR2201 / HI-UX/MPP (Polish-Japanese School,

Warsaw) Cray T3E / UNICOS(Jülich, Germany)

International installationsInternational installations

• Current– UK– Austria – Spain– Portugal– Poland– Germany– Slovakia– Greece– Japan– Mexico– USA

• Planned– Australia– Korea

Further DevelopmentsFurther Developments

• Family of parallel programming environments

P-GRADE VisualMP VisualGrid

- checkpointing

- dynamic load

balancing

- fault tolerance

- grid resource management

- grid monitoring

- mobile processes

ConclusionConclusion

• Current applications in physics– Efficency lost due to high level graphical programming is

less than 2 %

• Weather forecast application under development

• Download version:– www.lpds.sztaki.hu

• P-GRADE (Professional GRADE)– Project with Silicon Graphics Hungary– Current developments to support

• SPMD style programming• Object based programming

Thank You ...Thank You ...

?

top related