askalon a tool set for cluster and grid computing

33
ASKALON ASKALON A Tool A Tool Set Set for for Cluster Cluster and Grid Computin and Grid Computin A Special Research Program funded by Automatic Performance Analysis: Real Tools T. Fahringer, A. Hofer, A. Jugravu, T. Fahringer, A. Hofer, A. Jugravu, S. Pllana, R. S. Pllana, R. Prodan, C. Seragiotto, Prodan, C. Seragiotto, J. Testori, H.-L. Truong, A. J. Testori, H.-L. Truong, A. Villazon, M. Welzl Villazon, M. Welzl Institute for Computer Science University of Innsbruck [email protected] informatik.uibk.ac.at/dps Cracow’03 Grid Workshop, Oct. 2003

Upload: dieter

Post on 07-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

A Special Research Program funded by ASF. Automatic Performance Analysis: Real Tools. ASKALON A Tool Set for Cluster and Grid Computing. Cracow’03 Grid Workshop, Oct. 2003. T. Fahringer, A. Hofer, A. Jugravu, S. Pllana, R. Prodan, C. Seragiotto, - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ASKALON A Tool  Set  for  Cluster and Grid Computing

ASKALONASKALONA Tool A Tool Set Set forfor Cluster Cluster

and Grid Computingand Grid Computing

A Special Research Program funded by ASF

Automatic Performance Analysis: Real Tools

T. Fahringer, A. Hofer, A. Jugravu, T. Fahringer, A. Hofer, A. Jugravu, S. Pllana, R. Prodan, C. Seragiotto, S. Pllana, R. Prodan, C. Seragiotto,

J. Testori, H.-L. Truong, A. Villazon, M. WelzlJ. Testori, H.-L. Truong, A. Villazon, M. Welzl

Institute for Computer ScienceUniversity of Innsbruck

[email protected]/dps

Cracow’03 Grid Workshop, Oct. 2003

Page 2: ASKALON A Tool  Set  for  Cluster and Grid Computing

2

Outline

• ASKALON: Overview

• Performance Analysis and the Grid

• Automatic Experiment Management

• JavaSymphony: A New Programming Method

for the Grid

• Summary

Page 3: ASKALON A Tool  Set  for  Cluster and Grid Computing

A Tool Set for Cluster and Grid Architectures

3

ArchitecturesArchitectures NOWs PC-Clusters SMP Clusters GRID Systems DM/SM Systems

Parameter Studies Performance Studies Experiment Management Software Testing

ZenturioZenturio

Programming Programming ParadigmsParadigms MPI,GlobusMPI OpenMP/MPI HPF/OpenMP JavaSymphony

PerformanceExperiment

ProgramMachine Database

informatik.uibk.ac.at/dps

Instrumentation Measuring Performance Analysis

ScaleaScalea

Modeling Simulation Performance Prediction

PerformancePerformanceProphetProphet

ASKALON:

Automatic Bottleneck Analysis

AksumAksum

Page 4: ASKALON A Tool  Set  for  Cluster and Grid Computing

4ASKALONData

Repository

PerformancePropertyAnalyzer

Application

CompilationCommand

ExecutionCommand

Machine

ExperimentGenerator

Scheduler

ASKALONService

Repository

Service Sites

Compute Site

OverheadAnalyzer

SearchEngine

PerformanceAnalyzer

SCALEAUser Portal

ASKALONVisualization

Diagrams

ZENTURIOUser Portal

AKSUMUser Portal

PROPHETUser Portal

PerformanceEstimator

SISInstrumentor

ExperimentExecutor

Registry

Mid

dle

war

e

Factory

Factory

ASKALONWeb Services

Page 5: ASKALON A Tool  Set  for  Cluster and Grid Computing

5

Performance Analysis for the Grid

so far mostly low level analysis– monitoring, instrumentation, analysis

for the Grid infrastructure•but not for applications

– lots of low level performance data and visualization• lack of high-level summary information

– difficult to associate data with specific middleware components and applications

Page 6: ASKALON A Tool  Set  for  Cluster and Grid Computing

6

low level performance analysis

Page 7: ASKALON A Tool  Set  for  Cluster and Grid Computing

9

Performance Analysis for the Gridnext steps

– higher level analysis• performance analysis for the Grid and its applications (single-

entry single exit regions)• summaries instead of details• problems and interpretation instead of raw data

– combined Grid performance analysis for (SCALEA,• network AKSUM)• site• application

– customizable tools instead of hard-coded analysis– multi-experiment instead of single-experiment analysis– online and scalable performance analysis

Page 8: ASKALON A Tool  Set  for  Cluster and Grid Computing

11

Aksum A Tool for Semi-Automatic Multi-Experiment Performance Analysis

• user-provided problem and machine sizes• automated instrumentation, experiment management,

performance interpretation, and search for performance bottlenecks

• performance analysis for single-entry single-exit regions• performance problems related to the program• targets OpenMP/MPI, and mixed programs • customizable (build your own performance tool)

– API for performance overheads– define performance problems and code regions of interest– influence the search (strategy, time, code regions)

Page 9: ASKALON A Tool  Set  for  Cluster and Grid Computing

13

Specification of Performance Problems with JavaPSL

• JavaPSL is a – API for the specification of performance problems.– high-level interface for raw performance data.

• pre-defined and user-defined JavaPSL problems• performance problems as values between 0 and 1

(interpretation)public class SynchronizationOverhead implements Property {

private float severity;public SynchronizationOverhead( DynamicCodeRegion d, ReferenceDynamicCodeRegion r) {

severity = (float)d.getSynchronizationOverhead() /r.getExecutionTime();

}public boolean holds( ) { return severity > 0; }public float getSeverity( ) { return severity; }public float getConfidence( ) { return 1; }

}

Page 10: ASKALON A Tool  Set  for  Cluster and Grid Computing

14

• Defines evaluationorder of performanceproperties

• Predefined hierarchies– OpenMP, MPI, mixed mode

• Can be customized• Each node has:

– a threshold (property instances with severity less than the threshold are discarded)

– reference code region– bean properties

Property hierarchy

Page 11: ASKALON A Tool  Set  for  Cluster and Grid Computing

16

Property Hierarchy (first levels)

Inefficiency

ParallelInefficiency

SerialInefficiency

DataMovementOverhead

LoadImbalance

ImperfectFloatingPointBehavior

SynchronizationOverhead

ControlOfParallelismOverhead

...

...

...

...

ImperfectCacheBehavior

Page 12: ASKALON A Tool  Set  for  Cluster and Grid Computing

17

NonScalability

Ineff iciency

DataMovementOverhead

SynchronizationOverhead

ControlOfParallelismOverhead

SerialIneff iciency

ImperfectFloatingPointBehavior

SharedMemoryCoPOverhead

MessagePassingCoPOverhead

ReplicatedComputationOverhead

LossOfParallelismOverhead

UnparallelizedComputationOverhead

PartiallyParallelizedComputationOverhead

ExecutionTimeImbalance

ComputationImbalance

CommunicationImbalance

SynchronizationImbalance

IOImbalance

LocalMemoryAccessImbalance MultipleAdressSpace

SynchOverhead

SingleAdressSpaceSynchOverhead

L1CacheMissOverhead

IOOverhead

CommunicationOverhead

LocalMemoryOverhead

ImperfectL1CacheBehavior

FlushOverhead

...

P2POverhead

CollectiveOverhead

LargeMessagesOverhead

SmallMessagesOverhead

SendOverhead

ReceiveOverhead

Collectiv eCommunicationOv erhead

CollectiveComputationOverhead

RemoteMemoryOverhead

PutOverhead

GetOverhead

RemoteMemoryInitializationOverhead

ReadOverhead

WriteOverhead

SmallIORequestsOverhead

LargeIORequestsOverhead

RemoteIOOverhead

LocalIOOverhead

NonStrippedIOOverhead

LargeStartupFileOverhead

LargeOutputFilesOverhead

ReceiveContentionOverhead

LateSendOverhead

LocalReceiveOverhead

LocalSendOverhead

LateReceiveOverhead

SmallMessages2SameDestinationOverhead

SerialCommunicationOverhead

CollectiveCommunicationOnLargeDataStructures

LatePartyOverhead

LocalCollectiveOverhead

GatherOverhead

LocalGatherOverhead

LateGatherOverhead

ScatterOverhead LocalScatterOverhead

LateScatterOverhead

BroadcastOverhead

LocalBroadcastOverhead

LateBroadcastOverhead

ManualBroadcastOverhead

CollectiveComputationOnLargeDataStructures

LateCollectiveComputationPartyOverhead

LocalCollectiveComputationOverhead

ManualCollectiveComputationOverhead

LargeNumberOfCollectiveComputations

RMALocksOverhead

RMACollectiveSynchOverhead

Def erredCommunicationOv erhead

MessagePassingBarrierOverhead

ExplicitLockOverhead

CriticalSectionSynchOverhead

WorkSharingSynchOverhead

AtomicSynchronizationOverhead

SharedMemoryOverhead

LoadImbalanceImperfectL2Cache

Behavior

L2CacheMissOverhead

ImperfectTLBBehavior

TLBMissOverhead

ImperfectPageCacheBehavior

PageFaultOverhead

ParallelRegionFinishOverhead

SharedMemoryLoopSchedulingOverhead

InitializationOverhead

FinalizationOverhead

ParallelRegionStartupOverhead

MessagePassingLoopSchedulingOverhead

ReductionOverhead

LastPrivateOverhead

JoinOverhead

CopyInForCommonBlocksInitOverhead

FirstPrivateClauseInitOverhead

Property Hierarch

y

Page 13: ASKALON A Tool  Set  for  Cluster and Grid Computing

19

Application parameters

• Strings to besubstituted insome or allof the input files

• Mapped toZEN directivesin the inputfiles

• Basis for experiment generation and execution done by ZENTURIO

Page 14: ASKALON A Tool  Set  for  Cluster and Grid Computing

20

Case study: LAPW0 material science code

Page 15: ASKALON A Tool  Set  for  Cluster and Grid Computing

21

Case study: LAPW0 (views)

Page 16: ASKALON A Tool  Set  for  Cluster and Grid Computing

22

Case study: LAPW0 (charts)

Page 17: ASKALON A Tool  Set  for  Cluster and Grid Computing

23

Outline

• Performance Analysis and the Grid

• Automatic Experiment Management

• JavaSymphony: A New Programming

Method for the Grid

• Summary

Page 18: ASKALON A Tool  Set  for  Cluster and Grid Computing

24

Management of Experiments and Parameter Studies

Currently scientists– manually create parameter studies– manage many different sets of input data– launch large number of compilations and executions– administer result files– invoke performance analysis tools– interpret/visualize performance and parameter

results, etc.

This is a tedious, error-prone, and time consuming process.

Page 19: ASKALON A Tool  Set  for  Cluster and Grid Computing

25

ZENTURIO: An Automatic Experiment Management Framework for Cluster and Grid ArchitecturesSupport for scientists to semi-

automatically conduct large sets of– parameter studies

• throughput versus high-performance computing

– performance studies– software tests

on cluster and Grid architectures.

Page 20: ASKALON A Tool  Set  for  Cluster and Grid Computing

26

ExperimentExecutorService

Scheduler

E-Site

ZENTURIO A Web Service based

Architecture

Mid

dlew

are

ExperimentPreparation

application

compilation executioncommand

machine

ExperimentData

Repository

ExperimentMonitor

User Portal

ExperimentGenerator

Service

G-Site

Instrumentation

ApplicationData

Visualiser

RegistryService

Page 21: ASKALON A Tool  Set  for  Cluster and Grid Computing

27

Application Parameters and Value Sets

• Performance and parameter results depend on application parameters and their value sets.– machine sizes {x CPUs, y Grid sites, …}– problem sizes {x atoms, matrix size, …}– program variables {1,2,3,16:110:2}– data distributions {block, cyclic, …}– loop scheduling strategies {static, guided, …}– communication networks {Myrinet, FastEthernet, …}– input/output file names, etc.

• An Experiment is defined by its sources with every application parameter replaced by a specific value.

Page 22: ASKALON A Tool  Set  for  Cluster and Grid Computing

28

ZEN: Directive-based LanguageSpecification of Arbitrary Complex Experiments

•Set of directives to specify value sets of interest for arbitrary application parameters.• Directives:

– assignment– substitute– constraint– performance

• Annotation of arbitrary source/input files– program files, Makefiles, scripts, input files, etc.

•ZENTURIO generates sources for every different experiment based on ZEN directives.

Page 23: ASKALON A Tool  Set  for  Cluster and Grid Computing

29

LAPW0 Machine Size Globus RSL script

+(&

(resourceManagerContact = “gescher/jobmanager-pbs”)

(*ZEN SUBSTITUTE count\=4 = {count={2:40} } *)(count=4)(jobtype=mpi)(directory=“/home/radu/APPS/LAPW0”)(executable=“../SRC/lapw0”)

(arguments=“lapw0.def”))

count=4

count=2

count=3

count=40

...

Page 24: ASKALON A Tool  Set  for  Cluster and Grid Computing

30

Problem size: lapw0.def

4, 'znse_6.inm', 'unknown', 'formatted', 0!ZEN$ SUBSTITUTE ktp_.125hour.clmsum = { ktp_.125hour.clmsum,

ktp_.25hour.clmsum, ktp_.5hour.clmsum, ktp_1hour.clmsum }

8, 'ktp_.125hour.clmsum', 'old', 'formatted', 0!ZEN$ SUBSTITUTE ktp_.125hour.struct = { ktp_.125hour.struct,

ktp_.25hour.struct, ktp_.5hour.struct, ktp_1hour.struct }

20, 'ktp_.125hour.struct', 'old', 'formatted', 058, 'znse_6.vint', 'unknown','formatted', 0!ZEN$ CONSTRAINT INDEX ktp_.125hour.clmsum ==

ktp_.125hour.struct

ktp_.125hour.clmsum ktp_.25hour.clmsum ktp_.5hour.clmsum ktp_.1hour.clmsum

ktp_.125hour.struct ktp_.25hour.struct ktp_.5hour.struct ktp_.1hour.struct

Page 25: ASKALON A Tool  Set  for  Cluster and Grid Computing

32

!ZEN$ CR CR_P, CR_L PERF WTIME, ODATA. . .

!ZEN$ CR CR_OMPDO, CR_CALLS PERF WTIME, OSYNC BEGIN!$OMP DO SCHEDULE(STATIC). . .!$OMP END DO NOWAIT!$OMP BARRIER

!ZEN$ END CR

ZEN Performance Behaviour Directive

• request performance data for arbitrary code regions– CR_P = entire program– CR_L = all loops– CR_OMPDO = OpenMP do regions– CR_CALLS = procedure calls– WTIME = execution time– ODATA = data movement– OSYNC = synchronisation

• 50 code region mnemonics• 40 performance metrics• supported by SCALEA

Page 26: ASKALON A Tool  Set  for  Cluster and Grid Computing

33

ExperimentPreparation

Page 27: ASKALON A Tool  Set  for  Cluster and Grid Computing

34

ZENTURIO User Portal

Page 28: ASKALON A Tool  Set  for  Cluster and Grid Computing

35

ApplicationDataVisualiser(ADV)

Page 29: ASKALON A Tool  Set  for  Cluster and Grid Computing

36

Scalability Fast Ethernet

Page 30: ASKALON A Tool  Set  for  Cluster and Grid Computing

38

Backward PricingTotal Price Evolution

5 10 15 20 25 30 35 40 45 50 55 600.01

0.05

0.09

0

2000

4000

6000

8000

10000

12000

14000

Total Price

Number of Timesteps

Coupon

Backward Pricing, delta-t = 1.0

0-2000 2000-4000 4000-6000 6000-8000

8000-10000 10000-12000 12000-14000

0.08

0.24

0.4

0.56

0.72

5

25

45

0

5000

10000

15000

20000

25000

30000

35000

Total Price

delta-t

Number of Timesteps

Backward Pricing, coupon=0.05

0-5000 5000-10000 10000-15000 15000-20000

20000-25000 25000-30000 30000-35000

Page 31: ASKALON A Tool  Set  for  Cluster and Grid Computing

39

JavaSymphony (100 % Java) - new object-oriented programming paradigm of concurrent and distributed systems

– portability– higher level programming– simple access to resources– explicit control of locality and parallelism– performance-oriented

JavaSymphony programming model:– dynamic virtual architectures (VAs)– API for system parameters– single- and multi-threaded remote distributed objects – distribution/migration of objects and code– asynchronous und one-sided (remote) method invocation– synchronization and events (distributed)

And all of that without programming RMI, sockets, and threads!

JavaSymphonyHigh-Level Object-Oriented Programming of Grid Applications

Page 32: ASKALON A Tool  Set  for  Cluster and Grid Computing

40

Summary• Performance analysis for the Grid

– higher-level analysis, performance interpretation, multi-experiments, automatic, customizable, – high-level performance instrumentation interface– standardization of performance data

• Multi-Experiment Performance Analysis and Parameter studies for the Grid– request for arbitrary number of experiments– automatic management of experiments– fault tolerance, events – combine with schedulers and performance tools

• JavaSymphony: A new Programming Model for Grid Applications– Explicit control of locality, parallelism, and load balancing at a high level– dynamic virtual architectures, events, synchronization, migration, multi-threaded objects, asynchronous/snychronour/one-sided

remote methods– no RMI, socket or thread programming

Page 33: ASKALON A Tool  Set  for  Cluster and Grid Computing

A Tool Set for Cluster and Grid Architectures

University of Innsbruck/ Institute for Computer Science / T. Fahringer 42

ArchitecturesArchitectures NOWs PC-Clusters SMP Clusters GRID Systems DM/SM Systems

Parameter Studies Performance Studies Experiment Management Software Testing

ZenturioZenturio

Programming Programming ParadigmsParadigms MPI,GlobusMPI OpenMP/MPI HPF/OpenMP JavaSymphony

PerformanceExperiment

ProgramMachine Database

informatik.uibk.ac.at/dps

Instrumentation Measuring Performance Analysis

ScaleaScalea

Modeling Simulation Performance Prediction

PerformancePerformanceProphetProphet

ASKALON:

Automatic Bottleneck Analysis

AksumAksum