applications development for the computational grid march...

Post on 15-May-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Applications Development for the Computational Grid

David AbramsonFaculty of Information TechnologyMonash University

2

OverviewNew Methods in Scientific discovery

e-Science & e-ResearchComputational Platforms

The Grid and the WebSupporting a Software LifecycleThe role of Grid Services & MiddlewareSoftware Lifecycle Tools

Applications developmentDeploymentTest and debuggingExecution

Examples from Monash Tools

3

Scientific discovery

e-Science & e-Research

4

e-SciencePre-Internet

Theorize &/or experiment, aloneor in small teams; publish paper

Post-InternetConstruct and mine large databases of observational or simulation dataDevelop simulations & analysesAccess specialized devices remotelyExchange information within distributed multidisciplinary teams

Source: Ian Foster

5

6

Typical Grid ApplicationsCharacteristics

High Performance ComputationDistributed infrastructureInstruments are first class resourcesLots of dataNot just bigger – fundamentally different

Some examplesIn silico biology (See MyGrid)Earthquake simulationVirtual observatoryDynamic aircraft maintenanceHigh energy physicsMedical applicationsEnvironmental questions

7

Computational Platforms

Grid and Web Services

8

The GridInfrastructure (“middleware” & “services”) for establishing, managing, and evolving multi-organizational federations

Dynamic, autonomous, domain independentOn-demand, ubiquitous access to computing, data, and services

Mechanisms for creating and managing workflow within such federations

New capabilities constructed dynamically and transparently from distributed servicesService-oriented, virtualization

Source: Ian Foster

9

What is a Grid?

Three key criteriaCoordinates distributed resources …using standard, open, general-purpose protocols and interfaces …to deliver non-trivial qualities of service.

What is not a Grid?A cluster, a network attached storage device, a scientific instrument, a network, etc.Each may be an important component of a Grid, but by itself does not constitute a Grid

Source: Ian Foster

10

The (Power) Grid:On-Demand Access to Electricity

Time

Qua

lity,

eco

nom

ies

of s

cale

Source: Ian Foster

11

By Analogy, A Computing Grid

Decouple production and consumptionEnable on-demand accessAchieve economies of scaleEnhance consumer flexibilityEnable new devices

On a variety of scalesDepartmentCampusEnterpriseInternet

Source: Ian Foster

12

Grid and Web Services Convergence

The definition of WSRF means that the Grid and Web services communities can move forward on a common base.

Source: Globus Alliance

13

Supporting the Software Lifecycle

14

Why is this challenging?

Write software for local workstation

15

Why is this challenging?

Build heterogeneous testbed

16

Why is this challenging?

Deploy Software

17

Why is this challenging?

Test Software

18

Why is this challenging?

Build, schedule & Execute virtual application

19

Why is this challenging?

Interpret results

20

But this what I do well

21

Can we support this process better?

Deploy & Build

Execution

ApplicationsDevelopment

Test & Debug

22

Grid Services & Middleware

23

MiddlewareGlobus GT4 CondorAPST

PlatformInfrastructure Unix Windows JVM TCP/IP MPI .Net Runtime

Environmental Sciences

Life & Pharmaceutical

Sciences

ApplicationsGeo Sciences

Building Software for the Grid

VPN SSH

Courtesy IBM

24

MiddlewareGlobus GT4 CondorAPST

PlatformInfrastructure Unix Windows JVM TCP/IP MPI .Net Runtime

Environmental Sciences

Life & Pharmaceutical

Sciences

ApplicationsGeo Sciences

Building Software for the Grid

VPN SSH

Courtesy IBM,Lower Middleware

Upper Middleware & Tools

Bonds

25

LowerMiddleware Globus GT4 Web Services

PlatformInfrastructure Unix Windows JVM TCP/IP MPI .Net Runtime

Environmental Sciences

Life & Pharmaceutical

Sciences

ApplicationsGeo Sciences

Building Software for the Grid

VPN SSH

Shilbolith SRB

26

LowerMiddleware

PlatformInfrastructure Unix Windows JVM TCP/IP MPI .Net Runtime

Environmental Sciences

Life & Pharmaceutical

Sciences

ApplicationsGeo Sciences

Building Software for the Grid

VPN SSH

Semantic Gap

Globus GT4 Web Services Shilbolith SRB

27

Coding to underweardef build_rsl_file(executable, args, stagein=[], stageout=[], cleanup=[]):

tocleanup = []stderr = t5temp.mktempfile()stdout = t5temp.mktempfile()rstderr = '${GLOBUS_USER_HOME}/.nimrod/' + os.path.basename(stderr)rstdout = '${GLOBUS_USER_HOME}/.nimrod/' + os.path.basename(stdout)

rslfile = t5temp.mktempfile()f = open(rslfile, 'w')f.write("<job>\n <executable>%s</executable>\n" % executable)for arg in args:

f.write(" <argument>%s</argument>\n" % str(arg))f.write(" <stdout>%s</stdout>\n" % rstdout)f.write(" <stderr>%s</stderr>\n" % rstderr)# User defined stage-in sectionif stagein:

f.write(" <fileStageIn>")for src, dest, leave in stagein:

if not leave:tocleanup.append(dest)

f.write("""<transfer>

<sourceUrl>gsiftp://%s%s</sourceUrl><destinationUrl>file:///${GLOBUS_USER_HOME}/.nimrod/%s</destinationUrl>

</transfer>""" % (hostname, src, dest))f.write("\n\t</fileStageIn>\n")

f.write(" <fileStageOut>")# User defined stage-out files section…………………………………………………………

28

LowerMiddleware

PlatformInfrastructure Unix Windows JVM TCP/IP MPI .Net Runtime

Environmental Sciences

Life & Pharmaceutical

Sciences

ApplicationsGeo Sciences

Software Layers

VPN SSH

UpperMiddleware/Tools

Globus GT4 Web Services Shilbolith SRB

29

LowerMiddleware

PlatformInfrastructure Unix Windows JVM TCP/IP MPI .Net Runtime

Environmental Sciences

Life & Pharmaceutical

Sciences

ApplicationsGeo Sciences

Software Layers

VPN SSH

NimrodNimrodPortal& WS

DistANT

UpperMiddleware/Tools

MotorGlobus GT4 Web Services Shilbolith SRB

Worqbench

Debug REMUS

GriddLeSKepler Guard ActiveSheets

Development Deploy Test/Debug Execution

30

Applications Development

31

Applications Development on the Grid

New ApplicationsCode to middleware standardsSignificant effortExciting new distributed applicationNumerous programming techniques

Legacy ApplicationsWere built before the GridThey are fragileFile based IOMay be sequentialLeverage old codes to produce new virtual applicationAmenable to Grid Workflows

32

Approaches to Grid programming

General Purpose Workflows

Generic solutionWorkflow editor Scheduler

Special purpose workflows

Solve one class of problemSpecification languageScheduler

33

Grid Workflows

LowerMiddlewear

NimrodNimrodPortal& WS

DistANT

UpperMiddlewear/Tools

MotorGlobus GT4 Web Services Shilbolith SRB

Worqbench

Debug REMUS

GriddLeSKepler Guard ActiveSheets

34

Genomics: Promoter Identification Workflow

Source: Matt Coleman (LLNL)Source: Matt Coleman (LLNL)

Source: Ilkay Altintas, SDSC

35

Ecology: GARP Analysis Pipeline forInvasive Species Prediction

Training sample

(d)

GARPrule set

(e)

Test sample (d)

Integratedlayers

(native range) (c)

Speciespresence &

absence points(native range)

(a)EcoGridQuery

EcoGridQuery

LayerIntegration

LayerIntegration

SampleData

+A3+A2

+A1

DataCalculation

MapGeneration

Validation

User

Validation

MapGeneration

Integrated layers (invasion area) (c)

Species presence &absence points

(invasion area) (a)

Native range

predictionmap (f)

Model qualityparameter (g)

Environmental layers (native

range) (b)

GenerateMetadata

ArchiveTo Ecogrid

RegisteredEcogrid

Database

RegisteredEcogrid

Database

RegisteredEcogrid

Database

RegisteredEcogrid

Database

Environmental layers (invasion

area) (b)

Invasionarea prediction

map (f)

Model qualityparameter (g)

Selectedpredictionmaps (h)

Source: NSF SEEK (Deana Pennington et. al, UNM)Source: NSF SEEK (Deana Pennington et. al, UNM)Source: Ilkay Altintas, SDSC

36

Source: NIH BIRN (Jeffrey Grethe, UCSD)Source: NIH BIRN (Jeffrey Grethe, UCSD)

Source: Ilkay Altintas, SDSC

37

The KEPLER GUI (Vergil)

Drag and drop utilities, director and actor libraries.

Source: Ilkay Altintas, SDSC

38

A Generic Web Service ActorGiven a WSDL and the name of an operation of a web service, dynamically customizes itself to implement and execute that method.

Configure - select service operation

Source: Ilkay Altintas, SDSC

39

Kepler DirectorsOrchestrate WorkflowSynchronous Data Flow

Consumer actors not started until producer completesFiles copied from producer to consumer.

Process NetworksAll actors execute concurrentlyCommunication through

TCP/IP SocketsDedicated IO

IO modes produce different performance results.Actors need to be coded to support specific IO modes

40

Parameter Sweep Workflows with Nimrod

LowerMiddlewear

NimrodNimrodPortal& WS

DistANT

UpperMiddlewear/Tools

MotorGlobus GT4 Web Services Shilbolith SRB

Worqbench

Debug REMUS

GriddLeSKepler Guard ActiveSheets

41

Nimrod ...Supports workflows for robust design and search

Vary parametersExecute programsCopy data in and out

Sequential and parallel dependenciesComputational economy drives schedulingComputation scheduled near data when appropriateUse distributed high performance platformsUpper middleware broker for resources discoveryWide Community adoption

Nimrod

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006

Nimrod/GEnFuzion (www.axceleon.com)

Nimrod/ONimrod/OI

Nimrod/K

Active Sheets (Excel)

Nimrod Roadmap

Nimrod/WS

42

Parameter Studies & SearchStudy or search the behaviourof some of the output variables against a range of different input scenarios.

Design optimizationAllows robust analysisMore realistic simulations

Computations are loosely coupled (file transfer)Very wide range of applications

43

Nimrod scales from local to remote resources

Office

Department

OrganisationNation

44

From Quantum chemistry to aircraft design

Drug Docking Aerofoil Design

45

Nimrod Development Cycle

Prepare Jobs using Portal

Jobs Scheduled Executed Dynamically

Sent to available machines

Results displayed &interpreted

46

Experimental DesignWant to evaluate effects of parameters and parameter combinations“Design of Experiments”approach

Dates back to 1950Extensively used to generate minimum number of “right”experiments

New support in Nimrod/GSpecify resolution of experiment

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

estim

ate

B

C

T

D

BC

BT

CT

DT A

O N

BD

CD BN AC

AT DN

BO

K AB

CK AD

NO KS M

A = Kncx B = KPCa C = KgL D = Kgammacf E = PropLocalNCX F = constKmCaG = constKmNa H = constksat J = Kryrmax K = Kvmax L = Kgto M = KVSRN = KVSS O = Kr_xfer P = K_IpCamax Q = K_KmpCa R = KbL S = KaLT = KfL

47

Optimization using Nimrod/O

Nimrod/G allows exploration of design scenarios

Search by enumeration

Search for local/global minima based on objective function

How do I minimise the cost of this design?How do I maxmimize the life of this object?

Objective function evaluated by computational model

Computationally expensive

48

Genetic AlgorithmGenetic Algorithm

SimplexSimplex

Grid or ClusterGrid or Cluster

How Nimrod/OWorks

BFGSBFGS

Nimrod orNimrod orEnFuzionEnFuzion

DispatcherDispatcher

FunctionFunctionEvaluationsEvaluations

JobsJobs

NimrodNimrodPlanPlanFileFile

49

Interactive Design

Human-in-the-optimization-loopUse population based methodsRank solutions

50

Interactive Design

51

Deployment

LowerMiddlewear

NimrodNimrodPortal& WS

DistANT

UpperMiddlewear/Tools

MotorGlobus GT4 Web Services Shilbolith SRB

Worqbench

Debug REMUS

GriddLeSKepler Guard ActiveSheets

52

Why is this challenging?

Deploy Software

53

DeploymentTwo approaches

Hide the heterogeneityUse local knowledge about the instruction set, machine structure, file system, I/O system, and installed libraries

Build on VM technologyProvide services for deployment

Expose the heterogeneityBuild integrated framework that knows about the testedSupport the user in managing differencesBuild on IDE technology

54

DeploymentService

Intermediate Code Application Binary

ApplicationHandle

GRAM

ApplicationHandle

Installed Applications

Install Execute

Application Source

.NET Compilers

Client Machine

Grid Resource

.NET Runtime .NET Parallel Virtual Machine Globus/OGSA

Hide the heterogeneity

Grid build files based on ANTCreate deployment space consistent with GT GRAM

55

Motor Runtime-A VM for HPCOur approach is runtime-internal

Why do Java & .NET support web services, UI, security and other libraries as part of the standard environment?

Functionality is guaranteed

Similarly, we aim to provide guaranteed HPC functionality

56

Leveraging IDEs: WorqbenchUse Eclipse IDE to support usersTestbedis first class object in the IDE

57

Test and Debug

LowerMiddlewear

NimrodNimrodPortal& WS

DistANT

UpperMiddlewear/Tools

MotorGlobus GT4 Web Services Shilbolith SRB

Worqbench

Debug REMUS

GriddLeSKepler Guard ActiveSheets

58

Why is this challenging?

Test Software

59

Grid level basic debugging

Hardware

Software

60

Relative debuggingWhat do you do when you move your application to another node of the Grid and it stops working?Subtle errors can be introduced through changes

By programmerIn the environment (DLL Hell)

Programmer must understand application intimately to be able to locate source of errorsProgrammer can spend much time:

tracing program state to locate source of errorunderstanding how code changes may have resulted in errors

Relative debugging is about automating this process.Hybrid Test and Debug methodology

61

Relative Debugging on the Grid

Client running GUARDClient running GUARD

Server running Server running applicationapplicationBig Big EndianEndian64 bit64 bit

Server Server running running applicationapplicationLittle Little endianendian32 bit32 bit

TCP/IPTCP/IP

62

Source Code

AssertionsSimple Data Types

Complex Data Types

Run Both Applications

Build Assertions

Visualize differences

DifferentResults?

63

Execution

LowerMiddlewear

NimrodNimrodPortal& WS

DistANT

UpperMiddlewear/Tools

MotorGlobus GT4 Web Services Shilbolith SRB

Worqbench

Debug REMUS

GriddLeSKepler Guard ActiveSheets

64

Why is this challenging?

Build, schedule & Execute virtual application

65

The Nimrod Portal

66

Nimrod’s Runtime machinery

0

2

4

6

8

10

12

0 1 3 4 6 8 9 10 12 14 15 17 19 20 21 22 24 25 27 28 30 31 33 34 36 37 38 40 41 43 44 46 47 49 51 52 54

Time (minutes)

Jobs

Linux cluster - Monash (20) Sun - ANL (5) SP2 - ANL (5) SGI - ANL (15) SGI - ISI (10)

Soft real-time scheduling problem

67

Flexible Grid Workflows

CCAM CCAM DARLAM

RCMRCM

RCMRCM

RCMRCM

RCMCIT

Vector Machine

Shared Memory Multiprocessor

Linux Cluster

Global Climate DataTemperature, Pressure, etc

Regional Weather DataTemperature, Pressure, etc

Ozone concentration contours

All models provided by CSIRO Division of Atmospheric Research

Mainframe

68

GriddLeSLegacy applications need to be shielded from IO details in Grid

Local filesRemote filesReplicated filesProducer-consumer pipes

Don’t want to lock in IO model when application is written (or even Grid Enabled)Choice of IO model should be

DynamicLate bound

69

Flexible IO in GriddLeS

read()write()seek()

open()close()

Local File

Local File

Remote File

Remote File

RemoteApplication

Process

FileMultiplexer

Legacy Application

CacheCache

Replica 1Replica 1

Replica 2Replica 2

Replica 3Replica 3

70

GriddLeS ArchitectureApplication

Read,

Write,

etc

Grid Buffer Client

Grid Buffer Server

Grid FTP Server

Local File System

Remote File Client

GNS Client

Local File Client

File Multiplexer

GriddLeS NameServer (GNS)

Application

Read,

Write,

etc

Grid Buffer Client

Grid Buffer Server

Grid FTP Server

Local File System

Remote File Client

GNS Client

Local File Client

File Multiplexer

Replication Service

(GRS) ClientSRB

GlobusReplication

GFarm

Replication Service

(GRS) ClientSRB

GlobusReplication

GFarm

71

Nimrod/K

New project to integrate Special purpose function of Nimrod/G/OGeneral purpose workflows from KeplerIO model from GriddLeS

Better integration with PortalsMore flexible scheduling

72

Can we support this process better?

Deploy & Build

Execution

ApplicationsDevelopment

Test & Debug

Support scientists do what they do best

Science

Combination of MiddlewareSoftware tools

73

Acknowledgements (Monash Grid Research)

Research FellowsColin EnticottSlavisa GaricJagan KommineniTom PeachyJeff Tan

PhD StudentsShahaan AyyubPhilip ChanTim HoDonny KurniawanWojtek GoscinskiAaron Searle

Funding & SupportCRC for Enterprise Distributed Systems (DSTC)Australian Research CouncilGrangeNet (DCITA)Australian Partnership for Advanced Computing (APAC)MicrosoftSun MicrosystemsIBMHewlett PackardAxceleon

74

Questions?

www.csse.monash.edu.au/~davida

75

parameter energy label "Variable Photon Energy" float select anyof 0.03 0.05 0.1 0.2 0.3 default 0.03 0.05 0.1 0.2 0.3;parameter iseed integer random from 0 to 10000;parameter length label "Length of collecting electrode" float select anyof .8 .9 1 default .8 .9 1;parameter radius label "Radius" float select anyof 0.0625 0.0725 0.0825 default 0.0625 0.0725 0.0825;task nodestart

copy NE2611.dat node:.copy ne2611.skel node:.

endtasktask main

node:substitute ne2611.skel NE2611.INPnode:execute ne2611.xx copy node:NE2611.OP ne2611out.$jobnamecopy node:stderr ne2611.time.$jobname

endtask

Plan File

www.monash.edu.au

Burnoff of the Australian savanna –Does it affect the climate? Testing the Pragma Testbed.

K. Görgen, A. Lynch, C. Enticott*, J. Beringer, D. Abramson**,

P. Uotila, N. Tapper

School of Geography and Environmental Science* Distributed Systems Technology Centre** School of Computer Science and Software Engineering

77

Savanna Burnoff

• Extensive savanna eco-systems in northern Australia

– 25 % of Australia– Vegetation: spinifex / tussok

grasslands; forest / open woodland

– Warm, semiarid tropical climate– Primary land uses:

> Pastoralism> Mining > Tourism> Aboriginal land

management

(Tropical Savannas CRC)

78

Motivation

• Extensive savanna eco-systems in northern Australia

• Changing fire regime • Fires lead to abrupt changes in

surface properties– Surface energy budgets– Partititioning of convective fluxes – Increased soil heat flux→ Modified surface-atmosphere

coupling • Sensitivity study: do the fire’s

effects on atmospheric processes lead to changes in highly variable precipitation regime of Australian Monsoon?

• Many potential impacts (e.g. agricultural productivity)

(J. Beringer)

79

• Combination of atmospheric modelling (C-CAM), re-analysis and observational data

• C-CAM Simulations

Experiment Design

1974 to 1978 1979 to 1999

spinup control run, no fires / succession

real fires / succession, selected scenarios

~ 90 independent runs (fire / succession scenarios)for sensitivity studies → 1890 yrs of simulations

Part IPart II

80

Use of Grid Computing

• 90 parallel independent model runs • Single CPU model version of parallelized C-CAM (MPI)• Distribution of forcing data repositories to cluster sites (~80

GB), 250 MB forcing data per month• Machine independent dataformats (NetCDF)• Architecture specific, validated C-CAM executables• ~1.5 month CPU time for one experiment (90 exp. total)• Robust, portable, self-controlling model system incl. all

processing tools and restart files• PRAGMA Testbed

– Can we get enough nodes to complete experiment?– Can we maintain a testbed for 1.5 Months?– Can we maintain a node up for 0.5 days?– Can we make this routine for climate modelers?

81

0

10

20

30

40

50

60

70

80

90

100

Mar

08

2006

Mar

12

2006

Mar

17

2006

Mar

22

2006

Mar

27

2006

Mar

31

2006

Apr 0

5 20

06

Apr 1

0 20

06

Apr 1

4 20

06

Apr 1

9 20

06

Apr 2

4 20

06

Apr 2

8 20

06

May

03

2006

May

08

2006

May

13

2006

May

17

2006

May

22

2006

May

27

2006

May

31

2006

Jun

05 2

006

Jun

10 2

006

Jun

15 2

006

Jun

19 2

006

Jun

24 2

006

Jun

29 2

006

Jul 0

3 20

06

Jul 0

8 20

06

Jul 1

3 20

06

Jul 1

8 20

06

Jul 2

2 20

06

Jul 2

7 20

06

Aug

01 2

006

Aug

05 2

006

Aug

10 2

006

Aug

15 2

006

Aug

19 2

006

maharrocks-52umejupiterpragma001amata1tgcTotal

top related