parametric modeling on the grid with nimrod/g · distributed infrastructure instruments are first...

Post on 07-Jul-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Parametric modeling on the Grid with Nimrod/G

Jeff TanFaculty of Information TechnologyMonash e-Science and Grid Engineering Laboratory

2

Overview

New Methods in Scientific discovery

e-Science & e-ResearchThe role of Grid Services & MiddlewareSoftware Lifecycle Tools

Applications developmentExecution

Examples from Monash Tools

3

Scientific discovery

e-Science & e-Research

4

e-SciencePre-Internet

Theorize &/or experiment, aloneor in small teams; publish paper

Post-InternetConstruct and mine large databases of observational or simulation dataDevelop simulations & analysesAccess specialized devices remotelyExchange information within distributed multidisciplinary teams

Source: Ian Foster

5

Typical Grid ApplicationsCharacteristics

High Performance ComputationDistributed infrastructureInstruments are first class resourcesLots of dataNot just bigger – fundamentally different

Some examplesIn silico biology (See MyGrid)Earthquake simulationVirtual observatoryDynamic aircraft maintenanceHigh energy physicsMedical applicationsEnvironmental questions

6

Software Life Cycle on the Grid?

Deploy & Build

Execution

ApplicationsDevelopment

Test & Debug

7

Grid Services & Middleware

8

MiddlewareGlobus GT4 CondorAPST

PlatformInfrastructure Unix Windows JVM TCP/IP MPI .Net Runtime

Environmental Sciences

Life & Pharmaceutical

Sciences

ApplicationsGeo Sciences

Building Software for the Grid

VPN SSH

Courtesy IBM

9

MiddlewareGlobus GT4 CondorAPST

PlatformInfrastructure Unix Windows JVM TCP/IP MPI .Net Runtime

Environmental Sciences

Life & Pharmaceutical

Sciences

ApplicationsGeo Sciences

Building Software for the Grid

VPN SSH

Courtesy IBM,Lower Middleware

Upper Middleware & Tools

Bonds

10

LowerMiddleware

PlatformInfrastructure Unix Windows JVM TCP/IP MPI .Net Runtime

Environmental Sciences

Life & Pharmaceutical

Sciences

ApplicationsGeo Sciences

Building Software for the Grid

VPN SSH

Semantic Gap

Globus GT4 Web Services Shibboleth SRB

11

Coding to underweardef build_rsl_file(executable, args, stagein=[], stageout=[], cleanup=[]):

tocleanup = []stderr = t5temp.mktempfile()stdout = t5temp.mktempfile()rstderr = '${GLOBUS_USER_HOME}/.nimrod/' + os.path.basename(stderr)rstdout = '${GLOBUS_USER_HOME}/.nimrod/' + os.path.basename(stdout)

rslfile = t5temp.mktempfile()f = open(rslfile, 'w')f.write("<job>\n <executable>%s</executable>\n" % executable)for arg in args:

f.write(" <argument>%s</argument>\n" % str(arg))f.write(" <stdout>%s</stdout>\n" % rstdout)f.write(" <stderr>%s</stderr>\n" % rstderr)# User defined stage-in sectionif stagein:

f.write(" <fileStageIn>")for src, dest, leave in stagein:

if not leave:tocleanup.append(dest)

f.write("""<transfer>

<sourceUrl>gsiftp://%s%s</sourceUrl><destinationUrl>file:///${GLOBUS_USER_HOME}/.nimrod/%s</destinationUrl>

</transfer>""" % (hostname, src, dest))f.write("\n\t</fileStageIn>\n")

f.write(" <fileStageOut>")# User defined stage-out files section…………………………………………………………

12

LowerMiddleware

PlatformInfrastructure Unix Windows JVM TCP/IP MPI .Net Runtime

Environmental Sciences

Life & Pharmaceutical

Sciences

ApplicationsGeo Sciences

Software Layers

VPN SSH

UpperMiddleware/Tools

Globus GT4 Web Services Shilbolith SRB

13

LowerMiddleware

PlatformInfrastructure Unix Windows JVM TCP/IP MPI .Net Runtime

Environmental Sciences

Life & Pharmaceutical

Sciences

ApplicationsGeo Sciences

Software Layers

VPN SSH

NimrodNimrodPortal& WS

DistANT

UpperMiddleware/Tools

MotorGlobus GT4 Web Services Shilbolith SRB

Worqbench

Debug REMUS

GriddLeSKepler Guard ActiveSheets

Development Deploy Test/Debug Execution

14

Applications Development

15

Why is this challenging?

Write software for local workstation

16

Why is this challenging?

Build heterogeneous testbed

17

Applications Development on the Grid

New ApplicationsCode to middleware standardsSignificant effortExciting new distributed applicationNumerous programming techniques

Legacy ApplicationsWere built before the GridThey are fragileFile based IOMay be sequentialLeverage old codes to produce new virtual applicationAmenable to Grid Workflows

18

Approaches to Grid programming

General Purpose Workflows

Generic solutionWorkflow editor Scheduler

Special purpose workflows

Solve one class of problemSpecification languageScheduler

19

Parameter Sweep Workflows with Nimrod

LowerMiddleware

NimrodNimrodPortal& WS

DistANT

UpperMiddleware/Tools

MotorGlobus GT4 Web Services Shibboleth SRB

Worqbench

Debug REMUS

GriddLeSKepler Guard ActiveSheets

20

Nimrod…Supports workflows for robust design and search

Vary parametersExecute programsCopy data in and out

Sequential and parallel dependenciesComputational economy drives schedulingComputation scheduled near data when appropriateUse distributed high performance platformsUpper middleware broker for resources discoveryWide Community adoption

Nimrod

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006

Nimrod/GEnFuzion (www.axceleon.com)

Nimrod/ONimrod/OI

Nimrod/KActive Sheets (Excel)

Nimrod Roadmap

Nimrod/WS

21

Parameter Studies & SearchStudy or search the behaviour of some of the output variables against a range of different input scenarios.

Design optimizationAllows robust analysisMore realistic simulations

Computations are loosely coupled (file transfer)Very wide range of applications

22

Nimrod scales from local to remote resources

Office

Department

OrganisationNation

23

From Quantum chemistry to aircraft design

Drug Docking Aerofoil Design

24

Nimrod Development Cycle

Prepare Jobs using Portal

Jobs Scheduled Executed Dynamically

Sent to available machines

Results displayed &interpreted

25

Optimization using Nimrod/O

Nimrod/G allows exploration of design scenarios

Search by enumeration

Search for local/global minima based on objective function

How do I minimize the cost of this design?How do I maximize the life of this object?

Objective function evaluated by computational model

Computationally expensive

26

Genetic AlgorithmGenetic Algorithm

SimplexSimplex

Grid or ClusterGrid or Cluster

How Nimrod/OWorks

BFGSBFGS

Nimrod orNimrod orEnFuzionEnFuzion

DispatcherDispatcher

FunctionFunctionEvaluationsEvaluations

JobsJobs

NimrodNimrodPlanPlanFileFile

27

Interactive Design

Human-in-the-optimization-loopUse population based methodsRank solutions

28

Execution

LowerMiddleware

NimrodNimrodPortal& WS

DistANT

UpperMiddleware/Tools

MotorGlobus GT4 Web Services Shibboleth SRB

Worqbench

Debug REMUS

GriddLeSKepler Guard ActiveSheets

29

Why is this challenging?

Build, schedule & Execute virtual application

30

The Nimrod Portal

31

Nimrod’s Runtime machinery

0

2

4

6

8

10

12

0 1 3 4 6 8 9 10 12 14 15 17 19 20 21 22 24 25 27 28 30 31 33 34 36 37 38 40 41 43 44 46 47 49 51 52 54

Time (minutes)

Jobs

Linux cluster - Monash (20) Sun - ANL (5) SP2 - ANL (5) SGI - ANL (15) SGI - ISI (10)

Soft real-time scheduling problem

32

Active Sheets …

33

Can we support this process better?

Deploy & Build

Execution

ApplicationsDevelopment

Test & Debug

Support scientists do what they do best

Science

Combination of MiddlewareSoftware tools

34

Acknowledgements (Monash Grid Research)

Research FellowsColin EnticottSlavisa GaricJagan KommineniTom PeachyJeff Tan

PhD StudentsShahaan AyyubPhilip ChanTim HoDonny KurniawanWojtek GoscinskiAaron Searle

Funding & SupportCRC for Enterprise Distributed Systems (DSTC)Australian Research CouncilGrangeNet (DCITA)Australian Partnership for Advanced Computing (APAC)MicrosoftSun MicrosystemsIBMHewlett PackardAxceleon

35

Questions?

www.csse.monash.edu.au/~davida

36

parameter energy label "Variable Photon Energy" float select anyof 0.03 0.05 0.1 0.2 0.3 default 0.03 0.05 0.1 0.2 0.3;parameter iseed integer random from 0 to 10000;parameter length label "Length of collecting electrode" float select anyof .8 .9 1 default .8 .9 1;parameter radius label "Radius" float select anyof 0.0625 0.0725 0.0825 default 0.0625 0.0725 0.0825;task nodestart

copy NE2611.dat node:.copy ne2611.skel node:.

endtasktask main

node:substitute ne2611.skel NE2611.INPnode:execute ne2611.xx copy node:NE2611.OP ne2611out.$jobnamecopy node:stderr ne2611.time.$jobname

endtask

Plan File

www.monash.edu.au

Burnoff of the Australian savanna –Does it affect the climate? Testing the Pragma Testbed.

K. Görgen, A. Lynch, C. Enticott*, J. Beringer, D. Abramson**,

P. Uotila, N. Tapper

School of Geography and Environmental Science* Distributed Systems Technology Centre** School of Computer Science and Software Engineering

38

Savanna Burnoff

• Extensive savanna eco-systems in northern Australia

– 25 % of Australia– Vegetation: spinifex / tussok

grasslands; forest / open woodland

– Warm, semiarid tropical climate– Primary land uses:

> Pastoralism> Mining > Tourism> Aboriginal land

management

(Tropical Savannas CRC)

39

Motivation

• Extensive savanna eco-systems in northern Australia

• Changing fire regime • Fires lead to abrupt changes in

surface properties– Surface energy budgets– Partititioning of convective fluxes – Increased soil heat flux→ Modified surface-atmosphere

coupling • Sensitivity study: do the fire’s

effects on atmospheric processes lead to changes in highly variable precipitation regime of Australian Monsoon?

• Many potential impacts (e.g. agricultural productivity)

(J. Beringer)

40

• Combination of atmospheric modelling (C-CAM), re-analysis and observational data

• C-CAM Simulations

Experiment Design

1974 to 1978 1979 to 1999

spinup control run, no fires / succession

real fires / succession, selected scenarios

~ 90 independent runs (fire / succession scenarios)for sensitivity studies → 1890 yrs of simulations

Part IPart II

41

Use of Grid Computing

• 90 parallel independent model runs • Single CPU model version of parallelized C-CAM (MPI)• Distribution of forcing data repositories to cluster sites (~80

GB), 250 MB forcing data per month• Machine independent dataformats (NetCDF)• Architecture specific, validated C-CAM executables• ~1.5 month CPU time for one experiment (90 exp. total)• Robust, portable, self-controlling model system incl. all

processing tools and restart files• PRAGMA Testbed

– Can we get enough nodes to complete experiment?– Can we maintain a testbed for 1.5 Months?– Can we maintain a node up for 0.5 days?– Can we make this routine for climate modelers?

42

0

10

20

30

40

50

60

70

80

90

100

Mar

08

2006

Mar

12

2006

Mar

17

2006

Mar

22

2006

Mar

27

2006

Mar

31

2006

Apr 0

5 20

06

Apr 1

0 20

06

Apr 1

4 20

06

Apr 1

9 20

06

Apr 2

4 20

06

Apr 2

8 20

06

May

03

2006

May

08

2006

May

13

2006

May

17

2006

May

22

2006

May

27

2006

May

31

2006

Jun

05 2

006

Jun

10 2

006

Jun

15 2

006

Jun

19 2

006

Jun

24 2

006

Jun

29 2

006

Jul 0

3 20

06

Jul 0

8 20

06

Jul 1

3 20

06

Jul 1

8 20

06

Jul 2

2 20

06

Jul 2

7 20

06

Aug

01 2

006

Aug

05 2

006

Aug

10 2

006

Aug

15 2

006

Aug

19 2

006

maharrocks-52umejupiterpragma001amata1tgcTotal

top related