cyberinfrastructure: helping push research boundaries

67
Cyberinfrastructure: Helping Push Research Boundaries Shantenu Jha* Asst. Res. Professor (CS) Sr. Research Scientist (CCT) *also affiliated with National e-Science Centre (UK) & UCL

Upload: cybera-inc

Post on 11-May-2015

996 views

Category:

Technology


3 download

DESCRIPTION

Shantenu Jha's presentation from Cybera's 2007 Banff Cyberinfrastructure Summit

TRANSCRIPT

Page 1: Cyberinfrastructure: Helping Push Research Boundaries

Cyberinfrastructure: Helping Push Research Boundaries 

Shantenu Jha*

Asst. Res. Professor (CS)  

Sr. Research Scientist (CCT)*also affiliated with National e­Science Centre (UK) & UCL

Page 2: Cyberinfrastructure: Helping Push Research Boundaries

2

CI: Helping Push Research Boundaries

• Developing CI: A one step process? – “If we build it, will they come?” “Will it be usable?”

• Interplay of (sustainable, long­term, and broadly usable) CI and Research more complex – Research & Applications requirements  inform the 

development of CI – In response, developed CI “roughly” sets the 

boundaries of applications and their usage mode– Novel applications and usage modes that can 

exploit CI will push the boundaries of research...

Page 3: Cyberinfrastructure: Helping Push Research Boundaries

3

Outline• Scientific Grid Applications 

• Computing Free Energies in Biological Systems–  STIMD (2003­04), SPICE (2005­06)

• Challenges of Distributed Environments • HARC: A tool for co­allocating resources

– GENIUS: Grid­Enabled Neurosurgical Imaging Using Simulations (2007­08)

• Simple API for Grid Applications (SAGA)• Regional CI Example ­ Louisiana

• Software: Cactus, HARC, Petashare, SAGA... • People: LONI Institute and NSF Cybertools• Novel e­Science Applications

Page 4: Cyberinfrastructure: Helping Push Research Boundaries

   Source: NSF report on Cyberinfrastructure for Biology

Page 5: Cyberinfrastructure: Helping Push Research Boundaries

   

Computing Free Energies: Motivation

Thermodynamic quantity of maximumsignificance. 

Characterizes the binding accurately:  Inhibit specific protein domains       Cell signalling events  Intelligent drug design ...

Rapid  &  accurate  determination    is critical:      ­  where  FE  difference  maybe  just one part of the overall “system”     ­ library of ligands to explore

Cellullar  messangers     e.g.  Growth  Factors, cytokines

Gene switched

Recognition of activated receptor by SH2 domain

Transmembrane receptor

Page 6: Cyberinfrastructure: Helping Push Research Boundaries

   

Scientific Grid Computing: An Exemplar

• Computing FE is computationally very expensive. Balance between  accuracy  and  rapid  determination.  Some experimental time­scales are 2­3 days. 

• Algorithmic advances  (e.g., O(N  logN)) have helped; but more than just algorithmic advances are required. 

• Computational  ‘Grid’ Science: Which approaches can be adapted  to  exploit  grid­based  computation?  Interplay  of physical algorithm(s) and grid  architecture(s).

 

Page 7: Cyberinfrastructure: Helping Push Research Boundaries

   

Computing a Free Energy Difference Using Thermodynamic Integration (TI)

λ=1λ=0

∆GA∆GB

∆G1

∆G2

TI provides a formalism to compute the difference of free energy of binding (GAB) between  two  somewhat  similar,  yet  different  peptides.  The  key  concept  in  TI    is  that  of  a thermodynamic cycle ­ varying the value of lambda from 0 (peptide A) to 1 (peptide B).     

∆ ∆GAB  =   ∆GB ­ ∆GA

     =   ∆G1 ­ ∆G2

Src SH2 domain

ligandFree  energy  of  binding  of  the  ligand  to  the larger protein  characterises  the  strength  of  the binding. 

Page 8: Cyberinfrastructure: Helping Push Research Boundaries

   

In  general,  use  realtime  analysis  to  dynamically determine best next value of lambda.  λ

H

Launch initial job, use real time analysis to determine  when  to  spawn  next  simulation at  a  new  λ  value.  Spawning  simulation continues  until  sufficient  data  collected. Need to control several jobs. 

TI Calculation: Modified Workflow

λ=0.10

λ=0.33

λ=0.25

time

…λ=0.9

Starting conformation

H

tCheck for convergence

λ

λ

Combine  and  calculate  data  from  all  runs  to compute the integral to get  ∆∆GAB.

Page 9: Cyberinfrastructure: Helping Push Research Boundaries

   

Infrastructure (Software) Developed for Application

Steering  Library:  Correct  functional    abstraction  from  app. perspective. 

steering_control(), register_params(), emit_data()

  Library  determines  destination,  transport  mechanism,  fopen, fwriteDetails of infrastructure & middleware (layers) hidden from app

Architecture of a steered application.

Page 10: Cyberinfrastructure: Helping Push Research Boundaries

Extensions of the Distributed TI  Concept Computational Techniques

­ Replica Exchange Methods    Need for “intelligent” infrastructure to be coupled with analysis method

­ Ensemble MD– Simulate each system many times from same starting position– Allows conformational sampling. Can’t say how much a priori.

Start Conformation Series of Runs End Conformations

Cx C2

C1

C4

C3

Equilibration Protocolseq1  eq2  eq3  eq4  eq5  eq6  eq7  eq8

Page 11: Cyberinfrastructure: Helping Push Research Boundaries

   

RNA Translocation Through Protein PoresRNA Translocation Through Protein Pores

Molecular  Biology:  Critical  and ubiquitous  process. A model for:­ Gene expression in eukaryotic cells­ Viral infections rely on import of viral genome via nuclear pore­Translocation  across  bacterial membrane during  phage infection                     Technical Applications: Artificial pores (similar  to  natural  pore)  for  high­throughput DNA screeningTheoretical  Physics:  Long  standing problem  of  a  semi­flexible  polymer motion in a confined geometry

Page 12: Cyberinfrastructure: Helping Push Research Boundaries

   

Simulated Pore Interactive Computing Environment

Size,  complexity  &  timescale:  Computations  expensive.     Millions of CPU hours using “vanilla” MD.  Not good enough.

Free  Energy  Profile:  Extremely  challenging  but  yields maximal insight and understanding of translocation process.

Novel Algorithm:  Steered Molecular Dynamics  to  “pull DNA through  the  pore”.  Jarzynksi's  Equation    to  compute equilibrium  free  energy  profile  from  such  non­equilibrium pulling. 

  e ­ß∆F  = ‹ e ­ßW ›

Page 13: Cyberinfrastructure: Helping Push Research Boundaries

   

Grid Computing Using Novel Algorithms 

Reduces computational cost by a factor of ca. 100. Solve  a  computationally  “intractable”  problem  using  novel algorithm.  Our  solution  not  only  exploits  grid  infrastructure,  but requires it. 

SMD+JE:  Need  to  determine “optimal”  parameters  before simulations at the optimal values.

  Requires:  Interactive  simulations and  distributing  many  large, parallel simulations Interactive  “Live  coupling”:  use visualization to steer simulation

Page 14: Cyberinfrastructure: Helping Push Research Boundaries

   

Replace single long running ‘vanilla’ MD simulation with following scheme:

Step I: Understand structural features using static visualization 

Step II:  Interactive simulations for dynamic and energetic features    ­  Steered simulations: bidirectional communication. 

Qualitative + Quantitative (SMD+JE)                  ­  Haptic interaction: Use haptic  to feel feedback forces Step III:  Simulations  to compute “optimal” parameter values:               e.g., 75 simulations on 128/256 processors each.      Step IV: Use computed “optimal” values to calculate full FEP  

along the cylindrical axis of the pore.       

    

SPICE: Computing the Free Energy Profile (FEP)

Page 15: Cyberinfrastructure: Helping Push Research Boundaries

   

Grid Computing, Interactivity and Analysis

•  Interactive  simulations  used  to  determine:Optimal  value  of  force­constant  &  pulling velocity,  choice  of  sub­trajectory  length  and location for optimal value simulations

•  Use  visualization  to  provide  input  to    the running  simulation.  Require  256px  (or  more) of  HPC  for  interactivity.  Steady­state  data stream (up & down)

Interactive  simulations  perform  better  when  using  optical lightpaths  between simulation and visualization

Due to network characteristics. Typical “legacy” app (NAMD) not written for network I/O. “Unreliable” transfer can stall simulations. 

Page 16: Cyberinfrastructure: Helping Push Research Boundaries

“Global” Grid Infrastructure

Computation

Starlight (Chicago)Netherlight

(Amsterdam)

PSC

SDSC

Network PoP

NCSA

UKLight

US TeraGrid

All sites connected by production network

DEISA

Visualization

UK NGS

Leeds

Manchester

Oxford

RAL

HPCx

NGS

App Sci

Page 17: Cyberinfrastructure: Helping Push Research Boundaries

Recap: FE Exemplars

Both FE Algorithms are good candidates for distributed resource utilization

–  i.e., “pleasantly” distributable Similar Infrastructure 

–  Software (ReG Steering Services), middleware.. –  Federated Grids

SPICE more complex than STIMD:– Complexity of tasks different–  Needs co­scheduling of heterogenous resources– number of components/degree­of­freedom different

Page 18: Cyberinfrastructure: Helping Push Research Boundaries

   

VORTONICS:  Vortex Dynamicson Transatlantic Federated Grids

US­UK TG­NGS Joint Projects Supported by NSF, EPSRC, and TeraGrid

Computational challenge: Enormous problem sizes, memory requirements, and long run times: Largest runs require geographically distributed domain decomposition (GD3)

Page 19: Cyberinfrastructure: Helping Push Research Boundaries

   

Run Sizes to Date / Performance

• Using an early version of MPICH­G2, 3D lattice sizes up to 6453  across six sites on TG/NGS

• NCSA, SDSC, ANL, TACC, PSC, CSAR (UK)• Amount of data injected into network.  

Strongly bandwidth limited.

• Effective SUPS/processor• Reduced by factor approximately equal to 

number of sites• Therefore SUPS approximately constant as 

problem grows in size– If too large to fit onto one machine, GD3   over 

N resources simultaneously is no worse than N sequential runs

756149430026001kSUPS/Procsites

1 2 3 4 sites

Page 20: Cyberinfrastructure: Helping Push Research Boundaries

20

Outline

• Scientific Grid Applications • Computing Free Energies in Biological Systems

–  STIMD (2003­04), SPICE (2005­06)• Challenges of Distributed Environments 

• HARC: A tool for co­allocating resources– GENIUS: Grid­Enabled Neurosurgical Imaging 

Using Simulations (2007­08)• Simple API for Grid Applications (SAGA)

• Regional CI Example• Software: Cactus, HARC, Petashare, SAGA... • People: LONI Institute and NSF Cybertools• Novel e­Science Applications

Page 21: Cyberinfrastructure: Helping Push Research Boundaries

   

• Need for usable, stable and  extensible infrastructureInfrastructure relatively “easy” for demo(s); difficult for routine use! Science requires stable & persistent infrastructure

• Hiding the Heterogeneity; providing uniformityWe interface application code to grid middleware through well defined user­level APIs. No code refactoring required.Hides heterogeneity of software stack and site­specific details:

Vortonics: MPICH­G2 hides low­level details   (communication, network­topology,  resource­allocation and management)

SPICE ­­ RealityGrid steering library 

Challenges of Distributed EnvironmentsLessons learnt from Pilot Projects

Motivation for SAGA Efforts at OGF

Page 22: Cyberinfrastructure: Helping Push Research Boundaries

   

• Machine Configuration Issues:­ Variants of the same problem faced, e.g., hidden IP issue        for MPICH­G2 and RealityGrid steering ­ Same problem on different resources, e.g., PSC and HPCx. 

PSC  = qsocket + Access Gateway Node        performance issues remain due to protocol constraintsHPCx= Same solution does not work for ReG Steering;

                  port­forwarding being tested

Challenges of Distributed Environments

Page 23: Cyberinfrastructure: Helping Push Research Boundaries

   

Challenges of Distributed EnvironmentsFederated Grids

• Current barrier to utilise federated grids still high:Many degrees­of­freedom need coordinationCollective Inter­Grid Debugging required

• Federated Grids must be interoperable in practice:Stress test using real applications Requires additional “user level middleware” (MPICH­G2,   ReG steering infrastructure), to work across grids• Paper on the theory, implementation and experiences of the three joint projects: (CLADE 2006, Sp. Issue Cluster Comp)       http://www.realitygrid.org/publications/triprojects_clade_final.pdf•Application level Interoperability; Influenced the creation of GIN

Page 24: Cyberinfrastructure: Helping Push Research Boundaries

   

Challenges of Distributed EnvironmentsNew policies for resource co­ordination are required A common requirement of SPICE and VORTONICS: co­scheduling of resources (computer, visualization, network)!Three levels of scheduling complexity:

–  Advance single resource reservation–  Advanced, coordinated multiple reservations across a grid –  Advance coordinated reservations across distinct grids!

First breaks standard HPC usage model; Third  Cross­Grid Co­scheduling is very hard today.

Current levels of human intervention too high: Need  Automation           

 Motivation for HARC

Page 25: Cyberinfrastructure: Helping Push Research Boundaries

   

HARC: Highly­Available Resource Co­allocator

• What is Co­allocation?• Process of reserving multiple resources for use by a 

single application or “thing” – but in a single step...• Can reserve the resources:

– For the same time:• Metacomputing, large MPIg/MPICH­G2 jobs• Distributed visualization 

– Or some coordinated set of times• Computational workflows

HARC is primarily  developed by Jon  Maclaren@ CCT

Page 26: Cyberinfrastructure: Helping Push Research Boundaries

   

How does HARC Work?• Client makes request, from 

command line, or other tool via Client API

• Request goes to the HARC Acceptors, which manage the co­allocation

• The Acceptors talk to individual Resource Managers which make the individual reservations by talking to the local schedulers

Page 27: Cyberinfrastructure: Helping Push Research Boundaries

   

HARC is Extensible(Community Model)

• Modular Design throughout– Not just compute resources.  New resource types can be 

added, then co­allocated with all other types of resource– No modification to Acceptors is needed.  Just provide 

Resource Manager code to schedule the resource– And extend the Client API with new classes (again, no mods to 

existing code)– Even works from the command line

• Example: Network Resource Manager$ harc-reserve -n EnLIGHTened/RA1-BTH -c bluedawg.loni.org/8 -s 12:00 -d 1:00

– Co­allocates a lightpath between LSU & MCNC, with 8 processors on bluedawg...

– Was used to schedule lightpaths in EnLIGHTened testbed for Thomas Sterling’s HPC Class, broadcast in High­def video

Page 28: Cyberinfrastructure: Helping Push Research Boundaries

   

GENIUS: OverviewPI: Coveney, UCL

Goals:  Provide a better understanding of     cerebral fluid flow,  Inform clinicians of best surgical approaches.

Approach: Model large scale patient specific cerebral blood flow within clinically 

relevant time framesProvides:

 Reliable & effective patient­specific image­based models  Efficient LB blood flow simulation Real time blood flow volume rendering – Visualisation

Page 29: Cyberinfrastructure: Helping Push Research Boundaries

   

A fast fluid flow simulation of a very large system requires the use of an efficient parallel fluid solver several processors

• Lattice­Boltzmann method; Parallel MPI code,• Efficient algorithms for sparse geometries,• Topology­aware graph growing partitioning technique,• Optimized inter­ and intra­machine communication 

patterns,• Full checkpoint capabilities...

HemeLB fluid flow solver

Page 30: Cyberinfrastructure: Helping Push Research Boundaries

   

MPI­g: pre­release grid­enable MPI implementation which is optimised for the overlap of communication and computation.

Performance of HemeLB’s fluid solver on patient­specific system using LONI machines (IBM Power5 1.9 GHz).

LB time steps per secondCross­site ­ 2 LONI IBM machines

HemeLB fluid solver performance

Page 31: Cyberinfrastructure: Helping Push Research Boundaries

   

Efficient code and MPI­g exist,  but how to run over several distributed machines?

Use HARC ­ Highly Available Robust Co­scheduler (developed by Jon Maclaren at LSU).

Why  HemeLB  +  MPI­g  +  HARC ?

Heterogeneous and sparse resources are more likely to be available and give us prompt results

Clinically relevant and timely results

Advance resources reservation: HARC

Page 32: Cyberinfrastructure: Helping Push Research Boundaries

   

US/UK Grid Infrastructure

LONI

UK NGS

Leeds

Manchester

Oxford

RAL

HPCx

NGS

TeraGrid

The GENIUS project makes use ofinfrastructure provided by LONI, TeraGrid and NGS, connected by dedicated switched optical light paths

LSU

La TechLSU HSC

ULL

Tulane

SU

UNOLSU HSC

ULM

McNeese

NSU

SLUAlex

Page 33: Cyberinfrastructure: Helping Push Research Boundaries

   

Using HARC...• Our aim is to get HARC available to users as part of 

the basic Grid infrastructure• Current deployments

– LONI (Louisiana Optical Network Initiative)• production mode

– UK NGS, Manchester, Leeds and Oxford NGS2 – TeraGrid Co­scheduling testbed machines 

(SDSC/NCSA IA­64)– NW­Grid (Lancaster, Manchester)

• Everything is open source too• See:

– http://www.cct.lsu.edu/~maclaren/HARC/

Page 34: Cyberinfrastructure: Helping Push Research Boundaries

34

Rough Taxonomy of Applications

• Some applications are Grid­unaware and want to remain so– Use tools/environments (e.g, NanoHub, GridChem)– May run on Grid­aware/Grid­enabled environments (e.g.  

Condor) or programming environment (e.g, MPICH­G2)• Some applications are explicitly Grid­aware

– Control, Interact & Exploit distributed systems at the application level

Page 35: Cyberinfrastructure: Helping Push Research Boundaries

SAGA: In a Nutshell

• A lack of: • Programming interface that provides common grid functionality with the correct level of abstractions? • Ability to hide underlying complexities, varying semantics, heterogenities and changes from application  program(er) 

• Simple, integrated, stable, uniform, high­level  interface• Simplicity: Restricted in scope, 80/20 • Measure(s) of success:

• Does SAGA enable quick development  of “new” distributed applications?• Does it enable greater functionality using less code?

Page 36: Cyberinfrastructure: Helping Push Research Boundaries

10/22/2006 LCSD'06  36

Copy a File: Globus GASSif (source_url.scheme_type == GLOBUS_URL_SCHEME_GSIFTP || source_url.scheme_type == GLOBUS_URL_SCHEME_FTP ) { globus_ftp_client_operationattr_init (&source_ftp_attr); globus_gass_copy_attr_set_ftp (&source_gass_copy_attr, &source_ftp_attr); } else { globus_gass_transfer_requestattr_init (&source_gass_attr, source_url.scheme); globus_gass_copy_attr_set_gass(&source_gass_copy_attr, &source_gass_attr); } output_file = globus_libc_open ((char*) target, O_WRONLY | O_TRUNC | O_CREAT, S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP); if ( output_file == -1 ) { printf ("could not open the file \"%s\"\n", target); return (-1); } /* convert stdout to be a globus_io_handle */if ( globus_io_file_posix_convert (output_file, 0, &dest_io_handle) != GLOBUS_SUCCESS) { printf ("Error converting the file handle\n"); return (-1); } result = globus_gass_copy_register_url_to_handle ( &gass_copy_handle, (char*)source_URL, &source_gass_copy_attr, &dest_io_handle, my_callback, NULL); if ( result != GLOBUS_SUCCESS ) { printf ("error: %s\n", globus_object_printable_to_string (globus_error_get (result))); return (-1); } globus_url_destroy (&source_url); return (0); }

int copy_file (char const* source, char const* target) {globus_url_t source_url;globus_io_handle_t dest_io_handle;globus_ftp_client_operationattr_t source_ftp_attr;globus_result_t result;globus_gass_transfer_requestattr_t source_gass_attr;globus_gass_copy_attr_t source_gass_copy_attr;globus_gass_copy_handle_t gass_copy_handle;globus_gass_copy_handleattr_t gass_copy_handleattr; globus_ftp_client_handleattr_t ftp_handleattr; globus_io_attr_t io_attr; int output_file = -1; if ( globus_url_parse (source_URL, &source_url) != GLOBUS_SUCCESS ) { printf ("can not parse source_URL \"%s\"\n", source_URL); return (-1); } if ( source_url.scheme_type != GLOBUS_URL_SCHEME_GSIFTP && source_url.scheme_type != GLOBUS_URL_SCHEME_FTP && source_url.scheme_type != GLOBUS_URL_SCHEME_HTTP && source_url.scheme_type != GLOBUS_URL_SCHEME_HTTPS ) { printf ("can not copy from %s - wrong prot\n", source_URL); return (-1); } globus_gass_copy_handleattr_init (&gass_copy_handleattr); globus_gass_copy_attr_init (&source_gass_copy_attr); globus_ftp_client_handleattr_init (&ftp_handleattr); globus_io_fileattr_init (&io_attr); globus_gass_copy_attr_set_io (&source_gass_copy_attr, &io_attr); &io_attr); globus_gass_copy_handleattr_set_ftp_attr (&gass_copy_handleattr, &ftp_handleattr); globus_gass_copy_handle_init (&gass_copy_handle, &gass_copy_handleattr);

Page 37: Cyberinfrastructure: Helping Push Research Boundaries

10/22/2006 LCSD'06  37

SAGA Example: Copy a FileHigh­level, uniform

#include <string>#include <saga/saga.hpp>

void copy_file(std::string source_url, std::string target_url) { try { saga::file f(source_url); f.copy(target_url); } catch (saga::exception const &e) { std::cerr << e.what() << std::endl; }}

• Provides the high level abstraction, that application programmers need; will work across different systems

• Shields gory details of lower­level m/w system• Like MapReduce – leave details of distribution etc. out

Page 38: Cyberinfrastructure: Helping Push Research Boundaries

   

SAGA: Scope• Is:

–  Simple API for Grid­Aware Applications•  Deal with distributed infrastructure explicitly

– High­level (= application­level) abstraction– An uniform interface to different middleware(s)– Client­side software

• Is NOT:– Middleware– A service management interface!– Does not hide the resources ­ remote files, job (but 

the details)

Page 39: Cyberinfrastructure: Helping Push Research Boundaries

   

SAGA API: Towards a Standard• The need for a standard programming interface

– “Go it alone” versus “Community” model –  Reinventing the wheel again, yet again, and again–  MPI as a useful analogy of community standard– OGF the natural choice; establish SAGA­RG

•  “Tedium” of the standardisation process?– Not all technology needs to be standardised upfront – Standardisation not a guarantee to success

• Requirements Document –  Quick skim through the Requirements document re–  Design and requirements derived from 23 Use Cases–  Different projects, applications and functionality

Page 40: Cyberinfrastructure: Helping Push Research Boundaries

   

The SAGA Landscape

Page 41: Cyberinfrastructure: Helping Push Research Boundaries

SAGA  C++ (LSU)

Page 42: Cyberinfrastructure: Helping Push Research Boundaries

10/22/2006 LCSD'06  42

Implementation ­ Requirements• Non­trivial set of requirements:

– Allow heterogenous middleware to co­exist– Cope with evolving grid environments; dyn resources– Future SAGA API extensions – Portable, syntactically and semantically platform 

independent; permit latency hiding mechanisms– Ease of deployment, configuration, multiple­language 

support, documentation etc.– Provide synchronous, asynchronous & task versions Portability, modularity, flexibility, adaptabilty, extensibility

Page 43: Cyberinfrastructure: Helping Push Research Boundaries

   

Job Submission API

01: // Submitting a simple job and wait for completition02: //03: saga::job_description jobdef; 04: jobdef.set_attribute ("Executable", "job.sh");05:06: saga::job_service js; 07: saga::job job = js.create_job ("remote.host.net", jobdef);08:09: job.run();10:11: while( job.get_state() == saga::job::Running )12: {13: std::cout << “ Job running with ID: “ 14: << job.get_attribute(“ JobID” ) << std::endl;15: sleep(1);16: }

Page 44: Cyberinfrastructure: Helping Push Research Boundaries

   

SAGA Landscape

Page 45: Cyberinfrastructure: Helping Push Research Boundaries

   

 GridSAT First Principles Grid Application

• Grid implementation  of the satisfiability problem: To determine if the variables of given Boolean formula can be assigned such as to make it TRUE. 

• Adaptive: computation to communication ratio need/can be adjustable (!)

• Allows new domain science–  beats zChaff (time taken 

and problem) Adapted from slides by Wolski & Chakrab

Page 46: Cyberinfrastructure: Helping Push Research Boundaries

   

GridSAT Characteristics• Parallel, distributed SAT solver

– Both CPU and Memory Intensive– Splitting leads to better performance– Allows sharing: clause learned in solver shared

• Grid Aware Application:– Heterogenous (single, clusters & supercomputers)– Dynamical Resource Usage

• Unpredictable runtime behaviour–  How much time? How many resources? When 

to split? Which process splits first?– Problems vary: easy to hard, short to long

– Need to be adaptive, “add resources as you go”

Page 47: Cyberinfrastructure: Helping Push Research Boundaries

   

GridSAT: Programming Requirements

• RPC, Dynamic resource & Job management

Error Handling, scheduling and checkpointing

SAGA provides the required programming functionality, at the correct level of abstraction and thus makes it easier to  manage, deploy and extend (for new functionality) GridSAT

Page 48: Cyberinfrastructure: Helping Push Research Boundaries

   

Legacy Application: Replica Exchange

• “Class of algorithm” used for bio­molecular simulations

• e.g.,  Protein (mis­) folding• Primarily used for 

•  Enhanced sampling•  Determine transition rates

• Task Level Parallelism – Embarrassingly distributable!

Page 49: Cyberinfrastructure: Helping Push Research Boundaries

   

Replica Exchange Algorithm

• Create replicas of initial configuration

• Spawn 'N' replicas over different machine

• Run for time t• Attempt configuration 

swap of   Ri  <­> Rj

• Run for further time t• ...• Repeat till termination

RN

R1

R2

R3

t

hot

300KExchange  attempts

T

t

Page 50: Cyberinfrastructure: Helping Push Research Boundaries

   

RE: Programming Requirements RE can be implemented using following “primitives”

• Read job description– # of processors, replicas, determine resources

• Submit jobs– Move files, job launch

• Access simulation data & analysis• Checkpoint and re­launch simulations

– Exchange, RPC (to swap or not) Implement above using “grid primitives” provided by SAGA  Separated “distributed” logic from “simulation” logic

Independent of underlying code/engine Science kernel is independent of details of distributed 

resource management   Desktop akin to High­end supercomputer!!

Page 51: Cyberinfrastructure: Helping Push Research Boundaries

   

Programming Distributed  Applications   Parallel Programming Analogy 

Status of distributed programming today, (somewhat) similar to parallel programming pre­MPI days MPI was a “success” in that it helped many new applications

–  MPI was simple–  MPI was a standard (stable and portable code!)

• SAGA is to the grid application developer, what MPI is to the parallel program developer  (“grid­primitives”)• SAGA conception & trajectory similar to MPI

–  SAGA is simple to use– OGF specification; on path to becoming a standard  

•Therefore, SAGA's Measure(s) of success: •  Does SAGA enable “new” grid applications? 

Page 52: Cyberinfrastructure: Helping Push Research Boundaries

52

Outline• Scientific Grid Applications 

• Computing Free Energies in Biological Systems–  STIMD (2003­04), SPICE (2005­06)

• Challenges of Distributed Environments • HARC: A tool for co­allocating resources

– GENIUS: Grid­Enabled Neurosurgical Imaging Using Simulations (2007­08)

• Simple API for Grid Applications (SAGA)• Regional CI Example: LONI (Now part of TeraGrid!)

• Hardware: Compute + Network • Software: Cactus, HARC, Petashare, SAGA... • People: LONI Institute and NSF Cybertools• Novel e­Science Applications

Page 53: Cyberinfrastructure: Helping Push Research Boundaries

National Lambda Rail

~ 100TF IBM, Dell Supercomputers

UNOTulaneUL­L

SUBRLSU

LA Tech

3 Axes:­ LONI­ CyberTools­ LONI Institute

Page 54: Cyberinfrastructure: Helping Push Research Boundaries

   

Cybertools: Providing Application Software 

Page 55: Cyberinfrastructure: Helping Push Research Boundaries

   

Cybertools (2)

WP4: Core Package!

Page 56: Cyberinfrastructure: Helping Push Research Boundaries

   

Integrating Applications into a Cyberenvironment

Page 57: Cyberinfrastructure: Helping Push Research Boundaries

57

Page 58: Cyberinfrastructure: Helping Push Research Boundaries

   

• Goal: Enable underlying infrastructure to manage the low­level data handling issues. 

• Novel approach: treat data storage resources and the tasks related to data access as first class entities just like computational resources and compute tasks.

• Key technologies being developed: data­aware storage systems, data­aware schedulers (i.e. Stork), and cross­domain meta­data scheme. 

• PetaShare exploits 40 Gb/sec LONI connections between 5 LA Universities : LSU, LaTech, Tulane, ULL & UNO

Cybertools: Not just compute!PI: Tevfik Kosar (CCT/LSU)

Page 59: Cyberinfrastructure: Helping Push Research Boundaries

   

UNO

Tulane

LSU

ULL

LaTech

High Energy PhysicsBiomedical Data Mining

Coastal ModelingPetroleum Engineering

Synchrotron X-ray Microtomography Computational Fluid Dynamics

Biophysics

Molecular BiologyComputational Cardiac Electrophysiology Petroleum Engineering

Geology

Participating institutions in the PetaShare project, connected through LONI. Sample research of the participating researchers pictured (i.e. biomechanics by Kodiyalam & Wischusen, tangible interaction by Ullmer, coastal studies by Walker, and molecular biology by Bishop).

Page 60: Cyberinfrastructure: Helping Push Research Boundaries

• Build on LONI infrastructure, create bold new inter­university superstructure– New faculty, staff, students;  train others.  Focus on CS, Bio, 

Materials, but all disciplines impacted– Promote collaborative research at interfaces for innovation– Much stronger recruiting opportunities for all institutions

• Two new faculty at each institution (12 total)– Six each in CS, Comp. Bio/Materials with half PKSFI matching;  

fully covered after five years• Six Computational Scientists

– Support 70­90 projects over five years; lead to external funding• Graduate students

– 36 new students funded, trained; two years each

LONI Institute 

Page 61: Cyberinfrastructure: Helping Push Research Boundaries

61

Applications: Where it all comes together...

Page 62: Cyberinfrastructure: Helping Push Research Boundaries

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITYCENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY62

Resource Performance Monitoring Application

• jfdja; • NWS, BQP – only 1 resource at a time!! • How to choose M resources out of N ?  

e.g. MPICH­G2 Application, which M• Cactus + SAGA + LONI (Lightpaths)

Page 63: Cyberinfrastructure: Helping Push Research Boundaries

63

Page 64: Cyberinfrastructure: Helping Push Research Boundaries

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITYCENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY64

Page 65: Cyberinfrastructure: Helping Push Research Boundaries

Acknowledegments: The SAGA Team

Hartmut Kaiser

Andre Merzky

Ole Weidner

Thilo Kielmann

Ceriel Jacobs

Kees Verstop

Page 66: Cyberinfrastructure: Helping Push Research Boundaries

CENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITYCENTER FOR COMPUTATION & TECHNOLOGY AT LOUISIANA STATE UNIVERSITY66

Acknowledgements

• HARC: Jon Maclaren, LSU 

• GENIUS: Peter Coveney and Steve Manos, UCL

• PetaShare: Tevfik Kosar 

• Students & Research Staff @ CCT 

• LONI Staff

• Funding Agencies: NSF, EPSRC (UK), LA BoR

Page 67: Cyberinfrastructure: Helping Push Research Boundaries

67

Conference Announcement

• MardiGras Conference 2008:  “Novel Distributed Computing Applications and Technology” –  http://mardigrasconference.org

• Dan Katz (Chair) & Shantenu Jha (co­Chair) • Craig Lee (PC Chair), Geoffrey Fox (Vice­Chair, 

Emerging Technologies), Bill St. Arnaud (Vice Chair, Network­Instensive Applications), Matei Ripeanu (UBC) Publicity 

• Oct 31 Paper Submission Deadline• Peer reviewed proceedings  to be published in 

ACM library (ISBN I978­1­59593­835­0)