from grid to global computing: deploying parameter sweep applications henri casanova grid research...

33
Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL) http://grail.sdsc.edu/ San Diego Supercomputer Center (SDSC) Computer Science and Engineering Dept. (CSE) University of California, San Diego (UCSD)

Upload: griffin-powers

Post on 18-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

From Grid to Global Computing:Deploying Parameter SweepApplications

Henri Casanova

Grid Research And Innovation Laboratory (GRAIL)

http://grail.sdsc.edu/San Diego Supercomputer Center (SDSC)

Computer Science and Engineering Dept. (CSE)

University of California, San Diego (UCSD)

Page 2: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

Parameter Sweep Applications

Many compute tasks No or simple dependencies Several output post-processing stages Potentially large datasets

Input data

Raw Output

Tasks

Post-processing

Final Output

Page 3: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

Relevance

Arise in virtually every field of science an engineering Monte Carlo, Parameter Space

Searches, Parameter Studies, etc. Biology, Astrophysics, Physics,

Bioinformatics, Economics, etc. Primary candidate for Grid

computing Latency-tolerant, amenable to simple

fault-tolerance Need huge amount of resources

Page 4: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

Outline of the Presentation

Parameter Sweep Applications

(PSAs)

APST

The Virtual Instrument

BIO@Home

Page 5: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

Scheduling of PSAs

?

Page 6: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

Grid Scheduling Practice

Ad-hoc solutions: specific to one application hand-tuned to the environment

(e.g. SF-Express demo)

Large body of work on Scheduling What can we re-use on the Grid?

Heterogeneous resources Dynamic performance characteristics Resources downtimes Complex network topologies Performance prediction errors

Page 7: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

“DataGrid” Scheduling

Goal: Co-locate/replicate data and computation

Dynamic Priority List-Scheduling Built on heuristics described in [Ibarra77, Siegel99]

Added adaptivity Simulation results

List-scheduling works, adaptivity should make it practical

Experimental results (Demo at SC’00 and SC’01)

[HCW’00] H. Casanova, A. Legrand, et al.

Page 8: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

Lessons

Much scheduling work to re-use List-scheduling with Dynamic

Priorities seems effective Simulation Experimental

Let’s build software that uses it Let’s target scientific communities

Page 9: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

Motivation for APST

Started as scheduling research Evolved into a tool that provides

Transparency of Grid execution Data movements Remote job management Multiple Grid middleware back-ends

Scheduling Self-scheduling List scheduling w/ dynamic priorities

Page 10: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

APST Designs

The AppLeS Parameter Sweep Template: An Application Execution Environment

XML application and resource descriptions

APST clientGrid

Grid Services

Scheduler

TransportCompute

Decisions

Actions

MetadataBookkeeper

Information

APST

Page 11: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

APST: Lessons

The Grid is difficult to use APST provides a simple software layer

that does one thing well Minimal user interface (XML, command-line) Used as a building block for domain-specific

applications E.g. multi-cluster bio-informatics (Singapore)

Ssh? Default mechanism Critical for gaining user buy in Natural way to lead to using the Grid

Page 12: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

APST Status

Version 1.1 released 2 weeks ago Available for public download Used for 10+ applications

Bioinformatics (BLAST, HMM, …) Computational Neuro-science

Globus, NetSolve, Ssh, Condor GASS, IBP, Scp, GridFTP, SRB, NWS, MDS, Ganglia,…

http://grail.sdsc.edu/projects/apst

Page 13: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

APST Research Directions

APST is a research platform Maintained by one staff Several graduate student contributors

Partitionable Workload Bioinformatics (database splitting) Factoring: Decrease chunk size Pipelining: Increase chunk size Combined? Create APST-BLAST

(Mario Lauria, OSU Yang Yang, UCSD)

Page 14: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

Outline of the Presentation

Parameter Sweep Applications

(PSAs)

APST

Virtual Instrument

BIO@home

Page 15: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

Computational Neuroscience

MCell: Monte Carlo Cell simulator Developed at Salk and PSC Gain knowledge about neuro-transmission mechanisms

• Fundamental for drug design (psychiatry)• Large user base (yearly MCell workshop) • Parallel MC simulations at the molecular level

Page 16: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

Traditional MCell usage

“By hand” No automatic project management No transparent resource access No automated data management

Consequences No interactive simulations No fault-tolerance, scheduling, … MCell limited to resources in the lab

Page 17: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

MCell and APST

APST alleviates some of the limitations Large-scale simulations Fault-tolerance and scheduling Data retrieval from distributed storage XML application descriptions

No interactivity MCell is exploratory User interaction is fundamental for many

users

Page 18: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

The Virtual Instrument

$2.5M funding from the NSF Salk, PSC, UCSB, UTK, UCSD

A running MCell simulation should behave as a lab instrument

Computational steering for MCell User interface Grid software Application software Scheduling research

(how does one scheduling an application that’s being steered interactively?)

Page 19: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

VIDatabase

VIInterface

VIDaemon

VI User

Grid Storage andCompute Resources

storage

computeGridServices

control

data

control+data

control+data

data

process

VI Software

OpenDX

Page 20: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

Scheduling Goals

Reduce the “search” time Let user assign levels of importance to

regions on the parameter space Assign fraction of resources with respect to

the importance levels Assign priorities to tasks

Interesting questions Job control limited on Grid resource Cannot assign exact fractions Interesting trade-offs between control

overhead and accuracy of priorities

Page 21: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

Current Status

First software prototype released in Feb 2002 Globus and Ssh MySQL OpenDX priority-based scheduling 20,000 lines of C++

Upcoming papers JPDC submission Scheduling paper (SC submission)

Page 22: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

Outline of the Presentation

Parameter Sweep Applications

(PSAs)

PSAs on the Grid with APST

MCell Virtual Instrument

Global Computing

Page 23: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

SETI@home

Over 500,000 active participants, most of which run screensaver on home PC

Over a cumulative 20 TeraFlop/sec Versus 12.3 TeraFlop/sec of IBM’s ASCI

White Cost: $500,000 + $200,000 in donated

hardware Less than 1% of the $110 million required

for ASCI White

Page 24: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

Global vs. Grid Computing

Nature of resources Home desktops running Windows and are

completely autonomous Machines powered on and off by user Behind firewalls, dynamic IP, transient

network connections Programming model

Server cannot “push” tasks to clients Server has no little means for remote job

control Server has incomplete information about

resources and availability

Page 25: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

Goal

SETI@home limitations: Embarrassingly parallel Infinite amount of input data Pure throughput

Can we do something more? Short-lived applications? Parallel applications? Compute service?

BIO@Home Smith-Waterman for short/long sequences No real software yet (build on XtremWeb?)

Page 26: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

Scheduling?

Sophisticated scheduling algorithms need information and control

At the moment: Simple mechanisms1. Work unit duplication

Specifies max number of times a work unit can be resent

2. TimeoutsTime that must elapse before work unit is

resent

Page 27: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

Simulation

Built a simulation model Using

statistics/surveys/extrapolations Next: logs from real systems

(XtremWeb?, Entropia?) Evaluated the impact of both

mechanisms on performance and throughput

Page 28: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

Early Lessons

Trade-off between throughput and turn-around time

Duplication: aggressively decreases turn-around time wastes resources there is an optimal value

Timeouts: moderately lowers turnaround times preserves good throughput infinite timeouts is of course not a good idea

Page 29: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

Future work

Two knobs Question: A compute service?

Mix of applications (SETI, short-lived, …) Singapore Bio-informatics institute

Notion of fairness? How do we implement policy with many

volatile resources? Software

Re-use existing platforms: XtremWeb Entropia

Page 30: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

Conclusion

APST, Virtual Instrument, BIO@Home Other GRAIL activities I didn’t talk

about Scientific Computing Simulation Adaptive Scheduling Networking

http://grail.sdsc.edu

Page 31: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)
Page 32: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)
Page 33: From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

Experimental Results

UTK

UCSD

TITECH

Tokyo

Self-scheduling XSufferage