extending the galaxy portal with parallel and distributed execution capability

19
Extending the Galaxy portal with parallel and distributed execution capability Ketan Maheshwari, Alex Rodriguez, David Kelly, Ravi Madduri, Justin Wozniak, Michael Wilde, Ian Foster Argonne National Laboratory & University of Chicago

Upload: menefer

Post on 23-Feb-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Extending the Galaxy portal with parallel and distributed execution capability. Ketan Maheshwari , Alex Rodriguez, David Kelly, Ravi Madduri , Justin Wozniak, Michael Wilde, Ian Foster Argonne National Laboratory & University of Chicago. Overview. Introduce the Galaxy and Swift systems - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Extending the Galaxy portal with parallel and distributed execution capability

Extending the Galaxy portal with parallel and distributed execution

capability

Ketan Maheshwari, Alex Rodriguez, David Kelly, Ravi Madduri, Justin Wozniak, Michael Wilde, Ian Foster

Argonne National Laboratory & University of Chicago

Page 2: Extending the Galaxy portal with parallel and distributed execution capability

swift-lang.org

2

Overview

Introduce the Galaxy and Swift systems Couple the Swift and Galaxy gateway frameworks Combine the features offered by Galaxy and Swift into an

integrated platform Different integration schemes based on user

requirements, and application patterns Data management schemes Example use-case A demo screencast (if time permits) Summary and Future work

Page 3: Extending the Galaxy portal with parallel and distributed execution capability

3

Overview of the Galaxy Workflow System*

swift-lang.org *slide courtesy: Center for Genomic Regulation, Barcelona, Spain

workspaceTools panelMonitor/HistoryPanel

Page 4: Extending the Galaxy portal with parallel and distributed execution capability

swift-lang.org

4

Overview of Swift Parallel Scripting Framework

Simulation of super-cooled glass materials

Protein folding using homology-free approaches

Climate model analysis and decision making in energy policy

Simulation of RNA-protein interaction

Multiscale subsurfaceflow modeling

Modeling of power gridapplications

All have published science results obtained using Swift

T0623, 25 res., 8.2Å to 6.3Å (excluding tail)

Protein loop modeling. Courtesy A. Adhikari

Native Predicted

Initial

E

D

C

A B

A

B

C

D

E

F

F>

Page 5: Extending the Galaxy portal with parallel and distributed execution capability

swift-lang.org

5

Motivation : Swift and Galaxy are Complementary in many ways

Galaxy (galaxyproject.org) offers a simple, user-friendly web-based interface for composing, execution, monitoring workflows

Galaxy results are sharable, reproducible and reusable

Galaxy is a widely used: well-supported by user community e.g. Next Generation Sequencing (NGS) Community

Swift provides sophisticated interface to parallel and distributed platforms

Swift scripts are structured expressions of complex application flows which are readily executable on multiple, diverse and independent remote resources

Page 6: Extending the Galaxy portal with parallel and distributed execution capability

swift-lang.org

6

Galaxy web-console

Swift-Galaxy Integration Overview

Approaches enabling integration in different ways:– At tool level– At Workflow level– At language/expression level

Galaxy server Galaxy-tool

Swift

app libraries

CloudsClusters

SupercomputersGrids

user computer

Page 7: Extending the Galaxy portal with parallel and distributed execution capability

7

Computational Infrastructure

Galaxy offers a limited support for Distributed and Parallel Resources– Needs additional adhoc configuration to interface– Constrained in some ways, e.g. needs shared file system*

Swift is robustly interfaced to a wider types of Resource Managers with finer control over job submission parameters:– Supports: PBS/Torque, SGE, SLURM, Condor– Supports bag-of-workstations: clouds, workstation clusters– Supports distributed file system, multiple execution sites

simultaneously

swift-lang.org * To the best of our knowledge

Page 8: Extending the Galaxy portal with parallel and distributed execution capability

swift-lang.org

8

Interface with heterogeneous parallel systems is a challenge

## SLURM#!/bin/bash#SBATCH -J ...#SBATCH -oe ...#SBATCH –p ...#SBATCH –N ...ibrun ./my_exec args

# CONDORExecutable=eUniverse=stdError=err.$Input=in.$Output=out.$Log=foo.logQueue

SGE#!/bin/bash#$ -cwd#$ -j y#$ -S /bin/bashpwd./my_exec args

##TORQUE/PBS#!/bin/bash#PBS -q ccs_short#PBS -N my_serial_job#PBS -l walltime=01:00:00#PBS -l nodes=1:noib:ppn=1#PBS -m e./a.out

Page 9: Extending the Galaxy portal with parallel and distributed execution capability

swift-lang.org

9

Scheme1: Wrap Swift around Galaxy Tools

swift-tool A

swift-tool B

swift-tool N

Other Galaxytools

.

.

.

.

.

execution history

execution history

execution history

.

.

.

Page 10: Extending the Galaxy portal with parallel and distributed execution capability

swift-lang.org

10

Scheme 2: Interoperability between expressions

Swift script Galaxy WorkflowXML transformation

Internally both Swift and Galaxy codes are represented in XML dialects

Automated transformation to convert from one form into another

Currently under development

Page 11: Extending the Galaxy portal with parallel and distributed execution capability

swift-lang.org

11

Scheme 3: Harness Data Parallelism using foreach

Galaxy-tool

Galaxy-tool

Galaxy-tool

.

.

in-data(split)

out-data(merge)

swift-foreach wrapper

foreach protein, idx in proteinList{ runBlast (protein); tracef(“The index is: %i\n", idx);}

foreach idx in [begin:end:step]{ runmyapp (idx);}

Page 12: Extending the Galaxy portal with parallel and distributed execution capability

swift-lang.org

12

Cloud Interfaces Galaxy instances running on cloud nodes are already taking

advantage of cloud-based resources Swift’s coasters mechanism can farm resources and combine

multiple cloud and non-cloud resources in a single application run.

Page 13: Extending the Galaxy portal with parallel and distributed execution capability

swift-lang.org

13

Data Management

Both Galaxy and Swift offer various data management capabilities

Galaxy offers remote data uploading and viewing capabilities Swift allows disc resident data to be operated upon as

program variables Swift’s data-providers are interfaced with various data

management protocols and can manage data motions at runtime

Page 14: Extending the Galaxy portal with parallel and distributed execution capability

swift-lang.org

14

Evaluation Application: Inference analysis for power prices

generatesample

Candidate Solution

Variance &Mean

generatesample

Candidate Solution

generatesample

generatesample

generatesamplegenerate

sample

lower bound

upper bound

… …

samples

batches

batch size lower bound

upper bound

… …

batches

Page 15: Extending the Galaxy portal with parallel and distributed execution capability

swift-lang.org

15

Swift Script for Inference Analysis

import "mappings";import "apps”;type file;

int nS[] = [10, 100, 1000, 10000, 100000];foreach S, idxs in nS { sample0 = gensample(S, wind_data); obj[idxs] = ampl(sample0); foreach B, idxb in [10:40:10] { foreach k in [0:B]{ sample1 = gensample(S, wind_data); obj_l[idxs][idxb][k] = ampl_L(sample1); sample2 = gensample(S, wind_data); obj_u[idxs][idxb][k] = ampl_U(sample2, obj[idxs]);}}}

Page 16: Extending the Galaxy portal with parallel and distributed execution capability

swift-lang.org

16

Summary

Swift-Galaxy integration improves science gateways:– User control– Structured distributed computing– Simple– Interactive

Commonalities in basic execution model of Galaxy and Swift leads to many avenues of integration schemes

Broadly, Swift acts as a backend manager while Galaxy being the frontend for operations

Example of combining command-line and GUI based frameworks

Page 17: Extending the Galaxy portal with parallel and distributed execution capability

swift-lang.org

17

Future Work

A generic approach for each of the integration schemes Wider application adaptation Finer and broader exposure to configuration options to users Interactive monitoring features Authentication features, Globus based identity management

Page 18: Extending the Galaxy portal with parallel and distributed execution capability

swift-lang.org

18

Acknowledgements

This work was supported in part by the NIH through the NHLBI grant: The Cardiovascular Research Grid (R24HL085343) and by the U.S. Department of Energy under contract DE-AC02- 06CH11357.

We are grateful to Amazon, Inc., for an award of Amazon Web Services time that facilitated early experiments.

Colleagues at Swift and Globus groups at the MCS Division, Argonne National Laboratory

Page 19: Extending the Galaxy portal with parallel and distributed execution capability

swift-lang.org

19

Thank you!

Visit swift-lang.org for more information about Swift parallel

scripting framework