extending the galaxy portal with parallel and distributed execution capability

Post on 23-Feb-2016

44 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Extending the Galaxy portal with parallel and distributed execution capability. Ketan Maheshwari , Alex Rodriguez, David Kelly, Ravi Madduri , Justin Wozniak, Michael Wilde, Ian Foster Argonne National Laboratory & University of Chicago. Overview. Introduce the Galaxy and Swift systems - PowerPoint PPT Presentation

TRANSCRIPT

Extending the Galaxy portal with parallel and distributed execution

capability

Ketan Maheshwari, Alex Rodriguez, David Kelly, Ravi Madduri, Justin Wozniak, Michael Wilde, Ian Foster

Argonne National Laboratory & University of Chicago

swift-lang.org

2

Overview

Introduce the Galaxy and Swift systems Couple the Swift and Galaxy gateway frameworks Combine the features offered by Galaxy and Swift into an

integrated platform Different integration schemes based on user

requirements, and application patterns Data management schemes Example use-case A demo screencast (if time permits) Summary and Future work

3

Overview of the Galaxy Workflow System*

swift-lang.org *slide courtesy: Center for Genomic Regulation, Barcelona, Spain

workspaceTools panelMonitor/HistoryPanel

swift-lang.org

4

Overview of Swift Parallel Scripting Framework

Simulation of super-cooled glass materials

Protein folding using homology-free approaches

Climate model analysis and decision making in energy policy

Simulation of RNA-protein interaction

Multiscale subsurfaceflow modeling

Modeling of power gridapplications

All have published science results obtained using Swift

T0623, 25 res., 8.2Å to 6.3Å (excluding tail)

Protein loop modeling. Courtesy A. Adhikari

Native Predicted

Initial

E

D

C

A B

A

B

C

D

E

F

F>

swift-lang.org

5

Motivation : Swift and Galaxy are Complementary in many ways

Galaxy (galaxyproject.org) offers a simple, user-friendly web-based interface for composing, execution, monitoring workflows

Galaxy results are sharable, reproducible and reusable

Galaxy is a widely used: well-supported by user community e.g. Next Generation Sequencing (NGS) Community

Swift provides sophisticated interface to parallel and distributed platforms

Swift scripts are structured expressions of complex application flows which are readily executable on multiple, diverse and independent remote resources

swift-lang.org

6

Galaxy web-console

Swift-Galaxy Integration Overview

Approaches enabling integration in different ways:– At tool level– At Workflow level– At language/expression level

Galaxy server Galaxy-tool

Swift

app libraries

CloudsClusters

SupercomputersGrids

user computer

7

Computational Infrastructure

Galaxy offers a limited support for Distributed and Parallel Resources– Needs additional adhoc configuration to interface– Constrained in some ways, e.g. needs shared file system*

Swift is robustly interfaced to a wider types of Resource Managers with finer control over job submission parameters:– Supports: PBS/Torque, SGE, SLURM, Condor– Supports bag-of-workstations: clouds, workstation clusters– Supports distributed file system, multiple execution sites

simultaneously

swift-lang.org * To the best of our knowledge

swift-lang.org

8

Interface with heterogeneous parallel systems is a challenge

## SLURM#!/bin/bash#SBATCH -J ...#SBATCH -oe ...#SBATCH –p ...#SBATCH –N ...ibrun ./my_exec args

# CONDORExecutable=eUniverse=stdError=err.$Input=in.$Output=out.$Log=foo.logQueue

SGE#!/bin/bash#$ -cwd#$ -j y#$ -S /bin/bashpwd./my_exec args

##TORQUE/PBS#!/bin/bash#PBS -q ccs_short#PBS -N my_serial_job#PBS -l walltime=01:00:00#PBS -l nodes=1:noib:ppn=1#PBS -m e./a.out

swift-lang.org

9

Scheme1: Wrap Swift around Galaxy Tools

swift-tool A

swift-tool B

swift-tool N

Other Galaxytools

.

.

.

.

.

execution history

execution history

execution history

.

.

.

swift-lang.org

10

Scheme 2: Interoperability between expressions

Swift script Galaxy WorkflowXML transformation

Internally both Swift and Galaxy codes are represented in XML dialects

Automated transformation to convert from one form into another

Currently under development

swift-lang.org

11

Scheme 3: Harness Data Parallelism using foreach

Galaxy-tool

Galaxy-tool

Galaxy-tool

.

.

in-data(split)

out-data(merge)

swift-foreach wrapper

foreach protein, idx in proteinList{ runBlast (protein); tracef(“The index is: %i\n", idx);}

foreach idx in [begin:end:step]{ runmyapp (idx);}

swift-lang.org

12

Cloud Interfaces Galaxy instances running on cloud nodes are already taking

advantage of cloud-based resources Swift’s coasters mechanism can farm resources and combine

multiple cloud and non-cloud resources in a single application run.

swift-lang.org

13

Data Management

Both Galaxy and Swift offer various data management capabilities

Galaxy offers remote data uploading and viewing capabilities Swift allows disc resident data to be operated upon as

program variables Swift’s data-providers are interfaced with various data

management protocols and can manage data motions at runtime

swift-lang.org

14

Evaluation Application: Inference analysis for power prices

generatesample

Candidate Solution

Variance &Mean

generatesample

Candidate Solution

generatesample

generatesample

generatesamplegenerate

sample

lower bound

upper bound

… …

samples

batches

batch size lower bound

upper bound

… …

batches

swift-lang.org

15

Swift Script for Inference Analysis

import "mappings";import "apps”;type file;

int nS[] = [10, 100, 1000, 10000, 100000];foreach S, idxs in nS { sample0 = gensample(S, wind_data); obj[idxs] = ampl(sample0); foreach B, idxb in [10:40:10] { foreach k in [0:B]{ sample1 = gensample(S, wind_data); obj_l[idxs][idxb][k] = ampl_L(sample1); sample2 = gensample(S, wind_data); obj_u[idxs][idxb][k] = ampl_U(sample2, obj[idxs]);}}}

swift-lang.org

16

Summary

Swift-Galaxy integration improves science gateways:– User control– Structured distributed computing– Simple– Interactive

Commonalities in basic execution model of Galaxy and Swift leads to many avenues of integration schemes

Broadly, Swift acts as a backend manager while Galaxy being the frontend for operations

Example of combining command-line and GUI based frameworks

swift-lang.org

17

Future Work

A generic approach for each of the integration schemes Wider application adaptation Finer and broader exposure to configuration options to users Interactive monitoring features Authentication features, Globus based identity management

swift-lang.org

18

Acknowledgements

This work was supported in part by the NIH through the NHLBI grant: The Cardiovascular Research Grid (R24HL085343) and by the U.S. Department of Energy under contract DE-AC02- 06CH11357.

We are grateful to Amazon, Inc., for an award of Amazon Web Services time that facilitated early experiments.

Colleagues at Swift and Globus groups at the MCS Division, Argonne National Laboratory

swift-lang.org

19

Thank you!

Visit swift-lang.org for more information about Swift parallel

scripting framework

top related