enabling cloud bursting for life sciences within galaxy

16
Enabling Cloud Bursting for Life Sciences within Galaxy Enis Afgan Johns Hopkins University Galaxy Team Slides available at bit.ly/gxy-bursting

Upload: enis-afgan

Post on 15-Jul-2015

492 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Enabling Cloud Bursting for Life Sciences within Galaxy

Enabling Cloud Bursting for Life Sciences within Galaxy

Enis Afgan Johns Hopkins University

Galaxy Team

Slides available at bit.ly/gxy-bursting

Page 2: Enabling Cloud Bursting for Life Sciences within Galaxy

What is •  A data analysis and integration tool

•  A (free for everyone) web service integrating a wealth of tools, compute resources, terabytes of reference data and permanent storage

•  Open source software that makes integrating your own tools and data and customizing for your own site simple

?

Page 3: Enabling Cloud Bursting for Life Sciences within Galaxy

usegalaxy.org or

any of the other 60+ public servers

$ hg clone bitbucket.org/galaxy/galaxy-dist $ sh run.sh

Page 4: Enabling Cloud Bursting for Life Sciences within Galaxy
Page 5: Enabling Cloud Bursting for Life Sciences within Galaxy

Galaxy

/Tools

/Data

/Indices

DB

Compute resources

Galaxy

GalaxyGalaxy

RNA-Seq

Assembly

QualityControl (QC)

Local Federated

GalaxyObjectStore

interface

DB

Indices AData A

Tools A S3, SwiftPulsar

Indices BData B

Tools B

Local

Pulsar

Indices C

Data C

Tools C

Artifact & job provenance

RNA-Seq, Assembly, QC

GalaxyGalaxy

CloudMan

Page 6: Enabling Cloud Bursting for Life Sciences within Galaxy

Focus on Cloud Bursting Peak usage scenarios

Resource heterogeneity

Software licensing

Software installation restrictions

National cyber infrastructure resource access

Per-user, merit-based resource access

Page 7: Enabling Cloud Bursting for Life Sciences within Galaxy

Burst Triggers When?

Resource capacity

Job requirements

Data locality

System configuration

User preferences

Where? Remote resource availability

Cost

Page 8: Enabling Cloud Bursting for Life Sciences within Galaxy

Burst Architecture 1.  Galaxy dynamic job destination framework

2.  Galaxy CloudMan cluster with Pulsar

3.  A job destination mapper function

CloudManPulsar

CloudManPulsar

LocalDRM

Galaxy<dynamic)job)destination)framework)/>

f(mapper)

Page 9: Enabling Cloud Bursting for Life Sciences within Galaxy

Pulsar A standalone job manager server for Galaxy

Can be deployed on dedicated or transient servers (even MS Windows!)

Handles data staging and remote job execution

Pulsarjob

Stage data Submit job Monitor job

Send back the data

Page 10: Enabling Cloud Bursting for Life Sciences within Galaxy

1. Galaxy dynamic job destination framework

Define job execution properties

•  Runners: local, Slurm, HTCondor, DRMAA, Pulsar, …

•  Destinations: resource & job properties (e.g., DRM queue, wall time)

Page 11: Enabling Cloud Bursting for Life Sciences within Galaxy

2. CloudMan with Pulsar A.  Launch a Galaxy on the Cloud instance

B.  Enable Pulsar service

C.  Add the instance as a destination in job config

Tool availability

•  Direct tool install

•  Docker images

Page 12: Enabling Cloud Bursting for Life Sciences within Galaxy

3. Job mapper function Determine job destination at runtime

import pyslurm   def cloud_burst():    n = pyslurm.node()    nodes_state = n.get()    available_nodes = []    for node in nodes_state.itervalues():        if node['total_cpus'] > 0:            available_nodes.append(node)    if not available_nodes:        return 'pulsar_nectar_galaxy'    return 'drmaa_runner’

job destination

Page 13: Enabling Cloud Bursting for Life Sciences within Galaxy

CloudManPulsar

CloudManPulsar

LocalDRM

Galaxy<dynamic)job)destination)framework)/>

f(mapper)Pulsar ?

Page 14: Enabling Cloud Bursting for Life Sciences within Galaxy

An outcome?

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

0

100

200

300

400

500

600

700

800

900

1000 20

13-4

1 20

13-4

3 20

13-4

5 20

13-4

7 20

13-4

9 20

13-5

1 20

13-5

3 20

14-0

2 20

14-0

4 20

14-0

6 20

14-0

8 20

14-1

0 20

14-1

2 20

14-1

4 20

14-1

6 20

14-1

8 20

14-2

0 20

14-2

2 20

14-2

4 20

14-2

6 20

14-2

8 20

14-3

0 20

14-3

2 20

14-3

4 20

14-3

6 20

14-3

8 20

14-4

0 20

14-4

2 20

14-4

4 20

14-4

6 20

14-4

8 20

14-5

0 20

14-5

2 20

15-0

1 20

15-0

3

Jobs

run

to c

ompl

etio

n (c

ount

)

Aver

age

wai

t tim

e (m

inut

es)

Week

Average wait

Jobs run to completion

usegalaxy.org Start bursting No job wait

More jobs

Page 15: Enabling Cloud Bursting for Life Sciences within Galaxy

An outcome?

usegalaxy.org

0.00%

2.00%

4.00%

6.00%

8.00%

10.00%

12.00%

14.00%

2013

-41

2013

-43

2013

-45

2013

-47

2013

-49

2013

-51

2013

-53

2014

-02

2014

-04

2014

-06

2014

-08

2014

-10

2014

-12

2014

-14

2014

-16

2014

-18

2014

-20

2014

-22

2014

-24

2014

-26

2014

-28

2014

-30

2014

-32

2014

-34

2014

-36

2014

-38

2014

-40

2014

-42

2014

-44

2014

-46

2014

-48

2014

-50

2014

-52

2015

-01

2015

-03

Jobs

del

eted

whi

le q

ueue

d (%

of j

obs s

ubm

itted

)

Week

User frustration level

Page 16: Enabling Cloud Bursting for Life Sciences within Galaxy