enabling cloud bursting for life sciences within galaxy
TRANSCRIPT
Enabling Cloud Bursting for Life Sciences within Galaxy
Enis Afgan Johns Hopkins University
Galaxy Team
Slides available at bit.ly/gxy-bursting
What is • A data analysis and integration tool
• A (free for everyone) web service integrating a wealth of tools, compute resources, terabytes of reference data and permanent storage
• Open source software that makes integrating your own tools and data and customizing for your own site simple
?
usegalaxy.org or
any of the other 60+ public servers
$ hg clone bitbucket.org/galaxy/galaxy-dist $ sh run.sh
Galaxy
/Tools
/Data
/Indices
DB
Compute resources
Galaxy
GalaxyGalaxy
RNA-Seq
Assembly
QualityControl (QC)
Local Federated
GalaxyObjectStore
interface
DB
Indices AData A
Tools A S3, SwiftPulsar
Indices BData B
Tools B
Local
Pulsar
Indices C
Data C
Tools C
Artifact & job provenance
RNA-Seq, Assembly, QC
GalaxyGalaxy
CloudMan
Focus on Cloud Bursting Peak usage scenarios
Resource heterogeneity
Software licensing
Software installation restrictions
National cyber infrastructure resource access
Per-user, merit-based resource access
Burst Triggers When?
Resource capacity
Job requirements
Data locality
System configuration
User preferences
Where? Remote resource availability
Cost
Burst Architecture 1. Galaxy dynamic job destination framework
2. Galaxy CloudMan cluster with Pulsar
3. A job destination mapper function
CloudManPulsar
CloudManPulsar
LocalDRM
Galaxy<dynamic)job)destination)framework)/>
f(mapper)
Pulsar A standalone job manager server for Galaxy
Can be deployed on dedicated or transient servers (even MS Windows!)
Handles data staging and remote job execution
Pulsarjob
Stage data Submit job Monitor job
Send back the data
1. Galaxy dynamic job destination framework
Define job execution properties
• Runners: local, Slurm, HTCondor, DRMAA, Pulsar, …
• Destinations: resource & job properties (e.g., DRM queue, wall time)
2. CloudMan with Pulsar A. Launch a Galaxy on the Cloud instance
B. Enable Pulsar service
C. Add the instance as a destination in job config
Tool availability
• Direct tool install
• Docker images
3. Job mapper function Determine job destination at runtime
import pyslurm def cloud_burst(): n = pyslurm.node() nodes_state = n.get() available_nodes = [] for node in nodes_state.itervalues(): if node['total_cpus'] > 0: available_nodes.append(node) if not available_nodes: return 'pulsar_nectar_galaxy' return 'drmaa_runner’
job destination
CloudManPulsar
CloudManPulsar
LocalDRM
Galaxy<dynamic)job)destination)framework)/>
f(mapper)Pulsar ?
An outcome?
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
0
100
200
300
400
500
600
700
800
900
1000 20
13-4
1 20
13-4
3 20
13-4
5 20
13-4
7 20
13-4
9 20
13-5
1 20
13-5
3 20
14-0
2 20
14-0
4 20
14-0
6 20
14-0
8 20
14-1
0 20
14-1
2 20
14-1
4 20
14-1
6 20
14-1
8 20
14-2
0 20
14-2
2 20
14-2
4 20
14-2
6 20
14-2
8 20
14-3
0 20
14-3
2 20
14-3
4 20
14-3
6 20
14-3
8 20
14-4
0 20
14-4
2 20
14-4
4 20
14-4
6 20
14-4
8 20
14-5
0 20
14-5
2 20
15-0
1 20
15-0
3
Jobs
run
to c
ompl
etio
n (c
ount
)
Aver
age
wai
t tim
e (m
inut
es)
Week
Average wait
Jobs run to completion
usegalaxy.org Start bursting No job wait
More jobs
An outcome?
usegalaxy.org
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
14.00%
2013
-41
2013
-43
2013
-45
2013
-47
2013
-49
2013
-51
2013
-53
2014
-02
2014
-04
2014
-06
2014
-08
2014
-10
2014
-12
2014
-14
2014
-16
2014
-18
2014
-20
2014
-22
2014
-24
2014
-26
2014
-28
2014
-30
2014
-32
2014
-34
2014
-36
2014
-38
2014
-40
2014
-42
2014
-44
2014
-46
2014
-48
2014
-50
2014
-52
2015
-01
2015
-03
Jobs
del
eted
whi
le q
ueue
d (%
of j
obs s
ubm
itted
)
Week
User frustration level