building a european wide, bioinformatics jobs execution...bundle of tools a vm image, named virtual...
TRANSCRIPT
Building a European wide, Bioinformatics jobs execution network
Gianmauro Cuccuru - Galaxy Europe TeamUniversity of Freiburg
A Galaxy for all Scientists
• Galaxy is a gateway for transparent & reproducible data analysis• Easy accessing (no installation) & sharing of data, tools, analysis, workflows• Multiple interfaces
• intuitive web portals for biologists• unified API for bioinformaticians
• International: 120+ public instances, 8000+ citations
• UseGalaxy.eu, UseGalaxy.org.au, UseGalaxy.org ...
• ELIXIR community: • WG established in 2015
European Galaxy server
With the UseGalaxy.eu server we provide access to:● Free compute and storage resources ● More than 2000 different, well-documented and constantly maintained
bioinformatics tools ● Reference genomes
(incl. most common plant genomes)● 250 GB per user● Free registration ● Login with ELIXIR AAI● Member sites:
Freiburg, Erasmus MC, VIB Belgium, Institut Pasteur
https://usegalaxy.eu
Galaxy Interface
New - Events - Statistics
Analyze Data - Workflows - Visualize - Shared Data - Help
Tools with documentation, example input and results, reference
History as digital lab book
Galaxy Workflows for multi-step AnalysesHow?
● Extracted from history● Built manually● Import a shared workflow
Why?
● Automatize your analysis● Run pipelines● Re-run same analysis on different
inputs or reproduce results● Change parameter● Sub workflows● Share them
Infrastructure to connect interactive environments
● Jupyter● RStudio● Shiny● Neo4J● Phinch● …..
Training Infrastructure as a Service
● Queue where only your training’s jobs will run● Free, register using a google form● No Galaxy Maintenance● No Galaxy Administration● Official Galaxy Training Materials are guaranteed
to work and regularly tested● See how your students are progressing with our
dashboard● >1500 students in the past year
Subdomains - Fostering Communities
Subdomains with own welcome page and tool box
annotation.usegalaxy.eucheminformatics.usegalaxy.euclimate.usegalaxy.euclipseq.usegalaxy.euecology.usegalaxy.eugraphclust.usegalaxy.euproteomics.usegalaxy.eurna.usegalaxy.euimaging.usegalaxy.eu
metabolomics.usegalaxy.eumetagenomics.usegalaxy.eunanopore.usegalaxy.eusinglecell.usegalaxy.eustreetscience.usegalaxy.euhicexplorer.usegalaxy.euhumancellatlas.usegalaxy.eu
European Galaxy community at glance
UseGalaxy.eu since 2018
● 8,000 users (17th September 2019)● 6 Mio jobs● 11,000,000 datasets● 13,800 workflows● Training material: 137 contributors● Annual Galaxy community conference
Galaxy Computational Power
● 2000 CPU cores● 20 TB RAM● 1,5 PB storage● 50 TB data/all users/month● 130,000 jobs/all users/month● cloud infrastructure of de.NBI (German network for bioinformatics
infrastructure) to perform analyses of large datasets
Central manager Interactive environments (~10 physical nodes)
Main cluster (~100 cloud nodes divided into 8 different classes)
Training cluster (1-9 cloud nodes)
Bundle of tools
● A VM image, named Virtual Galaxy Compute Nodes (VGCN), that provides everything you need to run Galaxy jobs.
● Terraform scripts that take care of the infrastructure deployment into the Cloud resources
● Continuous testing● Continuous Deployment
Open Infrastructure
Open Infrastructure
Join Forces
The most innovative computing centers across Europe are currently interested to share their remote computation power to support the UseGalaxy.eu load:
● DE, de.NBI cloud● IT, Recas● BE, Vlaam Supercomputer Centrum (VSC)● PT, Tecnico ULisboa● ES, Barcelona Supercomp. Center (INB-BSC )● NO, University of Bergen● CZ, CESNET
A Pulsar network across Europe
To create this network of shared computational resources, we leverage Pulsar, a Task Execution Service (TES)-like service. Pulsar allows a Galaxy server to automatically interact with those remote systems, ensuring job and provenance information are correctly exchanged.
Local + remote clusters
DE01, DE02, IT01, UK01,...
Remote resources examples
FQDN: uk01.pulsar.galaxyproject.eu
Computation details:
● 5-30 nodes: 60 cores, 320 GB ram each
FQDN: de03.pulsar.galaxyproject.eu
Computation details:
● 8 NVIDIA Tesla T4
Singularity
Regenerated 32.000 Singularity containers and made sure that all best-practise Galaxy tools and workflows are available as Singularity containers.
Currently moving those containers (7TB) to a CVMFS and those will be part of every standard pulsar endpoint as soon as the CVMFS snapshot is ready.
Next
● Python bindings● Late materialization● 8.6 -> 8.8● Annex● HTCondor-CE
Thanks!
https://galaxyproject.eu/freiburg/
Docker-galaxy
Minimum requirementsFor a prototype setup, the minimum requirements are:
● Central manager and NFS server each with 4 cores, 8 GB
● Computational workerseach with 4-8 cores, 16 GB
● >200 GB volume
but the more the better
NFS
Central manager(HTCondor + Pulsar)
Computational workers