portable containers orchestration at …qnib.org/data/hpcw19/4_sched_6_nextflow.pdfportable...

13
PORTABLE CONTAINERS ORCHESTRATION AT SCALE WITH NEXTFLOW Paolo Di Tommaso, Seqera Labs ISC-HPC 2019 - Frankfurt

Upload: others

Post on 15-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PORTABLE CONTAINERS ORCHESTRATION AT …qnib.org/data/hpcw19/4_SCHED_6_Nextflow.pdfPORTABLE CONTAINERS ORCHESTRATION AT SCALE WITH NEXTFLOW Paolo Di Tommaso, Seqera Labs ISC-HPC 2019

PORTABLE CONTAINERS ORCHESTRATION AT SCALE

WITH NEXTFLOWPaolo Di Tommaso, Seqera Labs

ISC-HPC 2019 - Frankfurt

Page 2: PORTABLE CONTAINERS ORCHESTRATION AT …qnib.org/data/hpcw19/4_SCHED_6_Nextflow.pdfPORTABLE CONTAINERS ORCHESTRATION AT SCALE WITH NEXTFLOW Paolo Di Tommaso, Seqera Labs ISC-HPC 2019

orchestration dependencies

sharing & reproducibility Git GitHub

deployment

code

ENABLING TECHNOLOGY

Page 3: PORTABLE CONTAINERS ORCHESTRATION AT …qnib.org/data/hpcw19/4_SCHED_6_Nextflow.pdfPORTABLE CONTAINERS ORCHESTRATION AT SCALE WITH NEXTFLOW Paolo Di Tommaso, Seqera Labs ISC-HPC 2019

WHAT DO YOU MEAN?

# satellite sequences reported by RepeatMasker. zcat rmsk.txt.gz \ | grep Satellite \ | cut -f6,7,8 \ | sed s,^chr,, \ | perl -pe 's/^[^\s_]+_([^\s_]+)_random/$1.1/' \ | tr "gl" "GL" \ | sort -k1,1N -k2,2n \ | bgzip > hs37d5.satellite.bed.gz

Credits of Heng Li, https://goo.gl/2nF5NC

Page 4: PORTABLE CONTAINERS ORCHESTRATION AT …qnib.org/data/hpcw19/4_SCHED_6_Nextflow.pdfPORTABLE CONTAINERS ORCHESTRATION AT SCALE WITH NEXTFLOW Paolo Di Tommaso, Seqera Labs ISC-HPC 2019

process filtering { input: file 'rmsk.txt.gz' from sequences_ch output: file 'hs37d5.satellite.bed.gz' into results_ch

'''

''' }

THE NEXTFLOW WAY

Channel.fromPath('data/rmsk.txt.gz')

# satellite sequences reported by RepeatMasker. zcat rmsk.txt.gz \ | grep Satellite \ | cut -f6,7,8 \ | sed s,^chr,, \ | perl -pe 's/^[^\s_]+_([^\s_]+)_random/$1.1/' \ | tr "gl" "GL" \ | sort -k1,1N -k2,2n \ | bgzip > hs37d5.satellite.bed.gz

| filtering | publishTo { '/path' }

Page 5: PORTABLE CONTAINERS ORCHESTRATION AT …qnib.org/data/hpcw19/4_SCHED_6_Nextflow.pdfPORTABLE CONTAINERS ORCHESTRATION AT SCALE WITH NEXTFLOW Paolo Di Tommaso, Seqera Labs ISC-HPC 2019

process filtering { input: file 'rmsk.txt.gz' from sequences_ch output: file 'hs37d5.satellite.bed.gz' into results_ch

'''

''' }

THE NEXTFLOW WAY

Channel.fromPath('data/*.txt.fq')

# satellite sequences reported by RepeatMasker. zcat rmsk.txt.gz \ | grep Satellite \ | cut -f6,7,8 \ | sed s,^chr,, \ | perl -pe 's/^[^\s_]+_([^\s_]+)_random/$1.1/' \ | tr "gl" "GL" \ | sort -k1,1N -k2,2n \ | bgzip > hs37d5.satellite.bed.gz

| filtering | publishTo { '/path' }

Page 6: PORTABLE CONTAINERS ORCHESTRATION AT …qnib.org/data/hpcw19/4_SCHED_6_Nextflow.pdfPORTABLE CONTAINERS ORCHESTRATION AT SCALE WITH NEXTFLOW Paolo Di Tommaso, Seqera Labs ISC-HPC 2019

CONTAINERISATION• Nextflow envisioned the use

of software containers to fix computational reproducibility

• Mar 2014 (ver 0.7), support for Docker

• Dec 2016 (ver 0.23), support for Singularity

Nextflow

job job job

Page 7: PORTABLE CONTAINERS ORCHESTRATION AT …qnib.org/data/hpcw19/4_SCHED_6_Nextflow.pdfPORTABLE CONTAINERS ORCHESTRATION AT SCALE WITH NEXTFLOW Paolo Di Tommaso, Seqera Labs ISC-HPC 2019

CONTAINERISATION• Nextflow envisioned the use

of software containers to fix computational reproducibility

• Mar 2014 (ver 0.7), support for Docker

• Dec 2016 (ver 0.23), support for Singularity

Nextflow

job job job

Page 8: PORTABLE CONTAINERS ORCHESTRATION AT …qnib.org/data/hpcw19/4_SCHED_6_Nextflow.pdfPORTABLE CONTAINERS ORCHESTRATION AT SCALE WITH NEXTFLOW Paolo Di Tommaso, Seqera Labs ISC-HPC 2019

PORTABILITY

nextflow run your-script.nfnextflow run your-script.nf -with-docker your/image

Page 9: PORTABLE CONTAINERS ORCHESTRATION AT …qnib.org/data/hpcw19/4_SCHED_6_Nextflow.pdfPORTABLE CONTAINERS ORCHESTRATION AT SCALE WITH NEXTFLOW Paolo Di Tommaso, Seqera Labs ISC-HPC 2019

process { executor = 'slurm' queue = 'my-queue' memory = '8 GB' cpus = 4 container = 'user/image' }

PORTABILITY

Page 10: PORTABLE CONTAINERS ORCHESTRATION AT …qnib.org/data/hpcw19/4_SCHED_6_Nextflow.pdfPORTABLE CONTAINERS ORCHESTRATION AT SCALE WITH NEXTFLOW Paolo Di Tommaso, Seqera Labs ISC-HPC 2019

process { executor = 'awsbatch' queue = 'my-queue' memory = '8 GB' cpus = 4 container = 'user/image' }

PORTABILITY

Page 11: PORTABLE CONTAINERS ORCHESTRATION AT …qnib.org/data/hpcw19/4_SCHED_6_Nextflow.pdfPORTABLE CONTAINERS ORCHESTRATION AT SCALE WITH NEXTFLOW Paolo Di Tommaso, Seqera Labs ISC-HPC 2019

WHO IS USING NEXTFLOW?

Page 12: PORTABLE CONTAINERS ORCHESTRATION AT …qnib.org/data/hpcw19/4_SCHED_6_Nextflow.pdfPORTABLE CONTAINERS ORCHESTRATION AT SCALE WITH NEXTFLOW Paolo Di Tommaso, Seqera Labs ISC-HPC 2019

38members

12+institutions

20pipelines

Page 13: PORTABLE CONTAINERS ORCHESTRATION AT …qnib.org/data/hpcw19/4_SCHED_6_Nextflow.pdfPORTABLE CONTAINERS ORCHESTRATION AT SCALE WITH NEXTFLOW Paolo Di Tommaso, Seqera Labs ISC-HPC 2019

THANK YOU

http://nextflow.io

http://seqera.io