Download - FAIR bioinfo for bioinformaticians
FAIR bioinfo for bioinformaticiansIntroduction to the tools of reproducibility in bioinformatics
C. Hernandez1 T. Denecker1 J.Sellier2 C. Toffano-Nioche1
1Institute for Integrative Biology of the Cell (I2BC)UMR 9198, Universite Paris-Sud, CNRS, CEA
91190 - Gif-sur-Yvette, France
2Institut de Genetique et de Biologie Moleculaire et Cellulaire (IGBMC)CNRS UMR 7104 - Inserm U 1258
67404 - Illkirch cedex, France
Sept. 2020
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 1 / 28
Introduction
A (not-so-uncommon) nightmare
Runanalysis
Submittoajournal
Requestfromareviewer
Re-installsoftware
Re-runanalysis
Resultsaredifferent?!
What changed?
Software version
Libraries version
OS version
..?
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 2 / 28
Introduction
A (not-so-uncommon) nightmare
Runanalysis
Submittoajournal
Requestfromareviewer
Re-installsoftware
Re-runanalysis
Resultsaredifferent?!
What changed?
Software version
Libraries version
OS version
..?
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 2 / 28
Different levels of encapsulation
Goal : capture the system environment of applications (OS, packages,libraries,. . . ) to control their execution.
Hardware virtualisation (virtual machines)
OS virtualisation (images and containers)
Environment management
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 3 / 28
Encapsulation
Let’s say we want to install Firefox...
Windows MacOS Unix-based
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 4 / 28
Encapsulation
Computer
Host OS
We started with a computer using aspecific OS...
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 5 / 28
Encapsulation
Computer
Host OS
Application
We started with a computer using aspecific OS...And inside this environment, weinstalled a new application.
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 6 / 28
Encapsulation
Computer
Host OS
Libraries
Application
We started with a computer using aspecific OS...And inside this environment, weinstalled a new application.Applications rely on dependencies,e.g. external libraries.
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 7 / 28
Encapsulation
Computer
Host OS
Libraries
Application v1
Libraries
Application v1.2
Usually dependencies of differentapplications don’t interfere.But what if we want to test thelatest version of our favourite tool?There might be conflicts. . .
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 8 / 28
Encapsulation : hardware virtualisation
Computer
Host OS
VM manager
Guest OS 1
Libraries
Application v1
Guest OS 1
Libraries
Application v2 Idea: use virtual machinesPros:
Each application gets acompletely different andindependent environment
Virtual machines can betransferred to another computer(using the same manager)
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 9 / 28
Encapsulation : hardware virtualisation
MacOS
Ubuntu
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 10 / 28
Encapsulation : hardware virtualisation
Computer
Host OS
VM manager
Guest OS 1
Libraries
Application v1
Guest OS 1
Libraries
Application v2 Idea: use virtual machinesPros: transferable independentenvironmentsCons:
Redundancy between VMs
Heavy to set up
No automation
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 11 / 28
Encapsulation : OS virtualisation
Computer
Host OS
???
Guest OS 1
Libraries
Application v1
Libraries
Application v2
Libraries Idea: ”trick” applications intobelieving that they are in a differentOS than the host’sAvoid redundancy.
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 12 / 28
Encapsulation : OS virtualisation
Computer
Host OS Container engine
Minimal guest OS
Libraries
Application v1
Libraries
Application v2
Libraries Idea: ”trick” applications intobelieving that they are in a differentOS than the host’sAvoid redundancy.
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 13 / 28
Encapsulation : OS virtualisation
Practical Computational Reproducibility in the Life Sciences - BjornGruning et al (2018)
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 14 / 28
What is Docker?
Docker is not very “old”
First commit January 2013
First version March 2013
Version 1.0 in June 2014
But its adoption was fast
Officially packaged in Ubuntu since 2014 (v14.04)
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 15 / 28
What is Docker?
Image
Set of libraries and functions
Fixed. Cannot be modified
Can be stored/shared online
Can be automatically built
Container
”Active image”
Can be modified (interactive)
Can be turned into an image
One image, many containers
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 16 / 28
What is Docker?
(https://docs.docker.com/get-started/overview/)
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 17 / 28
What is Docker?
DockerHub
(https://hub.docker.com/)
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 18 / 28
What is Docker?
Usermade images (1/2)
(urlhttps://hub.docker.com/u/genomicpariscentre/)
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 19 / 28
What is Docker?
Usermade images (2/2)Be critical!
(https://hub.docker.com/r/genomicpariscentre/samtools/)
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 20 / 28
What is Docker?
(https://docs.docker.com/get-started/overview/)
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 21 / 28
What is Docker?
Other commands :
docker images : list images available locally
docker ps : status of containers
docker rm : delete a container
docker rmi : delete an image
...
(More details during the practical session.)
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 22 / 28
Encapsulation : OS virtualisation
Computer
Host OS Container engine
Minimal guest OS
Libraries
Application v1
Libraries
Application v2
Libraries
OS virtualisation vs hardwarevirtualisationPros:
SpeedI Installation is fasterI No boot time
LightweightI Minimal base OSI Minimal libraries and
application set
Easy sharing of applications
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 23 / 28
Encapsulation : OS virtualisation
Computer
Host OS Container engine
Minimal guest OS
Libraries
Application v1
Libraries
Application v2
Libraries Cons:
Needs root access (Singularity)
Changes of policies of theDocker company
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 24 / 28
Docker policy
Update of the Docker Image retention policy (13/08/2020)
https://www.docker.com/pricing/retentionfaq
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 25 / 28
Practical session
Practical session : Docker and Samtools.See companion document.
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 26 / 28
Practical session
Analysis workflow
green=input, blue=tool
fastqc control quality of the input reads
bowtie2 reads mapping on the genome sequence
samtools mapped reads selection & formatting
HTseq count table of mapped reads on genes (annotations)
DEseq2 statistical analysis: genes list having differential expression
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 27 / 28
Practical session
Savoir FAIRe
(Installation de Docker)
Learn the structure of a Docker command
Pull a pre-defined image available on the DockerHub
Start a container
Bonus: build a Dockerfile
Celine, Thomas, Claire (I2BC-IFB) FAIR Bioinfo IFB 2020 28 / 28