cge methods and databases + docker - goseqit · 2018-10-19 · containers are different from...

22
Advanced Whole Genome Sequence Analysis, 5-6 Nov. 2018 CGE methods and databases + Docker

Upload: others

Post on 27-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CGE methods and databases + Docker - GoSeqIt · 2018-10-19 · Containers are different from Virtual Machines since they share the hosts OS Containers are lightweight, require less

Advanced Whole Genome Sequence Analysis, 5-6 Nov. 2018

CGE methods and databases + Docker

Page 2: CGE methods and databases + Docker - GoSeqIt · 2018-10-19 · Containers are different from Virtual Machines since they share the hosts OS Containers are lightweight, require less

Learning objective:

After this lecture, you should be able to…

… describe the principle behind the CGE blast-based -Finder tools

… describe the principle behind the CGE KmerFinder method for species identification

… navigate the CGE databases located on BitBucket

…describe the principle behind Docker containers in general terms

Page 3: CGE methods and databases + Docker - GoSeqIt · 2018-10-19 · Containers are different from Virtual Machines since they share the hosts OS Containers are lightweight, require less

.

.

.

(ResFinder and PointFinder)

Page 4: CGE methods and databases + Docker - GoSeqIt · 2018-10-19 · Containers are different from Virtual Machines since they share the hosts OS Containers are lightweight, require less

Publication Name of Method PMIDMultilocus sequence typing of total genome sequenced bacteria MLST 22238442

In silico detection and typing of plasmids using PlasmidFinder and plasmid multilocus sequence typing

PlasmidFinder and pMLST 24777092

Identification of acquired antimicrobial resistance genes ResFinder 22782487

PointFinder: a novel web tool for WGS-based detection of antimicrobial resistance associated with chromosomal point mutations in bacterial pathogens

PointFinder 29091202

Rapid and Easy In Silico Serotyping of Escherichia coli Isolates by Use of Whole-Genome Sequencing Data

SerotypeFinder 25972421

Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia coli

VirulenceFinder 24574290

Benchmarking of methods for genomic taxonomy KmerFinder 24574292

Page 5: CGE methods and databases + Docker - GoSeqIt · 2018-10-19 · Containers are different from Virtual Machines since they share the hosts OS Containers are lightweight, require less

The CGE BLAST-based -Finder tools

ResFinder, PlasmidFinder, VirulenceFinder works exactly like this

Page 6: CGE methods and databases + Docker - GoSeqIt · 2018-10-19 · Containers are different from Virtual Machines since they share the hosts OS Containers are lightweight, require less

The CGE BLAST-based -Finder tools with a twist

Name of Method Twist

MLST The identified MLST alleles are translated into a Sequence Type

pMLST The identified pMLST alleles are translated into a Sequence Type

PointFinderThe identified genes are

examined for presence of point mutations

SerotypeFinder The identified alleles are translated into a serotype

Page 7: CGE methods and databases + Docker - GoSeqIt · 2018-10-19 · Containers are different from Virtual Machines since they share the hosts OS Containers are lightweight, require less

•Genomesinreferencedatabaseischoppedinto16mers:

A T G A C G T A T G A C T G A T G G C G T A G T A G T C C

Species identification: KmerFinder

•Downsampling•Only16merswithspecificprefix(ATG)arekept

Page 8: CGE methods and databases + Docker - GoSeqIt · 2018-10-19 · Containers are different from Virtual Machines since they share the hosts OS Containers are lightweight, require less
Page 9: CGE methods and databases + Docker - GoSeqIt · 2018-10-19 · Containers are different from Virtual Machines since they share the hosts OS Containers are lightweight, require less

Bacterial Analysis Pipeline

Service Overview

* Species/Lineage prediction (KmerFinder)

* Multilocus Sequence Typing (MLST)

* Resistance Gene Identification (only ResFinder)

* Plasmid Identification (Enterobacteriaceae or gram positive bacteria)

* Plasmid MLST (incF, IncH1, IncH2, IncI1, IncN, or IncA/C)

* Virulence Gene Identification (Escherichia coli, Enterococcus spp., Staphylococcus aureus, and Listeria spp.)

PMID: 27327771

Page 10: CGE methods and databases + Docker - GoSeqIt · 2018-10-19 · Containers are different from Virtual Machines since they share the hosts OS Containers are lightweight, require less

Joensen et al 2014. J. Clin. Microbiol. 52(5): 1501-1510

How to get the CGE databases

Page 11: CGE methods and databases + Docker - GoSeqIt · 2018-10-19 · Containers are different from Virtual Machines since they share the hosts OS Containers are lightweight, require less

Gene content of the E. coli virulence database

Page 12: CGE methods and databases + Docker - GoSeqIt · 2018-10-19 · Containers are different from Virtual Machines since they share the hosts OS Containers are lightweight, require less

DatabasecontentStatic databases at https://cge.cbs.dtu.dk/services/data.php are no longer maintained

All databases are in an online repository called BitBucket

BitBucket: Web-based hosting service aimed at software version control systems for teams

Main advantages:

It is easy to see which changes have been made since last update

It is possible to roll back to previous versions

Page 13: CGE methods and databases + Docker - GoSeqIt · 2018-10-19 · Containers are different from Virtual Machines since they share the hosts OS Containers are lightweight, require less

_db means it is a database repository

Otherwise it is a software repository

Page 14: CGE methods and databases + Docker - GoSeqIt · 2018-10-19 · Containers are different from Virtual Machines since they share the hosts OS Containers are lightweight, require less
Page 15: CGE methods and databases + Docker - GoSeqIt · 2018-10-19 · Containers are different from Virtual Machines since they share the hosts OS Containers are lightweight, require less
Page 16: CGE methods and databases + Docker - GoSeqIt · 2018-10-19 · Containers are different from Virtual Machines since they share the hosts OS Containers are lightweight, require less
Page 17: CGE methods and databases + Docker - GoSeqIt · 2018-10-19 · Containers are different from Virtual Machines since they share the hosts OS Containers are lightweight, require less
Page 18: CGE methods and databases + Docker - GoSeqIt · 2018-10-19 · Containers are different from Virtual Machines since they share the hosts OS Containers are lightweight, require less

https://www.goseqit.com/advanced-workshop-day2/

Page 19: CGE methods and databases + Docker - GoSeqIt · 2018-10-19 · Containers are different from Virtual Machines since they share the hosts OS Containers are lightweight, require less

Docker

Page 20: CGE methods and databases + Docker - GoSeqIt · 2018-10-19 · Containers are different from Virtual Machines since they share the hosts OS Containers are lightweight, require less

Problem: Software developed on one system is often difficult to get to run on another system (has low portability)

Installation of racon

Racon installation is a bit tricky and has several dependencies:

gcc (we already installed this with the “sudo yum groupinstall "Development Tools”” command earlier)

cmake (unfortunately cmake installing via “sudo yum install cmake” leads to an old version, so we have to do it it like this):

$ wget https://cmake.org/files/v3.6/cmake-3.6.2.tar.gz

$ tar -zxvf cmake-3.6.2.tar.gz

$ cd cmake-3.6.2

$ sudo ./bootstrap --prefix=/usr/local

$ sudo make

$ sudo make install

Finally, the file .bash_profile must be modified by adding the line “PATH=/home/ec2-user/cmake-3.6.2/bin:$PATH:$HOME/bin”.

Now, for the installation of racon:

$ git clone --recursive https://github.com/isovic/racon.git racon

$ cd racon

From Ex. 4 Extra

Page 21: CGE methods and databases + Docker - GoSeqIt · 2018-10-19 · Containers are different from Virtual Machines since they share the hosts OS Containers are lightweight, require less

ContainersThe goal of software containers: To isolate the software and its dependencies into a self-contained unit that can run anywhere

Containers are different from Virtual Machines since they share the hosts OS

Containers are lightweight, require less memory, and launch very fast

Docker has been around since 2008 and has become synonymous with containers, but there are other providers (e.g., Singularity)

Server

Host OS

Docker Engine

Bins/Libs

Bins/Libs

Bins/Libs

App #1

App #1

App #1

Containers

Page 22: CGE methods and databases + Docker - GoSeqIt · 2018-10-19 · Containers are different from Virtual Machines since they share the hosts OS Containers are lightweight, require less

Docker terminology

Docker file: A text file containing instructions for how to build a Docker image

Docker image: A package with all the dependencies and information needed to create a container

Docker hub: Public repository of Docker images. Images can be downloaded (pull’ed)

Docker container: An instance of a Docker image representing the execution of a single application or process