cge methods and databases + docker - goseqit · 2018-10-19 · containers are different from...
TRANSCRIPT
Advanced Whole Genome Sequence Analysis, 5-6 Nov. 2018
CGE methods and databases + Docker
Learning objective:
After this lecture, you should be able to…
… describe the principle behind the CGE blast-based -Finder tools
… describe the principle behind the CGE KmerFinder method for species identification
… navigate the CGE databases located on BitBucket
…describe the principle behind Docker containers in general terms
.
.
.
(ResFinder and PointFinder)
Publication Name of Method PMIDMultilocus sequence typing of total genome sequenced bacteria MLST 22238442
In silico detection and typing of plasmids using PlasmidFinder and plasmid multilocus sequence typing
PlasmidFinder and pMLST 24777092
Identification of acquired antimicrobial resistance genes ResFinder 22782487
PointFinder: a novel web tool for WGS-based detection of antimicrobial resistance associated with chromosomal point mutations in bacterial pathogens
PointFinder 29091202
Rapid and Easy In Silico Serotyping of Escherichia coli Isolates by Use of Whole-Genome Sequencing Data
SerotypeFinder 25972421
Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia coli
VirulenceFinder 24574290
Benchmarking of methods for genomic taxonomy KmerFinder 24574292
The CGE BLAST-based -Finder tools
ResFinder, PlasmidFinder, VirulenceFinder works exactly like this
The CGE BLAST-based -Finder tools with a twist
Name of Method Twist
MLST The identified MLST alleles are translated into a Sequence Type
pMLST The identified pMLST alleles are translated into a Sequence Type
PointFinderThe identified genes are
examined for presence of point mutations
SerotypeFinder The identified alleles are translated into a serotype
•Genomesinreferencedatabaseischoppedinto16mers:
A T G A C G T A T G A C T G A T G G C G T A G T A G T C C
Species identification: KmerFinder
•Downsampling•Only16merswithspecificprefix(ATG)arekept
Bacterial Analysis Pipeline
Service Overview
* Species/Lineage prediction (KmerFinder)
* Multilocus Sequence Typing (MLST)
* Resistance Gene Identification (only ResFinder)
* Plasmid Identification (Enterobacteriaceae or gram positive bacteria)
* Plasmid MLST (incF, IncH1, IncH2, IncI1, IncN, or IncA/C)
* Virulence Gene Identification (Escherichia coli, Enterococcus spp., Staphylococcus aureus, and Listeria spp.)
PMID: 27327771
Joensen et al 2014. J. Clin. Microbiol. 52(5): 1501-1510
How to get the CGE databases
Gene content of the E. coli virulence database
DatabasecontentStatic databases at https://cge.cbs.dtu.dk/services/data.php are no longer maintained
All databases are in an online repository called BitBucket
BitBucket: Web-based hosting service aimed at software version control systems for teams
Main advantages:
It is easy to see which changes have been made since last update
It is possible to roll back to previous versions
_db means it is a database repository
Otherwise it is a software repository
https://www.goseqit.com/advanced-workshop-day2/
Docker
Problem: Software developed on one system is often difficult to get to run on another system (has low portability)
Installation of racon
Racon installation is a bit tricky and has several dependencies:
gcc (we already installed this with the “sudo yum groupinstall "Development Tools”” command earlier)
cmake (unfortunately cmake installing via “sudo yum install cmake” leads to an old version, so we have to do it it like this):
$ wget https://cmake.org/files/v3.6/cmake-3.6.2.tar.gz
$ tar -zxvf cmake-3.6.2.tar.gz
$ cd cmake-3.6.2
$ sudo ./bootstrap --prefix=/usr/local
$ sudo make
$ sudo make install
Finally, the file .bash_profile must be modified by adding the line “PATH=/home/ec2-user/cmake-3.6.2/bin:$PATH:$HOME/bin”.
Now, for the installation of racon:
$ git clone --recursive https://github.com/isovic/racon.git racon
$ cd racon
From Ex. 4 Extra
ContainersThe goal of software containers: To isolate the software and its dependencies into a self-contained unit that can run anywhere
Containers are different from Virtual Machines since they share the hosts OS
Containers are lightweight, require less memory, and launch very fast
Docker has been around since 2008 and has become synonymous with containers, but there are other providers (e.g., Singularity)
Server
Host OS
Docker Engine
Bins/Libs
Bins/Libs
Bins/Libs
App #1
App #1
App #1
Containers
Docker terminology
Docker file: A text file containing instructions for how to build a Docker image
Docker image: A package with all the dependencies and information needed to create a container
Docker hub: Public repository of Docker images. Images can be downloaded (pull’ed)
Docker container: An instance of a Docker image representing the execution of a single application or process