computational resources in infectious disease
TRANSCRIPT
Computational Resources in Infectious Disease
João André Carriço, Microbiology Institute and Instituto de Medicina Molecular, Faculty of Medicine, University of [email protected] twitter: @jacarrico
ME081 – Meet-The-Expert Session26th ECCMID, Amsterdam, Netherlands 7-12 April 2016
Disclaimer This presentation is not intended to cover all available
software or databases (we would need several weeks or months to do that)
I’ll present what I use or intend to use in a near future
I gladly accept any suggestions to included on similar presentations in the future.
It is supposed to be interactive so ask away during the presentation.
Summary Available Databases
Virulence Factors and AMR DBs Sequence-based typing databases: Pubmlst.org / Enterobase
High Throughput Sequencing data analysis (freeware) Prokka Roary Nullabor Microreact.org PHYLOViZ
Commercial Solutions Bionumerics 7.5 CLC Genomics Workbench (CLC Bio) Ridom Seqsphere+
VF DatabasesVirulence Factor Databases VFDB (http://www.mgc.ac.cn/VFs/main.htm) Pathosystems Resource Integration Center (PATRIC)
VF (https)://www.patricbrc.org/) Victors (http://www.phidias.us/victors/) PHI-Base (http://www.phi-base.org/) MvirDB (http://mvirdb.llnl.gov/ )
To know more: - Presentation on the Controversies in interpreting whole genome sequence data session : http://eccmidlive.org/#resources/how-can-we-design-actionable-virulome-databases
Antibiotic Resistance Databases Comprehensive Antibiotic Resistance Database
(CARD) (https://card.mcmaster.ca/)
Repository of Antibiotic resistance Cassetes (RAC) (http://rac.aihi.mq.edu.au/rac/)
Integrall :The integron database (http://integrall.bio.ua.pt/)
(…)
Sequenced my strain…now what?
To know more : http://www.slideshare.net/nickloman/eccmid-2015-so-i-have-sequenced-my-genome-what-now
Reads(fastq files)
contigs(fasta files)
Annotated contigs(gbk/gff files)
Roary :Pan Genome Analysis
Enterobase BIGSdb
Nullabor
PHYLOViZ:Tree + metada visualization
Microreact.org: Tree +metadata +vizualization
Prok
ka
De novo assembler
Sequence Based Typing :Pubmlst /BIGSdb
http://www.pubmlst.org
http://bigsdb.web.pasteur.fr/
Sequence Based Typing :Enterobase
slide by @happy_khan
Martin SergeantMark AchtmanNabil-Fareed AlikhanZhemin Zhou
Prokka Genome annotation made easy by
Torsten Seemann (slides by Torsten) Genome annotation: adding
biological information to the sequence, by describing features
To know more :http://www.slideshare.net/torstenseemann/prokka-rapid-bacterial-genome-annotation-abphm-2013
Available at: https://github.com/tseemann/prokka
Roary Pan genome analysis by Andrew Page Available at: https://sangerpathogens.github.io/Roary/
Core genome
Accessory genome
Pan-genome
Roary Inputs: Annotated de novo assemblies (GFF files)
• Typically from the annotation pipeline
Outputs:• Spreadsheet with presence and absence of genes• Multi-FASTA alignment of core genes so you can build a tree
without a reference• Multi-FASTA alignments for each gene• Plots for the open/closed genome, unique genes• Integrates with iCANDY so you can visualise all structural variation• QC report from Kraken to help identify suspect samples
(Slide by Andrew Page)
Roary outputs
Core (n or n-1 strains)
Soft-Core (n-2 or n-3 strains)
Shell( 8(?) to n-3 strains)
Cloud( <8 (?) strains)
Core genome:Core + Soft-Core
Accessory genome:Shell + Cloud
Roary outputs
iCANDY output of presence and absence of genes in accessory genome.S. Weltevreden & public S. enterica genomes
(Slide by Andrew Page)
Nullarbor Complete pipeline from reads to reports by Torsten
Seemann
Objective is automate analysis for everyday use on public health labs /research settings
Uses and distills outputs by a lot of software
Avaliable at: https://github.com/tseemann/nullarbor
Nullarbor
Slide by Torsten Seeman
Nullarbor
From: https://github.com/tseemann/nullarbor
Some Nullarbor outputs in report
Slides by Torsten Seeman
PHYLOViZwww.phyloviz.net
PHYLOViZInputs:- Tab separated txt
(profiles)- Fasta files- Automatic database
retrieval (MLST) Outputs:• goeBURST and
goeBURST MST• Link quality assessment• High quality images
Can be easily applied to:- MLST/ cgMLST/wgMLST- MLVA- SNP data*- Gene Presence/absence
PHYLOViZ 2.0
New features: • Hierarchical clustering • Neighbor-Joining• Project Saving
PHYLOViZ Online Available at http://online.phyloviz.net
Web based version of PHYLOViZ
Allows users to create their own datasets, save them and share their data (privately or publicly)
REST API available
Scalable to thousands of nodes
Tree Analysis tools: Interactive distance matrix NLV graph
PHYLOViZ Online
Slide by @happy_khan
PHYLOViZ Online
PHYLOViZ Online
NLV Graph
Tree cut-off
Full MST
microreact.org
microreact.org
microreact.org
Create Selections
Change tree options
microreact.org Available at http://microreact.org/
Presentation on session Harnessing whole genome sequence data for public health applications : Novel open access tools for WGS-based pathogen surveillance and the identification of high-risk clones
http://eccmidlive.org/#resources/novel-open-access-tools-for-wgs-based-pathogen-surveillance-and-the-identification-of-high-risk-clones
Meet The Experts (available on twitter by order of appearance)
Commercial solutions
• Ridom Seqsphere+ : http://www.ridom.de/seqsphere/ • Applied Maths Bionumerics 7.6: http://www.applied-maths.com/bionumerics• CLCBio Genomic Workbench : http://www.clcbio.com/blog/clc-genomics-workbench-7-5/
Take home messages• Huge variety of software and database
solutions
• There is no single One-Size-Fits-All solution (job security for bioinformaticians)
• Different questions require different approaches
• Always questions the results and data provenance
ECCMID2015 Meet-the-expert session on “What bioinformatic tools should I use for analysis of High Throughput Sequencing data for molecular diagnostics? ”
Nick Loman: http://www.slideshare.net/nickloman/eccmid-2015-meettheexpert-bioinformatics-tools
João André Carriço: http://www.slideshare.net/joaoandrecarrico/eccmid-meet-theexpert2015
More references/presentations
Acknowledgments UMMI Members
Bruno Gonçalves Mário Ramirez José Melo-Cristino
INESC-ID Alexandre Francisco Cátia Vaz Marta Nascimento
EFSA INNUENDO Project (https://sites.google.com/site/innuendocon/) Mirko Rossi
FP7 PathoNGenTrace (http://www.patho-ngen-trace.eu/): Dag Harmsen (Univ. Muenster) Stefan Niemann (Research Center Borstel) Keith Jolley, James Bray and Martin Maiden (Univ. Oxford) Joerg Rothganger (RIDOM) Hannes Pouseele (Applied Maths)
Genome Canada IRIDA project (www.irida.ca) Franklin Bristow, Thomas Matthews, Aaron Petkau, Morag Graham and Gary Van Domselaar (NLM , PHAC) Ed Taboada and Peter Kruczkiewicz (Lab Foodborne Zoonoses, PHAC) Fiona Brinkman (SFU) William Hsiao (BCCDC) INTEGRATED RAPID INFECTIOUS DISEASE ANALYSIS