genome sequencing of pathogens with epidemic potential

24
Genome Sequencing of Pathogens with Epidemic Potential: Implications for Control of Communicable Diseases Communicable Diseases Vitali Sintchenko Centre for Infectious Diseases and Microbiology – Public Health, ICPMR Sydney Emerging Infections and Biosecurity Institute, The University of Sydney

Upload: others

Post on 10-Jan-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genome Sequencing of Pathogens with Epidemic Potential

Genome Sequencing of Pathogens with Epidemic Potential: Implications for Control of Communicable DiseasesCommunicable Diseases

Vitali Sintchenko

Centre for Infectious Diseases and Microbiology – Public Health, ICPMR

Sydney Emerging Infections and Biosecurity Institute, The University of Sydney

Page 2: Genome Sequencing of Pathogens with Epidemic Potential

Outline

• Transformational power of Whole Genome Sequencing

(WGS) technologies

• Added value of WGS of pathogens with epidemic

potential to public health

• International initiatives for WGS data sharing• International initiatives for WGS data sharing

• Challenges of assuring that this value is realised

2

Page 3: Genome Sequencing of Pathogens with Epidemic Potential

Magnitude of microbial diversity

• Number of microbes on Earth 5 x 1030

• Number of stars in the Universe 7 x 1021

• Number of humans 6 x 109

• Number of human cells in one human 1013

• Number of microbial cells in one human 1014• Number of microbial cells in one human 1014

• Number of microbial genes in one human gut 3 x 106

3

Page 4: Genome Sequencing of Pathogens with Epidemic Potential

Accelerating technology

Specialist technology

Portable technology

Bench top technology

• Human Genome Project• 15 years and $3 billion

• Celera genome (J. Craig Venter)• 9 months and $100 million

• Currently -• 3 hours and $1,000

Harvard/MIT 2005

MiniON sequencer 2012 (Oxford Nanopore Technologies)

Ion Torrent PGM

4

• 3 hours and $1,000• One human genome being sequenced

every 3 minutes

• Sequencing of H. influenzae in 1995 took 13 months and costed >$1 million

• > 3K complete bacterial genomes published in NCBI GenBank

Page 5: Genome Sequencing of Pathogens with Epidemic Potential

WGS bench-top instruments

Instrument Chemistry Read length (bases)

Run time (hours)

Data output per run

454GS Junior (Roche)

Pyrosequencing 500 8 35Mb

MiSeq Reversible 150 27 1.5Gb

5

MiSeq(Illumina)

Reversible terminator

150 27 1.5Gb

Ion Torrent PGM (Life Technologies)

Proton detection 200 3 500Mb (316 chip) or up to

1Gb (318 chip)

Rapid WGS of bacteria in clinical settings can be cost-saving (clinically relevant time to result ~ 50h)

Page 6: Genome Sequencing of Pathogens with Epidemic Potential

Advantages of WGS data

• Pathogen independent solution with high throughput, speed and quality

• Sequences represent smallest biologically meaningful units (ATCG….)

• DNA sequences represent agnostic and ‘future-proof’ data amenable to exchange and comparison (highly portable ‘molecular Esperanto’)

• Rapid growth of public DBs with reference sequences

6

Page 7: Genome Sequencing of Pathogens with Epidemic Potential

Conventional Microbiology

WGS based examination

Organism growth detected from culture

Clinical specimen

WGS

From Pasteur to Watson

IdentificationWGS

identification & characterisation

Characterisation/typing by reference laboratory

(phage typing, PFGE, MLST, VNTR etc) Upload to reference DB with

early warning of emergence or spread of virulent/resistant

strains

Identification of specific subtypes from WGS data

Page 8: Genome Sequencing of Pathogens with Epidemic Potential

Core and accessory/variable genomes

4000

6000

8000

Nu

mb

er o

f g

enes

• Core (essential functions; conserved in all strains)

• Accessory/dispensable genome

• Pathogenicity islands, prophages, transposons and integrated plasmids

• Strain-specific genes

8

0

2000

4000

Escherichia

coli

Pseudomonas

aeruginosa

Streptococcus

pyogenes

Streptococcus

pneumoniae

Core genome Variable genome

70%39% 57%57%

Nu

mb

er o

f g

enes

Page 9: Genome Sequencing of Pathogens with Epidemic Potential

Transformational power of WGS

• Diagnostic microbiology

• Monitoring emerging clones and new pathogens

• Discovery of virulence/drug resistance mechanisms

• Laboratory surveillance (local, national, global)

• Outbreak detection (at point of first secondary case)

• Detection of covert clusters (proof-of-concept studies demonstrated WGS superiority to current typing methods*)

• Tracing of transmission events within outbreaks

• Source attribution and ‘molecular compass’ – geographical structure among related isolates

9

• Mycobacterium tuberculosis (Gardy et al, 2012; Walker et al, 2013)

• Enterohaemorrhagic Escherichia coli (Underwood et al, 2013)

• Listeria monocytogenes (Gilmour et al, 2010)

• Acinetobacter baumanii (Lewis et al, 2010)

• Legionella pneumophila (Reuter et al, 2013)

• MRSA (Köser et al, 2012)

Page 10: Genome Sequencing of Pathogens with Epidemic Potential

Approaches to genome wide comparison

• Variable Tandem Repeat Analysis (VNTR)

• Problematic for NGS assembled genomes

• Single nucleotide polymorphism approach

• Works well for monomorphic organisms

• ‘Subjective’ SNP selection

• May be difficult to reproduce• May be difficult to reproduce

• Gene by gene approach

• Hierarchical locus-by-locus analysis

• wgMLST, MLST+, cgMLST

• Intragenic variation is counted as a single event

• Can place the isolate in context with existing typing methods

10

Page 11: Genome Sequencing of Pathogens with Epidemic Potential

Core genome MLST+

11

Jolley et al. JCM 2012; 59(9): 3046

Ribosomal MLST (rMLST)53 conserved genesClassification according toThe Bacterial Isolate Genome Sequence database (BIGSdb)

Page 12: Genome Sequencing of Pathogens with Epidemic Potential

Gene-by-gene genomic similarity

Designation of sequence types (ST) and clonal

complexes (CC)

Chambers & DeLeo. Nature Rev Micro 2009;7:629

Size of the node is proportional to the number

of isolates with this sequence type in the

database

S.aureus strains with one locus difference

Page 13: Genome Sequencing of Pathogens with Epidemic Potential

SNPs genetic diversity of related isolates

Serial isolates from patients with long-

term cavitating pulmonary disease, non-compliant with

therapy

13

Walker et al. Lancet Infect Dis 2013

Page 14: Genome Sequencing of Pathogens with Epidemic Potential

Zooming in to mutations in genomes

• Mutations (e.g., single nucleotide variants or polymorphisms [SNPs]) often accumulate randomly

• Different rates of mutations• MRSA – 1 SNP/3 months (Croucher

1 mutationdifference

• MRSA – 1 SNP/3 months (Croucher et al. Science 2011) or 1 SNP/6 weeks (Harris et al. Science 2010)

• Vibrio cholerae – 3.3 SNPs per annum (Mutreja et al. Nature 2011)

• M. tuberculosis – 1 SNP/2 years (Walker et al. Lancet Infect Dis 2012)

• Accumulation during the course of outbreak or natural variation?

14

Evolutionary time

2 mutationsdifference

No difference

Page 15: Genome Sequencing of Pathogens with Epidemic Potential

Inferring direction of transmission

(a) No direction can be inferred

(b) and (c) The root suggests

15

Walker et al. Clin Microbiol Infect 2013

(b) and (c) The root suggests transmission from left to right

(d) a central source case infects three secondary cases

(e) likely undiagnosed common source case

Page 16: Genome Sequencing of Pathogens with Epidemic Potential

Deciphering outbreaksStep 1: Binary interpretation of subtyping results (match vs. mismatch)

16

Time

Page 17: Genome Sequencing of Pathogens with Epidemic Potential

Deciphering outbreaksStep 2: Inferring directionality of transmission from WGS data

17

Time

?

Page 18: Genome Sequencing of Pathogens with Epidemic Potential

New Lab Infrastructure

• Bioinformatics pipelines (QC, genome assembly, variant calling, sequence typing etc)

• Standard, stable and scalable to amount of data

• Reproducible results

• Lab ethernet capacity (?1 GB/sec)

• Data processing and storage• Data processing and storage

• Data analysis – Cloud computing/HPC disk (e.g. Lustre)

• Pipelines - Linux/NFS server

• Warehouse – Compressed data storage and backup

18

Page 19: Genome Sequencing of Pathogens with Epidemic Potential

International initiatives

• Europe - Patho-NGen-Trace• EU funded FP7 strategic research

• Northern America• 100,000 Microbial Genomes/Genome Tracker (FDA/UCD)

• Advanced Molecular Detection (CDC)

• Integrated Rapid Infectious Disease Analysis [IRIDA] (Canada)

• Global Microbial Identifier• Global Microbial Identifier• 25 countries

• Mission – link and share WGS and epidemiological data in near real-

time for public health surveillance

• Targets:• Pipeline with 4 h TAT for outbreak detection

• Proficiency testing schemes for WGS (sequencing, genome assembly and

genome analysis steps)

19

Page 20: Genome Sequencing of Pathogens with Epidemic Potential

Challenges

• Global health diplomacy• WHO IHR should include “sharing of sequencing data”

• Minimal data sets and open access • Requires collaboration between different sectors (human

and animal health, food and environment) and

stakeholders (government, commercial and not-for-profit)stakeholders (government, commercial and not-for-profit)

• Ethics and confidentiality issues

• Sharing of benefits and IP rights• DNA sequence as a potential commodity

• IT infrastructure

20

Page 21: Genome Sequencing of Pathogens with Epidemic Potential

Exchange of genomic data

• 26 TB/day of download

• 4 TB of data exchange

EBI

Prediction of storm tracks

21

NCBI GenBank

EBI

DDBJ

Hurricane Sandy

Page 22: Genome Sequencing of Pathogens with Epidemic Potential

GMI Minimal Pathogen Metadata

Sample name WHATOrganismStrain/isolate

Category/Attribute1a) Clinical/Host associated• Specific_host

Collection_date WHEN

Geographic location WHERE6a) Geo_loc_name

OR• Specific_host• Isolation_source• Host_disease

OR

1b) Environmental/Food/Other• Isolation_source

6b) Lat_lon

Collected by WHO

Courtesy of James Ostell, NCBI

22

Page 23: Genome Sequencing of Pathogens with Epidemic Potential

Concluding remarks

• Evidence-based recommendations for implementation

of WGS in public health practice and the assessment of

outcomes are required

• Technical framework (your data is worth more if you share it)

• Proficiency testing and standardisation of WGS processes and • Proficiency testing and standardisation of WGS processes and

data analysis to guide WGS evaluation and implementation

• Education and professional training• Establish WGS training and competencies for public health

professionals, clinicians and scientists

23

Page 24: Genome Sequencing of Pathogens with Epidemic Potential

Thank You!

24

Thank You!

[email protected]