whole genome sequencing for outbreak analysis and pathogen ... · whole genome sequencing for...
Post on 18-Jun-2020
6 Views
Preview:
TRANSCRIPT
Whole genome sequencing for outbreak analysis and pathogen typing
Challenges and Opportunities
Alan TsangScientific Officer (Medical)
Microbiology Division, PHLSB
23 Dec 2019
Agenda• Overview of typing
• WGS-based typing
• Examples
• Challenges and advantages of WGS
Typing• Allow differentiation of microbes beyond the species and subspecies
level
– To relate individual cases to an outbreak of infectious disease
– To establish an association between an outbreak of food poisoning
and a specific food vehicle
– To trace the source of contaminants within a manufacturing process
Typing
• Phenotypic
– Characterization of bacteria based on expressed traits
• Serotyping
• Genotyping
– Characterization of bacteria based on genetic content
• Pulsed–field gel electrophoresis (PFGE)
• Multi-locus sequence typing (MLST)
• Variable-number tandem repeat (VNTR) typing
Drawbacks• Low resolution
– Only rough idea of relationship between isolates
• Labour intensive
– Lots of tedious lab work
• Relatively expensive
– In time and consumables
3 years ago…
Systems Comparison
15Gb Output• For E. coli, ~5 Mb • 80x coverage depth• ~ 0.4 Gb• ~ 3% of a MiSeq run
Whole genome sequencing workflow
Kwong JC et al. Whole genome sequencing in clinical and public health microbiology. Pathology. 2015
Next-Gen Sequencing Library preparation
How does Illumina sequencing work
Better libraries, better runs, better data
Basic genome informatics• Millions of DNA sequences
– Reads
• Typically 50-300 bp each
• Includes quality information
• File size ~ 1 gigabyte
Whole genome sequencing workflow
Kwong JC et al. Whole genome sequencing in clinical and public health microbiology. Pathology. 2015
Two main approaches
• Gene-by-gene comparisons
• Single Nucleotide Polymorphism (SNP) analysis
gene-by-gene comparisons • Compare in Gene level
• Multi-Locus Sequence Typing (cgMLST/wgMLST)
• Can be standardized between laboratories
• Databases:
• Ridom SeqSphere+ (Commercial Software)
• BIGSdb
cgMLST database
cgMLST database
Genomes and Loci
L1
Strain 1
Strain 2
Strain 3
Strain 4
Strain 5
Strain 6
L2 L3 L4 L5 L6 L7 L8
L1
L1
L1
L1
L1
L1
L2 L3 L4 L5 L6 L7 L8
L2850
L2850
L3 L4 L5 L6 L7 L8 L2850
L3 L4 L5 L6 L8 L2850
L2 L3 L4 L5 L7 L8 L2850
L3 L4 L5 L6 L7 L2850
L2 L3 L4 L5 L6 L7 L8 L2850
…..
…..
…..…..…..
…..…..
cgMLST
L1
Strain 1
Strain 2
Strain 3
Strain 4
Strain 5
Strain 6
L3 L4 L5
L1
L1
L1
L1
L1
L1
L3 L4 L5
L3 L4 L5
L3 L4 L5
L3 L4 L5
L3 L4 L5
L3 L4 L5
L2850
L2850
L2850
L2850
L2850
L2850
L2850
…..
…..
…..…..…..
…..…..
Genomes and Loci
L1
Strain 1 1111….1
Strain 2 1111….1
Strain 3 2211….2
Strain 4 3322….3
Strain 5 2111….2
Strain 6 1111….1
L3 L4 L5
L1
L1
L1
L1
L1
L1
L3 L4 L5
L2850
L2850
L3 L4 L5 L2850
L3 L4 L5 L2850
L3 L4 L5 L2850
L3 L4 L5 L2850
L3 L4 L5 L2850
Whole genome sequencing workflow
Kwong JC et al. Whole genome sequencing in clinical and public health microbiology. Pathology. 2015
Single Nucleotide Polymorphism (SNP) analysis
• This approach provides an even higher resolution power than cgMLST
• A difference between DNA sequences in the identity of a single
nucleotide (an A, T, G, or C)
• have the advantage of including intergenic regions
Read mapping
What a SNP look likeSNP (A=>G)
Reference
SNP-based typing
Ref GGCAGCAGTGTCTTGCCCGATTGCAGGATGAGTTACCAGCCACAGAATT
Strain A GGCAGCAGTGTCATGCCCGATTCCAGGATGAGTTACCAGCCACAGAATT
Strain B GGCAGCAGTGTCATGCCCGATTCCAGGATGAGTTACCAGCCACAGAATT
Strain C GGCAGCAGTGTCATGCCCGATTGCAGGATGAGTTACCAGCCACAGAATT
Strain D GGCAGCAGTGTCATGCCCGATTCCAGGATGAGTTACCAGCCACAGAATT
Strain E GGCAGCAGTGTCATGCCCGATTCCAGGATGAGTTACCAGCCACAGAATT
Strain F GCCACCAGAGTCTTACCGGATAGCAGCATGAGATACCTGCCACACAATT
SNP-based typing
A B C D E
A
B 0
C 1 1
D 0 0 1
E 0 0 1 0
F 12 12 11 12 12
Phylogenetic treeA
B
D
E
C
F
1 SNP
SNP matrixConcatenated SNP’s from the SNP matrix are
used to construct a phylogenetic tree
Ref GGCAGCAGTGTCTTGCCCGATTGCAGGATGAGTTACCAGCCACAGAATT
Strain A GGCAGCAGTGTCATGCCCGATTCCAGGATGAGTTACCAGCCACAGAATT
Strain B GGCAGCAGTGTCATGCCCGATTCCAGGATGAGTTACCAGCCACAGAATT
Strain C GGCAGCAGTGTCATGCCCGATTGCAGGATGAGTTACCAGCCACAGAATT
Strain D GGCAGCAGTGTCATGCCCGATTCCAGGATGAGTTACCAGCCACAGAATT
Strain E GGCAGCAGTGTCATGCCCGATTCCAGGATGAGTTACCAGCCACAGAATT
Strain F GCCACCAGAGTCTTACCGGATAGCAGCATGAGATACCTGCCACACAATT
SNP-based typingRef GGTTGCTGGTAG
Strain A GGTAGCTCGTAG
Strain B GGTAGCTCGTAG
Strain C GGTAGCTGGTAG
Strain D GGTAGCTCGTAG
Strain E GGTAGCTCGTAG
Strain F CCATAGAGCATC
A B C D E
A
B 0
C 1 1
D 0 0 1
E 0 0 1 0
F 12 12 11 12 12
Phylogenetic treeA
B
D
E
C
F
1 SNP
SNP matrixConcatenated SNP’s from the SNP matrix are
used to construct a phylogenetic tree
Example – outbreak investigation• In 2019, a cluster of Candida auris colonization occurred in a public
hospital in Hong Kong and affected 15 patients over a period of
approximately one month. This occurrence marked the first ever
detection of C. auris in Hong Kong.
• Whole-genome sequencing for the isolates was performed as part of the
outbreak investigation.
Major clades of Candida auris
Strains were:
• Very different across clades
• Highly related within clade
SNP numbers will vary…
using SNP callingpipeline A
using SNP callingpipeline B
SNP analysis • Many academic researchers have developed pipelines for similar
analysis, some of which are publically available
– output vary
• Many variables affect the number of measured SNPs between isolates
– tools employed
– SNP-calling filters / parameters
– species (nucleotide mutation rates vary between pathogens)
– reference sequence
– number and diversity of isolates analyzed
– time between samples
• Interpret genomic data in parallel with local epidemiological data
• No SNP databases or nomenclature is available
Schürch AC et al. Clin Microbiol Infect. 2018
Hatherell HA et. al. BMC Med. 2016
Example – serovar prediction• Traditional serology and the Kauffmann White Scheme (KWS) have
been the gold standard for Salmonella serotyping
– maintained by the World Health Organization (WHO)
Collaborating Centre for Reference and Research on Salmonella,
located at the Pasteur Institute in Paris, France
– The current (9th) edition issued in 2007 comprises antigenic
variants that had been validated as of January 1, 2007
• Evaluate the potential use of WGS to serve as a method for the routine
serotyping of Salmonella isolates
Salmonella Serotyping Using WGS
Strain Traditional Serotyping
Tool A Tool B v1 Tool B v2
1 Derby Derby N/A Derby
2 Bovismorbificans Bovismorbificans N/A Bovismorbificans
3 Wandsworth Wandsworth Wandsworth N/A
4 Typhimurium I 4,[5],12:i:- Typhimurium Typhimurium
5 Chailey Breda Chailey Chailey
6 Virchow Virchow N/A Virchow
7 Urbana Johannesburg N/A Urbana
8 Crewe Crewe|Poitiers N/A Crewe
New edition of the scheme - 2020
Challenges• Different pipelines
– different results
• Different versions of same pipeline
– different results
Drawbacks• Interpretation of WGS data
• A set of standardized tools and guidelines is not defined yet
• Cost?
• Data storage
– WGS generates large amounts of data
– requires both physical space and virtual space
• Internet connection/speed
– The large amounts of data generated by WGS need to be
transferred through the Internet to be available and of benefit to the
global community
Benefits of WGS• Performance
– a far superior resolution
– provides more information on pathogens
• Ease of sharing
– can be easily exchanged electronically around the globe
– can be stored in repositories (e.g. NCBI, EBI)
– the genomic data can be reanalyzed locally at any time
– local pathogens can easily be compared with other sequences in
publicly available international databases, allowing the local
outbreak to be interpreted in an international context
• Universality
– universal across all pathogens
X species-specific primer
X species-specific enzyme
Thank you
For Your Attention
top related