the use of average nucleotide identity (ani) for bacterial
TRANSCRIPT
Division of Foodborne, Waterborne, and Enviromental Diseases
National Center for Emerging and Zoonotic Infectious Diseases
The Use of Average Nucleotide Identity
(ANI) for Bacterial Identification
Patti Fields
for
Maryann Turnsek
Enteric Diseases Laboratory Branch (EDLB)
CDC
2017 APHL Annual Meeting
Providence, Rhode Island
June 14, 2017
REPLACE multiple microbiological
workflows:
• Identification
• Serotyping
• Virulence profiling
• Antimicrobial susceptibility
• Subtyping
WGS
With ONE cost-efficient
and precise method:
All of this information
can be derived from a
genome sequence!
Vision for Public Health
Microbiology in the WGS Era
Wish List for a Bacterial
Identification Method
Definitive, accurate identification
High specificity and sensitivity
Organism-agnostic
One method for all taxa
Computationally fast
Easy to perform and interpret
“Push button” solution
Works in BioNumerics framework
REFERENCE ID DATABASE
Perform a WGS-based test
to identify target
organisms:
Listeria monocytogenes
Campylobacter spp.
Escherchia coli
Salmonella enterica
Vibrio spp.
Yersinia enterocolitica
Cronobacter sakazakii
Proposed WGS-based Workflow
• Move to organism-
specific database for
subtyping
• Report ID to state
ID?
Yes
No
• Perform standard
identification methods
• Perform additional
characterization as
needed
• Report ID to state
Bacterial
Isolate
WGS, QC
Assembly
QC
DNA, QC
Goal: Consolidate Workflows in BioNumerics
WGS data Organism-specific
Database(s)
• Serotyping
• Virulence profile
• AMR profile
• wgMLST
• etc.
Reference
Identification
Database• Basic QC on raw reads
• De novo assembly
• ANI calculation
• Species ID
Bacterial
isolate
* Konstantinidis and Tiedje, 2005. Proc Natl Acad Sci USA 102(7):2567Richter and Rossello-Mora, 2009. Proc Natl Acad Sci USA 106(45):19126
Average Nucleotide Identity (ANI)* A computational method to compare two genomes
Compares an unknown query sequence to a well-
characterized reference genome
Two calculations:
• Compares the genetic similarity of shared sequences
• Determines proportion of bases aligned
Closely mirrors comparisons by DNA-DNA hybridization
The traditional gold standard method for determining
species boundaries
A C G T G T G C A C C A C G T A C T T A A C G T G T G C A C C A C G T
A C G T a T G C A C C A C G T A C T T A A C G T G T G C A C C A C G T
A C G T G T G C A C C A C G T A C T T A
A C G T G T G C A C C A C G T t C T T AQuery
Reference
MGE
MGEX
X
How ANI Works Aligns shared sequences and calculates percent
identical nucleotides
Answers the question: Are these two genomes the
same taxon? Yes or No
In this example, 53/55 aligned bases = 96.4% identity
The ANI “cutoff” value for % identity and % bases
aligned is determined empirically for each taxon
Published values are on the order of 95% identity
How ANI Works (cont)
Query
sequence for
identification
Reference
sequence library
Salmonella
enterica
Campylobacter
coli
Escherichia
albertii
Listeria
monocytogenes
Campylobacter
jejuni
Listeria
innocua
Escherichia
coli
Vibrio
cholerae
Any
values
above
cutoff for
ID?
X
X X
X
XX
X Add sequence to
Listeria database
for further
characterization:
Serotyping,
virulence genes,
wgMLST,
AST, etc.
(depending on
organism)
Pros and Cons for ANIPROs
Replicates species determinations by DNA-DNA
hybridization
Very rapid: Compare two genomes in seconds
Very robust: Reliable answer with 5X sequence coverage
(based on down-sampling experiment)
Relatively easy to interpret; clear cut off values
CONs
X Definitive identification requires a representative genome
is in the Reference Sequence Library
• New or unrepresented species cannot be identified
X Useful for comparing closely bacteria only.
• Distantly related => No Match
X As reference library gets bigger, computation time gets
longer
Clinical Laboratory Improvement Amendments (CLIA) of 1988*• Regulatory requirements for clinical testing of human specimens • Requires the establishment of performance specifications to
ensure the analytical validity of test results prior to patient testing
Compliance with regulatory requirements and quality
standards can be challenging for NGS-based assays
because traditional definitions for performance
characteristics do not readily translate to DNA
sequence methods and data.
*Code of Federal Regulations. 42 CFR Part 493. (1256)
Validation of ANI
CLIA Requirements
VALIDATIONplatform, test, informatics
TEST DEVELOPMENT AND OPTIMIZATION
PATIENT TESTING
QA/QC PT/AA
ANI CLIA Validation Plan Framework
Based on NGS Standardization of Clinical Testing (Nex-StoCT)
Working Group Recommendations*
Guidelines were developed for human genetic testing
Many of the recommendations are applicable to
microbiology tests
*Gargis et al, 2012. Nat Biotechnol. 30:1033.
Test DevelopmentEstablish workflow and QC check points
DNA
Extraction
Library
PreparationSequencing
Concentration,
Purity
Library
Concentration
Data Transfer
Assembly
Coverage,
Genome Size
Read
Quality
ANI
CalculationOrganism ID
ANI
Criteria
A total of 30 documents – SOPs, data worksheets, QC logs,
etc – were developed.
1) Extract DNA from a bacterial isolate; perform DNA sequencing.
2) Assess quality of DNA sequence data
3) Perform ANI
Example: ListeriaL.monocytogenes 2011L-2626 1/2a
L.monocytogenes 2011L-2676 1/2a
L.monocytogenes 2010L-1846 1/2a
L.monocytogenes SRR1039789 1/2a
L.monocytogenes 2011L-2625 1/2a
L.monocytogenes 2013L-5369 1/2c
L.monocytogenes 2010L-2074 1/2a
L.monocytogenes 2010L-1878 1/2a
L.monocytogenes 2012L-5240 1/2a
L.monocytogenes 2009L-1023 1/2a
L.monocytogenes 2013L-5409 1/2a
L.monocytogenes F4554 4b
L.monocytogenes 2014L-6447 4b
L.monocytogenes 2013L-5275 4b
L.monocytogenes 2013L-5562 4b
L.monocytogenes SRR1016925 4b
L.monocytogenes 2013L-5504 4b
L.monocytogenes F2365 4b
L.monocytogenes 2014L-6438 4b
L.monocytogenes G4599 1/2b
L.monocytogenes 2013L-5347 1/2b
L.monocytogenes F4233 1/2b
L.monocytogenes 2009L-1181 1/2b
L.monocytogenes 2014L-6025 1/2b
L.monocytogenes 2011L-2624 1/2b
L.monocytogenes J0099 4c
L.monocytogenes 2015L-6039 4b
L.monocytogenes 2014L-6657 4b
L.monocytogenes 2013L-5616 4b
L.monocytogenes 2013L-5455 4b
L.monocytogenes 2014L-6694 4c
L.monocytogenes 2014L-6652 non-typeable
L.monocytogenes 2013L-5130 4b
L.monocytogenes 2013L-5061 4b
L.monocytogenes 2014L-6393 4b
L.monocytogenes 2013L-5443 4b
L.monocytogenes F6212 non-typeable
L.monocytogenes F6214 non-typeable
L.monocytogenes 2014L-6256 4b
100
100
100
100
100
Lineage II
Lineage III
Lineage I
Lineage IV
Strain selection:
79 Listeria genomes
39 L. monocytogenes,
including all four lineages
L. innocua, L. ivanovii,
L. marthii, L. seeligeri, and
L. welshimeri
Pairwise comparison of all
73 genomes
92% ANI is a clear cutoff for species in Listeria
Establish “Cut off” Values for ANI
De
nsi
ty
85 1009590ANI
Within
Lineage I
~ 99%
Within Listeriamonocytogenes
> 92%
Between Listeriaspecies
< 91%
6
4
2
0
Comparison
Within L. innocua
Between L. innocua and Lm
Between L.innocua and Non-Lm
Within L. ivanovii
Between L. ivanovii and Lm
Between L. ivanovii and Non-Lm
Within L. marthii
Between L. marthii and Lm
Between L. marthii and Non-Lm
Between L. seeligeri and Lm
Within L. seeligeri
Between L. seeligeri and Non-Lm
Between L. welshimeri and Lm
Within L. welshimeri
Between L. welshimeri and Non-Lm
AN
I
100
90
85
95
Within species
Between species
Comparisons of Other Listeria species
Validation Initial focus: L. monocytogenes, Escherichia, Campylobacter
Based on public health priority (plus manageable numbers)
Finalize reference sequence library
Finalize thresholds for % identity and % bases aligned
Select strain set for validation
Targeted species, near neighbors, taxonomic diversity
325 strains total
Standard parameters for validation
Establish comparative method
Define acceptance criteria (depth of coverage, % bases
aligned, accuracy, sensitivity, specificity, etc)
Identify equipment that will be used
Designate trained staff to perform validation testing
Taxon-specific Acceptance Criteria15 species validated
Development of additional analytical tools
DNA
Extraction
Library
PreparationSequencing
Concentration,
Purity
Library
Concentration
Data Transfer
Assembly
Coverage,
Genome Size
Read
Quality
ANI
CalculationOrganism ID
ANI
Criteria
Validation easy(er) for subsequent tools
Same basic workflow through sequence QC
Development of additional analytical tools
DNA
Extraction
Library
PreparationSequencing
Concentration,
Purity
Library
Concentration
Data Transfer
Assembly
Coverage,
Genome Size
Read
Quality
rpoBCalculation
Organism ID
rpoB
Criteria
Easy(er) for subsequent tools
Same basic workflow through sequence QC
Develop and validate each additional tool
Identification algorithms based on rpoB
Serotype prediction
Virulence and AMR profiles
etc
Future Plans for WGS-based Methods ANI for additional taxa
Salmonella, Vibrio, Cronobacter, Yersinia
Additional analytical tools
rpoB-based identification
Serotype prediction
• SeqSero for Salmonella*
• SerotypeFinder for E. coli#
VirulenceFinder for E. coil#
ResFinder for AMR profiles#
Perform reference testing on a submitted genome
sequence rather than a bacterial isolate?
* Adapted from Zhang et al, 2015. J Clin Microbiol 53:1685
# Adapted from CGE: http://www.genomicepidemiology.org/
Some Key Points We are still learning; threshold values are not set in stone.
Just like any other identification test, ANI must be
validated and performed in compliance with CLIA
regulations.
Quality in = Quality out: QA/QC built in to every step in the process
Optimum use of WGS data will require tighter integration
of diagnostic testing and surveillance activities
WGS-based identification is a CLIA workflow
Subtyping methods may be non-CLIA processes
All staff will be CLIA testing personnel?
Essential to work together to create seamless workflow
Minimize duplication of efforts
Interested in validating ANI in your
laboratory?
We hope to have a pilot ANI database available in 2017
Each laboratory will need to follow their own state and
local requirements for compliance with CLIA
regulations.
CDC can help!
Sample documents
Quality assurance tips
Strain sets
Questions?
Contact the National Enteric Reference and Outbreak Team
(NERO) in EDLB ([email protected])
For more information please contact Centers for Disease Control and Prevention
1600 Clifton Road NE, Atlanta, GA 30333
Telephone, 1-800-CDC-INFO (232-4636)/TTY: 1-888-232-6348
E-mail: [email protected] Web: www.cdc.gov
Thank you !
Patti Fields
Maryann Turnsek
National Center for Emerging and Zoonotic Infectious Diseases
Division of Foodborne, Waterborne, and Environmental Diseases
The findings and conclusions in this report are those of the authors and do not necessarily
represent the official position of the Centers for Disease Control and Prevention.