the use of average nucleotide identity (ani) for bacterial

Division of Foodborne, Waterborne, and Enviromental Diseases

National Center for Emerging and Zoonotic Infectious Diseases

The Use of Average Nucleotide Identity

(ANI) for Bacterial Identification

Patti Fields

for

Maryann Turnsek

Enteric Diseases Laboratory Branch (EDLB)

CDC

2017 APHL Annual Meeting

Providence, Rhode Island

June 14, 2017

REPLACE multiple microbiological

workflows:

• Identification

• Serotyping

• Virulence profiling

• Antimicrobial susceptibility

• Subtyping

WGS

With ONE cost-efficient

and precise method:

All of this information

can be derived from a

genome sequence!

Vision for Public Health

Microbiology in the WGS Era

Wish List for a Bacterial

Identification Method

Definitive, accurate identification

High specificity and sensitivity

Organism-agnostic

One method for all taxa

Computationally fast

Easy to perform and interpret

“Push button” solution

Works in BioNumerics framework

REFERENCE ID DATABASE

Perform a WGS-based test

to identify target

organisms:

Listeria monocytogenes

Campylobacter spp.

Escherchia coli

Salmonella enterica

Vibrio spp.

Yersinia enterocolitica

Cronobacter sakazakii

Proposed WGS-based Workflow

• Move to organism-

specific database for

subtyping

• Report ID to state

ID?

Yes

No

• Perform standard

identification methods

• Perform additional

characterization as

needed

• Report ID to state

Bacterial

Isolate

WGS, QC

Assembly

QC

DNA, QC

Goal: Consolidate Workflows in BioNumerics

WGS data Organism-specific

Database(s)

• Serotyping

• Virulence profile

• AMR profile

• wgMLST

• etc.

Reference

Identification

Database• Basic QC on raw reads

• De novo assembly

• ANI calculation

• Species ID

Bacterial

isolate

* Konstantinidis and Tiedje, 2005. Proc Natl Acad Sci USA 102(7):2567Richter and Rossello-Mora, 2009. Proc Natl Acad Sci USA 106(45):19126

Average Nucleotide Identity (ANI)* A computational method to compare two genomes

Compares an unknown query sequence to a well-

characterized reference genome

Two calculations:

• Compares the genetic similarity of shared sequences

• Determines proportion of bases aligned

Closely mirrors comparisons by DNA-DNA hybridization

The traditional gold standard method for determining

species boundaries

A C G T G T G C A C C A C G T A C T T A A C G T G T G C A C C A C G T

A C G T a T G C A C C A C G T A C T T A A C G T G T G C A C C A C G T

A C G T G T G C A C C A C G T A C T T A

A C G T G T G C A C C A C G T t C T T AQuery

Reference

MGE

MGEX

X

How ANI Works Aligns shared sequences and calculates percent

identical nucleotides

Answers the question: Are these two genomes the

same taxon? Yes or No

In this example, 53/55 aligned bases = 96.4% identity

The ANI “cutoff” value for % identity and % bases

aligned is determined empirically for each taxon

Published values are on the order of 95% identity

How ANI Works (cont)

Query

sequence for

identification

Reference

sequence library

Salmonella

enterica

Campylobacter

coli

Escherichia

albertii

Listeria

monocytogenes

Campylobacter

jejuni

Listeria

innocua

Escherichia

coli

Vibrio

cholerae

Any

values

above

cutoff for

ID?

X

X X

X

XX

X Add sequence to

Listeria database

for further

characterization:

Serotyping,

virulence genes,

wgMLST,

AST, etc.

(depending on

organism)

Pros and Cons for ANIPROs

Replicates species determinations by DNA-DNA

hybridization

Very rapid: Compare two genomes in seconds

Very robust: Reliable answer with 5X sequence coverage

(based on down-sampling experiment)

Relatively easy to interpret; clear cut off values

CONs

X Definitive identification requires a representative genome

is in the Reference Sequence Library

• New or unrepresented species cannot be identified

X Useful for comparing closely bacteria only.

• Distantly related => No Match

X As reference library gets bigger, computation time gets

longer

Clinical Laboratory Improvement Amendments (CLIA) of 1988*• Regulatory requirements for clinical testing of human specimens • Requires the establishment of performance specifications to

ensure the analytical validity of test results prior to patient testing

Compliance with regulatory requirements and quality

standards can be challenging for NGS-based assays

because traditional definitions for performance

characteristics do not readily translate to DNA

sequence methods and data.

*Code of Federal Regulations. 42 CFR Part 493. (1256)

Validation of ANI

CLIA Requirements

VALIDATIONplatform, test, informatics

TEST DEVELOPMENT AND OPTIMIZATION

PATIENT TESTING

QA/QC PT/AA

ANI CLIA Validation Plan Framework

Based on NGS Standardization of Clinical Testing (Nex-StoCT)

Working Group Recommendations*

Guidelines were developed for human genetic testing

Many of the recommendations are applicable to

microbiology tests

*Gargis et al, 2012. Nat Biotechnol. 30:1033.

Test DevelopmentEstablish workflow and QC check points

DNA

Extraction

Library

PreparationSequencing

Concentration,

Purity

Library

Concentration

Data Transfer

Assembly

Coverage,

Genome Size

Read

Quality

ANI

CalculationOrganism ID

ANI

Criteria

A total of 30 documents – SOPs, data worksheets, QC logs,

etc – were developed.

1) Extract DNA from a bacterial isolate; perform DNA sequencing.

2) Assess quality of DNA sequence data

3) Perform ANI

Example: ListeriaL.monocytogenes 2011L-2626 1/2a

L.monocytogenes 2011L-2676 1/2a


L.monocytogenes SRR1039789 1/2a


L.monocytogenes 2013L-5369 1/2c






L.monocytogenes F4554 4b

L.monocytogenes 2014L-6447 4b



L.monocytogenes SRR1016925 4b


L.monocytogenes F2365 4b


L.monocytogenes G4599 1/2b

L.monocytogenes 2013L-5347 1/2b

L.monocytogenes F4233 1/2b




L.monocytogenes J0099 4c





L.monocytogenes 2014L-6694 4c

L.monocytogenes 2014L-6652 non-typeable





L.monocytogenes F6212 non-typeable

L.monocytogenes F6214 non-typeable


100

100

100

100

100

Lineage II

Lineage III

Lineage I

Lineage IV

Strain selection:

79 Listeria genomes

39 L. monocytogenes,

including all four lineages

L. innocua, L. ivanovii,

L. marthii, L. seeligeri, and

L. welshimeri

Pairwise comparison of all

73 genomes

92% ANI is a clear cutoff for species in Listeria

Establish “Cut off” Values for ANI

De

nsi

ty

85 1009590ANI

Within

Lineage I

~ 99%

Within Listeriamonocytogenes

> 92%

Between Listeriaspecies

< 91%

6

4

2

0

Comparison

Within L. innocua

Between L. innocua and Lm

Between L.innocua and Non-Lm

Within L. ivanovii

Between L. ivanovii and Lm

Between L. ivanovii and Non-Lm

Within L. marthii

Between L. marthii and Lm

Between L. marthii and Non-Lm

Between L. seeligeri and Lm

Within L. seeligeri

Between L. seeligeri and Non-Lm

Between L. welshimeri and Lm

Within L. welshimeri

Between L. welshimeri and Non-Lm

AN

I

100

90

85

95

Within species

Between species

Comparisons of Other Listeria species

Validation Initial focus: L. monocytogenes, Escherichia, Campylobacter

Based on public health priority (plus manageable numbers)

Finalize reference sequence library

Finalize thresholds for % identity and % bases aligned

Select strain set for validation

Targeted species, near neighbors, taxonomic diversity

325 strains total

Standard parameters for validation

Establish comparative method

Define acceptance criteria (depth of coverage, % bases

aligned, accuracy, sensitivity, specificity, etc)

Identify equipment that will be used

Designate trained staff to perform validation testing

Taxon-specific Acceptance Criteria15 species validated

Development of additional analytical tools

DNA

Extraction

Library


Concentration,

Purity

Library

Concentration

Data Transfer

Assembly

Coverage,

Genome Size

Read

Quality

ANI

CalculationOrganism ID

ANI

Criteria

Validation easy(er) for subsequent tools

Same basic workflow through sequence QC

Development of additional analytical tools

DNA

Extraction

Library


Concentration,

Purity

Library

Concentration

Data Transfer

Assembly

Coverage,

Genome Size

Read

Quality

rpoBCalculation

Organism ID

rpoB

Criteria

Easy(er) for subsequent tools

Same basic workflow through sequence QC

Develop and validate each additional tool

Identification algorithms based on rpoB

Serotype prediction

Virulence and AMR profiles

etc

Future Plans for WGS-based Methods ANI for additional taxa

Salmonella, Vibrio, Cronobacter, Yersinia

Additional analytical tools

rpoB-based identification

Serotype prediction

• SeqSero for Salmonella*

• SerotypeFinder for E. coli#

VirulenceFinder for E. coil#

ResFinder for AMR profiles#

Perform reference testing on a submitted genome

sequence rather than a bacterial isolate?

* Adapted from Zhang et al, 2015. J Clin Microbiol 53:1685

# Adapted from CGE: http://www.genomicepidemiology.org/

http://www.genomicepidemiology.org/

Some Key Points We are still learning; threshold values are not set in stone.

Just like any other identification test, ANI must be

validated and performed in compliance with CLIA

regulations.

Quality in = Quality out: QA/QC built in to every step in the process

Optimum use of WGS data will require tighter integration

of diagnostic testing and surveillance activities

WGS-based identification is a CLIA workflow

Subtyping methods may be non-CLIA processes

All staff will be CLIA testing personnel?

Essential to work together to create seamless workflow

Minimize duplication of efforts

Interested in validating ANI in your

laboratory?

We hope to have a pilot ANI database available in 2017

Each laboratory will need to follow their own state and

local requirements for compliance with CLIA

regulations.

CDC can help!

Sample documents

Quality assurance tips

Strain sets

Questions?

Contact the National Enteric Reference and Outbreak Team

(NERO) in EDLB ([email protected])

mailto:[email protected]

For more information please contact Centers for Disease Control and Prevention

1600 Clifton Road NE, Atlanta, GA 30333

Telephone, 1-800-CDC-INFO (232-4636)/TTY: 1-888-232-6348

E-mail: [email protected] Web: www.cdc.gov

Thank you !

Patti Fields

[email protected]

Maryann Turnsek

[email protected]

National Center for Emerging and Zoonotic Infectious Diseases

Division of Foodborne, Waterborne, and Environmental Diseases

The findings and conclusions in this report are those of the authors and do not necessarily

represent the official position of the Centers for Disease Control and Prevention.



the use of average nucleotide identity (ani) for bacterial

Documents