the genomics behind the amerithrax investigation claire m. fraser-liggett, jacques ravel, david a....
TRANSCRIPT
The Genomics Behind the Amerithrax Investigation
Claire M. Fraser-Liggett, Jacques Ravel, David A. RaskoInstitute for Genome Sciences
Department of Microbiology and ImmunologyUniversity of Maryland School of Medicine
Baltimore, MD 21201
Institute for Genome SciencesUniversity of Maryland School of Medicine
The Science behind the Amerithrax
Obtained B. anthracis spore preparations (powder) from the letters and other sources (buildings, victims...)
Goal: establish origin or a linkages between events, people, places and processes
Finding unique features using traditional DNA-based analysis
Institute for Genome SciencesUniversity of Maryland School of Medicine
New York post letter spore preparation
The Science behind the Amerithrax
Leahy letter spore preparation
Institute for Genome SciencesUniversity of Maryland School of Medicine
The Amerithrax investigation
Traditional genotyping methods did not achieve high discriminative power
Would the genome sequence of these B. anthracis isolates differ, and could this information be used for attribution?
All isolates were identified as B. anthracis Ames (strain)
Institute for Genome SciencesUniversity of Maryland School of Medicine
The Amerithrax investigation
Obtained DNA from B. anthracis recovered from the spinal fluid of Mr. Stevens
Sequenced the genome of B. anthracis Florida
Need to compare the genome sequence to that of an appropriate reference
Porton Down strain was not an appropriate reference
Sequenced the genome of B. anthracis Ames Ancestor
Institute for Genome SciencesUniversity of Maryland School of Medicine
fully virulent (pXO1, pXO2)
isolated from a dead Beefmaster heifer in Texas in 1981
used worldwide for vaccine challenge studies
sequenced and closed (AE017334)
chromosome : 5,227,419 bp
pXO1 : 181,677 bp
pXO2 : 94,830 bp
B. anthracis Ames Ancestor: the reference
Institute for Genome SciencesUniversity of Maryland School of Medicine
SNP analysis in B. anthracis genomes
Institute for Genome SciencesUniversity of Maryland School of Medicine
SNP analysis in B. anthracis genomes>ID642-16 (94989bp) 200731 c-->t 50878 COV: 16 QUAL: 31.5NON-SYNONYMOUS (2): GGC - G ---> GAC - D (from start: 5 bp/848 bp)AMES : GAAGAAGCGCTAATAAAATGCCCATTACAAGACTCCCTTCG 200751 |||||||||||||||||||| ||||||||||||||||||||QUERY : GAAGAAGCGCTAATAAAATGTCCATTACAAGACTCCCTTCG 50898ORF01519 transporter putative 50873 A 564 AAAAAAAAAAAAAAAA 35:36:35:35:36:36:36:36:35:27:36:36:36:36:37:36 50874 A 569 AAAAAAAAAAAAAAAA 34:36:35:35:36:36:36:36:35:35:36:36:36:36:35:36 50875 A 550 AAAAAAAAAAAAAAAA 18:36:35:35:36:36:36:36:34:36:36:35:36:34:35:36 50876 T 545 TTTTTTTTTTTTTTTT 13:36:36:35:36:36:36:34:35:38:35:34:36:34:37:34 50877 G 517 GGGGGGGGGGGGGGGG 17:35:35:35:34:37:36:36:32:26:34:34:33:36:22:35*50878 T 504 TTTTTTTTTTTTTTTT 11:36:35:33:34:36:31:34:35:19:31:35:31:36:33:34 50879 C 559 CCCCCCCCCCCCCCCC 33:34:35:35:36:36:34:35:32:35:36:35:36:36:35:36 50880 C 515 CCCCCCCCCCCCCCC 34:35:35:36:34:36:36:29:34:36:35:35:36:35:29 50881 A 508 AAAAAAAAAAAAAAA 36:34:34:36:40:34:36:32:26:36:34:36:36:32:26 50882 T 513 TTTTTTTTTTTTTTT 36:35:35:36:35:36:36:36:25:36:37:35:36:29:30 50883 T 498 TTTTTTTTTTTTTTT 36:35:34:36:33:36:36:35:15:36:35:32:36:33:30REFERENCE: B. anthracis Ames Ancestor 200726 A 334 AAAAAAAAAA 29:29:37:31:51:22:45:40:35:15 200727 A 362 AAAAAAAAAA 25:41:37:34:51:27:45:40:34:28 200728 A 347 AAAAAAAAAA 29:41:37:34:45:27:45:40:34:15 200729 T 341 TTTTTTTTTT 25:41:45:33:45:21:45:25:34:27 200730 G 336 GGGGGGGGGG 33:41:38:33:45:16:45:24:34:27*200731 C 334 CCCCCCCCCC 33:37:40:37:45:16:18:45:35:28 200732 C 318 CCCCCCCCCC 33:37:40:38:45:18:4:45:35:23 200733 C 284 CCCCCCCCC 33:37:37:37:41:15:4:45:35 200734 A 307 AAAAAAAAA 27:45:37:37:51:21:4:51:34 200735 T 291 TTTTTTTTT 24:45:37:37:41:24:4:45:34 200736 T 293 TTTTTTATT 24:45:45:37:41:24:4:45:35COV: 10 QUAL: 33.4
Access to genome coverage, sequence quality scores, and sequence in a single file
Institute for Genome SciencesUniversity of Maryland School of Medicine
Surprisingly, no differences were found between Florida isolate and Ames ancestor sequences
No differences in over 5 million base pairs - these two genome sequences were identical
What did it mean? Could genomics still help in making a match?
The Amerithrax investigation
Institute for Genome SciencesUniversity of Maryland School of Medicine
Morphological variants analyzed
Origin Type Sequencing
Leahy Letter Wild Type 12 X
Leahy Letter A Closed
Leahy Letter B 12 X
Leahy Letter C 12 X
Leahy Letter E/Opaque 12 X
NY Post Letter Wild Type Closed
NY Post Letter A 12 X
NY Post Letter B Closed
Strategy - Discover genetic differences by comparing the genome sequences to Ames Ancestor.
Institute for Genome SciencesUniversity of Maryland School of Medicine
Wild Type isolates from letters
When the closed genome of B. anthracis Ames Ancestor was compared to the closed genome of a wild type isolate of the NY Post letter and the draft genome of a wild type isolate of the Leahy Letter, no polymorphisms were found.
The genome sequences were 100% identical at each of the 5,227,419 bp compared.
Accessing differences through a combination of microbiological analyses and whole genome sequencing
Institute for Genome SciencesUniversity of Maryland School of Medicine
Morphological variants isolation
Visual differential tests
Colony morphology, color, sporulation efficiency for each spore samples
Institute for Genome SciencesUniversity of Maryland School of Medicine
Morphological variants isolation
Morphological variants characteristics
Institute for Genome SciencesUniversity of Maryland School of Medicine
Morphological variants characteristics
All four morphotypes were found in the spores preparations recovered from each of the four letters
This was a key step in the investigation, as it linked the two events in New York and Washington DC
Physical examination of the spore preparation originally showed different quality in the New York and Washington DC letters
Institute for Genome SciencesUniversity of Maryland School of Medicine
Morphological variants isolation
Could the population composition be the ID that can be used to make a match?
Genomics
Institute for Genome SciencesUniversity of Maryland School of Medicine
Morphological variants analyzed
Origin Type Sequencing
Leahy Letter Wild Type 12 X
Leahy Letter A Closed
Leahy Letter B 12 X
Leahy Letter C 12 X
Leahy Letter E/Opaque 12 X
NY Post Letter Wild Type Closed
NY Post Letter A 12 X
NY Post Letter B Closed
Strategy - Discover genetic differences by comparing the genome sequences to Ames Ancestor.
Institute for Genome SciencesUniversity of Maryland School of Medicine
Morphological variants A
Institute for Genome SciencesUniversity of Maryland School of Medicine
Morphological variants A
Leahy Letter ALeahy Letter A
Ames AncestorAmes Ancestor
PCR assayPCR assay
Institute for Genome SciencesUniversity of Maryland School of Medicine
Morphological variants A
NY Post Letter ANY Post Letter A
Leahy Letter ALeahy Letter A
Daschle Letter ADaschle Letter A
ORF1573: polysaccharide deacetylaseORF1573: polysaccharide deacetylase
ORF1572: conserved hypothetical proteinORF1572: conserved hypothetical protein
Ames AncestorAmes Ancestor
These three morphotypes A are morphologically identical, but genetically different. All three are present in the letters
Institute for Genome SciencesUniversity of Maryland School of Medicine
Morphological variants B - NY Post Letter
>ID161revcom-1 (295550bp) 5065092 T-->C 8836 COV: 9 QUAL: 43AMES : CCAAATTCCTTCCTATTCTTTCTTATTTTACCGTTTCCTAT 5065112 |||||||||||||||||||| |||||||||||||||||||| B : CCAAATTCCTTCCTATTCTTCCTTATTTTACCGTTTCCTAT 8856Intergenic 286720 T 373 TTTTTTTTT 47:40:47:38:36:36:47:40:42 286719 T 375 TTTTTTTTT 47:40:47:38:36:36:47:40:44 286718 C 388 CCCCCCCCC 47:45:47:38:36:36:45:47:47 286717 T 360 TTTTTTTTT 47:40:47:34:36:24:44:47:41 286716 T 387 TTTTTTTTT 47:40:47:38:36:36:49:47:47*286715 C 387 CCCCCCCCC 41:44:47:38:36:36:49:49:47 286714 C 372 CCCCCCCCC 41:40:47:38:36:36:47:40:47 286713 T 365 TTTTTTTTT 40:40:47:32:36:36:47:40:47 286712 T 367 TTTTTTTTT 40:40:47:34:36:36:47:40:47 286711 A 380 AAAAAAAAA 44:45:47:38:36:36:47:40:47 286710 T 368 TTTTTTTTT 44:40:44:34:36:38:45:40:47REFERENCE: B. anthracis Ames Ancestor 5065087 T 595 TTTTTTTTTTTTTTTTT 45:40:45:41:24:51:34:27:33:37:34:4:51:36:36:36:21 5065088 T 598 TTTTTTTTTTTTTTTTT 45:40:45:41:28:51:33:31:33:37:33:4:51:36:36:36:18 5065089 C 639 CCCCCCCCCCCCCCCCC 51:49:49:41:31:51:34:33:34:37:38:15:51:36:36:36:17 5065090 T 668 TTTTTTTTTTTTTTTTT 51:49:49:51:33:51:36:31:34:37:38:15:51:36:36:36:34 5065091 T 673 TTTTTTTTTTTTTTTTT 51:49:49:51:34:51:38:33:40:37:45:4:51:36:36:35:33*5065092 T 654 TTTTTTTTTTTTTTTTT 51:45:45:51:34:51:33:36:40:37:45:4:51:36:36:35:24 5065093 C 628 CCCCCCCCCCCCCCCCC 45:36:45:51:31:51:33:36:37:37:40:15:45:36:35:35:20 5065094 T 614 TTTTTTTTTTTTTTTTT 45:36:38:51:30:51:31:36:36:37:40:4:45:34:36:36:28 5065095 T 593 TTTTTTTTTTTTTTTT- 45:36:45:45:31:51:31:38:36:37:34:15:45:39:36:29:0 5065096 A 607 AAAAAAAAAAAAAAAAA 45:36:45:45:31:51:28:38:33:37:34:18:41:39:34:34:18 5065097 T 642 TTTTTTTTTTTTTTTTTT 45:34:45:45:31:51:28:38:33:37:34:19:41:28:34:36:35:28 COV: 17 QUAL: 38.47
SNP found in each variant B analyzedSNP found in each variant B analyzed
Institute for Genome SciencesUniversity of Maryland School of Medicine
Morphological variants B
KinB
spoOF
Institute for Genome SciencesUniversity of Maryland School of Medicine
Morphological variants C - Leahy Letter
>ID25-2 (934628bp) 2139122 g-->a 47177 COV: 9 CB_QVal: 352 QUAL: 39.11NON-SYNONYMOUS (2): TGG - W --->
TAG - _ (from start: 685 bp/1130 bp)
AMES : CGTATCGAAAAAGGAAATGTGGAATGAATCAGAAAGTTTTT 2139142 C : CGTATCGAAAAAGGAAATGTAGAATGAATCAGAAAGTTTTT 47197
ORF05728 sensor histidine kinase 47172 A 369 AAAAAAAAA 47:47:47:47:44:47:19:36:35 47173 A 368 AAAAAAAAA 47:47:47:41:47:44:19:41:3547174 T 373 TTTTTTTTT 47:47:47:36:47:47:21:47:34 47175 G 361 GGGGGGGGG 41:36:47:47:47:47:22:44:30 47176 T 358 TTTTTTTTT 47:41:47:47:44:47:17:38:30*47177 A 352 AAAAAAAAA 47:41:49:41:36:47:27:38:26 47178 G 359 GGGGGGGGG 47:36:41:36:47:47:14:47:44 47179 A 351 AAAAAAAAA 36:47:47:44:41:36:20:47:33 47180 A 371 AAAAAAAAA 47:47:47:47:36:47:20:47:33 47181 T 365 TTTTTTTTT 36:47:47:47:41:47:20:47:33 47182 G 364 GGGGGGGGG 47:47:41:36:41:47:23:47:35
REFERENCE: B. anthracis Ames Porton 2139117 A 319 AAAAAAAAAA 4:36:37:36:36:36:31:34:36:33 2139118 A 299 AAAAAAAAAA 4:36:37:36:36:36:28:29:36:21 2139119 T 290 TTTTTTTTTT 4:34:36:36:36:28:28:24:36:28 2139120 G 308 GGGGGGGGGG 18:34:36:36:36:27:28:26:36:31 2139121 T 291 TTTTTTTTTT 4:34:36:35:36:29:28:22:36:31 *2139122 G 311 GGGGGGGGGG 4:34:36:36:36:34:28:33:36:34 2139123 G 309 GGGGGGGGGG 4:35:37:36:36:36:20:34:36:35 2139124 A 309 AAAAAAAAAA 4:34:31:36:36:36:32:29:36:35 2139125 A 295 AAAAAAAAAA 4:34:31:36:36:36:34:17:34:33 2139126 T 317 TTTTTTTTTT 15:35:29:34:36:34:36:23:40:35 2139127 G 313 GGGGGGGGGG 4:35:34:34:36:36:33:32:35:34
COV: 10 CBQUAL: 311 QUAL: 31.1
SNP results in Amber stop codon and a truncated proteinSNP results in Amber stop codon and a truncated protein
Institute for Genome SciencesUniversity of Maryland School of Medicine
Morphological variants C/D - Leahy Letter
Institute for Genome SciencesUniversity of Maryland School of Medicine
Morphological variants C/D - Leahy Letter
Wild Type 1 MEMEGMEVFPIDKDIKEVFCSHLKNNRHQFVENWKNKMIISDKDPFRLEV 50Morphotype D 1 MEMEGMEVFPIDKDIKEVFCSHLKNNRHQFVENWKNKMIISDKDPFRLEV 50 Wild Type 51 VQNGEDLLEFIIELIMEEKDINYLQPLCEKIAIERAGADANIGDFVYNAN 100orphotype D 51 VQNGEDLLEFIIELIMEEKDINYLQPLCEKIAIERAGADANIGDFVYNAN 100
Wild Type 101 VGRNELFEAMCELDVSARELKPIMNQIHTCFDKLIYYTVLKYSEIISRNL 150Morphotype D 101 VGRNELFEAMCELDVSARELKPIMNQIHTCFDKLIYYTVLKYSEIISRNL 150
Wild Type 151 EEKQQYINETHKERLTILGQMSASFVHEFRNPLTSIMGFVKLLKADHPSL 200Morphotype D 151 EEKQQYINETHKERLTILGQMSASFVHEFRN------------------- 181
•Wild Type 201 SYLDIISHELDQLNFRISQFLFVSKKEMWNESESFWLNDLFQDIIQFLYP 250•Morphotype D 182 -------------------------------------------------- 182
•Wild Type 251 SLVNANVSIEKNLPYPIPLTGYRSEVRQVFLNILMNSIDALESMKEERKI 300•Morphotype D 182 -----------------PLTGYRSEVRQVFLNILMNSIDALESMKEERKI 214
•Wild Type 301 IIDVFEEDQSIRIVIKNNGPMIPAENVETIFEPFVTTKKLGTGIGLFVCK 350•Morphotype D 215 IIDVFEEDQSIRIVIKNNGPMIPAENVETIFEPFVTTKKLGTGIGLFVCK 264
•Wild Type 351 QIVEKHNGSIMCRSDDDWTEFQIAFQK* 378•Morphotype D 265 QIVEKHNGSIMCRSDDDWTEFQIAFQK* 292
Institute for Genome SciencesUniversity of Maryland School of Medicine
Morphological variants E - Leahy Letter
INDEL-REF-1:172510 21 bp ID3 (12311 bp) INS on REFGBAAA0205 response regulator putative
R: ATCAATATATGCTTGATAGTTTAAGTATTGGAAAAGATAGTTTTGATAAAGTAGATTCACTGE: ATCAATATATGCTTGATAGTT------------------------------------------TTGATAAAGTAGATTCACTG
Institute for Genome SciencesUniversity of Maryland School of Medicine
Morphological variants E - Leahy Letter
GBAAA0205 1 MIVSVKGNEQITKMLNDWYIEIRARHVGKAHNLKLEIDQKIHNIEEDQNL 50 LL18 1 MIVSVKGNEQITKMLNDWYIEIRARHVGKAHNLKLEIDQKIHNIEEDQNL 50 LL19 1 MIVSVKGNEQITKMLNDWYIEIRARHVGKAHNLKLEIDQKIHNIEEDQNL 50 **************************************************GBAAA0205 51 LLYYALLDFRHQYMLDSLSIGKDSFDKVDSLGVPADQFLQYYYHFFKAIH 100 LL18 51 LLYYALLDFRHQYMLD-------SFDKVDSLGVPADQFLQYYYHFFKAIH 93 LL19 51 LLYYALLDFRHQYMLDS---LKDSFDKVDSLGVPADQFLQYYYHFFKAIH 97 **************** ***************************GBAAA0205 101 SNITGDFTSAKEHYNQAELLLKHIPDEIEHAEFRFKLSTFHYHIYKPLAA 150 LL18 94 SNITGDFTSAKEHYNQAELLLKHIPDEIEHAEFRFKLSTFHYHIYKPLAA 143 LL19 98 SNITGDFTSAKEHYNQAELLLKHIPDEIEHAEFRFKLSTFHYHIYKPLAA 147 **************************************************GBAAA0205 151 IKEATKAKDIFKKHAGYETNIGLCDNLIGLACTHLKQFEEAEEHFITAIN 200 LL18 144 IKEATKAKDIFKKHAGYETNIGLCDNLIGLACTHLKQFEEAEEHFITAIN 193 LL19 148 IKEATKAKDIFKKHAGYETNIGLCDNLIGLACTHLKQFEEAEEHFITAIN 197 **************************************************GBAAA0205 201 TFKKSGKEKNITFVRHNLGLMYSGQNLSELAIRYLSEVTQELPKDYKAIF 250 LL18 194 TFKKSGKEKNITFVRHNLGLMYSGQNLSELAIRYLSEVTQELPKDYKAIF 243 LL19 198 TFKKSGKEKNITFVRHNLGLMYSGQNLSELAIRYLSEVTQELPKDYKAIF 247 **************************************************GBAAA0205 251 IKAREHMKIGESKETYNLIVKGLEICKELKNEEYEHHFLILEKLNQKVSA 300 LL18 244 IKAREHMKIGESKETYNLIVKGLEICKELKNEEYEHHFLILEKLNQKVSA 293 LL19 248 IKAREHMKIGESKETYNLIVKGLEICKELKNEEYEHHFLILEKLNQKVSA 297 **************************************************GBAAA0205 301 DELEKTIKTGISYFKRENLHEYVQEYAKKLAVLFHQENNRSKASDYFYLS 350 LL18 294 DELEKTIKTGISYFKRENLHEYVQEYAKKLAVLFHQENNRSKASDYFYLS 343 LL19 298 DELEKTIKTGISYFKRENLHEYVQEYAKKLAVLFHQENNRSKASDYFYLS 347 **************************************************GBAAA0205 351 HQAEEQNFEKEALK* 365 LL18 344 HQAEEQNFEKEALK* 358 LL19 348 HQAEEQNFEKEALK* 362 ***************
Morphotypes/genotypes
Variant Mutation Locus Function
A Duplication
rRNA-D,Polysaccharide
deacetylase and GBAA0151
unknown
B SNP Upstream spoOFInitation of Sporulation
C/D SNP/INDEL Sensor histidine kinase
Phosphorylation of
SpoOF/SpoOA
E INDEL (9/21 bp)pXO1
response regulator
Dephosphorylation of SpoOF
Institute for Genome SciencesUniversity of Maryland School of Medicine
Morphotypes/genotypes
Institute for Genome SciencesUniversity of Maryland School of Medicine
Develop and validate quantitative PCR assays for each of the genotypes (genetic variation)
Screen a repository of nearly 1,100 samples
All four mutations were only found in one source
RMR-1029 USMARIID
No other samples with three hits, a few with two or one hits
Making the match
Institute for Genome SciencesUniversity of Maryland School of Medicine
RMR-1029
Institute for Genome SciencesUniversity of Maryland School of Medicine
Population genetics - the unique signature
Science Magazine - August 2008
Population genetics - the unique signature
Population genetics - a minor subpopulation is unique to the spore preparation recovered from the letters.
These polymorphisms could be used to screen spore preparations, not single B. anthracis colonies on a plate.
Link the NY and Washington DC letters.
Timeline
Crime committed Fall 2001
Variants identified 2001-2002
Variants genetically characterized 2002-2003
Assay developed and validated 2003-2006
All repository collections and analysis completed mid 2007Preparation for indictment 2008
Unfortunate and tragic dealth of the suspect 2008
The scientific investigation was key in identification of the source material
More traditional police investigation was required to narrow down the list of potential perpetrators
Microbial Forensics - A new field (attribution/exclusion) - Different scientific standards
Making the match and attribution
AcknowledgmentsAcknowledgments
National Institute of Allergy and Infectious Diseases(N01-AI15447)National Institute of Allergy and Infectious Diseases(N01-AI15447)
IGS/TIGRDavid RaskoLingxia JiangRegina CerSteven SalzbergClaire Fraser-Liggett
IGS/TIGRDavid RaskoLingxia JiangRegina CerSteven SalzbergClaire Fraser-Liggett
FBIJason BannanMark WilsonRichard LanghamScott StanleyScott DeckerMatt Feinberg
FBIJason BannanMark WilsonRichard LanghamScott StanleyScott DeckerMatt Feinberg
USAMRIIDPatricia Worsham
USAMRIIDPatricia Worsham
Federal Bureau of Investigation (J-FBI-02-016)Federal Bureau of Investigation (J-FBI-02-016)
NIH/NIAIDMaria Giovanni
NIH/NIAIDMaria Giovanni
NSFRita ColwellMaryanna Henckart
NSFRita ColwellMaryanna Henckart
National Science FoundationNational Science Foundation
NAUPaul Keim
NAUPaul Keim
Institute for Genome SciencesUniversity of Maryland School of Medicine