perspectives on the future of forensic genetics
TRANSCRIPT
Perspectives on the Future of Forensic Genetics
Bruce BudowleInstitute of Applied Genetics
Department of Molecular and Medical Genetics,
University of North Texas Health Science Center
Fort Worth, Texas USA
This image cannot currently be displayed.
The Motivation
This image cannot currently be displayed.
Goals
• Identification
• Associations
• Exclusions
• Investigative leads
• Databases
• Demands• High volume
• Lead to backlogs
• High throughput sample processing
• Special attention situations
• Expedites
• Technology developments and enhancements to meet demands
This image cannot currently be displayed.
• Redefining Forensic Genetics
• Sample Collection
• Low Copy Number Typing
• Statistical Analyses• For LCN and Mixture
• Rapid DNA Typing
• Next Generation Sequencing***• Novel Investigative Leads
Prospects for the Future
This image cannot currently be displayed.
• No field has embraced molecular biology as a diagnostic tool more so than forensics
• The “Gold Standard” of the forensic science disciplines
• Such accolades while somewhat deserved can be detrimental to quality and performance• become complacent
• Need to question, critique, and build
• Still much to improve upon
Forensic Genetics
This image cannot currently be displayed.
• The application of genetics for the resolution of legal cases (Jobling and Gill 2004)
• The application of genetics to human and nonhuman material (in the sense of a science with the purpose of studying inherited characteristics for the analysis of inter- and intraspecific variations in populations) for the resolution of legal conflicts (FSI Genetics 2007)
• Commonly used descriptions that reflect typical applications – support in resolution of established cases (suspect/victim….)
Definition of Forensic Genetics
This image cannot currently be displayed.
• Many applications that do not begin with courtroom type scenario
• Investigative leads• Database searches
• Familial searches
• Phenotype
• Ancestry
• Cause and manner of death
• etc
Expand Definition
This image cannot currently be displayed.
Crime Scene Investigation
• Recovery of Sample• Swab
• Adsorption/absorption• Efficiency of sample recovery from scene
• Release from swab matrix• DNA yield from swab
• These two desirable features must be balanced
• Opposing properties
This image cannot currently be displayed.
Diversity of SwabsFab Swab
GE EasiCollect
Bode Buccal Collector FTA Paper
Hydraflock Swab with breakoff tab
Cotton-Tip Swab
• Omni Swab
This image cannot currently be displayed.
DiomicsSwab Material
• Features that suggest this material may perform better than current swabs• Polymer
• Highly absorptive
• Potentially can dissolve swab
• Better is defined as greater DNA yield; and quality of DNA typing results
This image cannot currently be displayed.
DNA Yield - SalivaX Swab vsCopan Swab
This image cannot currently be displayed.
Initial Low-Copy Number Work
• Early work on “touch samples”:• van Oorschot, R. A. and Jones, M. K. (1997) Nature.
387(6635): 767
• Findlay, I., et al (1997) Nature. 389(6651): 555-556
• Application to routine limited quantity casework:
• Gill, P., et al (2000) Forensic Sci. Int. 112(1): 17-40
• Whitaker, J. P., et al (2001) Forensic Sci. Int. 123(2-3): 215-223
• Gill, P. (2001) Croatian Medical Journal 42(3): 229-32
• Note that Touch Samples do not necessarily equate to LCN samples
This image cannot currently be displayed.
Comparison of STR Results with Different DNA Amounts
1ng Standard Result
Allele Drop In Allele Drop Out
Locus Drop Out
Increased Stutter (43%)
Allele Drop Out
33pg LT-DNA: 2 repsHeterozygote peak imbalance (57%)
This image cannot currently be displayed.
Hypothesis Driven Methodologies
• Gold standard limitation is most evident in mixture interpretation
• Substantial subjectivity
• But good sign is substantial discussion• The real strength of the field
• Present and future issues will be in hypothesis formulation, interpretation of results, documentation and communication• mtDNA led the way
• Education
This image cannot currently be displayed.
Models
From J Bright Presentation 2013
This image cannot currently be displayed.
Allele Drop-out and Drop-in Rates
• D: the probability of drop-out of one allele of a heterozygote ( )• Depends on locus and DNA quantity; from 0.0 to 0.66
have been reported• Can be as high as 100% in a specific case
• D2: the homozygote drop-out probability• D2 ≈ ½ D2;
•
• C: drop in probability • Some include both stutter and contamination together
1D D= −
2 21D D= −
Balding. Interpreting low template DNA profiles. Forensic Sci Int Genet. 2009 Dec;4(1):1-10.
This image cannot currently be displayed.
Hd: V + Unknown = E
Victim Unknown Likelihood
AB AA
AB BB
AB CC
AB DD
AB AB
AB AC
AB AD
AB BC
AB BD
AB CD
Pr( | ,Unknown, )pABC AB H
2
2
Pr( ) Pr( | , ) Pr( )
Pr( ) Pr( | , ) Pr( )
Pr( ) Pr( | , ) Pr( )
Pr( ) Pr( | , ) Pr( )
Pr( ) Pr( | , ) Pr( )
Pr( ) Pr( | , ) Pr( )
Pr( ) Pr
AA ABC AB AA AA DDC
BB ABC AB BB BB DDC
CC ABC AB CC CC DDD C
DD ABC AB DD DD DDD C
AB ABC AB AB AB DDC
AC ABC AB AC AC DDDC
AD
× = ×
× = ×
× = ×
× = ×
× = ×
× = ×
× ( | , ) Pr( )
Pr( ) Pr( | , ) Pr( )
Pr( ) Pr( | , ) Pr( )
Pr( ) Pr( | , ) Pr( )
ABC AB AD AD DDDC
BC ABC AB BC BC DDDC
BD ABC AB BD BD DDDC
CD ABC AB CD CD DDDC
= ×
× = ×
× = ×
× = ×
1
2
3
4
5
6
7
8
9
10
w
w
w
w
w
w
w
w
w
w
××××××××××
This image cannot currently be displayed.
Statistical Models
• Estimate the drop-out or drop-in rates through the observed peak heights or the quantity of input DNA (Logistic Model)• The higher the observed peak, the
lower the drop-out rate
• The parameters (β0 and β1) depend on the process used to generate DNA profiles• # of cycles, the kind of samples used; the allelic
composition; etc.
0 1
0 1
ˆexp( log( ))ˆ( | )ˆ1 exp( log( ))
HP D H
H
β ββ β
+=+ +
This image cannot currently be displayed.
STRMIX
http://strmix.esr.cri.nz/
This image cannot currently be displayed.
Issues for LCN Typing
• Do models reflect reality?• Variation of profile with low level DNA
• What constitutes an effective validation for drop out and drop in?
• Multiple amplifications vs one amplification
This image cannot currently be displayed.
Timely Response
• DNA database of known offenders and forensic samples
• Timeliness of the match is critical and can reduce the resources required to solve the crime
DNA database
This image cannot currently be displayed.
Traditional DNA Typing Process
DNA Extraction
DNA Quantification
DNA Amplification
CEData
analysis
2 hours 2 hours 3 hours 30 min 2-3 hours
This image cannot currently be displayed.
Direct Amplification Workflow
Fast with Direct Amplification
This image cannot currently be displayed.
Five/Four Lab Processes in One “Field” Device
Human AnalystProfile MatchHours - Weeks
• Disposable microfluidic biochip technology integrates and automates five laboratory processes into one field device operated by non-expert users
XXX
This image cannot currently be displayed.
Rapid DNA Single Platform Systems
This image cannot currently be displayed.
A “Trainable” Component of the System
This image cannot currently be displayed.
Potential Applications
• Law Enforcement—Booking desk/arrestee processing to generate investigative leads
• Forensic DNA Laboratories
• Borders and Ports of Entry
• Immigration/Illegal Adoption/Human Trafficking
• Victim Identification at Mass Disaster Sites
• Military and Military Intelligence
This image cannot currently be displayed.
Areas of Interest
• STRs
• Mitochondrial DNA
• HID SNPs
• Ancestry SNPs
• Phenotype SNPs
• Mixtures
• Pharmacogenetics
• Microbial Forensics
• …going to need a bigger boat
Image courtesy: geek.com
This image cannot currently be displayed.
• Decision trees required for marker set selection for various sample types
• Autosomal STRs
• Y STRs
• X STRs
• SNPs
• mtDNA
• Markers provide no additional lead without a suspect or a database hit
• Need tests that provide additional investigative leads
• Ex: facial reconstruction; phenotype and ancestry markers; familial searching
Current Forensic DNA Workflows
This image cannot currently be displayed.
Next Generation Sequencing
• NGS is no longer next generation, consider:• Current Generation Sequencing
• Massively Parallel Sequencing (MPS)
• Sequencing on steroids
This image cannot currently be displayed.
MPS andForensic STR“GoldStandard”
~13-24 markers
• Capillary electrophoresis fragments (size differences)
Length Variations
Increasing size (bp)
100s of Markers
Massively parallel sequencingFragments can overlap in sizeRead counts / noise
Length & Sequence Variations
Targeted MPS
GTGTGATGTA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATAGGTGTGTG G G G G G G G G G G G G G G
GTGTGATGTA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATAGGTGTGTG G G G G G G G G G G G G G G
GTGTGATGTA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATAGGTGTGTG G G G G G G G G G G G G G G
GTGTGATGTA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATAGGTGTGTG G G G G G G G G G G G G G G
GTGTGATGTA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATAGGTGTGTG G G G G G G G G G G G G G G
GTGTGATGTA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATAGGTGTGTG G G G G G G G G G G G G G G
GTGTGATGTA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATAGGTGTGTG G G G G G G G G G G G G G G
GTGTGATGTA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATA ATAGGTGTGTG G G G G G G G G G G G G G G
This image cannot currently be displayed.
Size No Longer Matters
• With CE the marker amplicons tagged with the same fluor had to be different sizes and/or use of mobility modifiers • Limits the number of loci that can be placed in a CE-based
multiplex
• With MPS size [for detection purposes] is no longer an issue• Each molecule is read independently
• Enables more markers to be analyzed simultaneously
This image cannot currently be displayed.
AATG AATG AATG AATG AATG
AATG AATG AATG AATG AATG
AATG AATG AATG AATG AATG
AATG
AATG AATG
STRs
• Current mainstay for identity testing• High discrimination power
This image cannot currently be displayed.
Types of SNPs
• Individual Identification SNPs:
• SNPs that collectively give very low probabilities of two individuals having the same multisite genotype; individualization, High heterozygosity, low Fst
• Ancestry Informative SNPs:
• SNPs that collectively give a high probability of an individual’s ancestry being from one part of the world or being derived from two or more areas of the world
• Lineage Informative SNPs:
• Sets of tightly linked SNPs that function as multiallelic markers that can serve to identify relatives with higher probabilities than simple di-allelic SNPs
• Phenotype Informative SNPs:
• SNPs that provide high probability that the individual has particular phenotypes, such as a particular skin color, hair color, eye color, etc.
• Pharmacogenetic SNPs – molecular autopsy
This image cannot currently be displayed.
General MPS Workflow
• Extract DNA
• Fragment genomic DNA or targeted amplification
• Library preparation
• Cluster generation/clonal amplification
• Sequence
• Data analysis• Bioinformatics
This image cannot currently be displayed.
Overview Of Ion Torrent PGM™Technology and Workflow
• Personal Genome Machine• Ion 314, 316, 318 Chip v2
• Up to 2Gb and 5.5M reads
• 2.3-7.3 hour runs• Depending on read length and chip size
• Multiplex 96 samples
This image cannot currently be displayed.
Concept
• To date, DNA sequencing required imaging technology to support detection of electromagnetic intermediates (light) and specialized nucleotides or other reagents
• An alternative now exists that is based on non-optical sequencing on integrated circuits
This image cannot currently be displayed.
Chemistry
This image cannot currently be displayed.
Chemistry• Reduce sequencing errors:
• Modified bases
• Fluorescent bases
• Laser detection
• Enzymatic amplification cascades
• Improve read length limitations:• Unnatural bases
• Faulty synthesis
• Slow cycle time
• Deliver highly uniform genome coverage
• In principle similar to pyrosequencing• But less complex
This image cannot currently be displayed.
• An “ionogram” is the output of the signals in flow space
• Must be read “up-and-down” along with “left-to-right”
• Height of bar indicates how many nucleotides incorporated during flow
Data Output is an Ionogram
Sequence: AATCTTCTG…
Key Sequence
TTT
TCAG
AA
This image cannot currently be displayed.
Overview of IlluminaTechnology and Workflow
• MiSeq• FGx- Forensic Genomics System
• Up to 15Gb and 50M reads
• 4-55 hour runs• Depending on read length
• Multiplex 96 samples
This image cannot currently be displayed.
Library Preparation
DNA(0.1-5.0 µg)
1 2 3 7 8 94 5 6T G T A C G A T …
Illumina Sequencing Technology Overview
CC
C
CC
CC
AA
AA
AA
TT
TT
GG
GG
GG
GG
Sequencing
Single molecule array
Cluster Growth
Image Acquisition Base Calling
5’
5’3’
TGTACGATCACCCGATCGAA
This image cannot currently be displayed.
Bioinformatics
• A science in itself
• Many science experiments are carried out with bioinformatics
• “the new field that merges biology, computer science, and information technology to manage and analyze the data, with the ultimate goal of understanding and modeling living systems."
• Genomics and Its Impact on Medicine and Society - A 2001 Primer U.S. Department of Energy Human Genome Program
This image cannot currently be displayed.
CACACTTGCATGTGAGAGCTTCTAATATCTAAATTAATGTTGAATCATTATTCAGAAACAGAGAGCTAACTGTTATCCCATCCTGACTTTATTCTTTATG AGAAAAATACAGTGATTCCAAGTTACCAAGTTAGTGCTGCTTGCTTTATAAATGAAGTAATATTTTAAAA GTTGTGCATAAGTTAAAATTCAGAAATAAAACTTCATCCTAAAACTCTGTG TGTTGCTTTAAATAATCAGAGCATCTGC TACTTAATTTTTTGTGTGTGGGTGCACAATAGATGTTTAATGAGATCCTGTCATCTGTCTGCTTTTTTATTGTAAAACAGGAGGGGTTTTAATACTGGAGGAACAACTGATGTACCTCTGAAAAGAGA AGAGATTAGTTATTAATTGAATTGAGGGTTGTCTTGTCTTAGTAGCTTTTATTCTCTAGGTACTATTTGATTATGATTGTGAAAATAGAATTTATCCCTCATTAAATGTAAAATCAACAGGAGAATAGCAAAAACTTATGAGATAGAT GAACGTTGTGTGAGTGGCATGGTTTAATTTGTTTGGAAGAAGCACTTGCCCCAGAAGATACACAATGAAATTCATGTTATTGAGTAGAGTAGTAATACAGTGTGTTCCCTTGTGAAGTTCATAACCAAGAATTTTAGTAGTGGATAGGTAGGCTGAATAACTGACTTCCTATC ATTTTCAGGTTCTGCGTTTGATTTTTTTTACATATTAATTTCTTTGATCCACATTAAGCTCAGTTATGTATTTCCATTTTATAAATGAAAAAAAATAGGCACTTGCAAATGTC AGATCACTTGCCTGTGGTCATTCGGGTAGAGATTTGTGGAGCTAAGTTGGTCTTAATCAAATGTCAAGCTTTTTTTTTTCTTATAAAATATAGGTTTTAATATGAGTTTTAAAATAAAAT TAATTAGAAAAAGGCAAATTACTCAATATATATAAGGTATTGCATTTGTAATAGGTAGGTATTTCATT TTCTAGTTATGGTGGGATATTATTCAGACTATAATTCCCAATGAAAAAACT TTAAAAAATGCTAGTGATTGCACACTTAAAACACCTTTTAAAAAGCATTGAGAGCTTATAAAATTTTA ATGAGTGATAAAACCAAATTTGAAGAGAAAAGAAGAACCCAGAGAGGTAAG GATATAACCTTACCAGTTGCAATTTGCCGATCTCTACAAATATTAATATTTATTTTGACAGTTTCAGGGTGAATGAGAAAGAAACCAAAACCCAAGACTAGCATATGTTGTCTTCTTAAGGAGCCCTCCCCTAAAAGATTGAGATGACCAAATCTTATACTCTCAGCATAAGGTGAACCAGACAGACCTAAAGCAGTGGTAGCTTGGATCCACTACTTGGGTTTGTGTGTGGCGTGACTCAGGTAATCTCAAGAATTGAACATTTTTTTAAGGTGGTCCTACTCATACACTGCCCAGGTATTAGGGAGAAGCAAATCTGAATGCTTTATAAAAATACCCTAAAGCTAAATC TTACAATATTCTCAAGAACACAGTGAA ACAAGGCAAAATAAGTTAAAATCAACAAAAACAACATGA AACATAATTAGACACACAAAGACTTCAAACATTGGAAAATACCAGAGAAAG ATAATAAATATTTTACTCTTTAAAAATTTAGTTAAAAGCTTAAACTAATTGTAGAGAAAA A ACTATGTTAGTATTATATTGTAGATGAAATAAGCAAAACATTTAAAATACA AATGTGATTACTTAAATTAAATATAATAGATAATTTACCACCAGATTAGATACCATTGAAGGAATAAT TAATATACTGAAATACAGGTCAGTAGAATTTTTTTCAATTCAGCATGGAGA TGTAAAAAATGAAAATTAATGCAAAAAATAAGGGCACAAAAAGAAATGAGTAATTTTGATCAGAAA TGTATTAAAATTAATAAACTGGAAATTTGACATTTAAAAAAAGCATTGTCA TCCAAGTAGATGTGTCTATTAAATAGTTGTTCTCATATCCAGTAATGTAATTATTATTCCCTCTCATGCAGTTCAGATTCTGGGGTAATCTTTAGACATCAGTTTTGTCTTTTATATTATTTATTCTGTTTACTACATTTTATTTTGCTAATGATATTTTTAATTTCTGACATTCTGGAGTATTGCTTGTAAAAGGTATTTTTAAAAATACTTTATGGTTATTTTTGTGATTCCTATT CCTCTATGGACACCAAGGCTATTGACATTTTCTTTGGTTTCTTCTGTTACTTCTATTTTCTTAGTGTTTATATCATTTCATAGATAGGATATTCTTTATTTTTTATTTTTATTTAAATATTT GGTGATTCTTGGTTTTCTCAGCCATCTATTGTCAAGTGTTCTTATTAAGCATTATTATTAAATAAAGATTATTT CCTCTAATCACATGAGAATCTTTATTTCCCCCAAGTAATTGAAAATTGCAATGCCATGCTGCCATGTGGTACAGCATGGGTTTGGGCTTGCTTTCTTCTTTTTTTTTTAACTTTTATTTTAGGTTTGGGAGTACCTGTGAAAGTTTGTTATATAGGTAAACTCGTGTCACCAGGGTTTGTTGTACAGATCATTTTGTCACCTAGGTACCAAGTACTCAACAATTATTTTTCCTGCTCCTCTGTCTCCTGTCACCCTCCACTCTCAAGTAGACTCCGGTGTCTGCTGTTCCATTCTTTGTGTCCATGTGTTCTCATAATTTAGTTCCCCACTTGTAAGTGAGAACATGCAGTATTTTCTAGTATTTGGTTTTTTGTTCCTGTGTTAATTTGCCCAGTATAATAGCCTCCAGCTCCATCCATGTTACTGCAAAGAACATGATCTCATTCTTTTTTATAGCTCCATGGTGTCTATATACCACATTTTCTTTATCTAAACTCTTATTGATGAGCATTGAGGTGGATTCTATGTCTTTGCTATTGTGCATATTGCTGCAAGAACATTTGTGTGCATGTGTCTTTATGGTAGAATGATATATTTTCTTCTGGGTATATATGCAGTAATGCGATTGCTGGTTGGAATGGTAGTTCTGCTTTTATCTCTTTGAGGAATTGCCATGCTGCTTTCCACAATAGTTGAACTAACTTACACTCCCACTAACAGTGTGTAAGTGTTTCCTTTTCTCCACAACCTGCCAGCATCTGTTATTTTTTGACATTTTAATAGTAGCCATTTTAACTGGTATGAAATTATATTTCATTGTGGTTTTAATTTGCATTTCTCTAATGATCAGTGATATTGAGTTTGTTTTTTTTCACATGCTTGTTGGCTGCATGTATGTCTTCTTTTAAAAAGTGTCTGTTCATGTACTTTGCCCACATTTTAATGGGGTTGTTTTTCTCTTGTAAATTTGTTTAAATTCCTTATAGGTGCTGGATTTTAGACATTTGTCAGACGCATAGTTTGCAAATAGTTTCTCCCATTCTGTAGGTTGTCTGTTTATTTTGTTAATAGTTTCTTTTGCTATGCAGAAGCTCTTAATAAGTTTAATGAGATCCTGATATGTTAGGCTTTGTGTCCCCACCCAAATCTCATCTTGAATTATATCTCCATAATCACCACATGGAGAGACCAGGTGGAGGTAATTGAATCTGGGGGTGGTTTCACCCATGCTGTTCTTGTGATAGTGAATGAGTTCTCACGAGATCTAATGGTTTTATGAGGGGCTCTTCCCAGCTTTGCCTGGTACTTCTCCTTCCTGCCGCTTTGTGAAAAAGGTGCATTGCGTCCCTTTCACCTTCTTCTATAATTGTAAGTTTCCTGAGGCCTTCCCAGCCATGCTGAACTTCAAGTCAATTAAACCTTTTTCTTTATAAATTACTCAGTCTCTGGTGGTTCTTTATAGCAGTGTGAAAATGGACTAATGAAGTTCCCATTTATGAATTTTTGCTTTTGTTGCAATTGCTTTTGACATCTTAGTCATGAAATCCTTGCCTGTTCTAAGTACAGGACGGTATTGCCTAGGTTGTCTTCCAGGGTTTTTCTAATTTTGTGTTTTGCATTTAAGTGTTTAATCCATCTTGAGTTGATTTTTGTATATTGTGTAAGGAAGGGGTCCAGTTTCAATCTTTTGCATATGGCTAGTTAGTTATCCCAGTACCATTTATTGAAAAGACAGTCTTTTCCCCATCGCTCGTTTTTGTCAGTTTTATTGATGATCAGATAATCATAGCTGTGTGGCTTTATTTCTGGGTTCTTTATTCTGTTCTATTGGTTTATGTCCCTGTTTTTGTGCCAGTACCATGCTGTTTTGGTTAACATAGCCCTGTAGTATAGTTTGAGGTCAGATAGCCTGATGCTTCCAGCTTTGTTCTTTTTCTTAAGATTGCCTTGGCTATTTGGCCTCTTTTTTGGTTCCACATGAATTTTAAAACAGTTGTTTCTAGTTTTTGAAGAATGTCATTGGTAGTTTGATAGAAATAGCATTTAATCTGTAAATTGATTT GTGCAGTATGGCCTTTTAATGATATTGATTCTTCCTATCCATGAGCATGATATGTTTTCCATTTTGTTTGTATCCTCTCTGATTTCTTTGTGCAGTGTTTTGTAATTCTCAT TGTAGAGATTTTTCACCTCCCTGGTTAGTTGTATTTTACCCTAGATATTT TATTCTTTTTGTGAAAATTGTGAATGGGATTGCCTTCCTGATTTGACTGC CAGCTTGGTTACTGTTGGTTTATAGAAATGCTAGTGATTTTTGTACATTG ATTTTCTTTCTAAAACTTTGCTGAAGTTTTTTTTATTAGCAGAAGGAGCTTTGGGGCTGAGACTATGGGGTTTTCTAGATATAGAATCATGTCAGCTTCAAATAGGGATAATTTTACTTCCTCTCTTCCTATTTGGATGCCCTTTATTTCTTTCTCTTGCCTGATTACTCTGGCTGGGATTTCCTATGTTGAATAGGAGT CATGAGAGAGGGCATCAAATCTACACATATCAAATACTAACCTTGAATGTCTAGATATTT TATTCTTTTTGT GAAAATTGTGAATGGGAT
5000 Bases per Page
This image cannot currently be displayed.
The magnitude of genomic data in a an analysis!
• 3 pallets with 40 boxes per pallet x 5000 pages per box x 5000 bases per page = 3,000,000,000 bases!
• To get accurate sequence • requires 6-fold coverage
• Now: Shred 18 pallets and reassemble
• Really need Bioinformatics
This image cannot currently be displayed.
Bioinformatics
• Allele calls
• Alignment
• Strand bias
• Coverage
• Thresholds
• ….
This image cannot currently be displayed.
STR Allele Calling Software
• Compare back to nominal allele length
• Needed better approaches
• More versatile, more flexible
This image cannot currently be displayed.
Average Locus Coverage (n=24) with 62 pg of input DNA
Prototype NGS (Promega)
This image cannot currently be displayed.
Loci with Intra-Repeat Variation (n=24)
Loci Repeats ObservationsD21S11 29 8D21S11 29 1D21S11 30 3D21S11 30 2D21S11 30 5D21S11 31 5D21S11 31 2D21S11 32.2 4D21S11 32.2 1D2S1338 20 1D2S1338 20 2D2S1338 20 1D2S1338 22 1D2S1338 22 2D2S1338 23 1D2S1338 23 5D2S1338 25 5D2S1338 25 1
Loci Repeats ObservationsD2S1338 25 1D3S1358 15 3D3S1358 15 10D3S1358 15 1D3S1358 16 10D3S1358 16 4D3S1358 17 3D3S1358 17 4D8S1179 13 11D8S1179 13 5D8S1179 14 1D8S1179 14 7D8S1179 14 1D8S1179 15 3D8S1179 15 1
VWA 16 3VWA 16 4VWA 17 1VWA 17 15VWA 18 1VWA 18 5VWA 19 1VWA 19 1
Consistent with MS work
This image cannot currently be displayed.
Variation Example at D3S1358 Locus
Stutter sequence for 15 alleles
AGATAGATAGATAGATAGATAGATAGATAGATAGATAGACAGACAGACAGA TAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGACAGACAGA TAGAT
Sequence for 15 alleles
AGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGACAGACAGA CAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGACAGA CAGATAGAT
This image cannot currently be displayed.
Allele and Stutter Distributionfor D3S1358 Homozygote
This image cannot currently be displayed.
Allele 30
TAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATATGGAT AGATAGATGATAGATAGATAGATATAGATAGATAGACAGACAGACAGACAGACAGATAGATAG ATAGATAGATAGATAGA
Minus Stutter Allele 31
TAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATATGGAT AGATAGATGATAGATAGATAGATATAGATAGATAGACAGACAGACAGACAGACAGACAGATAG ATAGATAGATAGATAGA
Allele 31
TAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAT GGATAGATAGATGATAGATAGATAGATATAGATAGATAGACAGACAGACAGACAGACAGACAG ATAGATAGATAGATAGATAGA
Plus Stutter Allele 30
TAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAT GGATAGATAGATGATAGATAGATAGATATAGATAGATAGACAGACAGACAGACAGACAGATAG ATAGATAGATAGATAGATAGA
Variation Example at D21S11 Locus
This image cannot currently be displayed.
Allele and Stutter Distributionfor D21S11 Heterozygote
This image cannot currently be displayed.
Mixture Interpretation EnhancementMinor and Major Contributor Alleles
0
1000
2000
3000
4000
5000
6000
9 10 11 12 13
De
pth
of
Co
ve
rag
e
Nominal Alleles by Repeat
D5S818
Major Shared Minor
AGAT...AGAT
AGAT...AGAT
AGAT...AGAG
Mixture of 2 peopleBoth 11,12
11 indistinguishable
Stutter from allele 11
12 distinguishable
This image cannot currently be displayed.
Analysis of Difficult Samples
mtDNA is the most successful marker
This image cannot currently be displayed.
Advantages of mtDNAAnalysis
• High copy numberlimited samplehair, teeth, bones
• Less prone to degradationstructure, location
• Maternal inheritancematernal relatives source of known sample in missing persons cases
• Highly variable among individuals
This image cannot currently be displayed.
Limitations of Sanger Sequencing
• Sequencing is labor intensive
• Analysis of results is time consuming
• Costly (prices range from $1000 to $3000 per mtDNAsample and only HV1 and HV2)
• Variation in intensity of peaks
• Not quantitative– impacts mixture interpretation
• Heteroplasmy difficulties
This image cannot currently be displayed.
Length Heteroplasmy
...CCACCAAACCCCCCCCTCCCCCCGCTT… 8 C’s
...CCACCAAACCCCCCCCCTCCCCCCGCTT… 9 C’s
...CCACCAAACCCCCCCCCCTCCCCCCGCTT… 10 C’s
This image cannot currently be displayed.
mitoSAVE
IGV
mitoSAVE: King et al. 2014
• Alignment issues
• Tool to provide consistencyin nomenclature
• EMPOP tools
This image cannot currently be displayed.
Population Studies Become Easy
• Sequenced 283 whole genomes on MiSeqfollowing protocol
• Three populations – African American, Caucasian, SW Hispanics
• Multiple software
• mitoSAVE
• Haplogrep
• First upload of MPS mtDNAdata to EMPOP
This image cannot currently be displayed.
Variation Across the mtGenome(n=283)
• 11,607 variants• defined in relation to the rCRS
• Polymorphism density clustered in HVI/HVII • 2,938 of the variants (25.3%)
• ~75% of variation in coding region
• Increase the value of mtDNA
This image cannot currently be displayed.
Length heteroplasmy sequences detected using MPS
Sanger sequencing of homopolymer C stretch (forward and reverse strands)
Length HeteroplasmyTissue Sample 400
This image cannot currently be displayed.
Sample number Method6997 Sanger 16192T 16260T
6997 NGS 16192T 16260T
0056-12 Sanger 16093Y 16223T 16239T 16260T 16274A 16325C 16362C
0056-12 NGS 16093Y* 16223T 16239T 16260T 16274A 16325C 16362C
H2 Sanger 16111T 16184T 16223T 16290T 16319A 16362C
H2 NGS 16111T 16184T 16223T 16290T 16319A 16362C
7000 Sanger 16126C 16223T 16325C 16362C
7000 NGS 16126C 16223T 16325C 16362C
Sample number Method6997 Sanger 263G 315.1C
6997 NGS 263G 315.1C
0056-12 Sanger 73G 263G
0056-12 NGS 73G 263G
H2 Sanger 64T 73G 146C 153G 155C 203A 222T 235G 263G 315.1C
H2 NGS 64T 73G 146C 153G 155C 203A 222T 235G 263G 315.1C
7000 Sanger 55C 56G 64T 73G 263G 279C 309.1C 315.1C
7000 NGS 55C 56G 64T 73G 263G 279C 309.1C 315.1C
16093Y* is 75% C and 25% T
HV2
HV1
Sanger vsMiSeqfour bone samples
All samples had threshold set at 25%; position 16093 for sample 0056-12 detected heteroplasmy at 19.1% showing all sites concordant
This image cannot currently be displayed.
Short AmpliconMultiplexesWhole Genome
• Degraded sample analyses
This image cannot currently be displayed.
History of Deadwood, S.D.
Deadwood Cemeteries
• Ingleside Cemetery • (1876 – 1878) located near the
downtown core business district; approximately 100 burials
• Mt. Moriah Cemetery • Established in 1878
• Final resting place of Western legends, murderers, madams, and pillars of Deadwood’s early economic development
This image cannot currently be displayed.
Unidentified Skeletal Remains: Case Objectives
The Washington Times, July 2014http://www.washingtontimes.com/news/2014/jul/6/scientists-unraveling-a-
historic-deadwood-mystery/?page=all
• Who is he?• What did he look
like?
• Where did he come from?
• Genetic Typing• Y-STRs
• AIMS
• Phenotype-informative SNPs
• mtDNA
This image cannot currently be displayed.
Y SNPs in the HID-Ion AmpliSeq™ Identity Panel• 34 upper clade SNPsDeadwoodFemur
008.001 E1 EVCV2DeadwoodFemur 008.002 E1 EVCV2
DeadwoodFemur 008.002 E2 EVCV2 Consensus
rs2032636 G G G Grs9341278 G G N Grs2032658 G G G Grs2319818 G G G Grs17269816 C C C Crs17222573 A A A AM479 C C N Crs3848982 C C C Crs3900 G G G Grs3911 A A N Ars2032631 A A A Ars2032673 T T T Trs2032652 T T T Trs16980426 T T T Trs13447443 A A N Ars17842518 G G G Grs2033003 C C C C
R1b
This image cannot currently be displayed.
European AncestryAIMs
This image cannot currently be displayed.
Phenotype Probabilities
http://hirisplex.erasmusmc.nl/Accessed on 10-07-2014HIrisPlex: Walsh et al. (2013)IrisPlex: Walsh et al. (2011)
This image cannot currently be displayed.
Green Mountain Study
• Data from blind study• Green Mountain Study• 12 samples• Used 1 ng template DNA
• Markers• HID SNPs• AIMs• Y SNPs• STRs• Whole Genome mtDNA
This image cannot currently be displayed.
STR Panel
• 10-plex STR Panel• Amelogenin• D16S359• D3S1358• D5S818• CSF1PO
• Analyzed data• STRait Razor• STR Genotyper Plugin
• Compared data with genotypes generated on 3130xl Genetic Analyzer
• Also evaluated of sequence variants within allelesSTRait Razor: Warshauer et al. 2013
• D7S820• D8S1179• TH01• TPOX• vWA
This image cannot currently be displayed.
mtDNA Genome
Sample Haplogroup
1 J1c5
3 H3b
4 U7b
5 H6a1b4
6 H33
7 M7b1a1c1
10 H5n
13 L2a1f
14 H1c
15 H1c
16 L3e1a1a
17 H1c
This image cannot currently be displayed.
Identifying Relationships
Genotypes from STRs and Identity SNPs allow for expansion and refinement of the partial pedigree identified with the mitochondrial haplotypes
This image cannot currently be displayed.
Identifying Relationships
Pedigree supported by data from Ancestry SNP panel
Biogeography ≠ Bioancestry
This image cannot currently be displayed.
Biogeography = Bioancestry
Phenotype: brown hair, hazel eyes, white fair skin complexion
BioancestryInferences
This image cannot currently be displayed.
Identical Twins
This image cannot currently be displayed.
• After complete forensic autopsy
• Cause of death unknown 0.3 - 0.6 % (~1/200)
• Manner of death unknown 3 - 6 % (~1/20)
Negative Autopsies
This image cannot currently be displayed.
Microbiome-Human ID
• Fierer et al, 2009
This image cannot currently be displayed.
Microbial Forensics
• Analysis of evidence from a bioterrorism act, biocrime, or inadvertent microorganism/toxin release for attribution purposes
• Essentially the same as any other forensic discipline
This image cannot currently be displayed.
This image cannot currently be displayed.
Turn Research to Practice
This image cannot currently be displayed.
Some Issues to Address
• Data quality• Q scores
• Filters
• Noise
• Thresholds• Detection
• Stochastic
• Heteroplasmy
• Marker specific criteria
• STRs• Nominal length vs sequence variation
• Bioinformatics• Software
• Reference alignment
• Training/Education
• Typical Validation Studies
This image cannot currently be displayed.
We have come a long way!
Yet more is still to come!
This image cannot currently be displayed.
• Thermo Fisher Scientific• Illumina• Promega• IntegenX• NetBio• South Dakota Historical Society
Archaeological Research CenterCity of Deadwood
#1ACKNOWLEDGMENTS
• Walther Parson• Antonio Amorim• Research Team
This image cannot currently be displayed.
Life Technologies and its affiliates are not endorsing, recommending, or promoting any use or application of Life Technologies products presented by third parties during this seminar. Information and materials presented or provided by third parties are provided as-is and without warranty of any kind, including regarding intellectual property rights and reported results. Parties presenting images, text and material represent they have the rights to do so.
This image cannot currently be displayed.
WHEN USED FOR PURPOSES OTHER THAN HUMAN IDENTIFICATION HID-ION PGMTM IS FOR RESEARCH USE ONLY. NOT FOR USE INDIAGNOSTIC PROCEDURES.
HID Ion AmpliSeq Identity Panel and HID Ion AmpliSeqAncestry Panel are for Research, Forensics and Paternity Use Only. Y Filer is for Forensics and Paternity Use Only.
Speaker was provided travel and hotel support by Thermo Fisher Scientific for this presentation, but no remuneration.