measuring rates of mtdna heteroplasmy using a...
TRANSCRIPT
Measuring Rates of mtDNA Heteroplasmy Using a NextGen
Sequencing Approach
Mitchell M. Holland, Ph.D.Former Director, Forensic Science Program
Associate Professor, Biochem & MolBioPenn State University, University Park, PA
www.forensics.psu.edu
NC State University15 Sep 2015
Nucleus
Mitochondria
Nuclear DNA
Mitochondrial DNAHigh Copy Number Genome
Types of DNA in the Cell
Membrane-enclosed organelles distributed through the cytosol of most eukaryotic cells
Tsar Nicholas II Family Reference5 Generations Removed
Identification ofNicholas Romanov
Tsar Nicholas II
Identification ofNicholas Romanov
Georgij Romanov
LR = 150
LR = 375,000When the heteroplasmy is considered
“Substitution” Rate of mtDNA
We compared DNA sequences of the two CR hyper-variable segments from close maternal relatives, from 134 independent mtDNA lineages spanning 327 generational events.
Germline Bottleneck
Family Reference5 Generations Removed
DGGE to Identify Heteroplasmy
Used DGGE analysis to identify the heteroplasmic
sequences … including from the distant maternal
relative
“Substitution” Rate of mtDNA
Ten substitutions were observed, resulting in an empirical rate of 1/33 generations, or 2.5/site/Myr. This is roughly twenty-fold higher than estimates derived from pylogenetic studies; 0.118 +/- 0.031/site/Myr.
Using our empirical rate to calibrate the mtDNA molecular clock would result in an age for the mtDNA MRCA of only ~6,500 y.a., clearly incompatible with the known age of modern humans.
Genetic Bottlenecks & Empirical Mutation Rates
The germline mutation rate is 0.13 mutations/site/Myr (compared to phylogenic rate estimates of 0.118)
The number of mtDNA molecules “transmitted” to the next generation is 30-35 (human germline bottleneck)
Non-synonymous mutations showed signs of purifying selection
Proceedings of the National Academy of Sciences (2014) Using an NGS
Approach
Hypothesis: An NGS approach will allow for the routine detection and reporting of mtDNA heteroplasmy, including low level variants. Differences in heteroplasmic profiles may allow for the differentiation of maternal relatives.
Problem: Forensic labs still don’t have a suitable method for detecting and reporting mtDNA heteroplasmy. Even high levels of heteroplasmy go unreported, lowering the discrimination potential of the typing system.
Here we are in 2015, and …
Our Initial Work: 2009-2011
www.isabs.hr www.forensics.psu.edu (Under Mitch Holland’s Research Page)www.cmj.hr
Croat Med J (2011), 52, pp. 299-313
Using the 454 LifeSciences GS Junior Instrument & Chemistry
Sample SangermtDNA Profile
Percent of Minor Heteroplasmy & Site
454 GS Junior mtDNA Profile
Percent of Minor Heteroplasmy & Site
F216069T, 16093C, 16126C, 16261T, 16274A, 16355T
16311 – 18.4% C16069T, 16093C, 16126C, 16261T, 16274A, 16355T
16093 – 3.71% T 16261 – 1.29% C 16311 – 20.14% C
F316069T, 16126C, 16145A, 16172C, 16261T
Not Detected16069T, 16126C, 16145A, 16172C, 16261T
Not Detected
F4 No polymorphisms Not Detected No polymorphisms Not Detected
F5 16129A, 16172C, 16223T, 16311C Not Detected 16129A, 16172C,
16223T, 16311C16129 – 0.51% G 16311 – 0.33% T
F7, F12-13, M13-14
16192T, 16256T,16270T Not Detected 16192T, 16256T,
16270T 16192 - 2.64-4.50% C
F8 16223T, 16362C Not Detected 16223T, 16362C 16223 – 1.86% C
F9 16356C Not Detected 16356C Not Detected
F10 16298C Not Detected 16298C 16298 – 0.45% T
F1616126C, 16239T, 16294T, 16296T,16304C
Not Detected16126C, 16239T, 16294T, 16296T, 16304C
Not Detected
F25 16343G Not Detected 16343G Not DetectedF26 16093C Not Detected 16093C Not DetectedF27 16172C, 16278T Not Detected 16172C, 16278T Not DetectedM3 16355T Not Detected 16355T Not DetectedM4 16111T Not Detected 16111T 16111 – 0.52% C
Evaluated 30 individuals from25 different mtDNA lineages
0.33%or1/300
Table 3, Holland et al, CMJ 2011
Concordance
3.71% C/THeteroplasmy
1.29% T/CHeteroplasmy
20.14% C/THeteroplasmy
Sanger versus NGS Heteroplasmy Detection
SAN
GER
NG
S
Figure 2, Holland et al, CMJ 2011
Other Examples
PGM capable of producing quality, reliable mtDNA sequence data
64 mtgenomes
<0.02% Differences from Sanger Data
Most Differences in Homopolymeric Stretches
Concordance
M5
16114A, 16129A, 16192T, 16213A, 16223T, 16278T, 16355T, 16362C
Not Detected
16114A, 16129A, 16192T, 16213A, 16223T, 16278T, 16355T, 16362C
16192 – 3.18% C
M7 16129A, 16223T,16264T Not Detected 16129A, 16223T,
16264T Not Detected
M8 16224C, 16311C Not Detected 16224C, 16311C Not Detected
M9 16301T, 16343G, 16356C Not Detected 16301T, 16343G,
16356C Not Detected
M10 16304C Not Detected 16304C16209 – 2.62% C 16222 – 2.30% T 16304 – 2.99% T
M11 16129A, 16223T Not Detected 16129A, 16223T Not Detected
M12 16069T, 16126C Not Detected 16069T, 16126C 16126- 1.14% T
M15 16093C, 16224C,16311C Not Detected 16093C, 16224C,
16311C 16093 – 3.04% T
M17 16126C, 16294T,16296T Not Detected 16126C, 16294T,
16296T Not Detected
M18 16278T, 16304C,16311C Not Detected 16278T, 16304C,
16311C
16128 – 0.52% T 16278 – 0.77% C 16293 – 0.77% G 16304 – 1.00% T
M19, F22 16069T, 16126C,16222T Not Detected 16069T, 16126C,
16222T Not Detected
Sample SangermtDNA Profile
Percent of Minor Heteroplasmy & Site
454 GS Junior mtDNA Profile
Percent of Minor Heteroplasmy & Site
Is low level heteroplasmy reproducible?
Reproducibility
Reproducibility
M10 16304C Not Detected 16304C16209 – 2.62% C 16222 – 2.30% T 16304 – 2.99% T
Reproducibility
Sample SangermtDNA Profile
Percent of Minor Heteroplasmy & Site
454 GS Junior mtDNA Profile
Percent of Minor Heteroplasmy & Site
M10 Replicate #116209 – 2.58% C 16222 – 2.03% T 16304 – 1.87% T
M10 Replicate #216209 – 2.32% C 16222 – 2.57% T 16304 – 0.56% T
Rate of Heteroplasmy
Data Set = 109 Individual Lineages(50 Pairs of Maternal
Relatives)
0.5-1.0% Heteroplasmy
>1% Heteroplasmy
>10% Heteroplasmy
Coding Region 69% 50% 14%
Control Region 50% 26% 8.6%*
*Consistent with previous reports: for example, Irwin et al, J Mol Evol 2009
Things to Consider
If we agree that NGS should be employed in forensic cases then we need to better understand:
rates of heteroplasmy (per sample & per nucleotide)
transmission and drift of heteroplasmic variants
where to set reporting thresholds
how DNA damage will impact thresholds
statistical approaches when reporting heteroplasmy
• mtDNA Control Region
• Buccal swabs from 550 Unrelated individuals
• European decent
• Three age groups
• 18-30, 31-50, >50 yoa
• MiSeq/Nextera XT
• Initial findings
• Haplotypes/Heteroplasmy
Quigley's Cartoons | blog | June 5, 2013 http://www.capecodtoday.com/blogs/quigley/2013/06/05/19650-swabbing-cheek
NIJ 2014-DN-BX-K022
Rate Study
http://forensics.psu.edu/research/dr.-mitchell-holland
Haplotypes
• 265 samples analyzed, thus far
• 222 different haplotypes in the dataset (84%)
• 196/265 unique haplotypes (74%)
• Consistent with previous analyses, but higher percentages due to sequence range analyzed
~72%
~63%
Shared Haplotypes
16 7 2 1
Most common haplotype = 16519C, 263G, 315.1C (3 %)Shared by 8/265 individuals
H, 93
U, 42R, 27
J, 26
T, 25
K, 23
I, M, N, V, W, X, 20
Native American (C n=3)African (L n=6)
Haplogroups
Heteroplasmy
0
10
20
30
40
50
60
70N
o H
eter
opla
smy
One
Site
Two
Site
s
Thre
e S
ites
Four
Site
s
Five
Site
s
At L
east
One
At L
east
Tw
o
At L
east
Thr
ee
At L
east
Fou
r
At L
east
Fiv
e
Obs
erva
tions
of H
eter
opla
smy
Observations of Heteroplasmy
1-10% MAF>10% MAF
13% individuals
60% individuals
NOTE: >10% means at least one site above this value
27%
24%26%
0
5
10
15
20
25
30
35
40
45
18‐30 Years of Age 31‐50 Years of Age >50 Years of Age
Samples with No Heteroplasmy
Heteroplasmy v. Age
Normalized for Sample Set Size
0
5
10
15
20
25
30
35
40
45
One Site Two Sites Three Sites Four Sites Five or MoreSites
18-30 Years of Age
31-50 Years of Age
>50 Years of Age
18-30: 26% individuals31-50: 27% individuals>50: 43% individuals
Heteroplasmy v. AgeN
umbe
r of S
ampl
es
Heteroplasmy v. Site
Cold SpotsHot Spots
65% in HV1, 21% in HV2, 14% Outside HV1/HV2
Likelihood Ratio
LR = p(E1/R) x p(E2/R)
p(E1/R’) x p(E2/R’)
p(E1/R) = the probability of the evidence (match between Georgij and Nicholas) given the hypothesis that the remains are those of Nicholas Romanov
E2 = the probability of co-occurrence of heteroplasmy
R’ = given the hypothesis that the remains are unrelated
LR = 375,000Increase Discrimination Potential
Differentiate Between Maternal Relatives
#1098
Primary Haplotype Heteroplasmy Positions200 A/G (3.0%)
A263G
T16093C 16093T/C (12.6%)C16261TC16291CT16311CT16362CT16519C
#1100
Primary Haplotype Heteroplasmy Positions
A263G
T16093C 16093C/T (3.4%)C16261TC16291CT16311CT16362CT16519C
Issues Still to AddressForensic Context
Reporting mechanism for heteroplasmy
Weight of a heteroplasmic match
Impact of maternal transmission of heteroplasmic variants
Impact of drift in heteroplasmic variants at the tissue level
Thanks!!
IlluminaCydne Holt, Kathy Stephens, Joe Valaro, Carey Davis, Dan Gheba, etc
SoftGenetics – NextGENe®
John Fosnacht, Teresa Snyder-Leiby, etc
Penn StateKateryna Makova, Anton Nekratenko
Mitotyping TechnologiesBob Bever, et al
Battelle Memorial Institute
National Institute of Justice (NIJ 2014-DN-BX-K022)
Eberly College of Science, Forensic Science Program
Current Research Group
Jen McElhoe,Research Associate (NIJ)
Master’s Students:Molly Rathbun (damage)Laura Wilson (D-loop val)Elena Zavala (bone extr)Jamie Gallimore (drift)
UG Students:Alyssa DuffyJillian BakerErica Pack
Walther Parson & Ann Gross
Thanks for your hospitality!!