mingkun li, roland schröder, shengyu ni, burkhard madea, and … · 2015. 2. 5. · mingkun li,...
TRANSCRIPT
1
Supplementary Materials
Extensive tissue-related and allele-related mtDNA heteroplasmy suggests positive selection for
somatic mutations
Mingkun Li, Roland Schröder, Shengyu Ni, Burkhard Madea, and Mark Stoneking
Supplementary Materials includes:
Supplementary Notes 1-2
Supplementary Figures 1-4
Supplementary Tables 1-8
2
Supplementary Note 1. Authenticity of the Results
In this note we consider several potential technical artifacts that might explain the tissue-specific
and allele-specific heteroplasmies. First, the postmortem interval between death and tissue sampling
varied from 24-72 hours, so heteroplasmies might reflect postmortem degradation of DNA (which
might be more pronounced in some tissues, such as liver). However, the correlation between the
number of heteroplasmies identified and the postmortem interval is not significant for either
individuals overall (Spearman’s r=0.075, P=0.364) or for each specific tissue (all p-values > 0.04).
Moreover, postmortem damage should occur randomly with respect to positions in the sequence,
whereas we find distinctive nonrandom patterns, such as more heteroplasmies in the control region
(Fig. 1). The cause of death may also influence heteroplasmy; however none of the individuals died
from any disease known to be associated with mtDNA mutations, and only one death was attributed
to cancer. We investigated whether there was any association between cause of death (according to
the major categories in Table S1) and the number of heteroplasmies detected (Fig. S1.1).
Figure S1.1 Boxplots of the number of heteroplasmies detected in each tissue according to cause of
death (A: Cardiovascular; B:Traumatic injuries; C:Natural causes (non-cardiovascular); D:
Intoxication; E: Unclear or other). See Fig. 1 for tissue abbreviations.
3
Individuals dying from intoxication had significantly fewer heteroplasmies in small intestine, liver,
and myocardial muscle tissue than did individuals dying from other causes; however, individuals
dying from intoxication were also significantly younger than individuals dying from other causes
(age at death = 42 vs. 60; P = 0.0039, Mann-Whitney test), and heteroplasmies are strongly age-
related (Fig. 3). To control for this age effect, for each tissue we generated 10,000 random subsets of
individuals dying from other causes, containing the same number of samples as the set of individuals
dying from intoxication, and required that the difference in average age between the two sets to be
less than or equal to three years. There were no significant differences in the number of
heteroplasmies detected in individuals dying from intoxication vs. individuals dying from other
causes in these age-controlled subsets. We thus conclude that, when age is controlled for, there is no
effect of cause of death on heteroplasmy incidence.
Second, all samples were pooled into a single library and sequenced together on multiple lanes,
eliminating potential batch effects due to variation between sequencing runs/lanes. Biased capture
during capture enrichment can also be excluded because individuals of different ages were included
in each pool of libraries that was subject to capture-enrichment, and allele-specific heteroplasmies
are correlated with age (Fig. S3), which would not be expected if biased capture were responsible for
the heteroplasmy observations.
Third, systematic differences in coverage across tissues, or across specific mtDNA regions, could
result in tissue-related differences in the detection of heteroplasmy. As shown in Fig. S1.2, coverage
is highest for myocardial muscle and lowest for blood, with the coverage for the remaining tissues
approximately the same, and there are characteristic “peaks” and “valleys” in the coverage across the
mtDNA genome.
Figure S1.2 Variation in coverage across the mtDNA genome per tissue. The X-axis is the position
in the mtDNA genome, while the Y-axis is the average coverage.
However, the number of heteroplasmies detected is not systematically correlated with coverage (Fig.
S1.3), nor does coverage differ with respect to the age of an individual (Fig. S1.4). Thus, variation in
coverage cannot explain the age-related correlations we see in heteroplasmy. Moreover, variation in
4
coverage does not differ between alleles for the allele-specific heteroplasmies (Fig. S1.5). We
therefore conclude that variation in coverage cannot explain the age-related, tissue-related, and
allele-specific heteroplasmies found in this study.
Figure S1.3 Number of heteroplasmies identified (Y-axis) vs. average coverage (X-axis) for each
tissue sample. The numbers after each tissue abbreviation are the Spearman rank correlation
coefficient, followed by the associated p-value.
5
Figure S1.4 Average coverage (Y-axis) vs. age for each individual (X-axis) . The numbers after
each tissue abbreviation are the Spearman rank correlation coefficient, followed by the associated p-
value.
6
Figure S1.5 Boxplots of the variation in coverage for each of the seven tissue-specific and allele-
specific heteroplasmic positions shown in Fig. 2 of the main manuscript. In each plot, the coverage is
shown for each tissue and each consensus allele.
Fifth, if we focus on heteroplasmies shared by two or more tissues in the same individual, this
should enrich for heteroplasmies that were either transmitted from the mother or occurred early in
development (prior to the divergence of the tissues that share the heteroplasmy). A tree relating
tissues based on such shared heteroplasmies closely corresponds to patterns of tissue development
(Fig. S1.6), indicating that shared heteroplasmies are behaving as expected.
Sixth, previous studies of fewer tissues and individuals have found some of the same tissue-
related and allele-related heteroplasmies that we find (Table S2). However, these previous studies
analyzed too few samples to identify the significant tissue-related and allele-related patterns that we
have identified.
As a further check on the reproducibility of the results, we selected 15 samples for resequencing.
New libraries were prepared from the DNA extracts, pooled, captured for mtDNA sequencing as
before (1) and sequenced on the HiSeq platform (paired-end reads, 96 bp); the results are shown in
Fig. S1.7. There is a very high and significant correlation between the alternative allele frequencies
at each heteroplasmic site detected in the two HiSeq runs (r = 0.971, p <0.0001); thus the original
findings are reproducible with the same technology.
7
Figure S1.6 Neighbor-joining tree based on the alternative allele frequency at heteroplasmic sites
shared by two or more tissues. See legend to Fig. 1 for tissue abbreviations.
Figure S1.7 Correlation between alternative allele frequencies at heteroplasmic sites for 15 samples
resequenced on the HiSeq platform.
8
Finally, we also applied a different technology, namely droplet digital PCR (ddPCR, see
Methods) to independently estimate the heteroplasmy alternative allele frequency in a subset of the
data. We chose 8 positions (Table S7); one position (16086) exhibits heteroplasmy in virtually every
tissue, and was analyzed in every tissue from five individuals. The remaining positions show tissue-
related heteroplasmy (including NS mutations at positions 4142, 10851, 11126, and 12569 that occur
preferentially in liver). The results are provided in Table S8 and shown in Fig. S1.8; the correlation
between alternative allele frequencies (where the consensus allele is defined as the consensus among
all of the tissues from an individual) estimated from sequencing vs. ddPCR is quite high (Pearson’s r
= 0.996; p <0.00001). Even when restricting the comparison to low-level heteroplasmies, with
alternative allele frequencies less than 0.05, the correlation remains quite convincing (n= 57, r =
0.835, p<0.00001). Thus, these results provide independent confirmation of the heteroplasmies
inferred from sequencing.
In sum, we are not able to find any experimental or analytical artifact that could explain the age-
related, tissue-specific, and allele-specific heteroplasmies that we find in this study.
Figure S1.8 Correlation between heteroplasmy level (alternative allele frequency) estimated from
sequencing, vs. that estimated from ddPCR. Note that the consensus allele is defined from the
consensus sequences from all of the tissues in an individual, and hence the alternative allele
frequency in a specific tissue can be greater than 0.5.
9
Supplementary Note 2. Detecting positive selection involving nonsynonymous heteroplasmies.
Our introduction of the hN/hS statistic is motivated by a similar statistic, the dN/dS ratio, which
is commonly used as a test for selection on protein-coding genes (2): dN is the number of
nonsynonymous differences per nonsynonymous site between two sequences, while dS is the number
of synonymous differences per synonymous site. An analogous statistic is the ka/ks ratio, in which
ka is the number of nonsynonymous changes along a lineage per nonsynonymous site, and ks is the
number of synonymous changes along the same lineage per synonymous site; ka/ks ratios have, for
example, been used to evaluate claims of climate-related selection on human mtDNA variation (3).
For our hN/hS statistic, the numerator, hN, is the number of nonsynonymous heteroplasmies divided
by the total number of sites in the sequence where a mutation would produce a nonsynonymous
difference. The denominator, hS, is the number of synonymous heteroplasmies divided by the total
number of sites in the sequence where a mutation would produce a synonymous difference. Thus, the
ratio hN/hS is normalized for the number of nonsynonymous and synonymous sites in the sequence
(typically, about 70% of the positions in a protein-coding sequence are nonsynonymous sites and
30% are synonymous, but the actual numbers vary depending on the specific codons used).
The purpose of dividing hN by hS is to distinguish positive selection on nonsynonymous
heteroplasmies (which would increase hN only) from an elevated mutation rate (which would
increase both hN and hS). The conventional interpretation of hN/hS ratios is as follows:
hN/hS< 1 : some degree of purifying (negative) selection on nonsynonymous heteroplasmies (fewer
nonsynonymous than synonymous heteroplasmies)
hN/hS = 1: complete neutrality (no selection against or for nonsynonymous heteroplasmies)
hN/hS> 1: positive selection (more nonsynonymous than synonymous heteroplasmies)
Note that the lower the hN/hS ratio, the greater the degree of negative selection against amino acid
changes. If functional constraints are relaxed, such that the negative selection pressure is decreased,
then hN/hS will increase, but with relaxed constraints the hN/hS ratio is not expected to become
greater than one. If nonsynonymous and synonymous heteroplasmies are occurring at the same rate
(as would be expected, for example, with postmortem degradation), then the expectation is hN/hS =
1. However, it has been shown that the above interpretation of dN/dS (in our case, hN/hS) ratios only
hold strictly when distantly-related lineages are compared, such that dN and dS can be taken to
represent fixed differences between lineages; when comparing polymorphisms within a species, the
above relationships may not hold – e.g., it is possible to get dN/dS ratios that are less than one even
with positive selection, or dN/dS ratios greater than one without positive selection (4, 5). The
standard tests for significance of a dN/dS ratio therefore may not give accurate results when applied
to intraspecific data, and we would expect this to also hold for intra-individual data.
Therefore, in order to investigate the significance of the observed hN/hS ratio of 3.11 in liver-
specific heteroplasmies, we used a resampling approach that takes into account the observed
spectrum of heteroplasmic mutations. There were 114 liver-specific heteroplasmies in the mtDNA
coding region, with the following spectrum of mutations:
10
A>C: 2
A>G: 3
C>T: 3
G>A: 62
T>C: 43
T>G: 1
We took the coding portion of the rCRS sequence, applied the above spectrum of changes to
positions at random, and calculated the resulting hN/hS ratio. We repeated this procedure 100,000
times to generate a distribution of hN/hS ratios that would be expected if the observed spectrum of
mutational changes were occurring at random with respect to nonsynonymous vs. synonymous sites,
taking into account the actual codons used in human mtDNA sequences. The results are shown in
Fig. S2.1, and there are two important conclusions. First, the average hN/hS ratio is 1.4, and the
probability of a random hN/hS ratio that is greater than one is 0.93. This is in accordance with
previous observations that even under neutrality, dN/dS ratios greater than one can be obtained for
within-population comparisons (4, 5). Second, the empirical probability that a random hN/hS ratio
exceeds the observed value of 3.11 for liver-specific heteroplasmies is only 0.00241. This result is
significant after Bonferroni correction for the number of independent tests of hN/hS ratios: there are
16 such tests (12 based on tissue-shared heteroplasmies and 4 based on tissue-specific
heteroplasmies; see Table S5), resulting in an adjusted significance level of 0.05/16 = 0.00313. Thus,
this analysis provides strong evidence against the null hypothesis that relaxed constraints against
nonsynonymous mutations are producing the observed hN/hS ratio in liver tissue. Instead, these
results favor the alternative hypothesis of positive selection for nonsynonymous somatic mutations in
liver.
Figure S2.1 Distribution of hN/hS ratios obtained from resampling with the observed mutational
spectrum for liver-specific heteroplasmic mutations in the mtDNA coding region. The red line shows
11
the hN/hS ratio of 3.11 that is observed for liver-specific heteroplasmies, and the associated
empirical p-value.
We adopted a similar resampling approach to investigate if there was an excess of liver-specific
NS mutations that were predicted to have a high or medium risk of a functional effect on the protein.
We observed 103 NS liver-specific mutations; for 100 of these, the risk of a functional effect could
be assigned by the Mutationassessor software (6), and 84% are predicted to have a high or medium
risk of a functional effect. Using the observed mutational spectrum for these 100 mutations, we
resampled 100 NS mutations at random (based on the rCRS), predicted the risk of a functional effect
for each mutation, and repeated this process 100,000 times, to generate a null distribution for the
frequency of high/medium risk NS mutations. The results are shown in Fig. S2.2, and indicate that
the probability by chance of observing 84% of NS mutations with a high or medium risk of a
functional effect is only 0.00179. This analysis thus suggests that high/medium risk NS somatic
mutations are occurring preferentially in liver tissue.
Figure S2.2 Distribution of the proportion of predicted high/medium risk NS mutations, based on
random resampling of NS mutations conditioned on the mutation spectrum for liver-specific NS
heteroplasmies. The red line indicates the observed proportion of 0.84 for liver-specific NS
mutations and the associated empirical p-value.
12
Figure S1 Age distribution of the individuals in this study.
13
Figure S2 Sequencing coverage for each tissue. See legend to Fig. 1 for tissue abbreviations.
14
Figure S3 Correlation between age and level of heteroplasmy. The label for each plot indicates the
tissue, correlation coefficient, and p-value for the null hypothesis that the correlation coefficient is 0.
a np 72, consensus allele T
15
b np 189, consensus allele A
16
c np 94, consensus allele G
17
d np 408, consensus allele T
18
e np 64, consensus allele C
19
f np 16327, consensus allele C
20
g np 60, consensus allele T
21
h np 564, consensus allele G
22
i np 204, consensus allele T
23
j np 16148, consensus allele C
24
Figure S4 Predicted secondary structure in the genomic region surrounding each heteroplasmic
position exhibiting a significant allele-specific effect.
a np 72: left T, right C
25
b np 185: left A, right G
26
c np 189: left A, right G
27
d np16086: left C, right T
28
e np 16092: left C, right T
29
f np16093: left C, right T
30
g np 16129: left A, right G
31
Table S1. Major categories of cause of death for the subjects in this study.
Cause of death No. Percentage Avg. Age
Cardiovascular (myocaridal infarction, coronary disease, etc.) 49 32.2 60
Traumatic injuries 35 23.0 54
Natural causes (non-cardiovascular) 33 21.7 63
Intoxication 16 10.5 42
Unclear or other 19 12.5 65
TOTAL 152 100 58
32
Table S2 List of tissue and allele-related heteroplasmies found in this study that have also been reported in the same tissue
(or in tumors of that tissue) in previous studies.
Position Nucleotides Tissues showing allele specificity in this study
Tissues previously reported
Tumors previously reported
60 T>C KI, LIV KI1,2, LIV1,2
64 C>A KI, SM SM1,2
72 T>C KI,LIV,SM KI1,2, LIV1,2, SM1,3,10 LIV4,5
94 G>A KI, LIV KI2, LIV2
185 A>G All SM3,10 Pancreas6
189 A>G All except BL,KI,SK,OV CER1, LIV1, SM1,2,3,7 LIV4,5
203 G>A KI, LIV LIV2
408 T>A SM SM1,2
16092 C>T All CER1,CEL1
16093 C>T All multiple1,2,8
16129 A>G All except BL, SK Skeletal remains9
1 He, Y. et al. Nature 464, 610-614 (2010).
2 Samuels, D.C. et al. PLoS Genetics 9, e1003929 (2013).
3 Zsurka, G. et al. Nature Genetics 37, 873-877 (2005).
4 Lee, H.C. et al. Mutation Research 547, 71-78 (2004).
5 Zhang, R. et al. Journal of Experimental & Clinical Cancer Research 29, 130 (2010).
6 Navaglia, F. et al. American Journal of Clinical Pathology 126, 593-601 (2006).
7 Theves, C. et al. Journal of Forensic Sciences 51, 865-873 (2006).
8 Krjutškov, K. et al. Current Genetics 60, 1-6 (2013).
9 Nelson, K. & Melton, T. Journal of Forensic Sciences 52, 557-61 (2007).
10 Durham, S.E., Samuels, D.C. & Chinnery, P.F. Neuromuscul Disord 16, 381-386 (2006).
33
Table S3 Mutations pairs that occurred more often than expected from their frequencies in the population (p-value<0.001,q-value<0.005)
Tissue Sample size First
mutation np
Count Second
mutation np
Count Observed count with
both mutations
Expected count with
both mutations
MM 149 204 26 564 37 16 6
SM 150 408 65 16327 46 44 20
KI 151 185 8 189 11 8 1
LI 150 185 9 16126 13 6 1
SM 150 185 12 16126 9 5 1
CO 152 185 9 16126 11 5 1
LIV 151 185 8 16126 14 5 1
SI 150 152 9 564 12 5 1
BL 139 12684 4 12705 6 3 0
34
Table S4 Mutation spectrum for tissue-specific and tissue-shared heteroplasmies, and for polymorphisms (differences among
consensus sequences) from the same individuals. AAF, alternative allele frequency.
Mutations
Tissue-
specific
Proportion
(%)
Tissues-
shared
Proportion
(%) Polymorphism
Proportion
(%)
Heteroplasmies
AAF>10%
Proportion
(%)
AT 49 11.7 22 2.8 3 0.5 26 3.4
GA 185 44 342 44 281 44 361 46.8
CA 27 6.5 34 4.4 19 3 20 2.6
GT 14 3.3 2 0.3 5 0.8 0 0
CT 144 34.2 372 47.8 325 51.1 353 45.8
GC 2 0.5 5 0.6 3 0.5 11 1.4
Transversions 92 21.9 63 8.1 30 4.7 57 7.4
Transitions 329 78.1 714 91.9 606 95.3 714 92.6
35
Table S5 Nonsynonymous (NS) and synonymous (S) heteroplasmies that are either shared by two or more tissues, or specific to a single tissue.
“New” indicates heteroplasmies that have not been reported as polymorphisms (Phylotree Build 15). hN/hS is the ratio of NS heteroplasmies per
NS site to S heteroplasmies per S site.
Shared heteroplasmies Tissue-specific heteroplasmies All heteroplasmies
Tissue Number NS S New
NS New S hN/hS Number NS S New NS New S hN/hS Number NS S
New
NS
New
S hN/hS
BL 234 46 44 29 8 0.35 47 19 8 14 2 0.79 281 65 52 43 10 0.42
SI 340 46 32 26 7 0.48
340 46 32 26 7 0.48
LI 330 46 33 27 9 0.46 2
1
332 46 34 27 9 0.45
KI 458 42 30 23 7 0.47 38 1
1
496 43 30 24 7 0.48
LIV 497 38 25 20 6 0.5 146 103 11 95 2 3.11 643 141 36 115 8 1.3
MM 351 38 26 21 6 0.49 28 1
379 39 26 21 6 0.5
SM 486 40 27 22 6 0.49 129 1
1
615 41 27 23 6 0.5
CO 416 41 28 22 7 0.49
416 41 28 22 7 0.49
CEL 331 43 30 24 7 0.48
331 43 30 24 7 0.48
CER 374 40 27 20 6 0.49 6 2 2
0.33 380 42 29 20 6 0.48
SK 265 46 41 26 8 0.37 25 4 1 2
1.32 290 50 42 28 8 0.4
OV 74 12 8 10 1 0.5
74 12 8 10 1 0.5
All 4156 478 351 270 78 0.45 421 131 23 113 4 1.89 4577 609 374 383 82 0.54
36
Table S6 Liver-specific nonsynonymous and synonymous heteroplasmies per mtDNA
protein-coding gene. The P-value is for the null hypothesis that the hN/hS ratio is equal to
one.
Gene NS S hN/hS P-value
ND2 5 1 1.67 1
ND5 24 0 >7.87 0.0026
ND4 16 1 5.46 0.08843
ND1 15 1 5.35 0.08514
ND4L 2 1 0.71 1
COX3 5 2 0.81 0.6814
ATP6 4 0 >1.39 0.577
ND3 7 1 2.23 0.6857
CYTB 11 2 1.79 0.7425
ND6 6 1 1.95 1
COX1 6 0 1.96 0.3459
COX2 2 1 0.64 0.5644
ATP8 0 0 NA NA
All 103 11 3.11 0.0024
37
Table S7 List of positions and associated primers and probes analyzed in ddPCR experiments.
Numbers indicate positions; F and R indicate forward and reverse PCR primers; Probe indicates
allele-specific probe, 5’ Modification indicates fluorescent label attached to probe.
Position and primer/probe Sequence ( 5´- 3 ´) 5´Modification
16086_F TCATGGGGAAGCAGATTTGGG
16086_R ATATTCATGGTGGCTGGCAGT
16086_Probe_c CCATCAACAACCGCcATGTATTTCGTACA [6FAM]
16086_Probe_t CCCATCAACAACCGCtATGTATTTCGTACA [HEX]
11126_F CGCCACTTATCCAGTGAACC
11126_R ATCGGGTGATGATAGCCAAG
11126_Probe_g ATATCTTCTTCgAAACCACACTTATCCCC [6FAM]
11126_Probe_a ATATCTTCTTCaAAACCACACTTATCCCC [HEX]
4142_F CTCCCCTGAACTCTACACAACA
4142_R GGGGAAATGCTGGAGATTGT
4142_Probe_g AGCATACCCCCgATTCCGCT [6FAM]
4142_Probe_a AGCATACCCCCaATTCCGCT [HEX]
10851_F GCTAAAACTAATCGTCCCAACAA
10851_R AAAGGTTGGGGAACAGCTAAA
10851_Probe_g CACAACCACCCACAgCCTAATTATTAGC [6FAM]
10851_Probe_a CACAACCACCCACAaCCTAATTATTAGC [HEX]
12569_F TCAGTCTCTTCCCCACAACA
12569_R CGAACAATGCTACAGGGATG
12569_Probe_c CCAGCTCTCCCcAAGCTTCAAACTAG [6FAM]
12569_Probe_t CCAGCTCTCCCtAAGCTTCAAACTAG [HEX]
408_F CCAAACCCCAAAAACAAAGA
408_R TGGGAGGGGAAAATAATGTG
408_Probe_t CAAATTTTATCTTTtGGCGGTATGCACTT [6FAM]
408_Probe_a CAAATTTTATCTTTaGGCGGTATGCACTT [HEX]
564_F CTAACCCCATACCCCGAAC
564_R GGTGATGTGAGCCCGTCTA
564_Probe_g CCAAACCCCAAAgACACCCCC [6FAM]
564_Probe_a CCAAACCCCAAAaACACCCCC [HEX]
16327_F CAAACCTACCCACCCTTAACA
16327_R ATTGATTTCACGGAGGATGG
16327_Probe_c CATAAAGCCATTTAcCGTACATAGCACATT [6FAM]
16327_Probe_t CATAAAGCCATTTAtCGTACATAGCACATT [HEX]
38
Table S8 Comparison of alternative allele frequencies determined by ddPCR vs. sequencing on the
Illumina platform. Samples are indicated by tissue abbreviation (from the legend to Fig. 1) and ID
number; ddPCR and Sequencing are alternative allele frequencies inferred by ddPCR and Illumina
sequencing, respectively.
Position Sample Consensus
allele Alternative
allele ddPCR Sequencing
16086 KI200 C T 0.032 0.036
16086 KI202 C T 0.048 0.044
16086 KI282 C T 0.021 0.023
16086 KI354 C T 0.036 0.024
16086 KI375 C T 0.020 0.021
16086 CEL200 C T 0.061 0.089
16086 CEL202 C T 0.050 0.051
16086 CEL282 C T 0.081 0.083
16086 CEL354 C T 0.139 0.137
16086 CEL375 C T 0.050 0.066
16086 CER200 C T 0.250 0.232
16086 CER202 C T 0.117 0.126
16086 CER282 C T 0.183 0.194
16086 CER354 C T 0.372 0.355
16086 CER375 C T 0.136 0.153
16086 BL200 C T 0.033 0.018
16086 BL202 C T 0.010 0.011
16086 BL354 C T 0.023 0.010
16086 BL375 C T 0.000 0.000
16086 MM200 C T 0.242 0.244
16086 MM282 C T 0.085 0.086
16086 MM354 C T 0.451 0.445
16086 MM375 C T 0.019 0.015
16086 SM200 C T 0.898 0.909
16086 SM202 C T 0.390 0.464
16086 SM282 C T 0.900 0.917
16086 SM354 C T 0.950 0.945
16086 SM375 C T 0.313 0.353
16086 SK200 C T 0.021 0.024
16086 SK202 C T 0.022 0.010
16086 SK282 C T 0.024 0.005
16086 SK354 C T 0.027 0.018
16086 SK375 C T 0.012 0.006
16086 CO200 C T 0.239 0.228
16086 CO202 C T 0.158 0.143
16086 CO282 C T 0.156 0.151
16086 CO354 C T 0.286 0.291
16086 CO375 C T 0.148 0.156
16086 LI200 C T 0.093 0.078
16086 LI202 C T 0.083 0.077
39
Position Sample Consensus allele
Alternative allele
ddPCR Sequencing
16086
LI282
C
T
0.092
0.091
16086 SI200 C T 0.139 0.135
16086 SI202 C T 0.093 0.116
16086 SI282 C T 0.034 0.030
16086 SI354 C T 0.159 0.150
16086 SI375 C T 0.017 0.007
16086 LIV200 C T 0.024 0.028
16086 LIV282 C T 0.025 0.028
16086 LIV354 C T 0.048 0.047
16086 LIV202 C T 0.059 0.062
16086 LIV375 C T 0.064 0.058
16086 LIV375 C T 0.072 0.058
11126 SM290 G A 0.027 0.029
11126 LIV240 G A 0.028 0.034
11126 LIV248 G A 0.039 0.037
11126 LIV289 G A 0.072 0.056
11126 LIV307 G A 0.024 0.024
11126 LIV361 G A 0.086 0.079
11126 LIV197 G A 0.020 0.024
4142 LIV264 G A 0.023 0.026
4142 LIV289 G A 0.039 0.034
4142 LIV334 G A 0.029 0.030
10851 LIV338 G A 0.057 0.054
10851 LIV344 G A 0.024 0.026
10851 LIV197 G A 0.018 0.024
12569 LIV261 T C 0.135 0.127
12569 LIV315 T C 0.021 0.040
408 SM192 T A 0.066 0.105
408 CO192 T A 0.010 0.010
408 SM239 T A 0.089 0.122
408 SM289 T A 0.226 0.282
408 SM248 T A 0.263 0.231
408 CO248 T A 0.022 0.011
408 SM379 T A 0.062 0.090
408 SM323 T A 0.060 0.057
408 SM279 T A 0.058 0.063
408 SM222 T A 0.033 0.038
408 SM347 T A 0.043 0.035
408 SM268 T A 0.024 0.020
408 SM214 T A 0.017 0.021
564 MM315 G A 0.011 0.014
564 SM315 G A 0.058 0.052
564 MM290 G A 0.047 0.044
564 MM193 G A 0.046 0.043
564 MM381 G A 0.029 0.029
40
Position Sample Consensus allele
Alternative allele
ddPCR Sequencing
564 MM279 G A 0.024 0.030
564 MM220 G A 0.024 0.020
564 MM227 G A 0.023 0.022
564 SM326 G A 0.030 0.026
16327 SM362 C T 0.185 0.199
16327 SM289 C T 0.124 0.139
16327 SM244 C T 0.093 0.114
16327 SM235 C T 0.045 0.067
16327 SM248 C T 0.065 0.082
16327 SM227 C T 0.033 0.053
16327 SM355 C T 0.034 0.050
16327 SM230 C T 0.034 0.040
16327 SM380 C T 0.025 0.030
16327 LI380 C T 0.002 0.007
16327 SM314 C T 0.021 0.020
16327 SM204 C T 0.072 0.093
16327 SK204 C T 0.011 0.010
41
Additional References
1. Maricic T, Whitten M, & Paabo S (2010) Multiplexed DNA sequence capture of mitochondrial genomes using PCR products. PLoS One 5:e14004.
2. Nielsen R (2005) Molecular signatures of natural selection. Annu Rev Genet 39:197-218. 3. Ingman M & Gyllensten U (2007) Rate variation between mitochondrial domains and adaptive
evolution in humans. Hum Mol Genet 16:2281-2287. 4. Kryazhimskiy S & Plotkin JB (2008) The population genetics of dN/dS. PLoS Genet 4:e1000304. 5. Mugal CF, Wolf JB, & Kaj I (2014) Why time matters: codon evolution and the temporal
dynamics of dN/dS. Mol Biol Evol 31:212-231. 6. Reva B, Antipin Y, & Sander C (2011) Predicting the functional impact of protein mutations:
application to cancer genomics. Nucleic Acids Res 39:e118.