clonal evolution of acute myeloid leukemia revealed by ... · 1 clonal evolution of acute myeloid...
TRANSCRIPT
Clonal Evolution of Acute Myeloid Leukemia Revealed by High-Throughput Single-Cell 1 Genomics 2
Kiyomi Morita1,8*, Feng Wang2*, Katharina Jahn6*, Jack Kuipers6, Yuanqing Yan7, Jairo 3 Matthews1, Latasha Little2, Curtis Gumbs2, Shujuan Chen2, Jianhua Zhang2, Xingzhi Song2, 4 Erika Thompson3, Keyur Patel4, Carlos Bueso-Ramos4, Courtney D DiNardo1, Farhad Ravandi1, 5 Elias Jabbour1, Michael Andreeff1, Jorge Cortes1, Marina Konopleva1, Kapil Bhalla1, Guillermo 6 Garcia-Manero1, Hagop Kantarjian1, Niko Beerenwinkel6†, Nicholas Navin3,5, P Andrew 7 Futreal2† and Koichi Takahashi1,2† 8 9 Departments of 1Leukemia, 2Genomic Medicine, 3 Genetics, 4Hematopathology, 5Bioinformatics, 10 The University of Texas MD Anderson Cancer Center, Houston, Texas, USA 11 6Department of Biosystems Science and Engineering, Swiss Federal Institute of Technology in 12 Zurich, Zurich, Switzerland 13 7Department of Neurosurgery, The University of Texas Health Science Center at Houston, 14 Houston, Texas, USA 15 8Department of Hematology and Oncology, Graduate School of Medicine, The University of 16 Tokyo, Tokyo, Japan 17 18 19 20 *These authors contributed equally to this work. 21 22 †Correspondence to: 23 Niko Beerenwinkel, Ph.D. 24 Department of Biosystems Science and Engineering, Swiss Federal Institute of Technology in 25 Zurich, Mattenstrasse 26, 4058, Basel, Switzerland, Email: [email protected] 26 27 28 P Andrew Futreal, Ph.D. 29 Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center,1881 30 East Road, Unit 1954, Houston, TX 77054, USA; Email: [email protected] 31 32 Koichi Takahashi, M.D. 33 Department of Leukemia and Genomic Medicine, The University of Texas MD Anderson 34 Cancer Center, 1515 Holcombe Boulevard, Unit 428, Houston, TX 77030, USA; Email: 35 [email protected] 36 37 38 39
Summary 1
One of the pervasive features of cancer is the diversity of mutations found in malignant 2
cells within the same tumor; a phenomenon called clonal diversity or intratumor heterogeneity. 3
Clonal diversity allows tumors to adapt to the selective pressure of treatment and likely 4
contributes to the development of treatment resistance and cancer recurrence. Thus, the ability to 5
precisely delineate the clonal substructure of a tumor, including the evolutionary history of its 6
development and the co-occurrence of its mutations, is necessary to understand and overcome 7
treatment resistance. However, DNA sequencing of bulk tumor samples cannot accurately 8
resolve complex clonal architectures. Here, we performed high-throughput single-cell DNA 9
sequencing to quantitatively assess the clonal architecture of acute myeloid leukemia (AML). 10
We sequenced a total of 556,951 cells from 77 patients with AML for 19 genes known to be 11
recurrently mutated in AML. The data revealed clonal relationship among AML driver mutations 12
and identified mutations that often co-occurred (e.g., NPM1/FLT3-ITD, DNMT3A/NPM1, 13
SRSF2/IDH2, and WT1/FLT3-ITD) and those that were mutually exclusive (e.g., NRAS/KRAS, 14
FLT3-D835/ITD, and IDH1/IDH2) at single-cell resolution. Reconstruction of the tumor 15
phylogeny uncovered history of tumor development that is characterized by linear and branching 16
clonal evolution patterns with latter involving functional convergence of separately evolved 17
clones. Analysis of longitudinal samples revealed remodeling of clonal architecture in response 18
to therapeutic pressure that is driven by clonal selection. Furthermore, in this AML cohort, 19
higher clonal diversity (≥4 subclones) was associated with significantly worse overall survival. 20
These data portray clonal relationship, architecture, and evolution of AML driver genes with 21
unprecedented resolution, and illuminate the role of clonal diversity in therapeutic resistance, 22
relapse and clinical outcome in AML. 23
Main 1
A growing body of evidence supports the role of clonal diversity in therapeutic 2
resistance, recurrence, and poor outcomes in cancer 1. Clonal diversity also reflects the history of 3
the accumulation of somatic mutations within a tumor. Thus, a precise characterization of clonal 4
diversity reveals not only the extent of a tumor’s clonal complexity but also the evolutionary 5
history of the tumor’s development. Much of the work characterizing the clonal architecture of 6
tumors has been done by computational inference using variant allele fraction (VAF) data from 7
massively parallel DNA sequencing of bulk tumor samples 2,3. However, the ability to infer 8
clonal heterogeneity and tumor phylogeny from bulk sequencing data is inherently limited, 9
because bulk sequencing techniques cannot reliably infer mutation co-occurrences and hence 10
often fail in reconstructing clonal substructure. 11
Single-cell DNA sequencing (scDNA-seq) can address some of these challenges 4-8. 12
However, until recently, the available methods required laborious single-cell isolation protocols 13
and suffered from low cell throughput, limited gene coverage, and technical artifacts from 14
whole-genome amplification that hindered their ability to characterize clonal architecture with 15
precision 9. Recent technological advances now allow rapid single-cell genotyping of targeted 16
cancer-related genes in thousands of cells. We previously described the performance and 17
feasibility of a new scDNA-seq platform (Tapestri® , Mission Bio, Inc.) in primary samples 18
from 2 patients with acute myeloid leukemia (AML) 10. Here, using this method, we conducted 19
scDNA-seq in 91 AML samples from 77 patients and uncovered the landscape of AML clonal 20
architecture at single-cell resolution. Using the data, we reconstructed the mutational history of 21
driver genes, some of which are therapeutic targets, and identified both linear and branching 22
clonal evolution patterns in AML. Additionally, we studied dynamic changes of clonal 23
architecture in response to therapies and analyzed the clinical implications of clonal diversity in 1
AML. 2
3
The landscape of driver mutations in AML at single-cell resolution 4
We analyzed bone marrow mononuclear cells (BMNC) from 77 AML patients, of which 5
64 (83%) were previously untreated, and 13 (17%) had relapsed or refractory disease (detailed 6
clinical characteristics are summarized in Supplementary Table 1). The cohort was enriched with 7
samples with normal diploid karyotype (N = 68, 88%) to avoid allelic imbalance affecting the 8
genotype calling. The median bone marrow blast percentage was 44% (interquartile range [IQR]: 9
29%-67%). A median of 7,584 BMNC (IQR: 6,194-8,361) per sample were sequenced by the 10
scDNA-seq platform (Fig. 1a). Across 40 amplicons targeting 19 known AML driver genes, 11
scDNA-seq resulted in a median of 25× coverage per amplicon per cell (IQR: 12×-43×, 12
Extended Data Fig. 1). The amplicons covering guanine-cytosine–rich sequences, such as 13
GATA2, SRSF2, and parts of RUNX1 and TP53, had lower coverage than others, such that 14
relatively large numbers of cells had inconclusive genotype information for the mutations 15
covered by these amplicons (Extended Data Fig. 2). The estimated median allele dropout (ADO) 16
rate was 4.7% (IQR: 3.6%-5.7%) (Extended Data Fig. 3). The estimated lower limit of detection 17
(LOD) of the platform was 0.1% of the cellular population based on the serial dilution assay of a 18
cell line and also from mutation validation by droplet digital PCR (Supplementary Table 2 and 19
Extended Data Fig. 4). 20
In total, we sequenced 556,951 BMNC from 77 AML patient samples (Fig. 1b). The 21
scDNA-seq approach detected 331 somatic mutations in 19 cancer genes, which included 238 22
(72%) single-nucleotide variants (SNV) and 93 (28%) small indels. Among those, 314 mutations 23
(95%) were orthogonally validated: 274 (87%) by conventional bulk next-generation sequencing 1
11 (bulk-seq, median 407×), 29 (9%) by droplet digital PCR, and 11 (3%) by a quantitative PCR 2
assay (all FLT3-internal tandem duplication [ITD], 4%). Therefore, the subsequent analyses used 3
a final set of 314 validated mutations (Supplementary Table 3). Of note, among the shared 4
genomic regions covered by the scDNA-seq and the bulk-seq platforms, all 274 (100%) 5
mutations called by the bulk-seq were also detected by scDNA-seq. The VAF from bulk-seq 6
(bulk VAF) and the VAF inferred from the scDNA-seq data (scDNA-seq VAF) had a good 7
concordance (rs = 0.78, p < 0.001) suggesting that the sequenced cells are a good representation 8
of the total bulk samples. (Fig. 1c and Extended Data Fig. 5). 9
The most frequently detected mutations by scDNA-seq in the 77 patients were in FLT3 10
(N = 37, 48%; 30 [39%] with ITD and 16 [21%] with non-ITD mutations), followed by NRAS 11
(N = 35, 45%), NPM1 (N = 32, 42%), IDH2 (N = 23, 30%), DNMT3A (N = 20, 26%), SRSF2 (N 12
= 17, 22%), RUNX1 (N = 14, 18%), KRAS (N = 14, 18%), PTPN11 (N = 14, 18%), and WT1 13
(N=14, 18%). scDNA-seq detected substantially more FLT3 mutations (11 [79%] ITD and 3 14
[21%] non-ITD) than bulk-seq (Extended Data Fig. 6a). This is likely due to the capability of the 15
scDNA-seq platform in detecting cryptic FLT3 mutations in small cellular subpopulations 16
(Extended Data Fig. 6b), which has been also reported previously using a different single-cell 17
technology 12. 18
19
Cellular-level co-occurrence and mutual exclusivity of AML driver mutations 20
Analysis of the cell-level co-occurrence and mutual exclusivity of AML driver mutations 21
suggested cooperative mechanisms and functional redundancy among these mutations. For 22
instance, sample number AML-68-001 carried DNMT3A, NPM1, and FLT3-ITD mutations, 23
which are the most frequently co-occurring mutations in AML at the patient level 13. scDNA-seq 1
unambiguously identified a cellular population carrying all 3 of these mutations (Fig. 1d). 2
Analysis of statistically significant mutation co-occurrence (false discovery rate [FDR] < 0.001) 3
identified frequently co-occurring mutations at the cell-level, which included NPM1/NRAS, 4
NPM1/FLT3, NPM1/DNMT3A, SRSF2/IDH2, NPM1/PTPN11, NPM1/IDH2, DNMT3A/FLT3-5
ITD, DNMT3A/NRAS, WT1/FLT3-ITD, NRAS/IDH2, NPM1/KRAS, NPM1/WT1, and others (Fig. 6
1e-f, Extended Data Fig. 7a and 8, variant-level co-occurrence is summarized in Extended Data 7
Fig. 7b). These cell-level co-occurrence data confirm the known cooperative relationship 8
between the driver mutations that has been suggested by previous studies with bulk-seq13,14 or 9
functional studies15-21, but also generates new hypotheses for the role of rare combinations in 10
cellular oncogenesis. For instance, we detected the cell-level co-occurrence of SF3B1 p.K666N 11
and SRSF2 p.P95H mutations in AML-04-001 (Fig. 1g). Mutations in RNA splicing genes are 12
mutually exclusive in general and thought to have functional redundancy or synthetically lethal 13
relationship 22; however, in rare instances, patients having both SF3B1 and SRSF2 mutations 14
were reported 23. Our data confirm that the two mutations can co-occur at the cellular level, and 15
suggests potential cooperativity between the two RNA splicing mutations. 16
In contrast, sample AML-45-001 carried ASXL1, KRAS, and 2 NRAS mutations. The 17
KRAS and 2 NRAS mutations were mutually exclusive at the cellular level (Fig. 1h; FDR < 18
0.001). Mutually exclusive relationships were frequently identified among mutations that are in 19
the same gene or part of the same molecular pathway (e.g., KIT p.D816V and FLT3-ITD; IDH1 20
and IDH2; FLT3-p.D835Y and FLT3-ITD; RUNX1 p.K152fs and RUNX1 p.D198N) (Fig. 1i-l). 21
These results support the widely explored hypothesis that co-occurrence of two or more 22
functionally redundant oncogenic mutations does not provide a selective advantage to cancer 23
cells and could be potentially synthetically lethal 24,25. Taken together, these data provide 1
definitive evidence of mutation co-occurrence at cellular level and provide a landscape of clonal 2
relationship among AML driver mutations (Extended Data Fig. 8 and 9). 3
4
Zygosity of AML driver mutations 5
One of the unique aspects of scDNA-seq is its capability of calling mutations in 6
individual cells with zygosity information. In fact, a previous single-cell study reported the 7
cellular diversity in the zygosity of NPM1 and FLT3 mutations in AML4. However, the lack of 8
validation method has made the interpretation of zygostiy difficult. In the current cohort, 9
mutations in genes such as FLT3-ITD, GATA2, JAK2, NPM1, RUNX1, and SRSF2 were 10
frequently detected as homozygous (Extended Data Fig. 10). Because amplicons covering some 11
of these mutations (GATA2, RUNX1, and SRSF2) had relatively low sequencing depth (Extended 12
Data Fig. 1), it is possible that some homozygous calls were the result of low sequencing depth 13
and ADO. To validate zygosity called by the scDNA-seq, we performed SNP arrays in selected 14
samples with homozygous mutation calls. In AML-25-001, 97% of the cells had a homozygous 15
RUNX1 p.Q335X mutation determined by scDNA-seq data, and SNP array data detected a copy-16
neutral loss of heterozygosity (CN-LOH) on chromosome 21 involving RUNX1 (Fig. 2a). 17
Similarly, in AML-03-001, 66% of the cells had a homozygous FLT3-ITD mutation determined 18
by scDNA-seq data, and the SNP array detected CN-LOH on chromosome 13 involving FLT3 19
(Fig. 2b). These results indicate that the observation of homozygosity of the RUNX1 and FLT3-20
ITD mutations in these cases was true and was a result of CN-LOH. In contrast, none of the 21
samples with homozygous SRSF2 (17% of the cells genotyped as homozygous in AML-57-001, 22
Fig. 2c) or NPM1 (13% of the cells genotyped as homozygous in AML-13-001, Fig. 2d) 23
mutations had allelic imbalance involving the mutated loci. These results do not rule out the 1
possibility that the SNP arrays missed the subclonal allelic imbalance. However, the cells that 2
were genotyped as homozygous had significantly lower sequencing depth than did the cells that 3
were genotyped as heterozygous (Fig. 2c-d and Extended Data Fig. 11), suggesting that the 4
homozygous calls in these mutations may have resulted from low sequencing depth and ADO. 5
For cases with validated homozygous calls, the zygosity of the mutations added further 6
resolution to the interpretation of the clonal substructure of the samples (Fig. 2a-b and Extended 7
Data Fig. 11). 8
9
Reconstructing mutational histories in AML 10
To reconstruct mutational histories in AML, we used SCITE, a probabilistic model to 11
infer phylogenetic trees from single-cell sequencing data that involves a flexible Markov-chain 12
Monte Carlo (MCMC) algorithm 26 (Supplemental Methods). Reconstructed phylogenies 13
revealed both linear and branching evolution patterns in AML (Extended Data Fig. 12). Patients 14
showing branching evolution had a significantly higher number of mutations, compared with 15
those with linear evolution (median number of mutations 5 [IQR: 4-6] vs. 3 [IQR: 2-4], p < 16
0.001, Fig. 3a). In cases with linear evolution pattern, ancestral mutations often involved 17
DNMT3A, IDH2, and SRSF2 mutations that have been detected as preleukemic clonal 18
hematopoiesis 27,28 followed by sequential accrual of secondary mutations that frequently 19
involved NPM1, FLT3, RUNX1, NRAS, KRAS, and PTPN11. (Fig. 3b-e and Extended Data Fig. 20
12) 13,29. In some cases with branching pattern, we observed the evolutionary history that is 21
consistent with convergent evolution. For example, in AML-38-001, a putative founding 22
mutation, NPM1 p.L287fs, generated 2 independent branches with IDH1 p.R132H or IDH2 23
p.R140Q mutations. Each of these branches then separated into PTPN11 p.D61H or KRAS 1
p.G12A mutations and FLT3-ITD, NRAS p.G13R, PTPN11 p.A72G, or NRAS p.G12A 2
mutations, respectively. As a result, the sample carried 9 clones, each with a combination of 3
similar, but separately evolved, molecular alterations (NPM1-IDH-RAS/RTK pathway alteration) 4
and the same mutation order (Fig. 3f). Other cases exhibiting evidence of branching evolution 5
are shown in Fig. 3g-i and Extended Data Fig. 12. These data with convergent evolution indicate 6
the presence of selective pressure in a bone marrow ecosystem that favors AML clones having 7
certain combination of molecular alterations with a fixed order. 8
9
Clonal remodeling under therapeutic pressure 10
We then analyzed 24 longitudinal samples from 10 patients (8 with baseline and relapse 11
pairs and 2 with multiple refractory timepoints) to study the evolution of clonal architecture in 12
response to therapies (Fig. 4 and Extended Data Fig. 13). We observed clonal selection and 13
adaptation under the therapeutic pressure that were associated with the patients’ clinical courses. 14
For instance, AML-09 was a 74-year-old man with previously untreated AML with NPM1 15
p.L287fs, FLT3-ITD, FLT3 p.D835E, FLT3 p.D835Y, and KRAS p.G13D mutations. The patient 16
was treated with azacitidine and sorafenib and experienced morphological complete remission 17
(i.e. leukemic blast less than 5% in marrow with normal recovery of blood counts). However, 5 18
months later, his AML relapsed. scDNA-seq of the baseline-relapse pair revealed that NPM1-19
FLT3 p.D835Y clone, originally a subclone that constituted 1.7% of the diagnostic sample, was 20
selected during the therapy, suggesting that clonal selection is the underlying mechanism of 21
relapse in this case (Fig. 4a). This clonal selection is consistent with the known in vitro 22
differential sensitivity of various FLT3 mutations to sorafenib; indeed, the FLT3 p.D835Y 23
mutation was shown to be more resistant to sorafenib than were the D835E and ITD mutations 1
30. 2
The second case AML-21 is a 56-year-old-woman with newly-diagnosed AML who was 3
treated with induction chemotherapy with clofarabine, idarubicin, and cytarabine followed by 4 4
additional cycles of consolidation therapy. After approximately 7 months of remission, her 5
disease relapsed. While both baseline and relapse samples shared ancestral WT1 p.A382fs -6
NPM1 mutations, the NRAS clone was replaced by the FLT3-ITD-WT1 p.S381fs clone at relapse 7
(the acquired WT1 p.S381fs mutation was in trans, making biallelic WT1 mutations at relapse, 8
Extended Data Fig. 14). These FLT3-ITD and WT1 p.S381fs mutations were undetectable at 9
baseline. They were likely acquired de novo at relapse or sub-detectable at baseline and were 10
selected during the therapy (Fig. 4b). 11
Finally, two treatment refractory AML cases showed highly adaptive clonal structure 12
during therapy. Both AML-38 (Fig. 4c, the same case in Fig. 3f) and AML-04 (Fig. 4d) had 13
AML with multiple branching clones. In both cases, treatment with a FLT3 inhibitor-containing 14
therapy decreased clones with FLT3 mutations, however with a concurrent expansion or 15
selection of other clones and development of new clones. High clonal diversity in both cases 16
seems to have allowed these AMLs to flexibly re-configure their clonal composition during 17
therapy, which likely contributed to the therapeutic resistance (Fig. 4c-d). These data from 18
longitudinal cases illustrate the evolution of clonal architecture under the selective pressure of 19
therapy and elucidate the role of clonal selection and adaptation in therapy resistance and 20
relapse. 21
22
Association between clonal diversity and clinical outcomes 23
We then analyzed the clinical implications of clonal diversity in AML that is uncovered 1
by our single-cell sequencing. The median number of cell subclones per patient was 4 (IQR: 3-2
5). Using the median as a cut-off, we divided the patients into lower (<4 subclones) and higher 3
clonal diversity (≥4 subclones) groups. Patients with higher clonal diversity were significantly 4
older compared with those with lower clonal diversity (median age 63 vs. 56 years, p = 0.0283). 5
Patients with higher clonal diversity were more likely to have a secondary or therapy-related 6
disease, relapsed/refractory disease, and tended to harbor chromosomal abnormalities detected 7
by karyotyping, although the differences were statistically not significant. Among the 64 8
previously-untreated cases, those with higher clonal diversity had a trend toward lower CR rate 9
compared with those with lower clonal diversity (CR rate 78% vs. 97%, p = 0.0534, Fig. 5a). 10
Also, AML patients with higher clonal diversity showed significantly worse overall survival 11
(OS) compared with those with lower clonal diversity (2-year OS 25 vs. 59 months, p = 0.0469; 12
Fig. 5b). 13
14
Discussion 15
Using a novel high-throughput scDNA-seq platform, we determined the clonal 16
architecture of AML at single-cell resolution and described the clonal relationships among AML 17
driver mutations. Cell-level co-occurrence and mutual exclusivity data obtained from this study 18
provide the rationale for future studies investigating the cooperative mechanism, functional 19
redundancy, and synthetic lethality among oncogenic mutations. Reconstruction of mutational 20
history based on the single-cell data not only revealed inter-tumor diversity in the evolutionary 21
history of AML but also provided evidence for linear and branching evolution patterns in AML 22
with some cases exhibiting convergent evolution. In cases with convergent evolution, we 23
observed clones that evolved separately but with a similar coalescence of molecular alterations, 1
which is similar to the observations in other studies utilizing multi-region/site sequencing or 2
single-cell analysis for different tumors6,31-34. These observations indicate for an underlying 3
genetic instability and evolutionary adaptation of AML clones to selective pressure in tissue 4
ecosystem. Cancer therapies, particularly molecularly targeted therapies, provide additional 5
selective pressure to AML clones, which facilitates selection of resistant clones and acquisition 6
of new mutations or clones, leading to recurrence or treatment resistance. 7
This work represents the largest cohort of AML patients yet examined at single-cell 8
resolution, moves a growing body of data4,6 forward into a deeper understanding of the 9
fundamental clonal architectures of AML. The depth of both patient numbers and cells 10
sequenced allowed a robust analysis of the clonal relationship and phylogeny in this study 11
despite the technical challenges associated with single-cell sequencing, such as but not limited 12
to, ADO, multiplets, coverage inconsistency, and false positives. Moreover, a large sample size 13
allowed the description of inter-tumor diversity in the patterns of clonal evolution, and offered 14
some indication that clonal diversity affects prognosis in AML. Here, we interrogated 19 known 15
leukemia driver genes that have given rise to a remarkable level of clonal complexity in AML. It 16
is noteworthy that this is still an underestimation of the true extent of clonal diversity. Future 17
studies with even more cells, broader coverage of the genome, and integration with single-cell 18
transcriptomic and epigenomic states, that is becoming a reality with the recent technological 19
advancement 35 36, will further elucidate the clonal diversity and evolutionary trajectories of 20
AML, which may contribute to the development of predictive biomarkers or therapies targeting 21
clonal diversity. 22
23
METHODS 1
Patients and samples 2
We included in the analysis 91 samples (89 bone marrow mononuclear cells and 2 3
peripheral blood mononuclear cells) from 75 patients with AML and 2 patients with high-risk 4
myelodysplastic syndrome who had at least one somatic mutation covered by the targeted panel 5
for scDNA-seq. In order to avoid allelic imbalance, samples with a normal karyotype were 6
prioritized for analysis. For samples that exhibited cytogenetic abnormalities, we confirmed that 7
the chromosomal position of the examined somatic mutations did not overlap with the regions 8
with a structural variation. Of the 77 patients, 67 patients were analyzed for the single-timepoint 9
sample including pre-treatment (N=59), relapse (N=5), and random timepoint with refractory 10
disease (N=3). For the remaining 10 patients, we analyzed the longitudinal samples obtained at 11
pre-treatment and relapse (N=6), pre-treatment, during treatment, and relapse (N=2), and 3 12
random refractory timepoints (N=2). All the patients provided written informed consent for 13
sample banking and analysis. The study was approved by the MD Anderson institutional review 14
board and was in accordance with the Declaration of Helsinki. 15
Variant detection by single-cell DNA sequencing 16
We used a novel microfluidic approach with molecular barcode technology to amplify the 17
DNA from individual cells as previously described 10. Briefly, cryopreserved bone marrow 18
mononuclear cells were thawed, and cells were quantified using a Countess Automated Cell 19
Counter (Thermo Fisher). The cells were resuspended in cell buffer and diluted to a 20
concentration of 2,000,000 to 4,000,000 cells/mL. Next, 100 µL of cell suspension was loaded 21
onto a microfluidics cartridge and cells were encapsulated on the Tapestri instrument followed 22
by the cell lysis and protease digestion on a thermal cycler within the individual droplet. The cell 23
lysate was then barcoded such that each cell had a unique label. The barcoded samples were then 1
thermocycled using 50 primer pairs specific to a panel of 19 mutated genes covering known 2
AML-related hotspot loci and 10 commonly heterozygous SNP loci for ADO determination 3
(Supplementary Table 4). 4
The pooled library was sequenced on an Illumina MiSeq system with 150- or 250-base 5
pair (bp) paired-end multiplexed runs. Detailed methods are provided in the Supplemental 6
Methods. Briefly, fastq files generated by the MiSeq instrument were processed using the 7
Tapestri Analysis Pipeline for adapter trimming, sequence alignment, barcode correction, cell 8
finding, and variant calling. Loom files that were generated by the pipeline via GATK-based 9
haplotype calling were then processed using in-house filtering criteria. We included cells for 10
downstream analysis that met the following criteria for genotyping: total read count (depth, DP) 11
≥ 10× and alternative allele count ≥ 3 (scVAF ≥ 15% if 20× ≤ DP ≤ 99×; scVAF ≥ 10% if DP ≥ 12
100×). Cells that did not satisfy these criteria were considered to have missing genotypes. 13
The ADO rate was calculated on the basis of common SNP information using 10 14
amplicons designed to cover 10 highly polymorphic loci in the Tapestri Single-Cell DNA AML 15
Panel. 16
Mutation detection by bulk sequencing 17
As an orthogonal validation, all samples were concurrently sequenced by conventional 18
bulk next-generation sequencing (NGS) using target-capture deep sequencing (N = 66, median 19
coverage: 432×, IQR: 283×-610×) or whole-exome sequencing (N = 11, median coverage: 146×, 20
IQR: 86×-158×). Target-capture NGS was performed using a SureSelect (Agilent Technologies) 21
custom panel of 295 genes that are recurrently mutated in hematological malignancies 22
(Supplementary Table 5). Detailed methods were previously described 11. Briefly, genomic DNA 23
was extracted using an Autopure extractor (QIAGEN/Gentra) and was fragmented and bait-1
captured in solution according to the manufacturer’s protocols. Captured DNA libraries were 2
then sequenced using a HiSeq 2000 sequencer (Illumina) with 76-bp paired-end reads. Whole-3
exome sequencing was performed using SureSelect V4 exome probes (Agilent Technologies) 4
and a HiSeq 2000 sequencer (Illumina) with 76-bp paired-end reads. Modified Mutect and Pindel 5
algorithms were used for mutation calling as described previously 11. 6
Comparison of genotype results from scDNA-seq and bulk sequencing 7
To determine how the models of clonal architecture obtained using the 2 sequencing 8
methods differed, we compared the VAF from bulk sequencing (bulk VAF) and the VAF from 9
single-cell genotype data (scDNA-seq VAF). scDNA-seq VAF was calculated as follows based 10
on the sequencing reads from the pooled single cells: (number of the single-cell sequencing reads 11
with alternate allele) / (number of total single-cell sequencing reads). 12
Inference of mutational histories 13
We used the SCITE (Single Cell Inference of Tumor Evolution) software to infer 14
phylogenetic trees of the driver mutations from scDNA-seq data as previously described 26. 15
SCITE is an MCMC-based Bayesian inference scheme that can be used to find a mutation tree (a 16
partial temporal order of mutations) that best fits the observed single-cell genotypes. The 17
concentration on the mutation tree (as opposed to a cell lineage tree) makes the use of SCITE 18
very efficient for use with our data that is characterized by few mutational events and many cells. 19
SCITE operates with 2 parameters, one for the false positive rate (FPR) and one for the 20
false negative rate, which can be either set to predefined values or inferred in the MCMC model 21
along with the tree structure. We used a global estimate of the sequencing error rate as the FPR 22
(1%) and dataset-specific estimates of the dropout rate (ADO provided by the platform) as the 23
false negative rate (FNR). In cases where no dropout rate was estimated, we let SCITE learn the 1
value from the data by giving it the average value of the estimates across all patients as a prior 2
estimate. We ran SCITE separately for each patient, providing the table of mutation calls as the 3
input (encoding 0 for wild-type, 1 for mutation, and 3 for missing data point). To obtain a robust 4
model, we ran SCITE with 4 different combinations of parameters: 1) use all cells including 5
missing genotype information with 1% FPR and SCITE inferred FNR, 2) use all cells including 6
missing genotype information with 1% FPR and platform provided FNR, 3) use only cells with 7
full genotype information with 1% FPR and SCITE inferred FNR, and 4) use only cells with full 8
genotype information with 1% FPR and platform provided FNR. When provided with an 9
incomplete genotype for a cell, SCITE is still able to use the partial genotyping information in 10
the tree inference and assigns cells into subclones based on the available information. 11
The inference procedure underlying SCITE is fully Bayesian, which allowed us to 12
quantify uncertainty in the inferred clonal architectures by sampling trees from the model’s 13
posterior distribution. We summarized the sampled trees by reporting 95% credible intervals for 14
the inferred subclones. 15
The tree structure (branching vs. linear) were mostly consistent among the 4 models (47 16
of 76 [62%] cases showing consistent tree structure, Extended data. Fig. 12). Phylogeny figures 17
that are shown in Fig. 3 are based on model 2 (all cells, 1% FPR, and platform provided FNR). 18
For longitudinal samples, we combined the scDNA-seq data from all time points from the same 19
patient and ran SCITE for the pooled data, and reconstructed the tumor phylogeny. To obtain 20
time point-specific estimates of subclone sizes, we performed the cell to subclone assignment in 21
the posterior sampling separately for each time point. As in some cases not all mutations were 22
observed at all time points, we adjusted the assignment probabilities such that a cell cannot be 23
placed below any mutation unobserved at the cell’s sampling time. This leads to subclones with a 1
temporary prevalence of 0%. This does not necessarily mean that the subclone was non-2
existent/extinct at that time, but simply reflects the lack of evidence for its existence based on the 3
cells sampled at the respective time point. The number of subclones was defined as the number 4
of distinct cellular populations carrying at least one mutations based on model 2. 5
SNP array 6
Genomic DNA from 28 samples in which scDNA-seq data showed at least 5% of 7
homozygously mutated clones were analyzed by Illumina Omni2.5-8 SNP array. The raw data 8
retrieved from an Illumina Omni2.5-8 SNP array was processed using GenomeStudio 2.0. The 9
raw log R ratio and B allele frequency were used for ASCAT (allele-10
specific copy number analysis of tumors) algorithm 37 to identify copy-number alterations. 11
Droplet digital PCR 12
We performed droplet digital PCR (ddPCR) using QX200TM Droplet Digital TM System 13
(Bio-Rad Laboratories) to confirm the variants that were detected by scDNA-seq but were not 14
detected by bulk NGS. ddPCRTM Supermix for Probes (No dUTP) was used with 50ng of 15
genomic DNA as a template for ddPCR assay in a 96-well plate according to the manufacture’s 16
protocol. 7ng of synthesized mutant DNA (designed through Bio-Rad Laboratories and ordered 17
through Integrated DNA Technologies) in a background of 130ng of normal human genomic 18
DNA (Promega) was used as a positive control. 50ng of normal human genomic DNA 19
(Promega) was used as a negative control. Water was used instead of DNAs for no-template 20
control reactions. Each reaction was tested in duplicate. Variant-specific primers/probes 21
(ddPCRTM Mutation Detection Assays, FAM/HEX for mutant/wildtype) were designed and 22
ordered through Bio-Rad Laboratories and are summarized in Supplementary Table 6. Data was 1
analyzed using Quanta-Soft Analysis Pro software v1.0.596 (Bio-Rad Laboratories). 2
Statistical analysis 3
Categorical variables were compared using Chi-squared or Fisher’s exact tests. 4
Continuous variables were analyzed by Student’s t-tests or Mann-Whitney U test depending on 5
the satisfaction of the statistical testing assumptions. Spearman’s rank correlation coefficient (rs) 6
was used to assess the relationships between two continuous variables that did not follow a 7
normal distribution. To evaluate cell-level co-occurrence and mutual exclusivity, a contingency 8
table was constructed to compute the log2-transformed odds ratios. Fisher’s exact test was used 9
to evaluate the statistical significance of associations. The Benjamini-Hochberg method was used 10
to adjust for multiple testing 38. In order to assess the prognostic relevance of clonal 11
heterogeneity, we collected survival information for previously-untreated 64 AML patients. 12
Overall survival was calculated from the date of pretreatment sample collection to the date of 13
death from any cause, and censored on the date of last follow-up if alive. Those who underwent 14
stem cell transplantation was censored on the date of transplantation. Kaplan-Meier plots were 15
used to visualize survival distributions. Differences in survival between groups were analyzed 16
using log-rank tests. We considered P value of less than 0.05 to be statistically significant. R 17
(ver. 3.4.3) and EZR 39 software packages were used for statistical analysis. 18
Code availability 19
Publicly available codes were used with a citation for data analysis. In-house codes that were 20
used for single-cell sequencing data variant calling are available from the corresponding author 21
on reasonable request. 22
Data availability 23
Deidentified clinical and genetic data is available in supplementary information. 1
Acknowledgments 2
This study was supported in part by the Cancer Prevention and Research Institute of Texas (grant 3
R120501 to PAF), the Welch Foundation (grant G-0040 to PAF), the University of Texas System 4
STARS Award (grant PS100149 to PAF), Physician Scientist Program at MD Anderson (to KT), 5
Lyda Hill Foundation (to PAF), the Charif Souki Cancer Research Fund (to HK), the MD 6
Anderson Cancer Center Leukemia SPORE grant (NIH P50 CA100632) (to HK), the MD 7
Anderson Cancer Center Support Grant (NIH/NCI P30 CA016672), Research Fellowships of the 8
Japan Society for the Promotion of Science for Young Scientists (to KM), and generous 9
philanthropic contributions to MD Anderson’s Moon Shot Program (to PAF, KT, GGM, and 10
HK). We thank Amy Ninetto at Department of Scientific Publications at MD Anderson for 11
providing scientific editing of the manuscript. We also thank Charles Silver, Dennis Eastburn, 12
Robert Durruthy-Durruthy, Matt Cato, Hannah Viernes, Anup Parikh, Sombeet Sahu, Kelly 13
Kaihara, and all others members of Mission Bio Inc. for the technical support. 14
15
Author contributions 16
KM performed the experiments, analyzed the data, and wrote the initial draft of the manuscript. 17
KT designed the study and wrote the manuscript. KJ, JK, and NB performed the phylogenetic 18
analysis. FW, JZ, and XS performed the bioinformatic analysis. YY performed the statistical 19
analysis. JM collected samples. LL, CG, SC, and ET performed sequencing. KP and CBR 20
performed pathologic analyses. CD, FR, EJ, MA, JC, MK, KB, GGM, and HK collected samples 21
and treated patients. NN and PAF critically reviewed the manuscript. PAF and KT provided 22
leadership and managed the study team. All authors read and approved the manuscript. 23
References 1
1 McGranahan, N. & Swanton, C. Clonal Heterogeneity and Tumor Evolution: Past, 2 Present, and the Future. Cell 168, 613-628 (2017). 3
2 Welch, J. S. et al. The origin and evolution of mutations in acute myeloid leukemia. Cell 4 150, 264-278 (2012). 5
3 Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994-1007 (2012). 6 4 Paguirigan, A. L. et al. Single-cell genotyping demonstrates complex clonal diversity in 7
acute myeloid leukemia. Sci Transl Med 7, 281re282 (2015). 8 5 Wang, Y. et al. Clonal evolution in breast cancer revealed by single nucleus genome 9
sequencing. Nature 512, 155 (2014). 10 6 Potter, N. et al. Single cell analysis of clonal architecture in acute myeloid leukaemia. 11
Leukemia (2018). 12 7 Navin, N. et al. Tumour evolution inferred by single-cell sequencing. Nature 472, 90 13
(2011). 14 8 Eirew, P. et al. Dynamics of genomic clones in breast cancer patient xenografts at single-15
cell resolution. Nature 518, 422-426 (2015). 16 9 Wang, Y. & Navin, Nicholas E. Advances and Applications of Single-Cell Sequencing 17
Technologies. Molecular Cell 58, 598-609 (2015). 18 10 Pellegrino, M. et al. High-throughput single-cell DNA sequencing of acute myeloid 19
leukemia tumors with droplet microfluidics. Genome Res (2018). 20 11 Takahashi, K. et al. Preleukaemic clonal haemopoiesis and risk of therapy-related 21
myeloid neoplasms: a case-control study. Lancet Oncol 18, 100-111 (2017). 22 12 Shouval, R. et al. Single cell analysis exposes intratumor heterogeneity and suggests that 23
FLT3-ITD is a late event in leukemogenesis. Exp Hematol 42, 457-463 (2014). 24 13 Papaemmanuil, E. et al. Genomic Classification and Prognosis in Acute Myeloid 25
Leukemia. New Engl J Med 374, 2209-2221 (2016). 26 14 The Cancer Genome Atlas Research Network. Genomic and epigenomic landscapes of 27
adult de novo acute myeloid leukemia. N Engl J Med 368, 2059-2074 (2013). 28 15 Mupo, A. et al. A powerful molecular synergy between mutant Nucleophosmin and Flt3-29
ITD drives acute myeloid leukemia in mice. Leukemia 27, 1917-1920 (2013). 30 16 Meyer, S. E. et al. DNMT3A Haploinsufficiency Transforms FLT3ITD Myeloproliferative 31
Disease into a Rapid, Spontaneous, and Fully Penetrant Acute Myeloid Leukemia. 32 Cancer discovery 6, 501-515 (2016). 33
17 Yoshimi, A. et al. Spliceosomal Dysfunction Is a Critical Mediator of IDH2 Mutant 34 Leukemogenesis. Blood 130, 473-473 (2017). 35
18 Dovey, O. M. et al. Molecular synergy underlies the co-occurrence patterns and 36 phenotype of NPM1-mutant acute myeloid leukemia. Blood 130, 1911-1922 (2017). 37
19 Huang, Y.J. et al. RUNX1 Deficiency and SRSF2 Mutation Cooperate to Promote 38 Myelodysplastic Syndrome Development. Blood 130, 119-119 (2017). 39
20 Vicent, S. et al. Wilms tumor 1 (WT1) regulates KRAS-driven oncogenesis and 40 senescence in mouse and human models. J Clin Invest 120, 3940-3952 (2010). 41
21 Pronier, E. et al. Genetic and epigenetic evolution as a contributor to WT1-mutant 42 leukemogenesis. Blood 132, 1265-1278 (2018). 43
22 Dvinge, H., Kim, E., Abdel-Wahab, O. & Bradley, R. K. RNA splicing factors as 44 oncoproteins and tumour suppressors. Nat Rev Cancer 16, 413-430 (2016). 45
23 Bejar, R. et al. Validation of a prognostic model and the impact of mutations in patients 1 with lower-risk myelodysplastic syndromes. J Clin Oncol 30, 3376-3382 (2012). 2
24 Cisowski, J., Sayin, V. I., Liu, M., Karlsson, C. & Bergo, M. O. Oncogene-induced 3 senescence underlies the mutual exclusive nature of oncogenic KRAS and BRAF. 4 Oncogene 35, 1328 (2015). 5
25 Unni, A. M., Lockwood, W. W., Zejnullahu, K., Lee-Lin, S. Q. & Varmus, H. Evidence 6 that synthetic lethality underlies the mutual exclusivity of oncogenic KRAS and EGFR 7 mutations in lung adenocarcinoma. Elife 4, e06907 (2015). 8
26 Jahn, K., Kuipers, J. & Beerenwinkel, N. Tree inference for single-cell data. Genome Biol 9 17, 86 (2016). 10
27 Shlush, L. I. et al. Identification of pre-leukaemic haematopoietic stem cells in acute 11 leukaemia. Nature 506, 328-333 (2014). 12
28 Abelson, S. et al. Prediction of acute myeloid leukaemia risk in healthy individuals. 13 Nature 559, 400-404 (2018). 14
29 Welch, J. S. Mutation position within evolutionary subclonal architecture in AML. Semin 15 Hematol 51, 273-281 (2014). 16
30 Smith, C. C., Lin, K., Stecula, A., Sali, A. & Shah, N. P. FLT3 D835 mutations confer 17 differential resistance to type II FLT3 inhibitors. Leukemia 29, 2390-2392 (2015). 18
31 Anderson, K. et al. Genetic variegation of clonal architecture and propagating cells in 19 leukaemia. Nature 469, 356 (2010). 20
32 Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by 21 multiregion sequencing. N Engl J Med 366, 883-892 (2012). 22
33 Campbell, P. J. et al. The patterns and dynamics of genomic instability in metastatic 23 pancreatic cancer. Nature 467, 1109-1113 (2010). 24
34 Yates, L. R. et al. Subclonal diversification of primary breast cancer revealed by 25 multiregion sequencing. Nat Med 21, 751-759 (2015). 26
35 van Galen, P. et al. Single-Cell RNA-Seq Reveals AML Hierarchies Relevant to Disease 27 Progression and Immunity. Cell 176, 1265-1281 e1224 (2019). 28
36 Gaiti, F. et al. Epigenetic evolution and lineage histories of chronic lymphocytic 29 leukaemia. Nature 569, 576-580 (2019). 30
37 Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc Natl Acad Sci U 31 S A 107, 16910-16915 (2010). 32
38 Benjamini, Y. & Hochberg, Y. Controlling the False Discovery Rate: A Practical and 33 Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B 34 (Methodological) 57, 289-300 (1995). 35
39 Kanda, Y. Investigation of the freely available easy-to-use software 'EZR' for medical 36 statistics. Bone marrow transplantation 48, 452-458 (2013). 37
38
39
Figure Legends 1 2 Fig. 1. The Genetic landscape of AML based on single-cell DNA sequencing. a, Distribution 3
of the number of total sequenced cells. Each point represents a sample from unique patients. b, 4
Somatic mutations in 556,951 cells from 77 AML patients detected by single-cell DNA 5
sequencing. Each column represents a cell, and cells from the same case are clustered together 6
within the areas surrounded by the grey lines. Note that some cases are difficult to be segregated 7
in print. Cells that were genotyped as being mutated or wild type for the indicated gene are 8
colored in blue and white, respectively. Cells with missing genotypes are colored in grey. When 9
one sample has multiple different mutations in the same gene, they were annotated differently 10
(e.g., NRAS_a, NRAS_b). A total of 57,953 cells that were genotyped as wild type for all the 11
variants screened are not shown. c, Correlation of the variant allele fraction (VAF) from bulk-12
sequencing and single cell DNA sequencing. The X-axis shows the VAF from the single-cell 13
genotype data (scDNA-seq VAF). The Y-axis shows the VAF from the bulk next-generation 14
sequencing (bulk VAF). Each dot represents a detected variant. The linear trendline was added to 15
best fit the distribution of the dots. The shaded area around the trendline represents the 95% 16
confidence intervals. d, Cellular-level co-occurrence of DNMT3A, NPM1, and FLT3-ITD 17
mutations. Heat map (left) shows the genotype of each sequenced cell for each variant, with 18
clustering based on the genotypes of driver mutations. Each column represents a cell at the 19
indicated scale. Cells with mutations and wild-type cells are indicated in blue and white, 20
respectively. Cells with missing genotypes are indicated in grey. The subclones located to the 21
right of the red line comprised <1% of the total sequence cells, since such small subclones can 22
represent false positive or negative genotypes as a result of ADO or multiplets. The figure on the 23
right show the pairwise association of mutations. The color and size of each panel represent the 24
degree of the logarithmic odds ratio (log OR). The bar on the right side is a key indicating the 1
association of the colors with the log OR. Co-occurrence and mutual exclusivity are indicated by 2
red and blue, respectively. The statistical significance of the associations based on the false 3
discovery rate (FDR) is indicated by the asterisks (*FDR < 0.1, **FDR < 0.05, ***FDR < 4
0.001). e, Frequency of mutation combination showing statistically-significant cell-level co-5
occurrence (FDR < 0.001). x-axis represents the combination of mutations based on mutated 6
genes, and y-axis shows the number of patients showing the significant cell-level co-occurrence 7
of each mutation combination. Mutation combinations that were detected in 3 or more patients 8
are plotted. Bars are colored based on the frequency (red if significantly co-occurred in >10 9
patients, orange if 6-10 patients, green if 4-5 patients, blue if 3 patients). f, Circos plot showing 10
the patterns of mutation co-occurrence for all genes based on the single-cell genotype data. 11
When 2 mutations co-occurred in the same cell, a ribbon connects the genes. The width of each 12
ribbon is proportional to the frequency of mutational events. g, Cellular-level co-occurrence of 13
SF3B1 and SRSF2 mutations. h-l, Cell-level mutual exclusivity patterns of somatic mutations in 14
individual samples for 5 representative cases. (h) KRAS and NRAS, (i) KIT and FLT3-ITD, (j) 15
IDH1 and IDH2, (k) FLT3-non-ITD, FLT3-ITD, and NRAS, and (l) RUNX1 p.K152fs and 16
RUNX1 p.D198N variants did not co-occur in the same cellular populations. Mut, mutated; WT, 17
wild type; Missing, missing genotype. 18
19
Fig. 2. Homozygous variants involving copy-neutral loss of heterozygosity. a-d, 20
Representative cases with highly homozygous variants analyzed by SNP array. The bar graphs 21
on the left show the distribution of zygosity for each indicated variant. Cells that were genotyped 22
as having heterozygous and homozygous mutations are shown in blue and red, respectively. The 23
numbers on the bars represent the number of cells with each genotype. The figures in the middle 1
show the distribution of the allele counts for the two alleles (green or red). The allele count is 2
shown on the vertical axes, and the chromosomes are shown on the horizontal axes. The 3
chromosomes on which the highly homozygous variants were located are highlighted with blue 4
rectangles. Distributions of depth are shown in the figures on the right based on the genotype 5
calling. Heat maps incorporating the zygosity information are also shown for cases with 6
validated homozygosity. Copy-neutral loss of heterozygosity (CN-LOH) involving the 7
homozygously called variant loci was detected by SNP array in cases with highly homozygous 8
(a) RUNX1 p.Q355X and (b) FLT3-ITD variants. Cases with homozygously called (c) SRSF2 9
p.P95R and (d) NPM1 p.L287fs variants did not have CN-LOH or any other copy-number 10
alterations. WT, wild type; Het, heterozygous; Homo, homozygous; Missing, missing genotype; 11
IQR, interquartile range. 12
13
Fig. 3. Inference of mutational history from single-cell genotype data using the SCITE 14
algorithm. a, Distribution of the number of driver mutations based on the evolution patterns 15
(branching or linear). The thick line within each box represents the median, and the top and 16
bottom edges represent the 25th and 75th percentiles, respectively. The upper and lower 17
whiskers represent the 75th percentile plus 1.5 times the interquartile range and the 25th 18
percentile minus 1.5 times the interquartile range, respectively. b-i, Phylogenetic trees for 19
representative cases illustrating distinct patterns of clonal evolution. The size of each circle is 20
proportional to the clonal population. The numbers within each circle are the number of cells and 21
the percentage of each clone among the total number of sequenced cells and the 95% credible 22
intervals from the posterior sampling to illustrate the uncertainty in the subclone sizes. (b)-(e) 23
show linear clonal evolution pattern, in which a subset of cells from the founder clone acquired 1
additional mutations in a stepwise manner. The trunk clone exhibits a forked evolution pattern 2
based on the status of additional mutations. (f)-(i) show branching clonal evolution pattern 3
characterized by the parallel acquisition of multiple functionally redundant mutations in different 4
cell populations. ADO, allele dropout; FPR, false positive rate. 5
6
Fig. 4. Various patterns of clonal evolution after therapy. The fish plot shows the inferred 7
clonal evolution pattern based on the single-cell genotype data. The phylogenetic trees visualize 8
the estimated order of mutation acquisition and the proportion of subclones with a different 9
combination of mutations at each timepoint. (a) A 74-year-old man with newly diagnosed 10
therapy-related acute myelomonocytic leukemia showing a clonal selection of FLT3 p.D835Y-11
mutated clone, a small subclone at baseline. (b) A 56-year old woman with acute 12
myelomonocytic leukemia showing the regression of NRAS-mutated clone replaced by FLT3-13
WT1 p.S381fs-mutated clone. (c) A 58-year-old-man with refractory AML showing the dynamic 14
change of subclonal architecture. The patient was started on cytarabine and quizartinib and 15
initially responded, but had a recurrent disease, and was refractory to a total of 4 cycles of 16
cytarabine and quizartinib therapy. FLT3-ITD mutated clone was cleared after the therapy, 17
whereas the remaining subclones persisted or expanded. (d) A 76-year-old man with refractory 18
secondary AML showing the adaptive behavior of subclones. Treatment with azacitidine plus 19
quizartinib followed by crenolanib monotherapy substantially shrank FLT3-ITD–mutated clone, 20
whereas the NRAS-IDH1 p.R132C-mutated clone and IDH1 p.R132S-mutated clone emerged in 21
the context of SF3B1 and SRSF2 mutations and expanded. Full case description is available in 22
the figure legends of Extended Data Fig. 13. 23
1
Fig. 5. Clonal diversity and its association with clinical outcome. 2
a, Association between clonal diversity and clinical characteristics. b, Overall survival (OS) for 3
previously-untreated patients (N=64) according to clonal diversity based on single-cell DNA 4
sequencing data. 5
6
7
h
i
Figure1
g
f
dKITJAK2TP53GATA2
PTPN11_cPTPN11_bPTPN11_aWT1_bWT1_aASXL1
RUNX1_bRUNX1_aU2AF1_bU2AF1_aEZH2SF3B1SRSF2KRAS_cKRAS_bKRAS_aNRAS_fNRAS_eNRAS_dNRAS_cNRAS_bNRAS_aIDH2IDH1
FLT3−non-ITD_cFLT3−non-ITD_bFLT3−non-ITD_a
FLT3_ITDDNMT3ANPM1
10,000 Cells
MissingMut
0
0.98
1.96
2.93
3.91
NPM
1 p.
L287
fsFL
T3−I
TDKR
AS p
.G12
S
DNMT3A p.R882HNPM1 p.L287fs
FLT3−ITD
***************
***
KRAS p.G12S
FLT3−ITD
NPM1 p.L287fs
DNMT3A p.R882H
AML-68-001 1000 Cells
Missing
WT
Mut
NRAS p.G12A
NRAS p.G12D
KRAS p.G12R
ASXL1 p.E801X
AML-45-001 1000 Cells
Missing
WT
Mut
−4.55
−2.74
−0.94
0.86
2.67
NPM
1 p.
L287
fsID
H1
p.R
132H
IDH
2 p.
R14
0QFL
T3−I
TD
DNMT3A p.R882CNPM1 p.L287fs
IDH1 p.R132HIDH2 p.R140Q
*** *** *** ****** *** ***
*** ******
FLT3−ITD
IDH2 p.R140Q
IDH1 p.R132H
NPM1 p.L287fs
DNMT3A p.R882C
AML-28-001 1000 Cells
Missing
WT
Mut
−3.95
−2.25
−0.56
1.14
2.83
NPM
1 p.
L287
fsFL
T3 p
.D83
5YFL
T3−I
TDN
RAS
p.G
12V
NR
AS p
.G12
D
DNMT3A p.R882CNPM1 p.L287fs
FLT3 p.D835YFLT3−ITD
NRAS p.G12V
*** *** ****** *** ***
*** *** ****** ***
NRAS p.G12D
NRAS p.G12VFLT3−ITD
FLT3 p.D835Y
NPM1 p.L287fs
DNMT3A p.R882C
AML-51-001 1000 Cells
Missing
WT
Mut
e
j
−4.85
−0.43
4
8.42
12.85SRSF
2 p.
P95H
FLT3
−ITD
FLT3
p.D
835E
PTPN
11 p
.F71
LW
T1 p
.R38
0WW
T1 p
.R38
0fs
NR
AS p
.G12
DID
H1
p.R
132C
SF3B1 p.K666NSRSF2 p.P95H
FLT3−ITDFLT3 p.D835E
PTPN11 p.F71LWT1 p.R380W
WT1 p.R380fsNRAS p.G12D
*** *** ** ** ** ***** ***
*** *** *** *** ****** *** ***
*** ******IDH1 p.R132C
NRAS p.G12DWT1 p.R380fsWT1 p.R380W
PTPN11 p.F71LFLT3 p.D835E
FLT3−ITDSRSF2 p.P95HSF3B1 p.K666N
AML-04-001 1000 Cells
Missing
WT
Mut
−2.81
−1.67
−0.53
0.61
1.75
NPM
1 p.
L287
fsKI
T p.
D81
6V
FLT3
−ITD
IDH2 p.R140QNPM1 p.L287fs
KIT p.D816V
*** *** *********FLT3−ITD
KIT p.D816V
NPM1 p.L287fs
IDH2 p.R140Q
AML-63-001 1000 Cells
Missing
WT
Mut
b
KITJAK2TP53GATA2
PTPN11_cPTPN11_bPTPN11_aWT1_bWT1_aASXL1
RUNX1_bRUNX1_aU2AF1_bU2AF1_aEZH2SF3B1SRSF2KRAS_cKRAS_bKRAS_aNRAS_fNRAS_eNRAS_dNRAS_cNRAS_bNRAS_aIDH2IDH1
FLT3−TKD_cFLT3−TKD_bFLT3−TKD_aFLT3_ITDDNMT3ANPM1
10,000 Cells
MissingMut
scDNA-seq VAF
bulk
VA
Fa
c
k
RUNX1 p.D198N
RUNX1 p.K152fs
DNMT3A p.R882H
IDH1 p.R132C
AML-59-001 1000 Cells
Missing
WT
Mut
−4.17
−2.31
−0.45
1.41
3.27
DN
MT3
A p.
R88
2HR
UN
X1 p
.K15
2fs
RU
NX1
p.D
198N
IDH1 p.R132CDNMT3A p.R882H
RUNX1 p.K152fs
************ *
***
0
2500
5000
7500
10000
12500
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00pseudobulk_VAF
No.
of s
eque
nced
cel
ls
rs = 0.78(p < 0.001)
86420
101214
No.
of c
ases
JAK2KIT
EZH2TP53
KRASPTPN11
U2AF1
SF3B1
IDH1ASXL1
RUNX1
SRSF2
NRAS
WT1
IDH2
FLT3−ITD DNMT3A
NPM1
00
00
0
0
0
0
3000
0
0
30000
0
30000
0
300000
30000
0
30000
0
30000
0
30000
0
3000
0
6000
0
0
3000
0
60000
0
30000
600000
30000
60000
90000
0
30000
60000
90000
120000
l
−
−
−
0.86
2.14
3.42
0.43
1.71
KRAS
p.G
12R
NR
AS p
.G12
DN
RAS
p.G
12A
ASXL1 p.E801XKRAS p.G12R
NRAS p.G12D
*** ****** ***
**
FLT3-
non-ITD
GATA2
Total 556,951 cells from 77 AML patients
log OR
log OR
log OR
log OR
log OR
log OR
log OR
* FDR <0.1** FDR <0.05 ***FDR <0.001
* FDR <0.1** FDR <0.05 ***FDR <0.001
* FDR <0.1** FDR <0.05 ***FDR <0.001
* FDR <0.1** FDR <0.05 ***FDR <0.001
* FDR <0.1** FDR <0.05 ***FDR <0.001
* FDR <0.1** FDR <0.05 ***FDR <0.001
* FDR <0.1** FDR <0.05 ***FDR <0.001
Figure2
AML-13-001,NPM1p.L287fs(chr5)
AML-03-001,FLT3-ITD(chr13)
AML-25-001,RUNX1p.Q355X(chr21)
AML-57-001, SRSF2p.P95R(chr17) mediandepth,10(IQR:8-16)vs.7IQR(6-10)
a
b
c
dmediandepth,23(IQR:16-33)vs.17IQR(13-23)
0
20
40
60
genotype
dept
h
WT Het Homo Missing
p<0.001
0
20
40
60
80
100
120
genotype
dept
h
WT Het Homo Missing
p<0.001
173 24
1440
0
4625 4263
47
108
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
100%
EZH2
p.R670C
ASXL1p.G6
42fs
RUNX
1p.Q335X
FLT3-IT
D
Het
380
5182
4730
2642
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
100%
NPM1p.L287fs
FLT3-IT
D
Het
62 347
1646 1687
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
100%
NPM1p.L287fs
SRSF2p.P95R
Het
177 27 256 2
4130 2684 1648 127
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
100%
IDH2
p.R140Q
WT1p.R462P
NPM1p.L287fs
TP53p.V172fs
Het
0
50
100
150
200
genotype
dept
h
WT Het Homo Missing
mediandepth50(IQR:32-79)vs.38IQR(24-58)p=0.00384
0
50
100
150
200
250
genotype
dept
h
WT Het Homo Missing
mediandepth29(IQR:21-40)vs.25IQR(18-35)
p<0.001
%cellswith
diffe
rentzy
gosity
%cellswith
diffe
rentzy
gosity
%cellswith
diffe
rentzy
gosity
%cellswith
diffe
rentzy
gosity
Copynum
bero
feachallele
Copynum
bero
feachallele
Copynum
bero
feachallele
Copynum
bero
feachallele
FLT3−ITD
RUNX1 p.Q335X
ASXL1 p.G642fs
EZH2 p.R670C
1000 Cells
Missing
WT
Mut−Hetero
Mut−Homo
FLT3−ITD
NPM1 p.L287fs
1000 Cells
Missing
WT
Mut−Hetero
Mut−Homo
Homo
Homo
Homo
Homo
b AML-02-001 c AML-52-001 e AML-33-001d AML-23-001
Figure3
g AML-51-001 h AML-40-001 i AML-13-001
a
�������������
����������������
������������
��������������
�������������
���
�������������������
���������������
�����������������
���������������
����������������
����
����
�������������
���
�����������������������������������
������������
������������������
���������������
�������������
���������������
����������������
���
����
����������������
����
�����������������������������������
������������
����������������
��������������
����������������
���
����
����������������
���
�����������������������������������
������������
������������������
���������������
������������
��������������������
��������������������
����
��������������������
�����������������������������������
��������������
�����������������������������
������������ �������� ����������� ����������������
���������������
��������������������
��������������������
����������������
����������������
����
����������������
�����������������������������������
������������
������������������
��������������
����������� ����������� ��������������������� ��������������������� �������������
���������������
����������������
����
����������������
����������������
����������������
����������������
����
����������������
�����������������������������������
������������
����������������
���������������
������������������
���������������
����������������
����
�������������
����������������
����
����������������
����
�����������������������������������
branching linear
2
4
6
8
10
12
num
ber o
f clo
nes
p<0.001
�������������
�������������� ���������������������������
��������
��������������������
�������������
������������ ����������������������������
���������������������������
�����������������
���������������
����������� ������������� ���������������������������
���������������
����������������
�������������������
���������������
����������������
����������������
����
����������������
�����������������������������������
f AML-38-001
BLAML-09-001
BM blast 75%
CR_C5D29 REL_C9D38AML-09-002
BM blast 62%
AML-09
NPM1_p.L287fs FLT3−ITD FLT3_p.D835E KRAS_p.G13D FLT3_p.D835Y WT1_p.P372fs
�������������������������
����������������
����������������
����������������
����������������
����
����������������
����
�������������
�
����������������
����
�������������
�
�������������
�
����������������
�������������
�
����������������
�������������������������
������������
������������
�����������
�������������������������
��������
����
�����������������������������������
FLT3-ITDNPM1
FLT3p.D835YFLT3p.D835E
KRAS
WT1p.P372fs
azacitidine +sorafenib
Figure4
a
WT1 p.P372fs
FLT3 p.D835Y
KRAS p.G13D
FLT3 p.D835ENPM1 p.L287fs
FLT3-ITD
Root
7.9%[7.2, 8.8]
340
0.0%[0.0, 0.0]
0
1.7%[1.4, 1.9]
120
5.2%[5.0, 5.5]
381
8.6%[8.4, 8.9]
629
86.4%[85.4, 87.3]
3741
0.0%[0.0, 0.0]
0
0.0%[0.0, 0.0]
0
0.0%[0.0, 0.0]
0
69.6%[69.1, 70.0]
5066
14.9%[14.3, 15.5]
1085
5.7%[5.2, 6.2]
246
ADO rate =4.9%FPR= 1.0%
BLAML-21-001
BM blast 38%
CR_C2D36 REL_D315AML-21-002
BM blast 58%
AML-21
WT1_p.A382fs NPM1_p.L287fs FLT3−ITD NRAS_p.G12D WT1_p.S381fs
�������������������������
����������������
����
����������������
���
����������������
����
�������������
�
�������������
�
�������������
�
�������������
��
�������������
��
�������������
��
����������������
����
�������������������������
�����������
������������
�������������
��������
������������
����
�����������������������������������
WT1p.A382fs NRAS
NPM1FLT3
WT1p.S381fs
clofarabine,idarubicin,cytarabine
b
NRAS p.G12D
WT1 p.S381fs
FLT3-ITD
NPM1 p.L287fs
WT1 p.A382fsRoot
71.7%[71.0, 72.3]
56810.2%
[0.0, 0.5]3
88.5%[87.1, 89.8]
12410.0%
[0.0, 0.0]0
0.0%[0.0, 0.0]
0
16.3%[15.7, 16.9]
1292
12.0%[11.5, 12.6]
952
5.4%[4.7, 6.2]
75
1.2%[0.6, 2.2]
16
4.7%[3.7, 5.8]
65
ADO rate= 3.9%FPR= 1.0%
�������������������������
����������������
��������������������
����������������
����������������
��������������������
����������������
���������������
����������������
�������������������
���������������
����������������
����������������
��������������������
���������������
����������������
����������������
�������������������
���������������
���������������
��������������
����������������
���������������
��������������
���������������
��������������������
��������������
����������������
����������������
��������������������
����������������
�����������������
����������������
����������������
����������������
���������������
����������������
�������������������������
�������������
��������������
�����������
��������
�������������
������������
������������
�����������
������������
�����������
�������������
�����������
����
�����������������������������������
C2D1AML-38-001
BM blast 33%
C3D23AML-38-002
BM blast 42%
C4D26AML-38-003
BM blast 53%
AML-38
NPM1_p.L287fsKRAS_p.G12A
IDH2_p.R140Q PTPN11_p.D61H
IDH1_p.R132HFLT3_p.D835H
NRAS_p.G13R PTPN11_p.G503A
NRAS_p.G12AKRAS_p.G12D
FLT3−ITD PTPN11_p.A72G
NPM1
KRASp.G12A
PTPN11p.D61H
FLT3-ITD
NRASp.G13R
IDH1
IDH2 NRASp.G12A
KRASp.G12D
PTPN11p.A72G
PTPN11p.G503AFLT3p.D835H
cytarabine +quizartinib cytarabine +quizartinib
c
PTPN11 p.G503A
KRAS p.G12D
PTPN11 p.D61H
FLT3 p.D835H
KRAS p.G12A
IDH1 p.R132H
FLT3-ITD
NRAS p.G13R
IDH2 p.R140QNPM1 p.L287fs PTPN11 p.A72G
NRAS p.G12A
0.8%[0.6, 1.0]
51
0.5%[0.4, 0.7]
33
7.0%[6.7, 7.3]
478
3.1%[2.7, 3.6]
209
36.4%[35.8, 36.9]
2477
5.9%5.[ 6, 6.3]
403
21.9%[21.6, 22.2]
1491
10.6%[10.3, 10.9]
719
1.7%[1.5, 1.8]
113
5.0%[4.7, 5.2]
339
5.4%[5.1, 5.8]
369
Root
1.7%[1.4, 1.9]
113
3.9%[3.0, 4.9]
85
3.2%[2.5, 3.9]
68
48.2%[46.9, 49.5]
1044
0.0%[0.0, 0.0]
0
20.8%[20.3, 21.4]
450
4.7%[4.0, 5.4]
101
0.9%[0.7, 1.1]
19
4.6%[4.3, 5.0
100
0.0%[0.0, 0.0]
0
2.2%[1.9, 2.6]
48
3.8%[3.3, 4.4]
82
7.6%[6.6, 8.6]
164
9.8%[9.2, 10.5]
737
3.4%[3.0, 3.8]
255
47.2%[46.4, 48.0]
3543
1.6%[1.3, 1.9]
118
23.4%[22.9, 23.9]
1754
5.9%[5.4, 6.2]
439
0.0%[0.0, 0.0]
0
2.7%[2.6, 2.9]
205
0.5%[0.4, 0.6]
37
2.2%[2.1, 2.3]
166
1.8%[1.7, 2.0]
137
1.5%[1.2, 1.7]
109
ADO rate= 6.2%FPR= 1.0%
C1D3AML-04-001PB blast 56%
C7D34AML-04-002
BM blast 24%
C3D2AML-04-003PB blast 51%
AML-04
SF3B1_p.K666NWT1_p.R380W
SRSF2_p.P95HPTPN11_p.F71L
IDH1_p.R132SIDH1_p.R132C
FLT3_p.D835EWT1_p.R380fs
FLT3−ITDWT1_p.R369fs
NRAS_p.G12DWT1_p.V371fs
�������������������������
����������������
�������������
��
����������������
����
����������������
���
����������������
�������������
�
����������������
�������������
��
�������������
�
�������������
�
�������������
�
�������������
�
�������������
��
����������������
���
�������������
��
�������������
��
�������������
��
����������������
���
�������������
��
�������������
�
�������������
�
�������������
��
����������������
���
�������������
��
����������������
����������������
���
�������������
��
�������������
��
����������������
����������������
����
�������������
��
�������������
��
�������������
�
����������������
����������������
���
����������������
���������������
�������������
������������
�����������
������������
��������
�������������
������������
������������
������������
������������
������������
�����������
����
�����������������������������������
crenolanib
SF3B1
FLT3-ITD IDH1p.R132C
IDH1p.R132S
SRSF2
FLT3p.D835EWT1p.R380fs
WT1p.R380W
PTPN11p.F71L
WT1p.R369fs WT1p.V371fsNRAS
azacitidine +quizartinib
d
WT1 p.R369fs
WT1 p.V371fs
IDH1 p.R132C
NRAS p.G12D
PTPN11 p.F71L
FLT3-ITD
WT1 p.R380fs
WT1 p.R380W
FLT3 p.D835E
SRSF2 p.P95H
SF3B1 p.K666N
IDH1 p.R132S
Root
0.0%[0.0, 0.0]
0
0.1%[0.0, 0.2]
4
0.0%[0.0, 0.0]
0
1.0%[0.8, 1.2]
55
16.8%[16.0, 17.7]
964
70.7%[69.7, 71.6]
4048
4.4%[4.3, 4.5]
250
0.1%[0.0, 0.2]
60.6%
[0.5, 0.8]36
0.0%[0.0, 0.0]
0
4.2%[3.8, 4.6]
240
2.1%[1.8, 2.5]
121
3.9%[3.0, 4.9]
48
1.4%[0.0, 3.6]
18
27.6%[25.6, 29.5]
345
12.3%[11.2, 13.4]
154
1.2%[0.9, 1.6]
15
3.7%[3.2, 4.2]45
2.5%[2.3, 2.6]31
0.1%[0.0, 0.2]
10.6%
[0.3, 1.0]7
36.3%[35.5, 37.1]
453
4.7%[3.8, 5.7]59
4.6%[3.7, 5.6]57
5.4%[4.6, 6.2]325
2.7%[0.0, 6.1]165
56.6%[53.8, 59.2]3419
11.7%[10.9, 12.5]
706
0.5%[0.4, 0.6]28
0.8%[0.6, 0.9]47
0.8%[0.8, 0.8]48
0.1%[0.0, 0.1]
30.5%
[0.3, 0.6]27
15.1%[14.8, 15.4]
911
2.6%[2.2, 3.0]
159
1.9%[1.6, 2.2]
113
ADO rate= 8.2%FPR= 1.0%
Figure5
0 20 40 60 80 100
0.0
0.2
0.4
0.6
0.8
1.0
OS (months)
prob
abili
ty
32 9 6 5 1 132 4 3 0 0 0
Number at risk
No. of clones<4≥4
<4≥4≥4
P=0.0469
a
b
clonal diversitylower(No. of clones <4)
higher(No. of clones ≥4) p
Median age, years (IQR) 56 (44-63) 63 (55-74) 0.0283Ontogeny 0.287
de novo 29 (83%) 30 (71%) secondary/therapy-related 6 (17%) 12 (29%)
Prior therapy 0.125previously-untreated 32 (91%) 32 (76%) relapsed/refractory 3 (9%) 10 (24%)
Karyotype 0.499normal 32 (91%) 36 (86%) others 3 (9%) 6 (14%)
Best response 0.0534complete remission 31 (97%) 25 (78%) others 1 (3%) 7 (22%)
Relapse 0.793relapsed 15 (48%) 11 (44%) never relapsed 16 (52%) 14 (56%)