next generation sequencing based multi-gene … monitor minimal residual disease and prediction of...
TRANSCRIPT
Next generation sequencing based multi-gene mutational screenfor acute myeloid leukemia using miseq: applicability for diagnostics and disease monitoring
by Rajyalakshmi Luthra, Keyur P. Patel, Neelima G. Reddy, Varan Haghshenas, Mark J. Routbort, Michael A. Harmon, Bedia A. Barkoh, Rashmi Kanagal-Shamanna, Farhad Ravandi, Jorge E. Cortes, Hagop M. Kantarjian, L. Jeffrey Medeiros, and Rajesh R. Singh
Haematologica 2013 [Epub ahead of print]
Citation: Luthra R, Patel KP, Reddy NG, Haghshenas V, Routbort MJ, Harmon MA, Barkoh BA, Kanagal-Shamanna R, Ravandi F, Cortes JE, Kantarjian HM, Medeiros LJ, and Singh RR. Next generationsequencing based multi-gene mutational screen for acute myeloid leukemia using miseq: applicability for diagnostics and disease monitoring. Haematologica. 2013; 98:xxxdoi:10.3324/haematol.2013.093765
Publisher's Disclaimer.E-publishing ahead of print is increasingly important for the rapid dissemination of science.Haematologica is, therefore, E-publishing PDF files of an early version of manuscripts thathave completed a regular peer review and have been accepted for publication. E-publishingof this PDF file has been approved by the authors. After having E-published Ahead of Print,manuscripts will then undergo technical and English editing, typesetting, proof correction andbe presented for the authors' final approval; the final version of the manuscript will thenappear in print on a regular issue of the journal. All legal disclaimers that apply to thejournal also pertain to this production process.
Copyright 2013 Ferrata Storti Foundation.Published Ahead of Print on October 18, 2013, as doi:10.3324/haematol.2013.093765.
1
Next generation sequencing based multi-gene mutational screen for acute myeloid leukemia using miseq: applicability for diagnostics and
disease monitoring
Running title: Clinical NGS for AML gene mutations
Rajyalakshmi Luthra1#, Keyur P. Patel1, Neelima G. Reddy1, Varan Haghshenas1, Mark J. Routbort1, Michael A. Harmon1, Bedia A. Barkoh1,
Rashmi Kanagal-Shamanna1, Farhad Ravandi 2, Jorge E Cortes2, Hagop M Kantarjian2, L. Jeffrey Medeiros1 and Rajesh R. Singh1#
Departments of Hematopathology1 and Leukemia2, The University of Texas M.D. Anderson Cancer Center, Houston, Texas
Correspondence
Rajyalakshmi Luthra and Rajesh R Singh, The University of Texas M. D. Anderson Cancer Center 8515 Fannin Street, Houston, TX 77054 USA. E-mail: [email protected]/ [email protected]
2
Abstract
Routine molecular testing of Acute Myeloid Leukemia involves mutational screening of
several genes of therapeutic and prognostic significance. Comprehensive analysis using
single-gene assays has high DNA requirement, cumbersome and timely consolidation of
results for clinical reporting is challenging. High throughput next generation sequencing
platforms widely used in research have not been tested vigorously for clinical application.
Here, we describe the clinical application of MiSeq, a next generation sequencing platform to
screen mutational hotspots in 54 cancer-related genes including genes relevant in Acute
Myeloid Leukemia (NRAS, KRAS, FLT3, NPM1, DNMT3A, IDH1/2, JAK2, KIT and EZH2). We
sequenced 63 Acute Myeloid Leukemia/myelodysplastic syndrome samples using MiSeq and
compared with another next generation sequencing platform (Ion-Torrent Personal Genome
Machine) and other conventional testing platforms. MiSeq detected a total of 100 single
nucleotide variants and 23 NPM1 insertions that were confirmed by Ion Torrent or conventional
platforms indicating complete concordance. FLT3 Internal tandem duplications (n=10) we not
detected, however reanalysis of MiSeq output by Pindel, an indel detection algorithm detected
them. Dilution studies of cancer cell-line DNA showed quantitative accuracy of mutation
detection up to 1.5% allelic frequency with high level of inter- and intra-run assay
reproducibility, suggesting potential utility for monitoring the therapy response, clonal
heterogeneity and evolution. Examples demonstrating the advantages of MiSeq over
conventional platforms for disease monitoring have been provided. Easy work-flow, high
throughput multiplexing capability, 4-day turnaround time and simultaneously assessment of
routinely tested and emerging markers makes MiSeq highly applicable for clinical molecular
testing of Acute Myeloid Leukemia.
3
Introduction
Acute myeloid leukemia (AML) is commonly characterized by genetic aberrations,
including gene mutations, copy number alterations and gene translocations.1, 2 In recent
years, massively parallel next generation sequencing (NGS) technology has shown great
potential in identifying AML-related genetic aberrations.3-6 The current 2008 World Health
Organization classification system requires genetic information, in addition to the clinical,
morphologic and immunophenotypic findings, for the diagnosis and classification of AML.
Several types of AML are now defined by the presence of recurrent chromosomal
translocations that result in fusion genes, and two new provisional categories have been
proposed based on the prognostic impact of these single gene defects AML with mutated
NPM1 and AML with mutated CEBPA. 7 Newer risk stratification models for AML patients also
have been proposed that incorporate data from cytogenetic studies and mutational profiling of
several critical genes, such as FLT3, NPM1, CEPBA, DNMT3A, IDH1, IDH2, KIT, MLL-PTD,
TET2, RUNX1, ASXL1 and TP53. 8, 9 For accurate risk stratification, it is important to analyze
the mutations in several genes comprehensively due to complex interactions among different
pathways in leukemogenesis 9-11. Some of these gene mutations are currently targetable with
inhibitors, such as RAS and FLT3, and other genes will likely be targetable in the future.
Recent literature also suggests that quantitative assessment of gene mutations may be useful
to monitor minimal residual disease and prediction of risk of relapse.11 Therefore, routine
molecular diagnostic testing of AML patients is indispensable for rational therapeutic decisions
and includes mutational screening of several genes, which are either targetable for therapy or
useful for prognostication.12
4
In a clinical molecular diagnostic laboratory, multi-gene screening with conventional
platforms proves challenging due to the need for relatively large quantities of DNA to assess
one gene at a time and the coordination and compilation of the results from several analysis
platforms into an integrated report. Massively parallel sequencing ability of next generation
sequencing (NGS) technologies have made high throughput and multiplexed sequencing of
specific panels of genes or whole exomes/genomes feasible.13-15 Currently, they are being
applied extensively to characterize genomic alterations in cancer16-19 and are highly relevant
for diagnostic purposes as alternatives to the first generation sequencing techniques.20
However, inclusion of NGS technologies into the clinical diagnostic arena has been delayed
due to lack of adequate guidelines and thorough validation.
MiSeq is a next generation sequencer that is well-suited for various targeted
sequencing applications and generates genomic sequence information with high accuracy. 21-
27 In this study, we used MiSeq and a customized TruSeq Amplicon Cancer Panel (TSACP) to
screen for mutations in 54 cancer-related genes in 63 samples (60 AML and 3 myelodysplastic
syndrome) in our Clinical Laboratory Improvement Amendments (CLIA)-certified laboratory.
We demonstrate the value of this approach, which requires less DNA and provides data for
multiple genes simultaneously in an integrated molecular diagnostic laboratory report with a
short turnaround time.
Methods
Tumor samples and sequencing platforms. A total of 63 patient samples that included 62 bone
marrow aspirates and 1 peripheral blood, from patients with the diagnosis of AML (n=60) or
myelodysplasia (n=3) were assessed using MiSeq sequencer (illumina Inc, San Diego, CA).
5
DNA was extracted using the Autopure extractor (QIAGEN/Gentra, Valencia, CA) and
quantified using the Qubit DNA BR assay kit (Life Technologies, Carlsbad, CA). Out of these,
52 samples were identified from the archives of the Molecular Diagnostics Laboratory at The
University of Texas MD Anderson Cancer Center based on their mutational status ascertained
by conventional mutation detection platforms in the laboratory. Forty-four of the archival
samples were sequenced using both the MiSeq and IT-PGM (Life Technologies). Eleven
additional samples were sequenced on the MiSeq instrument in parallel (blind) with routinely
used platforms, including Sanger sequencing, pyrosequencing and fragment analysis by
capillary electrophoresis (CE) performed as described previously. 28-30 To confirm mutations
detected in regions covered by TSACP but not by preexisting assays in our laboratory
(AmpliSeq cancer panel, Sanger or pyro sequencing) 42 new Sanger sequencing assays were
also developed.
Customized TruSeq Amplicon Caner Panel.
TruSeq Amplicon Cancer Panel (TSACP) interrogates mutational hotspots in 48 cancer-related
genes (listed in supplementary methods). TSACP consists of 212 pairs of probes designed to
bind flanking genomic areas of interest. Additionally, 7 more probe pairs were designed by
illumina on our request, which were spiked into the original TSCAP to interrogate mutational
hotspots in 3 additional AML-related genes (DNMT3A, IDH2 and EZH2) and 3 chronic
lymphocytic leukemia (CLL)-related genes (XPO1, KLHL6 and MYD88). Library preparation
and sequencing using MiSeq was performed as per manufacturer’s instruction (details
provided in supplementary methods).
6
Validation of version 2 upgrade on MiSeq. The Version 2 (V2) upgrade on MiSeq has changes
in sequencing chemistry and imaging to allow sequencing twice as many samples as V1. To
validate its accuracy and efficiency we sequenced in parallel 12 archival AML samples
(additional to the above mentioned cohort) using MiSeq V1 and the V2 upgrade. To
compensate for the higher sequencing capacity of V2, 12 additional samples were included in
the sequencing run with V2, which facilitated optimal comparison.
Variant calling and data analysis. Human genome build 19 (hg19) was used as the reference.
Alignment to hg19 genome and variant calling was performed by MiSeq Reporter Software
1.3.17. Integrative Genomics Viewer (IGV)31 was used to visualize the read alignment and
confirm the variant calls. A sequencing coverage of 250X (bi-directional) and a minimum
variant frequency of 5% in the background of wild type (WT) were used as cutoffs for clinical
reporting. An in-house custom developed software32 designated as OncoSeek, was used to
interface the data with IGV and to annotate the sequence variants. For identification of FLT3
internal tandem duplications (ITDs), Pindel33 analysis (source:
http://www.ebi.ac.uk/~kye/pindel/ and version Pindel0.2.4t) was performed using the BAM files.
Sequencing using Ion Torrent Personal Genome Machine (IT-PGM). Library preparation and
sequencing on IT-PGM was performed as described earlier.34 (Details provided in
supplementary methods).
7
RESULTS
MiSeq sequencing run metrics. In 16 different V1 sequencing runs, the average cluster
density of sequencing template generated per square millimeter (mm2) in the flow cell
ranged from 502,000 to 1,186,000 clusters per mm2 with a median of 950,000 clusters per
mm2. The cluster which passed filter or clusters from which sequencing information could be
obtained (without signal overlap from surrounding clusters) ranged from 460,000 to
1,065,000 clusters per mm2 with a median value of 851,950 per mm2 (Supplementary Figure
1A). The total sequencing output for the runs ranged from 1.1 to 2.2 Gigabases (Gb) with
sequencing quality score > Q30 with a median value of 1.9 Gb (Supplementary Figure 1B).
The total sequencing reads obtained from the run ranged from 4,079,435 to 9,570,965, with
a median value of 7,424,774. The identified sequencing reads or reads pass filter (with
chastity scores of ≥ 0.6) ranged from 3,716,549 to 8,080,211 reads with a median value of
6,641,052 (Supplementary Figure 1C) indicating that 72.4-96.1% of the reads (median,
95.3%) were ‘identified reads’ or reads that could be identified with the barcode indexes
used (Supplementary Figure 1D). The average sequencing depth per base per sample in
these runs was 1455X, which indicated adequate sequencing of the samples.
Sensitivity of mutation detection. To measure the mutation detection sensitivity, we
sequenced H2122 cell line DNA (with a homozygous KRAS mutation, p.G12C and a
heterozygous MET mutation, p.N375S) diluted into HL60 cell line DNA to provide different
levels of H2122 DNA, ranging from 100% (undiluted), 20% (1:3), 10% (1:9) and 5% (1:19).
Sequencing results from two independent experiments showed efficient detection of these
two mutations. The presence of mutations at all dilutions was clearly evident in the aligned
8
sequencing reads for KRAS (p.G12C) (Figure 1A, upper panel) and MET (p.N375S) (Figure
1A, lower panel). The concordance between the average variant frequencies detected at
different dilution levels and the expected frequencies are shown in Figure 1B, upper and
lower panels.
Concordance of MiSeq with IT-PGM. In the 44 tumor samples analyzed in parallel using the
MiSeq and IT-PGM sequencers a high degree of concordance was observed. A total of 110
variants (single nucleotide variations and insertions) were detected by MiSeq, 102 of which
were confirmed by IT-PGM. The 8 variants not detected by IT-PGM were not covered by
the AmpliSeq Cancer panel but were confirmed by Sanger sequencing (Supplementary
Table 1). Several variants listed in this table are potential germ line polymorphisms (KDR
p.Q472H, MET p.N375S, MET p.T1010I and KIT p.M541L). They were also listed in the
comparison study (Supplementary Table 1) as they help in establishing the overall mutation
detection accuracy and specificity of MiSeq. High degree of concordance in variant
detection was also observed between MiSeq and Sanger sequencing, pyrosequencing and
fragment analysis by capillary electrophoresis. For example, in a representative archival
sample a KRAS p.G60V (GGT>GTT) mutation was detected by pyrosequencing (Figure
2A) and was detected at a variant frequency of 45.6% at a sequencing depth of 3931X by
MiSeq (Figure 2B). IT-PGM also detected this mutation at a comparable frequency of 39%
(3120X) (Figure 2C). Similarly, an IDH2 p.R132H (CGT>CAT) (Figure 3A) mutation
detected by Sanger sequencing in a sample was also detected by MiSeq (42.0%, 3,672X)
(Figure 3B) and IT-PGM (43.1%, 3,318X) (Figure 3C). In a third example, a 4bp insertion
(G>GTCTG) detected in exon 11 of NPM1 by fragment analysis coupled to CE at 42.6%
9
(Figure 4A) was detected by MiSeq (32.8%, 1,164X) (Figure 4B) and also by IT-PGM
(26.5%, 2,249X) (Figure 4C).
Parallel comparison of mutation detection using MiSeq and routinely used testing platforms.
Eleven samples were sequenced using the MiSeq in parallel (blind) with conventional
techniques used routinely in our laboratory. In these samples, 14 sequence variants detected
by MiSeq were confirmed by Sanger sequencing (n=8), pyrosequencing (n=5) and fragment
analysis by CE (n=1) indicating complete concordance (Supplementary Table 2).
Detection of insertions by MiSeq. In our archival set of 44 samples, 23 samples had a 4 bp
insertion in NPM1 and 2 samples had FLT3 internal tandem duplications (ITDs). MiSeq
reporter detected all 4bp insertions in NPM1 but did not detect the 2 FLT3-ITDs (sample no. 3
and 40, Supplementary Table 1). These ITDs were also not detected by IT-PGM. To further
assess the ability of MiSeq reporter to call FLT3-ITDs, we sequenced an additional 8 archival
samples with known FLT3 ITDs of various sizes and allele frequencies as detected by CE.
MiSeq did not call any of the FLT3 ITDs present in these samples. However, using Pindel, an
indel detection algorithm, we were able to detect and map each of the ITDs in the MiSeq
sequencing output with complete concordance with CE based fragment analysis method
(Supplementary Table 3).
Comparison of Version 1 and Version 2 sequencing on MiSeq. During our study a MiSeq
upgrade was introduced with changes in the sequencing chemistry and imaging (referred to as
version 2 or V2), which doubles the throughput capacity in comparison with Version1 (V1). To
10
validate the upgrade, we sequenced 12 samples using MiSeq V1 and V2 in parallel. To
compensate for the higher sequencing capacity of V2 and to facilitate an optimal comparison,
we included 12 additional samples in the sequencing run with V2. A comparison of the run
metrics showed comparable cluster density and clusters pass filter between the two versions
(Supplementary Figure 2A). A higher sequencing capacity of V2 was evident by the 4.7 Gb
sequencing output in comparison with the 2.1Gb V1 output, whereas the sequencing quality
(Q30) remained comparable (Supplementary Figure 2B). The increased sequencing capacity
of V2 was evident in the doubling of the total sequencing reads and reads pass filter (PF) with
the total overall percentage of reads identified remaining comparable (Supplementary Figure
2C, upper panel). Similarly, total reads and average coverage per base per sample remained
highly comparable between V1 and V2 (Supplementary Figure 2C, lower panel). Furthermore,
a high degree of concordance was also observed in the sequencing coverage and mutations
detected by V1 and V2 (Supplementary Table 4) indicating that the V2 upgrade maintained the
same efficiency and accuracy as V1.
Sensitivity of mutation detection using V2 sequencing chemistry. Sensitivity analysis was
performed using several sequentially diluted samples of the cell line DLD1 DNA (harboring 8
heterozygous mutations) into normal control DNA. Across six different sequencing runs each
of eight heterozygous (monoallelic) mutations present in DLD1 were successfully detected in
four sequential dilutions (50%, 25%, 10% and 5%) with minimal variation in the overall variant
frequencies. The lowest detection limit was seen for the IDH1 p.G97D variant detected at a
frequency of 1.5% in the 1:19 or 5% diluted DLD1 sample, indicating high detection sensitivity
(Figure 5). Supplementary Table 5 provides a detailed summary of the dilution studies with
11
H2122 (V1) and DLD1 DNA (V2) with expected and detected variant frequencies of the
mutations at each dilution level.
Inter- and Intra-run reproducibility. Inter-run reproducibility was assessed by sequencing a 1:9
(10%) dilution of DLD1 DNA in 11 separate multiplexed sequencing runs. The results showed
that each of the eight expected mutations was consistently detected in every run with minimal
variability in variant frequencies (Supplementary Figure 3A). Intra-run assay reproducibility was
assessed by sequencing the (1:9) 10% DLD1 DNA with 24 different barcode indexes and
sequencing them on the same multiplexed sequencing run. Again, consistent detection of each
mutation in the 24 samples analyzed with comparable variant frequencies was observed
indicating high inter-assay reproducibility (Supplementary Figure 3B).
Disease monitoring or assessment of minimal residual disease (MRD). We evaluated the utility
of highly sensitive sequencing on MiSeq in monitoring minimal residual disease using 4
representative cases with archival samples analyzed and reported in our laboratory by
routinely used platforms and subsequently sequenced by MiSeq for this study. For patient 1
(Figure 6A), we sequenced five consecutive pre-treatment (day 1) and post-treatment (day 10,
185, 455, and 475) samples by MiSeq and compared to the results from conventional test
platforms. To begin with (day 1) the sample had two mutations, IDH2 p.R140Q and 4bp
insertion in exon 11 of NPM1 which were detected by Sanger sequencing analysis and
fragment analysis, respectively. MiSeq also detected these mutations. However, in subsequent
samples obtained at 10 days and 185 days, MiSeq detected the IDH2 p.R140Q mutation at an
allele frequency of 5.5% and 2.8%, respectively, and the NPM1 mutation at an allele frequency
12
of 6% and 2%, respectively. By contrast, Sanger sequencing did not detect any IDH2 mutation
in the 10 day and 185 day samples and it was reported negative, indicating the advantage of
higher sensitivity of MiSeq. In these 2 samples, capillary electrophoresis did detect the 4bp
insertions in NPM1 at 13.6% and 10.5%, respectively. In later samples obtained at 455 and
475 days, MiSeq detected a FLT3 point mutation p.D835Y as well as the IDH2 and NPM1
mutations originally detected in the sample, which were also, detected using the routine
alternative platforms (Figure 6A).
In the second sample (Figure 6B); mutations in four genes were detected at diagnosis
and after 35 days (NRAS, NPM1, DNMT3A and FLT3-ITD). At 55 and 95 days the NRAS and
NPM1 mutations decrease drastically while the FLT3-ITD and DNMT3A mutations persist
indicating the presence of distinct clonal population with different somatic mutations.
Furthermore, at 95 days two low FLT3 p.D835 mutations appeared making the total of
mutations to be followed for disease monitoring to six indicating the need to follow multiple
mutations, which can be conveniently achieved on MiSeq. In the third example (Figure 6C),
two NRAS mutations (p.G12S and p.G12C) were detected at diagnosis, of which p.G12S
decreased considerably with treatment indicating their occurrence in distinct clones. However,
an activating FLT3 kinase domain mutation (p.Y842H)35 with implications in therapy response
was observed at low levels at diagnosis and progressively increased in allelic frequency over
time. As this mutation is not routinely tested for AML, it was not tested by conventional
platforms but was identified by the virtue of multi-gene screening on MiSeq. In the fourth
sample (Figure 6D), four somatic mutations were detected at the onset of the disease in IDH2,
NRAS, NPM1 and DNMT3A whereas only the DNMT3A mutation remained detectable after 33
days. In these cases, the ability to monitor multiple somatic mutations at high sensitivity by a
13
single platform, MiSeq, highlights the advantages of the NGS platforms which not only helps in
conserving precious samples but also improving analysis and clinical reporting workflows.
Discussion
AML is a genetically heterogeneous disease and history has shown that identification of
genomic abnormalities has identified prognostically meaningful subgroups with therapeutic
implications. Therefore, more accurate identification of genetic lesions is needed for better
disease classification, risk stratification and treatment strategies. Comprehensive routine
screening using conventional or first generation sequencing platforms in a clinical molecular
diagnostics laboratory is very challenging, requiring high amounts of DNA, labor-intensive,
expensive, and logistically difficult in terms of data integration in real-time. Next generation
technologies are highly advantageous with their decreasing cost, high throughput and
multiplexing capacity thereby facilitating parallel and targeted sequencing of all genes of
interest on a routine basis. However, sample preparation, sequencing and variant calling
components of NGS technologies need to be validated before they can be applied for routine
diagnostic purposes.
In recent years, NGS studies have revealed numerous previously unrecognized or
under-recognized molecular changes of therapeutic and prognostic significance. Hence,
upfront screening for genetic abnormalities and their subsequent monitoring is essential for
optimal patient management. Here we have rigorously tested the applicability of a NGS-based
screen using MiSeq by analyzing both archival samples with known mutations and samples
assessed prospectively. For single nucleotide variants, complete concordance was observed
between MiSeq and all other sequencing methods including the IT-PGM, another NGS
14
platform. MiSeq also successfully detected all 4bp insertions in NPM1 in this cohort. MiSeq
software, however, could not identify FLT3-ITDs. A similar deficiency was also observed for IT-
PGM. To detect FLT3-ITDs using MiSeq, reanalysis of the aligned sequencing data using
another program, namely Pindel was necessary.
The sequencing results using MiSeq showed a high level of inter- and intra-run
reproducibility with consistent detection of mutations. Dilution studies with 2 cell lines harboring
a total of 10 mutations across 7 chromosomes showed MiSeq can detect every mutation at
each dilution level tested with variant frequency as low as 1.5%. This high detection sensitivity
is advantageous for better monitoring the efficacy of therapy as evident in one of the examples
included in the study where an IDH2 mutation missed by Sanger sequencing in two
longitudinal samples but was detected by MiSeq (Figure 6A). This mutation recurred at high
levels subsequently indicating reemergence of the same clone. High detection sensitivity of
MiSeq also has the potential to detect new clones at an earlier time point than less sensitive
conventional mutation detection methods. Furthermore, the high throughput capacity of MiSeq
allows for the screening of multiple genes over the course of therapy thereby maximizing the
chances of detecting any other emergent clone that would be missed using classical single-
gene screening approaches.
A summary of all somatic mutations detected by MiSeq in our study has been provided
in Supplementary Figure 4. It is interesting to note that 23% of mutations were found in genes
which are not routinely tested in AML but are important emerging markers with implications in
disease outcome. For instance, JAK2 p.V617F mutations have been found to occur at low
levels in AML with an aberrant karyotype, more prevalent in AML patients with a preceding
myeloproliferative neoplasm36 and associated with aggressive disease and recurrence37.
15
Similarly, mutations in PTPN11, which encodes a non-receptor tyrosine phosphatase SHP-2
occur at low levels in a subset of AML occurring in both adults and children. 38, 39 Furthermore,
in 78% of AML patients with a complex karyotype, loss of one copy of the TP53 gene and
mutation of the other has been reported 40, however, the prevalence in AML cases with normal
karyotype is very low (~ 2%) 41. It is also noteworthy that two cases in our study also harbored
FLT3 Y842H mutation, which is implicated in resistance to targeted therapy but is not yet
routinely tested in AML. Three of the genes added to TSACP (XPO1, KLHL6 and MYD88)
were those which are found to be mutated in CLL, however it is of interest to note that
application of this test for routine screening of 875 AML/MDS cases in our laboratory has
identified three mutations in XPO1 (p. E571K) and KLHL6 (p. A93G and p.A9G) indicating their
occurrence in AML/MDS, albeit at lower frequencies. On the whole, the above mentioned
examples demonstrate the advantages of higher sensitivity and simultaneous screening of
multiple genes by NGS in identifying clonal heterogeneity and additional genomic aberrations
in markers not routinely screened in AML.
The testing of mutation detection ability of this assay in each of the 54 genes was not
feasible due to lack of positive patient or cell line samples. However, in our set of 63 AML/MDS
samples, 100 single nucleotide variants and 33 insertions were detected by this platform and
confirmed by alternate platforms (IT-PGM, CE and Sanger sequencing) indicating complete
concordance and high specificity of mutation detection. We also detected a total of 73 somatic
mutations in 7 genes routinely tested in AML/MDS (Supplementary Figure 4). Furthermore, we
have also established the overall performance of the test (specificity, sensitivity and
reproducibility) for routine clinical use. Our experience has shown that the average sample
preparation using TSACP is less than 9 hours ( including library preparation, concentration
16
normalization, sample multiplexing and loading on MiSeq), with only 4 hours of hands-on
time and provides high multiplexing capacity of up to 12 samples (V1) and 24 samples (V2) per
run. All subsequent sequencing and analysis steps involving cluster generation, sequencing,
base-calling, sequence alignment and variant calling are performed on board and takes 28
hours (27h for sequencing and 1h for analysis) making MiSeq a very self-sufficient sequencer.
The average turnaround time of 72 hours, starting from DNA extraction to completion of
sequencing can be achieved if batching of samples is not required and is very desirable in
medium and high-volume laboratories to offset costs. Furthermore, the convenience and
efficiency of screening several routinely tested and AML- associated genes using only 250ng
of DNA and reporting the data simultaneously in a single clinical report enables efficient use of
patient material and resources in a molecular diagnostic laboratory. On the whole the overall
simplicity of sample preparation, the self-sufficiency of MiSeq sequencer and the associated
high specificity, sensitivity and accuracy makes it a highly applicable next-generation
sequencer for routine mutation screening of AML samples in a clinical molecular diagnostics
laboratory.
17
Authorship and Disclosures
Contributions
Luthra R and Singh RR: Designed the study, planned experiments and their execution, data
analysis and manuscript writing
Reddy NG: Conducted experiments and data analysis
Haghshenas V, Barkoh BA and Harmon MA: conducted experiments
Patel KP: Pathology review and review of results
Kanagal-Shamanna R: Review of clinical history and cytogenetics
Routbort MJ: Informatics support for NGS data output and annotation
Medeiros LJ: Review of results and manuscript writing
Kantarjian H, Cortes J, Ravandi F: Review of the results and clinical information
Disclosures
R. Luthra, Patel KP, Reddy NG, Haghshenas V, Barkoh BA, Harmon MA, Kanagal-Shamanna
R, Routbort MJ, Ravandi F, Medeiros LJ and Singh RR have no conflicts of interest to declare.
Kantarjian H and Cortes J have grant support to declare but outside the submitted work.
Completed COI declaration forms from all authors have been submitted as supplementary
material with the revised manuscript.
18
REFERENCES.
1. Owen C, Barnett M, Fitzgibbon J. Familial myelodysplasia and acute myeloid leukaemia-a review. Br J Haematol. 2008;140(2):123-32. 2. Sanders MA, Valk PJ. The evolving molecular genetic landscape in acute myeloid leukaemia. Curr Opin Hematol. 2013;20(2):79-85. 3. Ley TJ, Ding L, Walter MJ, McLellan MD, Lamprecht T, Larson DE, et al. DNMT3A mutations in acute myeloid leukemia. N Engl J Med. 2010;363(25):2424-33. 4. Ley TJ, Mardis ER, Ding L, Fulton B, McLellan MD, Chen K, et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature. 2008;456(7218):66-72. 5. Ding L, Ley TJ, Larson DE, Miller CA, Koboldt DC, Welch JS, et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature. 2012;481(7382):506-10. 6. Welch JS, Ley TJ, Link DC, Miller CA, Larson DE, Koboldt DC, et al. The origin and evolution of mutations in acute myeloid leukemia. Cell. 2012;150(2):264-78. 7. Swederlow SH, Campo E, Harris NLE. WHO classification of tumours of haematopoietic and lymphoid tissues, International Agency for Research on Cancer Press, Lyon, France. 2008. 8. Grossmann V, Schnittger S, Kohlmann A, Eder C, Roller A, Dicker F, et al. A novel hierarchical prognostic model of AML solely based on molecular mutations. Blood. 2012;120(15):2963-72. 9. Patel JP, Gonen M, Figueroa ME, Fernandez H, Sun Z, Racevskis J, et al. Prognostic relevance of integrated genetic profiling in acute myeloid leukemia. N Engl J Med. 2012;366(12):1079-89. 10. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N Engl J Med. 2013;368(22):2059-74. 11. Estey EH. Acute myeloid leukemia: 2013 update on risk-stratification and management. Am J Hematol. 2013;88(4):318-27. 12. Burnett A, Wetzler M, Lowenberg B. Therapeutic advances in acute myeloid leukemia. J Clin Oncol. 2011;29(5):487-94. 13. Mardis ER. Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet. 2008;9:387-402. 14. Mardis ER. The impact of next-generation sequencing technology on genetics. Trends Genet. 2008;24(3):133-41. 15. Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010;11(1):31-46. 16. Loyo M, Li RJ, Bettegowda C, Pickering CR, Frederick MJ, Myers JN, et al. Lessons learned from next-generation sequencing in head and neck cancer. Head Neck. 2013;35(3):454-63. 17. Mardis ER. Applying next-generation sequencing to pancreatic cancer treatment. Nat Rev Gastroenterol Hepatol. 2012;9(8):477-86. 18. Ramsay AJ, Martinez-Trillos A, Jares P, Rodriguez D, Kwarciak A, Quesada V. Next-generation sequencing reveals the secrets of the chronic lymphocytic leukemia genome. Clin Transl Oncol. 2013;15(1):3-8. 19. Toffanin S, Cornella H, Harrington A, Llovet JM. Next-Generation Sequencing: Path for Driver Discovery in Hepatocellular Carcinoma. Gastroenterology. 2012/09/26 ed, 2012.
19
20. Ku CS, Cooper DN, Roukos DH. Clinical relevance of cancer genome sequencing. World J Gastroenterol. 2013;19(13):2011-18. 21. Harismendy O, Schwab RB, Bao L, Olson J, Rozenzhak S, Kotsopoulos SK, et al. Detection of low prevalence somatic mutations in solid tumors with ultra-deep targeted sequencing. Genome Biol. 2011;12(12):R124. 22. Wang C, Krishnakumar S, Wilhelmy J, Babrzadeh F, Stepanyan L, Su LF, et al. High-throughput, high-fidelity HLA genotyping with deep sequencing. Proc Natl Acad Sci U S A. 2012;109(22):8676-81. 23. Eyre DW, Golubchik T, Gordon NC, Bowden R, Piazza P, Batty EM, et al. A pilot study of rapid benchtop sequencing of Staphylococcus aureus and Clostridium difficile for outbreak detection and surveillance. BMJ Open. 2012;2(3). 24. Liu L, Li Y, Li S, Hu N, He Y, Pong R, et al. Comparison of next-generation sequencing systems. J Biomed Biotechnol. 2012;2012:251364. 25. Loman NJ, Misra RV, Dallman TJ, Constantinidou C, Gharbia SE, Wain J, et al. Performance comparison of benchtop high-throughput sequencing platforms. Nat Biotechnol. 2012;30(5):434-9. 26. Quail MA, Smith M, Coupland P, Otto TD, Harris SR, Connor TR, et al. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics. 2012;13:341. 27. Junemann S, Sedlazeck FJ, Prior K, Albersmeier A, John U, Kalinowski J, et al. Updating benchtop sequencing performance comparison. Nat Biotechnol. 2013;31(4):294-6. 28. Singh RR, Bains A, Patel KP, Rahimi H, Barkoh BA, Paladugu A, et al. Detection of high-frequency and novel DNMT3A mutations in acute myeloid leukemia by high-resolution melting curve analysis. J Mol Diagn. 2012;14(4):336-45. 29. Verma S, Greaves WO, Ravandi F, Reddy N, Bueso-Ramos CE, O'Brien S, et al. Rapid detection and quantitation of BRAF mutations in hairy cell leukemia using a sensitive pyrosequencing assay. Am J Clin Pathol. 2012;138(1):153-6. 30. Deqin M, Chen Z, Nero C, Patel KP, Daoud EM, Cheng H, et al. Somatic deletions of the polyA tract in the 3' untranslated region of epidermal growth factor receptor are common in microsatellite instability-high endometrial and colorectal carcinomas. Arch Pathol Lab Med. 2012;136(5):510-6. 31. Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nat Biotechnol. 2011;29(1):24-6. 32. Routbort MH, B. Patel, K.P, Singh, R.R. Aldape,K. Reddy,N.G. Barkoh,B.A et al. OncoSeek - A versatile Annotation and Reporting System for Next Generation Sequencing-Based Clinical Mutation Analysis of Cancer Specimens, Abstract - Association For Molecular Pathology, Annual Conference. J Mol Diagn. 2012;14(6). 33. Ye K, Schulz MH, Long Q, Apweiler R, Ning Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics. 2009;25(21):2865-71. 34. Singh RR, Patel KP, Routbort MJ, Reddy NG, Barkoh BA, Handal B, et al. Clinical validation of a next-generation sequencing screen for mutational hotspots in 46 cancer-related genes. J Mol Diagn. 2013;15(5):607-22. 35. Bagrintseva K, Schwab R, Kohl TM, Schnittger S, Eichenlaub S, Ellwart JW, et al. Mutations in the tyrosine kinase domain of FLT3 define a new molecular mechanism of
20
acquired drug resistance to PTK inhibitors in FLT3-ITD-transformed hematopoietic cells. Blood. 2004;103(6):2266-75. 36. Levine RL, Wadleigh M, Cools J, Ebert BL, Wernig G, Huntly BJ, et al. Activating mutation in the tyrosine kinase JAK2 in polycythemia vera, essential thrombocythemia, and myeloid metaplasia with myelofibrosis. Cancer Cell. 2005;7(4):387-97. 37. Illmer T, Schaich M, Ehninger G, Thiede C. Tyrosine kinase mutations of JAK2 are rare events in AML but influence prognosis of patients with CBF-leukemias. Haematologica. 2007;92(1):137-8. 38. Hou HA, Chou WC, Lin LI, Chen CY, Tang JL, Tseng MH, et al. Characterization of acute myeloid leukemia with PTPN11 mutation: the mutation is closely associated with NPM1 mutation but inversely related to FLT3/ITD. Leukemia. 2008;22(5):1075-8. 39. Tartaglia M, Martinelli S, Iavarone I, Cazzaniga G, Spinelli M, Giarin E, et al. Somatic PTPN11 mutations in childhood acute myeloid leukaemia. Br J Haematol. 2005;129(3):333-9. 40. Rucker FG, Schlenk RF, Bullinger L, Kayser S, Teleanu V, Kett H, et al. TP53 alterations in acute myeloid leukemia with complex karyotype correlate with specific copy number alterations, monosomal karyotype, and dismal outcome. Blood. 2012;119(9):2114-21. 41. Fernandez-Mercado M, Yip BH, Pellagatti A, Davies C, Larrayoz MJ, Kondo T, et al. Mutation patterns of 16 genes in primary and secondary acute myeloid leukemia (AML) with normal cytogenetics. PLoS One. 2012;7(8):e42334.
21
Figure Legends
Figure 1. Sensitivity of detection using MiSeq. (A) A representative image of the aligned
sequencing reads as visualized in Integrative Genome Viewer shows a progressive decrease
in the single nucleotide variant detected in the background of wild type sequence in
sequentially diluted H2122 cell line DNA into HL60 cell line DNA. With dilution, a clear and
proportionate decrease in the homozygous KRAS (GGT>TGT, p.G12C) and a heterozygous
MET (AAC>AGC, p.N375S) mutation are evident. KRAS gene has the reverse orientation on
chromosome 12. However, by default, IGV exhibits aligned reads in ‘forward’ orientation.
Hence the substituted nucleotide appears as ‘A’ instead of ‘T’ in the reads (CCA>ACA or
GGT>TGT). The directions of gene orientation are indicated by the arrows. (B) The expected
variant frequencies and the average variant frequency of the KRAS (p.G12C) and MET
(p.N375S) mutations detected in two independent sensitivity analyses are shown.
Figure 2. Concordance of MiSeq, Pyrosequencing and IT-PGM. A representative sample in
which a KRAS mutation (p.G60V) identified by pyrosequencing was also detected and called
by both MiSeq and the Ion-Torrent Personal Genome Machine. (A) The pyrosequencing
results showing the base change (GGT>GTT, indicated by the arrow) in KRAS in comparison
with wild type control. (B) The same mutation change is evident in the aligned reads from the
MiSeq sequencing output at a very high coverage of 3,931X and showing a variant frequency
of 45.6%. (C) Sequencing on the IT-PGM also confirms the same mutation at a sequencing
depth and variant frequency comparable with the MiSeq. KRAS gene has the reverse
orientation on chromosome 12. As the default setting on IGV exhibits aligned reads in ‘forward’
22
orientation the substituted nucleotide appears as ‘A’ instead of ‘T’ in the reads (CCA>CAA or
GGT>GTT). The orientation of the gene is depicted by the arrows in panels B and C.
Figure 3. Concordance of MiSeq, Sanger sequencing and IT-PGM. A mutation in IDH2
(p.R132H) resulting from a CGT>CAT substitution originally detected by Sanger sequencing
(A) is also detected clearly in the MiSeq sequencing output and (B) sequencing using the IT-
PGM with comparable coverage and mutational frequency between both platforms. IDH2 has
reverse orientation on chromosome 15. The default setting in IGV shows aligned reads in
forward orientation only. Hence, the substituted nucleotide appears as ‘T’ instead of ‘A’ in the
reads (CGT>CAT or GCA>GTA). The orientation of the gene is depicted by the arrows in
panels B and C.
Figure 4. Ability of sequencing using MiSeq to detect insertions in NPM1. (A) A characteristic 4
bp insertion (G>GTCTG) in exon 11 of NPM1 gene is evident in this sample as detected by
capillary electrophoresis and is present at a ratio of 0.426 or 42.6%. (B) Sequencing of the
same sample using MiSeq also detected and called this mutation at a sequencing depth of
1,164X and a variant frequency of 32.8%. (C) IT-PGM also detected and called the same 4 bp
insertion at a sequencing depth of 2,249X and a variant frequency of 26.5% in the background
of wild type sequence. The presence of the insertions in the aligned reads are indicated as a
characteristic red bars in IGV as seen in panels B and C ( pointed by the arrows).
Figure 5. Assay sensitivity for V2 upgrade. Undiluted (100%) and sequentially diluted samples
of DLD1 cell line DNA into normal female control DNA (50%, 25%, 10% and 5%) were
23
sequenced in 6 different sequencing runs and the ability to detect the 8 different heterozygous
(monoallelic) mutations present in each dilution was tested. These mutations were successfully
detected in every dilution tested in all of the sequencing runs with minimal variation in the
variant frequency detected.
Figure 6. Application for disease monitoring. (A) An IDH2 and NPM1 mutation detected in a
patient at the onset of the disease were missed subsequently when present at low levels (days
10 and 185) when monitored by Sanger sequencing but were effectively detected and called
by MiSeq. The time point of clinical remission is indicated by the arrow (B) Mutations in
DNMT3A and NRAS, and insertions in NPM1 and FLT3 were detected at diagnosis and
followed subsequently. Two low-lying FLT3 D835 mutations also appear at day 95 indicating
the need to follow multiple mutations. The FLT3 ITD percentages as detected by CE are
plotted. The time point of clinical remission is indicated by the arrow (C) Two NRAS mutations
were detected and one of it disappears with treatment indicating distinct clones. A FLT3
Y842H activating mutation, not routinely tested was also detected by multi-gene screening on
MiSeq (D) Mutations were detected in 4 genes at diagnosis warranting testing of each of them
post treatment. The mutiplexing capacity of MiSeq would help consolidating them on to a
single platform. Data presented in sub panels C and D were patients with refractory AML.
1
Supplementary Methods
Genes interrogated by TruSeq Cancer Amplicon Panel: AKT1, BRAF, FGFR1, GNAS,
IDH1, FGFR2, KRAS, NRAS, PIK3CA, MET, RET, EGFR, JAK2, MPL, PDGFRA,
PTEN, TP53, FGFR3, FLT3, KIT, ERBB2, ABL1, HNF1A, HRAS, ATM, RB1, CDH1,
SMAD4, STK11, ALK, SRC, SMARCB1, VHL, MLH1, CTNNB1, KDR, FBXW7, APC,
CSF1R, NPM1,SMO, ERBB4, CDKN2A, NOTCH1, JAK3, PTPN11, GNAQ and GNA11.
Library preparation for sequencing on MiSeq: The genomic library was prepared using
250 ng of DNA template and customized Truseq Amplicon Cancer Panel (TSACP) kit
(illumina) according to the manufacturer’s protocol. Briefly, library preparation involved
hybridization of the probe mixture to genomic DNA and areas of interest were captured
by extension and ligation. In addition to areas complimentary to genomic DNA, each of
the probes has a common sequence, which is used to PCR-amplify the captured
sequences in a subsequent step. This PCR step also adds sequencing adapters (for
binding of sequencing primers) and short stretches of identifier sequences or barcodes
on the 3 and 5 ends of the amplicons to be used as unique sample identifiers, thus
facilitating multiplexed sequencing of the samples.
Sequencing using MiSeq. Library generated using customized TSCAP was purified
using AMPure magnetic beads (Agentcourt, Brea, CA) according to the manufacturer’s
protocol. From each library, equal quantities of the DNA were isolated and eluted using
Library normalization beads (TSCAP kit) following the manufacturer’s instructions and
equal volumes were mixed. This ensures same representation of the library from each
sample during multiplexed sequencing. Paired end sequencing of samples was
performed using MiSeq Reagent Kit, V1 (300 cycles) using MiSeq sequencer (illumina).
2
On average, libraries from 10 samples per sequencing run were multiplexed. Base
calling and the sequencing run summary were obtained using Real Time Analysis
Software (V1.14.23) (illumina). Sequencing quality was evident by the Q30 scores (one
error in 1000 bp sequence) and a cutoff of 85% sequence with Q30 score was used as
an indication of successful sequencing.
Sensitivity, inter-run and intra-run reproducibility. Sensitivity was determined by serially
diluting DNA from H2122 (ATCC CRL-5985), a human lung carcinoma cell line with a
known homozygous mutation in KRAS and a heterozygous mutation in MET into DNA
from HL60 cell line, a human promyelocytic leukemia cell line. Briefly, DNA isolated
from H2122 was diluted into DNA from HL60, in ratios of 1:4, 1:9, 1:19 (H2122:HL60)
resulting in 20%, 10% and 5% dilutions, respectively which were sequenced. Similarly,
DLD1 cell line (ATCC CCL-221) DNA with 8 heterozygous (monoallelic) mutations was
serially diluted into normal DNA control (Promega, Madison, WI) in ratios of 1:1, 1:3, 1:9
and 1: 19 (DLD1: normal) resulting in 50%, 25%, 10% and 5% dilutions of DLD1 DNA
and sequenced in 6 independent runs. Intra-run reproducibility was tested by
sequencing 1:9 diluted (10%) (DLD1: normal DNA) sample barcoded by 20 different
barcodes on a single sequencing run and inter-run assay reproducibility was assessed
by sequencing 10% DLD1 DNA diluted into normal DNA sequenced in 12 different runs.
Library preparation for Ion Torrent Personal Genome Machine (IT-PGM). Library
preparation was performed using the Ion Torrent (IT) Ampliseq Kit 2.0 Beta (Life
Technologies) and IT Ampliseq cancer panel primers (Life Technologies) to interrogate
mutational hotspots in 46 cancer-related genes following the manufacturer’s instructions
as described previously.34 Sequencing adaptors with short stretches of index sequences
3
(barcodes) that enable sample multiplexing were ligated to the amplicons using the Ion
Express Barcode Adaptors Kit (Life Technologies). The library prepared was quantified
using the Bioanalyzer highsensitivty DNA chip (Agilent Technologies Inc, Santa Clara,
CA).
Emulsion PCR and sequencing using IT-PGM. Clonal amplification of the barcoded
library DNA onto the ionspheres (ISPs) by emulsion PCR (E-PCR) and subsequent
isolation of ISPs with DNA was performed using Ion Express Template Kit (Life
Technologies) and IT OneTouch ES (Life Technologies) as described previously.34
Enriched ISPs were subjected to sequencing on a 318 IonChip (8 samples) using the
sequencing kit (Life Technologies) as per the manufacturer’s instructions. A cutoff of
300,000 reads with a quality score of AQ20 (1 misaligned base per 100 bases) was
used as a measure of successful sequencing of a sample. Torrent Suite V.2.0.1 and
variant caller V.1.0 (Life Technologies) were used for alignment and variant calling,
respectively using Human genome build 19 (hg19) as reference.
Supplementary Table 1. Summary of mutations detected in the sample set by MiSeq and IT-PGM. Tumor type used, the correlation of results from sequencing on MiSeq using Truseq Amplicon Cancer panel and on IT-PGM using Ampliseq Cancer Panel have been summarized (NC - Not Covered)
MiSeq Ion-Torrent PGM
Sample No WHO diagnosis (Blast Count)
Cytogenetic profile
Mutation Detected by Miseq (Coverage)
Variant
Frequency (%)
Mutation detected by IT-PGM (Coverage)
Variant Frequency
(%)
1 AMML (38%)
46,XX[20]
KDR p.Q472H (6193) KRAS p.G12A (5745) NRAS p. Q61R (765)
49.4 18.3 14.6
Yes (3849) Yes (1271) Yes (3392)
54.0 49.8 19.6
2 AML-MRC (33%)
46,XY[20]
KDR p.Q472H (3035)
NPM1, Exon 11, 4bp Insertion (3231)
47.4
29.4
Yes (2268)
Yes (2196)
47.0
33.8
3 AMML (67%)
46,XY,del(12) (p11.2)[2]
FLT3 p.D835H (1826) KDR p.Q472H (3927) MET p.T1010I (4415)
NPM1, Exon 11, 4bp Insertion (1190) APC p.L1129S (3605)
24.7 49.1 51.6
34.2 46.0
Yes (2584) Yes (4660) Yes (5780)
Yes (4448)
NC on IT-PGM (Sanger Confirmed )
48.4 47.0 49.7
31.8
-
4 AMML (81%)
46,XX[20]
KDR p.Q472H (3034) KIT p.M541L (6536)
NPM1, Exon 11, 4bp Insertion (937) ABL1 p.L248R (275)
49.1 50.1
35.1 5.09
Yes (4083) Yes (1608)
Yes (3018)
NC on IT-PGM (Sanger Confirmed )
46.0 50.1
39.0
-
5 AML without
maturation (84%) 46,XX[20]
IDH1 p.R132C (2801)
NPM1, Exon 11, 4bp Insertion (937)
47.23
31.8
Yes (4252)
Yes (3108)
37.6
32.8
6 t-AML (73%)
46,XX[20]
KIT p.M541L (5736)
NPM1, Exon 11, 4bp Insertion (937)
49.9
31.8
Yes (1608)
Yes (2090)
50.1
25.3
7 AML without
maturation (79%) 46,XX[20]
ATM, p.V410A (5907) KDR p.Q472H (4928) MET p.N375S (1449)
NPM1, Exon 11, 4bp Insertion (1449)
50.1 50.2 52.8
39.1
Yes (4648) Yes (2832) Yes (2335)
Yes (3166)
52
48.0 60.8
34.8
8 AML without
maturation (91%) 47,XX,+15[1]
NPM1, Exon 11,
4bp Insertion (1617) KDR p.Q472H (4693)
35.0 50.8
Yes(302) Yes (452)
35.8 49.0
9 Acute erythroid leukemia (20%)
46,XX[20] NPM1, Exon 11,
4bp Insertion (981) APC p.M1413V (2816)
39.1
51.1
Yes (2502)
NC on IT-PGM
(Sanger Confirmed )
32.1
-
10 AML-MRC (30%)
47,XX,+8[10]
MET p.N375S (1449) NRAS p.G12R (3376)
56.3 37.5
Yes (2335) Yes (4108)
60.8 21.2
11 AML without
maturation (93%)
46,XY[20]
ATM p.A1039T (2868) KDR p.Q472H (4856)
33.5 50.8
Yes (2346) Yes (3065)
62.4 46.0
12 AMML (38%)
46,XY[20]
KDR p.Q472H (4632) NPM1, Exon 11,
4bp Insertion (1929) NRAS p.G13D (2685)
51.0
35.6 20.1
Yes (2581)
Yes (2059) Yes (3911)
48.0
28.8 27.4
13 AML (56%)
46,XY[20]
IDH1 p. R132H (2607) KDR p.Q472H (4700) KIT p.M541L (9548)
NPM1, Exon 11, 4bp Insertion (2488)
25.7 99.7 48.4
19.5
Yes (4934) Yes (3426) Yes (1765)
Yes (3633)
46.4 99.0 50.7
18.1
14 Persistent AML (82%)
Complex KRAS p.G12D (3984) 51.4 Yes (1271) 49.8
15 AMML (24%)
46,XX [20] KRAS p.G12R (3574) 45.9 Yes (973) 42.5
16 AML-MRC (56%)
46,XY,del(16) (q13q21)[16]
IDH1 p.R132H (3009) KIT p.M541L (9081)
NPM1, Exon 11, 4bp Insertion (461)
KDR p.Q472H (4530)
44.7 49.0
31.8 58.2
Yes (4055) Yes (1097)
Yes (2092) Yes (3496)
51.3 47.3
31.6 52.0
17 AML (1%)
46,XX[20] KDR p.Q472H (3937) ABL1 p. L248R (458)
47.8 6.11
Yes (2448)
NC on IT-PGM (Sanger Confirmed)
50.0 -
18 AML (0%)
46,XY,t(11;21) (p11.2;q22)[1]
MET p.N375S (1735) 37.9 Yes (2335) 60.8
19 (AML) (3%)
46,XX[20]
IDH1 p. R132L (3012)
22.9
Yes (4934)
46.4
20 AML without
maturation (80%)
46,XY,del(20) (q11.2q13.3)[5]
JAK2 p.V617F (950) 50.2 Yes (4701) 49.2
21
MDS, RAEB-2 (10%)
46,XX[20]
KDR p.Q472H (3459)
NPM1, Exon 11, 4bp Insertion (1260)
48.1
30.2
Yes (1121)
Yes (1453)
52.0
23.9
22
MDS (1%)
47, XY, +8[10]
APC p.R876Q (2791)
52.9
Yes (3345)
46.1
23 AML-MRC (40%)
46,XX,del(5)(q13q33)[20]
KRAS p.Q61H (2494) 27.5
Yes (2985)
37.02
24
AML-MRC with CLL (53%)
46,XY[20] NPM1, Exon 11,
4bp Insertion (297)
32.3
Yes (3971)
29.3
25
AML-MRC (43%) 46,XY[18] KDR p.Q472H (4794) 48.7
Yes (1146)
61.0
26 AML-MRC (61%)
46,XX[20]
FLT3 p.D835Y (2939)
NPM1, Exon 11, 4bp Insertion (982)
49.1
36.4
Yes (2584)
Yes (1293)
48.4
36.1
27 AMML (61%)
46,XX[20]
FLT3 p.D835A (592) KIT p.M541L (3109) NRAS p.G13D (911)
24.6 50.3 36.0
Yes (4476) Yes (1608) Yes (3911)
12.5 50.1 27.4
28 AML-MRC (42%)
46,XY[20]
ATM p.P604S (1225)
55.1
Yes (555)
30.0
NPM1, Exon 11, 4bp Insertion (1807)
PTPN11 p.A72D
32.6 35.1
Yes (961) Yes (353)
22.0 79.6
29 (Acute
monocytic Leukemia) (74%)
53,XX,+X, der(1)t(1;1)
(p36.3;q12),+2,+4,+5,+8,+12,+
16[4]
ABL1 p.R307Q (2556) KRAS p.Q61H (1197)
NPM1, Exon 11, 4bp Insertion (1553)
44.6 37.9
20.2
Yes (703) Yes (2985)
Yes (2594)
61.4 37.0
21.7
30 AML-MRC (26%)
45,X,-Y[20]
NPM1, Exon 11, 4bp Insertion (2129) KDR p.Q472H (4150) KIT p.M541L (8411) MPL p.W515R (331)
32.0 48.7 99.5 32.3
Yes (1735) Yes (1134) Yes (4188) Yes (498)
24.7 58.0 99.9 41.3
31 t-AML (81%)
46,XX[20]
NPM1, Exon 11, 4bp Insertion (490)
APC p.I1307K IDH2 p.R140Q (1047)
36.5 48.8 47.6
Yes (3189) Yes (657)
NC on IT-PGM (Sanger Confirmed)
34.7 43.6
-
32 AML-MRC (88%)
46,XY,inv(9) (p12q13)[20]
KDR p.Q472H (4348) KIT p.M541L (6982) IDH2 p.R140Q (542)
48.6 49.5 26.5
Yes (3193) Yes (933)
NC on IT-PGM (Sanger Confirmed)
52.0 49.8
-
33 AML with
Inv16 (42%)
46,XX,inv(16)(p13.1q22)[16]
KDR p.Q472H (4683) RET p.S649L (8513) TP53 p. R148T (781)
47.7 47.6 92.5
Yes (2564) Yes (3713)
NC on IT-PGM (Sanger Confirmed)
56.0 40.4
-
34
AML with Inv16 (53%)
46,XX,inv(16) (p13.1q22)[19]
KDR p.Q472H (2304) KRAS p.G12D (1727) NRAS p.Q61R (325)
52.7 7.7
19.0
Yes (2858) Yes (1271) Yes (3392)
52.0 49.8 19.6
35 AML-MRC (37%)
46,XX,del(5)(q13q33)[4]
TP53 p.V274F (795) TP53 p. H179R (1059) APC p.S1404F (874)
39.1 24.8 5.15
Yes (503)
Yes (1558) NC on IT-PGM
(Sanger Confirmed)
26.8 41.7
-
36 AMML (65%)
46,XY,del(7)(q22),-10,-18[1]
KDR p.Q472H (4840) FLT3 p.D835Y (2365) IDH1 p.R132H (2975)
NPM1, Exon 11, 4bp Insertion (1164)
47.4 9.5
39.8
32.8
Yes (3474) Yes (3633) Yes (1443)
Yes (2249)
56.0 18.1 29.1
26.5
37 AML (81%)
46,XY,t(4;17)(q13;p13)[11]
FLT3 p.Y842H (1968) NRAS p. G12C (1938) NRAS p.G13D (1939)
15.7 21.5 18.0
Yes (3233) Yes (4108) Yes (3911)
9.3 21.2 27.4
38
AML (80%)
47,XY,+Y[6]
ATM p.V410A (6016) IDH1 p.R132H (2960) KDR p.Q472H (6029)
NPM1, Exon 11, 4bp Insertion (2214)
51.0 43.8 50.2
34.7
Yes (4648) Yes (4934) Yes (4030)
Yes (3961)
46.4 46.4 46.0
34.4
39 AML (45%)
Complex NPM1, Exon 11,
4bp Insertion (1342) KIT p.D816V (3832)
33.01 77.6
Yes (2975) Yes (4331)
31.6 77.8
40 Acute monocytic Leukemia (43%)
46,XX [20]
FLT3 p.D835V (1947) KDR p.Q472H (4861)
NPM1, Exon 11, 4bp Insertion (3560)
PTPN11 p. E76K (7409)
21.1 46.3
32.1 24.4
Yes (4476) Yes (1899)
Yes (1443) Yes (3270)
12.1 47.0
12.1 17.1
41 AML (32%)
NA IDH1 p.R132C (3520) KIT p.M541L (12238)
41.0 50.7
Yes (4934) Yes (1483)
46.4 48.6
42 AML-MRC (7%)
46,XY[20] KIT p.M541L (22765) KRAS p.G60V (3931)
99.8 45.6
Yes (991) Yes (8063)
99.8 38.5
43 MDS (15%)
46,XY[19]
KIT p.M541L (16620)
NRAS p.G13R (531) NRAS p.G12A(531)
49.4 23.5 21.2
Yes (1608) Yes (2804) Yes (2827)
50.1 23.6 23.6
44 AMML (61%)
46,XX,[22]
KDR p.Q472H (4218)
NPM1, Exon 11, 4bp Insertion (1052)
99.3
37.8
Yes (1474)
Yes (1576)
100
22.8
Supplementary Table 2. Sequencing of samples in parallel using MiSeq and
conventional sequencing platforms. A set of 11 samples were sequenced in parallel
using MiSeq and other sequencing platforms used in the laboratory and compared.
Diagnosis Blast (%)
MiSeq
Detection on Parallel Platforms
Mutation detected (Coverage)
Frequency (%)
1 AML (81%)
DNMT3A p. R882H (822)
47.8
Yes (Sanger)
2 AML (10%)
IDH1 p.R132C (4581) JAK2 p. V617F (7283)
33.7 96.8
Yes (Sanger)
Yes (Pyro)
3
AML (1%)
DNMT3A
p. R882H (2188)
25.8
Yes (Sanger)
4
AML-MRC (49%)
JAK2 p. V617F (5202)
25.9
Yes (Pyro)
5
AML-MRC (67%)
IDH1 p.R132C (3223)
34.3
Yes (Sanger)
6 AML-MRC (50%)
NRAS p.G12D (4905)
13.9
Yes (Pyro)
7
t-AML (90%)
IDH1 p.R132L (4018)
40.9
Yes (Sanger)
8 AMML (76%)
NPM1 (4bp Ins) (800)
DNMT3A p. R882H (466)
30.2
42.2
Yes (Capillary Electrophoresis)
Yes (Sanger)
9 t-AML (35%)
IDH2 p.R140Q (1816)
31.5
Yes (Sanger)
10 AML-MRC (90%)
NRAS p.Q61K (1167)
31.1
Yes (Pyro)
11 AML (30%)
IDH2 p.R140W (4522) JAK2 p. V617F (1833)
31.2 23.6
Yes (Sanger)
Yes (Pyro)
Supplementary Table 3. Detection of FLT3 ITDs by MiSeq Reporter and Pindel. 10 archival samples with known internal tandem duplications (ITDs) in exon 11 of FLT3 were sequenced using MiSeq and the aligned sequencing data was reanalyzed via Pindel, a program designed to detect indels. (NC – Not called by MiSeq Reporter)
FLT3 ITD Detection
Sample
Cytogenetics
Capillary Electrophoresis
(ITD Size and % allele frequency)
MiSeq Reporter
Pindel
1
AMML (67%)
46,XY,del(12)(p11.2) [2] 47,XY,+8[1] 46,XY[37]
18bp (23%) NC Yes (18bp)
2
AMML (81%)
46,XX[20] 31bp (39%) NC Yes (30bp)
3
AML without maturation (56%)
46,XY,t(11;21)(p11.2;q22)[1]
94bp (1%) NC Yes (93bp)
4
Acute monocytic leukemia (86%)
Complex 61bp (3%) NC Yes (60bp)
5
AMML (61%)
46,XX,[22]
42bp (3%)
58bp (35%)
NC NC
Yes (42bp) Yes (57bp)
6
AML with maturation (38%)
46,XY[13] 31bp (32%) NC Yes (30bp)
7
t-AML (81%)
46,XX[20] 46bp (1%) NC Yes (45bp)
8
Acute monoblastic leukemia (68%)
Complex 55bp (6%) 58bp (9%)
NC NC
Yes (54bp) Yes (57bp)
9
AML with Maturation (32%)
Complex 61bp (3%) NC Yes (60bp)
10
AML (3%)
NA (PB)
100bp (34%)
NC Yes (99bp)
Supplementary Table 4. Comparison of sequencing results from MiSeq Version1 and Version 2 sequencing. 12 samples were sequenced by both version 1 and Version 2 sequencing on MiSeq. The variants detected, their frequencies and sequencing coverage are compared.
Variants Version 1 Version 2
Sample Mutation Detected by Miseq and Coverage
Variant Frequency (%)
Coverage (X) Variant
Frequency (%) Coverage
(X)
1
FGFR3 p.F384L KIT, p.M541L MET, T1010I
48.7 51.4 51.2
306
18034 8119
49.5 51.0 54.3
216
11994 5479
2
APC p.E1317Q JAK2 p.V617F
50.7 59.7
9923 3015
49.9 59.3
12056 3416
3
KIT p.M541L TP53 p.Y220C
99.8 54.8
21312 5348
99.8 57.5
22977 5358
4
KIT p.M541L
51.8
19410
50.9
23113
5
KDR p.Q472H
50.5
9159
50.2
10275
6
KDR p.Q472H TP53 p.R273H
49.9 36.6
6433 2151
50.5 35.4
5449 1948
7
TP53 p.N247I TP53 p. R110P
8.7 29.1
3702 999
7.0
22.9
4539 1938
8
TP53 p.N239S 11.0 4460 10.3
4425
9
NPM1 (4bp insertion)
PTPN11 p.S502L
33.9 36.1
1676 5772
40.5 38.9
2863 8682
10
KIT p.M541L 48.5 9159 50.2 10275
11
KDR p.Q472H
99.5
5046
99.6
6083
12
KDR p.Q472H
51.4
6050
51.2
9207
Supplementary Table 5. Sensitivity studies were conducted using two cell lines (H2122 and DLD1). Details regarding the various dilutions used,
the average expected and detected variant frequencies of the mutations in six separate dilution studies are summarized below. The cytogenetic
profile of the cell lines is also included when available. The information summarized is an average of 2 and 6 separate studies for H2122 and
DLD1, respectively.
DLD1 (Variants)
Chromosome positions
DLD1
(Undiluted)
DLD1 (50%) (1:1 diluted)
DLD1 (25%) (1:3 diluted)
DLD1 (10%) (1:9 diluted)
DLD1 (5%)
(1:19 diluted)
Variant Frequency
(%)
Expected
(%)
Detected
(%)
Expected
(%)
Detected
(%)
Expected
(%)
Detected
(%)
Expected
(%)
Detected
(%)
IDH1 p.G97D PIK3CA p.D549N PIK3CA p.E545K
KIT p.V532I XPO1 p.E510K KRAS p.G13D TP53 p.S241F SMO p.T640A
2q33.3 3q26.3 3q26.3
4q11-q12 2p15
12p12.1 17p13.1 7q32.3
33.2 48.8 49.4 50.5 66.8 50.2 49.2 64.6
16.6 24.4 24.7 25.2 33.4 25.1 24.6 32.3
13.5 25.0 25.1 25.9 42.4 26.5 25.5 36.0
8.3 12.2 12.3 12.6 16.7 12.5 12.3 16.1
7.1 10.8 11.2 12.7 24.5 14.5 12.6 23.0
3.3 4.8 4.9 5.0 6.6 5.0 4.9 6.4
2.8 4.6 4.9 5.6
11.3 6.2 5.2 9.2
1.6 2.4 2.4 2.5 3.3 2.5 2.4 3.2
1.5 2.4 2.5 2.9 6.0 3.5 2.5 6.4
DLD1 Cytogenetic analysis (ATCC CCL-221): Karyotype: 46, XY, 2, + dir dup (2)(p13p23). Modal chromosome number of 46, occurring in 86% of cells. Rate of polyploidy (17.1%).
H2122 (Variants)
Chromosome positions
H2122
(Undiluted)
H2122 (20%) (1:4 diluted)
H2122 (10%) (1:9 diluted)
H2122 (5%) (1:19 diluted)
Frequency
(%)
Expected
(%)
Detected
(%)
Expected
(%)
Detected
(%)
Expected
(%)
Detected
(%)
NRAS p.G12C MET p.N375S
1p13.2 7q31
99.5 50.4
19.9 10.0
17.2 7.2
9.9 5.0
7.6 3.6
4.9 2.5
4.3 2.1
H2122 Cytogenetic analysis (ATCC CRL-5985) : Not available
Supplementary Figure Legends
Supplementary Figure 1. MiSeq Sequencing run metrics. The performance of 16 sequencing
runs are summarized. (A) The cluster formation efficiency across the runs is reflected by the
cluster density and the clusters pass filter. (B) Plot of total sequencing outputs with a base
calling quality of more than Q30 (one error in 1000 bases) for the sequencing runs. (C) Total
sequencing reads and reads pass filter which provide useful sequencing information (with
chastity scores of ≥ 0.6) are shown. (D) The total number of reads that could be identified or
assigned to the barcode sequence ID associated with each of the multiplexed samples.
Supplementary Figure 2. Validation of MiSeq Version 2 upgrade. (A) A comparison of the
cluster generation density and clusters pass filter of the sequencing runs with Version1 (V1) (12
samples) and Version 2 (V2) (with the same 12 samples and additional 12 samples). (B) A
doubling of sequencing capacity of MiSeq V2 upgrade (4.7Gb) was evident in comparison to
the V1 (2.1Gb) with comparable sequencing quality (Q30). (C) Higher sequencing capacity of
V2 also evident in higher sequencing reads and reads pass filter (upper panel). The 12 samples
common to runs of both versions had comparable sequencing reads and average per base
coverage (lower panel).
Supplementary Figure 3. Inter- and intra-run assay reproducibility. (A) Inter-run reproducibility
was assessed by sequencing the 1:9 diluted (10%) sample of DLD1 in 11 separate sequencing
runs and the detection of 8 mutations was tested. (B) Intra-run reproducibility was estimated by
sequencing 24 DLD1 (10%) samples, each indexed with a distinct barcode combination and
sequenced on a single sequencing run. Detection of the 8 mutations in DLD1 was found to be
consistent in each of the 24 samples sequenced.
Supplementary Figure 4. Summary of somatic mutations detected. A pie chart depicting
different genes and the number of somatic mutations detected in the AML/MDS cohort tested by
sequencing on MiSeq (includes all samples listed in supplementary Tables 1, 2, 3 and 4). The
mutations detected in our study found in genes not routinely tested for AML have also been
summarized (right panel). Information regarding the number of mutations detected in each gene
and their relative percentages are provided next to the gene names. Germline polymorphisms
were not included in this summary.
Supplementary Figure 1
0
200
400
600
800
1000
1200
1400
0 5 10 15 20
A
Sequencing Runs
Clu
ste
rs /m
m2 (
X 1
03)
Clusters
Clusters Pass
Filter
0
0.5
1
1.5
2
2.5
0 5 10 15 20
Sequencing Runs
Se
quencin
g O
utp
ut in
Gb
ases (
>Q
30
)
B
50
60
70
80
90
100
0 5 10 15 20
Sequencing Runs
% R
ea
ds Id
entifie
d
D
0
200
400
600
800
1000
1200
0 5 10 15 20
Total Reads
Reads Pass
Filter
Sequencing Runs
Se
qcuence R
ea
ds (
X 1
03)
C
V1 (12 Samples)
Pass filter – 85.4% V2 (24 Samples)
Pass filter – 85.0%
A
B
Re
ads (
mill
ion
s)
V1: 86.3% Q30 reads
2.1Gb sequence output
Clu
ste
r D
en
sity (
K/m
m2)
V2: 88.5% Q30 reads
4.7Gb sequence output
TOTAL
READS PF READS
% READS
INDENTIFIED
(PF)
Version 1 9162448 7830961 96.3212
Version 2 19598832 16657210 96.5842
C
Supplementary Figure 2
0
2
4
6
8
10
12
Variants
Vari
ant F
req
uen
cy (
%)
10% DLD1 (n=11)
IDH1
(p.G97D)
PIK3CA
(p.D549N)
PIK3CA
(p.E545K) KIT
(p.V532I) XPO1
(p.E510K)
KRAS
(p.G13D)
TP53
(p.S241F)
SMO
(p.T640A)
A
B
Supplementary Figure 3
0
2
4
6
8
10
12
10% DLD1 (n=24)
IDH1
(p.G97D)
PIK3CA
(p.D549N)
PIK3CA
(p.E545K) KIT
(p.V532I) XPO1
(p.E510K)
KRAS
(p.G13D)
TP53
(p.S241F)
SMO
(p.T640A)
Vari
ant F
req
uen
cy (
%)
Variants