next generation sequencing based multi-gene … monitor minimal residual disease and prediction of...

47
Next generation sequencing based multi-gene mutational screen for acute myeloid leukemia using miseq: applicability for diagnostics and disease monitoring by Rajyalakshmi Luthra, Keyur P. Patel, Neelima G. Reddy, Varan Haghshenas, Mark J. Routbort, Michael A. Harmon, Bedia A. Barkoh, Rashmi Kanagal-Shamanna, Farhad Ravandi, Jorge E. Cortes, Hagop M. Kantarjian, L. Jeffrey Medeiros, and Rajesh R. Singh Haematologica 2013 [Epub ahead of print] Citation: Luthra R, Patel KP, Reddy NG, Haghshenas V, Routbort MJ, Harmon MA, Barkoh BA, Kanagal-Shamanna R, Ravandi F, Cortes JE, Kantarjian HM, Medeiros LJ, and Singh RR. Next generation sequencing based multi-gene mutational screen for acute myeloid leukemia using miseq: applicability for diagnostics and disease monitoring. Haematologica. 2013; 98:xxx doi:10.3324/haematol.2013.093765 Publisher's Disclaimer. E-publishing ahead of print is increasingly important for the rapid dissemination of science. Haematologica is, therefore, E-publishing PDF files of an early version of manuscripts that have completed a regular peer review and have been accepted for publication. E-publishing of this PDF file has been approved by the authors. After having E-published Ahead of Print, manuscripts will then undergo technical and English editing, typesetting, proof correction and be presented for the authors' final approval; the final version of the manuscript will then appear in print on a regular issue of the journal. All legal disclaimers that apply to the journal also pertain to this production process. Copyright 2013 Ferrata Storti Foundation. Published Ahead of Print on October 18, 2013, as doi:10.3324/haematol.2013.093765.

Upload: truongmien

Post on 07-Mar-2018

219 views

Category:

Documents


5 download

TRANSCRIPT

Next generation sequencing based multi-gene mutational screenfor acute myeloid leukemia using miseq: applicability for diagnostics and disease monitoring

by Rajyalakshmi Luthra, Keyur P. Patel, Neelima G. Reddy, Varan Haghshenas, Mark J. Routbort, Michael A. Harmon, Bedia A. Barkoh, Rashmi Kanagal-Shamanna, Farhad Ravandi, Jorge E. Cortes, Hagop M. Kantarjian, L. Jeffrey Medeiros, and Rajesh R. Singh

Haematologica 2013 [Epub ahead of print]

Citation: Luthra R, Patel KP, Reddy NG, Haghshenas V, Routbort MJ, Harmon MA, Barkoh BA, Kanagal-Shamanna R, Ravandi F, Cortes JE, Kantarjian HM, Medeiros LJ, and Singh RR. Next generationsequencing based multi-gene mutational screen for acute myeloid leukemia using miseq: applicability for diagnostics and disease monitoring. Haematologica. 2013; 98:xxxdoi:10.3324/haematol.2013.093765

Publisher's Disclaimer.E-publishing ahead of print is increasingly important for the rapid dissemination of science.Haematologica is, therefore, E-publishing PDF files of an early version of manuscripts thathave completed a regular peer review and have been accepted for publication. E-publishingof this PDF file has been approved by the authors. After having E-published Ahead of Print,manuscripts will then undergo technical and English editing, typesetting, proof correction andbe presented for the authors' final approval; the final version of the manuscript will thenappear in print on a regular issue of the journal. All legal disclaimers that apply to thejournal also pertain to this production process.

Copyright 2013 Ferrata Storti Foundation.Published Ahead of Print on October 18, 2013, as doi:10.3324/haematol.2013.093765.

1

Next generation sequencing based multi-gene mutational screen for acute myeloid leukemia using miseq: applicability for diagnostics and

disease monitoring

Running title: Clinical NGS for AML gene mutations

Rajyalakshmi Luthra1#, Keyur P. Patel1, Neelima G. Reddy1, Varan Haghshenas1, Mark J. Routbort1, Michael A. Harmon1, Bedia A. Barkoh1,

Rashmi Kanagal-Shamanna1, Farhad Ravandi 2, Jorge E Cortes2, Hagop M Kantarjian2, L. Jeffrey Medeiros1 and Rajesh R. Singh1#

Departments of Hematopathology1 and Leukemia2, The University of Texas M.D. Anderson Cancer Center, Houston, Texas

Correspondence

Rajyalakshmi Luthra and Rajesh R Singh, The University of Texas M. D. Anderson Cancer Center 8515 Fannin Street, Houston, TX 77054 USA. E-mail: [email protected]/ [email protected]

2

Abstract

Routine molecular testing of Acute Myeloid Leukemia involves mutational screening of

several genes of therapeutic and prognostic significance. Comprehensive analysis using

single-gene assays has high DNA requirement, cumbersome and timely consolidation of

results for clinical reporting is challenging. High throughput next generation sequencing

platforms widely used in research have not been tested vigorously for clinical application.

Here, we describe the clinical application of MiSeq, a next generation sequencing platform to

screen mutational hotspots in 54 cancer-related genes including genes relevant in Acute

Myeloid Leukemia (NRAS, KRAS, FLT3, NPM1, DNMT3A, IDH1/2, JAK2, KIT and EZH2). We

sequenced 63 Acute Myeloid Leukemia/myelodysplastic syndrome samples using MiSeq and

compared with another next generation sequencing platform (Ion-Torrent Personal Genome

Machine) and other conventional testing platforms. MiSeq detected a total of 100 single

nucleotide variants and 23 NPM1 insertions that were confirmed by Ion Torrent or conventional

platforms indicating complete concordance. FLT3 Internal tandem duplications (n=10) we not

detected, however reanalysis of MiSeq output by Pindel, an indel detection algorithm detected

them. Dilution studies of cancer cell-line DNA showed quantitative accuracy of mutation

detection up to 1.5% allelic frequency with high level of inter- and intra-run assay

reproducibility, suggesting potential utility for monitoring the therapy response, clonal

heterogeneity and evolution. Examples demonstrating the advantages of MiSeq over

conventional platforms for disease monitoring have been provided. Easy work-flow, high

throughput multiplexing capability, 4-day turnaround time and simultaneously assessment of

routinely tested and emerging markers makes MiSeq highly applicable for clinical molecular

testing of Acute Myeloid Leukemia.

3

Introduction

Acute myeloid leukemia (AML) is commonly characterized by genetic aberrations,

including gene mutations, copy number alterations and gene translocations.1, 2 In recent

years, massively parallel next generation sequencing (NGS) technology has shown great

potential in identifying AML-related genetic aberrations.3-6 The current 2008 World Health

Organization classification system requires genetic information, in addition to the clinical,

morphologic and immunophenotypic findings, for the diagnosis and classification of AML.

Several types of AML are now defined by the presence of recurrent chromosomal

translocations that result in fusion genes, and two new provisional categories have been

proposed based on the prognostic impact of these single gene defects AML with mutated

NPM1 and AML with mutated CEBPA. 7 Newer risk stratification models for AML patients also

have been proposed that incorporate data from cytogenetic studies and mutational profiling of

several critical genes, such as FLT3, NPM1, CEPBA, DNMT3A, IDH1, IDH2, KIT, MLL-PTD,

TET2, RUNX1, ASXL1 and TP53. 8, 9 For accurate risk stratification, it is important to analyze

the mutations in several genes comprehensively due to complex interactions among different

pathways in leukemogenesis 9-11. Some of these gene mutations are currently targetable with

inhibitors, such as RAS and FLT3, and other genes will likely be targetable in the future.

Recent literature also suggests that quantitative assessment of gene mutations may be useful

to monitor minimal residual disease and prediction of risk of relapse.11 Therefore, routine

molecular diagnostic testing of AML patients is indispensable for rational therapeutic decisions

and includes mutational screening of several genes, which are either targetable for therapy or

useful for prognostication.12

4

In a clinical molecular diagnostic laboratory, multi-gene screening with conventional

platforms proves challenging due to the need for relatively large quantities of DNA to assess

one gene at a time and the coordination and compilation of the results from several analysis

platforms into an integrated report. Massively parallel sequencing ability of next generation

sequencing (NGS) technologies have made high throughput and multiplexed sequencing of

specific panels of genes or whole exomes/genomes feasible.13-15 Currently, they are being

applied extensively to characterize genomic alterations in cancer16-19 and are highly relevant

for diagnostic purposes as alternatives to the first generation sequencing techniques.20

However, inclusion of NGS technologies into the clinical diagnostic arena has been delayed

due to lack of adequate guidelines and thorough validation.

MiSeq is a next generation sequencer that is well-suited for various targeted

sequencing applications and generates genomic sequence information with high accuracy. 21-

27 In this study, we used MiSeq and a customized TruSeq Amplicon Cancer Panel (TSACP) to

screen for mutations in 54 cancer-related genes in 63 samples (60 AML and 3 myelodysplastic

syndrome) in our Clinical Laboratory Improvement Amendments (CLIA)-certified laboratory.

We demonstrate the value of this approach, which requires less DNA and provides data for

multiple genes simultaneously in an integrated molecular diagnostic laboratory report with a

short turnaround time.

Methods

Tumor samples and sequencing platforms. A total of 63 patient samples that included 62 bone

marrow aspirates and 1 peripheral blood, from patients with the diagnosis of AML (n=60) or

myelodysplasia (n=3) were assessed using MiSeq sequencer (illumina Inc, San Diego, CA).

5

DNA was extracted using the Autopure extractor (QIAGEN/Gentra, Valencia, CA) and

quantified using the Qubit DNA BR assay kit (Life Technologies, Carlsbad, CA). Out of these,

52 samples were identified from the archives of the Molecular Diagnostics Laboratory at The

University of Texas MD Anderson Cancer Center based on their mutational status ascertained

by conventional mutation detection platforms in the laboratory. Forty-four of the archival

samples were sequenced using both the MiSeq and IT-PGM (Life Technologies). Eleven

additional samples were sequenced on the MiSeq instrument in parallel (blind) with routinely

used platforms, including Sanger sequencing, pyrosequencing and fragment analysis by

capillary electrophoresis (CE) performed as described previously. 28-30 To confirm mutations

detected in regions covered by TSACP but not by preexisting assays in our laboratory

(AmpliSeq cancer panel, Sanger or pyro sequencing) 42 new Sanger sequencing assays were

also developed.

Customized TruSeq Amplicon Caner Panel.

TruSeq Amplicon Cancer Panel (TSACP) interrogates mutational hotspots in 48 cancer-related

genes (listed in supplementary methods). TSACP consists of 212 pairs of probes designed to

bind flanking genomic areas of interest. Additionally, 7 more probe pairs were designed by

illumina on our request, which were spiked into the original TSCAP to interrogate mutational

hotspots in 3 additional AML-related genes (DNMT3A, IDH2 and EZH2) and 3 chronic

lymphocytic leukemia (CLL)-related genes (XPO1, KLHL6 and MYD88). Library preparation

and sequencing using MiSeq was performed as per manufacturer’s instruction (details

provided in supplementary methods).

6

Validation of version 2 upgrade on MiSeq. The Version 2 (V2) upgrade on MiSeq has changes

in sequencing chemistry and imaging to allow sequencing twice as many samples as V1. To

validate its accuracy and efficiency we sequenced in parallel 12 archival AML samples

(additional to the above mentioned cohort) using MiSeq V1 and the V2 upgrade. To

compensate for the higher sequencing capacity of V2, 12 additional samples were included in

the sequencing run with V2, which facilitated optimal comparison.

Variant calling and data analysis. Human genome build 19 (hg19) was used as the reference.

Alignment to hg19 genome and variant calling was performed by MiSeq Reporter Software

1.3.17. Integrative Genomics Viewer (IGV)31 was used to visualize the read alignment and

confirm the variant calls. A sequencing coverage of 250X (bi-directional) and a minimum

variant frequency of 5% in the background of wild type (WT) were used as cutoffs for clinical

reporting. An in-house custom developed software32 designated as OncoSeek, was used to

interface the data with IGV and to annotate the sequence variants. For identification of FLT3

internal tandem duplications (ITDs), Pindel33 analysis (source:

http://www.ebi.ac.uk/~kye/pindel/ and version Pindel0.2.4t) was performed using the BAM files.

Sequencing using Ion Torrent Personal Genome Machine (IT-PGM). Library preparation and

sequencing on IT-PGM was performed as described earlier.34 (Details provided in

supplementary methods).

7

RESULTS

MiSeq sequencing run metrics. In 16 different V1 sequencing runs, the average cluster

density of sequencing template generated per square millimeter (mm2) in the flow cell

ranged from 502,000 to 1,186,000 clusters per mm2 with a median of 950,000 clusters per

mm2. The cluster which passed filter or clusters from which sequencing information could be

obtained (without signal overlap from surrounding clusters) ranged from 460,000 to

1,065,000 clusters per mm2 with a median value of 851,950 per mm2 (Supplementary Figure

1A). The total sequencing output for the runs ranged from 1.1 to 2.2 Gigabases (Gb) with

sequencing quality score > Q30 with a median value of 1.9 Gb (Supplementary Figure 1B).

The total sequencing reads obtained from the run ranged from 4,079,435 to 9,570,965, with

a median value of 7,424,774. The identified sequencing reads or reads pass filter (with

chastity scores of ≥ 0.6) ranged from 3,716,549 to 8,080,211 reads with a median value of

6,641,052 (Supplementary Figure 1C) indicating that 72.4-96.1% of the reads (median,

95.3%) were ‘identified reads’ or reads that could be identified with the barcode indexes

used (Supplementary Figure 1D). The average sequencing depth per base per sample in

these runs was 1455X, which indicated adequate sequencing of the samples.

Sensitivity of mutation detection. To measure the mutation detection sensitivity, we

sequenced H2122 cell line DNA (with a homozygous KRAS mutation, p.G12C and a

heterozygous MET mutation, p.N375S) diluted into HL60 cell line DNA to provide different

levels of H2122 DNA, ranging from 100% (undiluted), 20% (1:3), 10% (1:9) and 5% (1:19).

Sequencing results from two independent experiments showed efficient detection of these

two mutations. The presence of mutations at all dilutions was clearly evident in the aligned

8

sequencing reads for KRAS (p.G12C) (Figure 1A, upper panel) and MET (p.N375S) (Figure

1A, lower panel). The concordance between the average variant frequencies detected at

different dilution levels and the expected frequencies are shown in Figure 1B, upper and

lower panels.

Concordance of MiSeq with IT-PGM. In the 44 tumor samples analyzed in parallel using the

MiSeq and IT-PGM sequencers a high degree of concordance was observed. A total of 110

variants (single nucleotide variations and insertions) were detected by MiSeq, 102 of which

were confirmed by IT-PGM. The 8 variants not detected by IT-PGM were not covered by

the AmpliSeq Cancer panel but were confirmed by Sanger sequencing (Supplementary

Table 1). Several variants listed in this table are potential germ line polymorphisms (KDR

p.Q472H, MET p.N375S, MET p.T1010I and KIT p.M541L). They were also listed in the

comparison study (Supplementary Table 1) as they help in establishing the overall mutation

detection accuracy and specificity of MiSeq. High degree of concordance in variant

detection was also observed between MiSeq and Sanger sequencing, pyrosequencing and

fragment analysis by capillary electrophoresis. For example, in a representative archival

sample a KRAS p.G60V (GGT>GTT) mutation was detected by pyrosequencing (Figure

2A) and was detected at a variant frequency of 45.6% at a sequencing depth of 3931X by

MiSeq (Figure 2B). IT-PGM also detected this mutation at a comparable frequency of 39%

(3120X) (Figure 2C). Similarly, an IDH2 p.R132H (CGT>CAT) (Figure 3A) mutation

detected by Sanger sequencing in a sample was also detected by MiSeq (42.0%, 3,672X)

(Figure 3B) and IT-PGM (43.1%, 3,318X) (Figure 3C). In a third example, a 4bp insertion

(G>GTCTG) detected in exon 11 of NPM1 by fragment analysis coupled to CE at 42.6%

9

(Figure 4A) was detected by MiSeq (32.8%, 1,164X) (Figure 4B) and also by IT-PGM

(26.5%, 2,249X) (Figure 4C).

Parallel comparison of mutation detection using MiSeq and routinely used testing platforms.

Eleven samples were sequenced using the MiSeq in parallel (blind) with conventional

techniques used routinely in our laboratory. In these samples, 14 sequence variants detected

by MiSeq were confirmed by Sanger sequencing (n=8), pyrosequencing (n=5) and fragment

analysis by CE (n=1) indicating complete concordance (Supplementary Table 2).

Detection of insertions by MiSeq. In our archival set of 44 samples, 23 samples had a 4 bp

insertion in NPM1 and 2 samples had FLT3 internal tandem duplications (ITDs). MiSeq

reporter detected all 4bp insertions in NPM1 but did not detect the 2 FLT3-ITDs (sample no. 3

and 40, Supplementary Table 1). These ITDs were also not detected by IT-PGM. To further

assess the ability of MiSeq reporter to call FLT3-ITDs, we sequenced an additional 8 archival

samples with known FLT3 ITDs of various sizes and allele frequencies as detected by CE.

MiSeq did not call any of the FLT3 ITDs present in these samples. However, using Pindel, an

indel detection algorithm, we were able to detect and map each of the ITDs in the MiSeq

sequencing output with complete concordance with CE based fragment analysis method

(Supplementary Table 3).

Comparison of Version 1 and Version 2 sequencing on MiSeq. During our study a MiSeq

upgrade was introduced with changes in the sequencing chemistry and imaging (referred to as

version 2 or V2), which doubles the throughput capacity in comparison with Version1 (V1). To

10

validate the upgrade, we sequenced 12 samples using MiSeq V1 and V2 in parallel. To

compensate for the higher sequencing capacity of V2 and to facilitate an optimal comparison,

we included 12 additional samples in the sequencing run with V2. A comparison of the run

metrics showed comparable cluster density and clusters pass filter between the two versions

(Supplementary Figure 2A). A higher sequencing capacity of V2 was evident by the 4.7 Gb

sequencing output in comparison with the 2.1Gb V1 output, whereas the sequencing quality

(Q30) remained comparable (Supplementary Figure 2B). The increased sequencing capacity

of V2 was evident in the doubling of the total sequencing reads and reads pass filter (PF) with

the total overall percentage of reads identified remaining comparable (Supplementary Figure

2C, upper panel). Similarly, total reads and average coverage per base per sample remained

highly comparable between V1 and V2 (Supplementary Figure 2C, lower panel). Furthermore,

a high degree of concordance was also observed in the sequencing coverage and mutations

detected by V1 and V2 (Supplementary Table 4) indicating that the V2 upgrade maintained the

same efficiency and accuracy as V1.

Sensitivity of mutation detection using V2 sequencing chemistry. Sensitivity analysis was

performed using several sequentially diluted samples of the cell line DLD1 DNA (harboring 8

heterozygous mutations) into normal control DNA. Across six different sequencing runs each

of eight heterozygous (monoallelic) mutations present in DLD1 were successfully detected in

four sequential dilutions (50%, 25%, 10% and 5%) with minimal variation in the overall variant

frequencies. The lowest detection limit was seen for the IDH1 p.G97D variant detected at a

frequency of 1.5% in the 1:19 or 5% diluted DLD1 sample, indicating high detection sensitivity

(Figure 5). Supplementary Table 5 provides a detailed summary of the dilution studies with

11

H2122 (V1) and DLD1 DNA (V2) with expected and detected variant frequencies of the

mutations at each dilution level.

Inter- and Intra-run reproducibility. Inter-run reproducibility was assessed by sequencing a 1:9

(10%) dilution of DLD1 DNA in 11 separate multiplexed sequencing runs. The results showed

that each of the eight expected mutations was consistently detected in every run with minimal

variability in variant frequencies (Supplementary Figure 3A). Intra-run assay reproducibility was

assessed by sequencing the (1:9) 10% DLD1 DNA with 24 different barcode indexes and

sequencing them on the same multiplexed sequencing run. Again, consistent detection of each

mutation in the 24 samples analyzed with comparable variant frequencies was observed

indicating high inter-assay reproducibility (Supplementary Figure 3B).

Disease monitoring or assessment of minimal residual disease (MRD). We evaluated the utility

of highly sensitive sequencing on MiSeq in monitoring minimal residual disease using 4

representative cases with archival samples analyzed and reported in our laboratory by

routinely used platforms and subsequently sequenced by MiSeq for this study. For patient 1

(Figure 6A), we sequenced five consecutive pre-treatment (day 1) and post-treatment (day 10,

185, 455, and 475) samples by MiSeq and compared to the results from conventional test

platforms. To begin with (day 1) the sample had two mutations, IDH2 p.R140Q and 4bp

insertion in exon 11 of NPM1 which were detected by Sanger sequencing analysis and

fragment analysis, respectively. MiSeq also detected these mutations. However, in subsequent

samples obtained at 10 days and 185 days, MiSeq detected the IDH2 p.R140Q mutation at an

allele frequency of 5.5% and 2.8%, respectively, and the NPM1 mutation at an allele frequency

12

of 6% and 2%, respectively. By contrast, Sanger sequencing did not detect any IDH2 mutation

in the 10 day and 185 day samples and it was reported negative, indicating the advantage of

higher sensitivity of MiSeq. In these 2 samples, capillary electrophoresis did detect the 4bp

insertions in NPM1 at 13.6% and 10.5%, respectively. In later samples obtained at 455 and

475 days, MiSeq detected a FLT3 point mutation p.D835Y as well as the IDH2 and NPM1

mutations originally detected in the sample, which were also, detected using the routine

alternative platforms (Figure 6A).

In the second sample (Figure 6B); mutations in four genes were detected at diagnosis

and after 35 days (NRAS, NPM1, DNMT3A and FLT3-ITD). At 55 and 95 days the NRAS and

NPM1 mutations decrease drastically while the FLT3-ITD and DNMT3A mutations persist

indicating the presence of distinct clonal population with different somatic mutations.

Furthermore, at 95 days two low FLT3 p.D835 mutations appeared making the total of

mutations to be followed for disease monitoring to six indicating the need to follow multiple

mutations, which can be conveniently achieved on MiSeq. In the third example (Figure 6C),

two NRAS mutations (p.G12S and p.G12C) were detected at diagnosis, of which p.G12S

decreased considerably with treatment indicating their occurrence in distinct clones. However,

an activating FLT3 kinase domain mutation (p.Y842H)35 with implications in therapy response

was observed at low levels at diagnosis and progressively increased in allelic frequency over

time. As this mutation is not routinely tested for AML, it was not tested by conventional

platforms but was identified by the virtue of multi-gene screening on MiSeq. In the fourth

sample (Figure 6D), four somatic mutations were detected at the onset of the disease in IDH2,

NRAS, NPM1 and DNMT3A whereas only the DNMT3A mutation remained detectable after 33

days. In these cases, the ability to monitor multiple somatic mutations at high sensitivity by a

13

single platform, MiSeq, highlights the advantages of the NGS platforms which not only helps in

conserving precious samples but also improving analysis and clinical reporting workflows.

Discussion

AML is a genetically heterogeneous disease and history has shown that identification of

genomic abnormalities has identified prognostically meaningful subgroups with therapeutic

implications. Therefore, more accurate identification of genetic lesions is needed for better

disease classification, risk stratification and treatment strategies. Comprehensive routine

screening using conventional or first generation sequencing platforms in a clinical molecular

diagnostics laboratory is very challenging, requiring high amounts of DNA, labor-intensive,

expensive, and logistically difficult in terms of data integration in real-time. Next generation

technologies are highly advantageous with their decreasing cost, high throughput and

multiplexing capacity thereby facilitating parallel and targeted sequencing of all genes of

interest on a routine basis. However, sample preparation, sequencing and variant calling

components of NGS technologies need to be validated before they can be applied for routine

diagnostic purposes.

In recent years, NGS studies have revealed numerous previously unrecognized or

under-recognized molecular changes of therapeutic and prognostic significance. Hence,

upfront screening for genetic abnormalities and their subsequent monitoring is essential for

optimal patient management. Here we have rigorously tested the applicability of a NGS-based

screen using MiSeq by analyzing both archival samples with known mutations and samples

assessed prospectively. For single nucleotide variants, complete concordance was observed

between MiSeq and all other sequencing methods including the IT-PGM, another NGS

14

platform. MiSeq also successfully detected all 4bp insertions in NPM1 in this cohort. MiSeq

software, however, could not identify FLT3-ITDs. A similar deficiency was also observed for IT-

PGM. To detect FLT3-ITDs using MiSeq, reanalysis of the aligned sequencing data using

another program, namely Pindel was necessary.

The sequencing results using MiSeq showed a high level of inter- and intra-run

reproducibility with consistent detection of mutations. Dilution studies with 2 cell lines harboring

a total of 10 mutations across 7 chromosomes showed MiSeq can detect every mutation at

each dilution level tested with variant frequency as low as 1.5%. This high detection sensitivity

is advantageous for better monitoring the efficacy of therapy as evident in one of the examples

included in the study where an IDH2 mutation missed by Sanger sequencing in two

longitudinal samples but was detected by MiSeq (Figure 6A). This mutation recurred at high

levels subsequently indicating reemergence of the same clone. High detection sensitivity of

MiSeq also has the potential to detect new clones at an earlier time point than less sensitive

conventional mutation detection methods. Furthermore, the high throughput capacity of MiSeq

allows for the screening of multiple genes over the course of therapy thereby maximizing the

chances of detecting any other emergent clone that would be missed using classical single-

gene screening approaches.

A summary of all somatic mutations detected by MiSeq in our study has been provided

in Supplementary Figure 4. It is interesting to note that 23% of mutations were found in genes

which are not routinely tested in AML but are important emerging markers with implications in

disease outcome. For instance, JAK2 p.V617F mutations have been found to occur at low

levels in AML with an aberrant karyotype, more prevalent in AML patients with a preceding

myeloproliferative neoplasm36 and associated with aggressive disease and recurrence37.

15

Similarly, mutations in PTPN11, which encodes a non-receptor tyrosine phosphatase SHP-2

occur at low levels in a subset of AML occurring in both adults and children. 38, 39 Furthermore,

in 78% of AML patients with a complex karyotype, loss of one copy of the TP53 gene and

mutation of the other has been reported 40, however, the prevalence in AML cases with normal

karyotype is very low (~ 2%) 41. It is also noteworthy that two cases in our study also harbored

FLT3 Y842H mutation, which is implicated in resistance to targeted therapy but is not yet

routinely tested in AML. Three of the genes added to TSACP (XPO1, KLHL6 and MYD88)

were those which are found to be mutated in CLL, however it is of interest to note that

application of this test for routine screening of 875 AML/MDS cases in our laboratory has

identified three mutations in XPO1 (p. E571K) and KLHL6 (p. A93G and p.A9G) indicating their

occurrence in AML/MDS, albeit at lower frequencies. On the whole, the above mentioned

examples demonstrate the advantages of higher sensitivity and simultaneous screening of

multiple genes by NGS in identifying clonal heterogeneity and additional genomic aberrations

in markers not routinely screened in AML.

The testing of mutation detection ability of this assay in each of the 54 genes was not

feasible due to lack of positive patient or cell line samples. However, in our set of 63 AML/MDS

samples, 100 single nucleotide variants and 33 insertions were detected by this platform and

confirmed by alternate platforms (IT-PGM, CE and Sanger sequencing) indicating complete

concordance and high specificity of mutation detection. We also detected a total of 73 somatic

mutations in 7 genes routinely tested in AML/MDS (Supplementary Figure 4). Furthermore, we

have also established the overall performance of the test (specificity, sensitivity and

reproducibility) for routine clinical use. Our experience has shown that the average sample

preparation using TSACP is less than 9 hours ( including library preparation, concentration

16

normalization, sample multiplexing and loading on MiSeq), with only 4 hours of hands-on

time and provides high multiplexing capacity of up to 12 samples (V1) and 24 samples (V2) per

run. All subsequent sequencing and analysis steps involving cluster generation, sequencing,

base-calling, sequence alignment and variant calling are performed on board and takes 28

hours (27h for sequencing and 1h for analysis) making MiSeq a very self-sufficient sequencer.

The average turnaround time of 72 hours, starting from DNA extraction to completion of

sequencing can be achieved if batching of samples is not required and is very desirable in

medium and high-volume laboratories to offset costs. Furthermore, the convenience and

efficiency of screening several routinely tested and AML- associated genes using only 250ng

of DNA and reporting the data simultaneously in a single clinical report enables efficient use of

patient material and resources in a molecular diagnostic laboratory. On the whole the overall

simplicity of sample preparation, the self-sufficiency of MiSeq sequencer and the associated

high specificity, sensitivity and accuracy makes it a highly applicable next-generation

sequencer for routine mutation screening of AML samples in a clinical molecular diagnostics

laboratory.

17

Authorship and Disclosures

Contributions

Luthra R and Singh RR: Designed the study, planned experiments and their execution, data

analysis and manuscript writing

Reddy NG: Conducted experiments and data analysis

Haghshenas V, Barkoh BA and Harmon MA: conducted experiments

Patel KP: Pathology review and review of results

Kanagal-Shamanna R: Review of clinical history and cytogenetics

Routbort MJ: Informatics support for NGS data output and annotation

Medeiros LJ: Review of results and manuscript writing

Kantarjian H, Cortes J, Ravandi F: Review of the results and clinical information

Disclosures

R. Luthra, Patel KP, Reddy NG, Haghshenas V, Barkoh BA, Harmon MA, Kanagal-Shamanna

R, Routbort MJ, Ravandi F, Medeiros LJ and Singh RR have no conflicts of interest to declare.

Kantarjian H and Cortes J have grant support to declare but outside the submitted work.

Completed COI declaration forms from all authors have been submitted as supplementary

material with the revised manuscript.

18

REFERENCES.

1. Owen C, Barnett M, Fitzgibbon J. Familial myelodysplasia and acute myeloid leukaemia-a review. Br J Haematol. 2008;140(2):123-32. 2. Sanders MA, Valk PJ. The evolving molecular genetic landscape in acute myeloid leukaemia. Curr Opin Hematol. 2013;20(2):79-85. 3. Ley TJ, Ding L, Walter MJ, McLellan MD, Lamprecht T, Larson DE, et al. DNMT3A mutations in acute myeloid leukemia. N Engl J Med. 2010;363(25):2424-33. 4. Ley TJ, Mardis ER, Ding L, Fulton B, McLellan MD, Chen K, et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature. 2008;456(7218):66-72. 5. Ding L, Ley TJ, Larson DE, Miller CA, Koboldt DC, Welch JS, et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature. 2012;481(7382):506-10. 6. Welch JS, Ley TJ, Link DC, Miller CA, Larson DE, Koboldt DC, et al. The origin and evolution of mutations in acute myeloid leukemia. Cell. 2012;150(2):264-78. 7. Swederlow SH, Campo E, Harris NLE. WHO classification of tumours of haematopoietic and lymphoid tissues, International Agency for Research on Cancer Press, Lyon, France. 2008. 8. Grossmann V, Schnittger S, Kohlmann A, Eder C, Roller A, Dicker F, et al. A novel hierarchical prognostic model of AML solely based on molecular mutations. Blood. 2012;120(15):2963-72. 9. Patel JP, Gonen M, Figueroa ME, Fernandez H, Sun Z, Racevskis J, et al. Prognostic relevance of integrated genetic profiling in acute myeloid leukemia. N Engl J Med. 2012;366(12):1079-89. 10. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N Engl J Med. 2013;368(22):2059-74. 11. Estey EH. Acute myeloid leukemia: 2013 update on risk-stratification and management. Am J Hematol. 2013;88(4):318-27. 12. Burnett A, Wetzler M, Lowenberg B. Therapeutic advances in acute myeloid leukemia. J Clin Oncol. 2011;29(5):487-94. 13. Mardis ER. Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet. 2008;9:387-402. 14. Mardis ER. The impact of next-generation sequencing technology on genetics. Trends Genet. 2008;24(3):133-41. 15. Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010;11(1):31-46. 16. Loyo M, Li RJ, Bettegowda C, Pickering CR, Frederick MJ, Myers JN, et al. Lessons learned from next-generation sequencing in head and neck cancer. Head Neck. 2013;35(3):454-63. 17. Mardis ER. Applying next-generation sequencing to pancreatic cancer treatment. Nat Rev Gastroenterol Hepatol. 2012;9(8):477-86. 18. Ramsay AJ, Martinez-Trillos A, Jares P, Rodriguez D, Kwarciak A, Quesada V. Next-generation sequencing reveals the secrets of the chronic lymphocytic leukemia genome. Clin Transl Oncol. 2013;15(1):3-8. 19. Toffanin S, Cornella H, Harrington A, Llovet JM. Next-Generation Sequencing: Path for Driver Discovery in Hepatocellular Carcinoma. Gastroenterology. 2012/09/26 ed, 2012.

19

20. Ku CS, Cooper DN, Roukos DH. Clinical relevance of cancer genome sequencing. World J Gastroenterol. 2013;19(13):2011-18. 21. Harismendy O, Schwab RB, Bao L, Olson J, Rozenzhak S, Kotsopoulos SK, et al. Detection of low prevalence somatic mutations in solid tumors with ultra-deep targeted sequencing. Genome Biol. 2011;12(12):R124. 22. Wang C, Krishnakumar S, Wilhelmy J, Babrzadeh F, Stepanyan L, Su LF, et al. High-throughput, high-fidelity HLA genotyping with deep sequencing. Proc Natl Acad Sci U S A. 2012;109(22):8676-81. 23. Eyre DW, Golubchik T, Gordon NC, Bowden R, Piazza P, Batty EM, et al. A pilot study of rapid benchtop sequencing of Staphylococcus aureus and Clostridium difficile for outbreak detection and surveillance. BMJ Open. 2012;2(3). 24. Liu L, Li Y, Li S, Hu N, He Y, Pong R, et al. Comparison of next-generation sequencing systems. J Biomed Biotechnol. 2012;2012:251364. 25. Loman NJ, Misra RV, Dallman TJ, Constantinidou C, Gharbia SE, Wain J, et al. Performance comparison of benchtop high-throughput sequencing platforms. Nat Biotechnol. 2012;30(5):434-9. 26. Quail MA, Smith M, Coupland P, Otto TD, Harris SR, Connor TR, et al. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics. 2012;13:341. 27. Junemann S, Sedlazeck FJ, Prior K, Albersmeier A, John U, Kalinowski J, et al. Updating benchtop sequencing performance comparison. Nat Biotechnol. 2013;31(4):294-6. 28. Singh RR, Bains A, Patel KP, Rahimi H, Barkoh BA, Paladugu A, et al. Detection of high-frequency and novel DNMT3A mutations in acute myeloid leukemia by high-resolution melting curve analysis. J Mol Diagn. 2012;14(4):336-45. 29. Verma S, Greaves WO, Ravandi F, Reddy N, Bueso-Ramos CE, O'Brien S, et al. Rapid detection and quantitation of BRAF mutations in hairy cell leukemia using a sensitive pyrosequencing assay. Am J Clin Pathol. 2012;138(1):153-6. 30. Deqin M, Chen Z, Nero C, Patel KP, Daoud EM, Cheng H, et al. Somatic deletions of the polyA tract in the 3' untranslated region of epidermal growth factor receptor are common in microsatellite instability-high endometrial and colorectal carcinomas. Arch Pathol Lab Med. 2012;136(5):510-6. 31. Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nat Biotechnol. 2011;29(1):24-6. 32. Routbort MH, B. Patel, K.P, Singh, R.R. Aldape,K. Reddy,N.G. Barkoh,B.A et al. OncoSeek - A versatile Annotation and Reporting System for Next Generation Sequencing-Based Clinical Mutation Analysis of Cancer Specimens, Abstract - Association For Molecular Pathology, Annual Conference. J Mol Diagn. 2012;14(6). 33. Ye K, Schulz MH, Long Q, Apweiler R, Ning Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics. 2009;25(21):2865-71. 34. Singh RR, Patel KP, Routbort MJ, Reddy NG, Barkoh BA, Handal B, et al. Clinical validation of a next-generation sequencing screen for mutational hotspots in 46 cancer-related genes. J Mol Diagn. 2013;15(5):607-22. 35. Bagrintseva K, Schwab R, Kohl TM, Schnittger S, Eichenlaub S, Ellwart JW, et al. Mutations in the tyrosine kinase domain of FLT3 define a new molecular mechanism of

20

acquired drug resistance to PTK inhibitors in FLT3-ITD-transformed hematopoietic cells. Blood. 2004;103(6):2266-75. 36. Levine RL, Wadleigh M, Cools J, Ebert BL, Wernig G, Huntly BJ, et al. Activating mutation in the tyrosine kinase JAK2 in polycythemia vera, essential thrombocythemia, and myeloid metaplasia with myelofibrosis. Cancer Cell. 2005;7(4):387-97. 37. Illmer T, Schaich M, Ehninger G, Thiede C. Tyrosine kinase mutations of JAK2 are rare events in AML but influence prognosis of patients with CBF-leukemias. Haematologica. 2007;92(1):137-8. 38. Hou HA, Chou WC, Lin LI, Chen CY, Tang JL, Tseng MH, et al. Characterization of acute myeloid leukemia with PTPN11 mutation: the mutation is closely associated with NPM1 mutation but inversely related to FLT3/ITD. Leukemia. 2008;22(5):1075-8. 39. Tartaglia M, Martinelli S, Iavarone I, Cazzaniga G, Spinelli M, Giarin E, et al. Somatic PTPN11 mutations in childhood acute myeloid leukaemia. Br J Haematol. 2005;129(3):333-9. 40. Rucker FG, Schlenk RF, Bullinger L, Kayser S, Teleanu V, Kett H, et al. TP53 alterations in acute myeloid leukemia with complex karyotype correlate with specific copy number alterations, monosomal karyotype, and dismal outcome. Blood. 2012;119(9):2114-21. 41. Fernandez-Mercado M, Yip BH, Pellagatti A, Davies C, Larrayoz MJ, Kondo T, et al. Mutation patterns of 16 genes in primary and secondary acute myeloid leukemia (AML) with normal cytogenetics. PLoS One. 2012;7(8):e42334.

21

Figure Legends

Figure 1. Sensitivity of detection using MiSeq. (A) A representative image of the aligned

sequencing reads as visualized in Integrative Genome Viewer shows a progressive decrease

in the single nucleotide variant detected in the background of wild type sequence in

sequentially diluted H2122 cell line DNA into HL60 cell line DNA. With dilution, a clear and

proportionate decrease in the homozygous KRAS (GGT>TGT, p.G12C) and a heterozygous

MET (AAC>AGC, p.N375S) mutation are evident. KRAS gene has the reverse orientation on

chromosome 12. However, by default, IGV exhibits aligned reads in ‘forward’ orientation.

Hence the substituted nucleotide appears as ‘A’ instead of ‘T’ in the reads (CCA>ACA or

GGT>TGT). The directions of gene orientation are indicated by the arrows. (B) The expected

variant frequencies and the average variant frequency of the KRAS (p.G12C) and MET

(p.N375S) mutations detected in two independent sensitivity analyses are shown.

Figure 2. Concordance of MiSeq, Pyrosequencing and IT-PGM. A representative sample in

which a KRAS mutation (p.G60V) identified by pyrosequencing was also detected and called

by both MiSeq and the Ion-Torrent Personal Genome Machine. (A) The pyrosequencing

results showing the base change (GGT>GTT, indicated by the arrow) in KRAS in comparison

with wild type control. (B) The same mutation change is evident in the aligned reads from the

MiSeq sequencing output at a very high coverage of 3,931X and showing a variant frequency

of 45.6%. (C) Sequencing on the IT-PGM also confirms the same mutation at a sequencing

depth and variant frequency comparable with the MiSeq. KRAS gene has the reverse

orientation on chromosome 12. As the default setting on IGV exhibits aligned reads in ‘forward’

22

orientation the substituted nucleotide appears as ‘A’ instead of ‘T’ in the reads (CCA>CAA or

GGT>GTT). The orientation of the gene is depicted by the arrows in panels B and C.

Figure 3. Concordance of MiSeq, Sanger sequencing and IT-PGM. A mutation in IDH2

(p.R132H) resulting from a CGT>CAT substitution originally detected by Sanger sequencing

(A) is also detected clearly in the MiSeq sequencing output and (B) sequencing using the IT-

PGM with comparable coverage and mutational frequency between both platforms. IDH2 has

reverse orientation on chromosome 15. The default setting in IGV shows aligned reads in

forward orientation only. Hence, the substituted nucleotide appears as ‘T’ instead of ‘A’ in the

reads (CGT>CAT or GCA>GTA). The orientation of the gene is depicted by the arrows in

panels B and C.

Figure 4. Ability of sequencing using MiSeq to detect insertions in NPM1. (A) A characteristic 4

bp insertion (G>GTCTG) in exon 11 of NPM1 gene is evident in this sample as detected by

capillary electrophoresis and is present at a ratio of 0.426 or 42.6%. (B) Sequencing of the

same sample using MiSeq also detected and called this mutation at a sequencing depth of

1,164X and a variant frequency of 32.8%. (C) IT-PGM also detected and called the same 4 bp

insertion at a sequencing depth of 2,249X and a variant frequency of 26.5% in the background

of wild type sequence. The presence of the insertions in the aligned reads are indicated as a

characteristic red bars in IGV as seen in panels B and C ( pointed by the arrows).

Figure 5. Assay sensitivity for V2 upgrade. Undiluted (100%) and sequentially diluted samples

of DLD1 cell line DNA into normal female control DNA (50%, 25%, 10% and 5%) were

23

sequenced in 6 different sequencing runs and the ability to detect the 8 different heterozygous

(monoallelic) mutations present in each dilution was tested. These mutations were successfully

detected in every dilution tested in all of the sequencing runs with minimal variation in the

variant frequency detected.

Figure 6. Application for disease monitoring. (A) An IDH2 and NPM1 mutation detected in a

patient at the onset of the disease were missed subsequently when present at low levels (days

10 and 185) when monitored by Sanger sequencing but were effectively detected and called

by MiSeq. The time point of clinical remission is indicated by the arrow (B) Mutations in

DNMT3A and NRAS, and insertions in NPM1 and FLT3 were detected at diagnosis and

followed subsequently. Two low-lying FLT3 D835 mutations also appear at day 95 indicating

the need to follow multiple mutations. The FLT3 ITD percentages as detected by CE are

plotted. The time point of clinical remission is indicated by the arrow (C) Two NRAS mutations

were detected and one of it disappears with treatment indicating distinct clones. A FLT3

Y842H activating mutation, not routinely tested was also detected by multi-gene screening on

MiSeq (D) Mutations were detected in 4 genes at diagnosis warranting testing of each of them

post treatment. The mutiplexing capacity of MiSeq would help consolidating them on to a

single platform. Data presented in sub panels C and D were patients with refractory AML.

1

Supplementary Methods

Genes interrogated by TruSeq Cancer Amplicon Panel: AKT1, BRAF, FGFR1, GNAS,

IDH1, FGFR2, KRAS, NRAS, PIK3CA, MET, RET, EGFR, JAK2, MPL, PDGFRA,

PTEN, TP53, FGFR3, FLT3, KIT, ERBB2, ABL1, HNF1A, HRAS, ATM, RB1, CDH1,

SMAD4, STK11, ALK, SRC, SMARCB1, VHL, MLH1, CTNNB1, KDR, FBXW7, APC,

CSF1R, NPM1,SMO, ERBB4, CDKN2A, NOTCH1, JAK3, PTPN11, GNAQ and GNA11.

Library preparation for sequencing on MiSeq: The genomic library was prepared using

250 ng of DNA template and customized Truseq Amplicon Cancer Panel (TSACP) kit

(illumina) according to the manufacturer’s protocol. Briefly, library preparation involved

hybridization of the probe mixture to genomic DNA and areas of interest were captured

by extension and ligation. In addition to areas complimentary to genomic DNA, each of

the probes has a common sequence, which is used to PCR-amplify the captured

sequences in a subsequent step. This PCR step also adds sequencing adapters (for

binding of sequencing primers) and short stretches of identifier sequences or barcodes

on the 3 and 5 ends of the amplicons to be used as unique sample identifiers, thus

facilitating multiplexed sequencing of the samples.

Sequencing using MiSeq. Library generated using customized TSCAP was purified

using AMPure magnetic beads (Agentcourt, Brea, CA) according to the manufacturer’s

protocol. From each library, equal quantities of the DNA were isolated and eluted using

Library normalization beads (TSCAP kit) following the manufacturer’s instructions and

equal volumes were mixed. This ensures same representation of the library from each

sample during multiplexed sequencing. Paired end sequencing of samples was

performed using MiSeq Reagent Kit, V1 (300 cycles) using MiSeq sequencer (illumina).

2

On average, libraries from 10 samples per sequencing run were multiplexed. Base

calling and the sequencing run summary were obtained using Real Time Analysis

Software (V1.14.23) (illumina). Sequencing quality was evident by the Q30 scores (one

error in 1000 bp sequence) and a cutoff of 85% sequence with Q30 score was used as

an indication of successful sequencing.

Sensitivity, inter-run and intra-run reproducibility. Sensitivity was determined by serially

diluting DNA from H2122 (ATCC CRL-5985), a human lung carcinoma cell line with a

known homozygous mutation in KRAS and a heterozygous mutation in MET into DNA

from HL60 cell line, a human promyelocytic leukemia cell line. Briefly, DNA isolated

from H2122 was diluted into DNA from HL60, in ratios of 1:4, 1:9, 1:19 (H2122:HL60)

resulting in 20%, 10% and 5% dilutions, respectively which were sequenced. Similarly,

DLD1 cell line (ATCC CCL-221) DNA with 8 heterozygous (monoallelic) mutations was

serially diluted into normal DNA control (Promega, Madison, WI) in ratios of 1:1, 1:3, 1:9

and 1: 19 (DLD1: normal) resulting in 50%, 25%, 10% and 5% dilutions of DLD1 DNA

and sequenced in 6 independent runs. Intra-run reproducibility was tested by

sequencing 1:9 diluted (10%) (DLD1: normal DNA) sample barcoded by 20 different

barcodes on a single sequencing run and inter-run assay reproducibility was assessed

by sequencing 10% DLD1 DNA diluted into normal DNA sequenced in 12 different runs.

Library preparation for Ion Torrent Personal Genome Machine (IT-PGM). Library

preparation was performed using the Ion Torrent (IT) Ampliseq Kit 2.0 Beta (Life

Technologies) and IT Ampliseq cancer panel primers (Life Technologies) to interrogate

mutational hotspots in 46 cancer-related genes following the manufacturer’s instructions

as described previously.34 Sequencing adaptors with short stretches of index sequences

3

(barcodes) that enable sample multiplexing were ligated to the amplicons using the Ion

Express Barcode Adaptors Kit (Life Technologies). The library prepared was quantified

using the Bioanalyzer highsensitivty DNA chip (Agilent Technologies Inc, Santa Clara,

CA).

Emulsion PCR and sequencing using IT-PGM. Clonal amplification of the barcoded

library DNA onto the ionspheres (ISPs) by emulsion PCR (E-PCR) and subsequent

isolation of ISPs with DNA was performed using Ion Express Template Kit (Life

Technologies) and IT OneTouch ES (Life Technologies) as described previously.34

Enriched ISPs were subjected to sequencing on a 318 IonChip (8 samples) using the

sequencing kit (Life Technologies) as per the manufacturer’s instructions. A cutoff of

300,000 reads with a quality score of AQ20 (1 misaligned base per 100 bases) was

used as a measure of successful sequencing of a sample. Torrent Suite V.2.0.1 and

variant caller V.1.0 (Life Technologies) were used for alignment and variant calling,

respectively using Human genome build 19 (hg19) as reference.

Supplementary Table 1. Summary of mutations detected in the sample set by MiSeq and IT-PGM. Tumor type used, the correlation of results from sequencing on MiSeq using Truseq Amplicon Cancer panel and on IT-PGM using Ampliseq Cancer Panel have been summarized (NC - Not Covered)

MiSeq Ion-Torrent PGM

Sample No WHO diagnosis (Blast Count)

Cytogenetic profile

Mutation Detected by Miseq (Coverage)

Variant

Frequency (%)

Mutation detected by IT-PGM (Coverage)

Variant Frequency

(%)

1 AMML (38%)

46,XX[20]

KDR p.Q472H (6193) KRAS p.G12A (5745) NRAS p. Q61R (765)

49.4 18.3 14.6

Yes (3849) Yes (1271) Yes (3392)

54.0 49.8 19.6

2 AML-MRC (33%)

46,XY[20]

KDR p.Q472H (3035)

NPM1, Exon 11, 4bp Insertion (3231)

47.4

29.4

Yes (2268)

Yes (2196)

47.0

33.8

3 AMML (67%)

46,XY,del(12) (p11.2)[2]

FLT3 p.D835H (1826) KDR p.Q472H (3927) MET p.T1010I (4415)

NPM1, Exon 11, 4bp Insertion (1190) APC p.L1129S (3605)

24.7 49.1 51.6

34.2 46.0

Yes (2584) Yes (4660) Yes (5780)

Yes (4448)

NC on IT-PGM (Sanger Confirmed )

48.4 47.0 49.7

31.8

-

4 AMML (81%)

46,XX[20]

KDR p.Q472H (3034) KIT p.M541L (6536)

NPM1, Exon 11, 4bp Insertion (937) ABL1 p.L248R (275)

49.1 50.1

35.1 5.09

Yes (4083) Yes (1608)

Yes (3018)

NC on IT-PGM (Sanger Confirmed )

46.0 50.1

39.0

-

5 AML without

maturation (84%) 46,XX[20]

IDH1 p.R132C (2801)

NPM1, Exon 11, 4bp Insertion (937)

47.23

31.8

Yes (4252)

Yes (3108)

37.6

32.8

6 t-AML (73%)

46,XX[20]

KIT p.M541L (5736)

NPM1, Exon 11, 4bp Insertion (937)

49.9

31.8

Yes (1608)

Yes (2090)

50.1

25.3

7 AML without

maturation (79%) 46,XX[20]

ATM, p.V410A (5907) KDR p.Q472H (4928) MET p.N375S (1449)

NPM1, Exon 11, 4bp Insertion (1449)

50.1 50.2 52.8

39.1

Yes (4648) Yes (2832) Yes (2335)

Yes (3166)

52

48.0 60.8

34.8

8 AML without

maturation (91%) 47,XX,+15[1]

NPM1, Exon 11,

4bp Insertion (1617) KDR p.Q472H (4693)

35.0 50.8

Yes(302) Yes (452)

35.8 49.0

9 Acute erythroid leukemia (20%)

46,XX[20] NPM1, Exon 11,

4bp Insertion (981) APC p.M1413V (2816)

39.1

51.1

Yes (2502)

NC on IT-PGM

(Sanger Confirmed )

32.1

-

10 AML-MRC (30%)

47,XX,+8[10]

MET p.N375S (1449) NRAS p.G12R (3376)

56.3 37.5

Yes (2335) Yes (4108)

60.8 21.2

11 AML without

maturation (93%)

46,XY[20]

ATM p.A1039T (2868) KDR p.Q472H (4856)

33.5 50.8

Yes (2346) Yes (3065)

62.4 46.0

12 AMML (38%)

46,XY[20]

KDR p.Q472H (4632) NPM1, Exon 11,

4bp Insertion (1929) NRAS p.G13D (2685)

51.0

35.6 20.1

Yes (2581)

Yes (2059) Yes (3911)

48.0

28.8 27.4

13 AML (56%)

46,XY[20]

IDH1 p. R132H (2607) KDR p.Q472H (4700) KIT p.M541L (9548)

NPM1, Exon 11, 4bp Insertion (2488)

25.7 99.7 48.4

19.5

Yes (4934) Yes (3426) Yes (1765)

Yes (3633)

46.4 99.0 50.7

18.1

14 Persistent AML (82%)

Complex KRAS p.G12D (3984) 51.4 Yes (1271) 49.8

15 AMML (24%)

46,XX [20] KRAS p.G12R (3574) 45.9 Yes (973) 42.5

16 AML-MRC (56%)

46,XY,del(16) (q13q21)[16]

IDH1 p.R132H (3009) KIT p.M541L (9081)

NPM1, Exon 11, 4bp Insertion (461)

KDR p.Q472H (4530)

44.7 49.0

31.8 58.2

Yes (4055) Yes (1097)

Yes (2092) Yes (3496)

51.3 47.3

31.6 52.0

17 AML (1%)

46,XX[20] KDR p.Q472H (3937) ABL1 p. L248R (458)

47.8 6.11

Yes (2448)

NC on IT-PGM (Sanger Confirmed)

50.0 -

18 AML (0%)

46,XY,t(11;21) (p11.2;q22)[1]

MET p.N375S (1735) 37.9 Yes (2335) 60.8

19 (AML) (3%)

46,XX[20]

IDH1 p. R132L (3012)

22.9

Yes (4934)

46.4

20 AML without

maturation (80%)

46,XY,del(20) (q11.2q13.3)[5]

JAK2 p.V617F (950) 50.2 Yes (4701) 49.2

21

MDS, RAEB-2 (10%)

46,XX[20]

KDR p.Q472H (3459)

NPM1, Exon 11, 4bp Insertion (1260)

48.1

30.2

Yes (1121)

Yes (1453)

52.0

23.9

22

MDS (1%)

47, XY, +8[10]

APC p.R876Q (2791)

52.9

Yes (3345)

46.1

23 AML-MRC (40%)

46,XX,del(5)(q13q33)[20]

KRAS p.Q61H (2494) 27.5

Yes (2985)

37.02

24

AML-MRC with CLL (53%)

46,XY[20] NPM1, Exon 11,

4bp Insertion (297)

32.3

Yes (3971)

29.3

25

AML-MRC (43%) 46,XY[18] KDR p.Q472H (4794) 48.7

Yes (1146)

61.0

26 AML-MRC (61%)

46,XX[20]

FLT3 p.D835Y (2939)

NPM1, Exon 11, 4bp Insertion (982)

49.1

36.4

Yes (2584)

Yes (1293)

48.4

36.1

27 AMML (61%)

46,XX[20]

FLT3 p.D835A (592) KIT p.M541L (3109) NRAS p.G13D (911)

24.6 50.3 36.0

Yes (4476) Yes (1608) Yes (3911)

12.5 50.1 27.4

28 AML-MRC (42%)

46,XY[20]

ATM p.P604S (1225)

55.1

Yes (555)

30.0

NPM1, Exon 11, 4bp Insertion (1807)

PTPN11 p.A72D

32.6 35.1

Yes (961) Yes (353)

22.0 79.6

29 (Acute

monocytic Leukemia) (74%)

53,XX,+X, der(1)t(1;1)

(p36.3;q12),+2,+4,+5,+8,+12,+

16[4]

ABL1 p.R307Q (2556) KRAS p.Q61H (1197)

NPM1, Exon 11, 4bp Insertion (1553)

44.6 37.9

20.2

Yes (703) Yes (2985)

Yes (2594)

61.4 37.0

21.7

30 AML-MRC (26%)

45,X,-Y[20]

NPM1, Exon 11, 4bp Insertion (2129) KDR p.Q472H (4150) KIT p.M541L (8411) MPL p.W515R (331)

32.0 48.7 99.5 32.3

Yes (1735) Yes (1134) Yes (4188) Yes (498)

24.7 58.0 99.9 41.3

31 t-AML (81%)

46,XX[20]

NPM1, Exon 11, 4bp Insertion (490)

APC p.I1307K IDH2 p.R140Q (1047)

36.5 48.8 47.6

Yes (3189) Yes (657)

NC on IT-PGM (Sanger Confirmed)

34.7 43.6

-

32 AML-MRC (88%)

46,XY,inv(9) (p12q13)[20]

KDR p.Q472H (4348) KIT p.M541L (6982) IDH2 p.R140Q (542)

48.6 49.5 26.5

Yes (3193) Yes (933)

NC on IT-PGM (Sanger Confirmed)

52.0 49.8

-

33 AML with

Inv16 (42%)

46,XX,inv(16)(p13.1q22)[16]

KDR p.Q472H (4683) RET p.S649L (8513) TP53 p. R148T (781)

47.7 47.6 92.5

Yes (2564) Yes (3713)

NC on IT-PGM (Sanger Confirmed)

56.0 40.4

-

34

AML with Inv16 (53%)

46,XX,inv(16) (p13.1q22)[19]

KDR p.Q472H (2304) KRAS p.G12D (1727) NRAS p.Q61R (325)

52.7 7.7

19.0

Yes (2858) Yes (1271) Yes (3392)

52.0 49.8 19.6

35 AML-MRC (37%)

46,XX,del(5)(q13q33)[4]

TP53 p.V274F (795) TP53 p. H179R (1059) APC p.S1404F (874)

39.1 24.8 5.15

Yes (503)

Yes (1558) NC on IT-PGM

(Sanger Confirmed)

26.8 41.7

-

36 AMML (65%)

46,XY,del(7)(q22),-10,-18[1]

KDR p.Q472H (4840) FLT3 p.D835Y (2365) IDH1 p.R132H (2975)

NPM1, Exon 11, 4bp Insertion (1164)

47.4 9.5

39.8

32.8

Yes (3474) Yes (3633) Yes (1443)

Yes (2249)

56.0 18.1 29.1

26.5

37 AML (81%)

46,XY,t(4;17)(q13;p13)[11]

FLT3 p.Y842H (1968) NRAS p. G12C (1938) NRAS p.G13D (1939)

15.7 21.5 18.0

Yes (3233) Yes (4108) Yes (3911)

9.3 21.2 27.4

38

AML (80%)

47,XY,+Y[6]

ATM p.V410A (6016) IDH1 p.R132H (2960) KDR p.Q472H (6029)

NPM1, Exon 11, 4bp Insertion (2214)

51.0 43.8 50.2

34.7

Yes (4648) Yes (4934) Yes (4030)

Yes (3961)

46.4 46.4 46.0

34.4

39 AML (45%)

Complex NPM1, Exon 11,

4bp Insertion (1342) KIT p.D816V (3832)

33.01 77.6

Yes (2975) Yes (4331)

31.6 77.8

40 Acute monocytic Leukemia (43%)

46,XX [20]

FLT3 p.D835V (1947) KDR p.Q472H (4861)

NPM1, Exon 11, 4bp Insertion (3560)

PTPN11 p. E76K (7409)

21.1 46.3

32.1 24.4

Yes (4476) Yes (1899)

Yes (1443) Yes (3270)

12.1 47.0

12.1 17.1

41 AML (32%)

NA IDH1 p.R132C (3520) KIT p.M541L (12238)

41.0 50.7

Yes (4934) Yes (1483)

46.4 48.6

42 AML-MRC (7%)

46,XY[20] KIT p.M541L (22765) KRAS p.G60V (3931)

99.8 45.6

Yes (991) Yes (8063)

99.8 38.5

43 MDS (15%)

46,XY[19]

KIT p.M541L (16620)

NRAS p.G13R (531) NRAS p.G12A(531)

49.4 23.5 21.2

Yes (1608) Yes (2804) Yes (2827)

50.1 23.6 23.6

44 AMML (61%)

46,XX,[22]

KDR p.Q472H (4218)

NPM1, Exon 11, 4bp Insertion (1052)

99.3

37.8

Yes (1474)

Yes (1576)

100

22.8

Supplementary Table 2. Sequencing of samples in parallel using MiSeq and

conventional sequencing platforms. A set of 11 samples were sequenced in parallel

using MiSeq and other sequencing platforms used in the laboratory and compared.

Diagnosis Blast (%)

MiSeq

Detection on Parallel Platforms

Mutation detected (Coverage)

Frequency (%)

1 AML (81%)

DNMT3A p. R882H (822)

47.8

Yes (Sanger)

2 AML (10%)

IDH1 p.R132C (4581) JAK2 p. V617F (7283)

33.7 96.8

Yes (Sanger)

Yes (Pyro)

3

AML (1%)

DNMT3A

p. R882H (2188)

25.8

Yes (Sanger)

4

AML-MRC (49%)

JAK2 p. V617F (5202)

25.9

Yes (Pyro)

5

AML-MRC (67%)

IDH1 p.R132C (3223)

34.3

Yes (Sanger)

6 AML-MRC (50%)

NRAS p.G12D (4905)

13.9

Yes (Pyro)

7

t-AML (90%)

IDH1 p.R132L (4018)

40.9

Yes (Sanger)

8 AMML (76%)

NPM1 (4bp Ins) (800)

DNMT3A p. R882H (466)

30.2

42.2

Yes (Capillary Electrophoresis)

Yes (Sanger)

9 t-AML (35%)

IDH2 p.R140Q (1816)

31.5

Yes (Sanger)

10 AML-MRC (90%)

NRAS p.Q61K (1167)

31.1

Yes (Pyro)

11 AML (30%)

IDH2 p.R140W (4522) JAK2 p. V617F (1833)

31.2 23.6

Yes (Sanger)

Yes (Pyro)

Supplementary Table 3. Detection of FLT3 ITDs by MiSeq Reporter and Pindel. 10 archival samples with known internal tandem duplications (ITDs) in exon 11 of FLT3 were sequenced using MiSeq and the aligned sequencing data was reanalyzed via Pindel, a program designed to detect indels. (NC – Not called by MiSeq Reporter)

FLT3 ITD Detection

Sample

Cytogenetics

Capillary Electrophoresis

(ITD Size and % allele frequency)

MiSeq Reporter

Pindel

1

AMML (67%)

46,XY,del(12)(p11.2) [2] 47,XY,+8[1] 46,XY[37]

18bp (23%) NC Yes (18bp)

2

AMML (81%)

46,XX[20] 31bp (39%) NC Yes (30bp)

3

AML without maturation (56%)

46,XY,t(11;21)(p11.2;q22)[1]

94bp (1%) NC Yes (93bp)

4

Acute monocytic leukemia (86%)

Complex 61bp (3%) NC Yes (60bp)

5

AMML (61%)

46,XX,[22]

42bp (3%)

58bp (35%)

NC NC

Yes (42bp) Yes (57bp)

6

AML with maturation (38%)

46,XY[13] 31bp (32%) NC Yes (30bp)

7

t-AML (81%)

46,XX[20] 46bp (1%) NC Yes (45bp)

8

Acute monoblastic leukemia (68%)

Complex 55bp (6%) 58bp (9%)

NC NC

Yes (54bp) Yes (57bp)

9

AML with Maturation (32%)

Complex 61bp (3%) NC Yes (60bp)

10

AML (3%)

NA (PB)

100bp (34%)

NC Yes (99bp)

Supplementary Table 4. Comparison of sequencing results from MiSeq Version1 and Version 2 sequencing. 12 samples were sequenced by both version 1 and Version 2 sequencing on MiSeq. The variants detected, their frequencies and sequencing coverage are compared.

Variants Version 1 Version 2

Sample Mutation Detected by Miseq and Coverage

Variant Frequency (%)

Coverage (X) Variant

Frequency (%) Coverage

(X)

1

FGFR3 p.F384L KIT, p.M541L MET, T1010I

48.7 51.4 51.2

306

18034 8119

49.5 51.0 54.3

216

11994 5479

2

APC p.E1317Q JAK2 p.V617F

50.7 59.7

9923 3015

49.9 59.3

12056 3416

3

KIT p.M541L TP53 p.Y220C

99.8 54.8

21312 5348

99.8 57.5

22977 5358

4

KIT p.M541L

51.8

19410

50.9

23113

5

KDR p.Q472H

50.5

9159

50.2

10275

6

KDR p.Q472H TP53 p.R273H

49.9 36.6

6433 2151

50.5 35.4

5449 1948

7

TP53 p.N247I TP53 p. R110P

8.7 29.1

3702 999

7.0

22.9

4539 1938

8

TP53 p.N239S 11.0 4460 10.3

4425

9

NPM1 (4bp insertion)

PTPN11 p.S502L

33.9 36.1

1676 5772

40.5 38.9

2863 8682

10

KIT p.M541L 48.5 9159 50.2 10275

11

KDR p.Q472H

99.5

5046

99.6

6083

12

KDR p.Q472H

51.4

6050

51.2

9207

Supplementary Table 5. Sensitivity studies were conducted using two cell lines (H2122 and DLD1). Details regarding the various dilutions used,

the average expected and detected variant frequencies of the mutations in six separate dilution studies are summarized below. The cytogenetic

profile of the cell lines is also included when available. The information summarized is an average of 2 and 6 separate studies for H2122 and

DLD1, respectively.

DLD1 (Variants)

Chromosome positions

DLD1

(Undiluted)

DLD1 (50%) (1:1 diluted)

DLD1 (25%) (1:3 diluted)

DLD1 (10%) (1:9 diluted)

DLD1 (5%)

(1:19 diluted)

Variant Frequency

(%)

Expected

(%)

Detected

(%)

Expected

(%)

Detected

(%)

Expected

(%)

Detected

(%)

Expected

(%)

Detected

(%)

IDH1 p.G97D PIK3CA p.D549N PIK3CA p.E545K

KIT p.V532I XPO1 p.E510K KRAS p.G13D TP53 p.S241F SMO p.T640A

2q33.3 3q26.3 3q26.3

4q11-q12 2p15

12p12.1 17p13.1 7q32.3

33.2 48.8 49.4 50.5 66.8 50.2 49.2 64.6

16.6 24.4 24.7 25.2 33.4 25.1 24.6 32.3

13.5 25.0 25.1 25.9 42.4 26.5 25.5 36.0

8.3 12.2 12.3 12.6 16.7 12.5 12.3 16.1

7.1 10.8 11.2 12.7 24.5 14.5 12.6 23.0

3.3 4.8 4.9 5.0 6.6 5.0 4.9 6.4

2.8 4.6 4.9 5.6

11.3 6.2 5.2 9.2

1.6 2.4 2.4 2.5 3.3 2.5 2.4 3.2

1.5 2.4 2.5 2.9 6.0 3.5 2.5 6.4

DLD1 Cytogenetic analysis (ATCC CCL-221): Karyotype: 46, XY, 2, + dir dup (2)(p13p23). Modal chromosome number of 46, occurring in 86% of cells. Rate of polyploidy (17.1%).

H2122 (Variants)

Chromosome positions

H2122

(Undiluted)

H2122 (20%) (1:4 diluted)

H2122 (10%) (1:9 diluted)

H2122 (5%) (1:19 diluted)

Frequency

(%)

Expected

(%)

Detected

(%)

Expected

(%)

Detected

(%)

Expected

(%)

Detected

(%)

NRAS p.G12C MET p.N375S

1p13.2 7q31

99.5 50.4

19.9 10.0

17.2 7.2

9.9 5.0

7.6 3.6

4.9 2.5

4.3 2.1

H2122 Cytogenetic analysis (ATCC CRL-5985) : Not available

Supplementary Figure Legends

Supplementary Figure 1. MiSeq Sequencing run metrics. The performance of 16 sequencing

runs are summarized. (A) The cluster formation efficiency across the runs is reflected by the

cluster density and the clusters pass filter. (B) Plot of total sequencing outputs with a base

calling quality of more than Q30 (one error in 1000 bases) for the sequencing runs. (C) Total

sequencing reads and reads pass filter which provide useful sequencing information (with

chastity scores of ≥ 0.6) are shown. (D) The total number of reads that could be identified or

assigned to the barcode sequence ID associated with each of the multiplexed samples.

Supplementary Figure 2. Validation of MiSeq Version 2 upgrade. (A) A comparison of the

cluster generation density and clusters pass filter of the sequencing runs with Version1 (V1) (12

samples) and Version 2 (V2) (with the same 12 samples and additional 12 samples). (B) A

doubling of sequencing capacity of MiSeq V2 upgrade (4.7Gb) was evident in comparison to

the V1 (2.1Gb) with comparable sequencing quality (Q30). (C) Higher sequencing capacity of

V2 also evident in higher sequencing reads and reads pass filter (upper panel). The 12 samples

common to runs of both versions had comparable sequencing reads and average per base

coverage (lower panel).

Supplementary Figure 3. Inter- and intra-run assay reproducibility. (A) Inter-run reproducibility

was assessed by sequencing the 1:9 diluted (10%) sample of DLD1 in 11 separate sequencing

runs and the detection of 8 mutations was tested. (B) Intra-run reproducibility was estimated by

sequencing 24 DLD1 (10%) samples, each indexed with a distinct barcode combination and

sequenced on a single sequencing run. Detection of the 8 mutations in DLD1 was found to be

consistent in each of the 24 samples sequenced.

Supplementary Figure 4. Summary of somatic mutations detected. A pie chart depicting

different genes and the number of somatic mutations detected in the AML/MDS cohort tested by

sequencing on MiSeq (includes all samples listed in supplementary Tables 1, 2, 3 and 4). The

mutations detected in our study found in genes not routinely tested for AML have also been

summarized (right panel). Information regarding the number of mutations detected in each gene

and their relative percentages are provided next to the gene names. Germline polymorphisms

were not included in this summary.

Supplementary Figure 1

0

200

400

600

800

1000

1200

1400

0 5 10 15 20

A

Sequencing Runs

Clu

ste

rs /m

m2 (

X 1

03)

Clusters

Clusters Pass

Filter

0

0.5

1

1.5

2

2.5

0 5 10 15 20

Sequencing Runs

Se

quencin

g O

utp

ut in

Gb

ases (

>Q

30

)

B

50

60

70

80

90

100

0 5 10 15 20

Sequencing Runs

% R

ea

ds Id

entifie

d

D

0

200

400

600

800

1000

1200

0 5 10 15 20

Total Reads

Reads Pass

Filter

Sequencing Runs

Se

qcuence R

ea

ds (

X 1

03)

C

V1 (12 Samples)

Pass filter – 85.4% V2 (24 Samples)

Pass filter – 85.0%

A

B

Re

ads (

mill

ion

s)

V1: 86.3% Q30 reads

2.1Gb sequence output

Clu

ste

r D

en

sity (

K/m

m2)

V2: 88.5% Q30 reads

4.7Gb sequence output

TOTAL

READS PF READS

% READS

INDENTIFIED

(PF)

Version 1 9162448 7830961 96.3212

Version 2 19598832 16657210 96.5842

C

Supplementary Figure 2

0

2

4

6

8

10

12

Variants

Vari

ant F

req

uen

cy (

%)

10% DLD1 (n=11)

IDH1

(p.G97D)

PIK3CA

(p.D549N)

PIK3CA

(p.E545K) KIT

(p.V532I) XPO1

(p.E510K)

KRAS

(p.G13D)

TP53

(p.S241F)

SMO

(p.T640A)

A

B

Supplementary Figure 3

0

2

4

6

8

10

12

10% DLD1 (n=24)

IDH1

(p.G97D)

PIK3CA

(p.D549N)

PIK3CA

(p.E545K) KIT

(p.V532I) XPO1

(p.E510K)

KRAS

(p.G13D)

TP53

(p.S241F)

SMO

(p.T640A)

Vari

ant F

req

uen

cy (

%)

Variants

MPL, 1, 4%

RET, 1, 5%

ABL1, 3, 14%

JAK2, 4, 18%

PTPN11, 3, 14%

TP53, 8, 36%

ATM, 2, 9%

Others, 22, 23%

NPM1, 24, 25%

KRAS, 7, 7%

NRAS, 11, 12%

KIT, 1, 1%

IDH1, 10, 11%

IDH2, 4, 4%

FLT3 , 16, 17%

Supplementary Figure 4

Somatic mutations (n=95)