genetic analysis of mutation and response in cancer
TRANSCRIPT
Genetic Analysis of Mutation and Response in Cancer
A DISSERTATION SUBMITTED TO THE FACULTY OF THE
UNIVERSITY OF MINNESOTA BY
MONICA KIEFFER AKRE
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
BRIAN VAN NESS, PHD ADVISOR
AUGUST 2017
© Monica Kieffer Akre 2017
i
Acknowledgements
“But science and everyday life cannot and should not be separated.”
Rosalind Franklin (1940, letter to Ellis Franklin)
In these words, Dr. Franklin and I are kindred. Science is my sustenance. I
have the wonderment of a child. I often anthropomorphize my work: the
chromosomes, the cells, the code—whatever it is I am working on, I feel a
relationship. I’ve been known to talk to chromosomes or cells. I come home from
a day of work and tell my children about what Nature had to say to me today. For
Dr. Franklin, the struggle must have been heart-wrenching. I’m comforted that
she seemed to find a space in science, if not among Nobel Laureates, then in the
peaceful folds of the mystery.
This is the greatest piece I must acknowledge: all the women who have
come before me. Not just in science, but also in all things. To my Great
Grandmother Anna Hammel, who homesteaded as a woman at the turn of the
last century in western North Dakota and kept it in her name her entire life—a
pioneer and intrepid role-model. To my Grandmother Lois Maroney Hammel,
who struggled with mental illness for which there was nothing to be done. To my
Great Grandmother Elizabeth Killian Sonsalla, married off at age 14 and raised 5
children alone during the Great Depression. And to my Grandfather Herman
Hammel who said, “ain’t that something” when told I was going to graduate
school—he was a great ally. I come from fortitude. And not just within the
ii
immortal thread of heredity, but from the wisdom of all the elders from whom I
have had the pleasure to listen as they share their secrets to happiness.
My family—Erik, Alexa, and Lily—was patient with me through so much.
This course did not run smooth, but they were grounding and supportive. They
shared the blood, sweat, and tears—and there was much of all three. We defend
together.
I am daily aware of, and grateful for, the opportunity to learn. And thank all
who have made the journey possible. All of the wonderful people I gained
experience with during rotations: the labs of Naoko Shima, Ph.D, Reuben Harris,
Ph.D, and Tim Starr, Ph.D.
I owe so much to the Harris lab. The individual lab members, each and
every one, showed me science, compassion, and strength and I am forever
grateful. Nadine Shaban, Ph.D., has given me strength, laughter, and
encouragement.
I found a welcoming home in the Van Ness lab of curious seekers and
have loved the communal hypothesizing. Thank you, Amit Mitra, Ph.D, and
Taylor Harding for making me feel welcome and grow as a scientist. The
neighboring Griffith lab has been a source of reagents and joy. Thank you.
My committee members past and present have helped guide me and
challenge me: Tim Starr Ph.D., David Largaespada Ph.D., Anja Bielinsky, Ph.D.,
Chad Myers, Ph.D., Ran Blekhman, Ph.D., Nik Somia, Ph.D., and Naoko Shima,
Ph.D. Thank you.
iii
And finally, thank you to Dr. Brian Van Ness. The past 2 years in your lab,
I have felt like a child in a playground able to wonder. Thank you for bringing me
back to focus when my wonder turned to wander. The opportunity to discuss and
debate science (usually during your lunch—I promise I didn’t do that on purpose)
was invaluable in shaping me as a critical scientist.
iv
Abstract
This thesis is divided into two distinct parts: mutation and treatment of
cancer. Each abstract is described separately.
Molecular, cellular, and clinical studies have combined to demonstrate a
contribution from the DNA cytosine deaminase APOBEC3B (A3B) to the overall
mutation load in breast, head/neck, lung, bladder, cervical, ovarian, and other
cancer types. However, the complete landscape of mutations attributable to this
enzyme has yet to be determined in a controlled human cell system. In Chapter
2, we report a conditional and isogenic system for A3B induction, genomic DNA
deamination, and mutagenesis. Targeted sequencing of portions of TP53 and
MYC demonstrated greater mutation accumulation in the A3B-eGFP exposed
pools. Clones were generated and microarray analyses were used to identify
those with the greatest number of SNP alterations for whole genome sequencing.
A3B-eGFP exposed clones showed global increases in C-to-T transition
mutations, enrichments for cytosine mutations within A3B-preferred trinucleotide
motifs, and more copy number aberrations. The 293-based system characterized
here still yielded a genome-wide view of A3B-catalyzed mutagenesis in human
cells and a system for additional studies on the compounded effects of
simultaneous mutation mechanisms in cancer.
Personalized medicine includes the identification of the biomarkers and/or
expression profiles important in the treatment of cancer. Cancer cell lines offer a
representative diversity of molecular processes involved in both malignant
v
phenotypes as well as therapeutic responses. Coupling cancer line expression
with high-throughput drug screens allows for examination of perturbed pathways
that influence drug sensitivities. Here we use The Sanger Institute’s Genomics of
Drug Sensitivity in Cancer to develop a bioinformatic and pathway analysis
approach that identifies activation of the BCR pathway as an important biomarker
in response to the tyrosine kinase inhibitor, dasatinib, used in hematologic cancer
therapies. This approach establishes a process by which data from cell line
repositories can be used to identify biomarkers associated with drug response in
the treatment of cancers.
vi
Table of Contents List of Figures .................................................................................................................. x
Chapter 1 Introduction
Part I: Background for Mutational Processes ........................................................... 1
Controlled Endogenous Mutagenesis .................................................................... 2
Antibody Diversification ......................................................................................... 2
Foreign DNA Restriction ........................................................................................ 3
Nuclear APOBEC3 Activity .................................................................................... 4
Enzymatic-induced Mutation in Cancer ................................................................. 4
Substrate Context Preference ............................................................................... 5
A3B as Mutagen .................................................................................................... 6
Statement of Thesis ............................................................................................... 7
Part II: Background for Drug Prediction Expression Profiles ................................. 7
Components of Precision Medicine ....................................................................... 7
Diagnostic Mutations ............................................................................................. 8
Prognostic Mutations ............................................................................................. 9
Prognostic Expression ........................................................................................... 9
Targeted Treatment ............................................................................................. 10
Drug Metabolism .................................................................................................. 10
Genetic Approaches to Predictive Problems ....................................................... 11
Precision Medicine Summary .............................................................................. 11
Commonalities Across Cancers ........................................................................... 12
Opportunities in Public Repositories .................................................................... 12
vii
Diving In to Big Data ............................................................................................ 13
B-cell Development .............................................................................................. 15
Statement of Thesis ............................................................................................. 16
Chapter 2 Mutation Processes in 293-Based Clones Overexpressing the DNA Cytosine Deaminase APOBEC3B
Overview ..................................................................................................................... 28
Introduction ................................................................................................................ 29
Materials and Methods .............................................................................................. 32
Cell Lines ............................................................................................................. 32
Immunoblots ........................................................................................................ 32
DNA Deaminase Activity Assays ......................................................................... 33
Differential DNA Denaturation (3D) PCR Experiments ........................................ 34
SNP Analysis ....................................................................................................... 35
Whole Genome Sequencing (WGS) .................................................................... 36
Results ........................................................................................................................ 37
System for conditional A3B expression ............................................................... 37
Iterative rounds of A3B exposure ........................................................................ 38
Targeted DNA sequencing provides evidence for A3B mutagenesis .................. 40
Genome wide mutation analyses ......................................................................... 41
A3B mutational landscape by whole genome sequencing .................................. 43
Discussion ................................................................................................................. 46
Acknowledgements ................................................................................................... 48
viii
Chapter 3 Development of Expression-based Biomarkers of Dasatinib Response in
Hematologic Malignancies
Overview ..................................................................................................................... 57
Introduction ................................................................................................................ 58
Methods ...................................................................................................................... 60
Expression Analysis ............................................................................................. 60
Normalization of CCLE values ............................................................................. 60
Statistical Analysis ............................................................................................... 60
Tissue Culture ...................................................................................................... 61
Results ........................................................................................................................ 62
Classification of Response and Resistance ........................................................ 62
Differential gene expression and feature selection .............................................. 63
BCR and B-cell Development .............................................................................. 64
Evaluation of the genes in the BCR pathway ...................................................... 64
Validation ............................................................................................................. 65
Independent Cell Lines ........................................................................................ 65
Clinical Associations ............................................................................................ 67
Bortezomib Resistance, CD19,
and Collatoral Dasatinib Sensitivity ..................................................................... 67
Discussion ................................................................................................................. 68
Acknowledments: ...................................................................................................... 70
Chapter 4
Discussion and Future and Future Aims ................................................................ 87
ix
Mutational Processes ................................................................................................ 87
Future Directions ................................................................................................. 87
Predictive Expression Profiles ................................................................................. 88
Future Directions ................................................................................................. 88
The Importance of Pathway Analysis .................................................................. 89
Personal Future Directions .................................................................................. 89
Bibliography .............................................................................................................. 94
x
List of Figures
Chapter 1
Fig 1. Antibody Formation: Use for Controlled Mutagenesis ................................ 17
Fig. 2 Outcomes of A3B Deamination ..................................................................... 19
Fig 3. APOBEC3 Family of Cytosine Deaminases .................................................. 22
Fig 4. Precision Medicine in Cancer Timeline ......................................................... 23
Fig 5. B-cell Differentiation ....................................................................................... 26
Chapter 2
Fig 1. A conditional system for A3B expression. ................................................... 49
Fig 2. A3B induction optimization and targeted sequencing results ................... 51
Fig 3. SNP analyses to estimate new mutation accumulation. ............................. 53
Fig 4. Summary of somatic mutations detected by WGS. ..................................... 55
Chapter 3
Fig 1. Workflow .......................................................................................................... 72
Fig 2. Diversity of Dasatinib Response and Expression Correlation ................... 74
Fig 3. BCR Pathway Activated in Extreme Responders ........................................ 76
Fig 4. B-cell Differentiation Stages with Malignancy and CD19 Expression of Cell
Lines ........................................................................................................................... 78
Fig 5. Statistically Significant Signature Genes ..................................................... 80
Fig 6. Gene Expression Signature of Extreme Response ..................................... 82
Fig 7. Validation of Lines With 5 Gene Signature ................................................... 84
Supp Fig 1. Expression Comparison between Multiple Myeloma and
Waldenström’s macroglobulinemia ......................................................................... 86
xi
Chapter 4
Fig 1. Genes of the BCR Pathway ............................................................................ 91
Fig 2. Potential of Expression Profiling in Precision Medicine ............................. 93
1
Chapter 1 Introduction
This thesis is divided into two distinct parts: mutation and treatment of
cancer. The background of each will be done separately.
Part I: Background for Mutational Processes
It has long been suspected that cancer arises as an imprecise passing on
of hereditary material of somatic cells. Indeed, over 100 years ago, through
astute observation of dividing sea urchin cells Theodor Boveri discovered that an
exact chromosome complement was necessary for controlled growth. Boveri
hypothesized in 1902 that an aberrant number of chromosomes in a single cell
gave rise to uncontrolled growth similar to a tumor(1). Since the time Boveri and
his wife, Marcella, made these observations, many more observed the specifics
of mutations and their contribution to oncogenesis.
Viruses proved to be not only an exogenous source of cancer, but also
communicable (2). Mutations accrued in irradiated mice resulted in tumors (3).
More sources of mutations were added to the list of growing scientific
understanding. These sources could be divided into two groups: exogenous and
endogenous.
Mutations have been accepted as one cause of cancer. But there are also
beneficial uses for mutations. Indeed, we would still be bubbles in the primordial
ooze rather than multi-cellular, multi-function beings had evolution not used
2
mutation to retool gene products into the complexity we see in the diversity of life
forms.
Within humans, 3 important mutational processes occur: antibody
formation and diversification, foreign DNA restriction, and T-cell receptor
rearrangements. I will describe in detail the first two processes.
Controlled Endogenous Mutagenesis
Antibody Diversification
Mutations are necessary for diversification during antibody production.
Antibody diversity and specificity is a result of recombination of the heavy and
light (either kappa or lambda) chain components of the future antibody.
Recombination activating gene (RAG) enzymes are responsible for controlled
cutting of variable, diversity, and joining (V(D)J) regions of the heavy chain (HC)
which are then rejoined in a newly re-configured genetic sequence by non-
homologous end joining (NHEJ) (4). This process allows construction of
antibodies with an enormous diversity of gene rearrangements. RAG1/2-assisted
recombination also occurs at either the kappa or lambda light chain (LC) variable
and joining (VJ) gene regions (Fig 1A-C). The heavy and light chains are
presented together via disulfide bonds upon the IgM stem embedded in the
plasma membrane. The new, unique configuration is now tested for efficacy
against potential immunological threats.
Activation-Induced Cytosine Deaminase (AID) is an enzyme responsible
for removing amine groups from single-stranded cytosines. The resulting uracil
lesion is mismatched with the no-longer-complementary guanine on the opposite
3
strand. This lesion can be repaired back to an error-free cytosine through base-
excision repair. However, the lesion can also result in mutation (Fig 2).
1. Replication over the uracil results in a C>T transition,
2. translesion synthesis can bring about C>T transitions or C>A
transversions,
3. mutagenic mismatch repair can result in transitions or
transversions,
4. and abasic sites left un-repaired may result in double-stranded
breaks(5).
AID deamination is responsible for somatic hypermutation (SHM) that
modifies antibodies and diversifies them to acquire a “best fit” antibody to attack
foreign invaders with precision (Fig 1D)(6). Class-switch recombination (CSR)
occurs when AID-induced double-stranded breaks recombine via non-
homologous end joining to result in isotype of the antibody produced (Fig 1E)(4).
Antibody construction is an orchestrated process and the enzymes involved are
well controlled and largely sequester their activity to the antigen binding sites
(hypervariable regions) of IGH, IGL (7).
Foreign DNA Restriction
Cellular immunity is a defense system against foreign DNA. It has been
formed as a set of 7 genes in tandem on chromosome 22, decendents from AID:
the APOBEC3 genes (Fig 4) (8). Like their progenitor, these genes also code for
active cytosine deaminases (9). The canonical function of the APOBEC3 proteins
is foreign DNA restriction in the cytoplasm during viral infection (10).
4
These enzymes deaminate DNA in the cytoplasm from viral infection or
even uptake of foreign DNA material protecting the cell from infection or
disruptive integration into host DNA (8,11–13).
The enzymes work on single-stranded DNA of retroviruses deaminating
cytosines and rendering the code ineffective in infection. The cytoplasmic
localization of the APOBEC3s reflect their function as that is where foreign DNA
first presents (14).
Nuclear APOBEC3 Activity
There are several APOBEC3s found in the nucleus (Fig 4). One important
function of nuclear localization APOBEC3s is the suppression of Alu and LINE
elements—ancient retrotransposon “jumping genes” (12,15–17) and therefore
their nuclear occupation may be justified. The restriction of retro-elements does
not require enzymatic activity (10). But are nuclear APOBEC3s enzymatically
active? Can they deaminate genomic cytosines?
Of the APOBEC3s found in the nucleus, APOBEC3B (A3B) is the only A3
member that is exclusively nuclear (14,18). A3B is, therefore, a good candidate
to study for evidence of genomic DNA mutation. What is the impact of A3B’s
enzymatic activity in the nucleus. A3B’s enzymatic activity is potent (11). Is it
under tight control, or does it, indeed cause mutation of the very DNA from which
it came?
Enzymatic-induced Mutation in Cancer
During controlled breakage of the IGH region, the DNA is vulnerable to
NHEJ with another region, or another chromosome altogether. AID was
5
discovered to be responsible for the t(8;14) chromosomal translocation involving
the newly severed IGH gene region, that was destined to recombine within the
same gene region, is now recombined with the MYC oncogene. This
translocation juxtaposing the MYC gene under the influence of IGH’s strong
promoter is the hallmark of Burkitt lymphoma (19). Balb/c mice are prone to
spontaneously developing c-Myc/IgH translocations and plasmacytomas (20).
However, Balb/c AID-/- mice do not develop neoplasms (19). Multiple other
translocations with evidence of AID-assisted mutagenesis involving the IGH gene
region have since been reported(21). This was the first example of the controlled
mutation machinery of antibody formation contributing to an oncogenic
translocation(19).
Substrate Context Preference
Given the distribution of cytosines in the genome, random chance would
predict a pattern of C>T transition with a fairly equal chance of being positioned
3’ of any one of the 4 bases. However, exhaustive analysis of substrate context
revealed A3B has a strong preference for deaminating cytosines flanked by a 5’
T and 3’ W base (5’TCW3’ context). This signature mirrors the context of point
mutations seen in multiple cancer patient samples (22–24).
As described in Figure 2 the resulting uracils can be properly excised and
repaired. However, mutations can occur if the repair of the site is incomplete or
erroneously repaired(5).
6
A3B as Mutagen
The APOBEC3 member most likely to cause mutations in human genomic
DNA is A3B. Firstly, A3B has import mechanisms to the nucleus allowing it
access to genomic DNA (18). It is the only member that has exclusive nuclear
localization.
Further, breast cancers with a high level of A3B expression also had
significantly higher numbers of mutations within A3B’s preferred substrate
context (22). Since first discovering the correlation between high A3B expression
and mutation load in breast cancer, more cancers have been examined for their
A3B expression, mutation load, and sequence context specific to A3B—the
cytosine substrate preceded by a thymine at the 5’ position and followed by a
weak base—adenine or thymine—in the 3’ position (5,23). The sequence context
of mutations have been examined by whole genome sequencing of tumors
(24,25). These studies show a mutational signature that is indicative of the
enzymatic activity of an APOBEC3 as shown in vitro (22).
The mounting data implicated A3B as a causal agent in mutations
(5,22,23,26,27). However, there are implications that another cytosine
deaminase, APOBEC3A, could be responsible for the mutation signature seen in
breast cancer (28). Further, tumors from individuals with a germline deletion of
the coding region of A3B also carry an APOBEC3 signature, even in patients
homozygous for the deletion (29,30). If A3B was deleted, what was responsible
for the mutations seen in these tumors?
7
The field required an investigation into what a single APOBEC3 molecule,
A3B, could do to human genomic DNA. We, therefore, developed a clean,
isogenic system to monitor A3B’s mutagenic impact over time and the genomic
landscape it shaped.
Statement of Thesis
Given the increased mutational load of breast cancers with higher A3B
expression, we decided to examine the accumulation of mutations by A3B in a
controlled, isogenic cell line system to prove the causal link between A3B and
resulting increased mutation load and context specificity. In an A3B-GFP
doxycline-inducible system with a GFP control derived from the same clone, 2
biological replicates were grown side-by-side and induced weekly for 10 weeks.
Our hypothesis was that A3B over-expressing lines would accumulate more
mutation over the course of the experiment compared to the GFP controls and
that these mutations would prefer the TCW sequence, characteristic of A3B’s in
vitro enzymatic activity.
Part II: Background for Gene Expression
Drug Prediction Profiles
Components of Precision Medicine
There are multiple components to precision medicine. Genetic profiles
contribute to precision medicine through identification of:
1. diagnostic and prognostic mutations,
2. diagnostic and prognostic expression,
8
3. targeted treatment, and
4. treatment metabolism by congenital variants.
Each of these components will be briefly examined for their historical
contributions to the overall campaign to successfully treat cancer (Fig 5).
Diagnostic Mutations
The first structural chromosomal defect of cancer was identified with the
discovery of the Philadelphia chromosome in 1959. Cytogenetic analysis of
chronic myelogenous leukemia (CML) patients (in Philadelphia and so, thusly
named) revealed a small chromosome that was soon discovered to be the
product of the causal translocation between chromosomes 9 and 22 (31). The
resulting derivative 22 contained a fusion gene of BCR (Breakpoint Cluster
Region) and ABL. The BCR gene region should not be confused with B-cell
Receptor, which shares its abbreviation. For this reason, the gene region will
always be followed by the words “gene region.”
ABL is a tyrosine kinase that is constitutively active when fused in the
BCR gene region. This extremely specific translocation led to one of the first
targeted therapies, albeit over 40 years after the initial discovery of the
translocation.
Imatinib mesylate was approved for use in the United States in 2001 and
dramatically improved CML treatment and outcomes (32). Mainstream media
responded with sensational headlines like Time’s, “There Is New Ammunition In
the War Against Cancer. These Are the Bullets” with a picture of imatinib
mesylate pills on the cover of the May 2001 edition (33).
9
Imatinib mesylate is a targeted tyrosine kinase inhibitor and, at least
initially, the hype seemed warranted. However, CML patients started relapsing.
Their cancers had mutated or amplified the fusion gene so that imatinib mesylate
no longer managed their disease (34). This highlights one of the difficulties in
precision medicine: the moving target of treating a dynamic disease—one that
can change, or is heterogeneous at the start of treatment.
Prognostic Mutations
The Döhner classification of chronic lymphocytic leukemia (CLL) is based
on cytogenetic analysis identifying deletions, duplications, translocations, and
aneusomies. The analysis stratifies patients into 5 prognostic groupings allowing
practitioners and patients to make treatment decisions based on their prognostic
classification (35).
Prognostic Expression
Her2 expression is an example of a prognostic marker turned targeted
therapy. The Her2 (ERBB2) protein is overexpressed in about 25% of breast
cancers often due to gene amplification. Her2 amplified breast cancers were
aggressive and patients were given a poor prognosis (36). Cleverly, a
monoclonal antibody against Her2 was used to neutralize the proliferation signal
from the amplified protein. A humanized form of the original mouse monoclonal
was created to allow for patients’ immune system to better tolerate the presence
of the foreign antibody (37). For Her2-amplified breast cancer, trastuzumab,
turned a poor prognosis into a treatable disease (36,38–40).
10
Targeted Treatment
Both imatinib mesylate and trastuzumab represent treatments targeted to
mutations and expression respectively. Success of these molecular targeting
therapies resulted in a race of clinicians and researchers looking for similar
molecular aberrations in other cancer types. As a result, imatinib mesylate is now
used as an effective treatment in gastrointestinal stromal tumors (41,42).
Trastuzumab is used in treating Her2-amplified metastatic gastric cancer and
gastroesophogeal junction adenocarcinoma (43–45). Thus, therapies designed
for a specific cancer were found to have possible broader applications. A shift
began in which cancer treatments were based on common genetic events, even
across cancers of different tissue types.
Drug Metabolism
The final component of precision medicine to be discussed here is
congenital variants of metabolism. Liver enzymes that either pre-process drugs
from pro-drug inactive to active forms or break down drugs to be cleared have a
variety of reported single nucleotide polymorphisms (SNPs). For example,
Tamoxifen is an anti-estrogen drug but is only effective when converted by the
liver enzyme CYP2D6 into its active endoxifen form (46). Single Nucleotide
Polymorphisms (SNP) variants of CYP2D6 with less enzymatic activity convert
less drug to endoxifen resulting in a weakened therapeutic benefit. However,
even with enzymatically potent CYP2D6 variants, tamoxifen benefit can be
reduced as other prescription drugs can reduce the activity of CYP2D6 (47).
11
Genetic Approaches to Predictive Problems
Cytogenetic analysis was our first experience with diagnostic precision
medicine—the discovery of the Philadelphia chromosome. Indeed, DNA analysis
for mutations and variations make up a greater portion of clinically available tests
for diagnosis, prognosis, targeted treatment, and metabolic variants.
However, there has been a place for gene expression, even relatively
early on with assays for receptors that classify tumors or indicate overexpression
via immunohistochemistry (IHC). The technique was developed in 1941, long
before the cancer field had a use for it (48). IHC and other immunofluorescent
techniques in research labs have now become routine. In 1996, a Phase II trial to
test patients for Her2 overexpression status via IHC, brought the utility into
clinical precision medicine (49).
The central dogma of biology: DNA makes RNA makes protein, would
make the argument that what can be learned from the DNA is also in the RNA.
And, importantly, it is the amount of RNA that may define cellular activity,
including response to therapies. In Chapter 3, I make use of expression data
from cell lines to describe expression signatures predictive of drug response.
Precision Medicine Summary
The patient-specific cancer characteristics described above also
determine which therapies will or won’t be effective. Further, a patient’s
congenital enzymatic make-up determines if drugs are converted into active
conformations, metabolized and cleared too quickly to illicit a response, or too
slowly and build up to toxic levels. The multiple layers of information that can be
12
gathered from a patient and tumor make precision medicine difficult and highlight
a necessity to integrate genetic data from a larger body of evidence.
Commonalities Across Cancers
Tumor suppressors and proto-oncogenes are within every non-tumor cell.
The disruption of their function can be found across different cancer types. For
example, the ALK gene was first found rearranged in large-cell lymphoma has
been found in non-small cell lung carcinoma and clear cell renal cell carcinoma
(50–52). TP53 is estimated to be mutated, deleted, or silenced in nearly every
type of cancer with frequencies ranging from 10% of hematologic malignancies to
nearly 100% of high-grade serous ovarian cancer (53). Likewise, mis-regulation
of MYC-targeted genes is involved in a statistically significant proportion of 11 of
the 17 cancer types examined in the the Cancer Genome Atlas (TCGA) (54).
Discoveries of similar aberrations across cancer types has led to a movement of
classification of cancer into genetic-based categories rather than the more broad
categories based on histology or tissue of origin (55–57).
The difficulties presented by variant components of precision medicine
require utilizing information from all cancers. Large databases representing
multiple cancer types allow us to cross tissue barriers as repositories are
collections of multiple cancer types, tissues, stages, and individuals. Chapter 3
utilizes big data from multiple cancer types.
Opportunities in Public Repositories
Big data presents the opportunity to explore the gene expression-drug
response relationship in greater detail and across multiple cancer types. Cancer
13
cell line collections such as the NCI-60, Genomics of Drug Sensitivity in Cancer
(GDSC), and Cancer Cell Line Encyclopedia (CCLE) offer mutation status,
expression profiles, and drug response data (58–60).
The GDSC has 1074 cell lines with whole-genome sequence, micro-array
expression data, and response profiles to 265 drugs (58). Examination of the
expression profiles of cell lines derived from diverse primary tumors and their
corresponding response to drugs allows us to dissect important gene and
pathway expression that contributes to either response or resistance to a drug.
Once these patterns are established and tested clinically, patient tumor profiles
could direct therapy selection toward a more effective treatment.
Diving In to Big Data
The gene-drug approach described above can, as I show in Chapter 3, be
extended beyond one gene to include multiple genes of a pathway. In order to
accomplish this, multiple steps in limiting and classifying the data were
employed.
Firstly, we subset the cell lines to only include B-cell malignancies, leaving
us with expression data for 94 lines. This was done to limit any confounding
impact of tissue-specific expression on analysis, but still benefit from the
combination of multiple cancer types to increase sample size as well as broaden
what is understood about the all the cancers of B-cell origin. This is a small
version of the pan-cancer analysis described earlier.
14
Secondly, we examined 14 drugs with targets currently being used for
treatment in B-cell malignancies. We chose to focus on Dasatinib, which is a
multi-kinase inhibitor.
Area under the survival curve (AUSC) is a measurement of drug response
that is, in this data set, normalized between 0 and 1. Lines with an AUSC for a
drug that approaches 1 is more resistant than a line that has a value of 0.5 or
less. It’s important to note that there is no specific criterion for what is responsive
in these studies. Relative responsiveness is the measure by which value cutoffs
for AUSC were based. Cutoffs were chosen based on breaks in the AUSC values
that we assess represented a natural cutoff between a physiological response to
the drug. These separations in AUSC values are drug-specific and, in some
drugs, do not exist, but rather a gradual, continuous, distribution of AUSC values
is observed.
We describe lines as Responders of dasatinib when they have an AUSC
of less than 0.75 with the additional criterion of 50% of cell death (IC50) at a drug
concentration below the maximum testing concentration divided by 4. The IC50
criterion was to ensure that there was cell death assayed well within the survival
curve plot. There were 14 Responders of the 71 lines for which drug testing was
available. We described lines as Non-Responders when their AUSC was greater
than 0.98. This is an extremely high value and there is no ambiguity that these
lines’ metabolism was unaffected by dasatinib.
It was important that we have cell lines that displayed extreme responses
so as not to confound analysis. We identified differential expression between
15
Non-Responders and Responders. Once a we narrowed down candidate gene
expression markers of dasatinib response, we examined the expression of these
candidates in the remaining 46 llines. We were looking for trends in lines with
partial response or resistance. Did they have expression values similar to their
nearest extreme responders?
B-cell Development
The hematopoietic stem cell starts down a course of differentiation that
has multiple bifurcations and destinations. The one arm we will focus on here is
that of B-cell differentiation. The differentiation of a B-cell is marked by surface
receptors, activation of the B-cell receptor (BCR) pathway, and the maturation of
the antibody described in Part I of this introduction (Fig 6).
The pluripotent pre-B cell has an activated B-cell receptor pathway with
high expression of BCR components such as CD19, CD79A, and CD79B
presented on the surface of the cell (61). The HC first recombines the D-J
regions followed by joining this newly recombined section with a rearrangement
of the V region. The pre-B-cell adds the HC to the BCR assembly as prototype of
an antibody with a surrogate LC substituting for the yet-to-be recombined IGL
region (either kappa or lambda). Should the surrogate LC bind sufficiently bind
and present the HC, the new pro-B cell moves forward in differentiation and
replaces the surrogate LC with the newly-formed LC to the BCR ensemble and
the naïve cell lies in wait. Once stimulated by an antigen, the intercellular
portions of CD79A and CD79B are phosphorylated and a proliferation and
survival cascade ensues in the mature B cell (61).
16
After the BCR has signaled, its usefulness is obsolete and the cell
downregulates the BCR pathway components. The fully mature product of the
differentiated B-cell, the plasma cell, in fact, has essentially no expression or
presentation of BCR components, such as CD19 (61).
The subset of B-cell malignancies from the GDSC has expression data for
71 cell lines, which also have drug response for dasatinib. It is to be expected
that molecules involved in differentiation will be differentially expressed based on
at what stage in development the malignancy arose. What is also expected is the
cell lines carry enough in common to reveal what is truly different between the
Responders and Non-Responders.
Statement of Thesis
We hypothesize that the phenotype of extreme response and non-
response to drug treatment in cell lines must be due to differences between the
lines that can be revealed by analysis of RNA expression. We focused our
attention on dasatinib response in B-cell malignancies. We augmented our
analysis by including pathway analsyis of the differentially expressed genes. Our
gene expression response signature was confirmed by validating the biomarkers
on untested cell lines and primary patient samples and supported by outcomes of
a clinical trial.
17
Fig1.AntibodyFormation:UseforControlledMutagenesis
A) The HC Variable, Diversity, and Joining (V(D)J) loci undergo recombination
via the double-stranded breaks induced by the RAG enzymes (depicted as white
circles). Non-homologous end joining joins breaks to reconfigure the locus into a
unique region.
B) The LC (either the kappa or lambda locus) also undergoes recombination,
however there is no diversity sequence within the LC.
C) After being transcribed and translated, the antibody assembles. The HC ends
in a constant region (black) that abuts to the IgM locus that creates the “stem” of
the antibody. The LC also ends in a constant region.
D) Somatic Hypermutation (SHM) occurs as AID deaminates cytosines within the
heavy and light chain loci. SHM of the HC is depicted here with the mutations
arrowed on resultant antibody.
E) Class Switch Recombination (CSR) occurs when the AID-induced lesion
results in a double-stranded break that recombines the locus designating the
type of antibody to be produced.
18
V(D)J Recombination
AIDU
UAID
U AID
G
G
G
Somatic HyperMutation (SHM) Class-Switch Recombination
Fig 1. Antibody Formation: Use for Controlled Mutagenesis
VJ Recombination
Heavy C
hain
Lig
ht C
hain
(kappa o
r la
mbda)
IgA
IgE
IgM
IgD
IgG
AID
AID
IgG
A
B
C
D E
variable
div
ers
ity
join
ing
variable
join
ing
19
Fig.2OutcomesofA3BDeamination
A deaminated cytosine has multiple outcomes. Repair enzymes can, excise, and
replace the cytosine resulting in error-free repair. However, all along the process
of repair, mutations can result as
1) the lesion is used as a template during replication
2) error-prone mismatch repair can result in transition or transversions
3) Translesion synthesis with an error-prone polymerase can also result in
transition and transversions.
4) Double-stranded breaks can occur at weakened abasic sites or when two
abasic sites are near and nicked with AP endonuclease.
20
T
CA T
C AT AG
A G T
Highlighted Cytosines are within the A3B preferred context
Single-stranded DNA allows A3B catalytic acess
A3B
A3B
T
UA T
U AT AG
A G T
T
UA T
U AT AG
A G T
T
UA T
U AT AG
A G T
T
CA T
C AT AG
A G T
Lesions
Error-free Repair
Uracil DNA
Glycosylase Excision
T
UA T
U AT AG
A G T
T
UA T
U AT AG
A G T
T
A T
AT AG
A G T
Abasic Sites
Backbone Breakage
by AP endonuclease
T
UA T
U AT AG
A G T
T
UA T
U AT AG
A G T
T
A T
AT AG
A G T
T
A
AG
T
A
T
AG
T
Double-stranded
Break
Deamination by A3B
Polymerase
Replication over uracil-
C>T transition
Translesion
Synthesis-
Transitions or
Transversions
Mutagenic Mismatch
Repair-
Transitions or
Transversions
Fig 2. Outcomes of A3B Deamination
21
Fig 3. APOBEC3 Family of Cytosine Deaminases
A schematic representing the location of the 7 APOBEC3 family members in
tandem on chromosome 22. The cell representations on the right show the sub-
cellular localization of each family member corresponding to the color scheme.
22
Chromosome 22
22q13.1
APOBEC3 Family
A3A
A3B
A3C
A3DE
A3G
A3H
A3F
Fig 4. APOBEC3 Family of Cytosine Deaminases
Sub-cellularLocalization
Fig 3. APOBEC3 Family of Cytosine Deaminases
23
Fig.4PrecisionMedicineinCancerTimeline
Examples of Diagnostic, Prognostic, Treatment, and Metabolic components of
cancer care throughout recent history.
24
Fig
. 4 P
reci
sion
Med
icin
e in
Can
cer T
imel
ine
25
Fig5.B-cellDifferentiation The schematic represents B-cell differentiation and antibody maturation from left
to right. The lymphoid stem cell commits to the b-cell lineage and undergoes D-J
rearrangement of the IGH region as a Pro-B cell. Later in this stage the variable
region recombines with the newly formed DJ region. Entering the Pre-B cell
stage, the HC is displayed along with a surrogate LC. If successful binding of the
surrogate LC with the HC occurs, the LC is recombined and added to the CD79A
and CD79B components with the IgM stem at the surface of the immature B cell.
The naïve B cell travels to secondary lymph tissues to wait for antigen
stimulation. Upon antigen stimulation, the intercellular BCR intercellular
components are phosphorylated and a signaling cascade results in proliferative
and survival signals. Class switch recombination in the mature B cell results in
the formation of antibody isotypes. Receptors and molecules of the BCR pathway
are shown below as bars that extend across the differentiation stages where they
are most expressed.
26 Fi
g. 5
B-c
ell D
iffer
entia
tion
27
Chapter2
Mutation Processes in 293-Based Clones Overexpressing the
DNA Cytosine Deaminase APOBEC3B
Short title: APOBEC3B Mutation Signature in Human Cells
Monica K. Akre1, Gabriel J. Starrett1, Jelmar S. Quist2, Nuri A. Temiz1, Michael A.
Carpenter1, Andrew N. J. Tutt2, Anita Grigoriadis2, Reuben S. Harris1,3,*
1 Department of Biochemistry, Molecular Biology, and Biophysics,
Institute for Molecular Virology, Masonic Cancer Center, University of Minnesota,
Minneapolis, MN, USA, 55455
2 Breast Cancer Now Research Unit, Research Oncology, Guy’s Hospital, King’s
College London, London, UK, SE1 9RT
3 Howard Hughes Medical Institute, University of Minnesota, Minneapolis, MN,
USA, 55455
Keywords: APOBEC3B (A3B); DNA cytosine deamination; DNA damage;
genomic mutagenesis; mismatch repair deficiency; somatic mutation
28
Overview
Molecular, cellular, and clinical studies have combined to demonstrate a
contribution from the DNA cytosine deaminase APOBEC3B (A3B) to the overall
mutation load in breast, head/neck, lung, bladder, cervical, ovarian, and other
cancer types. However, the complete landscape of mutations attributable to this
enzyme has yet to be determined in a controlled human cell system. We report a
conditional and isogenic system for A3B induction, genomic DNA deamination,
and mutagenesis. Human 293-derived cells were engineered to express
doxycycline-inducible A3B-eGFP or eGFP constructs. Cells were subjected to 10
rounds of A3B-eGFP exposure that each caused 80-90% cell death. Control
pools were subjected to parallel rounds of non-toxic eGFP exposure, and
dilutions were done each round to mimic A3B-eGFP induced population
fluctuations. Targeted sequencing of portions of TP53 and MYC demonstrated
greater mutation accumulation in the A3B-eGFP exposed pools. Clones were
generated and microarray analyses were used to identify those with the greatest
number of SNP alterations for whole genome sequencing. A3B-eGFP exposed
clones showed global increases in C-to-T transition mutations, enrichments for
cytosine mutations within A3B-preferred trinucleotide motifs, and more copy
number aberrations. Surprisingly, both control and A3B-eGFP clones also elicited
strong mutator phenotypes characteristic of defective mismatch repair. Despite
this additional mutational process, the 293-based system characterized here still
yielded a genome-wide view of A3B-catalyzed mutagenesis in human cells and a
system for additional studies on the compounded effects of simultaneous
29
mutation mechanisms in cancer.
Introduction
Cancer genome sequencing studies have defined approximately 30 distinct
mutation signatures (reviewed by (27,62–64)). Some signatures are large-scale
confirmations of established sources of DNA damage that have escaped repair.
The largest is water-mediated deamination of methyl-cytosine bases, which
manifest as C-to-T transitions in genomic 5’-CG motifs (65). This process
impacts almost all cancer types and accumulates as a function of age. Other well
known examples include ultraviolet radiation, UV-A and UV-B, which crosslink
adjacent pyrimidine bases and result in signature C-to-T transitions (66), and
tobacco mutagens such as nitrosamine ketone (NNK), which metabolize into
reactive forms that covalently bind guanine bases and result in signature G-to-T
transversions (67). These latter mutagenic processes are well known drivers of
skin cancer and lung cancer, respectively, but also contribute to other tumor
types. A lesser-known but still significant example of a mutagen is the dietary
supplement aristolochic acid, which is derived from wild ginger and related plants
and metabolized into reactive species that covalently bind adenine bases and
cause A-to-T transversions(68,69). Aristolochic acid mutation signatures are
evident in urothelial cell, hepatocellular, and bladder carcinomas. Other
confirmed mutation sources include genetic defects in recombination repair
(BRCA1, BRCA2, etc.), post-replication mismatch repair (MSH2, MLH1, etc.),
and DNA replication proofreading function, which manifest as microhomology-
mediated insertion/deletion mutations, repeat/microsatellite slippage mutations,
30
and transversion mutation signatures, respectively(27,65,70).
The largest previously undefined mutation signature in cancer is C-to-T
transitions and C-to-G transversions within 5’-TC dinucleotide motifs(24,65,71).
This mutation signature occurs throughout the genome, as well as less frequently
in dense clusters called kataegis. This signature is ascribable to the enzymatic
activity of members of the APOBEC family of DNA cytosine to uracil
deaminases(22–24,65,71,72). Human cells encode up to 9 distinct APOBEC
family members with demonstrated C-to-U editing activity, and 7/9 have been
shown to prefer 5’-TC dinucleotide motifs in single-stranded DNA substrates:
APOBEC1, APOBEC3A, APOBEC3B (A3B), APOBEC3C, APOBEC3D,
APOBEC3F, and APOBEC3H. In contrast, AID and APOBEC3G prefer 5’RC and
5’CC, respectively (R = purine; reviewed by(73,74)). The size and similarity of
this protein family, as well as the formal possibility that another DNA damage
source may be responsible for the same mutation signature(75), have made DNA
sequencing data and informatics analyses open to multiple interpretations.
However, independent (14,22) and subsequent (5,23,28,72,76–80)
studies indicate that at least one DNA deaminase family member, A3B, has a
significant role in causing these types of mutations in cancer. First, A3B localizes
to the nucleus throughout the cell cycle except during mitosis when it appears
excluded from chromatin(14). Second, A3B is upregulated in breast cancer cell
lines and primary tumors at the mRNA, protein, and activity levels(5,22,81).
Third, endogenous A3B is the only detectable deaminase activity in nuclear
extracts of many cancer cell lines representing a broad spectrum of cancer types
31
(breast, head/neck, lung, ovarian, cervix, and bladder (5,22,81)). Fourth,
endogenous A3B is required for elevated levels of steady state uracil and
mutation frequencies in breast cancer cell lines(22). Fifth, overexpressed A3B
induces a potent DNA damage response characterized by g-H2AX and 53BP1
accumulation, multinuclear cell formation, and cell cycle deregulation(22,28,76).
Sixth, A3B levels correlate with overall mutation loads in breast and head/neck
tumors (22,77). Seventh, the biochemical deamination preference of recombinant
A3B, 5’TCR, is similar to the actual cytosine mutation pattern observed in breast,
head/neck, lung, cervical, and bladder cancers(5,22,23). Eighth, human
papillomavirus (HPV) infection induces A3B expression in several human cell
types, providing a link between viral infection and the observed strong APOBEC
mutation signatures in cervical and some head/neck and bladder cancers(82–
84). Ninth, the spectrum of oncogenic mutations in PIK3CA is biased toward
signature A3B mutation targets in HPV-positive head/neck cancers(77). Finally,
high A3B levels correlate with poor outcomes for estrogen receptor-positive
breast cancer patients (79,80,85).
Despite this extensive and rapidly growing volume of genomic,
molecular, and clinical information on A3B in cancer, the full breadth of this
enzyme’s activity on the human genome has yet to be determined. Here we
report further development of a human 293 cell-based system for conditional
expression of human A3B. The results reveal, for the first time in a human cell
line, the genomic landscape of A3B mutagenesis.
32
Materials and Methods
Cell Lines
We previously reported T-REx-293 cells that conditionally express A3B (22).
However, the mother, daughter, and granddaughter lines described here are new
in order to ensure a single cell origin and have all of the controls derived in
parallel. T-REx-293 cells were cultured in high glucose DMEM (Hyclone)
supplemented with 10% FBS and 0.5% Pen/Strep. Single cell derived mother
lines, A and C, were obtained by limiting dilution in normal growth medium.
These mother clones were transfected with linearized pcDNA5/TO-A3Bintron-
eGFP (A3Bi-eGFP) or pcDNA5/TO-eGFP vectors (22,86), selected with 200
µg/mL hygromycin, and screened as described in the main text to identify drug-
resistant daughter clones capable of Dox-mediated induction of A3Bi-eGFP or
eGFP, respectively. The encoded A3B enzyme is identical to “isoform a” in
GenBank (NP_004891.4). GFP flow cytometry was done using a FACSCanto II
instrument (BD Biosciences).
Immunoblots
Whole cell lysates were prepared by suspending 1x106 cells in 300µL 10x
reducing sample buffer (125mM Tris pH 6.8, 40% Glycerol, 4%SDS, 5% 2-
mercaptoethanol and 0.05% bromophenol blue). Soluble proteins were
fractionated by 4% stacking and 12% resolving SDS PAGE, and transferred to
PVDF membranes using a wet transfer BioRad apparatus. Membranes were
blocked for 1 hr in 4% milk in PBS with 0.05% sodium azide. Primary antibody
33
incubations, anti-GFP (JL8-BD Clontech) and anti-b-actin (Cell Signaling) were
done in at a 1:1000 dilution in 4% milk diluted in PBST, and incubation conditions
ranged from 4-8 degrees C for 2-16 hrs. Membranes were then washed 3 times
for 5 minutes in PBST. Secondary antibody incubations, anti-mouse 680
(1:20000) and anti-rabbit 800 (1:20000), were done in 4% milk diluted in PBST
with 0.01% SDS, and incubation conditions ranged from 4-8 degrees C for 2-16
hrs. The resulting membranes were washed 3 times for 5 minutes in PBST and
imaged using Licor instrumentation (Odyssey).
DNA Deaminase Activity Assays
This assay was adapted from published procedures (81,87). Whole-cell extracts
were prepared from 1x106 cells by sonication in 200µL HED buffer (25mM
HEPES, 5mM EDTA, 10% glycerol, 1mM DTT, and one tablet protease inhibitor-
Roche per 50mL HED buffer). Debris was removed by a 30 min maximum speed
spin in a tabletop micro-centrifuge at 4 degrees C. The supernatant was then
used in 20µL deamination reactions that contained the following: 1µL of 4pM
fluorescently-labeled 43-mer oligo (5’-
ATTATTATTATTCGAATGGATTTATTTATTTATTTATTTATTT-fluorescein)
containing a single interior 5’-TC substrate, 9.25µL UDG (NEB), 0.25µL RNase,
2µL 10x UDG buffer (NEB), 16.5µL lysate. Reactions were incubated at 37
degrees C for 1h. 2µL 1M NaOH was added and reaction was heated to 95
degrees C in a thermocycler for 10 min. 22µL of 2x formamide loading buffer was
34
added to each sample. 5µL of each reaction was fractionated on a 15% TBE
Urea Gel and imaged using a SynergyMx plate reader (BioTek).
Differential DNA Denaturation (3D) PCR Experiments
This assay was adapted from published procedures (22,88). Genomic DNA was
extracted from samples using a PureGene protocol (Gentra) and quantified using
Nanodrop instrumentation (ThermoFisher Scientific). 20 ng of genomic DNA was
subjected to one round of normal high denaturation temperature PCR using Taq
Polymerase (Denville) and primers for MYC (5’-ACGTTAGCTTCACCAACAGG
and 3’TTCATCAAAAACATCATCATCCAG) or TP53
(5’GAGCTGGAGCTTAGGCTCCAGAAAGGACAA and
3’TTCCTAGCACTGCCCAACAACACCAGC). 383 bp and 376 bp PCR products
were purified and quantified using qPCR with nested primer sets and SYBR
Green detection (Roche 480 LightCycler;
5’ACGAGGAGGAGAACTTCTACCAGCA and
3’TTCATCTGCGACCCGGACGACGAGA for MYC and
5’TTCTCTTTTCCTATCCTGAGTAGTGGTAA and
3’TTATGCCTCAGATTCACTTTTATCACCTTT for TP53). Equivalent amounts of
each PCR product were then used for 3D-PCR using the same nested PCR
primer sets. The resulting 291 and 235 bp products were fractionated by agarose
gel electrophoresis, purified using QIAEX II (Qiagen), cloned into a pJet vector
(Fermentas), and subjected to sequencing (GENEWIZ). Alignments and mutation
calls were done with Sequencher (Gene Codes Corporation).
35
SNP Analysis.
Granddaughter clones were established by limiting dilution after the final pulse
round. Genomic DNA was prepared using the Gentra PureGene kit (Qiagen,
Valencia, CA, quantified by agarose gel staining with ethidium bromide and by
NanoDrop measurements (Thermo Scientific, Wilmington, DE), and subjected to
SNP analyses by Source BioScience (Cambridge, UK) using the Human
OmniExpress-24v1-0 BeadChip (Illumina, San Diego, CA). Raw data were pre-
processed in GenomeStudio using the Genotyping Module (Illumina, San Siego,
CA). Genotype clustering was performed using the humanomniexpress_24v1-
0_a cluster file, whereby probes with a GenCall score below 0.15, indicating low
genotyping reliability, were discarded. All samples passed quality control as
assessed by call rates and frequencies. Genotypes for a total of 716,503 probes
were used for further analyses.
By comparing the genotypes of the granddaughter clones with their
respective mother clones, six classes of base substitutions could be determined
(C-to-T, C-to-G, C-to-A, T-to-G, T-to-C, and T-to-A). For example, a C-to-T
transition occurred if the C/C genotype of the mother clone changed to a C/T
genotype in the granddaughter clone. Given the design of some microarray
probes (i.e., some probes detect the Watson-strand rather than the Crick-strand),
a change from a G/G in the mother clone to a G/A genotype in the granddaughter
clone was also scored as a C-to-T transition.
36
Chromosomal abnormalities in the genomes of granddaughter clones were
identified with Nexus Copy Number 7.5 software (BioDiscovery, Hawthorne, CA),
using the matched mother clone as a reference. SNPRank segmentation was
applied and the segmented copy number data were further processed with the
Tumor Aberrations Prediction Suite (TAPS) to obtain allele-specific copy number
profiles (89). All analyses were performed using the R statistical environment
(http://www.R-project.org). The number of copy number alterations in the A3B-
eGFP pulsed clones were determined based on the difference between the
segment copy number counts of the A3B-eGFP pulsed clones and the eGFP
pulsed clones. Segments which the eGFP pulsed granddaughter clones were not
identical or had CN of 0 were excluded. These were subsequently binned by
copy number loss or gain. All SNP data sets have been deposited in the NCBI
GEO database under accession code GSE78710.
Whole Genome Sequencing (WGS)
Library preparation and sequencing was performed by the Beijing Genome
Institute (BGI) on the Illumina X Ten platform to an average of 34.5 ± 2.8 fold
coverage using purified DNA from Pulse 10 subclone extractions described in the
SNP methods. Sequences were aligned to the hg19 reference genome using
BWA. PCR duplicates were marked and removed with Picard-tools (Broad).
Somatic mutation calling was conducted using mpileup (SamTools), VarScan2
(WashU), and MuTect (Broad). Mutations detected by both VarScan2 and
MuTect were kept as true somatic mutations. VarScan2 was run using
37
procedures describe by de Bruin and coworkers(78). MuTect was run using
default parameters. Alignments from CG1 and CG2 were used as “normal”
controls for CA1 and CA3. Alignment from AG3 was used as the as “normal”
control for AA3. CG1 and CG2 were used as normals for each other in order to
determine their somatic mutations. Somatic mutations that were called against
multiple “normal” genomes were merged to increase detection rates by
overcoming regions of poor sequence coverage unique to either “normal”
genome. Variants occurring at an allele frequency greater than 0.5 or falling into
repetitive regions or those with consistent mapping errors were removed as
described (78). Somatic indels were called by VarScan2 and filtered using the
same methods described above. Separation of mutation signatures present in
our WGS data was performed by the Somatic Signatures R package using
nsNMF decomposition instead of Brunet NMF decomposition as described by
Covington and colleagues (90). Mutation strand asymmetries were analyzed
using somatic mutations from all samples and the AsymTools MatLab software
(90). All raw sequences are available from NCBI SRA under project number,
PRJNA312357.
Results
System for conditional A3B expression
Previous studies have demonstrated that A3B over-expression induces a strong
DNA damage response resulting in cell cycle aberrations and eventual cell death
(14,22,28,76,86). To be able to control the degree of A3B-induced genotoxicity,
38
we built upon our prior studies (22) by establishing a single cell-derived isogenic
system for conditional and titratable expression of this enzyme. T-REx-293 cells
were subcloned to establish an isogenic “mother” line, which was then
transfected stably with a doxycycline (Dox) inducible A3B-eGFP construct or with
an eGFP vector as a negative control. The resulting “daughter” clones were
screened by flow cytometry to identify those that were non-fluorescent without
Dox (i.e., non-leaky) and uniformly fluorescent with Dox treatment (Fig 1A).
Daughter clones were also screened for Dox-inducible overexpression of A3B-
eGFP or eGFP by anti-GFP immunoblotting (Fig 1B). A3B-eGFP clones were
uniformly GFP-negative without Dox treatment, but eGFP only clones showed a
low level of leaky expression possibly related to greater protein stability. As
additional confirmation, the functionality of the induced A3B-eGFP protein was
tested using an in vitro ssDNA deamination assay using whole cell extracts (87).
As expected, only extracts from Dox-treated A3B-eGFP cells elicited strong
ssDNA C-to-U editing activity as evidenced by the accumulation of the
deaminated and hydrolytically cleaved reaction products (labeled P in Fig 1C;
see Methods for details). Nearly identical results were obtained with a parallel
set of independently derived daughter clones (Fig 1D-E).
Iterative rounds of A3B exposure
To establish reproducible A3B induction conditions, a series of cytotoxicity
experiments was done using a range of Dox concentrations. 10,000 T-REx-293
A3B-eGFP cells were plated in 10 cm plates in triplicate, treated with 0, 1, 4, or
39
16 ng/mL Dox, incubated 14 days to allow time for colony formation, and
quantified by crystal violet staining. As expected, higher Dox concentrations led
to greater levels of toxicity (Fig 2A, B). Interpolation from a best-fit logarithmic
curve indicated that 2 ng/mL Dox (C-series daughter clone) or 1 ng/mL Dox (A-
series daughter clone) would cause 80-90% cytotoxicity, and this concentration
was selected for subsequent experiments. Taken together with the measured
doubling times of daughter clones, each A3B-eGFP induction series was
estimated to span 7 days (represented in the workflow schematic in Fig 2C).
Each T-REx-293 A3B-eGFP daughter clone was then subjected to 10
rounds of A3B-eGFP induction and recovery (Fig 2C). Iterative exposures to
A3B-eGFP were expected to generate randomly dispersed mutations throughout
the genome. Ten rounds of A3B-eGFP induction were chosen as a sufficient
regimen for the cells to accumulate readily detectable levels of somatic mutation
as a proof-of-concept for this inducible system. Moreover, this approach left open
the option to go back and characterize an intermediate round, or pursue
additional rounds should analyses require less or more mutations, respectively.
A potential pitfall of this experimental approach is the possibility of
selecting cells that have inactivated the A3B expression construct or the capacity
for induction to avoid the cytotoxic effects of overexpressing this DNA
deaminase. Aliquots of cells from each pulse series were therefore periodically
tested by flow cytometry for A3B-eGFP inducibility, western blot for protein
expression, and ssDNA deamination assays for enzymatic activity (e.g., Fig 1).
Even after the tenth induction series, the A3B-eGFP daughter clones performed
40
similar to original daughter cultures as well as to daughter cultures that had been
grown continuously in parallel to the Dox-exposed experimental cultures and
diluted to mimic the population dynamics caused by each A3B-eGFP exposure
(e.g., Fig 1). These observations indicate that, despite negative selection
pressure imposed by A3B-eGFP mediated DNA damage, resistance or escape
mechanisms did not become overt.
Targeted DNA sequencing provides evidence for A3B mutagenesis
Next, target gene 3D-PCR and sequencing were used to determine if the cells
within each daughter culture had accumulated detectable levels of mutation after
10 rounds of A3B-eGFP exposure. 3D-PCR is a technique that enables the
preferential recovery of DNA templates with C-to-T transitions and/or C-to-A
transversions, because these mutations cause reduced hydrogen bonding
potential and yield DNA molecules that can be amplified at PCR denaturation
temperatures lower than those required to amplify the original non-mutated
sequences (11,22,91). MYC and TP53 were selected as target genes for this
analysis because prior work by our lab with transiently over-expressed A3B and
by others with related A3 family members has demonstrated that these genomic
regions are susceptible to enzyme-catalyzed deamination (19,22,88,92–97).
The 3D-PCR and DNA sequencing analyses revealed substantially more
mutations in MYC and TP53 in A3B-eGFP exposed cells in comparison to
controls (Fig 2D-G). For instance, in the C-series daughter clone 43 mutations,
mostly C-to-T transitions, were evident in MYC amplicons from A3B-eGFP
41
exposed cultures, whereas only 8 mutations were found in a similar number of
control amplicons (mutation plot on left side of Fig 2D; p=0.00036, Student's two-
tailed t-test). The mutation load per amplicon was also higher (pie graph on right
side of Fig 2D). Similar results were obtained for TP53 (Fig 2E; p=0.11,
Student's two-tailed t-test), as well as for both MYC and TP53 in a parallel set of
independently derived A-series daughter clones (Fig 2F, 2G; p<0.0001 and
p<0.0001, respectively, Student's two-tailed t-test). Taken together, these results
provided strong confirmations that 10 rounds of A3B-eGFP exposure caused
increased levels of genomic DNA mutagenesis.
Genome wide mutation analyses
The experiments described used pools of cells and, due to the largely stochastic
nature of the A3B mutational process and the duration of the pulse series, each
pool would be expected to manifest extreme genetic heterogeneity. This
complexity would constrain a standard deep sequencing approach by enabling
only the earliest arising mutations to be detected in the pool because most
subsequent mutations would persist at frequencies too low for reliable detection.
To reduce this complexity to a manageable level and be able to investigate the
mutational history of a single cell exposed to iterative rounds of either A3B-eGFP
or eGFP, we used limiting dilution to generate “granddaughter” subclones from
the tenth generation daughter pools. The strength of this strategy is that any new
base substitution in a single daughter cell, which occurred between the time the
daughter clone was originally generated until the recovery period following the
42
tenth Dox treatment, would be fixed in the granddaughter clonal population at a
predictable allele frequency depending on local chromosome ploidy (i.e., new
mutations would be expected at 50% in diploid regions, 33% in triploid regions,
25% in tetraploid regions, etc., of the 293 cell genome).
The dynastic relationship between mother, daughter, and granddaughter
clones is shown in Fig 3A. To provide initial estimates of the overall level of new
base substitution mutations, genomic DNA was extracted from each
granddaughter clone and subjected to single nucleotide polymorphism (SNP)
analysis using the Illumina OmniExpress Bead Chip. A base substitution
mutation was defined as a clear SNP difference between each daughter clone
and her respective granddaughter. These analyses revealed a wide range of
SNP alterations among granddaughter clones, ranging from a low of <500 in the
C-series eGFP expressing granddaughter subclone CG1 to a high of over 8,000
in the A-series A3B-eGFP expressing granddaughter subclone AA3 (Fig 3B).
This extensive variability was expected based on the sublethal Dox concentration
used in each exposure round, the randomness of granddaughter clone selection,
and the stochastic nature of the mutation processes. Nevertheless, A3B-eGFP
exposed granddaughter clones had an average of 3.4-fold more new cytosine
mutations than the eGFP controls (averages shown by dashed vertical lines in
Fig 3B). Sanger sequencing of cloned PCR products was used to confirm
several distinct SNP alterations and provide an orthologous validation of this
array-based approach (e.g., representative chromatograms of mutations in
granddaughter CA1 versus CG2 in Fig 3C). In addition, hundreds more genomic
43
copy number alterations were evident in A3B-eGFP exposed granddaughters in
comparison eGFP controls (Fig 3D). Interestingly, the overall number of copy
number alterations appears to correlate positively with the overall number of
cytosine mutations, suggesting that many A3B-catalyzed genomic DNA
deamination events are likely processed into DNA breaks and result in larger-
scale copy number aberrations (Fig 3E).
A3B mutational landscape by whole genome sequencing
Next, whole genome sequencing (WGS) was done to assess the mutation
landscape for 3 A3B-eGFP exposed and 3 eGFP control granddaughter clones
from two distinct biological replica experiments (granddaughters in Fig. 3A).
Samples were sequenced using the Illumina X Ten platform at the Beijing
Genome Institute. Approximately 700 million 150 bp paired-end reads were
generated for each genome, with an average read depth of 34.5 ± 2.8 (SD) per
locus. Reads were aligned against the hg19 genome with BWA and somatic
mutations were called using both VarScan2 (Washington University, MO) and
MuTect (Broad Institute, MA), with the intersection of the results these two
methods identifying unambiguous mutations for further analysis (98,99).
Using this conservative approach for mutation identification, a total of
6741, 3496, and 3530 somatic mutations occurred at cytosines in granddaughter
clones that had been subjected to 10 rounds of A3B-eGFP pulses in comparison
to only 910 and 1531 cytosine mutations in the eGFP controls, consistent with
the results of the SNP analyses described above (p=0.018, Student’s t-test; Fig
44
4A). In particular, the A3B-eGFP pulsed granddaughter clones had higher
proportions of C-to-T mutations than the eGFP controls, 59%, 52%, and 54%
versus 36% and 47%, respectively (red slices in pie graphs in Fig 4B). The A3B-
eGFP pulsed granddaughter clones also had higher proportions of mutations at
A/T base pairs suggesting that genomic uracil lesions introduced by A3B may be
processed by downstream error-prone repair processes analogous to those
involved in AID-dependent somatic hypermutation of immunoglobulin genes (6)
(Fig 4A).
However, despite finding significantly higher base substitution mutation
loads in A3B-eGFP pulsed granddaughter clones, the overall distributions of
cytosine mutations within the 16 possible trinucleotide contexts appeared visually
similar for the A3B-eGFP and eGFP controls (histograms comparing the absolute
frequencies of cytosine mutations with the 16 possible trinucleotide contexts are
shown in Fig 4C). This result was initially surprising because we had expected
obvious differences between the A3B-induced mutation spectrum and that
attributable to other mechanisms, particularly within 5’TC contexts. However, a
closer inspection of the eGFP control data sets strongly indicated that this 293-
based system is defective in post-replication mismatch repair. For instance, the
eGFP controls had large numbers base substitution mutations (predominantly C-
to-A, C-to-T, and T-to-C) as well as hallmark microsatellite instabilities consistent
with reported mutation spectra in mismatch repair defective tumors (65,100).
Moreover, each eGFP control had over 10,000 insertion/deletion mutations
ranging in size from 1 to 46 base pairs (constrained by the length of the Illumina
45
sequencing reads).
Therefore, to distinguish the A3B-eGFP mutational contribution from that
caused by mismatch repair deficiency, we used NMF decomposition via the
Somatic Signatures R package to extract mutational signatures from all
granddaughter clones (Methods). Extracted signature 1 (E1) contained minimal
C-to-A mutations and the overall proportion of C-to-T mutations was increased
greatly compared to the raw profiles observed for each sample. This contribution
of this signature to the overall mutation profile was specifically enriched in the
A3B-eGFP pulsed granddaughter clones, contributing to about 50% of all
mutations (Fig 4E). Notably, this signature shows significant enrichments for C-
to-T mutations within 5’TCG motifs, which are biochemically preferred by
recombinant A3B enzyme (5,22,23) (Fishers exact test for granddaughter clones
CA1/CA3: TCA, p=0.42/0.93; TCC, p=1.00/0.96; TCG, p < 0.0001/0.0001; and
TCT, p=1.00/0.46). Moreover, strong enrichments for C-to-G transversion
mutations were evident for cytosine mutations within TCW contexts (W=A or T) in
E1 in comparison to other trinucleotide combinations (p=0.027, Student’s t-test).
C-to-G transversions are hallmark A3B-mediated mutations as other cytosine-
biased mutational processes such as ageing (spontaneous deamination of
methyl-cytosines in 5’CG motifs) and UV-light (polymerase-mediated bypass of
cross-linked pyrimidine bases) primarily result in C-to-T transitions (26,63).
Extracted signatures 2 (E2) and 3 (E3) were characterized by larger proportions
of C-to-A mutations occurring independently of trinucleotide motif. Upon
clustering our extracted signatures with previously defined mutation signatures
46
(65), E2 and E3 appeared most similar to signature 16, which currently has no
known etiology. Unexpectedly our APOBEC-mediated signature, E1, clustered
most closely to signature 5, also of unknown etiology, rather than signatures 2 or
13, which are normally attributed to APOBEC-mediated mutagenesis. These
WGS data suggest that the “APOBEC” signature may be more complex than
anticipated from prior studies. Nevertheless, our WGS studies demonstrate
increased genome-wide mutagenesis attributable to A3B, even over additional
pre-existing mutational process in this human 293 cell-based system.
Discussion
A3B is emerging as a significant source of somatic mutation in many different
cancer types (reviewed by (27,62–64) and see Introduction for references to
primary literature). Here, we further develop a 293-based cellular system for
conditional, Dox-mediated expression of A3B. The system was validated using
flow cytometry, immunoblotting, enzyme activity assays, and, most importantly,
three complementary mutation detection methods (3D-PCR, SNP microarray,
and WGS). Our results demonstrate higher levels of cytosine-focused mutations
in A3B-eGFP expressing cells, in comparison to eGFP controls. In particular, C-
to-T transition mutations and C-to-G transversion mutations in A3B preferred
trinucleotide motifs predominated after the composite mutation spectra were
extracted into 3 separate signatures. These studies combine to fortify the
conclusion that A3B is a potent human genomic DNA mutagen.
An unexpected outcome of our studies was the discovery of a probable
47
defect in post-replication mismatch repair in the 293-based system. The
mismatch repair-defective phenotype is clear with hallmark microsatellite
instabilities and base substitution mutation biases. However, the molecular
nature of this defect is not obvious and may be genetic and/or epigenetic. For
instance, the WGS data show 6 exonic and over 100 intronic alterations to
mismatch repair and related genes that could induce a mismatch repair defective
phenotype. Our results are consistent with a prior WGS study that found 1000’s
of mutational differences between 6 different 293-derived cell lines, as well as
significant down-regulation of MLH1 and MLH3 in a subset of the lines (101). Our
studies are also consistent with at least two additional prior reports characterizing
the related 293T cell line as mismatch repair defective (102,103). Regardless of
the precise molecular explanation(s), given the large number of labs worldwide
that rely upon 293 or 293-derived cell lines, the results presented here are likely
to be helpful for informing future experimental designs using this system.
Despite a compelling case for A3B in cancer mutagenesis (key results
cited in the Introduction), the overall APOBEC mutation signature in cancer
cannot be explained by A3B alone, because it is still evident in breast cancers
lacking the entirety of the A3B gene due to a common deletion polymorphism
(104). One or more of the other APOBEC family members with an intrinsic
preference for 5’TC dinucleotide substrates may be responsible. For instance, a
leading candidate is A3A due to high catalytic activity in biochemical assays,
nuclear/cell-wide localization in some cell types, propensity to induce a DNA
damage response and cell death upon overexpression, and the resemblance of
48
its mutation signature in model systems to the observed APOBEC signature in
many cancers (11,12,14,22,28,87,94,105–110). A3A gene expression may also
be derepressed as a side-affect of the A3B gene deletion(111). Additional studies
will be needed to unambiguously delineate the identities of the full repertoire of
cancer-relevant APOBEC3 enzymes, quantify their relative contributions to
mutation in each cancer type, and build upon this fundamental knowledge to
improve cancer diagnostics and therapeutics.
Acknowledgements
We thank Emily Law for providing A3B-eGFP and eGFP constructs and several
Harris and Grigoriadis lab members for helpful comments.
49
Fig1.AconditionalsystemforA3Bexpression.
(A) Representative flow cytometry data for T-REx-293 A3B-eGFP and eGFP
daughter cultures 24 hrs after Dox treatment (n=3; mean +/- SD of technical
replicates).
(B) Representative anti-GFP immunoblot of T-REx-293 A3B-eGFP and eGFP
daughter cultures 24 hrs after Dox treatment.
(C) Representative DNA cytosine deaminase activity data of whole cell extracts
from T-REx-293 A3B-eGFP and eGFP daughter cultures 24 hrs after Dox
treatment.
(D, E, F) Biological replicate data using A-series daughter clones of the
experiments described in panels A, B, and C, which used C-series daughter
clones.
50
B
Cβ-actin
GFP
β-actin
GFP
0 2 10000 2 1000
A3B-eGFP eGFP
0 21000 0 2
1000
S
P
A
E
F
D
0
50
100
25
75
25
75
0 2 10000 2 1000
0 1 10000 1 1000
Dox ng/mL Dox ng/mL
Dox ng/mL0 1 1000
0 1 1000
A3B-eGFP eGFP
Dox ng/mL
Dox ng/mL
A3B-eGFP eGFP A3B-eGFP eGFP
A3B-eGFP eGFP A3B-eGFP eGFP
S
P
Dox ng/mL
70 kDa
30 kDa
42 kDa
70 kDa
30 kDa
42 kDa
43 nt
34 nt
43 nt
34 nt
50
100
0 110
00 0 110
000
C-series Daughter Clones A-series Daughter Clones
% G
FP+
% G
FP+
51
Fig2.A3Binductionoptimizationandtargetedsequencingresults.
Fig2.A3Binductionoptimizationandtargetedsequencingresults (A, B) Dose response curves indicating the relative colony forming efficiency
(viability index) of T-REx-293 A3B-eGFP daughter clones treated with the
indicated Dox concentrations (n=3; mean viability +/- SD of biological replicates).
The dotted lines show the Dox concentration required to induce 80% cell death
(2 or 1 ng/mL for C- and A-series daughter clones, respectively).
(C) A schematic of the experimental workflow depicting the viability index of a
population of cells induced to express A3B-eGFP and recover over time. Dox
treatment occurs on day 1, maximal death is observed on days 3 or 4, and each
population typically rebounds to normal viability levels by days 6 or 7.
(D-G) A summary of the base substitution mutations observed in MYC (241 bp)
and TP53 (176 bp) by 3D-PCR analysis of genomic DNA after 10 rounds of A3B-
eGFP or eGFP exposure. Red, blue, and black columns represent the absolute
numbers of C-to-T, C-to-A, and other base substitution types in sequenced 3D-
PCR products, respectively. Asterisks indicate cytosine mutations occurring in 5’-
TC dinucleotide motifs. The adjacent pie graphs summarize the base substitution
mutation load for each 3D-PCR amplicon. The number of sequences analyzed is
indicated in the center of each pie graph.
52
A3B-eGFP Exposed C-series Daughter
C-to-T
C-to-A
Other
TC context
C-to-T
C-to-A
Other
TC context
MYC
CB
D
A3B-eGFP Exposed C-series Daughter
eGFP Exposed C-Series Daughter
TP53
F
A3B-
Induced
Death
Growth
and
Recovery
A1 or 2ng/mL Dox
H
J
17
22
15
50 bp
50 bp
0 5 10 15
0
25
50
75
100
Dox (ng/mL) Dox (ng/mL)
Via
bil
ity
In
de
x
Via
bil
ity
In
de
x
0
25
50
75
100
Days
eGFP Exposed C-series Daughter
A3B-eGFP Exposed A-series Daughter
A3B-eGFP Exposed A-series Daughter
eGFP Exposed A-series Daughter
eGFP Exposed A-series Daughter
19
0
1
2
3
4
≥5
Mutations/
Sequence
0
1
2
3
4
≥5
Mutations/
Sequence
2ng/mL Dox
1ng/mL Dox
MYC
TP53
20
22
25
25
0 05 10 15
0
25
50
75
100
Via
bil
ity
In
de
x
C-series Titration A-series Titration
53
Fig3.SNPanalysestoestimatenewmutationaccumulation.
(A) A dynastic tree illustrating the relationship between mother, daughter, and
granddaughter clones used for SNP and WGS experiments. The red, dashed box
around the daughter clones denotes 10 cycles of Dox-treatment.
(B) A histogram summarizing the SNP alterations observed in granddaughter
clones by microarray hybridization. Red, blue, and black colors represent C-to-T,
C-to-A, and C-to-G mutations, respectively.
(C) Sanger sequencing chromatograms confirming representative cytosine
mutations predicted by SNP analysis. The left chromatogram shows a G-to-A
transition (C-to-T on the opposite strand) and the right chromatogram a C-to-G
transversion.
(D) A histogram plot of the total number of copy number (CN) alterations in the
indicated categories in A3B-eGFP exposed granddaughter clones in comparison
to eGFP exposed controls.
(E) A dot plot and best-fit line of data in panel B versus data in panel D.
54
55
Figure4.SummaryofsomaticmutationsdetectedbyWGS.
(A) Stacked bar graphs representing total number of C/G and T/A context
somatic mutations in each granddaughter subclone (black and white bars,
respectively). A1 and A3 are A3B-eGFP exposed subclones, and G1 and G2 are
eGFP exposed controls.
(B) Pie charts representing the proportion of each type of cytosine mutation
across the genome. Red, blue, and black wedges represent C-to-T, C-to-A, and
C-to-G mutations, respectively. A1 and A3 are A3B-eGFP exposed subclones,
and G1 and G2 are eGFP exposed controls.
(C) Stacked bar graphs representing the absolute frequency of C-context somatic
trinucleotide mutations of each subclone from the B panel.
(D) Stacked bar graphs representing the extracted mutation signatures from
WGS data.
(E) The proportion that each extracted mutation signature contributes to each
granddaughter clone.
56
A3B-eGFP eGFP
Num
ber o
f som
atic
mut
atio
ns
A
B C
E
D
ACAACCACGACTCCACCCCCGCCTGCAGCCGCGGCTTCATCCTCGTCT
0.000.020.040.060.080.10
ACAACCACGACTCCACCCCCGCCTGCAGCCGCGGCTTCATCCTCGTCT
0.000.020.040.060.080.10
CG1
CG2
AA3
36%
22%
42%
47%
20%
33%
ACAACCACGACTCCACCCCCGCCTGCAGCCGCGGCTTCATCCTCGTCT
0.000.020.040.060.080.10CA3
52%
19%
29%
3530
910
1531
ACAACCACGACTCCACCCCCGCCTGCAGCCGCGGCTTCATCCTCGTCT
0.000.020.040.060.080.10
CA1
54%
19%
27%
3496
Prop
ortio
n of
tota
l mut
atio
ns
CG1 - eGFP Exposed
CG2 - eGFP Exposed
CA3 - A3B-eGFP Exposed
AA3 - A3B-eGFP Exposed
CA1 - A3B-eGFP Exposed
ACAACCACGACTCCACCCCCGCCTGCAGCCGCGGCTTCATCCTCGTCT
0.000.020.040.060.080.10
AA3
CA1
CA3
CG1
CG2
0
5000
10000
15000T/A
C/G
6741
ACAACCACGACTCCACCCCCGCCTGCAGCCGCGGCTTCATCCTCGTCT
0.00.10.20.30.40.5
0.00
0.25
0.50
0.75
1.00
E1
E2
E3
AA3
CA1
CA3
CG1
CG2
ACAACCACGACTCCACCCCCGCCTGCAGCCGCGGCTTCATCCTCGTCT
0.00.10.20.30.40.5
Extracted Signature 3 (E3)
Extracted Signature 2 (E2)
ACAACCACGACTCCACCCCCGCCTGCAGCCGCGGCTTCATCCTCGTCT
0.00.10.20.30.40.5
Extracted Signature 1 (E1)
Con
trib
utio
n to
sig
natu
re
Sign
atur
e pr
opor
tion
59%
17%
24%
57
Chapter 3 Development of Expression-based Biomarkers of
Dasatinib Response in Hematologic Malignancies
Running Title: Development of Dasatinib-Response Biomarkers in B-cell
Malignancies
Monica K. Akre1, Mitra, Amit1, Wen Wang2, Chad L. Myers2, Brian Van Ness1*
1 Department of Molecular, Cellular, Developmental Biology, and Genetics,
Masonic Cancer Center, University of Minnesota, Minneapolis, MN, USA, 55455
2 Department of Computer Science and Engineering, University of Minnesota,
Minneapolis, MN, USA, 55455
Keywords: B-cell Receptor (BCR) Pathway, Genomics of Drug Sensitivity in
Cancer (GDSC), B-cell malignancies, Dasatinib
Overview
Personalized medicine includes the identification of the biomarkers and/or
expression profiles important in the treatment of cancer. Cancer cell lines offer a
representative diversity of molecular processes involved in both malignant
58
phenotypes as well as therapeutic responses. Coupling cancer line expression
with high-throughput drug screens allows for examination of perturbed pathways
that influence drug sensitivities. Here we use The Sanger Institute’s Genomics of
Drug Sensitivity in Cancer to develop a bioinformatic and pathway analysis
approach that identifies the BCR pathway as an important biomarker in response
to the tyrosine kinase inhibitor, dasatinib, used in hematologic cancer therapies.
This approach establishes a process by which data from cell line repositories can
be used to identify biomarkers associated with drug response in the treatment of
cancers.
Introduction
Despite improvements in cancer therapies, wide variation in tumor
response to treatment is a major limitation in achieving consistent therapeutic
effect(1,2). The variation is likely represented by tumor diversity. Previously, we
demonstrated computational approaches to define response and resistance
based on gene expression profiles in myeloma, a plasma cell malignancy (3).
This approach took advantage of a large panel of myeloma cell lines
representing a wide diversity of response to therapeutic drugs used in the
clinic(4).
Numerous collections of cancer cell lines provide additional opportunities
to characterize the genetic signatures that may distinguish response and
resistance to a variety of drugs. The NCI-60, Cancer Cell Line Encylopedia
(CCLE), and the Genomics of Drug Sensitivity in Cancer (GDSC) provide a
wealth of information that includes genomic sequences, mutational status, gene
59
expression, as well as response to panels of drugs used in cancer therapies (5–
7). This offers a unique opportunity to apply approaches that may provide genetic
profiles that define response and resistance, with the potential to apply these as
predictors in clinical decisions.
The GDSC is a collection of 1,047 cell lines from diverse tumor types that
have been tested with 265 drugs (5). The data collection includes DNA
sequence, mutation status, and gene expression data that we have used to
develop a pipeline of computational approaches that predict response and
resistance. Drug response is determined by a 9 step 2-fold serial dilution of drug
concentration and measuring cell viability. From these, two quantitative values
are provided: the drug concentration required to reduce viability by 50% (IC50)
and the area under the dose-response survival curve (AUSC). Gene expression
data is available from the Affymetrix U219 gene array platform.
Because expression patterns may vary widely simply based on tissue
specificity, we chose to limit our initial analysis to B cell malignancies,
represented by 71 cell lines derived from leukemias, lymphomas, and myelomas.
Our approach comprises a series of steps in which we classify response and
resistance, develop a differential classification profile of gene expression
patterns, identify features by pathway analysis, and validate on cell lines and
reported clinical outcomes (Fig 1). We demonstrate this approach to stratify and
predict response to the protein tyrosine kinase inhibitor, dasatinib.
Dasatinib is a multi-target kinase inhibitor that has affinity for about 50
kinases and is most widely used to manage chronic myelogenous leukemia (8–
60
12). Dasatinib’s ability to reversibly and competitively inhibit the ATP binding site
of kinases make its potential applications wide in scope. However, therapeutic
applications have been reported in multiple cancers with significant variation in
response(13–16). For this report, we develop an approach that identifies a 5
gene signature distinguishing dasatinib response and resistance.
Methods
Expression Analysis
Robust-Multi-Array Average (RMA) normalized expression data and drug
response data was downloaded directly from the GDSC website
(www.cancerrx.org).
CCLE Robust-Multi-Array Average (RMA) normalized expression data
was downloaded from Genomicscape.com. Multiple ENSEMBL IDs mapping to
the same gene region were averaged.
Normalization of CCLE values GDSC
There are 48 cell lines of B-cell origin that are in both the GDSC and the
CCLE. These were used to determine the best method of normalization. Lowess
plot comparison between these 48 lines showed the best concordance between
the two sets. First, the natural log of each CCLE intensitiy value was subtracted
from that value, then the set was quantile normalized to the GDSC. The
Bioconductor limma package normalizeQuantile function was modified to adjust
the CCLE values to their GDSC “twin” lines(17).
R scripts available by request.
Statistical Analysis
61
The Significant Analysis of Microarray (sam) function of the siggenes R
package was used in differential expression analysis (18,19). The thresholds
were varied until the number of genes exceeded 200, yet still within an FDR of
0.10.
Responders and Non-Responders expression values for each gene in the
signature were compared using the unpaired, nonparametric, Mann-Whitney test
in GraphPad (Prism).
Ordinary one-way ANOVA was performed in GraphPad (Prism) on all
genes of signature using 4 groupings based on response values: Responders
(AUSC<0.75), Partial Responders (AUSC 075-0.85) and Limited Responders
(AUSC 0.85-0.98), and Non-Responders (AUSC>0.98).
The cor() function of the R stats package was used to find Pearson
correlation coefficients using all expression values between cell lines (20).
All heatmaps, including the Pearson correlation coefficients were rendered
using the pheatmap package (21).
Tissue Culture
All cell lines were maintained in RPMI-1640 (Lonza) supplemented with
10% FBS (Invitrogen Life Technology), 1X Antibiotic/Antimycotic (Gibco), 1X L-
Glutamine (Gibco), and 1ng/mL of human IL-6. Incubators were humidified and
maintained at 37° Centigrade with a 5% CO2 content.
Cells were plated at a concentration of 5x105/mL on day0. On day 1, a
dilution series of concentrations (2-fold dilutions from the max dose of 5.12µM)
as well as a DMSO vehicle control were administered in triplicate. On day 4 (72
62
hours after treatment) cell viability was measured by Cell Titer-Glo Luminescent
cell viability assay according to manufacturer's instructions (Promega) and
luminescence was read and recorded using Synergy 2 Microplate Reader
(Biotek).
Maximum viability assigned, as 100% for IC50 or 1 for AUSC calculations,
was normalized to untreated controls. IC50values were estimated by calculating
the nonlinear regression using the inhibitor-normalized response equation
(variable slope) in GraphPad (Prism).
As done in the calculation of AUSC in the GDSC, wells containing media,
drug, but no cells were used to calculate the value for normalizing maximum
response, which corresponds to a 0 value. AUSC used the concentrations that
overlapped with GDSC doses and substituted the first lowest concentration within
the GDSC dilution series for the lower 2 doses tested in lab. AUSC were
calculated using GraphPad (Prism).
Results
Classification of Response and Resistance.
The first step in the process is shown in Figure 2, in which we arranged
the 71 B cell lines by response, using both Area Under the Survival Curve
(AUSC) and IC50. The distribution of response favored non-response, with only
14 lines showing a strong response to low doses, and 11 lines showing
essentially no response. Specifically, we classified Responders as lines that
show an AUSC <0.75, and an IC50 value < maximum drug concentration divided
by 4 and Non-Responders were defined as having an AUSC>0.98 and an IC50 >
63
maximum dose tested. This resulted in the 14 strong Responder lines versus 11
highly resistant (Non-Responder) lines, representing the extreme ends of
response and resistance. Our rationale was that underlying this wide separation
of response may be a common expression signature or pathway(s) that can
serve as a predictive biomarker.
We applied Pearson correlation coefficient analysis on all 71 cell lines
(see methods). Notably, Responders shared more similarity than Non-
Responders (rank-sum p = 4.97x10-10) (Fig 2B,C). What is apparent is the Non-
Responders, as a group, are far more diverse in gene expression than those
cells that are within the Responder group. Thus, we directed our attention to the
features that define the Responders that are distinct from the gene expression of
the diverse Non-Responders.
Differential Gene Expression and Feature Selection
Differential gene expression between the Responder and Non-Responder
lines was performed using Significance Analysis of Microarray (see methods),
with a False Discovery Rate limited to 10%. This resulted in 228 genes to further
analyze for their relevance to the dasatinib response.
Ingenuity Pathway Analysis (IPA) (Qiagen) uses an extensive, curated
literature bank to identify molecule interaction and pathway regulation (23). Data
is uploaded and populates the knowledge base interactions. Significant p-values
are generated when genes fall within a pathway in a non-random manner.
Further, pathway activation scoring is achieved by similarly populating relevant
pathways and the directionality of the gene expression. Analysis of major
64
canonical pathways provides both the p-value (likelihood of true positive) and z-
score (directional impact of activation/de-activation).
Pathway analysis was conducted using IPA on the 228 differentially
expressed genes and their log-fold ratio data of Responders relative to Non-
Responders. Notably, the top canonical pathway with a significant z-score (z-
score=2, which is 2 standard deviations from the mean) was the B-cell-receptor
(BCR) pathway (p-value=0.013). Using the log-fold ratio scores of Responders
relative to Non-Responders, molecules of the BCR pathway reflected a uniquely
activated pattern in the Responders(Fig 3).
BCR and B-cell Development
The BCR pathway has been demonstrated to be active at different stages
of B cell development (24–26). Activation of various oncogenes and tumor
suppressor genes gives rise to the malignancies at different stages of B-cell
development (Fig. 4). The distribution of response to dasatinib along the B-cell
differentiation path indicates cancers arising from a pre- or early- B-cell may be
more likely to respond than B cells or plasma cells later in development that
notably have decrease expression of BCR pathway genes.
Evaluation of the Genes in the BCR Pathway
We further characterized the differentially expressed genes of the BCR
pathway between the Responders (high expression) and Non-Responders (low
expression). The genes from the BCR pathway were analyzed using t-test
between Responders and Non-Responders. Genes that were significant (p-
value <0.05) were further analyzed.
65
Mann-Whitney tests of individual genes of the BCR pathway were
performed. We reasoned that effective drug-response predictors would also
show a linear trend between Responders, intermediate groups, and Non-
Responders.
Intermediate response groups were included to identify an expression
trend across the full range of responses. These groupings included cell lines with
a partial response (AUSC between 0.75-0.85) and limited response (AUSC
between 0.85-0.98). The 5 genes displaying the best separations were chosen
for use in a predictive scoring system described below. Significant p-values
between adjacent groupings were not observed. However, ANOVAs were
performed using all 4 groupings and indicated 4 of the 5 genes showed highly
significant differences between the Responders and Non-Responders (ANOVA
p-values listed underneath Mann-Whitney test p-values in Fig 5), and trends of
decreasing expression across the increasing resistance groupings. The
expression of the 5 genes is shown as a heatmap comparing the strong
Responders and the highly resistant Non-Responders (Fig 6). Notably, CD19
alone showed a significant association with response (p<0.0001).
Validation
Independent Cell Lines
Eleven cell lines not included in the differential expression analysis had
available gene expression data (8 from CCLE, 3 from GDSC). These were tested
in-house for dasatinib response. The CCLE/GDSC expression platforms were
then normalized to one another to obtain comparable values (not shown). During
66
the validation phase, we used the AUSC as the sole metric indicating response.
The binary classification of these lines based on the 5 gene signature into
Responder (AUSC<0.8) and Non-Responder (AUSC>0.8) demonstrated the
previously modeled BCR expression associated with response in 9 of the 11
lines (Fig 7).
An algorithm was developed from the averaging the expression values of
the Responders and Non-Responders, we refer to these as Response Averages
(RA). Lines that had expression lower than the RA of CD19 were immediately
binned as a Non-Responder. Lines that had expression higher than that of the
CD19 RA were given a score of “1” for each gene the expression exceeded the
RA of EBF1, PAX5, BTK, and BLNK. If a cell line’s score was less than 3, they
were binned as Non-Responders.
The discriminating genes were determined using the extreme Responders
versus the extreme Non-Responders. When applying the scoring system to the
intermediate responding lines (that were not included in the gene discrimination
modeling) 30 lines were correctly predicted for an accuracy rate of 67%. It should
be noted low CD19 does place 23 out of 23 of these cell lines in the Non-
Responder category. Thus, low CD19 is an effective discriminator of non-
response. Nevertheless, the 5 gene discrimination was very accurate in
distinguishing the highly sensitive from the highly resistant lines. Despite the
difficulty of determining the response of the equivocal intermediate group, the
sensitivity of the test set, training set, and intermediate group is 78.9% and
specificity is 74.6%.
67
Clinical Associations
MM lines (plasma cells) rarely express CD19, and show a low activation of
the BCR pathway. Thus, our BCR gene discrimination model would indicate
poor response. Indeed, a recent clinical study (NCT00429949) of relapsed,
refractory, or plateau phase MM patients was discontinued after using dasatinib
as a single agent in which a partial response occurred in only 1 of 21 enrolled in
the study.
Waldenström’s macroglobulinemia (WM) is also a plasma cell
malignancy, but in contrast to multiple myeloma, is CD19+, and has recently
been described to express an activated BCR pathway (27). Based on our 5 gene
signature we predicted these malignancies would respond to dasatinib; thus,
representing an independent validation set of Responders. Indeed, WM primary
patient lines exhibited good response to dasatinib in primary patient samples
(n=32)(27) supporting our findings that the expression of CD19 and 4 other
molecules of the BCR pathway are associated with dasatinib response. A
comparison of the 5 gene expression of the WM patients and MM cell lines
shows the stark difference in expression between these two plasma cell
malignancies (Sup Fig 1).
Bortezomib Resistance, CD19, and Collateral Dasatinib Sensitivity
Further supporting the findings here is work done in mantle cell lymphoma
(MCL). MCL is mid-stage B-cell malignancy and may, or may not, express CD19
(28,29). Two MCL lines were treated with low doses of Bortezomib over time to
better understand mechanisms of resistance in MCL (30). This acquired
68
bortezomib resistance (BTZ-R) was accompanied by a re-activation of the BCR
pathway, ie. these cells had increased expression and phosphorylation of BCR
components above parental lines. Notably, along with this re-activation, a
collateral increase in sensitivity to dasatinib was observed. The BTZ-R lines
responded to doses of dasatinib 10-fold less concentrated than parental lines.
These observations held true in mouse xenografts of the MCL line pairs treated
with dasatinib as well (30). These data further support that CD19 and the BCR
pathway are consistent biomarkers associated with B cell lineage cells response
to dasatinib.
To further explore the possibility of acquired bortezomib resistance
inducing a collateral sensitivity to dasatinib, we tested the sensitivity of a
myeloma cell line pair. U266-P and U266-VR (Velcade resistant) was developed
in our laboratory and had RNAseq data available (130). The U266-VR line had a
2.8-fold increase in FPKM reads for CD19. In addition the U266-P line had a
dasatinib AUSC value of 0.91, whereas U266-VR had an AUSC of 0.73. This
supports the observations of Kim, et al. and also lays the groundwork for
exploring the bortezomib-dasatinib relationship further in multiple myeloma.
Discussion
We describe the dasatinib response of B-cell malignancies as two
extremes: Responders and Non-Responders. From this binary categorization,
comparisons were made between the groups. Differential expression revealed a
set of genes, that, when uploaded to IPA, was most significant for the BCR
pathway. The Responders had an activated pathway and conversely, the Non-
69
Responders had a de-activated BCR pathway. Five genes: CD19, EBF1, BTK,
BLNK, and PAX5, of this pathway were used to sort the 25 original cell lines into
their Responder/Non-Responder groupings.
Eleven independent cell lines were likewise binned according to their
expression values for each of the five genes. Ten of the 11 were appropriately
categorized. WM patient primary samples have expression patterns indicative of
response. As a malignancy, WM primary cell lines are responsive to dasatinib
(27). The expression of the 5 BCR genes, and particularly CD19, showed
effective discrimination of the strong responders and highly resistant cell lines.
We show here that cell line expression and drug response can be
interrogated through differential expression and pathway analysis to find
meaningful relationships and identify biomarkers of drug response. However,
there are potential limitations. First, the modeling was done with a very limited
number of cells, at the extremes of response. Despite this, the associations
identified in the limited modeling set provided a robust association with actual
clinical outcomes. The predictive power in a clinical setting may not be accurate
in identifying partial responses, nor the increased efficacies that often
accompany combination therapies. However, we note myelomas with emerging
proteasome inhibitor resistant populations that may contribute to relapses, may
respond to dasatinib, and possibly benefit from its use in combination therapies.
The modeling also takes into account only the diversity of tumor cells, without
consideration of variations in microenvironmental influences or patient variations
70
in drug metabolism or distribution. Yet, the patient correlations remained
significant to predict response across the B cell developmental pathway.
This is just one example of the use of available data bases to develop
response signatures. Similar approaches may provide gene signatures across
many other drugs within the GDSC, CCLE, or similar large cell line data bases.
Ultimately, expression signatures may add important biomarkers that better direct
therapeutic approaches,
Acknowledments:
We thank Zachary Hunter and Steven Treon ,M.D.,Ph.D. for supportive guidance
with RNAseq data of the Waldenström’s data set. We thank the Department of
Genetics, Cell Biology & Development for support for MA. AM was supported by
Minnesota State Partnership Grant.
.
71
Fig 1. Workflow
A) Cell lines representing B cell malignancies and showing strong response
and strong resistance to disatinib were chosen.
B) Differential expression using sig.genes with an FDR of <0.10 was used to
describe potential genetic biomarkers of response.
C) Gene lists and fold change expression were analyzed using Ingenuity
Pathway Analysis.
D) Mann-Whitney tests were performed on extreme Responders/Non-
Responders followed by ANOVA analysis to determine expression trends
across response.
E) Statistically significant genes defined a gene signature and were used to
separate cell lines by response.
F) Untested lines and clinical samples were used to validate the gene
signature.
72
Feat
ure
Sel
ectio
nC
lass
ifica
tion
Gen
eE
valu
atio
nD
iffer
entia
tion
Valid
atio
n
Line
1Li
ne 2
Fig
1. W
orkf
low
**
**
Gen
e S
epar
atio
n
A)
B)
C)
D)
E)
F)
73
Fig. 2 Diversity of Dasatinib Response and Expression
Correlation
A) The 71 B-cell malignancies found in the GDSC with dasatinib data are plotted
along the x-axis in increasing order of AUSC value. Both log2(IC50) values
(black boxes, scaled on the primary axis) and AUSC scores (gray circles, scaled
on secondary axis) are represented.
B-C) Pearson correlation coefficients calculated using all available 17,419 gene
expression observations within the GDSC. Perfect correlation coefficients of 1
can be seen along the diagonal as each cell line is compared to itself. The lowest
correlation coefficients are found between Non-Responders indicative of their
expression diversity.
74
75
Fig 3. BCR Pathway Activated in Extreme Responders
Genes of the BCR pathway represented as expression ratios of highly sensitive
Responders relative to Non-Responders. The darker the red-colored molecule
represents a greater fold-change between groups.
76
PlasmaMembrane
NuclearEnvelope
Fig 3. BCR Pathway Activated in Extreme Responders
77
Fig 4. B-cell Differentiation Stages with Malignancy and CD19
Expression of Cell Lines
Maturation of B-cells is represented from left to right. Malignancies arising from
corresponding stages of B-cell development are depicted along with cell lines
representing those malignancies in the same vertical axis. Next to cell line names
are either a red dot (Responders) or blue dot (Non-Responders) along with their
corresponding log2 fluorescence intensity of CD19, the marker found most
associated with dasatinib response.
78
Man
tle C
ell L
ymph
oma
Burk
itt Ly
mph
oma
Diffu
se L
arge
B-c
ell L
ymph
oma
Follic
ular L
ymph
oma
Mult
iple
Mye
loma
Chro
nic L
ymph
ocyti
c Leu
kem
ia
GA
-10
SU
-DH
L-6
SU
-DH
L-16
Fara
ge
SU
P-B
15
KA
RPA
S-2
31
NU
-DU
L--1
DO
HH
-2
SU
-DH
L-4
KA
RPA
S-4
22
MLM
A
NA
LM-6
697
MH
H-P
RE
B-1
SK-M
M-2
BC
-1
EB
2TU
R
RP
MI-8
226
WS
U-N
HL
RL
WS
U-D
LCL2
SC
C-3
L-36
3IM
-9
Hairy
Cell
Leu
kem
ia
GCB-
like
Lym
phom
a
Acut
e Ly
mph
ocyti
c Leu
kem
ia2345678
Fig
4. B
-cel
l Diff
eren
tiatio
n St
ages
with
Mal
igna
ncy
and
CD
19 E
xpre
ssio
n of
Cel
l Lin
es
79
Fig 5. Statistically Significant Signature Genes
Each Gene in the 5 gene signature is represented with cell lines binned
according to AUSC and their log2 expression values. The discrimination of
categorical response is depicted in the graph below as the following 4 categories
with AUSC values following parenthetically: Responder (0-0.75), partial
Responder (0.75-0.85), limited Responder (0.85-0.98), and Non-Responder
(>0.98). Mann-Whitney tests p values assessing the Responders versus Non-
Responders are listed above the p values for ANOVA across all categories.
80
Fig 5. Statistically Significant Signature Genes
0
2
4
6
8
10lo
g2(F
I)**** p<0.0001**** p<0.0001
CD19
Responder
Partial
Res
ponder
Limite
d Res
ponder
Non-Res
ponder
AU
SC
0
0.5
1
2.5
3.0
3.5
4.0
4.5
5.0
log2
(FI)
PAX5**** p<0.0001**** p<0.0001
0
5
10
15
log2
(FI)
EBF1**p=0.0013**p=0.0042
Responder
Partial
Res
ponder
Limite
d Res
ponder
Non-Res
ponder
AU
SC
0
0.5
1
Responder
Partial
Res
ponder
Limite
d Res
ponder
Non-Res
ponder
AU
SC
0
0.5
1
0
5
10
15
log2
(FI)
BTK**p=0.0077ns p=0.0522
Responder
Partial
Res
ponder
Limite
d Res
ponder
Non-Res
ponder
AU
SC
0
0.5
10
5
10
15
log2
(FI)
BLNK**p=0.0013**p=0.0085
Responder
Partial
Res
ponder
Limite
d Res
ponder
Non-Res
ponder
AU
SC
0
0.5
1
81
Fig 6. Gene expression Signature of Extreme Response Unsupervised clustering of the gene expression signature that discriminates
Responders from Non-Responders Cell lines are listed along the x-axis while the
5 genes most associated with dasatinib response are on the y-axis. Expression
values are represented as scaled as z-scores of the log2 transformed
fluorescence intensities. The 14 extreme Responders are boxed in red on the left
of the heatmap, the 11 extreme Non-Responders are boxed in blue on the right.
The dynamic ranges of each gene in the signature is not always reflective of its
contribution to identifying response as can be seen in the case of PAX5. This
gene is highly significant in differentiating response, but its absolute values vary
subtly, but significantly (see Figure 5, p<.0001).
82
FarageNU−D
UL−1
MLM
ADOHH−2
NALM
−6KARPA
S−422
SUP−B
15697SU−D
HL−6
GA−10
SU−D
HL−16
KARPA
S−231
MHH−P
REB−1
SU−D
HL−4
WSU−N
HL
IM−9
RL
EB2
WSU−D
LCL2
TUR
RPMI−8226
SK−M
M−2
L−363BC−1
SCC−3
PAX5
CD19
EBF1
BTK
BLNK
−3
−2
−1
0
1
2
Fig 6. 5 Gene Expression Signature of Extreme Response
Extreme Responders Extreme Non-Responders
83
Fig 7. Validation of Lines With 5 Gene Signature Training and Test Lines are evaluated based on their expression values relative
to the extreme responding lines. Lines with expression values less that the
average of the Responders and Non-Responders of the training set are binned
into Responder and Non-Responder. Test set of 11 cell lines had 1 true positive,
2 false positives, 8 true negatives, and no false negatives.
84
CD19LOW
HIGH
SU-DHL-4
WSU-NHL
BC-1
EB2TUR
RL
WSU-DLCL2
SCC-3
SK-MM-2RPMI-8226L-363IM-9
SUP-B15
NALM-6697
GA-10
SU-DHL-6
SU-DHL-16
Farage
KARPAS-231
NU-DUL--1DOHH-2
KARPAS-422
MHH-PREB-1MLMA
KHM-1B
KMS-34KMM-1KMS-18
BL-41
U-266KMS-28 BMNCI-H929KMS-26KMS-11
Mino
BLNKBTKEBF1PAX5
Fig 7. Separation of Cell Lines with 5 Gene Signature
Training Set
Training Set
Test Set
Test Set
85
Supplementary Fig 1. Expression Comparison between multiple
myeloma and Waldenström’s macroglobulinemia
Sixteen MM cell line and 57 WM primary patient samples had RNAseq available
for comparison of the 5 genes associated with dasatinib response. Each
sample’s gene expression value is represented as a fraction of the total number
of reads for that sample
86
MM WM-0.0001
0.0000
0.0001
0.0002
0.0003
0.0004
0.0005
CD19G
ene
Rea
ds/T
otal
Rea
ds
MM WM-0.0001
0.0000
0.0001
0.0002
0.0003
BLNK
Gen
e R
eads
/Tot
al R
eads
MM WM-0.0001
0.0000
0.0001
0.0002
0.0003
0.0004
BTK
Gen
e R
eads
/Tot
al R
eads
MM WM-0.0001
0.0000
0.0001
0.0002
0.0003
EBF1
Gen
e R
eads
/Tot
al R
eads
MM WM-0.0005
0.0000
0.0005
0.0010
0.0015
0.0020
PAX5
Gen
e R
eads
/Tot
al R
eads
A
C
D
EB
Supplementary Figure 1. Expression Comparison between multiple myeloma and Waldenström’s macroglobulinemia
87
Chapter4
Discussion and Future Aims
My thesis work can be described by two words: differences and
similarities. In Chapter 2, I described an endogenous mutagenic process that
contributes heterogeneity--differences. In Chapter 3, I describe a 5 gene
expression signature predictive of dasatinib sensitivity. Non-Responders have
low expression of these genes, Responders have higher expression—similarities.
Mutational Processes
In Chapter 2, I introduced a mechanism of endogenous mutagenesis that
contributes to cancer heterogeneity. Our work proved that A3B overexpression
mutates genomic DNA in human cell lines. The extent of the phenotypic changes
that result from this process is an area of active research. This work showed A3B
mutagenesis in human genomic DNA occurs most often at the sequence context
previously described in in vitro work (22).
Future Directions
Comparison of the whole genome sequences of the control clones
provided evidence of a background mutagenic process in the T-REx-293 lines
used. The 293 line and its derivatives (such as the T-REx-293 used in this study)
are widely used. Researchers should be aware of this feature of the 293 line as it
may be an inappropriate choice for some studies. For example, 293 lines are not
ideal in studies that assume no microsatellite instability.
Future study to better understand the source and extent of the apparent
mismatch repair deficiency is also be important for repair research. Down-
88
regulation of the mis-match repair genes, MLH1 and MLH3 have been found in
293-derived lines previously (101). The WGS revealed mutations of mismatch
repair genes in all clones indicating the original line procured commercially also
contained the mutations. Are these mutated genes still functional? Does the
down-regulation of either or both MLH1 and MLH3 cause the phenotype?
Transfection experiments of MLH1, MH3, and the mutated versions of mismatch
repair genes in our studies could rescue the defect and answer the above
questions.
Predictive Expression Profiles
In Chapter 3, we used pathway analysis to describe the expression of 5
genes of the BCR pathway as important for response to dasatinib. Pathway
analysis identified the BCR as the top canonical dysregulated. Study of
heatmaps of the BCR components indicates that not all responders have
upregulation of all BCR components, nor to the same degree (Fig 1). But taken
together, the dysregulation was significant and consistent – providing novel
understanding of factors that influence response.
Future Directions
Dasatinib is a multi-kinase inhibitor and can interact with multiple
molecules of the BCR pathway (139). How does BCR pathway activation
increase cell sensitivity to dasatinib? Is it a causal relationship or is BCR pathway
activation a concomitant event? These questions can be addressed with
transfection of dasatinib resistant lines and knock-down experiments of dasatinib
sensitive lines.
89
Importance of Pathway Analysis
Differential expression does not always identify a dysregulated pathway.
Each gene’s expression is evaluated independently of the networks or pathways
in which it biologically connected. For example, a more restrictive FDR of 5%
found 44 genes differentially expressed between dasatinib Responders and Non-
Responders, but only 2 of the 5 genes of the predictive response signature
(PAX5 and CD19) were among those genes. After identifying the BCR pathway
activation as important in dasatinib response, all genes of the pathway were re-
examined. This critical step identified the importance of BTK, which was not in
the original 228 genes, in dasatinib response. This illustrates differential
expression alone leaves out relationships between genes that are necessary to
make clear what is biologically relevant.
Personal Future Directions
Large databases like the GDSC are resources, but underutilized. My work
creates a workflow for identification of expression profiles of response (Ch 3, Fig
1). In the future, I will preform similar analysis in the GDSC and CCLE to gene
expression profiles for more drugs. I want to expand my analysis to include all
265 drugs that show efficacy in B-cell malignancies. My goal is to create a
hematologic-oncology expression chip that will direct physicians and patients to
the most effective treatments for the patient’s specific malignancy (Fig 2).
90
Fig1.GenesoftheBCRPathway
Responsive cell lines are represented on the left-most portion of the heatmap,
boxed in red. Non-Responders are on the right, boxed in blue. The heatmap
represents log2 (fluorescence intensity) values for 21 genes of the BCR pathway.
This illustrates the patchwork nature of expression in response. This difficulty is
overcome with pathway analysis when the genes that work together in a system
are examined together as well.
91
697
DOHH−2
KARPA
S−422
MHH−P
REB−1
MLMA
NALM
−6
SUP−B
15
Farage
GA−10
KARPA
S−231
NU−D
UL−1
SU−D
HL−16
SU−D
HL−4
SU−D
HL−6
IM−9
SK−M
M−2
RPMI−8226
EB2
TUR
WSU−N
HL
RL
BC−1
SCC−3
L−363
WSU−D
LCL2
PDK1GAB2BCRPAX5ABL1GAB1PTENTCF3CSKBLNKPLCG2GRB2LYN
BTKEBF1CD22CD79ABADFOXO1CD79BCD19
4
6
8
10Fig 1. Genes of the BCR Pathway
Responders Non-Responders
92
Fig2.PotentialofExpressionProfilinginPrecisionMedicine
The outline is a model for using expression signatures of response and non-
response for each drug (left panel). Patient Samples are assayed for their tumor
profile (middle panel). Patient profiles are matched to responsive profiles of
drugs. These matches result in the patient receiving therapy most likely to be
effective.
93
94
Bibliography
1. Boveri T. Concerning the Origin of Malignant Tumours by Theodor Boveri.
Translated and annotated by Henry Harris. J Cell Sci [Internet].
2008;121(Supplement 1):1–84. Available from:
http://jcs.biologists.org/cgi/doi/10.1242/jcs.025742
2. Rous P. a Sarcoma of the Fowl Transmissible By an Agent Separable
From the Tumor Cells. J Exp Med [Internet]. 1911;13(4):397–411.
Available from: http://www.jem.org/cgi/doi/10.1084/jem.13.4.397
3. Rusch HP, Baumann CA. Tumor Production in Mice with Ultraviolet
Irradiation. Am J Cancer [Internet]. 1939 Jan 1;35(1):55 LP-62. Available
from: http://cancerres.aacrjournals.org/content/35/1/55.abstract
4. Casellas R, Nussenzweig A, Wuerffel R, Pelanda R, Reichlin A, Suh H, et
al. Ku80 is required for immunoglobulin isotype switching. EMBO J.
1998;17(8):2404–11.
5. Leonard B, Hart SN, Burns MB, Carpenter MA, Temiz NA, Rathore A, et al.
APOBEC3B upregulation and genomic mutation patterns in serous ovarian
carcinoma. Cancer Res. 2013;73(24):7222–31.
6. Di Noia JM, Neuberger MS. Molecular Mechanisms of Antibody Somatic
Hypermutation. Annu Rev Biochem [Internet]. 2007;76(1):1–22. Available
from:
http://www.annualreviews.org/doi/10.1146/annurev.biochem.76.061705.09
0740
7. Casellas R, Basu U, Yewdell WT, Chaudhuri J, Robbiani DF, Di Noia JM.
95
Mutations, kataegis and translocations in B cells: understanding AID
promiscuous activity. Nat Rev Immunol [Internet]. 2016;16(3):164–76.
Available from: http://www.nature.com/doifinder/10.1038/nri.2016.2
8. Conticello SG. The AID/APOBEC family of nucleic acid mutators. Genome
Biol [Internet]. 2008;9(6):229. Available from:
http://genomebiology.biomedcentral.com/articles/10.1186/gb-2008-9-6-229
9. Salter JD, Bennett RP, Smith HC. The APOBEC Protein Family: United by
Structure, Divergent in Function. Vol. 41, Trends in Biochemical Sciences.
2016. p. 578–94.
10. Refsland EW, Harris RS. The APOBEC3 family of retroelement restriction
factors. Vol. 371, Current topics in microbiology and immunology. 2013. p.
1–27.
11. Stenglein MD, Burns MB, Li M, Lengyel J, Harris RS. APOBEC3 proteins
mediate the clearance of foreign DNA from human cells. Nat Struct Mol
Biol [Internet]. 2010;17(2):222–9. Available from:
http://www.nature.com/doifinder/10.1038/nsmb.1744
12. Chen H, Lilley CE, Yu Q, Lee D V., Chou J, Narvaiza I, et al. APOBEC3A
is a potent inhibitor of adeno-associated virus and retrotransposons. Curr
Biol. 2006;16(5):480–5.
13. Harris RS, Liddament MT. Retroviral restriction by APOBEC proteins. Nat
Rev Immunol [Internet]. 2004;4(11):868–77. Available from:
http://www.nature.com/doifinder/10.1038/nri1489
14. Lackey L, Law EK, Brown WL, Harris RS. Subcellular localization of the
96
APOBEC3 proteins during mitosis and implications for genomic DNA
deamination. Cell Cycle. 2013;12(5):762–72.
15. Kinomoto M, Kanno T, Shimura M, Ishizaka Y, Kojima A, Kurata T, et al. All
APOBEC3 family proteins differentially inhibit LINE-1 retrotransposition.
Nucleic Acids Res. 2007;35(9):2955–64.
16. Bogerd HP, Wiegand HL, Hulme AE, Garcia-Perez JL, O’Shea KS, Moran
J V., et al. Cellular inhibitors of long interspersed element 1 and Alu
retrotransposition. Proc Natl Acad Sci [Internet]. 2006;103(23):8780–5.
Available from: http://www.pnas.org/cgi/doi/10.1073/pnas.0603313103
17. Muckenfuss H, Hamdorf M, Held U, Perkovic M, Löwer J, Cichutek K, et al.
APOBEC3 proteins inhibit human LINE-1 retrotransposition. J Biol Chem.
2006;281(31):22161–72.
18. Lackey L, Demorest ZL, Land AM, Hultquist JF, Brown WL, Harris RS.
APOBEC3B and AID have similar nuclear import mechanisms. J Mol Biol.
2012;419(5):301–14.
19. Ramiro AR, Jankovic M, Eisenreich T, Difilippantonio S, Chen-Kiang S,
Muramatsu M, et al. AID is required for c-myc/IgH chromosome
translocations in vivo. Cell. 2004;118(4):431–8.
20. Potter M, Wiener F. Plasmacytomagenesis in mice: Model of neoplastic
development dependent upon chromosomal translocations. Vol. 13,
Carcinogenesis. 1992. p. 1681–97.
21. Küppers R, Dalla-Favera R. Mechanisms of chromosomal translocations in
B cell lymphomas. Oncogene [Internet]. 2001;20(40):5580–94. Available
97
from:
http://www.nature.com/doifinder/10.1038/sj.onc.1204640%0Ahttp://www.nc
bi.nlm.nih.gov/pubmed/11607811
22. Burns MB, Lackey L, Carpenter MA, Rathore A, Land AM, Leonard B, et al.
APOBEC3B is an enzymatic source of mutation in breast cancer. Nature
[Internet]. 2013;494(7437):366–70. Available from:
http://www.nature.com/doifinder/10.1038/nature11881
23. Burns MB, Temiz NA, Harris RS. Evidence for APOBEC3B mutagenesis in
multiple human cancers. Nat Genet [Internet]. 2013;45(9):977–83.
Available from: http://www.nature.com/doifinder/10.1038/ng.2701
24. Nik-Zainal S, Alexandrov LB, Wedge DC, Van Loo P, Greenman CD,
Raine K, et al. Mutational processes molding the genomes of 21 breast
cancers. Cell. 2012;149(5):979–93.
25. Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SAJR, Behjati S,
Biankin A V., et al. Signatures of mutational processes in human cancer.
Nature [Internet]. 2013;500(7463):415–21. Available from:
http://www.nature.com/doifinder/10.1038/nature12477
26. Harris RS, Stephens P, Tarpey P, Davies H, Loo P, Greenman C, et al.
Molecular mechanism and clinical impact of APOBEC3Bcatalyzed
mutagenesis in breast cancer. Breast Cancer Res 2015 171 [Internet].
2015;17(1):400404. Available from: http://breast-cancer-
research.com/content/17/1/8%5Cnhttps://breast-cancer-
research.biomedcentral.com/articles/10.1186/s13058-014-0498-3
98
27. Helleday T, Eshtad S, Nik-Zainal S. Mechanisms underlying mutational
signatures in human cancers. Nat Rev Genet [Internet]. 2014;15(9):585–
98. Available from: http://www.nature.com/doifinder/10.1038/nrg3729
28. Taylor BJM, Nik-Zainal S, Wu YL, Stebbings LA, Raine K, Campbell PJ, et
al. DNA deaminases induce break-associated mutation showers with
implication of APOBEC3B and 3A in breast cancer kataegis. Elife.
2013;2013(2).
29. Kidd JM, Newman TL, Tuzun E, Kaul R, Eichler EE. Population
stratification of a common APOBEC gene deletion polymorphism. PLoS
Genet. 2007;3(4):0584–92.
30. Long J, Delahanty RJ, Li G, Gao YT, Lu W, Cai Q, et al. A common
deletion in the APOBEC3 genes and breast cancer risk. J Natl Cancer Inst.
2013;105(8):573–9.
31. David A Hungerford PCN. A minute chromosome in human chronic
granulocytic leukemia. Science (80- ). 1960;132:1457–501.
32. Ren R. Mechanisms of BCR–ABL in the pathogenesis of chronic
myelogenous leukaemia. Nat Rev Cancer [Internet]. 2005;5(3):172–83.
Available from: http://www.nature.com/doifinder/10.1038/nrc1567
33. Lemonick M, Park A, Cray D GC. New Hope for Cancer. Time.
2001;157(21):62.
34. Druker BJ, Guilhot F, O’Brien SG, Gathmann I, Kantarjian H, Gattermann
N, et al. Five-year follow-up of patients receiving imatinib for chronic
myeloid leukemia. N Engl J Med [Internet]. 2006;355(23):2408–17.
99
Available from: http://www.ncbi.nlm.nih.gov/pubmed/17151364
35. Döhner H, Stilgenbauer S, Benner A, Leupolt E, Kröber A, Bullinger L, et
al. Genomic aberrations and survival in chronic lymphocytic leukemia. N
Engl J Med [Internet]. 2000;343(26):1910–6. Available from:
http://www.ncbi.nlm.nih.gov/pubmed/11136261%5Cnhttp://www.nejm.org/d
oi/pdf/10.1056/NEJM200012283432602
36. Slamon DJ. Studies of the HER-2/neu Proto-oncogene in Human Breast
Cancer. Cancer Invest [Internet]. 1990;8(2):253–4. Available from:
http://www.tandfonline.com/doi/full/10.3109/07357909009017573
37. Molina MA, Codony-Servat J, Albanell J, Rojo F, Arribas J, Baselga J.
Trastuzumab (Herceptin), a humanized anti-HER2 receptor monoclonal
antibody, inhibits basal and activated HER2 ectodomain cleavage in breast
cancer cells. Cancer Res. 2001;61(12):4744–9.
38. Carter P, Presta L, Gorman CM, Ridgway JB, Henner D, Wong WL, et al.
Humanization of an anti-p185HER2 antibody for human cancer therapy.
Proc Natl Acad Sci U S A [Internet]. 1992;89(May):4285–9. Available from:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed
&dopt=Citation&list_uids=1350088%5Cnhttp://www.ncbi.nlm.nih.gov/pmc/a
rticles/PMC49066/pdf/pnas01084-0075.pdf
39. De Santes K, Slamon D, Anderson SK, Shepard M, Fendly B, Maneval D,
et al. Radiolabeled antibody targeting of the HER-2/neu oncoprotein.
Cancer Res [Internet]. 1992;52(7):1916–23. Available from:
http://www.ncbi.nlm.nih.gov/pubmed/1348016
100
40. Shepard HM, Lewis GD, Sarup JC, Fendly BM, Maneval D, Mordenti J, et
al. Monoclonal antibody therapy of human cancer: Taking the HER2
protooncogene to the clinic. J Clin Immunol. 1991;11(3):117–27.
41. Demetri GD, von Mehren M, Blanke CD, Van den Abbeele AD, Eisenberg
B, Roberts PJ, et al. Efficacy and safety of imatinib mesylate in advanced
gastrointestinal stromal tumors. N Engl J Med [Internet]. 2002;347(7):472–
80. Available from:
https://www.researchgate.net/publication/11206772_Efficacy_and_Safety_
of_Imatinib_Mesylate_in_Advanced_Gastrointestinal_Stromal_Tumors
42. Din OS, Woll PJ. Treatment of Gastrointestinal Stromal Tumor: Focus on
Imatinib Mesylate. Ther Clin Risk Manag [Internet]. 2008;4(1):149–62.
Available from:
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2503651&tool=p
mcentrez&rendertype=abstract
43. Croxtall JD, McKeage K. Trastuzumab: In HER2-positive metastatic gastric
cancer. Vol. 70, Drugs. 2010. p. 2259–67.
44. Sawaki A, Ohashi Y, Omuro Y, Satoh T, Hamamoto Y, Boku N, et al.
Efficacy of trastuzumab in Japanese patients with HER2-positive advanced
gastric or gastroesophageal junction cancer: A subgroup analysis of the
Trastuzumab for Gastric Cancer (ToGA) study. Gastric Cancer.
2012;15(3):313–22.
45. Kim SY, Kim HP, Kim YJ, Oh DY, Im SA, Lee D, et al. Trastuzumab inhibits
the growth of human gastric cancer cell lines with HER2 amplification
101
synergistically with cisplatin. Int J Oncol. 2008;32(1):89–95.
46. Hoskins JM, Carey LA, McLeod HL. CYP2D6 and tamoxifen: DNA matters
in breast cancer. Nat Rev Cancer [Internet]. 2009;9(8):576–86. Available
from: http://www.nature.com/doifinder/10.1038/nrc2683
47. Jin Y, Desta Z, Stearns V, Ward B, Ho H, Lee KH, et al. CYP2D6
genotype, antidepressant use, and tamoxifen metabolism during adjuvant
breast cancer treatment. J Natl Cancer Inst. 2005;97(1):30–9.
48. Coons AH, Creech HJ, Jones RN. Immunological Properties of an Antibody
Containing a Fluorescent Group. Exp Biol Med. 1941;47(2):200–2.
49. Baselga J, Tripathy D, Mendelsohn J, Baughman S, Benz CC, Dantis L, et
al. Phase II Study of Weekly Intravenous Recombinant Humanized Anti-
p185HER2 Monoclonal Antibody in Patients with HER2/neu-
Overexpressing Metastatic Breast Cancer. J Clin Oncol. 1996;14(3):737–
44.
50. Morris S, Kirstein M, Valentine M, Dittmer K, Shapiro D, Saltman D, et al.
Fusion of a kinase gene, ALK, to a nucleolar protein gene, NPM, in non-
Hodgkin’s lymphoma. Science (80- ) [Internet]. 1994;263(5151):1281–4.
Available from:
http://www.sciencemag.org/cgi/doi/10.1126/science.8122112
51. Soda M, Choi YL, Enomoto M, Takada S, Yamashita Y, Ishikawa S, et al.
Identification of the transforming EML4–ALK fusion gene in non-small-cell
lung cancer. Nature [Internet]. 2007;448(7153):561–6. Available from:
http://www.nature.com/doifinder/10.1038/nature05945
102
52. Sukov WR, Hodge JC, Lohse CM, Akre MK, Leibovich BC, Thompson RH,
et al. ALK alterations in adult renal cell carcinoma: frequency,
clinicopathologic features and outcome in a large series of consecutively
treated patients. Mod Pathol [Internet]. 2012;25(11):1516–25. Available
from: http://www.ncbi.nlm.nih.gov/pubmed/22743654
53. Rivlin N, Brosh R, Oren M, Rotter V. Mutations in the p53 Tumor
Suppressor Gene: Important Milestones at the Various Steps of
Tumorigenesis. Genes Cancer [Internet]. 2011;2(4):466–74. Available
from: http://gan.sagepub.com/lookup/doi/10.1177/1947601911408889
54. Posa I, Carvalho S, Tavares J, Grosso AR. A pan-cancer analysis of MYC-
PVT1 reveals CNV-unmediated deregulation and poor prognosis in renal
carcinoma. Oncotarget [Internet]. 2016;7(30). Available from:
http://www.ncbi.nlm.nih.gov/pubmed/27366943
55. Cline MS, Craft B, Swatloski T, Goldman M, Ma S, Haussler D, et al.
Exploring TCGA Pan-Cancer Data at the UCSC Cancer Genomics
Browser. Sci Rep [Internet]. 2013;3(1):2652. Available from:
http://www.nature.com/articles/srep02652
56. Zack TI, Schumacher SE, Carter SL, Cherniack AD, Saksena G, Tabak B,
et al. Pan-cancer patterns of somatic copy number alteration. Nat Genet
[Internet]. 2013;45(10):1134–40. Available from:
http://www.nature.com/doifinder/10.1038/ng.2760
57. Cancer Genome Atlas Research Network, Weinstein JN, Collisson E a,
Mills GB, Shaw KRM, Ozenberger B a, et al. The Cancer Genome Atlas
103
Pan-Cancer analysis project. Nat Genet [Internet]. 2013;45(10):1113–20.
Available from:
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3919969&tool=p
mcentrez&rendertype=abstract%5Cnhttp://www.ncbi.nlm.nih.gov/pubmed/2
4071849%5Cnhttp://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=P
MC3919969
58. Yang W, Soares J, Greninger P, Edelman EJ, Lightfoot H, Forbes S, et al.
Genomics of Drug Sensitivity in Cancer (GDSC): A resource for therapeutic
biomarker discovery in cancer cells. Nucleic Acids Res. 2013;
59. Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S,
et al. The Cancer Cell Line Encyclopedia enables predictive modelling of
anticancer drug sensitivity. Nature [Internet]. 2012;483(7391):603–7.
Available from: http://dx.doi.org/10.1038/nature11003
60. Shoemaker RH. The NCI60 human tumour cell line anticancer drug screen.
Nat Rev Cancer [Internet]. 2006;6(10):813–23. Available from:
http://www.nature.com/doifinder/10.1038/nrc1951
61. Pieper K, Grimbacher B, Eibel H. B-cell biology and development. Vol. 131,
Journal of Allergy and Clinical Immunology. 2013. p. 959–71.
62. Roberts SA, Gordenin DA. Hypermutation in human cancer genomes:
footprints and mechanisms. Nat Rev Cancer [Internet]. 2014;14(12):786–
800. Available from: http://www.nature.com/doifinder/10.1038/nrc3816
63. Swanton C, McGranahan N, Starrett GJ, Harris RS. APOBEC Enzymes:
Mutagenic Fuel for Cancer Evolution and Heterogeneity. Vol. 5, Cancer
104
discovery. 2015. p. 704–12.
64. Henderson S, Fenton T. APOBEC3 genes: Retroviral restriction factors to
cancer drivers. Vol. 21, Trends in Molecular Medicine. 2015. p. 274–84.
65. Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio S a JR, Behjati S,
Biankin A V, et al. Signatures of mutational processes in human cancer.
Nature [Internet]. 2013;500:415–21. Available from:
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3776390&tool=p
mcentrez&rendertype=abstract
66. Cleaver JE, Crowley E. UV damage, DNA repair and skin carcinogenesis.
Front Biosci [Internet]. 2002;7:d1024-43. Available from:
http://www.ncbi.nlm.nih.gov/pubmed/11897551
67. Hecht SS. Lung carcinogenesis by tobacco smoke. Int J Cancer.
2012;131(12):2724–32.
68. Poon SL, Pang S-T, McPherson JR, Yu W, Huang KK, Guan P, et al.
Genome-wide mutational signatures of aristolochic acid and its application
as a screening tool. Sci Transl Med [Internet]. 2013;5(197):197ra101.
Available from: http://www.ncbi.nlm.nih.gov/pubmed/23926199
69. Poon S, McPherson JR, Tan P, Teh B, Rozen SG. Mutation signatures of
carcinogen exposure: genome-wide detection and new opportunities for
cancer prevention. Genome Med [Internet]. 2014;6(3):24. Available from:
http://genomemedicine.biomedcentral.com/articles/10.1186/gm541
70. Lord CJ, Tutt ANJ, Ashworth A. Synthetic Lethality and Cancer Therapy:
Lessons Learned from the Development of PARP Inhibitors. Annu Rev
105
Med [Internet]. 2015;66(1):455–70. Available from:
http://www.annualreviews.org/doi/10.1146/annurev-med-050913-022545
71. Roberts SA, Sterling J, Thompson C, Harris S, Mav D, Shah R, et al.
Clustered Mutations in Yeast and in Human Cancers Can Arise from
Damaged Long Single-Strand DNA Regions. Mol Cell. 2012;46(4):424–35.
72. Roberts SA, Lawrence MS, Klimczak LJ, Grimm SA, Fargo D, Stojanov P,
et al. An APOBEC cytidine deaminase mutagenesis pattern is widespread
in human cancers. Nat Genet [Internet]. 2013;45(9):970–6. Available from:
http://www.nature.com/doifinder/10.1038/ng.2702
73. Albin JS, Harris RS. Interactions of host APOBEC3 restriction factors with
HIV-1 in vivo: implications for therapeutics. Expert Rev Mol Med [Internet].
2010;12:e4. Available from:
http://www.journals.cambridge.org/abstract_S1462399409001343
74. Harris RS, Dudley JP. APOBECs and virus restriction. Vols. 479–480,
Virology. 2015. p. 131–45.
75. Bacolla A, Cooper DN, Vasquez KM. Mechanisms of base substitution
mutagenesis in cancer genomes. Vol. 5, Genes. 2014. p. 108–46.
76. Shinohara M, Io K, Shindo K, Matsui M, Sakamoto T, Tada K, et al.
APOBEC3B can impair genomic stability by inducing base substitutions in
genomic DNA in human cells. Sci Rep [Internet]. 2012;2. Available from:
http://www.nature.com/articles/srep00806
77. Henderson S, Chakravarthy A, Su X, Boshoff C, Fenton TR. APOBEC-
Mediated Cytosine Deamination Links PIK3CA Helical Domain Mutations
106
to Human Papillomavirus-Driven Tumor Development. Cell Rep.
2014;7(6):1833–41.
78. de Bruin EC, McGranahan N, Mitter R, Salm M, Wedge DC, Yates L, et al.
Spatial and temporal diversity in genomic instability processes defines lung
cancer evolution. Science (80- ) [Internet]. 2014;346(6206):251–6.
Available from:
http://www.sciencemag.org/cgi/doi/10.1126/science.1253462
79. Sieuwerts AM, Willis S, Burns MB, Look MP, Gelder MEM Van, Schlicker
A, et al. Elevated APOBEC3B Correlates with Poor Outcomes for
Estrogen-Receptor-Positive Breast Cancers. Horm Cancer. 2014;5(6):405–
13.
80. Cescon DW, Haibe-Kains B, Mak TW. APOBEC3B expression in breast
cancer reflects cellular proliferation, while a deletion polymorphism is
associated with immune activation. Proc Natl Acad Sci U S A [Internet].
2015;112(9):2841–6. Available from:
http://www.ncbi.nlm.nih.gov/pubmed/25730878%5Cnhttp://www.pubmedce
ntral.nih.gov/articlerender.fcgi?artid=PMC4352793
81. Leonard B, McCann JL, Starrett GJ, Kosyakovsky L, Luengas EM, Molan
AM, et al. The PKC/NF- B Signaling Pathway Induces APOBEC3B
Expression in Multiple Human Cancers. Cancer Res [Internet].
2015;75(21):4538–47. Available from:
http://cancerres.aacrjournals.org/cgi/doi/10.1158/0008-5472.CAN-15-2171-
T
107
82. Vieira VC, Leonard B, White EA, Starrett GJ, Temiz NA, Lorenz LD, et al.
Human papillomavirus E6 triggers upregulation of the antiviral and cancer
genomic DNA deaminase APOBEC3B. MBio. 2014;5(6).
83. Warren CJ, Xu T, Guo K, Griffin LM, Westrich JA, Lee D, et al. APOBEC3A
Functions as a Restriction Factor of Human Papillomavirus. J Virol
[Internet]. 2015;89(1):688–702. Available from:
http://jvi.asm.org/lookup/doi/10.1128/JVI.02383-14
84. Mori S, Takeuchi T, Ishii Y, Kukimoto I. Identification of APOBEC3B
promoter elements responsible for activation by human papillomavirus type
16 E6. Biochem Biophys Res Commun. 2015;460(3):555–60.
85. Periyasamy M, Patel H, Lai CF, Nguyen VTM, Nevedomskaya E, Harrod A,
et al. APOBEC3B-Mediated Cytidine Deamination Is Required for Estrogen
Receptor Action in Breast Cancer. Cell Rep. 2015;13(1):108–21.
86. Land AM, Wang J, Law EK, Aberle R, Kirmaier A, Krupp A, et al.
Degradation of the cancer genomic DNA deaminase APOBEC3B by SIV
Vif. Oncotarget [Internet]. 2015;6(37). Available from:
http://www.oncotarget.com/abstract/5483
87. Carpenter MA, Li M, Rathore A, Lackey L, Law EK, Land AM, et al.
Methylcytosine and normal cytosine deamination by the foreign DNA
restriction enzyme APOBEC3A. J Biol Chem. 2012;287(41):34801–8.
88. Suspene R, Aynaud M-M, Guetard D, Henry M, Eckhoff G, Marchio A, et
al. Somatic hypermutation of human mitochondrial and nuclear DNA by
APOBEC3 cytidine deaminases, a pathway for DNA catabolism. Proc Natl
108
Acad Sci [Internet]. 2011;108(12):4858–63. Available from:
http://www.pnas.org/cgi/doi/10.1073/pnas.1009687108
89. Rasmussen M, Sundstrom M, Goransson Kultima H, Botling J, Micke P,
Birgisson H, et al. Allele-specific copy number analysis of tumor samples
with aneuploidy and tumor heterogeneity. Genome Biol [Internet].
2011;12(10):R108. Available from:
http://www.ncbi.nlm.nih.gov/pubmed/22023820
90. Gehring JS, Fischer B, Lawrence M, Huber W. SomaticSignatures:
Inferring mutational signatures from single-nucleotide variants.
Bioinformatics. 2015;31(22):3673–5.
91. Suspène R, Henry M, Guillot S, Wain-Hobson S, Vartanian JP. Recovery
of APOBEC3-edited human immunodeficiency virus G→A hypermutants by
differential DNA denaturation PCR. J Gen Virol. 2005;86(1):125–9.
92. Liu M, Duke JL, Richter DJ, Vinuesa CG, Goodnow CC, Kleinstein SH, et
al. Two levels of protection for the B cell genome during somatic
hypermutation. Nature [Internet]. 2008;451(7180):841–5. Available from:
http://www.nature.com/doifinder/10.1038/nature06547
93. Nilsen H, An Q, Lindahl T. Mutation frequencies and AID activation state in
B-cell lymphomas from Ung-deficient mice. Oncogene [Internet].
2005;24(18):3063–6. Available from:
http://www.nature.com/doifinder/10.1038/sj.onc.1208480
94. Mussil B, Suspène R, Aynaud MM, Gauvrit A, Vartanian JP, Wain-Hobson
S. Human APOBEC3A Isoforms Translocate to the Nucleus and Induce
109
DNA Double Strand Breaks Leading to Cell Stress and Death. PLoS One.
2013;8(8).
95. Rucci F, Cattaneo L, Marrella V, Sacco MG, Sobacchi C, Lucchini F, et al.
Tissue-specific sensitivity to AID expression in transgenic mouse models.
Gene. 2006;377(1–2):150–8.
96. Komori J, Marusawa H, Machimoto T, Endo Y, Kinoshita K, Kou T, et al.
Activation-induced cytidine deaminase links bile duct inflammation to
human cholangiocarcinoma. Hepatology. 2008;47(3):888–96.
97. Matsumoto Y, Marusawa H, Kinoshita K, Endo Y, Kou T, Morisawa T, et al.
Helicobacter pylori infection triggers aberrant expression of activation-
induced cytidine deaminase in gastric epithelium. Nat Med [Internet].
2007;13(4):470–6. Available from:
http://www.nature.com/doifinder/10.1038/nm1566
98. Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, et al.
VarScan 2: Somatic mutation and copy number alteration discovery in
cancer by exome sequencing. Genome Res. 2012;22(3):568–76.
99. Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C,
et al. Sensitive detection of somatic point mutations in impure and
heterogeneous cancer samples. Nat Biotechnol [Internet]. 2013;31(3):213–
9. Available from: http://www.nature.com/doifinder/10.1038/nbt.2514
100. Boland CR, Goel A. Microsatellite Instability in Colorectal Cancer.
Gastroenterology. 2010;138(6).
101. Lin Y-C, Boone M, Meuris L, Lemmens I, Van Roy N, Soete A, et al.
110
Genome dynamics of the human embryonic kidney 293 lineage in
response to cell biology manipulations. Nat Commun [Internet].
2014;5:4767. Available from:
http://www.nature.com/doifinder/10.1038/ncomms5767
102. Roesner LM, Mielke C, Fähnrich S, Merkhoffer Y, Dittmar KEJ, Drexler HG,
et al. Stable expression of MutL?? in human cells reveals no specific
response to mismatched DNA, but distinct recruitment to damage sites. J
Cell Biochem. 2013;114(10):2405–14.
103. Trojan J, Zeuzem S, Randolph A, Hemmerle C, Brieger A, Raedle J, et al.
Functional analysis of hMLH1 variants and HNPCC-related mutations
using a human expression system. Gastroenterology. 2002;122(1):211–9.
104. Nik-Zainal S, Wedge DC, Alexandrov LB, Petljak M, Butler AP, Bolli N, et
al. Association of a germline copy number polymorphism of APOBEC3A
and APOBEC3B with burden of putative APOBEC-dependent mutations in
breast cancer. Nat Genet [Internet]. 2014;46(5):487–91. Available from:
http://www.nature.com/doifinder/10.1038/ng.2955
105. Thielen BK, McNevin JP, McElrath MJ, Hunt BVS, Klein KC, Lingappa JR.
Innate immune signaling induces high levels of TC-specific deaminase
activity in primary monocyte-derived cells through expression of
APOBEC3A isoforms. J Biol Chem. 2010;285(36):27753–66.
106. Hultquist JF, Lengyel JA, Refsland EW, LaRue RS, Lackey L, Brown WL,
et al. Human and Rhesus APOBEC3D, APOBEC3F, APOBEC3G, and
APOBEC3H Demonstrate a Conserved Capacity To Restrict Vif-Deficient
111
HIV-1. J Virol [Internet]. 2011;85(21):11220–34. Available from:
http://jvi.asm.org/cgi/doi/10.1128/JVI.05238-11
107. Aynaud MM, Suspène R, Vidalain PO, Mussil B, Guétard D, Tangy F, et al.
Human tribbles 3 protects nuclear DNA from cytidine deamination by
APOBEC3A. J Biol Chem. 2012;287(46):39182–92.
108. Land AM, Law EK, Carpenter MA, Lackey L, Brown WL, Harris RS.
Endogenous APOBEC3A DNA cytosine deaminase is cytoplasmic and
nongenotoxic. J Biol Chem. 2013;288(24):17253–60.
109. Shee C, Cox BD, Gu F, Luengas EM, Joshi MC, Chiu LY, et al. Engineered
proteins detect spontaneous DNA breakage in human and bacterial cells.
Elife. 2013;2:e01222.
110. Chan K, Roberts SA, Klimczak LJ, Sterling JF, Saini N, Malc EP, et al. An
APOBEC3A hypermutation signature is distinguishable from the signature
of background mutagenesis by APOBEC3B in human cancers. Nat Genet
[Internet]. 2015;47(9):1067–72. Available from:
http://www.nature.com/doifinder/10.1038/ng.3378
111. Caval V, Suspène R, Shapira M, Vartanian J-P, Wain-Hobson S. A
prevalent cancer susceptibility APOBEC3A hybrid allele bearing
APOBEC3B 3′UTR enhances chromosomal DNA damage. Nat Commun
[Internet]. 2014;5:5129. Available from:
http://www.nature.com/doifinder/10.1038/ncomms6129
112. Vangsted A, Klausen TW, Vogel U. Genetic variations in multiple myeloma
II: Association with effect of treatment. Vol. 88, European Journal of
112
Haematology. 2012. p. 93–117.
113. Fonseca R, Bergsagel PL, Drach J, Shaughnessy J, Gutierrez N, Stewart
AK, et al. International Myeloma Working Group molecular classification of
multiple myeloma: spotlight review. Leukemia [Internet].
2009;23(12):2210–21. Available from:
http://www.nature.com/doifinder/10.1038/leu.2009.174
114. Mitra AK, Mukherjee UK, Harding T, Jang JS, Stessman H, Li Y, et al.
Single-cell analysis of targeted transcriptome (SCATTome) predicts drug
sensitivity of single cells within human myeloma tumors. Leukemia
[Internet]. 2015;(October 2015):1094–102. Available from:
http://www.ncbi.nlm.nih.gov/pubmed/26710886
115. Iorio F, Knijnenburg TA, Vis DJ, Bignell GR, Menden MP, Schubert M, et
al. A Landscape of Pharmacogenomic Interactions in Cancer. Cell. 2015;
116. Lombardo LJ, Lee FY, Chen P, Norris D, Barrish JC, Behnia K, et al.
Discovery of N-(2-chloro-6-methylphenyl)-2-(6-(4-(2-hydroxyethyl)-
piperazin-1-yl)-2-methylpyrimidin-4-ylamino)thiazole-5-carboxamide (BMS-
354825), a dual Src/Abl kinase inhibitor with potent antitumor activity in
preclinical assays. J Med Chem. 2004;47(27):6658–61.
117. Chen R, Chen B. The role of dasatinib in the management of chronic
myeloid leukemia. Vol. 9, Drug Design, Development and Therapy. 2015.
p. 773–9.
118. Keating GM. Dasatinib: A Review in Chronic Myeloid Leukaemia and Ph+
Acute Lymphoblastic Leukaemia. Drugs [Internet]. 2017;77(1):85–96.
113
Available from: http://link.springer.com/10.1007/s40265-016-0677-x
119. Wei G, Rafiyath S, Liu D. First-line treatment for chronic myeloid leukemia:
dasatinib, nilotinib, or imatinib. J Hematol Oncol. 2010;3:47.
120. Kantarjian H, Shah NP, Hochhaus A, Cortes J, Shah S, Ayala M, et al.
Dasatinib versus imatinib in newly diagnosed chronic-phase chronic
myeloid leukemia. N Engl J Med [Internet]. 2010;362(24):2260–70.
Available from: http://www.ncbi.nlm.nih.gov/pubmed/20525995
121. Dunn EF, Iida M, Myers RA, Campbell DA, Hintz KA, Armstrong EA, et al.
Dasatinib sensitizes KRAS mutant colorectal tumors to cetuximab.
Oncogene [Internet]. 2011;30(5):561–74. Available from:
http://dx.doi.org/10.1038/onc.2010.430
122. Xiao J, Xu M, Hou T, Huang Y, Yang C, Li J. Dasatinib enhances antitumor
activity of paclitaxel in ovarian cancer through Src signaling. Mol Med Rep
[Internet]. 2015;12(3):3249–56. Available from:
http://www.ncbi.nlm.nih.gov/pubmed/25975261%5Cnhttp://www.pubmedce
ntral.nih.gov/articlerender.fcgi?artid=PMC4526065
123. Le XF, Mao W, He G, Claret FX, Xia W, Ahmed AA, et al. The role of
p27Kip1 in dasatinib-enhanced paclitaxel cytotoxicity in human ovarian
cancer cells. J Natl Cancer Inst. 2011;103(18):1403–22.
124. Montero JC, Seoane S, Ocaña A, Pandiella A. Inhibition of Src family
kinases and receptor tyrosine kinases by dasatinib: Possible combinations
in solid tumors. Vol. 17, Clinical Cancer Research. 2011. p. 5546–52.
125. Smyth GK. limma: Linear Models for Microarray Data Bioinformatics and
114
Computational Biology Solutions Using R and Bioconductor. In:
Bioinformatics and Computational Biology Solutions Using R and
Bioconductor. 2005. p. 397–420.
126. Chu G, Li J, Narasimhan B, Tibshirani R, Tusher V. SAM - Significance
Analysis of Microarrays - Users guide and technical document. Policy.
2011;1–42.
127. Schwender H, Krause A, Ickstadt K. Identifying interesting genes with
siggenes. RNews. 2006;6:45–50.
128. Seal HL. Studies in the History of Probability and Statistics. XV The
historical development of the Gauss linear model. Biometrika [Internet].
1967;54(1–2):1–24. Available from:
http://biomet.oxfordjournals.org/content/54/1-2/1.abstract
129. Kolde R. Package `pheatmap’. Bioconductor. 2012;1–6.
130. Mitra AK, Harding T, Mukherjee U, Jang J, Li Y, HongZheng R, et al. A
gene expression signature distinguishes innate response and resistance to
proteasome inhibitors in multiple myeloma. Blood Cancer J.
2017;(February):1–10.
131. Krämer A, Green J, Pollard J, Tugendreich S. Causal analysis approaches
in Ingenuity Pathway Analysis. Bioinformatics [Internet]. 2014;30(4):523–
30. Available from:
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3928520&tool=p
mcentrez&rendertype=abstract
132. Irish JM, Czerwinski DK, Nolan GP, Levy R. Altered B-cell receptor
115
signaling kinetics distinguish human follicular lymphoma B cells from
tumor-infiltrating nonmalignant B cells. Blood. 2006;108(9):3135–42.
133. Irish JM, Myklebust JH, Alizadeh AA, Houot R, Sharman JP, Czerwinski
DK, et al. B-cell signaling networks reveal a negative prognostic human
lymphoma cell subset that emerges during tumor progression. Proc Natl
Acad Sci [Internet]. 2010;107(29):12747–54. Available from:
http://www.pnas.org/cgi/doi/10.1073/pnas.1002057107
134. Blix ES, Irish JM, Husebekk A, Delabie J, Forfang L, Tierens AM, et al.
Phospho-specific flow cytometry identifies aberrant signaling in indolent B-
cell lymphoma. BMC Cancer [Internet]. 2012;12(1):478. Available from:
http://bmccancer.biomedcentral.com/articles/10.1186/1471-2407-12-478
135. Argyropoulos K, Vogel R, Ziegler C, Altan-Bonnet G, Velardi E, Calafiore
M, et al. Clonal B cells in Waldenström’s macroglobulinemia exhibit
functional features of chronic active B-cell receptor signaling. Leukemia.
2016;30(108):1116–25.
136. Molot RJ, Meeker TC, Wittwer CT, Perkins SL, Segal GH, Masih AS, et al.
Antigen expression and polymerase chain reaction amplification of mantle
cell lymphomas. Blood. 1994;83(6):1626–31.
137. Ginaldi L, De Martinis M, Matutes E, Farahat N, Morilla R, Catovsky D.
Levels of expression of CD19 and CD20 in chronic B cell leukaemias. J
Clin Pathol [Internet]. 1998;51(5):364–9. Available from:
http://jcp.bmj.com/cgi/doi/10.1136/jcp.51.5.364
138. Kim A, Seong KM, Kang HJ, Park S, Lee S-S. Inhibition of Lyn is a
116
promising treatment for mantle cell lymphoma with bortezomib resistance.
Oncotarget [Internet]. 6(35). Available from:
www.impactjournals.com/oncotarget
139. Shi H, Zhang CJ, Chen GYJ, Yao SQ. Cell-based proteome profiling of
potential Dasatinib targets by use of affinity-based probes. J Am Chem
Soc. 2012;134(6):3001–14.