somatic mutations in breast cancer genomes559456/fulltext01.pdfbreast cancer genomes, which broaden...

54
ACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2012 Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine 822 Somatic Mutations in Breast Cancer Genomes Discovery and Validation of Breast Cancer Genes XIANG JIAO ISSN 1651-6206 ISBN 978-91-554-8490-3 urn:nbn:se:uu:diva-182319

Upload: others

Post on 24-Feb-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

ACTAUNIVERSITATIS

UPSALIENSISUPPSALA

2012

Digital Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Medicine 822

Somatic Mutations in BreastCancer Genomes

Discovery and Validation of Breast CancerGenes

XIANG JIAO

ISSN 1651-6206ISBN 978-91-554-8490-3urn:nbn:se:uu:diva-182319

Page 2: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

Dissertation presented at Uppsala University to be publicly examined in Rudbecksalen, DagHammarskjölds v 20, Uppsala, Wednesday, November 21, 2012 at 09:15 for the degree ofDoctor of Philosophy (Faculty of Medicine). The examination will be conducted in English.

AbstractJiao, X. 2012. Somatic Mutations in Breast Cancer Genomes: Discovery and Validationof Breast Cancer Genes. Acta Universitatis Upsaliensis. Digital ComprehensiveSummaries of Uppsala Dissertations from the Faculty of Medicine 822. 53 pp. Uppsala.ISBN 978-91-554-8490-3.

Breast cancer is the most common cancer in women worldwide. However, the genetic alterationsthat lead to breast cancer are not fully understood. This thesis aims to identify novel genes ofpotential mechanistic, diagnostic or therapeutic interest in breast cancers by mutational analysisand whole-genome sequencing.

In paper I, sequencing of 36 previously identified candidate genes in 96 breast tumors withpatient-matched normal DNA determined the somatic mutation prevalence of these candidategenes and identified additional mutations in Notch, NF-κB, PI3K, and Hedgehog pathways aswell as in processes mediating DNA methylation, RNA processing and calcium signaling.

In paper II, comparison of massively parallel mate-pair sequencing results of a human genomebefore and after phi29-mediated multiple displacement amplification (MDA) revealed thatMDA introduces structural alteration artifacts, with an emphasis on false positive inversions,and impairs the sensitivity to detect true inversions. Therefore, MDA has limited value in samplepreparation for whole-genome sequencing for structural alteration detection.

In paper III, massively parallel paired-end sequencing identified gene rearrangements in 15hormone receptor negative breast cancers. Forty validated rearrangements were predicted todirectly affect 30 genes, involved in epigenetic regulation, cell mitosis, signalling transductionand glycolytic flux. RNA interference-based assays revealed the potential roles in cell growthof some affected genes, among which DDX10 was implicated to be involved in apoptosis.

In paper IV, a method for statistical evaluation of putative translocations detected bymassively parallel paired-end sequencing was proposed. In an application of this method toanalyse translocations detected by cancer genome deep paired-end sequencing, 76 putativetranslocations were classified into four categories, with the majority likely to be caused bymismapping due to repetitive regions.

Taken together, this thesis provides insights into genes and pathways mutated in sporadicbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancerand may ultimately facilitate the diagnosis and treatment of this disease.

Keywords: breast cancer, cancer gene, pathway, somatic mutation, structural alteration,sequencing, whole genome amplification

Xiang Jiao, Uppsala University, Department of Immunology, Genetics and Pathology,Rudbecklaboratoriet, SE-751 85 Uppsala, Sweden.

© Xiang Jiao 2012

ISSN 1651-6206ISBN 978-91-554-8490-3urn:nbn:se:uu:diva-182319 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-182319)

Page 3: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

Dedicated to my family 献给我的家人

Page 4: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate
Page 5: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

List of Papers

This thesis is based on the following papers, which are referred to in the text by their Roman numerals.

I Jiao, X., Wood, L.D., Lindman, M., Jones, S., Buckhaults, P.,

Polyak, K., Sukumar, S., Carter, H., Kim, D., Karchin, R. and Sjöblom, T. (2012) Somatic mutations in the Notch, NF-KB, PIK3CA, and Hedgehog pathways in human breast cancers. Genes Chromosomes Cancer, 51(5):480–489

II Jiao, X., Rosenlund, M., Hooper, S.D., Tellgren-Roth, C., He, L., Fu, Y., Mangion, J. and Sjöblom, T. (2011) Structural altera-tions from multiple displacement amplification of a human ge-nome revealed by mate-pair sequencing. PLoS ONE, 6(7): e22250

III Hooper, S.D.*, Jiao, X.*, Djureinovic, T., Larsson, C., Wärn-berg, F., Tellgren-Roth, C., Botling, J. and Sjöblom, T. Gene rearrangements in hormone receptor negative breast cancers re-vealed by paired-end sequencing. Submitted

IV Hooper, S.D., Jiao, X., Rosenlund, M., Tellgren-Roth, C., Cavelier, L. and Sjöblom, T. Interpreting translocations detect-ed by paired-end sequencing of cancer samples. Submitted

*These authors contributed equally to this work.

Reprints were made with permission from the respective publishers.

Page 6: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate
Page 7: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

Contents

Introduction ................................................................................................... 11 The cancer genome ................................................................................... 11

Genome instability as a driver of tumor development ......................... 11 Cancer genes and pathways ................................................................. 12

Technologies to characterize cancer genomes ......................................... 13 Identification of cancer genes prior to the completion of the human genome sequence ................................................................................. 14 New insights into cancer genomics from the completion of the human genome sequence ..................................................................... 14 Massively parallel sequencing technology in cancer genomics .......... 15

The landscape of cancer genomes ............................................................ 17 Breast cancer ............................................................................................ 24

Epidemiology and etiology .................................................................. 24 Pathology and staging .......................................................................... 24 Subtypes and targeted therapy ............................................................. 26 Genomic landscapes ............................................................................ 26

Present Investigation ..................................................................................... 29 Aims ......................................................................................................... 29 Results and discussion .............................................................................. 29

Paper I Genes in Notch, NF-κB, PI3K, and Hedgehog pathways are somatically mutated in human breast cancers ................................ 29 Paper II Phi29-mediated multiple displacement amplification introduces false positive structural alterations detected by whole-genome sequencing .............................................................................. 32 Paper III Somatic gene rearrangements in hormone receptor negative breast cancers ........................................................................ 33 Paper IV Statistical evaluation and interpretation of putative translocations detected by massively parallel paired-end sequencing . 35

Concluding remarks and future perspectives ................................................ 36

Appendix ....................................................................................................... 38

Acknowledgements ....................................................................................... 41

References ..................................................................................................... 43

Page 8: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate
Page 9: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

Abbreviations

AJCC American Joint Committee on Cancer CAN-genes Candidate cancer genes CGH Comparative genomic hybridization CGP Cancer Genome Project CHASM Cancer-specific high-throughput annotation of somatic

mutations CIN Chromosome instability DCIS Ductal carcinoma in situ ER Estrogen receptor HD Homozygous deletion HER2 Human epidermal growth factor receptor 2 HNPCC Hereditary non-polyposis colon cancer HR Hormone receptor ICGC International Cancer Genome Consortium IDC Invasive ductal carcinoma IHC Immunohistochemistry ILC Invasive lobular carcinoma LOH Loss of heterozygosity MDA Multiple displacement amplification MIN Microsatellite instability MMR Mismatch repair mTOR Mechanistic target of rapamycin MuSiC Mutational significance in cancer PCR Polymerase chain reaction PI3K Phosphoinositide 3-kinase PolyPhen Polymorphism phenotyping PR Progesterone receptor PTK Protein tyrosine kinase PTP Protein tyrosine phosphatase RTK Receptor tyrosine kinase SIFT Sorting intolerant from tolerant TCGA The Cancer Genome Atlas TNBC Triple negative breast cancer WHO World Health Organization

Page 10: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate
Page 11: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

11

Introduction

The cancer genome Cancer is, essentially, a genetic disease caused by a series of alterations in genes that control cell growth and proliferation. These alterations can be in the form of changes of one or a few nucleotides, such as point mutations, or displacements of larger DNA segments, namely structural alterations. Struc-tural alterations include translocations, inversions and copy number varia-tions caused by insertions, duplications and deletions. Mutations in cancer genomes are either constitutional or somatic. Constitutional mutations are inherited from a parent and cause hereditary susceptibility to cancer, whereas somatic mutations occur later in tumor development and result in sporadic tumors. In addition, epigenetic alterations that lead to up- and down-regulation of gene expression occur in most cancers 1-2.

Genome instability as a driver of tumor development Genomic instability is a prominent characteristic of both hereditary cancers and sporadic cancers, enabling malignant cells to randomly acquire muta-tions at a higher rate 3-4. Most cancers exhibit an elevated rate of chromo-some structure and number changes compared to normal cells, a phenotype termed chromosomal instability (CIN) 5, while some other cancers show increased level of expansions and contractions of oligonucleotide repeats, namely microsatellite instability (MIN) 6-7, or a higher rate of base-pair mu-tations 8.

Germline mutations in DNA repair genes are associated with genomic in-stability in the majority of hereditary tumors. For instance, MIN in heredi-tary non-polyposis colon cancers (HNPCC) results from mutations in DNA mismatch repair (MMR) genes 9-10, while germline mutations in MYH lead to an increase in G:C to T:A transversions in colorectal tumors 8. Another in-tensely studied example is that germline mutations in BRCA1 and BRCA2 are associated with CIN in ~25% of hereditary breast cancer 11 and 12% 12 of ovarian cancer 13-14.

However, the molecular mechanism of genome instability in sporadic tu-mors remains obscure. The idea that DNA repair deficiency is responsible for loss of genome integrity has been challenged, since mutations targeting DNA repair pathways seem insufficient for genomic instability in most spo-

Page 12: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

12

radic carcinomas as revealed by cancer genome sequencing studies 15-17, although disruption of such pathways could be underestimated in these stud-ies due to the incomplete characterization of the full caretaker gene cast as well as failure to detect structural rearrangements and epigenetic alterations, which also might lead to disruption of DNA repair processes.

Besides mutations disrupting DNA repair genes, activation of oncogenes has been observed to induce chromosomal instability 18-20. Based on these observations, the oncogene-induced DNA replication stress model was pro-posed 21. This model states that oncogene activation preferentially induces DNA damages at common fragile sites, while escape from p53-dependent apoptosis is required for these precancerous cells to survive and become cancerous.

It was previously thought that somatic genomic alterations accumulate progressively during cancer evolution. However, a particular phenomenon of CIN characterized by tens to hundreds of rearrangements apparently ac-quired within one single catastrophic event to form complex chromosome changes, termed chromothripsis, was recently discovered in multiple tumor types 22-24. Although several possibilities regarding the emergence of such a phenomenon have been suggested 23, 25-26, the actual initiating cause and un-derlying DNA repair mechanism and its potential implications in cancer remain undelineated.

Cancer genes and pathways A combination of genetic changes in cancer genes collectively leads to the initiation, maintenance and progression of cancer. By evaluating numbers and patterns of recurrent mutations, over 400 genes have been identified as cancer genes to this date, composing >2% of all known protein-coding genes in the human genome 27. It is believed that this cast will further be expanded by increased understanding of the cancer genome. In general, there are two major classes of cancer genes: oncogenes and tumor suppressor genes.

Oncogenes are genes that intrinsically stimulate cell growth and are al-tered by gain-of-function mutations in cancer, which enhance the physiolog-ical activities of genes that are not supposed to be activated under normal conditions. Thus activation of one allele is usually sufficient to confer tumor growth advantage, meaning that oncogenes are dominant. Activation of on-cogenes arises from gene amplifications (e.g. ERBB2 amplification in breast cancer), chromosomal translocations (e.g. MYC in B-cell lymphoma) or point mutations (e.g. BRAFV600E in melanoma). In addition, genomic rear-rangements creating fusion genes (e.g. TMPRSS2-ERG in prostate cancer) are also a source of oncogene activation.

Tumor suppressor genes, on the other hand, are generally inactivated by mutations in cancer. Such inactivation can be accomplished either by point mutations, leading to amino acid substitution or premature truncation of the

Page 13: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

13

protein products, or by large-scale rearrangements that disrupt the gene cod-ing sequences. Many tumor suppressor genes protect cells from unrestrained growth, and deficiencies in these genes will lead to a variety of cellular changes, such as abnormal growth signaling, resistance of cell death, avoid-ance of immune surveillance and reprogrammed energy metabolisms, and eventually give rise to tumorigenesis, as activation of oncogenes does 28. A special class of tumor suppressor genes is crucial for maintaining the ge-nomic stability of cells as discussed earlier, termed “caretakers” or stability genes. Mutations in these genes cause deficiency in DNA repair and lead to higher mutation rate in other genes 29.

It was previously thought that tumor suppressor genes including caretaker genes act in a recessive manner, meaning that both alleles have to be inacti-vated for a biological effect to result 30. But some tumor suppressor genes show a phenomenon called haplo-insufficiency, in which loss-of-function in only one copy may as well give rise to a cancerous phenotype (e.g. TP53 31-

32, CDKN1B 33, PTEN 34). Cancer genes belong to a limited number of pathways, and mutations on

different components belonging to the same pathway lead to similar pheno-types 35. For example, in the phosphoinositide 3-kinase (PI3K) pathway, oncogenic mutations have been frequently observed in multiple genes. Those mutations include activating mutations in genes encoding upstream receptor tyrosine kinases (RTKs) such as EGFR, ERBB2 and PDGFRA, in genes encoding PI3K catalytic p110α and p85α subunits such as PIK3CA and PIK3R1, genes encoding downstream kinases such as AKTs as well as inac-tivating mutations in PTEN, which encodes a negative regulator of this pathway 36. All mutations lead to the oncogenic activation of PI3K pathway, resulting in increased cell proliferation, growth and survival. Generally, al-teration events within the pathways are mutually exclusive, meaning that mutations affecting the same pathway are likely not to co-occur in the same patient. Understanding of cancer pathways also provides insights into thera-peutic intervention. For instance, many inhibitors targeting different PI3K nodes (RTKs, PI3K, AKT and mTOR) are currently undergoing clinical development 37.

Technologies to characterize cancer genomes To better prevent, detect and treat cancer, a comprehensive understanding of the genetic mechanism of cancer is required, including which genes are re-sponsible for cancer and how they participate in tumorigenesis. Systematic analysis of cancer genome is therefore fruitful for identification of genes and pathways wherein mutations contribute to cancer development.

Page 14: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

14

Identification of cancer genes prior to the completion of the human genome sequence The first report of cancer-associated gene mutation in cancer genome (HRASG12V in a human bladder carcinoma cell line) was published in 1982 38-

39, followed by subsequent discovery of many human cancer genes during the last two decades of 20th century. The early approaches for cancer gene identification include retrovirus studies 40-41, cytogenetic approaches 42-43, oncogene transformation assays 44, mapping of genes underlying familial cancer syndromes 45-46, and genome-wide search for regions recurrently af-fected by homozygous deletions (HD) and loss of heterozygosity (LOH) 47-

48, as reviewed in Weir et al 49. Using these methods, a handful of frequently mutated cancer genes were identified, such as MYC 50, ABL1 51, RB1 45, TP53 52, ERBB2 44, PTEN 53 and APC 54.

New insights into cancer genomics from the completion of the human genome sequence The completion of the human genome project at the beginning of the 21st century 55-56 provided a high-quality, comprehensive reference for human genome research, leading to new opportunities and strategies for systematic exploration of cancer genes.

Gene amplifications and deletions are among the causes of oncogene acti-vation and tumor suppressor gene inactivation, respectively. Therefore, de-tection of abnormal copy numbers in cancer genomes is an effective way to identify cancer genes. Completion of the human genome sequence enables copy number analysis with relatively a fine resolution and a high coverage. For instance, microarrays harboring representative probes that capture DNA fragments throughout the entire genome provide a quantitative assessment of DNA copy number profile, where the density and distribution of probes be-comes critical for the accuracy and precision of mapping copy number al-tered regions 57-61. Some cancer genes have been discovered by copy number analysis, including both oncogenes (associated with copy number gains) 62-65 and tumor suppressor genes (associated with copy number losses) 66-68. Be-sides DNA microarrays, digital karyotyping 69 and end-sequence profiling 70 also provide unbiased measurements of copy number changes across the whole genome.

Another major advance is the possibility of cancer genome resequencing to identify mutations involved in cancer. The resequencing effort of cancer genomes can be performed on certain genes of interest or on coding se-quences across the entire genome. For example, one of the most frequently mutated oncogenes, PIK3CA, was identified through resequencing of the PI3K genes among 35 colorectal cancers and further validated in 199 addi-tional colorectal cancers 71. On the one hand, such mutational analyses fo-

Page 15: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

15

cused on genes with known or potential relationships to cancer, for example the genes encoding tyrosine kinases 72, tyrosine phosphatases 73 and protein kinases 74 have identified genes recurrently mutated in cancer. On the other hand, resequencing of particular genes against large cohorts of patients shed light on the molecular mechanism of tumor response to existing therapy, providing guidelines for cancer treatment 75-77.

Following the first whole-exome sequencing of breast and colorectal can-cers published in 2006 15, mutational analyses have been expanded to cover almost all coding exons instead of focusing on a subset of genes 16, 78-79, which has remarkably broadened our knowledge about cancer genomes. These studies gave insight into the numbers of somatic point mutations and their variation across and within cancer types 16, 78-79, as well as revealed the fact that the vast majority of point mutations in cancer genomes are neutral, whereas only a small fraction collectively responsible for cancer develop-ment 16. Candidate genes with significantly higher mutation rates than the background were identified (e.g. IDH1 in glioblastoma), providing promis-ing targets for subsequent studies. However, due to the limitation of rese-quencing that it detects only point mutations but sheds no light on other forms of alterations, integration of resequencing with copy number analysis is necessary to obtain a comprehensive picture of genes altered in cancer 78-

82.

Massively parallel sequencing technology in cancer genomics Although the conventional polymerase chain reaction (PCR)-coupled Sanger sequencing approach is accurate and unbiased, the labor, time and cost re-quirements impeded wide application of this technology in whole-exome sequencing. Therefore, previous systematic sequencing studies have usually been restricted to certain subsets of genes, or to relatively small numbers of cancer samples. However, the emergence of massively parallel sequencing technologies 83-84, also known as next-generation sequencing or second-generation sequencing, promises high-throughput, low-cost approaches for cancer genome research. Several commercially available platforms (e.g. ABI/SOLiD, Roche/454 and Illumina/Solexa) use distinct nucleotide signal detection methods. By parallelizing the sequencing process and simultane-ously producing millions of short (50-400 bases) sequence reads 85-86, these platforms currently generate tens of billions of bases of DNA sequence per day (from performance specifications of Illumina HiSeq 2000), equivalent to more than 10 human haploid genomes. The yield has been increasing over the past several years and is predicted to continue rising in the future with the cost for sequencing of each DNA base decreasing dramatically at the same time.

The first determination of sequences from a cancer genome by second-generation sequencing has proved that the new sequencing technology is

Page 16: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

16

capable of detecting point mutations and short indels 87. In following se-quencing efforts featured by paired-end or mate pair sequencing strategies 88, which generate short reads from both ends of size-selected DNA fragments and detect incorrectly aligned read pairs (Figure 1), many chromosome rear-rangements including copy number alterations have been observed, demon-strating that massively parallel whole-genome sequencing can detect se-quence alterations at potentially all dimensions in cancer genomes 89-91. Tar-get capturing and enrichment approaches such as PCR, molecular inversion probes and hybrid capture 92 enable sequence analysis not only carried out genome-widely, but also focused on regions of interest, for instance, the exome 93. Moreover, RNA extracted from tissue can be sequenced using these high-throughput platforms as well, allowing characterization of the whole cancer transcriptome 94-95. These sequencing platforms have therefore been proved to be powerful for comprehensive characterization of cancer genomes 87, 89-90.

Despite their advances in medical genetics research, second-generation sequencing technologies present challenges as well. One practical issue is that these platforms always require large amount of input materials, which could be difficult for clinical tumor samples. Therefore accurate template amplification is in great demand. In PCR-coupled Sanger sequencing-based genome analyses, phi29-mediated multiple displacement whole genome amplification is the most common way to amplify template DNA due to its high fidelity at nucleotide level (error rate less than 3*10-6) 15-16, 96. However, multiple displacement amplification was reported to introduce massive false positive inversions detected by mate pair sequencing technology 97, which limits its utility in genome resequencing.

Another challenge lies in the structural variation discovery with second-generation sequencing data. Since the paired-end sequencing methods rely on mapping of short paired reads to a reference genome to discover structur-al variants, they have low sensitivity for detecting variation in repetitive regions 98, which are probably enriched with structural variation 99-100. Indi-vidual polymorphism of repetitive regions 101 and low-complexity regions leads to additional mapping mistakes. More in-depth understanding of these regions and their variation in the human genome and further optimization in bioinformatic methods are required before we can reliably predict variations in these regions.

Page 17: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

17

Figure 1. Schematic illustration of mate pair sequencing strategy. Genomic DNA is sheared into fragments, of which those in a certain size range are circular-ized with internal adaptors and subjected to sequencing. Sequenced mate pairs are mapped to a reference genome. Structural alterations are reported by abnormal map-ping of mate pairs.

The landscape of cancer genomes The rapidly evolving sequencing technologies generate massive genomic data at an increasing rate with reduced cost. In recent years in particular, large-scale analyses of cancer genomes have produced a wealth of infor-mation, which greatly expanded our knowledge on human cancer (summa-rized in Table 1). The launch of comprehensive cancer genome projects in-cluding Cancer Genome Project (CGP) 102, The Cancer Genome Atlas (TCGA) 103 and International Cancer Genome Consortium (ICGC) 104 facili-tates the compilation of an encyclopedic catalogue of the genomic changes involved in cancer.

Page 18: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

18

Table 1. Summary of selected cancer genome sequencing studies of common epithe-lial cancers (breast, colorectal, ovarian, prostate and lung carcinoma).

Year Tumor type Sequencing target T/N pairs† Findings

2003 72 Colorectal 138 PTK‡ genes 35 Identified recurrent mutations in NTRK3, FES, KDR, EPHA3, NTRK2, MLK4, GUCY2F.

2004 71 Colorectal, others

16 PI3K‡ genes 35+199 Identified frequent PIK3CA muta-tions in multiple cancer types.

2004 73 Colorectal 87 PTP‡ genes 18+157 Identified mutations in potential tumor suppressor genes PTPRT, PTPN13, PTPN14, PTPRG, PTPRF, PTPN3 in colorectal cancer.

2005 105 Breast Protein kinome (518 genes)

25 Identified diverse patterns of somatic mutations in breast cancer.

2005 106 Lung Protein kinome (518 genes)

33 Suggested candidate genes ATM, ATR, FGFR2, AURKC and re-vealed somatic mutation spectra in lung cancer.

2007 15-

16 Breast, colorectal

All RefSeq genes (18 191 genes)

11+24 per tumor type

The first sequencing effort of all coding regions in cancer ge-nomes. Identified 280 candidate genes and revealed the mutation landscape of breast and colorectal cancer genomes.

2007 107 Multiple Protein kinome (518 genes)

210 Provided insights into the intrinsic mutation rates and patterns of different cancer types.

2008 89 Lung Whole genome 2 The first cancer genome study using massively parallel se-quencing technology. Identified somatic rearrangements in cancer.

2008 108 Lung 623 candidate genes

188 Identified frequently mutated genes (e.g. NF1, LRP1B, PTPRD, ERBB4, NTRKs) and pathways in lung cancer.

2009 109 Breast (metastasis)

Whole genome, whole transcrip-tome

1§ Demonstrated single nucleotide mutational heterogeneity and mutational evolution in breast tumor progression.

2009 110 Breast Whole genome 24 The first genomic screen for somatic rearrangements in tu-mor samples. Revealed the ge-nome landscape of somatic rear-rangements in breast cancer.

2010 111 Lung Whole genome 1 Identified multiple mutation signa-tures linked to tobacco carcinogens from the full repertoire of ~23 000 somatic mutations.

2010 112 Breast Whole genome 1* Indicated that metastasis may arise from a minority of cells within the primary tumor.

2010 81 Multiple 1 507 known 441 Revealed different mutation rates

Page 19: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

19

cancer genes and druggable genes

and mutated gene sets across tumor types and subtypes. Identi-fied candidate genes (e.g. GNAS, GNAO1, MAP2K4) and pathways (e.g. RTK/RAS, GPCR, JNK).

2010 113 Ovarian All RefSeq genes (~18 000 genes)

8+34 Identified tumor suppressor gene ARID1A in ovary cancer.

2011 114 Breast Whole transcrip-tome

4 Discovered novel fusion genes (e.g. VAPB-IKZF3) with potential functional role in breast cancer.

2011 115 Prostate Whole genome 7 Identified novel recurrently dis-rupted genes CADM2 and MAGI2 and suggested a link between chromatin and transcriptional regulation and genomic rear-rangements.

2011 116 Ovarian Whole exome 316 Identified candidate genes (e.g. NF1, RB1, CDK12) and pathways (NOTCH, FOXM1) in ovarian carcinoma and provided classifica-tion with potential prognostic values.

2012 117-

118 Breast Whole genome 21 Identified distinct nucleotide sub-

stitution signatures, observed localized hypermutation and con-structed a model of breast cancer evolution.

2012 119 Breast Whole genome (n=46), whole exome (n=31)

77+240 Identified novel significantly mutated genes (e.g. GATA3, TBX3, ATR, RUNX1, LDRAP1, STMN2, AGTR2, SF3B1) in luminal breast cancer and revealed pathways (e.g. TP53, DNA replication, MMR) associated with aromatase inhibi-tor response.

2012 120 Breast Whole genome (n=15), whole exome (n=54)

65 Revealed mutations and structural alterations with clonal frequency and suggested involvement of cytoskeletal gene mutations in breast cancer.

2012 121 Breast Whole exome 100 +250 Revealed multiple mutation signa-tures of breast cancers and identi-fied novel driver mutations (e.g. AKT2).

2012 122 Breast Whole genome (n=22), whole exome (n=103)

108+235 Identified novel recurrent muta-tions in CBFB and a recurrent fusion gene MAGI3-AKT3.

2012 123 Colorectal Whole genome (n=97), whole exome (n=224)

276 Identified novel significantly altered genes (e.g. ARID1A, SOX9, FAM123B, ERBB2, IGF2) in colorectal cancer.

2012 124 Breast Whole exome 507 Revealed molecular subtype-specific patterns of mutations and identified novel candidate genes.

Page 20: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

20

†T/N pairs, patient-matched tumor/normal pairs investigated. In some cases, numbers of T/N pairs in discovery screen and validation screen are indicated before and after the plus sign “+”, respectively. ‡Abbreviations: PTK, protein tyrosine kinase; PI3K, Phosphatidylinositol 3-kinases; PTP, protein tyrosine phosphatase. §DNA from primary tumor and metastasis was analyzed in this study. *DNA from blood, primary tumor, metastasis and xenograft was analyzed in this study.

One question that cancer genome resequencing efforts attempted to answer is how many somatic mutations a cancer genome would harbor, which is largely dependent on the scope and sensitivity of mutation detection tech-nologies. It was previously known that somatic mutation prevalence varies both within and between different tumor types 107. Whole-genome sequenc-ing studies enable observation of genetic alterations previously undetectable by protein-coding sequence screens, including mutations in non-coding re-gions and large rearrangements. Primary breast cancers were reported to harbor ~7 000-10 000 somatic point mutations per genome 117, 119, 122 in which tens to hundreds reside in the protein-coding regions 15-16, 121-122, as well as up to hundreds (average 20-50) of somatic structural variants 110, 112,

117, 119, 122. The amount of somatic mutations in breast cancer is on a similar scale compared with pancreatic cancer 125, prostate cancer 115, non-hypermutated colorectal cancer 15-16, 123 and multiple myeloma 126, but is ~3-5 fold higher than that in acute myeloid leukemia (average 539 somatic muta-tions and structural alterations, 21 coding sequence mutations per tumor) 87,

127 and is lower by about one order of magnitude than that of lung carcino-ma 111, 128 and malignant melanoma 90, in which high mutation prevalence might be attributed to extensive mutagenic exposure such as tobacco carcin-ogens or ultraviolet light.

However, among the somatic genetic alterations in cancer genomes, only a small fraction actually confers selective advantage and contributes to tu-morigenesis, termed driver mutations, while others are non-causative pas-senger mutations 107. The minimum number of mutations necessary for tu-morigenesis has been estimated to be around 5-6, according to incidence modeling of solid tumors such as breast and colorectal cancers, and this number would be smaller in leukemia and childhood cancers 129. However, recent systematic mutational screens of cancer genomes suggested a higher number of causal gene mutations in each tumor (range 10-20 genes) 15, 130. Distinguishing the driver mutations from passengers cannot be accomplished by analyzing genetic data alone, but requires functional validation of the cancer-relevant activities. Since most functional assays are relatively labor and time intensive, prioritization of the genes for functional studies presents a great challenge in cancer genomic data interpretation.

Several measurements have been adopted to identify the most promising driver mutations. First, analyzing the ratio of non-synonymous mutations to synonymous mutations of a given gene would indicate whether the muta-

Page 21: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

21

tions have been under positive selection during tumor development, thus a higher than expected ratio always suggests driver mutation 107, 131. Second, assessment of the mutation prevalence in genes also identifies drivers that contribute to cancer if they are highly unlikely to be mutated by chance 15. Third, several tools have been employed to predict the effect of non-synonymous single nucleotide variants on protein function based on phylo-genetic conservation and physical considerations [e.g. Sorting Intolerant From Tolerant (SIFT) 132, Polymorphism Phenotyping (PolyPhen) 133, Pan-ther 134, MutationTaster 135, etc.]. Last but not least, as the number of path-ways involved in cancer is much smaller than that of cancer genes and a variety of mutations in multiple cancer genes from the same pathway would likely to have similar pathological effects 35, evaluation of the combined prevalence of somatic alterations at the pathway level provides strategies for identification of cancer-associated processes 81, 136.

In the past few years, comprehensive mutation interpretation implement-ing most if not all of these measurements has been introduced into cancer genome analyses. For example, Carter and her colleagues developed a com-putational pipeline for cancer-specific high-throughput annotation of somatic mutations (CHASM), which takes a total of 49 predictive features into ac-count for driver identification 137. Another example is a package for determi-nation of mutational significance in cancer (MuSiC), designed by Dees et al. MuSiC is the first software suite that integrates clinical data with coverage data and database references to identify drivers from large mutational dis-covery sets 138. Although many tools can help to prioritize the candidates of interest for downstream analyses, only the evidence from functional assays and biological studies can fully credential a candidate gene as a bona fide cancer gene.

Page 22: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

22

Figure 2. The most frequently mutated genes in breast cancers. Genes are sorted by somatic mutation prevalence. Data obtained from COSMIC 139 v61.

It is clear from cancer genome resequencing efforts that not all cancer genes are mutated at high prevalence. On the contrary, despite conferring selective advantage, the vast majority of cancer genes are infrequently mu-tated and are therefore difficult to identify through sequencing of a limited number of samples (Figure 2). In order to discover these infrequent driver mutations, systematic screens of large cohorts of patients are required. For instance, it was estimated that 500 tumor samples of a particular tumor type are needed in whole-exome sequencing studies to get a ~80% detection power of genes with ~3% true mutation frequency 104. A retrospection of cancer gene discovery in common epithelial cancers over the past 30 years is consistent with this statement in the sense that the discovery of novel cancer genes has been accelerated by large-scale genome analyses but most of the newly identified cancer genes are mutated at low prevalence (<10%)(Figure 3). From another point of view, recent large-scale cancer genome sequencing projects 123-124 indicate that the likelihood of finding novel frequently mutat-ed protein-coding genes in common cancer types is low.

Page 23: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

23

Figure 3. Cancer gene somatic mutation prevalence in common epithelial tu-mors as a function of year of discovery. For a complete list of genes with muta-tional prevalence and year of discovery, please see the Appendix.

Page 24: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

24

Breast cancer

Epidemiology and etiology Breast cancer represents the most frequent cancer and the leading cause of cancer death in women worldwide. It constitutes about one fourth of the female cancer cases and accounts for 14% of cancer deaths. An estimated 1.38 million cases were diagnosed in 2008 over the world. Incidence rates of breast cancer are generally high in Northern and Western Europe and North America, but lower in Africa and East Asia 140-141. In Sweden alone, over 7 000 women are diagnosed with breast cancer annually 142-144.

It is estimated that about 7% of breast cancers are hereditary due to pre-disposing genetic factors such as germline mutations in BRCA1 145 or BRCA2 146, whereas the vast majority are sporadic breast cancers. A number of genome-wide association studies implicated other risk factor loci. Besides mutations in susceptibility genes, other established risk factors of breast cancer include postmenopausal hormone replacement therapy, oral contra-ceptives, reproductive patterns such as nulliparity, late age at first birth and avoidance of breast-feeding, physical inactivity, alcohol intake and obesity 147.

Pathology and staging The progression of breast cancer is a multi-step process starting with ductal hyper-proliferation termed atypical hyperplasia. Subsequent pathological stages include in situ carcinoma and invasive carcinoma, of which the for-mer is characterized by neoplastic cell proliferation restricted within the basement membrane whereas the latter is featured by invasion into the sur-rounding stroma. Finally, invasive carcinoma evolves into a lethal metastatic cancer 148. Ductal carcinoma in situ (DCIS) is thought to be the precursor of invasive ductal carcinoma (IDC), the most common histological subtype of breast cancer. However, studies on genetic profiles of in situ, invasive and metastatic breast carcinomas failed to discover stage-specific genetic events in the tumor cells. Instead, other studies indicated that microenvironmental factors such as abnormalities in the surrounding myoepithelial cells and stromal cells play a key role in breast cancer invasion and metastasis 149. Therefore, the multi-step breast cancer development and progression is at-tributed to acquisition of genetic/epigenetic alterations conferring new selec-tive advantage to the cells as well as tumor-promoting microenvironmental alterations 4.

Page 25: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

25

The most common clinical staging system for breast cancer is the TNM system formulated by American Joint Committee on Cancer (AJCC). In this system, primary tumor size (T), spreading to nearby lymph nodes (N) and distant metastasis (M) are evaluated collectively to suggest the pathologic stage of breast cancer ranged from stage 0 to stage IV (Figure 4) 150.

Figure 4. TNM classification of breast carcinoma.

Stage IIIC

Breast tumor

N3

Stage IIAStage I

Stage IIBStage IVStage IIIB

Stage IIIA

Stage 0

M1M0

N0

Tis T1

T2

T3 T4

N1 N2

T0,1 T2 T3

T0,1,2,3

T4

TPrimary tumor

•Tis carcinoma in situ

•T0 no evidence

•T1 tumor ≤ 2 cm

•T2 2 cm - 5 cm

•T3 tumor > 5 cm

•T4 any size, direct extension to chest wall or skin

NRegional lymph nodes (LN)

•N0No LN metastasis

•N1Metastasis in movable axillary LNs

•N2Metastasis in fixed axillaryor internal mammary LNs

•N3Metastasis ina) infraclavicular, orb) internal mammary and axillary, orc) supraclavicular LNs

MDistant metastasis

•M0No distant metastasis

•M1Distant metastasis

Page 26: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

26

Subtypes and targeted therapy Breast cancer is a collection of substantially heterogeneous tumors with dis-tinct morphology, genomic landscapes, prognosis and responses to treat-ment. The World Health Organization (WHO) histological classification of breast tumors is based on microscopic morphological features. IDC repre-sents the largest group (~75%) of invasive breast cancer, while invasive lob-ular carcinoma (ILC) accounts for 5-15% of all invasive tumors. Remaining tumors including medullary, neuroendocrine, tubular, inflammatory among others, are known as “special” types 151.

Based on immunohistochemistry (IHC) staining of the biomarkers estro-gen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor 2 (HER2), breast cancer can be categorized into at least three subtypes: hormone receptor positive (HR+, ER+ and/or PR+), HER2 posi-tive (ER-/PR-/HER2+) and triple negative breast cancers (TNBC, ER-/PR-/HER2-) 152-153. Each IHC subtype exhibits distinct prognosis and response to therapies. For instance, most HR+ tumors respond well to hormonal inter-ventions such as tamoxifen and aromatase inhibitors, while HER2+ tumors are generally sensitive to anti-HER2 therapies such as trastuzumab (trade name Herceptin®). Despite many ongoing clinical developments, unfortu-nately there is no promising targeted therapy available for all TNBCs.

Another widely accepted human breast cancer classification system is based on characterized gene expression profiles, also known as “molecular portraits” 154. Breast tumors can thus be categorized into at least five sub-types, namely luminal A, luminal B, basal-like, HER2-enriched and normal breast-like 155. Clinical trial cohort studies showed that patients classified into different subgroups exhibit distinct outcomes and response to therapies, suggesting that the molecular subtype classification is a strong prognostic indicator 155-158. Moreover, these molecular subtypes are also distinct in ge-nomic complexity 122, 159, mutational patterns and epigenetic alterations 124, and preference for sites of distant metastases 160.

Genomic landscapes A pilot genome-wide sequencing effort on breast tumors identified a total of 1 137 somatically mutated genes from 11 breast cancers, with an average of 52 non-synonymous mutations per sample. Using gene mutation prevalence as principal criteria, 140 genes were identified as candidate cancer genes (CAN-genes) that require further evaluation to confirm their roles as causal contributors to tumorigenesis 15-16. These studies also portrayed the genomic landscape of human breast cancer that consists of a few frequently mutated gene “mountains” and a huge number of infrequently mutated (usually < 5%) gene “hills” 16.

Page 27: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

27

Besides the discovery of novel candidate genes, more recent studies have delineated new aspects of breast cancer exomes. Interrogating luminal-type breast cancer genomes with clinical data revealed that somatic mutations in TP53 signaling pathway, DNA replication and mismatch repair are associat-ed with aromatase inhibitor resistance 119. Determination of clonal frequen-cies by deep sequencing provided new insights into the initiating events of TNBCs 120.

Massively parallel paired-end sequencing technologies enable whole-genome detection of gene rearrangements at DNA sequence level 88. An analysis on 24 breast cancers revealed more than 2 000 gene rearrangements, enriched with tandem duplications 110. Analysis of breast cancers across a variety of subtypes revealed that luminal B and HER2-enriched breast tu-mors harbor many more structural rearrangements when compared to lu-minal A subtype. However, no frequently recurrent rearrangements have been discovered in breast cancer by previous studies except for the MAGI3-AKT3 gene fusion detected in 4% (9 out of 257) of breast cancers 122.

Like all cancer types, breast cancer progression is thought to be a dynam-ic multi-step Darwinian evolution process. Independent mutations occur in a stepwise fashion, of which those conferring selective advantages promote cell proliferation and clonal expansion 109. Through deep whole-genome sequencing of 21 breast cancers and analysis of subclonal genetic alterations, Nik-Zainal et al proposed a model for clonal evolution that many molecular aberrations accumulate in dormant cell lineages before final expansion of the most-recent common ancestor, which triggers diagnosis 118.

Integrative breast cancer studies aim at developing new definition of breast cancer subtypes with better prognostic and predictive values. A cluster analysis integrating copy number and gene expression profiles of ~2 000 breast cancers suggested a novel classification system 161. A recent multi-platform study on hundreds of breast cancers revealed subtype-specific pat-tern in many tumor characteristics including gene mutations, microRNA expression, DNA methylation, copy number changes and protein expression. Moreover, in whole-exome sequencing of more than 500 tumors this study also revealed almost all frequently altered pathways (PI3K/AKT, TP53, RB) in breast cancer (Figure 5) 124.

Page 28: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

28

Figure 5. Frequently altered pathways in breast cancer cells. Yellow curves represent cell membrane. Genes always affected by activating mutations in breast cancers are illustrated in red boxes, while genes always inactivated shown in blue boxes.

Page 29: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

29

Present Investigation

Aims The main objective of this thesis was to discover and validate novel genes of potential mechanistic, diagnostic or therapeutic interest in breast cancers. More specifically the aims were as follows:

I To assess somatic mutation prevalence of 36 candidate genes in 96 breast cancer patients.

II To investigate whether phi29-mediated whole genome ampli-fication introduces false positive structural alterations detected by massively parallel mate pair sequencing.

III To identify somatic structural variants in hormone receptor negative breast cancer by paired-end whole-genome sequenc-ing.

IV To evaluate and interpret putative translocations identified by paired-end sequencing using a set of statistical tests.

Results and discussion Paper I Genes in Notch, NF-κB, PI3K, and Hedgehog pathways are somatically mutated in human breast cancers A previous exome-wide mutational analysis identified 140 candidate genes possibly involved in breast cancer, most of which were affected by recurring but infrequent somatic mutations 15-16. To further determine the mutation prevalence of these genes and investigate which pathways and processes might be involved in breast carcinogenesis, we selected 36 novel candidate genes and sequenced their protein-coding regions in a panel of 96 human breast cancers with patient-matched normal DNA.

We observed a total of 30 novel somatic mutations comprising 14 mis-sense, three frame-shift, one truncating, one splice site, and 11 synonymous mutations in 12 genes with potential impact on protein function, namely ADAM12, CENTB1 (also known as ACAP1), CENTG1 (also known as AGAP2), DIP2C, GLI1, GRIN2D, HDLBP, IKBKB, KPNA5, NFKB1, NOTCH1, and OTOF.

Page 30: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

30

Sequencing of genes from the Notch pathway (NOTCH1 and ADAM12) revealed several novel somatic mutations including a frame-shift mutation in NOTCH1 and a missense substitution in ADAM12 predicted to be disease-causing, implying the involvement of alterations in this pathway in breast cancer development.

NF-κB signaling plays a crucial role in the regulation of the inflammatory response in cancer development 162, whereas the PI3K pathway, frequently altered in breast cancer, is involved in cell growth, proliferation and surviv-al 163. In this study, somatic mutations in IKBKB, NFKB1, CENTB1 and CENTG1, with a combined mutation prevalence of 8%, highlighted the in-volvement of NF-κB pathway in breast tumorigenesis. Moreover, the likely disease-causing mutations discovered in CENTB1 and CENTG1, which sit at the crossroads of NF-κB and PI3K signaling implicated alternative paths for PI3K activation. Notably, these genes are mutated more frequently in hor-mone receptor negative breast tumors (22%, n=32) than in receptor positive tumors (1.7%, n=60) when our dataset was combined with a previous screen 15 (p=0.002). No additional mutations were observed in three other NF-κB pathway genes (NFKBIE, NFKBIA and KEAP1) tested in this study.

Missense mutations identified in the sonic hedgehog effector GLI1 sug-gested that this gene might be activated in breast cancers, given the fact that GLI1 is a proto-oncogene usually amplified in malignant glioma 164. We also observed mutations in genes involved in DNA methylation (DIP2C), RNA metabolism (HDLBP), nuclear protein transport (KPNA5) and ion channel (GRIN2D), pointing to novel processes likely involved in breast cancer de-velopment.

Generally, our results are consistent with the knowledge that these genes are infrequently altered in breast cancer. However, for most genes reported in this study, the somatic mutation prevalences are lower than those from the previous study 15-16, but are comparable with other contemporaneous studies 120-121 (Table 2). Potential explanations for the disparity in mutation frequencies include the variance in mutation detection sensitivity among different sequencing strategies, the inability of mutational screens based on a low number of samples to pinpoint the true mutation prevalence, and differ-ent sample cohort composition in terms of histological/molecular subtypes used in these studies. Further, the effective sequence coverage of early se-cond-generation exome sequencing may be lower than that of Sanger-based approaches. However, given that breast cancer is a heterogeneous disease, larger cohorts of well-characterized samples will be required to better inter-rogate subtype-specific mutation prevalence.

Page 31: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

31

Table 2. Comparison of candidate gene mutation prevalence† in published cancer genome projects.

Genes‡ Current study

(n=96§)

Wood et al 16

(n=35)

Shah et al 120

(n=65, TNBC)

Stephens et al 121

(n=100)

Banerji et al 122

(n=103)

Ellis et al 119

(n=77, luminal)

ADAM12 2% 9% 2% 0 0 0 CENTB1 1% 6% 0 2% 0 0 CENTG1 1% 6% 0 0 0 0 DIP2C 2% 11% 0 0 0 0 GLI1 1% 6% 0 1% 0 0 GRIN2D 2% 9% 0 0 0 0 HDLBP 1% 9% 0 1% 2% 3% IKBKB 1% 3% 0 0 0 0 KPNA5 1% 6% 0 1% 0 0 NFKB1 1% 3% 0 0 0 0 NOTCH1 4% 9% 0 1% 1% 0 OTOF 1% 9% 0 4% 1% 0 ABCA3 0 9% 0 0 0 0 AIM1 0 11% 0 1% 0 0 AMFR 0 6% 0 0 1% 0 ATP8B1 0 11% 0 0 0 0 BAP1 0 3% 2% 2% 0 1% CYP1A1 0 6% 0 0 0 0 DBN1 0 9% 0 0 0 0 FLJ13479 (ZNF668) 0 11% 3% 0 0 0

GEN1 0 6% 0 1% 0 0 GAB1 0 6% 0 0 1% 0 HOXA3 0 6% 0 0 0 0 KEAP1 0 6% 0 0 0 0 KIAA1946 (FAM171B) 0 9% 0 2% 1% 0

LOC340156 (MYLK4) 0 9% 0 0 1% 0

LRRFIP1 0 9% 0 0 0 1% MRE11A 0 6% 0 0 0 1% NCOA6 0 6% 0 0 0 1% NFKBIA 0 3% 2% 0 0 0 NFKBIE 0 6% 0 0 0 0 PIK3R1 0 3% 0 1% 1% 1% SIX4 0 9% 3% 0 1% 0 TCF1 (HNF1A) 0 6% 0 0 0 0

TMEM123 0 6% 0 0 0 0 VEPH1 0 9% 3% 0 0 0 †Only non-synonymous mutations were included in mutation prevalence calculation. ‡HGNC symbols were indicated in brackets if different from the names used in paper I. §All common breast cancer subtypes were included if not otherwise specified. TNBC, triple negative breast cancer.

Page 32: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

32

To identify a gene’s role as driver from genome-wide mutational screens is challenging when, as in most cases, its mutation prevalence is low. However, by jointly evaluating mutation prevalence of multiple genes in a pathway manner, one can locate the biological process critical for tumorigenesis, even if the mutation frequency is low for every single member 165. In this paper, each of the NF-κB components were altered in <2% of samples and there-fore would have been neglected by single gene examination. However, the overall mutation rate of this pathway makes it of potential interest by com-bined calculation. Further, pathway analysis can provide strategies for mo-lecular-based cancer therapy, since targeting certain components could effec-tively correct the aberration caused by mutations in other genes within the pathway 37.

To sum up, we substantiated the evidence supporting the role of muta-tions in a subset of novel candidate cancer genes in breast tumorigenesis. We have identified additional somatic mutations in genes of the Notch, Hedge-hog, NF-κB and PI3K pathways as well as in processes not yet strongly linked to human cancer such as RNA processing and calcium signaling.

Paper II Phi29-mediated multiple displacement amplification introduces false positive structural alterations detected by whole-genome sequencing Genome-wide identification of acquired alterations can currently be achieved with second-generation sequencing technologies. However, con-siderable amount of high-quality DNA is required, which could be a big hurdle for investigation on tumor biopsy specimens. Possible solutions to this problem include construction of patient-derived cell lines in vitro or xenografts to expand the cells, and whole genome amplification to obtain sufficient amount of DNA. Despite elimination of interference by non-neoplastic cells 166, establishing cell lines and xenografts is labor- and time-consuming. Therefore, an approach for accurate amplification of genomic DNA is highly demanded for large-scale studies.

Among a variety of available whole genome amplification methods, iso-thermal multiple displacement amplification (MDA) using highly processive phi29 DNA polymerase is the most commonly used for the purpose of se-quencing, genotyping and comparative genomic hybridization (CGH) ar-rays 167. It has been reported that MDA introduces copy number variant arti-facts 168 in human genome analysis and chimeras with inverted sequences when applied to prokaryotic genomes 169. However, the spectrum and extent of structural alterations formed by MDA in mammalian genomes remain unknown.

In order to investigate the effect of MDA on sequencing coverage, sensi-tivity and specificity of rearrangement detection, and to determine the abun-

Page 33: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

33

dance of false positive structural alterations detected by massively parallel paired-end sequencing, we sequenced the human genome from a healthy donor using SOLiD long-insert mate pair sequencing platform (Life Tech-nologies) before and after MDA.

A greater local read coverage variation was found in the MDA sample, implying that MDA biased sequence representation in the sequencing result. We observed that 12.6% of mate pairs from the MDA sample indicated po-tential structural alterations, which was much higher than those from the non-amplified sample (4.4%). The numbers of mate pairs supporting dele-tions, insertions and translocations were increased by MDA within one order of magnitude, whereas MDA led to a ~50-fold increase in non-redundant mate pairs spanning putative inversions and a ~10-fold increase in pairs spanning double inversions.

On the one hand, inversion detection with more stringent criteria revealed that the MDA sample had >200-fold more inversions evenly distributed across the whole genome, and the size range of the inversed sequences was substantially different in the MDA sample. On the other hand, only <50% of true inversions was detected by sequencing after MDA, suggesting that a large number of inversion artifacts in the MDA sample masked the true posi-tive inversions, resulting in lower detection sensitivity.

Inter-chromosomal translocations observed in non-amplified sample and MDA sample were filtered by a series of in silico steps to remove likely false positives. Validation of the remaining translocations by PCR demon-strated that the two translocations detected only in the MDA sample were actually also present in the non-amplified DNA verified using PCR followed by Sanger sequencing, indicating that phi29-mediated MDA did not intro-duce false positive inter-chromosomal translocations.

This study concludes that phi29-mediated MDA introduces structural al-teration artifacts, with an emphasis on false positive inversions. As it leads to sequence representation bias and greatly increases the subsequent validation effort, MDA currently has limited value in whole-genome sequencing.

Paper III Somatic gene rearrangements in hormone receptor negative breast cancers Recent genome-wide mutational analyses have revealed many genes with recurrent somatic mutations in breast cancer, however, chromosomal rear-rangements in terms of large insertions, deletions, inversions and transloca-tions in breast cancer have not been intensively studied. In order to interro-gate genes affected by somatic structural variants in breast cancer, we se-quenced 15 hormone receptor negative breast cancers using massively paral-lel paired-end sequencing of both ends of ~2.5 kb DNA fragments. Thirteen samples were sequenced at a relatively low coverage (average clone cover-

Page 34: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

34

age ~8-fold), while two samples were sequenced at a 10-fold higher cover-age (average clone coverage ~80-fold). The numbers and types of structural variations varied among tumors. The most striking disparity was observed in insertions. Two samples harbored thousands of insertions each, whereas others had a median of 17 insertions (range 3-260).

Putative rearrangements were selected for validation by PCR and Sanger sequencing in tumor and patient-matched normal DNA to identify true so-matic rearrangements. Forty rearrangements including 8 deletions, 6 inver-sions and 26 inter-chromosomal translocations were validated to be somatic events. These validated rearrangements were predicted to directly affect 30 genes, including genes previously reported to be altered in cancer as well as genes that have not yet been related to cancer. We identified rearranged genes involved in epigenetic regulation, cell mitosis, signaling transduction and glycolytic flux, as well as genes whose function is not yet clear. Silenc-ing of the genes CLTC, EPHA5, TNIK, DDX10 and SKA3 by siRNA inhibits cell growth in breast adenocarcinoma cell line MCF-7 and premalignant breast cell line MCF-10A, indicating their potential role in breast cancer development.

Consistent with other studies 110, 119-120, 122, we did not observe frequently recurring rearrangements. The only recurrent somatic structural alteration was an inter-chromosomal translocation t(11:13)(q22.3,q12.11) disrupting genes DDX10 and SKA3 in two out of 15 breast tumors. DDX10 (DEAD box polypeptide 10) encodes an RNA helicase and is previously known to form the NUP98-DDX10 fusion oncogene in leukemia, whereas SKA3 (spindle and kinetochore associated complex subunit 3) has not been linked to any type of cancer so far. Besides the growth inhibition effect induced by RNA interference, DDX10-suppressed cell lines also showed a higher percentage of cells exhibiting apoptotic nuclear morphology, suggesting that DDX10 might be involved in cell apoptosis process.

Long-insert paired-end sequencing strategies provide several advantages for whole genome analysis. First, they generate a higher clone coverage compared to short-insert libraries with the same sequencing throughput. Se-cond, they allow detection of rearrangements spanning repetitive regions, which cannot be identified by short-insert paired-end sequencing.

In summary, this study identified several novel candidate genes altered in breast cancers by somatic rearrangements, suggesting an alternative path for cancer gene discovery. RNAi-based functional assays validated their roles in cell growth and indicated potential involvement of DDX10 in apoptosis, nominating a few promising target genes for future research.

Page 35: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

35

Paper IV Statistical evaluation and interpretation of putative translocations detected by massively parallel paired-end sequencing Whole-genome paired-end sequencing has become a commonly used tech-nique to identify genetic alterations. However, bioinformatic analyses to reliably detect gene rearrangements and validation of putative alterations remain major challenges. In previous studies where PCR- or sequencing-based validation was employed to test structural alterations detected by paired-end sequencing, the validation rate (percentage of putative rear-rangements that were confirmed to be true) could be as low as 50-60% de-spite the bioinformatic filters used 122-123, indicating that a considerable pro-portion of predicted structural alterations were either different from predic-tion by paired-end reads, or false positives. Possible explanations include mapping errors of reads due to repetitive regions of the human genome as well as the complicated nature of rearrangements, for instance, translocations accompanied by deletions. In this study we interrogated 76 putative inter-chromosomal translocations detected by deep paired-end sequencing of two breast cancer samples with HER2 overexpression, attempting to study the patterns of paired-end reads and develop a method to suggest putative trans-locations for further validation.

We propose an idealized translocation model with statistical description on the strand distribution, strand fidelity, anchor position correlation and variance and anchor range of the supporting paired-end reads. Assessed with these criteria, the 76 putative translocations were divided into four major categories. Category 1 containing 60 translocations was featured by lack of anchor correlation, probably caused by read mismapping due to repetitive regions. Category 2 was composed of only one transposition representing insertion of a chromosomal fragment into a heterologous chromosome. Four-teen translocations in category 3 co-occurred with chimeric deletions at or near the breakpoints, whereas the only one translocation in category 4 ful-filled all the criteria for an idealized translocation.

This study showed that only a minor fraction of the putative transloca-tions in these two breast cancer samples were classified as ideal, whereas the majority were consequences of sequence repeats or complicated rearrange-ments involving deletions. It might partly explain the low validation rate observed in some studies, although further tests in a larger population and in other cancer types are required.

Page 36: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

36

Concluding remarks and future perspectives

This thesis identified and validated candidate genes for breast cancer by mutational analysis and whole-genome sequencing of breast tumor samples, providing a greater insight of genes and pathways mutated in sporadic breast cancer genomes. It also presented the challenges in cancer genome research, such as limited quality and quantities of tumor samples, as well as structural variation detection using massively parallel sequencing technology.

During the past 30 years, discovery of driver mutations has been the ma-jor cancer research activity leading to development of diagnosis and thera-peutics. Taking advantages of rapidly evolving genome technologies, intense efforts to interrogate cancer genomes would ultimately generate the full rep-ertoire of alterations in diverse cancer types in near future. In the following decades, extensive functional and mechanistic studies are required in order to validate that candidate mutations substantially contribute to cancer devel-opment, thus having the potential of serving as biomarkers or drug targets.

An intensively discussed issue in breast cancer genome research is tumor heterogeneity. Like other types of cancer, breast cancer exhibits a high de-gree of diversity in genetic and epigenetic alterations and gene expression profiles, at both intertumor and intratumor levels. Intertumor heterogeneity of breast cancer is represented by a variety of classification systems based on histological or molecular aspects of breast tumors, among which molecular subtype is currently served as the basis for treatment decision. Intratumor heterogeneity refers to the difference between cancer cells within one tumor in tumorigenic traits such as angiogenic, invasive and metastatic potential, reflecting cellular diversity for genetic and epigenetic altertions.

Fully understanding breast cancer heterogeneity is a prerequisite for per-sonalized treatment of breast cancer. Despite significant predictive values, the molecular subtype or IHC subtype classification systems appear insuffi-ciently informative in individualized treatment decisions, as tumors from the same subtype do not always respond similarly to a certain therapy. Further-more, dynamic tumor heterogeneity (i.e. expansion of subclones during and after therapy) should be appreciated since it is the reason for treatment fail-ures in many cases.

Breast cancer is a dynamic and heterogeneous disease. With the im-provement in sequencing technologies as well as other investigation ap-proaches, future studies on breast cancer genomes will:

Page 37: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

37

• Reveal new insights into the molecular basis of initiation, maintenance and progression of breast cancer;

• Provide multi-dimensional stratification of breast tumors for accurate therapy decision;

• Shed light on the dynamics of tumor heterogeneity and monitor sub-clonal expansion with high sensitivity to prevent relapse.

Although many challenges exist on the way from cancer genome research to personalized treatment, I believe that a comprehensive and in-depth under-standing of breast cancer genome will ultimately build the bridge from bench to bedside.

Page 38: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

38

Appendix

Table 3. Cancer genes discovered by mutational analyses in breast, colorectal, ovari-an, prostate and lung cancers.

Gene Year of discovery

Mutation prevalence† Major mutation types†‡

KRAS 1982 41 43% in colorectal 123 14% in ovary 17% in lung

MS at codon 12 or 13

ERBB2 1985 170 ~20% in breast 171 AMP, MS, in-frame insertion in PTK domain.

EGFR 1988 172 5-15% in colorectal 173 27% in lung

AMP, MS, in-frame deletion in PTK domain

RB1 1988 174 6% in breast 2% in ovary 116

HD, NS

TP53 1989 52 23% in breast 60% in colorectal 123 47% in ovary 18% in prostate 37% in lung 175§

MS, FS, NS 175

APC 1992 54 81% in colorectal 123 9% in prostate

FS, NS

CDH1 1994 176-177 17% in breast MS, FS, NSCDKN2A 1994 47 13% in lung HD, MS, NSSMAD2 1996 178 6% in colorectal 123 MS, NSSMAD4 1996 179 10% in colorectal 123 MS, HDCTNNB1 1997 180 5% in colorectal 123

11% in ovary MS, in-frame deletion

MAP2K4 1997 181 4% in breast 5% in colorectal

MS, HD

PTEN 1997 53 6% in breast 4% in colorectal 123 16% in prostate

MS, FS

TCF7L2 1999 182 9% in colorectal 123 MS, NSEP300 2000 183 3% in breast

16% in colorectalMS, FS

LRP1B 2000 184 9% in lung MSFBXW7 2001 185 11% in colorectal 123 MS, NSPIK3R1 2001 186 2% in breast

5% in colorectal 3% in ovary

MS, in-frame deletions in the inter-SH2 region

BRAF 2002 187 12% in colorectal 10% in ovary

MS at codon 600

EPHA3 2003 72 4% in lung MS

Page 39: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

39

NTRK3 2003 72 2% in lung MSEPHB2 2004 188 2% in prostate 189-191 MSGATA3 2004 192 11% in breast FS, MSPIK3CA 2004 71 26% in breast

18% in colorectal 123 11% in ovary 2% in prostate

MS at codon 1047, 545, 542

TMPRSS2-ERG 2005 193 41% in prostate Fusion geneZFHX3 (also known as ATBF1)

2005 194 8% in prostate 189-190 MS

DIP2C 2006 15 5% in breast MSGNAS 2006 15 6% in colorectal

11% in ovary AMP, MS at codon 201

MLL3 2006 15 7% in breast 124 12% in colorectal

MS, FS, NS

PTPRD 2006 15 2% in breast 4% in lung

MS

AKT1 2007 195 5% in breast 124 MSALK 2008 196-199 8% in colorectal MS at codon 1174 and 1275 ERBB4 2008 108 4% in lung MSNF1 2008 108 2% in breast

4% in ovary 116 5% in lung

FS, NS, MS

KDM6A (also known as UTX)

2009 200 2% in lung FS, NS

ARID1A 2010 113, 201 5% in breast 18% in colorectal 29% in ovary

FS, NS, MS

NCOA2 2010 191 6% in prostate 191 AMPPPP2R1A 2010 113 7% in ovary MSCDK12 2011 116 3% in ovary 116 MS, NSCSMD3 2011 116 6% in ovary 116 MSSPOP 2011 115 10% in prostate 189 MSVTI1A-TCF7L2 2011 202 3% in colorectal 202 Fusion geneAFF2 2012 123 3% in breast 124 MSAGTR2 2012 119 2% in breast 119, 121 MSAKT2 2012 121 0.6% in breast 119, 121-122 MSARID1B 2012 121 4% in breast 121-122 HD, NS, FS, MSATR 2012 119 3% in breast 119-122 MSCASP8 2012 121 2% in breast 120-122 NS, MSCBFB 2012 119, 122 3% in breast 119, 121-122 FS, NS, MSCDKN1B 2012 121 2% in breast 119, 121-122

4% in prostate 189FS

FAM123B 2012 123 7% in colorectal 123 HD, NSFOXA1 2012 189 2% in breast 124

4% in prostate 189MS, FS

LDLRAP1 2012 119 2% in breast 119, 122 MS, NSMAGI3-AKT3 2012 122 4% in breast 122 Fusion geneMAP3K1 2012 119, 121 7% in breast 119, 121 FS, NSMAP3K13 2012 121 3% in breast 121 FS, in-frame deletion, MS MED12 2012 189 5% in prostate 189-190 MS at codon 44

Page 40: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

40

MLL2 2012 190 5% in prostate 189-190 MS, NS, FSNCOR1 2012 121 4% in breast 119-122 HD, MS, NSNIPA2 2012 189 3% in prostate 189 MSPTPN22 2012 124 1% in breast 124 MSRUNX1 2012 119 3% in breast 119, 121 MSSCN11A 2012 189 5% in prostate 189-190 MSSF3B1 2012 119 3% in breast 119, 121-122 MSSMARCD1 2012 121 2% in breast 121 NS, MSSOX9 2012 123 4% in colorectal 123 FS, NSSTMN2 2012 119 2% in breast 119, 122 FS, MSSYNE3 (C14orf49)

2012 189 4% in prostate 189 MS

TBX3 2012 119, 121 4% in breast 119, 121 In-frame deletions, FS THSD7B 2012 189 6% in prostate 189 MSZNF595 2012 189 4% in prostate 189 MS †Mutation prevalence and mutation type composition across all cancer types obtained from Catalogue of Somatic Mutations in Cancer (COSMIC), version 60 released on 19th July 2012, http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/ 139 unless otherwise specified. ‡Abbreviations: AMP, amplification; FS, frame-shift indel; HD, homologous deletion; MS, missense substitution; NS, nonsense mutation; PTK, protein tyrosine kinase. §The version of the IARC TP53 database: R15, November 2010.

Page 41: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

41

Acknowledgements

This work was carried out at the Department of Immunology, Genetics and Pathology, Rudbeck Laboratory at Uppsala University. Financial support was provided by the Swedish Cancer Foundation, the Virginia and D.K. Ludwig Fund for Cancer Research and the National Institutes of Health (NIH).

I would like to take this opportunity to express my gratitude to the people who have supported and helped me during my work.

First of all, I would like to sincerely thank my supervisor, Tobias Sjöblom. Thanks for accepting me as a PhD student in your group, for guiding me in academic journey with great patience, for motivating me with your enthusi-asm in science, for always being there for me whenever I turned to you.

Thanks to my co-supervisor, Ulf Gyllensten, for encouraging me and for including me in the Medical Genetics group meetings during the beginning of my time in Rudbeck.

Thanks to all the MOLCAN people, for creating a nice working environ-ment and sharing a great time in the lab (and out of the lab as well). Sara K, Muhammad and Lucy, for your company along the way and fighting to-gether with me. Monica and Sofia, for preparing high-quality samples for my work and keeping everything in the lab in a good order. Sean, for our great collaboration and for taking good care of me in London. Tanja and Chatarina for your skilled contributions to this work. Ivaylo and Snehang-shu for your insightful advice and critical thinking. Jenny, for being the best student I’ve ever had and for often working with me until midnight. Tom for keeping my old computer working normally. Verónica, for the delicacies from Venezuela and for encouraging me over facebook. Viktor for pointing out my stupid mistakes while reading this thesis. Magnus, for encouraging me to learn bioinformatics and for creating the list “Things to do in Stock-holm” for me. To all ex-colleagues Dongyan, Sara S, Tanzila, Anders and Jessica, for the good old days we have had together.

Thanks to all the co-authors and collaborators in Sweden and abroad for your contribution to these studies. Special thanks to Liqun from Life Tech-nologies for generous help in analyzing data from SOLiD sequencing.

Page 42: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

42

Thanks to Bengt Glimelius, for your endless knowledge on colorectal can-cer and insightful discussion at Friday journal clubs. To Gunnar Westin, Lene Uhrbom and Eva Berglund, for your valuable and constructive advice on my half-time review. To Karin Forsberg Nilsson and Anna Dimberg for your help with the application process of my dissertation.

To the Uppsala Genome Center for providing fast and reliable sequencing service.

To the IGP administration board for keeping everything running smoothly and patiently answering my questions and fixing my problems.

To all the friends and colleagues at Rudbeck: Nicola, Millaray, Lesley, Larry, Jelena, Demet, Umash, Annika, Suomi, Lothar, Antonia, Jimmy, Anja, Ivana, Ammar, Hamid and many others, thank you for making Rud-beck such a great place to work!

To guys and girls from the “Chinese lunch-table”: Xiujuan, Yiwen, Yuan X, Anqi, Dan, Hua, Lei Z, Lei C, Kun, Junhong, Di, Gucci, Jin and Mar-cy for the delightful lunch time.

Rachel, thank you very much for convincing me ten years ago that Uppsala is a perfect place to study and to live (by not telling me how cold and dark the winters will be). You are totally right!

To all the fellows from CSSAU (Chinese Student and Scholar Association in Uppsala): Chengxi, Meng, Mi, Xin, Yan, Yemao, Yuan T, Lei S, En and others, for bringing me so much fun and reminding me that life is not all about research.

To all my friends in Sweden, in China, and elsewhere, for missing me, sup-porting me and caring about me.

Mom and Dad, thanks for your endless love, for always believing in me and letting me pursue the life I want. 爸爸妈妈,谢谢你们一直相信我,支持我,爱我,让我追寻自己想要的生活。

Dear Yifei, thank you for the greatest love I’ve experienced ever.

Page 43: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

43

References

1. Jones PA and Baylin SB, The fundamental role of epigenetic events in cancer. Nat Rev Genet, 2002. 3(6): p. 415-428.

2. Jones PA and Baylin SB, The epigenomics of cancer. Cell, 2007. 128(4): p. 683-692.

3. Negrini S, Gorgoulis VG, and Halazonetis TD, Genomic instability--an evolving hallmark of cancer. Nat Rev Mol Cell Biol, 2010. 11(3): p. 220-228.

4. Hanahan D and Weinberg RA, Hallmarks of cancer: the next generation. Cell, 2011. 144(5): p. 646-674.

5. Lengauer C, Kinzler KW, and Vogelstein B, Genetic instability in colorectal cancers. Nature, 1997. 386(6625): p. 623-627.

6. Bhattacharyya NP, Skandalis A, Ganesh A, Groden J, and Meuth M, Mutator phenotypes in human colorectal carcinoma cell lines. Proc Natl Acad Sci U S A, 1994. 91(14): p. 6319-6323.

7. Marra G and Boland CR, Hereditary nonpolyposis colorectal cancer: the syndrome, the genes, and historical perspectives. J Natl Cancer Inst, 1995. 87(15): p. 1114-1125.

8. Al-Tassan N, Chmiel NH, Maynard J, et al., Inherited variants of MYH associated with somatic G:C-->T:A mutations in colorectal tumors. Nat Genet, 2002. 30(2): p. 227-232.

9. Fishel R, Lescoe MK, Rao MR, et al., The human mutator gene homolog MSH2 and its association with hereditary nonpolyposis colon cancer. Cell, 1993. 75(5): p. 1027-1038.

10. Leach FS, Nicolaides NC, Papadopoulos N, et al., Mutations of a mutS homolog in hereditary nonpolyposis colorectal cancer. Cell, 1993. 75(6): p. 1215-1225.

11. Meindl A, Ditsch N, Kast K, Rhiem K, and Schmutzler RK, Hereditary breast and ovarian cancer: new genes, new treatments, new concepts. Dtsch Arztebl Int, 2011. 108(19): p. 323-330.

12. Risch HA, McLaughlin JR, Cole DE, et al., Prevalence and penetrance of germline BRCA1 and BRCA2 mutations in a population series of 649 women with ovarian cancer. Am J Hum Genet, 2001. 68(3): p. 700-710.

13. Gretarsdottir S, Thorlacius S, Valgardsdottir R, et al., BRCA2 and p53 mutations in primary breast cancer in relation to genetic instability. Cancer Res, 1998. 58(5): p. 859-862.

14. Tirkkonen M, Johannsson O, Agnarsson BA, et al., Distinct somatic genetic changes associated with tumor progression in carriers of BRCA1 and BRCA2 germ-line mutations. Cancer Res, 1997. 57(7): p. 1222-1227.

15. Sjöblom T, Jones S, Wood LD, et al., The consensus coding sequences of human breast and colorectal cancers. Science, 2006. 314(5797): p. 268-274.

16. Wood LD, Parsons DW, Jones S, et al., The genomic landscapes of human breast and colorectal cancers. Science, 2007. 318(5853): p. 1108-1113.

Page 44: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

44

17. Wang TL, Rago C, Silliman N, et al., Prevalence of somatic alterations in the colorectal cancer cell genome. Proc Natl Acad Sci U S A, 2002. 99(5): p. 3076-3080.

18. Denko NC, Giaccia AJ, Stringer JR, and Stambrook PJ, The human Ha-ras oncogene induces genomic instability in murine fibroblasts within one cell cycle. Proc Natl Acad Sci U S A, 1994. 91(11): p. 5124-5128.

19. Karlsson A, Deb-Basu D, Cherry A, Turner S, Ford J, and Felsher DW, Defective double-strand DNA break repair and chromosomal translocations by MYC overexpression. Proc Natl Acad Sci U S A, 2003. 100(17): p. 9974-9979.

20. Woo RA and Poon RY, Activated oncogenes promote and cooperate with chromosomal instability for neoplastic transformation. Genes Dev, 2004. 18(11): p. 1317-1330.

21. Halazonetis TD, Gorgoulis VG, and Bartek J, An oncogene-induced DNA damage model for cancer development. Science, 2008. 319(5868): p. 1352-1355.

22. Kloosterman WP, Hoogstraat M, Paling O, et al., Chromothripsis is a common mechanism driving genomic rearrangements in primary and metastatic colorectal cancer. Genome Biol, 2011. 12(10): p. R103.

23. Stephens PJ, Greenman CD, Fu B, et al., Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell, 2011. 144(1): p. 27-40.

24. Molenaar JJ, Koster J, Zwijnenburg DA, et al., Sequencing of neuroblastoma identifies chromothripsis and defects in neuritogenesis genes. Nature, 2012. 483(7391): p. 589-593.

25. Liu P, Erez A, Nagamani SC, et al., Chromosome catastrophes involve replication mechanisms generating complex genomic rearrangements. Cell, 2011. 146(6): p. 889-903.

26. Maher CA and Wilson RK, Chromothripsis and human disease: piecing together the shattering process. Cell, 2012. 148(1-2): p. 29-32.

27. Futreal PA, Coin L, Marshall M, et al., A census of human cancer genes. Nat Rev Cancer, 2004. 4(3): p. 177-183.

28. Hanahan D and Weinberg RA, The hallmarks of cancer. Cell, 2000. 100(1): p. 57-70.

29. Friedberg EC, DNA damage and repair. Nature, 2003. 421(6921): p. 436-440. 30. Knudson AG, Jr., Mutation and cancer: statistical study of retinoblastoma.

Proc Natl Acad Sci U S A, 1971. 68(4): p. 820-823. 31. Nigro JM, Baker SJ, Preisinger AC, et al., Mutations in the p53 gene occur in

diverse human tumour types. Nature, 1989. 342(6250): p. 705-708. 32. Mulligan LM, Matlashewski GJ, Scrable HJ, and Cavenee WK, Mechanisms of

p53 loss in human sarcomas. Proc Natl Acad Sci U S A, 1990. 87(15): p. 5863-5867.

33. Pietenpol JA, Bohlander SK, Sato Y, et al., Assignment of the human p27Kip1 gene to 12p13 and its analysis in leukemias. Cancer Res, 1995. 55(6): p. 1206-1210.

34. Muller M, Rink K, Krause H, and Miller K, PTEN/MMAC1 mutations in prostate cancer. Prostate Cancer Prostatic Dis, 2000. 3(S1): p. S32.

35. Van Dyke T and Jacks T, Cancer modeling in the modern era: progress and challenges. Cell, 2002. 108(2): p. 135-144.

36. Yuan TL and Cantley LC, PI3K pathway alterations in cancer: variations on a theme. Oncogene, 2008. 27(41): p. 5497-5510.

37. Liu P, Cheng H, Roberts TM, and Zhao JJ, Targeting the phosphoinositide 3-kinase pathway in cancer. Nat Rev Drug Discov, 2009. 8(8): p. 627-644.

Page 45: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

45

38. Tabin CJ, Bradley SM, Bargmann CI, et al., Mechanism of activation of a human oncogene. Nature, 1982. 300(5888): p. 143-149.

39. Reddy EP, Reynolds RK, Santos E, and Barbacid M, A point mutation is responsible for the acquisition of transforming properties by the T24 human bladder carcinoma oncogene. Nature, 1982. 300(5888): p. 149-152.

40. Ellis RW, Defeo D, Shih TY, et al., The p21 src genes of Harvey and Kirsten sarcoma viruses originate from divergent members of a family of normal vertebrate genes. Nature, 1981. 292(5823): p. 506-511.

41. Pulciani S, Santos E, Lauver AV, Long LK, Aaronson SA, and Barbacid M, Oncogenes in solid human tumours. Nature, 1982. 300(5892): p. 539-542.

42. Rowley JD, Letter: A new consistent chromosomal abnormality in chronic myelogenous leukaemia identified by quinacrine fluorescence and Giemsa staining. Nature, 1973. 243(5405): p. 290-293.

43. Klein G, Multiple phenotypic consequences of the Ig/Myc translocation in B-cell-derived tumors. Genes Chromosomes Cancer, 1989. 1(1): p. 3-8.

44. Bargmann CI, Hung MC, and Weinberg RA, The neu oncogene encodes an epidermal growth factor receptor-related protein. Nature, 1986. 319(6050): p. 226-230.

45. Friend SH, Bernards R, Rogelj S, et al., A human DNA segment with properties of the gene that predisposes to retinoblastoma and osteosarcoma. Nature, 1986. 323(6089): p. 643-646.

46. Cavenee WK, Dryja TP, Phillips RA, et al., Expression of recessive alleles by chromosomal mechanisms in retinoblastoma. Nature, 1983. 305(5937): p. 779-784.

47. Kamb A, Gruis NA, Weaver-Feldhaus J, et al., A cell cycle regulator potentially involved in genesis of many tumor types. Science, 1994. 264(5157): p. 436-440.

48. Nobori T, Miura K, Wu DJ, Lois A, Takabayashi K, and Carson DA, Deletions of the cyclin-dependent kinase-4 inhibitor gene in multiple human cancers. Nature, 1994. 368(6473): p. 753-756.

49. Weir B, Zhao X, and Meyerson M, Somatic alterations in the human cancer genome. Cancer Cell, 2004. 6(5): p. 433-438.

50. Collins S and Groudine M, Amplification of endogenous myc-related DNA sequences in a human myeloid leukaemia cell line. Nature, 1982. 298(5875): p. 679-681.

51. de Klein A, van Kessel AG, Grosveld G, et al., A cellular oncogene is translocated to the Philadelphia chromosome in chronic myelocytic leukaemia. Nature, 1982. 300(5894): p. 765-767.

52. Baker SJ, Fearon ER, Nigro JM, et al., Chromosome 17 deletions and p53 gene mutations in colorectal carcinomas. Science, 1989. 244(4901): p. 217-221.

53. Li J, Yen C, Liaw D, et al., PTEN, a putative protein tyrosine phosphatase gene mutated in human brain, breast, and prostate cancer. Science, 1997. 275(5308): p. 1943-1947.

54. Powell SM, Zilz N, Beazer-Barclay Y, et al., APC mutations occur early during colorectal tumorigenesis. Nature, 1992. 359(6392): p. 235-237.

55. Lander ES, Linton LM, Birren B, et al., Initial sequencing and analysis of the human genome. Nature, 2001. 409(6822): p. 860-921.

56. Venter JC, Adams MD, Myers EW, et al., The sequence of the human genome. Science, 2001. 291(5507): p. 1304-1351.

57. Bignell GR, Huang J, Greshock J, et al., High-resolution analysis of DNA copy number using oligonucleotide microarrays. Genome Res, 2004. 14(2): p. 287-295.

Page 46: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

46

58. Brennan C, Zhang Y, Leo C, et al., High-resolution global profiling of genomic alterations with long oligonucleotide microarray. Cancer Res, 2004. 64(14): p. 4744-4748.

59. Ishkanian AS, Malloff CA, Watson SK, et al., A tiling resolution DNA microarray with complete coverage of the human genome. Nat Genet, 2004. 36(3): p. 299-303.

60. Lucito R, Healy J, Alexander J, et al., Representational oligonucleotide microarray analysis: a high-resolution method to detect genome copy number variation. Genome Res, 2003. 13(10): p. 2291-2305.

61. Zhao X, Li C, Paez JG, et al., An integrated view of copy number and allelic alterations in the cancer genome using single nucleotide polymorphism arrays. Cancer Res, 2004. 64(9): p. 3060-3071.

62. Nanjundan M, Nakayama Y, Cheng KW, et al., Amplification of MDS1/EVI1 and EVI1, located in the 3q26.2 amplicon, is associated with favorable patient prognosis in ovarian cancer. Cancer Res, 2007. 67(7): p. 3074-3084.

63. Bass AJ, Watanabe H, Mermel CH, et al., SOX2 is an amplified lineage-survival oncogene in lung and esophageal squamous cell carcinomas. Nat Genet, 2009. 41(11): p. 1238-1242.

64. Weir BA, Woo MS, Getz G, et al., Characterizing the cancer genome in lung adenocarcinoma. Nature, 2007. 450(7171): p. 893-898.

65. Firestein R, Bass AJ, Kim SY, et al., CDK8 is a colorectal cancer oncogene that regulates beta-catenin activity. Nature, 2008. 455(7212): p. 547-551.

66. Mullighan CG, Goorha S, Radtke I, et al., Genome-wide analysis of genetic alterations in acute lymphoblastic leukaemia. Nature, 2007. 446(7137): p. 758-764.

67. Mullighan CG, Miller CB, Radtke I, et al., BCR-ABL1 lymphoblastic leukaemia is characterized by the deletion of Ikaros. Nature, 2008. 453(7191): p. 110-114.

68. Rivera MN, Kim WJ, Wells J, et al., An X chromosome gene, WTX, is commonly inactivated in Wilms tumor. Science, 2007. 315(5812): p. 642-645.

69. Wang TL, Maierhofer C, Speicher MR, et al., Digital karyotyping. Proc Natl Acad Sci U S A, 2002. 99(25): p. 16156-16161.

70. Volik S, Zhao S, Chin K, et al., End-sequence profiling: sequence-based analysis of aberrant genomes. Proc Natl Acad Sci U S A, 2003. 100(13): p. 7696-7701.

71. Samuels Y, Wang Z, Bardelli A, et al., High frequency of mutations of the PIK3CA gene in human cancers. Science, 2004. 304(5670): p. 554.

72. Bardelli A, Parsons DW, Silliman N, et al., Mutational analysis of the tyrosine kinome in colorectal cancers. Science, 2003. 300(5621): p. 949.

73. Wang Z, Shen D, Parsons DW, et al., Mutational analysis of the tyrosine phosphatome in colorectal cancers. Science, 2004. 304(5674): p. 1164-1166.

74. Bignell G, Smith R, Hunter C, et al., Sequence analysis of the protein kinase gene family in human testicular germ-cell tumors of adolescents and adults. Genes Chromosomes Cancer, 2006. 45(1): p. 42-46.

75. Stephens P, Hunter C, Bignell G, et al., Lung cancer: intragenic ERBB2 kinase mutations in tumours. Nature, 2004. 431(7008): p. 525-526.

76. Lynch TJ, Bell DW, Sordella R, et al., Activating mutations in the epidermal growth factor receptor underlying responsiveness of non-small-cell lung cancer to gefitinib. N Engl J Med, 2004. 350(21): p. 2129-2139.

77. Paez JG, Janne PA, Lee JC, et al., EGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy. Science, 2004. 304(5676): p. 1497-1500.

Page 47: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

47

78. Jones S, Zhang X, Parsons DW, et al., Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science, 2008. 321(5897): p. 1801-1806.

79. Parsons DW, Jones S, Zhang X, et al., An integrated genomic analysis of human glioblastoma multiforme. Science, 2008. 321(5897): p. 1807-1812.

80. TCGA, Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 2008. 455(7216): p. 1061-1068.

81. Kan Z, Jaiswal BS, Stinson J, et al., Diverse somatic mutation patterns and pathway alterations in human cancers. Nature, 2010. 466(7308): p. 869-873.

82. Leary RJ, Lin JC, Cummins J, et al., Integrated analysis of homozygous deletions, focal amplifications, and sequence alterations in breast and colorectal cancers. Proc Natl Acad Sci U S A, 2008. 105(42): p. 16224-16229.

83. Bentley DR, Balasubramanian S, Swerdlow HP, et al., Accurate whole human genome sequencing using reversible terminator chemistry. Nature, 2008. 456(7218): p. 53-59.

84. Wheeler DA, Srinivasan M, Egholm M, et al., The complete genome of an individual by massively parallel DNA sequencing. Nature, 2008. 452(7189): p. 872-876.

85. Margulies M, Egholm M, Altman WE, et al., Genome sequencing in microfabricated high-density picolitre reactors. Nature, 2005. 437(7057): p. 376-380.

86. Shendure J, Porreca GJ, Reppas NB, et al., Accurate multiplex polony sequencing of an evolved bacterial genome. Science, 2005. 309(5741): p. 1728-1732.

87. Ley TJ, Mardis ER, Ding L, et al., DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature, 2008. 456(7218): p. 66-72.

88. Korbel JO, Urban AE, Affourtit JP, et al., Paired-end mapping reveals extensive structural variation in the human genome. Science, 2007. 318(5849): p. 420-426.

89. Campbell PJ, Stephens PJ, Pleasance ED, et al., Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat Genet, 2008. 40(6): p. 722-729.

90. Pleasance ED, Cheetham RK, Stephens PJ, et al., A comprehensive catalogue of somatic mutations from a human cancer genome. Nature, 2010. 463(7278): p. 191-196.

91. Chiang DY, Getz G, Jaffe DB, et al., High-resolution mapping of copy-number alterations with massively parallel sequencing. Nat Methods, 2009. 6(1): p. 99-103.

92. Mamanova L, Coffey AJ, Scott CE, et al., Target-enrichment strategies for next-generation sequencing. Nat Methods, 2010. 7(2): p. 111-118.

93. Ng SB, Turner EH, Robertson PD, et al., Targeted capture and massively parallel sequencing of 12 human exomes. Nature, 2009. 461(7261): p. 272-276.

94. Maher CA, Kumar-Sinha C, Cao X, et al., Transcriptome sequencing to detect gene fusions in cancer. Nature, 2009. 458(7234): p. 97-101.

95. Maher CA, Palanisamy N, Brenner JC, et al., Chimeric transcript discovery by paired-end transcriptome sequencing. Proc Natl Acad Sci U S A, 2009. 106(30): p. 12353-12358.

96. Nelson JR, Cai YC, Giesler TL, et al., TempliPhi, phi29 DNA polymerase based rolling circle amplification of templates for DNA sequencing. Biotechniques, 2002. Suppl: p. 44-47.

Page 48: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

48

97. Jiao X, Rosenlund M, Hooper SD, et al., Structural alterations from multiple displacement amplification of a human genome revealed by mate-pair sequencing. PLoS One, 2011. 6(7): p. e22250.

98. Medvedev P, Stanciu M, and Brudno M, Computational methods for discovering structural variation with next-generation sequencing. Nat Methods, 2009. 6(11 Suppl): p. S13-20.

99. Huang CR, Schneider AM, Lu Y, et al., Mobile interspersed repeats are major structural variants in the human genome. Cell, 2010. 141(7): p. 1171-1182.

100. Kidd JM, Cooper GM, Donahue WF, et al., Mapping and sequencing of structural variation from eight human genomes. Nature, 2008. 453(7191): p. 56-64.

101. Payseur BA, Jing P, and Haasl RJ, A genomic portrait of human microsatellite variation. Mol Biol Evol, 2011. 28(1): p. 303-312.

102. Dickson D, Wellcome funds cancer database. Nature, 1999. 401(6755): p. 729. 103. Collins FS and Barker AD, Mapping the cancer genome. Pinpointing the genes

involved in cancer will help chart a new course across the complex landscape of human malignancies. Sci Am, 2007. 296(3): p. 50-57.

104. Hudson TJ, Anderson W, Artez A, et al., International network of cancer genome projects. Nature, 2010. 464(7291): p. 993-998.

105. Stephens P, Edkins S, Davies H, et al., A screen of the complete protein kinase gene family identifies diverse patterns of somatic mutations in human breast cancer. Nat Genet, 2005. 37(6): p. 590-592.

106. Davies H, Hunter C, Smith R, et al., Somatic mutations of the protein kinase gene family in human lung cancer. Cancer Res, 2005. 65(17): p. 7591-7595.

107. Greenman C, Stephens P, Smith R, et al., Patterns of somatic mutation in human cancer genomes. Nature, 2007. 446(7132): p. 153-158.

108. Ding L, Getz G, Wheeler DA, et al., Somatic mutations affect key pathways in lung adenocarcinoma. Nature, 2008. 455(7216): p. 1069-1075.

109. Shah SP, Morin RD, Khattra J, et al., Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature, 2009. 461(7265): p. 809-813.

110. Stephens PJ, McBride DJ, Lin ML, et al., Complex landscapes of somatic rearrangement in human breast cancer genomes. Nature, 2009. 462(7276): p. 1005-1010.

111. Pleasance ED, Stephens PJ, O'Meara S, et al., A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature, 2010. 463(7278): p. 184-190.

112. Ding L, Ellis MJ, Li S, et al., Genome remodelling in a basal-like breast cancer metastasis and xenograft. Nature, 2010. 464(7291): p. 999-1005.

113. Jones S, Wang TL, Shih Ie M, et al., Frequent mutations of chromatin remodeling gene ARID1A in ovarian clear cell carcinoma. Science, 2010. 330(6001): p. 228-231.

114. Edgren H, Murumagi A, Kangaspeska S, et al., Identification of fusion genes in breast cancer by paired-end RNA-sequencing. Genome Biol, 2011. 12(1): p. R6.

115. Berger MF, Lawrence MS, Demichelis F, et al., The genomic complexity of primary human prostate cancer. Nature, 2011. 470(7333): p. 214-220.

116. TCGA, Integrated genomic analyses of ovarian carcinoma. Nature, 2011. 474(7353): p. 609-615.

117. Nik-Zainal S, Alexandrov LB, Wedge DC, et al., Mutational processes molding the genomes of 21 breast cancers. Cell, 2012. 149(5): p. 979-993.

Page 49: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

49

118. Nik-Zainal S, Van Loo P, Wedge DC, et al., The life history of 21 breast cancers. Cell, 2012. 149(5): p. 994-1007.

119. Ellis MJ, Ding L, Shen D, et al., Whole-genome analysis informs breast cancer response to aromatase inhibition. Nature, 2012. 486(7403): p. 353-360.

120. Shah SP, Roth A, Goya R, et al., The clonal and mutational evolution spectrum of primary triple-negative breast cancers. Nature, 2012. 486(7403): p. 395-399.

121. Stephens PJ, Tarpey PS, Davies H, et al., The landscape of cancer genes and mutational processes in breast cancer. Nature, 2012. 486(7403): p. 400-404.

122. Banerji S, Cibulskis K, Rangel-Escareno C, et al., Sequence analysis of mutations and translocations across breast cancer subtypes. Nature, 2012. 486(7403): p. 405-409.

123. TCGA, Comprehensive molecular characterization of human colon and rectal cancer. Nature, 2012. 487(7407): p. 330-337.

124. TCGA, Comprehensive molecular portraits of human breast tumours. Nature, 2012. 490: p. 61-70.

125. Campbell PJ, Yachida S, Mudie LJ, et al., The patterns and dynamics of genomic instability in metastatic pancreatic cancer. Nature, 2010. 467(7319): p. 1109-1113.

126. Chapman MA, Lawrence MS, Keats JJ, et al., Initial genome sequencing and analysis of multiple myeloma. Nature, 2011. 471(7339): p. 467-472.

127. Ding L, Ley TJ, Larson DE, et al., Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature, 2012. 481(7382): p. 506-510.

128. Lee W, Jiang Z, Liu J, et al., The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature, 2010. 465(7297): p. 473-477.

129. Hornsby C, Page KM, and Tomlinson IPM, What can we learn from the population incidence of cancer? Armitage and Doll revisited. The Lancet Oncology, 2007. 8(11): p. 1030-1038.

130. Beerenwinkel N, Antal T, Dingli D, et al., Genetic progression and the waiting time to cancer. PLoS Comput Biol, 2007. 3(11): p. e225.

131. Greenman C, Wooster R, Futreal PA, Stratton MR, and Easton DF, Statistical analysis of pathogenicity of somatic mutations in cancer. Genetics, 2006. 173(4): p. 2187-2198.

132. Ng PC and Henikoff S, SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res, 2003. 31(13): p. 3812-3814.

133. Adzhubei IA, Schmidt S, Peshkin L, et al., A method and server for predicting damaging missense mutations. Nat Methods, 2010. 7(4): p. 248-249.

134. Thomas PD, Campbell MJ, Kejariwal A, et al., PANTHER: a library of protein families and subfamilies indexed by function. Genome Res, 2003. 13(9): p. 2129-2141.

135. Schwarz JM, Rodelsperger C, Schuelke M, and Seelow D, MutationTaster evaluates disease-causing potential of sequence alterations. Nat Methods, 2010. 7(8): p. 575-576.

136. Sjöblom T, Systematic analyses of the cancer genome: lessons learned from sequencing most of the annotated human protein-coding genes. Curr Opin Oncol, 2008. 20(1): p. 66-71.

137. Carter H, Chen S, Isik L, et al., Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations. Cancer Res, 2009. 69(16): p. 6660-6667.

Page 50: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

50

138. Dees ND, Zhang Q, Kandoth C, et al., MuSiC: Identifying mutational significance in cancer genomes. Genome Res, 2012. 22(8): p. 1589-1598.

139. Bamford S, Dawson E, Forbes S, et al., The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. Br J Cancer, 2004. 91(2): p. 355-358.

140. Jemal A, Bray F, Center MM, Ferlay J, Ward E, and Forman D, Global cancer statistics. CA Cancer J Clin, 2011. 61(2): p. 69-90.

141. Shin HR Ferlay J, Bray F, Forman D, Mathers C and Parkin DM, GLOBOCAN 2008 v2.0, Cancer Incidence and Mortality Worldwide: IARC CancerBase No. 10 [Internet]. Lyon, France: International Agency for Research on Cancer. 2010. Available from: http://globocan.iarc.fr. accessed on 24/09/2012

142. Swedish National Board of Health and Welfare (Socialstyrelsen), Cancer incidence in Sweden 2010. 2011. Available from: http://www.socialstyrelsen.se/Lists/Artikelkatalog/Attachments/18530/2011-12-15.pdf

143. Swedish National Board of Health and Welfare (Socialstyrelsen), Cancer incidence in Sweden 2009. 2010. Available from: http://www.socialstyrelsen.se/Lists/Artikelkatalog/Attachments/18204/2010-12-17.pdf

144. Swedish National Board of Health and Welfare (Socialstyrelsen), Cancer incidence in Sweden 2008. 2009. Available from: http://www.socialstyrelsen.se/Lists/Artikelkatalog/Attachments/17841/2009-12-1.pdf

145. Miki Y, Swensen J, Shattuck-Eidens D, et al., A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. Science, 1994. 266(5182): p. 66-71.

146. Wooster R, Neuhausen SL, Mangion J, et al., Localization of a breast cancer susceptibility gene, BRCA2, to chromosome 13q12-13. Science, 1994. 265(5181): p. 2088-2090.

147. Hulka BS and Moorman PG, Breast cancer: hormones and other risk factors. Maturitas, 2001. 38(1): p. 103-113; discussion 113-106.

148. Sgroi DC, Preinvasive breast cancer. Annu Rev Pathol, 2010. 5: p. 193-221. 149. Polyak K, Breast cancer: origins and evolution. J Clin Invest, 2007. 117(11):

p. 3155-3163. 150. American Joint Committee on Cancer, AJCC Cancer Staging Manual, 7th

edition. 2010 151. Schmutzler R, Schlegelberger B, Meindl A, et al., [Counselling, genetic testing

and prevention in women with hereditary breast- and ovarian cancer. Interdisciplinary recommendations of the consortium "Hereditary Breast- and Ovarian Cancer" of the German Cancer AiD]. Zentralbl Gynakol, 2003. 125(12): p. 494-506.

152. Carlson RW, Moench SJ, Hammond ME, et al., HER2 testing in breast cancer: NCCN Task Force report and recommendations. J Natl Compr Canc Netw, 2006. 4 Suppl 3: p. S1-22; quiz S23-24.

153. Hammond ME, Hayes DF, Dowsett M, et al., American Society of Clinical Oncology/College Of American Pathologists guideline recommendations for immunohistochemical testing of estrogen and progesterone receptors in breast cancer. J Clin Oncol, 2010. 28(16): p. 2784-2795.

154. Perou CM, Sorlie T, Eisen MB, et al., Molecular portraits of human breast tumours. Nature, 2000. 406(6797): p. 747-752.

Page 51: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

51

155. Sorlie T, Perou CM, Tibshirani R, et al., Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci U S A, 2001. 98(19): p. 10869-10874.

156. Sorlie T, Tibshirani R, Parker J, et al., Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A, 2003. 100(14): p. 8418-8423.

157. Sotiriou C, Neo SY, McShane LM, et al., Breast cancer classification and prognosis based on gene expression profiles from a population-based study. Proc Natl Acad Sci U S A, 2003. 100(18): p. 10393-10398.

158. Langerod A, Zhao H, Borgan O, et al., TP53 mutation status and gene expression profiles are powerful prognostic markers of breast cancer. Breast Cancer Res, 2007. 9(3): p. R30.

159. Hicks J, Krasnitz A, Lakshmi B, et al., Novel patterns of genome rearrangement and their association with survival in breast cancer. Genome Res, 2006. 16(12): p. 1465-1479.

160. Hess KR, Pusztai L, Buzdar AU, and Hortobagyi GN, Estrogen receptors and distinct patterns of breast cancer relapse. Breast Cancer Res Treat, 2003. 78(1): p. 105-118.

161. Curtis C, Shah SP, Chin SF, et al., The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature, 2012. 486(7403): p. 346-352.

162. Baldwin AS, Regulation of cell death and autophagy by IKK and NF-kappaB: critical mechanisms in immune function and cancer. Immunol Rev, 2012. 246(1): p. 327-345.

163. Dillon RL, White DE, and Muller WJ, The phosphatidyl inositol 3-kinase signaling network: implications for human breast cancer. Oncogene, 2007. 26(9): p. 1338-1345.

164. Kinzler KW, Bigner SH, Bigner DD, et al., Identification of an amplified, highly expressed gene in a human glioma. Science, 1987. 236(4797): p. 70-73.

165. Bell DW, Our changing view of the genomic landscape of cancer. J Pathol, 2010. 220(2): p. 231-243.

166. Jones S, Chen WD, Parmigiani G, et al., Comparative lesion sequencing provides insights into tumor evolution. Proc Natl Acad Sci U S A, 2008. 105(11): p. 4283-4288.

167. Lovmar L and Syvanen AC, Multiple displacement amplification to create a long-lasting source of DNA for genetic studies. Hum Mutat, 2006. 27(7): p. 603-614.

168. Pugh TJ, Delaney AD, Farnoud N, et al., Impact of whole genome amplification on analysis of copy number variants. Nucleic Acids Res, 2008. 36(13): p. e80.

169. Lasken RS and Stockwell TB, Mechanism of chimera formation during the Multiple Displacement Amplification reaction. BMC Biotechnol, 2007. 7: p. 19.

170. King CR, Kraus MH, and Aaronson SA, Amplification of a novel v-erbB-related gene in a human mammary carcinoma. Science, 1985. 229(4717): p. 974-976.

171. Slamon DJ, Clark GM, Wong SG, Levin WJ, Ullrich A, and McGuire WL, Human breast cancer: correlation of relapse and survival with amplification of the HER-2/neu oncogene. Science, 1987. 235(4785): p. 177-182.

172. Yamazaki H, Fukui Y, Ueyama Y, et al., Amplification of the structurally and functionally altered epidermal growth factor receptor gene (c-erbB) in human brain tumors. Mol Cell Biol, 1988. 8(4): p. 1816-1820.

Page 52: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

52

173. Fearon ER, Molecular genetics of colorectal cancer. Annu Rev Pathol, 2011. 6: p. 479-507.

174. T'Ang A, Varley JM, Chakraborty S, Murphree AL, and Fung YK, Structural rearrangement of the retinoblastoma gene in human breast carcinoma. Science, 1988. 242(4876): p. 263-266.

175. Petitjean A, Mathe E, Kato S, et al., Impact of mutant p53 functional properties on TP53 mutation patterns and tumor phenotype: lessons from recent developments in the IARC TP53 database. Hum Mutat, 2007. 28(6): p. 622-629.

176. Oda T, Kanai Y, Oyama T, et al., E-cadherin gene mutations in human gastric carcinoma cell lines. Proc Natl Acad Sci U S A, 1994. 91(5): p. 1858-1862.

177. Risinger JI, Berchuck A, Kohler MF, and Boyd J, Mutations of the E-cadherin gene in human gynecologic cancers. Nat Genet, 1994. 7(1): p. 98-102.

178. Eppert K, Scherer SW, Ozcelik H, et al., MADR2 maps to 18q21 and encodes a TGFbeta-regulated MAD-related protein that is functionally mutated in colorectal carcinoma. Cell, 1996. 86(4): p. 543-552.

179. Schutte M, Hruban RH, Hedrick L, et al., DPC4 gene in various tumor types. Cancer Res, 1996. 56(11): p. 2527-2530.

180. Morin PJ, Sparks AB, Korinek V, et al., Activation of beta-catenin-Tcf signaling in colon cancer by mutations in beta-catenin or APC. Science, 1997. 275(5307): p. 1787-1790.

181. Teng DH, Perry WL, 3rd, Hogan JK, et al., Human mitogen-activated protein kinase kinase 4 as a candidate tumor suppressor. Cancer Res, 1997. 57(19): p. 4177-4182.

182. Duval A, Gayet J, Zhou XP, Iacopetta B, Thomas G, and Hamelin R, Frequent frameshift mutations of the TCF-4 gene in colorectal cancers with microsatellite instability. Cancer Res, 1999. 59(17): p. 4213-4215.

183. Gayther SA, Batley SJ, Linger L, et al., Mutations truncating the EP300 acetylase in human cancers. Nat Genet, 2000. 24(3): p. 300-303.

184. Liu CX, Musco S, Lisitsina NM, Forgacs E, Minna JD, and Lisitsyn NA, LRP-DIT, a putative endocytic receptor gene, is frequently inactivated in non-small cell lung cancer cell lines. Cancer Res, 2000. 60(7): p. 1961-1967.

185. Moberg KH, Bell DW, Wahrer DC, Haber DA, and Hariharan IK, Archipelago regulates Cyclin E levels in Drosophila and is mutated in human cancer cell lines. Nature, 2001. 413(6853): p. 311-316.

186. Philp AJ, Campbell IG, Leet C, et al., The phosphatidylinositol 3'-kinase p85alpha gene is an oncogene in human ovarian and colon tumors. Cancer Res, 2001. 61(20): p. 7426-7429.

187. Davies H, Bignell GR, Cox C, et al., Mutations of the BRAF gene in human cancer. Nature, 2002. 417(6892): p. 949-954.

188. Huusko P, Ponciano-Jackson D, Wolf M, et al., Nonsense-mediated decay microarray analysis identifies mutations of EPHB2 in human prostate cancer. Nat Genet, 2004. 36(9): p. 979-983.

189. Barbieri CE, Baca SC, Lawrence MS, et al., Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer. Nat Genet, 2012. 44(6): p. 685-689.

190. Grasso CS, Wu YM, Robinson DR, et al., The mutational landscape of lethal castration-resistant prostate cancer. Nature, 2012. 487(7406): p. 239-243.

191. Taylor BS, Schultz N, Hieronymus H, et al., Integrative genomic profiling of human prostate cancer. Cancer Cell, 2010. 18(1): p. 11-22.

192. Usary J, Llaca V, Karaca G, et al., Mutation of GATA3 in human breast tumors. Oncogene, 2004. 23(46): p. 7669-7678.

Page 53: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

53

193. Tomlins SA, Rhodes DR, Perner S, et al., Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science, 2005. 310(5748): p. 644-648.

194. Sun X, Frierson HF, Chen C, et al., Frequent somatic mutations of the transcription factor ATBF1 in human prostate cancer. Nat Genet, 2005. 37(4): p. 407-412.

195. Carpten JD, Faber AL, Horn C, et al., A transforming mutation in the pleckstrin homology domain of AKT1 in cancer. Nature, 2007. 448(7152): p. 439-444.

196. George RE, Sanda T, Hanna M, et al., Activating mutations in ALK provide a therapeutic target in neuroblastoma. Nature, 2008. 455(7215): p. 975-978.

197. Chen Y, Takita J, Choi YL, et al., Oncogenic mutations of ALK kinase in neuroblastoma. Nature, 2008. 455(7215): p. 971-974.

198. Janoueix-Lerosey I, Lequin D, Brugieres L, et al., Somatic and germline activating mutations of the ALK kinase receptor in neuroblastoma. Nature, 2008. 455(7215): p. 967-970.

199. Mosse YP, Laudenslager M, Longo L, et al., Identification of ALK as a major familial neuroblastoma predisposition gene. Nature, 2008. 455(7215): p. 930-935.

200. van Haaften G, Dalgliesh GL, Davies H, et al., Somatic mutations of the histone H3K27 demethylase gene UTX in human cancer. Nat Genet, 2009. 41(5): p. 521-523.

201. Wiegand KC, Shah SP, Al-Agha OM, et al., ARID1A mutations in endometriosis-associated ovarian carcinomas. N Engl J Med, 2010. 363(16): p. 1532-1543.

202. Bass AJ, Lawrence MS, Brace LE, et al., Genomic sequencing of colorectal adenocarcinomas identifies a recurrent VTI1A-TCF7L2 fusion. Nat Genet, 2011. 43(10): p. 964-968.

Page 54: Somatic Mutations in Breast Cancer Genomes559456/FULLTEXT01.pdfbreast cancer genomes, which broaden our understanding of the genetic basis of breast cancer and may ultimately facilitate

Acta Universitatis UpsaliensisDigital Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Medicine 822

Editor: The Dean of the Faculty of Medicine

A doctoral dissertation from the Faculty of Medicine, UppsalaUniversity, is usually a summary of a number of papers. A fewcopies of the complete dissertation are kept at major Swedishresearch libraries, while the summary alone is distributedinternationally through the series Digital ComprehensiveSummaries of Uppsala Dissertations from the Faculty ofMedicine.

Distribution: publications.uu.seurn:nbn:se:uu:diva-182319

ACTAUNIVERSITATIS

UPSALIENSISUPPSALA

2012