integrative clinical genomics of metastatic cancer - molmed oosterhuis...integrative clinical...
TRANSCRIPT
1 7 a u g u s t 2 0 1 7 | V O L 5 4 8 | N a t u R E | 2 9 7
aRticLEdoi:10.1038/nature23306
Integrative clinical genomics of metastatic cancerDan R. Robinson1,2*, Yi-Mi Wu1,2*, Robert J. Lonigro1*, Pankaj Vats1, Erin cobain3, Jessica Everett3, Xuhong cao1, Erica Rabban1, chandan Kumar-sinha1,2, Victoria Raymond3, scott schuetze3, ajjai alva3, Javed siddiqui1,2, Rashmi chugh3, Francis Worden3, Mark M. Zalupski3, Jeffrey innis4, Rajen J. Mody4, scott a. tomlins1,2, David Lucas2, Laurence H. Baker3, Nithya Ramnath3, ann F. schott3, Daniel F. Hayes3, Joseph Vijai5, Kenneth Offit5, Elena M. stoffel3, J. scott Roberts6, David c. smith3, Lakshmi P. Kunju1,2, Moshe talpaz7, Marcin cieślik1,2* & arul M. chinnaiyan1,2,7,8,9
Tumour metastasis is the process by which cancer cells disperse from a primary site and progressively colonize distant organs. In over 90% of cases, metastatic spread of tumour cells is the greatest contributor to deaths from cancer1,2. With the preponderance of cancer patients enrolled in early stage (phase I–II) clinical trials harbouring metastatic disease2, and with the advent of genomic testing of tumours, there remains the promise of matching patients to the right therapy on the basis of comprehensive molecular profiles3 of pathogenic somatic4,5 and germline6 variants, and components of the functional genome, tumour phenotype, and tumour microenvironment afforded by RNA sequencing (RNA-seq)7,8. While metastatic tumours share key driver mutations with the primary tumour from which they arise, they often develop new mutations as they evolve during metastasis and treatment9. Thus, it is preferable to match patients to potential therapies and clinical trials on the basis of a real-time analysis of their metastatic tumour, rather than archival material of their primary tumour2,10.
In 2010, we introduced the Michigan Oncology Sequencing (MI-ONCOSEQ) Program, an institutional review board (IRB)-approved protocol to perform prospective, integrative exome and transcriptome sequencing of advanced cancers in patients7, mirroring the efforts of The Cancer Genome Atlas (TCGA) project, which focuses on generating exome and transcriptome sequence11 in primary cancers. The MI-ONCOSEQ program was among the first comprehensive, clin-ical DNA- and RNA-seq programs offered to cancer patients8,12–16. Its purpose was to determine the use of genomic sequencing of tumours and germline coupled with a multidisciplinary precision medicine tumour board (PMTB) in the management of advanced cancer in patients7,15. The program transitioned into sequencing in a clinical setting (under Clinical Laboratory Improvement Amendments) as part of the Clinical Sequencing Exploratory Research consortium in 2013 (refs 14–16).
In this study, we performed clinical-grade whole-exome (tumour/normal) and -transcriptome sequencing (that is, integrative sequencing) of 500 cancer patients harbouring metastatic cancers from over 30 primary sites and biopsied from over 22 organs (abbreviated as the ‘MET500’ cohort). Sequencing-matched tumour and normal samples from patients delineated potentially pathogenic germline alterations and provided high-resolution copy-number landscapes. RNA-seq analysis provided insights into functional gene fusions, transcriptional pathway activation, and a landscape of immune infiltration.
Molecular aberrations in metastatic cancersWe successfully obtained 537 biopsies from 556 enrolled patients, and complete sequencing results on 500 patients with metastatic cancers, rep-resenting a 93% success rate. Reasons for failure included lack of tumour content on biopsy (37 cases, 6.6%) and biopsy material not available (19 cases, 3.4%; patient declined biopsy, poor physical performance, unable to image site, unsafe for biopsy, insufficient tissue, or enrolled in other clinical trials). Most patients, 468 (93.6%), were seen at the University of Michigan Comprehensive Cancer Center; however, patients from 21 other institutions were also enrolled. The patient demographics were 258 (51.6%) males, 242 females (48.4%), 460 (92%) white, and 40 (8%) non-white. The median age of the cohort was 59 years, with a range of 18–86 (Supplementary Table 1 and Extended Data Fig. 1a, b).
Figure 1a shows the cancer types (n = 20) represented in the MET500 cohort. The top three cancer types in our cohort included 93 (18.6%) metastatic prostate cancers, 91 (18.2%) metastatic breast cancers, and 42 (8.4%) soft tissue sarcomas. There were also 25 (5%) carcinomas of unknown primary origin. Figure 1b highlights the diverse metastatic sites analysed (n > 30) in the MET500 cohort. The most prevalent sites of metastases included 134 liver, 114 lymph node, 46 lung, 42 bone, and 32 abdominal mass/ascites/pleural fluid.
Metastasis is the primary cause of cancer-related deaths. Although The Cancer Genome Atlas has sequenced primary tumour types obtained from surgical resections, much less comprehensive molecular analysis is available from clinically acquired metastatic cancers. Here we perform whole-exome and -transcriptome sequencing of 500 adult patients with metastatic solid tumours of diverse lineage and biopsy site. The most prevalent genes somatically altered in metastatic cancer included TP53, CDKN2A, PTEN, PIK3CA, and RB1. Putative pathogenic germline variants were present in 12.2% of cases of which 75% were related to defects in DNA repair. RNA sequencing complemented DNA sequencing to identify gene fusions, pathway activation, and immune profiling. Our results show that integrative sequence analysis provides a clinically relevant, multi-dimensional view of the complex molecular landscape and microenvironment of metastatic cancers.
1Michigan Center for Translational Pathology, University of Michigan, Ann Arbor, Michigan 48109, USA. 2Department of Pathology, University of Michigan, Ann Arbor, Michigan 48109, USA. 3Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan 48109, USA. 4Department of Pediatrics, University of Michigan, Ann Arbor, Michigan 48109, USA. 5Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA. 6Department of Health Behavior & Health Education, School of Public Health, University of Michigan, Ann Arbor, Michigan 48109, USA. 7Comprehensive Cancer Center, University of Michigan, Ann Arbor, Michigan 48109, USA. 8Department of Urology, University of Michigan, Ann Arbor, Michigan 48109, USA. 9Howard Hughes Medical Institute, Chevy Chase, Maryland 20815, USA.* These authors contributed equally to this work.
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
ArticlereSeArcH
2 9 8 | N a t u R E | V O L 5 4 8 | 1 7 a u g u s t 2 0 1 7
For each patient, we performed paired-exome sequencing on tumour and germline DNA to identify likely pathogenic variants and resolve their somatic or germline origin. Mean target coverages for tumour and normal exomes were 180× and 120× , respectively. The average tumour content was 62%. Sequencing metrics are summarized in Supplementary Table 2. Within the targeted regions, we identified an average of 119 somatic mutations per patient. For most cancer types, the number of mutations significantly increased in metastases relative to primary tumours in TCGA (Fig. 1c). The difference was more pro-nounced for tumour types with low mutation burden in the primary stage, for example prostate or adrenal cancer. To identify the most recurrent, and hence likely pathogenic, targets of genetic alterations, we performed an integrative analysis of single-nucleotide variants (SNVs), copy-number variants (CNVs), and gene fusions. For each patient and gene, we classified the most recurrently mutated genes as putative tumour suppressors or oncogenes on the basis of the increase in frequency of inactivating mutations and expert knowledge (Fig. 1d, e and Supplementary Table 3). We found a long-tailed mutational spec-trum for both tumour suppressors and oncogenes. TP53 (266, 53.2%), CDKN2A (80, 16%), PTEN (79, 15.8%), and RB1 (68, 13.6%) were the most frequently altered tumour suppressors, while the most frequently mutated oncogenes included PIK3CA (67, 13.4%), AR (63, 12.6%), and KRAS (51, 10.2%). Overall, tumour suppressors were altered across many cancer types (for example, TP53 and RB1), while oncogenes were more strongly associated with individual cancer types (for example, AR or GNAS) (Extended Data Fig. 1c). We further compared the alter-ation frequencies with those from primary tumours and found that the increase in mutation burden (Fig. 1c) was mirrored by an increase in the frequency of genetic aberrations for the most widely mutated genes (Extended Data Fig. 1d).
Germline variants in metastatic cancerThrough sequencing-matched germline DNA, we identified 63 presumed pathogenic germline mutations (PPGMs), as defined by ClinVar
(https://www.ncbi.nlm.nih.gov/clinvar/) expert curation, involving 18 genes in 61 individuals (12.2%) (Fig. 2a). These included 30 deleterious missense mutations, 8 nonsense mutations, 20 frameshift mutations, and 5 deleterious splice site mutations (Fig. 2b and Supplementary Table 4). In total, 75% of the PPGMs were in genes related to DNA repair, of these, mutations in MUTYH (n = 10, 16%), BRCA2 (n = 9, 14%), CHEK2 (n = 9, 14%), and BRCA1 (n = 5, 8%) were the most common. Outside DNA repair pathways, we observed PPGMs in APC (n = 6, 9.5%), MITF (n = 5, 8%), and HOXB13 (n = 3, 5%), among other genes. Of the 63 instances of pathogenic alleles identified, 5 were previously unreported, while the remaining alleles had existing ClinVar entries with assigned pathogenic or likely pathogenic importance. Of the 61 individuals with PPGMs identified in this study, 30 (49%) had a somatic second allele aberration within the tumour genome, including loss of heterozygosity, and exhibited molecular phenotypes consistent with pathogenicity (Supplementary Table 4). The remaining cases were of carrier status for the identified allele.
Next, we compared the frequencies in genes with PPGMs discovered in the MET500 cohort with the population frequencies in 52,790 indi-viduals compiled by the Exome Aggregation Consortium (http://exac.broadinstitute.org/), excluding TCGA cancer samples17. The odds of any PPGM in metastatic cancer significantly exceeded the odds found in the populations comprising the Exome Aggregation Consortium (odds ratio = 3.00, range 2.28–3.9, P = 1 × 10−13). The genes analysed and found to be enriched in the metastatic series included BRCA1, BRCA2, APC, CHEK2, MITF, MLH1, NBN, and RB1 (Supplementary Table 5).
The gene fusion landscape of metastatic cancerTo identify both activating and inactivating gene fusions, we analysed 868 transcriptome libraries from 496 metastatic tumour RNAs. These fusion junctions involved 12,027 unique gene pairs, an average of 34 gene fusions per tumour, derived from a range of structural aberrations (Fig. 3a). Large differences in fusion burden were observed across tumour type (Extended Data Fig. 2a). One hundred
a b
LungBreast
Muscle
Liver
Shoulder and back
Bladder
Head and neck
Skin
Brain
Colon
Thyroid
Omentum
Bone
Subcutaneous tissue
Others
Adrenal
Abdominal mass or �uid
Lymph node
Pelvis
Thigh
Thorax mass or �uid
Gluteal mass
114
15
5
4
11
5
422
7
54
11
32
5
15
3
3
46
134
3
13
21
c
d
25
Lung cancerBreast cancer
Soft tissue sarcoma
Hepatobiliary cancer
Prostate cancer
Cancer of unknown primary
Oesophagogastric cancer and GIST
Bladder cancer
Head and neck cancer
Melanoma
Brain cancer
Colorectal cancer
Thyroid cancer
Ovarian cancer
Salivary gland cancer
Pancreatic cancer
Bone cancer
Skin cancer, non-melanoma
Others
Adrenocortical carcinomaRenal cell carcinoma
Uterine sarcoma
7
414
93
91
14
27
25
29
13 88
10
8 1416
42
1213
9
18
CCNE1CDK4
ERBB3FGFR4GATA2
LRRC4BROBO4
CTNNB1PAK1
POU6F2RASGRF1
RETRICTORABCB1FBXW7JARID2
NOTCH3NTRK1
PDGFRAPPM1D
TRIOCHD4MDM2
MED21MLH3
POLD1NBN
TLR4ERBB2PTPRJ
UNC13AFOXA1
MECOMALK
BRAFFRS2
GREB1KAT6AMYH9
PDGFRBPTK2TERT
TRRAPCUX1
MED12MSH3
TRPS1FGFR1
FLT4USP9X
CTNNA2ARID2ESR1
PLCB4EXT1
RECQL4EGFRMYC
CCND1CTNND2
GNASPREX2KRAS
ARPIK3CA
Ampli�cationMissenseDeleteriousTwo-hitDeletion
BARD1KEAP1PALB2SMC4TMC6
CTNNA1LATS1
MALAT1NSD1
PALLDRNF43WNK1
XPCASXL2
DICER1ERF
FANCMPBRM1PTCH2
TGFBR2FOXO3
HLA−DRB1MLH1
SETD2TSC1
MAP3K1RAD50AMER1
ATREP300
KMT2BNCOR1NCOR2SMAD4
SMARCA4CDH1MSH6KIF1BFANCI
ERGPTCH1
TSC2ARID1BBRCA1
FAT1KDM6A
TP53BP1ATRXCHD1
CREBBPJAK1
NOTCH1ERCC5
POLEFANCA
WRNZFHX3
NF1ARID1AKMT2D
APCKMT2C
ATMBRCA2
RB1PTEN
CDKN2ATP53
0 10 20 0 20 40Patients with mutations (%): Patients with mutations (%):
e
Cohort Tissue ACC Adrenal BLCA Bladder BRCA Breast CHOL Gall bladder COLO Colon ESCA Oesophagus GBM Glioblastoma HCC Liver HNSC Head-neck KDNY Kidney LUNG Lung OV Ovarian PAAD Pancreas PRAD Prostate SARC Sarcoma SECR Miscellaneous gland SKCM Melanoma STAD Stomach TGCT Germ-cell THCA Thyroid THYM Thymoma MISC Other/unkown
–1
0
1
2
THC
ATH
YM
TGC
TP
RA
DA
CC
CH
OL
PAA
DB
RC
AS
AR
CK
DN
YG
BM OV
HN
SC
STA
DH
CC
ES
CA
CO
LOB
LCA
LUN
GS
KC
M
log 10
(mut
atio
nal b
urd
en)
MET500TCGA
Figure 1 | Landscape of molecular alterations in metastatic cancer. a, Cancer types in the MET500 cohort. Number of cases indicated for each cancer type. GIST, gastrointestinal stromal tumour. b, Site of biopsies. c, Mutational burden across tumour types from the MET500 and corresponding primary TCGA cohorts. Transparent box plots signify insignificant differences (Wilcoxon rank-sum test false discovery
rate ≥ 0.1). d, e, Landscape of molecular alterations in the MET500 cohort. Each cell represents the mutation status of an individual gene for a single patient. Putative oncogenes are represented in d and putative tumour-suppressor genes in e. The percentage of mutations across the MET500 cohort is represented by vertical histograms.
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Article reSeArcH
1 7 a u g u s t 2 0 1 7 | V O L 5 4 8 | N a t u R E | 2 9 9
and ninety-nine cases (39.8%) harboured at least one putative patho-genic fusion, with 138 activating fusions and 103 deleterious fusions (Supplementary Table 6). The activating fusions could be classified as DNA-binding (n = 88), protein kinases (n = 29), and signal trans-ducers (n = 21) (Fig. 3b). The loss-of-function fusions segregated into
canonical tumour-suppressor genes (n = 59), chromatin-modifying genes (n = 35), and genes involved in cell adhesion (n = 9). The most commonly fused tumour-suppressor genes were NF1 (n = 18), TP53 (n = 11), PTEN (n = 11), and RB1 (n = 6) (Extended Data Fig. 2b). Interestingly, we identified a series of eight novel fusion pairs in met-astatic cancers that we believe are pathogenic (Fig. 3c). These include activated FGFR, BRAF, and ALK fusions with novel partners, extending the range of both fusion partners and cancer types for these clinically targetable fusions18–22. Novel gene fusions with functional domains include GREB1–NR4A3 in uterine leiomyosarcoma, POC5–PRKD1 in polymorphous low-grade adenocarcinoma of the tongue, and CIC–CITED1 in undifferentiated high-grade sarcoma. Notch fusions fall into two classes: those predicted to be sensitive to γ -secretase inhibition (for example, NOTCH2–SPAG17), and fusions that are independent of γ -secretase processing (for example, PARS2–NOTCH2).
Transcriptional signatures of metastatic diseaseTo investigate the potential clinical use of metastatic expression profiles, we analysed transcriptomes of the 496 biopsy samples (868 libraries). We first evaluated to what extent tissue- and cancer- specific gene expression was maintained across metastatic lesions. We used the t-distributed stochastic neighbourhood embedding (t-SNE) projection23 to qualitatively visualize the expression of primary can-cer markers across the MET500. Compared with primary tumours, metastatic samples were less well separated, more heterogeneous, and did not segregate on the basis of biopsy site, with the exception of liver biopsies (Fig. 4a and Extended Data Fig. 2c, d). We compared the expression of tissue-specific marker genes derived from 36 normal tissues24 between normal, primary, and metastatic samples, and observed significant de-differentiation with disease progression (Extended Data Fig. 2e).
Next, we looked at transcriptional signatures associated with perturbed cancer-related genes25,26. Compared with normal
a
b
BRCA1
1863
BRCCC BRC
p.N1355fs p.Q1756fs p.E1817*X
BRCA2
3418
Xp.T675fs
p.F1182fs p.L1491fsp.E1493fsp.G1376fs
p.S1882*p.L1908fs p.Q2960*p.E13fs
X
HELC OB OB OB
HOXB13284
HoxA13_N Homeodomain
p.G84E
MITF
419
HLH Domain of unknown function
p.E318K
MUTYH
546
FES
p.Y179C p.G396D
Endonuclease III DNA glycosylase
APC
p.I1307KX
p.S1545*
2843
BDArmadillo region EB1 binding
CHEK2
p.I157T
543
SQ/TQ FHA Kinase domain
p.Q83fs
Xp.R145W p.R191p.R95* p.T367fs
MLH1
MSH2
NBN
756
MutS homologueinteractionATPase domain EXO1 interaction domain
p.G67R p.Q197
934
DNA binding
Xp.E832*p.A636P
MutL homologueinteractionMSH3/MSH6 interaction
RAD50
PALB2
1186
WD40ChAM
p.S254fs p.Y1108fs
Interaction with BRCA1/RAD51
754
ABMMBDBRCT1FHA BRCT2
p.P381fsp.L57
1312
MBD MBDATPase_N ATPase_CCoiled coil Coiled coil
p.T109fs p.D767fs
Missense mutation Frameshift mutation (indel)X Nonsense mutation Green: copy neutral LOH
Red: one copy loss LOH
Blue: no LOHSplice site mutation
MUTYH (16%)BR
CA2 (14%
)
CH
EK2
(14%
)
AP
C (9
.5%
)
BRCA1 (8%)
MIT
F (8
%)
HOXB13 (5%
)
MLH1 (3.2%)
MSH2 (3.2%)
NBN (3.2%)
PALB2 (3.2%)
RAD50 (3.2%) DICER1
RB1
ATM
BRIP1 FAN
CA
RA
D51D
Figure 2 | Putative pathogenic germline variants in metastatic cancers. a, Pathogenic germline alterations identified in the MET500 cohort. DNA repair pathway-related variants are indicated in shades of blue, while ‘other’ alterations are indicated in shades of red. b, Gene-level schematic of pathogenic germline variants identified in the MET500 cohort. LOH, loss of heterozygosity. *, stop gain; fs, frameshift.
CIC–CITED1 fusion
1789
CITED
BCAP29–BRAF fusion
558
CCDC61–ALK fusion
1043
CC-MYO Y KIN
942
NR-LBDGREB1–NR4A3 fusion
969
EYA-PaseNFATC2–EYA2 fusion
2266TMLNR HD ANK
SPAG17
NOTCH2–SPAG17 fusion
2471Met1699
PARS2–NOTCH2 fusion
781
POC5–PRKD1 fusion
(seminal vesicle carcinoma)
(breast, triple-negative)
(colon cancer)
(high-grade sarcoma)
(leiomyosarcoma)
(breast, HER2+)
(breast, ER+)
(adenocarcinoma, tongue)
Loss of function (43%)
Signal transducer (9%)
Kinase (12%)
DNA binding(36%)
Deletion(15%)
Duplication(22%)
Inversion(25%)
Translocation(38%)
a b
cExon 7 Exon 11
Exon 13 Exon 20
GREB1 DBD
PARS2 5′ UTRANK PEST
HMG
Exon 20 Exon 3
Exon 6 Exon 2
Exon 1 Exon 27
Exon 34 Intron 48
DPol III
S/T KIN
Exon 10 Exon 13
Bap31 S/T KIN
Exon 5 Exon 6
TAD SR / SP RHD-DBD
Figure 3 | Diverse classes of gene fusions identified in metastatic cancers. a, Fusions classified by underlying structural aberrations. b, Functional gene fusions identified in the MET500 cohort. c, Molecular structure of novel, potentially activating, gene fusions in the MET500 cohort.
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
ArticlereSeArcH
3 0 0 | N a t u R E | V O L 5 4 8 | 1 7 a u g u s t 2 0 1 7
tissues, transcriptional output was increased for most oncogenic signatures (Extended Data Fig. 3a, b), indicating a global shift towards a cancer-related transcriptional program. Unsupervised clustering of signature scores across patients revealed relevant associations between gene sets, and phenotypic similarities among patients (Extended Data Fig. 4). Inference of patient-specific activities27,28 revealed coordinated changes across curated pathways that coa-lesced into a small number of principal cancer hallmarks (Extended Data Fig. 5a, b): interferon response, inflammatory response, epithelial to mesenchymal transition (EMT), proliferation, and metabolism. Importantly, these associations were robust to algorithm
choice. Compared with normal tissues, metastatic tumours show a global increase in proliferation, stress response, and metabo-lism. Conversely, hallmarks of EMT and cancer-immune responses can be either up- or downregulated (Extended Data Fig. 5a, b). Next, we computationally delineated 25 non-redundant experimental ‘meta-signatures’ (Extended Data Fig. 6). Unsupervised clustering and correlation analysis of meta-signatures revealed four of the canonical cancer hallmarks: immune response, EMT, proliferation, and metab-olism (Fig. 4b). Metastatic tumours fall into two main subtypes: an EMT-like subtype associated with inflammation signatures29, and a proliferative subtype associated with increased metabolism and systemic stress. In agreement, we observed mutual exclusivity between curated proliferative and EMT gene sets (Fig. 4c). Interestingly, this trend was less prominent across primary tumours (Extended Data Fig. 7a). Importantly, meta-signature activities were found to be weakly associated with biopsy site (Extended Data Fig. 7b) and primary tissue (Extended Data Fig. 7c), and held independently for common cancer types and biopsy sites (Extended Data Fig. 7d).
The immune microenvironment of metastatic diseaseTo characterize the phenotype of host immune responses, we leveraged exome, RNA-seq, and a dedicated assay for T-cell repertoire profiling. On the basis of immune-cell markers proposed in ref. 30, we devel-oped an RNA-seq-based score, MImmScore, to assess the magnitude of leukocyte infiltration. We found that MImmScore was negatively correlated with tumour content (Extended Data Fig. 8a), and posi-tively correlated with stromal infiltration (Extended Data Fig. 8b). MImmScore was compared with canonical T-cell expression markers (RNA-seq-based) and DNA-based T-cell receptor-β (TCRβ ) CDR3 sequencing: all three analyses were in good agreement (Extended Data Fig. 8c, d). We also discovered that metastatic immune infiltration was strongly determined by tumour type (Fig. 5a) and to a lesser degree by biopsy site (Extended Data Fig. 9a). Cancer types known to be infiltrated in the localized stage (Extended Data Fig. 9b), including kidney cancer31,32, lung cancer33, and melanoma34, remained infil-trated at metastatic sites. Less immunogenic types, such as breast and prostate cancers, were generally associated with a lower MImmScore in the primary and metastatic stages (Fig. 5a and Extended Data Fig. 9b). Immune infiltration was found to be heterogeneous not only across cancer types but also within individual cohorts (Extended Data Fig. 9c, d). Strikingly, individual patients with high levels of immune infiltration could be identified even within tumour types that did not thus far respond to immunotherapies.
We hypothesized that metastatic tumours differ not only in the magnitude but also in composition of leukocyte cell types, tumour- infiltrating lymphocytes (TILs), and macrophages. Unsupervised clustering revealed groups of samples with significant differences in TIL composition, on the basis of bulk tumour transcriptome data35 (Fig. 5b). Cancers were most strongly typified by the different ratios of M2 to M0 (unpolarized) macrophages (clusters TIL-2, -4, -6, -7) and different CD8+ to CD4+ T-cell ratios (high CD8+ TIL-1, high CD4+ TIL-4). While immunosuppressive M2 macrophages36 were highly prevalent, pro-inflammatory, anti-tumour M1 macrophages were largely absent. A small cluster of samples (TIL-5) was characterized by a dominant ratio of cytotoxic CD8+ T cells. To assess clonal T-cell expansion, we selected index cases with a high MImmScore and CD8+ T-cell ratio or with low immune infiltration. We ascertained the identity and frequency of T-cell clones by TCRβ CDR3 deep sequencing, and found that the estimated numbers of T cells were markedly increased in the index cases (Supplementary Table 7). Most importantly, this increase was correlated with a significant expansion of T-cell clones (increased clonality) (Fig. 5c and Extended Data Fig. 9e) and a con-comitant decrease in the ratio of regulatory T cells to cytotoxic T cells (Extended Data Fig. 9f). Highly mutated samples were found to be associated with a larger number of infiltrating T cells (Fig. 5d) and an increased MImmScore (Extended Data Fig. 10a).
–20
0
20
40
−75 −50 −25 0 25 50
Dimension 1
Dim
ensi
on 2
Cohort
BRCA
CHOL
GBM
HCC
HNSC
MISC
OVPRAD
SARC
SKCM
Immune response (3)HypoxiaAllograft rejectionImmune response (1)Immune response (2)Interferon responseIn�ammationMetastasisEMT (2)EMT (1)MuscleEpithelial genesProliferation (2)Cell cycle lit.DNA replicationProliferation (1)Cell cycle (reactome)Homologous recombinationTelomere maintenanceProteasome pathwayCancer signatureTranslationTCA cycle
Correlation
–0.5 0 0.5 1.0
b
a
–0.3
–0.2
–0.1
0.0
0.1
0.2
0.3
–0.2 0.0 0.2EMT signature
S p
hase
(FA
NC
A)
c
Figure 4 | Diverse transcriptional profiles of metastatic cancers. a, Global gene expression patterns of the MET500 identify poorly differentiated cancers as illustrated by a t-SNE projection of the MET500 samples. Position of samples within the plot reflects the relative similarity in the expression of cancer-specific markers. Samples are colour-coded on the basis of their assigned analysis cohort. See Fig. 1c for cancer abbreviations; MISC, miscellaneous. b, Correlation heatmap and hierarchical clustering showing similarities (red) and dissimilarities (blue) in the transcriptional activity of computationally derived aggregate sets of signatures across the MET500 ‘meta-signatures’; lit., signature extracted from literature. c, Negative correlation between signatures of EMT and proliferation (S phase of the cell cycle, FANCA pathway). All MET500 samples are shown.
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Article reSeArcH
1 7 a u g u s t 2 0 1 7 | V O L 5 4 8 | N a t u R E | 3 0 1
Next, we focused on the expression of ligand/receptor pairs on the surface of T cells and antigen-presenting cells (APCs). These molecules are either co-stimulatory and required for T-cell activation, or co-inhibitory as in the case of immune checkpoints (Fig. 5e). Most patients were either immunologically silent (clusters Tcell-0, APC-0) or immunologically active (Tcell-1, APC-1), with a highly significant over-lap between the independent cluster analyses (Extended Data Fig. 10b, c). Importantly, almost all samples in APC-1 expressed CD80/CD86 and almost all samples in Tcell-1 expressed CD28. CD80/CD86 are the ligands for the CD28 receptor and a critical signal for T-cell activation.
Finally, we examined the relationships between the emerging predic-tive biomarkers for immune therapy and the transcriptomic immune phenotypes. We stratified patients into three categories: immunologi-cally silent, partly active, and fully active. A sample was categorized as completely or partly active if it was a member of all or at least one of the active clusters: TIL-5, APC-1, Tcell-1, respectively. Patients in the active categories exhibited increased levels of expression biomarkers PD-L1 (ref. 37) (Fig. 5f), HLA38, and granzyme39, and had higher mutational burden (Fig. 5g), which is both a prognostic and predictive
marker40. Finally, leveraging a predictive signature to immunotherapy in metastatic melanoma41, we developed a clinical-response score. As expected, immunologically active patients had significantly higher clinical-response scores (Fig. 5h).
DiscussionDecreases in the cost of sequencing have led to the widespread adop-tion of integrative sequencing for the study of cancer and precision oncology. Accordingly, our real-time clinical sequencing program was established to explore the practical challenges of clinical translation and at the same time to characterize the genomic landscape of advanced cancer. The resulting MET500 cohort represents the first assessment of the genetic and transcriptomic heterogeneity across a wide range of metastatic cancers.
The distribution of mutation frequencies across diverse lineages of metastatic cancers is extremely long-tailed, with relatively few genes mutated at a high rate. We found that 12.2% of our cases harboured potentially pathogenic germline variants, most of which (75%) were related to DNA repair pathways. Mutations in DNA repair pathways
–200
–100
0
100
200
TGC
TA
CC
THC
AP
RA
DS
AR
CS
EC
RB
LCA
STA
DC
OLO
BR
CA
ES
CA
HC
CG
BM
MIS
CC
HO
LP
AA
DO
VH
NS
CLU
NG
KD
NY
SK
CM
THY
M
MIm
mS
core
0.1
0.2
0.3
0 105 2 × 105
Number of T cells (TCRB−seq)
Clo
nalit
y
0
105
2 × 105
3 × 105
100 1,000
Number of mutations
Num
ber
of T
cel
ls (T
CR
B−
seq
)
0
5
10
15
20
25
PD
−L1
(RP
KM
)
AP
C−
0
AP
C-1
AP
C-2
AP
C-3
AP
C-4
AP
C-5
AP
C-6
AP
C-7
AP
C-8
HHLA2LGALS9VTCN1
ICOSLGCD276
TNFSF4TNFSF18
TNFRSF14TNFRSF8
CD70TNFSF9ENTPD1
CD80PDCD1LG2
CD86CD274CD40
0
250
500
750
1,000
Non
-syn
onym
ous
mut
atio
ns
−100
−50
0
50
100
Clin
ical
-res
pon
se s
core
Tcel
l-0
Tcel
l-1
Tcel
l-2
Tcel
l-3
Tcel
l-4
Tcel
l-5
Tcel
l-6
Tcel
l-7
TMIGD2CD28
CD40LGBTLACD27ICOSNT5E
TNFSF8ADORA2A
LAG3TNFRSF18
PDCD1HAVCR2
TNFRSF9CTLA4
TNFRSF4
ExpressionZ-score
420–2–4
TIL-
1
TIL-
2
TIL-
3
TIL-
4
TIL-
5
TIL-
6
TIL-
7
NK cells resting
Macrophages M0
Macrophages M1
T cells follicular helper
T cells CD8
B cells naive
T cells CD4 memory resting
Mast cells resting
Macrophages M2
In�ltrate (%)6040200
a c d
b
e
f g h
= 0.86***
= 0.68**
****
********
***
Silent
Partia
l
Complet
eSile
nt
Partia
l
Complet
eSile
nt
Partia
l
Complet
e
Figure 5 | The immune microenvironment of metastatic cancers. a, Magnitude of immune (leukocyte) infiltration (MImmScore) across the MET500 analysis cohort. b, Hierarchical clustering of samples by their predicted immune infiltrates. c, d, TCR profiling by TCRβ DNA deep sequencing (TCRB–seq). Correlation of estimated T-cell numbers (templates): c, with clonal expansion (high clonality indicates that many T cells have the same TCRβ sequence) (ρ, Spearman’s rank correlation coefficient); d, with number of mutations. e, Clusters of patients based
on the normalized expression levels of APC (left) or T-cell (right) surface molecules. f–h, Genomic correlates for patients grouped by their membership in immunologically active clusters: TIL-5, APC-1, Tcell-1 (silent, none; partial, some; complete, all). f, Expression of PD-L1 (t-test); RPKM, reads per kilobase per million mapped reads. g, Number of non-synonymous mutations (Wilcoxon rank-sum test). h, Response score based on a predictive gene expression signature to immunotherapy (t-test). Significance levels: * P = 0.05–0.001, * * P = 0.001–10−6, * * * P ≤ 10−6.
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
ArticlereSeArcH
3 0 2 | N a t u R E | V O L 5 4 8 | 1 7 a u g u s t 2 0 1 7
have therapeutic implications: hypermutated tumours may respond to immune checkpoint inhibitors42, while HR deficiency could sug-gest sensitivity to PARP inhibitors42,43. The high prevalence of likely pathogenic germline variants suggests that metastatic patients should be considered for genetic counselling and associated germline testing.
By integrating whole-exome sequencing with RNA-seq, we were able demonstrate that transcriptome profiling provides clinically impor-tant and complementary molecular information. We have demon-strated how RNA-seq can be used in a clinical context to characterize gene fusions, outlier gene expression, transcriptional pathways, and the immune microenvironment. Across the MET500 cohort, 37% cases harboured a putative driver fusion, or an inactivating fusion in a tumour-suppressor gene. RNA-seq data played an important part in characterizing the transcriptional networks active in tumour cells as well as the metastatic tumour microenvironment, and suggest that metastatic tumours are significantly de-differentiated, but retain some tissue- and cancer-specific gene expression patterns. We were able to delineate two distinct types of metastasis: proliferative and EMT-like. Interestingly, proliferative tumours were associated with increased metabolism and stress response, while EMT-like tumours were asso-ciated with inflammation-related signatures.
Particularly valuable in the context of immunotherapy are mecha-nism-driven biomarkers that delineate discrete immune checkpoints or mechanisms of immune evasion. However, immune biomarkers need to characterize a complex disease state comprising the tumour genotype (for example, mutational burden), phenotype (for example, PD-L1 expression), and host response (for example, presence of CD8+ T cells). Towards comprehensive immunogenomic profiling, in this study we leveraged DNA- and RNA-seq data, which enabled us to char-acterize not only the tumour genotype but also the phenotype of the host immune response. Our results demonstrate the feasibility of using RNA-seq data to delineate immunologically and potentially clinically distinct subtypes of metastatic tumours, highlighting the potential of clinical RNA-seq for monitoring the tumour microenvironment and guiding immunotherapeutic approaches.
While this study has compared the molecular attributes of a meta-static cancer cohort with those of primary cancer cohorts, it did not utilize matched samples of primary and metastatic biopsies from indi-vidual cases. The sequencing of matched samples could illuminate further the processes behind tumour evolution, resistance to therapy, and immune interactions.
In summary, the metastatic solid tumour cohort represented in this study is a powerful complement to studies that have been done on primary cancers. Metastatic cancer is a highly heterogeneous disease at the genetic, transcriptomic, and microenvironment levels. Progress in the treatment of advanced cancer will therefore depend on our ability to learn the therapeutic implications of metastatic heterogeneity and to develop screening methods and clinical trial designs that match patients to the most promising therapies.
Online Content Methods, along with any additional Extended Data display items and Source Data, are available in the online version of the paper; references unique to these sections appear only in the online paper.
received 23 November 2016; accepted 25 June 2017.
Published online 2 August 2017.
1. Mehlen, P. & Puisieux, A. Metastasis: a question of life or death. Nat. Rev. Cancer 6, 449–458 (2006).
2. Steeg, P. S. Targeting metastasis. Nat. Rev. Cancer 16, 201–218 (2016).3. Friedman, A. A., Letai, A., Fisher, D. E. & Flaherty, K. T. Precision medicine for
cancer with next-generation functional diagnostics. Nat. Rev. Cancer 15, 747–756 (2015).
4. Mauer, C. B., Pirzadeh-Miller, S. M., Robinson, L. D. & Euhus, D. M. The integration of next-generation sequencing panels in the clinical cancer genetics practice: an institutional experience. Genet. Med. 16, 407–412 (2014).
5. Shen, T., Pajaro-Van de Stadt, S. H., Yeat, N. C. & Lin, J. C. Clinical applications of next generation sequencing in cancer: from panels, to exomes, to genomes. Front. Genet. 6, 215 (2015).
6. Jones, S. et al. Personalized genomic analyses for cancer mutation discovery and interpretation. Sci. Transl. Med. 7, 283ra53 (2015).
7. Roychowdhury, S. et al. Personalized oncology through integrative high-throughput sequencing: a pilot study. Sci. Transl. Med. 3, 111ra121 (2011).
8. Byron, S. A., Van Keuren-Jensen, K. R., Engelthaler, D. M., Carpten, J. D. & Craig, D. W. Translating RNA sequencing into clinical diagnostics: opportunities and challenges. Nat. Rev. Genet. 17, 257–271 (2016).
9. Turajlic, S. & Swanton, C. Metastasis as an evolutionary process. Science 352, 169–175 (2016).
10. Robinson, D. et al. Integrative clinical genomics of advanced prostate cancer. Cell 161, 1215–1228 (2015).
11. The Cancer Genome Atlas Research Network, et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
12. Gagan, J. & Van Allen, E. M. Next-generation sequencing to guide cancer therapy. Genome Med. 7, 80 (2015).
13. Mardis, E. R. The translation of cancer genomics: time for a revolution in clinical cancer care. Genome Med. 6, 22 (2014).
14. Parsons, D. W. et al. Diagnostic yield of clinical tumor and germline whole-exome sequencing for children with solid tumors. J. Am. Med. Assoc. Oncol. 2, 616–624 (2016).
15. Mody, R. J. et al. Integrative clinical sequencing in the management of refractory or relapsed cancer in youth. J. Am. Med. Assoc. 314, 913–925 (2015).
16. Wagle, N. et al. High-throughput detection of actionable genomic alterations in clinical tumor samples by targeted, massively parallel sequencing. Cancer Discov. 2, 82–93 (2012).
17. Pritchard, C. C. et al. Inherited DNA-repair gene mutations in men with metastatic prostate cancer. N. Engl. J. Med. 375, 443–453 (2016).
18. Palanisamy, N. et al. Rearrangements of the RAF kinase pathway in prostate cancer, gastric cancer and melanoma. Nat. Med. 16, 793–798 (2010).
19. Robinson, D. R. et al. Functionally recurrent rearrangements of the MAST kinase and Notch gene families in breast cancer. Nat. Med. 17, 1646–1651 (2011).
20. Stransky, N., Cerami, E., Schalm, S., Kim, J. L. & Lengauer, C. The landscape of kinase fusions in cancer. Nat. Commun. 5, 4846 (2014).
21. Agaram, N. P., Zhang, L., Sung, Y. S., Singer, S. & Antonescu, C. R. Extraskeletal myxoid chondrosarcoma with non-EWSR1–NR4A3 variant fusions correlate with rhabdoid phenotype and high-grade morphology. Hum. Pathol. 45, 1084–1091 (2014).
22. Weinreb, I. et al. Novel PRKD gene rearrangements and variant fusions in cribriform adenocarcinoma of salivary gland origin. Genes Chromosom. Cancer 53, 845–856 (2014).
23. Amir, A. D. et al. viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia. Nat. Biotechnol. 31, 545–552 (2013).
24. Lonsdale, J. et al. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
25. Liberzon, A. et al. The Molecular Signatures Database hallmark gene set collection. Cell Syst. 1, 417–425 (2015).
26. Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011).
27. Hänzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics 14, 7 (2013).
28. Vaske, C. J. et al. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics 26, i237–i245 (2010).
29. López-Novoa, J. M. & Nieto, M. A. Inflammation and EMT: an alliance towards organ fibrosis and cancer progression. EMBO Mol. Med. 1, 303–314 (2009).
30. Yoshihara, K. et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat. Commun. 4, 2612 (2013).
31. Attig, S. et al. Simultaneous infiltration of polyfunctional effector and suppressor T cells into renal cell carcinomas. Cancer Res. 69, 8412–8419 (2009).
32. Nakano, O. et al. Proliferative activity of intratumoral CD8+ T-lymphocytes as a prognostic factor in human renal cell carcinoma: clinicopathologic demonstration of antitumor immunity. Cancer Res. 61, 5132–5136 (2001).
33. Ruffini, E. et al. Clinical significance of tumor-infiltrating lymphocytes in lung neoplasms. Ann. Thorac. Surg. 87, 365–371 (2009).
34. Boon, T., Coulie, P. G., Van den Eynde, B. J. & van der Bruggen, P. Human T cell responses against melanoma. Annu. Rev. Immunol. 24, 175–208 (2006).
35. Newman, A. M. et al. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods 12, 453–457 (2015).
36. Sica, A., Schioppa, T., Mantovani, A. & Allavena, P. Tumour-associated macrophages are a distinct M2 polarised population promoting tumour progression: potential targets of anti-cancer therapy. Eur. J. Cancer 42, 717–727 (2006).
37. Patel, S. P. & Kurzrock, R. PD-L1 expression as a predictive biomarker in cancer immunotherapy. Mol. Cancer Ther. 14, 847–856 (2015).
38. van Houdt, I. S. et al. Favorable outcome in clinically stage II melanoma patients is associated with the presence of activated tumor infiltrating T-lymphocytes and preserved MHC class I antigen expression. Int. J. Cancer 123, 609–615 (2008).
39. Galon, J. et al. Type, density, and location of immune cells within human colorectal tumors predict clinical outcome. Science 313, 1960–1964 (2006).
40. Rizvi, N. A. et al. Mutational landscape determines sensitivity to PD-1 blockade in non-small cell lung cancer. Science 348, 124–128 (2015).
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Article reSeArcH
1 7 a u g u s t 2 0 1 7 | V O L 5 4 8 | N a t u R E | 3 0 3
41. Ulloa-Montoya, F. et al. Predictive gene signature in MAGE-A3 antigen-specific cancer immunotherapy. J. Clin. Oncol. 31, 2388–2395 (2013).
42. Mateo, J. et al. DNA-repair defects and olaparib in metastatic prostate cancer. N. Engl. J. Med. 373, 1697–1708 (2015).
43. Helleday, T., Petermann, E., Lundin, C., Hodgson, B. & Sharma, R. A. DNA repair pathways as targets for cancer therapy. Nat. Rev. Cancer 8, 193–204 (2008).
Supplementary Information is available in the online version of the paper.
Acknowledgements This work was supported by a National Institutes of Health (NIH) Clinical Sequencing Exploratory Research Award NIH 1UM1HG006508. Other sources of support included the Prostate Cancer Foundation, Stand Up 2 Cancer (SU2C)-Prostate Cancer Foundation Prostate Dream Team Grant SU2C-AACR-DT0712, Early Detection Research Network grant U01 CA214170, and Prostate SPORE grant P50 CA186786. A.M.C. is a Howard Hughes Medical Institute Investigator, A. Alfred Taubman Scholar, and American Cancer Society Professor. M.C. is supported by a PCF Young Investigator Award. We acknowledge Y. Ning, R. Wang, X. Dang, M. Davis, L. Hodges, J. Griggs, J. Athanikar, C. Brennan, C. Betts, J. Chen, S. Kalyana-Sundaram, K. Giles, and R. Mehra for their contributions to this study. Over 100 physicians referred patients to this study and we acknowledge the following: K. Cooney, M. Hussain, S. Urba, N. Henry, V. Sahai, D. Simeone, C. Lao, J. Smerage, M. Caram, M. Burness, G. Kalemkerian, C. Van Poznak, M. Wicha, R. Buckanovich, J. Bufill, P. Grivas, P. Hu, A. Morikawa, P. Palmbos, B. Redman, F. Feng, G. Hammer, S. Merajver, and A. Pearson. We thank S. Roychowdhury and K. Pienta for help
in protocol development for the MI-ONCOSEQ program. Most importantly, we recognize the generosity and kindness of the cancer patients and their families for participating in this study.
Author Contributions D.R.R., Y.-M.W., and X.C. coordinated clinical sequencing. R.J.L., M.C., and P.V. developed the bioinformatics analysis. J.S. coordinated sample procurement, L.P.K., D.L., and S.A.T. led the histopathology analysis. D.C.S., S.S., M.M.Z., A.A., R.C., F.W., L.H.B., R.J.M., N.R., A.F.S., and D.F.H. coordinated patient recruitment. E.R. was the lead study coordinator. J.E., V.R., E.M.S., and J.I. provided genetic counselling and assessment of PPGMs, and J.V. and K.O. analysed relative risk assessment. J.S.R. coordinated the bioethics component. M.T. and A.M.C. coordinated IRB protocol development. D.R.R., Y.-M.W. and C.K.-S. prepared PMTBs. E.C., M.T., D.F.H., D.R.R., and Y.-M.W. implemented the clinical tiering of molecular aberrations. Y.-M.W., D.R.R., M.C., and A.M.C. developed the figures and tables. A.M.C., M.C., D.R.R., and Y.-M.W. wrote the manuscript with input from all authors. A.M.C. and M.T. designed and supervised the study.
Author Information Reprints and permissions information is available at www.nature.com/reprints. The authors declare no competing financial interests. Readers are welcome to comment on the online version of the paper. Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Correspondence and requests for materials should be addressed to A.M.C. ([email protected]).
reviewer Information Nature thanks S. Bova, P. Robbins and the other anonymous reviewer(s) for their contribution to the peer review of this work.
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
ArticlereSeArcH
MethOdSNo statistical methods were used to predetermine sample size. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment.Patient enrolment. Sequencing of clinical samples was performed under our IRB-approved studies at the University of Michigan (Michigan Oncology Sequencing Protocol, MI-ONCOSEQ, IRB numbers HUM00046018, HUM00067928, HUM00056496). Medically qualified patients 18 years or older with metastatic disease (including advanced or refractory) that could be safely accessed by image-guided biopsy were eligible for inclusion. The MI-ONCOSEQ study was initiated in 2010 and is continuing as of July 2017. All patients pro-vided written informed consent to obtain fresh tumour biopsies, and to perform comprehensive molecular profiling of tumour/germline exomes and tumour transcriptome. Four hundred and twenty-two patients (MO prefix) were enrolled under the MI-ONCOSEQ protocol, which included meeting with genetic coun-sellors at the time of consent. A ‘flexible default’ consent model was used with MO patients, which mandated disclosure of findings that directly impacted the current cancer management, but patients could choose whether to receive incidental results, including those with possible significance for family members or conditions unre-lated to the current cancer. The remaining patients (TP prefix) were usually from external sites and were enrolled under a tissue profiling protocol without initial genetic counselling. TP patients were not given the option to decline germline findings, and germline results relevant to cancer were automatically reported. Once sequenced, a patient’s clinical course was updated quarterly to document clinical status and treatment decisions made by the physicians since the last follow-up.Tissue acquisition and pathology review. Needle biopsies or surgically resected tissues were flash frozen in OCT and a section was cut for evaluation. Remaining portions of each specimen were retained for nucleic acid extraction. Haematoxylin and eosin (H&E)-stained frozen sections were reviewed by pathologists to iden-tify cores or areas with the highest tumour content. In general, multiple sources of data were used to confirm the diagnosis and site of origin of the carcinomas, especially metastatic adenocarcinoma in these biopsies sent for next-generation sequencing. These included review of the electronic medical record for the clinical history, documentation of primary diagnosis and/or previously performed metastatic biopsy with confirmation of site of origin and the morphological assess-ment of H&E‐stained sections (frozen section slides in all cases), and review of concurrent formalin-fixed paraffin-embedded sections, as most of these cases had a previous documentation of metastasis with confirmation of site of origin. In no case were the frozen section slides used exclusively to determine site of origin. Immunohistochemistry using a panel of antibodies was performed on the formalin-fixed paraffin-embedded sections, especially if the biopsy was the first documentation of metastasis. The antibodies used for confirmation of site of primary origin were based on primary diagnosis; however, when clinically indi-cated, depending on the site of metastasis, the presence of multiple primaries and/or whether there was a long time gap between primary diagnosis and metastatic biopsy, to the best possible extent other sites of origin were excluded. Presentation at the PMTB included a review of morphological assessment and immunohisto-chemical results, and correlation of these results with expression analysis of the RNA-seq data, as well as mutation signatures. Sequencing results were used in a supportive fashion to reinforce the diagnosis of primary origin of the metastatic cancer.Integrative clinical sequencing. Integrative clinical sequencing was performed using standard protocols in our Clinical Laboratory Improvement Amendments-compliant sequencing laboratory10,15. Tissues with the highest tumour content for each case were disrupted by 5 mm beads on a Tissuelyser II (Qiagen). Tumour genomic DNA and total RNA were purified from the same sample using an AllPrep DNA/RNA/miRNA kit (Qiagen). Matched normal genomic DNA from blood, buc-cal swab, or saliva was isolated using an DNeasy Blood & Tissue Kit (Qiagen). RNA integrity was measured on an Agilent 2100 Bioanalyzer using RNA Nano reagents (Agilent Technologies). RNA-seq was performed either by processed (poly(A)+) transcriptome or exome-capture transcriptome platform10,44. Both poly(A)+ and capture transcriptome libraries were prepared using 1–2 μ g of total RNA. Poly(A)+ RNA was isolated using Sera-Mag oligo(dT) beads (Thermo Scientific) and frag-mented with an Ambion Fragmentation Reagents kit (Ambion). cDNA synthesis, end-repair, A-base addition, and ligation of the Illumina index adapters were per-formed according to Illumina’s TruSeq RNA protocol (Illumina). Libraries were size-selected on 3% agarose gel. Recovered fragments were enriched by PCR using Phusion DNA polymerase (New England Biolabs) and purified using AMPure XP beads (Beckman Coulter). Capture transcriptomes were prepared as above without the upfront mRNA selection and captured by Agilent SureSelect Human All Exon v.4 probes following the manufacturer’s protocol. Library quality was measured on an Agilent 2100 Bioanalyzer for product size and concentration.
Paired-end libraries were sequenced by an Illumina HiSeq 2000 or HiSeq 2500 (2 × 100-nucleotide read length), with a sequence coverage of 40–50 million paired reads. Reads that passed the chastity filter of Illumina BaseCall software were used for subsequent analysis.
Exome libraries of matched pairs of tumour/normal DNAs were prepared as described previously10,15. In brief, 1–3 μ g of genomic DNA was sheared using a Covaris S2 to a peak target size of 250 base pairs (bp). Fragmented DNA was concentrated using AMPure beads, followed by end-repair, A-base addition, liga-tion of the Illumina indexed adapters, and size selection on 3% Nusieve agarose gels (Lonza). Fragments between 300 to 350 bp were recovered, amplified using Illumina index primers, and purified by AMPure beads. One microgram of the library was hybridized to the Agilent SureSelect Human All Exon v.4. The targeted exon fragments were captured and enriched following the manufacturer’s protocol (Agilent). Paired-end whole-exome libraries were analysed by an Agilent 2100 Bioanalyzer and DNA 1000 reagents and sequenced using an Illumina HiSeq 2000 or HiSeq 2500 (Illumina).
We used the publicly available software FastQC to assess sequencing quality. For each lane, per-base quality scores across the length of the reads were examined. Lanes were deemed passing if the per-base quality score box plot indicated that > 75% of the reads had > Q20 for bases 1–80. In addition to the raw sequence quality, the alignment quality was also assessed using the Picard package. This allowed monitoring of duplication rates and chimaeric reads that may have resulted from ligation artefacts—crucial statistics for interpreting the results of copy number and structural variant analysis.TCRβ repertoire deep sequencing (immunoSEQ). Amplification and sequencing of (TCRB/IGH/IGKL/TCRAD/TCRG) CDR3 was performed using the immu-noSEQ Platform (Adaptive Biotechnologies). The same DNA aliquot obtained from frozen tumour tissues was used as for the exome sequencing. The immu-noSEQ Platform combined multiplex PCR with high-throughput sequencing and a sophisticated bioinformatics pipeline for (TCRB/IGH/IGKL/TCRAD/TCRG) CDR3 analysis, including internal PCR amplification controls. Duplicate PCR reactions were done on all samples with > 1 μ g of DNA. Computational analysis of sequencing data, including the estimation of the total number of tem-plates and identification of clonotypes, was performed using the vendor-supplied analysis portal.Mutation analysis. Whole-exome sequencing was performed on an Illumina HiSeq 2000 in paired-end mode and the primary base call files were converted into FASTQ sequence files using the bcl2fastq converter tool bcl2fastq-1.8.4 in the CASAVA 1.8 pipeline. The FASTQ sequence files generated were then processed through an in-house pipeline constructed for whole-exome sequence analyses of paired cancer genomes. The sequencing reads were aligned to the reference genome build hg19, GRCh37 using Novoalign Multithreaded (version 2.08.02) (Novocraft) and converted into BAM files using SAMtools (version 0.1.18). Sorting and indexing of BAM files used Novosort threaded (version 1.00.01) and duplicate reads were removed using Picard (version 1.74). Mutation analysis was performed with VarScan2 algorithms (version 2.3.2) using the pileup files created by SAMtools mpileup for tumour and matched normal samples, simultaneously performing the pairwise comparisons of base call and normalized sequence depth at each position. To detect SNVs, filtering parameters including coverage, var-iant read support, variant frequency, P value, base quality, homopolymer, and strandedness were applied. For indel analysis, Pindel (version 0.2.4) was used on tumour and matched normal samples; indels common to both samples were classified as germline, indels present in tumour but not in normal were classified as somatic. Finally, the list of candidate indels as well as somatic and/or germline mutations was generated by excluding synonymous SNVs. ANNOVAR69 was used to functionally annotate the detected genetic variants, and positions were based on Ensemble66 transcript sequences.
Tumour content for each tumour exome library was estimated from the sequence data by fitting a binomial mixture model with two components to the set of most likely SNV candidates on two-copy genomic regions. The set of candidates used for estimation consisted of coding variants that (1) exhibited at least 3 variant fragments in the tumour sample, (2) exhibited zero variant fragments in the matched benign sample with at least 16 fragments of coverage, (3) were not present in the Single Nucleotide Polymorphism database (dbSNP), (4) were within a targeted exon or within 100 bp of a targeted exon, (5) were not in homopolymer runs of four or more bases, and (6) exhibited no evidence of amplification or deletion. To filter out regions of possible amplification or deletion, we used exon coverage ratios to infer copy-number changes, as described below. Resulting SNV candidates were not used for estimation of tumour content if the segmented log- ratio exceeded 0.2 in absolute value. Candidates on the Y chromosome were also eliminated, because they were unlikely to exist in two-copy genomic regions. Using this set of candidates, we fitted a binomial mixture model with two components
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Article reSeArcH
using the R package flexmix, version 2.3.8. One component consisted of SNV candidates with very low variant fractions, presumably resulting from recurrent sequencing errors and other artefacts. The other component, consisting of the likely set of true SNVs, was informative of tumour content in the tumour sample. Specifically, under the assumption that most or all of the observed SNV candidates in this component were heterozygous SNVs, we expected the estimated binomial proportion of this component to represent one-half of the proportion of tumour cells in the sample. Thus, the estimated binomial proportion as obtained from the mixture model was doubled to obtain an estimate of tumour content.
Recurrently mutated genes were classified as putative oncogenes and tumour suppressors. Initially, we divided them into tumour suppressors and oncogenes on the basis of the proportion of inactivating (two-hit, nonsense) aberrations (relative to all other aberrations, including missense mutations and amplifications) using a heuristic cut-off of 0.65. This initial classification was then reviewed on the basis of the distribution of somatic mutations in COSMIC (for example, presence of hotspots, ratio of inactivating mutations, prevalence of frameshifts, etc.), whether the mutations were putative gain or loss-of-function, and relevant gene-related literature. This resulted, for several oncogenes but not tumour suppressors, in the re-classification to tumour suppressor, namely FAT1, KMT2B, KMT2C, KMT2D, RAD50, RNF43, MSH2, SMC4, KEAP1, MUTYH, BRIP1, and VHL.Copy-number aberration. Copy-number aberration was quantified and reported for each gene as the segmented normalized log2-transformed exon coverage ratios between each tumour sample and matched normal sample. To account for observed associations between coverage ratios and variation in GC content across the genome, lowess normalization was used to correct per-exon coverage ratios before segmentation analysis. Specifically, mean GC percentage was computed for each targeted region, and a lowess curve was fitted to the scatterplot of log2-cov-erage ratios versus mean GC content across the targeted exome using the lowess function in R (version 2.13.1) with smoothing parameter f = 0.05. Partly redundant sequencing of areas of the genome afforded the ability for cross-validation of find-ings. We cross-validated exome-based point mutation calls by manually examining the genomic and transcriptomic reads covering the mutation using the University of California, Santa Cruz (UCSC) Genome Browser. Likewise, gene fusion calls from the transcriptome data could be further supported by structural variant detec-tion in the genomic sequence data, as well as copy-number information derived from the genome and exome sequencing.Mutation burden estimation. The Varscan2 processed VCF files from 33 TCGA cohorts were downloaded from the GDC data portal and lifted-over from the GRCh38 to GRCh37 reference genome using CrossMap to compare with MET500. The mutations were filtered by coverage (at least 10× ) and variant allelic fraction (at least 6%). These mutations were further narrowed down to be within 10 bp of the Agilent All Exon v.4 captured regions. The mutation burden was estimated as (total mutation/total covered bases) × 106. Finally, we identified 20 common cohorts between MET500 and TCGA.Comparisons of gene-level aberration frequency. To compare the mutation frequency between primary and metastatic tumours, we first identified 20 analysis cohorts (tumour types) shared between the TCGA and MET500 projects. For each of those cohorts, we obtained aberration frequencies for selected most recurrently/ubiquitously mutated oncogenes (TP53, PTEN, RB1) and tumour suppressors (KRAS, PIK3CA, GNAS). We compared the aberration frequencies for each gene within each tumour type with Fisher’s exact test for a total of 120 dependent tests. To correct for multiple testing, we applied the BY method of Benjamini, Hochberg, and Yekutieli.RNA-seq data analysis. Strand-specific RNA-seq libraries were analysed using the CRISP clinical RNA-seq pipeline, which comprises expression analysis, virus detec-tion, and structural variant detection using a separate tool CODAC (M.C., Y.M.W., D.R.R. and A.M.C., manuscript in preparation). CRISP is composed of several tasks: pre-alignment QC, read grooming, alignment, post-align QC, quantification. Notably, fusion calling is not a part of CRISP and is done independently using multiple of CRISP output files. The components were either chosen on the basis of their performance and robustness (for example, featureCounts) or rewritten from scratch (HPSEQ, sepath, PaPy). In brief, reads that passed vendor QC thresholds were trimmed of adaptor sequences and aligned to the GRCh38 reference genome with added sequences for known oncogenic viruses and a transcript reference data-base on the basis of Gencode v.23. STAR_2.4.0g1 was used for alignment with the following settings (not-default): outSAMstrandField: None; alignSJoverhangMin: 8; alignSJDBoverhangMin: 3; scoreGenomicLengthLog2scale: 0; alignIntronMin: 20; alignIntronMax: 1000000; alignMatesGapMax: 1000000. For Chimeric align-ment, used for structural variant detection, the following settings were applied: alignIntronMax: 400000; alignMatesGapMax: 400000; chimSegmentMin: 10; chimJunctionOverhangMin: 1; chimScoreSeparation: 0; chimScoreJunctionNonG-TAG: 0; chimScoreDropMax: 1,000; chimScoreMin: 1.
The chimaeric output was analysed for chimaeric junction supported by span-ning and encompassing reads, and then filtered. CRISP and CODAC were tuned to perform optimally with a custom set of reference transcripts based on Gencode v.23 (MOTR). This set lacked many questionable transcripts such as non-coding transcripts overlapping coding exons, transcripts linking two protein-coding genes, read-through transcripts, non-coding transcripts for protein-coding genes, strange isoforms with extremely long exons, intron-retention isoforms, etc. Many of those could decrease the reliability of gene-based expression estimates and, most impor-tantly, limit the ability to detect fusions as the detection of chimaeric reads rests on the assumption that the ‘fused’ genes are not part of a known isoform. Also, by eliminating coding–non-coding overlaps, the coding (PROT) and non-coding (NONC) portions of MOTR were disjointed, simplifying many downstream analyses. In addition to standard chromosomes and unplaced contigs, we included several sequences of laboratory contaminants (for example, mycoplasma), path-ogens (for example, tuberculosis), and oncogenic viruses (for example, human papilloma virus). A custom pipeline inspected the reads aligned to viral sequences (which are often problematical, for example highly repetitive). The sequencing approach used in this study did not detect HIV with reliability or sensitivity, precluding its use clinically in this regard. Paired-end reads were trimmed from adaptor sequences (in-house tool) and processed twice, once for linear alignment and downstream expression profiling, and a second time for chimaeric align-ment. Before chimaeric alignment, the reads were ‘merged’: that is, if the two mates overlapped because the sequenced RNA fragment was shorter than twice the read length, the reads were combined into a synthetic single long read. This greatly improved the sensitivity by which STAR could detect a chimaeric junction. CODAC could use both alignment files to call fusions.Fusion calling (CODAC). Our fusion-calling pipeline allowed us to detect fusions regardless of the location of breakpoints within gene bodies, which in turn allowed us to detect a wider range of aberrations including gain-of-function fusions and truncating loss-of-function fusions. The chimaeric alignments from STAR were aggregated using custom software (M.C., Y.M.W., D.R.R. and A.M.C., manuscript in preparation) and filtered for recurrent artefacts, breakpoints within problem-atic repetitive regions, segmental duplications, and possible alignment errors (mismatches, pseudogenes). Variable cut-offs of supporting reads were required, depending on the breakpoint position (higher if the breakpoint was in problematic regions), with a minimum of three high-quality spanning reads defined as having a long alignment (> 60 bp) on both ends of the breakpoint and a low number of mismatches (sequencing errors, SNPs, or mutations), and low repetitiveness score.Computational fusion validation. We performed both computational and exper-imental validations to estimate the specificity of our fusion-calling algorithm CODAC (M.C., Y.M.W., D.R.R. and A.M.C., manuscript in preparation). First, we ran the algorithm on 50 randomly selected GTEX libraries (normal samples) and found that, for 90% of the cases, at most, two false-positive fusions were called. Next, we compared the quality of clinically reportable or pathogenic fusion calls with the remaining fusion calls, using a compound fusion-quality score taking into account several quality metrics: number of spanning reads, alignment quality, repetitiveness of the DNA, presence of splice donor–acceptor motif, etc. Likely pathogenic and non-pathogenic fusions were very similar in terms of overall quality. To further validate our algorithm, we plotted the number of fusions and number of copy-number breakpoints per sample, and observed that the number of DNA and RNA-breakpoints was highly correlated.Experimental fusion validation. To validate the sensitivity/specificity of the fusion-calling pipeline, we randomly selected 17 private pathogenic fusions from Supplementary Table 6, and 40 fusions from 2 cases with highly rearranged genomes (20 random fusions per case, ranging from high to low supporting reads) and performed reverse transcription PCR followed by Sanger sequencing of the fusion fragments. cDNA was synthesized using a SuperScript III First-Strand Synthesis SuperMix Kit according to the manufacturer’s instructions (Thermo Fisher Scientific/Invitrogen). PCR amplification was performed using fusion- specific primers (synthesized by IDT) and a HotStarTaq Plus Master Mix Kit (Qiagen). PCR products were subjected to electrophoresis and purified using a QIAEX II Gel Extraction Kit (Qiagen) before Sanger sequencing. Sanger sequenc-ing was performed by the University of Michigan DNA Sequencing Core. Fifty-six out of the 57 candidates (98.2%) were validated by this approach.Adjustment. Capture and polyA expression levels could be almost perfectly adjusted across the whole dynamic range of gene expression. The adjustment was based on the (shown valid) assumption that most of the differences were due to systematic differences: for example, capture efficiency or transcript stability. These differences were estimated from data of 400 paired polyA and capture libraries using a linear model with shrinkage and variance pooling using limma with Voom precision esti-mates (M.C., Y.M.W., D.R.R. and A.M.C., manuscript in preparation). Correction of systematic biases through a linear model (Supplementary Methods) eliminated
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
ArticlereSeArcH
almost all of the apparent differences, and resulted in a good correlation between capture and polyA RNA-seq data (r = 0.97), indicating high overall reproducibility and enabling us to jointly analyse both datasets.Selection of marker genes. We followed a multi-step procedure to identify expres-sion markers for normal tissues and primary tumour types. First, we assembled expression compendia for both tasks. The normal tissue compendium included all the data available from GTEX and the Human Proteome Atlas, for a total of 36 different tissues/organs. The primary cancer compendium included all data from the TCGA for 33 primary tumour types. Then for each tissue/cancer type we sought to find protein-coding genes that fulfilled the following criteria: (1) highly expressed for that given tissue/cancer type; (2) upregulated in that tissue compared with other tissues/cancer types; (3) expressed in only a few tissues/cancer types; (4) not redundant with other better markers for that tissue/cancer types.
We implemented an algorithm (available at https://github.com/mcieslik-mctp/) identifying such genes using two statistical criteria: enrichment Z-score (that is, how much higher a gene was expressed in a target tissue/tumour relative to other tissues tumours) and Hoyer’s sparsity (that is, sparsity was highest if a gene was expressed only in one tissue type and lowest if it was expressed in all tissues). Next, all genes were ranked according to both analyses and an average rank computed. This ranked list of genes was traversed from the putative markers (highest rank) to worst (lowest rank) to populate a shortlist of uncorrelated markers, beginning with the top-ranking gene which was automatically included in the shortlist. For each subsequent evaluated marker, a correlation with all other previously included markers was computed and a cut-off applied (Spearman’s ρ > 0.85). This process was repeated until 50 shortlisted marker genes were identified for each tissue. We applied this algorithm to the two datasets (that is, 36 normal tissues and 33 cancer types), which resulted in two sets of marker genes, Nt36 and Tc33, respectively.Expression signature analysis. To estimate signature/hallmark/pathway activity lev-els, we used two different approaches. The first relative (‘intrinsic’) approach was similar to GSVA and ssGSEA in that it estimated the activity of a pathway in one sample relative to the activity of that pathway in a cohort of other samples. In our case, the cohort was the MET500. The expression of each gene in the pathway was transformed into percentiles and the activity of each pathway was calculated as the average percentile score of all genes in a pathway minus 50 (that is, the expected median activity of a pathway). The intrinsic score has been shown to correlate very well with GSVA estimates. All GSVA analyses used default settings. The second, ‘extrinsic’ approach was analogous to the relative (intrinsic) approach, but expres-sion percentiles for each gene were calculated, not relative within the MET500 cohort, but compared with the expression levels of that gene in over 8,000 samples from 36 normal tissues (GTEX and HUPA), each tissue represented by the same number of transcriptomes (oversampling for some of the less well studied tissues). For all pathway analyses, we used the gene sets provided by MSigDB, including the hallmark sets and the experimentally perturbed datasets.MImmScore. The MImmScore is an aggregate analysis of immune infiltration based on the expression of multiple immune-related genes. It is derived from the ‘immune signature’ genes used in the ESTIMATE method in ref. 30. The 141 genes included in this set fulfil several criteria: (1) are common to RNA-seq and major microarray platforms, (2) are highly expressed in haematopoietic cells compared with normal tissues, (3) are upregulated in highly immune infiltrated ovarian tumours, and (4) are non-redundant. A signature derived from the expression of these genes has been shown to track with the amount of non-tumour cells on the basis of EpCam expression. We used all 141 ‘immune score’ genes, but chose a different statistical approach to transform their relative expression levels into a compound score. We used the so-called ‘inverse normal transformation’ method that is common in eQTL and other regression analyses. In brief, this rank-based method minimizes the influence of outliers, and genes with non-normally dis-tributed expression levels, on the compound score. The first step is to transform the variable (that is, gene expression across samples) to ranks and subsequently
percentiles. The percentiles are then transformed to standard normal deviates using the inverse normal (or probit function). In other words, the expression of each gene is made to follow the normal distribution. The MImmScore is simply computed by summing, for each sample, the standard normal deviates for each gene. The MImmScore is hence analogous to the ‘intrinsic’ signature expression analysis, with the added step of converting percentiles to the Z-scores associated with a given probability (using the standard qnorm function in R). Samples with high levels of immune infiltration will have many genes expressed above the 50th percentile (Z-score > 0) and hence the overall MImmScore will be positive.Pathogenic germline variant analysis. Pathogenicities of germline variants were determined through review of the published literature, public databases including but not limited to ClinVar, the Human Genome Mutation Database, Leiden Open Variation Databases, and variant-specific databases (for example, International Agency for Research on Cancer TP53 Database, International Society for Gastrointestinal Hereditary Tumours mutation databases). Only cancer- relevant germline variants that had been previously categorized as pathogenic in ClinVar, or adjudicated at the PMTB as pathogenic, were disclosed on the clinical report. These clinically reported germline variants are shown in Supplementary Table 4. Variants with conflicting pathogenicity reports and variants not previously reported were considered to be of uncertain significance and not considered for disclosure. After disclosure, familial testing was recommended. Clinical relevance of somatic variants was investigated using an integrated approach incorporating technical considerations, (recurrence, variant allele fraction, expression levels, and predictive algorithms for pathogenicity), variant-specific information (ClinVar, published literature, and curated gene specific resources), as well as published correlations of drug/variant sensitivity profiles. Considerations of tumour hetero-geneity, including clonal versus subclonal mutations, were addressed by comparing variant allele fractions and copy-number estimates for each of the mutations to post-sequencing estimates of tumour content derived from SNV and copy-number analyses.PMTB activity. A fortnightly, multidisciplinary PMTB interpreted and deliber-ated on sequencing results for each patient. PMTB participants included paediat-ric and adult oncologists, geneticists, pathologists, biologists, bioinformaticians, bioethicists, genetic counsellors, study coordinators, and ad hoc expertise. Selected findings underwent additional independent Clinical Laboratory Improvement Amendments-validated testing, and summarized results were disclosed to treating oncologists and families by the clinical sequencing team, board-certified clinical geneticists, and/or counsellors, as appropriate. For this study, potentially actionable findings were defined as any genomic findings discovered during sequencing anal-ysis that could (1) lead to a change in patient management by providing a targetable molecular aberration, (2) lead to a change in diagnosis or risk stratification, or (3) provide cancer-related germline findings that could inform patients/families about a potential future risk of various cancers.Data availability. Sequencing data from the 500 patients enrolled in this study can be obtained from the Database of Genotypes and Phenotypes (dbGaP) under accession number phs000673.v2.p1. The MET500 web portal is available at http://met500.path.med.umich.edu. To make the data accessible to the broader scientific and clinical communities, we have made the somatic landscape searchable through a MAGI-based web interface45 at http://met500.path.med.umich.edu. All custom analysis software used in this study is publicly available on github at https://github.com/mcieslik-mctp/ and https://github.com/mctp/. All other data are available from the corresponding author upon reasonable request.
44. Cieslik, M. et al. The use of exome capture RNA-seq for highly degraded RNA with application to clinical cancer sequencing. Genome Res. 25, 1372–1381 (2015).
45. Leiserson, M. D. et al. MAGI: visualization and collaborative annotation of genomic aberrations. Nat. Methods 12, 483–484 (2015).
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Article reSeArcH
Female Male
242; 48.4%
258; 51.6%
a b
18-25 26-35 36-45 46-55 56-65 >650
20
40
60
80
100
120
140
160
180
Age Distribution
Num
ber o
f cas
es
1231
46
113
161
137
(yr)
BLCA
BRCA
CHOL
COLO
KDNY
LUNG
OV
PRAD
TGCT
THCA
PRADTHCA
BRCA
PRAD
SARC
TGCT
COLO
ESCA
BLCA
THYM
BLCA
LUNG
PRAD
THYM
PTEN GNAS
RB1 PIK3CA
TP53 KRAS
0% 5% 10% 15% 20% 25% 0.0% 2.5% 5.0% 7.5%10.0%12.5%
0% 5% 10% 15% 20% 25% 0% 10% 20% 30%
0% 20% 40% 60% 80% 0% 20% 40% 60%0%
20%
40%
60%
0%
10%
20%
30%
0%
10%
20%
30%
0%
25%
50%
75%
100%
0%
10%
20%
0%
10%
20%
30%
40%
altered in primary tumors
alte
red
in m
etas
tatic
tum
ors
012345
nominal−log10(p−value)
adjusted−log10(p−value)
BY < 0.1N.S.
PIK3CA
ARKRAS
CCND1
EGFR
ESR1
ERBB2
BRAF
MYC
MDM2
MTORFGFR2
FOXA1
EWSR1
TP53
CDKN2A
PTENRB1
ARID1A
APC
NF1
ARID1B
KDM6A
ARID2
BRCA2
KMT2CCREBBP
NCOR1
ERG
common GOF
restricted GOF
common LOF
restricted LOF
c
d
Extended Data Figure 1 | Demographics of the MET500 cohort and summary of common genetic aberrations. a, Gender distribution of the MET500 cohort. b, Age distribution of the MET500 cohort. c, Bubble plot of clinically actionable genetic aberrations. Genes have been divided by putative gain-of-function (oncogene, red) or loss-of-function (tumour suppressor, blue) status. Common aberrations are defined as those observed in five or more MET500 analysis cohorts (Fig. 1c), restricted aberrations are found in fewer than five analysis cohorts. Bubble area is proportional to the observed frequency of the aberration across the
MET500 cohort. d, Comparison of genetic aberration frequencies (SNVs, indels, amplifications, predicted homozygous deletions) between primary (TCGA) and metastatic (MET500) tumours for select tumour suppressors (left) and oncogenes (right). TCGA data for the primary cancer cohorts were obtained from the cBio portal. Nominal statistical significance is based on Fisher’s exact test. Statistically significant differences in frequencies after correction for multiple dependent tests using the Benjamini–Yekutieli procedure are indicated as circles, insignificant differences are shown as triangles.
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
ArticlereSeArcH
�
�
�
�
�
�
�
�
�
�
�
�
��
��
�
�
��
�
���
�
�
�
�
�
�
�
�
�
��
�
�
��
�
�
�
�
�
�
0
25
50
75
100TH
YM
CO
LOK
DN
YTH
CA
PAA
DS
EC
RS
AR
CS
TAD
MIS
CG
BM
HN
SC
SK
CM
ACC
CH
OL
BLC
AH
CC
OV
ES
CA
LUN
GTG
CT
BR
CA
PR
AD
num
ber o
f fus
ions
−20
0
20
40
−75 −50 −25 0 25 50Dimension 1
Dim
ensi
on 2
biopsy siteadrenalbladderbone_marrowbrainbreastcervixcolonliverlunglymph_nodeoralpancreasprostateskinsoft_tissuethyroid
−60
−30
0
30
60
−60 −30 0 30 60Dimension 1
Dim
ensi
on 2
cohortACCBLCABRCACESCCHOLCOADDLBCESCAGBMHNSCKICHKIRCKIRPLAMLLGGLIHCLUAD
LUSCMESOOVPAADPCPGPRADREADSARCSKCMSTADTGCTTHCATHYMUCECUCSUVM
0MB
90MB
180MB
1
0MB
90MB
180MB
2
0MB
90MB
180MB
3
180M
B
0MB90M
B
4
0MB90
MB
180M
B
5
0MB
90MB
6
0MB
90MB7
0MB
90MB8
0MB
90MB
9
0MB
90MB
10
0MB
90MB
11
0MB
90MB12
0MB
90MB13
0MB
90MB
14 0MB 90
MB15
0MB
90M
B
16
0MB 17
0MB 18
0MB 19
0MB 20
0MB 21
0MB 22
0MB
90MB
X
0MB Y
0MB
M
0MB
90MB
180MB
1
0MB
90MB
180MB
2
0MB
90MB
180MB
3
180M
B
0MB90M
B
4
0MB90
MB
180M
B
5
0MB
90MB
6
0MB
90MB7
0MB
90MB8
0MB
90MB
9
0MB
90MB
10
0MB
90MB
11
0MB
90MB12
0MB
90MB13
0MB
90MB
14 0MB 90
MB15
0MB
90M
B
16
0MB 17
0MB 18
0MB 19
0MB 20
0MB 21
0MB 22
0MB
90MB
X
0MB Y
0MB
M
0MB
90MB
180MB
1
0MB
90MB
180MB
2
0MB
90MB
180MB
3
180M
B
0MB90M
B
4
0MB90
MB
180M
B
5
0MB
90MB
6
0MB
90MB7
0MB
90MB8
0MB
90MB
9
0MB
90MB
10
0MB
90MB
11
0MB
90MB12
0MB
90MB13
0MB
90MB
14 0MB 90
MB15
0MB
90M
B
16
0MB 17
0MB 18
0MB 19
0MB 20
0MB 21
0MB 22
0MB
90MB
X
0MB Y
0MB
M
0MB
90MB
180MB
1
0MB
90MB
180MB
2
0MB
90MB
180MB
3
180M
B
0MB90M
B
4
0MB90
MB
180M
B
5
0MB
90MB
6
0MB
90MB7
0MB
90MB8
0MB
90MB
9
0MB
90MB
10
0MB
90MB
11
0MB
90MB12
0MB
90MB13
0MB
90MB
14 0MB 90
MB15
0MB
90M
B
16
0MB 17
0MB 18
0MB 19
0MB 20
0MB 21
0MB 22
0MB
90MB
X
0MB Y
0MB
M
PTEN
NF1 TP53
RB1
c
b
d
0
30
60
90
esop
hagu
s
blad
der
gall_
blad
der
colo
n
brea
st
stom
ach
oral
pros
tate
ovar
y
lung
brai
n
thyr
oid
kidn
ey
adre
nal
test
is
panc
reas
liver
tissu
e−m
arke
rs(p
erce
ntile
exp
ress
ion)
normalprimarymets
NS NS
e
a
Extended Data Figure 2 | See next page for caption.
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Article reSeArcH
Extended Data Figure 2 | Analysis of pan-cancer metastatic transcriptomes. a, Structural rearrangements in metastatic genomes. Distribution of the number of fusions per case is plotted across the MET500 by analysis cohort (see Fig. 1c for cancer abbreviations). The y axis is truncated at 100 fusions. Error bars show the range from Q1 − 1.5 × IQR to Q3 + 1.5 × IQR, where Q1 is the first quartile, Q3 is the third quartile, and IQR is the interquartile range. b, Summary Circos diagrams of predicted inactivating fusions for select tumour-suppressor genes across the cohort. Arc end positions indicate the chimaeric junctions; colours indicate type of rearrangement. Black, tandem duplication; blue, translocation; red, inversion; grey, signifies that multiple close junctions were detected. c, The t-SNE plot for the TCGA pan-cancer meta-cohort (a random selection of cases from each primary tumour type) on the basis of the expression of tumour-type-specific marker genes (same genes as in Fig. 4a). d, The t-SNE plot for the MET500 samples coloured by biopsy site (same samples as in Fig. 4a, there coloured by cancer type). e, Average percentile expression of tissue-specific genes in normal tissues, primary cancers, and metastases. Error bars, s.d. Significance tests were done for all normal–primary and primary–metastasis pairs of samples; all comparisons were significant
(P < 0.01) according to a two-tailed t-test, with the exception of those indicated with NS. ACC, adrenocortical carcinoma; BLCA, bladder urothelial carcinoma; BRCA, breast invasive carcinoma; CESC, cervical squamous cell carcinoma and endocervical adenocarcinoma; CHOL, cholangiocarcinoma; COAD, colon adenocarcinoma; DLBC, lymphoid neoplasm diffuse large B-cell lymphoma; ESCA, oesophageal carcinoma; GBM, glioblastoma multiforme; HNSC, head and neck squamous cell carcinoma; KICH, kidney chromophobe; KIRC, kidney renal clear cell carcinoma; KIRP, kidney renal papillary cell carcinoma; LAML, acute myeloid leukaemia; LGG, brain lower grade glioma; LIHC, liver hepatocellular carcinoma; LUAD, lung adenocarcinoma; LUSC, lung squamous cell carcinoma; MESO, mesothelioma; OV, ovarian serous cystadenocarcinoma; PAAD, pancreatic adenocarcinoma; PCPG, pheochromocytoma and paraganglioma; PRAD, prostate adenocarcinoma; READ, rectal adenocarcinoma; SARC, sarcoma; SKCM, skin cutaneous melanoma; STAD, stomach adenocarcinoma; TGCT, testicular germ cell tumours; THYM, thymoma; THCA, thyroid carcinoma; UCS, uterine carcinosarcoma; UCEC, uterine corpus endometrial carcinoma; UVM, uveal melanoma.
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
ArticlereSeArcH
CSR_LATE_UP.V1_UPRB_P107_DN.V1_UPE2F1_UP.V1_UPRPS14_DN.V1_DNGCNP_SHH_UP_LATE.V1_UPMYC_UP.V1_UPTBK1.DN.48HRS_UPRB_DN.V1_UPHOXA9_DN.V1_DNCSR_EARLY_UP.V1_UPGCNP_SHH_UP_EARLY.V1_UPVEGF_A_UP.V1_DNTBK1.DF_DNCAMP_UP.V1_UPPRC2_EED_UP.V1_DNEIF4E_UPTBK1.DN.48HRS_DNHINATA_NFKB_MATRIXE2F3_UP.V1_UPSIRNA_EIF4GI_DNMTOR_UP.V1_UPERB2_UP.V1_DNRB_P107_DN.V1_DNPRC2_EZH2_UP.V1_UPSNF5_DN.V1_UPMTOR_UP.N4.V1_UPBCAT_BILD_ET_AL_UPSRC_UP.V1_DNGLI1_UP.V1_DNCYCLIN_D1_KE_.V1_UPRB_DN.V1_DNMEL18_DN.V1_UPRB_P130_DN.V1_DNSIRNA_EIF4GI_UPLTE2_UP.V1_DNAKT_UP.V1_UPGCNP_SHH_UP_LATE.V1_DNBMI1_DN.V1_UPPIGF_UP.V1_UPMEK_UP.V1_DNMEK_UP.V1_UPATF2_UP.V1_UPYAP1_UPIL15_UP.V1_UPAKT_UP_MTOR_DN.V1_UPEIF4E_DNE2F3_UP.V1_DNERB2_UP.V1_UPBMI1_DN_MEL18_DN.V1_UPP53_DN.V1_UPIL2_UP.V1_UPSINGH_KRAS_DEPENDBCAT_GDS748_DNSTK33_SKM_UPHOXA9_DN.V1_UPCORDENONSI_YAP_CONEGFR_UP.V1_UPBCAT_BILD_ET_AL_DNESC_J1_UP_LATE.V1_UPESC_V6.5_UP_EARLY.V1_DNJAK2_DN.V1_DNYAP1_DNBCAT.100_UP.V1_UPKRAS.50_UP.V1_UPKRAS.KIDNEY_UP.V1_DNHINATA_NFKB_IMMU_INFGLI1_UP.V1_UPRPS14_DN.V1_UPKRAS.LUNG_UP.V1_UPCAHOY_ASTROGLIALP53_DN.V2_UPLEF1_UP.V1_UPATM_DN.V1_UPKRAS.AMP.LUNG_UP.V1_UPPTEN_DN.V1_DNKRAS.300_UP.V1_DNP53_DN.V1_DNBRCA1_DN.V1_UPKRAS.LUNG.BREAST_UP.V1_UPCAHOY_NEURONALKRAS.300_UP.V1_UPKRAS.600_UP.V1_UPATM_DN.V1_DNKRAS.BREAST_UP.V1_UPKRAS.50_UP.V1_DNKRAS.600.LUNG.BREAST_UP.V1_UPKRAS.LUNG_UP.V1_DNPTEN_DN.V1_UPKRAS.KIDNEY_UP.V1_UP
% diffactivity
−20
−10
0
10
20
a
��
��
��
�
�
��
�
�
�
�
��
�
�
�
�
�
��
�
�
�
��
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
��
��
�
�
�
�
�
��
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
���
�
�
��
�
��
�
�
�
�
�
�
�
��
���
�
�
����
�
�
�
�
��
�
�
�
�
�
��
�
�
�
�
�
��
�
�
�
�
�
�
�
�
��
�
�
���
����
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
���
��
�
�
�
��
�
�
�
�
�
�
�
�
�����
��
�
�
�
�
�
�
�
�
�
�
�
����
���
�
�
�
�
�
�
�
�
�
�
��
���
��
�
�
�
�
���
�
�
�
���
�
�
�
��
�
�
��
��
�
�
�
��
�
�
��
�
��
�
�
�
�
���
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
����
�
��
�
�
�
�
�
�
�
�
�
�
�
���
�
�
��
�
�
����
�
���
���
��
�
�
�
�
�
��
�
������
�
�
�
���
�
�
�
�
�
��
�
��
�
�
�
���
�
�
�
�
�
��
�
�
��
�
�
�
����
��
�
�
��
��
�
�
�
�
�
�
����
�
�
�
�
�
�
�
�
�
���
�
�
�
�
�
����
�
�
�
��
��
�
�
��
�
����
�
��
�
�
��
������
�
�
�
�
�
��� �
�
�
�
��
�
�
�
�
�
�
�
�
��
�
���
�
�
�
���
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
��
�
�
��
�
���
�
�
�
��
�
��
�
�
�
�
��
�
�
�
�
�
��
�
���
���
�
�
�
��
��
�
�
�
�
�
��
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
��
��
�
�
�
�
�
�
�
���
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
����
�
�
�
��
�
�
�
�
�
��
��
�
�
��
�
�
�
�
�
�
�
�
���
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
��
��
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
��
�
�
�
��
��
�
�
�
�
�
��
�
�
�
���
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
��
�
�
�
�
�
�
���
�
��
�
�
�
�
�
�
�
��
��
�
�
�
��
���
�
�
�
���
�
�
�
�
���
�
��
�
�
�
�
�
�
�
�
�
��
�
�
��
�
��
��
���
�
�
��
�
�
�
�
��
�
���
�
��
�
���
�������
�
�
�
�
�
��
�
��
�
�
�
�
�
�
�
����
��
�
�
�
�
��
���
�
�
�
�
��
��
�
�
�
�
�
�
�
��
�
�
��
��
��
�
�
�
�
�
�
�
�
�
��
���
�
��
�
�
��
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
��
�
�
�
�
�
��
�
�
�
�
��
�
�
��
�
�
�
�
�
�
����
�
�
�
�
�
�
��
��
���
�
�
�
�
�
��
��
��
������
�
�
�
��
�
�
�
�
��
�
�
�
�
�
��
−20
0
20
THY
MP
RA
DAC
CTH
CA
STA
DTG
CT
SE
CR
SA
RC
BLC
AH
CC
BR
CA
GB
MS
KC
MPA
AD
KD
NY
MIS
CC
OLO O
VH
NS
CE
SC
ALU
NG
CH
OLag
greg
ate
scor
e of
onc
ogen
ic p
athw
ay a
ctiv
ity [%
]
b
Extended Data Figure 3 | Global activity of oncogenic signatures. a, Activity of signatures is calculated relative to a normal tissue baseline: that is, activity scores are compared with a compendium of 36 normal tissues. Therefore, this plot represents a comparison of pathway activities between metastatic tissues and normal tissues. Increased activity (positive difference, red) or decreased activity (negative difference, blue) indicates that the signature genes are on average more (or less) expressed in a
metastatic tumour sample relative to the baseline (in average percentile point difference labelled ‘% diff activity’). Samples (columns) are ordered from left to right by decreasing average activity difference (column averages: that is, the aggregate score in b). b, Box plots summarizing the aggregate scores (column averages of ‘% diff activity’) in a. Analysis cohorts are ordered left to right by median aggregate scores. Error bars show the range from Q1 − 1.5 × IQR to Q3 + 1.5 × IQR.
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Article reSeArcH
PTEN_DN.V1_DNATM_DN.V1_UPVEGF_A_UP.V1_UPKRAS.50_UP.V1_UPLEF1_UP.V1_UPP53_DN.V1_DNATF2_S_UP.V1_DNATF2_UP.V1_DNAKT_UP.V1_DNCSR_LATE_UP.V1_DNCAHOY_NEURONALKRAS.KIDNEY_UP.V1_UPP53_DN.V2_UPKRAS.KIDNEY_UP.V1_DNKRAS.LUNG.BREAST_UP.V1_UPKRAS.LUNG_UP.V1_UPALK_DN.V1_UPHINATA_NFKB_IMMU_INFCYCLIN_D1_UP.V1_UPIL15_UP.V1_UPIL2_UP.V1_UPKRAS.50_UP.V1_DNGLI1_UP.V1_UPPTEN_DN.V2_UPE2F1_UP.V1_DNCRX_DN.V1_DNCAHOY_ASTROGLIALESC_V6.5_UP_EARLY.V1_DNTGFB_UP.V1_UPBMI1_DN_MEL18_DN.V1_UPBMI1_DN.V1_UPMEL18_DN.V1_UPMYC_UP.V1_DNPRC2_EZH2_UP.V1_DNKRAS.DF.V1_UPRAF_UP.V1_UPEGFR_UP.V1_UPYAP1_UPSNF5_DN.V1_UPCORDENONSI_YAPRPS14_DN.V1_UPHOXA9_DN.V1_UPSTK33_SKM_UPSTK33_NOMO_UPSTK33_UPBCAT.100_UP.V1_DNMTOR_UP.N4.V1_DNJAK2_DN.V1_DNCSR_EARLY_UP.V1_DNPRC2_EED_UP.V1_DNPDGF_ERK_DN.V1_DNE2F3_UP.V1_UPMYC_UP.V1_UPRB_P130_DN.V1_UPRB_DN.V1_UPGCNP_SHH_UP_LATE.V1_UPGCNP_SHH_UP_EARLY.V1_UPPRC2_EZH2_UP.V1_UPRB_P107_DN.V1_UPHOXA9_DN.V1_DNCSR_LATE_UP.V1_UPRPS14_DN.V1_DNE2F1_UP.V1_UPEIF4E_DNBCAT_BILD_ET_AL_DNRB_P130_DN.V1_DNMEK_UP.V1_DNLTE2_UP.V1_DNVEGF_A_UP.V1_DNERB2_UP.V1_DNPIGF_UP.V1_UPTBK1.DF_DNBMI1_DN_MEL18_DN.V1_DNMEL18_DN.V1_DNBMI1_DN.V1_DNRAF_UP.V1_DNBCAT.100_UP.V1_UPBCAT_GDS748_UPLEF1_UP.V1_DNESC_J1_UP_LATE.V1_UPMEK_UP.V1_UPERB2_UP.V1_UPHINATA_NFKB_MATRIXRB_DN.V1_DNCSR_EARLY_UP.V1_UPGLI1_UP.V1_DNSINGH_KRAS_DEPENDENCAKT_UP_MTOR_DN.V1_UPAKT_UP.V1_UPP53_DN.V1_UPBCAT_BILD_ET_AL_UPMTOR_UP.V1_UPMTOR_UP.N4.V1_UPSIRNA_EIF4GI_DNTBK1.DN.48HRS_DNTBK1.DN.48HRS_UPEIF4E_UPCAMP_UP.V1_UPSIRNA_EIF4GI_UPRB_P107_DN.V1_DN
% diffactivity
−20
−10
0
10
20
Extended Data Figure 4 | Relative activity of oncogenic signatures. Hierarchically clustered heatmap of activity scores for the most variable oncogenic signatures. In contrast to Supplementary Figure 7, here activity scores are computed intrinsically: that is, relative to other samples in the
MET500 (like ssGSEA or GSVA), which represents a relative comparison between different patients/samples. Red indicates that a signature is more active (in percentile points) for a given sample relative to the median activity; blue indicates that a signature is less active for a given sample.
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
ArticlereSeArcH
a b
12
34
56
78
INTERFERON_ALPHA_RESPONSEINTERFERON_GAMMA_RESPONSEALLOGRAFT_REJECTIONTNFA_SIGNALING_VIA_NFKBINFLAMMATORY_RESPONSEKRAS_SIGNALING_UPAPOPTOSISCOMPLEMENTIL6_JAK_STAT3_SIGNALINGIL2_STAT5_SIGNALINGCOAGULATIONEPITHELIAL_MESENCHYMAL_TRANSITIONANGIOGENESISHEDGEHOG_SIGNALINGAPICAL_SURFACETGF_BETA_SIGNALINGNOTCH_SIGNALINGHYPOXIAAPICAL_JUNCTIONMYOGENESISMYC_TARGETS_V2MYC_TARGETS_V1E2F_TARGETSG2M_CHECKPOINTDNA_REPAIRMITOTIC_SPINDLEP53_PATHWAYCHOLESTEROL_HOMEOSTASISADIPOGENESISFATTY_ACID_METABOLISMPEROXISOMEBILE_ACID_METABOLISMXENOBIOTIC_METABOLISMUNFOLDED_PROTEIN_RESPONSEGLYCOLYSISOXIDATIVE_PHOSPHORYLATIONPROTEIN_SECRETIONMTORC1_SIGNALINGPI3K_AKT_MTOR_SIGNALINGREACTIVE_OXIGEN_SPECIES_PATHWAYUV_RESPONSE_UPUV_RESPONSE_DNESTROGEN_RESPONSE_EARLYESTROGEN_RESPONSE_LATEANDROGEN_RESPONSEWNT_BETA_CATENIN_SIGNALINGHEME_METABOLISMPANCREAS_BETA_CELLSKRAS_SIGNALING_DNSPERMATOGENESIS
−40
−20
0
20
40
% diffactivity
INTERFERON_ALPHA_RESPONSEINTERFERON_GAMMA_RESPONSEALLOGRAFT_REJECTIONTNFA_SIGNALING_VIA_NFKBINFLAMMATORY_RESPONSEKRAS_SIGNALING_UPAPOPTOSISCOMPLEMENTIL6_JAK_STAT3_SIGNALINGIL2_STAT5_SIGNALINGCOAGULATIONEPITHELIAL_MESENCHYMAL_TRANSITIONANGIOGENESISHEDGEHOG_SIGNALINGAPICAL_SURFACETGF_BETA_SIGNALINGNOTCH_SIGNALINGHYPOXIAAPICAL_JUNCTIONMYOGENESISMYC_TARGETS_V2MYC_TARGETS_V1E2F_TARGETSG2M_CHECKPOINTDNA_REPAIRMITOTIC_SPINDLEP53_PATHWAYCHOLESTEROL_HOMEOSTASISADIPOGENESISFATTY_ACID_METABOLISMPEROXISOMEBILE_ACID_METABOLISMXENOBIOTIC_METABOLISMUNFOLDED_PROTEIN_RESPONSEGLYCOLYSISOXIDATIVE_PHOSPHORYLATIONPROTEIN_SECRETIONMTORC1_SIGNALINGPI3K_AKT_MTOR_SIGNALINGREACTIVE_OXIGEN_SPECIES_PATHWAYUV_RESPONSE_UPUV_RESPONSE_DNESTROGEN_RESPONSE_EARLYESTROGEN_RESPONSE_LATEANDROGEN_RESPONSEWNT_BETA_CATENIN_SIGNALINGHEME_METABOLISMPANCREAS_BETA_CELLSKRAS_SIGNALING_DNSPERMATOGENESIS
−20
−10
0
10
20
% diffactivity
Extended Data Figure 5 | Activity of cancer hallmarks in metastatic cancers. Clustered heatmaps of activity scores for the 50 MSigDB cancer hallmarks are shown. a, Gene expression patterns of cancer hallmark pathways. Average increase (red) or decrease (blue) in the relative expression levels (percentiles) of transcriptional signatures associated with the hallmarks of cancers. b, Activity scores are calculated relative
to a compendium of 36 normal tissues, which represent a comparison of hallmark activities between metastatic tissues and normal tissues (analogous to Supplementary Figure 7 but for a different gene set). Red indicates that a signature is more active (in percentile points) for a given sample relative to the median activity; blue indicates that a signature is less active for a given sample.
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Article reSeArcH
12
34
56
78
910
1112
1314
1516
1718
1920
2122
2324
25
HYPOXIA_DNLARGE_PRE_BII_LYMPHOCYTIMMATURE_B_LYMPHOCYTE_UEARLY_T_LYMPHOCYTE_DNHEMATOPOIETIC_CELL_LINERESPIRATORY_ELECTRON_TRRESPIRATORY_ELECTRON_TRVOXPHOSTCA_CYCLE_AND_RESPIRATOOXIDATIVE_PHOSPHORYLATIIFNA_RESPONSERESPONSE_TO_IFNG_UPINTERFERON_RESPONSIVE_GIFNB1_TARGETSSYSTEMIC_LUPUS_ERYTHEMAUNDIFFERENTIATED_CANCERMYC_TARGETS_UPNORMAL_QUIESCENT_VS_NORE2A_TARGETS_UPADIPOGENESIS_PEAK_AT_24RNA_POL_I_PROMOTER_OPENPACKAGING_OF_TELOMERE_EMEIOTIC_RECOMBINATIONTELOMERE_MAINTENANCEDEPOSITION_OF_NEW_CENPAPROTEASOME_PATHWAYPROTEASOMEREGULATION_OF_ORNITHINEVIF_MEDIATED_DEGRADATIOCDK_MEDIATED_PHOSPHORYLDASATINIB_RESISTANCE_UPINTEGRIN_CS_PATHWAYRESPONSE_TO_RADIATION_TMETASTASISGLIS2_TARGETS_UPLIVER_SPECIFIC_GENESLIVERPENTOSE_AND_GLUCURONATERETINOL_METABOLISMSTEROID_HORMONE_BIOSYNTASTHMATYPE_I_DIABETES_MELLITUALLOGRAFT_REJECTIONGRAFT_VERSUS_HOST_DISEAAUTOIMMUNE_THYROID_DISERESPONSE_TO_LEUKOTRIENENFKB_SIGNALINGTNF_SIGNALING_VIA_NFKBINFLAMMATORY_RESPONSE_LTNF_TARGETS_UPPEDIATRIC_CANCER_MARKERCELL_CYCLE_GENES_IN_IR_IL6_DEPRIVATION_DNE2F3_TARGETSTARGETS_OF_MYC_AND_TFRCHEAD_AND_NECK_CANCER_FSTRIATED_MUSCLE_CONTRACSOLEUS_VS_EDL_MYOFIBERSAKT1_SIGNALING_VIA_MTORBREAST_CANCER_PROGRESSIBREAST_CANCER_ESR1_UPBREAST_CANCER_CLASSES_UBREAST_CARCINOMA_METAPLDNA_STRAND_ELONGATIONACTIVATION_OF_THE_PRE_RBREAST_CANCER_WITH_BRCABOUND_BY_E2FEZH2_TARGETSBREAST_CANCER_CLUSTER_1LRRC3B_TARGETSRESPONSE_TO_DARAPLADIBSOFT_TISSUE_TUMORS_PCA1PSMD4_TARGETSSYNTHESIS_OF_DNAM_G1_TRANSITIONAGING_OLD_DNMELANOMA_METASTASIS_UPMITOTIC_PROMETAPHASEFORMATION_OF_THE_TERNAR3_UTR_MEDIATED_TRANSLATINFLUENZA_VIRAL_RNA_TRAPEPTIDE_CHAIN_ELONGATIORIBOSOMEFANCONI_PATHWAYTLX_TARGETS_36HR_DNNEP_NS2_INTERACTS_WITH_TRANSPORT_OF_RIBONUCLEOINTERACT_WITH_AIREZEB1_TARGETSBREAST_CANCER_BASAL_VS_BREAST_BASAL_VS_LUMINALCDH1_TARGETS_3_DNEPITHELIAL_DIFFERENTIATANGIOIMMUNOBLASTIC_LYMPPROSTATE_CANCER_DNBASEMENT_MEMBRANESENDMETRIUM_CANCER_DNPROTEOGLYCANSBREAST_LOBULAR_CARCINOMBREAST_CARCINOMA_METAPLLYMPHATIC_VESSELS_DURINMMP14_TARGETS_SUBSETCANCER_MESENCHYMAL_TRANIMATINIB_RESPONSEAML_CLUSTER_5DENTAL_CARIES_UPLIPA_TARGETS_3MLIPA_TARGETS_6MTARGETS_OF_PAX8_PPARG_FCANCER_META_SIGNATUREDLBCL_VS_FOLLICULAR_LYMMYC_TARGETS_AND_SERUM_RMYC_TARGETSBREAST_CANCER_CLUSTER_2DOXORUBICIN_RESISTANCE_CELL_CYCLE_LITERATUREE2F_TARGETSMULTIPLE_MYELOMA_PR_UPPRIMARY_IMMUNODEFICIENCIL27_PATHWAYRESPONSE_TO_ONCOCYTIC_VINTESTINAL_IMMUNE_NETWONKT_PATHWAY
% diff.activity
−40
−20
0
20
40
Extended Data Figure 6 | Discovery of oncogenic meta-signatures. Relative activity scores were computed for all experimental signatures in the MSigDB database across the MET500 cohort. The signatures were clustered into 25 meta-signatures on the basis of their activity profiles across the MET500. For each of the 25 meta-signature clusters, the 5 most
variable signatures were selected. Red indicates that a signature is more active (in percentile points) for a given sample relative to the median activity across the MET500; blue indicates that a signature is less active for a given sample.
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
ArticlereSeArcH
−0.3
−0.2
−0.1
0.0
0.1
0.2
0.3
−0.2 0.0 0.2EMT signature 1
S−p
hase
(FA
NC
A)
adrenalliverkidneygall_bladderpancreasesophagusbreastbladdercolonthymusprostatetestisparotidoralskinlungovaryunknownthyroidothersarcomatoidstomachbrain
imm
une
resp
onse
(1)
imm
une
resp
onse
(3)
TF −
NFk
Bim
mun
e re
spon
se (2
)in
terfe
ron
resp
onse
EM
T (1
)ex
trace
llula
r mat
rixE
MT
(2)
STA
T3 ta
rget
sliv
er s
tero
id a
nabo
lism
liver
(dru
g m
etab
olis
m)
liver
(can
cer)
liver
(am
ino
acid
)TC
A c
ycle
cell−
cycl
e G
1TF
− c
MY
Ctra
nsla
tion
(1)
trans
latio
n (2
)br
east
can
cer
EM
T (3
)pr
olife
ratio
nce
ll cy
cle
DN
A re
pair
trans
crip
tion
telo
mer
e m
aint
enan
ce
cervixadrenallivercolonbladderprostatepancreasoralbrainlungthyroidbone_marrowlymph_nodebreastskinsoft_tissue
liver
ste
roid
ana
bolis
mliv
er (d
rug
met
abol
ism
)liv
er (c
ance
r)liv
er (a
min
o ac
id)
brea
st c
ance
rE
MT
(3)
imm
une
resp
onse
(1)
inte
rfero
n re
spon
seim
mun
e re
spon
se (3
)TF
− N
FkB
imm
une
resp
onse
(2)
extra
cellu
lar m
atrix
EM
T (1
)E
MT
(2)
STA
T3 ta
rget
sce
ll−cy
cle
G1
TF −
cM
YC
TCA
cyc
lete
lom
ere
mai
nten
ance
prol
ifera
tion
cell
cycl
etra
nscr
iptio
nD
NA
repa
irtra
nsla
tion
(1)
trans
latio
n (2
)
−40
−20
0
20
40
% diffactivity
immune response (1)immune response (2)allograft rejectioninterferon responseimmune response (3)hypoxiainflammationepithelial genesmetastasisEMT (1)EMT (2)muscleproliferation (2)cell cycle lit.DNA replicationcell cycle (reactome)proliferation (1)S−phasetelomere maintenanceproteasome pathwayTCA cyclecancer sig.translation
Breastimmune response (1)allograft rejectionimmune response (3)hypoxiaimmune response (2)interferon responseinflammationEMT (2)EMT (1)metastasismuscleepithelial genesproliferation (2)cell cycle lit.DNA replicationcell cycle (reactome)proliferation (1)S−phasetelomere maintenanceproteasome pathwaycancer sig.translationTCA cycle
Prostateimmune response (1)allograft rejectionimmune response (2)immune response (3)hypoxiainflammationinterferon responsemetastasisepithelial genesEMT (2)EMT (1)muscleproliferation (2)cell cycle lit.DNA replicationproliferation (1)cell cycle (reactome)S−phasetelomere maintenanceproteasome pathwaycancer sig.TCA cycletranslation
Soft Tissue
Lymph Nodeallograft rejectionhypoxiaimmune response (3)immune response (1)immune response (2)interferon responsemetastasisinflammationEMT (2)EMT (1)muscleepithelial genesproliferation (1)cell cycle (reactome)proliferation (2)cell cycle lit.DNA replicationS−phaseproteasome pathwaycancer sig.translationTCA cycletelomere maintenance
immune response (1)immune response (2)allograft rejectioninterferon responseimmune response (3)hypoxiainflammationepithelial genesmetastasisEMT (1)EMT (2)muscleproliferation (2)cell cycle lit.DNA replicationcell cycle (reactome)proliferation (1)S−phasetelomere maintenanceproteasome pathwayTCA cyclecancer sig.translation
Sarcoma Liverimmune response (1)immune response (2)interferon responseimmune response (3)hypoxiaallograft rejectionmetastasisinflammationEMT (2)EMT (1)muscleproliferation (2)cell cycle lit.DNA replicationS−phaseproliferation (1)cell cycle (reactome)telomere maintenancecancer sig.translationproteasome pathwayTCA cycleepithelial genes
a b c
d
Extended Data Figure 7 | Activity of the oncogenic meta-signatures. a, Relative activity of EMT and proliferation signatures across the TCGA analysis meta-cohort. b, Relative activity of the 25 meta-signatures across MET500 samples from different biopsy sites. Red indicates that a signature is more active for a given biopsy site relative to the median activity; blue indicates that a signature is less active for a given biopsy site. c, Relative activity of the 25 meta-signatures across samples from different normal tissues. Red indicates that a signature is more active (in percentile points)
for a given tissue relative to the median activity; blue indicates that a signature is less active for a given tissue. d, Correlations between the 25 meta-signatures. Correlation heatmap and hierarchical clustering showed similarities (red) and dissimilarities (blue) in the transcriptional activity of computationally derived aggregate sets of MSigDB signatures: that is, ‘meta-signatures’ across samples from the MET500 stratified by the most common primary tumour type (left) and biopsy site (right).
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Article reSeArcH
−200
−100
0
100
200
40 60 80 100Tumor Content (DNA based)
MIm
mS
core
�
�
�
�
�
�
�
��
�
� �
�
�
�
�
�
�
�
�
�
��
�
���
��
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
���
�
��
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
��
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
� �
�
�
�
�
���
��
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
��
�
�
��
� �
�
�
�
�
�
�
�
�
�
�
�
�
�
� �
�
�
�
�
�
�
�
�
�
�
��
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
� �
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
��
�
�
�
�
�
�
� �
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
� �
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
��
�
�
�
�
�
�
���
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�� �
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
��
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
��
�
� �
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
��
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
� �
�
��
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
� �
�
�
�
−300
−200
−100
0
100
200
−200 −100 0 100 200MImmScore
stro
mal
sco
re
0
5
10
15
0e+00 1e+05 2e+05 number of T−cells (T-cell repertoire profiling)
T−ce
ll ex
pres
sion
sco
re
(R
NA
-seq
bas
ed)
0e+00
1e+05
2e+05
high lowMImmScore
num
ber o
f T−c
ells
a b
c d
rho=0.96***
***
r=-0.62*** r=0.69***
Extended Data Figure 8 | Prediction of immune infiltration in cancer tissues. a, Correlation between the MImmScore, a measurement of absolute immune infiltration in a tumour sample, with tumour content estimated from exome DNA-seq using CNVs and SNVs. b, Correlation between MImmScores and an analogous score for tumour-stromal infiltration. c, Correlation between a T-cell expression score summarizing the expression levels (RNA-seq-based) of marker genes CD3D, CD3E,
CD3G, CD6, SH2D1A, TRAT1 and the estimated number of T cells based on T-cell repertoire profiling (DNA-based). d, Number of T cells based on T-cell repertoire profiling for index cases stratified into MImmScore low (< 0) or MImmScore high (> 0). Error bars show the range from Q1 − 1.5 × IQR to Q3 + 1.5 × IQR. Significance levels of Spearman’s rank correlation coefficient test: * P = 0.05–0.001, * * P = 0.001–10−6, ***P < 10−6.
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
ArticlereSeArcH
−200
−100
0
100
200
blad
der
liver
brai
n
soft_
tissu
e
bone
_mar
row
skin
lym
ph_n
ode
lung oral
biopsy siteM
Imm
Sco
re
0
526104
3
0
4
8
7
19
20
5
8
1816
1
13
10
2
4
0
20
40
60
80
ACC
GB
MTG
CT
SE
CR
STA
DP
RA
DE
SC
AS
AR
CB
RC
AB
LCA
HN
SC
CO
LO OV
CH
OL
MIS
CPA
AD
HC
CTH
CA
LUN
GS
KC
MK
DN
YTH
YM
% M
Imm
Sco
re h
igh
patie
nts
0
250
500
750
0e+00 1e+05 2e+05number of T−cells
T-cell repertoire profiling
num
ber o
f exp
ande
d cl
ones
0.0
0.5
1.0
1.5
2.0
0e+00 1e+05 2e+05
T re
g to
T C
D8+
ratio
b
c
e
d
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
−200
−100
0
100
200
UV
MAC
CU
CS
LGG
KIC
HP
CP
GP
RA
DC
OA
DTH
CA
UC
EC
RE
AD
LIH
CC
ES
CE
SC
AC
HO
LH
NS
CO
VS
AR
CB
RC
AK
IRP
BLC
AG
BM
SK
CM
TGC
TS
TAD
ME
SO
LUS
CPA
AD
LUA
DK
IRC
THY
MLA
ML
DLB
C
MIm
mS
core
a
0
20
40
60
CO
AD
KIC
HP
CP
GR
EA
DU
CS
ACC
LGG
LIH
CO
VP
RA
DU
VM
UC
EC
BR
CA
CE
SC
ES
CA
GB
MH
NS
CK
IRP
CH
OL
ME
SO
TGC
TTH
CA
SK
CM
BLC
APA
AD
LUS
CS
TAD
SA
RC
LUA
DK
IRC
THY
M
% M
Imm
Sco
re h
igh
patie
nts
MET500
TCGATCGA
f
number of T−cellsT-cell repertoire profiling
rho=0.99***
rho=-0.61**
Extended Data Figure 9 | Differential immune infiltration in various cancer types. a, Distribution of MImmScores, a measurement of the magnitude of immune infiltration in a tumour sample, for MET500 samples/patients grouped by tumour biopsy site. b, Distribution of MImmScores across the TCGA meta-cohort, grouped by primary cancer designation. Haematological malignancies (DLBC, LAML) are included as positive control. Error bars show the range from Q1 − 1.5 × IQR to Q3 + 1.5 × IQR. c, Percentage of patients in each of the MET500 analysis cohorts with a high MImmScore, defined here as > 80th percentile across the whole MET500. The total number of cases with high MImmScore is
indicated above each bar. d, Same as c but for the TCGA meta-cohort. e, Correlation between the total number of T cells (templates) based on T-cell repertoire (DNA-seq) of the TCR CDR3 sequence and the number of expanded clones (an expanded T-cell clone is defined as having more than 30 cells with the same CDR3 sequence). f, Ratio of expression levels for markers of CD8+ T cells (CD8A, CD8B) and regulatory T cells (FOXP3) as a function of the total number of T cells. Significance levels of Spearman’s rank correlation coefficient: * P = 0.05–0.001, **P = 0.001–10−6, * * * P < 10−6.
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Article reSeArcH
−200
−100
0
100
200M
Imm
Sco
re
highlymutated
a
b c
−5
0
5
10
AP
C−1
AP
C−4
AP
C−3
AP
C−6
AP
C−5
AP
C−7
AP
C−2
AP
C−8
AP
C−0
Chi
−squ
ared
resi
dual
T−cellcluster
Tcell−1Tcell−4
p−value: 2.75e−36 (Chi−squared)
Tcell−2
Tcell−1
Tcell−4
Tcell−6
Tcell−7
Tcell−0
Tcell−5
Tcell−3
AP
C−8
AP
C−0
AP
C−7
AP
C−6
AP
C−2
AP
C−5
AP
C−3
AP
C−4
AP
C−1
Pearson'sresidual
1050−5
p=1.7e-6
Extended Data Figure 10 | Genomic correlates of immune infiltration. a, Association between the MImmScore and mutation status (hypermutated samples have been defined here as having > 250 non-synonymous mutations). Statistical significance of this association was determined using logistic regression. b, c, A χ2 test for independence is used to determine whether the clusterings of samples based on T-cell and APC markers are independent. Enrichment or depletion is calculated
as the Pearson’s residual. Red indicates (positive enrichment) that the clusters overlap significantly. Blue indicates (depletion) that clusters tend to be mutually exclusive. Clustered heatmap of enrichment levels (χ2 table cell residuals) is shown in b. Enrichment levels for clusters for the active Tcell-1 and Tcell-4 clusters and all APC clusters (APC-1, -4 active) are shown in c.
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
1
nature research | life sciences reporting summ
aryJune 2017
Corresponding Author: Arul M. Chinnaiyan
Date: June 2, 2017
Life Sciences Reporting SummaryNature Research wishes to improve the reproducibility of the work we publish. This form is published with all life science papers and is intended to promote consistency and transparency in reporting. All life sciences submissions use this form; while some list items might not apply to an individual manuscript, all fields must be completed for clarity.
For further information on the points included in this form, see Reporting Life Sciences Research. For further information on Nature Research policies, including our data availability policy, see Authors & Referees and the Editorial Policy Checklist.
Experimental design1. Sample size
Describe how sample size was determined. Sample size was not predetermined
2. Data exclusions
Describe any data exclusions. No exclusion applied
3. Replication
Describe whether the experimental findings were reliably reproduced. No replicates for the clinical sequencing portion. Every patient was sequenced once. Methods for data analysis were provided and can be reproduced.
4. Randomization
Describe how samples/organisms/participants were allocated into experimental groups.
The experiments were not randomized
5. Blinding
Describe whether the investigators were blinded to group allocation during data collection and/or analysis.
This study is not blinded.
Note: all studies involving animals and/or human research participants must disclose whether blinding and randomization were used.
6. Statistical parameters For all figures and tables that use statistical methods, confirm that the following items are present in relevant figure legends (or the Methods section if additional space is needed).
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement (animals, litters, cultures, etc.)
A description of how samples were collected, noting whether measurements were taken from distinct samples or whether the same sample was measured repeatedly.
A statement indicating how many times each experiment was replicated
The statistical test(s) used and whether they are one- or two-sided (note: only common tests should be described solely by name; more complex techniques should be described in the Methods section)
A description of any assumptions or corrections, such as an adjustment for multiple comparisons
The test results (e.g. p values) given as exact values whenever possible and with confidence intervals noted
A summary of the descriptive statistics, including central tendency (e.g. median, mean) and variation (e.g. standard deviation, interquartile range)
Clearly defined error bars
See the web collection on statistics for biologists for further resources and guidance.
2
nature research | life sciences reporting summ
aryJune 2017
SoftwarePolicy information about availability of computer code
7. Software
Describe the software used to analyze the data in this study. Publicly available softwares for Next-generation Sequencing analysis: Illumina CASAVA, Novoalign, SAMtools, Picard, Pindel, ANNOVAR69, CRISP and others. Customized softwares: Code provided
For all studies, we encourage code deposition in a community repository (e.g. GitHub). Authors must make computer code available to editors and reviewers upon request. The Nature Methods guidance for providing algorithms and software for publication may be useful for any submission.
Materials and reagentsPolicy information about availability of materials
8. Materials availability
Indicate whether there are restrictions on availability of unique materials or if these materials are only available for distribution by a for-profit company.
Patient samples are consented for research purpose, but are restricted for distribution.
9. Antibodies
Describe the antibodies used and how they were validated for use in the system under study (i.e. assay and species).
N/A
10. Eukaryotic cell linesa. State the source of each eukaryotic cell line used. N/A
b. Describe the method of cell line authentication used. N/A
c. Report whether the cell lines were tested for mycoplasma contamination.
N/A
d. If any of the cell lines used in the paper are listed in the database of commonly misidentified cell lines maintained by ICLAC, provide a scientific rationale for their use.
N/A
Animals and human research participantsPolicy information about studies involving animals; when reporting animal research, follow the ARRIVE guidelines
11. Description of research animalsProvide details on animals and/or animal-derived materials used in the study.
N/A
Policy information about studies involving human research participants
12. Description of human research participantsDescribe the covariate-relevant population characteristics of the human research participants.
N/A