transcriptome analysis of recurrently deregulated genes ...p1@tert lymphoid myeloid bone lymphoid...

12
Integrated Systems and Technologies Transcriptome Analysis of Recurrently Deregulated Genes across Multiple Cancers Identies New Pan-Cancer Biomarkers Bogumil Kaczkowski 1 , Yuji Tanaka 1,2 , Hideya Kawaji 1,2,3 , Albin Sandelin 4 , Robin Andersson 4 , Masayoshi Itoh 1,3 , Timo Lassmann 1,5 , the FANTOM5 consortium, Yoshihide Hayashizaki 3 , Piero Carninci 1 , and Alistair R.R. Forrest 1,6 Abstract Genes that are commonly deregulated in cancer are clinically attractive as candidate pan-diagnostic markers and therapeutic targets. To globally identify such targets, we compared Cap Analysis of Gene Expression proles from 225 different cancer cell lines and 339 corresponding primary cell samples to identify transcripts that are deregulated recurrently in a broad range of cancer types. Comparing RNA-seq data from 4,055 tumors and 563 normal tissues proled in the The Cancer Genome Atlas and FANTOM5 datasets, we identied a core transcript set with ther- anostic potential. Our analyses also revealed enhancer RNAs, which are upregulated in cancer, dening promoters that overlap with repetitive elements (especially SINE/Alu and LTR/ERV1 elements) that are often upregulated in cancer. Lastly, we docu- mented for the rst time upregulation of multiple copies of the REP522 interspersed repeat in cancer. Overall, our genome- wide expression proling approach identied a comprehensive set of candidate biomarkers with pan-cancer potential, and extended the perspective and pathogenic signicance of repetitive elements that are frequently activated during cancer progression. Cancer Res; 76(2); 21626. Ó2015 AACR. Introduction Successful cancer treatment depends heavily on early detection and diagnosis. Despite decades of research, relatively few bio- markers are routinely used in clinics (e.g., CA-125 and PSA in ovarian and prostate cancers, respectively; refs. 1, 2). There is a need for reliable and clinically applicable new cancer biomarkers for early detection. Cancers originating in the same tissue can be very heterogeneous, often being derived from different cell types and having drastically different mutation proles (3). At the same time, cancers from different tissues can share some common features, for example, The Cancer Genome Atlas (TCGA) has found genes and pathways, DNA copy number alterations, muta- tions, methylation, and transcriptome changes that recur across 12 different primary tumor types (4). Here using Cap Analysis of Gene Expression (CAGE) data collected for the Functional ANnoTation Of Mammalian genome (FANTOM5) project (5), we identied mRNAs, long-noncoding RNAs (lncRNA), enhancer RNAs (eRNA), and RNAs initiating from within repeat elements, which are recur- rently perturbed in cancer cell lines. To conrm that these transcripts are relevant to tumors, we compared their expres- sion in 4,055 primary tumors and 563 matching tissue sets RNA-seq proled by the TCGA (6) and in a set of colorectal tumor (7) samples proled proteomically. Finally, for the most promising biomarker candidates we performed qRT-PCR vali- dations in cancer cell lines and tumor cDNA panels. Taken together, our analyses allowed for identication of a set of robust pan cancer biomarker candidates, which have the poten- tial for development as blood biomarkers for early detection and for histological screening of biopsies. This work is part of the FANTOM5 project. Data download, genomic tools, and copublished manuscripts have been summa- rized at the FANTOM5 website (8). Materials and Methods FANTOM5 data We used the cap analysis of gene expression (CAGE) data from the FANTOM5 project (libraries sequenced to a median depth of 4 million mapped tags; ref. 5). We used 564 CAGE proles: 225 cancer cell lines and 339 primary cells samples. We split the data into three data sets: (i) matched solid, (ii) unmatched solid, and 1 RIKEN Center for Life Science Technologies, Division of Genomic Technologies, 1-7-22 Suehiro-cho,Tsurumi-ku,Yokohama, Kanagawa, Japan. 2 RIKEN Advanced Center for Computing and Communication, Preventive Medicine and Applied Genomics unit, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Japan. 3 RIKEN Preventive Medicine & Diag- nosis Innovation Program, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan. 4 The Bioinformatics Centre, Department of Biology and Bio- tech Research and Innovation Centre (BRIC), University of Copenha- gen, Copenhagen, Denmark. 5 Telethon Kids Institute, the University of Western Australia, Perth,Western Australia, Australia. 6 Harry Perkins Institute of Medical Research, QEII Medical Centre and Centre for Medical Research, the University of Western Australia, Nedlands, Western Australia, Australia. Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/). Corresponding Authors: Bogumil Kaczkowski, RIKEN Center for Life Science Technologies (CLST), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan. Phone: 81-45-503-9222; Fax: 81-45-503-9216; E-mail: [email protected]; and Alistair R.R. Forrest, QEII Medical Centre and Centre for Medical Research, the University of Western Australia, 6 Verdun Street, Nedlands, WA 6009, Australia. Phone: 61-8-6151-0780; Fax: 61-8-6151- 0701; E-mail: [email protected] doi: 10.1158/0008-5472.CAN-15-0484 Ó2015 American Association for Cancer Research. Cancer Research Cancer Res; 76(2) January 15, 2016 216 on July 10, 2021. © 2016 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from Published OnlineFirst November 9, 2015; DOI: 10.1158/0008-5472.CAN-15-0484

Upload: others

Post on 20-Feb-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

  • Integrated Systems and Technologies

    Transcriptome Analysis of RecurrentlyDeregulated Genes across Multiple CancersIdentifies New Pan-Cancer BiomarkersBogumil Kaczkowski1, Yuji Tanaka1,2, Hideya Kawaji1,2,3, Albin Sandelin4,Robin Andersson4, Masayoshi Itoh1,3, Timo Lassmann1,5, the FANTOM5 consortium,Yoshihide Hayashizaki3, Piero Carninci1, and Alistair R.R. Forrest1,6

    Abstract

    Genes that are commonly deregulated in cancer are clinicallyattractive as candidate pan-diagnostic markers and therapeutictargets. To globally identify such targets, we compared CapAnalysis of Gene Expression profiles from 225 different cancercell lines and 339 corresponding primary cell samples to identifytranscripts that are deregulated recurrently in a broad range ofcancer types. Comparing RNA-seq data from 4,055 tumors and563 normal tissues profiled in the The Cancer Genome Atlas andFANTOM5 datasets, we identified a core transcript set with ther-anostic potential. Our analyses also revealed enhancer RNAs,

    which are upregulated in cancer, defining promoters that overlapwith repetitive elements (especially SINE/Alu and LTR/ERV1elements) that are often upregulated in cancer. Lastly, we docu-mented for the first time upregulation of multiple copies ofthe REP522 interspersed repeat in cancer. Overall, our genome-wide expression profiling approach identified a comprehensiveset of candidate biomarkers with pan-cancer potential, andextended the perspective and pathogenic significance of repetitiveelements that are frequently activated during cancer progression.Cancer Res; 76(2); 216–26. �2015 AACR.

    IntroductionSuccessful cancer treatment depends heavily on early detection

    and diagnosis. Despite decades of research, relatively few bio-markers are routinely used in clinics (e.g., CA-125 and PSA inovarian and prostate cancers, respectively; refs. 1, 2). There is aneed for reliable and clinically applicable new cancer biomarkersfor early detection. Cancers originating in the same tissue can bevery heterogeneous, often being derived from different cell typesand having drastically differentmutation profiles (3). At the same

    time, cancers from different tissues can share some commonfeatures, for example, The Cancer Genome Atlas (TCGA) hasfound genes and pathways, DNA copy number alterations, muta-tions, methylation, and transcriptome changes that recur across12 different primary tumor types (4).

    Here using Cap Analysis of Gene Expression (CAGE) datacollected for the Functional ANnoTation Of Mammaliangenome (FANTOM5) project (5), we identified mRNAs,long-noncoding RNAs (lncRNA), enhancer RNAs (eRNA), andRNAs initiating from within repeat elements, which are recur-rently perturbed in cancer cell lines. To confirm that thesetranscripts are relevant to tumors, we compared their expres-sion in 4,055 primary tumors and 563 matching tissue setsRNA-seq profiled by the TCGA (6) and in a set of colorectaltumor (7) samples profiled proteomically. Finally, for the mostpromising biomarker candidates we performed qRT-PCR vali-dations in cancer cell lines and tumor cDNA panels. Takentogether, our analyses allowed for identification of a set ofrobust pan cancer biomarker candidates, which have the poten-tial for development as blood biomarkers for early detectionand for histological screening of biopsies.

    This work is part of the FANTOM5 project. Data download,genomic tools, and copublished manuscripts have been summa-rized at the FANTOM5 website (8).

    Materials and MethodsFANTOM5 data

    We used the cap analysis of gene expression (CAGE) data fromthe FANTOM5project (libraries sequenced to amedian depth of 4million mapped tags; ref. 5). We used 564 CAGE profiles: 225cancer cell lines and 339 primary cells samples. We split the datainto three data sets: (i) matched solid, (ii) unmatched solid, and

    1RIKEN Center for Life Science Technologies, Division of GenomicTechnologies, 1-7-22 Suehiro-cho, Tsurumi-ku,Yokohama, Kanagawa,Japan. 2RIKENAdvanced Center for Computing and Communication,Preventive Medicine and Applied Genomics unit, 1-7-22 Suehiro-cho,Tsurumi-ku, Yokohama, Japan. 3RIKEN Preventive Medicine & Diag-nosis Innovation Program, 2-1 Hirosawa, Wako, Saitama 351-0198,Japan. 4The Bioinformatics Centre, Department of Biology and Bio-tech Research and Innovation Centre (BRIC), University of Copenha-gen, Copenhagen, Denmark. 5TelethonKids Institute, the University ofWestern Australia, Perth,Western Australia, Australia. 6Harry PerkinsInstitute of Medical Research, QEII Medical Centre and Centre forMedical Research, the University of Western Australia, Nedlands,Western Australia, Australia.

    Note: Supplementary data for this article are available at Cancer ResearchOnline (http://cancerres.aacrjournals.org/).

    Corresponding Authors: Bogumil Kaczkowski, RIKEN Center for Life ScienceTechnologies (CLST), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045,Japan. Phone: 81-45-503-9222; Fax: 81-45-503-9216; E-mail:[email protected]; and Alistair R.R. Forrest, QEII Medical Centre andCentre for Medical Research, the University of Western Australia, 6 VerdunStreet, Nedlands, WA 6009, Australia. Phone: 61-8-6151-0780; Fax: 61-8-6151-0701; E-mail: [email protected]

    doi: 10.1158/0008-5472.CAN-15-0484

    �2015 American Association for Cancer Research.

    CancerResearch

    Cancer Res; 76(2) January 15, 2016216

    on July 10, 2021. © 2016 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

    Published OnlineFirst November 9, 2015; DOI: 10.1158/0008-5472.CAN-15-0484

    http://cancerres.aacrjournals.org/

  • p1@TERT

    Lym

    phoi

    d

    Mye

    loid

    Bone

    Lym

    phoi

    d

    Mye

    loid

    Bone

    Switching Expression shift

    B

    Bloodmatched

    p1@POLQ

    2

    3

    Brai

    n

    Kidn

    ey

    Live

    r

    Lung

    Mel

    anoc

    ytes

    Mes

    othe

    lium

    Brea

    st

    Pros

    tate

    Ova

    ry

    Lym

    phoi

    d

    Mye

    loid

    Bone

    Brai

    n

    Kidn

    ey

    Live

    r

    Lung

    Mel

    anoc

    ytes

    Mes

    othe

    lium

    Brea

    st

    Pros

    tate

    Ova

    ry

    Lym

    phoi

    d

    Mye

    loid

    Bone

    Brai

    n

    Kidn

    ey

    Live

    r

    Lung

    Mel

    anoc

    ytes

    Mes

    othe

    lium

    Brea

    st

    Pros

    tate

    Ova

    ry

    p1@NAALADL1 p1@C13orf15

    Brai

    n

    Kidn

    ey

    Live

    r

    Lung

    Mel

    anoc

    ytes

    Mes

    othe

    lium

    Brea

    st

    Pros

    tate

    Ova

    ry

    ON/OFF UP/DOWN

    Upregulated

    Downregulated

    CANCER

    NORMAL

    Solidmatched

    Bloodmatched

    Solidmatched

    C

    D

    A

    DE pipeline DE pipeline DE pipeline

    10 MATCHED origins SOLID tumors

    72 cancer cell lines 65 primary cells

    2 MATCHED origins BLOOD cancers

    51 cancer cell lines 74 primary cells

    UNMATCHED origins SOLID cancers

    102 cancer cell lines 200 primary cells

    FANTOM5 DATA 225 cancer cell lines

    339 primary cells

    pE ipeliniE li pipeline

    Solid cancer only differential expression

    Pan-cancer differential expression

    edgeR andON/OFF analysis

    Overlapping features

    Gene level log2 FC

    Pro

    mot

    er le

    vel l

    og2

    FC

    ′ WiNon

    Gene-

    Gene-

    Promoter

    Figure 1.Summary of comparisons carried out toidentify recurrently perturbedtranscripts in the FANTOM5 cell linedataset. A, differential expression (DE)pipeline applied to the FANTOM5 data.B, examples of differentially expressedpromoters showing expressionswitching (ON and OFF) and expressionshift (UP and DOWN). C, comparisonbetween promoter and gene leveldifferential expression (based on CAGEdata). Note: Although the majority ofdifferentially expressed promotersreflect gene-wise differentialexpression, a significant fraction behavedifferently, for example,MPP2 or BCAT1.D, table summarizing the number ofpromoters and genes showingdifferential expression. Numbers inparentheses indicate numbers of uniquegenes.

    Pan-Cancer Transcriptome

    www.aacrjournals.org Cancer Res; 76(2) January 15, 2016 217

    on July 10, 2021. © 2016 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

    Published OnlineFirst November 9, 2015; DOI: 10.1158/0008-5472.CAN-15-0484

    http://cancerres.aacrjournals.org/

  • (iii) matched blood (Supplementary Table S1A and S1C for list ofcancer types and sample annotation). The CAGE tag countsunder 184,827 robust decomposition-based peak identification(DPI) clusters (5) were used to represent a promoter-level expres-sion. For the enhancer activity, we used the CAGE tags countsunder 43,011 enhancer regions identified in ref. 9.

    FANTOM5 differential expression analysisTo identify up- and downregulated transcripts in cancer cell

    lines versus normal primary cells, we used Genewise NegativeBinomial Generalized Linear Models as implemented in edgeR(10). The cancer versus normal comparison was performedusing glmLRT function. In matched solid comparison, we setequal weight for each solid cancer type, each type contributingequally to overall comparison. In the matched solid andmatched blood dataset, simple cancer versus normal compar-ison was performed.

    The P-values were adjusted for multiple testing by Benjamini–Hochberg method. The thresholds of fold change >4 and FDR 0), OFF (not detected, count ¼0). We then calculated the frequency of expression in cancer andnormal samples. Features expressed four timesmore frequently incancer than in normal samples were selected as "ON in cancer,"whereas features not expressed/lost four times more often incancer than in normal samples mere selected as "OFF in cancer."The procedure was applied to each dataset (matched solid,unmatched solid, and matched blood). The significance of theassociation (contingency) between ON/OFF status and cancer/normal status was tested by two-sided Fisher exact test withadjustment formultiple testing by Benjamini–Hochbergmethod.The threshold of FDR < 0.01 was used. The pipeline of differentialexpression described above was applied separately to the DPI/promoter counts and enhancer counts. The features found dif-ferentially expressed in all three datasets were selected as "pan"cancer features, whereas features differentially expressed inmatched and unmatched solid datasets only were selected as"solid only" cancer features.

    TCGA RNA-seq dataWe obtained the RNA-Seq profiling data of 4,055 cancer

    samples and 563 normal tissues data from The Cancer GenomeAtlas (TCGA) Data Portal (data status as of Aug 5, 2013, originlisted in Supplementary Table S1B; ref. 6). The profiles repre-sented 14 solid cancer types for which both tumor and normaltissue sampleswere available.We downloaded level 3 RNASeqV2,

    upper quartile normalized RSEM count estimates with expressionprofiles of 20,531 genes in 4,618 samples.

    The counts were log2 transformed and used as an inputexpression data to LIMMA.

    The cancer versus normal comparison was performed usingequal weight for each solid cancer type, each type contributingequally to overall comparison. The P-values were adjusted formultiple testing by Benjamini–Hochbergmethod. The thresholdsof fold change >2 and FDR

  • ResultsIdentification of transcripts recurrently up- or downregulatedin cancer cell lines

    Using CAGE data collected for the FANTOM5 (5, 9) project, wecompared expression levels of transcripts from 184,827 promoterand 43,011 enhancer regions between a panel of 225 cancer celllines and a panel of 339 primary cell samples (samples IDs andtheir annotation is listed in Supplementary Table S1C).

    First, the cancer cell line and primary cell datasets weredivided into three subsets (see Supplementary Table S1A); celllines and primary cells from solid tissues or blood lineages thatcould be matched are referred to as matched-solid or matched-blood. The remaining samples from solid tissue are referred toas unmatched-solid.

    In each subset, we identified promoters that were differen-tially expressed between cancer and normal (edgeR; ref. 10,>4-fold change, FDR < 0.01). We also performed an alternativebinary analysis (we refer to it as an ON/OFF analysis) toidentify transcripts that were consistently switched off orswitched on in cancer [four times more often expressed(switched ON) or not detected (switched OFF) in the cancergroup compared to the normal group, using a significance levelof FDR < 0.01 by Fisher exact test (examples on Fig. 1B)]. Theresults of the ON/OFF and edgeR analyses were then merged toobtain a final selection of up- and downregulated promoters(Fig. 1A and Supplementary Table S2).

    In total, 2,108promoters were differentially expressed in cancercell lines. Seven hundred and eighty-one were consistently upregulated in all three comparisons and a further 814 were up onlyin solid cancers. Conversely 99 were consistently down-regulatedin all three datasets and a further 414 were down only in solidcancers (Table 1). Sixty-three percent of the differentiallyexpressed peaks overlapped protein-coding genes, 12% over-lapped long noncoding genes (GENCODE v19; ref. 16) and25% were not associated to any known genes (SupplementaryTable S3).

    In some cases, the CAGE analysis identified alternative pro-moters. Comparing the gene-wise differential expression (totalCAGE signal for the same gene) to the differential expression ofindividual promoters (Fig. 1C), we found that for 23% of differ-entially expressed protein coding genes, at least one alternativepromoter behaved differently to that of the whole gene, whereasfor lncRNAs (which have fewer alternative promoters) it was only5% (Fig. 1D).

    Differentially expressed protein coding genes are enriched incancer-associated genes

    Focusing on CAGE peaks unambiguously at the 50 end ofprotein coding genes (�500 bp from the 50 end of annotated

    Blood

    Bone

    Brain

    Breast

    Kidney

    LiverLungM

    elanocyteM

    esotheliumO

    varyP

    rostate

    BLC

    AB

    RC

    AC

    OA

    DH

    NS

    CK

    ICH

    KIR

    CK

    IRP

    LIHC

    LUA

    DLU

    SC

    PR

    AD

    RE

    AD

    TH

    CA

    UC

    EC

    PRAMEZIC2ZIC5GABRDHOXB13OTX1DLX4DLX6TERTONECUT2HOXC13FABP6TNNT1TP73MNX1E2F8GRIN2DDLX1TOP2ANMUMMP13CD70KIF14CENPFPKMYT1MKI67EXO1SRCIN1MYEOVSLCO1B3IGF2BP3FAM111BBUB1BTM4SF19TMEM145E2F7POLQRAD54LSKA3CDC6SGOL1KIF18ANUSAP1MAST1RHPN1CBX2CLSPNSLC7A11XRCC2RDM1CCNOKRT80GPR19RNFT2KIF23PDX1ATAD2HELLSMLF1IPRCOR2BLMCCDC150CCNE2CASC5CHRNA5DNAH14MCM2ATAD5PABPC1LRACGAP1KIFC2CDC7DBF4RFC4ASNSMTHFD2

    ACOX2LYNX1PTRFCLIP3SLIT3BST1JAM3THBDPTGS1PEAR1ANPEPARMCX1BEX4KCTD12ZNF677FBLN5FLRT2TWIST2CXCL2TNFAIP8L3TMEM220TIMP3TSPYL5MT1AAPODIGF2DNALI1SERPING1GYPCS1PR1MYL9NAALADL1GPX3SERP2TPM2OSR1PHYHD1NDNPHYHIPFEZ1PCDHGA12SRPXMT1EPAMR1DCNSFRP1AOX1FABP4NKAPLCOX7A1HSPB6TCEAL7

    −10

    −5

    0

    5

    10

    log2FCcancer vs.normal

    FANTOM5 CAGE TCGA RNA-seq

    Figure 2.Pan-cancer biomarker candidates. Genes aberrantly expressed in bothFANTOM5cancer cell lines (>4 fold change or four times expressiongain/loss,FDR 2, FDR

  • Primer spanUpregulated, confirmed by qPCR

    hg19 chr1 46482587..46621931+ [len 139.3kb ]

    46500000 46550000 46600000Entrez gene hg19

    MAST2PIK3R3

    LOC100133124 LOC100132269

    CAGE tags RLE normalized [rev:9.7 fwd:0.22 scale:9.75] (exp mean) rle

    Enhancers - permissive set (Andersson and colleagues)

    Encode ChiaPet ditags :: tagcount

    FANTOM5 CAGE Phase1 CTSS

    UCSC hg19 repeatmasker repeats 2011-02-02

    REP522REP522

    L1M2AT_rich

    AluY

    AluYL1ME1

    Entrez gene hg19

    ABHD17AP6CCDC144NL

    RNASEH1P1CCDC144NL-AS1

    Gencode v19 transcripts seq A

    Entrez gene hg19TNFRSF19 MIPEP

    Gencode v19 transcripts

    Primer span

    MiTranscriptome transcript assemblyT091467

    T091478

    T091480T091481

    T091482T091483

    T091484T091485

    T091486

    T091487

    T091488 (BRCAT95)

    T091489

    T091490T091491

    T091492

    T091512

    ~

    ~

    no qPCR signal Upregulated, confirmed by qPCR

    REP522 initiated bidirectional transcription of CCD144NL and CCD144NL-AS1

    seq

    FANTOM5 CAGE Phase1 CTSS

    UCSC hg19 repeatmasker repeats 2011-02-02L1M2

    REP522REP522 REP522 MIRc

    FANTOM5 Human permissive enhancers phase 1 and 2 [rev:0 fwd:198] (mean) score

    Enhancer chr1:46575103-4657517 ChIA-PET chromatin interaction with promoter of PIK3R3

    Entrez gene hg19VIM

    Gencode v19 transcripts

    Primer spanDownregulated, confirmed by qPCR

    A

    B

    C

    D

    RP11-124N14.3 is downregulated in cancer and antisense to VIM (EMT marker)

    REP522 initiated bidirectional transcription of BRCAT95 and T091486

    ENST00000456355.1 (RP11-124N14.3)

    ChIA–PET chromatin interaction

    Upregulated, confirmed by qPCR

    Kaczkowski et al.

    Cancer Res; 76(2) January 15, 2016 Cancer Research220

    on July 10, 2021. © 2016 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

    Published OnlineFirst November 9, 2015; DOI: 10.1158/0008-5472.CAN-15-0484

    http://cancerres.aacrjournals.org/

  • transcripts or located in 50 UTRs) we identified 911 promoterscorresponding to 656 unique genes that were differentiallyexpressed: 435 upregulated and 221 downregulated (Supplemen-tary Table S9). The gene set was significantly enriched foroncogenes (hypergeometric test P ¼ 7.5e�05, 33 genes), tumorsuppressors (P ¼ 0.0043, 13 genes), genes frequently mutated incancer (P¼ 0.034, 18 genes; ref. 14) and genes listed in the CancerGene Census (P ¼ 0.01, 28 genes; see Supplementary Table S3F;ref. 15). Interestingly, eight oncogenes were downregulated, andfive tumor suppressors were upregulated, changing in the oppo-site direction to onewould expect (Supplementary Fig. S1A–S1C).This may be caused by regulatory feedback loops responding toneoplastic changes.

    We next performed an analogous cancer versus normal analysison RNA-seq data from 14 tumor-normal pairs (4,055 primarycancer samples and 563 normal tissues samples; SupplementaryTable S1B) from The Cancer Genome Atlas (TCGA, http://cancer-genome.nih.gov/). The fold changes observed for the TCGAanalysis were considerably weaker (Fig. 2), than those seen forthe FANTOM5 analysis presumably because of the mixture ofcells in a tumor diluting the cancer cell signal (Fig. 2). To recoversimilar numbers of genes from both the TCGA and FANTOM5analyses, we therefore applied a weaker threshold (abs FC > 2,FDR < 0.01) to identify 490 upregulated genes and 1,661downregulated genes (Supplementary Table S4). The up-regu-lated genes were enriched for those listed in the cancer genecensus (hypergeometric test P ¼ 0.03, 18 genes). Of particularnote, many more genes were downregulated in the tumor-normal comparison than we observed for the cancer cellline-primary cell comparison.

    Potential pan-cancer biomarkersWe found that 76 (17%) of the upregulated genes identified in

    the cancer cell lines analysis were also upregulated in primarytumors from TCGA (Fig. 2). Among them we find oncogenes(HOXC13, MYEOV, MNX1, and CASC5), cancer antigens(PRAME, CD70, CASC5, IDF2BP3) and, somewhat unexpectedly,the tumor suppressors (TP73, BLM, BUB1B). The upregulatedgenes were also enriched in genes involved in cell cycle, DNAmetabolism, biopolymer metabolism, and homeobox genesinvolved in development. This included well-known pan-cancergenes such asTERT,PRAME, andTOP2A (17, 18) andMYEOV andMNX1, which are implicated in blood malignancies (19, 20) andFAM111B in prostate cancer (21).

    For the downregulated genes, 52 (19%) genes from theFANTOM5 cancer cell lines analysis were also downregulated inprimary tumors (Fig. 2). Interestingly, the list was enriched forgenes related to oxidoreductase activity (five genes:AOX1, PTGS1,ACOX2, COX7A1, and the tumor suppressor GPX3; ref. 22).Because the downregulation is seen in both cancer cell lines andprimary tumors, we deduce that the changes are caused by a

    permanent reprograming of metabolism in cancer cells ratherthan response to tumor microenvironment, or cell culture con-ditions. Finally, we also observed seven discrepancies; CDKN2A,COL1A1, COL5A2, GJB2, HIST1H2BH, MMP9, and TNFRSF6Bwere downregulated in the cancer cell lines but up regulated in theprimary tumor analysis.

    Finally,weused recentproteomedata from90colorectal cancersand 30 normal tissues published by Zhang and colleagues (7). Thespectral count data were available for 239 of our 656 differentiallyexpressed genes. Twenty mRNAs/proteins were upregulated inboth the cancer cell lines (CAGE) and colorectal tumors (massspec data) whereas 16 were upregulated in both the RNA-seq andmass spec data (Supplementary Fig. S2A and Supplementary TableS9C). Notably, four genes were robustly upregulated in all threecomparisons: MCM2, TOP2A, ASNS, and MKI67.

    There were 108 genes that were downregulated at the proteinlevel and in at least one transcriptome analysis (CAGE or RNA-seq). Strikingly, the top 10 enriched termswithin those geneswereall related to metabolic processes, either to oxidative processes orlipid metabolism (Supplementary Fig. S2B), thereby confirmingthe metabolic pathway changes that we have observed from theRNA data.

    Pan-cancer long-noncoding RNAsFrom the cancer cell line analysis we identified 271 diffe-

    rentially expressed lncRNAs (181 lncRNAs annotated withGENCODE 19, plus a further 90 with the miTRanscriptomeannotation; ref. 23). The majority (247 lncRNAs) were upregu-lated whereas 24 were downregulated (Supplementary TableS10A). In total, 39 and five of these were up- and downregulated,respectively, in both the cancer cell line analysis and at least onetumor type in the miTranscriptome study (23). Of those, 21 wereconsistently upregulated and two consistently downregulated incancer cell lines and at least two tumor types (SupplementaryTable S10B).

    For two of these lncRNAs (ENST00000448869 and FOXP4-AS1), we performed qRTPCR validation in cancer cell linesversus primary cells and also in a cDNA panel covering eighttumor types and normal matching tissues. In both cases, thetargets were highly significantly upregulated in both cancer celllines and tumors (Fig. 4).

    We also looked for the overlap with the lists of pan-cancerlncRNAs to the 229 "onco-lncRNAs" identified by Cabanski andcolleagues (24), which allowed us to confirm three additionalupregulated lncRNAs (Supplementary Table S10B and S10D).Our analysis of preprocessed TCGA RNA-seq data also allowed usto confirm deregulation of four lncRNAs, two already confirmedby miTranscriptome and Cabanski (MEG3 and DGCR5) and twothat were missed by other reports; downregulation of the MT1Lpseudogene andmost notably the up-regulation ofPVT1, which isa well-known lncRNA oncogene (25).

    Figure 3.Genomic neighborhoodof pan-cancer–associatednoncoding transcripts. A,RP11-124N14.3 lncRNA is consistently downregulated in cancer cell lines and is positionedantisense to VIM (EMT marker). The downregulation has been confirmed by qPCR (Fig. 4A) and in the miTranscriptome data (Supplementary Table S10A).B, we observed that REP22 repeat becomes activated in cancer, giving rise to bidirectional transcription. The example here shows the promoters of the proteincoding CCD144NL and its antisense CCD144NL-AS1 overlapping a REP522 element. The upregulation of both genes was validated in cancer cell lines by qPCR(Fig. 4A). C, in another example,BRCAT95, which is upregulated in breast cancer (Supplementary Table S10A), is initiated from aREP522 element and is confirmed tobe upregulated by qPCR. The other transcript could not be confirmed by qPCR, likely due to low expression level and/or incorrectly assembled transcript.Interestingly, the promoter pair overlaps FANTOM5 defined enhancer region. D, ChIA-pet data show pan-cancer enhancer chr1:46575103-46575175 is physicallyassociated with the promoter for the PIK3R3 gene that is reported to increase tumor migration and invasion when overexpressed in colorectal cancer (29) and isidentified as a potential therapeutic target in ovarian cancer (41). The visualizations were performed in the ZENBU genome browser (42).

    Pan-Cancer Transcriptome

    www.aacrjournals.org Cancer Res; 76(2) January 15, 2016 221

    on July 10, 2021. © 2016 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

    Published OnlineFirst November 9, 2015; DOI: 10.1158/0008-5472.CAN-15-0484

    http://cancerres.aacrjournals.org/

  • Deregulated long-noncoding RNAs located near cancer-relatedgenes

    We next looked at the genomic neighborhood of the differ-entially expressed lncRNAs. For 27 of the 181 (GENCODE19)differentially expressed lncRNAs, we found 33 cancer-relatedgenes within 100 kb (Table 2, example in Supplementary Fig.S3). For example, PVT1 neighbors MYC; these are consistentlycogained in cancer. We also observe RP11-1070N10.5 neigh-boring the TCL6 (lincRNA), TCL1A, and TCL1B oncogenes(located in a breakpoint cluster region on chromosome14q32 in T-cell leukemia (26) and HOXA11-AS, neighboringHOXA13 and HOXA9 and overlapping the HOXA11 oncogene.Notably five out of six cancer-related genes located within 1 kbfrom upregulated lncRNAs were also upregulated, these includetheMCF2L, GATA2, andMNX1 oncogenes and BSG and CSAG1cancer antigens (Table 2). Possibly linked to cancermetabolism,the upregulated PCAT7 is located antisense to fructose-1,6-bisphosphatase-2 (FBP2; Supplementary Fig. S3), whosedecreased expression promotes glycolysis and growth in gastriccancer cells (27).

    Activation of repeat elements in cancerGlobally about 20% of all FANTOM5 promoters initiate from

    within repetitive elements and low complexity DNA sequencesannotated by RepeatMasker. We observed a simple relationshipfor promoters that overlapped a repetitive element; the higher thefold change (upregulation in cancer), the higher the probabilitythat the promoter overlapped a repetitive element (Supplemen-tary Fig. S5, see Supplementary Table S13 for the promoter–repeatassociations). A more detailed analysis revealed that the upregu-lated promoters are enriched in retrotransposons (SINE/Alu,LINE/L1, LTR/ERV1, LTR/ERVL). The SINE/Alu and LINE/L1 over-lapping promoters tended to be located in intronic regions ofprotein coding genes (49% and 32%, respectively) and notassociated to known RNA transcripts, whereas the upregulatedpromoters overlapping LTR/ERV1 often initiated the expressionof lncRNAs (31 GENCODE lncRNAs and 48 miTranscriptomelncRNAs; Table 3).

    In contrast, the majority of promoters overlapping simplerepeats and low complexity sequences were associated withprotein coding transcripts. Simple repeats were enriched amongupregulated promoters, whereas low complexity sequences wereenriched among downregulated promoters (Table 3).

    Bidirectional transcription from REP522 satellite repeat isactivated in cancer

    Interestingly, a specific repeat family, REP522, was stronglyenriched in the most upregulated promoters. REP522, origi-nally called a telomeric satellite, is a largely palindromic,unclassified interspersed repeat of �1.8 Kb in size (28). Weobserved that out of 72 promoters overlapping REP522, 25were upregulated in cancer (odds ratio, 62.05). Twenty out ofthese 25 promoters were associated with a known transcript(five pseudogenes, nine lncRNAs, and one protein codinggene) including the pseudogene BAGE2 (B melanoma antigenfamily, member 2) and the lncRNAs PCAT7 and BRCAT95,which were previously implicated in cancer (23). In mostcases, the transcription is initiated bidirectionally and in fivecases it overlaps regions previously annotated as enhancers. Toshow that the observed activation of REP522 elements was notdue to a mapping artifact, we performed qPCR validation for

    11 upregulated, REP522 initiated transcripts from differentgenomic regions in three cancer cell lines and dermal fibro-blast cells as a control. For eight of these we confirmedupregulation in the cancer cell lines compared with normalfibroblasts (Fig. 4A). In one case, we confirmed the bidirec-tional transcription of CCD144NL and CCD144NL-AS1 fromone REP522 element (Fig. 3B). The three transcripts for whichthe qPCR validations did not yield any results represented verylowly expressed, novel and computationally assembled tran-scripts from miTranscriptome, hinting at the possibility thatthey were either too lowly expressed or the transcripts were notcorrectly assembled (Fig. 3C). To our knowledge this is the firstreport implicating REP522 activation in cancer.

    Enhancer activation in cancerTaking advantage of the fact that CAGE data can be used to

    estimate the activity of enhancers from balanced bidirectionalcapped transcription (9), we performed differential expressionanalysis based on CAGE tags counts under 43,011 CAGE-definedenhancers (9), using the same differential expression pipeline asfor the promoter regions. We found 28 pan-cancer enhancersupregulated in solid and blood cancers and a further 62 upregu-lated in solid cancers only (Supplementary Table S5). Enhancerstend to be highly cell-type specific (9); accordingly we found nobroadly downregulated enhancers in cancer.

    We found that 23 of the 90 upregulated enhancers could beassociated to a miTranscriptome transcript (50 end within 500 bpfrom the enhancers; Supplementary Table S11A) and that four ofthose transcripts were reported to be upregulated in at least onecancer type (Supplementary Table S11B).

    We next used Chromatin Interaction Analysis Paired-End Tags(ChIA-PET) data from the ENCODE project to associate thesepan-cancer enhancers with their target genes. We found that 55 ofthe 90 upregulated enhancers can be physically linked to pro-moters of known genes (228 unique enhancer—gene links, Sup-plementary Table S6). 17 of the enhancers were linked to cancerrelated genes, including seven oncogenes (BCL9,CREB1,ZNF384,SALL4, TFRC, BTG1, and oncomir MIR21), two tumor suppressors(ING4, KCTD11) and five Mut-Drivers (PIK3CB, CLIP1, KIFC3,GPS2, andCARM1; Supplementary Table S6). In addition, eight ofthe upregulated enhancers were linked to promoters found to beupregulated in cancer cell lines, including cancer-linked genessuch as TNFSF12 and PIK3R3 (Fig. 3D; ref. 29).

    DiscussionBy using both the FANTOM5 CAGE expression data from

    cancer cell lines and primary cells, and the TCGA RNA-seq andTCGA proteome expression datasets from TCGA tumors andnormal tissues we have built an overview of recurrent expressionchanges in cancer.

    These datasets have their own strengths and weaknesses. Com-plicating the TCGA analysis, both tumors and normal tissues arecomplex mixtures of cell types (cancer cells, infiltrating lympho-cytes, stroma and blood vessels), thus interpretation of differen-tial expression between normal and cancer is complicated. Differ-ences in gene expression may simply reflect differences in cellcomposition. To minimize this issue, the TCGA (3) required thatprofiled tumor samples contain at least 60% tumor cells and lessthan 20% necrosis. The FANTOM5 cell line and primary cell dataavoids this complication as relatively homogenous, pure cell

    Kaczkowski et al.

    Cancer Res; 76(2) January 15, 2016 Cancer Research222

    on July 10, 2021. © 2016 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

    Published OnlineFirst November 9, 2015; DOI: 10.1158/0008-5472.CAN-15-0484

    http://cancerres.aacrjournals.org/

  • cultures were profiled. Conversely, artifacts from the long-termculture of cell lines and their artificial in vitro culture conditionscould affect our FANTOM5 analysis. The TCGA avoids this bydirectly profiling fresh tissue.

    As expected, there are differences in the genes sets identified bythe two datasets. Despite this, we identified a core set of 128markers that are consistently perturbed in both the FANTOM5 celland TCGA tissue analyses. Four of themarkers are also upregulatedat the protein level in a colon cancer dataset. Specifically, TOP2A,MKI67, MCM2, and ASNS, which are among some of the moststudied cancer biomarkers and drug targets. TOP2A is targeted by

    etoposide (30). ASNS is targeted in asparginase therapy of acutelymphoblastic leukemia (31), and both MKI67 and MCM2 havebeen studied as biomarkers (32) and (33) and potential drugtargets (34, 35). Targeting these genes is likely to bring therapeuticvalue to many patients as they are recurrently upregulated acrossmany cancer types. Our pan-cancer markers also appear to bemostly novel, as comparison to prior works [Rhodes and collea-gues, multicancer meta-signature of 67 genes upregulated incancer by meta-analysis of 40 published microarray experiments(18); Xu and colleagues, 46 genes upregulated across 21microarraydata sets (36)] found little overlap (Supplementary Table S8).

    Normal Tumor

    5

    0

    5

    10

    BLM, mean normal Ct = 35.7 log2FC = 2.6, P = 0.0000002

    log 2

    FC (d

    dCt)

    vs. m

    ean

    norm

    allo

    g 2FC

    (ddC

    t) vs

    . mea

    n no

    rmal

    log 2

    FC (d

    dCt)

    vs. m

    ean

    norm

    allo

    g 2FC

    (ddC

    t) vs

    . mea

    n no

    rmal

    log 2

    FC (d

    dCt)

    vs. m

    ean

    norm

    allo

    g 2FC

    (ddC

    t) vs

    . mea

    n no

    rmal

    Normal Tumor

    10

    5

    0

    5

    10

    PRAME, mean normal Ct = 33.2 log2FC = 1.3, P = 0.12

    Normal Tumor

    10

    5

    0

    5

    10

    ENST00000448869, mean normal Ct = 33.3log2FC = 2.6, P = 0.0001

    Normal Tumor

    6

    4

    2

    0

    2

    4

    6

    FOXP4 AS1, mean normal Ct = 31.6 log2FC = 2.2, P = 0.00000007

    Breast Colon Kidney

    Liver Lung Ovarian Prostate

    Thyroid

    −10

    −5

    0

    5

    10

    −10

    −5

    0

    5

    10

    −10

    −5

    0

    5

    10

    Normal TumorNormal TumorNormal TumorNormal Tumor

    A

    C

    B

    ENST00000448869

    Legend: Statistical significance (double-sided t test): * - P < 0.05, ** - P < 0.01, *** - P < 0.001.• - qPCR validation in cDNA tumor panel. ˚ - Bidirectional transcription from the same REP522 element.

    Pr

    Ps

    An

    An

    Ps

    Pr

    Pr

    Figure 4.Validation of pan-cancer biomarkers by qRT-PCR. A, summary of the qRT-PCR validations in three cancer cell lines and dermal fibroblasts as normal reference.The table shows the significant upregulation of eight REP522-associated transcripts and six potential biomarkers, and downregulation of three lncRNAs(potential tumor suppressors). B, for the most promising candidates, we performed qRT-PCR validation across a cDNA panel of 65 tumors, seven lesions and24 normal tissues. Note: BLM, a known tumor suppressor, and two selected lncRNAs (ENST00000448869 and FOXP4-AS1) showed highly significant upregulationin cancer. C, as in B, but showing the upregulation of ENST00000448869 across multiple cancer types.

    Pan-Cancer Transcriptome

    www.aacrjournals.org Cancer Res; 76(2) January 15, 2016 223

    on July 10, 2021. © 2016 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

    Published OnlineFirst November 9, 2015; DOI: 10.1158/0008-5472.CAN-15-0484

    http://cancerres.aacrjournals.org/

  • The FANTOM5 CAGE data also allowed us to look at tran-script types rarely studied in prior efforts (long-noncodingRNAs, enhancer RNAs, and repetitive element derived RNAs).We report 271 pan-cancer lncRNAs, including famous cancer-associated lncRNAs such as PVT1 and many novel cases. Publicdatasets confirmed the upregulation of 25 and downregulationof three of these lncRNAs in at least two primary tumor types(23, 24) and we further validated upregulation of two novellncRNAs by qRTPCR in a cDNA panel covering eight tumor

    types. We also identify 90 enhancer RNA-producing regionsthat are recurrently activated in cancer cell lines. For four ofthem a corresponding lncRNA transcript model is upregulatedin the TCGA dataset.

    The observation that promoters that overlap repetitive ele-ments are often upregulated in cancer is quite interesting, andthe link of the little known REP522 element to cancer is novel.One instance of REP522near theBmelanomaantigen (BAGE) locushas been reported to be marked with H3K9me3 and actively

    Table 3. Promoters overlapping repetitive elements

    Location of differentiallyexpressed repeat

    Repeat overlappingpromoters

    Differentially expressedrepeat overlapping promoters

    Proteincoding

    Repeat family Total # down Odds ratio P-value # up Odds ratio P-value 50UTR Intron Exon 30UTR lncRNA Pseudogene Not annotatedREP522 72 0 0 1 25 62.05 2.2E�16 1 0 0 0 9 3 12Low_complexity 2,013 13 2.37 4.7E�03 18 1.04 0.81 15 2 2 6 2 0 4Simple_repeat 11,982 44 1.35 0.06 204 2.13 2.2E�16 86 70 4 7 17 1 63SINE/Alu 3,961 0 0 2.4E�05 138 4.44 2.2E�16 5 67 1 1 11 3 50LINE/L1 3,426 1 0.1 1.5E�03 67 2.35 1.8E�09 2 22 0 0 12 0 32LINE/L2 3,220 2 0.22 0.01 25 0.9 0.7 2 4 0 0 4 0 17LTR/ERVL-MaLR 3,613 0 0 7.8E�05 31 0.99 1 6 4 0 0 10 0 11LTR/ERV1 3,932 2 0.18 3.0E�03 133 4.3 2.2E�16 7 12 0 0 31 2 83LTR/ERVL 1,488 0 0 0.04 20 1.57 0.049 2 2 0 0 8 0 8

    NOTE: The table shows the numbers of upregulated anddownregulated promoters that overlap nine families of repetitive elements (�20promoters) aswell as Fisherexact statistics of the enrichments (odd ratios and P-value, two-sided test). The right side of the table shows the available information about the annotation of thosepromoters.

    Table 2. The differentially expressed lncRNAs located within 100 kb from known cancer-related genes

    lncRNAlncRNADE summary

    Neighborgene name

    NeighborDE summary Neighbor gene info

    Distance fromlncRNA Overlap Strand

    MCF2L-AS1 Solid UP MCF2L Solid UP Oncogene

  • transcribed (37), perhaps suggesting REP522 transcriptional acti-vation is responsible for upregulation of BAGE in cancer. Otherbetter studied elements such as LTR elements have previouslybeen reported to act as alternative promoters of host genes inmouse embryos (38) and to contribute to the complexity of thetranscriptome of iPS and stem cells (39). Thus, the reactivation ofthese elements and the eRNAs identified above suggests acquisi-tion of stem cell like properties by cancer cells. Possibly becauserepetitive sequences are usually suppressed by methylation insomatic cells; however, in cancer they are frequently hypomethy-lated (40).

    In conclusion, our results, which highlight the transcriptomechanges in cancer and cover both protein coding genes, non-protein coding transcripts, unannotated promoters and enhancerRNAs, represent an important step towards discovery of poten-tially useful cancer biomarkers and therapeutic targets. Develop-ment of technologies to detect and target these molecules has thegreat potential to be applicable to a broad range of cancers. Onelast note is that we identify molecules that are consistently up ordown in cancer normal comparisons, but are not necessarilyalways higher in all cancers relative to all normal tissues (a subsetare). Suchmoleculesmay not be suitable for plasma/serumbaseddiagnostics but would be useful in screening biopsies in a histo-pathologic setting.

    Disclosure of Potential Conflicts of InterestP. Carninci is founder and CEO for TransSINE Technologies. No potential

    conflicts of interest were disclosed by the other authors.

    Authors' ContributionsConception and design: B. Kaczkowski, Y. Hayashizaki, P. Carninci,A.R.R. ForrestDevelopment of methodology: B. Kaczkowski, M. Itoh, P. CarninciAcquisition of data (provided animals, acquired and managed patients,provided facilities, etc.): Y. Tanaka, M. Itoh, The FANTOM5 Consortium,P. Carninci, A.R.R. ForrestAnalysis and interpretation of data (e.g., statistical analysis, biostatistics,computational analysis): B. Kaczkowski, H. Kawaji, A. Sandelin, R. Andersson,M. Itoh, T. Lassmann

    Writing, review, and/or revision of the manuscript: B. Kaczkowski,R. Andersson, P. Carninci, A.R.R. ForrestAdministrative, technical, or material support (i.e., reporting or organizingdata, constructing databases): Y. Tanaka, H. Kawaji, A. Sandelin, T. Lassmann,The FANTOM5 Consortium, P. CarninciStudy supervision: A.R.R. ForrestOther (supported experimental design of qPCR validation of pan-cancermarker's candidates and data analysis, as well as performing experiments):Y. Tanaka

    AcknowledgmentsThe authors thank Erik Arner, Efthymios Motakis, Kosuke Hashimoto, Dave

    Tang, Chung-Chau Hon, Jordan Ramilowski, and Giovani Pascarella for valu-able discussions and comments to themanuscript, and Yuri Ishizu for technicalassistance.

    Grant SupportB. Kaczkowski was supported by Postdoctoral Fellowship Program from

    Japan Society for the Promotion of Science (JSPS) and Foreign PostdoctoralResearcher (FPR) program from RIKEN, Japan. Y. Tanaka was supported byGrants-in-Aid for Scientific Research (KAKENHI) from the Ministry ofEducation, Culture, Sports, Science, and Technology. R. Andersson wassupported by funding from the European Research Council (ERC) underthe European Union's Horizon 2020 Research and Innovation Programme(grant agreement No. 638273). A. Sandelin was supported by the NovoNordisk Foundation and the Lundbeck Foundation. FANTOM5 was madepossible by a Research Grant for RIKEN Omics Science Center from MEXT toY. Hayashizaki and a Grant of the Innovative Cell Biology by InnovativeTechnology (Cell Innovation Program) from the MEXT, Japan to Y. Haya-shizaki. This study is also supported by Research Grants from the JapaneseMinistry of Education, Culture, Sports, Science and Technology throughRIKEN Preventive Medicine and Diagnosis Innovation Program to Y. Haya-shizaki and RIKEN Centre for Life Science, Division of Genomic Technol-ogies to P. Carninci. A.R.R. Forrest is supported by a Senior Cancer ResearchFellowship from the Cancer Research Trust and funds raised by the MACARide to Conquer Cancer.

    The costs of publication of this article were defrayed in part by thepayment of page charges. This article must therefore be hereby markedadvertisement in accordance with 18 U.S.C. Section 1734 solely to indicatethis fact.

    Received February 18, 2015; revised September 21, 2015; acceptedOctober 4,2015; published OnlineFirst November 9, 2015.

    References1. Felder M, Kapur A, Gonzalez-Bosquet J, Horibata S, Heintz J, Albrecht R,

    et al. MUC16 (CA125): tumor biomarker to cancer therapy, a work inprogress. Mol Cancer 2014;13:129.

    2. Makarov DV, Loeb S, Getzenberg RH, Partin AW. Biomarkers for prostatecancer. Annu Rev Med 2009;60:139–51.

    3. Cancer Genome Atlas ResearchNetwork. Kandoth C, Schultz N, CherniackAD, Akbani R, Liu Y, et al. Integrated genomic characterization of endo-metrial carcinoma. Nature 2013;497:67–73.

    4. Cancer Genome Atlas Research Network. Genome CharacterizationCenter. Chang K, Creighton CJ, Davis C, Donehower L, et al. TheCancer Genome Atlas Pan-Cancer analysis project. Nat Genet 2013;45:1113–20.

    5. FANTOM Consortium and the RIKEN PMI and CLST (DGT). A promoter-level mammalian expression atlas. Nature 2014;507:462–70.

    6. The Cancer Genome Atlas - Data Portal [Internet]. [cited 2015 Jul 14].Available from: https://tcga-data.nci.nih.gov/

    7. Zhang B, Wang J, Wang X, Zhu J, Liu Q, Shi Z, et al. Proteogenomiccharacterization of human colon and rectal cancer. Nature 2014;513:382–7.

    8. FANTOM5 project [Internet]. [cited 2015 Jul 14]. Available from: http://fantom.gsc.riken.jp/5/

    9. Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M,et al. An atlas of active enhancers across human cell types and tissues.Nature 2014;507:455–61.

    10. RobinsonMD,McCarthyDJ, SmythGK. edgeR: a Bioconductor package fordifferential expression analysis of digital gene expression data. Bioinfor-matics 2010;26:139–40.

    11. SubramanianA, TamayoP,Mootha VK,Mukherjee S, Ebert BL,GilletteMA,et al. Gene set enrichment analysis: a knowledge-based approach forinterpreting genome-wide expression profiles. Proc Natl Acad Sci U S A2005;102:15545–50.

    12. Magrane M, Consortium U. UniProt Knowledgebase: a hub of integratedprotein data. Database (Oxford) 2011:bar009.

    13. ZhaoM, Sun J, Zhao Z. TSGene: aweb resource for tumor suppressor genes.Nucleic Acids Res 2013;41:D970–6.

    14. Tamborero D, Gonzalez-Perez A, Perez-Llamas C, Deu-Pons J, Kandoth C,Reimand J, et al. Comprehensive identification of mutational cancer drivergenes across 12 tumor types. Sci Rep 2013;3:2650.

    15. Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, et al. Acensus of human cancer genes. Nat Rev Cancer 2004;4:177–83.

    16. Harrow J, Frankish A, Gonzalez JM, Tapanari E, DiekhansM, Kokocinski F,et al. GENCODE: the reference human genome annotation for TheENCODE Project. Genome Res 2012;22:1760–74.

    17. Fratta E, Coral S, CovreA, Parisi G, Colizzi F,Danielli R, et al. The biology ofcancer testis antigens: putative function, regulation and therapeutic poten-tial. Mol Oncol 2011;5:164–82.

    18. Rhodes DR, Yu J, Shanker K, Deshpande N, Varambally R, Ghosh D, et al.Large-scale meta-analysis of cancer microarray data identifies common

    www.aacrjournals.org Cancer Res; 76(2) January 15, 2016 225

    Pan-Cancer Transcriptome

    on July 10, 2021. © 2016 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

    Published OnlineFirst November 9, 2015; DOI: 10.1158/0008-5472.CAN-15-0484

    http://cancerres.aacrjournals.org/

  • transcriptional profiles of neoplastic transformation and progression. ProcNatl Acad Sci U S A 2004;101:9309–14.

    19. Janssen JW, Vaandrager JW, Heuser T, Jauch A, Kluin PM, Geelen E, et al.Concurrent activation of a novel putative transforming gene, myeov, andcyclin D1 in a subset of multiple myeloma cell lines with t(11;14)(q13;q32). Blood 2000;95:2691–8.

    20. Taketani T, Taki T, Sako M, Ishii T, Yamaguchi S, Hayashi Y. MNX1-ETV6fusion gene in an acute megakaryoblastic leukemia and expression of theMNX1 gene in leukemia and normal B cell lines. Cancer Genet Cytogenet2008;186:115–9.

    21. Akamatsu S, Takata R, Haiman CA, Takahashi A, Inoue T, Kubo M, et al.Common variants at 11q12, 10q26 and 3p11.2 are associated withprostate cancer susceptibility in Japanese. Nat Genet 2012;44:426–9–S1.

    22. Barrett CW, Ning W, Chen X, Smith JJ, Washington MK, Hill KE, et al.Tumor suppressor function of the plasma glutathione peroxidase gpx3 incolitis-associated carcinoma. Cancer Res 2013;73:1245–55.

    23. Iyer MK, Niknafs YS, Malik R, Singhal U, Sahu A, Hosono Y, et al. Thelandscapeof longnoncodingRNAs in the human transcriptome.NatGenet2015;47:199–208.

    24. Cabanski CR, White NM, Dang HX, Silva-Fisher JM, Rauck CE, Cicka D,et al. Pan-cancer transcriptome analysis reveals long noncoding RNAs withconserved function. RNA Biol 2015;12:628–42.

    25. Tseng Y-Y, Moriarity BS, Gong W, Akiyama R, Tiwari A, Kawakami H, et al.PVT1 dependence in cancer with MYC copy-number increase. Nature2014;512:82–6.

    26. Saitou M, Sugimoto J, Hatakeyama T, Russo G, Isobe M. Identification ofthe TCL6 genes within the breakpoint cluster region on chromosome14q32 in T-cell leukemia. Oncogene 2000;19:2796–802.

    27. Li H, Wang J, Xu H, Xing R, Pan Y, Li W, et al. Decreased fructose-1,6-bisphosphatase-2 expression promotes glycolysis and growth in gastriccancer cells. Mol Cancer 2013;12:110.

    28. Wheeler TJ, Clements J, Eddy SR, Hubley R, Jones TA, Jurka J, et al. Dfam: adatabase of repetitive DNA based on profile hidden Markov models.Nucleic Acids Res 2013;41:D70–82.

    29. Wang G, Yang X, Li C, Cao X, Luo X, Hu J. PIK3R3 induces epithelial-to-mesenchymal transition and promotesmetastasis in colorectal cancer.MolCancer Ther 2014;13:1837–47.

    30. Johnson CA, Padget K, Austin CA, Turner BM. Deacetylase activity associ-ates with topoisomerase II and is necessary for etoposide-induced apo-ptosis. J Biol Chem 2001;276:4539–42.

    31. Hawkins DS, Park JR, Thomson BG, Felgenhauer JL, Holcenberg JS,Panosyan EH, et al. Asparaginase pharmacokinetics after intensivepolyethylene glycol-conjugated L-asparaginase therapy for childrenwith relapsed acute lymphoblastic leukemia. Clin Cancer Res 2004;10:5335–41.

    32. Dudderidge TJ, Stoeber K, Loddo M, Atkinson G, Fanshawe T, GriffithsDF, et al. Mcm2, Geminin, and KI67 define proliferative state and areprognostic markers in renal cell carcinoma. Clin Cancer Res 2005;11:2510–7.

    33. Wharton SB, Chan KK, Anderson JR, Stoeber K, Williams GH. ReplicativeMcm2 protein as a novel proliferation marker in oligodendrogliomas andits relationship to Ki67 labelling index, histological grade and prognosis.Neuropathol Appl Neurobiol 2001;27:305–13.

    34. Liu Y, He G, Wang Y, Guan X, Pang X, Zhang B. MCM-2 is a therapeutictarget of Trichostatin A in colon cancer cells. Toxicol Lett 2013;221:23–30.

    35. Rahmanzadeh R, Rai P, Celli JP, Rizvi I, Baron-L€uhr B, Gerdes J, et al. Ki-67as amolecular target for therapy in an in vitro three-dimensionalmodel forovarian cancer. Cancer Res 2010;70:9234–42.

    36. Xu L, Geman D, Winslow RL. Large-scale integration of cancer microarraydata identifies a robust common cancer signature. BMC Bioinformatics2007;8:275.

    37. Ward MC, Wilson MD, Barbosa-Morais NL, Schmidt D, Stark R, Pan Q,et al. Latent regulatory potential of human-specific repetitive elements.MolCell 2013;49:262–72.

    38. Peaston AE, Evsikov AV, Graber JH, de Vries WN, Holbrook AE, Solter D,et al. Retrotransposons regulate host genes in mouse oocytes and preim-plantation embryos. Dev Cell 2004;7:597–606.

    39. Fort A, Hashimoto K, Yamada D, Salimullah M, Keya CA, Saxena A, et al.Deep transcriptome profiling of mammalian stem cells supports a regu-latory role for retrotransposons in pluripotency maintenance. Nat Genet2014;46:558–66.

    40. Ehrlich M. DNA methylation in cancer: too much, but also too little.Oncogene 2002;21:5400–13.

    41. Zhang L, Huang J, Yang N, Greshock J, Liang S, Hasegawa K, et al.Integrative genomic analysis of phosphatidylinositol 30-kinase familyidentifies PIK3R3 as a potential therapeutic target in epithelial ovariancancer. Clin Cancer Res 2007;13:5314–21.

    42. Severin J, Lizio M, Harshbarger J, Kawaji H, Daub CO, Hayashizaki Y, et al.Interactive visualization and analysis of large-scale sequencing datasetsusing ZENBU. Nat Biotechnol 2014;32:217–9.

    Cancer Res; 76(2) January 15, 2016 Cancer Research226

    Kaczkowski et al.

    on July 10, 2021. © 2016 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

    Published OnlineFirst November 9, 2015; DOI: 10.1158/0008-5472.CAN-15-0484

    http://cancerres.aacrjournals.org/

  • 2016;76:216-226. Published OnlineFirst November 9, 2015.Cancer Res Bogumil Kaczkowski, Yuji Tanaka, Hideya Kawaji, et al. Multiple Cancers Identifies New Pan-Cancer BiomarkersTranscriptome Analysis of Recurrently Deregulated Genes across

    Updated version

    10.1158/0008-5472.CAN-15-0484doi:

    Access the most recent version of this article at:

    Material

    Supplementary

    http://cancerres.aacrjournals.org/content/suppl/2015/11/10/0008-5472.CAN-15-0484.DC1

    Access the most recent supplemental material at:

    Cited articles

    http://cancerres.aacrjournals.org/content/76/2/216.full#ref-list-1

    This article cites 39 articles, 11 of which you can access for free at:

    Citing articles

    http://cancerres.aacrjournals.org/content/76/2/216.full#related-urls

    This article has been cited by 11 HighWire-hosted articles. Access the articles at:

    E-mail alerts related to this article or journal.Sign up to receive free email-alerts

    Subscriptions

    Reprints and

    [email protected]

    To order reprints of this article or to subscribe to the journal, contact the AACR Publications Department at

    Permissions

    Rightslink site. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC)

    .http://cancerres.aacrjournals.org/content/76/2/216To request permission to re-use all or part of this article, use this link

    on July 10, 2021. © 2016 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

    Published OnlineFirst November 9, 2015; DOI: 10.1158/0008-5472.CAN-15-0484

    http://cancerres.aacrjournals.org/lookup/doi/10.1158/0008-5472.CAN-15-0484http://cancerres.aacrjournals.org/content/suppl/2015/11/10/0008-5472.CAN-15-0484.DC1http://cancerres.aacrjournals.org/content/76/2/216.full#ref-list-1http://cancerres.aacrjournals.org/content/76/2/216.full#related-urlshttp://cancerres.aacrjournals.org/cgi/alertsmailto:[email protected]://cancerres.aacrjournals.org/content/76/2/216http://cancerres.aacrjournals.org/

    /ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /CropGrayImages false /GrayImageMinResolution 200 /GrayImageMinResolutionPolicy /Warning /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /CropMonoImages false /MonoImageMinResolution 600 /MonoImageMinResolutionPolicy /Warning /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 900 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False

    /CreateJDFFile false /Description > /Namespace [ (Adobe) (Common) (1.0) ] /OtherNamespaces [ > /FormElements false /GenerateStructure false /IncludeBookmarks false /IncludeHyperlinks false /IncludeInteractive false /IncludeLayers false /IncludeProfiles false /MarksOffset 18 /MarksWeight 0.250000 /MultimediaHandling /UseObjectSettings /Namespace [ (Adobe) (CreativeSuite) (2.0) ] /PDFXOutputIntentProfileSelector /NA /PageMarksFile /RomanDefault /PreserveEditing true /UntaggedCMYKHandling /LeaveUntagged /UntaggedRGBHandling /LeaveUntagged /UseDocumentBleed false >> > ]>> setdistillerparams> setpagedevice