supplemental information integrative genomic … information integrative genomic analyses reveal ......

38
1 Cancer Cell, Volume 23 Supplemental Information Integrative Genomic Analyses Reveal an Androgen-Driven Somatic Alteration Landscape in Early-Onset Prostate Cancer Joachim Weischenfeldt, Ronald Simon, Lars Feuerbach, Karin Schlangen, Dieter Weichenhan, Sarah Minner, Daniela Wuttig, Hans-Jörg Warnatz, Henning Stehr, Tobias Rausch, Natalie Jäger, Lei Gu, Olga Bogatyrova, Adrian M. Stütz, Rainer Claus, Jürgen Eils, Roland Eils, Clarissa Gerhäuser, Po-Hsien Huang, Barbara Hutter, Rolf Kabbe, Christian Lawerenz, Sylwester Radomski, Cynthia C Bartholomae, Maria Fälth, Stephan Gade, Manfred Schmidt, Nina Amschler, Thomas Haß, Rami Galal, Jovisa Gjoni, Ruprecht Kuner, Constance Baer, Sawinee Masser, Christof von Kalle, Thomas Zichner, Vladimir Benes, Benjamin Raeder, Malte Mader, Vyacheslav Amstislavskiy, Meryem Avci, Hans Lehrach, Dmitri Parkhomchuk, Marc Sultan, Lia Burkhardt, Markus Graefen, Hartwig Huland, Martina Kluth, Antje Krohn, Hüseyin Sirma, Laura Stumm, Stefan Steurer, Katharina Grupp, Holger Sültmann, Guido Sauter, Christoph Plass, Benedikt Brors, Marie-Laure Yaspo, Jan O. Korbel, and Thorsten Schlomm Inventory of Supplemental Information Supplemental Data Figure S1, related to Table 2. Table S1, related to Table 2. Provided as an Excel file. Figure S2, related to Figure 2. Table S2, related to Figure 2. Provided as an Excel file.

Upload: donhu

Post on 20-Mar-2018

226 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

1

Cancer Cell, Volume 23

Supplemental Information

Integrative Genomic Analyses Reveal

an Androgen-Driven Somatic Alteration Landscape

in Early-Onset Prostate Cancer

Joachim Weischenfeldt, Ronald Simon, Lars Feuerbach, Karin Schlangen, Dieter

Weichenhan, Sarah Minner, Daniela Wuttig, Hans-Jörg Warnatz, Henning Stehr,

Tobias Rausch, Natalie Jäger, Lei Gu, Olga Bogatyrova, Adrian M. Stütz, Rainer

Claus, Jürgen Eils, Roland Eils, Clarissa Gerhäuser, Po-Hsien Huang, Barbara

Hutter, Rolf Kabbe, Christian Lawerenz, Sylwester Radomski, Cynthia C

Bartholomae, Maria Fälth, Stephan Gade, Manfred Schmidt, Nina Amschler,

Thomas Haß, Rami Galal, Jovisa Gjoni, Ruprecht Kuner, Constance Baer, Sawinee

Masser, Christof von Kalle, Thomas Zichner, Vladimir Benes, Benjamin Raeder,

Malte Mader, Vyacheslav Amstislavskiy, Meryem Avci, Hans Lehrach, Dmitri

Parkhomchuk, Marc Sultan, Lia Burkhardt, Markus Graefen, Hartwig Huland,

Martina Kluth, Antje Krohn, Hüseyin Sirma, Laura Stumm, Stefan Steurer,

Katharina Grupp, Holger Sültmann, Guido Sauter, Christoph Plass, Benedikt

Brors, Marie-Laure Yaspo, Jan O. Korbel, and Thorsten Schlomm

Inventory of Supplemental Information

Supplemental Data

Figure S1, related to Table 2.

Table S1, related to Table 2. Provided as an Excel file.

Figure S2, related to Figure 2.

Table S2, related to Figure 2. Provided as an Excel file.

Page 2: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

2

Table S3, related to Figure 2. Provided as an Excel file.

Table S4, related to Figure 2. Provided as an Excel file.

Table S5, related to Figure 2. Provided as an Excel file.

Table S6, related to Figure 2. Provided as an Excel file.

Table S7, related to Figure 2.

Figure S3, related to Figure 4.

Figure S4, related to Figure 5.

Supplemental Experimental Procedures

Supplemental References

Page 3: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

3

Supplemental Information

Supplemental Data

Figure S1, related to Table 2

False-negative rate for SNV calling and Genome-wide patterns of the EO-PCA

methylome.

Page 4: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

4

(A) Estimated False Negative Rate (FNR) of 5%. Based on a sequencing depth of 30x, a

tumor purity at 0.5 and the assumption that at least 2 reads are required to be able to

ascertain an SNV.

(B) Differentially DNA-methylated genomic regions (DMRs) revealed 521 promoter-

associated DMRs, mostly hypermethylated, to be common to all eleven EO-PCA tumor

samples. Genome-wide distribution of observed (red bars) and expected (yellow bars)

hypermethylated promoters, and observed (blue bars) and expected (green bars)

hypomethylated promoters in EO-PCA. Genes with methylated promoters and

associated gene expression downregulation are listed in Table S6.

(C) Non-random distribution of differentially methylated promoters throughout the PCA

genome, similar to reports in other cancers (Plass and Smiraglia, 2006). The X-axis

displays the number of tumor samples and the Y-axis indicates the number of

differentially methylated promoters. Black and red curves show the expected and the

observed distribution, respectively. The empirical p-value was calculated based on

10,000 permutations.

Page 5: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

5

Page 6: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

6

Page 7: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

7

Page 8: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

8

Figure S2, related to Figure 2

(A) Genomic and epigenomic alterations in EO-PCA. Circos plots showing genomic

structural rearrangements, copy-number profiles, SNVs and methylation patterns in 11

EO-PCAs. See legend of Figure 2A for further details.

(B) Tumors with NCOR2 deletions are associated with lower PSA-recurrence-free

survival. Prognostic impact of NCOR2 deletions (red line; n=163) compared to NCOR2-

positive control patients (blue line; n=4,937; p=0.0391; likelihood-ratio test), detected by

FISH in a set of 5,100 PCA samples on our TMA resource.

(C-J) PTEN rearrangements and clinical impact.

Page 9: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

9

(C) Co-occurrence of deletions and inactivating translocations of PTEN assessed in a

large patient cohort including 11,152 PCAs, using TMAs (del., deletion). PTEN analysis

with a break-apart FISH probe with positive PTEN break-apart signals in 3% of PCAs

(n=5,404). PTEN break was detected in 102/443 (23%) cases with concurrent PTEN

deletion, and 53/4,389 (1.2%) samples lacking such additional deletion (p<0.0001,

Fisher’s exact test).

Deletions and translocations are abundant in PCAs with advanced tumor stage (D), and

Gleason grade (E).

(F-H) FISH analysis of PTEN breaks and deletions.

(F) Tumor cell showing two PTEN copies without breaks as indicated by two pairs of

adjacent red and green FISH signals corresponding to the 5’ and 3’ flanking regions of

the gene.

(G) Tumor cell with heterozygous deletion.

(H) Tumor cell with break of one allele.

(I) Tumor cell with a concurrent PTEN deletion and break.

(J) Kaplan-Meier analysis showing link between PTEN disruption and early PSA

recurrence both when occurring independently or in conjunction with deletions

encompassing the other PTEN allele (PTEN normal vs homozygous deletion p < 0.0001;

PTEN normal vs heterozygous deletion p < 0.0001; PTEN normal vs heterozygous

deletion and break p < 0.0001; PTEN normal vs break-only (homozygous or

heterozygous) p = 0.0001; Likelihood-Ratio test).

(K-O) Deregulated miRNAs in EO-PCA.

(K) Overlap of differentially expressed miRNAs in our study and in previously published

PCA studies (Martens-Uzunova et al., 2012; Szczyrba et al., 2010; Wach et al., 2012). In

our study, all miRNAs that were observed as up- or down-regulated across all seven

tumors analyzed for miRNA expression were considered to be differentially expressed.

Page 10: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

10

(L) The PTEN-targeting miR-106b-5p that displays oncogenic activity in a PCA mouse

model (Poliseno et al., 2010a) is upregulated 2.4x – 4.6x in all seven EO-PCA samples.

Previous reports showed fold-changes of 1.5 – 2.3 for this miRNA in PCA (Martens-

Uzunova et al., 2012; Szczyrba et al., 2010; Wach et al., 2012; Taylor et al., 2010).

Shown are expressions of miR-106b-5p in all seven analyzed tumor samples (dots),

including two samples with full PTEN inactivation (red dots), and in the normal control

(horizontal line).

(M) miRNAs contribute to PTEN inactivation. PTEN gene expression (red dots) is shown

together with the mean expression of 16 miRNAs (left panel and green squares in right

panel) that are proven to target PTEN, including miR-106b-5p (Table S7) and mean

expression of competing endogenous RNAs (ceRNAs; blue triangles = mean expression

of 14 previously reported ceRNAs), which can stabilize PTEN mRNA by serving as a

decoy for PTEN-targeting miRNAs (Sumazin et al., 2011; Poliseno et al., 2010b) (right

panel). Expression levels were normalized to a normal prostate tissue sample (red

horizontal line).

(N) Methylation analyses by MassARRAY revealed hypomethylation of PTEN-targeting

miRNA promoters (Baer et al., 2012). Data are from an independent set of 35 PCA and

35 normal prostate epithelium samples. Panels on the right display correlations between

miR-93 and miR-141 expression and average methylation levels across the promoter

regions (R corresponds to Spearman rank correlation values). Horizontal lines depict

median values. Samples are labeled using the following color code: black dots = elderly-

onset PCA, red dots = EO-PCAs; grey dots = normal prostate epithelium > 50yrs, green

dots = normal prostate epithelium ≤ 50 yrs.

(O) The tumor suppressive (Kong et al., 2012) miRNA MIRLET7B was inactivated by a

disruptive translocation. Displayed are RNA-seq-based expression data (RPM) of

Page 11: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

11

MIRLET7B and paired-end mapping based translocation call involving the miRNA cluster

surrounding MIRLET7B in EOPC-04 (left panel), which was also verified by Sanger

sequencing. For comparison, RNA expression levels from a normal prostate and a tumor

sample (EOPC-010) are displayed (samples without rearrangement in the respective

locus). The right panel displays expression levels (RPKM) for MIRLET7B in all 11 EO-

PCA tumors and a normal prostate tissue sample.

(P-T) a novel SNURF:ETV1 fusion gene.

(P) Verification of the SNURF:ETV1 fusion by dideoxy (Sanger) sequencing across the

fusion breakpoint region in patient EOPC-03. The fusion comprised up to intron 2 of

SNURF at the 5’-end and continued into intron 4 of ETV1 at the 3’-end. SNURF is an

imprinted and androgen-regulated paternally expressed gene (Montano et al., 2007),

which is located in a region with transcriptional epigenetic silencing of the maternal, but

not the paternal allele (15q) (Buiting et al., 1995).

(Q) Relative expression of ETV1 (y-axis; exon RPKM 5’ and 3’ to the breakpoint of

SNURF:ETV1 in EOPC-03, log scale) in all eleven tumors. Further to the right, relative

expression levels (normalized to prostate control) measured by qRT-PCR are depicted

for EOPC-01 – 04 (Table S2).

(R) Evidence for the paternal origin and, hence, activating character of the SNURF

fusion allele in EOPC-03. PCR products used to identify the parental origin of the fusion

allele by sequencing common SNPs are indicated by thick, horizontal black lines

underneath the depicted fused gene. Individual clones were sequenced from EOPC-03

(right, longer PCR product) or from EOPC-03 and parents (left, shorter product). The

base composition is indicated. Question marks indicate un-detectable fusion allele-

specific bases.

Page 12: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

12

(S) Correlating promoter methylation and gene expression in the imprinting domain on

chromosome 15, with genes (blue boxes) and transcriptional directions (blue arrow

heads). Hyper- and hypomethylated promoters in patients, indicated by numbers or

“ALL”, are shown as red and green vertical bars, respectively. Paternally or maternally

silenced genes are marked by red stars. “IC” marks the imprinting center. An expression

ratio >1.5 was considered as up-, and a ratio <0.67 as down-regulated, here indicated as

green triangles pointing up- or downward, respectively.

(T) FISH analysis of SNURF:ETV1 fusion gene rearrangement using break-apart probe

sets for SNURF (left column) and ETV1 (middle column) as well as a fusion probe set for

SNURF:ETV1 (right column). The break-apart probe sets were made from two

differentially (red and green) labeled BAC clones corresponding to the 5’ and 3’ flanking

regions each of SNURF and ETV1. Intact alleles (no breakage) are indicated by paired

red-green signals, whereas breakage is indicated be separate red and green signals.

The fusion probe was made from two differentially labeled BAC clones corresponding to

the 3’ end of SNURF and the 5’ end of ETV1. Shown are single breaks of SNURF (left

column) and of ETV1 (center column), and close proximity of the two breaks

(SNURF:ETV1 fusion; right column). Two additional SNURF breaks detected by FISH

analysis of 1,219 additional PCAs (PCA-Additional#1 and a tetraploid PCA-

Additional#2).

Page 13: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

13

Table S7. Related to Figure 2.

EO-PCA

Elderly-onset PCA

Methylation analysis (Mass ARRAY)

mature miRNA

Upreg. (N=7)

fold- change

fold-change promoter region

Methyl-ation

correlation with miRNA expression

(R, p)

Clu

ster

chr.

13

hsa-miR-17-5p 7 4.0 1.4

chr13:91995949-92000723 n.s. n.s.

hsa-miR-19a-3p 6 2.6 1.6

hsa-miR-19b-3p 6 2.7 1.5

hsa-miR-20a-5p 7 5.2 1.5

hsa-miR-92a-3p 5 2.3 n.s.

clus

ter

chr.

7

hsa-miR-106b-5p 7 3.3 1.5

chr7:99695817-99701753

-0.543, <0.001

hsa-miR-93-5p 7 5.2 1.4 hypo -0.682, <0.001-0.625, <0.001

hsa-miR-25-3p 7 3.3 1.6

clus

ter

chr.

12

hsa-miR-141-3p 6 2.0 1.6 chr12:7057258-7074488 hypo -0.678, <0.001

clus

ter

chr.

1

hsa-miR-214-3p 3 1.5 -1.2 not analyzed not analyzed

clus

ter

chr.

14

hsa-miR-494 4 1.5 1.3 not analyzed not analyzed

clus

ter

chr.

X

hsa-miR-221-3p 3 1.1 -2.2 chrX:45610594-

45611811 not analyzed

hsa-miR-222-3p 1 -2.0 -2.2

hsa-miR-21-5p 5 1.9 n.s. chr17:57912817-57921277

n.s. n.s.

hsa-miR-22-3p 1 -1.3 -1.3 not analyzed

hsa-miR-26b-5p 4 1.1 1.2 not analyzed

PTEN-targeting miRNAs are upregulated in EO-PCA and elderly-onset PCA. The table

displays fold-changes of 16 miRNAs that were previously reported to directly target and

downregulate PTEN (Poliseno et al., 2010a; Liang et al., 2011; Cao et al., 2011; Zhang

et al., 2012; Wang et al., 2011; Xu et al., 2012; Palumbo et al., 2012). The correlation

between gene expression and promoter hypomethylation of the four miRNAs

Page 14: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

14

investigated by Mass-ARRAY analysis was assessed by Spearman’s rank correlation (R

and p-value, p). The miRNA expression was measured in an independent dataset of 35

PCAs and 35 non-malignant prostate samples (n.s. = not significant).

Page 15: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

15

Page 16: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

16

Figure S3, related to Figure 4

Patient age as a function of ERG upregulation (upper left plot; n=9,567 patients

analyzed using TMAs, p=3.89x10-25), TMPRSS2:ERG presence (upper right plot;

n=6,071, p=1.96x10-15), 6q15 deletion (middle left plot; n=3,493, p=2.08x10-7), PTEN

disruption (middle right plot; n=5,374, p=3.17x10-3), CHD1 disruption (bottom left plot; n=

2,981, p=1.25x10-5) and NCOR2 disruption (bottom right plot, y-axis adjusted; n=5,487,

p=0.0158). The display items for ERG, TMPRSS2:ERG, PTEN and 6q were generated

using the same data as used in Figure 4A. PCA age-distributions are shown as

horizontal boxplots (displaying the 25th to 75th percentiles (boxes), medians (lines), and

1.5 times the interquartile range (whiskers) and outliers (rings)), with a histogram

indicating presence of disrupted or upregulated protein (top) or absence (bottom), and a

red line showing the logistic regression. TMPRSS2:ERG, 6q15 loss, PTEN, CHD1 and

NCOR2 disruption were detected with FISH on TMAs and ERG overexpression by IHC

(see Experimental Procedures).

Page 17: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

17

Page 18: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

18

Figure S4, related to Figure 5

(A) No prognostic impact of ERG expression detected by IHC in a set of 8,317 PCA

samples. PSA-recurrence-free survival was assessed for patients with ERG positive

(blue line; n=3,632) and ERG negative tumors (blue red; n=4,685; p=0.1979; likelihood-

ratio test).

(B) Free testosterone (nM) by age decade. Reproduced from (Mohr et al., 2005).

(C) Fraction of ERG positive tumors in different age group with AR expression, scored

by IHC as strong, moderate, weak or negative.

Page 19: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

19

Supplemental Experimental Procedures

Discovery of SNVs using a consensus approach

To arrive at a high-confidence list of somatic SNVs we applied three distinct

computational pipelines for somatic SNV discovery, and subsequently kept somatic SNV

calls only if they were identified by at least two out of the three pipelines. The following

SNV discovery pipelines were used. In one pipeline, we applied the Genome Analysis

Toolkit (GATK) (DePristo et al., 2011) on reads aligned to the reference genome with

ELAND2, using the quality recalibration and the local realignment features of GATK, and

called SNVs with the UnifiedGenotyper. In another pipeline, reads were aligned with

BWA (with softmasking applied to overlapping read pairs), and SNVs called using

samtools mpileup (Li et al., 2009) and bcftools (version 0.1.17), with parameter

adjustments to allow calling of somatic variants. Default settings of bcftools are designed

for diploid samples, but due to tumor heterogeneity, polyploidy, and normal cell

contamination tumor genomes often have a significantly lower mutant allele frequency

than that seen in normal diploid genomes. The third pipeline used a combination of

Varscan (Koboldt et al., 2009) (v2.2.7) and SNVMix2 (Goya et al., 2010) (v0.11.8-r4)

also on BWA aligned reads. Here, variants with <18 x coverage in tumor and normal

samples as well as those with at least 1 variant count in the blood were discarded to

adjust for variant calling in tumor genomes. The scoring was performed with the

following parameters: (i) estimated tumor purity max. 60%; (ii) variant frequency in tumor

of at least 12% (iii) minimum mapping and base qualities of 10. Additionally, those SNVs

which did not pass the quality filters, but which had matching RNA reads, were retained

in the third pipeline. For calling germline variants, only high-confident SNVs present in

germline and tumor and called by both GATK and mpileup were considered. For filtering,

we excluded SNV calls without sequence coverage in the corresponding germline

Page 20: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

20

sample and reads that overlapped simple repeat regions identified by repeatmasker.

Additionally, known sequence contexts that can lead to false positives were identified

(Nakamura et al., 2011), and we generally only considered variant calls with support

from both strands in those cases in which such suspected sequencing errors were

present. SNV calls were annotated with Refseq, Ensembl genes, and dbSNP (release

dbSNP132), and nonsynonymous variant calls were inferred using Annovar (Wang et al.,

2010). Condel (Gonzalez-Perez and Lopez-Bigas, 2011) was applied to infer potentially

damaging SNVs. Finally, RNA reads were used for assessing the SNVs expression

status.

We computed the estimated False Negative Rate (FNR) based on the assumption that

at least 2 reads that support a variant are to be observed, according to general ICGC

recommendations. We show that where we reach 30x sequencing coverage, the FNR is

below 5% for all observed tumor purities (Figure S1A).

Further details on structural variant calling

For high-confident structural rearrangement detection of events, we considered calls

with a minimum of four supporting pairs or split-read support. Rearrangement calls

without a corresponding variant in the matched normal sample were inferred to be

tumor-specific when identified as unique, based on 80% reciprocal overlap criteria for

rearrangements larger than 5 Kb, and 40% reciprocal overlap criteria for smaller

rearrangements. Additionally, we removed all calls present in at least 0.5% of germline

(blood lymphocyte and lymphoblast cell line) samples used in the 1000 Genome Project

(Mills et al., 2011) or present in other germline DNA samples of our patient cohort. Gene

rearrangements and fusion genes were predicted from mapped read pair calls, using a 5

Kb search window around the two inferred breakpoint loci. We used read-depth analysis

to obtain further evidence and support for unbalanced genomic rearrangements, by

Page 21: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

21

applying BIC-seq (Xi et al., 2011) to read pairs, using standard parameters, with

lambda=4. Breakpoint information from split-read analysis was included, to obtain

optimal resolution for read-depth based unbalanced rearrangements.

Gene expression level calculation

All calculations were based on Ensembl v. 62 exons, build GRCh37.p3. In order to avoid

counting reads twice in regions featuring overlapping annotated exons belonging to the

same gene, those exons were merged into “non-redundant” exonic units. Reads

mapping to overlapping exons belonging to different genes were treated independently,

and counted for each gene. The gene coverage was estimated after a filtering step

retaining only unique reads (minimum mapping quality 1 in the correct orientation).

RPM per gene  

Counts were normalized according to the total number of uniquely mapped reads per

library and expressed as Reads per Million mapped reads (RPM values):

RPM Gene = Gene Reads *1,000,000/Total Exon Reads (millions)

RPKM per gene

Gene expression levels were quantified in terms of reads per kilobase of exon model per

million mapped reads (RPKM). This normalization was based on the total non-reduntant

cumulative exon length of a gene (non redundant exonic units):

RPKM, Reads Per Kilobase of exon model per Million mapped reads,. RPKM Gene =

RPM Gene*1,000/Exon Length (Kb)

Note: for samples with several sequencing lanes, RPKM values were averaged between

lanes.

Page 22: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

22

Discovery of fusion genes using RNA sequence read data

We also used our RNA data for fusion transcript inference. Specifically, potential fusion

events were detected by using the TopHat-Fusion (TopHat-Fusion 0.1.0 Beta) program

(Kim and Salzberg, 2011), an enhanced version of TopHat aligning reads across

potential fusion points. Fastq files from paired-end RNA-seq data were used as input.

The minimum required read match size at each fusion end (i.e., fusion anchor size) was

set to 13 nt. Two mismatches were permitted per read (default parameter). A minimum

number of both spanning reads and matching pairs framing the fusion was requested. In

a second step, the module TopHat-Fusion-Post was used, filtering out spurious fusions

due to highly similar sequences or pseudogenes. In this step reads were re-aligned

against synthetic sequences corresponding to putative fusions. In a third step, visual

inspection of the read distribution and assessment of the overall transcript read depth

upstream and downstream of the fusion were used as additional criteria in the evaluation

of high confidence events. Our search for potential fusions resulted in 17 additional

fusion transcripts (Table S2), which were not identified by our paired-end mapping

approach. Each of these displayed >10 reads spanning the fusion border, suggesting

high confidence. Using RT-PCR, 11/11 of these fusion transcripts could be verified.

Fusion transcript verification by RT-PCR and sequencing

Eleven TopHat-predicted candidate fusion transcripts were validated by RT-PCR

analysis and sequencing. Primers for amplification of neighboring exons in the normal

(unfused) transcript forms were designed using Primer3Plus

(http://www.bioinformatics.nl/cgi-bin/primer3plus/primer3plus.cgi) with annealing

temperatures set between 59-62°C and requiring different amplicon sizes for the normal

and the fusion transcripts (see Table S3). The primers were tested by RT-PCR using

total RNA from HEK 293T/17 cells (ATCC CRL-11268) or total RNA from EO-PCA

Page 23: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

23

samples to validate their performance (data not shown). The validated primers were

used to amplify the normal transcripts from the EO-PCA total RNA samples of interest

using the Verso 1-Step RT-PCR Kit (Thermo Scientific). The 25 µl RT-PCR reactions

contained 1x 1-Step PCR Master Mix, 1x Verso Enzyme Mix, 2 ng/µl total RNA, and 0.2

µM of each primer. Thermal cycling was carried out on a MJ Research PTC-200 using

the following PCR program: cDNA synthesis at 55°C for 30 min, followed by Rtase

inactivation at 95°C for 2 min, followed by 5 cycles of touchdown PCR at 95°C for 20

sec, 60-56°C (-1°C/cycle) for 30 sec and 72°C for 30 sec, followed by 35 cycles of PCR

at 95°C for 20 sec, 55°C for 30 sec and 72°C for 30 sec, followed by final extension at

72°C for 5 min. The RT-PCR products were separated on a 2.5% TBE-agarose gel with

90 V for 70 min, stained with ethidium bromide and visualized on an UV transilluminator.

The fusion transcripts were amplified using the same protocol by combining the forward

and reverse primers for the fused exons of the two different gene transcripts, and sent

for Sanger sequencing.

Reanalysis of SVs in elderly-onset PCA data from Berger and co-workers

To exclude biases introduced by differences in SV calling algorithms between our

method and the published results from Berger and co-workers, we also recalled SVs

from raw (BAM format) files published by the Berger et al. study (Berger et al., 2011). All

analysis results based on these additional analyses were consistent with our model of an

androgen-driven somatic DNA alteration landscape being prevalent in EO-PCA, with the

recalled data resulting in a similar (or even slightly more pronounced differences)

between gene rearrangements in EO-PCAs and elderly-onset PCAs. Namely, we first

confirmed, based on reanalysis of the Berger et al. raw data, that no more than 3/7

elderly-onset PCAs harbored androgen-driven oncogenic ETS transcription factor fusion

events (compare with main text, and Figure 3E). Second, the fraction of fusion gene

Page 24: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

24

rearrangements was confirmed to be significantly higher in EO-PCAs compared to

elderly-onset PCAs, based on reanalysis of the Berger et al. raw data, consistent with

the data shown in Figure 3A (p=0.0002; Welch Two Sample t-test; with a median of 0.06

inferred for the Berger et al. raw data). Third, we verified that the fraction of androgen-

responsive rearranged genes is significantly higher in EO-PCAs compared to elderly-

onset PCAs, based on reanalysis of the Berger et al. raw data, consistent with the data

shown in Figure 3C (p=9.3E-07; Welch Two Sample t-test; with a median of 0.23 inferred

for the Berger et al. raw data).

Further details on small RNA sequencing

Small RNA was eluted from gel slices in 0.3M NaCl overnight at 4°C, the gel slurry was

passed through a 5µm filter tube (IST Engineering, Milpitas, CA, USA) and precipitated

overnight at -80°C. For the preparation of small RNA libraries, the NEBNext Small RNA

Sample Prep Set (NEB, Frankfurt/M., Germany) was used following the manufacturer´s

specification with a few modifications. Briefly, NEB´s 3´ adaptor

(TCGTATGCCGTCTTCTGCTTG) was ligated to the precipitated small RNAs at 25°C for

1h. After incubation with the RT primer (CAAGCAGAAG-ACGGCATACGA), the 5´

adaptor (GUUCAGAGUUCUACAGUC-CGACGAUC) was ligated to the RNA.

Subsequently, reverse transcription was performed using the SuperScript II Reverse

Transcriptase (Invitrogen). The cDNA product was amplified by PCR using the following

cycling conditions: 3min 94°C, 13 cycles of (94°C 80s, 60°C 30s, and 65°C 15s), and a

final extension at 65°C for 5min (PCR primer sequences:

CAAGCAGAAGACGGCATACGA, AATGATACGGCGACCACCGACAGGTTCAGAGT-

TCTACAGTCCGA). Amplicons corresponding to small RNAs (�90-100bp) were purified

on a 6% TBE polyacrylamide gel and eluted in NEB´s elution buffer at RT overnight as

described above. Fragment size, purity and DNA concentration were determined with

the Bioanalyzer 2100 (Agilent, Boeblingen, Germany). Samples were sequenced (50bp,

Page 25: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

25

single read) on a HiSeq 2000 instrument. Raw sequencing reads were processed and

mapped using the function mapper.pl of the miRDeep2 package (Friedlander et al.,

2012). Low quality reads were filtered out, adaptor sequence was clipped (using the first

10nt of the adaptor) and reads shorter than 18nt were discarded. Reads were mapped to

known human miRNAs based on miRBase18.0 (Griffiths-Jones et al., 2006; Griffiths-

Jones et al., 2008) using the function quantifier.pl in miRDeep2. One mismatch was

allowed when mapping to the miRNA precursor sequence, and two nucleotides

upstream and five nucleotides downstream of the mature sequence were considered for

the mapping. We allowed for read mapping onto multiple miRNAs. The obtained raw

read counts were normalized sample-wise by dividing with the total number of reads

mapping to known human microRNAs for each sample.

Further details on DNA methylome sequencing and analysis

MCIp for enrichment of highly methylated tumor and normal DNA was carried out as

described previously (Gebhard et al., 2006) with minor modifications. In brief, about 3 µg

DNA per sample were sonicated using a Covaris S sonicator (Covaris Inc., Woburn,

USA) for 6 min at 4°C, 20% duty cycle, intensity 5, 200 cycles/burst to obtain fragments

of about 150 bp. Using laboratory robot SX-8G IP-Star (Diagenode, Liege, Belgium),

fragmented DNA was enriched with 60 µg MBD2-Fc protein coupled to magnetic Protein

A-decorated beads (Diagenode, Liege, Belgium) for 30 min, followed by stepwise elution

with 400mM, 500mM, 550mM and 1M NaCl buffers. Eluates were desalted with MinElute

columns (Qiagen, Hilgen, Germany) and analyzed for enrichment of methylated DNA by

quantitative PCR using primers from the imprinted SNURF gene. The non-methylated

allele enriches in the low salt eluate while the methylated allele elutes with high salt. For

deep-sequencing based analysis (MCIp-seq), DNA libraries were prepared from the

highly methylated DNA fractions eluted with 1M NaCl, using the NebNext chemistry

Page 26: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

26

(New England Biolabs, Ipswich, MA, USA) according to the manufacturer’s

recommendations. In brief, 10-30 ng MCIp-enriched DNA fragments were end-repaired

and SOLiD sequencing platform compatible barcoded adaptors were ligated.

Subsequently, the libraries were enriched by 10 cycles of PCR-based amplification and

fragments of 220-270 bp were size-selected by extraction after agarose gel

electrophoresis. The purified DNA was subjected to sequence analysis by single-end 50

bp reads using the SOLiD 4 next generation sequencing platform (Applied Biosystems,

Life Technologies Corporation, Carlsbad, CA, USA). Reads were mapped to the human

genome reference sequence (Build 37) using the alignment software BFAST (Homer et

al., 2009). We performed two types of quality control: (1) we removed duplication reads

and reads with a MAQ score of <20; (2) we re-sequenced samples with a saturation

coefficient of <0.95 in order to make sure that reads covered all regions that can be

captured by MCIp (Chavez et al., 2010). To detect regions of differential methylation

between tumor and normal, we are applying three criteria (i.e., q value, coverage and

fold change) both when using locus-specific analyses (focused approach) and unbiased

analyses (genome-wide approach) (Bock et al., 2010). Methylome analyses described in

this manuscript were carried out in an unbiased, genome-wide fashion - except those

used for identifying potential driver genes, which involved locus-specific analysis.

Regions with an odds ratio >1 are considered hypermethylated, those with an odds ratio

<1 hypomethylated in the tumor samples. To test the hypothesis that such differentially

methylated promoters are non-randomly distributed throughout the PCA genome, we

constructed a test statistic according to the number of differentially methylated promoters

occurring in more than 50% of the tumors. The empirical P value was calculated based

on 10,000 permutations. Data processing was performed by a set of custom Perl

(http://www.perl.org/) and R (http://www.r-project.org/) scripts.

Page 27: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

27

Allelic linkage analysis of the SNURF locus

For allelic linkage analysis of germline SNVs in the SNURF locus, ~1.8Kb PCR products

were generated using DNA derived from blood samples of EOPC-03 and the

corresponding parents of the patient. The products were cloned with the Topo TA

cloning kit (Life Technologies, Frankfurt, Germany), and the clones were subsequently

analyzed by Sanger sequencing. Long-range PCR with tumor DNA from EOPC-03 as

template was performed with the Expand Long-Range PCR kit (Roche, Mannheim,

Germany), using primers on either side of the SNURF:ETV1 fusion point.

Gene expression analysis by quantitative real-time RT-PCR (qRT-PCR).

500 ng total RNA from snap frozen tissue was reversely transcribed with Superscript™ II

Reverse Transcriptase (Invitrogen, Darmstadt, Germany), using random hexamer

primers according to the manufactures protocol. We performed qPCR with this cDNA

using the Roche Lightcycler© 480 system (Roche diagnostics, Mannheim, Germany)

and the SYBR Green kit from Qiagen (Hilden, Germany). Expression of target genes

was normalized to the average expression levels of the housekeeping genes ACTB,

GAPDH and HPRT1.

Page 28: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

28

RNA isolation and microarray analysis of testosterone-stimulated LNCaP cells.

In order to infer androgen-regulated genes in a genome-wide fashion, LNCaP cells were

non-treated or treated with dihydrotestosterone (100nM) for 24h. Total RNA from these

cells was extracted using Trizol and RNeasy system (Macherey-Nagel). Quality and

concentration of isolated RNA was determined using the Agilent RNA 6000 Nano Kit

(Agilent Technologies) and NanoDrop 1000 (Peqlab). Procedures for cDNA synthesis,

labeling and hybridization were carried out according to 3’ IVT Express Kit and

Hybridization, Wash and Stain Kit (Affymetrix, Santa Clara, USA) using 100 ng total RNA

as starting material. All experiments were performed using Human GeneChip U133 Plus

2.0 Array (Affymetrix) containing more than 47,000 transcripts and variants, including

more than 38,500 well characterized genes. Microarrays were scanned with the

GeneChip Scanner 3000 7G using GeneChip Command Console (version 3.0,

Affymetrix). The signals were processed with GeneChip Operating Software (version

1.4, Affymetrix). To compare samples and experiments, the trimmed mean signal of

each array was scaled to a target intensity of 200. Absolute and comparison analyses

were performed with Affymetrix GCOS (version 1.4, Affymetrix) software using default

parameters. We considered genes increased or decreased by at least 1.74 fold (Signal

Log Ratio >=0.8) compared to the control as androgen-regulated.

Further details on FISH and IHC analysis

The probe sets used in this study included a PTEN deletion probe consisting of two

SpectrumGreen (SG)-labeled BAC clones (RP11-380G5, RP11-813O3; Source

Bioscience, United Kingdom) and a SpectrumOrange (SO)-labeled commercial

centromere 10 reference probe (#06J36-090; Abbott, Wiesbaden, Germany); a PTEN

break apart probe including probe sets corresponding to the 5’ upstream (SO-labeled

clones RP11-659F22, RP11-79A15) and to the 3’ downstream region of PTEN (SG-

Page 29: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

29

labeled RP11-765C10, RP11-813O3); a MAP3K7 deletion probe (SG-labeled RP3-

470J8, RP11-501P02) and a SpectrumOrange (SO)-labeled commercial centromere 6

reference probe (#06J36-06; Abbott, Wiesbaden, Germany); a SNURF:ETV1 fusion

probe (3’ ETV1: SG-labeled RP11-138H16, RP11-79G16; 5’SNURF: SO-labeled RP11-

732F04, RP11-720B15), as well as separate break apart probes for SNURF (3’SNURF:

SG-labeled RP11-732F04, RP11-720B15; SO-labeled 5’SNURF: RP11-732F04, RP11-

720B15) and ETV1 (3’ ETV1: SG-labeled RP11-138H16, RP11-79G16; 5’ ETV1: SO-

labeled RP11-173F05, RP11-621E24). Additional FISH break-apart probes included

ROS1 (3’ROS1: SG-labeled RP11-721K11, RP11-976L17; 5’ROS1: SO-labeled RP11-

48A22, RP11-835I21), NEDD4L (3’NEDD4L: SO-labeled RP11-167O10, RP11-440O04;

5’NEDD4L: SG-labeled RP11-613N08, RP11-718I15), PRPH2 (3’PRPH2: SO-labeled

RP11-18K18, RP11-315O16; 5’PRPH2: SG-labeled RP11-501I18, RP11-475N16),

CCDC21 (3’CCDC21: SG-labeled RP11-423L24, RP11-758G19; 5’CEP85: SO-labeled

RP11-111D20, RP11-349K08), MED6 (3’MED6: SG-labeled RP11-794M19; 5’MED6:

SO-labeled RP11-137A13), PPAP2A (3’PPAP2A: SG-labeled RP11-173L16; 5’PPAP2A:

SO-labeled RP11-643H16), FOXP1 (3’FOXP1: SG-labeled RP11-154H23, RP11-49E03;

5’FOXP1: SO-labeled RP11-79P21, RP11-430J3), NSL1 (3’NSL1: SG-labeled RP11-

338C15; 5’NSL1: SO-labeled RP11-348H13), and VASH2 (3’VASH2: SO-labeled RP11-

15O15; 5’VASH2: SG-labeled RP11-168K13). Based on a previous study reporting 5%

false negative results from the special case of the balanced and intrachromosomal

inversion leading to the EML4:ALK fusion in lung cancer using a break-apart probe

(Rodig et al., 2009), we estimated that the false negative detecting rate will be ≤5% in

our study, due to a preponderance of interchromosomal and/or unbalanced

rearrangements. In order to exclude false positive findings, we arbitrarily selected a

stringent threshold requiring presence of split signals in ≥50% of tumor cells to define

Page 30: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

30

gene breakage. Each tissue spot was evaluated and the predominant signal numbers

and constellation (in case of the break apart probes) was recorded for each FISH probe.

For IHC analysis, slides were deparaffinized and exposed to heat induced antigen

retrieval for 5 minutes in an autoclave at 121°C at pH7.8. Bound primary antibody was

visualized using the DAKO EnVision™ Kit (DAKO). Only nuclear ERG staining was

considered. For each tumor sample the staining intensity was judged from 0-4. Ki67 and

AR staining was performed as described before (Bubendorf et al., 1998; Minner et al.,

2011). In brief, nuclei were considered Ki67 positive if any nuclear staining was seen.

The Ki67 LI (percentage of Ki67 positive cells) was determined by scoring 100

consecutive tumor cells in each arrayed tissue sample. If fewer than 100 cells were

present in a TMA spot, all tumor cells were scored. AR expression was estimated in a

four-step scale including negative (no staining at all), weak (1+ staining in ≥1% of tumor

cells), moderate (2+ staining in ≥1% of tumor cells), and strong (3+ staining in ≥1% of

tumor cells).

Further details on TMA

The TMA contains prostatectomy specimens from patients undergoing radical

prostatectomy between 1992-2008 at the Department of Urology, University Medical

Center Hamburg-Eppendorf, Germany. Clinical follow-up data is available for the vast

majority (~90%) of arrayed tumors. Median follow-up was 46.7 months ranging from 1 to

219 months. None of the patients received neo-adjuvant endocrine therapy. Additional

(salvage) therapy was initiated in case of a biochemical relapse (BCR). In all patients,

prostate specific antigen (PSA) values were measured quarterly in the first year,

followed by biannual measurements in the second and annual measurements after the

third year following surgery. Recurrence was defined as a postoperative PSA of 0.2

ng/ml and rising thereafter. The first PSA value above or equal to 0.2 ng/ml was used to

Page 31: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

31

define the time of recurrence. Patients without evidence of tumor recurrence were

censored at the time of the last follow-up.

TMA age distribution and power analysis

Our TMA resource consisted of the following proportions of EO-PCA and elderly-onset

PCA with protein status, respectively: TMPRSS2:ERG FISH (250/5,821); ERG IHC

(383/9,184); PTEN FISH (240/5,134); 6q15 FISH (163/3,330); CHD1 FISH (128/2,846);

NCOR2 FISH (232/5,251). We analyzed a total of 11,073 TMA tumor probes, with 431

(3.89%) belonging to patients of age 50 and below. Due to the reference status of the

Martini-Clinic Prostate Cancer Center in Europe specifically young PCA patients often

show up at the hospital (based on their own initiative). Since the age-distribution and

associations are continuous, we evaluated the general correlation of EO-PCA related

events with age, instead of using categorial variables and a fixed age cut-off. If using

categorial variables (i.e., fixed age cutoffs) to facilitate effect size estimates (using the

‘pwr’ package of the R statistical programming language), we estimated a power to call

effect sizes of 0.124 (TMPRSS2:ERG), 0.081 (ERG+), 0.238 (PTEN disruptions), 0.084

(6q15 deletions), 0.089 (CHD1 disruptions) and 0.059 (NCOR2 disruptions), and an

overall ability to identify effects smaller than 0.051 with a power of 0.99 at a significance

level of 0.05 when using eight categorial variables (i.e., seven degrees of freedom; two-

sided test).

Page 32: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

32

Methylation and miRNA analyses of additional, independent prostate samples

Samples: To be able to pursue a comprehensive analysis of the DNA methylation level

of miRNA promoters and miRNA expression, we assessed an independent dataset of 35

PCAs and 35 non-matched non-malignant prostate tissue samples. The median age of

the patients was 65 (48-75) years. The Gleason score distribution of the PCA samples

was: 1x (3+3), 8x (3+4), 15x (4+3), 8x (4+5), and 3x (5+4). The tumor stages were: 9x

pT2c, 13x pT3a, 11x pT3b, and 2x pT4a. All tumor samples contained at least 70%

tumor cells. Normal prostate tissue samples were obtained from non-suspect areas of

the peripheral zone from 35 different patients with clinical low-risk tumors.

Nucleic acid extraction: DNA and RNA including miRNA were isolated using the

DNA/RNA All prep kit (Qiagen) with a few modifications. Briefly, tissue was lysed in lysis

plus buffer using a tissue lyses (Qiagen). DNA was bound to AllPrep DNA spin columns,

washed with buffers AW1 and AW2, and eluted with EB buffer. RNA was isolated from

AllPrep DNA spin column flow-through by adding 1.5 volumes of 100% ethanol, RNA

binding to RNeasy mini columns, washing with buffers WT and RPE and elution in water.

miRNA expression analysis: miRNA expression was quantified using the TaqMan

Array Human MicroRNA Set Cards v2.0 (Applied Biosystems) regarding the

manufacturers specifications (primer sequences are listed in Table S3).

Data was median normalized and differentially expressed genes were identified by

LIMMA. Raw data of the Taylor et al. dataset (Taylor et al., 2010) were quantile

normalized and differentially expressed miRNAs were identified by LIMMA.

MassArray methylation analysis: Quantitative DNA methylation analysis was

performed by MassARRAY® technique. Briefly, genomic DNA was chemically modified

with sodium bisulfite using the EZ methylation kit (Zymo Research, Orange, CA, USA)

according to the manufacturer’s instructions, in vitro transcribed, cleaved by RNase A,

and subjected to MALDI-TOF mass spectrometry analysis to determine methylation

Page 33: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

33

patterns, as previously described (Ehrich et al., 2008). DNA methylation standards (0%,

20%, 40%, 60%, 80%, and 100% methylated genomic DNA) were used to control for

potential PCR bias.

Statistical analysis: Correlation analysis of methylation and miRNA expression was

done by the Spearman rank correlation approach.

Page 34: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

34

Supplemental References

Baer, C., Claus, R., Frenzel, L. P., Zucknick, M., Park, Y. J., Gu, L., Weichenhan, D.,

Fischer, M., Pallasch, C. P., Herpel, E. et al. (2012). Extensive Promoter DNA

Hypermethylation and Hypomethylation Is Associated with Aberrant MicroRNA

Expression in Chronic Lymphocytic Leukemia. Cancer Res 72, 3775-3785.

Bock, C., Tomazou, E. M., Brinkman, A. B., Muller, F., Simmer, F., Gu, H., Jager, N.,

Gnirke, A., Stunnenberg, H. G., and Meissner, A. (2010). Quantitative comparison of

genome-wide DNA methylation mapping technologies. Nat Biotechnol 28, 1106-1114.

Bubendorf, L., Tapia, C., Gasser, T. C., Casella, R., Grunder, B., Moch, H., Mihatsch, M.

J., and Sauter, G. (1998). Ki67 labeling index in core needle biopsies independently

predicts tumor-specific survival in prostate cancer. Hum Pathol 29, 949-954.

Buiting, K., Saitoh, S., Gross, S., Dittrich, B., Schwartz, S., Nicholls, R. D., and

Horsthemke, B. (1995). Inherited microdeletions in the Angelman and Prader-Willi

syndromes define an imprinting centre on human chromosome 15. Nat Genet 9, 395-

400.

Cao, Y., Yu, S. L., Wang, Y., Guo, G. Y., Ding, Q., and An, R. H. (2011). MicroRNA-

dependent regulation of PTEN after arsenic trioxide treatment in bladder cancer cell

line T24. Tumour Biol 32, 179-188.

Chavez, L., Jozefczuk, J., Grimm, C., Dietrich, J., Timmermann, B., Lehrach, H.,

Herwig, R., and Adjaye, J. (2010). Computational analysis of genome-wide DNA

methylation during the differentiation of human embryonic stem cells along the

endodermal lineage. Genome Res 20, 1441-1450.

Page 35: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

35

Ehrich, M., Turner, J., Gibbs, P., Lipton, L., Giovanneti, M., Cantor, C., and van den

Boom, D. (2008). Cytosine methylation profiling of cancer cell lines. Proc Natl Acad

Sci U S A 105, 4844-4849.

Gonzalez-Perez, A., and Lopez-Bigas, N. (2011). Improving the assessment of the

outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel.

Am J Hum Genet 88, 440-449.

Goya, R., Sun, M. G., Morin, R. D., Leung, G., Ha, G., Wiegand, K. C., Senz, J., Crisan,

A., Marra, M. A., Hirst, M. et al. (2010). SNVMix: predicting single nucleotide

variants from next-generation sequencing of tumors. Bioinformatics 26, 730-736.

Griffiths-Jones, S., Grocock, R. J., van Dongen, S., Bateman, A., and Enright, A. J.

(2006). miRBase: microRNA sequences, targets and gene nomenclature. Nucleic

Acids Res 34, D140-4.

Griffiths-Jones, S., Saini, H. K., van Dongen, S., and Enright, A. J. (2008). miRBase:

tools for microRNA genomics. Nucleic Acids Res 36, D154-8.

Kim, D., and Salzberg, S. L. (2011). TopHat-Fusion: an algorithm for discovery of novel

fusion transcripts. Genome Biol 12, R72.

Koboldt, D. C., Chen, K., Wylie, T., Larson, D. E., McLellan, M. D., Mardis, E. R.,

Weinstock, G. M., Wilson, R. K., and Ding, L. (2009). VarScan: variant detection in

massively parallel sequencing of individual and pooled samples. Bioinformatics 25,

2283-2285.

Kong, D., Heath, E., Chen, W., Cher, M. L., Powell, I., Heilbrun, L., Li, Y., Ali, S., Sethi,

S., Hassan, O. et al. (2012). Loss of let-7 up-regulates EZH2 in prostate cancer

Page 36: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

36

consistent with the acquisition of cancer stem cell signatures that are attenuated by

BR-DIM. PLoS One 7, e33729.

Liang, Z., Li, Y., Huang, K., Wagar, N., and Shim, H. (2011). Regulation of miR-19 to

breast cancer chemoresistance through targeting PTEN. Pharm Res 28, 3091-3100.

Martens-Uzunova, E. S., Jalava, S. E., Dits, N. F., van Leenders, G. J., Moller, S.,

Trapman, J., Bangma, C. H., Litman, T., Visakorpi, T., and Jenster, G. (2012).

Diagnostic and prognostic signatures from the small non-coding RNA transcriptome

in prostate cancer. Oncogene 31, 978-991.

Mills, R. E., Walter, K., Stewart, C., Handsaker, R. E., Chen, K., Alkan, C., Abyzov, A.,

Yoon, S. C., Ye, K., Cheetham, R. K. et al. (2011). Mapping copy number variation by

population-scale genome sequencing. Nature 470, 59–65.

Montano, M., Flanagan, J. N., Jiang, L., Sebastiani, P., Rarick, M., LeBrasseur, N. K.,

Morris, C. A., Jasuja, R., and Bhasin, S. (2007). Transcriptional profiling of

testosterone-regulated genes in the skeletal muscle of human immunodeficiency virus-

infected men experiencing weight loss. J Clin Endocrinol Metab 92, 2793-2802.

Nakamura, K., Oshima, T., Morimoto, T., Ikeda, S., Yoshikawa, H., Shiwa, Y., Ishikawa,

S., Linak, M. C., Hirai, A., Takahashi, H. et al. (2011). Sequence-specific error profile

of Illumina sequencers. Nucleic Acids Res 39, e90.

Palumbo, T., Faucz, F. R., Azevedo, M., Xekouki, P., Iliopoulos, D., and Stratakis, C. A.

(2012). Functional screen analysis reveals miR-26b and miR-128 as central regulators

of pituitary somatomammotrophic tumor growth through activation of the PTEN-AKT

pathway. Oncogene

Page 37: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

37

Plass, C., and Smiraglia, D. J. (2006). Genome-wide analysis of DNA methylation

changes in human malignancies. Curr Top Microbiol Immunol 310, 179-198.

Poliseno, L., Salmena, L., Riccardi, L., Fornari, A., Song, M. S., Hobbs, R. M.,

Sportoletti, P., Varmeh, S., Egia, A., Fedele, G. et al. (2010a). Identification of the

miR-106b~25 microRNA cluster as a proto-oncogenic PTEN-targeting intron that

cooperates with its host gene MCM7 in transformation. Sci Signal 3, ra29.

Poliseno, L., Salmena, L., Zhang, J., Carver, B., Haveman, W. J., and Pandolfi, P. P.

(2010b). A coding-independent function of gene and pseudogene mRNAs regulates

tumour biology. Nature 465, 1033-1038.

Rodig, S. J., Mino-Kenudson, M., Dacic, S., Yeap, B. Y., Shaw, A., Barletta, J. A.,

Stubbs, H., Law, K., Lindeman, N., Mark, E. et al. (2009). Unique clinicopathologic

features characterize ALK-rearranged lung adenocarcinoma in the western population.

Clin Cancer Res 15, 5216-5223.

Sumazin, P., Yang, X., Chiu, H. S., Chung, W. J., Iyer, A., Llobet-Navas, D.,

Rajbhandari, P., Bansal, M., Guarnieri, P., Silva, J. et al. (2011). An extensive

microRNA-mediated network of RNA-RNA interactions regulates established

oncogenic pathways in glioblastoma. Cell 147, 370-381.

Szczyrba, J., Loprich, E., Wach, S., Jung, V., Unteregger, G., Barth, S., Grobholz, R.,

Wieland, W., Stohr, R., Hartmann, A. et al. (2010). The microRNA profile of prostate

carcinoma obtained by deep sequencing. Mol Cancer Res 8, 529-538.

Wach, S., Nolte, E., Szczyrba, J., Stohr, R., Hartmann, A., Orntoft, T., Dyrskjot, L.,

Eltze, E., Wieland, W., Keck, B. et al. (2012). MicroRNA profiles of prostate

carcinoma detected by multiplatform microRNA screening. Int J Cancer 130, 611-621.

Page 38: Supplemental Information Integrative Genomic … Information Integrative Genomic Analyses Reveal ... Ronald Simon, Lars Feuerbach, Karin ... cluster chr.1 hsa-miR-214-3p 3 1.5 -1.2

38

Wang, K., Li, M., and Hakonarson, H. (2010). ANNOVAR: functional annotation of

genetic variants from high-throughput sequencing data. Nucleic Acids Res 38, e164.

Wang, Z. X., Lu, B. B., Wang, H., Cheng, Z. X., and Yin, Y. M. (2011). MicroRNA-21

modulates chemosensitivity of breast cancer cells to doxorubicin by targeting PTEN.

Arch Med Res 42, 281-290.

Xu, X. D., Song, X. W., Li, Q., Wang, G. K., Jing, Q., and Qin, Y. W. (2012).

Attenuation of microRNA-22 derepressed PTEN to effectively protect rat

cardiomyocytes from hypertrophy. J Cell Physiol 227, 1391-1398.

Zhang, B. G., Li, J. F., Yu, B. Q., Zhu, Z. G., Liu, B. Y., and Yan, M. (2012).

microRNA-21 promotes tumor proliferation and invasion in gastric cancer by

targeting PTEN. Oncol Rep 27, 1019-1026.