identifying modules of cooperating cancer drivers...2020/06/29  · identifying modules of...

35
I DENTIFYING MODULES OF C OOPERATING C ANCER D RIVERS Michael I. Klein 1,6 , Vincent L. Cannataro 2 , Jeffrey P. Townsend 1,3,4 , Scott Newman 6 , David F. Stern 4,5and Hongyu Zhao 1,3,41 Program in Computational Biology and Bioinformatics, Yale University 2 Department of Biology, Emmanuel College, Boston, MA 3 Department of Biostatistics, Yale School of Public Health 4 Yale Cancer Center, Yale University 5 Department of Pathology, Yale School of Public Health 6 Bioinformatics R&D, Sema4, Stamford, CT Equal contributors. Correspondences: [email protected], [email protected] ABSTRACT Identifying cooperating modules of driver alterations can provide biological insights to cancer causation and would advance the development of effective personalized treatments. We present Cancer Rule-Set Optimization (CRSO) for inferring the combinations of alterations that cooperate to drive tumor formation in individual patients. Application to 19 TCGA cancer types found a mean of 11 core driver combinations per cancer, comprising 2-6 alterations per combination, and accounting for a mean of 70% of samples per cancer. CRSO departs from methods based on statistical cooccurrence, which we demonstrate is a suboptimal criterion for investigating driver cooperation. CRSO identified well-studied driver combinations that were not detected by other approaches and nominated novel combinations that correlate with clinical outcomes in multiple cancer types. Novel synergies were identified in NRAS-mutant melanomas that may be therapeutically relevant. Core driver combinations involving NFE2L2 mutations were identified in four cancer types, supporting the therapeutic potential of NRF2 pathway inhibition. CRSO is available at https://github.com/mikekleinsgit/CRSO/. Keywords driver gene combinations · cancer etiology · personalized cancer medicine · precision oncology · multi-gene biomarker · bioinformatics · stochastic optimization 1 Introduction Cells must deregulate multiple genetic pathways in in order to become cancerous. Most recent estimates are that 2-8 ‘hits’ are necessary for a precursor cell to become neoplastic [1, 2, 3] . Although identifying a small number of driver mutations among an excess of passenger mutations is a challenging problem, many methods already exist to do so. Generally, these methods identify genes mutated significantly more than the background mutation rate would predict and their output is a list of significant genes or regions found across a whole cohort. Well known examples include MutSigCV [4] and dNdScv [5] for identification of significantly mutated genes (SMGs), and GISTIC2 [6] for identification of significant somatic copy number variations (SCNVs). Although useful in identifying known and novel cancer genes, these statistical enrichment methods do not attempt to identify co-occurring or mutually exclusive mutations. For example, activating mutations in KRAS/NRAS are frequently accompanied by loss of function of CDKN2A/B in melanoma, lung, pancreatic and other cancers. A major driving force behind the co-occurrence is that loss of the G1/S checkpoint (CDKN2A/B) is necessary to avoid oncogene induced senescence caused by KRAS oncogenic signaling [7, 8, 9]. Thus, mutations in these two genes cooperate to produce a pro-growth phenotype. Many methods can identify mutually exclusive candidate driver genes [10, 11, 12, 13, 14, 15, 16], but there are comparatively few that identify functionally relevant modules of cooccurring gene alterations in individual patients. Consequently, this extra layer of biological information and its possible relevance to therapy is not captured by any of the above methods and it would be beneficial to derive new approaches to identify groups of cooperating mutations. From a biological perspective, cooccurrence of driver alterations clearly does occur [17, 18, 19, 20], and appears to be a requirement for carcinogenesis, as evidenced by the insufficiency of BRAF and RAS hotspot mutations to transform benign colon polyps and nevi into invasive carcinoma [21, 22]. However, statistical approaches to identifying . CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229 doi: bioRxiv preprint

Upload: others

Post on 10-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS

Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4, Scott Newman6,David F. Stern4,5† and Hongyu Zhao 1,3,4†

1 Program in Computational Biology and Bioinformatics, Yale University2 Department of Biology, Emmanuel College, Boston, MA3 Department of Biostatistics, Yale School of Public Health

4 Yale Cancer Center, Yale University5 Department of Pathology, Yale School of Public Health

6 Bioinformatics R&D, Sema4, Stamford, CT† Equal contributors. Correspondences: [email protected], [email protected]

ABSTRACT

Identifying cooperating modules of driver alterations can provide biological insights to cancercausation and would advance the development of effective personalized treatments. We presentCancer Rule-Set Optimization (CRSO) for inferring the combinations of alterations that cooperate todrive tumor formation in individual patients. Application to 19 TCGA cancer types found a mean of 11core driver combinations per cancer, comprising 2-6 alterations per combination, and accounting fora mean of 70% of samples per cancer. CRSO departs from methods based on statistical cooccurrence,which we demonstrate is a suboptimal criterion for investigating driver cooperation. CRSO identifiedwell-studied driver combinations that were not detected by other approaches and nominated novelcombinations that correlate with clinical outcomes in multiple cancer types. Novel synergies wereidentified in NRAS-mutant melanomas that may be therapeutically relevant. Core driver combinationsinvolving NFE2L2 mutations were identified in four cancer types, supporting the therapeutic potentialof NRF2 pathway inhibition. CRSO is available at https://github.com/mikekleinsgit/CRSO/.

Keywords driver gene combinations · cancer etiology · personalized cancer medicine · precision oncology · multi-genebiomarker · bioinformatics · stochastic optimization

1 Introduction

Cells must deregulate multiple genetic pathways in in order to become cancerous. Most recent estimates are that2-8 ‘hits’ are necessary for a precursor cell to become neoplastic [1, 2, 3] . Although identifying a small number ofdriver mutations among an excess of passenger mutations is a challenging problem, many methods already exist todo so. Generally, these methods identify genes mutated significantly more than the background mutation rate wouldpredict and their output is a list of significant genes or regions found across a whole cohort. Well known examplesinclude MutSigCV [4] and dNdScv [5] for identification of significantly mutated genes (SMGs), and GISTIC2 [6] foridentification of significant somatic copy number variations (SCNVs).

Although useful in identifying known and novel cancer genes, these statistical enrichment methods do not attempt toidentify co-occurring or mutually exclusive mutations. For example, activating mutations in KRAS/NRAS are frequentlyaccompanied by loss of function of CDKN2A/B in melanoma, lung, pancreatic and other cancers. A major driving forcebehind the co-occurrence is that loss of the G1/S checkpoint (CDKN2A/B) is necessary to avoid oncogene inducedsenescence caused by KRAS oncogenic signaling [7, 8, 9]. Thus, mutations in these two genes cooperate to produce apro-growth phenotype. Many methods can identify mutually exclusive candidate driver genes [10, 11, 12, 13, 14, 15, 16],but there are comparatively few that identify functionally relevant modules of cooccurring gene alterations in individualpatients. Consequently, this extra layer of biological information and its possible relevance to therapy is not capturedby any of the above methods and it would be beneficial to derive new approaches to identify groups of cooperatingmutations.

From a biological perspective, cooccurrence of driver alterations clearly does occur [17, 18, 19, 20], and appearsto be a requirement for carcinogenesis, as evidenced by the insufficiency of BRAF and RAS hotspot mutations totransform benign colon polyps and nevi into invasive carcinoma [21, 22]. However, statistical approaches to identifying

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 2: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

cooccurrence have given mixed results. For example, Canisius et al. concluded that, after accounting for patient-specificmutation frequencies, there is no evidence of statistically significant cooccurrence of somatic mutations in cancer[23]. Conversely, Mina et al. did identify pairwise oncogenic synergies in multiple cancer types based on a model ofsequential evolution using their SELECT algorithm [24]. Using mathematical modeling of cancer evolution, Minaet al. argue that statistically significant cooccurrence emerges only by conditioning on the sequence of alterations,consistent with the lack of traditional cooccurrence demonstrated by Canisius et al. [23]. Zhang et al. found evidenceof statistically significant cooccurrence at the level of the pathway, but did not investigate at the gene-level [25]. Thus,current approaches can identify pairwise combinations of driver mutations in a subset of individuals. However, methodsto detect combinations of three or more drivers, and methods that attempt to identify driver combinations within everysample in a cohort (or at least within a large proportion of samples) are needed.

Here, we describe Cancer Rule-Set Optimization (CRSO), a method to identify modules of cooperating alterations thatare essential and collectively sufficient to drive cancer in individual patients. CRSO is developed as part of a theoreticalframework that assumes the existence of specific combinations of two or more alterations called rules that cooperativelydrive cancer if found in the same host cell. Rules are assumed to be minimally sufficient, meaning that exclusion ofany of the alterations renders the remaining collection of alterations insufficient to drive cancer. CRSO seeks to find acollection of rules called a rule set that represent all of the unique minimal combinations that can explain all of thegiven tumors in the study population i.e., every sample is required to harbor all of the alterations in at least one ofthe rules. CRSO is robust to heterogenous mutational rates within the cohort as it uses alteration-specific passengerprobabilities that reflect how likely specific observations would have occurred by chance. The output of CRSO canprovide biological insights to cancer causation, as well as infer the likely driver combinations in individual patients. Weshow that CRSO can identify known and novel combinations of driver alterations in 19 tissue types from The CancerGenome Atlas (TCGA) [26, 27], and that some of these combinations correlate with clinical outcomes.

2 Results

2.1 CRSO overview

CRSO finds combinations of genomic alterations (referred to as events) that are predicted to cooperate to drive cancerin individual patients. The inputs into CRSO are two event-by-sample matrices: D and P. D is a binary alterationmatrix, such that Dij = 1 if event i occurs in sample j, and 0 otherwise. P is a continuous valued penalty matrix,where Pij is the negative log of the probability of event i occurring in sample j by chance, i.e., as a passenger event.Three primary event types were considered: coding mutations identified as candidate drivers by dNdScv [5], as wellas copy number amplifications and deletions identified as candidate drivers by GISTIC2 [6]. The version of CRSOpresented here was restricted to SCNVs and coding mutations because candidate drivers for these alteration classes canbe readily generated by statistical enrichment methods. However, CRSO models can include additional alteration types,such as gene-fusions and aneuploidies.

In order to calculate passenger probabilities, mutations and copy number variations were first represented as a categoricalevent-by-sample matrix, M. Mij can take one of several values called observation types, that depend on on the typeof event i. Mutation events take values in the set {Z,HS,L, S, I}, corresponding to wild-type, hotspot mutation,loss mutation, splice site mutation or in-frame indel (Section S1). Amplification events take values in {Z,WA,SA},corresponding to wild-type, weak amplification and strong amplification, respectively. Similarly, deletion events takevalues in {Z,WD,SD}, corresponding to wild-type, weak deletion (hemizygous) and strong deletion (homozygous),respectively. Two additional event types, referred to as hybrid mutDels and mutAmps, were defined to represent genesthat were identified by both dNdScv and GISTIC2 in the same tumor type (Methods 5.1, Figure 1A).

The CRSO model defines a rule as a collection of two or more events that drive tumors harboring all of the events inthe collection. Rules are defined to be minimally sufficient, meaning that any strict subset of events within the rule areinsufficient to cause cancer. A rule set is defined as a collection of rules that account for all minimally sufficient drivercombinations within the entire cohort. Under a proposed rule set, every sample is assigned to at most one rule in therule set. Two rules are defined to be family members if one rule is a strict subset of the other. Rule sets cannot containany two rules that are family members because it is a contradiction for both rules to be minimally sufficient.

CRSO is a stochastic optimization procedure over the space of possible rule sets. When a sample is assigned to a rule ina rule set RS, the events that comprise the rule are considered to be drivers within that sample, and all of the remainingevents in that sample are considered passengers. The objective function score for RS, J(RS), is defined to be thereduction in total statistical penalty under RS compared to the null rule set (Methods 5.2). Figure 1B shows an exampleof a rule set consisting of two rules applied to the miniature melanoma dataset. The total penalty is greatly reducedonce samples are assigned to rules and the corresponding events are designated as drivers. The coverage of a rule set isthe percentage of samples that can be assigned to at least one rule in the rule set. In Fig. 1B the coverage of RS is 80%.

2

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 3: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

Figure 1: Visualization of CRSO using toy example that was extracted from TCGA melanoma (SKCM). A) Candidate driver eventsidentified by dNdScv and GISTIC2. Data is first represented as categorical matrix M, where each event can take one of severalobservation types, depending on the event type. Event types are indicated by the suffix, -M for mutations, -A for amplificationsand -MD for hybrid mutDels. Wild-type observations are represented as Z for all event types. Mutations are hotspot (HS), loss (L)splice site (S) or in-frame indel (I). Amplifications are either weak (WA) or strong (SA). Similarly, deletions are either weak (WD) orstrong (SD). MutDel hybrid events can take values in any of the mutation or deletion observation types. Events in M are representedas a binary matrix D for making rule assignments. The penalty matrix P contains penalties for each possible observation basedpatient-specific, event-specific and observation-type specific passenger probabilities. B) Objective function calculated for an examplerule set. Under the proposed rule set, the assigned events are designated as drivers, and the corresponding penalties are reduced to 0.C) Workflow of four phase procedure for identifying the best rule sets of size K, for a range of Ks.

The goal of CRSO is to find the rule set that achieves the best balance of objective function score and rule set size,which we call the core rule set. CRSO uses a four-phase procedure (Fig. 1C) to first find the highest scoring rule setof size K for K ∈ {1 . . . 40} (Methods 5.2.1). The core rule set is then determined from among all of the solutions ofsize K. A subsampling process is used to identify an expanded list of generalized core rules (GCRs). A confidencescore is determined for each GCR to be the frequency of inclusion in the sub-sampled iterations. The subset of GCRsthat have confidence levels above 50 comprise the consensus GCRs (con-GCR), and by definition cannot containfamily members. The full CRSO methodology is presented Methods 5.2.

3

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 4: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

We applied CRSO to 19 cancer types obtained from TCGA (Table 1). The average number of events per cancer typewas 86.4. There was a wide distribution of the number of candidate drivers per patient within individual cancers. In 17out of 19 cancer types, the median number of total candidate drivers (SMGs plus SCNVs) was ≥ 6, and in 6 of themwas ≥ 10 (Figure S1). CRSO identified an average of 11.1 core rules, covering an average of 70.3% of samples percancer type. Of the 210 total core rules identified, 137 (65%), 47 (22%), 13 (6%), 12 (6%) and 1 (< 0.1%) consisted of2, 3, 4, 5 and 6 events, respectively. The output of each CRSO run is a detailed report containing summaries and visualrepresentations of the results (Supplementary S5).

Table 1: Summary of 19 TCGA tissues types investigated with CRSO

Cancer Full Name Samples Events (M/C) RCT Rules Kc Cov MRS con-GCRs

BLCA Bladder urothelial CA 392 113 (47/66) 0.033 1730 14 76 2.21 15BRCA Breast invasive CA 963 90 (28/62) 0.03 449 11 58 2 13CESC Cervical and endocervical cancers 191 82 (23/59) 0.031 173 12 53 2.08 15COAD Colon AD 362 97 (35/62) 0.039 1792 14 81 3.64 10ESCA Esophageal CA 184 92 (12/80) 0.054 1894 13 84 3.23 10GBM Glioblastoma multiforme 273 78 (15/63) 0.033 186 11 77 2.27 10HNSC Head and neck squamous cell CA 505 94 (25/69) 0.032 926 10 68 2.5 7KIRC Kidney renal clear cell CA 433 39 (13/26) 0.014 66 10 52 2 11LGG Brain lower grade glioma 513 67 (19/48) 0.031 210 5 65 2.4 6LIHC Liver hepatoceullar CA 366 75 (18/57) 0.03 146 12 56 2 16LUAD Lung AD 478 93 (22/71) 0.031 291 10 60 2.1 11LUSC Lung squamous cell CA 178 112 (37/75) 0.045 1588 14 83 2.93 10OV Ovarian serous cystAD 455 77 (6/71) 0.075 1990 12 86 3.08 12PAAD Pancreatic AD 126 65 (11/54) 0.032 549 6 79 2.67 3PRAD Prostate AD 492 73 (13/60) 0.03 301 10 51 2.5 6READ Rectum AD 120 67 (13/54) 0.042 1260 12 89 3.17 9SKCM Skin cutaneous melanoma 290 71 (20/51) 0.031 197 10 69 2 9STAD Stomach AD 391 121 (37/84) 0.031 1690 13 64 2.46 14UCEC Uterine corpus endometrial CA 242 136 (39/97) 0.037 1515 11 84 2.36 7

Full Name: AD short for adenocarcinoma, CA short for carcinoma. Events total number of events as determined by dNdScv andGistic2, (M/C) is number of mutations and CNVs. RCT rule coverage threshold, chosen to be .03 or the smallest threshold satisfiedby at most 2000 rules, except in the case of KIRC. Rules starting library size. Kc core rule set size. Cov core rule set coverage.MRS mean core rule size. con-GCRs consensus GCR sizes.

2.2 CRSO performance on simulated data

In order to assess CRSO’s ability to detect known driver combinations, we initially used simulations with knownground truth rule sets. Ten simulated datasets with ground truth rule sets of size ntr were randomly generated foreach ntr ∈ {2 . . . 20}, for a total of 190 simulations. Each dataset contained 100 events and 400 samples, chosento approximate the mean number of events and samples across the TCGA datasets. Passengers rates for events andsamples were sampled from a pooled distribution of real data passenger rates (Supplementary S6.1, Figure S2). Weevaluated CRSO using 3 metrics: sensitivity, specificity, and rule assignment accuracy. Sensitivity was calculated to bethe percentage of ground truth rules included in the consensus GCRs (con-GCRs). Specificity was calculated to bethe percentage of con-GCRs that are in the ground truth rule set. Assignment accuracy was the percentage of samplesidentically assigned in the core RS assignment and the ground truth assignment.

The mean sensitivity was at least 0.85 for all ntr ∈ {2 . . . 20}, and was ≥ 0.90 for 16/19 ntr values (Table S3, Figure2A). The mean specificity was 0.73 for ntr = 2 and was ≥ 0.96 for the other 18 true rule set sizes (Table S3, Figure2B). The lowest mean accuracy was observed for ntr = 2, at 0.76, and was above 0.80 for all other ntr values. Startingwith ntr = 3, for which the mean sensitivity was 1.0, and the mean accuracy was 0.98, the general trend was thatsensitivity and accuracy decreased as ntr increased. For ntr = 2, 3 simulations performed poorly with accuraciesbelow 0.4 and the other 7 simulations all had accuracies greater than 0.90 (Figure S3), suggesting that CRSO may besusceptible to over-fitting when the number of true rules is only 2.

Across 190 simulated datasets, CRSO was able to navigate an intractable search space of possible rule sets (e.g., fora library of 1500 rules, there are > 1045 possible rule sets of size ≤ 20) and identify ground truth rules comprisingbetween 2-6 events with high specificity and high sensitivity. We compared CRSO’s performance on the simulateddataset with that of pairwise cooccurrence tests- a popular strategy for detecting driver-gene cooperation. For eachsimulation, the Fisher exact test was performed for all pairs of events, and the detected cooccurrences were compared tothe ground truth duos, i.e., the superset of all pairs of events that cooccur within at least one ground truth rule. Events

4

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 5: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

with 0 or 1 occurrence were excluded. Multiple-hypothesis correction for each simulation experiment was performedusing a false discovery rate of 5%. Sensitivities were calculated as the percentage of true pairwise interactions that wereidentified as statistically co-occurrent. Specificities were calculated as the percentage of identified pairwise interactionsthat were part of at least one ground truth rule.

The mean sensitivity achieved by the Fisher test 0.53± 0.18 over all ntr, and ranged from a high of 0.88 for ntr = 3,to a low of 0.30 for ntr = 20 (Figure 2A). The mean specificity achieved by the Fisher test was 0.41± 0.09 over allntr, and ranged from a low of 0.29 for ntr = 2, to a high of 0.60 for ntr = 19 (Figure 2B). The mean sensitivitiesand specificities achieved by CRSO outperformed those of the Fisher test for all ntr, with an average sensitivityimprovement of 0.4 and an average specificity improvement of 0.57 (Figure 2).

Figure 2: Performance of CRSO and Fisher tests applied to 190 ground truth simulations. Results are grouped according to groundtruth rule set size (NTR). The group denoted ’All’ is the global mean over all NTR. (A) Mean sensitivities of CRSO predictionsevaluated against ground truth rules (light blue bars) and Fisher test predictions evaluated against ground truth duos (orange bars).(B) Mean specificities of CRSO predictions evaluated against ground truth rules (light blue bars) and Fisher predictions evaluatedagainst ground truth duos (orange bars).

To better understand the limitations of pairwise cooccurrence tests, we applied the Fisher exact test to a version ofthe simulations having uniform passenger rates across samples and events, and to a version of the simulations withzero passenger events. Figure S2 shows an example simulation with zero passenger events (Dz), uniformly distributedpassenger events (Du), and realistically distributed noise rates that were used for the CRSO performance evaluation(Dr). The mean sensitivity for Du was 0.47 ± 0.20 across all ntr, and was comparable to the mean sensitivity forDr (Figure S5A). The mean specificity for Du was 0.96± 0.03 across all ntr, and was far superior to the those forDr (Figure S5B). The specificity results suggest that unaccounted-for heterogeneity in passenger rates across samplesand events lead to an excess of false-positives, consistent with the findings of Canisius et al. [23]. The specificity wasalways 1.0 for Dz, since it is impossible to have false positives in the absence of noise. The mean sensitivity for Dz

was 0.76± 0.11 across all ntr, and ranged from a high of 1.0 for ntr = 2, to a low of 0.64 for ntr = 20 (Figure 2A).The sensitivities for simulations without any noise show that many false negatives result directly from the ground truthrule set structure.

2.3 Application to TCGA melanoma data

We next present results from 290 TCGA cutaneous melanomas (SKCM) (Figure 3). We use this as an exemplar datasetas SKCM is a heterogeneous cancer type with some known driver combinations [28, 29, 30, 31]. We used as input, 20SMGs from dNdScv, 34 SCNV deletions and 20 SCNV amplifications from GISTIC2 narrow peak calls. Three genes,CDKN2A, PTEN and B2M, were represented as hybrid "mutDel" events as they were both significantly mutated anddeleted in the dataset (Sections S1, S2 and S3).

The candidate rule library contained 197 rules that satisfied the coverage threshold of 3%, of which 165 consisted oftwo events and 32 consisted of three events. The core rule set was determined to be the best rule set of size K = 10,as this was the smallest rule set that exceeded both the performance and coverage thresholds of at least 90% (FigureS6A-B). There were no valid rule sets that satisfied the minimum samples assigned (msa) threshold of 9 samples forK ≥ 18. Figure 4 shows the optimal assignment under the core RS and the corresponding reduction in penalty. Thirtypercent of the SKCM patients do not satisfy any of the core rules, indicated by the grey color bar in Fig. 4A.

5

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 6: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

Figure 3: CRSO Representation of TCGA Melanoma Dataset. A) Binary representation of top-25 most frequent events. Horizontalbars on the right show event frequencies across the population. B) Penalty matrix for top-25 most frequent events. Horizontal bars onthe right show total event penalties. Event types indicated by suffixes: -A for amplifications, -D for deletions, -M for mutations, and-MD for mutDel hybrid events.

Generalized core (GC) analysis using 100 subsampling iterations identified 9 con-GCRs (Methods 5.2.2). Althoughall of the melanoma con-GCRs contained 2 events, comparison of the GCRs with GC duos reveals that some con-GCRs appear as part of larger rules in some GC iterations (Figure S7). For example, BRAF-M+CDKN2A-MD andBRAF-M+PTEN-MD are observed as core rules in 83% and 84% of GC iterations, respectively, but are both observedas core duos in 100% of GC iterations. The difference in GCD and GCR confidence scores indicates that CRSO is100% confident that BRAF-M+PTEN-MD and BRAF-M+CDKN2A-MD are both essential combinations in a subsetof melanoma patients, but is approximately 84% confident that these duos are independently sufficient to producemelanomas.

The core RS and con-GCRs can be considered complementary best rule sets, with the core RS providing the singlebest performing RS and corresponding assignment over the full dataset, and the con-GCRs providing a robust set ofrules with quantified confidence scores. The union of the melanoma core RS and con-GCRs comprise 11 distinct rules,

6

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 7: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

Melanoma core rule-set assignment

Figure 4: A) Heatmap of the binary alteration matrix D under core rule set assignment. Events are ordered by frequency. Samplesare ordered according to rule set membership, as indicated by the color bar. The right-most group (red bar), are not assigned to anyrule. For each sample, assigned events are designated as drivers (black), and unassigned events (blue) are assumed to be passengers.B–C) Heatmaps of the passenger penalty matrix P before and after assignment to the core rule set.

and are dominated by rules that contain either BRAF or NRAS mutations (Table 2). Of the 11 rules, 5 contain BRAF, 5contain NRAS and only 1 rule, B2M-MD+FMN1/SNORD77-D, contains neither. Hotspot mutations in BRAF and NRASdefine the two major subtypes of melanoma that are mutually exclusive, with 50% of patients harboring BRAFV600Emutations and 30% of patients harboring NRAS hotspot mutations [32]. CRSO prioritized rules containing these eventsbecause it is improbable that these highly recurrent hotspot mutations would have happened by chance.

2.3.1 Expected melanoma combinations detected by CRSO

We categorized the 11 rules as either expected (5 rules) or potentially novel (6 rules) based on literature review. The 5expected rules involve combinations of either BRAF or NRAS with well-studied tumor suppressors: 1) NRAS-M+TP53-M, 2) BRAF-M+TP53-M, 3) NRAS-M+CDKN2A-MD, 4) BRAF-M+CDKN2A-MD, and 5) BRAF-M+PTEN-MD. Thefirst 4 of these rules are instances of cooccurring MAPK3 pathway activation with P53 inactivation- a synergy thatis known promote carcinogenesis [30, 31]. Evidence in multiple cancer types supports the cooperation betweenactivating mutations in KRAS and loss of the G1/S checkpoint by inactivation of CDKN2A or TP53 [7, 8, 9]. CRSO

7

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 8: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

Table 2: Melanoma Core RS and Consensus Generalized Core Rules (conGCR)

Rule Core Type P1 Rank Confidence Coverage SJ Fraction Assigned

NRAS-M + TP53-M Both 4 100.00 6.6% (r=29) 168 (r=8) 0.89BRAF-M + RN7SKP254-A Both 11 98.00 6.6% (r=32) 137 (r=21) 0.63

BRAF-M + HIPK2/TBXAS1-A Both 6 97.00 7.9% (r=13) 161 (r=10) 0.52CDKN2A-MD + NRAS-M Both 2 84.00 14% (r=2) 327 (r=2) 0.68

BRAF-M + PTEN-MD Both 3 84.00 11% (r=3) 227 (r=3) 0.68BRAF-M + CDKN2A-MD Both 1 83.00 26% (r=1) 524 (r=1) 0.86

HULC-A + NRAS-M Both 5 71.00 6.6% (r=28) 167 (r=9) 0.58B2M-MD + FMN1/SNORD77-D Both 10 66.00 9% (r=6) 144 (r=36) 0.54

ADAM18-M + NRAS-M Core 9 48.00 6.9% (r=20) 146 (r=11) 0.45NRAS-M + SMYD3-A Core 15 40.00 5.5% (r=40) 131 (r=20) 0.62BRAF-M + TP53-M conGCR 8 51.00 8.3% (r=9) 156 (r=7) -

Core Type Core RS, conGCR or Both. Confidence is generalized core rules confidence level, conGCRs defined as rules withconfidence above 50. P1 Ranks phase 1 ranking of rule. Coverage is the percentage of samples that cover the rule (and rankingamong all the rules in the rule library), SJ is the single rule performance (and ranking), FracA is the fraction of covered samplesthat are assigned to the rule. Only applies to core rules.

identified KRAS-M+TP53-M as a consensus GCR in 3 other cancer types, LUAD, PAAD and STAD, and identifiedKRAS-M+CDKN2A-MD as a con-GCR in LUAD and PAAD. Cooccurrence of BRAFV600E and CDKN2A loss definesa subset of pediatric brain tumors that are responsive to combined treatment with BRAF and CDK4/6 inhibitors [33].Unlike CDKN2A and TP53, PTEN was inferred to exclusively cooperate with BRAF. Supporting the minimal sufficiencyof BRAF-M+PTEN-MD, cooccurrence of BRAFV600E and PTEN loss were shown to induce metastatic melanoma inmouse models, while BRAFV600E alone only produced benign nevi in the mice [28]. PTEN loss is a mechanism ofprimary and acquired resistance to BRAF inhibitors, inspiring the development of combination strategies co-targetingthe BRAF/MEK and PI3K/AKT pathways in tumors harboring both BRAFV600E and PTEN loss [29, 34, 35]

These 5 rules exemplify CRSO’s ability to identify combinations with experimental evidence of being minimallysufficient to drive cancer, and with established utility as biomarkers and sources of rational combination strategy.Although these combinations are not novel, CRSO prioritized these rules without any prior biological knowledge. Bycontrast, 2 published methods for systematically identifying driver combinations were applied to the same TCGAmelanoma dataset and did not identify any of these 5 combinations [24, 36]. Since most strategies are based on pairwisecooccurrence, the omission of these well studied synergies in melanoma further demonstrates the inappropriateness ofstatistical cooccurrence as a criteria for detecting driver combinations.

2.3.2 Novel melanoma combinations detected by CRSO

The other 6 rules in Table 2 involve lesser-known drivers and may represent novel biological subtypes of melanoma.NRAS-M+HULC-A was prioritized as both a con-GCR (conf. = 71), and as part of the core RS. HULC is a longnon-coding RNA that has been identified as a driver of tumorigenesis in multiple cancers [37, 38, 39], includingliver cancer, osteosarcoma, cervical cancer, pancreatic, stomach and ovarian cancers [40, 41, 42, 43, 44, 45]. HULCexpression is a predictor of poor prognosis across multiple cancers [46, 47, 48, 49], and HULC silencing enhancedthe effectiveness of chemotherapy in stomach cancer cell lines [44]. The involvement of HULC in the tumorigenesisof a subset of NRAS mutant melanomas appears to be unreported and merits further investigation as a possible noveldiscovery with therapeutic implications.

NRAS-M+SMYD3-A (conf. = 40) was identified as part of the CRSO core RS. SMYD3 is a member of the histone lysinemethyltransferases enzyme family, and upregulates transcription of a plethora of oncogenes in multiple cancer types,including CDK2 and MMP2 in hepatocellular carcinomas [50], BCLAF1 in bladder cancer [51], androgen receptorsin prostate cancer [52], EGFR in renal cell carcinoma [53] and MYC and CTNNB1 in colon and liver cancers [54].Mazur et al. showed that SMYD3 methylation of MAP3K2 leads to upregulation of MAP kinase signaling and promotescarcinogenesis in RAS mutated lung and pancreatic cancers [55]. The authors further showed that preventing SMYD3catalytic activity in mouse models with oncogenic RAS mutations inhibited tumor development [55]. This presents adirect biological link between SMYD3 expression and RAS mutations, and nominates SMYD3 as a drug target for RASdriven cancers. As with HULC, the possible role of SMYD3 in melanoma appears to be unreported.

CRSO identified two synergies exclusive to BRAF-M with very high confidence: BRAF-M+HIPK2/TBXAS1-A (conf. =97) and BRAF-M+RN7SKP254-A (conf. = 98). Although we annotated amplifications according to the GISTIC narrowpeak regions, these two amplification events are part of large wide-peak regions containing many genes. The wide peakannotated by HIPK2 and TBXAS1 contains 72 genes, including BRAF. BRAF amplification is a mechanism of acquired

8

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 9: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

resistance to BRAF inhibitors in BRAFV600E melanomas [56, 57, 58, 59]. We found that patients harboring BRAF-Mand HIPK2/TBXAS1-A had shorter progression-free intervals (PFI) compared to patients harboring BRAF-M withoutHIPK2/TBXAS1-A (cox-ph P = 0.016, Table 4, Figure 5). Because TCGA tumors are primary and treatment-naive, thisobservation suggests that BRAF amplification and BRAFV600E may be a sufficient combination for tumor formation,and a cause of intrinsic resistance to BRAF inhibition.

The wide amplification peak annotated by RN7SKP254 is on chromosome 15q26.2 and contains 169 genes. Fourgenes were classified as cancer genes based on the Sanger Institute Cancer Gene Census [60]: BLM, IDH2, NTRK2and CRTC3. This rule was prioritized by CRSO as the second highest confidence rule (98%) despite ranking 32nd incoverage (6.6%) and 21st in SJ ranking, suggesting that this amplification may merit further investigation.

BRAF melanomas with HIPK2/TBXAS1-A have poor PFI

+++++++++

+++++++++++++

++++

+++++++

+++++++++++ + + + + +++++ ++ + ++ ++p = 0.0280.00

0.25

0.50

0.75

1.00

0 2500 5000 7500 10000Days

PF

I Pro

babi

lity

+ +WT BRAF−M

A

140 19 6 2 0140 35 12 3 1−−Number at risk

++

++++

++++++++++ + + + + ++++ ++ + + ++

+

+ + +

p = 0.0140.00

0.25

0.50

0.75

1.00

0 2500 5000 7500 10000Days

PF

I Pro

babi

lity

+ +BRAF only BRAF+HIPK2/TBXAS−A

B

119 33 11 3 121 2 1 0 0−−

Number at risk

Figure 5: HIPK2/TBXAS1-A annotates an amplified region that contains BRAF and defines a subtype of BRAF mutant SKCMpatients with poor PFI. A) BRAF patients have improved PFI compared to non-BRAF patients. B) SKCM patients withBRAF+HIPK2/TBXAS1-A have worse PFI than patients with only BRAF. P-values calculated from Kaplan–Meier estimator.

2.4 Known and novel findings in TCGA

In this section we highlight some findings from the TCGA CRSO results. Table S6 presents all con-GCRs, and providesa concise snapshot of the results from all cancer types.

2.4.1 Known 3-gene combinations in brain and colorectal cancers

One of the distinguishing features of CRSO is the ability to identify functionally relevant cooccurrences involving 3 ormore events. The rule ATRX-M+IDH1-M +TP53-M was identified as a con-GCR in both LGG (conf.: 71) and GBM(conf.: 63), despite only occurring in 3.7% of GBM samples. This 3-gene combination defines a well known subtype ofbrain cancers with differential prognosis, and has been experimentally shown to induce carcinogenesis [61, 62, 63].

9

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 10: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

Similarly, the well-studied combination of APC+KRAS+TP53 in colorectal cancers [64, 65, 66] was identified as acon-GCR in READ (conf.: 79), and as a non-consensus GCR in COAD (conf.: 42).

2.4.2 CRSO prioritized NFE2L2 combinations in multiple cancers

NFE2L2 mutations were identified as part of a con-GCR in 4 cancer types: LUSC, HNSC, ESCA and BLCA. All of thecon-GCRs involving NFE2L2 ranked very low in terms of single-rule metrics (Table 3), suggesting that NFE2L2 isaccounting for samples not accounted for by any higher ranking rules. NFE2L2 encodes NRF2, a key transcriptionfactor that regulates cellular response to oxidative-stress [67]. Initially, NRF2 activation was identified as a mechanismof cellular protection against cancer [68, 69]. However, we now have evidence that constitutive NRF2 activation viamutations in KEAP1 or recurrent NFE2L2 exon 2 deletions can drive tumor proliferation and metastasis [70, 67].NFE2L2 mutations have recently been shown to define subtypes with differential prognosis in lung and head and neckcancers [71, 72, 73, 74]. Several upcoming clinical are designed to explore the therapeutic benefit of compounds thatinhibit NRF2 in advanced cancer patients harboring mutations in NFE2L2 or KEAP1 [75, 76, 77]. The rules containingNFE2L2 may refine our understanding of the contexts in which NFE2L2 drives cancer and may identify subsets ofpatients that are responsive to NFE2L2 inhibition.

Table 3: Consensus GCRs involving NFE2L2

Tissue Rule Conf. Conf. Rank Coverage SJ NE

LUSC CSMD3-M + NFE2L2-MA + SOX2-A 79 3 8.43% (r=230) 152 (r=126) 3HNSC CDKN2A-MD + NFE2L2-MA + TP53-M 50 7 8.71% (r=77) 406 (r=35) 3ESCA ACTRT3/MYNN-A + NFE2L2-M + TP53-M 59 9 5.98% (r=1280) 169 (r=345) 3BLCA CDKN2A-MD + NFE2L2-M 51 15 5.61% (r=337) 220 (r=131) 2SJ is single rule objective function score. NE is number of events.

2.4.3 CRSO prioritized a rare combination in head and neck cancers (HNSC)

The rule CASP8-M+HRAS-M was identified in HNSC as the third highest ranking GCR (conf. = 73) even though itranks 591st in coverage and 301st in SJ out of 926 rules. Patients that have these two mutations and are also wild-typefor TP53 have been shown to define a biologically distinct subtype characterized by low SCNV burden [78]. Theprioritization of CASP8-M+HRAS-M despite low coverage (3.8%) demonstrates CRSO’s capability to systematicallyidentify non-obvious, biologically meaningful combinations among the myriad of possible combinations.

2.4.4 CRSO prioritized rules containing ALB mutations in liver cancer

Two high confidence GCRs were identified in LIHC involving ALB mutations: ALB-M+CTNNB1-M (confidence = 98)and ALB-M+TP53-M (confidence = 80). ALB was experimentally shown to be a tumor suppressor in hepatocellularcarcinomas [79], and has been discussed as an important part of the somatic mutation landscape [80]. Cooperationsinvolving ALB have not been systematically reported, and the two combinations we identified may help inform context-dependent treatments of ALB mutant patients. Evidence of the relevance of these rules on PFI is presented in Section2.5.

2.4.5 CRSO identified a novel 3-gene combination in bladder cancer

ARID1A-MD+SOX4-A+TP53-M was identified as a con-GCR in BLCA (conf. = 79, coverage = 7%). SOX4 over-expression has been studied experimentally and has been reported to be an important contributor in bladder cancertumorigenesis [81, 82]. The hypothesis that SOX4 cooperates with ARID1A and TP53 to initiate bladder carcinogenesisis novel and may merit further experimental investigation.

2.5 Associations of rules with patient outcomes

We evaluated whether stratifying patients according to con-GCRs predicted by CRSO instead of individual driver eventsprovided extra prognostic information. We restricted this analysis to events that appear in more than one con-GCR,(referred to as multi-rule events), since these events are predicted by CRSO to occur in distinct genetic contexts. Forevery con-GCR, R, containing a multi-rule event, E, we performed univariate cox-ph analysis to compare the PFIs ofsamples that satisfy R versus samples that harbor E but do not satisfy R (Methods 5.5).

A total of 289 tests were performed across the 19 cancer types, and 21 (7.3%) significant associations (|Z| ≥ 1.96)were detected (Table 4). Positive Z-scores indicates better outcomes in the cohort satisfying R versus that only harbors

10

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 11: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

E, and negative Z-scores indicate the opposite. To account for multiple hypotheses, an adjusted P value, PAdj , (Table4) was calculated using a permutation procedure (Methods 5.5).

Table 4: Significant PFI associations among consensus GCRs

Tissue Event Rule NR NE Z P P.Adj

BLCA ARID1A-MD ARID1A-MD + SOX4-A + TP53-M 28 100 2.986 0.0028 0.006BLCA FGFR3-MA CDKN2A-MD + FGFR3-MA 44 41 -2.536 0.0112 0.021BLCA TP53-M ARID1A-MD + SOX4-A + TP53-M 28 167 2.457 0.0140 0.083BRCA PIK3CA-MA CCND1/ORAOV1-A + PIK3CA-MA 109 298 -2.167 0.0302 0.077BRCA TP53-M PIK3CA-MA + TP53-M 130 163 2.574 0.0101 0.043ESCA TP53-M GMDS-D + PIH1/WWOX-D + TP53-M 34 118 -2.021 0.0432 0.330GBM CDKN2A-D CDKN2A-D + PIK3R1-M 22 151 -2.148 0.0317 0.155KIRC RNA5SP200-A BAP1-M + RNA5SP200-A 12 96 -3.179 0.0015 0.005LGG IDH1-M CIC-M + IDH1-M 99 298 2.499 0.0124 0.047LGG IDH1-M IDH1-M + PIK3CA-M 31 366 -2.1 0.0357 0.139LGG TP53-M ATRX-MD + IDH1-M + TP53-M 191 59 2.12 0.0340 0.055LIHC ARID1A-MD ARID1A-MD + CTNNB1-M 23 61 -2.886 0.0039 0.006LIHC CTNNB1-M ARID1A-MD + CTNNB1-M 23 74 -2.27 0.0232 0.079LIHC TP53-MD ALB-M + TP53-MD 13 111 -2.351 0.0187 0.150LUAD TP53-M SFTA3-A + TP53-M 74 177 2.009 0.0445 0.265LUSC TP53-M CERS3-A + TP53-M 26 116 2.084 0.0371 0.261OV d2(n=15) d2(n=15) + 4 Events 59 121 -2.255 0.0242 0.062SKCM BRAF-M BRAF-M + HIPK2/TBXAS1-A 21 119 -2.407 0.0161 0.084UCEC PIK3CA-M PIK3CA-M + PTEN-MD 89 38 2.221 0.0264 0.056UCEC PIK3CA-M CTNNB1-M + PIK3CA-M 40 87 1.984 0.0472 0.110UCEC PIK3CA-M PIK3CA-M + TP53-M 32 95 -3.751 0.0002 0.000Each experiment consisted of comparing the samples satisfying the rule with the samples harboring the event, but not satisfyingthe rule. NR and NE are number of samples in the rule and event classes, respectively. Z-scores (Z) and p-values (P) werecalculated using univariate cox-ph. Positive Z-score indicates that the patients satisfying the rule have better prognosis than thosewho only satisfy the event. P.adj is adjusted p-value described in Methods 5.5. The rule d2(n=15) + 4 Events is short ford2(n=15) + FKSG52/PDE4D-D + MECOM-A + MYC-A + TP53-M.

2.5.1 CRSO combinations refine the classification of IDH1-mutant LGGs

IDH1 mutations occur in 78% of LGGs and is biomarker for improved outcome in LGG patients [61] (Figure 6A). Wedetected differential outcomes associated with the event/rule pairings IDH1-M vs. IDH1-M+CIC-M and IDH1-M vs.IDH1-M+PIK3CA-M that could further stratify IDH1 mutant samples (Figure 6). Among the patients with IDH1-M(n = 397), those that also have CIC-M (n = 99) have better PFI than those that are CIC wild-type (n = 298, Figure6B, Z = 2.5). On the other hand, the 31 IDH1-mutant samples that also harbor PIK3CA-M have worse PFI thanthose that are PIK3CA wild-type (n = 366, Figure 6C, Z = -2.1). Collectively, these results suggest that LGG patientscould be stratified into 4 groups: IDH1 wild-type, IDH1 mutant+CIC mutant, IDH1 mutant+PIK3CA mutant, andIDH1 mutant+CIC/PIK3CA wild-type. Figure 6D suggests that PIK3CA-M appears to nullify the improvement in PFIconferred by IDH1, and that CIC-M enhances the improvement in PFI. The improved prognosis of IDH1-M+CIC-Mrelative to other IDH1 mutant LGGs has been previously reported and has been independently characterized by p1/q19co-deletions, which overlap highly with CIC mutations [61, 83, 63]. We did not find literature evidence of the poorprognosis of IDH1 tumors harboring PIK3CA mutations in LGGs

11

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 12: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

CRSO rules refine the stratification of IDH1-mutant brain lower grade gliomas (LGG)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++

++++ +

+

+++++++++++

++

+++++++++++++

++++++++++

+ ++ ++

+ +

p = 9.5e−170.00

0.25

0.50

0.75

1.00

0 1000 2000 3000 4000Days

PF

I Pro

babi

lity

+ +IDH1 WT

A

397 108 28 11 3 1

114 13 3 0 0 0−−

Number at risk

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++

++++ +

+

+++++++

++++

+++

+++ + + +

p = 0.0330.00

0.25

0.50

0.75

1.00

0 1000 2000 3000 4000Days

PF

I Pro

babi

lity

+ +IDH1 IDH1 + PIK3CA

C

366 105 28 11 3 1

31 3 0 0 0 0−−

Number at risk

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

++++++ +

++ ++ + +

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

++++++ ++ ++++ +

+++

p = 0.0110.00

0.25

0.50

0.75

1.00

0 1000 2000 3000 4000Days

PF

I Pro

babi

lity

+ +IDH1 CIC + IDH1

B

99 25 9 3 2

298 83 19 8 1−−

Number at risk

++++++++++++++++++++++++++++++++++++++++++++++++++

+++++ +

++ ++ + +

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

+++++ ++ ++++ +

+++

+++

+

++

+ +

+++++++++++

++

+++++++++++++

++++++++++

+ ++ ++

+ +

p = 4.9e−170.00

0.25

0.50

0.75

1.00

0 1000 2000 3000 4000Days

PF

I Pro

babi

lity

+ + + +C+I I I+P WT

D

85 24 9 3 2281 81 19 8 117 2 0 0 0114 13 3 0 0−−

−−Number at risk

Figure 6: A) IDH1 mutations define a subset of LGG patients with better PFI. B) Patients with IDH1 and CIC havebetter PFI than patients with IDH1 only. C) Patients with IDH1 and PIK3CA have worse PFI than patients with IDH1only. D) LGG patients can be stratified into four classes: IDH1 wild-type (WT), IDH1 and CIC (C+I), IDH1+PIK3CA(I+P) and IDH1 wild-type for both PIK3CA and CIC. Samples harboring IDH1, PIK3CA and CIC were excluded frompanel D (n = 14). P-values calculated from Kaplan–Meier estimator.

12

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 13: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

2.5.2 CRSO identifies potential multi-gene biomarkers in liver cancer

LIHC patients with both ARID1A-MD and CTNNB1-M (n = 23) had worse PFI expectancy than patients with ARID1A-MD but not CTNNB1-M (Table 4, n = 61, Z = -2.89, PAdj = 0.006). The patients satisfying this rule also had worsePFI expectancy than those with CTNNB1-M but not ARID1A-MD (n = 74, Z = -2.27, PAdj = 0.079). Surprisingly,neither ARID1A-MD nor CTNNB1-M is a biomarker individually (Fig. 7A-B), and yet together they appear to define asubtype with significantly worse prognosis (Fig. 7C-D).

Cooccurrence of ARID1A+CTNNB1 defines a subtype with poor PFI in liver hepatocellular carcinoma (LIHC)

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++ +++ +++ ++++ + ++ ++++ + +++ ++++ + +

+++++++++++

+++++++++++++ + +++ ++++ ++ +

+ ++

p = 0.90.00

0.25

0.50

0.75

1.00

0 1000 2000Days

PF

I Pro

babi

lity

+ +WT ARID1A−MD

A

281 44 11 2 084 16 3 1 0−

−Number at risk

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++ +++++++++ ++++ ++ ++ +++++ + +++ ++++ + + +

+++

++ +

+

++

p = 0.00960.00

0.25

0.50

0.75

1.00

0 1000 2000Days

PF

I Pro

babi

lity

+ +WT ARID1A+CTNNB1 (R)

C

342 60 14 3 023 0 0 0 0−

−Number at risk

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++ ++++++++ ++ + + ++ ++ + +

+++++++++++

++++++++++++++ +++++++++++++

+++++ + + + ++ +

p = 0.660.00

0.25

0.50

0.75

1.00

0 1000 2000Days

PF

I Pro

babi

lity

+ +WT CTNNB1−M

B

268 49 1297 11 2−

−Number at risk

+++++++++++++++++++++++++++++

++++++++++++++++++++++++++++++++++++++++++++++ ++++ + + +++ ++ + ++ ++ +

+++++++

+++++++++++ +++ ++++

++ +

+ +

++++++++

+++++++++++ ++++++++++++ ++ ++ +

+ + ++ +

+++

++ +

+

++

p = 0.0420.00

0.25

0.50

0.75

1.00

0 1000 2000Days

PF

I Pro

babi

lity

+ + + +WT/WT ARID1A CTNNB1 R

D

207 33 961 16 374 11 223 0 0−−

−−Number at risk

Figure 7: A) ARID1A-MD is not a biomarker in LIHC. B) CTNNB1-M is not a biomarker. C-D) LIHC patients with ARID1A-MDand CTNNB1-M define a subtype with significantly worse prognosis compared to all other patients. P-values calculated fromKaplan–Meier estimator.

A second subtype with poor prognosis in LIHC was defined by ALB-M+TP53-M. Patients harboring both of theseevents had significantly worse PFI than those harboring either event individually, or neither event (Figure 8).

13

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 14: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

Cooccurrence of ALB+TP53 defines a subtype with poor PFI in LIHC

+++++++++++++++

++++

++++++ +++++++++++ + ++ +++ +++++ + + + + ++ ++

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++ + ++ + + ++ + +++++ + + + + + + +

p = 0.0580.00

0.25

0.50

0.75

1.00

0 1000 2000Days

PF

I Pro

babi

lity

+ +TP53 WT

A

124 19 4 0 0241 41 10 3 0−

−Number at risk

+

+++++

++ +

+ ++ +

+

++

p = 0.0620.00

0.25

0.50

0.75

1.00

0 500 1000 1500Days

PF

I Pro

babi

lity

+ +ALB ALB + TP53

C

29 12 3 1 113 3 1 1 0−

−Number at risk

+++++++++++++++++

+

++++++ ++++++++ + + ++ +++ +++ + + + + +

+

++

p = 0.0160.00

0.25

0.50

0.75

1.00

0 500 1000 1500Days

PF

I Pro

babi

lity

+ +TP53 ALB + TP53

B

111 38 18 713 3 1 1−

−Number at risk

+

+++++

+++

+ ++ +

+

++

+++++++++++++++++

+

++++++ ++++++++ + + ++ +++ +++++ + + + +

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++

+++ + ++ + ++++ + +

p = 0.00750.00

0.25

0.50

0.75

1.00

0 1000 2000Days

PF

I Pro

babi

lity

+ + + +ALB ALB + TP53 TP53 WT

D

29 3 113 1 0

111 18 4212 38 9−−

−−Number at risk

Figure 8: A) TP53 correlates with worse PFI in LIHC. B-C) ALB+TP53 patients have worse PFI than patients with only onemutation. D) LIHC patients with ARID1A-MD and CTNNB1-M define a subtype with significantly worse prognosis compared to allother patients. P-values calculated from Kaplan–Meier estimator.

14

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 15: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

2.5.3 CDKN2A and FGFR3 co-mutation status suggest a 3-tier stratification of BLCA tumors

FGFR3-MA+KDM6A-MD and FGFR3-MA+CDKN2A-MD are the 1st and 3rd highest confidence GCRs in BLCA. Theassociation between CDKN2A and prognosis is complicated in bladder cancer, as both p16 protein over-expressionand complete lack of expression have been shown to be biomarkers of poor prognosis in bladder cancers [84, 85].In our study, CDKN2A deletion and CDKN2A mutations were combined into a single hybrid event, resulting ina simple association between CDKN2A-MD status and poor PFI (Figure 9A). We found that FGFR3-MA confersimproved PFI within CDKN2A wild-type tumors (Figure 9A), but does not associate with any PFI difference in tumorsharboring CDKN2A-MD (Figure 9B-C). These results suggest a 3-tier classification, with CDKN2A-MD defininga tier with poorer prognosis, CDKN2A/FGFR3 double wild-type defining a tier with intermediate prognosis, andFGFR3-MA+/CDKN2A-MD- defining a tier with improved prognosis (Figure 9D).

BLCA outcomes depend on CDKN2A and FGFR3

+++++++++++++++++++++++++++++++++++++++++

+++++++++++++ +++++ ++++ ++++++++ + ++

+

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++ + ++ + +

+

p = 0.00450.00

0.25

0.50

0.75

1.00

0 1000 2000 3000 4000 5000Days

PF

I Pro

babi

lity

+ +CDKN2A−MD (C+) WT (C−)

A

163 26 9 2 0 0229 49 18 6 3 0−

−Number at risk

+++++

++++++++++++++++++

++++++++++++++++++ +++ + ++ + +++ ++ + + +

++++

++++++ +

+ ++ ++ + +

+

p = 0.860.00

0.25

0.50

0.75

1.00

0 1000 2000 3000Days

PF

I Pro

babi

lity

+ +C+/F− C+/F+

B

119 18 6 044 8 3 2−

−Number at risk

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++ ++ + ++ + +

+

++++++++ +++++++++++++

++ + + + + +++ +

+

p = 0.0610.00

0.25

0.50

0.75

1.00

0 1000 2000 3000 4000 5000Days

PF

I Pro

babi

lity

+ +C−/F− C−/F+

C

188 39 12 6 3 041 10 6 0 0 0−

−Number at risk

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++ ++ + ++ + +

+

++++++++ +++++++++++++

++ + + + + +++ +

+

+++++++++++++++++++++++++++++++++++++++++

+++++++++++++ +++++ ++++ ++++++++ + ++

+

p = 0.00430.00

0.25

0.50

0.75

1.00

0 1000 2000 3000 4000 5000Days

PF

I Pro

babi

lity

+ + +C−/F− C−/F+ C+

D

188 39 12 6 3 041 10 6 0 0 0

163 26 9 2 0 0−−−

Number at risk

Figure 9: C+ indicates tumors harboring CDKN2A-MD, C- indicates tumors wild-type for CDKN2A. F+ indicates tumors harboringFGFR3-MA, and F- indicates tumors wild-type for FGFR3 A) CDKN2A-MD is a single-gene biomarker for poor PFI in bladdercancer. B) FGFR3-MA is a biomarker for improved survival among CDKN2A wild-type samples. C) FGFR3-MA status does notcorrelate with PFI in CDKN2A-MD tumors. D) Proposed 3 tier stratification of BLCA defined by C+, C-/F- and C-/F+. P-valuescalculated from Kaplan–Meier estimator.

2.6 Comparison with SELECT

We compared the CRSO TCGA results with the pairs of cooccurrences identified by Mina et al. using SELECT[24]. Sixteen TCGA cancer types were analyzed by both SELECT and CRSO (Table 5). COAD and READ wereexcluded from the side-by-side comparison because they were analyzed as a single colorectal cancer type in Mina et al..Pancreatic adenocarcinoma (PAAD) was excluded because it was not analyzed in Mina et al..

15

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 16: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

For each tissue we compared the total coverage of all statistically co-occurrent pairs identified by SELECT within acurated set of 505 pan-cancer mutations and CNVs [24], versus the coverages of the core rule sets identified by CRSO(Table 5). Although both methods identified a similar mean number of synergies per cancer type, (10.4 ± 10.2 forSELECT versus 11.1± 2.2 for CRSO), the CRSO core rules covered an average of 68% of samples (sd = 12%) percancer type compared to an average of 19% of samples (sd = 11%) covered by the SELECT synergistic pairs. The largediscrepancy in coverage is because CRSO does not require statistical cooccurrence between the events in a cancer rule.CRSO’s ability to identify a likely driver combination in the majority of samples, and to identify combinations of morethan 2 events, may facilitate precision oncology advances that could benefit many patients.

Table 5: Coverages of SELECT pairs and CRSO core RS or 16 TCGA cancer types

Select_Cov CRSO_Core_Cov Select_Num_Pairs CRSO_Num_rules

BLCA 0.38 0.75 20 14BRCA 0.27 0.58 29 11CESC 0.05 0.53 2 12ESCA 0.29 0.84 10 13GBM 0.38 0.77 3 11HNSC 0.28 0.68 9 10KIRC 0.07 0.52 2 10LGG 0.06 0.65 2 5LIHC 0.09 0.56 0 12LUAD 0.20 0.60 7 10LUSC 0.15 0.83 2 14

OV 0.20 0.86 32 12PRAD 0.10 0.51 7 10SKCM 0.11 0.69 9 10STAD 0.28 0.64 24 13UCEC 0.11 0.84 9 11

Average 0.19 0.67 10.4 11.1All SELECT predicted synergies for each cancer type were extracted, and coverage for the cancer type wascalculated using the union of covered samples. CRSO Core RS coverages for each cancer type.

For each cancer type, we compared the CRSO con-GCDs (conf. ≥50) with the co-occurring pairs detected by SELECTusing the subset of mutation events that overlap between the datasets. Copy-number events were excluded because theywere processed and defined differently in Mina et al. (Methods 5.6). In total, 99 duos were identified by either CRSOor SELECT, 5 duos were identified by both algorithms, 19 duos were identified by SELECT only and 75 duos wereidentified by CRSO only (Table S7). The large discrepancy in the number of pairs is partially because the commonmutation set represents a larger fraction of the CRSO event pool. Additionally, a single CRSO rule can contain 3,6 or 10 duos if it contains 3, 4 or 5 events. Of the 19 duos identified by SELECT that were not con-GCDs, 13 hadcoverage below the CRSO rule-inclusion threshold of 3% (6 had coverages ≤1%). Of the 6 duos with coverage ≥3%identified by SELECT only, 4 were identified by CRSO but did not achieve 50% confidence threshold, and only 2 werenot identified at all by CRSO.

CRSO identified many well-known and high coverage combinations that were missed by SELECT, includingIDH1+{ATRX/TP53/CIC/PIK3CA} in LGG, BRAF+{PTEN/TP53} in SKCM, and many high coverage combina-tions involving PIK3CA in UCEC. Moreover, the full set of cooccurrences identified by SELECT (i.e., not restrictedto common events), included 0 combinations involving IDH1 in LGG, 0 combinations involving BRAF in SKCM,and 1 low-coverage combination involving NRAS in SKCM. Many of the CRSO con-GCDs missed by SELECTconsisted of a common tumor-suppressor such as TP53 and ARID1A cooperating with a growth-promoting oncogene.Tumor-suppressors that can cooperate with many genes appear to be independent from all of them and are overlookedby approaches that rely on statistical cooccurrence. Supporting this explanation, Mina et al. reported that bothmutual-exclusivity and cooccurrence are found at higher rates between genes that are within the same pathway [24].

16

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 17: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

3 Discussion

We developed CRSO as a stochastic optimization procedure for predicting combinations of alterations that are minimallysufficient to drive cancer in individual patients. We applied CRSO to 19 TCGA cancer types, using SMGs identified bydNdScv [5] and SCNVs identified by GISTIC2 [6] as input features. An optimal core rule set was determined for eachcancer type, as well as sets of generalized core rules, trios and duos, along with confidence scores.

CRSO is an optimization over a computationally intractable search space of rule sets, i.e., sets of combinations ofevents. We performed extensive simulation analyses with known ground truth rule sets and found that CRSO was ableto reproduce the ground truth rule sets with high sensitivity and specificity, and that CRSO was able to assign individualpatients to rules with high accuracy. Analysis of TCGA results showed that CRSO identified synergies predicted bya leading algorithm for detecting pairwise synergies, and nominated many additional cooperations. We showed thatCRSO prioritized biologically important events and combinations with literature support that would not be prioritizedbased on single-event analysis or other tools.

CRSO is developed based on a theoretical model that assumes that every tumor results from a specific minimallyessential driver combination, and that a small number of such combinations account for the majority of tumors ineach cancer type. Ultimately, whether a rule is minimally sufficient cannot be proven by CRSO alone. We also do notpresume that CRSO is finding the "ground truth rule set" for the 19 cancer types, nor that an unambiguous "groundtruth rule set" exists in all cases. Rather, we hypothesize that some of the rules identified by CRSO are of biological andclinical significance, and that the probability of a rule being of biological significance is positively correlated to itsgeneralized core confidence score.

Several characteristics of CRSO distinguish it from other available methods for detecting biological cooperation fromgenetic profiles. Most approaches can only detect pairwise synergies, whereas CRSO is able identify combinations of 3or more events. CRSO takes into account passenger event probabilities by optimizing over a probabilistic representationof genetic alterations in addition to the binary representation used by most approaches. CRSO does not use pairwisestatistical significance as a criteria for driver gene cooperation. Instead, CRSO optimizes for coverage of the populationand minimization of passenger penalties, using a minimal number of rules.

3.1 Novel findings and multi-gene biomarkers from 19 TCGA cancers

We highlighted several novel combinations from among the high confidence rules inferred by CRSO in 19 TCGAcancer types (Table S6).

NRAS-M+HULC-A and NRAS-M+SMYD3-A were identified as core rules that may represent distinct mechanisms ofcarcinogenesis in NRAS mutant melanomas 2.3. HULC and SMYD3 amplifications are lesser-known events that havebeen shown in the past few years to contribute to tumorigenesis in multiple cancer types, but have not been identifiedas driver events in melanoma. SMYD3 was shown halt tumor progression in RAS driven mouse models of lung andpancreatic cancers [55], and may represent a novel therapeutic target in NRAS-mutant melanomas.

CRSO identified NFE2L2 as con-GCRs in 4 cancer types as part of 4 distinct rules that had comparatively low coverageand single-rule performance (3).The CRSO findings support that NFE2L2-mediated NRF2 activation may be an essentialdriver in multiple cancers types.

In LIHC, CRSO nominated ALB-M+CTNNB1-M and ALB-M+TP53-MD as the 2nd and 9th highest confidence GCRs(Table S6), respectively, suggestion that ALB mutations may play a vital role in the formation of some liver cancers.ALB-M+TP53-MD was further identified to be a subtype of LIHC with poor prognosis.

Additional examples were presented in LGG, LIHC and BLCA of significant differences in patient PFIs that werefound based on rules identified by CRSO. In all of these examples the PFI differences could not have been identified byconsideration of individual alterations. Rather, these differences appear to be a consequence of specific combinationsof alterations who’s cooccurrences define subtypes with better or worse prognosis. These combinations, as well asother combinations in Table 4 that were not discussed, merit further investigation as prognostic biomarkers. Evaluatingthese findings on independent datasets is a necessary next step toward determining whether any of the combinationsare reliable biomarkers that can assist clinical stratification. Incorporating information about patient treatments mayhelp refine this analysis. The interpretability of potential biomarkers identified by CRSO may provide actionableinformation that goes beyond current clinical stratification, such as suggesting combination treatments or identifyingspecific alterations as drug targets.

17

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 18: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

3.2 Limitations of pairwise cooccurrence tests and comparison with SELECT

We evaluated whether pairwise cooccurrence tests could reproduce the pairwise ground-truth synergies in 190 simulateddatasets based CRSO model assumptions and empirically estimated passenger alteration probabilities. We found thatpairwise cooccurrence testing resulted in overall poor performance, with low sensitivity and low specificity. Moreover,we found that pairwise tests had poor sensitivity even when all passenger events were excluded (Fig. S5A). Theinadequacy of pairwise cooccurrence tests can be explained by the complicated cooperation network of cancer rule setsthat allows for individual alterations to be part of multiple different rules. Consider as a minimal illustrative examplethat a cancer rule set is defined by 3 rules encompassing 3 events: RS = {A + B,A + C,B + C}. If these threerules each occur in 1/3 of the population and are perfectly mutually exclusive, we would observe perfect statisticalindependence between the three events using pairwise Fisher-exact tests.

A real data comparison was performed between CRSO and SELECT, a popular, recently developed algorithm [24]for finding pairwise relationships based on a model of conditional selection. CRSO and SELECT were applied to 16common cancer types. Although both methods use TCGA mutations and SCNVs as inputs, and both methods predict asimilar number of pairs/rules per cancer type, the percentage of the cohort harboring at least one CRSO rule is muchhigher than the percentage harboring both events of at least one SELECT pair (see Table S7). The results filteredthrough a common mutation set suggest that CRSO is able to capture many of the higher coverage SELECT pairswithout screening for cooccurrence.

SELECT uses an information theoretic approach to prioritize event pairs based on the strength of pairwise inter-dependency patterns [24]. Ultimately, SELECT seeks to find pairs of events that co-occur more or less frequentlythan expected if the events were independent. Our analysis shows that this criterion leads to the omission of manywell-known, high coverage oncogenic synergies. Many of the combinations detected by CRSO and missed by SELECTinvolve tumor suppressors that can cooperate with multiple oncogenes in different tumors. For example, SELECT didnot detect any of the well-studied and highly-recurrent combinations in melanoma between TP53 or CDKN2A witheither BRAF or NRAS. CRSO does not assume that pairwise statistical dependence is required for biological synergyand is therefore able to identify driver combinations involving tumor suppressors that cooperate with many oncogenesto contribute to carcinogenesis.

3.3 CRSO limitations and future improvements

CRSO is a tool for prioritizing biologically relevant combinations that may merit further investigation from a hugespace of possible combinations, and it is inevitable that some identified combinations will be false positives, and someimportant biological combinations will be missed. In this section we discuss likely sources of false positives and falsenegatives, and possible steps that can be taken to improve the accuracy of the results.

3.3.1 Inclusion of missing driver events

The coverages achieved by the core rule sets in different cancers ranges from a low of 51% in prostate adenocarcinoma(PRAD), to a high of 89% in rectum adenocarcinoma (READ) cancers, revealing that a substantial subset of patients inevery cancer type were not assigned to any rule. One strategy for accounting for these samples would be to expand theset of mutations and SCNVs included as events by relaxing the significance threshold used for dNdScv or GISTIC2, orby including additional events identified by other approaches.

The rules identified by CRSO are further limited by the types of events that are used as inputs. We chose to use SCNVsand SMGs because these are the most common types of driver alterations in cancer, and can be systematically prioritizedby statistical enrichment methods. However, many other types of alterations were omitted that may contribute tocancer formation, including germline alterations, arm level CNVs, gene fusions, chromosomal translocations andepigenetic alterations. For examples, TMPRSS2-ERG fusions are observed in 40% of prostate cancers [86], andp1/q19 chromosomal co-deletions are observed in 30% of lower-grade gliomas [83]. CRSO supports inclusion ofany binary-encoded event of interest. In case it is hard to calculate passenger probabilities for an event of interest, werecommend assigning a penalty equal to 1.05 times the largest penalty in each sample that harbors this event. Doing sowould ensure that the event has maximum priority in the samples that harbor it. Using a much larger penalty comparedto those associated with SCNVs and mutations could adversely impact the coverage of CRSO by over-prioritizing rulesthat cover very few samples.

Since it is inevitable that some important drivers are missing from our datasets, we performed an experiment investigatingthe robustness of the melanoma results to exclusion of important events. The results suggest that most rules identifiedin the absence of an important driver event will also be identified when the event is included (Section 5.4, Table S5).

18

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 19: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

Nonetheless, it is likely that exclusion of very high frequency driver events, such as TMPRSS2-ERG fusion, will lead toidentification of some false positives.

3.3.2 Improved variant annotation can refine inputs and improve detection of patient-specific drivercombinations

It is sometimes possible to predict the functional consequences of specific alterations within candidate driver events,e.g., using CIViC [87]. Variant annotation can be integrated into CRSO by modifying the inputs so that high confidenceneutral alterations are labeled wild-type.

Future implementations of CRSO may consider distinguishing between bi-allelic and mono-allelic mutations. Forexample it is possible that a combination involving bi-allelic loss of a tumor suppressor is sufficient to drive cancerinitiation, whereas mono-allelic loss in the same combination may require additional alterations. Unfortunately,distinguishing between bi-allelic and mono-allelic mutations from bulk tumor samples remains challenging due tointra-tumor heterogeneity [88].

3.3.3 Limitations of CNV representation and passenger probability calculation

CRSO uses the focal amplification and deletion peaks identified by GISTIC2 as input features. Some GISTIC2 peakscontain many genes, making it difficult to identify the genes of biological significance.

For mutations, we were able to account for covariates that impact mutation rates and estimate passenger probabilitiesrates of individual amino-acid substitutions using CancerEffectSizeR [89]. Unfortunately, analogous tools for calculatingpassenger probabilities of SCNVs that account for built-in biases do not exist, and we relied on a crude approach toestimate these probabilities based on alterations rates in control cytobands. One strategy towards better prioritizinglikely driver SCNVs would be to filter out deletions and amplifications that do not have a corresponding impact ongene-expression dosage in individual patients, similar to the approach use by Mina et al. [24].

We observed retrospectively that some rules identified by CRSO contained multiple GISTIC peaks on the samechromosome, with high correlation across patients are likely to be artifacts. One strategy for avoiding this unwantedbias would be to disallow rules involving multiple co-directional copy-number events on the same chromosome, or toimplement a distance threshold. Fortunately, only a small fraction of core rules and con-GCRS were in this category,and only no rules that we highlighted in the manuscript belonged in this category.

3.3.4 Refinement of cohorts may improve accuracy

In the present analysis we applied CRSO to the full TCGA cohorts for 19 cancer types. It may be of interest to applyCRSO to previously established subtypes to better understand cancer causation within minor subtypes. For example,the melanoma analysis only revealed 1 rule that was wild-type for both BRAF and NRAS, and many samples in thissubtype were not assigned to any rule.To better understand these samples, users may run CRSO only on the subtypeof double-wild-type melanomas. We recommend running dNdScv and GISTIC2 on the smaller cohort to identifysubtype-specific candidate drivers. Another interesting example would be to run CRSO separately on the establishedpam50 subtypes in breast cancer [90]. This could identify driver combination specific to basal-like breast cancers,which have limited treatment options and poor prognosis [91]. It would also be interesting to run CRSO separately onmicrosatellite stable/unstable subtypes of colorectal cancers.

3.4 Comparison of CRSO with evolutionary progression models

Computational advances in recent years have enabled inference of the evolutionary histories of individual cancers[92, 93, 94]. In 2017 Jamal-Hanjani et al. showed that it was possible to track the evolutionary histories of non-smallcell lung cancers by comparing the exome sequences from multiple regions of individual tumors [95]. The authors wereable to distinguish early clonal drivers from late subclonal drivers. A recently published study by Gerstung et al. usedthe ratio of duplicated to non-duplicated mutations within amplified regions to reconstruct the timing of driver events in2,658 cancers [96]. Their analysis found that the earliest driver events overlapped highly with the most recurrent. Theability to reproduce the timing of mutations represents a major advance in our understanding of cancer evolution andpresents opportunities for early cancer detection and customizing treatments.

CRSO differs from these methods by introducing a dichotomization of driver events as essential or non-essential. Thisdichotomization allows for stratification of cancers into a small number of subgroups, each defined by its own cancerrule. Compared to methods that infer patient-specific progression models, CRSO sacrifices comprehensiveness forsimplicity and testability. The CRSO results represent a small collection of testable hypothesis about driver eventcooperations that recur in significant subsets of tumors.

19

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 20: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

There is a large gap between the current state of clinical oncology decision making and the computational advances thatallow for reconstruction of evolutionary trajectories. It is unlikely these approaches will translate into novel precisiontreatments in the short term. CRSO can move the needle forward by transitioning from clinical stratification of patientsbased on single events towards stratifying patients based on cancer rules. Understanding the biological cooperation of asmall number of essential drivers can inform novel treatment directions. Information about evolutionary trajectories ofindividual tumors may improve the accuracy of CRSO predictions by filtering out alterations with strong evidence ofbeing sub-clonal.

3.5 Comparison of CRSO with weighted set covering approaches

Dash et al. [97] proposed a method for identifying two-hit combinations that are likely responsible for carcinogenesis.The authors represented the problem as an instance of the weighted set cover problem [98] and showed that the identifiedset of combinations could discriminate normal and cancer samples with high accuracy. The combinations in Dash etal. are similar to the rules in CRSO and the set of combinations identified as the solution to the weighted set coveralgorithm is similar to the optimal rule set identified by CRSO. However, there is a major difference in the optimizationcriteria in each of the methods. In [97] sets of combinations are prioritized based on how well they can discriminatenormal and tumor samples in a training set. This choice of criteria assumes that combinations that occur only in tumorsare likely to be cooperating drivers. There are many differences in the mutational landscapes of tumors and normaltissue and it is possible that many combinations involving passenger mutations are also likely to be observed almostexclusively in tumors. An updated version extending the approach to 3-hit and some 4-hit combinations was recentlypublished by Al Hajri et al. [36]. The updated version implements customized GPU hardware to dramatically reducecomputation time. The performance of the multi-hit combinations was comparable to the 2-hit combinations, using thecriteria of distinguishing normal from tumor. The authors considered all genes containing any coding mutation, and didnot annotate variants into functional categories (e.g., they did not distinguish hotspots from other missense mutations).The authors report that the 2-hit version of the algorithm only identified 9 confirmed cancer genes (according toCOSMIC) across 197 combinations, missing well-known drivers such as CDKN2A, BRAF, PIK3CA, EGFR, ARID1A,CTTNB1 and many others. Specific genes were not discussed in the multi-hit version [36]. In contrast to weighted setcovering approaches for identifying multi-combinations that are unique to cancer, CRSO reproduces well-known drivercombinations and nominates novel combinations with evidence of being biologically relevant to carcinogenesis.

4 Conclusion

It has been almost two decades since the advent of next generation sequencing. Projects such as TCGA haveproduced high quality datasets containing extensive molecular characterization of tens of thousands of cancer patients.Computational tools have been developed to extract information from these data that has greatly advanced ourunderstanding of the molecular underpinnings of cancer evolution. Despite all of this progress, most targeted treatmentsare prescribed to target a single genetic alteration. It is still not possible to explain much of the heterogeneity in patientresponses to different treatments. We developed CRSO as an approach to infer essential driver combinations thatcooccur in many patients. CRSO is highly flexible and easy for researchers to use. The results represent testablehypotheses that are easy to interpret. We hope that CRSO will prove helpful in identifying biologically meaningful andclinically actionable combinations of driver alterations.

From a broader perspective, we envision that wide-spread adoption of CRSO by domain experts will propel a shiftin the precision oncology community from single-gene thinking towards multi-gene thinking, and that patients willbe classified and treated according to driver combinations. For example, we suggest that development of a cancercombination census, analogous to the Sanger Institute Cancer Gene Census [60], would accelerate therapeutic advancesby nominating novel therapeutic strategies and refining clinical trial recruitment based on driver combinations. Thiscensus would be comprehensive, and include tiers based on the strength of evidence of specific combinations inspecific cancer types. For example, one tier would consist experimentally validated driver combinations, such asBRAF+PTEN in melanomas [28, 34, 29] and ATRX+IDH1+TP53 in gliomas [61, 62, 63], and lower tiers would consistof computationally predicted combinations nominated by methods such as CRSO and SELECT, with varying amount ofliterature support.

20

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 21: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

5 Methods

5.1 Representation of TCGA inputs

TCGA data from 19 cancer types (Table 1) were obtained from the January 28, 2016 GDAC Firehose [99]. Candidatedriver mutations were defined to be the set of significantly mutated genes (SMGs) identified by dNdScv as significantlymutated using the threshold qsuball < 0.1—using tissue-specific mutational covariates developed within Cannataro etal. [89]. Candidate copy number variations were defined to be the set of genomic regions identified by GISTIC2 asamplified or deleted, using the threshold q-residual < 0.25.

The inputs into CRSO are two event-by-sample matrices: D and P. D is a binary alteration matrix, such that Dij = 1if event i occurs in sample j, and 0 otherwise. P is a continuous valued penalty matrix, where Pij is the negative log ofthe probability of event i occurring in sample j by chance, i.e., as a passenger event. In order to calculate passengerprobabilities, TCGA mutations and copy number variations were first represented as a categorical event-by-samplematrix, M. Mij can take one of several values called observation types. The possible observation types for event idiffer according to the event type of event i. Three primary event types were considered: mutations, amplifications anddeletions. Mutations were represented at the gene level, whereas both copy number types were represented at the regionlevel as defined by the GISTIC2 narrow peaks. Mutation events take values in the set {Z,HS,L, S, I}, correspondingto wild-type, hotspot mutation, loss mutation, splice site mutation or in-frame indel. Amplification events take values in{Z,WA,SA}, corresponding to wild-type, weak amplification and strong amplification. Similarly, deletion events takevalues in {Z,WD,SD}, corresponding to wild-type, weak deletion (hemizygous) and strong deletion (homozygous).Two additional event types, referred to as hybrid mutDels and mutAmps, were defined to represent genes that wereidentified by both dNdScv and GISTIC2 in the same tumor type. When an amplification peak contained an SMG, theamplification and mutation events were consolidated into a single mutAmp event, taking values in the set of mutationalobservation types, amplification observation types as well as combined observation types for when both an amplificationand mutation of the gene were observed in the same sample. MutDels were defined analogously (Section S3).

The entries of D only depend on whether an observation is wild-type or not. That is, Dij equals 0 if Mij is wild-type,and equals 1 otherwise. By contrast, the entry Pij is highly sensitive to specific observation type of event i that isobserved in sample j. For example, suppose TP53 is identified as an SMG within a cancer population. Some tumorsmay contain one of many nonsense point mutations within TP53, whereas other tumors may contain a highly recurrentmissense mutation, or a splice site mutation that produces an alternative isoform of the TP53 protein. Although allof these alterations are TP53 mutations, they occur with very different passenger probabilities, sometimes spanningmultiple orders of magnitude. The dual representation defined by D and P reflects the assumption that different types ofalterations within the same event are functionally similar but probabilistically distinct. This representation also allowsCRSO to account for sample-specific differences in passenger probabilities , which can sometimes be very substantial.Wild-type events are defined to have passenger probability of 1, so that if Mij = Z then Pij = 0. Figure 1A shows anexample of the input matrices using a miniature dataset that was extracted from TCGA melanoma (SKCM) data for thepurposes of illustration.

Exact definitions of observation types and calculation of passenger probabilities are provided in the SupplementalInformation, Sections S1 for mutations, S2 for CNVs and S3 for hybrid events.

5.2 CRSO Algorithm

CRSO is an optimization procedure over the space of possible rule sets. The ability of a rule set to account for thedistribution of events in the population is quantified by an objective function. Consider a rule set RS = (r1, ..., rk),and suppose each sample has been assigned to one rule in RS, or to the null rule. The null rule is implicitly includedin every rule set as an assignment placeholder for samples that do not satisfy any of the rules in RS. When a sampleis assigned to a rule, the events that comprise the rule are considered to be drivers within that sample, and all of theremaining events in that sample are considered passengers. The statistical penalty of RS is the sum of the penalties ofall of the events designated as passengers under RS. A penalty matrix, PRS, is derived by modifying the full penaltymatrix, P, such that the penalties of all assigned events are changed to 0. For example, suppose sample j is assigned toa rule containing events x and y. This is represented in PRS by assigning PRS

x,j and PRSy,j to be 0, instead of the original

values they took in P.

The objective function score for RS, J(RS), is defined to be the reduction in total statistical penalty under RScompared to the null rule set: J(RS) =

∑Pi,j −

∑PRSi,j . In cases when a sample satisfies multiple rules in RS, the

sample is always assigned to the rule that maximizes J(RS), i.e., the rule with largest cumulative penalty in the sample.

21

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 22: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

5.2.1 Identification of best RS of size K

Given D, a starting rule library is built by identifying all rules that contain at least 2 events and occur in a minimumpercentage of samples. Except where otherwise indicated, a minimum rule coverage threshold was chosen to be thelarger of 3% of the population, or the minimum threshold that at most 2000 rules satisfy. CRSO uses a four phaseprocedure (Fig. 1C) to find the best scoring ruleset of fixed size, K, for K ∈ {1 . . . 40}. Since there will generally bea large number of rules in the starting rule library it is impossible to exhaustively evaluate every possible rule set, evenfor small K. The computational time of exhaustive evaluation grows exponentially with the size of the rule pool, i.e.,O(2n) for n rules. To address this computational limitation, phase 1 is uses a random sampling procedure to prioritizerules according to how likely they are to be among the best rule sets. In phase 2, a subset of the top-performing rulesdetermined from phase 1 are exhaustively evaluated to identify the best rule set of size K, for K ∈ {1 . . . 10}. Thenumber of rules that are exhaustively evaluated is subject to a computational constraint on the number of total rule-setsthat are considered for each K. In general a larger number of rules can be exhaustively evaluated only for very smallK. The number of rules that can be exhaustively considered is generally small for k > 4, meaning that only the top20-30 rules have the opportunity to be among the best rule sets. Since this subset may be too restrictive, the algorithm isexposed to additional rules in phase 3 by considering many rulesets that include a small number (e.g., 1-3) of unexploredrules mixed with rules from the current best rule set. In phase 4, best rulesets of sizes 11-40 are determined by samplingmany rule sets of size K +1 that overlap highly with the best scoring rule set of size K. The full description of the fourphase procedure is presented in Section S4.

5.2.2 Determination of core RS and generalized core rules

In general larger rule sets perform better than smaller rule sets because the samples in D have more assignmentopportunities. On the other hand larger rule sets may be more likely to reflect noise in the dataset rather than truebiological signal, i.e., over-fitting. Fixing the size of the rule set enforces competition between rules based on thenumber of events covered per sample, the passenger rate of events covered and the number of samples covered. Tomitigate the impact of over-fitting, CRSO requires that every rule in a rule set is assigned to a minimum number ofsamples, denoted as the msa parameter, for minimum samples assigned. The default msa parameter is 3% of thepopulation. Rule sets that do not satisfy the msa threshold are automatically assigned an objective function score andcoverage of 0, i.e., they are discarded.

The four phase procedure results in a list of best rule sets of size K, for K ∈ {1 . . .Kmax}, where Kmax is the largestK for which a valid rule set exists and satisfies the msa threshold. Typically Kmax is much lower than 40, as large rulesets are constrained by the msa requirement as well as the requirement that rule sets do not contain family members,i.e., are valid. The goal of CRSO is to find the rule set that achieves the best balance of objective function score, samplecoverage and rule set size, which is called the core rule set. The core rule set is defined to be the smallest of the bestrule sets of size K that achieves 90% of the maximum coverage and performance. Using an arbitrary threshold is aheuristic for automatically choosing a core rule set. In some cases, closer inspection of the results and the convergencecurves of performance and coverage can help identify a better choice if a clear plateau is observed. To evaluate thestability of the core rule set and automatically identify a more robust set of rules, 100 iterations of a subsamplingprocedure are used to identify generalized core rules (GCRs).

In each iteration, between 67-85% of the samples are randomly chosen, and a core rule set is determined by evaluatingthe top 100 phase 4 rule sets for each K. The msa is scaled according to the subset size, and the core coverage andperformance thresholds are randomly sampled uniformly between 85-99%. The full set of GCRs consists of all rulesthat appear in any of the subsampled cores. After performing 100 iterations, a confidence score is determined for eachGCR to be the frequency of inclusion in the sub-core iterations. The subset of GCRs that have confidence levels above50 by definition cannot contain family members, and are a valid rule set. We refer to this rule set as the consensusGCR (con-GCR). The consensus GCR represents a robust estimate of the highest confidence rules for a given datasetthat are most likely to reflect true biological cooperation.

In some cases there can be a lack of high confidence GCRs, typically occurring when the core rule set contains manyrules with many events. In such a situation, there may be sets of 2 events (duos) or 3 events (trios) that recur often withinthe sub-cores of each iteration. We calculate generalized core duos (GCDs) and generalized core trios (GCTs), tobetter identify recurrent pairwise and three-way synergies. Generalized core events (GCEs) are also calculated andrepresent the frequency of individual events being part of sub-sampled core rule sets.

5.3 Phase 1 performance analysis and parameter optimization

We evaluated the performance of phase 1 on the simulations in order to assess whether alternative parameter choiceswould improve CRSO’s overall performance. Phase 1 prioritizes rules based on their contribution to random rule sets,

22

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 23: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

and is critical for reducing the search space of possible rule sets, which would otherwise be computationally intractable.This is an inherently heuristic strategy whose performance is mathematically difficult to evaluate without knowledgeof the ground truth rule set. A phase 1 score was defined for each of the 190 simulations to quantify the similarity ofthe phase 1 ranking to the maximum possible ranking, corresponding to all true rules in the top ntr positions (SectionS6.2.1). The mean phase 1 score was more than 0.94 for all ground truth rule set sizes, indicating that phase 1 rankingssuccessfully prioritized the ground truth rules (Table S4). Phase 1 rankings were far superior at prioritizing ground truthrules compared to coverage or SJ rankings for all ntr. In 186 out of 190 (0.98) total simulations, 100% of ground truthrules were identified within the top 40 rules post phase 1. By comparison, this was the case for 30 simulations (16%)using coverage rankings and for 19 simulations (10%) using single rule performance (SJ) rankings. On average, 99.8%of true rules were identified among the top 40 phase 1 rules, compare to 64% for coverage rankings and 58% for SJranking (see Table S4 for breakdown by ntr).

The simulation results suggest that the phase 1 procedure is very effective at prioritizing ground truth rules using thedefault parameters of p1.ntpr = 20 and p1.cs = 0.25. In order to understand how the parameter choices impact thephase 1 performance, we performed phase 1 over a grid of parameter values for each of the 190 simulations. The resultsshowed that phase 1 performance was robust to parameter choices, and that optimal performance can be expected aslong as p1.ntpr ≥ 10 and p1.cs ≤ 0.5 (Section S6.2.2).

5.4 Robustness of CRSO to missing features

The rules that can be identified by CRSO depend on the collection of events included in the inputs. In order to test therobustness of CRSO to the exclusion of important features, CRSO was applied to TCGA melanoma data excludinga single event, for each of the top 15 events. The results were compared to the results from the full dataset. Overallthe results suggest that if a driver event is excluded from the inputs, the duos/rules that do not contain the event willgenerally be comprehensively captured in the incomplete dataset. The rate of false associations caused by the eventexclusion is generally small for most events, but can be large if the excluded event is frequent and part of manyduos/rules in the full dataset (Supplemental S7, Table S5).

5.5 Clinical outcome analysis

Consider multi-rule event E that appears in rule R. The following test was performed to determine if there is statisticalevidence of differential patient prognosis associated with R. The subset of patients that contain E are considered,and those that are wild-type for E are disregarded. Patients within this cohort were assigned to one of two classes:those that satisfy R are assigned to class “rule", and those that harbor E but do not satisfy R are assigned to class“event". Univariate cox-ph analysis was performed to test for differences in progression-free intervals (PFI) between thetwo classes. The result of each test is summarized with a Z score, as recommended in [100]. Compared to using Pvalues for ascertaining statistical significance, Z scores provide additional information about the direction of the PFIdifferences. Positive Z scores indicate better PFI for patients in the rule class compared to patients in the event class,and negative Z scores indicate the opposite. PFI data were obtained from a recently published resource for TCGAoutcome analysis [101], and the use of PFI as primary endpoint is consistent with the authors’ recommendations forbest practices.

For each cancer type, each multi-rule event, E, was tested against all of the eligible GCRs that include E. The reasonsfor testing E against each rule separately, instead of directly comparing all of the rules containing E, are two-fold. First,some multi-rule events occur in many GCRs making it difficult to interpret the results. Second, comparing multiplerules at once introduces a problem of dealing with patients that satisfy multiple rules.

A total of 289 tests were performed across the 19 cancer types, and 21 (7.3%) significant associations (|Z| ≥ 1.96)were detected (Table 4). Only those pairings for which both the event and rule classes contained at least 10 patientswere considered. It is tempting to argue that the expected percentage of significant associates should be 5% and that weare finding more associations than expected by chance. However, the tests should not be assumed to be independent,and conventional multiple hypothesis corrections may not be appropriate as a consequence. Instead, a permutation testwas performed in order to address the multiple hypotheses associated with each multi-event rule. For each multi-ruleevent, the PFIs of the samples that contain the event were scrambled 1000 times, and the smallest P value attained byany of the rule vs. event comparisons was stored for each iteration. An adjusted P value, PAdj , (Table 4), was definedto be the fraction of iterations that have permuted P value smaller than the real data P value.

5.6 Criteria for common-event comparison with SELECT

For each cancer type a set of common mutations was identified as the intersection of the SELECT pan cancer mutationswith the cancer specific mutation events identified by dNdScv that were used in the CRSO analysis. CDKN2A

23

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 24: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

mutations were excluded from the common set of mutations in cancers for which CDKN2A was identified as hybridmutation/deletion in the CRSO analysis. In these cancers, large coverage discrepancies were observed in duos involvingCDKN2A because SELECT encoded CDKN2A mutations as separate events from CDKN2A copy number loss. Commonevent duos for each cancer were defined to be the union of GCDs identified by CRSO and the co-occurrent pairsidentified by SELECT for which both events were in the common set of mutations.

5.7 Software Details

CRSO is written in R utilizing “ggplot2”, “survival", “survminer”, “foreach” and “doMPI” packages [102, 103, 104,105, 106]. We used cancereffectsizeR [89] to calculate amino acid level mutational rates at individual amino acidpositions (see Section S1). The CRSO R code is available at https://github.com/mikekleinsgit/CRSO/.

24

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 25: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

6 Frequently Used Acronyms

• CRSO Cancer Rule Set Optimization

• TCGA The Cancer Genome Atlas

• SCNV Somatic Copy Number Variation

• SMG Significantly Mutated Gene

• dNdScv Name of method for identifying significantly mutated genes

• GISTIC2 Name of method for identifying significant copy number variations

• GCR Generalized Core Rule

• GCT Generalized Core Trio

• GCD Generalized Core Duo

• GCE Generalized Core Event

• GC Generalized Core (sometimes used in other contexts such as GC iterations)

• con-GCR Consensus Generalized Core Rule, i.e., GCRs with confidence > 50%

• RS Rule set

• MSA Minimum samples assigned. All rules in a rule set are required to be assigned to at least msa samples.

• NTR Number of true rules. Used to describe ground truth rule set size of simulation.

• PFI Progression-Free Interval

TCGA cancer type acronyms referenced in the manuscript:

• SKCM Skin cutaneous melanoma

• LIHC Liver hepatocellular carcinoma

• COAD Colon adenocarcinoma

• READ Rectum adenocarcinoma

• LGG Low grade glioma

• HNSC Head and neck squamous cell carcinoma

7 Funding

This work was supported by National Institutes of Health [grant numbers P30CA016359 to HZ, P50CA1965305 to HZ].

8 Conflict of Interest

The authors declare that they have no conflict of interest.

9 CRSO Availability

The CRSO R package is freely available for download at https://github.com/mikekleinsgit/CRSO/.

10 Acknowledgements

The results published here are based on data generated by the TCGA Research Network: http://cancergenome.nih.gov/[107]. The authors thank Dr. Michael Kane for his help with developing the CRSO software package. The authorsthank Dr. Christos Hatzis for his counsel.

25

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 26: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

11 Author Contributions

MK, DS and HZ conceived of the project. MK programmed the CRSO software and drafted the manuscript. VC andJT provided SNV mutation rates and dNdScv values that were used to calculate TCGA passenger probabilities. SNreviewed the TCGA results and identified interesting and potentially novel combinations via literature exploration. JT,SN, DS and HZ edited the manuscript. All authors read and approved the final manuscript.

References

[1] Cristian Tomasetti, Luigi Marchionni, Martin A Nowak, Giovanni Parmigiani, and Bert Vogelstein. Only threedriver gene mutations are required for the development of lung and colorectal cancers. Proceedings of theNational Academy of Sciences of the United States of America, 112(1):118–23, jan 2015.

[2] Ramu Anandakrishnan, Robin T. Varghese, Nicholas A. Kinney, and Harold R. Garner. Estimating the numberof genetic mutations (hits) required for carcinogenesis based on the distribution of somatic mutations. PLOSComputational Biology, 15(3):e1006881, mar 2019.

[3] Bert Vogelstein, Nickolas Papadopoulos, Victor E Velculescu, Shibin Zhou, Luis A Diaz, and Kenneth W Kinzler.Cancer genome landscapes. Science (New York, N.Y.), 339(6127):1546–58, mar 2013.

[4] Michael S Lawrence, Petar Stojanov, Paz Polak, Gregory V Kryukov, Kristian Cibulskis, Andrey Sivachenko,Scott L Carter, Chip Stewart, Craig H Mermel, Steven A Roberts, Adam Kiezun, Peter S Hammerman, AaronMcKenna, Yotam Drier, Lihua Zou, Alex H Ramos, Trevor J Pugh, Nicolas Stransky, Elena Helman, Jaegil Kim,Carrie Sougnez, Lauren Ambrogio, Elizabeth Nickerson, Erica Shefler, Maria L Cortés, Daniel Auclair, GordonSaksena, Douglas Voet, Michael Noble, Daniel DiCara, Pei Lin, Lee Lichtenstein, David I Heiman, TimothyFennell, Marcin Imielinski, Bryan Hernandez, Eran Hodis, Sylvan Baca, Austin M Dulak, Jens Lohr, Dan-AviLandau, Catherine J Wu, Jorge Melendez-Zajgla, Alfredo Hidalgo-Miranda, Amnon Koren, Steven A McCarroll,Jaume Mora, Brian Crompton, Robert Onofrio, Melissa Parkin, Wendy Winckler, Kristin Ardlie, Stacey BGabriel, Charles W M Roberts, Jaclyn A Biegel, Kimberly Stegmaier, Adam J Bass, Levi A Garraway, MatthewMeyerson, Todd R Golub, Dmitry A Gordenin, Shamil Sunyaev, Eric S Lander, and Gad Getz. Mutationalheterogeneity in cancer and the search for new cancer-associated genes. Nature, 499(7457):214–218, jul 2013.

[5] Iñigo Martincorena, Keiran M Raine, Moritz Gerstung, Kevin J Dawson, Kerstin Haase, Peter Van Loo, HelenDavies, Michael R Stratton, and Peter J Campbell. Universal Patterns of Selection in Cancer and Somatic Tissues.Cell, 171(5):1029–1041.e21, nov 2017.

[6] Craig H Mermel, Steven E Schumacher, Barbara Hill, Matthew L Meyerson, Rameen Beroukhim, and Gad Getz.GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alterationin human cancers. Genome Biology, 12(4):R41, 2011.

[7] Katja Schuster, Niranjan Venkateswaran, Andrea Rabellino, Luc Girard, Samuel Penã-Llopis, and Pier PaoloScaglioni. Nullifying the CDKN2AB locus promotes mutant K-ras lung tumorigenesis. Molecular CancerResearch, 12(6):912–923, 2014.

[8] S. Courtois-Cox, S. L. Jones, and K. Cichowski. Many roads lead to oncogene-induced senescence. NatureOncogene, 27(20):2801–2809, may 2008.

[9] Andrew J. Aguirre, Nabeel Bardeesy, Manisha Sinha, Lyle Lopez, David A. Tuveson, James Horner, Mark S.Redston, and Ronald A. DePinho. Activated Kras and Ink4a/Arf deficiency cooperate to produce metastaticpancreatic ductal adenocarcinoma. Genes and Development, 17(24):3112–3126, dec 2003.

[10] Christopher A Miller, Stephen H Settle, Erik P Sulman, Kenneth D Aldape, and Aleksandar Milosavljevic.Discovering functional modules by identifying recurrent and mutually exclusive mutational patterns in tumors.BMC Medical Genomics, 4(1):34, dec 2011.

[11] Mark Dm Leiserson, Hsin-Ta Wu, Fabio Vandin, and Benjamin J Raphael. CoMEt: a statistical approach toidentify combinations of mutually exclusive alterations in cancer. Genome Biology, 16:160, 2015.

[12] Mark D. M. Leiserson, Dima Blokh, Roded Sharan, and Benjamin J. Raphael. Simultaneous Identification ofMultiple Driver Pathways in Cancer. PLoS Computational Biology, 9(5):e1003054, may 2013.

[13] Jack P. Hou, Amin Emad, Gregory J. Puleo, Jian Ma, and Olgica Milenkovic. A new correlation clusteringmethod for cancer mutation analysis. Bioinformatics, 32(24):3717, 2016.

[14] Hao Wu, Lin Gao, Feng Li, Fei Song, Xiaofei Yang, and Nikola Kasabov. Identifying overlapping mutated driverpathways by constructing gene networks in cancer. BMC Bioinformatics, 16(Suppl 5):S3, 2015.

[15] Yahya Bokhari and Tomasz Arodz. QuaDMutEx: quadratic driver mutation explorer. 18:458, 2017.

26

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 27: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

[16] Bo Gao, Guojun Li, Juntao Liu, Yang Li, and Xiuzhen Huang. Identification of driver modules in pan-cancer viacoordinating coverage and exclusivity. Oncotarget, 8(22):36115–36126, may 2017.

[17] Peter Ulz, Ellen Heitzer, and Michael R. Speicher. Co-occurrence of MYC amplification and TP53 mutations inhuman cancer. Nature Genetics, 48(2):104–106, feb 2016.

[18] Yuan Wang, Fuquan Chen, Man Zhao, Zhe Yang, Jiong Li, Shuqin Zhang, Weiying Zhang, Lihong Ye, andXiaodong Zhang. The long noncoding RNA HULC promotes liver cancer by increasing the expression of theHMGA2 oncogene via sequestration of the microRNA-186. Journal of Biological Chemistry, 292(37):15395–15407, sep 2017.

[19] Youjin Kim, Boram Lee, Joon Ho Shim, Se Hoon Lee, Woong Yang Park, Yoon La Choi, Jong Mu Sun, Jin SeokAhn, Myung Ju Ahn, and Keunchil Park. Concurrent Genetic Alterations Predict the Progression to TargetTherapy in EGFR-Mutated Advanced NSCLC. Journal of Thoracic Oncology, 14(2):193–202, feb 2019.

[20] Paul A. VanderLaan, Deepa Rangachari, Susan M. Mockus, Vanessa Spotlow, Honey V. Reddi, Joan Malcolm,Mark S. Huberman, Loren J. Joseph, Susumu S. Kobayashi, and Daniel B. Costa. Mutations in TP53, PIK3CA,PTEN and other genes in EGFR mutated lung cancers: Correlation with clinical outcomes. Lung Cancer,106:17–21, apr 2017.

[21] Miriam Juárez, Cecilia Egoavil, María Rodríguez-Soler, Eva Hernández-Illán, Carla Guarinos, Araceli García-Martínez, Cristina Alenda, Mar Giner-Calabuig, Oscar Murcia, Carolina Mangas, Artemio Payá, José R. Aparicio,Francisco A. Ruiz, Juan Martínez, Juan A. Casellas, José L. Soto, Pedro Zapater, and Rodrigo Jover. KRAS andBRAF somatic mutations in colonic polyps and the risk of metachronous neoplasia. PLoS ONE, 12(9), sep 2017.

[22] Pamela M. Pollock, Ursula L. Harper, Katherine S. Hansen, Laura M. Yudt, Mitchell Stark, Christiane M.Robbins, Tracy Y. Moses, Galen Hostetter, Urs Wagner, John Kakareka, Ghadi Salem, Tom Pohida, PeterHeenan, Paul Duray, Olli Kallioniemi, Nicholas K. Hayward, Jeffrey M. Trent, and Paul S. Meltzer. Highfrequency of BRAF mutations in nevi. Nature Genetics, 33(1):19–20, jan 2003.

[23] Sander Canisius, John W. M. Martens, and Lodewyk F. A. Wessels. A novel independence test for somaticalterations in cancer shows that biology drives mutual exclusivity but chance explains most co-occurrence.Genome Biology, 17(1):261, dec 2016.

[24] Marco Mina, Franck Raynaud, Daniele Tavernari, Elena Battistello, Stephanie Sungalee, Sadegh Saghafinia,Titouan Laessle, Francisco Sanchez-Vega, Nikolaus Schultz, Elisa Oricchio, and Giovanni Ciriello. ConditionalSelection of Genomic Alterations Dictates Cancer Evolution and Oncogenic Dependencies. Cancer Cell,32(2):155–168.e6, aug 2017.

[25] Junhua Zhang, Ling-Yun Wu, Xiang-Sun Zhang, and Shihua Zhang. Discovery of co-occurring driver pathwaysin cancer. BMC Bioinformatics, 15(1):271, dec 2014.

[26] John N. Cancer Genome Atlas Research Network, John N Weinstein, Eric A Collisson, Gordon B Mills, KennaR Mills Shaw, Brad A Ozenberger, Kyle Ellrott, Ilya Shmulevich, Chris Sander, and Joshua M Stuart. TheCancer Genome Atlas Pan-Cancer analysis project. Nature genetics, 45(10):1113–20, oct 2013.

[27] Katarzyna Tomczak, Patrycja Czerwinska, and Maciej Wiznerowicz. Review The Cancer Genome Atlas (TCGA):an immeasurable source of knowledge. Współczesna Onkologia, 1A(1A):68–77, 2015.

[28] David Dankort, David P. Curley, Robert A. Cartlidge, Betsy Nelson, Anthony N. Karnezis, William E. Damsky,Mingjian J. You, Ronald A. DePinho, Martin McMahon, and Marcus Bosenberg. BrafV600E cooperates withPten loss to induce metastatic melanoma. Nature Genetics, 41(5):544–552, may 2009.

[29] Merope Griffin, Daniele Scotto, Debra H. Josephs, Silvia Mele, Silvia Crescioli, Heather J. Bax, Giulia Pellizzari,Matthew D. Wynne, Mano Nakamura, Ricarda M. Hoffmann, Kristina M. Ilieva, Anthony Cheung, James F.Spicer, Sophie Papa, Katie E. Lacy, and Sophia N. Karagiannis. BRAF inhibitors: Resistance and the promise ofcombination treatments for melanoma. Oncotarget, 8(44):78174–78192, aug 2017.

[30] Sheng Wu Gen. The functional interactions between the p53 and MAPK signaling pathways. Cancer Biologyand Therapy, 3(2):156–161, feb 2004.

[31] Lorenzo Stramucci, Angelina Pranteda, and Gianluca Bossi. Insights of crosstalk between p53 protein and theMKK3/MKK6/p38 MAPK signaling pathway in cancer. Cancers, 10(5), may 2018.

[32] Genomic Classification of Cutaneous Melanoma. Cell, 161(7):1681–1696, jun 2015.[33] Emmanuelle Huillard, Rintaro Hashizume, Joanna J. Phillips, Amélie Griveau, Rebecca A. Ihrie, Yasuyuki

Aoki, Theodore Nicolaides, Arie Perry, Todd Waldman, Martin McMahon, William A. Weiss, Claudia Petritsch,C. David James, and David H. Rowitch. Cooperative interactions of BRAFV600E kinase and CDKN2A locusdeficiency in pediatric malignant astrocytoma as a basis for rational therapy. Proceedings of the NationalAcademy of Sciences of the United States of America, 109(22):8710–8715, may 2012.

27

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 28: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

[34] José Luís Manzano, Laura Layos, Cristina Bugés, María de los Llanos Gil, Laia Vila, Eva Martínez-Balibrea,and Anna Martínez-Cardús. Resistant mechanisms to BRAF inhibitors in melanoma. Annals of TranslationalMedicine, 4(12):6, 2016.

[35] Inna V. Fedorenko, Kim H.T. Paraiso, and Keiran S.M. Smalley. Acquired and intrinsic BRAF inhibitor resistancein BRAF V600E mutant melanoma, aug 2011.

[36] Qais Al Hajri, Sajal Dash, Wu chun Feng, Harold R. Garner, and Ramu Anandakrishnan. Identifying multi-hitcarcinogenic gene combinations: Scaling up a weighted set cover algorithm using compressed binary matrixrepresentation on a GPU. Scientific Reports, 10(1):1–18, dec 2020.

[37] Xin Yu, Heyi Zheng, Matthew T.V. Chan, and William Ka Kei Wu. HULC: an oncogenic long non-coding RNAin human cancer. Journal of Cellular and Molecular Medicine, 21(2):410–417, feb 2017.

[38] Christiane Klec, Tony Gutschner, Katrin Panzitt, and Martin Pichler. Involvement of long non-coding RNAHULC (highly up-regulated in liver cancer) in pathogenesis and implications for therapeutic intervention. ExpertOpinion on Therapeutic Targets, 23(3):177–186, mar 2019.

[39] Soudeh Ghafouri-Fard, Mohammadhosein Esmaeili, Mohammad Taheri, and Majid Samsami. Highly upregulatedin liver cancer (HULC): An update on its role in carcinogenesis. Journal of Cellular Physiology, may 2020.

[40] Katrin Panzitt, Marisa M.O. Tschernatsch, Christian Guelly, Tarek Moustafa, Martin Stradner, Heimo M.Strohmaier, Charles R. Buck, Helmut Denk, Renée Schroeder, Michael Trauner, and Kurt Zatloukal. Characteri-zation of HULC, a Novel Gene With Striking Up-Regulation in Hepatocellular Carcinoma, as Noncoding RNA.Gastroenterology, 132(1):330–342, 2007.

[41] Chen Wang, Xiaoxue Jiang, Xiaonan Li, Shuting Song, Qiuyu Meng, Liyan Wang, Yanan Lu, Xiaoru Xin, Hu Pu,Xin Gui, Tianming Li, and Dongdong Lu. Long noncoding RNA HULC accelerates the growth of human livercancer stem cells by upregulating CyclinD1 through miR675-PKM2 pathway via autophagy. Stem cell research& therapy, 11(1):8, jan 2020.

[42] Yong Li, Jing Jing Liu, Jia Hui Zhou, Rui Chen, and Chao Qun Cen. LncRNA HULC induces the progression ofosteosarcoma by regulating the miR-372-3p/HMGB1 signalling axis. Molecular Medicine, 26(1), mar 2020.

[43] Wenjun Lu, Xiaobin Wan, Limin Tao, and Junhui Wan. Long non-coding RNA HULC promotes cervical cancercell proliferation, migration and invasion via mir-218/TPD52 axis. OncoTargets and Therapy, 13:1109–1118,2020.

[44] Yifei Zhang, Xiaojing Song, Xixun Wang, Jinchen Hu, and Lixin Jiang. Silencing of LncRNA HULC EnhancesChemotherapy Induced Apoptosis in Human Gastric Cancer. Journal of Medical Biochemistry, 35(2):137–143,apr 2016.

[45] Shuo Chen, Dan Dan Wu, Xiu Bo Sang, Li Li Wang, Zhi Hong Zong, Kai Xuan Sun, Bo Liang Liu, andYang Zhao. The lncRNA HULC functions as an oncogene by targeting ATG7 and ITGB1 in epithelial ovariancarcinoma. Cell Death and Disease, 8(10):e3118, oct 2017.

[46] Yang Hua Fan, Miao Jing Wu, Yuan Jiang, Minhua Ye, Shi Gang Lu, Lei Wu, and Xin Gen Zhu. Longnon-coding RNA HULC as a potential prognostic biomarker in human cancers: A meta-analysis. Oncotarget,8(13):21410–21417, 2017.

[47] Wei Peng, Wei Gao, and Jifeng Feng. Long noncoding RNA HULC is a novel biomarker of poor prognosis inpatients with pancreatic cancer. Medical Oncology, 31(12):1–7, nov 2014.

[48] Chunjing Jin, Wei Shi, Feng Wang, Xianjuan Shen, Jing Qi, Hui Cong, Jie Yuan, Linying Shi, Bingying Zhu,Xi Luo, Yan Zhang, and Shaoqing Ju. Long non-coding RNA HULC as a novel serum biomarker for diagnosisand prognosis prediction of gastric cancer. Oncotarget, 7(32):51763–51772, aug 2016.

[49] Xian Chen, Limin Lun, Huabin Hou, Runhua Tian, Haiping Zhang, and Yunyuan Zhang. The value of lncRNAHULC as a prognostic factor for survival of cancer outcome: A meta-analysis. Cellular Physiology andBiochemistry, 41(4):1424–1434, may 2017.

[50] Yu Wang, Bin hui Xie, Wei hao Lin, Yong hui Huang, Jia yan Ni, Jie Hu, Wei Cui, Jun Zhou, Long Shen,Lin feng Xu, Fan Lian, and He ping Li. Amplification of SMYD3 promotes tumorigenicity and intrahepaticmetastasis of hepatocellular carcinoma via upregulation of CDK2 and MMP2. Oncogene, 38(25):4948–4961,jun 2019.

[51] Bing Shen, Mingyue Tan, Xinyu Mu, Yan Qin, Fang Zhang, Yong Liu, and Yu Fan. Upregulated SMYD3promotes bladder cancer progression by targeting BCLAF1 and activating autophagy. Tumor Biology, 37(6):7371–7381, jun 2016.

28

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 29: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

[52] Cheng Liu, Chang Wang, Kun Wang, Li Liu, Qi Shen, Keqiang Yan, Xiaoqing Sun, Jie Chen, Jikai Liu, HongboRen, Hainan Liu, Zhonghua Xu, Sanyuan Hu, Dawei Xu, and Yidong Fan. SMYD3 as an Oncogenic Driver inProstate cancer by Stimulation of Androgen receptor transcription. 2013.

[53] Cheng Liu, Li Liu, Kun Wang, Xiao Feng Li, Li Yuan Ge, Run Zhuo Ma, Yi Dong Fan, Lu Chao Li, Zheng FangLiu, Min Qiu, Yi Chang Hao, Zhen Feng Shi, Chuan You Xia, Klas Strååt, Yi Huang, Lu Lin Ma, and DaweiXu. VHL-HIF-2α axis-induced SMYD3 upregulation drives renal cell carcinoma progression via direct trans-activation of EGFR. Oncogene, 2020.

[54] Michalis E Sarris, Panagiotis Moulos, Anna Haroniti, and Antonis Giakountis. Smyd3 Is a TranscriptionalPotentiator of Multiple Cancer-Promoting Genes and Required for Liver and Colon Cancer DevelopmentSmyd3 invades active chromatin domains via association with H3K4Me 3 and RNA Pol-II Accession NumbersGSE67790. Cancer Cell, 29:354–366, 2016.

[55] Pawel K. Mazur, Nicolas Reynoird, Purvesh Khatri, Pascal W.T.C. Jansen, Alex W. Wilkinson, Shichong Liu,Olena Barbash, Glenn S. Van Aller, Michael Huddleston, Dashyant Dhanak, Peter J. Tummino, Ryan G. Kruger,Benjamin A. Garcia, Atul J. Butte, Michiel Vermeulen, Julien Sage, and Or Gozani. SMYD3 links lysinemethylation of MAP3K2 to Ras-driven cancer. Nature, 510(7504):283–287, may 2014.

[56] Jessie Villanueva, Jeffrey R. Infante, Clemens Krepler, Patricia Reyes-Uribe, Minu Samanta, Hsin Yi Chen,Bin Li, Rolf K. Swoboda, Melissa Wilson, Adina Vultur, Mizuho Fukunaba-Kalabis, Bradley Wubbenhorst,Thomas Y. Chen, Qin Liu, Katrin Sproesser, Douglas J. DeMarini, Tona M. Gilmer, Anne Marie Martin, RonenMarmorstein, David C. Schultz, David W. Speicher, Giorgos C. Karakousis, Wei Xu, Ravi K. Amaravadi, XiaoweiXu, Lynn M. Schuchter, Meenhard Herlyn, and Katherine L. Nathanson. Concurrent MEK2 Mutation and BRAFAmplification Confer Resistance to BRAF and MEK Inhibitors in Melanoma. Cell Reports, 4(6):1090–1099, sep2013.

[57] Ryan B. Corcoran, Dora Dias-Santagata, Kristin Bergethon, A. John Iafrate, Jeffrey Settleman, and Jeffrey A.Engelman. BRAF gene amplification can promote acquired resistance to MEK inhibitors in cancer cells harboringthe BRAF V600E mutation. Science Signaling, 3(149):ra84, nov 2010.

[58] Katherine L Nathanson, Anne-Marie Martin, Bradley Wubbenhorst, Joel Greshock, Richard Letrero, KurtD’andrea, Steven O’day, Jeffrey R Infante, Gerald S Falchook, Hendrik-Tobias Arkenau, Michael Millward,Michael P Brown, Anna Pavlick, Michael A Davies, Bo Ma, Robert Gagnon, Martin Curtis, Peter F Lebowitz,Richard Kefford, and Georgina V Long. Tumor genetic analyses of patients with metastatic melanoma treatedwith the BRAF inhibitor Dabrafenib (GSK2118436). Clin Cancer Res, 19(17):4868–4878, 2013.

[59] Camilla Stagni, Carolina Zamuner, Lisa Elefanti, Tiziana Zanin, Paola Del Bianco, Antonio Sommariva, AlessioFabozzi, Jacopo Pigozzo, Simone Mocellin, Maria Cristina Montesco, Vanna Chiarion-Sileni, Arcangela DeNicolo, and Chiara Menin. BRAF gene copy number and mutant allele frequency correlate with time toprogression in metastatic melanoma patients treated with MAPK inhibitors. Molecular Cancer Therapeutics,17(6):1332–1340, jun 2018.

[60] Zbyslaw Sondka, Sally Bamford, Charlotte G. Cole, Sari A. Ward, Ian Dunham, and Simon A. Forbes. TheCOSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nature ReviewsCancer, 18(11):696–705, nov 2018.

[61] The Cancer Genome Atlas Research Network. Comprehensive, Integrative Genomic Analysis of DiffuseLower-Grade Gliomas. New England Journal of Medicine, 372(26):2481–2498, jun 2015.

[62] Aram S. Modrek, Danielle Golub, Themasap Khan, Devin Bready, Jod Prado, Christopher Bowman, JingjingDeng, Guoan Zhang, Pedro P. Rocha, Ramya Raviram, Charalampos Lazaris, James M. Stafford, Gary LeRoy,Michael Kader, Joravar Dhaliwal, N. Sumru Bayin, Joshua D. Frenster, Jonathan Serrano, Luis Chiriboga, RabaaBaitalmal, Gouri Nanjangud, Andrew S. Chi, John G. Golfinos, Jing Wang, Matthias A. Karajannis, Richard A.Bonneau, Danny Reinberg, Aristotelis Tsirigos, David Zagzag, Matija Snuderl, Jane A. Skok, Thomas A.Neubert, and Dimitris G. Placantonakis. Low-Grade Astrocytoma Mutations in IDH1, P53, and ATRX Cooperateto Block Differentiation of Human Neural Stem Cells via Repression of SOX2. Cell Reports, 21(5):1267–1280,oct 2017.

[63] Yuchen Jiao, Patrick J Killela, Zachary J Reitman, B Ahmed Rasheed, Christopher M Heaphy, Roeland FDe Wilde, Fausto J Rodriguez, Sergio Rosemberg, Sueli Mieko Oba-Shinjo, Suely Kazue, Nagahashi Marie,Chetan Bettegowda, Nishant Agrawal, Eric Lipp, Christopher J Pirozzi, Giselle Y Lopez, Yiping He, Henry SFriedman, Allan H Friedman, Gregory J Riggins, Matthias Holdhoff, Peter Burger, Roger E Mclendon, Darell DBigner, Bert Vogelstein, Alan K Meeker, Kenneth W Kinzler, Nickolas Papadopoulos, Luis A Diaz, and Hai Yan.Frequent ATRX, CIC, FUBP1 and IDH1 mutations refine the classification of malignant gliomas. Oncotarget,3(7):710–722, 2012.

29

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 30: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

[64] Eric R. Fearon and Bert Vogelstein. A genetic model for colorectal tumorigenesis. Cell, 61(5):759–767, jun1990.

[65] Gillian Smith, Francis A. Carey, Julie Beattie, Murray J.V. Wilkie, Tracy J. Lightfoot, Jonathan Coxhead, R. ColinGarner, Robert J.C. Steele, and C. Roland Wolf. Mutations in APC, Kirsten-ras, and p53 - Alternative geneticpathways to colorectal cancer. Proceedings of the National Academy of Sciences of the United States of America,99(14):9433–9438, jul 2002.

[66] Kathleen R. Cho and Bert Vogelstein. Genetic alterations in the adenoma–carcinoma sequence. Cancer, 70(4S):1727–1731, 1992.

[67] Montserrat Rojo de la Vega, Eli Chapman, and Donna D. Zhang. NRF2 and the Hallmarks of Cancer. CancerCell, 34(1):21–43, jul 2018.

[68] Donna D. Zhang. Mechanistic studies of the Nrf2-Keap1 signaling pathway. Drug Metabolism Reviews,38(4):769–789, jul 2006.

[69] Wenge Li and Ah Ng Kong. Molecular mechanisms of Nrf2-mediated antioxidant response. MolecularCarcinogenesis, 48(2):91–104, feb 2009.

[70] Leonard D. Goldstein, James Lee, Florian Gnad, Christiaan Klijn, Annalisa Schaub, Jens Reeder, AnneleenDaemen, Corey E. Bakalarski, Thomas Holcomb, David S. Shames, Ryan J. Hartmaier, Juliann Chmielecki,Somasekar Seshagiri, Robert Gentleman, and David Stokoe. Recurrent Loss of NFE2L2 Exon 2 Is a Mechanismfor Nrf2 Pathway Activation in Human Cancers. Cell Reports, 16(10):2605–2617, sep 2016.

[71] Xian Xu, Yang Yang, Xiaoyan Liu, Na Cao, Peng Zhang, Songhui Zhao, Donglin Chen, Li Li, Yong He, XiaoweiDong, Kai Wang, Hanqing Lin, Naiquan Mao, and Lingxiang Liu. NFE2L2 / KEAP1 Mutations Correlate withHigher Tumor Mutational Burden Value/ PD-L1 Expression and Potentiate Improved Clinical Outcome withImmunotherapy . The Oncologist, apr 2020.

[72] Rieke Frank, Matthias Scheffler, Sabine Merkelbach-Bruse, Michaela A. Ihle, Anna Kron, Michael Rauer, FrankUeckeroth, Katharina Konig, Sebastian Michels, Rieke Fischer, Anna Eisert, Jana Fassunke, Carina Heydt,Monika Serke, Yon Dschun Ko, Ulrich Gerigk, Thomas Geist, Britta Kaminsky, Lukas C. Heukamp, MathieuClement-Ziza, Reinhard Buttner, and Jurgen Wolf. Clinical and pathological characteristics of KEAP1- andNFE2L2-mutated Non–Small Cell Lung Carcinoma (NSCLC). Clinical Cancer Research, 24(13):3087–3096,jul 2018.

[73] Zongang Liu, Meiyan Deng, Lin Wu, and Suning Zhang. An integrative investigation on significant mutationsand their down-stream pathways in lung squamous cell carcinoma reveals CUL3/KEAP1/NRF2 relevant subtypes.Molecular Medicine, 26(1), may 2020.

[74] Akhileshwar Namani, Md Matiur Rahaman, Ming Chen, and Xiuwen Tang. Gene-expression signature regulatedby the KEAP1-NRF2-CUL3 axis is associated with a poor prognosis in head and neck squamous cell cancer.BMC cancer, 18(1):46, jan 2018.

[75] NCT03872427 ClinicalTrials.gov. Testing Whether Cancers With Specific Mutations Respond Better to Glutami-nase Inhibitor, CB-839 HCl, Anti-Cancer Treatment, BeGIN Study.

[76] NCT04267913 ClinicalTrials.gov. Testing of TAK228 (MLN0128, Sapanisertib) Plus Docetaxel to the UsualStandard of Care for Advanced Squamous Cell Lung Cancer (A Lung-MAP Treatment Trial).

[77] NCT02417701 ClinicalTrials.gov. Sapanisertib in Treating Patients With Stage IV or Recurrent Lung Cancer.

[78] The Cancer Genome Atlas Research Network. Comprehensive genomic characterization of head and necksquamous cell carcinomas. Nature, 517(7536):576–582, jan 2015.

[79] Shunsuke Nojiri and Takashi Joh. Albumin suppresses human hepatocellular carcinoma proliferation and the cellcycle. International Journal of Molecular Sciences, 15(3):5163–5174, mar 2014.

[80] The Cancer Genome Atlas Research Network. Comprehensive and Integrative Genomic Characterization ofHepatocellular Carcinoma. Cell, 169(7):1327–1341.e23, jun 2017.

[81] He Shen, Maxime Blijlevens, Nuo Yang, Costakis Frangou, Kayla E. Wilson, Bo Xu, Yinglong Zhang, LiruiZhang, Carl D. Morrison, Lori Shepherd, Qiang Hu, Qianqian Zhu, Jianmin Wang, Song Liu, and Jianmin Zhang.Sox4 expression confers bladder cancer stem cell properties and predicts for poor patient outcome. InternationalJournal of Biological Sciences, 11(12):1363–1375, nov 2015.

[82] Josue D. Moran, Hannah H. Kim, Zhenghong Li, and Carlos S. Moreno. SOX4 regulates invasion of bladdercancer cells via repression of WNT5a. International Journal of Oncology, 55(2):359–370, aug 2019.

30

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 31: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

[83] Daniel J. Brat, Roel G.W. Verhaak, Kenneth D. Aldape, W. K.Alfred Yung, Sofie R. Salama, Lee A.D. Cooper,Esther Rheinbay, C. Ryan Miller, Mark Vitucci, Olena Morozova, A. Gordon Robertson, Houtan Noushmehr,Peter W. Laird, Andrew D. Cherniack, Rehan Akbani, Jason T. Huse, Giovanni Ciriello, Laila M. Poisson, Jill S.Barnholtz-Sloan, Mitchel S. Berger, Cameron Brennan, Rivka R. Colen, Howard Colman, Adam E. Flanders,Caterina Giannini, Mia Grifford, Antonio Iavarone, Rajan Jain, Isaac Joseph, Jaegil Kim, Katayoon Kasaian,Tom Mikkelsen, Bradley A. Murray, Brian Patrick O’Neill, Lior Pachter, Donald W. Parsons, Carrie Sougnez,Erik P. Sulman, Scott R. Vandenberg, Erwin G. Van Meir, Andreas Von Deimling, Hailei Zhang, Daniel Crain,Kevin Lau, David Mallery, Scott Morris, Joseph Paulauskis, Robert Penny, Troy Shelton, Mark Sherman, PeggyYena, Aaron Black, Jay Bowen, Katie Dicostanzo, Julie Gastier-Foster, Kristen M. Leraas, Tara M. Lichtenberg,Christopher R. Pierson, Nilsa C. Ramirez, Cynthia Taylor, Stephanie Weaver, Lisa Wise, Erik Zmuda, TanjaDavidsen, John A. Demchok, Greg Eley, Martin L. Ferguson, Carolyn M. Hutter, Kenna R.Mills Shaw, Bradley A.Ozenberger, Margi Sheth, Heidi J. Sofia, Roy Tarnuzzer, Zhining Wang, Liming Yang, Jean Claude Zenklusen,Brenda Ayala, Julien Baboud, Sudha Chudamani, Mark A. Jensen, Jia Liu, Todd Pihl, Rohini Raman, YunhuWan, Ye Wu, Adrian Ally, J. Todd Auman, Miruna Balasundaram, Saianand Balu, Stephen B. Baylin, RameenBeroukhim, Moiz S. Bootwalla, Reanne Bowlby, Christopher A. Bristow, Denise Brooks, Yaron Butterfield,Rebecca Carlsen, Scott Carter, Lynda Chin, Andy Chu, Eric Chuah, Kristian Cibulskis, Amanda Clarke, Simon G.Coetzee, Noreen Dhalla, Tim Fennell, Sheila Fisher, Stacey Gabriel, Gad Getz, Richard Gibbs, Ranabir Guin,Angela Hadjipanayis, D. Neil Hayes, Toshinori Hinoue, Katherine Hoadley, Robert A. Holt, Alan P. Hoyle,Stuart R. Jefferys, Steven Jones, Corbin D. Jones, Raju Kucherlapati, Phillip H. Lai, Eric Lander, Semin Lee,Lee Lichtenstein, Yussanne Ma, Dennis T. Maglinte, Harshad S. Mahadeshwar, Marco A. Marra, Michael Mayo,Shaowu Meng, Matthew L. Meyerson, Piotr A. Mieczkowski, Richard A. Moore, Lisle E. Mose, Andrew J.Mungall, Angeliki Pantazi, Michael Parfenov, Peter J. Park, Joel S. Parker, Charles M. Perou, Alexei Protopopov,Xiaojia Ren, Jeffrey Roach, Thaís S. Sabedot, Jacqueline Schein, Steven E. Schumacher, Jonathan G. Seidman,Sahil Seth, Hui Shen, Janae V. Simons, Payal Sipahimalani, Matthew G. Soloway, Xingzhi Song, Huandong Sun,Barbara Tabak, Angela Tam, Donghui Tan, Jiabin Tang, Nina Thiessen, Timothy Triche, David J. Van Den Berg,Umadevi Veluvolu, Scot Waring, Daniel J. Weisenberger, Matthew D. Wilkerson, Tina Wong, Junyuan Wu, LiuXi, Andrew W. Xu, Lixing Yang, Travis I. Zack, Jianhua Zhang, B. Arman Aksoy, Harindra Arachchi, ChrisBenz, Brady Bernard, Daniel Carlin, Juok Cho, Daniel DiCara, Scott Frazer, Gregory N. Fuller, Jian Jiong Gao,Nils Gehlenborg, David Haussler, David I. Heiman, Lisa Iype, Anders Jacobsen, Zhenlin Ju, Sol Katzman, HoonKim, Theo Knijnenburg, Richard Bailey Kreisberg, Michael S. Lawrence, William Lee, Kalle Leinonen, Pei Lin,Shiyun Ling, Wenbin Liu, Yingchun Liu, Yuexin Liu, Yiling Lu, Gordon Mills, Sam Ng, Michael S. Noble, EvanPaull, Arvind Rao, Sheila Reynolds, Gordon Saksena, Zack Sanborn, Chris Sander, Nikolaus Schultz, YasinSenbabaoglu, Ronglai Shen, Ilya Shmulevich, Rileen Sinha, Josh Stuart, S. Onur Sumer, Yichao Sun, NatalieTasman, Barry S. Taylor, Doug Voet, Nils Weinhold, John N. Weinstein, Da Yang, Kosuke Yoshihara, SiyuanZheng, Wei Zhang, Lihua Zou, Ty Abel, Sara Sadeghi, Mark L. Cohen, Jenny Eschbacher, Eyas M. Hattab,Aditya Raghunathan, Matthew J. Schniederjan, Dina Aziz, Gene Barnett, Wendi Barrett, Darell D. Bigner,Lori Boice, Cathy Brewer, Chiara Calatozzolo, Benito Campos, Carlos Gilberto Carlotti, Timothy A. Chan,Lucia Cuppini, Erin Curley, Stefania Cuzzubbo, Karen Devine, Francesco DiMeco, Rebecca Duell, J. BradleyElder, Ashley Fehrenbach, Gaetano Finocchiaro, William Friedman, Jordonna Fulop, Johanna Gardner, BethHermes, Christel Herold-Mende, Christine Jungk, Ady Kendler, Norman L. Lehman, Eric Lipp, Ouida Liu,Randy Mandt, Mary McGraw, Roger Mclendon, Christopher McPherson, Luciano Neder, Phuong Nguyen,Ardene Noss, Raffaele Nunziata, Quinn T. Ostrom, Cheryl Palmer, Alessandro Perin, Bianca Pollo, AlexanderPotapov, Olga Potapova, W. Kimryn Rathmell, Daniil Rotin, Lisa Scarpace, Cathy Schilero, Kelly Senecal,Kristen Shimmel, Vsevolod Shurkhay, Suzanne Sifri, Rosy Singh, Andrew E. Sloan, Kathy Smolenski, Susan M.Staugaitis, Ruth Steele, Leigh Thorne, Daniela P.C. Tirapelli, Andreas Unterberg, Mahitha Vallurupalli, YunWang, Ronald Warnick, Felicia Williams, Yingli Wolinsky, Sue Bell, Mara Rosenberg, Chip Stewart, FranklinHuang, Jonna L. Grimsby, Amie J. Radenbaugh, and Jianan Zhang. Comprehensive, integrative genomic analysisof diffuse lower-grade gliomas. New England Journal of Medicine, 372(26):2481–2498, jun 2015.

[84] Shahrokh F. Shariat, Hideo Tokunaga, Jain Hua Zhou, Ja Hong Kim, Gustavo E. Ayala, William F. Benedict, andSeth P. Lerner. p53, p21, pRB, and p16 expression predict clinical outcome in cystectomy with bladder cancer.Journal of Clinical Oncology, 22(6):1014–1024, 2004.

[85] Thomas S. Worst, Cleo Aron Weis, Robert Stöhr, Simone Bertz, Markus Eckstein, Wolfgang Otto, JohannesBreyer, Arndt Hartmann, Christian Bolenz, Ralph M. Wirtz, and Philipp Erben. CDKN2A as transcriptomicmarker for muscle-invasive bladder cancer risk stratification and therapy decision-making. Scientific Reports,8(1):1–10, dec 2018.

[86] Adam Abeshouse, Jaeil Ahn, Rehan Akbani, Adrian Ally, Samirkumar Amin, Christopher D. Andry, MattiAnnala, Armen Aprikian, Joshua Armenia, Arshi Arora, J. Todd Auman, Miruna Balasundaram, Saianand Balu,Christopher E. Barbieri, Thomas Bauer, Christopher C. Benz, Alain Bergeron, Rameen Beroukhim, Mario

31

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 32: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

Berrios, Adrian Bivol, Tom Bodenheimer, Lori Boice, Moiz S. Bootwalla, Rodolfo Borges Dos Reis, Paul C.Boutros, Jay Bowen, Reanne Bowlby, Jeffrey Boyd, Robert K. Bradley, Anne Breggia, Fadi Brimo, Christopher A.Bristow, Denise Brooks, Bradley M. Broom, Alan H. Bryce, Glenn Bubley, Eric Burks, Yaron S.N. Butterfield,Michael Button, David Canes, Carlos G. Carlotti, Rebecca Carlsen, Michel Carmel, Peter R. Carroll, Scott L.Carter, Richard Cartun, Brett S. Carver, June M. Chan, Matthew T. Chang, Yu Chen, Andrew D. Cherniack,Simone Chevalier, Lynda Chin, Juok Cho, Andy Chu, Eric Chuah, Sudha Chudamani, Kristian Cibulskis,Giovanni Ciriello, Amanda Clarke, Matthew R. Cooperberg, Niall M. Corcoran, Anthony J. Costello, JanetCowan, Daniel Crain, Erin Curley, Kerstin David, John A. Demchok, Francesca Demichelis, Noreen Dhalla, RajivDhir, Alexandre Doueik, Bettina Drake, Heidi Dvinge, Natalya Dyakova, Ina Felau, Martin L. Ferguson, ScottFrazer, Stephen Freedland, Yao Fu, Stacey B. Gabriel, Jianjiong Gao, Johanna Gardner, Julie M. Gastier-Foster,Nils Gehlenborg, Mark Gerken, Mark B. Gerstein, Gad Getz, Andrew K. Godwin, Anuradha Gopalan, MarkusGraefen, Kiley Graim, Thomas Gribbin, Ranabir Guin, Manaswi Gupta, Angela Hadjipanayis, Syed Haider,Lucie Hamel, D. Neil Hayes, David I. Heiman, Julian Hess, Katherine A. Hoadley, Andrea H. Holbrook, Robert A.Holt, Antonia Holway, Christopher M. Hovens, Alan P. Hoyle, Mei Huang, Carolyn M. Hutter, Michael Ittmann,Lisa Iype, Stuart R. Jefferys, Corbin D. Jones, Steven J.M. Jones, Hartmut Juhl, Andre Kahles, Christopher J.Kane, Katayoon Kasaian, Michael Kerger, Ekta Khurana, Jaegil Kim, Robert J. Klein, Raju Kucherlapati, LouisLacombe, Marc Ladanyi, Phillip H. Lai, Peter W. Laird, Eric S. Lander, Mathieu Latour, Michael S. Lawrence,Kevin Lau, Tucker Lebien, Darlene Lee, Semin Lee, Kjong Van Lehmann, Kristen M. Leraas, Ignaty Leshchiner,Robert Leung, John A. Libertino, Tara M. Lichtenberg, Pei Lin, W. Marston Linehan, Shiyun Ling, Scott M.Lippman, Jia Liu, Wenbin Liu, Lucas Lochovsky, Massimo Loda, Christopher Logothetis, Laxmi Lolla, TeriLongacre, Yiling Lu, Jianhua Luo, Yussanne Ma, Harshad S. Mahadeshwar, David Mallery, Armaz Mariamidze,Marco A. Marra, Michael Mayo, Shannon McCall, Ginette McKercher, Shaowu Meng, Anne Marie Mes-Masson,Maria J. Merino, Matthew Meyerson, Piotr A. Mieczkowski, Gordon B. Mills, Kenna R.Mills Shaw, SarahMinner, Alireza Moinzadeh, Richard A. Moore, Scott Morris, Carl Morrison, Lisle E. Mose, Andrew J. Mungall,Bradley A. Murray, Jerome B. Myers, Rashi Naresh, Joel Nelson, Mark A. Nelson, Peter S. Nelson, Yulia Newton,Michael S. Noble, Houtan Noushmehr, Matti Nykter, Angeliki Pantazi, Michael Parfenov, Peter J. Park, Joel S.Parker, Joseph Paulauskis, Robert Penny, Charles M. Perou, Alain Piché, Todd Pihl, Peter A. Pinto, Davide Prandi,Alexei Protopopov, Nilsa C. Ramirez, Arvind Rao, W. Kimryn Rathmell, Gunnar Rätsch, Xiaojia Ren, Victor E.Reuter, Sheila M. Reynolds, Suhn K. Rhie, Kimberly Rieger-Christ, Jeffrey Roach, A. Gordon Robertson, BrianRobinson, Mark A. Rubin, Fred Saad, Sara Sadeghi, Gordon Saksena, Charles Saller, Andrew Salner, FranciscoSanchez-Vega, Chris Sander, George Sandusky, Guido Sauter, Andrea Sboner, Peter T. Scardino, EleonoraScarlata, Jacqueline E. Schein, Thorsten Schlomm, Laura S. Schmidt, Nikolaus Schultz, Steven E. Schumacher,Jonathan Seidman, Luciano Neder, Sahil Seth, Alexis Sharp, Candace Shelton, Troy Shelton, Hui Shen, RonglaiShen, Mark Sherman, Margi Sheth, Yan Shi, Juliann Shih, Ilya Shmulevich, Jeffry Simko, Ronald Simon,Janae V. Simons, Payal Sipahimalani, Tara Skelly, Heidi J. Sofia, Matthew G. Soloway, Xingzhi Song, AndreaSorcini, Carrie Sougnez, Serghei Stepa, Chip Stewart, John Stewart, Joshua M. Stuart, Travis B. Sullivan, CharlieSun, Huandong Sun, Angela Tam, Donghui Tan, Jiabin Tang, Roy Tarnuzzer, Katherine Tarvin, Barry S. Taylor,Patrick Teebagy, Imelda Tenggara, Bernard Têtu, Ashutosh Tewari, Nina Thiessen, Timothy Thompson, Leigh B.Thorne, Daniela P. Tirapelli, Scott A. Tomlins, Felipe Amstalden Trevisan, Patricia Troncoso, Lawrence D.True, Maria Christina Tsourlakis, Svitlana Tyekucheva, Eliezer Van Allen, David J. Van Den Berg, UmadeviVeluvolu, Roel Verhaak, Cathy D. Vocke, Doug Voet, Yunhu Wan, Qingguo Wang, Wenyi Wang, Zhining Wang,Nils Weinhold, John N. Weinstein, Daniel J. Weisenberger, Matthew D. Wilkerson, Lisa Wise, John Witte,Chia Chin Wu, Junyuan Wu, Ye Wu, Andrew W. Xu, Shalini S. Yadav, Liming Yang, Lixing Yang, ChristinaYau, Huihui Ye, Peggy Yena, Thomas Zeng, Jean C. Zenklusen, Hailei Zhang, Jianhua Zhang, Jiashan Zhang,Wei Zhang, Yi Zhong, Kelsey Zhu, and Erik Zmuda. The Molecular Taxonomy of Primary Prostate Cancer. Cell,163(4):1011–1025, nov 2015.

[87] Malachi Griffith, Nicholas C Spies, Kilannin Krysiak, Joshua F McMichael, Adam C Coffman, Arpad M Danos,Benjamin J Ainscough, Cody A Ramirez, Damian T Rieke, Lynzey Kujan, Erica K Barnell, Alex H Wagner,Zachary L Skidmore, Amber Wollam, Connor J Liu, Martin R Jones, Rachel L Bilski, Robert Lesurf, Yan-YangFeng, Nakul M Shah, Melika Bonakdar, Lee Trani, Matthew Matlock, Avinash Ramu, Katie M Campbell,Gregory C Spies, Aaron P Graubert, Karthik Gangavarapu, James M Eldred, David E Larson, Jason R Walker,Benjamin M Good, Chunlei Wu, Andrew I Su, Rodrigo Dienstmann, Adam A Margolin, David Tamborero,Nuria Lopez-Bigas, Steven J M Jones, Ron Bose, David H Spencer, Lukas D Wartman, Richard K Wilson,Elaine R Mardis, and Obi L Griffith. CIViC is a community knowledgebase for expert crowdsourcing the clinicalinterpretation of variants in cancer. Nature Genetics, 49(2):170–174, feb 2017.

[88] Javad Noorbakhsh and Jeffrey H. Chuang. Uncertainties in tumor allele frequencies limit power to inferevolutionary pressures. Nature Genetics, 49(9):1288–1289, sep 2017.

[89] Vincent L Cannataro, Stephen G Gaffney, and Jeffrey P Townsend. Effect Sizes of Somatic Mutations in Cancer.

32

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 33: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

JNCI: Journal of the National Cancer Institute, 110(11):1171–1177, nov 2018.

[90] Philip S. Bernard, Joel S. Parker, Michael Mullins, Maggie C.U. Cheung, Samuel Leung, David Voduc, TammiVickery, Sherri Davies, Christiane Fauron, Xiaping He, Zhiyuan Hu, John F. Quackenhush, Inge J. Stijleman,Juan Palazzo, J. S. Matron, Andrew B. Nobel, Elaine Mardis, Torsten O. Nielsen, Matthew J. Ellis, and Charles M.Perou. Supervised risk predictor of breast cancer based on intrinsic subtypes. Journal of Clinical Oncology,27(8):1160–1167, mar 2009.

[91] Daniel C. Koboldt, Robert S. Fulton, Michael D. McLellan, Heather Schmidt, Joelle Kalicki-Veizer, Joshua F.McMichael, Lucinda L. Fulton, David J. Dooling, Li Ding, Elaine R. Mardis, Richard K. Wilson, Adrian Ally,Miruna Balasundaram, Yaron S.N. Butterfield, Rebecca Carlsen, Candace Carter, Andy Chu, Eric Chuah, HyeJung E. Chun, Robin J.N. Coope, Noreen Dhalla, Ranabir Guin, Carrie Hirst, Martin Hirst, Robert A. Holt,Darlene Lee, Haiyan I. Li, Michael Mayo, Richard A. Moore, Andrew J. Mungall, Erin Pleasance, A. GordonRobertson, Jacqueline E. Schein, Arash Shafiei, Payal Sipahimalani, Jared R. Slobodan, Dominik Stoll, AngelaTam, Nina Thiessen, Richard J. Varhol, Natasja Wye, Thomas Zeng, Yongjun Zhao, Inanc Birol, Steven J.M.Jones, Marco A. Marra, Andrew D. Cherniack, Gordon Saksena, Robert C. Onofrio, Nam H. Pho, Scott L.Carter, Steven E. Schumacher, Barbara Tabak, Bryan Hernandez, Jeff Gentry, Huy Nguyen, Andrew Crenshaw,Kristin Ardlie, Rameen Beroukhim, Wendy Winckler, Gad Getz, Stacey B. Gabriel, Matthew Meyerson, LyndaChin, Raju Kucherlapati, Katherine A. Hoadley, J. Todd Auman, Cheng Fan, Yidi J. Turman, Yan Shi, LingLi, Michael D. Topal, Xiaping He, Hann Hsiang Chao, Aleix Prat, Grace O. Silva, Michael D. Iglesia, WeiZhao, Jerry Usary, Jonathan S. Berg, Michael Adams, Jessica Booker, Junyuan Wu, Anisha Gulabani, TomBodenheimer, Alan P. Hoyle, Janae V. Simons, Matthew G. Soloway, Lisle E. Mose, Stuart R. Jefferys, SaianandBalu, Joel S. Parker, D. Neil Hayes, Charles M. Perou, Simeen Malik, Swapna Mahurkar, Hui Shen, Daniel J.Weisenberger, Timothy Triche, Phillip H. Lai, Moiz S. Bootwalla, Dennis T. Maglinte, Benjamin P. Berman,David J. Van Den Berg, Stephen B. Baylin, Peter W. Laird, Chad J. Creighton, Lawrence A. Donehower, MichaelNoble, Doug Voet, Nils Gehlenborg, Daniel Di Cara, Juinhua Zhang, Hailei Zhang, Chang Jiun Wu, SpringYingchun Liu, Michael S. Lawrence, Lihua Zou, Andrey Sivachenko, Pei Lin, Petar Stojanov, Rui Jing, JuokCho, Raktim Sinha, Richard W. Park, Marc Danie Nazaire, Jim Robinson, Helga Thorvaldsdottir, Jill Mesirov,Peter J. Park, Sheila Reynolds, Richard B. Kreisberg, Brady Bernard, Ryan Bressler, Timo Erkkila, Jake Lin,Vesteinn Thorsson, Wei Zhang, Ilya Shmulevich, Giovanni Ciriello, Nils Weinhold, Nikolaus Schultz, JianjiongGao, Ethan Cerami, Benjamin Gross, Anders Jacobsen, Rileen Sinha, B. Arman Aksoy, Yevgeniy Antipin, BorisReva, Ronglai Shen, Barry S. Taylor, Marc Ladanyi, Chris Sander, Pavana Anur, Paul T. Spellman, Yiling Lu,Wenbin Liu, Roel R.G. Verhaak, Gordon B. Mills, Rehan Akbani, Nianxiang Zhang, Bradley M. Broom, Tod D.Casasent, Chris Wakefield, Anna K. Unruh, Keith Baggerly, Kevin Coombes, John N. Weinstein, David Haussler,Christopher C. Benz, Joshua M. Stuart, Stephen C. Benz, Jingchun Zhu, Christopher C. Szeto, Gary K. Scott,Christina Yau, Evan O. Paull, Daniel Carlin, Christopher Wong, Artem Sokolov, Janita Thusberg, Sean Mooney,Sam Ng, Theodore C. Goldstein, Kyle Ellrott, Mia Grifford, Christopher Wilks, Singer Ma, Brian Craft, ChunhuaYan, Ying Hu, Daoud Meerzaman, Julie M. Gastier-Foster, Jay Bowen, Nilsa C. Ramirez, Aaron D. Black,Robert E. Pyatt, Peter White, Erik J. Zmuda, Jessica Frick, Tara M. Lichtenberg, Robin Brookens, Myra M.George, Mark A. Gerken, Hollie A. Harper, Kristen M. Leraas, Lisa J. Wise, Teresa R. Tabler, Cynthia McAllister,Thomas Barr, Melissa Hart-Kothari, Katie Tarvin, Charles Saller, George Sandusky, Colleen Mitchell, Mary V.Iacocca, Jennifer Brown, Brenda Rabeno, Christine Czerwinski, Nicholas Petrelli, Oleg Dolzhansky, MikhailAbramov, Olga Voronina, Olga Potapova, Jeffrey R. Marks, Wiktoria M. Suchorska, Dawid Murawa, WitoldKycler, Matthew Ibbs, Konstanty Korski, Arkadiusz Spychała, Paweł Murawa, Jacek J. Brzezinski, HannaPerz, Radosław Łazniak, Marek Teresiak, Honorata Tatka, Ewa Leporowska, Marta Bogusz-Czerniewicz, JulianMalicki, Andrzej Mackiewicz, Maciej Wiznerowicz, Xuan Van Le, Bernard Kohl, Nguyen Viet Tien, RichardThorp, Nguyen Van Bang, Howard Sussman, Bui Duc Phu, Richard Hajek, Nguyen Phi Hung, Tran Viet ThePhuong, Huynh Quyet Thang, Khurram Zaki Khan, Robert Penny, David Mallery, Erin Curley, Candace Shelton,Peggy Yena, James N. Ingle, Fergus J. Couch, Wilma L. Lingle, Tari A. King, Ana Maria Gonzalez-Angulo,Mary D. Dyer, Shuying Liu, Xiaolong Meng, Modesto Patangan, Frederic Waldman, Hubert Stöppler, W. KimrynRathmell, Leigh Thorne, Mei Huang, Lori Boice, Ashley Hill, Carl Morrison, Carmelo Gaudioso, Wiam Bshara,Kelly Daily, Sophie C. Egea, Mark D. Pegram, Carmen Gomez-Fernandez, Rajiv Dhir, Rohit Bhargava, AdamBrufsky, Craig D. Shriver, Jeffrey A. Hooke, Jamie Leigh Campbell, Richard J. Mural, Hai Hu, Stella Somiari,Caroline Larson, Brenda Deyarmin, Leonid Kvecher, Albert J. Kovatich, Matthew J. Ellis, Thomas Stricker,Kevin White, Olufunmilayo Olopade, Chunqing Luo, Yaqin Chen, Ron Bose, Li Wei Chang, Andrew H. Beck,Todd Pihl, Mark Jensen, Robert Sfeir, Ari Kahn, Anna Chu, Prachi Kothiyal, Zhining Wang, Eric Snyder, JoanPontius, Brenda Ayala, Mark Backus, Jessica Walton, Julien Baboud, Dominique Berton, Matthew Nicholls,Deepak Srinivasan, Rohini Raman, Stanley Girshik, Peter Kigonya, Shelley Alonso, Rashmi Sanbhadti, SeanBarletta, David Pot, Margi Sheth, John A. Demchok, Kenna R.Mills Shaw, Liming Yang, Greg Eley, Martin L.Ferguson, Roy W. Tarnuzzer, Jiashan Zhang, Laura A.L. Dillon, Kenneth Buetow, Peter Fielding, Bradley A.

33

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 34: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

Ozenberger, Mark S. Guyer, Heidi J. Sofia, and Jacqueline D. Palchik. Comprehensive molecular portraits ofhuman breast tumours. Nature, 490(7418):61–70, oct 2012.

[92] Clemency Jolly and Peter Van Loo. Timing somatic events in the evolution of cancer. Genome Biology, 19(1):95,jul 2018.

[93] Daniele Ramazzotti, Giulio Caravagna, Loes Olde Loohuis, Alex Graudenzi, Ilya Korsunsky, Giancarlo Mauri,Marco Antoniotti, and Bud Mishra. Data and text mining CAPRI: efficient inference of cancer progressionmodels from cross-sectional data. Bioinformatics, (18):3016–3026, 2015.

[94] Luca De Sano, Giulio Caravagna, Daniele Ramazzotti, Alex Graudenzi, Giancarlo Mauri, Bud Mishra, andMarco Antoniotti. TRONCO: an R package for the inference of cancer progression models from heterogeneousgenomic data. Bioinformatics (Oxford, England), 32(12):1911–1913, jun 2016.

[95] Mariam Jamal-Hanjani, Gareth A. Wilson, Nicholas McGranahan, Nicolai J. Birkbak, Thomas B.K. Watkins,Selvaraju Veeriah, Seema Shafi, Diana H. Johnson, Richard Mitter, Rachel Rosenthal, Max Salm, Stuart Horswell,Mickael Escudero, Nik Matthews, Andrew Rowan, Tim Chambers, David A. Moore, Samra Turajlic, HangXu, Siow-Ming Lee, Martin D. Forster, Tanya Ahmad, Crispin T. Hiley, Christopher Abbosh, Mary Falzon,Elaine Borg, Teresa Marafioti, David Lawrence, Martin Hayward, Shyam Kolvekar, Nikolaos Panagiotopoulos,Sam M. Janes, Ricky Thakrar, Asia Ahmed, Fiona Blackhall, Yvonne Summers, Rajesh Shah, Leena Joseph,Anne M. Quinn, Phil A. Crosbie, Babu Naidu, Gary Middleton, Gerald Langman, Simon Trotter, MarianneNicolson, Hardy Remmen, Keith Kerr, Mahendran Chetty, Lesley Gomersall, Dean A. Fennell, Apostolos Nakas,Sridhar Rathinam, Girija Anand, Sajid Khan, Peter Russell, Veni Ezhil, Babikir Ismail, Melanie Irvin-Sellers,Vineet Prakash, Jason F. Lester, Malgorzata Kornaszewska, Richard Attanoos, Haydn Adams, Helen Davies,Stefan Dentro, Philippe Taniere, Brendan O’Sullivan, Helen L. Lowe, John A. Hartley, Natasha Iles, Harriet Bell,Yenting Ngai, Jacqui A. Shaw, Javier Herrero, Zoltan Szallasi, Roland F. Schwarz, Aengus Stewart, Sergio A.Quezada, John Le Quesne, Peter Van Loo, Caroline Dive, Allan Hackshaw, and Charles Swanton. Tracking theEvolution of Non–Small-Cell Lung Cancer. New England Journal of Medicine, 376(22):2109–2121, jun 2017.

[96] Moritz Gerstung, Clemency Jolly, Ignaty Leshchiner, Stefan C. Dentro, Santiago Gonzalez, Daniel Rosebrock,Thomas J. Mitchell, Yulia Rubanova, Pavana Anur, Kaixian Yu, Maxime Tarabichi, Amit Deshwar, Jeff Win-tersinger, Kortine Kleinheinz, Ignacio Vázquez-García, Kerstin Haase, Lara Jerman, Subhajit Sengupta, GeoffMacintyre, Salem Malikic, Nilgun Donmez, Dimitri G. Livitz, Marek Cmero, Jonas Demeulemeester, StevenSchumacher, Yu Fan, Xiaotong Yao, Juhee Lee, Matthias Schlesner, Paul C. Boutros, David D. Bowtell, HongtuZhu, Gad Getz, Marcin Imielinski, Rameen Beroukhim, S. Cenk Sahinalp, Yuan Ji, Martin Peifer, FlorianMarkowetz, Ville Mustonen, Ke Yuan, Wenyi Wang, Quaid D. Morris, Stefan C. Dentro, Ignaty Leshchiner,Moritz Gerstung, Clemency Jolly, Kerstin Haase, Maxime Tarabichi, Jeff Wintersinger, Amit G. Deshwar,Kaixian Yu, Santiago Gonzalez, Yulia Rubanova, Geoff Macintyre, David J. Adams, Pavana Anur, RameenBeroukhim, Paul C. Boutros, David D. Bowtell, Peter J. Campbell, Shaolong Cao, Elizabeth L. Christie, MarekCmero, Yupeng Cun, Kevin J. Dawson, Nilgun Donmez, Ruben M. Drews, Roland Eils, Yu Fan, Matthew Fittall,Dale W. Garsed, Gad Getz, Gavin Ha, Marcin Imielinski, Lara Jerman, Yuan Ji, Kortine Kleinheinz, Juhee Lee,Henry Lee-Six, Dimitri G. Livitz, Salem Malikic, Florian Markowetz, Inigo Martincorena, Thomas J. Mitchell,Ville Mustonen, Layla Oesper, Martin Peifer, Myron Peto, Benjamin J. Raphael, Daniel Rosebrock, S. CenkSahinalp, Adriana Salcedo, Matthias Schlesner, Steven Schumacher, Subhajit Sengupta, Ruian Shi, Seung JunShin, Oliver Spiro, Lincoln D. Stein, Ignacio Vázquez-García, Shankar Vembu, David A. Wheeler, Tsun PoYang, Xiaotong Yao, Ke Yuan, Hongtu Zhu, Wenyi Wang, Quaid D. Morris, Paul T. Spellman, David C. Wedge,Peter Van Loo, Paul T. Spellman, and David C. Wedge. The evolutionary history of 2,658 cancers. Nature,578(7793):122–128, feb 2020.

[97] Sajal Dash, Nicholas A. Kinney, Robin T. Varghese, Harold R. Garner, Wu-chun Feng, and Ramu Anandakrishnan.Differentiating between cancer and normal tissue samples using multi-hit combinations of genetic mutations.Scientific Reports, 9(1):1005, dec 2019.

[98] V. Chvatal. A Greedy Heuristic for the Set-Covering Problem. Mathematics of Operations Research, 4(3):233–235, aug 1979.

[99] Broad Institute TCGA Genome Data Analysis Center Firehose 2016 01 28 run. doi:10.7908/C11G0KM9, 2016.

[100] Joan C Smith and Jason M Sheltzer. Systematic identification of mutations and copy number alterations associatedwith cancer patient prognosis. eLife, 7, dec 2018.

[101] Jianfang Liu, Tara Lichtenberg, and Katherine A Hoadley. An Integrated TCGA Pan-Cancer Clinical DataResource to Drive High-Quality Survival Outcome Analytics In Brief Analysis of clinicopathologic annotationsfor over 11,000 cancer patients in the TCGA program leads to the generation of TCGA Clinical Data Reso. Cell,173:400–416.e11, 2018.

34

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint

Page 35: Identifying Modules of Cooperating Cancer Drivers...2020/06/29  · IDENTIFYING MODULES OF COOPERATING CANCER DRIVERS Michael I. Klein1,6, Vincent L. Cannataro2, Jeffrey P. Townsend1,3,4,

A PREPRINT - JUNE 23, 2020

[102] Hadley Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.[103] Terry M Therneau. A Package for Survival Analysis in S, 2015. version 2.38.[104] Alboukadel Kassambara and Marcin Kosinski. survminer: Drawing Survival Curves using ’ggplot2’, 2018. R

package version 0.4.3.[105] Microsoft and Steve Weston. foreach: Provides Foreach Looping Construct for R, 2017. R package version

1.4.4.[106] Steve Weston. doMPI: Foreach Parallel Adaptor for the Rmpi Package, 2017. R package version 0.2.2.[107] Robert L. Grossman, Allison P. Heath, Vincent Ferretti, Harold E. Varmus, Douglas R. Lowy, Warren A. Kibbe,

and Louis M. Staudt. Toward a Shared Vision for Cancer Genomic Data. New England Journal of Medicine,375(12):1109–1112, sep 2016.

35

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted June 29, 2020. ; https://doi.org/10.1101/2020.06.29.168229doi: bioRxiv preprint