inferring and analyzing module-specific lncrna-mrna causal...

18
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/322853954 Inferring and analyzing module-specific lncRNA-mRNA causal regulatory networks in human cancer Article in Briefings in Bioinformatics · February 2018 DOI: 10.1093/bib/bby008 CITATIONS 5 READS 359 4 authors: Some of the authors of this publication are also working on these related projects: Economic Complexity View project Measuring the Economic Complexity of Australia's States and Territories View project Junpeng Zhang Dali University 57 PUBLICATIONS 266 CITATIONS SEE PROFILE Thuc D le University of South Australia 110 PUBLICATIONS 629 CITATIONS SEE PROFILE Lin Liu University of South Australia 180 PUBLICATIONS 1,267 CITATIONS SEE PROFILE Jiuyong Li University of South Australia 295 PUBLICATIONS 3,406 CITATIONS SEE PROFILE All content following this page was uploaded by Junpeng Zhang on 01 February 2018. The user has requested enhancement of the downloaded file.

Upload: others

Post on 08-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Inferring and analyzing module-specific lncRNA-mRNA causal ...nugget.unisa.edu.au/Thuc/Briefings2019JP.pdf · Thuc Duy Le is a research fellow at the University of South Australia

See discussions stats and author profiles for this publication at httpswwwresearchgatenetpublication322853954

Inferring and analyzing module-specific lncRNA-mRNA causal regulatory

networks in human cancer

Article in Briefings in Bioinformatics middot February 2018

DOI 101093bibbby008

CITATIONS

5READS

359

4 authors

Some of the authors of this publication are also working on these related projects

Economic Complexity View project

Measuring the Economic Complexity of Australias States and Territories View project

Junpeng Zhang

Dali University

57 PUBLICATIONS 266 CITATIONS

SEE PROFILE

Thuc D le

University of South Australia

110 PUBLICATIONS 629 CITATIONS

SEE PROFILE

Lin Liu

University of South Australia

180 PUBLICATIONS 1267 CITATIONS

SEE PROFILE

Jiuyong Li

University of South Australia

295 PUBLICATIONS 3406 CITATIONS

SEE PROFILE

All content following this page was uploaded by Junpeng Zhang on 01 February 2018

The user has requested enhancement of the downloaded file

Inferring and analyzing module-specific lncRNAndash

mRNA causal regulatory networks in human cancerJunpeng Zhang Thuc Duy Le Lin Liu and Jiuyong LiCorresponding authors Junpeng Zhang School of Engineering Dali University Dali Yunnan 671003 Public Republic of China Tel thorn86 872 2219799 Faxthorn86 872 2219799 E-mail zhangjunpeng_411yahoocom Jiuyong Li School of Information Technology and Mathematical Sciences University of SouthAustralia Mawson Lakes 5095 SA Australia Tel thorn61 8 830 23898 Fax thorn61 8 830 23381 E-mail jiuyongliunisaeduau

Abstract

It is known that noncoding RNAs (ncRNAs) cover 98 of the transcriptome but do not encode proteins Among ncRNAslong noncoding RNAs (lncRNAs) are a large and diverse class of RNA molecules and are thought to be a gold mine of poten-tial oncogenes anti-oncogenes and new biomarkers Although only a minority of lncRNAs is functionally characterized it isclear that they are important regulators to modulate gene expression and involve in many biological functions To revealthe functions and regulatory mechanisms of lncRNAs it is vital to understand how lncRNAs regulate their target genes forimplementing specific biological functions In this article we review the computational methods for inferring lncRNAndashmRNA interactions and the third-party databases of storing lncRNAndashmRNA regulatory relationships We have found thatthe existing methods are based on statistical correlations between the gene expression levels of lncRNAs and mRNAs andmay not reveal gene regulatory relationships which are causal relationships Moreover these methods do not consider themodularity of lncRNAndashmRNA regulatory networks and thus the networks identified are not module-specific To addressthe above two issues we propose a novel method MSLCRN to infer and analyze module-specific lncRNAndashmRNA causal reg-ulatory networks We have applied it into glioblastoma multiforme lung squamous cell carcinoma ovarian cancer andprostate cancer respectively The experimental results show that MSLCRN as an expression-based method could be a use-ful complementary method to study lncRNA regulations

Key words lncRNA mRNA lncRNAndashmRNA co-expression lncRNAndashmRNA interaction lncRNAndashmRNA causal relationshiphuman cancer

Junpeng Zhang is an associate professor at the School of Engineering Dali University He received his BSc (2009) in Bio-medical Engineering and MSc(2012) in Control Theory and Control Engineering from Kunming University of Science and Technology Kunming City China His research interestsinclude bioinformatics and data miningThuc Duy Le is a research fellow at the University of South Australia (UniSA) He received his BSc (2002) and MSc (2006) in pure Mathematics from theUniversity of Pedagogy Ho Chi Minh City Vietnam and BSc (2010) in Computer Science from UniSA He received his PhD degree in Computer Science(Bioinformatics) in 2014 at UniSA His research interests are bioinformatics data mining and machine learningLin Liu is a senior lecturer at the School of Information Technology and Mathematical Sciences University of South Australia (UniSA) She received herbachelor and master degrees in Electronic Engineering from Xidian University China in 1991 and 1994 respectively and her PhD degree in computer sys-tems engineering from UniSA in 2006 Her research interests include data mining and bioinformatics as well as Petri nets and their applications to proto-col verification and network security analysisJiuyong Li is a professor at the School of Information Technology and Mathematical Sciences University of South Australia He received his PhD degree incomputer science from the Griffith University Australia (2002) His research interests are in the fields of data mining privacy preserving and bioinfor-matics His research has been supported by five prestigious Australian Research Council Discovery grants since 2005 and he has published more than 100research papersSubmitted 13 December 2017 Received (in revised form) 8 January 2018

VC The Author(s) 2018 Published by Oxford University Press All rights reservedFor Permissions please email journalspermissionsoupcom

1

Briefings in Bioinformatics 2018 1ndash17

doi 101093bibbby008Paper

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

Introduction

Long noncoding RNAs (lncRNAs) are non-protein coding tran-scripts with gt200 nucleotides in length Unlike small noncodingRNAs (sncRNAs) lncRNAs generally exhibit low sequence con-servation However owing to rapidly adaptive selection pres-sures the low conservation of lncRNAs (such as Air and Xist)does not indicate absence of function [1] Similar to microRNAs(miRNAs) an important class of sncRNAs evidence has shownthat lncRNAs play important roles in a wide range of biologicalprocesses even in cancers [2 3] Despite the importance oflncRNAs in many physiological and pathological processes alarge number of lncRNAs remain to be functionally character-ized For this reason the number of studies on lncRNA researchhas been increased exponentially in the past decade (as shownin Figure 1)

To achieve various biological functions lncRNAs form generegulatory networks by interacting with other biological mole-cules such as transcription factors miRNAs messenger RNAs(mRNAs) and RNA-binding proteins [4] Among these biologicalmolecules interacting with lncRNAs mRNAs are the most popu-lar ones By regulating the transcription and translation ofmRNAs lncRNAs could get involved in several vital biologicalprocesses such as cell differentiation cell proliferation andcytoprotective programs [5] Therefore the identification oflncRNAndashmRNA regulatory networks would help to uncoverfunctions and regulatory mechanisms of lncRNAs

A straightforward method for identifying lncRNAndashmRNAregulatory networks is sequence-based complementary basepairing To predict lncRNA targets several sequence-basedmethods such as GUUGle [6] RNAup [7] RNAplex [8] IntaRNA[9] RactIP [10] LncTar [11] and RIblast [12] have been devel-oped Owing to the long sequence and complex tertiary struc-ture of each lncRNA the computational costs of predictinglarge-scale lncRNAndashmRNA regulatory relationships are usuallyhigh Moreover these sequence-based methods only considerthe sequence information of lncRNAs and target mRNAs andthus the predicted lncRNAndashmRNA regulatory networks arestatic However previous studies [13ndash15] have shown thatlncRNAs exhibit condition-specific expression fashion anddynamic networks of gene regulation To identify dynamic orcondition-specific lncRNAndashmRNA regulatory networks it is nec-essary to use expression data Some expression-based methods[16ndash19] for predicting co-expressed lncRNAndashmRNA networks

have been proposed However as the predictions are based onstatistical associations found in gene expression levels onlythey may not represent the real lsquocausalrsquo lncRNAndashmRNA regula-tory relationships Furthermore the existing expression-basedmethods do not consider the modularity of lncRNAndashmRNA reg-ulatory networks an important feature of gene regulatory net-works [20]

In this article we first review the computational methodsfor inferring lncRNAndashmRNA interactions and the public data-bases for storing lncRNAndashmRNA regulatory relationshipsSecond we propose a novel method to infer Module-SpecificLncRNAndashmRNA Causal Regulatory Network (thus the proposedmethod is called MSLCRN) In the first step by consideringmodularity of networks MSLCRN uses Weighted Gene Co-expression Network Analysis (WGCNA) [21] to identify lncRNAndashmRNA co-expression modules In each module the lncRNAsand mRNAs are regarded as module-specific genes In the sec-ond step MSLCRN uses a causal inference method named inter-vention calculus when the directed acyclic graph (DAG) isabsent (IDA) [22 23] to estimate the causal effects of possiblelncRNAndashmRNA causal pairs in each module To speed up theestimation the parallelized version of IDA [24] is used to calcu-late the causal effects For each module the noncausal lncRNAndashmRNA pairs are eliminated and the retained lncRNAndashmRNAcausal pairs are further assembled to generate a module-specific lncRNAndashmRNA causal network To obtain a globallncRNAndashmRNA causal regulatory network we further integratethe identified module-specific lncRNAndashmRNA causal networksin the third step

To evaluate MSLCRN we have applied it into four humancancer data sets including glioblastoma multiforme (GBM) lungsquamous cell carcinoma (LSCC) ovarian cancer (OvCa) andprostate cancer (PrCa) from [25] The validation survival andenrichment analysis results show that the proposed methodcan help with revealing the functions and regulatory mecha-nisms of lncRNAs MSLCRN is released under the GPL-30License and is freely available through GitHub (httpsgithubcomzhangjunpeng411MSLCRN)

Computational methods for inferringlncRNAndashmRNA interactions

In this section we review the computational approaches forinferring lncRNAndashmRNA interactions In Table 1 we divide themethods into two categories (1) sequence-based method and(2) expression-based method We will separately review thesemethods as follows

Sequence-based method

The common characteristic of the sequence-based methods isthat the identification of RNAndashRNA interactions depends onRNA binding energy between two RNA molecules To evaluatethe strength of RNA binding energy a number of energy models[6ndash12 26ndash32] are proposed to predict RNAndashRNA interactions

Gerlach and Giegerich [6] propose a utility program GUUGlefor locating potential helical regions under RNA complementarybase pairs rules The method can be effectively used as a filterfor noncoding RNA (ncRNA) target prediction However the reli-able prediction of RNAndashRNA binding energies is also importantfor the identification of RNAndashRNA interactions To study thethermodynamics of RNAndashRNA interactions Muckstein et al [7]present an extension of the standard partition function methodcalled RNAup to RNA secondary structures By comparing

129 141 164 240 338

639

991

1419

2068

2596

0

500

1000

1500

2000

2500

3000

Year

Num

ber

of p

ublic

atio

ns

Figure 1 The number of lncRNA-related publications in the past decade The

number of queried publications is obtained from PubMed library with keyword

lsquolncRNArsquo

2 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

Table 1 Summary of computational methods or tools for inferring lncRNAndashmRNA interactions

Methodstools Categories of methods Brief descriptions Available

GUUGle [6] Sequence-based Target prediction by locating potential helical regions of RNAndashRNA pairs under RNA base pairing rules which include G-Ubases

httpbibiserv2cebitecuni-bielefelddeguugle

RNAup [7] Sequence-based Target prediction by studying thermodynamics of RNAndashRNApairs based on the sum of the energy of binding andhybridization

httprnatbiunivieacatcgi-binRNAWebSuiteRNAupcgi

RNAcofold [26] Sequence-based Target prediction by computing the hybridization energy andbase pairing pattern of RNAndashRNA pairs

httprnatbiunivieacatcgi-binRNAWebSuiteRNAcofoldcgi

Alkan et al [27] Sequence-based Target prediction by minimizing the joint free energy of RNAndashRNA pairs under a number of energy models including basepair energy model stacked pair energy model loop energymodel

On request

RNAplex [8] Sequence-based Target prediction by finding possible hybridization sites ofRNAndashRNA pairs

httpwwwtbiunivieacathtafer

IntaRNA [9] Sequence-based Target prediction by incorporating accessibility of target sitesas well as the existence of a user-definable seed

httprnainformatikuni-freiburgdeIntaRNAInputjsp

RactIP [10] Sequence-based Target prediction by integrating approximate information onan ensemble of equilibrium joint structures into the objec-tive function of integer programming

httprtipsdnabiokeioacjpractip

PETcofold [28] Sequence-based Target prediction by taking covariance information in intra-molecular and intermolecular base pairs into account

httprthdkresourcespetcofold

RIsearch [29] Sequence-based Target prediction by implementing a simplified Turner energymodel for fast computation of hybridization

httpsrthdkresourcesrisearchrisearch1php

RIsearch2 [30] Sequence-based An updated version of RIsearch and predict targets using a sin-gle integrated seed-and-extend framework based on suffixarrays

httpsrthdkresourcesrisearch

LncTar [11] Sequence-based lncRNA target prediction by finding the minimum free energyjoint structure of RNAndashRNA pairs based on base pairing

httpwwwcuilabcnlnctar

lncRNATargets [31] Sequence-based lncRNA target prediction based on nucleic acidthermodynamics

httpwwwherbbolorg 8001lrt

Terai et al [32] Sequence-based lncRNA target prediction by developing an integrated pipelineon the K computer which is one of the fastest super-com-puters in the world

httprtoolscbrcjpcgi-binRNARNAindexpl

RIblast [12] Sequence-based Target prediction based on the seed-and-extension approach httpgithubcomfukunagatsuRIblast

Liao et al [16] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodand the identified lncRNAndashmRNA interactions should be co-expressed in the same direction in no less than 3 Mousemicroarray data sets

On request

Guo et al [17] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodin OvCa malignant progression

On request

Du et al [18] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodand a power function in thyroid cancer

On request

Liu et al [33] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodin human colorectal carcinoma

On request

Huang et al [34] Expression-based Identify lncRNAndashmRNA interactions associated with pneumo-nia by using Pearson method

On request

Li et al [35] Expression-based Identify dynamic lncRNAndashmRNA interactions associated withvenous congestion by using Pearson method

On request

Wu et al [19] Expression-based Identify lncRNAndashmRNA interactions by using a generalized lin-ear model to regress mRNA expression on lncRNA expres-sion in breast cancer

On request

Fu et al [36] Expression-based Identify lncRNAndashmRNA interactions by considering mRNA lociwithin lncRNA and the Pearson correlation in cartilage

On request

Zhang et al [37] Expression-based Identify lncRNAndashmRNA interactions by considering mRNA lociwithin lncRNA and the Pearson correlation in cartilage inperipheral blood mononuclear cells

On request

Iwakiri et al [38] Expression-based Identify tissue-specific lncRNAndashmRNA interactions by integrat-ing the tissue specificity of lncRNAs and mRNAs intosequence-based prediction of human lncRNAndashRNAinteractions

On request

Lv et al [39] Expression-based Identify tissue-specific lncRNAndashmRNA interactions by usingPearson and sequence-based methods and in human intra-hepatic cholangiocarcinoma

On request

Module-specific lncRNA-mRNA causal regulatory networks | 3

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

predicted free energies of binding with RNA interference experi-mental data RNAup can produce biologically reasonableresults For genome-wide predictions of ncRNA targets RNAupis not fast enough Therefore it is usually to be combined withother faster RNAndashRNA prediction methods

To extend the standard dynamic programming algorithmsfor computing RNA secondary structures Bernhart et al [26]propose a program named RNAcofold to compute the hybridiza-tion energy and base pairing pattern of the co-folding of twoRNA molecules However the method disregards some impor-tant interaction structures and is restricted to dimeric com-plexes Moreover for the RNAndashRNA interaction predictionpredicting the joint secondary structure of two interacting RNAsis also important To solve it Alkan et al [27] develop severalalgorithms to minimize the joint free energy between the twoRNAs under a number of energy models Assuming that con-served RNAndashRNA interactions imply conserved functionSeemann et al [28] also implement a comparative method calledPETcofold to predict the joint secondary structure of two inter-acting RNAs As PETcofold considers sequence conservation anincreasing amount of structural covariance can further improveits performance

RNAup [7] and RNAcofold [26] are too slow for genome-widesearch in finding target sites of ncRNAs To accelerate the speedof RNAndashRNA interaction predictions RNAplex [8] is presented toquickly find possible hybridization sites between two interact-ing RNAs To focus on the target search on short highly stableinteractions RNAplex introduces a per nucleotide penaltyMeanwhile another general and fast approach IntaRNA [9] isproposed to efficiently predict bacterial RNAndashRNA interactionsCompared with other existing target prediction methodsIntaRNA considers both the accessibility of target sites and theexistence of a user-defined seed Therefore it shows a higheraccuracy than competing methods Kato et al [10] also present afast and accurate prediction method RactIP for comprehensivetype of RNAndashRNA interactions In terms of predicting joint sec-ondary structures of two interacting RNAs RactIP run incompa-rably faster than competitive programs

To further achieve a speed improvement of predictingRNAndashRNA interactions Wenzel et al [29] present RIsearch forfast computation of hybridization between two interactingRNAs They show that the energy model of RIsearch is anaccurate approximation of the full energy model for near-complementary RNAndashRNA duplexes Furthermore RIsearch isfaster than RNAplex [8] in RNAndashRNA interaction searchRecently RIsearch2 [30] an updated version of RIsearch [29] isproposed to localize potential near-complementary RNAndashRNAinteractions between two RNA sequences The comparisonresults show that RIsearch2 is much faster than the previousmethods such as GUUGle [6] RNAplex [8] IntaRNA [9] andRIsearch [29]

Although the above RNAndashRNA interaction prediction meth-ods can be extended to predict lncRNAndashmRNA interactionsnone of them are exclusively used for identifying the RNA tar-gets of lncRNAs in a large scale To efficiently identify lncRNAndashmRNA interactions Li et al [11] propose a tool named LncTarLncTar explores lncRNAndashmRNA interactions by finding the min-imum free energy joint structure of two interacting RNAs basedon base pairing As LncTar runs fast and does not have a limitto RNA size it can be used for large-scale identification of theRNA targets for all RNAs Another web-based platformlncRNATargets [31] is also provided for lncRNA target predic-tion Because there is no limit to RNA size lncRNATargets canalso be used to identify the RNA targets of all RNAs In a whole

human transcriptome Terai et al [32] develop an integratedpipeline to predict lncRNAndashmRNA interactions for the first timeIn the pipeline IntaRNA [9] is used to calculate interactionenergy and RactIP [10] is used to predict joint secondary struc-ture Recently to further shorten the running time of predictinglncRNAndashmRNA interactions an ultrafast RNAndashRNA interactionprediction method RIblast [12] based on the seed-and-extensionmethod is presented The comparison results show that RIblastruns faster than RNAplex [8] IntaRNA [9] Terai et al pipeline[32] and thus can be applied to a large scale of lncRNA targetidentification

Expression-based method

At the gene expression level the co-expressed lncRNAndashmRNApairs are regarded as lncRNAndashmRNA interactions for theexpression-based methods Among the existing expression-based methods [16ndash19 33ndash39] Pearson correlation method is akey step of most methods to identify co-expressed lncRNAndashmRNA pairs

Liao et al [16] construct a lncRNAndashmRNA co-expression net-work from re-annotated mouse microarray data sets By usingPearson method they only keep the lncRNAndashmRNA pairs withPlt 001 and Pearson correlation ranked in the top or bottom005 percentile The study is the first large-scale prediction oflncRNA functions from a lncRNAndashmRNA co-expression networkTo identify immune-associated lncRNA biomarkers in OvCa Guoet al [17] make a comprehensive analysis of lncRNAndashmRNA co-expression patterns To identify lncRNAndashmRNA co-expressionpairs they calculate Pearson correlation between differentiallyexpressed lncRNAs and mRNAs They only reserve the lncRNAndashmRNA co-expression pairs with Pearson correlationgt 05 and thecorresponding False Discovery Rate (FDR)lt 001 Liu et al [33] andHuang et al [34] also use Pearson method to study lncRNAndashmRNAco-expression networks in human colorectal carcinoma andpneumonia respectively The inferred lncRNAndashmRNA co-expression networks will help to study lncRNA functionsRecently Du et al [18] propose a two-step method to conduct acomprehensive analysis of lncRNAndashmRNA co-expression patternsin thyroid cancer First they use Pearson method to calculatePearson correlation and the cutoff of Pearson correlation is 05and the corresponding FDR cutoff is 001 Second the Pearson cor-relations are transformed into an adjacency matrix

Owing to dynamic characteristic of gene regulatory net-works Wu et al [19] identify two distinct lncRNAndashmRNA co-expression networks in tumor and normal breast tissue Theyuse a generalized linear model to regress mRNA expression onlncRNA expression in tumor and normal breast tissue and onlyfocus on dynamic breast lncRNAndashmRNA co-expression pairsthat differ in tumor and normal breast tissue Meanwhile tostudy the potential role of lncRNAs in venous congestion Liet al [35] also construct a dynamic lncRNAndashmRNA co-expression network By using Pearson method they separatelycalculate Pearson correlations of each lncRNAndashmRNA pair invenous congestion and normal samples The lncRNAndashmRNApairs with Pearson correlationgt099 orlt099 and P-valuelt001are selected as lncRNAndashmRNA co-expression pairs They con-struct two types of lncRNAndashmRNA co-expression networkslsquolostrsquo network where lncRNAndashmRNA co-expression pairs onlyexisted in normal samples and lsquoobtainedrsquo network wherelncRNAndashmRNA co-expression pairs only existed in venous con-gestion samples The lsquolostrsquo and lsquoobtainedrsquo networks are furtherintegrated to obtain a dynamic lncRNAndashmRNA co-expressionnetwork

4 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

The above methods simply use matched lncRNA and mRNAexpression data to identify lncRNAndashmRNA co-expression pairsTo identify lsquocis-regulated target genesrsquo of lncRNAs some methodsalso consider mRNA loci information within lncRNA For exampleFu et al [36] combine mRNA loci information and matchedlncRNA and mRNA expression data to predict lncRNA targetsThey identify the mRNAs as targets under two conditions (i) themRNA loci are within a 300-kb window up- or downstream oflncRNA and (ii) lncRNAndashmRNA co-expression pairs are signifi-cantly positive correlated (Pearson correlationgt 08 and the corre-sponding P-valuelt 005) Zhang et al [37] also use a similarmethod to Fu et al [36] for identifying lncRNA targets The mRNAscan be regarded as targets when (1) the mRNA loci are within a10 window up- or downstream of lncRNA and (2) lncRNAndashmRNAco-expression pairs are significantly positive correlated (Pearsoncorrelationgt 098 and the corresponding P-valuelt 005)

Apart from mRNA loci information within lncRNA someemerging methods consider predictions from sequence-basedmethods as putative lncRNAndashmRNA interactions For exampleIwakiri et al [38] integrate tissue-specific lncRNA and mRNAexpression data into predictions from a sequence-basedmethod in [32] They discover that integrating tissue specificitycan improve prediction accuracy of lncRNAndashmRNA interactionsLv et al [39] also combine matched lncRNA and mRNA expres-sion data with predictions from a sequence-based methodLncTar [11] They first use Pearson method to identify co-expressed lncRNAndashmRNA co-expression pairs with Pearsoncorrelationgt095 orlt095 Then LncTar is used to further filterthe identified lncRNAndashmRNA co-expression pairs

Public databases for storing lncRNAndashmRNAregulatory relationships

In this section we review the public databases of storinglncRNAndashmRNA regulatory relationships Table 2 shows a sum-mary of the third-party public databases including experimen-tally validated and computationally predicted databases

NPInter [40] contains experimentally validated interactionsbetween ncRNAs especially lncRNAs and miRNAs The databasecontains 915 067 interactions in 188 tissues or cell lines from 68kinds of experimental technologies There is a classification ofthe functional interactions based on the functional process thatncRNA is involved in Moreover NPInter allows users to searchinteractions related publications and other information

LncRNADisease [41] not only collects experimentally sup-ported lncRNAndashdisease associations and lncRNA interactionsbut also predicts novel lncRNAndashdisease associations Recentlythe database curates 478 entries of experimentally validatedlncRNA interactions LncRNADisease provides users severalways to search lncRNA-related diseases and interactions

To study differentially expressed genes after lncRNA knock-down or overexpression Jiang et al [42] develop a databasecalled LncRNA2Target in human and mouse organisms Thedatabase has a collection of 396 experimentally validatedlncRNAndashtarget interactions In LncRNA2Target if a gene is dif-ferentially expressed after lncRNA knockdown or overexpres-sion it is regarded as a target of a lncRNA For convenienceLncRNA2Target allows users to search for the targets of singlelncRNA or for the lncRNAs that target a specific geneMeanwhile Zhou et al [43] also build a reference resourceLncReg for lncRNA-related regulatory networks The databasehas 1081 experimentally validated lncRNA-related regulatory

records between 258 nonredundant lncRNAs and 571 nonredun-dant genes

IRNdb [44] is a database that focuses on collecting immuno-logically relevant lncRNAndashtarget miRNAndashtarget and PIWI-interacting RNAndashtarget interactions The current version ofIRNdb documents 22 453 immunologically relevant lncRNAndashtar-get interactions by integrating three databases LncRNADisease[41] LncRNA2Target [42] and LncReg [43] The aim is to helpresearchers study the roles of ncRNAs in the immune systemRecently a new experimentally validated database namedlncRInter [45] was developed to collect reliable and high-qualitylncRNAndashtarget interactions The extracted lncRNAndashtarget inter-actions are all from published literature and are supported bycertain biological experiments (eg luciferase reporter assayin vitro binding assay RNA pull-down) In total lncRInter con-tains 1036 experimentally validated lncRNAndashtarget interactionsin 15 organisms

In addition to the experimentally validated databases pre-sented above there are several computationally predicted data-bases for collecting lncRNAndashmRNA interactions For instancestarBase [46] is a comprehensive database of systematicallyidentifying the RNAndashRNA and proteinndashRNA interaction net-works from 108 CLIP-Seq (PAR-CLIP HITS-CLIP iCLIP CLASH)data sets The lncRNAndashmRNA interactions can be extractedfrom proteinndashRNA interaction networks InCaNet [47] aimsto establish a comprehensive regulatory network betweenlncRNAs and cancer genes They identify lncRNAndashcancergene interactions by computing gene co-expression betweenlncRNAs and cancer genes BmncRNAdb [48] is a comprehensivedatabase of silkworm lncRNAs and miRNAs The database pro-vides three online tools for users to predict both lncRNAndashtargetand miRNAndashtarget interactions lncRNAtor [49] collect expres-sion data from 243 RNA-seq experiments including 5237samples of various tissues and developmental stages ThelncRNAndashmRNA co-expression pairs are identified through co-expression analysis of lncRNAs and mRNAs lncRNome [50] is acomprehensive knowledgebase of sequence structure biologi-cal functions genomic variations and epigenetic modificationson gt17 000 lncRNAs in human For lncRNAndashprotein interactionsthe database incorporates PAR-CLIP experiments and a supportvector machine-based prediction method Co-lncRNA [51] andLncRNA2Function [52] predict co-expressed lncRNAndashmRNAinteractions from RNA-Seq data and further annotates thepotential functions of human lncRNAs using functional enrich-ment analysis lncRNAMap [53] is an integrated and compre-hensive database to explore regulatory functions of humanlncRNAs By integrating small RNAs supported by publicly avail-able deep sequencing data lncRNAMap construct lncRNA-derived siRNAndashtarget interactions

In summary for experimentally validated databases userscan select individual database or combine several databases asground truth to validate the predicted lncRNAndashmRNA interac-tions As for computationally predicted databases they can beused as initial structural of sequence-based or expression-basedmethods to identify lncRNAndashmRNA interactions

Inferring and analyzing MSLCRN networksRepurposed microarray data across human cancers

We collect the repurposed lncRNA and mRNA expression dataof GBM LSCC OvCa and PrCa from [25] A lncRNA or mRNA iseliminated if it does not have a corresponding gene symbol in adata set By calculating average expression values of replicate

Module-specific lncRNA-mRNA causal regulatory networks | 5

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

lncRNAs and mRNAs we obtain unique expression value ofthese replicates Consequently we get the matched expressiondata of 9704 lncRNAs and 18 282 mRNAs in 451 GBM 113 LSCC585 OvCa and 150 PrCa samples

Pipeline of MSLCRN

As shown in Figure 2 MSLCRN contains the following threesteps to infer module-specific lncRNAndashmRNA causal regulatorynetworks

i Identification of lncRNAndashmRNA co-expression modulesGiven the matched lncRNA and mRNA expression data weuse WGCNA to generate gene co-expression modules Amodule containing at least two lncRNAs and two mRNAsare regarded as a lncRNAndashmRNA co-expression moduleand used as the input of the second step

ii Identification of module-specific lncRNAndashmRNA causal reg-ulatory networks For each lncRNAndashmRNA co-expressionmodule with each lncRNAndashmRNA pair we apply parallelIDA to estimate the causal effect of the lncRNA on the

Table 2 Public databases for storing lncRNAndashmRNA regulatory relationships

Databases Types of databases Brief descriptions Organisms Available

NPInter [40] Validated A database of experimentally verified func-tional interactions between ncRNAs(including lncRNAs miRNAs etc) and bio-molecules (proteins RNAs and DNAs)

22 organisms httpwwwbioinfoorgNPInter

LncRNADisease [41] Validated A database of experimentally supportedlncRNAndashdisease association data andlncRNAndashtarget interactions in various lev-els including protein RNA miRNA andDNA

Human httpwwwcuilabcnlncrnadisease

LncRNA2Target [42] Validated A database of lncRNAndashtarget regulatory rela-tionships experimentally validated bylncRNA knockdown or overexpression

Human mouse httpwwwlncrna2targetorg

LncReg [43] Validated A database of experimentally validatedlncRNAndashtarget interactions from publicliterature

7 organisms httpbioinformaticsustceducnlncreg

IRNdb [44] Validated A database of immunologically relevantncRNAs (miRNAs lncRNAs and otherncRNAs) and target genes

Human mouse httpcompbiomasseyacnzappsirndb

lncRInter [45] Validated A database of experimentally validatedlncRNAndashtarget interactions extracted frompeer-reviewed publications

15 organisms httpbioinfolifehusteducnlncRInter

starBase [46] Predicted A comprehensive database of systematicallyidentifying the RNAndashRNA and proteinndashRNAinteraction networks from 108 CLIP-Seq(PAR-CLIP HITS-CLIP iCLIP CLASH) datasets

Human httpstarbasesysueducn

lnCaNet [47] Predicted A database of establishing a comprehensiveregulatory network source for lncRNA andcancer genes

Human httplncanetbioinfo-minzhaoorg

BmncRNAdb [48] Predicted A comprehensive database of the silkwormlncRNAs and miRNAs as well as the threeonline tools for users to predict the targetgenes of lncRNAs or miRNAs

Bombyx mori httpgenecqueducnBmncRNAdbindexphp

lncRNAtor [49] Predicted A comprehensive resource of encompassingannotation sequence analysis geneexpression protein binding and phyloge-netic conservation

6 organisms httplncrnatorewhaackr

lncRNome [50] Predicted A comprehensive knowledgebase on thetypes chromosomal locations descriptionon the biological functions and diseaseassociations of lncRNAs

Human httpgenomeigibresinlncRNome

Co-LncRNA [51] Predicted A computationally predicted database toidentify GO annotations and KEGG path-ways affected by co-expressed protein-cod-ing genes of a single or multiple lncRNAs

Human httpwwwbio-bigdatacomCo-LncRNA

LncRNA2Function [52] Predicted A comprehensive resource of investigatingthe functions of lncRNAs based on co-expressed lncRNAndashmRNA interactions

Human httpmlghiteducnlncrna2function

lncRNAMap [53] Predicted An integrated and comprehensive databaseof regulatory functions of lncRNAs and act-ing as ceRNAs

Human httplncRNAMapmbcnctuedutw

6 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

mRNA We use the absolute value of the causal effect(AVCE) to evaluate the strength of the regulation of thelncRNA on the mRNA and a higher AVCE indicates a stron-ger lncRNA regulation The lncRNAndashmRNA pairs with highAVCEs in each module are considered as module-specificlncRNAndashmRNA causal regulatory relationships and we calleach module with these relationships identified a module-specific causal regulatory network

iii Identification of global lncRNAndashmRNA causal regulatorynetwork We integrate the module-specific lncRNAndashmRNA

causal regulatory networks to form the global lncRNAndashmRNA causal regulatory network

Identification of lncRNAndashmRNA co-expression modules

In systems biology WGCNA [21] is a popular method for findingthe correlation patterns among genes across samples and canbe used to identify clusters or modules of highly co-expressedgenes Therefore we use WGCNA to first infer lncRNAndashmRNAco-expression modules

Figure 2 The pipeline of MSLCRN First WGCNA is used to identify lncRNAndashmRNA co-expression modules from matched lncRNA and mRNA expression data Second

we infer lncRNAndashmRNA causal regulatory relationships in each module by using parallel IDA method For each module we assemble the identified lncRNAndashmRNA reg-

ulatory relationships to obtain a module-specific lncRNAndashmRNA causal regulatory network Third the module-specific lncRNAndashmRNA causal regulatory networks are

integrated to form a global lncRNAndashmRNA causal regulatory network

Module-specific lncRNA-mRNA causal regulatory networks | 7

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

Specifically the matched lncRNA and mRNA expressiondata are used as the input of WGCNA For each pair of genesi and j the gene co-expression similarity sij of the pair is definedas

sij frac14 jcorethi jTHORNj (1)

where jcor(i j)j is the absolute value of the Pearson correlationbetween genes i and j The gene co-expression similarity matrixis denoted by Sfrac14 [sij]

To pick an appropriate soft-thresholding power for trans-forming the similarity matrix S into an adjacency matrix A weuse the scale-free topology criterion for soft-thresholding andthe minimum scale free topology fitting index R2 is set as 09Then the topological overlap matrix (TOM) Wfrac14 [wij] is gener-ated based on the adjacency matrix Afrac14 [aij] The TOM similaritywij between genes i and j is defined

wij frac14P

uaiuauj thorn aij

minfP

uaiuP

uaujg thorn 1 aij(2)

where u denotes all genes of the matched lncRNA and mRNAexpression data The TOM dissimilarity between genes i and j isdenoted by dijfrac14 1 - wij To identify gene co-expression modulesthe TOM dissimilarity matrix Dfrac14 [dij] is clustered using optimalhierarchical clustering method [54] Here the identified geneco-expression modules are groups of lncRNAs and mRNAs withhigh topological overlap The lncRNAs and mRNAs of eachlncRNAndashmRNA co-expression module are considered for possi-ble lncRNAndashmRNA causal relationships in the next step

Identification of module-specific lncRNAndashmRNA causalregulatory networks

After the identification of lncRNAndashmRNA co-expression mod-ules we use the parallel IDA method [24] to estimate causaleffects of possible lncRNAndashmRNA causal pairs in each moduleThe application of parallel IDA method to matched lncRNAand mRNA expression data for estimating causal effectsincludes two steps (i) learning the causal structure fromexpression data using the parallel-PC algorithm [24] and(ii) estimating the causal effects of lncRNAs on mRNAs byapplying do-calculus [55]

In step (i) Vfrac14 L1 Lm T1 Tn is a set of random varia-bles denoting m lncRNAs and n mRNAs The causal structure isin the form of a DAG where a node denotes a lncRNA Li ormRNA Tj and an edge between two nodes represents a causalrelationship between them We use the parallel-PC algorithm aparallel version of the PC algorithm [56] to learn the causalstructures (the DAGs) from expression data Starting with a fullyconnected undirected graph the parallel-PC algorithm deter-mines if an edge is retained or removed in the graph by con-ducting conditional independence tests in parallel Then to geta DAG the directions of edges in the obtained graph are ori-ented As different DAGs may represent the same conditionalindependence the parallel-PC algorithm uses a completed par-tially directed acyclic graph (CPDAG) to uniquely describe anequivalence class of DAGs In this work we use the R-packageParallelPC [57] to implement the parallel-PC algorithm and setthe significant level of the conditional independence testsafrac14 001

In step (ii) we are only interested in estimating the causaleffect of the directed edge Li Tj where vertex is Li a parent ofvertex Tj As described above a CPDAG may generate a class ofDAGs For the causal effect of Li Tj in a CPDAG we use do-calculus [55] to estimate the causal effects of Li on Tj in a class ofDAGs Then we use the minimum absolute value of all possiblecausal effects as a final causal effect of Li Tj As for the detailsof how the parallel IDA method is applied to estimate causalrelationships from expression data the readers can refer to [24]

The estimated causal effects can be positive or negativereflecting the up or down regulation by the lncRNAs on themRNAs For the purpose of constructing the regulatory net-works we use the absolute values of the causal effects (AVCEs)to evaluate the strengths of the regulation and thus to confirmthe regulatory relationships

We set different AVCE cutoffs from 010 to 060 with a step of005 to generate MSLCRN networks in GBM LSCC OvCa andPrCa respectively For each cutoff we merge the identifiedMSLCRN networks to obtain global lncRNAndashmRNA causal regu-latory networks in the four human cancers respectively Asshown in Table 3 a higher cutoff selection causes a smallerglobal lncRNAndashmRNA causal regulatory network but bettergoodness of fit To make a trade-off between the size of theglobal lncRNAndashmRNA causal regulatory networks and goodnessof fit we set a compromised AVCE cutoff with a value of 045 Ifthe AVCE of a lncRNA on a mRNA is 045 or above we considerthere is a causal regulatory relationship between the lncRNAndashmRNA pair Under the compromise cutoff we have a moderatesize of the global lncRNAndashmRNA causal regulatory networks inGBM LSCC OvCa and PrCa Meanwhile the node degree distri-butions of four global lncRNAndashmRNA causal regulatory net-works also follow power law distribution (the fitted power curveis in the form of yfrac14 axb) well with R2gt 08

Validation survival and enrichment analysis

Previous studies have demonstrated that about 20 of thenodes in a biological network are essential and are regarded ashub genes [58 59] Therefore when analyzing a global lncRNAndashmRNA causal network we select the 20 of lncRNAs with thehighest degrees in the network as hub lncRNAs The degree of alncRNA node in the global network is the number of mRNAsconnected with it

To validate the predicted module-specific lncRNAndashmRNAcausal regulatory relationships we obtain the experimentallyvalidated lncRNAndashmRNA regulatory relationships from thethree widely used databases NPInter v30 [40] LncRNADiseasev2017 [41] and LncRNA2Target v12 [42] Furthermore we retainexperimentally validated lncRNAndashmRNA regulatory relation-ships associated with the four human cancer data sets asground truth

We perform survival analysis using the R-package survival[60] A multivariate Cox model is used to predict the risk scoreof each tumor sample Then all tumor samples in each cancerdata set are equally divided into high- and low-risk groupsaccording to their risk scores Moreover we calculate theHazard Ratio between the high- and the low-risk groups andperform the Log-rank test

To further investigate the underlying biological processesand pathways related to each of the MSLCRN networks we usethe R-package clusterProfiler [61] to conduct functional enrich-ment analysis on the networks respectively The GeneOntology (GO) [62] biological processes and Kyoto Encyclopedia

8 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

of Genes and Genomes (KEGG) [63] pathways with adjustedP-valuelt005 [adjusted by Benjamini-Hochberg (BH) method]are regarded as functional categories for the MSLCRN networks

We also collect a list of lncRNAs and mRNAs that areassociated with GBM LSCC OvCa and PrCa to study diseaseenrichment of each of the MSLCRN networks The list of disease-associated lncRNAs is obtained from LncRNADisease v2017 [41]Lnc2Cancer v2016 [64] and MNDR v20 [65] The list of disease-associated mRNAs is from DisGeNET v50 [66] To evaluatewhether a MSLCRN network is significantly enriched in a specificdisease we use a hyper-geometric distribution test as follows

p frac14 1 FethxjBNMTHORN frac14 1Xx1

ifrac140

N

i

B N

M i

B

M

(3)

In the formula B is the number of all genes in the expressiondata set N denotes the number of all genes associated with aspecific disease in the expression data set M is the number ofgenes in a MSLCRN network and x is the number of genes asso-ciated with a specific disease in a MSLCRN network A MSLCRNnetwork is significantly enriched in a specific disease if theP-valuelt 005

Network analysis validation and comparisonon MSLCRN networkslncRNAs exhibit dynamic positive gene regulationacross cancers

By following the first step of the MSLCRN method we haveidentified 23 38 45 and 32 lncRNAndashmRNA co-expression mod-ules in GBM LSCC OvCa and PrCa respectively In the secondstep of the MSLCRN method we eliminate the noncausallncRNAndashmRNA pairs in lncRNAndashmRNA co-expression modulesAs a result we generate 23 38 45 and 32 module-specificlncRNAndashmRNA causal regulatory networks in GBM LSCC OvCaand PrCa respectively After merging the module-specificlncRNAndashmRNA causal regulatory networks for each data set weobtain the four global lncRNAndashmRNA regulatory networks inGBM LSCC OvCa and PrCa respectively

To understand the overlap and difference of module-specificgenes module-specific lncRNAndashmRNA causal regulatory rela-tionships and module-specific hub lncRNAs in the four humancancers we generate three set intersection plots using theR-package UpSetR [67] As shown in Figure 3 we find that themajority of module-specific genes (5752) module-specificlncRNAndashmRNA causal regulatory relationships (9902) andmodule-specific hub lncRNAs (8922) tend to be cancer-specific Only a small portion of module-specific genes (396) andmodule-specific lncRNAndashmRNA causal regulatory relationships(6) are shared by the four cancers Especially none of themodule-specific hub lncRNAs are common between the fourcancers In addition the causal effects are positive for 99569672 9993 and 7863 of the causal regulatory relationshipsidentified in GBM LSCC OvCa and PrCa respectively Theseresults indicate that lncRNAs are more likely to exhibit dynamicpositive gene regulation across cancers The results are alsoconsistent with the proposition that the positive gene regula-tion by lncRNAs would be desired in specific situations [68]

Differential network analysis uncovers cancer-specificlncRNAndashmRNA causal networks

In this section we focus on studying cancer-specific lncRNAndashmRNA causal networks using differential network analysisThus the GBM-specific LSCC-specific OvCa-specific and PrCa-specific lncRNAndashmRNA causal networks are identified Asshown in Figure 4A the distributions of node degrees in thesefour cancer-specific lncRNAndashmRNA causal networks followpower law distributions well with R2frac14 09774 09923 09723and 08310 respectively Thus these four cancer-specificlncRNAndashmRNA causal networks are scale free indicating thatmost mRNAs are regulated by a small number of lncRNAs

Table 3 Degree distributions of global lncRNAndashmRNA causal regula-tory networks with different cutoffs in GBM LSCC OvCa and PrCa

Datasets Cutoffs Number of causalregulations

yfrac14axb R2

GBM 010 11 847 yfrac142274x06893 04161015 10 924 yfrac142495x07275 05460020 9732 yfrac142745x0767 06475025 8461 yfrac142958x08074 06757030 7176 yfrac143194x08319 06807035 6041 yfrac143363x08703 07203040 4997 yfrac143741x09348 07999045 4074 y54082x21034 08694050 3279 yfrac144194x118 09244055 2583 yfrac143896x1259 09463060 1862 yfrac143666x143 09792

LSCC 010 789 172 yfrac143143x06071 04829015 684 524 yfrac143475x06323 05841020 569 369 yfrac143905x06525 06578025 451 346 yfrac144855x06928 07789030 340 860 yfrac146341x07554 08796035 244 547 yfrac148147x08379 09504040 166 593 yfrac149724x0935 09848045 108 024 y51031x21018 09933050 66 335 yfrac149425x1068 09963055 37 632 yfrac147807x1089 09948060 19 547 yfrac146565x1169 09972

OvCa 010 333 146 yfrac143272x05928 05042015 232 794 yfrac144192x06262 06531020 159 872 yfrac146398x07216 08247025 112 792 yfrac148816x08356 09120030 80 808 yfrac141008x09472 09551035 57 099 yfrac149545x1014 09744040 38 517 yfrac148198x1066 09748045 24 439 y56575x21066 09697050 14 435 yfrac14540x1079 09551055 7973 yfrac144368x1107 09319060 4026 yfrac143285x1107 09460

PrCa 010 1 894 322 yfrac143089x06245 02750015 1 749 595 yfrac143586x06787 03582020 1 594 744 yfrac144013x07169 04316025 1 429 858 yfrac144271x0732 04919030 1 260 968 yfrac144389x07244 05616035 1 097 654 yfrac144406x0702 06470040 946 439 yfrac144485x06816 07338045 812 687 y55175x207005 08206050 694 558 yfrac14667x07588 08823055 584 834 yfrac148833x08469 09332060 474 654 yfrac141113x09684 09503

Note The AVCE cutoffs range from 010 to 060 with a step of 005

The bold values are the degree distributions of global lncRNA-mRNA causal reg-

ulatory networks with a compromised AVCE cutoff (045) in four human cancers

Module-specific lncRNA-mRNA causal regulatory networks | 9

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

Next we use four lists of lncRNAs and mRNAs associatedwith GBM LSCC OvCa and PrCa to discover lncRNAndashmRNAcausal networks that are associated with the four human can-cers We define that cancer-related lncRNAndashmRNA causal regu-latory relationships are those in which at least one regulatoryparty is cancer-related lncRNA or mRNA As a result wehave extracted GBM-related LSCC-related OvCa-related andPrCa-related lncRNAndashmRNA causal networks from the fourcancer-specific lncRNAndashmRNA causal networks (details inSupplementary File S1) To understand the potential biologicalprocesses and pathways of the four cancer-related lncRNAndashmRNA causal networks we identify significant GO biologicalprocesses and KEGG pathways using functional enrichmentanalysis In Figure 4B several top GO biological processes and

KEGG pathways such as cytokine activity [69] G-proteincoupled receptor binding [70] TNF signaling pathway [71] cAMPsignaling pathway [72] pathways in cancer are closely associ-ated with the occurrence and development of cancer Thisresult suggests that the identified cancer-related lncRNAndashmRNA causal networks may be involved in the occurrence anddevelopment of human cancer

Conservative network analysis highlights a corelncRNAndashmRNA causal regulatory network acrosshuman cancers

Although most of the lncRNAndashmRNA causal regulatory relation-ships are cancer-specific there are still a number of common

396 424

106 115

1370

133 173

1599

126

1729

1224

252

2523

2157

5081

0

2000

4000

Inte

rsec

tion

Siz

e

PrCa

Ov Ca

LSCC

GBM

025

0050

0075

00

1000

0

Set Size

6 283

13 1 80 673

206

3969

76 3493

407

2816

9950

7

1948

7

0

250000

500000

750000

Inte

rsec

tion

Siz

e

PrCa

Ov Ca

LSCC

GBM

0e+0

0

2e+0

5

4e+0

5

6e+0

5

8e+0

5

Set Size

6 1 2 933

2

41

6 11

246

47

524

0

200

400In

ters

ectio

n S

ize

PrCa

Ov Ca

LSCC

GBM

0200

400

600

Set Size

Module-specific genes

Module-specific causal regulations Module-specific hub lncRNAs 808611

A

B C

Figure 3 Overlap and difference of module-specific genes module-specific causal regulations and module-specific hub lncRNAs across GBM LSCC OvCa and PrCa

(A) Module-specific genes (both lncRNAs and mRNAs) intersection plot (B) Module-specific causal regulations intersection plot (C) Module-specific hub lncRNAs inter-

section plot The red lines denote common genes and causal regulations across GBM LSCC OvCa and PrCa

10 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

causal regulatory relationships between the four global net-works To evaluate whether there is a common core of lncRNAndashmRNA causal regulatory relationships in the global regulatorynetworks across human cancers we concentrate on the con-served lncRNAndashmRNA causal regulatory relationships thatexisted in at least three human cancers

As shown in Figure 5A the majority of the conservedlncRNAndashmRNA causal regulatory relationships form a closely

connected community This finding indicates that the con-served lncRNAndashmRNA causal regulatory network may be a corenetwork across human cancers

The survival analysis shows that the lncRNAs and mRNAs inthe core network can significantly distinguish the metastasisrisks between the high- and low-risk groups in GBM OvCa andPrCa data sets (Figure 5B) This result suggests that the core net-work may act as a common network biomarker of GBM OvCa

Cancer-specific networks Causal regulations y=axb R2

GBM-specific 2816 y=4812x-1292 09774

LSCC-specific 99507 y=1034x-1014 09923

OvCa-specific 19487 y=7366x-1156 09723

PrCa-specific 808611 y=5243x-07055 08310

1 10 100 11001

10

100

1100

Degree of genes

Nu

mb

er o

f ge

nes

GBM-specific fitting curveLSCC-specific fitting curveOvCa-specific fitting curvePrCa-specific fitting curveGBM-specific degree distributionLSCC-specific degree distributionOvCa-specific degree distributionPrCa-specific degree distribution

solutecation symporter activitysymporter activity

gated channel activitysodium ion transmembrane transporter activity

ion channel activitysubstrate-specific channel activity

channel activitypassive transmembrane transporter activity

growth factor activitycation channel activity

metal ion transmembrane transporter activitycollagen bindingheparin binding

sulfur compound bindingintegrin binding

glycosaminoglycan bindingextracellular matrix binding

peptide receptor activityG-protein coupled peptide receptor activity

chemokine bindingG-protein coupled receptor binding

growth factor bindingcytokine binding

serine-type endopeptidase activitydeath receptor activity

tumor necrosis factor-activated receptor activitycytokine receptor activity

protein heterodimerization activitydipeptidase activity

glycoprotein bindingRAGE receptor binding

cytokine receptor bindingcytokine activity

GBM(203)

LSCC(1015)

OvCa(479)

PrCa(3019)

001

002

003

004

padjust

GeneRatio002

004

006

008

Taste transduction

ECM-receptor interaction

cAMP signaling pathway

Calcium signaling pathway

Neuroactive ligand-receptor interaction

PI3K-Akt signaling pathway

Pathways in cancerRegulation of actin cytoskeleton

Complement and coagulation cascades

AGE-RAGE signaling pathway in diabetic complications

Hematopoietic cell lineage

Th17 cell differentiation

Inflammatory bowel disease (IBD)

Malaria

Osteoclast differentiation

Influenza A

Tuberculosis

Intestinal immune network for IgA production

Chagas disease (American trypanosomiasis)

Leishmaniasis

TNF signaling pathway

Toll-like receptor signaling pathway

Rheumatoid arthritis

Cytokine-cytokine receptor interaction

GBM(129)

LSCC(500)

OvCa(266)

PrCa(1364)

GeneRatio

005

010

015

001

002

003

004padjust

GO enrichment analysis KEGG enrichment analysis

A

B

Figure 4 Differential network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) Degree distribution of cancer-specific lncRNAndashmRNA

causal networks in GBM LSCC OvCa and PrCa (B) Functional enrichment analysis of cancer-related lncRNAndashmRNA causal networks in GBM LSCC OvCa and PrCa

Module-specific lncRNA-mRNA causal regulatory networks | 11

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

and PrCa In Figure 5B we also find that the core network con-tains several cancer genes (34 26 30 and 38 cancer genes asso-ciated with GBM LSCC OvCa and PrCa respectively)

By conducting GO and KEGG enrichment analysis we findthat the core network is significantly enriched in 399 GO biologi-cal processes and 3 KEGG pathways (details in SupplementaryFile S2) Of the 399 GO biological processes 2 GO terms includ-ing negative regulation of cell adhesion (GO 0007162) and cyto-kine production in immune response (GO 0002367) areinvolved in three cancer hallmarks Tissue Invasion andMetastasis Tumor Promoting Inflammation and EvadingImmune Detection [73] This observation implies that the corenetwork may control these cancer-related hallmarks

Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

We divide the hub lncRNAs into two categories (1) conserved hublncRNAs which exist in at least three human cancers and (2)cancer-specific hub lncRNAs which only exist in single humancancer As a result we obtain 9 conserved hub lncRNAs and 828cancer-specific hub lncRNAs (include 11 GBM-specific 246 LSCC-specific 47 OvCa-specific and 524 PrCa-specific hub lncRNAs)

To evaluate whether the hub lncRNAs can distinguish meta-stasis risks of human cancers we use them to predict metasta-sis risks for tumor samples in GBM LSCC OvCa and PrCaAs shown in Figure 6A the conserved hub lncRNAs can discrim-inate the metastasis risks of tumor samples significantly(Log-rank P-valuelt 005) in four human cancers In Figure 6Bexcepting LSCC-specific hub lncRNAs owing to failing to fit aCox regression model GBM-specific OvCa-specific and PrCa-specific hub lncRNAs can discriminate the metastasis risks oftumor samples significantly in GBM OvCa and PrCa respec-tively (Log-rank P-valuelt 005) These results suggest that thehub lncRNAs are discriminative and can act as biomarkers todistinguish between high- and low-risk tumor samples

Experimentally validated lncRNAndashmRNA regulations aremostly bad hits for LncTar

Using a collection of experimentally validated lncRNAndashmRNAregulatory relationships (details in Supplementary File S3) asthe ground truth the numbers of experimentally confirmedlncRNAndashmRNA causal regulations are 17 14 20 and 42 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4)

Figure 5 Conservative network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) The core lncRNAndashmRNA causal network that

occurred in at least three human cancers The red diamond nodes and white circle nodes denote lncRNAs and mRNAs respectively (B) Survival analysis of the core

lncRNAndashmRNA causal network

12 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as

lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations

A

B

Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs

Module-specific lncRNA-mRNA causal regulatory networks | 13

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

MSLCRN networks are biologically meaningful

In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks

We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases

Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful

Comparison with other PC-based networkinference methods

Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045

We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers

Conclusions and discussion

Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions

As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological

functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks

Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs

In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers

Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and

Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks

Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)

MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)

Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-

ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-

ciated MSLCRN networks

14 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs

Key Points

bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers

bull lncRNAs exhibit dynamic positive gene regulationacross human cancers

bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships

bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms

Supplementary Data

Supplementary data are available online at httpsacademicoupcombib

Funding

The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)

References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding

RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5

2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69

3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63

4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042

5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30

6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4

7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82

8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63

9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56

10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6

11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12

12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74

13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89

14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22

15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156

16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78

17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683

18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15

19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731

20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13

21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559

22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64

23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8

24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526

25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13

26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3

27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82

28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19

29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46

Module-specific lncRNA-mRNA causal regulatory networks | 15

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60

31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016

32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12

33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892

34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408

35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51

36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32

37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305

38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15

39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99

40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057

41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6

42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6

43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083

44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138

45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8

46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7

47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7

48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370

49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5

50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034

51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082

52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2

53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9

54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17

55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000

56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000

57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1

58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6

59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910

60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000

61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7

62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9

63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30

64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5

65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765

66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9

67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40

68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46

69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52

70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94

71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88

16 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58

73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74

74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104

75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5

76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31

77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90

Module-specific lncRNA-mRNA causal regulatory networks | 17

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats

  • bby008-TF1
  • bby008-TF51
  • bby008-TF2
Page 2: Inferring and analyzing module-specific lncRNA-mRNA causal ...nugget.unisa.edu.au/Thuc/Briefings2019JP.pdf · Thuc Duy Le is a research fellow at the University of South Australia

Inferring and analyzing module-specific lncRNAndash

mRNA causal regulatory networks in human cancerJunpeng Zhang Thuc Duy Le Lin Liu and Jiuyong LiCorresponding authors Junpeng Zhang School of Engineering Dali University Dali Yunnan 671003 Public Republic of China Tel thorn86 872 2219799 Faxthorn86 872 2219799 E-mail zhangjunpeng_411yahoocom Jiuyong Li School of Information Technology and Mathematical Sciences University of SouthAustralia Mawson Lakes 5095 SA Australia Tel thorn61 8 830 23898 Fax thorn61 8 830 23381 E-mail jiuyongliunisaeduau

Abstract

It is known that noncoding RNAs (ncRNAs) cover 98 of the transcriptome but do not encode proteins Among ncRNAslong noncoding RNAs (lncRNAs) are a large and diverse class of RNA molecules and are thought to be a gold mine of poten-tial oncogenes anti-oncogenes and new biomarkers Although only a minority of lncRNAs is functionally characterized it isclear that they are important regulators to modulate gene expression and involve in many biological functions To revealthe functions and regulatory mechanisms of lncRNAs it is vital to understand how lncRNAs regulate their target genes forimplementing specific biological functions In this article we review the computational methods for inferring lncRNAndashmRNA interactions and the third-party databases of storing lncRNAndashmRNA regulatory relationships We have found thatthe existing methods are based on statistical correlations between the gene expression levels of lncRNAs and mRNAs andmay not reveal gene regulatory relationships which are causal relationships Moreover these methods do not consider themodularity of lncRNAndashmRNA regulatory networks and thus the networks identified are not module-specific To addressthe above two issues we propose a novel method MSLCRN to infer and analyze module-specific lncRNAndashmRNA causal reg-ulatory networks We have applied it into glioblastoma multiforme lung squamous cell carcinoma ovarian cancer andprostate cancer respectively The experimental results show that MSLCRN as an expression-based method could be a use-ful complementary method to study lncRNA regulations

Key words lncRNA mRNA lncRNAndashmRNA co-expression lncRNAndashmRNA interaction lncRNAndashmRNA causal relationshiphuman cancer

Junpeng Zhang is an associate professor at the School of Engineering Dali University He received his BSc (2009) in Bio-medical Engineering and MSc(2012) in Control Theory and Control Engineering from Kunming University of Science and Technology Kunming City China His research interestsinclude bioinformatics and data miningThuc Duy Le is a research fellow at the University of South Australia (UniSA) He received his BSc (2002) and MSc (2006) in pure Mathematics from theUniversity of Pedagogy Ho Chi Minh City Vietnam and BSc (2010) in Computer Science from UniSA He received his PhD degree in Computer Science(Bioinformatics) in 2014 at UniSA His research interests are bioinformatics data mining and machine learningLin Liu is a senior lecturer at the School of Information Technology and Mathematical Sciences University of South Australia (UniSA) She received herbachelor and master degrees in Electronic Engineering from Xidian University China in 1991 and 1994 respectively and her PhD degree in computer sys-tems engineering from UniSA in 2006 Her research interests include data mining and bioinformatics as well as Petri nets and their applications to proto-col verification and network security analysisJiuyong Li is a professor at the School of Information Technology and Mathematical Sciences University of South Australia He received his PhD degree incomputer science from the Griffith University Australia (2002) His research interests are in the fields of data mining privacy preserving and bioinfor-matics His research has been supported by five prestigious Australian Research Council Discovery grants since 2005 and he has published more than 100research papersSubmitted 13 December 2017 Received (in revised form) 8 January 2018

VC The Author(s) 2018 Published by Oxford University Press All rights reservedFor Permissions please email journalspermissionsoupcom

1

Briefings in Bioinformatics 2018 1ndash17

doi 101093bibbby008Paper

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

Introduction

Long noncoding RNAs (lncRNAs) are non-protein coding tran-scripts with gt200 nucleotides in length Unlike small noncodingRNAs (sncRNAs) lncRNAs generally exhibit low sequence con-servation However owing to rapidly adaptive selection pres-sures the low conservation of lncRNAs (such as Air and Xist)does not indicate absence of function [1] Similar to microRNAs(miRNAs) an important class of sncRNAs evidence has shownthat lncRNAs play important roles in a wide range of biologicalprocesses even in cancers [2 3] Despite the importance oflncRNAs in many physiological and pathological processes alarge number of lncRNAs remain to be functionally character-ized For this reason the number of studies on lncRNA researchhas been increased exponentially in the past decade (as shownin Figure 1)

To achieve various biological functions lncRNAs form generegulatory networks by interacting with other biological mole-cules such as transcription factors miRNAs messenger RNAs(mRNAs) and RNA-binding proteins [4] Among these biologicalmolecules interacting with lncRNAs mRNAs are the most popu-lar ones By regulating the transcription and translation ofmRNAs lncRNAs could get involved in several vital biologicalprocesses such as cell differentiation cell proliferation andcytoprotective programs [5] Therefore the identification oflncRNAndashmRNA regulatory networks would help to uncoverfunctions and regulatory mechanisms of lncRNAs

A straightforward method for identifying lncRNAndashmRNAregulatory networks is sequence-based complementary basepairing To predict lncRNA targets several sequence-basedmethods such as GUUGle [6] RNAup [7] RNAplex [8] IntaRNA[9] RactIP [10] LncTar [11] and RIblast [12] have been devel-oped Owing to the long sequence and complex tertiary struc-ture of each lncRNA the computational costs of predictinglarge-scale lncRNAndashmRNA regulatory relationships are usuallyhigh Moreover these sequence-based methods only considerthe sequence information of lncRNAs and target mRNAs andthus the predicted lncRNAndashmRNA regulatory networks arestatic However previous studies [13ndash15] have shown thatlncRNAs exhibit condition-specific expression fashion anddynamic networks of gene regulation To identify dynamic orcondition-specific lncRNAndashmRNA regulatory networks it is nec-essary to use expression data Some expression-based methods[16ndash19] for predicting co-expressed lncRNAndashmRNA networks

have been proposed However as the predictions are based onstatistical associations found in gene expression levels onlythey may not represent the real lsquocausalrsquo lncRNAndashmRNA regula-tory relationships Furthermore the existing expression-basedmethods do not consider the modularity of lncRNAndashmRNA reg-ulatory networks an important feature of gene regulatory net-works [20]

In this article we first review the computational methodsfor inferring lncRNAndashmRNA interactions and the public data-bases for storing lncRNAndashmRNA regulatory relationshipsSecond we propose a novel method to infer Module-SpecificLncRNAndashmRNA Causal Regulatory Network (thus the proposedmethod is called MSLCRN) In the first step by consideringmodularity of networks MSLCRN uses Weighted Gene Co-expression Network Analysis (WGCNA) [21] to identify lncRNAndashmRNA co-expression modules In each module the lncRNAsand mRNAs are regarded as module-specific genes In the sec-ond step MSLCRN uses a causal inference method named inter-vention calculus when the directed acyclic graph (DAG) isabsent (IDA) [22 23] to estimate the causal effects of possiblelncRNAndashmRNA causal pairs in each module To speed up theestimation the parallelized version of IDA [24] is used to calcu-late the causal effects For each module the noncausal lncRNAndashmRNA pairs are eliminated and the retained lncRNAndashmRNAcausal pairs are further assembled to generate a module-specific lncRNAndashmRNA causal network To obtain a globallncRNAndashmRNA causal regulatory network we further integratethe identified module-specific lncRNAndashmRNA causal networksin the third step

To evaluate MSLCRN we have applied it into four humancancer data sets including glioblastoma multiforme (GBM) lungsquamous cell carcinoma (LSCC) ovarian cancer (OvCa) andprostate cancer (PrCa) from [25] The validation survival andenrichment analysis results show that the proposed methodcan help with revealing the functions and regulatory mecha-nisms of lncRNAs MSLCRN is released under the GPL-30License and is freely available through GitHub (httpsgithubcomzhangjunpeng411MSLCRN)

Computational methods for inferringlncRNAndashmRNA interactions

In this section we review the computational approaches forinferring lncRNAndashmRNA interactions In Table 1 we divide themethods into two categories (1) sequence-based method and(2) expression-based method We will separately review thesemethods as follows

Sequence-based method

The common characteristic of the sequence-based methods isthat the identification of RNAndashRNA interactions depends onRNA binding energy between two RNA molecules To evaluatethe strength of RNA binding energy a number of energy models[6ndash12 26ndash32] are proposed to predict RNAndashRNA interactions

Gerlach and Giegerich [6] propose a utility program GUUGlefor locating potential helical regions under RNA complementarybase pairs rules The method can be effectively used as a filterfor noncoding RNA (ncRNA) target prediction However the reli-able prediction of RNAndashRNA binding energies is also importantfor the identification of RNAndashRNA interactions To study thethermodynamics of RNAndashRNA interactions Muckstein et al [7]present an extension of the standard partition function methodcalled RNAup to RNA secondary structures By comparing

129 141 164 240 338

639

991

1419

2068

2596

0

500

1000

1500

2000

2500

3000

Year

Num

ber

of p

ublic

atio

ns

Figure 1 The number of lncRNA-related publications in the past decade The

number of queried publications is obtained from PubMed library with keyword

lsquolncRNArsquo

2 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

Table 1 Summary of computational methods or tools for inferring lncRNAndashmRNA interactions

Methodstools Categories of methods Brief descriptions Available

GUUGle [6] Sequence-based Target prediction by locating potential helical regions of RNAndashRNA pairs under RNA base pairing rules which include G-Ubases

httpbibiserv2cebitecuni-bielefelddeguugle

RNAup [7] Sequence-based Target prediction by studying thermodynamics of RNAndashRNApairs based on the sum of the energy of binding andhybridization

httprnatbiunivieacatcgi-binRNAWebSuiteRNAupcgi

RNAcofold [26] Sequence-based Target prediction by computing the hybridization energy andbase pairing pattern of RNAndashRNA pairs

httprnatbiunivieacatcgi-binRNAWebSuiteRNAcofoldcgi

Alkan et al [27] Sequence-based Target prediction by minimizing the joint free energy of RNAndashRNA pairs under a number of energy models including basepair energy model stacked pair energy model loop energymodel

On request

RNAplex [8] Sequence-based Target prediction by finding possible hybridization sites ofRNAndashRNA pairs

httpwwwtbiunivieacathtafer

IntaRNA [9] Sequence-based Target prediction by incorporating accessibility of target sitesas well as the existence of a user-definable seed

httprnainformatikuni-freiburgdeIntaRNAInputjsp

RactIP [10] Sequence-based Target prediction by integrating approximate information onan ensemble of equilibrium joint structures into the objec-tive function of integer programming

httprtipsdnabiokeioacjpractip

PETcofold [28] Sequence-based Target prediction by taking covariance information in intra-molecular and intermolecular base pairs into account

httprthdkresourcespetcofold

RIsearch [29] Sequence-based Target prediction by implementing a simplified Turner energymodel for fast computation of hybridization

httpsrthdkresourcesrisearchrisearch1php

RIsearch2 [30] Sequence-based An updated version of RIsearch and predict targets using a sin-gle integrated seed-and-extend framework based on suffixarrays

httpsrthdkresourcesrisearch

LncTar [11] Sequence-based lncRNA target prediction by finding the minimum free energyjoint structure of RNAndashRNA pairs based on base pairing

httpwwwcuilabcnlnctar

lncRNATargets [31] Sequence-based lncRNA target prediction based on nucleic acidthermodynamics

httpwwwherbbolorg 8001lrt

Terai et al [32] Sequence-based lncRNA target prediction by developing an integrated pipelineon the K computer which is one of the fastest super-com-puters in the world

httprtoolscbrcjpcgi-binRNARNAindexpl

RIblast [12] Sequence-based Target prediction based on the seed-and-extension approach httpgithubcomfukunagatsuRIblast

Liao et al [16] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodand the identified lncRNAndashmRNA interactions should be co-expressed in the same direction in no less than 3 Mousemicroarray data sets

On request

Guo et al [17] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodin OvCa malignant progression

On request

Du et al [18] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodand a power function in thyroid cancer

On request

Liu et al [33] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodin human colorectal carcinoma

On request

Huang et al [34] Expression-based Identify lncRNAndashmRNA interactions associated with pneumo-nia by using Pearson method

On request

Li et al [35] Expression-based Identify dynamic lncRNAndashmRNA interactions associated withvenous congestion by using Pearson method

On request

Wu et al [19] Expression-based Identify lncRNAndashmRNA interactions by using a generalized lin-ear model to regress mRNA expression on lncRNA expres-sion in breast cancer

On request

Fu et al [36] Expression-based Identify lncRNAndashmRNA interactions by considering mRNA lociwithin lncRNA and the Pearson correlation in cartilage

On request

Zhang et al [37] Expression-based Identify lncRNAndashmRNA interactions by considering mRNA lociwithin lncRNA and the Pearson correlation in cartilage inperipheral blood mononuclear cells

On request

Iwakiri et al [38] Expression-based Identify tissue-specific lncRNAndashmRNA interactions by integrat-ing the tissue specificity of lncRNAs and mRNAs intosequence-based prediction of human lncRNAndashRNAinteractions

On request

Lv et al [39] Expression-based Identify tissue-specific lncRNAndashmRNA interactions by usingPearson and sequence-based methods and in human intra-hepatic cholangiocarcinoma

On request

Module-specific lncRNA-mRNA causal regulatory networks | 3

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

predicted free energies of binding with RNA interference experi-mental data RNAup can produce biologically reasonableresults For genome-wide predictions of ncRNA targets RNAupis not fast enough Therefore it is usually to be combined withother faster RNAndashRNA prediction methods

To extend the standard dynamic programming algorithmsfor computing RNA secondary structures Bernhart et al [26]propose a program named RNAcofold to compute the hybridiza-tion energy and base pairing pattern of the co-folding of twoRNA molecules However the method disregards some impor-tant interaction structures and is restricted to dimeric com-plexes Moreover for the RNAndashRNA interaction predictionpredicting the joint secondary structure of two interacting RNAsis also important To solve it Alkan et al [27] develop severalalgorithms to minimize the joint free energy between the twoRNAs under a number of energy models Assuming that con-served RNAndashRNA interactions imply conserved functionSeemann et al [28] also implement a comparative method calledPETcofold to predict the joint secondary structure of two inter-acting RNAs As PETcofold considers sequence conservation anincreasing amount of structural covariance can further improveits performance

RNAup [7] and RNAcofold [26] are too slow for genome-widesearch in finding target sites of ncRNAs To accelerate the speedof RNAndashRNA interaction predictions RNAplex [8] is presented toquickly find possible hybridization sites between two interact-ing RNAs To focus on the target search on short highly stableinteractions RNAplex introduces a per nucleotide penaltyMeanwhile another general and fast approach IntaRNA [9] isproposed to efficiently predict bacterial RNAndashRNA interactionsCompared with other existing target prediction methodsIntaRNA considers both the accessibility of target sites and theexistence of a user-defined seed Therefore it shows a higheraccuracy than competing methods Kato et al [10] also present afast and accurate prediction method RactIP for comprehensivetype of RNAndashRNA interactions In terms of predicting joint sec-ondary structures of two interacting RNAs RactIP run incompa-rably faster than competitive programs

To further achieve a speed improvement of predictingRNAndashRNA interactions Wenzel et al [29] present RIsearch forfast computation of hybridization between two interactingRNAs They show that the energy model of RIsearch is anaccurate approximation of the full energy model for near-complementary RNAndashRNA duplexes Furthermore RIsearch isfaster than RNAplex [8] in RNAndashRNA interaction searchRecently RIsearch2 [30] an updated version of RIsearch [29] isproposed to localize potential near-complementary RNAndashRNAinteractions between two RNA sequences The comparisonresults show that RIsearch2 is much faster than the previousmethods such as GUUGle [6] RNAplex [8] IntaRNA [9] andRIsearch [29]

Although the above RNAndashRNA interaction prediction meth-ods can be extended to predict lncRNAndashmRNA interactionsnone of them are exclusively used for identifying the RNA tar-gets of lncRNAs in a large scale To efficiently identify lncRNAndashmRNA interactions Li et al [11] propose a tool named LncTarLncTar explores lncRNAndashmRNA interactions by finding the min-imum free energy joint structure of two interacting RNAs basedon base pairing As LncTar runs fast and does not have a limitto RNA size it can be used for large-scale identification of theRNA targets for all RNAs Another web-based platformlncRNATargets [31] is also provided for lncRNA target predic-tion Because there is no limit to RNA size lncRNATargets canalso be used to identify the RNA targets of all RNAs In a whole

human transcriptome Terai et al [32] develop an integratedpipeline to predict lncRNAndashmRNA interactions for the first timeIn the pipeline IntaRNA [9] is used to calculate interactionenergy and RactIP [10] is used to predict joint secondary struc-ture Recently to further shorten the running time of predictinglncRNAndashmRNA interactions an ultrafast RNAndashRNA interactionprediction method RIblast [12] based on the seed-and-extensionmethod is presented The comparison results show that RIblastruns faster than RNAplex [8] IntaRNA [9] Terai et al pipeline[32] and thus can be applied to a large scale of lncRNA targetidentification

Expression-based method

At the gene expression level the co-expressed lncRNAndashmRNApairs are regarded as lncRNAndashmRNA interactions for theexpression-based methods Among the existing expression-based methods [16ndash19 33ndash39] Pearson correlation method is akey step of most methods to identify co-expressed lncRNAndashmRNA pairs

Liao et al [16] construct a lncRNAndashmRNA co-expression net-work from re-annotated mouse microarray data sets By usingPearson method they only keep the lncRNAndashmRNA pairs withPlt 001 and Pearson correlation ranked in the top or bottom005 percentile The study is the first large-scale prediction oflncRNA functions from a lncRNAndashmRNA co-expression networkTo identify immune-associated lncRNA biomarkers in OvCa Guoet al [17] make a comprehensive analysis of lncRNAndashmRNA co-expression patterns To identify lncRNAndashmRNA co-expressionpairs they calculate Pearson correlation between differentiallyexpressed lncRNAs and mRNAs They only reserve the lncRNAndashmRNA co-expression pairs with Pearson correlationgt 05 and thecorresponding False Discovery Rate (FDR)lt 001 Liu et al [33] andHuang et al [34] also use Pearson method to study lncRNAndashmRNAco-expression networks in human colorectal carcinoma andpneumonia respectively The inferred lncRNAndashmRNA co-expression networks will help to study lncRNA functionsRecently Du et al [18] propose a two-step method to conduct acomprehensive analysis of lncRNAndashmRNA co-expression patternsin thyroid cancer First they use Pearson method to calculatePearson correlation and the cutoff of Pearson correlation is 05and the corresponding FDR cutoff is 001 Second the Pearson cor-relations are transformed into an adjacency matrix

Owing to dynamic characteristic of gene regulatory net-works Wu et al [19] identify two distinct lncRNAndashmRNA co-expression networks in tumor and normal breast tissue Theyuse a generalized linear model to regress mRNA expression onlncRNA expression in tumor and normal breast tissue and onlyfocus on dynamic breast lncRNAndashmRNA co-expression pairsthat differ in tumor and normal breast tissue Meanwhile tostudy the potential role of lncRNAs in venous congestion Liet al [35] also construct a dynamic lncRNAndashmRNA co-expression network By using Pearson method they separatelycalculate Pearson correlations of each lncRNAndashmRNA pair invenous congestion and normal samples The lncRNAndashmRNApairs with Pearson correlationgt099 orlt099 and P-valuelt001are selected as lncRNAndashmRNA co-expression pairs They con-struct two types of lncRNAndashmRNA co-expression networkslsquolostrsquo network where lncRNAndashmRNA co-expression pairs onlyexisted in normal samples and lsquoobtainedrsquo network wherelncRNAndashmRNA co-expression pairs only existed in venous con-gestion samples The lsquolostrsquo and lsquoobtainedrsquo networks are furtherintegrated to obtain a dynamic lncRNAndashmRNA co-expressionnetwork

4 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

The above methods simply use matched lncRNA and mRNAexpression data to identify lncRNAndashmRNA co-expression pairsTo identify lsquocis-regulated target genesrsquo of lncRNAs some methodsalso consider mRNA loci information within lncRNA For exampleFu et al [36] combine mRNA loci information and matchedlncRNA and mRNA expression data to predict lncRNA targetsThey identify the mRNAs as targets under two conditions (i) themRNA loci are within a 300-kb window up- or downstream oflncRNA and (ii) lncRNAndashmRNA co-expression pairs are signifi-cantly positive correlated (Pearson correlationgt 08 and the corre-sponding P-valuelt 005) Zhang et al [37] also use a similarmethod to Fu et al [36] for identifying lncRNA targets The mRNAscan be regarded as targets when (1) the mRNA loci are within a10 window up- or downstream of lncRNA and (2) lncRNAndashmRNAco-expression pairs are significantly positive correlated (Pearsoncorrelationgt 098 and the corresponding P-valuelt 005)

Apart from mRNA loci information within lncRNA someemerging methods consider predictions from sequence-basedmethods as putative lncRNAndashmRNA interactions For exampleIwakiri et al [38] integrate tissue-specific lncRNA and mRNAexpression data into predictions from a sequence-basedmethod in [32] They discover that integrating tissue specificitycan improve prediction accuracy of lncRNAndashmRNA interactionsLv et al [39] also combine matched lncRNA and mRNA expres-sion data with predictions from a sequence-based methodLncTar [11] They first use Pearson method to identify co-expressed lncRNAndashmRNA co-expression pairs with Pearsoncorrelationgt095 orlt095 Then LncTar is used to further filterthe identified lncRNAndashmRNA co-expression pairs

Public databases for storing lncRNAndashmRNAregulatory relationships

In this section we review the public databases of storinglncRNAndashmRNA regulatory relationships Table 2 shows a sum-mary of the third-party public databases including experimen-tally validated and computationally predicted databases

NPInter [40] contains experimentally validated interactionsbetween ncRNAs especially lncRNAs and miRNAs The databasecontains 915 067 interactions in 188 tissues or cell lines from 68kinds of experimental technologies There is a classification ofthe functional interactions based on the functional process thatncRNA is involved in Moreover NPInter allows users to searchinteractions related publications and other information

LncRNADisease [41] not only collects experimentally sup-ported lncRNAndashdisease associations and lncRNA interactionsbut also predicts novel lncRNAndashdisease associations Recentlythe database curates 478 entries of experimentally validatedlncRNA interactions LncRNADisease provides users severalways to search lncRNA-related diseases and interactions

To study differentially expressed genes after lncRNA knock-down or overexpression Jiang et al [42] develop a databasecalled LncRNA2Target in human and mouse organisms Thedatabase has a collection of 396 experimentally validatedlncRNAndashtarget interactions In LncRNA2Target if a gene is dif-ferentially expressed after lncRNA knockdown or overexpres-sion it is regarded as a target of a lncRNA For convenienceLncRNA2Target allows users to search for the targets of singlelncRNA or for the lncRNAs that target a specific geneMeanwhile Zhou et al [43] also build a reference resourceLncReg for lncRNA-related regulatory networks The databasehas 1081 experimentally validated lncRNA-related regulatory

records between 258 nonredundant lncRNAs and 571 nonredun-dant genes

IRNdb [44] is a database that focuses on collecting immuno-logically relevant lncRNAndashtarget miRNAndashtarget and PIWI-interacting RNAndashtarget interactions The current version ofIRNdb documents 22 453 immunologically relevant lncRNAndashtar-get interactions by integrating three databases LncRNADisease[41] LncRNA2Target [42] and LncReg [43] The aim is to helpresearchers study the roles of ncRNAs in the immune systemRecently a new experimentally validated database namedlncRInter [45] was developed to collect reliable and high-qualitylncRNAndashtarget interactions The extracted lncRNAndashtarget inter-actions are all from published literature and are supported bycertain biological experiments (eg luciferase reporter assayin vitro binding assay RNA pull-down) In total lncRInter con-tains 1036 experimentally validated lncRNAndashtarget interactionsin 15 organisms

In addition to the experimentally validated databases pre-sented above there are several computationally predicted data-bases for collecting lncRNAndashmRNA interactions For instancestarBase [46] is a comprehensive database of systematicallyidentifying the RNAndashRNA and proteinndashRNA interaction net-works from 108 CLIP-Seq (PAR-CLIP HITS-CLIP iCLIP CLASH)data sets The lncRNAndashmRNA interactions can be extractedfrom proteinndashRNA interaction networks InCaNet [47] aimsto establish a comprehensive regulatory network betweenlncRNAs and cancer genes They identify lncRNAndashcancergene interactions by computing gene co-expression betweenlncRNAs and cancer genes BmncRNAdb [48] is a comprehensivedatabase of silkworm lncRNAs and miRNAs The database pro-vides three online tools for users to predict both lncRNAndashtargetand miRNAndashtarget interactions lncRNAtor [49] collect expres-sion data from 243 RNA-seq experiments including 5237samples of various tissues and developmental stages ThelncRNAndashmRNA co-expression pairs are identified through co-expression analysis of lncRNAs and mRNAs lncRNome [50] is acomprehensive knowledgebase of sequence structure biologi-cal functions genomic variations and epigenetic modificationson gt17 000 lncRNAs in human For lncRNAndashprotein interactionsthe database incorporates PAR-CLIP experiments and a supportvector machine-based prediction method Co-lncRNA [51] andLncRNA2Function [52] predict co-expressed lncRNAndashmRNAinteractions from RNA-Seq data and further annotates thepotential functions of human lncRNAs using functional enrich-ment analysis lncRNAMap [53] is an integrated and compre-hensive database to explore regulatory functions of humanlncRNAs By integrating small RNAs supported by publicly avail-able deep sequencing data lncRNAMap construct lncRNA-derived siRNAndashtarget interactions

In summary for experimentally validated databases userscan select individual database or combine several databases asground truth to validate the predicted lncRNAndashmRNA interac-tions As for computationally predicted databases they can beused as initial structural of sequence-based or expression-basedmethods to identify lncRNAndashmRNA interactions

Inferring and analyzing MSLCRN networksRepurposed microarray data across human cancers

We collect the repurposed lncRNA and mRNA expression dataof GBM LSCC OvCa and PrCa from [25] A lncRNA or mRNA iseliminated if it does not have a corresponding gene symbol in adata set By calculating average expression values of replicate

Module-specific lncRNA-mRNA causal regulatory networks | 5

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

lncRNAs and mRNAs we obtain unique expression value ofthese replicates Consequently we get the matched expressiondata of 9704 lncRNAs and 18 282 mRNAs in 451 GBM 113 LSCC585 OvCa and 150 PrCa samples

Pipeline of MSLCRN

As shown in Figure 2 MSLCRN contains the following threesteps to infer module-specific lncRNAndashmRNA causal regulatorynetworks

i Identification of lncRNAndashmRNA co-expression modulesGiven the matched lncRNA and mRNA expression data weuse WGCNA to generate gene co-expression modules Amodule containing at least two lncRNAs and two mRNAsare regarded as a lncRNAndashmRNA co-expression moduleand used as the input of the second step

ii Identification of module-specific lncRNAndashmRNA causal reg-ulatory networks For each lncRNAndashmRNA co-expressionmodule with each lncRNAndashmRNA pair we apply parallelIDA to estimate the causal effect of the lncRNA on the

Table 2 Public databases for storing lncRNAndashmRNA regulatory relationships

Databases Types of databases Brief descriptions Organisms Available

NPInter [40] Validated A database of experimentally verified func-tional interactions between ncRNAs(including lncRNAs miRNAs etc) and bio-molecules (proteins RNAs and DNAs)

22 organisms httpwwwbioinfoorgNPInter

LncRNADisease [41] Validated A database of experimentally supportedlncRNAndashdisease association data andlncRNAndashtarget interactions in various lev-els including protein RNA miRNA andDNA

Human httpwwwcuilabcnlncrnadisease

LncRNA2Target [42] Validated A database of lncRNAndashtarget regulatory rela-tionships experimentally validated bylncRNA knockdown or overexpression

Human mouse httpwwwlncrna2targetorg

LncReg [43] Validated A database of experimentally validatedlncRNAndashtarget interactions from publicliterature

7 organisms httpbioinformaticsustceducnlncreg

IRNdb [44] Validated A database of immunologically relevantncRNAs (miRNAs lncRNAs and otherncRNAs) and target genes

Human mouse httpcompbiomasseyacnzappsirndb

lncRInter [45] Validated A database of experimentally validatedlncRNAndashtarget interactions extracted frompeer-reviewed publications

15 organisms httpbioinfolifehusteducnlncRInter

starBase [46] Predicted A comprehensive database of systematicallyidentifying the RNAndashRNA and proteinndashRNAinteraction networks from 108 CLIP-Seq(PAR-CLIP HITS-CLIP iCLIP CLASH) datasets

Human httpstarbasesysueducn

lnCaNet [47] Predicted A database of establishing a comprehensiveregulatory network source for lncRNA andcancer genes

Human httplncanetbioinfo-minzhaoorg

BmncRNAdb [48] Predicted A comprehensive database of the silkwormlncRNAs and miRNAs as well as the threeonline tools for users to predict the targetgenes of lncRNAs or miRNAs

Bombyx mori httpgenecqueducnBmncRNAdbindexphp

lncRNAtor [49] Predicted A comprehensive resource of encompassingannotation sequence analysis geneexpression protein binding and phyloge-netic conservation

6 organisms httplncrnatorewhaackr

lncRNome [50] Predicted A comprehensive knowledgebase on thetypes chromosomal locations descriptionon the biological functions and diseaseassociations of lncRNAs

Human httpgenomeigibresinlncRNome

Co-LncRNA [51] Predicted A computationally predicted database toidentify GO annotations and KEGG path-ways affected by co-expressed protein-cod-ing genes of a single or multiple lncRNAs

Human httpwwwbio-bigdatacomCo-LncRNA

LncRNA2Function [52] Predicted A comprehensive resource of investigatingthe functions of lncRNAs based on co-expressed lncRNAndashmRNA interactions

Human httpmlghiteducnlncrna2function

lncRNAMap [53] Predicted An integrated and comprehensive databaseof regulatory functions of lncRNAs and act-ing as ceRNAs

Human httplncRNAMapmbcnctuedutw

6 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

mRNA We use the absolute value of the causal effect(AVCE) to evaluate the strength of the regulation of thelncRNA on the mRNA and a higher AVCE indicates a stron-ger lncRNA regulation The lncRNAndashmRNA pairs with highAVCEs in each module are considered as module-specificlncRNAndashmRNA causal regulatory relationships and we calleach module with these relationships identified a module-specific causal regulatory network

iii Identification of global lncRNAndashmRNA causal regulatorynetwork We integrate the module-specific lncRNAndashmRNA

causal regulatory networks to form the global lncRNAndashmRNA causal regulatory network

Identification of lncRNAndashmRNA co-expression modules

In systems biology WGCNA [21] is a popular method for findingthe correlation patterns among genes across samples and canbe used to identify clusters or modules of highly co-expressedgenes Therefore we use WGCNA to first infer lncRNAndashmRNAco-expression modules

Figure 2 The pipeline of MSLCRN First WGCNA is used to identify lncRNAndashmRNA co-expression modules from matched lncRNA and mRNA expression data Second

we infer lncRNAndashmRNA causal regulatory relationships in each module by using parallel IDA method For each module we assemble the identified lncRNAndashmRNA reg-

ulatory relationships to obtain a module-specific lncRNAndashmRNA causal regulatory network Third the module-specific lncRNAndashmRNA causal regulatory networks are

integrated to form a global lncRNAndashmRNA causal regulatory network

Module-specific lncRNA-mRNA causal regulatory networks | 7

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

Specifically the matched lncRNA and mRNA expressiondata are used as the input of WGCNA For each pair of genesi and j the gene co-expression similarity sij of the pair is definedas

sij frac14 jcorethi jTHORNj (1)

where jcor(i j)j is the absolute value of the Pearson correlationbetween genes i and j The gene co-expression similarity matrixis denoted by Sfrac14 [sij]

To pick an appropriate soft-thresholding power for trans-forming the similarity matrix S into an adjacency matrix A weuse the scale-free topology criterion for soft-thresholding andthe minimum scale free topology fitting index R2 is set as 09Then the topological overlap matrix (TOM) Wfrac14 [wij] is gener-ated based on the adjacency matrix Afrac14 [aij] The TOM similaritywij between genes i and j is defined

wij frac14P

uaiuauj thorn aij

minfP

uaiuP

uaujg thorn 1 aij(2)

where u denotes all genes of the matched lncRNA and mRNAexpression data The TOM dissimilarity between genes i and j isdenoted by dijfrac14 1 - wij To identify gene co-expression modulesthe TOM dissimilarity matrix Dfrac14 [dij] is clustered using optimalhierarchical clustering method [54] Here the identified geneco-expression modules are groups of lncRNAs and mRNAs withhigh topological overlap The lncRNAs and mRNAs of eachlncRNAndashmRNA co-expression module are considered for possi-ble lncRNAndashmRNA causal relationships in the next step

Identification of module-specific lncRNAndashmRNA causalregulatory networks

After the identification of lncRNAndashmRNA co-expression mod-ules we use the parallel IDA method [24] to estimate causaleffects of possible lncRNAndashmRNA causal pairs in each moduleThe application of parallel IDA method to matched lncRNAand mRNA expression data for estimating causal effectsincludes two steps (i) learning the causal structure fromexpression data using the parallel-PC algorithm [24] and(ii) estimating the causal effects of lncRNAs on mRNAs byapplying do-calculus [55]

In step (i) Vfrac14 L1 Lm T1 Tn is a set of random varia-bles denoting m lncRNAs and n mRNAs The causal structure isin the form of a DAG where a node denotes a lncRNA Li ormRNA Tj and an edge between two nodes represents a causalrelationship between them We use the parallel-PC algorithm aparallel version of the PC algorithm [56] to learn the causalstructures (the DAGs) from expression data Starting with a fullyconnected undirected graph the parallel-PC algorithm deter-mines if an edge is retained or removed in the graph by con-ducting conditional independence tests in parallel Then to geta DAG the directions of edges in the obtained graph are ori-ented As different DAGs may represent the same conditionalindependence the parallel-PC algorithm uses a completed par-tially directed acyclic graph (CPDAG) to uniquely describe anequivalence class of DAGs In this work we use the R-packageParallelPC [57] to implement the parallel-PC algorithm and setthe significant level of the conditional independence testsafrac14 001

In step (ii) we are only interested in estimating the causaleffect of the directed edge Li Tj where vertex is Li a parent ofvertex Tj As described above a CPDAG may generate a class ofDAGs For the causal effect of Li Tj in a CPDAG we use do-calculus [55] to estimate the causal effects of Li on Tj in a class ofDAGs Then we use the minimum absolute value of all possiblecausal effects as a final causal effect of Li Tj As for the detailsof how the parallel IDA method is applied to estimate causalrelationships from expression data the readers can refer to [24]

The estimated causal effects can be positive or negativereflecting the up or down regulation by the lncRNAs on themRNAs For the purpose of constructing the regulatory net-works we use the absolute values of the causal effects (AVCEs)to evaluate the strengths of the regulation and thus to confirmthe regulatory relationships

We set different AVCE cutoffs from 010 to 060 with a step of005 to generate MSLCRN networks in GBM LSCC OvCa andPrCa respectively For each cutoff we merge the identifiedMSLCRN networks to obtain global lncRNAndashmRNA causal regu-latory networks in the four human cancers respectively Asshown in Table 3 a higher cutoff selection causes a smallerglobal lncRNAndashmRNA causal regulatory network but bettergoodness of fit To make a trade-off between the size of theglobal lncRNAndashmRNA causal regulatory networks and goodnessof fit we set a compromised AVCE cutoff with a value of 045 Ifthe AVCE of a lncRNA on a mRNA is 045 or above we considerthere is a causal regulatory relationship between the lncRNAndashmRNA pair Under the compromise cutoff we have a moderatesize of the global lncRNAndashmRNA causal regulatory networks inGBM LSCC OvCa and PrCa Meanwhile the node degree distri-butions of four global lncRNAndashmRNA causal regulatory net-works also follow power law distribution (the fitted power curveis in the form of yfrac14 axb) well with R2gt 08

Validation survival and enrichment analysis

Previous studies have demonstrated that about 20 of thenodes in a biological network are essential and are regarded ashub genes [58 59] Therefore when analyzing a global lncRNAndashmRNA causal network we select the 20 of lncRNAs with thehighest degrees in the network as hub lncRNAs The degree of alncRNA node in the global network is the number of mRNAsconnected with it

To validate the predicted module-specific lncRNAndashmRNAcausal regulatory relationships we obtain the experimentallyvalidated lncRNAndashmRNA regulatory relationships from thethree widely used databases NPInter v30 [40] LncRNADiseasev2017 [41] and LncRNA2Target v12 [42] Furthermore we retainexperimentally validated lncRNAndashmRNA regulatory relation-ships associated with the four human cancer data sets asground truth

We perform survival analysis using the R-package survival[60] A multivariate Cox model is used to predict the risk scoreof each tumor sample Then all tumor samples in each cancerdata set are equally divided into high- and low-risk groupsaccording to their risk scores Moreover we calculate theHazard Ratio between the high- and the low-risk groups andperform the Log-rank test

To further investigate the underlying biological processesand pathways related to each of the MSLCRN networks we usethe R-package clusterProfiler [61] to conduct functional enrich-ment analysis on the networks respectively The GeneOntology (GO) [62] biological processes and Kyoto Encyclopedia

8 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

of Genes and Genomes (KEGG) [63] pathways with adjustedP-valuelt005 [adjusted by Benjamini-Hochberg (BH) method]are regarded as functional categories for the MSLCRN networks

We also collect a list of lncRNAs and mRNAs that areassociated with GBM LSCC OvCa and PrCa to study diseaseenrichment of each of the MSLCRN networks The list of disease-associated lncRNAs is obtained from LncRNADisease v2017 [41]Lnc2Cancer v2016 [64] and MNDR v20 [65] The list of disease-associated mRNAs is from DisGeNET v50 [66] To evaluatewhether a MSLCRN network is significantly enriched in a specificdisease we use a hyper-geometric distribution test as follows

p frac14 1 FethxjBNMTHORN frac14 1Xx1

ifrac140

N

i

B N

M i

B

M

(3)

In the formula B is the number of all genes in the expressiondata set N denotes the number of all genes associated with aspecific disease in the expression data set M is the number ofgenes in a MSLCRN network and x is the number of genes asso-ciated with a specific disease in a MSLCRN network A MSLCRNnetwork is significantly enriched in a specific disease if theP-valuelt 005

Network analysis validation and comparisonon MSLCRN networkslncRNAs exhibit dynamic positive gene regulationacross cancers

By following the first step of the MSLCRN method we haveidentified 23 38 45 and 32 lncRNAndashmRNA co-expression mod-ules in GBM LSCC OvCa and PrCa respectively In the secondstep of the MSLCRN method we eliminate the noncausallncRNAndashmRNA pairs in lncRNAndashmRNA co-expression modulesAs a result we generate 23 38 45 and 32 module-specificlncRNAndashmRNA causal regulatory networks in GBM LSCC OvCaand PrCa respectively After merging the module-specificlncRNAndashmRNA causal regulatory networks for each data set weobtain the four global lncRNAndashmRNA regulatory networks inGBM LSCC OvCa and PrCa respectively

To understand the overlap and difference of module-specificgenes module-specific lncRNAndashmRNA causal regulatory rela-tionships and module-specific hub lncRNAs in the four humancancers we generate three set intersection plots using theR-package UpSetR [67] As shown in Figure 3 we find that themajority of module-specific genes (5752) module-specificlncRNAndashmRNA causal regulatory relationships (9902) andmodule-specific hub lncRNAs (8922) tend to be cancer-specific Only a small portion of module-specific genes (396) andmodule-specific lncRNAndashmRNA causal regulatory relationships(6) are shared by the four cancers Especially none of themodule-specific hub lncRNAs are common between the fourcancers In addition the causal effects are positive for 99569672 9993 and 7863 of the causal regulatory relationshipsidentified in GBM LSCC OvCa and PrCa respectively Theseresults indicate that lncRNAs are more likely to exhibit dynamicpositive gene regulation across cancers The results are alsoconsistent with the proposition that the positive gene regula-tion by lncRNAs would be desired in specific situations [68]

Differential network analysis uncovers cancer-specificlncRNAndashmRNA causal networks

In this section we focus on studying cancer-specific lncRNAndashmRNA causal networks using differential network analysisThus the GBM-specific LSCC-specific OvCa-specific and PrCa-specific lncRNAndashmRNA causal networks are identified Asshown in Figure 4A the distributions of node degrees in thesefour cancer-specific lncRNAndashmRNA causal networks followpower law distributions well with R2frac14 09774 09923 09723and 08310 respectively Thus these four cancer-specificlncRNAndashmRNA causal networks are scale free indicating thatmost mRNAs are regulated by a small number of lncRNAs

Table 3 Degree distributions of global lncRNAndashmRNA causal regula-tory networks with different cutoffs in GBM LSCC OvCa and PrCa

Datasets Cutoffs Number of causalregulations

yfrac14axb R2

GBM 010 11 847 yfrac142274x06893 04161015 10 924 yfrac142495x07275 05460020 9732 yfrac142745x0767 06475025 8461 yfrac142958x08074 06757030 7176 yfrac143194x08319 06807035 6041 yfrac143363x08703 07203040 4997 yfrac143741x09348 07999045 4074 y54082x21034 08694050 3279 yfrac144194x118 09244055 2583 yfrac143896x1259 09463060 1862 yfrac143666x143 09792

LSCC 010 789 172 yfrac143143x06071 04829015 684 524 yfrac143475x06323 05841020 569 369 yfrac143905x06525 06578025 451 346 yfrac144855x06928 07789030 340 860 yfrac146341x07554 08796035 244 547 yfrac148147x08379 09504040 166 593 yfrac149724x0935 09848045 108 024 y51031x21018 09933050 66 335 yfrac149425x1068 09963055 37 632 yfrac147807x1089 09948060 19 547 yfrac146565x1169 09972

OvCa 010 333 146 yfrac143272x05928 05042015 232 794 yfrac144192x06262 06531020 159 872 yfrac146398x07216 08247025 112 792 yfrac148816x08356 09120030 80 808 yfrac141008x09472 09551035 57 099 yfrac149545x1014 09744040 38 517 yfrac148198x1066 09748045 24 439 y56575x21066 09697050 14 435 yfrac14540x1079 09551055 7973 yfrac144368x1107 09319060 4026 yfrac143285x1107 09460

PrCa 010 1 894 322 yfrac143089x06245 02750015 1 749 595 yfrac143586x06787 03582020 1 594 744 yfrac144013x07169 04316025 1 429 858 yfrac144271x0732 04919030 1 260 968 yfrac144389x07244 05616035 1 097 654 yfrac144406x0702 06470040 946 439 yfrac144485x06816 07338045 812 687 y55175x207005 08206050 694 558 yfrac14667x07588 08823055 584 834 yfrac148833x08469 09332060 474 654 yfrac141113x09684 09503

Note The AVCE cutoffs range from 010 to 060 with a step of 005

The bold values are the degree distributions of global lncRNA-mRNA causal reg-

ulatory networks with a compromised AVCE cutoff (045) in four human cancers

Module-specific lncRNA-mRNA causal regulatory networks | 9

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

Next we use four lists of lncRNAs and mRNAs associatedwith GBM LSCC OvCa and PrCa to discover lncRNAndashmRNAcausal networks that are associated with the four human can-cers We define that cancer-related lncRNAndashmRNA causal regu-latory relationships are those in which at least one regulatoryparty is cancer-related lncRNA or mRNA As a result wehave extracted GBM-related LSCC-related OvCa-related andPrCa-related lncRNAndashmRNA causal networks from the fourcancer-specific lncRNAndashmRNA causal networks (details inSupplementary File S1) To understand the potential biologicalprocesses and pathways of the four cancer-related lncRNAndashmRNA causal networks we identify significant GO biologicalprocesses and KEGG pathways using functional enrichmentanalysis In Figure 4B several top GO biological processes and

KEGG pathways such as cytokine activity [69] G-proteincoupled receptor binding [70] TNF signaling pathway [71] cAMPsignaling pathway [72] pathways in cancer are closely associ-ated with the occurrence and development of cancer Thisresult suggests that the identified cancer-related lncRNAndashmRNA causal networks may be involved in the occurrence anddevelopment of human cancer

Conservative network analysis highlights a corelncRNAndashmRNA causal regulatory network acrosshuman cancers

Although most of the lncRNAndashmRNA causal regulatory relation-ships are cancer-specific there are still a number of common

396 424

106 115

1370

133 173

1599

126

1729

1224

252

2523

2157

5081

0

2000

4000

Inte

rsec

tion

Siz

e

PrCa

Ov Ca

LSCC

GBM

025

0050

0075

00

1000

0

Set Size

6 283

13 1 80 673

206

3969

76 3493

407

2816

9950

7

1948

7

0

250000

500000

750000

Inte

rsec

tion

Siz

e

PrCa

Ov Ca

LSCC

GBM

0e+0

0

2e+0

5

4e+0

5

6e+0

5

8e+0

5

Set Size

6 1 2 933

2

41

6 11

246

47

524

0

200

400In

ters

ectio

n S

ize

PrCa

Ov Ca

LSCC

GBM

0200

400

600

Set Size

Module-specific genes

Module-specific causal regulations Module-specific hub lncRNAs 808611

A

B C

Figure 3 Overlap and difference of module-specific genes module-specific causal regulations and module-specific hub lncRNAs across GBM LSCC OvCa and PrCa

(A) Module-specific genes (both lncRNAs and mRNAs) intersection plot (B) Module-specific causal regulations intersection plot (C) Module-specific hub lncRNAs inter-

section plot The red lines denote common genes and causal regulations across GBM LSCC OvCa and PrCa

10 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

causal regulatory relationships between the four global net-works To evaluate whether there is a common core of lncRNAndashmRNA causal regulatory relationships in the global regulatorynetworks across human cancers we concentrate on the con-served lncRNAndashmRNA causal regulatory relationships thatexisted in at least three human cancers

As shown in Figure 5A the majority of the conservedlncRNAndashmRNA causal regulatory relationships form a closely

connected community This finding indicates that the con-served lncRNAndashmRNA causal regulatory network may be a corenetwork across human cancers

The survival analysis shows that the lncRNAs and mRNAs inthe core network can significantly distinguish the metastasisrisks between the high- and low-risk groups in GBM OvCa andPrCa data sets (Figure 5B) This result suggests that the core net-work may act as a common network biomarker of GBM OvCa

Cancer-specific networks Causal regulations y=axb R2

GBM-specific 2816 y=4812x-1292 09774

LSCC-specific 99507 y=1034x-1014 09923

OvCa-specific 19487 y=7366x-1156 09723

PrCa-specific 808611 y=5243x-07055 08310

1 10 100 11001

10

100

1100

Degree of genes

Nu

mb

er o

f ge

nes

GBM-specific fitting curveLSCC-specific fitting curveOvCa-specific fitting curvePrCa-specific fitting curveGBM-specific degree distributionLSCC-specific degree distributionOvCa-specific degree distributionPrCa-specific degree distribution

solutecation symporter activitysymporter activity

gated channel activitysodium ion transmembrane transporter activity

ion channel activitysubstrate-specific channel activity

channel activitypassive transmembrane transporter activity

growth factor activitycation channel activity

metal ion transmembrane transporter activitycollagen bindingheparin binding

sulfur compound bindingintegrin binding

glycosaminoglycan bindingextracellular matrix binding

peptide receptor activityG-protein coupled peptide receptor activity

chemokine bindingG-protein coupled receptor binding

growth factor bindingcytokine binding

serine-type endopeptidase activitydeath receptor activity

tumor necrosis factor-activated receptor activitycytokine receptor activity

protein heterodimerization activitydipeptidase activity

glycoprotein bindingRAGE receptor binding

cytokine receptor bindingcytokine activity

GBM(203)

LSCC(1015)

OvCa(479)

PrCa(3019)

001

002

003

004

padjust

GeneRatio002

004

006

008

Taste transduction

ECM-receptor interaction

cAMP signaling pathway

Calcium signaling pathway

Neuroactive ligand-receptor interaction

PI3K-Akt signaling pathway

Pathways in cancerRegulation of actin cytoskeleton

Complement and coagulation cascades

AGE-RAGE signaling pathway in diabetic complications

Hematopoietic cell lineage

Th17 cell differentiation

Inflammatory bowel disease (IBD)

Malaria

Osteoclast differentiation

Influenza A

Tuberculosis

Intestinal immune network for IgA production

Chagas disease (American trypanosomiasis)

Leishmaniasis

TNF signaling pathway

Toll-like receptor signaling pathway

Rheumatoid arthritis

Cytokine-cytokine receptor interaction

GBM(129)

LSCC(500)

OvCa(266)

PrCa(1364)

GeneRatio

005

010

015

001

002

003

004padjust

GO enrichment analysis KEGG enrichment analysis

A

B

Figure 4 Differential network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) Degree distribution of cancer-specific lncRNAndashmRNA

causal networks in GBM LSCC OvCa and PrCa (B) Functional enrichment analysis of cancer-related lncRNAndashmRNA causal networks in GBM LSCC OvCa and PrCa

Module-specific lncRNA-mRNA causal regulatory networks | 11

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

and PrCa In Figure 5B we also find that the core network con-tains several cancer genes (34 26 30 and 38 cancer genes asso-ciated with GBM LSCC OvCa and PrCa respectively)

By conducting GO and KEGG enrichment analysis we findthat the core network is significantly enriched in 399 GO biologi-cal processes and 3 KEGG pathways (details in SupplementaryFile S2) Of the 399 GO biological processes 2 GO terms includ-ing negative regulation of cell adhesion (GO 0007162) and cyto-kine production in immune response (GO 0002367) areinvolved in three cancer hallmarks Tissue Invasion andMetastasis Tumor Promoting Inflammation and EvadingImmune Detection [73] This observation implies that the corenetwork may control these cancer-related hallmarks

Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

We divide the hub lncRNAs into two categories (1) conserved hublncRNAs which exist in at least three human cancers and (2)cancer-specific hub lncRNAs which only exist in single humancancer As a result we obtain 9 conserved hub lncRNAs and 828cancer-specific hub lncRNAs (include 11 GBM-specific 246 LSCC-specific 47 OvCa-specific and 524 PrCa-specific hub lncRNAs)

To evaluate whether the hub lncRNAs can distinguish meta-stasis risks of human cancers we use them to predict metasta-sis risks for tumor samples in GBM LSCC OvCa and PrCaAs shown in Figure 6A the conserved hub lncRNAs can discrim-inate the metastasis risks of tumor samples significantly(Log-rank P-valuelt 005) in four human cancers In Figure 6Bexcepting LSCC-specific hub lncRNAs owing to failing to fit aCox regression model GBM-specific OvCa-specific and PrCa-specific hub lncRNAs can discriminate the metastasis risks oftumor samples significantly in GBM OvCa and PrCa respec-tively (Log-rank P-valuelt 005) These results suggest that thehub lncRNAs are discriminative and can act as biomarkers todistinguish between high- and low-risk tumor samples

Experimentally validated lncRNAndashmRNA regulations aremostly bad hits for LncTar

Using a collection of experimentally validated lncRNAndashmRNAregulatory relationships (details in Supplementary File S3) asthe ground truth the numbers of experimentally confirmedlncRNAndashmRNA causal regulations are 17 14 20 and 42 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4)

Figure 5 Conservative network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) The core lncRNAndashmRNA causal network that

occurred in at least three human cancers The red diamond nodes and white circle nodes denote lncRNAs and mRNAs respectively (B) Survival analysis of the core

lncRNAndashmRNA causal network

12 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as

lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations

A

B

Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs

Module-specific lncRNA-mRNA causal regulatory networks | 13

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

MSLCRN networks are biologically meaningful

In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks

We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases

Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful

Comparison with other PC-based networkinference methods

Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045

We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers

Conclusions and discussion

Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions

As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological

functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks

Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs

In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers

Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and

Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks

Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)

MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)

Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-

ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-

ciated MSLCRN networks

14 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs

Key Points

bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers

bull lncRNAs exhibit dynamic positive gene regulationacross human cancers

bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships

bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms

Supplementary Data

Supplementary data are available online at httpsacademicoupcombib

Funding

The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)

References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding

RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5

2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69

3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63

4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042

5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30

6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4

7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82

8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63

9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56

10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6

11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12

12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74

13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89

14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22

15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156

16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78

17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683

18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15

19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731

20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13

21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559

22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64

23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8

24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526

25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13

26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3

27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82

28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19

29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46

Module-specific lncRNA-mRNA causal regulatory networks | 15

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60

31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016

32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12

33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892

34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408

35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51

36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32

37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305

38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15

39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99

40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057

41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6

42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6

43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083

44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138

45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8

46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7

47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7

48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370

49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5

50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034

51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082

52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2

53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9

54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17

55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000

56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000

57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1

58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6

59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910

60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000

61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7

62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9

63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30

64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5

65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765

66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9

67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40

68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46

69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52

70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94

71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88

16 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58

73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74

74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104

75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5

76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31

77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90

Module-specific lncRNA-mRNA causal regulatory networks | 17

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats

  • bby008-TF1
  • bby008-TF51
  • bby008-TF2
Page 3: Inferring and analyzing module-specific lncRNA-mRNA causal ...nugget.unisa.edu.au/Thuc/Briefings2019JP.pdf · Thuc Duy Le is a research fellow at the University of South Australia

Introduction

Long noncoding RNAs (lncRNAs) are non-protein coding tran-scripts with gt200 nucleotides in length Unlike small noncodingRNAs (sncRNAs) lncRNAs generally exhibit low sequence con-servation However owing to rapidly adaptive selection pres-sures the low conservation of lncRNAs (such as Air and Xist)does not indicate absence of function [1] Similar to microRNAs(miRNAs) an important class of sncRNAs evidence has shownthat lncRNAs play important roles in a wide range of biologicalprocesses even in cancers [2 3] Despite the importance oflncRNAs in many physiological and pathological processes alarge number of lncRNAs remain to be functionally character-ized For this reason the number of studies on lncRNA researchhas been increased exponentially in the past decade (as shownin Figure 1)

To achieve various biological functions lncRNAs form generegulatory networks by interacting with other biological mole-cules such as transcription factors miRNAs messenger RNAs(mRNAs) and RNA-binding proteins [4] Among these biologicalmolecules interacting with lncRNAs mRNAs are the most popu-lar ones By regulating the transcription and translation ofmRNAs lncRNAs could get involved in several vital biologicalprocesses such as cell differentiation cell proliferation andcytoprotective programs [5] Therefore the identification oflncRNAndashmRNA regulatory networks would help to uncoverfunctions and regulatory mechanisms of lncRNAs

A straightforward method for identifying lncRNAndashmRNAregulatory networks is sequence-based complementary basepairing To predict lncRNA targets several sequence-basedmethods such as GUUGle [6] RNAup [7] RNAplex [8] IntaRNA[9] RactIP [10] LncTar [11] and RIblast [12] have been devel-oped Owing to the long sequence and complex tertiary struc-ture of each lncRNA the computational costs of predictinglarge-scale lncRNAndashmRNA regulatory relationships are usuallyhigh Moreover these sequence-based methods only considerthe sequence information of lncRNAs and target mRNAs andthus the predicted lncRNAndashmRNA regulatory networks arestatic However previous studies [13ndash15] have shown thatlncRNAs exhibit condition-specific expression fashion anddynamic networks of gene regulation To identify dynamic orcondition-specific lncRNAndashmRNA regulatory networks it is nec-essary to use expression data Some expression-based methods[16ndash19] for predicting co-expressed lncRNAndashmRNA networks

have been proposed However as the predictions are based onstatistical associations found in gene expression levels onlythey may not represent the real lsquocausalrsquo lncRNAndashmRNA regula-tory relationships Furthermore the existing expression-basedmethods do not consider the modularity of lncRNAndashmRNA reg-ulatory networks an important feature of gene regulatory net-works [20]

In this article we first review the computational methodsfor inferring lncRNAndashmRNA interactions and the public data-bases for storing lncRNAndashmRNA regulatory relationshipsSecond we propose a novel method to infer Module-SpecificLncRNAndashmRNA Causal Regulatory Network (thus the proposedmethod is called MSLCRN) In the first step by consideringmodularity of networks MSLCRN uses Weighted Gene Co-expression Network Analysis (WGCNA) [21] to identify lncRNAndashmRNA co-expression modules In each module the lncRNAsand mRNAs are regarded as module-specific genes In the sec-ond step MSLCRN uses a causal inference method named inter-vention calculus when the directed acyclic graph (DAG) isabsent (IDA) [22 23] to estimate the causal effects of possiblelncRNAndashmRNA causal pairs in each module To speed up theestimation the parallelized version of IDA [24] is used to calcu-late the causal effects For each module the noncausal lncRNAndashmRNA pairs are eliminated and the retained lncRNAndashmRNAcausal pairs are further assembled to generate a module-specific lncRNAndashmRNA causal network To obtain a globallncRNAndashmRNA causal regulatory network we further integratethe identified module-specific lncRNAndashmRNA causal networksin the third step

To evaluate MSLCRN we have applied it into four humancancer data sets including glioblastoma multiforme (GBM) lungsquamous cell carcinoma (LSCC) ovarian cancer (OvCa) andprostate cancer (PrCa) from [25] The validation survival andenrichment analysis results show that the proposed methodcan help with revealing the functions and regulatory mecha-nisms of lncRNAs MSLCRN is released under the GPL-30License and is freely available through GitHub (httpsgithubcomzhangjunpeng411MSLCRN)

Computational methods for inferringlncRNAndashmRNA interactions

In this section we review the computational approaches forinferring lncRNAndashmRNA interactions In Table 1 we divide themethods into two categories (1) sequence-based method and(2) expression-based method We will separately review thesemethods as follows

Sequence-based method

The common characteristic of the sequence-based methods isthat the identification of RNAndashRNA interactions depends onRNA binding energy between two RNA molecules To evaluatethe strength of RNA binding energy a number of energy models[6ndash12 26ndash32] are proposed to predict RNAndashRNA interactions

Gerlach and Giegerich [6] propose a utility program GUUGlefor locating potential helical regions under RNA complementarybase pairs rules The method can be effectively used as a filterfor noncoding RNA (ncRNA) target prediction However the reli-able prediction of RNAndashRNA binding energies is also importantfor the identification of RNAndashRNA interactions To study thethermodynamics of RNAndashRNA interactions Muckstein et al [7]present an extension of the standard partition function methodcalled RNAup to RNA secondary structures By comparing

129 141 164 240 338

639

991

1419

2068

2596

0

500

1000

1500

2000

2500

3000

Year

Num

ber

of p

ublic

atio

ns

Figure 1 The number of lncRNA-related publications in the past decade The

number of queried publications is obtained from PubMed library with keyword

lsquolncRNArsquo

2 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

Table 1 Summary of computational methods or tools for inferring lncRNAndashmRNA interactions

Methodstools Categories of methods Brief descriptions Available

GUUGle [6] Sequence-based Target prediction by locating potential helical regions of RNAndashRNA pairs under RNA base pairing rules which include G-Ubases

httpbibiserv2cebitecuni-bielefelddeguugle

RNAup [7] Sequence-based Target prediction by studying thermodynamics of RNAndashRNApairs based on the sum of the energy of binding andhybridization

httprnatbiunivieacatcgi-binRNAWebSuiteRNAupcgi

RNAcofold [26] Sequence-based Target prediction by computing the hybridization energy andbase pairing pattern of RNAndashRNA pairs

httprnatbiunivieacatcgi-binRNAWebSuiteRNAcofoldcgi

Alkan et al [27] Sequence-based Target prediction by minimizing the joint free energy of RNAndashRNA pairs under a number of energy models including basepair energy model stacked pair energy model loop energymodel

On request

RNAplex [8] Sequence-based Target prediction by finding possible hybridization sites ofRNAndashRNA pairs

httpwwwtbiunivieacathtafer

IntaRNA [9] Sequence-based Target prediction by incorporating accessibility of target sitesas well as the existence of a user-definable seed

httprnainformatikuni-freiburgdeIntaRNAInputjsp

RactIP [10] Sequence-based Target prediction by integrating approximate information onan ensemble of equilibrium joint structures into the objec-tive function of integer programming

httprtipsdnabiokeioacjpractip

PETcofold [28] Sequence-based Target prediction by taking covariance information in intra-molecular and intermolecular base pairs into account

httprthdkresourcespetcofold

RIsearch [29] Sequence-based Target prediction by implementing a simplified Turner energymodel for fast computation of hybridization

httpsrthdkresourcesrisearchrisearch1php

RIsearch2 [30] Sequence-based An updated version of RIsearch and predict targets using a sin-gle integrated seed-and-extend framework based on suffixarrays

httpsrthdkresourcesrisearch

LncTar [11] Sequence-based lncRNA target prediction by finding the minimum free energyjoint structure of RNAndashRNA pairs based on base pairing

httpwwwcuilabcnlnctar

lncRNATargets [31] Sequence-based lncRNA target prediction based on nucleic acidthermodynamics

httpwwwherbbolorg 8001lrt

Terai et al [32] Sequence-based lncRNA target prediction by developing an integrated pipelineon the K computer which is one of the fastest super-com-puters in the world

httprtoolscbrcjpcgi-binRNARNAindexpl

RIblast [12] Sequence-based Target prediction based on the seed-and-extension approach httpgithubcomfukunagatsuRIblast

Liao et al [16] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodand the identified lncRNAndashmRNA interactions should be co-expressed in the same direction in no less than 3 Mousemicroarray data sets

On request

Guo et al [17] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodin OvCa malignant progression

On request

Du et al [18] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodand a power function in thyroid cancer

On request

Liu et al [33] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodin human colorectal carcinoma

On request

Huang et al [34] Expression-based Identify lncRNAndashmRNA interactions associated with pneumo-nia by using Pearson method

On request

Li et al [35] Expression-based Identify dynamic lncRNAndashmRNA interactions associated withvenous congestion by using Pearson method

On request

Wu et al [19] Expression-based Identify lncRNAndashmRNA interactions by using a generalized lin-ear model to regress mRNA expression on lncRNA expres-sion in breast cancer

On request

Fu et al [36] Expression-based Identify lncRNAndashmRNA interactions by considering mRNA lociwithin lncRNA and the Pearson correlation in cartilage

On request

Zhang et al [37] Expression-based Identify lncRNAndashmRNA interactions by considering mRNA lociwithin lncRNA and the Pearson correlation in cartilage inperipheral blood mononuclear cells

On request

Iwakiri et al [38] Expression-based Identify tissue-specific lncRNAndashmRNA interactions by integrat-ing the tissue specificity of lncRNAs and mRNAs intosequence-based prediction of human lncRNAndashRNAinteractions

On request

Lv et al [39] Expression-based Identify tissue-specific lncRNAndashmRNA interactions by usingPearson and sequence-based methods and in human intra-hepatic cholangiocarcinoma

On request

Module-specific lncRNA-mRNA causal regulatory networks | 3

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

predicted free energies of binding with RNA interference experi-mental data RNAup can produce biologically reasonableresults For genome-wide predictions of ncRNA targets RNAupis not fast enough Therefore it is usually to be combined withother faster RNAndashRNA prediction methods

To extend the standard dynamic programming algorithmsfor computing RNA secondary structures Bernhart et al [26]propose a program named RNAcofold to compute the hybridiza-tion energy and base pairing pattern of the co-folding of twoRNA molecules However the method disregards some impor-tant interaction structures and is restricted to dimeric com-plexes Moreover for the RNAndashRNA interaction predictionpredicting the joint secondary structure of two interacting RNAsis also important To solve it Alkan et al [27] develop severalalgorithms to minimize the joint free energy between the twoRNAs under a number of energy models Assuming that con-served RNAndashRNA interactions imply conserved functionSeemann et al [28] also implement a comparative method calledPETcofold to predict the joint secondary structure of two inter-acting RNAs As PETcofold considers sequence conservation anincreasing amount of structural covariance can further improveits performance

RNAup [7] and RNAcofold [26] are too slow for genome-widesearch in finding target sites of ncRNAs To accelerate the speedof RNAndashRNA interaction predictions RNAplex [8] is presented toquickly find possible hybridization sites between two interact-ing RNAs To focus on the target search on short highly stableinteractions RNAplex introduces a per nucleotide penaltyMeanwhile another general and fast approach IntaRNA [9] isproposed to efficiently predict bacterial RNAndashRNA interactionsCompared with other existing target prediction methodsIntaRNA considers both the accessibility of target sites and theexistence of a user-defined seed Therefore it shows a higheraccuracy than competing methods Kato et al [10] also present afast and accurate prediction method RactIP for comprehensivetype of RNAndashRNA interactions In terms of predicting joint sec-ondary structures of two interacting RNAs RactIP run incompa-rably faster than competitive programs

To further achieve a speed improvement of predictingRNAndashRNA interactions Wenzel et al [29] present RIsearch forfast computation of hybridization between two interactingRNAs They show that the energy model of RIsearch is anaccurate approximation of the full energy model for near-complementary RNAndashRNA duplexes Furthermore RIsearch isfaster than RNAplex [8] in RNAndashRNA interaction searchRecently RIsearch2 [30] an updated version of RIsearch [29] isproposed to localize potential near-complementary RNAndashRNAinteractions between two RNA sequences The comparisonresults show that RIsearch2 is much faster than the previousmethods such as GUUGle [6] RNAplex [8] IntaRNA [9] andRIsearch [29]

Although the above RNAndashRNA interaction prediction meth-ods can be extended to predict lncRNAndashmRNA interactionsnone of them are exclusively used for identifying the RNA tar-gets of lncRNAs in a large scale To efficiently identify lncRNAndashmRNA interactions Li et al [11] propose a tool named LncTarLncTar explores lncRNAndashmRNA interactions by finding the min-imum free energy joint structure of two interacting RNAs basedon base pairing As LncTar runs fast and does not have a limitto RNA size it can be used for large-scale identification of theRNA targets for all RNAs Another web-based platformlncRNATargets [31] is also provided for lncRNA target predic-tion Because there is no limit to RNA size lncRNATargets canalso be used to identify the RNA targets of all RNAs In a whole

human transcriptome Terai et al [32] develop an integratedpipeline to predict lncRNAndashmRNA interactions for the first timeIn the pipeline IntaRNA [9] is used to calculate interactionenergy and RactIP [10] is used to predict joint secondary struc-ture Recently to further shorten the running time of predictinglncRNAndashmRNA interactions an ultrafast RNAndashRNA interactionprediction method RIblast [12] based on the seed-and-extensionmethod is presented The comparison results show that RIblastruns faster than RNAplex [8] IntaRNA [9] Terai et al pipeline[32] and thus can be applied to a large scale of lncRNA targetidentification

Expression-based method

At the gene expression level the co-expressed lncRNAndashmRNApairs are regarded as lncRNAndashmRNA interactions for theexpression-based methods Among the existing expression-based methods [16ndash19 33ndash39] Pearson correlation method is akey step of most methods to identify co-expressed lncRNAndashmRNA pairs

Liao et al [16] construct a lncRNAndashmRNA co-expression net-work from re-annotated mouse microarray data sets By usingPearson method they only keep the lncRNAndashmRNA pairs withPlt 001 and Pearson correlation ranked in the top or bottom005 percentile The study is the first large-scale prediction oflncRNA functions from a lncRNAndashmRNA co-expression networkTo identify immune-associated lncRNA biomarkers in OvCa Guoet al [17] make a comprehensive analysis of lncRNAndashmRNA co-expression patterns To identify lncRNAndashmRNA co-expressionpairs they calculate Pearson correlation between differentiallyexpressed lncRNAs and mRNAs They only reserve the lncRNAndashmRNA co-expression pairs with Pearson correlationgt 05 and thecorresponding False Discovery Rate (FDR)lt 001 Liu et al [33] andHuang et al [34] also use Pearson method to study lncRNAndashmRNAco-expression networks in human colorectal carcinoma andpneumonia respectively The inferred lncRNAndashmRNA co-expression networks will help to study lncRNA functionsRecently Du et al [18] propose a two-step method to conduct acomprehensive analysis of lncRNAndashmRNA co-expression patternsin thyroid cancer First they use Pearson method to calculatePearson correlation and the cutoff of Pearson correlation is 05and the corresponding FDR cutoff is 001 Second the Pearson cor-relations are transformed into an adjacency matrix

Owing to dynamic characteristic of gene regulatory net-works Wu et al [19] identify two distinct lncRNAndashmRNA co-expression networks in tumor and normal breast tissue Theyuse a generalized linear model to regress mRNA expression onlncRNA expression in tumor and normal breast tissue and onlyfocus on dynamic breast lncRNAndashmRNA co-expression pairsthat differ in tumor and normal breast tissue Meanwhile tostudy the potential role of lncRNAs in venous congestion Liet al [35] also construct a dynamic lncRNAndashmRNA co-expression network By using Pearson method they separatelycalculate Pearson correlations of each lncRNAndashmRNA pair invenous congestion and normal samples The lncRNAndashmRNApairs with Pearson correlationgt099 orlt099 and P-valuelt001are selected as lncRNAndashmRNA co-expression pairs They con-struct two types of lncRNAndashmRNA co-expression networkslsquolostrsquo network where lncRNAndashmRNA co-expression pairs onlyexisted in normal samples and lsquoobtainedrsquo network wherelncRNAndashmRNA co-expression pairs only existed in venous con-gestion samples The lsquolostrsquo and lsquoobtainedrsquo networks are furtherintegrated to obtain a dynamic lncRNAndashmRNA co-expressionnetwork

4 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

The above methods simply use matched lncRNA and mRNAexpression data to identify lncRNAndashmRNA co-expression pairsTo identify lsquocis-regulated target genesrsquo of lncRNAs some methodsalso consider mRNA loci information within lncRNA For exampleFu et al [36] combine mRNA loci information and matchedlncRNA and mRNA expression data to predict lncRNA targetsThey identify the mRNAs as targets under two conditions (i) themRNA loci are within a 300-kb window up- or downstream oflncRNA and (ii) lncRNAndashmRNA co-expression pairs are signifi-cantly positive correlated (Pearson correlationgt 08 and the corre-sponding P-valuelt 005) Zhang et al [37] also use a similarmethod to Fu et al [36] for identifying lncRNA targets The mRNAscan be regarded as targets when (1) the mRNA loci are within a10 window up- or downstream of lncRNA and (2) lncRNAndashmRNAco-expression pairs are significantly positive correlated (Pearsoncorrelationgt 098 and the corresponding P-valuelt 005)

Apart from mRNA loci information within lncRNA someemerging methods consider predictions from sequence-basedmethods as putative lncRNAndashmRNA interactions For exampleIwakiri et al [38] integrate tissue-specific lncRNA and mRNAexpression data into predictions from a sequence-basedmethod in [32] They discover that integrating tissue specificitycan improve prediction accuracy of lncRNAndashmRNA interactionsLv et al [39] also combine matched lncRNA and mRNA expres-sion data with predictions from a sequence-based methodLncTar [11] They first use Pearson method to identify co-expressed lncRNAndashmRNA co-expression pairs with Pearsoncorrelationgt095 orlt095 Then LncTar is used to further filterthe identified lncRNAndashmRNA co-expression pairs

Public databases for storing lncRNAndashmRNAregulatory relationships

In this section we review the public databases of storinglncRNAndashmRNA regulatory relationships Table 2 shows a sum-mary of the third-party public databases including experimen-tally validated and computationally predicted databases

NPInter [40] contains experimentally validated interactionsbetween ncRNAs especially lncRNAs and miRNAs The databasecontains 915 067 interactions in 188 tissues or cell lines from 68kinds of experimental technologies There is a classification ofthe functional interactions based on the functional process thatncRNA is involved in Moreover NPInter allows users to searchinteractions related publications and other information

LncRNADisease [41] not only collects experimentally sup-ported lncRNAndashdisease associations and lncRNA interactionsbut also predicts novel lncRNAndashdisease associations Recentlythe database curates 478 entries of experimentally validatedlncRNA interactions LncRNADisease provides users severalways to search lncRNA-related diseases and interactions

To study differentially expressed genes after lncRNA knock-down or overexpression Jiang et al [42] develop a databasecalled LncRNA2Target in human and mouse organisms Thedatabase has a collection of 396 experimentally validatedlncRNAndashtarget interactions In LncRNA2Target if a gene is dif-ferentially expressed after lncRNA knockdown or overexpres-sion it is regarded as a target of a lncRNA For convenienceLncRNA2Target allows users to search for the targets of singlelncRNA or for the lncRNAs that target a specific geneMeanwhile Zhou et al [43] also build a reference resourceLncReg for lncRNA-related regulatory networks The databasehas 1081 experimentally validated lncRNA-related regulatory

records between 258 nonredundant lncRNAs and 571 nonredun-dant genes

IRNdb [44] is a database that focuses on collecting immuno-logically relevant lncRNAndashtarget miRNAndashtarget and PIWI-interacting RNAndashtarget interactions The current version ofIRNdb documents 22 453 immunologically relevant lncRNAndashtar-get interactions by integrating three databases LncRNADisease[41] LncRNA2Target [42] and LncReg [43] The aim is to helpresearchers study the roles of ncRNAs in the immune systemRecently a new experimentally validated database namedlncRInter [45] was developed to collect reliable and high-qualitylncRNAndashtarget interactions The extracted lncRNAndashtarget inter-actions are all from published literature and are supported bycertain biological experiments (eg luciferase reporter assayin vitro binding assay RNA pull-down) In total lncRInter con-tains 1036 experimentally validated lncRNAndashtarget interactionsin 15 organisms

In addition to the experimentally validated databases pre-sented above there are several computationally predicted data-bases for collecting lncRNAndashmRNA interactions For instancestarBase [46] is a comprehensive database of systematicallyidentifying the RNAndashRNA and proteinndashRNA interaction net-works from 108 CLIP-Seq (PAR-CLIP HITS-CLIP iCLIP CLASH)data sets The lncRNAndashmRNA interactions can be extractedfrom proteinndashRNA interaction networks InCaNet [47] aimsto establish a comprehensive regulatory network betweenlncRNAs and cancer genes They identify lncRNAndashcancergene interactions by computing gene co-expression betweenlncRNAs and cancer genes BmncRNAdb [48] is a comprehensivedatabase of silkworm lncRNAs and miRNAs The database pro-vides three online tools for users to predict both lncRNAndashtargetand miRNAndashtarget interactions lncRNAtor [49] collect expres-sion data from 243 RNA-seq experiments including 5237samples of various tissues and developmental stages ThelncRNAndashmRNA co-expression pairs are identified through co-expression analysis of lncRNAs and mRNAs lncRNome [50] is acomprehensive knowledgebase of sequence structure biologi-cal functions genomic variations and epigenetic modificationson gt17 000 lncRNAs in human For lncRNAndashprotein interactionsthe database incorporates PAR-CLIP experiments and a supportvector machine-based prediction method Co-lncRNA [51] andLncRNA2Function [52] predict co-expressed lncRNAndashmRNAinteractions from RNA-Seq data and further annotates thepotential functions of human lncRNAs using functional enrich-ment analysis lncRNAMap [53] is an integrated and compre-hensive database to explore regulatory functions of humanlncRNAs By integrating small RNAs supported by publicly avail-able deep sequencing data lncRNAMap construct lncRNA-derived siRNAndashtarget interactions

In summary for experimentally validated databases userscan select individual database or combine several databases asground truth to validate the predicted lncRNAndashmRNA interac-tions As for computationally predicted databases they can beused as initial structural of sequence-based or expression-basedmethods to identify lncRNAndashmRNA interactions

Inferring and analyzing MSLCRN networksRepurposed microarray data across human cancers

We collect the repurposed lncRNA and mRNA expression dataof GBM LSCC OvCa and PrCa from [25] A lncRNA or mRNA iseliminated if it does not have a corresponding gene symbol in adata set By calculating average expression values of replicate

Module-specific lncRNA-mRNA causal regulatory networks | 5

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

lncRNAs and mRNAs we obtain unique expression value ofthese replicates Consequently we get the matched expressiondata of 9704 lncRNAs and 18 282 mRNAs in 451 GBM 113 LSCC585 OvCa and 150 PrCa samples

Pipeline of MSLCRN

As shown in Figure 2 MSLCRN contains the following threesteps to infer module-specific lncRNAndashmRNA causal regulatorynetworks

i Identification of lncRNAndashmRNA co-expression modulesGiven the matched lncRNA and mRNA expression data weuse WGCNA to generate gene co-expression modules Amodule containing at least two lncRNAs and two mRNAsare regarded as a lncRNAndashmRNA co-expression moduleand used as the input of the second step

ii Identification of module-specific lncRNAndashmRNA causal reg-ulatory networks For each lncRNAndashmRNA co-expressionmodule with each lncRNAndashmRNA pair we apply parallelIDA to estimate the causal effect of the lncRNA on the

Table 2 Public databases for storing lncRNAndashmRNA regulatory relationships

Databases Types of databases Brief descriptions Organisms Available

NPInter [40] Validated A database of experimentally verified func-tional interactions between ncRNAs(including lncRNAs miRNAs etc) and bio-molecules (proteins RNAs and DNAs)

22 organisms httpwwwbioinfoorgNPInter

LncRNADisease [41] Validated A database of experimentally supportedlncRNAndashdisease association data andlncRNAndashtarget interactions in various lev-els including protein RNA miRNA andDNA

Human httpwwwcuilabcnlncrnadisease

LncRNA2Target [42] Validated A database of lncRNAndashtarget regulatory rela-tionships experimentally validated bylncRNA knockdown or overexpression

Human mouse httpwwwlncrna2targetorg

LncReg [43] Validated A database of experimentally validatedlncRNAndashtarget interactions from publicliterature

7 organisms httpbioinformaticsustceducnlncreg

IRNdb [44] Validated A database of immunologically relevantncRNAs (miRNAs lncRNAs and otherncRNAs) and target genes

Human mouse httpcompbiomasseyacnzappsirndb

lncRInter [45] Validated A database of experimentally validatedlncRNAndashtarget interactions extracted frompeer-reviewed publications

15 organisms httpbioinfolifehusteducnlncRInter

starBase [46] Predicted A comprehensive database of systematicallyidentifying the RNAndashRNA and proteinndashRNAinteraction networks from 108 CLIP-Seq(PAR-CLIP HITS-CLIP iCLIP CLASH) datasets

Human httpstarbasesysueducn

lnCaNet [47] Predicted A database of establishing a comprehensiveregulatory network source for lncRNA andcancer genes

Human httplncanetbioinfo-minzhaoorg

BmncRNAdb [48] Predicted A comprehensive database of the silkwormlncRNAs and miRNAs as well as the threeonline tools for users to predict the targetgenes of lncRNAs or miRNAs

Bombyx mori httpgenecqueducnBmncRNAdbindexphp

lncRNAtor [49] Predicted A comprehensive resource of encompassingannotation sequence analysis geneexpression protein binding and phyloge-netic conservation

6 organisms httplncrnatorewhaackr

lncRNome [50] Predicted A comprehensive knowledgebase on thetypes chromosomal locations descriptionon the biological functions and diseaseassociations of lncRNAs

Human httpgenomeigibresinlncRNome

Co-LncRNA [51] Predicted A computationally predicted database toidentify GO annotations and KEGG path-ways affected by co-expressed protein-cod-ing genes of a single or multiple lncRNAs

Human httpwwwbio-bigdatacomCo-LncRNA

LncRNA2Function [52] Predicted A comprehensive resource of investigatingthe functions of lncRNAs based on co-expressed lncRNAndashmRNA interactions

Human httpmlghiteducnlncrna2function

lncRNAMap [53] Predicted An integrated and comprehensive databaseof regulatory functions of lncRNAs and act-ing as ceRNAs

Human httplncRNAMapmbcnctuedutw

6 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

mRNA We use the absolute value of the causal effect(AVCE) to evaluate the strength of the regulation of thelncRNA on the mRNA and a higher AVCE indicates a stron-ger lncRNA regulation The lncRNAndashmRNA pairs with highAVCEs in each module are considered as module-specificlncRNAndashmRNA causal regulatory relationships and we calleach module with these relationships identified a module-specific causal regulatory network

iii Identification of global lncRNAndashmRNA causal regulatorynetwork We integrate the module-specific lncRNAndashmRNA

causal regulatory networks to form the global lncRNAndashmRNA causal regulatory network

Identification of lncRNAndashmRNA co-expression modules

In systems biology WGCNA [21] is a popular method for findingthe correlation patterns among genes across samples and canbe used to identify clusters or modules of highly co-expressedgenes Therefore we use WGCNA to first infer lncRNAndashmRNAco-expression modules

Figure 2 The pipeline of MSLCRN First WGCNA is used to identify lncRNAndashmRNA co-expression modules from matched lncRNA and mRNA expression data Second

we infer lncRNAndashmRNA causal regulatory relationships in each module by using parallel IDA method For each module we assemble the identified lncRNAndashmRNA reg-

ulatory relationships to obtain a module-specific lncRNAndashmRNA causal regulatory network Third the module-specific lncRNAndashmRNA causal regulatory networks are

integrated to form a global lncRNAndashmRNA causal regulatory network

Module-specific lncRNA-mRNA causal regulatory networks | 7

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

Specifically the matched lncRNA and mRNA expressiondata are used as the input of WGCNA For each pair of genesi and j the gene co-expression similarity sij of the pair is definedas

sij frac14 jcorethi jTHORNj (1)

where jcor(i j)j is the absolute value of the Pearson correlationbetween genes i and j The gene co-expression similarity matrixis denoted by Sfrac14 [sij]

To pick an appropriate soft-thresholding power for trans-forming the similarity matrix S into an adjacency matrix A weuse the scale-free topology criterion for soft-thresholding andthe minimum scale free topology fitting index R2 is set as 09Then the topological overlap matrix (TOM) Wfrac14 [wij] is gener-ated based on the adjacency matrix Afrac14 [aij] The TOM similaritywij between genes i and j is defined

wij frac14P

uaiuauj thorn aij

minfP

uaiuP

uaujg thorn 1 aij(2)

where u denotes all genes of the matched lncRNA and mRNAexpression data The TOM dissimilarity between genes i and j isdenoted by dijfrac14 1 - wij To identify gene co-expression modulesthe TOM dissimilarity matrix Dfrac14 [dij] is clustered using optimalhierarchical clustering method [54] Here the identified geneco-expression modules are groups of lncRNAs and mRNAs withhigh topological overlap The lncRNAs and mRNAs of eachlncRNAndashmRNA co-expression module are considered for possi-ble lncRNAndashmRNA causal relationships in the next step

Identification of module-specific lncRNAndashmRNA causalregulatory networks

After the identification of lncRNAndashmRNA co-expression mod-ules we use the parallel IDA method [24] to estimate causaleffects of possible lncRNAndashmRNA causal pairs in each moduleThe application of parallel IDA method to matched lncRNAand mRNA expression data for estimating causal effectsincludes two steps (i) learning the causal structure fromexpression data using the parallel-PC algorithm [24] and(ii) estimating the causal effects of lncRNAs on mRNAs byapplying do-calculus [55]

In step (i) Vfrac14 L1 Lm T1 Tn is a set of random varia-bles denoting m lncRNAs and n mRNAs The causal structure isin the form of a DAG where a node denotes a lncRNA Li ormRNA Tj and an edge between two nodes represents a causalrelationship between them We use the parallel-PC algorithm aparallel version of the PC algorithm [56] to learn the causalstructures (the DAGs) from expression data Starting with a fullyconnected undirected graph the parallel-PC algorithm deter-mines if an edge is retained or removed in the graph by con-ducting conditional independence tests in parallel Then to geta DAG the directions of edges in the obtained graph are ori-ented As different DAGs may represent the same conditionalindependence the parallel-PC algorithm uses a completed par-tially directed acyclic graph (CPDAG) to uniquely describe anequivalence class of DAGs In this work we use the R-packageParallelPC [57] to implement the parallel-PC algorithm and setthe significant level of the conditional independence testsafrac14 001

In step (ii) we are only interested in estimating the causaleffect of the directed edge Li Tj where vertex is Li a parent ofvertex Tj As described above a CPDAG may generate a class ofDAGs For the causal effect of Li Tj in a CPDAG we use do-calculus [55] to estimate the causal effects of Li on Tj in a class ofDAGs Then we use the minimum absolute value of all possiblecausal effects as a final causal effect of Li Tj As for the detailsof how the parallel IDA method is applied to estimate causalrelationships from expression data the readers can refer to [24]

The estimated causal effects can be positive or negativereflecting the up or down regulation by the lncRNAs on themRNAs For the purpose of constructing the regulatory net-works we use the absolute values of the causal effects (AVCEs)to evaluate the strengths of the regulation and thus to confirmthe regulatory relationships

We set different AVCE cutoffs from 010 to 060 with a step of005 to generate MSLCRN networks in GBM LSCC OvCa andPrCa respectively For each cutoff we merge the identifiedMSLCRN networks to obtain global lncRNAndashmRNA causal regu-latory networks in the four human cancers respectively Asshown in Table 3 a higher cutoff selection causes a smallerglobal lncRNAndashmRNA causal regulatory network but bettergoodness of fit To make a trade-off between the size of theglobal lncRNAndashmRNA causal regulatory networks and goodnessof fit we set a compromised AVCE cutoff with a value of 045 Ifthe AVCE of a lncRNA on a mRNA is 045 or above we considerthere is a causal regulatory relationship between the lncRNAndashmRNA pair Under the compromise cutoff we have a moderatesize of the global lncRNAndashmRNA causal regulatory networks inGBM LSCC OvCa and PrCa Meanwhile the node degree distri-butions of four global lncRNAndashmRNA causal regulatory net-works also follow power law distribution (the fitted power curveis in the form of yfrac14 axb) well with R2gt 08

Validation survival and enrichment analysis

Previous studies have demonstrated that about 20 of thenodes in a biological network are essential and are regarded ashub genes [58 59] Therefore when analyzing a global lncRNAndashmRNA causal network we select the 20 of lncRNAs with thehighest degrees in the network as hub lncRNAs The degree of alncRNA node in the global network is the number of mRNAsconnected with it

To validate the predicted module-specific lncRNAndashmRNAcausal regulatory relationships we obtain the experimentallyvalidated lncRNAndashmRNA regulatory relationships from thethree widely used databases NPInter v30 [40] LncRNADiseasev2017 [41] and LncRNA2Target v12 [42] Furthermore we retainexperimentally validated lncRNAndashmRNA regulatory relation-ships associated with the four human cancer data sets asground truth

We perform survival analysis using the R-package survival[60] A multivariate Cox model is used to predict the risk scoreof each tumor sample Then all tumor samples in each cancerdata set are equally divided into high- and low-risk groupsaccording to their risk scores Moreover we calculate theHazard Ratio between the high- and the low-risk groups andperform the Log-rank test

To further investigate the underlying biological processesand pathways related to each of the MSLCRN networks we usethe R-package clusterProfiler [61] to conduct functional enrich-ment analysis on the networks respectively The GeneOntology (GO) [62] biological processes and Kyoto Encyclopedia

8 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

of Genes and Genomes (KEGG) [63] pathways with adjustedP-valuelt005 [adjusted by Benjamini-Hochberg (BH) method]are regarded as functional categories for the MSLCRN networks

We also collect a list of lncRNAs and mRNAs that areassociated with GBM LSCC OvCa and PrCa to study diseaseenrichment of each of the MSLCRN networks The list of disease-associated lncRNAs is obtained from LncRNADisease v2017 [41]Lnc2Cancer v2016 [64] and MNDR v20 [65] The list of disease-associated mRNAs is from DisGeNET v50 [66] To evaluatewhether a MSLCRN network is significantly enriched in a specificdisease we use a hyper-geometric distribution test as follows

p frac14 1 FethxjBNMTHORN frac14 1Xx1

ifrac140

N

i

B N

M i

B

M

(3)

In the formula B is the number of all genes in the expressiondata set N denotes the number of all genes associated with aspecific disease in the expression data set M is the number ofgenes in a MSLCRN network and x is the number of genes asso-ciated with a specific disease in a MSLCRN network A MSLCRNnetwork is significantly enriched in a specific disease if theP-valuelt 005

Network analysis validation and comparisonon MSLCRN networkslncRNAs exhibit dynamic positive gene regulationacross cancers

By following the first step of the MSLCRN method we haveidentified 23 38 45 and 32 lncRNAndashmRNA co-expression mod-ules in GBM LSCC OvCa and PrCa respectively In the secondstep of the MSLCRN method we eliminate the noncausallncRNAndashmRNA pairs in lncRNAndashmRNA co-expression modulesAs a result we generate 23 38 45 and 32 module-specificlncRNAndashmRNA causal regulatory networks in GBM LSCC OvCaand PrCa respectively After merging the module-specificlncRNAndashmRNA causal regulatory networks for each data set weobtain the four global lncRNAndashmRNA regulatory networks inGBM LSCC OvCa and PrCa respectively

To understand the overlap and difference of module-specificgenes module-specific lncRNAndashmRNA causal regulatory rela-tionships and module-specific hub lncRNAs in the four humancancers we generate three set intersection plots using theR-package UpSetR [67] As shown in Figure 3 we find that themajority of module-specific genes (5752) module-specificlncRNAndashmRNA causal regulatory relationships (9902) andmodule-specific hub lncRNAs (8922) tend to be cancer-specific Only a small portion of module-specific genes (396) andmodule-specific lncRNAndashmRNA causal regulatory relationships(6) are shared by the four cancers Especially none of themodule-specific hub lncRNAs are common between the fourcancers In addition the causal effects are positive for 99569672 9993 and 7863 of the causal regulatory relationshipsidentified in GBM LSCC OvCa and PrCa respectively Theseresults indicate that lncRNAs are more likely to exhibit dynamicpositive gene regulation across cancers The results are alsoconsistent with the proposition that the positive gene regula-tion by lncRNAs would be desired in specific situations [68]

Differential network analysis uncovers cancer-specificlncRNAndashmRNA causal networks

In this section we focus on studying cancer-specific lncRNAndashmRNA causal networks using differential network analysisThus the GBM-specific LSCC-specific OvCa-specific and PrCa-specific lncRNAndashmRNA causal networks are identified Asshown in Figure 4A the distributions of node degrees in thesefour cancer-specific lncRNAndashmRNA causal networks followpower law distributions well with R2frac14 09774 09923 09723and 08310 respectively Thus these four cancer-specificlncRNAndashmRNA causal networks are scale free indicating thatmost mRNAs are regulated by a small number of lncRNAs

Table 3 Degree distributions of global lncRNAndashmRNA causal regula-tory networks with different cutoffs in GBM LSCC OvCa and PrCa

Datasets Cutoffs Number of causalregulations

yfrac14axb R2

GBM 010 11 847 yfrac142274x06893 04161015 10 924 yfrac142495x07275 05460020 9732 yfrac142745x0767 06475025 8461 yfrac142958x08074 06757030 7176 yfrac143194x08319 06807035 6041 yfrac143363x08703 07203040 4997 yfrac143741x09348 07999045 4074 y54082x21034 08694050 3279 yfrac144194x118 09244055 2583 yfrac143896x1259 09463060 1862 yfrac143666x143 09792

LSCC 010 789 172 yfrac143143x06071 04829015 684 524 yfrac143475x06323 05841020 569 369 yfrac143905x06525 06578025 451 346 yfrac144855x06928 07789030 340 860 yfrac146341x07554 08796035 244 547 yfrac148147x08379 09504040 166 593 yfrac149724x0935 09848045 108 024 y51031x21018 09933050 66 335 yfrac149425x1068 09963055 37 632 yfrac147807x1089 09948060 19 547 yfrac146565x1169 09972

OvCa 010 333 146 yfrac143272x05928 05042015 232 794 yfrac144192x06262 06531020 159 872 yfrac146398x07216 08247025 112 792 yfrac148816x08356 09120030 80 808 yfrac141008x09472 09551035 57 099 yfrac149545x1014 09744040 38 517 yfrac148198x1066 09748045 24 439 y56575x21066 09697050 14 435 yfrac14540x1079 09551055 7973 yfrac144368x1107 09319060 4026 yfrac143285x1107 09460

PrCa 010 1 894 322 yfrac143089x06245 02750015 1 749 595 yfrac143586x06787 03582020 1 594 744 yfrac144013x07169 04316025 1 429 858 yfrac144271x0732 04919030 1 260 968 yfrac144389x07244 05616035 1 097 654 yfrac144406x0702 06470040 946 439 yfrac144485x06816 07338045 812 687 y55175x207005 08206050 694 558 yfrac14667x07588 08823055 584 834 yfrac148833x08469 09332060 474 654 yfrac141113x09684 09503

Note The AVCE cutoffs range from 010 to 060 with a step of 005

The bold values are the degree distributions of global lncRNA-mRNA causal reg-

ulatory networks with a compromised AVCE cutoff (045) in four human cancers

Module-specific lncRNA-mRNA causal regulatory networks | 9

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

Next we use four lists of lncRNAs and mRNAs associatedwith GBM LSCC OvCa and PrCa to discover lncRNAndashmRNAcausal networks that are associated with the four human can-cers We define that cancer-related lncRNAndashmRNA causal regu-latory relationships are those in which at least one regulatoryparty is cancer-related lncRNA or mRNA As a result wehave extracted GBM-related LSCC-related OvCa-related andPrCa-related lncRNAndashmRNA causal networks from the fourcancer-specific lncRNAndashmRNA causal networks (details inSupplementary File S1) To understand the potential biologicalprocesses and pathways of the four cancer-related lncRNAndashmRNA causal networks we identify significant GO biologicalprocesses and KEGG pathways using functional enrichmentanalysis In Figure 4B several top GO biological processes and

KEGG pathways such as cytokine activity [69] G-proteincoupled receptor binding [70] TNF signaling pathway [71] cAMPsignaling pathway [72] pathways in cancer are closely associ-ated with the occurrence and development of cancer Thisresult suggests that the identified cancer-related lncRNAndashmRNA causal networks may be involved in the occurrence anddevelopment of human cancer

Conservative network analysis highlights a corelncRNAndashmRNA causal regulatory network acrosshuman cancers

Although most of the lncRNAndashmRNA causal regulatory relation-ships are cancer-specific there are still a number of common

396 424

106 115

1370

133 173

1599

126

1729

1224

252

2523

2157

5081

0

2000

4000

Inte

rsec

tion

Siz

e

PrCa

Ov Ca

LSCC

GBM

025

0050

0075

00

1000

0

Set Size

6 283

13 1 80 673

206

3969

76 3493

407

2816

9950

7

1948

7

0

250000

500000

750000

Inte

rsec

tion

Siz

e

PrCa

Ov Ca

LSCC

GBM

0e+0

0

2e+0

5

4e+0

5

6e+0

5

8e+0

5

Set Size

6 1 2 933

2

41

6 11

246

47

524

0

200

400In

ters

ectio

n S

ize

PrCa

Ov Ca

LSCC

GBM

0200

400

600

Set Size

Module-specific genes

Module-specific causal regulations Module-specific hub lncRNAs 808611

A

B C

Figure 3 Overlap and difference of module-specific genes module-specific causal regulations and module-specific hub lncRNAs across GBM LSCC OvCa and PrCa

(A) Module-specific genes (both lncRNAs and mRNAs) intersection plot (B) Module-specific causal regulations intersection plot (C) Module-specific hub lncRNAs inter-

section plot The red lines denote common genes and causal regulations across GBM LSCC OvCa and PrCa

10 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

causal regulatory relationships between the four global net-works To evaluate whether there is a common core of lncRNAndashmRNA causal regulatory relationships in the global regulatorynetworks across human cancers we concentrate on the con-served lncRNAndashmRNA causal regulatory relationships thatexisted in at least three human cancers

As shown in Figure 5A the majority of the conservedlncRNAndashmRNA causal regulatory relationships form a closely

connected community This finding indicates that the con-served lncRNAndashmRNA causal regulatory network may be a corenetwork across human cancers

The survival analysis shows that the lncRNAs and mRNAs inthe core network can significantly distinguish the metastasisrisks between the high- and low-risk groups in GBM OvCa andPrCa data sets (Figure 5B) This result suggests that the core net-work may act as a common network biomarker of GBM OvCa

Cancer-specific networks Causal regulations y=axb R2

GBM-specific 2816 y=4812x-1292 09774

LSCC-specific 99507 y=1034x-1014 09923

OvCa-specific 19487 y=7366x-1156 09723

PrCa-specific 808611 y=5243x-07055 08310

1 10 100 11001

10

100

1100

Degree of genes

Nu

mb

er o

f ge

nes

GBM-specific fitting curveLSCC-specific fitting curveOvCa-specific fitting curvePrCa-specific fitting curveGBM-specific degree distributionLSCC-specific degree distributionOvCa-specific degree distributionPrCa-specific degree distribution

solutecation symporter activitysymporter activity

gated channel activitysodium ion transmembrane transporter activity

ion channel activitysubstrate-specific channel activity

channel activitypassive transmembrane transporter activity

growth factor activitycation channel activity

metal ion transmembrane transporter activitycollagen bindingheparin binding

sulfur compound bindingintegrin binding

glycosaminoglycan bindingextracellular matrix binding

peptide receptor activityG-protein coupled peptide receptor activity

chemokine bindingG-protein coupled receptor binding

growth factor bindingcytokine binding

serine-type endopeptidase activitydeath receptor activity

tumor necrosis factor-activated receptor activitycytokine receptor activity

protein heterodimerization activitydipeptidase activity

glycoprotein bindingRAGE receptor binding

cytokine receptor bindingcytokine activity

GBM(203)

LSCC(1015)

OvCa(479)

PrCa(3019)

001

002

003

004

padjust

GeneRatio002

004

006

008

Taste transduction

ECM-receptor interaction

cAMP signaling pathway

Calcium signaling pathway

Neuroactive ligand-receptor interaction

PI3K-Akt signaling pathway

Pathways in cancerRegulation of actin cytoskeleton

Complement and coagulation cascades

AGE-RAGE signaling pathway in diabetic complications

Hematopoietic cell lineage

Th17 cell differentiation

Inflammatory bowel disease (IBD)

Malaria

Osteoclast differentiation

Influenza A

Tuberculosis

Intestinal immune network for IgA production

Chagas disease (American trypanosomiasis)

Leishmaniasis

TNF signaling pathway

Toll-like receptor signaling pathway

Rheumatoid arthritis

Cytokine-cytokine receptor interaction

GBM(129)

LSCC(500)

OvCa(266)

PrCa(1364)

GeneRatio

005

010

015

001

002

003

004padjust

GO enrichment analysis KEGG enrichment analysis

A

B

Figure 4 Differential network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) Degree distribution of cancer-specific lncRNAndashmRNA

causal networks in GBM LSCC OvCa and PrCa (B) Functional enrichment analysis of cancer-related lncRNAndashmRNA causal networks in GBM LSCC OvCa and PrCa

Module-specific lncRNA-mRNA causal regulatory networks | 11

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

and PrCa In Figure 5B we also find that the core network con-tains several cancer genes (34 26 30 and 38 cancer genes asso-ciated with GBM LSCC OvCa and PrCa respectively)

By conducting GO and KEGG enrichment analysis we findthat the core network is significantly enriched in 399 GO biologi-cal processes and 3 KEGG pathways (details in SupplementaryFile S2) Of the 399 GO biological processes 2 GO terms includ-ing negative regulation of cell adhesion (GO 0007162) and cyto-kine production in immune response (GO 0002367) areinvolved in three cancer hallmarks Tissue Invasion andMetastasis Tumor Promoting Inflammation and EvadingImmune Detection [73] This observation implies that the corenetwork may control these cancer-related hallmarks

Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

We divide the hub lncRNAs into two categories (1) conserved hublncRNAs which exist in at least three human cancers and (2)cancer-specific hub lncRNAs which only exist in single humancancer As a result we obtain 9 conserved hub lncRNAs and 828cancer-specific hub lncRNAs (include 11 GBM-specific 246 LSCC-specific 47 OvCa-specific and 524 PrCa-specific hub lncRNAs)

To evaluate whether the hub lncRNAs can distinguish meta-stasis risks of human cancers we use them to predict metasta-sis risks for tumor samples in GBM LSCC OvCa and PrCaAs shown in Figure 6A the conserved hub lncRNAs can discrim-inate the metastasis risks of tumor samples significantly(Log-rank P-valuelt 005) in four human cancers In Figure 6Bexcepting LSCC-specific hub lncRNAs owing to failing to fit aCox regression model GBM-specific OvCa-specific and PrCa-specific hub lncRNAs can discriminate the metastasis risks oftumor samples significantly in GBM OvCa and PrCa respec-tively (Log-rank P-valuelt 005) These results suggest that thehub lncRNAs are discriminative and can act as biomarkers todistinguish between high- and low-risk tumor samples

Experimentally validated lncRNAndashmRNA regulations aremostly bad hits for LncTar

Using a collection of experimentally validated lncRNAndashmRNAregulatory relationships (details in Supplementary File S3) asthe ground truth the numbers of experimentally confirmedlncRNAndashmRNA causal regulations are 17 14 20 and 42 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4)

Figure 5 Conservative network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) The core lncRNAndashmRNA causal network that

occurred in at least three human cancers The red diamond nodes and white circle nodes denote lncRNAs and mRNAs respectively (B) Survival analysis of the core

lncRNAndashmRNA causal network

12 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as

lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations

A

B

Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs

Module-specific lncRNA-mRNA causal regulatory networks | 13

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

MSLCRN networks are biologically meaningful

In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks

We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases

Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful

Comparison with other PC-based networkinference methods

Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045

We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers

Conclusions and discussion

Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions

As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological

functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks

Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs

In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers

Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and

Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks

Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)

MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)

Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-

ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-

ciated MSLCRN networks

14 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs

Key Points

bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers

bull lncRNAs exhibit dynamic positive gene regulationacross human cancers

bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships

bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms

Supplementary Data

Supplementary data are available online at httpsacademicoupcombib

Funding

The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)

References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding

RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5

2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69

3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63

4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042

5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30

6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4

7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82

8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63

9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56

10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6

11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12

12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74

13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89

14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22

15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156

16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78

17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683

18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15

19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731

20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13

21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559

22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64

23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8

24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526

25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13

26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3

27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82

28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19

29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46

Module-specific lncRNA-mRNA causal regulatory networks | 15

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60

31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016

32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12

33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892

34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408

35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51

36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32

37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305

38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15

39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99

40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057

41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6

42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6

43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083

44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138

45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8

46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7

47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7

48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370

49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5

50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034

51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082

52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2

53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9

54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17

55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000

56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000

57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1

58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6

59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910

60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000

61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7

62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9

63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30

64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5

65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765

66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9

67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40

68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46

69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52

70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94

71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88

16 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58

73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74

74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104

75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5

76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31

77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90

Module-specific lncRNA-mRNA causal regulatory networks | 17

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats

  • bby008-TF1
  • bby008-TF51
  • bby008-TF2
Page 4: Inferring and analyzing module-specific lncRNA-mRNA causal ...nugget.unisa.edu.au/Thuc/Briefings2019JP.pdf · Thuc Duy Le is a research fellow at the University of South Australia

Table 1 Summary of computational methods or tools for inferring lncRNAndashmRNA interactions

Methodstools Categories of methods Brief descriptions Available

GUUGle [6] Sequence-based Target prediction by locating potential helical regions of RNAndashRNA pairs under RNA base pairing rules which include G-Ubases

httpbibiserv2cebitecuni-bielefelddeguugle

RNAup [7] Sequence-based Target prediction by studying thermodynamics of RNAndashRNApairs based on the sum of the energy of binding andhybridization

httprnatbiunivieacatcgi-binRNAWebSuiteRNAupcgi

RNAcofold [26] Sequence-based Target prediction by computing the hybridization energy andbase pairing pattern of RNAndashRNA pairs

httprnatbiunivieacatcgi-binRNAWebSuiteRNAcofoldcgi

Alkan et al [27] Sequence-based Target prediction by minimizing the joint free energy of RNAndashRNA pairs under a number of energy models including basepair energy model stacked pair energy model loop energymodel

On request

RNAplex [8] Sequence-based Target prediction by finding possible hybridization sites ofRNAndashRNA pairs

httpwwwtbiunivieacathtafer

IntaRNA [9] Sequence-based Target prediction by incorporating accessibility of target sitesas well as the existence of a user-definable seed

httprnainformatikuni-freiburgdeIntaRNAInputjsp

RactIP [10] Sequence-based Target prediction by integrating approximate information onan ensemble of equilibrium joint structures into the objec-tive function of integer programming

httprtipsdnabiokeioacjpractip

PETcofold [28] Sequence-based Target prediction by taking covariance information in intra-molecular and intermolecular base pairs into account

httprthdkresourcespetcofold

RIsearch [29] Sequence-based Target prediction by implementing a simplified Turner energymodel for fast computation of hybridization

httpsrthdkresourcesrisearchrisearch1php

RIsearch2 [30] Sequence-based An updated version of RIsearch and predict targets using a sin-gle integrated seed-and-extend framework based on suffixarrays

httpsrthdkresourcesrisearch

LncTar [11] Sequence-based lncRNA target prediction by finding the minimum free energyjoint structure of RNAndashRNA pairs based on base pairing

httpwwwcuilabcnlnctar

lncRNATargets [31] Sequence-based lncRNA target prediction based on nucleic acidthermodynamics

httpwwwherbbolorg 8001lrt

Terai et al [32] Sequence-based lncRNA target prediction by developing an integrated pipelineon the K computer which is one of the fastest super-com-puters in the world

httprtoolscbrcjpcgi-binRNARNAindexpl

RIblast [12] Sequence-based Target prediction based on the seed-and-extension approach httpgithubcomfukunagatsuRIblast

Liao et al [16] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodand the identified lncRNAndashmRNA interactions should be co-expressed in the same direction in no less than 3 Mousemicroarray data sets

On request

Guo et al [17] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodin OvCa malignant progression

On request

Du et al [18] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodand a power function in thyroid cancer

On request

Liu et al [33] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodin human colorectal carcinoma

On request

Huang et al [34] Expression-based Identify lncRNAndashmRNA interactions associated with pneumo-nia by using Pearson method

On request

Li et al [35] Expression-based Identify dynamic lncRNAndashmRNA interactions associated withvenous congestion by using Pearson method

On request

Wu et al [19] Expression-based Identify lncRNAndashmRNA interactions by using a generalized lin-ear model to regress mRNA expression on lncRNA expres-sion in breast cancer

On request

Fu et al [36] Expression-based Identify lncRNAndashmRNA interactions by considering mRNA lociwithin lncRNA and the Pearson correlation in cartilage

On request

Zhang et al [37] Expression-based Identify lncRNAndashmRNA interactions by considering mRNA lociwithin lncRNA and the Pearson correlation in cartilage inperipheral blood mononuclear cells

On request

Iwakiri et al [38] Expression-based Identify tissue-specific lncRNAndashmRNA interactions by integrat-ing the tissue specificity of lncRNAs and mRNAs intosequence-based prediction of human lncRNAndashRNAinteractions

On request

Lv et al [39] Expression-based Identify tissue-specific lncRNAndashmRNA interactions by usingPearson and sequence-based methods and in human intra-hepatic cholangiocarcinoma

On request

Module-specific lncRNA-mRNA causal regulatory networks | 3

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

predicted free energies of binding with RNA interference experi-mental data RNAup can produce biologically reasonableresults For genome-wide predictions of ncRNA targets RNAupis not fast enough Therefore it is usually to be combined withother faster RNAndashRNA prediction methods

To extend the standard dynamic programming algorithmsfor computing RNA secondary structures Bernhart et al [26]propose a program named RNAcofold to compute the hybridiza-tion energy and base pairing pattern of the co-folding of twoRNA molecules However the method disregards some impor-tant interaction structures and is restricted to dimeric com-plexes Moreover for the RNAndashRNA interaction predictionpredicting the joint secondary structure of two interacting RNAsis also important To solve it Alkan et al [27] develop severalalgorithms to minimize the joint free energy between the twoRNAs under a number of energy models Assuming that con-served RNAndashRNA interactions imply conserved functionSeemann et al [28] also implement a comparative method calledPETcofold to predict the joint secondary structure of two inter-acting RNAs As PETcofold considers sequence conservation anincreasing amount of structural covariance can further improveits performance

RNAup [7] and RNAcofold [26] are too slow for genome-widesearch in finding target sites of ncRNAs To accelerate the speedof RNAndashRNA interaction predictions RNAplex [8] is presented toquickly find possible hybridization sites between two interact-ing RNAs To focus on the target search on short highly stableinteractions RNAplex introduces a per nucleotide penaltyMeanwhile another general and fast approach IntaRNA [9] isproposed to efficiently predict bacterial RNAndashRNA interactionsCompared with other existing target prediction methodsIntaRNA considers both the accessibility of target sites and theexistence of a user-defined seed Therefore it shows a higheraccuracy than competing methods Kato et al [10] also present afast and accurate prediction method RactIP for comprehensivetype of RNAndashRNA interactions In terms of predicting joint sec-ondary structures of two interacting RNAs RactIP run incompa-rably faster than competitive programs

To further achieve a speed improvement of predictingRNAndashRNA interactions Wenzel et al [29] present RIsearch forfast computation of hybridization between two interactingRNAs They show that the energy model of RIsearch is anaccurate approximation of the full energy model for near-complementary RNAndashRNA duplexes Furthermore RIsearch isfaster than RNAplex [8] in RNAndashRNA interaction searchRecently RIsearch2 [30] an updated version of RIsearch [29] isproposed to localize potential near-complementary RNAndashRNAinteractions between two RNA sequences The comparisonresults show that RIsearch2 is much faster than the previousmethods such as GUUGle [6] RNAplex [8] IntaRNA [9] andRIsearch [29]

Although the above RNAndashRNA interaction prediction meth-ods can be extended to predict lncRNAndashmRNA interactionsnone of them are exclusively used for identifying the RNA tar-gets of lncRNAs in a large scale To efficiently identify lncRNAndashmRNA interactions Li et al [11] propose a tool named LncTarLncTar explores lncRNAndashmRNA interactions by finding the min-imum free energy joint structure of two interacting RNAs basedon base pairing As LncTar runs fast and does not have a limitto RNA size it can be used for large-scale identification of theRNA targets for all RNAs Another web-based platformlncRNATargets [31] is also provided for lncRNA target predic-tion Because there is no limit to RNA size lncRNATargets canalso be used to identify the RNA targets of all RNAs In a whole

human transcriptome Terai et al [32] develop an integratedpipeline to predict lncRNAndashmRNA interactions for the first timeIn the pipeline IntaRNA [9] is used to calculate interactionenergy and RactIP [10] is used to predict joint secondary struc-ture Recently to further shorten the running time of predictinglncRNAndashmRNA interactions an ultrafast RNAndashRNA interactionprediction method RIblast [12] based on the seed-and-extensionmethod is presented The comparison results show that RIblastruns faster than RNAplex [8] IntaRNA [9] Terai et al pipeline[32] and thus can be applied to a large scale of lncRNA targetidentification

Expression-based method

At the gene expression level the co-expressed lncRNAndashmRNApairs are regarded as lncRNAndashmRNA interactions for theexpression-based methods Among the existing expression-based methods [16ndash19 33ndash39] Pearson correlation method is akey step of most methods to identify co-expressed lncRNAndashmRNA pairs

Liao et al [16] construct a lncRNAndashmRNA co-expression net-work from re-annotated mouse microarray data sets By usingPearson method they only keep the lncRNAndashmRNA pairs withPlt 001 and Pearson correlation ranked in the top or bottom005 percentile The study is the first large-scale prediction oflncRNA functions from a lncRNAndashmRNA co-expression networkTo identify immune-associated lncRNA biomarkers in OvCa Guoet al [17] make a comprehensive analysis of lncRNAndashmRNA co-expression patterns To identify lncRNAndashmRNA co-expressionpairs they calculate Pearson correlation between differentiallyexpressed lncRNAs and mRNAs They only reserve the lncRNAndashmRNA co-expression pairs with Pearson correlationgt 05 and thecorresponding False Discovery Rate (FDR)lt 001 Liu et al [33] andHuang et al [34] also use Pearson method to study lncRNAndashmRNAco-expression networks in human colorectal carcinoma andpneumonia respectively The inferred lncRNAndashmRNA co-expression networks will help to study lncRNA functionsRecently Du et al [18] propose a two-step method to conduct acomprehensive analysis of lncRNAndashmRNA co-expression patternsin thyroid cancer First they use Pearson method to calculatePearson correlation and the cutoff of Pearson correlation is 05and the corresponding FDR cutoff is 001 Second the Pearson cor-relations are transformed into an adjacency matrix

Owing to dynamic characteristic of gene regulatory net-works Wu et al [19] identify two distinct lncRNAndashmRNA co-expression networks in tumor and normal breast tissue Theyuse a generalized linear model to regress mRNA expression onlncRNA expression in tumor and normal breast tissue and onlyfocus on dynamic breast lncRNAndashmRNA co-expression pairsthat differ in tumor and normal breast tissue Meanwhile tostudy the potential role of lncRNAs in venous congestion Liet al [35] also construct a dynamic lncRNAndashmRNA co-expression network By using Pearson method they separatelycalculate Pearson correlations of each lncRNAndashmRNA pair invenous congestion and normal samples The lncRNAndashmRNApairs with Pearson correlationgt099 orlt099 and P-valuelt001are selected as lncRNAndashmRNA co-expression pairs They con-struct two types of lncRNAndashmRNA co-expression networkslsquolostrsquo network where lncRNAndashmRNA co-expression pairs onlyexisted in normal samples and lsquoobtainedrsquo network wherelncRNAndashmRNA co-expression pairs only existed in venous con-gestion samples The lsquolostrsquo and lsquoobtainedrsquo networks are furtherintegrated to obtain a dynamic lncRNAndashmRNA co-expressionnetwork

4 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

The above methods simply use matched lncRNA and mRNAexpression data to identify lncRNAndashmRNA co-expression pairsTo identify lsquocis-regulated target genesrsquo of lncRNAs some methodsalso consider mRNA loci information within lncRNA For exampleFu et al [36] combine mRNA loci information and matchedlncRNA and mRNA expression data to predict lncRNA targetsThey identify the mRNAs as targets under two conditions (i) themRNA loci are within a 300-kb window up- or downstream oflncRNA and (ii) lncRNAndashmRNA co-expression pairs are signifi-cantly positive correlated (Pearson correlationgt 08 and the corre-sponding P-valuelt 005) Zhang et al [37] also use a similarmethod to Fu et al [36] for identifying lncRNA targets The mRNAscan be regarded as targets when (1) the mRNA loci are within a10 window up- or downstream of lncRNA and (2) lncRNAndashmRNAco-expression pairs are significantly positive correlated (Pearsoncorrelationgt 098 and the corresponding P-valuelt 005)

Apart from mRNA loci information within lncRNA someemerging methods consider predictions from sequence-basedmethods as putative lncRNAndashmRNA interactions For exampleIwakiri et al [38] integrate tissue-specific lncRNA and mRNAexpression data into predictions from a sequence-basedmethod in [32] They discover that integrating tissue specificitycan improve prediction accuracy of lncRNAndashmRNA interactionsLv et al [39] also combine matched lncRNA and mRNA expres-sion data with predictions from a sequence-based methodLncTar [11] They first use Pearson method to identify co-expressed lncRNAndashmRNA co-expression pairs with Pearsoncorrelationgt095 orlt095 Then LncTar is used to further filterthe identified lncRNAndashmRNA co-expression pairs

Public databases for storing lncRNAndashmRNAregulatory relationships

In this section we review the public databases of storinglncRNAndashmRNA regulatory relationships Table 2 shows a sum-mary of the third-party public databases including experimen-tally validated and computationally predicted databases

NPInter [40] contains experimentally validated interactionsbetween ncRNAs especially lncRNAs and miRNAs The databasecontains 915 067 interactions in 188 tissues or cell lines from 68kinds of experimental technologies There is a classification ofthe functional interactions based on the functional process thatncRNA is involved in Moreover NPInter allows users to searchinteractions related publications and other information

LncRNADisease [41] not only collects experimentally sup-ported lncRNAndashdisease associations and lncRNA interactionsbut also predicts novel lncRNAndashdisease associations Recentlythe database curates 478 entries of experimentally validatedlncRNA interactions LncRNADisease provides users severalways to search lncRNA-related diseases and interactions

To study differentially expressed genes after lncRNA knock-down or overexpression Jiang et al [42] develop a databasecalled LncRNA2Target in human and mouse organisms Thedatabase has a collection of 396 experimentally validatedlncRNAndashtarget interactions In LncRNA2Target if a gene is dif-ferentially expressed after lncRNA knockdown or overexpres-sion it is regarded as a target of a lncRNA For convenienceLncRNA2Target allows users to search for the targets of singlelncRNA or for the lncRNAs that target a specific geneMeanwhile Zhou et al [43] also build a reference resourceLncReg for lncRNA-related regulatory networks The databasehas 1081 experimentally validated lncRNA-related regulatory

records between 258 nonredundant lncRNAs and 571 nonredun-dant genes

IRNdb [44] is a database that focuses on collecting immuno-logically relevant lncRNAndashtarget miRNAndashtarget and PIWI-interacting RNAndashtarget interactions The current version ofIRNdb documents 22 453 immunologically relevant lncRNAndashtar-get interactions by integrating three databases LncRNADisease[41] LncRNA2Target [42] and LncReg [43] The aim is to helpresearchers study the roles of ncRNAs in the immune systemRecently a new experimentally validated database namedlncRInter [45] was developed to collect reliable and high-qualitylncRNAndashtarget interactions The extracted lncRNAndashtarget inter-actions are all from published literature and are supported bycertain biological experiments (eg luciferase reporter assayin vitro binding assay RNA pull-down) In total lncRInter con-tains 1036 experimentally validated lncRNAndashtarget interactionsin 15 organisms

In addition to the experimentally validated databases pre-sented above there are several computationally predicted data-bases for collecting lncRNAndashmRNA interactions For instancestarBase [46] is a comprehensive database of systematicallyidentifying the RNAndashRNA and proteinndashRNA interaction net-works from 108 CLIP-Seq (PAR-CLIP HITS-CLIP iCLIP CLASH)data sets The lncRNAndashmRNA interactions can be extractedfrom proteinndashRNA interaction networks InCaNet [47] aimsto establish a comprehensive regulatory network betweenlncRNAs and cancer genes They identify lncRNAndashcancergene interactions by computing gene co-expression betweenlncRNAs and cancer genes BmncRNAdb [48] is a comprehensivedatabase of silkworm lncRNAs and miRNAs The database pro-vides three online tools for users to predict both lncRNAndashtargetand miRNAndashtarget interactions lncRNAtor [49] collect expres-sion data from 243 RNA-seq experiments including 5237samples of various tissues and developmental stages ThelncRNAndashmRNA co-expression pairs are identified through co-expression analysis of lncRNAs and mRNAs lncRNome [50] is acomprehensive knowledgebase of sequence structure biologi-cal functions genomic variations and epigenetic modificationson gt17 000 lncRNAs in human For lncRNAndashprotein interactionsthe database incorporates PAR-CLIP experiments and a supportvector machine-based prediction method Co-lncRNA [51] andLncRNA2Function [52] predict co-expressed lncRNAndashmRNAinteractions from RNA-Seq data and further annotates thepotential functions of human lncRNAs using functional enrich-ment analysis lncRNAMap [53] is an integrated and compre-hensive database to explore regulatory functions of humanlncRNAs By integrating small RNAs supported by publicly avail-able deep sequencing data lncRNAMap construct lncRNA-derived siRNAndashtarget interactions

In summary for experimentally validated databases userscan select individual database or combine several databases asground truth to validate the predicted lncRNAndashmRNA interac-tions As for computationally predicted databases they can beused as initial structural of sequence-based or expression-basedmethods to identify lncRNAndashmRNA interactions

Inferring and analyzing MSLCRN networksRepurposed microarray data across human cancers

We collect the repurposed lncRNA and mRNA expression dataof GBM LSCC OvCa and PrCa from [25] A lncRNA or mRNA iseliminated if it does not have a corresponding gene symbol in adata set By calculating average expression values of replicate

Module-specific lncRNA-mRNA causal regulatory networks | 5

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

lncRNAs and mRNAs we obtain unique expression value ofthese replicates Consequently we get the matched expressiondata of 9704 lncRNAs and 18 282 mRNAs in 451 GBM 113 LSCC585 OvCa and 150 PrCa samples

Pipeline of MSLCRN

As shown in Figure 2 MSLCRN contains the following threesteps to infer module-specific lncRNAndashmRNA causal regulatorynetworks

i Identification of lncRNAndashmRNA co-expression modulesGiven the matched lncRNA and mRNA expression data weuse WGCNA to generate gene co-expression modules Amodule containing at least two lncRNAs and two mRNAsare regarded as a lncRNAndashmRNA co-expression moduleand used as the input of the second step

ii Identification of module-specific lncRNAndashmRNA causal reg-ulatory networks For each lncRNAndashmRNA co-expressionmodule with each lncRNAndashmRNA pair we apply parallelIDA to estimate the causal effect of the lncRNA on the

Table 2 Public databases for storing lncRNAndashmRNA regulatory relationships

Databases Types of databases Brief descriptions Organisms Available

NPInter [40] Validated A database of experimentally verified func-tional interactions between ncRNAs(including lncRNAs miRNAs etc) and bio-molecules (proteins RNAs and DNAs)

22 organisms httpwwwbioinfoorgNPInter

LncRNADisease [41] Validated A database of experimentally supportedlncRNAndashdisease association data andlncRNAndashtarget interactions in various lev-els including protein RNA miRNA andDNA

Human httpwwwcuilabcnlncrnadisease

LncRNA2Target [42] Validated A database of lncRNAndashtarget regulatory rela-tionships experimentally validated bylncRNA knockdown or overexpression

Human mouse httpwwwlncrna2targetorg

LncReg [43] Validated A database of experimentally validatedlncRNAndashtarget interactions from publicliterature

7 organisms httpbioinformaticsustceducnlncreg

IRNdb [44] Validated A database of immunologically relevantncRNAs (miRNAs lncRNAs and otherncRNAs) and target genes

Human mouse httpcompbiomasseyacnzappsirndb

lncRInter [45] Validated A database of experimentally validatedlncRNAndashtarget interactions extracted frompeer-reviewed publications

15 organisms httpbioinfolifehusteducnlncRInter

starBase [46] Predicted A comprehensive database of systematicallyidentifying the RNAndashRNA and proteinndashRNAinteraction networks from 108 CLIP-Seq(PAR-CLIP HITS-CLIP iCLIP CLASH) datasets

Human httpstarbasesysueducn

lnCaNet [47] Predicted A database of establishing a comprehensiveregulatory network source for lncRNA andcancer genes

Human httplncanetbioinfo-minzhaoorg

BmncRNAdb [48] Predicted A comprehensive database of the silkwormlncRNAs and miRNAs as well as the threeonline tools for users to predict the targetgenes of lncRNAs or miRNAs

Bombyx mori httpgenecqueducnBmncRNAdbindexphp

lncRNAtor [49] Predicted A comprehensive resource of encompassingannotation sequence analysis geneexpression protein binding and phyloge-netic conservation

6 organisms httplncrnatorewhaackr

lncRNome [50] Predicted A comprehensive knowledgebase on thetypes chromosomal locations descriptionon the biological functions and diseaseassociations of lncRNAs

Human httpgenomeigibresinlncRNome

Co-LncRNA [51] Predicted A computationally predicted database toidentify GO annotations and KEGG path-ways affected by co-expressed protein-cod-ing genes of a single or multiple lncRNAs

Human httpwwwbio-bigdatacomCo-LncRNA

LncRNA2Function [52] Predicted A comprehensive resource of investigatingthe functions of lncRNAs based on co-expressed lncRNAndashmRNA interactions

Human httpmlghiteducnlncrna2function

lncRNAMap [53] Predicted An integrated and comprehensive databaseof regulatory functions of lncRNAs and act-ing as ceRNAs

Human httplncRNAMapmbcnctuedutw

6 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

mRNA We use the absolute value of the causal effect(AVCE) to evaluate the strength of the regulation of thelncRNA on the mRNA and a higher AVCE indicates a stron-ger lncRNA regulation The lncRNAndashmRNA pairs with highAVCEs in each module are considered as module-specificlncRNAndashmRNA causal regulatory relationships and we calleach module with these relationships identified a module-specific causal regulatory network

iii Identification of global lncRNAndashmRNA causal regulatorynetwork We integrate the module-specific lncRNAndashmRNA

causal regulatory networks to form the global lncRNAndashmRNA causal regulatory network

Identification of lncRNAndashmRNA co-expression modules

In systems biology WGCNA [21] is a popular method for findingthe correlation patterns among genes across samples and canbe used to identify clusters or modules of highly co-expressedgenes Therefore we use WGCNA to first infer lncRNAndashmRNAco-expression modules

Figure 2 The pipeline of MSLCRN First WGCNA is used to identify lncRNAndashmRNA co-expression modules from matched lncRNA and mRNA expression data Second

we infer lncRNAndashmRNA causal regulatory relationships in each module by using parallel IDA method For each module we assemble the identified lncRNAndashmRNA reg-

ulatory relationships to obtain a module-specific lncRNAndashmRNA causal regulatory network Third the module-specific lncRNAndashmRNA causal regulatory networks are

integrated to form a global lncRNAndashmRNA causal regulatory network

Module-specific lncRNA-mRNA causal regulatory networks | 7

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

Specifically the matched lncRNA and mRNA expressiondata are used as the input of WGCNA For each pair of genesi and j the gene co-expression similarity sij of the pair is definedas

sij frac14 jcorethi jTHORNj (1)

where jcor(i j)j is the absolute value of the Pearson correlationbetween genes i and j The gene co-expression similarity matrixis denoted by Sfrac14 [sij]

To pick an appropriate soft-thresholding power for trans-forming the similarity matrix S into an adjacency matrix A weuse the scale-free topology criterion for soft-thresholding andthe minimum scale free topology fitting index R2 is set as 09Then the topological overlap matrix (TOM) Wfrac14 [wij] is gener-ated based on the adjacency matrix Afrac14 [aij] The TOM similaritywij between genes i and j is defined

wij frac14P

uaiuauj thorn aij

minfP

uaiuP

uaujg thorn 1 aij(2)

where u denotes all genes of the matched lncRNA and mRNAexpression data The TOM dissimilarity between genes i and j isdenoted by dijfrac14 1 - wij To identify gene co-expression modulesthe TOM dissimilarity matrix Dfrac14 [dij] is clustered using optimalhierarchical clustering method [54] Here the identified geneco-expression modules are groups of lncRNAs and mRNAs withhigh topological overlap The lncRNAs and mRNAs of eachlncRNAndashmRNA co-expression module are considered for possi-ble lncRNAndashmRNA causal relationships in the next step

Identification of module-specific lncRNAndashmRNA causalregulatory networks

After the identification of lncRNAndashmRNA co-expression mod-ules we use the parallel IDA method [24] to estimate causaleffects of possible lncRNAndashmRNA causal pairs in each moduleThe application of parallel IDA method to matched lncRNAand mRNA expression data for estimating causal effectsincludes two steps (i) learning the causal structure fromexpression data using the parallel-PC algorithm [24] and(ii) estimating the causal effects of lncRNAs on mRNAs byapplying do-calculus [55]

In step (i) Vfrac14 L1 Lm T1 Tn is a set of random varia-bles denoting m lncRNAs and n mRNAs The causal structure isin the form of a DAG where a node denotes a lncRNA Li ormRNA Tj and an edge between two nodes represents a causalrelationship between them We use the parallel-PC algorithm aparallel version of the PC algorithm [56] to learn the causalstructures (the DAGs) from expression data Starting with a fullyconnected undirected graph the parallel-PC algorithm deter-mines if an edge is retained or removed in the graph by con-ducting conditional independence tests in parallel Then to geta DAG the directions of edges in the obtained graph are ori-ented As different DAGs may represent the same conditionalindependence the parallel-PC algorithm uses a completed par-tially directed acyclic graph (CPDAG) to uniquely describe anequivalence class of DAGs In this work we use the R-packageParallelPC [57] to implement the parallel-PC algorithm and setthe significant level of the conditional independence testsafrac14 001

In step (ii) we are only interested in estimating the causaleffect of the directed edge Li Tj where vertex is Li a parent ofvertex Tj As described above a CPDAG may generate a class ofDAGs For the causal effect of Li Tj in a CPDAG we use do-calculus [55] to estimate the causal effects of Li on Tj in a class ofDAGs Then we use the minimum absolute value of all possiblecausal effects as a final causal effect of Li Tj As for the detailsof how the parallel IDA method is applied to estimate causalrelationships from expression data the readers can refer to [24]

The estimated causal effects can be positive or negativereflecting the up or down regulation by the lncRNAs on themRNAs For the purpose of constructing the regulatory net-works we use the absolute values of the causal effects (AVCEs)to evaluate the strengths of the regulation and thus to confirmthe regulatory relationships

We set different AVCE cutoffs from 010 to 060 with a step of005 to generate MSLCRN networks in GBM LSCC OvCa andPrCa respectively For each cutoff we merge the identifiedMSLCRN networks to obtain global lncRNAndashmRNA causal regu-latory networks in the four human cancers respectively Asshown in Table 3 a higher cutoff selection causes a smallerglobal lncRNAndashmRNA causal regulatory network but bettergoodness of fit To make a trade-off between the size of theglobal lncRNAndashmRNA causal regulatory networks and goodnessof fit we set a compromised AVCE cutoff with a value of 045 Ifthe AVCE of a lncRNA on a mRNA is 045 or above we considerthere is a causal regulatory relationship between the lncRNAndashmRNA pair Under the compromise cutoff we have a moderatesize of the global lncRNAndashmRNA causal regulatory networks inGBM LSCC OvCa and PrCa Meanwhile the node degree distri-butions of four global lncRNAndashmRNA causal regulatory net-works also follow power law distribution (the fitted power curveis in the form of yfrac14 axb) well with R2gt 08

Validation survival and enrichment analysis

Previous studies have demonstrated that about 20 of thenodes in a biological network are essential and are regarded ashub genes [58 59] Therefore when analyzing a global lncRNAndashmRNA causal network we select the 20 of lncRNAs with thehighest degrees in the network as hub lncRNAs The degree of alncRNA node in the global network is the number of mRNAsconnected with it

To validate the predicted module-specific lncRNAndashmRNAcausal regulatory relationships we obtain the experimentallyvalidated lncRNAndashmRNA regulatory relationships from thethree widely used databases NPInter v30 [40] LncRNADiseasev2017 [41] and LncRNA2Target v12 [42] Furthermore we retainexperimentally validated lncRNAndashmRNA regulatory relation-ships associated with the four human cancer data sets asground truth

We perform survival analysis using the R-package survival[60] A multivariate Cox model is used to predict the risk scoreof each tumor sample Then all tumor samples in each cancerdata set are equally divided into high- and low-risk groupsaccording to their risk scores Moreover we calculate theHazard Ratio between the high- and the low-risk groups andperform the Log-rank test

To further investigate the underlying biological processesand pathways related to each of the MSLCRN networks we usethe R-package clusterProfiler [61] to conduct functional enrich-ment analysis on the networks respectively The GeneOntology (GO) [62] biological processes and Kyoto Encyclopedia

8 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

of Genes and Genomes (KEGG) [63] pathways with adjustedP-valuelt005 [adjusted by Benjamini-Hochberg (BH) method]are regarded as functional categories for the MSLCRN networks

We also collect a list of lncRNAs and mRNAs that areassociated with GBM LSCC OvCa and PrCa to study diseaseenrichment of each of the MSLCRN networks The list of disease-associated lncRNAs is obtained from LncRNADisease v2017 [41]Lnc2Cancer v2016 [64] and MNDR v20 [65] The list of disease-associated mRNAs is from DisGeNET v50 [66] To evaluatewhether a MSLCRN network is significantly enriched in a specificdisease we use a hyper-geometric distribution test as follows

p frac14 1 FethxjBNMTHORN frac14 1Xx1

ifrac140

N

i

B N

M i

B

M

(3)

In the formula B is the number of all genes in the expressiondata set N denotes the number of all genes associated with aspecific disease in the expression data set M is the number ofgenes in a MSLCRN network and x is the number of genes asso-ciated with a specific disease in a MSLCRN network A MSLCRNnetwork is significantly enriched in a specific disease if theP-valuelt 005

Network analysis validation and comparisonon MSLCRN networkslncRNAs exhibit dynamic positive gene regulationacross cancers

By following the first step of the MSLCRN method we haveidentified 23 38 45 and 32 lncRNAndashmRNA co-expression mod-ules in GBM LSCC OvCa and PrCa respectively In the secondstep of the MSLCRN method we eliminate the noncausallncRNAndashmRNA pairs in lncRNAndashmRNA co-expression modulesAs a result we generate 23 38 45 and 32 module-specificlncRNAndashmRNA causal regulatory networks in GBM LSCC OvCaand PrCa respectively After merging the module-specificlncRNAndashmRNA causal regulatory networks for each data set weobtain the four global lncRNAndashmRNA regulatory networks inGBM LSCC OvCa and PrCa respectively

To understand the overlap and difference of module-specificgenes module-specific lncRNAndashmRNA causal regulatory rela-tionships and module-specific hub lncRNAs in the four humancancers we generate three set intersection plots using theR-package UpSetR [67] As shown in Figure 3 we find that themajority of module-specific genes (5752) module-specificlncRNAndashmRNA causal regulatory relationships (9902) andmodule-specific hub lncRNAs (8922) tend to be cancer-specific Only a small portion of module-specific genes (396) andmodule-specific lncRNAndashmRNA causal regulatory relationships(6) are shared by the four cancers Especially none of themodule-specific hub lncRNAs are common between the fourcancers In addition the causal effects are positive for 99569672 9993 and 7863 of the causal regulatory relationshipsidentified in GBM LSCC OvCa and PrCa respectively Theseresults indicate that lncRNAs are more likely to exhibit dynamicpositive gene regulation across cancers The results are alsoconsistent with the proposition that the positive gene regula-tion by lncRNAs would be desired in specific situations [68]

Differential network analysis uncovers cancer-specificlncRNAndashmRNA causal networks

In this section we focus on studying cancer-specific lncRNAndashmRNA causal networks using differential network analysisThus the GBM-specific LSCC-specific OvCa-specific and PrCa-specific lncRNAndashmRNA causal networks are identified Asshown in Figure 4A the distributions of node degrees in thesefour cancer-specific lncRNAndashmRNA causal networks followpower law distributions well with R2frac14 09774 09923 09723and 08310 respectively Thus these four cancer-specificlncRNAndashmRNA causal networks are scale free indicating thatmost mRNAs are regulated by a small number of lncRNAs

Table 3 Degree distributions of global lncRNAndashmRNA causal regula-tory networks with different cutoffs in GBM LSCC OvCa and PrCa

Datasets Cutoffs Number of causalregulations

yfrac14axb R2

GBM 010 11 847 yfrac142274x06893 04161015 10 924 yfrac142495x07275 05460020 9732 yfrac142745x0767 06475025 8461 yfrac142958x08074 06757030 7176 yfrac143194x08319 06807035 6041 yfrac143363x08703 07203040 4997 yfrac143741x09348 07999045 4074 y54082x21034 08694050 3279 yfrac144194x118 09244055 2583 yfrac143896x1259 09463060 1862 yfrac143666x143 09792

LSCC 010 789 172 yfrac143143x06071 04829015 684 524 yfrac143475x06323 05841020 569 369 yfrac143905x06525 06578025 451 346 yfrac144855x06928 07789030 340 860 yfrac146341x07554 08796035 244 547 yfrac148147x08379 09504040 166 593 yfrac149724x0935 09848045 108 024 y51031x21018 09933050 66 335 yfrac149425x1068 09963055 37 632 yfrac147807x1089 09948060 19 547 yfrac146565x1169 09972

OvCa 010 333 146 yfrac143272x05928 05042015 232 794 yfrac144192x06262 06531020 159 872 yfrac146398x07216 08247025 112 792 yfrac148816x08356 09120030 80 808 yfrac141008x09472 09551035 57 099 yfrac149545x1014 09744040 38 517 yfrac148198x1066 09748045 24 439 y56575x21066 09697050 14 435 yfrac14540x1079 09551055 7973 yfrac144368x1107 09319060 4026 yfrac143285x1107 09460

PrCa 010 1 894 322 yfrac143089x06245 02750015 1 749 595 yfrac143586x06787 03582020 1 594 744 yfrac144013x07169 04316025 1 429 858 yfrac144271x0732 04919030 1 260 968 yfrac144389x07244 05616035 1 097 654 yfrac144406x0702 06470040 946 439 yfrac144485x06816 07338045 812 687 y55175x207005 08206050 694 558 yfrac14667x07588 08823055 584 834 yfrac148833x08469 09332060 474 654 yfrac141113x09684 09503

Note The AVCE cutoffs range from 010 to 060 with a step of 005

The bold values are the degree distributions of global lncRNA-mRNA causal reg-

ulatory networks with a compromised AVCE cutoff (045) in four human cancers

Module-specific lncRNA-mRNA causal regulatory networks | 9

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

Next we use four lists of lncRNAs and mRNAs associatedwith GBM LSCC OvCa and PrCa to discover lncRNAndashmRNAcausal networks that are associated with the four human can-cers We define that cancer-related lncRNAndashmRNA causal regu-latory relationships are those in which at least one regulatoryparty is cancer-related lncRNA or mRNA As a result wehave extracted GBM-related LSCC-related OvCa-related andPrCa-related lncRNAndashmRNA causal networks from the fourcancer-specific lncRNAndashmRNA causal networks (details inSupplementary File S1) To understand the potential biologicalprocesses and pathways of the four cancer-related lncRNAndashmRNA causal networks we identify significant GO biologicalprocesses and KEGG pathways using functional enrichmentanalysis In Figure 4B several top GO biological processes and

KEGG pathways such as cytokine activity [69] G-proteincoupled receptor binding [70] TNF signaling pathway [71] cAMPsignaling pathway [72] pathways in cancer are closely associ-ated with the occurrence and development of cancer Thisresult suggests that the identified cancer-related lncRNAndashmRNA causal networks may be involved in the occurrence anddevelopment of human cancer

Conservative network analysis highlights a corelncRNAndashmRNA causal regulatory network acrosshuman cancers

Although most of the lncRNAndashmRNA causal regulatory relation-ships are cancer-specific there are still a number of common

396 424

106 115

1370

133 173

1599

126

1729

1224

252

2523

2157

5081

0

2000

4000

Inte

rsec

tion

Siz

e

PrCa

Ov Ca

LSCC

GBM

025

0050

0075

00

1000

0

Set Size

6 283

13 1 80 673

206

3969

76 3493

407

2816

9950

7

1948

7

0

250000

500000

750000

Inte

rsec

tion

Siz

e

PrCa

Ov Ca

LSCC

GBM

0e+0

0

2e+0

5

4e+0

5

6e+0

5

8e+0

5

Set Size

6 1 2 933

2

41

6 11

246

47

524

0

200

400In

ters

ectio

n S

ize

PrCa

Ov Ca

LSCC

GBM

0200

400

600

Set Size

Module-specific genes

Module-specific causal regulations Module-specific hub lncRNAs 808611

A

B C

Figure 3 Overlap and difference of module-specific genes module-specific causal regulations and module-specific hub lncRNAs across GBM LSCC OvCa and PrCa

(A) Module-specific genes (both lncRNAs and mRNAs) intersection plot (B) Module-specific causal regulations intersection plot (C) Module-specific hub lncRNAs inter-

section plot The red lines denote common genes and causal regulations across GBM LSCC OvCa and PrCa

10 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

causal regulatory relationships between the four global net-works To evaluate whether there is a common core of lncRNAndashmRNA causal regulatory relationships in the global regulatorynetworks across human cancers we concentrate on the con-served lncRNAndashmRNA causal regulatory relationships thatexisted in at least three human cancers

As shown in Figure 5A the majority of the conservedlncRNAndashmRNA causal regulatory relationships form a closely

connected community This finding indicates that the con-served lncRNAndashmRNA causal regulatory network may be a corenetwork across human cancers

The survival analysis shows that the lncRNAs and mRNAs inthe core network can significantly distinguish the metastasisrisks between the high- and low-risk groups in GBM OvCa andPrCa data sets (Figure 5B) This result suggests that the core net-work may act as a common network biomarker of GBM OvCa

Cancer-specific networks Causal regulations y=axb R2

GBM-specific 2816 y=4812x-1292 09774

LSCC-specific 99507 y=1034x-1014 09923

OvCa-specific 19487 y=7366x-1156 09723

PrCa-specific 808611 y=5243x-07055 08310

1 10 100 11001

10

100

1100

Degree of genes

Nu

mb

er o

f ge

nes

GBM-specific fitting curveLSCC-specific fitting curveOvCa-specific fitting curvePrCa-specific fitting curveGBM-specific degree distributionLSCC-specific degree distributionOvCa-specific degree distributionPrCa-specific degree distribution

solutecation symporter activitysymporter activity

gated channel activitysodium ion transmembrane transporter activity

ion channel activitysubstrate-specific channel activity

channel activitypassive transmembrane transporter activity

growth factor activitycation channel activity

metal ion transmembrane transporter activitycollagen bindingheparin binding

sulfur compound bindingintegrin binding

glycosaminoglycan bindingextracellular matrix binding

peptide receptor activityG-protein coupled peptide receptor activity

chemokine bindingG-protein coupled receptor binding

growth factor bindingcytokine binding

serine-type endopeptidase activitydeath receptor activity

tumor necrosis factor-activated receptor activitycytokine receptor activity

protein heterodimerization activitydipeptidase activity

glycoprotein bindingRAGE receptor binding

cytokine receptor bindingcytokine activity

GBM(203)

LSCC(1015)

OvCa(479)

PrCa(3019)

001

002

003

004

padjust

GeneRatio002

004

006

008

Taste transduction

ECM-receptor interaction

cAMP signaling pathway

Calcium signaling pathway

Neuroactive ligand-receptor interaction

PI3K-Akt signaling pathway

Pathways in cancerRegulation of actin cytoskeleton

Complement and coagulation cascades

AGE-RAGE signaling pathway in diabetic complications

Hematopoietic cell lineage

Th17 cell differentiation

Inflammatory bowel disease (IBD)

Malaria

Osteoclast differentiation

Influenza A

Tuberculosis

Intestinal immune network for IgA production

Chagas disease (American trypanosomiasis)

Leishmaniasis

TNF signaling pathway

Toll-like receptor signaling pathway

Rheumatoid arthritis

Cytokine-cytokine receptor interaction

GBM(129)

LSCC(500)

OvCa(266)

PrCa(1364)

GeneRatio

005

010

015

001

002

003

004padjust

GO enrichment analysis KEGG enrichment analysis

A

B

Figure 4 Differential network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) Degree distribution of cancer-specific lncRNAndashmRNA

causal networks in GBM LSCC OvCa and PrCa (B) Functional enrichment analysis of cancer-related lncRNAndashmRNA causal networks in GBM LSCC OvCa and PrCa

Module-specific lncRNA-mRNA causal regulatory networks | 11

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

and PrCa In Figure 5B we also find that the core network con-tains several cancer genes (34 26 30 and 38 cancer genes asso-ciated with GBM LSCC OvCa and PrCa respectively)

By conducting GO and KEGG enrichment analysis we findthat the core network is significantly enriched in 399 GO biologi-cal processes and 3 KEGG pathways (details in SupplementaryFile S2) Of the 399 GO biological processes 2 GO terms includ-ing negative regulation of cell adhesion (GO 0007162) and cyto-kine production in immune response (GO 0002367) areinvolved in three cancer hallmarks Tissue Invasion andMetastasis Tumor Promoting Inflammation and EvadingImmune Detection [73] This observation implies that the corenetwork may control these cancer-related hallmarks

Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

We divide the hub lncRNAs into two categories (1) conserved hublncRNAs which exist in at least three human cancers and (2)cancer-specific hub lncRNAs which only exist in single humancancer As a result we obtain 9 conserved hub lncRNAs and 828cancer-specific hub lncRNAs (include 11 GBM-specific 246 LSCC-specific 47 OvCa-specific and 524 PrCa-specific hub lncRNAs)

To evaluate whether the hub lncRNAs can distinguish meta-stasis risks of human cancers we use them to predict metasta-sis risks for tumor samples in GBM LSCC OvCa and PrCaAs shown in Figure 6A the conserved hub lncRNAs can discrim-inate the metastasis risks of tumor samples significantly(Log-rank P-valuelt 005) in four human cancers In Figure 6Bexcepting LSCC-specific hub lncRNAs owing to failing to fit aCox regression model GBM-specific OvCa-specific and PrCa-specific hub lncRNAs can discriminate the metastasis risks oftumor samples significantly in GBM OvCa and PrCa respec-tively (Log-rank P-valuelt 005) These results suggest that thehub lncRNAs are discriminative and can act as biomarkers todistinguish between high- and low-risk tumor samples

Experimentally validated lncRNAndashmRNA regulations aremostly bad hits for LncTar

Using a collection of experimentally validated lncRNAndashmRNAregulatory relationships (details in Supplementary File S3) asthe ground truth the numbers of experimentally confirmedlncRNAndashmRNA causal regulations are 17 14 20 and 42 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4)

Figure 5 Conservative network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) The core lncRNAndashmRNA causal network that

occurred in at least three human cancers The red diamond nodes and white circle nodes denote lncRNAs and mRNAs respectively (B) Survival analysis of the core

lncRNAndashmRNA causal network

12 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as

lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations

A

B

Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs

Module-specific lncRNA-mRNA causal regulatory networks | 13

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

MSLCRN networks are biologically meaningful

In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks

We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases

Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful

Comparison with other PC-based networkinference methods

Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045

We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers

Conclusions and discussion

Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions

As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological

functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks

Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs

In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers

Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and

Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks

Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)

MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)

Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-

ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-

ciated MSLCRN networks

14 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs

Key Points

bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers

bull lncRNAs exhibit dynamic positive gene regulationacross human cancers

bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships

bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms

Supplementary Data

Supplementary data are available online at httpsacademicoupcombib

Funding

The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)

References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding

RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5

2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69

3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63

4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042

5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30

6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4

7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82

8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63

9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56

10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6

11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12

12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74

13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89

14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22

15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156

16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78

17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683

18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15

19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731

20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13

21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559

22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64

23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8

24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526

25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13

26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3

27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82

28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19

29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46

Module-specific lncRNA-mRNA causal regulatory networks | 15

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60

31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016

32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12

33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892

34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408

35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51

36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32

37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305

38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15

39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99

40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057

41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6

42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6

43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083

44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138

45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8

46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7

47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7

48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370

49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5

50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034

51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082

52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2

53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9

54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17

55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000

56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000

57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1

58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6

59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910

60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000

61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7

62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9

63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30

64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5

65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765

66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9

67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40

68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46

69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52

70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94

71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88

16 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58

73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74

74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104

75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5

76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31

77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90

Module-specific lncRNA-mRNA causal regulatory networks | 17

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats

  • bby008-TF1
  • bby008-TF51
  • bby008-TF2
Page 5: Inferring and analyzing module-specific lncRNA-mRNA causal ...nugget.unisa.edu.au/Thuc/Briefings2019JP.pdf · Thuc Duy Le is a research fellow at the University of South Australia

predicted free energies of binding with RNA interference experi-mental data RNAup can produce biologically reasonableresults For genome-wide predictions of ncRNA targets RNAupis not fast enough Therefore it is usually to be combined withother faster RNAndashRNA prediction methods

To extend the standard dynamic programming algorithmsfor computing RNA secondary structures Bernhart et al [26]propose a program named RNAcofold to compute the hybridiza-tion energy and base pairing pattern of the co-folding of twoRNA molecules However the method disregards some impor-tant interaction structures and is restricted to dimeric com-plexes Moreover for the RNAndashRNA interaction predictionpredicting the joint secondary structure of two interacting RNAsis also important To solve it Alkan et al [27] develop severalalgorithms to minimize the joint free energy between the twoRNAs under a number of energy models Assuming that con-served RNAndashRNA interactions imply conserved functionSeemann et al [28] also implement a comparative method calledPETcofold to predict the joint secondary structure of two inter-acting RNAs As PETcofold considers sequence conservation anincreasing amount of structural covariance can further improveits performance

RNAup [7] and RNAcofold [26] are too slow for genome-widesearch in finding target sites of ncRNAs To accelerate the speedof RNAndashRNA interaction predictions RNAplex [8] is presented toquickly find possible hybridization sites between two interact-ing RNAs To focus on the target search on short highly stableinteractions RNAplex introduces a per nucleotide penaltyMeanwhile another general and fast approach IntaRNA [9] isproposed to efficiently predict bacterial RNAndashRNA interactionsCompared with other existing target prediction methodsIntaRNA considers both the accessibility of target sites and theexistence of a user-defined seed Therefore it shows a higheraccuracy than competing methods Kato et al [10] also present afast and accurate prediction method RactIP for comprehensivetype of RNAndashRNA interactions In terms of predicting joint sec-ondary structures of two interacting RNAs RactIP run incompa-rably faster than competitive programs

To further achieve a speed improvement of predictingRNAndashRNA interactions Wenzel et al [29] present RIsearch forfast computation of hybridization between two interactingRNAs They show that the energy model of RIsearch is anaccurate approximation of the full energy model for near-complementary RNAndashRNA duplexes Furthermore RIsearch isfaster than RNAplex [8] in RNAndashRNA interaction searchRecently RIsearch2 [30] an updated version of RIsearch [29] isproposed to localize potential near-complementary RNAndashRNAinteractions between two RNA sequences The comparisonresults show that RIsearch2 is much faster than the previousmethods such as GUUGle [6] RNAplex [8] IntaRNA [9] andRIsearch [29]

Although the above RNAndashRNA interaction prediction meth-ods can be extended to predict lncRNAndashmRNA interactionsnone of them are exclusively used for identifying the RNA tar-gets of lncRNAs in a large scale To efficiently identify lncRNAndashmRNA interactions Li et al [11] propose a tool named LncTarLncTar explores lncRNAndashmRNA interactions by finding the min-imum free energy joint structure of two interacting RNAs basedon base pairing As LncTar runs fast and does not have a limitto RNA size it can be used for large-scale identification of theRNA targets for all RNAs Another web-based platformlncRNATargets [31] is also provided for lncRNA target predic-tion Because there is no limit to RNA size lncRNATargets canalso be used to identify the RNA targets of all RNAs In a whole

human transcriptome Terai et al [32] develop an integratedpipeline to predict lncRNAndashmRNA interactions for the first timeIn the pipeline IntaRNA [9] is used to calculate interactionenergy and RactIP [10] is used to predict joint secondary struc-ture Recently to further shorten the running time of predictinglncRNAndashmRNA interactions an ultrafast RNAndashRNA interactionprediction method RIblast [12] based on the seed-and-extensionmethod is presented The comparison results show that RIblastruns faster than RNAplex [8] IntaRNA [9] Terai et al pipeline[32] and thus can be applied to a large scale of lncRNA targetidentification

Expression-based method

At the gene expression level the co-expressed lncRNAndashmRNApairs are regarded as lncRNAndashmRNA interactions for theexpression-based methods Among the existing expression-based methods [16ndash19 33ndash39] Pearson correlation method is akey step of most methods to identify co-expressed lncRNAndashmRNA pairs

Liao et al [16] construct a lncRNAndashmRNA co-expression net-work from re-annotated mouse microarray data sets By usingPearson method they only keep the lncRNAndashmRNA pairs withPlt 001 and Pearson correlation ranked in the top or bottom005 percentile The study is the first large-scale prediction oflncRNA functions from a lncRNAndashmRNA co-expression networkTo identify immune-associated lncRNA biomarkers in OvCa Guoet al [17] make a comprehensive analysis of lncRNAndashmRNA co-expression patterns To identify lncRNAndashmRNA co-expressionpairs they calculate Pearson correlation between differentiallyexpressed lncRNAs and mRNAs They only reserve the lncRNAndashmRNA co-expression pairs with Pearson correlationgt 05 and thecorresponding False Discovery Rate (FDR)lt 001 Liu et al [33] andHuang et al [34] also use Pearson method to study lncRNAndashmRNAco-expression networks in human colorectal carcinoma andpneumonia respectively The inferred lncRNAndashmRNA co-expression networks will help to study lncRNA functionsRecently Du et al [18] propose a two-step method to conduct acomprehensive analysis of lncRNAndashmRNA co-expression patternsin thyroid cancer First they use Pearson method to calculatePearson correlation and the cutoff of Pearson correlation is 05and the corresponding FDR cutoff is 001 Second the Pearson cor-relations are transformed into an adjacency matrix

Owing to dynamic characteristic of gene regulatory net-works Wu et al [19] identify two distinct lncRNAndashmRNA co-expression networks in tumor and normal breast tissue Theyuse a generalized linear model to regress mRNA expression onlncRNA expression in tumor and normal breast tissue and onlyfocus on dynamic breast lncRNAndashmRNA co-expression pairsthat differ in tumor and normal breast tissue Meanwhile tostudy the potential role of lncRNAs in venous congestion Liet al [35] also construct a dynamic lncRNAndashmRNA co-expression network By using Pearson method they separatelycalculate Pearson correlations of each lncRNAndashmRNA pair invenous congestion and normal samples The lncRNAndashmRNApairs with Pearson correlationgt099 orlt099 and P-valuelt001are selected as lncRNAndashmRNA co-expression pairs They con-struct two types of lncRNAndashmRNA co-expression networkslsquolostrsquo network where lncRNAndashmRNA co-expression pairs onlyexisted in normal samples and lsquoobtainedrsquo network wherelncRNAndashmRNA co-expression pairs only existed in venous con-gestion samples The lsquolostrsquo and lsquoobtainedrsquo networks are furtherintegrated to obtain a dynamic lncRNAndashmRNA co-expressionnetwork

4 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

The above methods simply use matched lncRNA and mRNAexpression data to identify lncRNAndashmRNA co-expression pairsTo identify lsquocis-regulated target genesrsquo of lncRNAs some methodsalso consider mRNA loci information within lncRNA For exampleFu et al [36] combine mRNA loci information and matchedlncRNA and mRNA expression data to predict lncRNA targetsThey identify the mRNAs as targets under two conditions (i) themRNA loci are within a 300-kb window up- or downstream oflncRNA and (ii) lncRNAndashmRNA co-expression pairs are signifi-cantly positive correlated (Pearson correlationgt 08 and the corre-sponding P-valuelt 005) Zhang et al [37] also use a similarmethod to Fu et al [36] for identifying lncRNA targets The mRNAscan be regarded as targets when (1) the mRNA loci are within a10 window up- or downstream of lncRNA and (2) lncRNAndashmRNAco-expression pairs are significantly positive correlated (Pearsoncorrelationgt 098 and the corresponding P-valuelt 005)

Apart from mRNA loci information within lncRNA someemerging methods consider predictions from sequence-basedmethods as putative lncRNAndashmRNA interactions For exampleIwakiri et al [38] integrate tissue-specific lncRNA and mRNAexpression data into predictions from a sequence-basedmethod in [32] They discover that integrating tissue specificitycan improve prediction accuracy of lncRNAndashmRNA interactionsLv et al [39] also combine matched lncRNA and mRNA expres-sion data with predictions from a sequence-based methodLncTar [11] They first use Pearson method to identify co-expressed lncRNAndashmRNA co-expression pairs with Pearsoncorrelationgt095 orlt095 Then LncTar is used to further filterthe identified lncRNAndashmRNA co-expression pairs

Public databases for storing lncRNAndashmRNAregulatory relationships

In this section we review the public databases of storinglncRNAndashmRNA regulatory relationships Table 2 shows a sum-mary of the third-party public databases including experimen-tally validated and computationally predicted databases

NPInter [40] contains experimentally validated interactionsbetween ncRNAs especially lncRNAs and miRNAs The databasecontains 915 067 interactions in 188 tissues or cell lines from 68kinds of experimental technologies There is a classification ofthe functional interactions based on the functional process thatncRNA is involved in Moreover NPInter allows users to searchinteractions related publications and other information

LncRNADisease [41] not only collects experimentally sup-ported lncRNAndashdisease associations and lncRNA interactionsbut also predicts novel lncRNAndashdisease associations Recentlythe database curates 478 entries of experimentally validatedlncRNA interactions LncRNADisease provides users severalways to search lncRNA-related diseases and interactions

To study differentially expressed genes after lncRNA knock-down or overexpression Jiang et al [42] develop a databasecalled LncRNA2Target in human and mouse organisms Thedatabase has a collection of 396 experimentally validatedlncRNAndashtarget interactions In LncRNA2Target if a gene is dif-ferentially expressed after lncRNA knockdown or overexpres-sion it is regarded as a target of a lncRNA For convenienceLncRNA2Target allows users to search for the targets of singlelncRNA or for the lncRNAs that target a specific geneMeanwhile Zhou et al [43] also build a reference resourceLncReg for lncRNA-related regulatory networks The databasehas 1081 experimentally validated lncRNA-related regulatory

records between 258 nonredundant lncRNAs and 571 nonredun-dant genes

IRNdb [44] is a database that focuses on collecting immuno-logically relevant lncRNAndashtarget miRNAndashtarget and PIWI-interacting RNAndashtarget interactions The current version ofIRNdb documents 22 453 immunologically relevant lncRNAndashtar-get interactions by integrating three databases LncRNADisease[41] LncRNA2Target [42] and LncReg [43] The aim is to helpresearchers study the roles of ncRNAs in the immune systemRecently a new experimentally validated database namedlncRInter [45] was developed to collect reliable and high-qualitylncRNAndashtarget interactions The extracted lncRNAndashtarget inter-actions are all from published literature and are supported bycertain biological experiments (eg luciferase reporter assayin vitro binding assay RNA pull-down) In total lncRInter con-tains 1036 experimentally validated lncRNAndashtarget interactionsin 15 organisms

In addition to the experimentally validated databases pre-sented above there are several computationally predicted data-bases for collecting lncRNAndashmRNA interactions For instancestarBase [46] is a comprehensive database of systematicallyidentifying the RNAndashRNA and proteinndashRNA interaction net-works from 108 CLIP-Seq (PAR-CLIP HITS-CLIP iCLIP CLASH)data sets The lncRNAndashmRNA interactions can be extractedfrom proteinndashRNA interaction networks InCaNet [47] aimsto establish a comprehensive regulatory network betweenlncRNAs and cancer genes They identify lncRNAndashcancergene interactions by computing gene co-expression betweenlncRNAs and cancer genes BmncRNAdb [48] is a comprehensivedatabase of silkworm lncRNAs and miRNAs The database pro-vides three online tools for users to predict both lncRNAndashtargetand miRNAndashtarget interactions lncRNAtor [49] collect expres-sion data from 243 RNA-seq experiments including 5237samples of various tissues and developmental stages ThelncRNAndashmRNA co-expression pairs are identified through co-expression analysis of lncRNAs and mRNAs lncRNome [50] is acomprehensive knowledgebase of sequence structure biologi-cal functions genomic variations and epigenetic modificationson gt17 000 lncRNAs in human For lncRNAndashprotein interactionsthe database incorporates PAR-CLIP experiments and a supportvector machine-based prediction method Co-lncRNA [51] andLncRNA2Function [52] predict co-expressed lncRNAndashmRNAinteractions from RNA-Seq data and further annotates thepotential functions of human lncRNAs using functional enrich-ment analysis lncRNAMap [53] is an integrated and compre-hensive database to explore regulatory functions of humanlncRNAs By integrating small RNAs supported by publicly avail-able deep sequencing data lncRNAMap construct lncRNA-derived siRNAndashtarget interactions

In summary for experimentally validated databases userscan select individual database or combine several databases asground truth to validate the predicted lncRNAndashmRNA interac-tions As for computationally predicted databases they can beused as initial structural of sequence-based or expression-basedmethods to identify lncRNAndashmRNA interactions

Inferring and analyzing MSLCRN networksRepurposed microarray data across human cancers

We collect the repurposed lncRNA and mRNA expression dataof GBM LSCC OvCa and PrCa from [25] A lncRNA or mRNA iseliminated if it does not have a corresponding gene symbol in adata set By calculating average expression values of replicate

Module-specific lncRNA-mRNA causal regulatory networks | 5

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

lncRNAs and mRNAs we obtain unique expression value ofthese replicates Consequently we get the matched expressiondata of 9704 lncRNAs and 18 282 mRNAs in 451 GBM 113 LSCC585 OvCa and 150 PrCa samples

Pipeline of MSLCRN

As shown in Figure 2 MSLCRN contains the following threesteps to infer module-specific lncRNAndashmRNA causal regulatorynetworks

i Identification of lncRNAndashmRNA co-expression modulesGiven the matched lncRNA and mRNA expression data weuse WGCNA to generate gene co-expression modules Amodule containing at least two lncRNAs and two mRNAsare regarded as a lncRNAndashmRNA co-expression moduleand used as the input of the second step

ii Identification of module-specific lncRNAndashmRNA causal reg-ulatory networks For each lncRNAndashmRNA co-expressionmodule with each lncRNAndashmRNA pair we apply parallelIDA to estimate the causal effect of the lncRNA on the

Table 2 Public databases for storing lncRNAndashmRNA regulatory relationships

Databases Types of databases Brief descriptions Organisms Available

NPInter [40] Validated A database of experimentally verified func-tional interactions between ncRNAs(including lncRNAs miRNAs etc) and bio-molecules (proteins RNAs and DNAs)

22 organisms httpwwwbioinfoorgNPInter

LncRNADisease [41] Validated A database of experimentally supportedlncRNAndashdisease association data andlncRNAndashtarget interactions in various lev-els including protein RNA miRNA andDNA

Human httpwwwcuilabcnlncrnadisease

LncRNA2Target [42] Validated A database of lncRNAndashtarget regulatory rela-tionships experimentally validated bylncRNA knockdown or overexpression

Human mouse httpwwwlncrna2targetorg

LncReg [43] Validated A database of experimentally validatedlncRNAndashtarget interactions from publicliterature

7 organisms httpbioinformaticsustceducnlncreg

IRNdb [44] Validated A database of immunologically relevantncRNAs (miRNAs lncRNAs and otherncRNAs) and target genes

Human mouse httpcompbiomasseyacnzappsirndb

lncRInter [45] Validated A database of experimentally validatedlncRNAndashtarget interactions extracted frompeer-reviewed publications

15 organisms httpbioinfolifehusteducnlncRInter

starBase [46] Predicted A comprehensive database of systematicallyidentifying the RNAndashRNA and proteinndashRNAinteraction networks from 108 CLIP-Seq(PAR-CLIP HITS-CLIP iCLIP CLASH) datasets

Human httpstarbasesysueducn

lnCaNet [47] Predicted A database of establishing a comprehensiveregulatory network source for lncRNA andcancer genes

Human httplncanetbioinfo-minzhaoorg

BmncRNAdb [48] Predicted A comprehensive database of the silkwormlncRNAs and miRNAs as well as the threeonline tools for users to predict the targetgenes of lncRNAs or miRNAs

Bombyx mori httpgenecqueducnBmncRNAdbindexphp

lncRNAtor [49] Predicted A comprehensive resource of encompassingannotation sequence analysis geneexpression protein binding and phyloge-netic conservation

6 organisms httplncrnatorewhaackr

lncRNome [50] Predicted A comprehensive knowledgebase on thetypes chromosomal locations descriptionon the biological functions and diseaseassociations of lncRNAs

Human httpgenomeigibresinlncRNome

Co-LncRNA [51] Predicted A computationally predicted database toidentify GO annotations and KEGG path-ways affected by co-expressed protein-cod-ing genes of a single or multiple lncRNAs

Human httpwwwbio-bigdatacomCo-LncRNA

LncRNA2Function [52] Predicted A comprehensive resource of investigatingthe functions of lncRNAs based on co-expressed lncRNAndashmRNA interactions

Human httpmlghiteducnlncrna2function

lncRNAMap [53] Predicted An integrated and comprehensive databaseof regulatory functions of lncRNAs and act-ing as ceRNAs

Human httplncRNAMapmbcnctuedutw

6 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

mRNA We use the absolute value of the causal effect(AVCE) to evaluate the strength of the regulation of thelncRNA on the mRNA and a higher AVCE indicates a stron-ger lncRNA regulation The lncRNAndashmRNA pairs with highAVCEs in each module are considered as module-specificlncRNAndashmRNA causal regulatory relationships and we calleach module with these relationships identified a module-specific causal regulatory network

iii Identification of global lncRNAndashmRNA causal regulatorynetwork We integrate the module-specific lncRNAndashmRNA

causal regulatory networks to form the global lncRNAndashmRNA causal regulatory network

Identification of lncRNAndashmRNA co-expression modules

In systems biology WGCNA [21] is a popular method for findingthe correlation patterns among genes across samples and canbe used to identify clusters or modules of highly co-expressedgenes Therefore we use WGCNA to first infer lncRNAndashmRNAco-expression modules

Figure 2 The pipeline of MSLCRN First WGCNA is used to identify lncRNAndashmRNA co-expression modules from matched lncRNA and mRNA expression data Second

we infer lncRNAndashmRNA causal regulatory relationships in each module by using parallel IDA method For each module we assemble the identified lncRNAndashmRNA reg-

ulatory relationships to obtain a module-specific lncRNAndashmRNA causal regulatory network Third the module-specific lncRNAndashmRNA causal regulatory networks are

integrated to form a global lncRNAndashmRNA causal regulatory network

Module-specific lncRNA-mRNA causal regulatory networks | 7

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

Specifically the matched lncRNA and mRNA expressiondata are used as the input of WGCNA For each pair of genesi and j the gene co-expression similarity sij of the pair is definedas

sij frac14 jcorethi jTHORNj (1)

where jcor(i j)j is the absolute value of the Pearson correlationbetween genes i and j The gene co-expression similarity matrixis denoted by Sfrac14 [sij]

To pick an appropriate soft-thresholding power for trans-forming the similarity matrix S into an adjacency matrix A weuse the scale-free topology criterion for soft-thresholding andthe minimum scale free topology fitting index R2 is set as 09Then the topological overlap matrix (TOM) Wfrac14 [wij] is gener-ated based on the adjacency matrix Afrac14 [aij] The TOM similaritywij between genes i and j is defined

wij frac14P

uaiuauj thorn aij

minfP

uaiuP

uaujg thorn 1 aij(2)

where u denotes all genes of the matched lncRNA and mRNAexpression data The TOM dissimilarity between genes i and j isdenoted by dijfrac14 1 - wij To identify gene co-expression modulesthe TOM dissimilarity matrix Dfrac14 [dij] is clustered using optimalhierarchical clustering method [54] Here the identified geneco-expression modules are groups of lncRNAs and mRNAs withhigh topological overlap The lncRNAs and mRNAs of eachlncRNAndashmRNA co-expression module are considered for possi-ble lncRNAndashmRNA causal relationships in the next step

Identification of module-specific lncRNAndashmRNA causalregulatory networks

After the identification of lncRNAndashmRNA co-expression mod-ules we use the parallel IDA method [24] to estimate causaleffects of possible lncRNAndashmRNA causal pairs in each moduleThe application of parallel IDA method to matched lncRNAand mRNA expression data for estimating causal effectsincludes two steps (i) learning the causal structure fromexpression data using the parallel-PC algorithm [24] and(ii) estimating the causal effects of lncRNAs on mRNAs byapplying do-calculus [55]

In step (i) Vfrac14 L1 Lm T1 Tn is a set of random varia-bles denoting m lncRNAs and n mRNAs The causal structure isin the form of a DAG where a node denotes a lncRNA Li ormRNA Tj and an edge between two nodes represents a causalrelationship between them We use the parallel-PC algorithm aparallel version of the PC algorithm [56] to learn the causalstructures (the DAGs) from expression data Starting with a fullyconnected undirected graph the parallel-PC algorithm deter-mines if an edge is retained or removed in the graph by con-ducting conditional independence tests in parallel Then to geta DAG the directions of edges in the obtained graph are ori-ented As different DAGs may represent the same conditionalindependence the parallel-PC algorithm uses a completed par-tially directed acyclic graph (CPDAG) to uniquely describe anequivalence class of DAGs In this work we use the R-packageParallelPC [57] to implement the parallel-PC algorithm and setthe significant level of the conditional independence testsafrac14 001

In step (ii) we are only interested in estimating the causaleffect of the directed edge Li Tj where vertex is Li a parent ofvertex Tj As described above a CPDAG may generate a class ofDAGs For the causal effect of Li Tj in a CPDAG we use do-calculus [55] to estimate the causal effects of Li on Tj in a class ofDAGs Then we use the minimum absolute value of all possiblecausal effects as a final causal effect of Li Tj As for the detailsof how the parallel IDA method is applied to estimate causalrelationships from expression data the readers can refer to [24]

The estimated causal effects can be positive or negativereflecting the up or down regulation by the lncRNAs on themRNAs For the purpose of constructing the regulatory net-works we use the absolute values of the causal effects (AVCEs)to evaluate the strengths of the regulation and thus to confirmthe regulatory relationships

We set different AVCE cutoffs from 010 to 060 with a step of005 to generate MSLCRN networks in GBM LSCC OvCa andPrCa respectively For each cutoff we merge the identifiedMSLCRN networks to obtain global lncRNAndashmRNA causal regu-latory networks in the four human cancers respectively Asshown in Table 3 a higher cutoff selection causes a smallerglobal lncRNAndashmRNA causal regulatory network but bettergoodness of fit To make a trade-off between the size of theglobal lncRNAndashmRNA causal regulatory networks and goodnessof fit we set a compromised AVCE cutoff with a value of 045 Ifthe AVCE of a lncRNA on a mRNA is 045 or above we considerthere is a causal regulatory relationship between the lncRNAndashmRNA pair Under the compromise cutoff we have a moderatesize of the global lncRNAndashmRNA causal regulatory networks inGBM LSCC OvCa and PrCa Meanwhile the node degree distri-butions of four global lncRNAndashmRNA causal regulatory net-works also follow power law distribution (the fitted power curveis in the form of yfrac14 axb) well with R2gt 08

Validation survival and enrichment analysis

Previous studies have demonstrated that about 20 of thenodes in a biological network are essential and are regarded ashub genes [58 59] Therefore when analyzing a global lncRNAndashmRNA causal network we select the 20 of lncRNAs with thehighest degrees in the network as hub lncRNAs The degree of alncRNA node in the global network is the number of mRNAsconnected with it

To validate the predicted module-specific lncRNAndashmRNAcausal regulatory relationships we obtain the experimentallyvalidated lncRNAndashmRNA regulatory relationships from thethree widely used databases NPInter v30 [40] LncRNADiseasev2017 [41] and LncRNA2Target v12 [42] Furthermore we retainexperimentally validated lncRNAndashmRNA regulatory relation-ships associated with the four human cancer data sets asground truth

We perform survival analysis using the R-package survival[60] A multivariate Cox model is used to predict the risk scoreof each tumor sample Then all tumor samples in each cancerdata set are equally divided into high- and low-risk groupsaccording to their risk scores Moreover we calculate theHazard Ratio between the high- and the low-risk groups andperform the Log-rank test

To further investigate the underlying biological processesand pathways related to each of the MSLCRN networks we usethe R-package clusterProfiler [61] to conduct functional enrich-ment analysis on the networks respectively The GeneOntology (GO) [62] biological processes and Kyoto Encyclopedia

8 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

of Genes and Genomes (KEGG) [63] pathways with adjustedP-valuelt005 [adjusted by Benjamini-Hochberg (BH) method]are regarded as functional categories for the MSLCRN networks

We also collect a list of lncRNAs and mRNAs that areassociated with GBM LSCC OvCa and PrCa to study diseaseenrichment of each of the MSLCRN networks The list of disease-associated lncRNAs is obtained from LncRNADisease v2017 [41]Lnc2Cancer v2016 [64] and MNDR v20 [65] The list of disease-associated mRNAs is from DisGeNET v50 [66] To evaluatewhether a MSLCRN network is significantly enriched in a specificdisease we use a hyper-geometric distribution test as follows

p frac14 1 FethxjBNMTHORN frac14 1Xx1

ifrac140

N

i

B N

M i

B

M

(3)

In the formula B is the number of all genes in the expressiondata set N denotes the number of all genes associated with aspecific disease in the expression data set M is the number ofgenes in a MSLCRN network and x is the number of genes asso-ciated with a specific disease in a MSLCRN network A MSLCRNnetwork is significantly enriched in a specific disease if theP-valuelt 005

Network analysis validation and comparisonon MSLCRN networkslncRNAs exhibit dynamic positive gene regulationacross cancers

By following the first step of the MSLCRN method we haveidentified 23 38 45 and 32 lncRNAndashmRNA co-expression mod-ules in GBM LSCC OvCa and PrCa respectively In the secondstep of the MSLCRN method we eliminate the noncausallncRNAndashmRNA pairs in lncRNAndashmRNA co-expression modulesAs a result we generate 23 38 45 and 32 module-specificlncRNAndashmRNA causal regulatory networks in GBM LSCC OvCaand PrCa respectively After merging the module-specificlncRNAndashmRNA causal regulatory networks for each data set weobtain the four global lncRNAndashmRNA regulatory networks inGBM LSCC OvCa and PrCa respectively

To understand the overlap and difference of module-specificgenes module-specific lncRNAndashmRNA causal regulatory rela-tionships and module-specific hub lncRNAs in the four humancancers we generate three set intersection plots using theR-package UpSetR [67] As shown in Figure 3 we find that themajority of module-specific genes (5752) module-specificlncRNAndashmRNA causal regulatory relationships (9902) andmodule-specific hub lncRNAs (8922) tend to be cancer-specific Only a small portion of module-specific genes (396) andmodule-specific lncRNAndashmRNA causal regulatory relationships(6) are shared by the four cancers Especially none of themodule-specific hub lncRNAs are common between the fourcancers In addition the causal effects are positive for 99569672 9993 and 7863 of the causal regulatory relationshipsidentified in GBM LSCC OvCa and PrCa respectively Theseresults indicate that lncRNAs are more likely to exhibit dynamicpositive gene regulation across cancers The results are alsoconsistent with the proposition that the positive gene regula-tion by lncRNAs would be desired in specific situations [68]

Differential network analysis uncovers cancer-specificlncRNAndashmRNA causal networks

In this section we focus on studying cancer-specific lncRNAndashmRNA causal networks using differential network analysisThus the GBM-specific LSCC-specific OvCa-specific and PrCa-specific lncRNAndashmRNA causal networks are identified Asshown in Figure 4A the distributions of node degrees in thesefour cancer-specific lncRNAndashmRNA causal networks followpower law distributions well with R2frac14 09774 09923 09723and 08310 respectively Thus these four cancer-specificlncRNAndashmRNA causal networks are scale free indicating thatmost mRNAs are regulated by a small number of lncRNAs

Table 3 Degree distributions of global lncRNAndashmRNA causal regula-tory networks with different cutoffs in GBM LSCC OvCa and PrCa

Datasets Cutoffs Number of causalregulations

yfrac14axb R2

GBM 010 11 847 yfrac142274x06893 04161015 10 924 yfrac142495x07275 05460020 9732 yfrac142745x0767 06475025 8461 yfrac142958x08074 06757030 7176 yfrac143194x08319 06807035 6041 yfrac143363x08703 07203040 4997 yfrac143741x09348 07999045 4074 y54082x21034 08694050 3279 yfrac144194x118 09244055 2583 yfrac143896x1259 09463060 1862 yfrac143666x143 09792

LSCC 010 789 172 yfrac143143x06071 04829015 684 524 yfrac143475x06323 05841020 569 369 yfrac143905x06525 06578025 451 346 yfrac144855x06928 07789030 340 860 yfrac146341x07554 08796035 244 547 yfrac148147x08379 09504040 166 593 yfrac149724x0935 09848045 108 024 y51031x21018 09933050 66 335 yfrac149425x1068 09963055 37 632 yfrac147807x1089 09948060 19 547 yfrac146565x1169 09972

OvCa 010 333 146 yfrac143272x05928 05042015 232 794 yfrac144192x06262 06531020 159 872 yfrac146398x07216 08247025 112 792 yfrac148816x08356 09120030 80 808 yfrac141008x09472 09551035 57 099 yfrac149545x1014 09744040 38 517 yfrac148198x1066 09748045 24 439 y56575x21066 09697050 14 435 yfrac14540x1079 09551055 7973 yfrac144368x1107 09319060 4026 yfrac143285x1107 09460

PrCa 010 1 894 322 yfrac143089x06245 02750015 1 749 595 yfrac143586x06787 03582020 1 594 744 yfrac144013x07169 04316025 1 429 858 yfrac144271x0732 04919030 1 260 968 yfrac144389x07244 05616035 1 097 654 yfrac144406x0702 06470040 946 439 yfrac144485x06816 07338045 812 687 y55175x207005 08206050 694 558 yfrac14667x07588 08823055 584 834 yfrac148833x08469 09332060 474 654 yfrac141113x09684 09503

Note The AVCE cutoffs range from 010 to 060 with a step of 005

The bold values are the degree distributions of global lncRNA-mRNA causal reg-

ulatory networks with a compromised AVCE cutoff (045) in four human cancers

Module-specific lncRNA-mRNA causal regulatory networks | 9

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

Next we use four lists of lncRNAs and mRNAs associatedwith GBM LSCC OvCa and PrCa to discover lncRNAndashmRNAcausal networks that are associated with the four human can-cers We define that cancer-related lncRNAndashmRNA causal regu-latory relationships are those in which at least one regulatoryparty is cancer-related lncRNA or mRNA As a result wehave extracted GBM-related LSCC-related OvCa-related andPrCa-related lncRNAndashmRNA causal networks from the fourcancer-specific lncRNAndashmRNA causal networks (details inSupplementary File S1) To understand the potential biologicalprocesses and pathways of the four cancer-related lncRNAndashmRNA causal networks we identify significant GO biologicalprocesses and KEGG pathways using functional enrichmentanalysis In Figure 4B several top GO biological processes and

KEGG pathways such as cytokine activity [69] G-proteincoupled receptor binding [70] TNF signaling pathway [71] cAMPsignaling pathway [72] pathways in cancer are closely associ-ated with the occurrence and development of cancer Thisresult suggests that the identified cancer-related lncRNAndashmRNA causal networks may be involved in the occurrence anddevelopment of human cancer

Conservative network analysis highlights a corelncRNAndashmRNA causal regulatory network acrosshuman cancers

Although most of the lncRNAndashmRNA causal regulatory relation-ships are cancer-specific there are still a number of common

396 424

106 115

1370

133 173

1599

126

1729

1224

252

2523

2157

5081

0

2000

4000

Inte

rsec

tion

Siz

e

PrCa

Ov Ca

LSCC

GBM

025

0050

0075

00

1000

0

Set Size

6 283

13 1 80 673

206

3969

76 3493

407

2816

9950

7

1948

7

0

250000

500000

750000

Inte

rsec

tion

Siz

e

PrCa

Ov Ca

LSCC

GBM

0e+0

0

2e+0

5

4e+0

5

6e+0

5

8e+0

5

Set Size

6 1 2 933

2

41

6 11

246

47

524

0

200

400In

ters

ectio

n S

ize

PrCa

Ov Ca

LSCC

GBM

0200

400

600

Set Size

Module-specific genes

Module-specific causal regulations Module-specific hub lncRNAs 808611

A

B C

Figure 3 Overlap and difference of module-specific genes module-specific causal regulations and module-specific hub lncRNAs across GBM LSCC OvCa and PrCa

(A) Module-specific genes (both lncRNAs and mRNAs) intersection plot (B) Module-specific causal regulations intersection plot (C) Module-specific hub lncRNAs inter-

section plot The red lines denote common genes and causal regulations across GBM LSCC OvCa and PrCa

10 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

causal regulatory relationships between the four global net-works To evaluate whether there is a common core of lncRNAndashmRNA causal regulatory relationships in the global regulatorynetworks across human cancers we concentrate on the con-served lncRNAndashmRNA causal regulatory relationships thatexisted in at least three human cancers

As shown in Figure 5A the majority of the conservedlncRNAndashmRNA causal regulatory relationships form a closely

connected community This finding indicates that the con-served lncRNAndashmRNA causal regulatory network may be a corenetwork across human cancers

The survival analysis shows that the lncRNAs and mRNAs inthe core network can significantly distinguish the metastasisrisks between the high- and low-risk groups in GBM OvCa andPrCa data sets (Figure 5B) This result suggests that the core net-work may act as a common network biomarker of GBM OvCa

Cancer-specific networks Causal regulations y=axb R2

GBM-specific 2816 y=4812x-1292 09774

LSCC-specific 99507 y=1034x-1014 09923

OvCa-specific 19487 y=7366x-1156 09723

PrCa-specific 808611 y=5243x-07055 08310

1 10 100 11001

10

100

1100

Degree of genes

Nu

mb

er o

f ge

nes

GBM-specific fitting curveLSCC-specific fitting curveOvCa-specific fitting curvePrCa-specific fitting curveGBM-specific degree distributionLSCC-specific degree distributionOvCa-specific degree distributionPrCa-specific degree distribution

solutecation symporter activitysymporter activity

gated channel activitysodium ion transmembrane transporter activity

ion channel activitysubstrate-specific channel activity

channel activitypassive transmembrane transporter activity

growth factor activitycation channel activity

metal ion transmembrane transporter activitycollagen bindingheparin binding

sulfur compound bindingintegrin binding

glycosaminoglycan bindingextracellular matrix binding

peptide receptor activityG-protein coupled peptide receptor activity

chemokine bindingG-protein coupled receptor binding

growth factor bindingcytokine binding

serine-type endopeptidase activitydeath receptor activity

tumor necrosis factor-activated receptor activitycytokine receptor activity

protein heterodimerization activitydipeptidase activity

glycoprotein bindingRAGE receptor binding

cytokine receptor bindingcytokine activity

GBM(203)

LSCC(1015)

OvCa(479)

PrCa(3019)

001

002

003

004

padjust

GeneRatio002

004

006

008

Taste transduction

ECM-receptor interaction

cAMP signaling pathway

Calcium signaling pathway

Neuroactive ligand-receptor interaction

PI3K-Akt signaling pathway

Pathways in cancerRegulation of actin cytoskeleton

Complement and coagulation cascades

AGE-RAGE signaling pathway in diabetic complications

Hematopoietic cell lineage

Th17 cell differentiation

Inflammatory bowel disease (IBD)

Malaria

Osteoclast differentiation

Influenza A

Tuberculosis

Intestinal immune network for IgA production

Chagas disease (American trypanosomiasis)

Leishmaniasis

TNF signaling pathway

Toll-like receptor signaling pathway

Rheumatoid arthritis

Cytokine-cytokine receptor interaction

GBM(129)

LSCC(500)

OvCa(266)

PrCa(1364)

GeneRatio

005

010

015

001

002

003

004padjust

GO enrichment analysis KEGG enrichment analysis

A

B

Figure 4 Differential network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) Degree distribution of cancer-specific lncRNAndashmRNA

causal networks in GBM LSCC OvCa and PrCa (B) Functional enrichment analysis of cancer-related lncRNAndashmRNA causal networks in GBM LSCC OvCa and PrCa

Module-specific lncRNA-mRNA causal regulatory networks | 11

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

and PrCa In Figure 5B we also find that the core network con-tains several cancer genes (34 26 30 and 38 cancer genes asso-ciated with GBM LSCC OvCa and PrCa respectively)

By conducting GO and KEGG enrichment analysis we findthat the core network is significantly enriched in 399 GO biologi-cal processes and 3 KEGG pathways (details in SupplementaryFile S2) Of the 399 GO biological processes 2 GO terms includ-ing negative regulation of cell adhesion (GO 0007162) and cyto-kine production in immune response (GO 0002367) areinvolved in three cancer hallmarks Tissue Invasion andMetastasis Tumor Promoting Inflammation and EvadingImmune Detection [73] This observation implies that the corenetwork may control these cancer-related hallmarks

Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

We divide the hub lncRNAs into two categories (1) conserved hublncRNAs which exist in at least three human cancers and (2)cancer-specific hub lncRNAs which only exist in single humancancer As a result we obtain 9 conserved hub lncRNAs and 828cancer-specific hub lncRNAs (include 11 GBM-specific 246 LSCC-specific 47 OvCa-specific and 524 PrCa-specific hub lncRNAs)

To evaluate whether the hub lncRNAs can distinguish meta-stasis risks of human cancers we use them to predict metasta-sis risks for tumor samples in GBM LSCC OvCa and PrCaAs shown in Figure 6A the conserved hub lncRNAs can discrim-inate the metastasis risks of tumor samples significantly(Log-rank P-valuelt 005) in four human cancers In Figure 6Bexcepting LSCC-specific hub lncRNAs owing to failing to fit aCox regression model GBM-specific OvCa-specific and PrCa-specific hub lncRNAs can discriminate the metastasis risks oftumor samples significantly in GBM OvCa and PrCa respec-tively (Log-rank P-valuelt 005) These results suggest that thehub lncRNAs are discriminative and can act as biomarkers todistinguish between high- and low-risk tumor samples

Experimentally validated lncRNAndashmRNA regulations aremostly bad hits for LncTar

Using a collection of experimentally validated lncRNAndashmRNAregulatory relationships (details in Supplementary File S3) asthe ground truth the numbers of experimentally confirmedlncRNAndashmRNA causal regulations are 17 14 20 and 42 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4)

Figure 5 Conservative network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) The core lncRNAndashmRNA causal network that

occurred in at least three human cancers The red diamond nodes and white circle nodes denote lncRNAs and mRNAs respectively (B) Survival analysis of the core

lncRNAndashmRNA causal network

12 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as

lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations

A

B

Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs

Module-specific lncRNA-mRNA causal regulatory networks | 13

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

MSLCRN networks are biologically meaningful

In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks

We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases

Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful

Comparison with other PC-based networkinference methods

Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045

We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers

Conclusions and discussion

Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions

As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological

functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks

Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs

In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers

Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and

Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks

Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)

MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)

Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-

ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-

ciated MSLCRN networks

14 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs

Key Points

bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers

bull lncRNAs exhibit dynamic positive gene regulationacross human cancers

bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships

bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms

Supplementary Data

Supplementary data are available online at httpsacademicoupcombib

Funding

The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)

References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding

RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5

2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69

3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63

4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042

5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30

6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4

7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82

8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63

9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56

10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6

11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12

12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74

13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89

14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22

15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156

16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78

17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683

18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15

19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731

20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13

21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559

22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64

23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8

24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526

25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13

26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3

27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82

28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19

29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46

Module-specific lncRNA-mRNA causal regulatory networks | 15

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60

31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016

32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12

33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892

34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408

35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51

36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32

37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305

38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15

39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99

40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057

41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6

42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6

43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083

44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138

45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8

46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7

47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7

48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370

49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5

50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034

51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082

52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2

53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9

54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17

55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000

56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000

57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1

58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6

59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910

60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000

61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7

62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9

63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30

64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5

65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765

66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9

67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40

68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46

69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52

70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94

71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88

16 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58

73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74

74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104

75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5

76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31

77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90

Module-specific lncRNA-mRNA causal regulatory networks | 17

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats

  • bby008-TF1
  • bby008-TF51
  • bby008-TF2
Page 6: Inferring and analyzing module-specific lncRNA-mRNA causal ...nugget.unisa.edu.au/Thuc/Briefings2019JP.pdf · Thuc Duy Le is a research fellow at the University of South Australia

The above methods simply use matched lncRNA and mRNAexpression data to identify lncRNAndashmRNA co-expression pairsTo identify lsquocis-regulated target genesrsquo of lncRNAs some methodsalso consider mRNA loci information within lncRNA For exampleFu et al [36] combine mRNA loci information and matchedlncRNA and mRNA expression data to predict lncRNA targetsThey identify the mRNAs as targets under two conditions (i) themRNA loci are within a 300-kb window up- or downstream oflncRNA and (ii) lncRNAndashmRNA co-expression pairs are signifi-cantly positive correlated (Pearson correlationgt 08 and the corre-sponding P-valuelt 005) Zhang et al [37] also use a similarmethod to Fu et al [36] for identifying lncRNA targets The mRNAscan be regarded as targets when (1) the mRNA loci are within a10 window up- or downstream of lncRNA and (2) lncRNAndashmRNAco-expression pairs are significantly positive correlated (Pearsoncorrelationgt 098 and the corresponding P-valuelt 005)

Apart from mRNA loci information within lncRNA someemerging methods consider predictions from sequence-basedmethods as putative lncRNAndashmRNA interactions For exampleIwakiri et al [38] integrate tissue-specific lncRNA and mRNAexpression data into predictions from a sequence-basedmethod in [32] They discover that integrating tissue specificitycan improve prediction accuracy of lncRNAndashmRNA interactionsLv et al [39] also combine matched lncRNA and mRNA expres-sion data with predictions from a sequence-based methodLncTar [11] They first use Pearson method to identify co-expressed lncRNAndashmRNA co-expression pairs with Pearsoncorrelationgt095 orlt095 Then LncTar is used to further filterthe identified lncRNAndashmRNA co-expression pairs

Public databases for storing lncRNAndashmRNAregulatory relationships

In this section we review the public databases of storinglncRNAndashmRNA regulatory relationships Table 2 shows a sum-mary of the third-party public databases including experimen-tally validated and computationally predicted databases

NPInter [40] contains experimentally validated interactionsbetween ncRNAs especially lncRNAs and miRNAs The databasecontains 915 067 interactions in 188 tissues or cell lines from 68kinds of experimental technologies There is a classification ofthe functional interactions based on the functional process thatncRNA is involved in Moreover NPInter allows users to searchinteractions related publications and other information

LncRNADisease [41] not only collects experimentally sup-ported lncRNAndashdisease associations and lncRNA interactionsbut also predicts novel lncRNAndashdisease associations Recentlythe database curates 478 entries of experimentally validatedlncRNA interactions LncRNADisease provides users severalways to search lncRNA-related diseases and interactions

To study differentially expressed genes after lncRNA knock-down or overexpression Jiang et al [42] develop a databasecalled LncRNA2Target in human and mouse organisms Thedatabase has a collection of 396 experimentally validatedlncRNAndashtarget interactions In LncRNA2Target if a gene is dif-ferentially expressed after lncRNA knockdown or overexpres-sion it is regarded as a target of a lncRNA For convenienceLncRNA2Target allows users to search for the targets of singlelncRNA or for the lncRNAs that target a specific geneMeanwhile Zhou et al [43] also build a reference resourceLncReg for lncRNA-related regulatory networks The databasehas 1081 experimentally validated lncRNA-related regulatory

records between 258 nonredundant lncRNAs and 571 nonredun-dant genes

IRNdb [44] is a database that focuses on collecting immuno-logically relevant lncRNAndashtarget miRNAndashtarget and PIWI-interacting RNAndashtarget interactions The current version ofIRNdb documents 22 453 immunologically relevant lncRNAndashtar-get interactions by integrating three databases LncRNADisease[41] LncRNA2Target [42] and LncReg [43] The aim is to helpresearchers study the roles of ncRNAs in the immune systemRecently a new experimentally validated database namedlncRInter [45] was developed to collect reliable and high-qualitylncRNAndashtarget interactions The extracted lncRNAndashtarget inter-actions are all from published literature and are supported bycertain biological experiments (eg luciferase reporter assayin vitro binding assay RNA pull-down) In total lncRInter con-tains 1036 experimentally validated lncRNAndashtarget interactionsin 15 organisms

In addition to the experimentally validated databases pre-sented above there are several computationally predicted data-bases for collecting lncRNAndashmRNA interactions For instancestarBase [46] is a comprehensive database of systematicallyidentifying the RNAndashRNA and proteinndashRNA interaction net-works from 108 CLIP-Seq (PAR-CLIP HITS-CLIP iCLIP CLASH)data sets The lncRNAndashmRNA interactions can be extractedfrom proteinndashRNA interaction networks InCaNet [47] aimsto establish a comprehensive regulatory network betweenlncRNAs and cancer genes They identify lncRNAndashcancergene interactions by computing gene co-expression betweenlncRNAs and cancer genes BmncRNAdb [48] is a comprehensivedatabase of silkworm lncRNAs and miRNAs The database pro-vides three online tools for users to predict both lncRNAndashtargetand miRNAndashtarget interactions lncRNAtor [49] collect expres-sion data from 243 RNA-seq experiments including 5237samples of various tissues and developmental stages ThelncRNAndashmRNA co-expression pairs are identified through co-expression analysis of lncRNAs and mRNAs lncRNome [50] is acomprehensive knowledgebase of sequence structure biologi-cal functions genomic variations and epigenetic modificationson gt17 000 lncRNAs in human For lncRNAndashprotein interactionsthe database incorporates PAR-CLIP experiments and a supportvector machine-based prediction method Co-lncRNA [51] andLncRNA2Function [52] predict co-expressed lncRNAndashmRNAinteractions from RNA-Seq data and further annotates thepotential functions of human lncRNAs using functional enrich-ment analysis lncRNAMap [53] is an integrated and compre-hensive database to explore regulatory functions of humanlncRNAs By integrating small RNAs supported by publicly avail-able deep sequencing data lncRNAMap construct lncRNA-derived siRNAndashtarget interactions

In summary for experimentally validated databases userscan select individual database or combine several databases asground truth to validate the predicted lncRNAndashmRNA interac-tions As for computationally predicted databases they can beused as initial structural of sequence-based or expression-basedmethods to identify lncRNAndashmRNA interactions

Inferring and analyzing MSLCRN networksRepurposed microarray data across human cancers

We collect the repurposed lncRNA and mRNA expression dataof GBM LSCC OvCa and PrCa from [25] A lncRNA or mRNA iseliminated if it does not have a corresponding gene symbol in adata set By calculating average expression values of replicate

Module-specific lncRNA-mRNA causal regulatory networks | 5

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

lncRNAs and mRNAs we obtain unique expression value ofthese replicates Consequently we get the matched expressiondata of 9704 lncRNAs and 18 282 mRNAs in 451 GBM 113 LSCC585 OvCa and 150 PrCa samples

Pipeline of MSLCRN

As shown in Figure 2 MSLCRN contains the following threesteps to infer module-specific lncRNAndashmRNA causal regulatorynetworks

i Identification of lncRNAndashmRNA co-expression modulesGiven the matched lncRNA and mRNA expression data weuse WGCNA to generate gene co-expression modules Amodule containing at least two lncRNAs and two mRNAsare regarded as a lncRNAndashmRNA co-expression moduleand used as the input of the second step

ii Identification of module-specific lncRNAndashmRNA causal reg-ulatory networks For each lncRNAndashmRNA co-expressionmodule with each lncRNAndashmRNA pair we apply parallelIDA to estimate the causal effect of the lncRNA on the

Table 2 Public databases for storing lncRNAndashmRNA regulatory relationships

Databases Types of databases Brief descriptions Organisms Available

NPInter [40] Validated A database of experimentally verified func-tional interactions between ncRNAs(including lncRNAs miRNAs etc) and bio-molecules (proteins RNAs and DNAs)

22 organisms httpwwwbioinfoorgNPInter

LncRNADisease [41] Validated A database of experimentally supportedlncRNAndashdisease association data andlncRNAndashtarget interactions in various lev-els including protein RNA miRNA andDNA

Human httpwwwcuilabcnlncrnadisease

LncRNA2Target [42] Validated A database of lncRNAndashtarget regulatory rela-tionships experimentally validated bylncRNA knockdown or overexpression

Human mouse httpwwwlncrna2targetorg

LncReg [43] Validated A database of experimentally validatedlncRNAndashtarget interactions from publicliterature

7 organisms httpbioinformaticsustceducnlncreg

IRNdb [44] Validated A database of immunologically relevantncRNAs (miRNAs lncRNAs and otherncRNAs) and target genes

Human mouse httpcompbiomasseyacnzappsirndb

lncRInter [45] Validated A database of experimentally validatedlncRNAndashtarget interactions extracted frompeer-reviewed publications

15 organisms httpbioinfolifehusteducnlncRInter

starBase [46] Predicted A comprehensive database of systematicallyidentifying the RNAndashRNA and proteinndashRNAinteraction networks from 108 CLIP-Seq(PAR-CLIP HITS-CLIP iCLIP CLASH) datasets

Human httpstarbasesysueducn

lnCaNet [47] Predicted A database of establishing a comprehensiveregulatory network source for lncRNA andcancer genes

Human httplncanetbioinfo-minzhaoorg

BmncRNAdb [48] Predicted A comprehensive database of the silkwormlncRNAs and miRNAs as well as the threeonline tools for users to predict the targetgenes of lncRNAs or miRNAs

Bombyx mori httpgenecqueducnBmncRNAdbindexphp

lncRNAtor [49] Predicted A comprehensive resource of encompassingannotation sequence analysis geneexpression protein binding and phyloge-netic conservation

6 organisms httplncrnatorewhaackr

lncRNome [50] Predicted A comprehensive knowledgebase on thetypes chromosomal locations descriptionon the biological functions and diseaseassociations of lncRNAs

Human httpgenomeigibresinlncRNome

Co-LncRNA [51] Predicted A computationally predicted database toidentify GO annotations and KEGG path-ways affected by co-expressed protein-cod-ing genes of a single or multiple lncRNAs

Human httpwwwbio-bigdatacomCo-LncRNA

LncRNA2Function [52] Predicted A comprehensive resource of investigatingthe functions of lncRNAs based on co-expressed lncRNAndashmRNA interactions

Human httpmlghiteducnlncrna2function

lncRNAMap [53] Predicted An integrated and comprehensive databaseof regulatory functions of lncRNAs and act-ing as ceRNAs

Human httplncRNAMapmbcnctuedutw

6 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

mRNA We use the absolute value of the causal effect(AVCE) to evaluate the strength of the regulation of thelncRNA on the mRNA and a higher AVCE indicates a stron-ger lncRNA regulation The lncRNAndashmRNA pairs with highAVCEs in each module are considered as module-specificlncRNAndashmRNA causal regulatory relationships and we calleach module with these relationships identified a module-specific causal regulatory network

iii Identification of global lncRNAndashmRNA causal regulatorynetwork We integrate the module-specific lncRNAndashmRNA

causal regulatory networks to form the global lncRNAndashmRNA causal regulatory network

Identification of lncRNAndashmRNA co-expression modules

In systems biology WGCNA [21] is a popular method for findingthe correlation patterns among genes across samples and canbe used to identify clusters or modules of highly co-expressedgenes Therefore we use WGCNA to first infer lncRNAndashmRNAco-expression modules

Figure 2 The pipeline of MSLCRN First WGCNA is used to identify lncRNAndashmRNA co-expression modules from matched lncRNA and mRNA expression data Second

we infer lncRNAndashmRNA causal regulatory relationships in each module by using parallel IDA method For each module we assemble the identified lncRNAndashmRNA reg-

ulatory relationships to obtain a module-specific lncRNAndashmRNA causal regulatory network Third the module-specific lncRNAndashmRNA causal regulatory networks are

integrated to form a global lncRNAndashmRNA causal regulatory network

Module-specific lncRNA-mRNA causal regulatory networks | 7

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

Specifically the matched lncRNA and mRNA expressiondata are used as the input of WGCNA For each pair of genesi and j the gene co-expression similarity sij of the pair is definedas

sij frac14 jcorethi jTHORNj (1)

where jcor(i j)j is the absolute value of the Pearson correlationbetween genes i and j The gene co-expression similarity matrixis denoted by Sfrac14 [sij]

To pick an appropriate soft-thresholding power for trans-forming the similarity matrix S into an adjacency matrix A weuse the scale-free topology criterion for soft-thresholding andthe minimum scale free topology fitting index R2 is set as 09Then the topological overlap matrix (TOM) Wfrac14 [wij] is gener-ated based on the adjacency matrix Afrac14 [aij] The TOM similaritywij between genes i and j is defined

wij frac14P

uaiuauj thorn aij

minfP

uaiuP

uaujg thorn 1 aij(2)

where u denotes all genes of the matched lncRNA and mRNAexpression data The TOM dissimilarity between genes i and j isdenoted by dijfrac14 1 - wij To identify gene co-expression modulesthe TOM dissimilarity matrix Dfrac14 [dij] is clustered using optimalhierarchical clustering method [54] Here the identified geneco-expression modules are groups of lncRNAs and mRNAs withhigh topological overlap The lncRNAs and mRNAs of eachlncRNAndashmRNA co-expression module are considered for possi-ble lncRNAndashmRNA causal relationships in the next step

Identification of module-specific lncRNAndashmRNA causalregulatory networks

After the identification of lncRNAndashmRNA co-expression mod-ules we use the parallel IDA method [24] to estimate causaleffects of possible lncRNAndashmRNA causal pairs in each moduleThe application of parallel IDA method to matched lncRNAand mRNA expression data for estimating causal effectsincludes two steps (i) learning the causal structure fromexpression data using the parallel-PC algorithm [24] and(ii) estimating the causal effects of lncRNAs on mRNAs byapplying do-calculus [55]

In step (i) Vfrac14 L1 Lm T1 Tn is a set of random varia-bles denoting m lncRNAs and n mRNAs The causal structure isin the form of a DAG where a node denotes a lncRNA Li ormRNA Tj and an edge between two nodes represents a causalrelationship between them We use the parallel-PC algorithm aparallel version of the PC algorithm [56] to learn the causalstructures (the DAGs) from expression data Starting with a fullyconnected undirected graph the parallel-PC algorithm deter-mines if an edge is retained or removed in the graph by con-ducting conditional independence tests in parallel Then to geta DAG the directions of edges in the obtained graph are ori-ented As different DAGs may represent the same conditionalindependence the parallel-PC algorithm uses a completed par-tially directed acyclic graph (CPDAG) to uniquely describe anequivalence class of DAGs In this work we use the R-packageParallelPC [57] to implement the parallel-PC algorithm and setthe significant level of the conditional independence testsafrac14 001

In step (ii) we are only interested in estimating the causaleffect of the directed edge Li Tj where vertex is Li a parent ofvertex Tj As described above a CPDAG may generate a class ofDAGs For the causal effect of Li Tj in a CPDAG we use do-calculus [55] to estimate the causal effects of Li on Tj in a class ofDAGs Then we use the minimum absolute value of all possiblecausal effects as a final causal effect of Li Tj As for the detailsof how the parallel IDA method is applied to estimate causalrelationships from expression data the readers can refer to [24]

The estimated causal effects can be positive or negativereflecting the up or down regulation by the lncRNAs on themRNAs For the purpose of constructing the regulatory net-works we use the absolute values of the causal effects (AVCEs)to evaluate the strengths of the regulation and thus to confirmthe regulatory relationships

We set different AVCE cutoffs from 010 to 060 with a step of005 to generate MSLCRN networks in GBM LSCC OvCa andPrCa respectively For each cutoff we merge the identifiedMSLCRN networks to obtain global lncRNAndashmRNA causal regu-latory networks in the four human cancers respectively Asshown in Table 3 a higher cutoff selection causes a smallerglobal lncRNAndashmRNA causal regulatory network but bettergoodness of fit To make a trade-off between the size of theglobal lncRNAndashmRNA causal regulatory networks and goodnessof fit we set a compromised AVCE cutoff with a value of 045 Ifthe AVCE of a lncRNA on a mRNA is 045 or above we considerthere is a causal regulatory relationship between the lncRNAndashmRNA pair Under the compromise cutoff we have a moderatesize of the global lncRNAndashmRNA causal regulatory networks inGBM LSCC OvCa and PrCa Meanwhile the node degree distri-butions of four global lncRNAndashmRNA causal regulatory net-works also follow power law distribution (the fitted power curveis in the form of yfrac14 axb) well with R2gt 08

Validation survival and enrichment analysis

Previous studies have demonstrated that about 20 of thenodes in a biological network are essential and are regarded ashub genes [58 59] Therefore when analyzing a global lncRNAndashmRNA causal network we select the 20 of lncRNAs with thehighest degrees in the network as hub lncRNAs The degree of alncRNA node in the global network is the number of mRNAsconnected with it

To validate the predicted module-specific lncRNAndashmRNAcausal regulatory relationships we obtain the experimentallyvalidated lncRNAndashmRNA regulatory relationships from thethree widely used databases NPInter v30 [40] LncRNADiseasev2017 [41] and LncRNA2Target v12 [42] Furthermore we retainexperimentally validated lncRNAndashmRNA regulatory relation-ships associated with the four human cancer data sets asground truth

We perform survival analysis using the R-package survival[60] A multivariate Cox model is used to predict the risk scoreof each tumor sample Then all tumor samples in each cancerdata set are equally divided into high- and low-risk groupsaccording to their risk scores Moreover we calculate theHazard Ratio between the high- and the low-risk groups andperform the Log-rank test

To further investigate the underlying biological processesand pathways related to each of the MSLCRN networks we usethe R-package clusterProfiler [61] to conduct functional enrich-ment analysis on the networks respectively The GeneOntology (GO) [62] biological processes and Kyoto Encyclopedia

8 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

of Genes and Genomes (KEGG) [63] pathways with adjustedP-valuelt005 [adjusted by Benjamini-Hochberg (BH) method]are regarded as functional categories for the MSLCRN networks

We also collect a list of lncRNAs and mRNAs that areassociated with GBM LSCC OvCa and PrCa to study diseaseenrichment of each of the MSLCRN networks The list of disease-associated lncRNAs is obtained from LncRNADisease v2017 [41]Lnc2Cancer v2016 [64] and MNDR v20 [65] The list of disease-associated mRNAs is from DisGeNET v50 [66] To evaluatewhether a MSLCRN network is significantly enriched in a specificdisease we use a hyper-geometric distribution test as follows

p frac14 1 FethxjBNMTHORN frac14 1Xx1

ifrac140

N

i

B N

M i

B

M

(3)

In the formula B is the number of all genes in the expressiondata set N denotes the number of all genes associated with aspecific disease in the expression data set M is the number ofgenes in a MSLCRN network and x is the number of genes asso-ciated with a specific disease in a MSLCRN network A MSLCRNnetwork is significantly enriched in a specific disease if theP-valuelt 005

Network analysis validation and comparisonon MSLCRN networkslncRNAs exhibit dynamic positive gene regulationacross cancers

By following the first step of the MSLCRN method we haveidentified 23 38 45 and 32 lncRNAndashmRNA co-expression mod-ules in GBM LSCC OvCa and PrCa respectively In the secondstep of the MSLCRN method we eliminate the noncausallncRNAndashmRNA pairs in lncRNAndashmRNA co-expression modulesAs a result we generate 23 38 45 and 32 module-specificlncRNAndashmRNA causal regulatory networks in GBM LSCC OvCaand PrCa respectively After merging the module-specificlncRNAndashmRNA causal regulatory networks for each data set weobtain the four global lncRNAndashmRNA regulatory networks inGBM LSCC OvCa and PrCa respectively

To understand the overlap and difference of module-specificgenes module-specific lncRNAndashmRNA causal regulatory rela-tionships and module-specific hub lncRNAs in the four humancancers we generate three set intersection plots using theR-package UpSetR [67] As shown in Figure 3 we find that themajority of module-specific genes (5752) module-specificlncRNAndashmRNA causal regulatory relationships (9902) andmodule-specific hub lncRNAs (8922) tend to be cancer-specific Only a small portion of module-specific genes (396) andmodule-specific lncRNAndashmRNA causal regulatory relationships(6) are shared by the four cancers Especially none of themodule-specific hub lncRNAs are common between the fourcancers In addition the causal effects are positive for 99569672 9993 and 7863 of the causal regulatory relationshipsidentified in GBM LSCC OvCa and PrCa respectively Theseresults indicate that lncRNAs are more likely to exhibit dynamicpositive gene regulation across cancers The results are alsoconsistent with the proposition that the positive gene regula-tion by lncRNAs would be desired in specific situations [68]

Differential network analysis uncovers cancer-specificlncRNAndashmRNA causal networks

In this section we focus on studying cancer-specific lncRNAndashmRNA causal networks using differential network analysisThus the GBM-specific LSCC-specific OvCa-specific and PrCa-specific lncRNAndashmRNA causal networks are identified Asshown in Figure 4A the distributions of node degrees in thesefour cancer-specific lncRNAndashmRNA causal networks followpower law distributions well with R2frac14 09774 09923 09723and 08310 respectively Thus these four cancer-specificlncRNAndashmRNA causal networks are scale free indicating thatmost mRNAs are regulated by a small number of lncRNAs

Table 3 Degree distributions of global lncRNAndashmRNA causal regula-tory networks with different cutoffs in GBM LSCC OvCa and PrCa

Datasets Cutoffs Number of causalregulations

yfrac14axb R2

GBM 010 11 847 yfrac142274x06893 04161015 10 924 yfrac142495x07275 05460020 9732 yfrac142745x0767 06475025 8461 yfrac142958x08074 06757030 7176 yfrac143194x08319 06807035 6041 yfrac143363x08703 07203040 4997 yfrac143741x09348 07999045 4074 y54082x21034 08694050 3279 yfrac144194x118 09244055 2583 yfrac143896x1259 09463060 1862 yfrac143666x143 09792

LSCC 010 789 172 yfrac143143x06071 04829015 684 524 yfrac143475x06323 05841020 569 369 yfrac143905x06525 06578025 451 346 yfrac144855x06928 07789030 340 860 yfrac146341x07554 08796035 244 547 yfrac148147x08379 09504040 166 593 yfrac149724x0935 09848045 108 024 y51031x21018 09933050 66 335 yfrac149425x1068 09963055 37 632 yfrac147807x1089 09948060 19 547 yfrac146565x1169 09972

OvCa 010 333 146 yfrac143272x05928 05042015 232 794 yfrac144192x06262 06531020 159 872 yfrac146398x07216 08247025 112 792 yfrac148816x08356 09120030 80 808 yfrac141008x09472 09551035 57 099 yfrac149545x1014 09744040 38 517 yfrac148198x1066 09748045 24 439 y56575x21066 09697050 14 435 yfrac14540x1079 09551055 7973 yfrac144368x1107 09319060 4026 yfrac143285x1107 09460

PrCa 010 1 894 322 yfrac143089x06245 02750015 1 749 595 yfrac143586x06787 03582020 1 594 744 yfrac144013x07169 04316025 1 429 858 yfrac144271x0732 04919030 1 260 968 yfrac144389x07244 05616035 1 097 654 yfrac144406x0702 06470040 946 439 yfrac144485x06816 07338045 812 687 y55175x207005 08206050 694 558 yfrac14667x07588 08823055 584 834 yfrac148833x08469 09332060 474 654 yfrac141113x09684 09503

Note The AVCE cutoffs range from 010 to 060 with a step of 005

The bold values are the degree distributions of global lncRNA-mRNA causal reg-

ulatory networks with a compromised AVCE cutoff (045) in four human cancers

Module-specific lncRNA-mRNA causal regulatory networks | 9

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

Next we use four lists of lncRNAs and mRNAs associatedwith GBM LSCC OvCa and PrCa to discover lncRNAndashmRNAcausal networks that are associated with the four human can-cers We define that cancer-related lncRNAndashmRNA causal regu-latory relationships are those in which at least one regulatoryparty is cancer-related lncRNA or mRNA As a result wehave extracted GBM-related LSCC-related OvCa-related andPrCa-related lncRNAndashmRNA causal networks from the fourcancer-specific lncRNAndashmRNA causal networks (details inSupplementary File S1) To understand the potential biologicalprocesses and pathways of the four cancer-related lncRNAndashmRNA causal networks we identify significant GO biologicalprocesses and KEGG pathways using functional enrichmentanalysis In Figure 4B several top GO biological processes and

KEGG pathways such as cytokine activity [69] G-proteincoupled receptor binding [70] TNF signaling pathway [71] cAMPsignaling pathway [72] pathways in cancer are closely associ-ated with the occurrence and development of cancer Thisresult suggests that the identified cancer-related lncRNAndashmRNA causal networks may be involved in the occurrence anddevelopment of human cancer

Conservative network analysis highlights a corelncRNAndashmRNA causal regulatory network acrosshuman cancers

Although most of the lncRNAndashmRNA causal regulatory relation-ships are cancer-specific there are still a number of common

396 424

106 115

1370

133 173

1599

126

1729

1224

252

2523

2157

5081

0

2000

4000

Inte

rsec

tion

Siz

e

PrCa

Ov Ca

LSCC

GBM

025

0050

0075

00

1000

0

Set Size

6 283

13 1 80 673

206

3969

76 3493

407

2816

9950

7

1948

7

0

250000

500000

750000

Inte

rsec

tion

Siz

e

PrCa

Ov Ca

LSCC

GBM

0e+0

0

2e+0

5

4e+0

5

6e+0

5

8e+0

5

Set Size

6 1 2 933

2

41

6 11

246

47

524

0

200

400In

ters

ectio

n S

ize

PrCa

Ov Ca

LSCC

GBM

0200

400

600

Set Size

Module-specific genes

Module-specific causal regulations Module-specific hub lncRNAs 808611

A

B C

Figure 3 Overlap and difference of module-specific genes module-specific causal regulations and module-specific hub lncRNAs across GBM LSCC OvCa and PrCa

(A) Module-specific genes (both lncRNAs and mRNAs) intersection plot (B) Module-specific causal regulations intersection plot (C) Module-specific hub lncRNAs inter-

section plot The red lines denote common genes and causal regulations across GBM LSCC OvCa and PrCa

10 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

causal regulatory relationships between the four global net-works To evaluate whether there is a common core of lncRNAndashmRNA causal regulatory relationships in the global regulatorynetworks across human cancers we concentrate on the con-served lncRNAndashmRNA causal regulatory relationships thatexisted in at least three human cancers

As shown in Figure 5A the majority of the conservedlncRNAndashmRNA causal regulatory relationships form a closely

connected community This finding indicates that the con-served lncRNAndashmRNA causal regulatory network may be a corenetwork across human cancers

The survival analysis shows that the lncRNAs and mRNAs inthe core network can significantly distinguish the metastasisrisks between the high- and low-risk groups in GBM OvCa andPrCa data sets (Figure 5B) This result suggests that the core net-work may act as a common network biomarker of GBM OvCa

Cancer-specific networks Causal regulations y=axb R2

GBM-specific 2816 y=4812x-1292 09774

LSCC-specific 99507 y=1034x-1014 09923

OvCa-specific 19487 y=7366x-1156 09723

PrCa-specific 808611 y=5243x-07055 08310

1 10 100 11001

10

100

1100

Degree of genes

Nu

mb

er o

f ge

nes

GBM-specific fitting curveLSCC-specific fitting curveOvCa-specific fitting curvePrCa-specific fitting curveGBM-specific degree distributionLSCC-specific degree distributionOvCa-specific degree distributionPrCa-specific degree distribution

solutecation symporter activitysymporter activity

gated channel activitysodium ion transmembrane transporter activity

ion channel activitysubstrate-specific channel activity

channel activitypassive transmembrane transporter activity

growth factor activitycation channel activity

metal ion transmembrane transporter activitycollagen bindingheparin binding

sulfur compound bindingintegrin binding

glycosaminoglycan bindingextracellular matrix binding

peptide receptor activityG-protein coupled peptide receptor activity

chemokine bindingG-protein coupled receptor binding

growth factor bindingcytokine binding

serine-type endopeptidase activitydeath receptor activity

tumor necrosis factor-activated receptor activitycytokine receptor activity

protein heterodimerization activitydipeptidase activity

glycoprotein bindingRAGE receptor binding

cytokine receptor bindingcytokine activity

GBM(203)

LSCC(1015)

OvCa(479)

PrCa(3019)

001

002

003

004

padjust

GeneRatio002

004

006

008

Taste transduction

ECM-receptor interaction

cAMP signaling pathway

Calcium signaling pathway

Neuroactive ligand-receptor interaction

PI3K-Akt signaling pathway

Pathways in cancerRegulation of actin cytoskeleton

Complement and coagulation cascades

AGE-RAGE signaling pathway in diabetic complications

Hematopoietic cell lineage

Th17 cell differentiation

Inflammatory bowel disease (IBD)

Malaria

Osteoclast differentiation

Influenza A

Tuberculosis

Intestinal immune network for IgA production

Chagas disease (American trypanosomiasis)

Leishmaniasis

TNF signaling pathway

Toll-like receptor signaling pathway

Rheumatoid arthritis

Cytokine-cytokine receptor interaction

GBM(129)

LSCC(500)

OvCa(266)

PrCa(1364)

GeneRatio

005

010

015

001

002

003

004padjust

GO enrichment analysis KEGG enrichment analysis

A

B

Figure 4 Differential network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) Degree distribution of cancer-specific lncRNAndashmRNA

causal networks in GBM LSCC OvCa and PrCa (B) Functional enrichment analysis of cancer-related lncRNAndashmRNA causal networks in GBM LSCC OvCa and PrCa

Module-specific lncRNA-mRNA causal regulatory networks | 11

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

and PrCa In Figure 5B we also find that the core network con-tains several cancer genes (34 26 30 and 38 cancer genes asso-ciated with GBM LSCC OvCa and PrCa respectively)

By conducting GO and KEGG enrichment analysis we findthat the core network is significantly enriched in 399 GO biologi-cal processes and 3 KEGG pathways (details in SupplementaryFile S2) Of the 399 GO biological processes 2 GO terms includ-ing negative regulation of cell adhesion (GO 0007162) and cyto-kine production in immune response (GO 0002367) areinvolved in three cancer hallmarks Tissue Invasion andMetastasis Tumor Promoting Inflammation and EvadingImmune Detection [73] This observation implies that the corenetwork may control these cancer-related hallmarks

Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

We divide the hub lncRNAs into two categories (1) conserved hublncRNAs which exist in at least three human cancers and (2)cancer-specific hub lncRNAs which only exist in single humancancer As a result we obtain 9 conserved hub lncRNAs and 828cancer-specific hub lncRNAs (include 11 GBM-specific 246 LSCC-specific 47 OvCa-specific and 524 PrCa-specific hub lncRNAs)

To evaluate whether the hub lncRNAs can distinguish meta-stasis risks of human cancers we use them to predict metasta-sis risks for tumor samples in GBM LSCC OvCa and PrCaAs shown in Figure 6A the conserved hub lncRNAs can discrim-inate the metastasis risks of tumor samples significantly(Log-rank P-valuelt 005) in four human cancers In Figure 6Bexcepting LSCC-specific hub lncRNAs owing to failing to fit aCox regression model GBM-specific OvCa-specific and PrCa-specific hub lncRNAs can discriminate the metastasis risks oftumor samples significantly in GBM OvCa and PrCa respec-tively (Log-rank P-valuelt 005) These results suggest that thehub lncRNAs are discriminative and can act as biomarkers todistinguish between high- and low-risk tumor samples

Experimentally validated lncRNAndashmRNA regulations aremostly bad hits for LncTar

Using a collection of experimentally validated lncRNAndashmRNAregulatory relationships (details in Supplementary File S3) asthe ground truth the numbers of experimentally confirmedlncRNAndashmRNA causal regulations are 17 14 20 and 42 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4)

Figure 5 Conservative network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) The core lncRNAndashmRNA causal network that

occurred in at least three human cancers The red diamond nodes and white circle nodes denote lncRNAs and mRNAs respectively (B) Survival analysis of the core

lncRNAndashmRNA causal network

12 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as

lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations

A

B

Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs

Module-specific lncRNA-mRNA causal regulatory networks | 13

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

MSLCRN networks are biologically meaningful

In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks

We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases

Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful

Comparison with other PC-based networkinference methods

Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045

We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers

Conclusions and discussion

Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions

As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological

functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks

Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs

In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers

Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and

Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks

Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)

MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)

Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-

ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-

ciated MSLCRN networks

14 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs

Key Points

bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers

bull lncRNAs exhibit dynamic positive gene regulationacross human cancers

bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships

bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms

Supplementary Data

Supplementary data are available online at httpsacademicoupcombib

Funding

The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)

References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding

RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5

2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69

3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63

4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042

5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30

6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4

7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82

8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63

9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56

10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6

11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12

12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74

13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89

14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22

15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156

16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78

17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683

18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15

19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731

20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13

21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559

22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64

23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8

24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526

25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13

26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3

27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82

28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19

29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46

Module-specific lncRNA-mRNA causal regulatory networks | 15

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60

31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016

32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12

33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892

34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408

35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51

36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32

37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305

38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15

39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99

40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057

41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6

42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6

43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083

44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138

45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8

46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7

47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7

48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370

49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5

50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034

51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082

52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2

53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9

54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17

55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000

56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000

57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1

58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6

59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910

60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000

61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7

62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9

63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30

64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5

65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765

66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9

67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40

68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46

69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52

70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94

71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88

16 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58

73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74

74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104

75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5

76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31

77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90

Module-specific lncRNA-mRNA causal regulatory networks | 17

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats

  • bby008-TF1
  • bby008-TF51
  • bby008-TF2
Page 7: Inferring and analyzing module-specific lncRNA-mRNA causal ...nugget.unisa.edu.au/Thuc/Briefings2019JP.pdf · Thuc Duy Le is a research fellow at the University of South Australia

lncRNAs and mRNAs we obtain unique expression value ofthese replicates Consequently we get the matched expressiondata of 9704 lncRNAs and 18 282 mRNAs in 451 GBM 113 LSCC585 OvCa and 150 PrCa samples

Pipeline of MSLCRN

As shown in Figure 2 MSLCRN contains the following threesteps to infer module-specific lncRNAndashmRNA causal regulatorynetworks

i Identification of lncRNAndashmRNA co-expression modulesGiven the matched lncRNA and mRNA expression data weuse WGCNA to generate gene co-expression modules Amodule containing at least two lncRNAs and two mRNAsare regarded as a lncRNAndashmRNA co-expression moduleand used as the input of the second step

ii Identification of module-specific lncRNAndashmRNA causal reg-ulatory networks For each lncRNAndashmRNA co-expressionmodule with each lncRNAndashmRNA pair we apply parallelIDA to estimate the causal effect of the lncRNA on the

Table 2 Public databases for storing lncRNAndashmRNA regulatory relationships

Databases Types of databases Brief descriptions Organisms Available

NPInter [40] Validated A database of experimentally verified func-tional interactions between ncRNAs(including lncRNAs miRNAs etc) and bio-molecules (proteins RNAs and DNAs)

22 organisms httpwwwbioinfoorgNPInter

LncRNADisease [41] Validated A database of experimentally supportedlncRNAndashdisease association data andlncRNAndashtarget interactions in various lev-els including protein RNA miRNA andDNA

Human httpwwwcuilabcnlncrnadisease

LncRNA2Target [42] Validated A database of lncRNAndashtarget regulatory rela-tionships experimentally validated bylncRNA knockdown or overexpression

Human mouse httpwwwlncrna2targetorg

LncReg [43] Validated A database of experimentally validatedlncRNAndashtarget interactions from publicliterature

7 organisms httpbioinformaticsustceducnlncreg

IRNdb [44] Validated A database of immunologically relevantncRNAs (miRNAs lncRNAs and otherncRNAs) and target genes

Human mouse httpcompbiomasseyacnzappsirndb

lncRInter [45] Validated A database of experimentally validatedlncRNAndashtarget interactions extracted frompeer-reviewed publications

15 organisms httpbioinfolifehusteducnlncRInter

starBase [46] Predicted A comprehensive database of systematicallyidentifying the RNAndashRNA and proteinndashRNAinteraction networks from 108 CLIP-Seq(PAR-CLIP HITS-CLIP iCLIP CLASH) datasets

Human httpstarbasesysueducn

lnCaNet [47] Predicted A database of establishing a comprehensiveregulatory network source for lncRNA andcancer genes

Human httplncanetbioinfo-minzhaoorg

BmncRNAdb [48] Predicted A comprehensive database of the silkwormlncRNAs and miRNAs as well as the threeonline tools for users to predict the targetgenes of lncRNAs or miRNAs

Bombyx mori httpgenecqueducnBmncRNAdbindexphp

lncRNAtor [49] Predicted A comprehensive resource of encompassingannotation sequence analysis geneexpression protein binding and phyloge-netic conservation

6 organisms httplncrnatorewhaackr

lncRNome [50] Predicted A comprehensive knowledgebase on thetypes chromosomal locations descriptionon the biological functions and diseaseassociations of lncRNAs

Human httpgenomeigibresinlncRNome

Co-LncRNA [51] Predicted A computationally predicted database toidentify GO annotations and KEGG path-ways affected by co-expressed protein-cod-ing genes of a single or multiple lncRNAs

Human httpwwwbio-bigdatacomCo-LncRNA

LncRNA2Function [52] Predicted A comprehensive resource of investigatingthe functions of lncRNAs based on co-expressed lncRNAndashmRNA interactions

Human httpmlghiteducnlncrna2function

lncRNAMap [53] Predicted An integrated and comprehensive databaseof regulatory functions of lncRNAs and act-ing as ceRNAs

Human httplncRNAMapmbcnctuedutw

6 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

mRNA We use the absolute value of the causal effect(AVCE) to evaluate the strength of the regulation of thelncRNA on the mRNA and a higher AVCE indicates a stron-ger lncRNA regulation The lncRNAndashmRNA pairs with highAVCEs in each module are considered as module-specificlncRNAndashmRNA causal regulatory relationships and we calleach module with these relationships identified a module-specific causal regulatory network

iii Identification of global lncRNAndashmRNA causal regulatorynetwork We integrate the module-specific lncRNAndashmRNA

causal regulatory networks to form the global lncRNAndashmRNA causal regulatory network

Identification of lncRNAndashmRNA co-expression modules

In systems biology WGCNA [21] is a popular method for findingthe correlation patterns among genes across samples and canbe used to identify clusters or modules of highly co-expressedgenes Therefore we use WGCNA to first infer lncRNAndashmRNAco-expression modules

Figure 2 The pipeline of MSLCRN First WGCNA is used to identify lncRNAndashmRNA co-expression modules from matched lncRNA and mRNA expression data Second

we infer lncRNAndashmRNA causal regulatory relationships in each module by using parallel IDA method For each module we assemble the identified lncRNAndashmRNA reg-

ulatory relationships to obtain a module-specific lncRNAndashmRNA causal regulatory network Third the module-specific lncRNAndashmRNA causal regulatory networks are

integrated to form a global lncRNAndashmRNA causal regulatory network

Module-specific lncRNA-mRNA causal regulatory networks | 7

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

Specifically the matched lncRNA and mRNA expressiondata are used as the input of WGCNA For each pair of genesi and j the gene co-expression similarity sij of the pair is definedas

sij frac14 jcorethi jTHORNj (1)

where jcor(i j)j is the absolute value of the Pearson correlationbetween genes i and j The gene co-expression similarity matrixis denoted by Sfrac14 [sij]

To pick an appropriate soft-thresholding power for trans-forming the similarity matrix S into an adjacency matrix A weuse the scale-free topology criterion for soft-thresholding andthe minimum scale free topology fitting index R2 is set as 09Then the topological overlap matrix (TOM) Wfrac14 [wij] is gener-ated based on the adjacency matrix Afrac14 [aij] The TOM similaritywij between genes i and j is defined

wij frac14P

uaiuauj thorn aij

minfP

uaiuP

uaujg thorn 1 aij(2)

where u denotes all genes of the matched lncRNA and mRNAexpression data The TOM dissimilarity between genes i and j isdenoted by dijfrac14 1 - wij To identify gene co-expression modulesthe TOM dissimilarity matrix Dfrac14 [dij] is clustered using optimalhierarchical clustering method [54] Here the identified geneco-expression modules are groups of lncRNAs and mRNAs withhigh topological overlap The lncRNAs and mRNAs of eachlncRNAndashmRNA co-expression module are considered for possi-ble lncRNAndashmRNA causal relationships in the next step

Identification of module-specific lncRNAndashmRNA causalregulatory networks

After the identification of lncRNAndashmRNA co-expression mod-ules we use the parallel IDA method [24] to estimate causaleffects of possible lncRNAndashmRNA causal pairs in each moduleThe application of parallel IDA method to matched lncRNAand mRNA expression data for estimating causal effectsincludes two steps (i) learning the causal structure fromexpression data using the parallel-PC algorithm [24] and(ii) estimating the causal effects of lncRNAs on mRNAs byapplying do-calculus [55]

In step (i) Vfrac14 L1 Lm T1 Tn is a set of random varia-bles denoting m lncRNAs and n mRNAs The causal structure isin the form of a DAG where a node denotes a lncRNA Li ormRNA Tj and an edge between two nodes represents a causalrelationship between them We use the parallel-PC algorithm aparallel version of the PC algorithm [56] to learn the causalstructures (the DAGs) from expression data Starting with a fullyconnected undirected graph the parallel-PC algorithm deter-mines if an edge is retained or removed in the graph by con-ducting conditional independence tests in parallel Then to geta DAG the directions of edges in the obtained graph are ori-ented As different DAGs may represent the same conditionalindependence the parallel-PC algorithm uses a completed par-tially directed acyclic graph (CPDAG) to uniquely describe anequivalence class of DAGs In this work we use the R-packageParallelPC [57] to implement the parallel-PC algorithm and setthe significant level of the conditional independence testsafrac14 001

In step (ii) we are only interested in estimating the causaleffect of the directed edge Li Tj where vertex is Li a parent ofvertex Tj As described above a CPDAG may generate a class ofDAGs For the causal effect of Li Tj in a CPDAG we use do-calculus [55] to estimate the causal effects of Li on Tj in a class ofDAGs Then we use the minimum absolute value of all possiblecausal effects as a final causal effect of Li Tj As for the detailsof how the parallel IDA method is applied to estimate causalrelationships from expression data the readers can refer to [24]

The estimated causal effects can be positive or negativereflecting the up or down regulation by the lncRNAs on themRNAs For the purpose of constructing the regulatory net-works we use the absolute values of the causal effects (AVCEs)to evaluate the strengths of the regulation and thus to confirmthe regulatory relationships

We set different AVCE cutoffs from 010 to 060 with a step of005 to generate MSLCRN networks in GBM LSCC OvCa andPrCa respectively For each cutoff we merge the identifiedMSLCRN networks to obtain global lncRNAndashmRNA causal regu-latory networks in the four human cancers respectively Asshown in Table 3 a higher cutoff selection causes a smallerglobal lncRNAndashmRNA causal regulatory network but bettergoodness of fit To make a trade-off between the size of theglobal lncRNAndashmRNA causal regulatory networks and goodnessof fit we set a compromised AVCE cutoff with a value of 045 Ifthe AVCE of a lncRNA on a mRNA is 045 or above we considerthere is a causal regulatory relationship between the lncRNAndashmRNA pair Under the compromise cutoff we have a moderatesize of the global lncRNAndashmRNA causal regulatory networks inGBM LSCC OvCa and PrCa Meanwhile the node degree distri-butions of four global lncRNAndashmRNA causal regulatory net-works also follow power law distribution (the fitted power curveis in the form of yfrac14 axb) well with R2gt 08

Validation survival and enrichment analysis

Previous studies have demonstrated that about 20 of thenodes in a biological network are essential and are regarded ashub genes [58 59] Therefore when analyzing a global lncRNAndashmRNA causal network we select the 20 of lncRNAs with thehighest degrees in the network as hub lncRNAs The degree of alncRNA node in the global network is the number of mRNAsconnected with it

To validate the predicted module-specific lncRNAndashmRNAcausal regulatory relationships we obtain the experimentallyvalidated lncRNAndashmRNA regulatory relationships from thethree widely used databases NPInter v30 [40] LncRNADiseasev2017 [41] and LncRNA2Target v12 [42] Furthermore we retainexperimentally validated lncRNAndashmRNA regulatory relation-ships associated with the four human cancer data sets asground truth

We perform survival analysis using the R-package survival[60] A multivariate Cox model is used to predict the risk scoreof each tumor sample Then all tumor samples in each cancerdata set are equally divided into high- and low-risk groupsaccording to their risk scores Moreover we calculate theHazard Ratio between the high- and the low-risk groups andperform the Log-rank test

To further investigate the underlying biological processesand pathways related to each of the MSLCRN networks we usethe R-package clusterProfiler [61] to conduct functional enrich-ment analysis on the networks respectively The GeneOntology (GO) [62] biological processes and Kyoto Encyclopedia

8 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

of Genes and Genomes (KEGG) [63] pathways with adjustedP-valuelt005 [adjusted by Benjamini-Hochberg (BH) method]are regarded as functional categories for the MSLCRN networks

We also collect a list of lncRNAs and mRNAs that areassociated with GBM LSCC OvCa and PrCa to study diseaseenrichment of each of the MSLCRN networks The list of disease-associated lncRNAs is obtained from LncRNADisease v2017 [41]Lnc2Cancer v2016 [64] and MNDR v20 [65] The list of disease-associated mRNAs is from DisGeNET v50 [66] To evaluatewhether a MSLCRN network is significantly enriched in a specificdisease we use a hyper-geometric distribution test as follows

p frac14 1 FethxjBNMTHORN frac14 1Xx1

ifrac140

N

i

B N

M i

B

M

(3)

In the formula B is the number of all genes in the expressiondata set N denotes the number of all genes associated with aspecific disease in the expression data set M is the number ofgenes in a MSLCRN network and x is the number of genes asso-ciated with a specific disease in a MSLCRN network A MSLCRNnetwork is significantly enriched in a specific disease if theP-valuelt 005

Network analysis validation and comparisonon MSLCRN networkslncRNAs exhibit dynamic positive gene regulationacross cancers

By following the first step of the MSLCRN method we haveidentified 23 38 45 and 32 lncRNAndashmRNA co-expression mod-ules in GBM LSCC OvCa and PrCa respectively In the secondstep of the MSLCRN method we eliminate the noncausallncRNAndashmRNA pairs in lncRNAndashmRNA co-expression modulesAs a result we generate 23 38 45 and 32 module-specificlncRNAndashmRNA causal regulatory networks in GBM LSCC OvCaand PrCa respectively After merging the module-specificlncRNAndashmRNA causal regulatory networks for each data set weobtain the four global lncRNAndashmRNA regulatory networks inGBM LSCC OvCa and PrCa respectively

To understand the overlap and difference of module-specificgenes module-specific lncRNAndashmRNA causal regulatory rela-tionships and module-specific hub lncRNAs in the four humancancers we generate three set intersection plots using theR-package UpSetR [67] As shown in Figure 3 we find that themajority of module-specific genes (5752) module-specificlncRNAndashmRNA causal regulatory relationships (9902) andmodule-specific hub lncRNAs (8922) tend to be cancer-specific Only a small portion of module-specific genes (396) andmodule-specific lncRNAndashmRNA causal regulatory relationships(6) are shared by the four cancers Especially none of themodule-specific hub lncRNAs are common between the fourcancers In addition the causal effects are positive for 99569672 9993 and 7863 of the causal regulatory relationshipsidentified in GBM LSCC OvCa and PrCa respectively Theseresults indicate that lncRNAs are more likely to exhibit dynamicpositive gene regulation across cancers The results are alsoconsistent with the proposition that the positive gene regula-tion by lncRNAs would be desired in specific situations [68]

Differential network analysis uncovers cancer-specificlncRNAndashmRNA causal networks

In this section we focus on studying cancer-specific lncRNAndashmRNA causal networks using differential network analysisThus the GBM-specific LSCC-specific OvCa-specific and PrCa-specific lncRNAndashmRNA causal networks are identified Asshown in Figure 4A the distributions of node degrees in thesefour cancer-specific lncRNAndashmRNA causal networks followpower law distributions well with R2frac14 09774 09923 09723and 08310 respectively Thus these four cancer-specificlncRNAndashmRNA causal networks are scale free indicating thatmost mRNAs are regulated by a small number of lncRNAs

Table 3 Degree distributions of global lncRNAndashmRNA causal regula-tory networks with different cutoffs in GBM LSCC OvCa and PrCa

Datasets Cutoffs Number of causalregulations

yfrac14axb R2

GBM 010 11 847 yfrac142274x06893 04161015 10 924 yfrac142495x07275 05460020 9732 yfrac142745x0767 06475025 8461 yfrac142958x08074 06757030 7176 yfrac143194x08319 06807035 6041 yfrac143363x08703 07203040 4997 yfrac143741x09348 07999045 4074 y54082x21034 08694050 3279 yfrac144194x118 09244055 2583 yfrac143896x1259 09463060 1862 yfrac143666x143 09792

LSCC 010 789 172 yfrac143143x06071 04829015 684 524 yfrac143475x06323 05841020 569 369 yfrac143905x06525 06578025 451 346 yfrac144855x06928 07789030 340 860 yfrac146341x07554 08796035 244 547 yfrac148147x08379 09504040 166 593 yfrac149724x0935 09848045 108 024 y51031x21018 09933050 66 335 yfrac149425x1068 09963055 37 632 yfrac147807x1089 09948060 19 547 yfrac146565x1169 09972

OvCa 010 333 146 yfrac143272x05928 05042015 232 794 yfrac144192x06262 06531020 159 872 yfrac146398x07216 08247025 112 792 yfrac148816x08356 09120030 80 808 yfrac141008x09472 09551035 57 099 yfrac149545x1014 09744040 38 517 yfrac148198x1066 09748045 24 439 y56575x21066 09697050 14 435 yfrac14540x1079 09551055 7973 yfrac144368x1107 09319060 4026 yfrac143285x1107 09460

PrCa 010 1 894 322 yfrac143089x06245 02750015 1 749 595 yfrac143586x06787 03582020 1 594 744 yfrac144013x07169 04316025 1 429 858 yfrac144271x0732 04919030 1 260 968 yfrac144389x07244 05616035 1 097 654 yfrac144406x0702 06470040 946 439 yfrac144485x06816 07338045 812 687 y55175x207005 08206050 694 558 yfrac14667x07588 08823055 584 834 yfrac148833x08469 09332060 474 654 yfrac141113x09684 09503

Note The AVCE cutoffs range from 010 to 060 with a step of 005

The bold values are the degree distributions of global lncRNA-mRNA causal reg-

ulatory networks with a compromised AVCE cutoff (045) in four human cancers

Module-specific lncRNA-mRNA causal regulatory networks | 9

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

Next we use four lists of lncRNAs and mRNAs associatedwith GBM LSCC OvCa and PrCa to discover lncRNAndashmRNAcausal networks that are associated with the four human can-cers We define that cancer-related lncRNAndashmRNA causal regu-latory relationships are those in which at least one regulatoryparty is cancer-related lncRNA or mRNA As a result wehave extracted GBM-related LSCC-related OvCa-related andPrCa-related lncRNAndashmRNA causal networks from the fourcancer-specific lncRNAndashmRNA causal networks (details inSupplementary File S1) To understand the potential biologicalprocesses and pathways of the four cancer-related lncRNAndashmRNA causal networks we identify significant GO biologicalprocesses and KEGG pathways using functional enrichmentanalysis In Figure 4B several top GO biological processes and

KEGG pathways such as cytokine activity [69] G-proteincoupled receptor binding [70] TNF signaling pathway [71] cAMPsignaling pathway [72] pathways in cancer are closely associ-ated with the occurrence and development of cancer Thisresult suggests that the identified cancer-related lncRNAndashmRNA causal networks may be involved in the occurrence anddevelopment of human cancer

Conservative network analysis highlights a corelncRNAndashmRNA causal regulatory network acrosshuman cancers

Although most of the lncRNAndashmRNA causal regulatory relation-ships are cancer-specific there are still a number of common

396 424

106 115

1370

133 173

1599

126

1729

1224

252

2523

2157

5081

0

2000

4000

Inte

rsec

tion

Siz

e

PrCa

Ov Ca

LSCC

GBM

025

0050

0075

00

1000

0

Set Size

6 283

13 1 80 673

206

3969

76 3493

407

2816

9950

7

1948

7

0

250000

500000

750000

Inte

rsec

tion

Siz

e

PrCa

Ov Ca

LSCC

GBM

0e+0

0

2e+0

5

4e+0

5

6e+0

5

8e+0

5

Set Size

6 1 2 933

2

41

6 11

246

47

524

0

200

400In

ters

ectio

n S

ize

PrCa

Ov Ca

LSCC

GBM

0200

400

600

Set Size

Module-specific genes

Module-specific causal regulations Module-specific hub lncRNAs 808611

A

B C

Figure 3 Overlap and difference of module-specific genes module-specific causal regulations and module-specific hub lncRNAs across GBM LSCC OvCa and PrCa

(A) Module-specific genes (both lncRNAs and mRNAs) intersection plot (B) Module-specific causal regulations intersection plot (C) Module-specific hub lncRNAs inter-

section plot The red lines denote common genes and causal regulations across GBM LSCC OvCa and PrCa

10 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

causal regulatory relationships between the four global net-works To evaluate whether there is a common core of lncRNAndashmRNA causal regulatory relationships in the global regulatorynetworks across human cancers we concentrate on the con-served lncRNAndashmRNA causal regulatory relationships thatexisted in at least three human cancers

As shown in Figure 5A the majority of the conservedlncRNAndashmRNA causal regulatory relationships form a closely

connected community This finding indicates that the con-served lncRNAndashmRNA causal regulatory network may be a corenetwork across human cancers

The survival analysis shows that the lncRNAs and mRNAs inthe core network can significantly distinguish the metastasisrisks between the high- and low-risk groups in GBM OvCa andPrCa data sets (Figure 5B) This result suggests that the core net-work may act as a common network biomarker of GBM OvCa

Cancer-specific networks Causal regulations y=axb R2

GBM-specific 2816 y=4812x-1292 09774

LSCC-specific 99507 y=1034x-1014 09923

OvCa-specific 19487 y=7366x-1156 09723

PrCa-specific 808611 y=5243x-07055 08310

1 10 100 11001

10

100

1100

Degree of genes

Nu

mb

er o

f ge

nes

GBM-specific fitting curveLSCC-specific fitting curveOvCa-specific fitting curvePrCa-specific fitting curveGBM-specific degree distributionLSCC-specific degree distributionOvCa-specific degree distributionPrCa-specific degree distribution

solutecation symporter activitysymporter activity

gated channel activitysodium ion transmembrane transporter activity

ion channel activitysubstrate-specific channel activity

channel activitypassive transmembrane transporter activity

growth factor activitycation channel activity

metal ion transmembrane transporter activitycollagen bindingheparin binding

sulfur compound bindingintegrin binding

glycosaminoglycan bindingextracellular matrix binding

peptide receptor activityG-protein coupled peptide receptor activity

chemokine bindingG-protein coupled receptor binding

growth factor bindingcytokine binding

serine-type endopeptidase activitydeath receptor activity

tumor necrosis factor-activated receptor activitycytokine receptor activity

protein heterodimerization activitydipeptidase activity

glycoprotein bindingRAGE receptor binding

cytokine receptor bindingcytokine activity

GBM(203)

LSCC(1015)

OvCa(479)

PrCa(3019)

001

002

003

004

padjust

GeneRatio002

004

006

008

Taste transduction

ECM-receptor interaction

cAMP signaling pathway

Calcium signaling pathway

Neuroactive ligand-receptor interaction

PI3K-Akt signaling pathway

Pathways in cancerRegulation of actin cytoskeleton

Complement and coagulation cascades

AGE-RAGE signaling pathway in diabetic complications

Hematopoietic cell lineage

Th17 cell differentiation

Inflammatory bowel disease (IBD)

Malaria

Osteoclast differentiation

Influenza A

Tuberculosis

Intestinal immune network for IgA production

Chagas disease (American trypanosomiasis)

Leishmaniasis

TNF signaling pathway

Toll-like receptor signaling pathway

Rheumatoid arthritis

Cytokine-cytokine receptor interaction

GBM(129)

LSCC(500)

OvCa(266)

PrCa(1364)

GeneRatio

005

010

015

001

002

003

004padjust

GO enrichment analysis KEGG enrichment analysis

A

B

Figure 4 Differential network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) Degree distribution of cancer-specific lncRNAndashmRNA

causal networks in GBM LSCC OvCa and PrCa (B) Functional enrichment analysis of cancer-related lncRNAndashmRNA causal networks in GBM LSCC OvCa and PrCa

Module-specific lncRNA-mRNA causal regulatory networks | 11

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

and PrCa In Figure 5B we also find that the core network con-tains several cancer genes (34 26 30 and 38 cancer genes asso-ciated with GBM LSCC OvCa and PrCa respectively)

By conducting GO and KEGG enrichment analysis we findthat the core network is significantly enriched in 399 GO biologi-cal processes and 3 KEGG pathways (details in SupplementaryFile S2) Of the 399 GO biological processes 2 GO terms includ-ing negative regulation of cell adhesion (GO 0007162) and cyto-kine production in immune response (GO 0002367) areinvolved in three cancer hallmarks Tissue Invasion andMetastasis Tumor Promoting Inflammation and EvadingImmune Detection [73] This observation implies that the corenetwork may control these cancer-related hallmarks

Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

We divide the hub lncRNAs into two categories (1) conserved hublncRNAs which exist in at least three human cancers and (2)cancer-specific hub lncRNAs which only exist in single humancancer As a result we obtain 9 conserved hub lncRNAs and 828cancer-specific hub lncRNAs (include 11 GBM-specific 246 LSCC-specific 47 OvCa-specific and 524 PrCa-specific hub lncRNAs)

To evaluate whether the hub lncRNAs can distinguish meta-stasis risks of human cancers we use them to predict metasta-sis risks for tumor samples in GBM LSCC OvCa and PrCaAs shown in Figure 6A the conserved hub lncRNAs can discrim-inate the metastasis risks of tumor samples significantly(Log-rank P-valuelt 005) in four human cancers In Figure 6Bexcepting LSCC-specific hub lncRNAs owing to failing to fit aCox regression model GBM-specific OvCa-specific and PrCa-specific hub lncRNAs can discriminate the metastasis risks oftumor samples significantly in GBM OvCa and PrCa respec-tively (Log-rank P-valuelt 005) These results suggest that thehub lncRNAs are discriminative and can act as biomarkers todistinguish between high- and low-risk tumor samples

Experimentally validated lncRNAndashmRNA regulations aremostly bad hits for LncTar

Using a collection of experimentally validated lncRNAndashmRNAregulatory relationships (details in Supplementary File S3) asthe ground truth the numbers of experimentally confirmedlncRNAndashmRNA causal regulations are 17 14 20 and 42 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4)

Figure 5 Conservative network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) The core lncRNAndashmRNA causal network that

occurred in at least three human cancers The red diamond nodes and white circle nodes denote lncRNAs and mRNAs respectively (B) Survival analysis of the core

lncRNAndashmRNA causal network

12 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as

lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations

A

B

Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs

Module-specific lncRNA-mRNA causal regulatory networks | 13

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

MSLCRN networks are biologically meaningful

In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks

We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases

Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful

Comparison with other PC-based networkinference methods

Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045

We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers

Conclusions and discussion

Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions

As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological

functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks

Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs

In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers

Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and

Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks

Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)

MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)

Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-

ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-

ciated MSLCRN networks

14 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs

Key Points

bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers

bull lncRNAs exhibit dynamic positive gene regulationacross human cancers

bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships

bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms

Supplementary Data

Supplementary data are available online at httpsacademicoupcombib

Funding

The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)

References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding

RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5

2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69

3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63

4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042

5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30

6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4

7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82

8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63

9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56

10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6

11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12

12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74

13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89

14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22

15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156

16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78

17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683

18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15

19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731

20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13

21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559

22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64

23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8

24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526

25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13

26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3

27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82

28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19

29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46

Module-specific lncRNA-mRNA causal regulatory networks | 15

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60

31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016

32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12

33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892

34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408

35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51

36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32

37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305

38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15

39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99

40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057

41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6

42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6

43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083

44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138

45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8

46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7

47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7

48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370

49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5

50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034

51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082

52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2

53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9

54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17

55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000

56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000

57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1

58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6

59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910

60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000

61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7

62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9

63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30

64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5

65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765

66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9

67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40

68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46

69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52

70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94

71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88

16 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58

73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74

74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104

75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5

76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31

77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90

Module-specific lncRNA-mRNA causal regulatory networks | 17

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats

  • bby008-TF1
  • bby008-TF51
  • bby008-TF2
Page 8: Inferring and analyzing module-specific lncRNA-mRNA causal ...nugget.unisa.edu.au/Thuc/Briefings2019JP.pdf · Thuc Duy Le is a research fellow at the University of South Australia

mRNA We use the absolute value of the causal effect(AVCE) to evaluate the strength of the regulation of thelncRNA on the mRNA and a higher AVCE indicates a stron-ger lncRNA regulation The lncRNAndashmRNA pairs with highAVCEs in each module are considered as module-specificlncRNAndashmRNA causal regulatory relationships and we calleach module with these relationships identified a module-specific causal regulatory network

iii Identification of global lncRNAndashmRNA causal regulatorynetwork We integrate the module-specific lncRNAndashmRNA

causal regulatory networks to form the global lncRNAndashmRNA causal regulatory network

Identification of lncRNAndashmRNA co-expression modules

In systems biology WGCNA [21] is a popular method for findingthe correlation patterns among genes across samples and canbe used to identify clusters or modules of highly co-expressedgenes Therefore we use WGCNA to first infer lncRNAndashmRNAco-expression modules

Figure 2 The pipeline of MSLCRN First WGCNA is used to identify lncRNAndashmRNA co-expression modules from matched lncRNA and mRNA expression data Second

we infer lncRNAndashmRNA causal regulatory relationships in each module by using parallel IDA method For each module we assemble the identified lncRNAndashmRNA reg-

ulatory relationships to obtain a module-specific lncRNAndashmRNA causal regulatory network Third the module-specific lncRNAndashmRNA causal regulatory networks are

integrated to form a global lncRNAndashmRNA causal regulatory network

Module-specific lncRNA-mRNA causal regulatory networks | 7

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

Specifically the matched lncRNA and mRNA expressiondata are used as the input of WGCNA For each pair of genesi and j the gene co-expression similarity sij of the pair is definedas

sij frac14 jcorethi jTHORNj (1)

where jcor(i j)j is the absolute value of the Pearson correlationbetween genes i and j The gene co-expression similarity matrixis denoted by Sfrac14 [sij]

To pick an appropriate soft-thresholding power for trans-forming the similarity matrix S into an adjacency matrix A weuse the scale-free topology criterion for soft-thresholding andthe minimum scale free topology fitting index R2 is set as 09Then the topological overlap matrix (TOM) Wfrac14 [wij] is gener-ated based on the adjacency matrix Afrac14 [aij] The TOM similaritywij between genes i and j is defined

wij frac14P

uaiuauj thorn aij

minfP

uaiuP

uaujg thorn 1 aij(2)

where u denotes all genes of the matched lncRNA and mRNAexpression data The TOM dissimilarity between genes i and j isdenoted by dijfrac14 1 - wij To identify gene co-expression modulesthe TOM dissimilarity matrix Dfrac14 [dij] is clustered using optimalhierarchical clustering method [54] Here the identified geneco-expression modules are groups of lncRNAs and mRNAs withhigh topological overlap The lncRNAs and mRNAs of eachlncRNAndashmRNA co-expression module are considered for possi-ble lncRNAndashmRNA causal relationships in the next step

Identification of module-specific lncRNAndashmRNA causalregulatory networks

After the identification of lncRNAndashmRNA co-expression mod-ules we use the parallel IDA method [24] to estimate causaleffects of possible lncRNAndashmRNA causal pairs in each moduleThe application of parallel IDA method to matched lncRNAand mRNA expression data for estimating causal effectsincludes two steps (i) learning the causal structure fromexpression data using the parallel-PC algorithm [24] and(ii) estimating the causal effects of lncRNAs on mRNAs byapplying do-calculus [55]

In step (i) Vfrac14 L1 Lm T1 Tn is a set of random varia-bles denoting m lncRNAs and n mRNAs The causal structure isin the form of a DAG where a node denotes a lncRNA Li ormRNA Tj and an edge between two nodes represents a causalrelationship between them We use the parallel-PC algorithm aparallel version of the PC algorithm [56] to learn the causalstructures (the DAGs) from expression data Starting with a fullyconnected undirected graph the parallel-PC algorithm deter-mines if an edge is retained or removed in the graph by con-ducting conditional independence tests in parallel Then to geta DAG the directions of edges in the obtained graph are ori-ented As different DAGs may represent the same conditionalindependence the parallel-PC algorithm uses a completed par-tially directed acyclic graph (CPDAG) to uniquely describe anequivalence class of DAGs In this work we use the R-packageParallelPC [57] to implement the parallel-PC algorithm and setthe significant level of the conditional independence testsafrac14 001

In step (ii) we are only interested in estimating the causaleffect of the directed edge Li Tj where vertex is Li a parent ofvertex Tj As described above a CPDAG may generate a class ofDAGs For the causal effect of Li Tj in a CPDAG we use do-calculus [55] to estimate the causal effects of Li on Tj in a class ofDAGs Then we use the minimum absolute value of all possiblecausal effects as a final causal effect of Li Tj As for the detailsof how the parallel IDA method is applied to estimate causalrelationships from expression data the readers can refer to [24]

The estimated causal effects can be positive or negativereflecting the up or down regulation by the lncRNAs on themRNAs For the purpose of constructing the regulatory net-works we use the absolute values of the causal effects (AVCEs)to evaluate the strengths of the regulation and thus to confirmthe regulatory relationships

We set different AVCE cutoffs from 010 to 060 with a step of005 to generate MSLCRN networks in GBM LSCC OvCa andPrCa respectively For each cutoff we merge the identifiedMSLCRN networks to obtain global lncRNAndashmRNA causal regu-latory networks in the four human cancers respectively Asshown in Table 3 a higher cutoff selection causes a smallerglobal lncRNAndashmRNA causal regulatory network but bettergoodness of fit To make a trade-off between the size of theglobal lncRNAndashmRNA causal regulatory networks and goodnessof fit we set a compromised AVCE cutoff with a value of 045 Ifthe AVCE of a lncRNA on a mRNA is 045 or above we considerthere is a causal regulatory relationship between the lncRNAndashmRNA pair Under the compromise cutoff we have a moderatesize of the global lncRNAndashmRNA causal regulatory networks inGBM LSCC OvCa and PrCa Meanwhile the node degree distri-butions of four global lncRNAndashmRNA causal regulatory net-works also follow power law distribution (the fitted power curveis in the form of yfrac14 axb) well with R2gt 08

Validation survival and enrichment analysis

Previous studies have demonstrated that about 20 of thenodes in a biological network are essential and are regarded ashub genes [58 59] Therefore when analyzing a global lncRNAndashmRNA causal network we select the 20 of lncRNAs with thehighest degrees in the network as hub lncRNAs The degree of alncRNA node in the global network is the number of mRNAsconnected with it

To validate the predicted module-specific lncRNAndashmRNAcausal regulatory relationships we obtain the experimentallyvalidated lncRNAndashmRNA regulatory relationships from thethree widely used databases NPInter v30 [40] LncRNADiseasev2017 [41] and LncRNA2Target v12 [42] Furthermore we retainexperimentally validated lncRNAndashmRNA regulatory relation-ships associated with the four human cancer data sets asground truth

We perform survival analysis using the R-package survival[60] A multivariate Cox model is used to predict the risk scoreof each tumor sample Then all tumor samples in each cancerdata set are equally divided into high- and low-risk groupsaccording to their risk scores Moreover we calculate theHazard Ratio between the high- and the low-risk groups andperform the Log-rank test

To further investigate the underlying biological processesand pathways related to each of the MSLCRN networks we usethe R-package clusterProfiler [61] to conduct functional enrich-ment analysis on the networks respectively The GeneOntology (GO) [62] biological processes and Kyoto Encyclopedia

8 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

of Genes and Genomes (KEGG) [63] pathways with adjustedP-valuelt005 [adjusted by Benjamini-Hochberg (BH) method]are regarded as functional categories for the MSLCRN networks

We also collect a list of lncRNAs and mRNAs that areassociated with GBM LSCC OvCa and PrCa to study diseaseenrichment of each of the MSLCRN networks The list of disease-associated lncRNAs is obtained from LncRNADisease v2017 [41]Lnc2Cancer v2016 [64] and MNDR v20 [65] The list of disease-associated mRNAs is from DisGeNET v50 [66] To evaluatewhether a MSLCRN network is significantly enriched in a specificdisease we use a hyper-geometric distribution test as follows

p frac14 1 FethxjBNMTHORN frac14 1Xx1

ifrac140

N

i

B N

M i

B

M

(3)

In the formula B is the number of all genes in the expressiondata set N denotes the number of all genes associated with aspecific disease in the expression data set M is the number ofgenes in a MSLCRN network and x is the number of genes asso-ciated with a specific disease in a MSLCRN network A MSLCRNnetwork is significantly enriched in a specific disease if theP-valuelt 005

Network analysis validation and comparisonon MSLCRN networkslncRNAs exhibit dynamic positive gene regulationacross cancers

By following the first step of the MSLCRN method we haveidentified 23 38 45 and 32 lncRNAndashmRNA co-expression mod-ules in GBM LSCC OvCa and PrCa respectively In the secondstep of the MSLCRN method we eliminate the noncausallncRNAndashmRNA pairs in lncRNAndashmRNA co-expression modulesAs a result we generate 23 38 45 and 32 module-specificlncRNAndashmRNA causal regulatory networks in GBM LSCC OvCaand PrCa respectively After merging the module-specificlncRNAndashmRNA causal regulatory networks for each data set weobtain the four global lncRNAndashmRNA regulatory networks inGBM LSCC OvCa and PrCa respectively

To understand the overlap and difference of module-specificgenes module-specific lncRNAndashmRNA causal regulatory rela-tionships and module-specific hub lncRNAs in the four humancancers we generate three set intersection plots using theR-package UpSetR [67] As shown in Figure 3 we find that themajority of module-specific genes (5752) module-specificlncRNAndashmRNA causal regulatory relationships (9902) andmodule-specific hub lncRNAs (8922) tend to be cancer-specific Only a small portion of module-specific genes (396) andmodule-specific lncRNAndashmRNA causal regulatory relationships(6) are shared by the four cancers Especially none of themodule-specific hub lncRNAs are common between the fourcancers In addition the causal effects are positive for 99569672 9993 and 7863 of the causal regulatory relationshipsidentified in GBM LSCC OvCa and PrCa respectively Theseresults indicate that lncRNAs are more likely to exhibit dynamicpositive gene regulation across cancers The results are alsoconsistent with the proposition that the positive gene regula-tion by lncRNAs would be desired in specific situations [68]

Differential network analysis uncovers cancer-specificlncRNAndashmRNA causal networks

In this section we focus on studying cancer-specific lncRNAndashmRNA causal networks using differential network analysisThus the GBM-specific LSCC-specific OvCa-specific and PrCa-specific lncRNAndashmRNA causal networks are identified Asshown in Figure 4A the distributions of node degrees in thesefour cancer-specific lncRNAndashmRNA causal networks followpower law distributions well with R2frac14 09774 09923 09723and 08310 respectively Thus these four cancer-specificlncRNAndashmRNA causal networks are scale free indicating thatmost mRNAs are regulated by a small number of lncRNAs

Table 3 Degree distributions of global lncRNAndashmRNA causal regula-tory networks with different cutoffs in GBM LSCC OvCa and PrCa

Datasets Cutoffs Number of causalregulations

yfrac14axb R2

GBM 010 11 847 yfrac142274x06893 04161015 10 924 yfrac142495x07275 05460020 9732 yfrac142745x0767 06475025 8461 yfrac142958x08074 06757030 7176 yfrac143194x08319 06807035 6041 yfrac143363x08703 07203040 4997 yfrac143741x09348 07999045 4074 y54082x21034 08694050 3279 yfrac144194x118 09244055 2583 yfrac143896x1259 09463060 1862 yfrac143666x143 09792

LSCC 010 789 172 yfrac143143x06071 04829015 684 524 yfrac143475x06323 05841020 569 369 yfrac143905x06525 06578025 451 346 yfrac144855x06928 07789030 340 860 yfrac146341x07554 08796035 244 547 yfrac148147x08379 09504040 166 593 yfrac149724x0935 09848045 108 024 y51031x21018 09933050 66 335 yfrac149425x1068 09963055 37 632 yfrac147807x1089 09948060 19 547 yfrac146565x1169 09972

OvCa 010 333 146 yfrac143272x05928 05042015 232 794 yfrac144192x06262 06531020 159 872 yfrac146398x07216 08247025 112 792 yfrac148816x08356 09120030 80 808 yfrac141008x09472 09551035 57 099 yfrac149545x1014 09744040 38 517 yfrac148198x1066 09748045 24 439 y56575x21066 09697050 14 435 yfrac14540x1079 09551055 7973 yfrac144368x1107 09319060 4026 yfrac143285x1107 09460

PrCa 010 1 894 322 yfrac143089x06245 02750015 1 749 595 yfrac143586x06787 03582020 1 594 744 yfrac144013x07169 04316025 1 429 858 yfrac144271x0732 04919030 1 260 968 yfrac144389x07244 05616035 1 097 654 yfrac144406x0702 06470040 946 439 yfrac144485x06816 07338045 812 687 y55175x207005 08206050 694 558 yfrac14667x07588 08823055 584 834 yfrac148833x08469 09332060 474 654 yfrac141113x09684 09503

Note The AVCE cutoffs range from 010 to 060 with a step of 005

The bold values are the degree distributions of global lncRNA-mRNA causal reg-

ulatory networks with a compromised AVCE cutoff (045) in four human cancers

Module-specific lncRNA-mRNA causal regulatory networks | 9

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

Next we use four lists of lncRNAs and mRNAs associatedwith GBM LSCC OvCa and PrCa to discover lncRNAndashmRNAcausal networks that are associated with the four human can-cers We define that cancer-related lncRNAndashmRNA causal regu-latory relationships are those in which at least one regulatoryparty is cancer-related lncRNA or mRNA As a result wehave extracted GBM-related LSCC-related OvCa-related andPrCa-related lncRNAndashmRNA causal networks from the fourcancer-specific lncRNAndashmRNA causal networks (details inSupplementary File S1) To understand the potential biologicalprocesses and pathways of the four cancer-related lncRNAndashmRNA causal networks we identify significant GO biologicalprocesses and KEGG pathways using functional enrichmentanalysis In Figure 4B several top GO biological processes and

KEGG pathways such as cytokine activity [69] G-proteincoupled receptor binding [70] TNF signaling pathway [71] cAMPsignaling pathway [72] pathways in cancer are closely associ-ated with the occurrence and development of cancer Thisresult suggests that the identified cancer-related lncRNAndashmRNA causal networks may be involved in the occurrence anddevelopment of human cancer

Conservative network analysis highlights a corelncRNAndashmRNA causal regulatory network acrosshuman cancers

Although most of the lncRNAndashmRNA causal regulatory relation-ships are cancer-specific there are still a number of common

396 424

106 115

1370

133 173

1599

126

1729

1224

252

2523

2157

5081

0

2000

4000

Inte

rsec

tion

Siz

e

PrCa

Ov Ca

LSCC

GBM

025

0050

0075

00

1000

0

Set Size

6 283

13 1 80 673

206

3969

76 3493

407

2816

9950

7

1948

7

0

250000

500000

750000

Inte

rsec

tion

Siz

e

PrCa

Ov Ca

LSCC

GBM

0e+0

0

2e+0

5

4e+0

5

6e+0

5

8e+0

5

Set Size

6 1 2 933

2

41

6 11

246

47

524

0

200

400In

ters

ectio

n S

ize

PrCa

Ov Ca

LSCC

GBM

0200

400

600

Set Size

Module-specific genes

Module-specific causal regulations Module-specific hub lncRNAs 808611

A

B C

Figure 3 Overlap and difference of module-specific genes module-specific causal regulations and module-specific hub lncRNAs across GBM LSCC OvCa and PrCa

(A) Module-specific genes (both lncRNAs and mRNAs) intersection plot (B) Module-specific causal regulations intersection plot (C) Module-specific hub lncRNAs inter-

section plot The red lines denote common genes and causal regulations across GBM LSCC OvCa and PrCa

10 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

causal regulatory relationships between the four global net-works To evaluate whether there is a common core of lncRNAndashmRNA causal regulatory relationships in the global regulatorynetworks across human cancers we concentrate on the con-served lncRNAndashmRNA causal regulatory relationships thatexisted in at least three human cancers

As shown in Figure 5A the majority of the conservedlncRNAndashmRNA causal regulatory relationships form a closely

connected community This finding indicates that the con-served lncRNAndashmRNA causal regulatory network may be a corenetwork across human cancers

The survival analysis shows that the lncRNAs and mRNAs inthe core network can significantly distinguish the metastasisrisks between the high- and low-risk groups in GBM OvCa andPrCa data sets (Figure 5B) This result suggests that the core net-work may act as a common network biomarker of GBM OvCa

Cancer-specific networks Causal regulations y=axb R2

GBM-specific 2816 y=4812x-1292 09774

LSCC-specific 99507 y=1034x-1014 09923

OvCa-specific 19487 y=7366x-1156 09723

PrCa-specific 808611 y=5243x-07055 08310

1 10 100 11001

10

100

1100

Degree of genes

Nu

mb

er o

f ge

nes

GBM-specific fitting curveLSCC-specific fitting curveOvCa-specific fitting curvePrCa-specific fitting curveGBM-specific degree distributionLSCC-specific degree distributionOvCa-specific degree distributionPrCa-specific degree distribution

solutecation symporter activitysymporter activity

gated channel activitysodium ion transmembrane transporter activity

ion channel activitysubstrate-specific channel activity

channel activitypassive transmembrane transporter activity

growth factor activitycation channel activity

metal ion transmembrane transporter activitycollagen bindingheparin binding

sulfur compound bindingintegrin binding

glycosaminoglycan bindingextracellular matrix binding

peptide receptor activityG-protein coupled peptide receptor activity

chemokine bindingG-protein coupled receptor binding

growth factor bindingcytokine binding

serine-type endopeptidase activitydeath receptor activity

tumor necrosis factor-activated receptor activitycytokine receptor activity

protein heterodimerization activitydipeptidase activity

glycoprotein bindingRAGE receptor binding

cytokine receptor bindingcytokine activity

GBM(203)

LSCC(1015)

OvCa(479)

PrCa(3019)

001

002

003

004

padjust

GeneRatio002

004

006

008

Taste transduction

ECM-receptor interaction

cAMP signaling pathway

Calcium signaling pathway

Neuroactive ligand-receptor interaction

PI3K-Akt signaling pathway

Pathways in cancerRegulation of actin cytoskeleton

Complement and coagulation cascades

AGE-RAGE signaling pathway in diabetic complications

Hematopoietic cell lineage

Th17 cell differentiation

Inflammatory bowel disease (IBD)

Malaria

Osteoclast differentiation

Influenza A

Tuberculosis

Intestinal immune network for IgA production

Chagas disease (American trypanosomiasis)

Leishmaniasis

TNF signaling pathway

Toll-like receptor signaling pathway

Rheumatoid arthritis

Cytokine-cytokine receptor interaction

GBM(129)

LSCC(500)

OvCa(266)

PrCa(1364)

GeneRatio

005

010

015

001

002

003

004padjust

GO enrichment analysis KEGG enrichment analysis

A

B

Figure 4 Differential network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) Degree distribution of cancer-specific lncRNAndashmRNA

causal networks in GBM LSCC OvCa and PrCa (B) Functional enrichment analysis of cancer-related lncRNAndashmRNA causal networks in GBM LSCC OvCa and PrCa

Module-specific lncRNA-mRNA causal regulatory networks | 11

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

and PrCa In Figure 5B we also find that the core network con-tains several cancer genes (34 26 30 and 38 cancer genes asso-ciated with GBM LSCC OvCa and PrCa respectively)

By conducting GO and KEGG enrichment analysis we findthat the core network is significantly enriched in 399 GO biologi-cal processes and 3 KEGG pathways (details in SupplementaryFile S2) Of the 399 GO biological processes 2 GO terms includ-ing negative regulation of cell adhesion (GO 0007162) and cyto-kine production in immune response (GO 0002367) areinvolved in three cancer hallmarks Tissue Invasion andMetastasis Tumor Promoting Inflammation and EvadingImmune Detection [73] This observation implies that the corenetwork may control these cancer-related hallmarks

Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

We divide the hub lncRNAs into two categories (1) conserved hublncRNAs which exist in at least three human cancers and (2)cancer-specific hub lncRNAs which only exist in single humancancer As a result we obtain 9 conserved hub lncRNAs and 828cancer-specific hub lncRNAs (include 11 GBM-specific 246 LSCC-specific 47 OvCa-specific and 524 PrCa-specific hub lncRNAs)

To evaluate whether the hub lncRNAs can distinguish meta-stasis risks of human cancers we use them to predict metasta-sis risks for tumor samples in GBM LSCC OvCa and PrCaAs shown in Figure 6A the conserved hub lncRNAs can discrim-inate the metastasis risks of tumor samples significantly(Log-rank P-valuelt 005) in four human cancers In Figure 6Bexcepting LSCC-specific hub lncRNAs owing to failing to fit aCox regression model GBM-specific OvCa-specific and PrCa-specific hub lncRNAs can discriminate the metastasis risks oftumor samples significantly in GBM OvCa and PrCa respec-tively (Log-rank P-valuelt 005) These results suggest that thehub lncRNAs are discriminative and can act as biomarkers todistinguish between high- and low-risk tumor samples

Experimentally validated lncRNAndashmRNA regulations aremostly bad hits for LncTar

Using a collection of experimentally validated lncRNAndashmRNAregulatory relationships (details in Supplementary File S3) asthe ground truth the numbers of experimentally confirmedlncRNAndashmRNA causal regulations are 17 14 20 and 42 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4)

Figure 5 Conservative network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) The core lncRNAndashmRNA causal network that

occurred in at least three human cancers The red diamond nodes and white circle nodes denote lncRNAs and mRNAs respectively (B) Survival analysis of the core

lncRNAndashmRNA causal network

12 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as

lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations

A

B

Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs

Module-specific lncRNA-mRNA causal regulatory networks | 13

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

MSLCRN networks are biologically meaningful

In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks

We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases

Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful

Comparison with other PC-based networkinference methods

Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045

We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers

Conclusions and discussion

Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions

As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological

functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks

Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs

In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers

Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and

Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks

Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)

MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)

Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-

ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-

ciated MSLCRN networks

14 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs

Key Points

bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers

bull lncRNAs exhibit dynamic positive gene regulationacross human cancers

bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships

bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms

Supplementary Data

Supplementary data are available online at httpsacademicoupcombib

Funding

The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)

References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding

RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5

2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69

3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63

4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042

5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30

6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4

7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82

8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63

9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56

10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6

11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12

12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74

13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89

14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22

15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156

16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78

17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683

18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15

19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731

20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13

21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559

22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64

23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8

24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526

25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13

26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3

27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82

28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19

29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46

Module-specific lncRNA-mRNA causal regulatory networks | 15

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60

31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016

32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12

33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892

34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408

35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51

36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32

37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305

38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15

39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99

40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057

41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6

42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6

43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083

44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138

45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8

46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7

47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7

48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370

49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5

50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034

51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082

52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2

53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9

54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17

55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000

56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000

57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1

58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6

59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910

60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000

61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7

62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9

63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30

64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5

65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765

66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9

67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40

68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46

69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52

70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94

71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88

16 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58

73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74

74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104

75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5

76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31

77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90

Module-specific lncRNA-mRNA causal regulatory networks | 17

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats

  • bby008-TF1
  • bby008-TF51
  • bby008-TF2
Page 9: Inferring and analyzing module-specific lncRNA-mRNA causal ...nugget.unisa.edu.au/Thuc/Briefings2019JP.pdf · Thuc Duy Le is a research fellow at the University of South Australia

Specifically the matched lncRNA and mRNA expressiondata are used as the input of WGCNA For each pair of genesi and j the gene co-expression similarity sij of the pair is definedas

sij frac14 jcorethi jTHORNj (1)

where jcor(i j)j is the absolute value of the Pearson correlationbetween genes i and j The gene co-expression similarity matrixis denoted by Sfrac14 [sij]

To pick an appropriate soft-thresholding power for trans-forming the similarity matrix S into an adjacency matrix A weuse the scale-free topology criterion for soft-thresholding andthe minimum scale free topology fitting index R2 is set as 09Then the topological overlap matrix (TOM) Wfrac14 [wij] is gener-ated based on the adjacency matrix Afrac14 [aij] The TOM similaritywij between genes i and j is defined

wij frac14P

uaiuauj thorn aij

minfP

uaiuP

uaujg thorn 1 aij(2)

where u denotes all genes of the matched lncRNA and mRNAexpression data The TOM dissimilarity between genes i and j isdenoted by dijfrac14 1 - wij To identify gene co-expression modulesthe TOM dissimilarity matrix Dfrac14 [dij] is clustered using optimalhierarchical clustering method [54] Here the identified geneco-expression modules are groups of lncRNAs and mRNAs withhigh topological overlap The lncRNAs and mRNAs of eachlncRNAndashmRNA co-expression module are considered for possi-ble lncRNAndashmRNA causal relationships in the next step

Identification of module-specific lncRNAndashmRNA causalregulatory networks

After the identification of lncRNAndashmRNA co-expression mod-ules we use the parallel IDA method [24] to estimate causaleffects of possible lncRNAndashmRNA causal pairs in each moduleThe application of parallel IDA method to matched lncRNAand mRNA expression data for estimating causal effectsincludes two steps (i) learning the causal structure fromexpression data using the parallel-PC algorithm [24] and(ii) estimating the causal effects of lncRNAs on mRNAs byapplying do-calculus [55]

In step (i) Vfrac14 L1 Lm T1 Tn is a set of random varia-bles denoting m lncRNAs and n mRNAs The causal structure isin the form of a DAG where a node denotes a lncRNA Li ormRNA Tj and an edge between two nodes represents a causalrelationship between them We use the parallel-PC algorithm aparallel version of the PC algorithm [56] to learn the causalstructures (the DAGs) from expression data Starting with a fullyconnected undirected graph the parallel-PC algorithm deter-mines if an edge is retained or removed in the graph by con-ducting conditional independence tests in parallel Then to geta DAG the directions of edges in the obtained graph are ori-ented As different DAGs may represent the same conditionalindependence the parallel-PC algorithm uses a completed par-tially directed acyclic graph (CPDAG) to uniquely describe anequivalence class of DAGs In this work we use the R-packageParallelPC [57] to implement the parallel-PC algorithm and setthe significant level of the conditional independence testsafrac14 001

In step (ii) we are only interested in estimating the causaleffect of the directed edge Li Tj where vertex is Li a parent ofvertex Tj As described above a CPDAG may generate a class ofDAGs For the causal effect of Li Tj in a CPDAG we use do-calculus [55] to estimate the causal effects of Li on Tj in a class ofDAGs Then we use the minimum absolute value of all possiblecausal effects as a final causal effect of Li Tj As for the detailsof how the parallel IDA method is applied to estimate causalrelationships from expression data the readers can refer to [24]

The estimated causal effects can be positive or negativereflecting the up or down regulation by the lncRNAs on themRNAs For the purpose of constructing the regulatory net-works we use the absolute values of the causal effects (AVCEs)to evaluate the strengths of the regulation and thus to confirmthe regulatory relationships

We set different AVCE cutoffs from 010 to 060 with a step of005 to generate MSLCRN networks in GBM LSCC OvCa andPrCa respectively For each cutoff we merge the identifiedMSLCRN networks to obtain global lncRNAndashmRNA causal regu-latory networks in the four human cancers respectively Asshown in Table 3 a higher cutoff selection causes a smallerglobal lncRNAndashmRNA causal regulatory network but bettergoodness of fit To make a trade-off between the size of theglobal lncRNAndashmRNA causal regulatory networks and goodnessof fit we set a compromised AVCE cutoff with a value of 045 Ifthe AVCE of a lncRNA on a mRNA is 045 or above we considerthere is a causal regulatory relationship between the lncRNAndashmRNA pair Under the compromise cutoff we have a moderatesize of the global lncRNAndashmRNA causal regulatory networks inGBM LSCC OvCa and PrCa Meanwhile the node degree distri-butions of four global lncRNAndashmRNA causal regulatory net-works also follow power law distribution (the fitted power curveis in the form of yfrac14 axb) well with R2gt 08

Validation survival and enrichment analysis

Previous studies have demonstrated that about 20 of thenodes in a biological network are essential and are regarded ashub genes [58 59] Therefore when analyzing a global lncRNAndashmRNA causal network we select the 20 of lncRNAs with thehighest degrees in the network as hub lncRNAs The degree of alncRNA node in the global network is the number of mRNAsconnected with it

To validate the predicted module-specific lncRNAndashmRNAcausal regulatory relationships we obtain the experimentallyvalidated lncRNAndashmRNA regulatory relationships from thethree widely used databases NPInter v30 [40] LncRNADiseasev2017 [41] and LncRNA2Target v12 [42] Furthermore we retainexperimentally validated lncRNAndashmRNA regulatory relation-ships associated with the four human cancer data sets asground truth

We perform survival analysis using the R-package survival[60] A multivariate Cox model is used to predict the risk scoreof each tumor sample Then all tumor samples in each cancerdata set are equally divided into high- and low-risk groupsaccording to their risk scores Moreover we calculate theHazard Ratio between the high- and the low-risk groups andperform the Log-rank test

To further investigate the underlying biological processesand pathways related to each of the MSLCRN networks we usethe R-package clusterProfiler [61] to conduct functional enrich-ment analysis on the networks respectively The GeneOntology (GO) [62] biological processes and Kyoto Encyclopedia

8 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

of Genes and Genomes (KEGG) [63] pathways with adjustedP-valuelt005 [adjusted by Benjamini-Hochberg (BH) method]are regarded as functional categories for the MSLCRN networks

We also collect a list of lncRNAs and mRNAs that areassociated with GBM LSCC OvCa and PrCa to study diseaseenrichment of each of the MSLCRN networks The list of disease-associated lncRNAs is obtained from LncRNADisease v2017 [41]Lnc2Cancer v2016 [64] and MNDR v20 [65] The list of disease-associated mRNAs is from DisGeNET v50 [66] To evaluatewhether a MSLCRN network is significantly enriched in a specificdisease we use a hyper-geometric distribution test as follows

p frac14 1 FethxjBNMTHORN frac14 1Xx1

ifrac140

N

i

B N

M i

B

M

(3)

In the formula B is the number of all genes in the expressiondata set N denotes the number of all genes associated with aspecific disease in the expression data set M is the number ofgenes in a MSLCRN network and x is the number of genes asso-ciated with a specific disease in a MSLCRN network A MSLCRNnetwork is significantly enriched in a specific disease if theP-valuelt 005

Network analysis validation and comparisonon MSLCRN networkslncRNAs exhibit dynamic positive gene regulationacross cancers

By following the first step of the MSLCRN method we haveidentified 23 38 45 and 32 lncRNAndashmRNA co-expression mod-ules in GBM LSCC OvCa and PrCa respectively In the secondstep of the MSLCRN method we eliminate the noncausallncRNAndashmRNA pairs in lncRNAndashmRNA co-expression modulesAs a result we generate 23 38 45 and 32 module-specificlncRNAndashmRNA causal regulatory networks in GBM LSCC OvCaand PrCa respectively After merging the module-specificlncRNAndashmRNA causal regulatory networks for each data set weobtain the four global lncRNAndashmRNA regulatory networks inGBM LSCC OvCa and PrCa respectively

To understand the overlap and difference of module-specificgenes module-specific lncRNAndashmRNA causal regulatory rela-tionships and module-specific hub lncRNAs in the four humancancers we generate three set intersection plots using theR-package UpSetR [67] As shown in Figure 3 we find that themajority of module-specific genes (5752) module-specificlncRNAndashmRNA causal regulatory relationships (9902) andmodule-specific hub lncRNAs (8922) tend to be cancer-specific Only a small portion of module-specific genes (396) andmodule-specific lncRNAndashmRNA causal regulatory relationships(6) are shared by the four cancers Especially none of themodule-specific hub lncRNAs are common between the fourcancers In addition the causal effects are positive for 99569672 9993 and 7863 of the causal regulatory relationshipsidentified in GBM LSCC OvCa and PrCa respectively Theseresults indicate that lncRNAs are more likely to exhibit dynamicpositive gene regulation across cancers The results are alsoconsistent with the proposition that the positive gene regula-tion by lncRNAs would be desired in specific situations [68]

Differential network analysis uncovers cancer-specificlncRNAndashmRNA causal networks

In this section we focus on studying cancer-specific lncRNAndashmRNA causal networks using differential network analysisThus the GBM-specific LSCC-specific OvCa-specific and PrCa-specific lncRNAndashmRNA causal networks are identified Asshown in Figure 4A the distributions of node degrees in thesefour cancer-specific lncRNAndashmRNA causal networks followpower law distributions well with R2frac14 09774 09923 09723and 08310 respectively Thus these four cancer-specificlncRNAndashmRNA causal networks are scale free indicating thatmost mRNAs are regulated by a small number of lncRNAs

Table 3 Degree distributions of global lncRNAndashmRNA causal regula-tory networks with different cutoffs in GBM LSCC OvCa and PrCa

Datasets Cutoffs Number of causalregulations

yfrac14axb R2

GBM 010 11 847 yfrac142274x06893 04161015 10 924 yfrac142495x07275 05460020 9732 yfrac142745x0767 06475025 8461 yfrac142958x08074 06757030 7176 yfrac143194x08319 06807035 6041 yfrac143363x08703 07203040 4997 yfrac143741x09348 07999045 4074 y54082x21034 08694050 3279 yfrac144194x118 09244055 2583 yfrac143896x1259 09463060 1862 yfrac143666x143 09792

LSCC 010 789 172 yfrac143143x06071 04829015 684 524 yfrac143475x06323 05841020 569 369 yfrac143905x06525 06578025 451 346 yfrac144855x06928 07789030 340 860 yfrac146341x07554 08796035 244 547 yfrac148147x08379 09504040 166 593 yfrac149724x0935 09848045 108 024 y51031x21018 09933050 66 335 yfrac149425x1068 09963055 37 632 yfrac147807x1089 09948060 19 547 yfrac146565x1169 09972

OvCa 010 333 146 yfrac143272x05928 05042015 232 794 yfrac144192x06262 06531020 159 872 yfrac146398x07216 08247025 112 792 yfrac148816x08356 09120030 80 808 yfrac141008x09472 09551035 57 099 yfrac149545x1014 09744040 38 517 yfrac148198x1066 09748045 24 439 y56575x21066 09697050 14 435 yfrac14540x1079 09551055 7973 yfrac144368x1107 09319060 4026 yfrac143285x1107 09460

PrCa 010 1 894 322 yfrac143089x06245 02750015 1 749 595 yfrac143586x06787 03582020 1 594 744 yfrac144013x07169 04316025 1 429 858 yfrac144271x0732 04919030 1 260 968 yfrac144389x07244 05616035 1 097 654 yfrac144406x0702 06470040 946 439 yfrac144485x06816 07338045 812 687 y55175x207005 08206050 694 558 yfrac14667x07588 08823055 584 834 yfrac148833x08469 09332060 474 654 yfrac141113x09684 09503

Note The AVCE cutoffs range from 010 to 060 with a step of 005

The bold values are the degree distributions of global lncRNA-mRNA causal reg-

ulatory networks with a compromised AVCE cutoff (045) in four human cancers

Module-specific lncRNA-mRNA causal regulatory networks | 9

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

Next we use four lists of lncRNAs and mRNAs associatedwith GBM LSCC OvCa and PrCa to discover lncRNAndashmRNAcausal networks that are associated with the four human can-cers We define that cancer-related lncRNAndashmRNA causal regu-latory relationships are those in which at least one regulatoryparty is cancer-related lncRNA or mRNA As a result wehave extracted GBM-related LSCC-related OvCa-related andPrCa-related lncRNAndashmRNA causal networks from the fourcancer-specific lncRNAndashmRNA causal networks (details inSupplementary File S1) To understand the potential biologicalprocesses and pathways of the four cancer-related lncRNAndashmRNA causal networks we identify significant GO biologicalprocesses and KEGG pathways using functional enrichmentanalysis In Figure 4B several top GO biological processes and

KEGG pathways such as cytokine activity [69] G-proteincoupled receptor binding [70] TNF signaling pathway [71] cAMPsignaling pathway [72] pathways in cancer are closely associ-ated with the occurrence and development of cancer Thisresult suggests that the identified cancer-related lncRNAndashmRNA causal networks may be involved in the occurrence anddevelopment of human cancer

Conservative network analysis highlights a corelncRNAndashmRNA causal regulatory network acrosshuman cancers

Although most of the lncRNAndashmRNA causal regulatory relation-ships are cancer-specific there are still a number of common

396 424

106 115

1370

133 173

1599

126

1729

1224

252

2523

2157

5081

0

2000

4000

Inte

rsec

tion

Siz

e

PrCa

Ov Ca

LSCC

GBM

025

0050

0075

00

1000

0

Set Size

6 283

13 1 80 673

206

3969

76 3493

407

2816

9950

7

1948

7

0

250000

500000

750000

Inte

rsec

tion

Siz

e

PrCa

Ov Ca

LSCC

GBM

0e+0

0

2e+0

5

4e+0

5

6e+0

5

8e+0

5

Set Size

6 1 2 933

2

41

6 11

246

47

524

0

200

400In

ters

ectio

n S

ize

PrCa

Ov Ca

LSCC

GBM

0200

400

600

Set Size

Module-specific genes

Module-specific causal regulations Module-specific hub lncRNAs 808611

A

B C

Figure 3 Overlap and difference of module-specific genes module-specific causal regulations and module-specific hub lncRNAs across GBM LSCC OvCa and PrCa

(A) Module-specific genes (both lncRNAs and mRNAs) intersection plot (B) Module-specific causal regulations intersection plot (C) Module-specific hub lncRNAs inter-

section plot The red lines denote common genes and causal regulations across GBM LSCC OvCa and PrCa

10 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

causal regulatory relationships between the four global net-works To evaluate whether there is a common core of lncRNAndashmRNA causal regulatory relationships in the global regulatorynetworks across human cancers we concentrate on the con-served lncRNAndashmRNA causal regulatory relationships thatexisted in at least three human cancers

As shown in Figure 5A the majority of the conservedlncRNAndashmRNA causal regulatory relationships form a closely

connected community This finding indicates that the con-served lncRNAndashmRNA causal regulatory network may be a corenetwork across human cancers

The survival analysis shows that the lncRNAs and mRNAs inthe core network can significantly distinguish the metastasisrisks between the high- and low-risk groups in GBM OvCa andPrCa data sets (Figure 5B) This result suggests that the core net-work may act as a common network biomarker of GBM OvCa

Cancer-specific networks Causal regulations y=axb R2

GBM-specific 2816 y=4812x-1292 09774

LSCC-specific 99507 y=1034x-1014 09923

OvCa-specific 19487 y=7366x-1156 09723

PrCa-specific 808611 y=5243x-07055 08310

1 10 100 11001

10

100

1100

Degree of genes

Nu

mb

er o

f ge

nes

GBM-specific fitting curveLSCC-specific fitting curveOvCa-specific fitting curvePrCa-specific fitting curveGBM-specific degree distributionLSCC-specific degree distributionOvCa-specific degree distributionPrCa-specific degree distribution

solutecation symporter activitysymporter activity

gated channel activitysodium ion transmembrane transporter activity

ion channel activitysubstrate-specific channel activity

channel activitypassive transmembrane transporter activity

growth factor activitycation channel activity

metal ion transmembrane transporter activitycollagen bindingheparin binding

sulfur compound bindingintegrin binding

glycosaminoglycan bindingextracellular matrix binding

peptide receptor activityG-protein coupled peptide receptor activity

chemokine bindingG-protein coupled receptor binding

growth factor bindingcytokine binding

serine-type endopeptidase activitydeath receptor activity

tumor necrosis factor-activated receptor activitycytokine receptor activity

protein heterodimerization activitydipeptidase activity

glycoprotein bindingRAGE receptor binding

cytokine receptor bindingcytokine activity

GBM(203)

LSCC(1015)

OvCa(479)

PrCa(3019)

001

002

003

004

padjust

GeneRatio002

004

006

008

Taste transduction

ECM-receptor interaction

cAMP signaling pathway

Calcium signaling pathway

Neuroactive ligand-receptor interaction

PI3K-Akt signaling pathway

Pathways in cancerRegulation of actin cytoskeleton

Complement and coagulation cascades

AGE-RAGE signaling pathway in diabetic complications

Hematopoietic cell lineage

Th17 cell differentiation

Inflammatory bowel disease (IBD)

Malaria

Osteoclast differentiation

Influenza A

Tuberculosis

Intestinal immune network for IgA production

Chagas disease (American trypanosomiasis)

Leishmaniasis

TNF signaling pathway

Toll-like receptor signaling pathway

Rheumatoid arthritis

Cytokine-cytokine receptor interaction

GBM(129)

LSCC(500)

OvCa(266)

PrCa(1364)

GeneRatio

005

010

015

001

002

003

004padjust

GO enrichment analysis KEGG enrichment analysis

A

B

Figure 4 Differential network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) Degree distribution of cancer-specific lncRNAndashmRNA

causal networks in GBM LSCC OvCa and PrCa (B) Functional enrichment analysis of cancer-related lncRNAndashmRNA causal networks in GBM LSCC OvCa and PrCa

Module-specific lncRNA-mRNA causal regulatory networks | 11

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

and PrCa In Figure 5B we also find that the core network con-tains several cancer genes (34 26 30 and 38 cancer genes asso-ciated with GBM LSCC OvCa and PrCa respectively)

By conducting GO and KEGG enrichment analysis we findthat the core network is significantly enriched in 399 GO biologi-cal processes and 3 KEGG pathways (details in SupplementaryFile S2) Of the 399 GO biological processes 2 GO terms includ-ing negative regulation of cell adhesion (GO 0007162) and cyto-kine production in immune response (GO 0002367) areinvolved in three cancer hallmarks Tissue Invasion andMetastasis Tumor Promoting Inflammation and EvadingImmune Detection [73] This observation implies that the corenetwork may control these cancer-related hallmarks

Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

We divide the hub lncRNAs into two categories (1) conserved hublncRNAs which exist in at least three human cancers and (2)cancer-specific hub lncRNAs which only exist in single humancancer As a result we obtain 9 conserved hub lncRNAs and 828cancer-specific hub lncRNAs (include 11 GBM-specific 246 LSCC-specific 47 OvCa-specific and 524 PrCa-specific hub lncRNAs)

To evaluate whether the hub lncRNAs can distinguish meta-stasis risks of human cancers we use them to predict metasta-sis risks for tumor samples in GBM LSCC OvCa and PrCaAs shown in Figure 6A the conserved hub lncRNAs can discrim-inate the metastasis risks of tumor samples significantly(Log-rank P-valuelt 005) in four human cancers In Figure 6Bexcepting LSCC-specific hub lncRNAs owing to failing to fit aCox regression model GBM-specific OvCa-specific and PrCa-specific hub lncRNAs can discriminate the metastasis risks oftumor samples significantly in GBM OvCa and PrCa respec-tively (Log-rank P-valuelt 005) These results suggest that thehub lncRNAs are discriminative and can act as biomarkers todistinguish between high- and low-risk tumor samples

Experimentally validated lncRNAndashmRNA regulations aremostly bad hits for LncTar

Using a collection of experimentally validated lncRNAndashmRNAregulatory relationships (details in Supplementary File S3) asthe ground truth the numbers of experimentally confirmedlncRNAndashmRNA causal regulations are 17 14 20 and 42 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4)

Figure 5 Conservative network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) The core lncRNAndashmRNA causal network that

occurred in at least three human cancers The red diamond nodes and white circle nodes denote lncRNAs and mRNAs respectively (B) Survival analysis of the core

lncRNAndashmRNA causal network

12 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as

lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations

A

B

Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs

Module-specific lncRNA-mRNA causal regulatory networks | 13

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

MSLCRN networks are biologically meaningful

In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks

We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases

Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful

Comparison with other PC-based networkinference methods

Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045

We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers

Conclusions and discussion

Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions

As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological

functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks

Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs

In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers

Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and

Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks

Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)

MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)

Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-

ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-

ciated MSLCRN networks

14 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs

Key Points

bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers

bull lncRNAs exhibit dynamic positive gene regulationacross human cancers

bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships

bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms

Supplementary Data

Supplementary data are available online at httpsacademicoupcombib

Funding

The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)

References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding

RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5

2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69

3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63

4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042

5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30

6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4

7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82

8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63

9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56

10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6

11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12

12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74

13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89

14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22

15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156

16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78

17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683

18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15

19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731

20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13

21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559

22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64

23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8

24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526

25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13

26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3

27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82

28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19

29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46

Module-specific lncRNA-mRNA causal regulatory networks | 15

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60

31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016

32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12

33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892

34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408

35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51

36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32

37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305

38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15

39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99

40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057

41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6

42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6

43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083

44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138

45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8

46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7

47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7

48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370

49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5

50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034

51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082

52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2

53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9

54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17

55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000

56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000

57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1

58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6

59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910

60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000

61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7

62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9

63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30

64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5

65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765

66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9

67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40

68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46

69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52

70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94

71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88

16 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58

73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74

74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104

75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5

76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31

77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90

Module-specific lncRNA-mRNA causal regulatory networks | 17

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats

  • bby008-TF1
  • bby008-TF51
  • bby008-TF2
Page 10: Inferring and analyzing module-specific lncRNA-mRNA causal ...nugget.unisa.edu.au/Thuc/Briefings2019JP.pdf · Thuc Duy Le is a research fellow at the University of South Australia

of Genes and Genomes (KEGG) [63] pathways with adjustedP-valuelt005 [adjusted by Benjamini-Hochberg (BH) method]are regarded as functional categories for the MSLCRN networks

We also collect a list of lncRNAs and mRNAs that areassociated with GBM LSCC OvCa and PrCa to study diseaseenrichment of each of the MSLCRN networks The list of disease-associated lncRNAs is obtained from LncRNADisease v2017 [41]Lnc2Cancer v2016 [64] and MNDR v20 [65] The list of disease-associated mRNAs is from DisGeNET v50 [66] To evaluatewhether a MSLCRN network is significantly enriched in a specificdisease we use a hyper-geometric distribution test as follows

p frac14 1 FethxjBNMTHORN frac14 1Xx1

ifrac140

N

i

B N

M i

B

M

(3)

In the formula B is the number of all genes in the expressiondata set N denotes the number of all genes associated with aspecific disease in the expression data set M is the number ofgenes in a MSLCRN network and x is the number of genes asso-ciated with a specific disease in a MSLCRN network A MSLCRNnetwork is significantly enriched in a specific disease if theP-valuelt 005

Network analysis validation and comparisonon MSLCRN networkslncRNAs exhibit dynamic positive gene regulationacross cancers

By following the first step of the MSLCRN method we haveidentified 23 38 45 and 32 lncRNAndashmRNA co-expression mod-ules in GBM LSCC OvCa and PrCa respectively In the secondstep of the MSLCRN method we eliminate the noncausallncRNAndashmRNA pairs in lncRNAndashmRNA co-expression modulesAs a result we generate 23 38 45 and 32 module-specificlncRNAndashmRNA causal regulatory networks in GBM LSCC OvCaand PrCa respectively After merging the module-specificlncRNAndashmRNA causal regulatory networks for each data set weobtain the four global lncRNAndashmRNA regulatory networks inGBM LSCC OvCa and PrCa respectively

To understand the overlap and difference of module-specificgenes module-specific lncRNAndashmRNA causal regulatory rela-tionships and module-specific hub lncRNAs in the four humancancers we generate three set intersection plots using theR-package UpSetR [67] As shown in Figure 3 we find that themajority of module-specific genes (5752) module-specificlncRNAndashmRNA causal regulatory relationships (9902) andmodule-specific hub lncRNAs (8922) tend to be cancer-specific Only a small portion of module-specific genes (396) andmodule-specific lncRNAndashmRNA causal regulatory relationships(6) are shared by the four cancers Especially none of themodule-specific hub lncRNAs are common between the fourcancers In addition the causal effects are positive for 99569672 9993 and 7863 of the causal regulatory relationshipsidentified in GBM LSCC OvCa and PrCa respectively Theseresults indicate that lncRNAs are more likely to exhibit dynamicpositive gene regulation across cancers The results are alsoconsistent with the proposition that the positive gene regula-tion by lncRNAs would be desired in specific situations [68]

Differential network analysis uncovers cancer-specificlncRNAndashmRNA causal networks

In this section we focus on studying cancer-specific lncRNAndashmRNA causal networks using differential network analysisThus the GBM-specific LSCC-specific OvCa-specific and PrCa-specific lncRNAndashmRNA causal networks are identified Asshown in Figure 4A the distributions of node degrees in thesefour cancer-specific lncRNAndashmRNA causal networks followpower law distributions well with R2frac14 09774 09923 09723and 08310 respectively Thus these four cancer-specificlncRNAndashmRNA causal networks are scale free indicating thatmost mRNAs are regulated by a small number of lncRNAs

Table 3 Degree distributions of global lncRNAndashmRNA causal regula-tory networks with different cutoffs in GBM LSCC OvCa and PrCa

Datasets Cutoffs Number of causalregulations

yfrac14axb R2

GBM 010 11 847 yfrac142274x06893 04161015 10 924 yfrac142495x07275 05460020 9732 yfrac142745x0767 06475025 8461 yfrac142958x08074 06757030 7176 yfrac143194x08319 06807035 6041 yfrac143363x08703 07203040 4997 yfrac143741x09348 07999045 4074 y54082x21034 08694050 3279 yfrac144194x118 09244055 2583 yfrac143896x1259 09463060 1862 yfrac143666x143 09792

LSCC 010 789 172 yfrac143143x06071 04829015 684 524 yfrac143475x06323 05841020 569 369 yfrac143905x06525 06578025 451 346 yfrac144855x06928 07789030 340 860 yfrac146341x07554 08796035 244 547 yfrac148147x08379 09504040 166 593 yfrac149724x0935 09848045 108 024 y51031x21018 09933050 66 335 yfrac149425x1068 09963055 37 632 yfrac147807x1089 09948060 19 547 yfrac146565x1169 09972

OvCa 010 333 146 yfrac143272x05928 05042015 232 794 yfrac144192x06262 06531020 159 872 yfrac146398x07216 08247025 112 792 yfrac148816x08356 09120030 80 808 yfrac141008x09472 09551035 57 099 yfrac149545x1014 09744040 38 517 yfrac148198x1066 09748045 24 439 y56575x21066 09697050 14 435 yfrac14540x1079 09551055 7973 yfrac144368x1107 09319060 4026 yfrac143285x1107 09460

PrCa 010 1 894 322 yfrac143089x06245 02750015 1 749 595 yfrac143586x06787 03582020 1 594 744 yfrac144013x07169 04316025 1 429 858 yfrac144271x0732 04919030 1 260 968 yfrac144389x07244 05616035 1 097 654 yfrac144406x0702 06470040 946 439 yfrac144485x06816 07338045 812 687 y55175x207005 08206050 694 558 yfrac14667x07588 08823055 584 834 yfrac148833x08469 09332060 474 654 yfrac141113x09684 09503

Note The AVCE cutoffs range from 010 to 060 with a step of 005

The bold values are the degree distributions of global lncRNA-mRNA causal reg-

ulatory networks with a compromised AVCE cutoff (045) in four human cancers

Module-specific lncRNA-mRNA causal regulatory networks | 9

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

Next we use four lists of lncRNAs and mRNAs associatedwith GBM LSCC OvCa and PrCa to discover lncRNAndashmRNAcausal networks that are associated with the four human can-cers We define that cancer-related lncRNAndashmRNA causal regu-latory relationships are those in which at least one regulatoryparty is cancer-related lncRNA or mRNA As a result wehave extracted GBM-related LSCC-related OvCa-related andPrCa-related lncRNAndashmRNA causal networks from the fourcancer-specific lncRNAndashmRNA causal networks (details inSupplementary File S1) To understand the potential biologicalprocesses and pathways of the four cancer-related lncRNAndashmRNA causal networks we identify significant GO biologicalprocesses and KEGG pathways using functional enrichmentanalysis In Figure 4B several top GO biological processes and

KEGG pathways such as cytokine activity [69] G-proteincoupled receptor binding [70] TNF signaling pathway [71] cAMPsignaling pathway [72] pathways in cancer are closely associ-ated with the occurrence and development of cancer Thisresult suggests that the identified cancer-related lncRNAndashmRNA causal networks may be involved in the occurrence anddevelopment of human cancer

Conservative network analysis highlights a corelncRNAndashmRNA causal regulatory network acrosshuman cancers

Although most of the lncRNAndashmRNA causal regulatory relation-ships are cancer-specific there are still a number of common

396 424

106 115

1370

133 173

1599

126

1729

1224

252

2523

2157

5081

0

2000

4000

Inte

rsec

tion

Siz

e

PrCa

Ov Ca

LSCC

GBM

025

0050

0075

00

1000

0

Set Size

6 283

13 1 80 673

206

3969

76 3493

407

2816

9950

7

1948

7

0

250000

500000

750000

Inte

rsec

tion

Siz

e

PrCa

Ov Ca

LSCC

GBM

0e+0

0

2e+0

5

4e+0

5

6e+0

5

8e+0

5

Set Size

6 1 2 933

2

41

6 11

246

47

524

0

200

400In

ters

ectio

n S

ize

PrCa

Ov Ca

LSCC

GBM

0200

400

600

Set Size

Module-specific genes

Module-specific causal regulations Module-specific hub lncRNAs 808611

A

B C

Figure 3 Overlap and difference of module-specific genes module-specific causal regulations and module-specific hub lncRNAs across GBM LSCC OvCa and PrCa

(A) Module-specific genes (both lncRNAs and mRNAs) intersection plot (B) Module-specific causal regulations intersection plot (C) Module-specific hub lncRNAs inter-

section plot The red lines denote common genes and causal regulations across GBM LSCC OvCa and PrCa

10 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

causal regulatory relationships between the four global net-works To evaluate whether there is a common core of lncRNAndashmRNA causal regulatory relationships in the global regulatorynetworks across human cancers we concentrate on the con-served lncRNAndashmRNA causal regulatory relationships thatexisted in at least three human cancers

As shown in Figure 5A the majority of the conservedlncRNAndashmRNA causal regulatory relationships form a closely

connected community This finding indicates that the con-served lncRNAndashmRNA causal regulatory network may be a corenetwork across human cancers

The survival analysis shows that the lncRNAs and mRNAs inthe core network can significantly distinguish the metastasisrisks between the high- and low-risk groups in GBM OvCa andPrCa data sets (Figure 5B) This result suggests that the core net-work may act as a common network biomarker of GBM OvCa

Cancer-specific networks Causal regulations y=axb R2

GBM-specific 2816 y=4812x-1292 09774

LSCC-specific 99507 y=1034x-1014 09923

OvCa-specific 19487 y=7366x-1156 09723

PrCa-specific 808611 y=5243x-07055 08310

1 10 100 11001

10

100

1100

Degree of genes

Nu

mb

er o

f ge

nes

GBM-specific fitting curveLSCC-specific fitting curveOvCa-specific fitting curvePrCa-specific fitting curveGBM-specific degree distributionLSCC-specific degree distributionOvCa-specific degree distributionPrCa-specific degree distribution

solutecation symporter activitysymporter activity

gated channel activitysodium ion transmembrane transporter activity

ion channel activitysubstrate-specific channel activity

channel activitypassive transmembrane transporter activity

growth factor activitycation channel activity

metal ion transmembrane transporter activitycollagen bindingheparin binding

sulfur compound bindingintegrin binding

glycosaminoglycan bindingextracellular matrix binding

peptide receptor activityG-protein coupled peptide receptor activity

chemokine bindingG-protein coupled receptor binding

growth factor bindingcytokine binding

serine-type endopeptidase activitydeath receptor activity

tumor necrosis factor-activated receptor activitycytokine receptor activity

protein heterodimerization activitydipeptidase activity

glycoprotein bindingRAGE receptor binding

cytokine receptor bindingcytokine activity

GBM(203)

LSCC(1015)

OvCa(479)

PrCa(3019)

001

002

003

004

padjust

GeneRatio002

004

006

008

Taste transduction

ECM-receptor interaction

cAMP signaling pathway

Calcium signaling pathway

Neuroactive ligand-receptor interaction

PI3K-Akt signaling pathway

Pathways in cancerRegulation of actin cytoskeleton

Complement and coagulation cascades

AGE-RAGE signaling pathway in diabetic complications

Hematopoietic cell lineage

Th17 cell differentiation

Inflammatory bowel disease (IBD)

Malaria

Osteoclast differentiation

Influenza A

Tuberculosis

Intestinal immune network for IgA production

Chagas disease (American trypanosomiasis)

Leishmaniasis

TNF signaling pathway

Toll-like receptor signaling pathway

Rheumatoid arthritis

Cytokine-cytokine receptor interaction

GBM(129)

LSCC(500)

OvCa(266)

PrCa(1364)

GeneRatio

005

010

015

001

002

003

004padjust

GO enrichment analysis KEGG enrichment analysis

A

B

Figure 4 Differential network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) Degree distribution of cancer-specific lncRNAndashmRNA

causal networks in GBM LSCC OvCa and PrCa (B) Functional enrichment analysis of cancer-related lncRNAndashmRNA causal networks in GBM LSCC OvCa and PrCa

Module-specific lncRNA-mRNA causal regulatory networks | 11

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

and PrCa In Figure 5B we also find that the core network con-tains several cancer genes (34 26 30 and 38 cancer genes asso-ciated with GBM LSCC OvCa and PrCa respectively)

By conducting GO and KEGG enrichment analysis we findthat the core network is significantly enriched in 399 GO biologi-cal processes and 3 KEGG pathways (details in SupplementaryFile S2) Of the 399 GO biological processes 2 GO terms includ-ing negative regulation of cell adhesion (GO 0007162) and cyto-kine production in immune response (GO 0002367) areinvolved in three cancer hallmarks Tissue Invasion andMetastasis Tumor Promoting Inflammation and EvadingImmune Detection [73] This observation implies that the corenetwork may control these cancer-related hallmarks

Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

We divide the hub lncRNAs into two categories (1) conserved hublncRNAs which exist in at least three human cancers and (2)cancer-specific hub lncRNAs which only exist in single humancancer As a result we obtain 9 conserved hub lncRNAs and 828cancer-specific hub lncRNAs (include 11 GBM-specific 246 LSCC-specific 47 OvCa-specific and 524 PrCa-specific hub lncRNAs)

To evaluate whether the hub lncRNAs can distinguish meta-stasis risks of human cancers we use them to predict metasta-sis risks for tumor samples in GBM LSCC OvCa and PrCaAs shown in Figure 6A the conserved hub lncRNAs can discrim-inate the metastasis risks of tumor samples significantly(Log-rank P-valuelt 005) in four human cancers In Figure 6Bexcepting LSCC-specific hub lncRNAs owing to failing to fit aCox regression model GBM-specific OvCa-specific and PrCa-specific hub lncRNAs can discriminate the metastasis risks oftumor samples significantly in GBM OvCa and PrCa respec-tively (Log-rank P-valuelt 005) These results suggest that thehub lncRNAs are discriminative and can act as biomarkers todistinguish between high- and low-risk tumor samples

Experimentally validated lncRNAndashmRNA regulations aremostly bad hits for LncTar

Using a collection of experimentally validated lncRNAndashmRNAregulatory relationships (details in Supplementary File S3) asthe ground truth the numbers of experimentally confirmedlncRNAndashmRNA causal regulations are 17 14 20 and 42 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4)

Figure 5 Conservative network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) The core lncRNAndashmRNA causal network that

occurred in at least three human cancers The red diamond nodes and white circle nodes denote lncRNAs and mRNAs respectively (B) Survival analysis of the core

lncRNAndashmRNA causal network

12 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as

lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations

A

B

Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs

Module-specific lncRNA-mRNA causal regulatory networks | 13

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

MSLCRN networks are biologically meaningful

In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks

We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases

Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful

Comparison with other PC-based networkinference methods

Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045

We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers

Conclusions and discussion

Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions

As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological

functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks

Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs

In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers

Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and

Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks

Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)

MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)

Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-

ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-

ciated MSLCRN networks

14 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs

Key Points

bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers

bull lncRNAs exhibit dynamic positive gene regulationacross human cancers

bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships

bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms

Supplementary Data

Supplementary data are available online at httpsacademicoupcombib

Funding

The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)

References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding

RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5

2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69

3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63

4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042

5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30

6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4

7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82

8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63

9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56

10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6

11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12

12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74

13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89

14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22

15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156

16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78

17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683

18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15

19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731

20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13

21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559

22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64

23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8

24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526

25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13

26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3

27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82

28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19

29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46

Module-specific lncRNA-mRNA causal regulatory networks | 15

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60

31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016

32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12

33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892

34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408

35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51

36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32

37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305

38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15

39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99

40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057

41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6

42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6

43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083

44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138

45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8

46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7

47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7

48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370

49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5

50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034

51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082

52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2

53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9

54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17

55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000

56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000

57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1

58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6

59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910

60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000

61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7

62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9

63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30

64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5

65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765

66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9

67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40

68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46

69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52

70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94

71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88

16 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58

73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74

74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104

75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5

76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31

77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90

Module-specific lncRNA-mRNA causal regulatory networks | 17

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats

  • bby008-TF1
  • bby008-TF51
  • bby008-TF2
Page 11: Inferring and analyzing module-specific lncRNA-mRNA causal ...nugget.unisa.edu.au/Thuc/Briefings2019JP.pdf · Thuc Duy Le is a research fellow at the University of South Australia

Next we use four lists of lncRNAs and mRNAs associatedwith GBM LSCC OvCa and PrCa to discover lncRNAndashmRNAcausal networks that are associated with the four human can-cers We define that cancer-related lncRNAndashmRNA causal regu-latory relationships are those in which at least one regulatoryparty is cancer-related lncRNA or mRNA As a result wehave extracted GBM-related LSCC-related OvCa-related andPrCa-related lncRNAndashmRNA causal networks from the fourcancer-specific lncRNAndashmRNA causal networks (details inSupplementary File S1) To understand the potential biologicalprocesses and pathways of the four cancer-related lncRNAndashmRNA causal networks we identify significant GO biologicalprocesses and KEGG pathways using functional enrichmentanalysis In Figure 4B several top GO biological processes and

KEGG pathways such as cytokine activity [69] G-proteincoupled receptor binding [70] TNF signaling pathway [71] cAMPsignaling pathway [72] pathways in cancer are closely associ-ated with the occurrence and development of cancer Thisresult suggests that the identified cancer-related lncRNAndashmRNA causal networks may be involved in the occurrence anddevelopment of human cancer

Conservative network analysis highlights a corelncRNAndashmRNA causal regulatory network acrosshuman cancers

Although most of the lncRNAndashmRNA causal regulatory relation-ships are cancer-specific there are still a number of common

396 424

106 115

1370

133 173

1599

126

1729

1224

252

2523

2157

5081

0

2000

4000

Inte

rsec

tion

Siz

e

PrCa

Ov Ca

LSCC

GBM

025

0050

0075

00

1000

0

Set Size

6 283

13 1 80 673

206

3969

76 3493

407

2816

9950

7

1948

7

0

250000

500000

750000

Inte

rsec

tion

Siz

e

PrCa

Ov Ca

LSCC

GBM

0e+0

0

2e+0

5

4e+0

5

6e+0

5

8e+0

5

Set Size

6 1 2 933

2

41

6 11

246

47

524

0

200

400In

ters

ectio

n S

ize

PrCa

Ov Ca

LSCC

GBM

0200

400

600

Set Size

Module-specific genes

Module-specific causal regulations Module-specific hub lncRNAs 808611

A

B C

Figure 3 Overlap and difference of module-specific genes module-specific causal regulations and module-specific hub lncRNAs across GBM LSCC OvCa and PrCa

(A) Module-specific genes (both lncRNAs and mRNAs) intersection plot (B) Module-specific causal regulations intersection plot (C) Module-specific hub lncRNAs inter-

section plot The red lines denote common genes and causal regulations across GBM LSCC OvCa and PrCa

10 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

causal regulatory relationships between the four global net-works To evaluate whether there is a common core of lncRNAndashmRNA causal regulatory relationships in the global regulatorynetworks across human cancers we concentrate on the con-served lncRNAndashmRNA causal regulatory relationships thatexisted in at least three human cancers

As shown in Figure 5A the majority of the conservedlncRNAndashmRNA causal regulatory relationships form a closely

connected community This finding indicates that the con-served lncRNAndashmRNA causal regulatory network may be a corenetwork across human cancers

The survival analysis shows that the lncRNAs and mRNAs inthe core network can significantly distinguish the metastasisrisks between the high- and low-risk groups in GBM OvCa andPrCa data sets (Figure 5B) This result suggests that the core net-work may act as a common network biomarker of GBM OvCa

Cancer-specific networks Causal regulations y=axb R2

GBM-specific 2816 y=4812x-1292 09774

LSCC-specific 99507 y=1034x-1014 09923

OvCa-specific 19487 y=7366x-1156 09723

PrCa-specific 808611 y=5243x-07055 08310

1 10 100 11001

10

100

1100

Degree of genes

Nu

mb

er o

f ge

nes

GBM-specific fitting curveLSCC-specific fitting curveOvCa-specific fitting curvePrCa-specific fitting curveGBM-specific degree distributionLSCC-specific degree distributionOvCa-specific degree distributionPrCa-specific degree distribution

solutecation symporter activitysymporter activity

gated channel activitysodium ion transmembrane transporter activity

ion channel activitysubstrate-specific channel activity

channel activitypassive transmembrane transporter activity

growth factor activitycation channel activity

metal ion transmembrane transporter activitycollagen bindingheparin binding

sulfur compound bindingintegrin binding

glycosaminoglycan bindingextracellular matrix binding

peptide receptor activityG-protein coupled peptide receptor activity

chemokine bindingG-protein coupled receptor binding

growth factor bindingcytokine binding

serine-type endopeptidase activitydeath receptor activity

tumor necrosis factor-activated receptor activitycytokine receptor activity

protein heterodimerization activitydipeptidase activity

glycoprotein bindingRAGE receptor binding

cytokine receptor bindingcytokine activity

GBM(203)

LSCC(1015)

OvCa(479)

PrCa(3019)

001

002

003

004

padjust

GeneRatio002

004

006

008

Taste transduction

ECM-receptor interaction

cAMP signaling pathway

Calcium signaling pathway

Neuroactive ligand-receptor interaction

PI3K-Akt signaling pathway

Pathways in cancerRegulation of actin cytoskeleton

Complement and coagulation cascades

AGE-RAGE signaling pathway in diabetic complications

Hematopoietic cell lineage

Th17 cell differentiation

Inflammatory bowel disease (IBD)

Malaria

Osteoclast differentiation

Influenza A

Tuberculosis

Intestinal immune network for IgA production

Chagas disease (American trypanosomiasis)

Leishmaniasis

TNF signaling pathway

Toll-like receptor signaling pathway

Rheumatoid arthritis

Cytokine-cytokine receptor interaction

GBM(129)

LSCC(500)

OvCa(266)

PrCa(1364)

GeneRatio

005

010

015

001

002

003

004padjust

GO enrichment analysis KEGG enrichment analysis

A

B

Figure 4 Differential network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) Degree distribution of cancer-specific lncRNAndashmRNA

causal networks in GBM LSCC OvCa and PrCa (B) Functional enrichment analysis of cancer-related lncRNAndashmRNA causal networks in GBM LSCC OvCa and PrCa

Module-specific lncRNA-mRNA causal regulatory networks | 11

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

and PrCa In Figure 5B we also find that the core network con-tains several cancer genes (34 26 30 and 38 cancer genes asso-ciated with GBM LSCC OvCa and PrCa respectively)

By conducting GO and KEGG enrichment analysis we findthat the core network is significantly enriched in 399 GO biologi-cal processes and 3 KEGG pathways (details in SupplementaryFile S2) Of the 399 GO biological processes 2 GO terms includ-ing negative regulation of cell adhesion (GO 0007162) and cyto-kine production in immune response (GO 0002367) areinvolved in three cancer hallmarks Tissue Invasion andMetastasis Tumor Promoting Inflammation and EvadingImmune Detection [73] This observation implies that the corenetwork may control these cancer-related hallmarks

Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

We divide the hub lncRNAs into two categories (1) conserved hublncRNAs which exist in at least three human cancers and (2)cancer-specific hub lncRNAs which only exist in single humancancer As a result we obtain 9 conserved hub lncRNAs and 828cancer-specific hub lncRNAs (include 11 GBM-specific 246 LSCC-specific 47 OvCa-specific and 524 PrCa-specific hub lncRNAs)

To evaluate whether the hub lncRNAs can distinguish meta-stasis risks of human cancers we use them to predict metasta-sis risks for tumor samples in GBM LSCC OvCa and PrCaAs shown in Figure 6A the conserved hub lncRNAs can discrim-inate the metastasis risks of tumor samples significantly(Log-rank P-valuelt 005) in four human cancers In Figure 6Bexcepting LSCC-specific hub lncRNAs owing to failing to fit aCox regression model GBM-specific OvCa-specific and PrCa-specific hub lncRNAs can discriminate the metastasis risks oftumor samples significantly in GBM OvCa and PrCa respec-tively (Log-rank P-valuelt 005) These results suggest that thehub lncRNAs are discriminative and can act as biomarkers todistinguish between high- and low-risk tumor samples

Experimentally validated lncRNAndashmRNA regulations aremostly bad hits for LncTar

Using a collection of experimentally validated lncRNAndashmRNAregulatory relationships (details in Supplementary File S3) asthe ground truth the numbers of experimentally confirmedlncRNAndashmRNA causal regulations are 17 14 20 and 42 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4)

Figure 5 Conservative network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) The core lncRNAndashmRNA causal network that

occurred in at least three human cancers The red diamond nodes and white circle nodes denote lncRNAs and mRNAs respectively (B) Survival analysis of the core

lncRNAndashmRNA causal network

12 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as

lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations

A

B

Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs

Module-specific lncRNA-mRNA causal regulatory networks | 13

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

MSLCRN networks are biologically meaningful

In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks

We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases

Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful

Comparison with other PC-based networkinference methods

Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045

We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers

Conclusions and discussion

Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions

As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological

functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks

Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs

In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers

Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and

Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks

Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)

MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)

Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-

ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-

ciated MSLCRN networks

14 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs

Key Points

bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers

bull lncRNAs exhibit dynamic positive gene regulationacross human cancers

bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships

bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms

Supplementary Data

Supplementary data are available online at httpsacademicoupcombib

Funding

The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)

References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding

RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5

2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69

3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63

4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042

5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30

6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4

7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82

8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63

9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56

10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6

11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12

12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74

13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89

14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22

15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156

16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78

17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683

18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15

19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731

20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13

21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559

22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64

23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8

24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526

25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13

26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3

27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82

28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19

29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46

Module-specific lncRNA-mRNA causal regulatory networks | 15

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60

31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016

32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12

33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892

34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408

35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51

36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32

37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305

38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15

39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99

40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057

41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6

42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6

43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083

44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138

45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8

46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7

47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7

48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370

49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5

50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034

51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082

52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2

53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9

54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17

55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000

56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000

57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1

58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6

59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910

60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000

61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7

62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9

63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30

64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5

65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765

66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9

67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40

68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46

69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52

70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94

71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88

16 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58

73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74

74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104

75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5

76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31

77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90

Module-specific lncRNA-mRNA causal regulatory networks | 17

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats

  • bby008-TF1
  • bby008-TF51
  • bby008-TF2
Page 12: Inferring and analyzing module-specific lncRNA-mRNA causal ...nugget.unisa.edu.au/Thuc/Briefings2019JP.pdf · Thuc Duy Le is a research fellow at the University of South Australia

causal regulatory relationships between the four global net-works To evaluate whether there is a common core of lncRNAndashmRNA causal regulatory relationships in the global regulatorynetworks across human cancers we concentrate on the con-served lncRNAndashmRNA causal regulatory relationships thatexisted in at least three human cancers

As shown in Figure 5A the majority of the conservedlncRNAndashmRNA causal regulatory relationships form a closely

connected community This finding indicates that the con-served lncRNAndashmRNA causal regulatory network may be a corenetwork across human cancers

The survival analysis shows that the lncRNAs and mRNAs inthe core network can significantly distinguish the metastasisrisks between the high- and low-risk groups in GBM OvCa andPrCa data sets (Figure 5B) This result suggests that the core net-work may act as a common network biomarker of GBM OvCa

Cancer-specific networks Causal regulations y=axb R2

GBM-specific 2816 y=4812x-1292 09774

LSCC-specific 99507 y=1034x-1014 09923

OvCa-specific 19487 y=7366x-1156 09723

PrCa-specific 808611 y=5243x-07055 08310

1 10 100 11001

10

100

1100

Degree of genes

Nu

mb

er o

f ge

nes

GBM-specific fitting curveLSCC-specific fitting curveOvCa-specific fitting curvePrCa-specific fitting curveGBM-specific degree distributionLSCC-specific degree distributionOvCa-specific degree distributionPrCa-specific degree distribution

solutecation symporter activitysymporter activity

gated channel activitysodium ion transmembrane transporter activity

ion channel activitysubstrate-specific channel activity

channel activitypassive transmembrane transporter activity

growth factor activitycation channel activity

metal ion transmembrane transporter activitycollagen bindingheparin binding

sulfur compound bindingintegrin binding

glycosaminoglycan bindingextracellular matrix binding

peptide receptor activityG-protein coupled peptide receptor activity

chemokine bindingG-protein coupled receptor binding

growth factor bindingcytokine binding

serine-type endopeptidase activitydeath receptor activity

tumor necrosis factor-activated receptor activitycytokine receptor activity

protein heterodimerization activitydipeptidase activity

glycoprotein bindingRAGE receptor binding

cytokine receptor bindingcytokine activity

GBM(203)

LSCC(1015)

OvCa(479)

PrCa(3019)

001

002

003

004

padjust

GeneRatio002

004

006

008

Taste transduction

ECM-receptor interaction

cAMP signaling pathway

Calcium signaling pathway

Neuroactive ligand-receptor interaction

PI3K-Akt signaling pathway

Pathways in cancerRegulation of actin cytoskeleton

Complement and coagulation cascades

AGE-RAGE signaling pathway in diabetic complications

Hematopoietic cell lineage

Th17 cell differentiation

Inflammatory bowel disease (IBD)

Malaria

Osteoclast differentiation

Influenza A

Tuberculosis

Intestinal immune network for IgA production

Chagas disease (American trypanosomiasis)

Leishmaniasis

TNF signaling pathway

Toll-like receptor signaling pathway

Rheumatoid arthritis

Cytokine-cytokine receptor interaction

GBM(129)

LSCC(500)

OvCa(266)

PrCa(1364)

GeneRatio

005

010

015

001

002

003

004padjust

GO enrichment analysis KEGG enrichment analysis

A

B

Figure 4 Differential network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) Degree distribution of cancer-specific lncRNAndashmRNA

causal networks in GBM LSCC OvCa and PrCa (B) Functional enrichment analysis of cancer-related lncRNAndashmRNA causal networks in GBM LSCC OvCa and PrCa

Module-specific lncRNA-mRNA causal regulatory networks | 11

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

and PrCa In Figure 5B we also find that the core network con-tains several cancer genes (34 26 30 and 38 cancer genes asso-ciated with GBM LSCC OvCa and PrCa respectively)

By conducting GO and KEGG enrichment analysis we findthat the core network is significantly enriched in 399 GO biologi-cal processes and 3 KEGG pathways (details in SupplementaryFile S2) Of the 399 GO biological processes 2 GO terms includ-ing negative regulation of cell adhesion (GO 0007162) and cyto-kine production in immune response (GO 0002367) areinvolved in three cancer hallmarks Tissue Invasion andMetastasis Tumor Promoting Inflammation and EvadingImmune Detection [73] This observation implies that the corenetwork may control these cancer-related hallmarks

Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

We divide the hub lncRNAs into two categories (1) conserved hublncRNAs which exist in at least three human cancers and (2)cancer-specific hub lncRNAs which only exist in single humancancer As a result we obtain 9 conserved hub lncRNAs and 828cancer-specific hub lncRNAs (include 11 GBM-specific 246 LSCC-specific 47 OvCa-specific and 524 PrCa-specific hub lncRNAs)

To evaluate whether the hub lncRNAs can distinguish meta-stasis risks of human cancers we use them to predict metasta-sis risks for tumor samples in GBM LSCC OvCa and PrCaAs shown in Figure 6A the conserved hub lncRNAs can discrim-inate the metastasis risks of tumor samples significantly(Log-rank P-valuelt 005) in four human cancers In Figure 6Bexcepting LSCC-specific hub lncRNAs owing to failing to fit aCox regression model GBM-specific OvCa-specific and PrCa-specific hub lncRNAs can discriminate the metastasis risks oftumor samples significantly in GBM OvCa and PrCa respec-tively (Log-rank P-valuelt 005) These results suggest that thehub lncRNAs are discriminative and can act as biomarkers todistinguish between high- and low-risk tumor samples

Experimentally validated lncRNAndashmRNA regulations aremostly bad hits for LncTar

Using a collection of experimentally validated lncRNAndashmRNAregulatory relationships (details in Supplementary File S3) asthe ground truth the numbers of experimentally confirmedlncRNAndashmRNA causal regulations are 17 14 20 and 42 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4)

Figure 5 Conservative network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) The core lncRNAndashmRNA causal network that

occurred in at least three human cancers The red diamond nodes and white circle nodes denote lncRNAs and mRNAs respectively (B) Survival analysis of the core

lncRNAndashmRNA causal network

12 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as

lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations

A

B

Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs

Module-specific lncRNA-mRNA causal regulatory networks | 13

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

MSLCRN networks are biologically meaningful

In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks

We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases

Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful

Comparison with other PC-based networkinference methods

Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045

We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers

Conclusions and discussion

Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions

As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological

functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks

Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs

In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers

Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and

Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks

Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)

MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)

Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-

ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-

ciated MSLCRN networks

14 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs

Key Points

bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers

bull lncRNAs exhibit dynamic positive gene regulationacross human cancers

bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships

bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms

Supplementary Data

Supplementary data are available online at httpsacademicoupcombib

Funding

The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)

References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding

RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5

2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69

3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63

4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042

5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30

6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4

7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82

8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63

9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56

10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6

11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12

12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74

13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89

14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22

15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156

16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78

17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683

18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15

19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731

20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13

21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559

22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64

23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8

24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526

25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13

26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3

27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82

28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19

29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46

Module-specific lncRNA-mRNA causal regulatory networks | 15

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60

31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016

32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12

33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892

34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408

35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51

36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32

37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305

38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15

39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99

40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057

41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6

42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6

43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083

44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138

45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8

46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7

47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7

48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370

49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5

50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034

51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082

52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2

53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9

54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17

55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000

56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000

57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1

58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6

59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910

60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000

61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7

62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9

63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30

64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5

65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765

66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9

67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40

68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46

69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52

70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94

71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88

16 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58

73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74

74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104

75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5

76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31

77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90

Module-specific lncRNA-mRNA causal regulatory networks | 17

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats

  • bby008-TF1
  • bby008-TF51
  • bby008-TF2
Page 13: Inferring and analyzing module-specific lncRNA-mRNA causal ...nugget.unisa.edu.au/Thuc/Briefings2019JP.pdf · Thuc Duy Le is a research fellow at the University of South Australia

and PrCa In Figure 5B we also find that the core network con-tains several cancer genes (34 26 30 and 38 cancer genes asso-ciated with GBM LSCC OvCa and PrCa respectively)

By conducting GO and KEGG enrichment analysis we findthat the core network is significantly enriched in 399 GO biologi-cal processes and 3 KEGG pathways (details in SupplementaryFile S2) Of the 399 GO biological processes 2 GO terms includ-ing negative regulation of cell adhesion (GO 0007162) and cyto-kine production in immune response (GO 0002367) areinvolved in three cancer hallmarks Tissue Invasion andMetastasis Tumor Promoting Inflammation and EvadingImmune Detection [73] This observation implies that the corenetwork may control these cancer-related hallmarks

Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

We divide the hub lncRNAs into two categories (1) conserved hublncRNAs which exist in at least three human cancers and (2)cancer-specific hub lncRNAs which only exist in single humancancer As a result we obtain 9 conserved hub lncRNAs and 828cancer-specific hub lncRNAs (include 11 GBM-specific 246 LSCC-specific 47 OvCa-specific and 524 PrCa-specific hub lncRNAs)

To evaluate whether the hub lncRNAs can distinguish meta-stasis risks of human cancers we use them to predict metasta-sis risks for tumor samples in GBM LSCC OvCa and PrCaAs shown in Figure 6A the conserved hub lncRNAs can discrim-inate the metastasis risks of tumor samples significantly(Log-rank P-valuelt 005) in four human cancers In Figure 6Bexcepting LSCC-specific hub lncRNAs owing to failing to fit aCox regression model GBM-specific OvCa-specific and PrCa-specific hub lncRNAs can discriminate the metastasis risks oftumor samples significantly in GBM OvCa and PrCa respec-tively (Log-rank P-valuelt 005) These results suggest that thehub lncRNAs are discriminative and can act as biomarkers todistinguish between high- and low-risk tumor samples

Experimentally validated lncRNAndashmRNA regulations aremostly bad hits for LncTar

Using a collection of experimentally validated lncRNAndashmRNAregulatory relationships (details in Supplementary File S3) asthe ground truth the numbers of experimentally confirmedlncRNAndashmRNA causal regulations are 17 14 20 and 42 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4)

Figure 5 Conservative network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) The core lncRNAndashmRNA causal network that

occurred in at least three human cancers The red diamond nodes and white circle nodes denote lncRNAs and mRNAs respectively (B) Survival analysis of the core

lncRNAndashmRNA causal network

12 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as

lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations

A

B

Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs

Module-specific lncRNA-mRNA causal regulatory networks | 13

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

MSLCRN networks are biologically meaningful

In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks

We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases

Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful

Comparison with other PC-based networkinference methods

Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045

We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers

Conclusions and discussion

Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions

As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological

functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks

Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs

In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers

Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and

Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks

Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)

MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)

Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-

ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-

ciated MSLCRN networks

14 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs

Key Points

bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers

bull lncRNAs exhibit dynamic positive gene regulationacross human cancers

bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships

bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms

Supplementary Data

Supplementary data are available online at httpsacademicoupcombib

Funding

The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)

References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding

RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5

2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69

3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63

4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042

5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30

6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4

7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82

8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63

9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56

10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6

11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12

12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74

13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89

14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22

15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156

16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78

17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683

18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15

19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731

20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13

21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559

22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64

23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8

24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526

25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13

26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3

27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82

28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19

29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46

Module-specific lncRNA-mRNA causal regulatory networks | 15

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60

31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016

32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12

33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892

34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408

35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51

36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32

37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305

38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15

39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99

40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057

41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6

42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6

43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083

44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138

45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8

46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7

47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7

48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370

49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5

50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034

51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082

52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2

53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9

54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17

55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000

56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000

57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1

58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6

59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910

60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000

61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7

62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9

63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30

64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5

65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765

66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9

67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40

68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46

69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52

70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94

71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88

16 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58

73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74

74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104

75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5

76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31

77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90

Module-specific lncRNA-mRNA causal regulatory networks | 17

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats

  • bby008-TF1
  • bby008-TF51
  • bby008-TF2
Page 14: Inferring and analyzing module-specific lncRNA-mRNA causal ...nugget.unisa.edu.au/Thuc/Briefings2019JP.pdf · Thuc Duy Le is a research fellow at the University of South Australia

We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as

lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations

A

B

Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs

Module-specific lncRNA-mRNA causal regulatory networks | 13

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

MSLCRN networks are biologically meaningful

In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks

We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases

Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful

Comparison with other PC-based networkinference methods

Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045

We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers

Conclusions and discussion

Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions

As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological

functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks

Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs

In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers

Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and

Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks

Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)

MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)

Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-

ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-

ciated MSLCRN networks

14 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs

Key Points

bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers

bull lncRNAs exhibit dynamic positive gene regulationacross human cancers

bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships

bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms

Supplementary Data

Supplementary data are available online at httpsacademicoupcombib

Funding

The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)

References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding

RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5

2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69

3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63

4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042

5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30

6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4

7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82

8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63

9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56

10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6

11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12

12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74

13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89

14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22

15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156

16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78

17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683

18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15

19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731

20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13

21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559

22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64

23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8

24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526

25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13

26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3

27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82

28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19

29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46

Module-specific lncRNA-mRNA causal regulatory networks | 15

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60

31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016

32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12

33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892

34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408

35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51

36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32

37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305

38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15

39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99

40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057

41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6

42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6

43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083

44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138

45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8

46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7

47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7

48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370

49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5

50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034

51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082

52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2

53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9

54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17

55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000

56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000

57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1

58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6

59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910

60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000

61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7

62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9

63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30

64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5

65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765

66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9

67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40

68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46

69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52

70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94

71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88

16 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58

73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74

74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104

75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5

76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31

77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90

Module-specific lncRNA-mRNA causal regulatory networks | 17

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats

  • bby008-TF1
  • bby008-TF51
  • bby008-TF2
Page 15: Inferring and analyzing module-specific lncRNA-mRNA causal ...nugget.unisa.edu.au/Thuc/Briefings2019JP.pdf · Thuc Duy Le is a research fellow at the University of South Australia

MSLCRN networks are biologically meaningful

In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks

We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases

Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful

Comparison with other PC-based networkinference methods

Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045

We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers

Conclusions and discussion

Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions

As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological

functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks

Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs

In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers

Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and

Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks

Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)

MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)

Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-

ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-

ciated MSLCRN networks

14 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs

Key Points

bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers

bull lncRNAs exhibit dynamic positive gene regulationacross human cancers

bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships

bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms

Supplementary Data

Supplementary data are available online at httpsacademicoupcombib

Funding

The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)

References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding

RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5

2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69

3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63

4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042

5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30

6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4

7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82

8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63

9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56

10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6

11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12

12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74

13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89

14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22

15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156

16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78

17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683

18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15

19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731

20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13

21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559

22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64

23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8

24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526

25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13

26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3

27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82

28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19

29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46

Module-specific lncRNA-mRNA causal regulatory networks | 15

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60

31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016

32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12

33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892

34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408

35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51

36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32

37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305

38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15

39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99

40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057

41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6

42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6

43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083

44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138

45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8

46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7

47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7

48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370

49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5

50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034

51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082

52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2

53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9

54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17

55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000

56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000

57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1

58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6

59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910

60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000

61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7

62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9

63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30

64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5

65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765

66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9

67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40

68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46

69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52

70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94

71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88

16 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58

73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74

74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104

75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5

76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31

77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90

Module-specific lncRNA-mRNA causal regulatory networks | 17

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats

  • bby008-TF1
  • bby008-TF51
  • bby008-TF2
Page 16: Inferring and analyzing module-specific lncRNA-mRNA causal ...nugget.unisa.edu.au/Thuc/Briefings2019JP.pdf · Thuc Duy Le is a research fellow at the University of South Australia

expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs

Key Points

bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers

bull lncRNAs exhibit dynamic positive gene regulationacross human cancers

bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers

bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships

bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms

Supplementary Data

Supplementary data are available online at httpsacademicoupcombib

Funding

The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)

References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding

RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5

2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69

3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63

4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042

5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30

6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4

7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82

8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63

9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56

10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6

11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12

12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74

13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89

14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22

15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156

16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78

17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683

18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15

19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731

20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13

21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559

22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64

23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8

24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526

25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13

26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3

27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82

28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19

29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46

Module-specific lncRNA-mRNA causal regulatory networks | 15

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60

31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016

32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12

33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892

34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408

35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51

36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32

37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305

38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15

39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99

40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057

41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6

42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6

43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083

44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138

45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8

46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7

47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7

48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370

49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5

50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034

51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082

52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2

53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9

54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17

55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000

56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000

57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1

58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6

59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910

60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000

61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7

62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9

63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30

64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5

65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765

66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9

67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40

68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46

69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52

70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94

71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88

16 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58

73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74

74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104

75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5

76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31

77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90

Module-specific lncRNA-mRNA causal regulatory networks | 17

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats

  • bby008-TF1
  • bby008-TF51
  • bby008-TF2
Page 17: Inferring and analyzing module-specific lncRNA-mRNA causal ...nugget.unisa.edu.au/Thuc/Briefings2019JP.pdf · Thuc Duy Le is a research fellow at the University of South Australia

30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60

31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016

32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12

33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892

34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408

35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51

36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32

37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305

38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15

39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99

40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057

41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6

42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6

43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083

44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138

45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8

46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7

47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7

48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370

49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5

50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034

51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082

52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2

53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9

54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17

55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000

56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000

57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1

58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6

59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910

60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000

61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7

62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9

63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30

64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5

65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765

66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9

67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40

68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46

69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52

70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94

71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88

16 | Zhang et al

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018

72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58

73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74

74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104

75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5

76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31

77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90

Module-specific lncRNA-mRNA causal regulatory networks | 17

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats

  • bby008-TF1
  • bby008-TF51
  • bby008-TF2
Page 18: Inferring and analyzing module-specific lncRNA-mRNA causal ...nugget.unisa.edu.au/Thuc/Briefings2019JP.pdf · Thuc Duy Le is a research fellow at the University of South Australia

72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58

73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74

74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104

75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5

76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31

77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90

Module-specific lncRNA-mRNA causal regulatory networks | 17

Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats

  • bby008-TF1
  • bby008-TF51
  • bby008-TF2