Transcript
Page 1: Stability indicators in (biological) network inference€¦ · STABILITY INDICATORS IN (BIOLOGICAL) NETWORK INFERENCE Giuseppe Jurman1, Michele Filosi1,2, Roberto Visintainer1, Samantha

STABILITY INDICATORS IN (BIOLOGICAL) NETWORK INFERENCEGiuseppe Jurman1, Michele Filosi1,2, Roberto Visintainer1, Samantha Riccadonna3, Cesare Furlanello1

1 Fondazione Bruno Kessler, Trento and 2 University of Trento, Italy and 3 Fondazione Edmund Mach San Michele all’Adige, Italy

We propose how to quantify inference variability with respect to data perturbation, and, in particular, data subsampling. We introduce a set offour indicators allowing the researcher to quantitatively evaluate the reliability of the inferred/non-inferred links. For a given ratio of removeddata and for a give number of resampling, we quantitatively assess the mutual distances among all inferred networks and their distances to thenetwork generated by the whole dataset. The rationale is that, the smaller the average distance, the stabler the network. We also provide aranked list of the stablest links and nodes, where the rank is induced by the variability of the link weight and the node degree across the gen-erated networks, the less variable being the top ranked.

As a network distance we employ the HIM distance, which represents a good compromise between local (link-based) and global (structure-based) measure of network comparison. As a first testbed in a controlled situation the four indicators are computed on a synthetic dataset fordifferent instances of a correlation network with different measures, highlighting the impact of a False Discovery Ratio filter on the network re-construction method. Finally, we show the use of the stability measures in comparing the relevance networks inferred on a miRNA microarraydataset with paired tissues extracted from a cohort of 241 hepatocellular carcinoma patients.

STABILITY INDICATORS

p

D: sALG

NODES={x1D,...,xp

D}

LINK WEIGHTS=

w11D,...,w1p

D

wp1D,...,wpp

D

whkD

...

...

...

...

p

D: s

p

Di: n

p

D1: n

p

Dr: n

...

...ALG

ALG

ALG

n < s

r ≤ sn))

Stability of the entire network

I1(n, r ) = {HIM(ND, NDi) : i = 1, ... , r}I1(n, r ) = {HIM(ND, NDi) : i = 1, ... , r}I1(n, r ) = {HIM(ND, NDi) : i = 1, ... , r}

Distances between the network constructed on thewhole dataset and the networks inferred from thedifferent subsampling replicates.

I2(n, r ) = {HIM(NDi , NDj) : i , j = 1, ... r , i 6= j}I2(n, r ) = {HIM(NDi , NDj) : i , j = 1, ... r , i 6= j}I2(n, r ) = {HIM(NDi , NDj) : i , j = 1, ... r , i 6= j}

Mutual distances among the networks inferred fromthe different subsampling replicates.

Stability (reliability) of single nodes and links

I3(n, r ) = {aDihk}I3(n, r ) = {aDihk}I3(n, r ) = {aDihk} for i = 1, ... , r and k , h = 1, ... , p

I4(n, r ) = {∂(xDih )}I4(n, r ) = {∂(xDih )}I4(n, r ) = {∂(xDih )} for i = 1, ... , r and h = 1, ... , p

and ∂ the degree function

Variability of node degree and link weight of thenetworks inferred from the different subsamplingreplicates

RESAMPLING SCHEMA:I LOO (leave-one-out stability): n = s − 1,r = 1I 20 × k -fold20 × k -fold20 × k -fold cross validation for k = 2, 4, 10 (k2, k4 and k10) −→ n = bs(k−1)

k c and r = 20k .

HIM NETWORK DISTANCE

HIM MetricProduct Metric of

{ Hamming - edit distance - focus only on local presence/absence of matching links.Ipsen-Mikhailov - spectral distance - evaluate global structure of topologies.

(A)

(B) H

IM

0.2

0.4

0.6

0.8

0.2 0.4 0.6 0.8

P(A,B)0.6

P(A,F)

0.59

P(A,E)

0.5

2

I

II III

IV

HIM(G,H) = 1√2

√H(G,H)2 + IM(G,H)2 ,

H(G,H) = 1N(N−1)

∑1≤i 6=j≤N

|A(1)ij − A(2)

ij | ,

IM(G,H) = εγ(G,H) =√∫∞

0 [ρG(ω, γ)− ρH(ω, γ)]2 dω .

Representation of the HIM distance in the Ipsen-Mikhailov and Hamming distance space between networks A versus B, F and E,where F is the fully connected network and E is the empty one.P(G,H) represents the distance between two networks G and H whose coordinates are x = H(G,H)x = H(G,H)x = H(G,H) and y = IM(G,H)y = IM(G,H)y = IM(G,H) and the normof P is

√2 times the HIM distance HIM(G,H).

I N-nodes network as N-atoms systemconnected by identical elastic strings,

xi +N∑

j=1Aij(xi − xj) = 0 for i = 0, · · · , N − 1 ,

I The vibrational frequencies ωi satisfyλi = ω2

i ,

I ρ(ω) = KN−1∑i=1

γ

(ω − ωk)2 + γ2, spectral density as sum of Lorentz

distributions,I γ is the common width, half-width at half-maximum (HWHM),

equal to half the interquartile range, εγ(E , F ) = 1 ,I K is the normalization constant solution of

∫∞0 ρ(ω)dω = 1.

0

1

2

3

4

5

6

7

G

0 2 4 6 8

0.0

00

.05

0.1

00

.15

0.2

00

.25

0.3

0

ρG(γ,ω)

20 4 6 8

spec(LG)

REFERENCES

Baralla et al. Inferring Gene Networks: Dream or Nightmare? Annals of the New York Academy of Science, 2009.

Budhu et al. Identification of Metastasis-Related MicroRNAs in Hepatocellular Carcinoma Hepatology, 2008.

Faith et al. Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles PLoS Biology, 2007.

Gillis and Pavlidis The role of indirect connections in gene networks in predicting function Bioinformatics, 2011.

Ipsen and Mikhailov Evolutionary reconstruction of networks Physical Review E, 2002.

Jurman et al. Stability Indicators in Network Reconstruction arXiv, 2012.

Jurman et al. A glocal distance for network comparison arXiv, 2012.

Meyer et al. Verification of systems biology research in the age of collaborative competition Nature Biotechnology, 2011.

Miller et al. Identifying Biological Network Structure, Predicting Network Behavior, and Classifying Network State With High Dimensional Model Representation (HDMR) PLoS ONE, 2012.

Prill et al. Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges PLoS ONE, 2010.

Reshef et al. Detecting novel associations in large datasets Science, 2011

Volinia et al. Reprogramming of miRNA networks in cancer and leukemia Genome Research, 2010.

FDR EFFECT ON CORRELATION NETWORKS

To assess the different level of stability in a correlation network inferred by a set of synthetic high-throughput signals whenthe inference is computed with or without False Discovery Rate control.

SYNTHETIC BENCHMARK.

Corr(fi, fj) ≈

0.90.70.4

f20

f19

f18

f17

f16

f15

f14

f13

f12

f11

f10

f9

f8

f7

f6

f5

f4

f3

f2

f1

−0

.20

.00

.20

.40

.60

.81

.0

I WGCNA [Langfelder et al., 2008]I MIC [Reshef et al., 2011]I WGCNA

with FDR correction

Adj = {ahk} where ahk =

|Corr(xh, xk)| if|Fz(h, k)| ≥ 10 otherwise

I1

I2

0.00

0.02

0.04

0.06

0.00 0.05 0.10

k10

k2

k4

LOO

k10

k2

k4

LOO

k10

k2

k4

LOOk10k2k4LOO

k10

k2

k4

LOO

MINEWGCNAWGCNAFDR1e−2WGCNAFDR5e−3WGCNAFDR1e−4

MIRNA NETWORK ON A HEPATOCELLULAR CARCINOMA DATASET

I 482 tissue samples from 241 patients [Budhu et al., 2008, Volinia et al., 2010].I For each patients, a sample from cancerous hepatic tissue and a sample from

surrounding non-cancerous hepatic tissue.I Ohio State University CCC MicroRNA Microarray 2.0: 11520 probes, 250

non-redundant human and 200 mouse miRNA.I After preprocessing, the dataset HCC of 240+240 paired samples described

by 210 human miRNA is analyzed (210 ♂+ 30 ♀).IHCC is partitioned into four subsets combining the sex and disease status

phenotypes(MT, FT, MnT, FnT).

INFERENCE ALGORITHMS COMPARISON

I I1 captures the robustness of the algorithms to subsampling.I I2 tends to express the homogeneity of the dataset.I The bigger the sample-size for reconstruction the stabler the result.

I1

I2

0.00

0.02

0.04

0.06

0.08

0.0 0.2 0.4 0.6

k10k2

k4LOO

k10

k2

k4

LOO

k10

k2

k4

LOO

k10

k2

k4

LOO

ARACNE CLR TOM WGCNA●

MT

FT

MnT

FnT

let.7a.1.prec

let.7a.2.precNo1

let.7a.2.precNo2

let.7a.3.prec

let.7b.prec

let.7c.prec

let.7d.prec

let.7d.v1.prec

let.7d.v2.precNo2

let.7e.prec

let.7f.1.precNo2

let.7f.2.prec2

let.7g.precNo1

let.7iNo1

let.7iNo2

007.2.precNo2

007.3.precNo1

009.3No1

010a.precNo1

010b.precNo1

016a.chr13

016b.chr3

017.precNo2

020.prec

021.prec.17No1

023a.prec

023b.prec

024.1.precNo1

024.1.precNo2

024.2.prec

025.prec

026a.precNo1

026b.prec

027a.prec

027b.prec

029a.2No1

029a.2No2

029c.prec

030a.precNo1

030b.precNo1

030c.prec

030d.precNo2

031.prec

032.precNo1

032.precNo2

034precNo1

092.prec.13.092.1No1

092.prec.13.092.1No2

092.prec.X.092.2

093.prec.7.1.093.1

095.prec.4

096.prec.7No2

099b.prec.19No2

099.prec.21

100.1.2.prec

100No1

101.1.2.precNo2

102.prec.1

103.2.prec

103.prec.5.103.1

105.prec.X.1.105.1

106aNo1

106bNo1 106.prec.X

107No1

107.prec.10

123.precNo1

123.precNo2

124a.1.prec1124a.2.prec

125a.precNo1125b.1126No1

126No1

126No2

126No2

128a.precNo2

129.2No1

129.precNo1

1.2No1

1.2No2

130a.precNo2

130bNo1

130bNo2

132.precNo2133bNo2

135.2.prec

135a.1No1

135a.2No1

138.2.prec

142.prec

145.prec

148aNo1

148bNo1

148bNo2

148.prec

149.prec

150.prec152.precNo2

155.prec

15aNo1

16.1No1

16.2No1

181a.precNo2 181b.1No1181b.2No1

181b.2No2

181b.precNo1

181c.precNo1

184.precNo2

185.precNo2

192.2.3No1

192No1

193.precNo1

194.1No1

194.2No1

194.precNo1

195.prec

196a.1No1

196a.2No1

196bNo2

197.prec

199a.1.prec

199b.precNo2

206.precNo1

206.precNo2213.precNo1

214.prec

215.precNo1

215.precNo2

216.precNo1

219.1No1

219.1No2

219.2No2

21No1

221.prec

222.precNo1

222.precNo2

223.prec224.prec

26a.1No1

26a.1No2

26a.2No1

296No1

299No1

29b.1No1

29b.2.102prec7.1.7.2

302bNo2

30c.1No1

30c.2No1

30eNo1

320No2

321No1

321No2

323No2324.5pNo1

324No2

325No1

326No1

326No2

328No1

335No2

338No1

338No2

339No2

340No2

342No1

345No2

346No1

34aNo1

34bNo2

368No1

371No1

373No1

373No1

373No2

373No23p21.v1.v2.AntiS5P

3p21.v1.v2.sense5P

3p21.v3.v4.sense13P

3p21.v3.v4.sense35P

3p21.v3.v4.sense45P

let.7a.1.prec

let.7a.2.precNo1

let.7a.2.precNo2

let.7a.3.prec

let.7b.prec

let.7c.prec

let.7d.prec

let.7d.v1.prec

let.7d.v2.precNo2

let.7e.prec

let.7f.1.precNo2

let.7f.2.prec2

let.7g.precNo1

let.7iNo1

let.7iNo2

007.2.precNo2

007.3.precNo1

009.3No1

010a.precNo1

010b.precNo1

016a.chr13

016b.chr3

017.precNo2

020.prec

021.prec.17No1

023a.prec

023b.prec024.1.precNo1

024.1.precNo2

024.2.prec

025.prec

026a.precNo1

026b.prec027a.prec

027b.prec

029a.2No1

029a.2No2

029c.prec

030a.precNo1

030a.precNo2

030b.precNo1

030c.prec

030d.precNo2

031.prec

032.precNo1

032.precNo2

034precNo1

092.prec.13.092.1No1

092.prec.13.092.1No2

092.prec.X.092.2

093.prec.7.1.093.1

095.prec.4

096.prec.7No2

099b.prec.19No2

099.prec.21

100.1.2.prec

100No1

101.1.2.precNo1

101.1.2.precNo2

102.prec.1

103.2.prec

103.prec.5.103.1

105.prec.X.1.105.1

106aNo1

106bNo1

106.prec.X

107No1

107.prec.10

122a.prec

123.precNo1

123.precNo2

124a.1.prec1

124a.2.prec

124a.3.prec

125a.precNo1

125b.1

125b.2.precNo2

126No1

126No1

126No2

126No2

127.prec

128a.precNo1

128a.precNo2

128b.precNo1129.2No1

129.precNo1129.precNo2

1.2No1

1.2No2

130a.precNo2130bNo1

130bNo2

132.precNo2

133a.1

133bNo2

135.2.prec

135a.1No1

135a.2No1

136.precNo2

138.2.prec

140No2

142.prec

145.prec

146.prec

148aNo1

148bNo1

148bNo2

148.prec

149.prec

150.prec

152.precNo1

152.precNo2

155.prec

15aNo116.1No1

16.2No1

181a.precNo2

181b.1No1

181b.2No1

181b.2No2

181b.precNo1

181c.precNo1

184.precNo2

185.precNo2

191.prec

192.2.3No1192No1

193.precNo1

193.precNo2

194.1No1

194.2No1

194.precNo1

195.prec

196a.1No1

196a.2No1

196bNo2

197.prec

198.prec

199a.1.prec

199b.precNo2

205.prec

206.precNo1

206.precNo2

210.prec

212.precNo1

212.precNo2

213.precNo1

214.prec

215.precNo1

215.precNo2

216.precNo1

218.2.precNo2

219.1No1

219.1No2

219.2No2

21No1

221.prec

222.precNo1

222.precNo2

223.prec

224.prec26a.1No1

26a.1No2

26a.2No1

296No1

299No1

29b.1No1

29b.2.102prec7.1.7.2

301No2

302bNo2

30c.1No1

30c.2No1

30eNo1

320No1

320No2

321No1

321No2

323No2

324.5pNo1

324.5pNo2

324No2

325No1

326No1

326No2

328No1

331No2

335No2

338No1

338No2

339No2340No2

342No1

342No2

345No2

346No1

34aNo1

34bNo2

34cNo2

368No1

371No1

373No1

373No1

373No2

373No2

3p21.v1.v2.AntiS5P

3p21.v1.v2.sense5P3p21.v3.v4.sense13P

3p21.v3.v4.sense35P

3p21.v3.v4.sense45P

let.7a.2.precNo1

let.7a.3.prec

let.7c.prec

let.7d.prec

let.7d.v1.prec

let.7d.v2.precNo2

let.7e.prec

let.7f.2.prec2let.7iNo1

let.7iNo2

007.2.precNo2

009.3No1

016a.chr13

016b.chr3

017.precNo2

020.prec

021.prec.17No1

023a.prec

023b.prec

024.1.precNo2

024.2.prec

025.prec

026a.precNo1

026b.prec

027a.prec

027b.prec

029a.2No1

029a.2No2

030a.precNo1

030b.precNo1

030c.prec

030d.precNo2

031.prec

032.precNo1

032.precNo2

092.prec.13.092.1No2

092.prec.X.092.2

093.prec.7.1.093.1

095.prec.4

101.1.2.precNo1101.1.2.precNo2

102.prec.1

105.prec.X.1.105.1

107No1

123.precNo2

124a.1.prec1124a.2.prec

126No2

129.2No1

129.precNo1

1.2No1

1.2No2

130a.precNo2

130bNo1

130bNo2

132.precNo2

133a.1

133bNo2

135a.1No1

135a.2No1

138.2.prec

148aNo1

148bNo1

148bNo2

149.prec

152.precNo1

152.precNo2

16.2No1

181a.precNo2 181b.1No1181b.2No1

181b.2No2

181b.precNo1

181c.precNo1

185.precNo2

191.prec

192.2.3No1

192No1

193.precNo1

194.2No1

196a.1No1

196a.2No1

196bNo2

198.prec

199a.1.prec

199b.precNo2

206.precNo1

206.precNo2210.prec 213.precNo1

214.prec

215.precNo1

215.precNo2

216.precNo1

219.1No1

219.2No2

21No1

222.precNo1

26a.1No1

26a.1No2

26a.2No1

296No1

299No1

29b.1No1

302bNo2

30c.1No1

30c.2No1

320No2

321No1

321No2

323No2324.5pNo1

325No1

326No1

326No2

328No1

331No2

338No1

339No2

340No2

342No2

345No2

346No1

34bNo2

368No1

371No1

373No1

373No1

373No2

373No23p21.v1.v2.AntiS5P

3p21.v1.v2.sense5P

3p21.v3.v4.sense13P

3p21.v3.v4.sense35P

let.7a.1.prec

let.7a.2.precNo1

let.7a.2.precNo2

let.7a.3.prec

let.7b.prec

let.7c.prec

let.7d.prec

let.7d.v1.prec

let.7d.v2.precNo2

let.7e.prec

let.7f.1.precNo2

let.7f.2.prec2

let.7g.precNo1

let.7iNo1

let.7iNo2

007.2.precNo2

007.3.precNo1

009.3No1

010a.precNo1

016a.chr13

016b.chr3

017.precNo2

020.prec

021.prec.17No1

023a.prec

023b.prec024.1.precNo1

024.1.precNo2

024.2.prec

025.prec

026a.precNo1

026b.prec

027b.prec

029a.2No1

029a.2No2

029c.prec

030a.precNo1

030a.precNo2

030b.precNo1

030c.prec

030d.precNo2

031.prec

032.precNo1

032.precNo2

034precNo1

092.prec.13.092.1No1

092.prec.13.092.1No2

092.prec.X.092.2

093.prec.7.1.093.1

095.prec.4

096.prec.7No2

099b.prec.19No2

099.prec.21

100.1.2.prec

100No1

101.1.2.precNo1

101.1.2.precNo2

102.prec.1

103.prec.5.103.1

105.prec.X.1.105.1

106aNo1

106bNo1

106.prec.X

107No1

107.prec.10

123.precNo1

123.precNo2

124a.1.prec1

124a.2.prec

125a.precNo1

125b.1

125b.2.precNo2

126No1

126No1

126No2

128a.precNo1

128a.precNo2

128b.precNo1129.2No1

129.precNo2

1.2No1

1.2No2

130a.precNo2

130bNo2

132.precNo2

133a.1

133bNo2

135.2.prec

135a.2No1

136.precNo2

138.2.prec

140No2

145.prec

146.prec

148bNo2

148.prec

149.prec

150.prec

152.precNo1

152.precNo2

155.prec

15aNo116.1No1

16.2No1

181a.precNo2

181b.1No1

181b.2No1

181b.2No2

181b.precNo1

181c.precNo1

184.precNo2

185.precNo2

191.prec

193.precNo1

194.1No1

194.2No1

194.precNo1

196a.1No1

196a.2No1

196bNo2

197.prec

198.prec

199b.precNo2

205.prec

206.precNo1

206.precNo2

212.precNo1

213.precNo1

214.prec

215.precNo1

215.precNo2

216.precNo1

218.2.precNo2

219.1No1

219.1No2

219.2No2

21No1

221.prec

222.precNo1

222.precNo2

223.prec

224.prec26a.1No1

26a.1No2

26a.2No1

296No1

299No1

29b.1No1

29b.2.102prec7.1.7.2

301No2

302bNo2

30c.1No1

30c.2No1

320No2

321No1

321No2

323No2

324.5pNo1

324.5pNo2

324No2

325No1

326No1

326No2

328No1

331No2

335No2

338No1

338No2

339No2340No2

342No1

342No2

345No2

346No1

34aNo1

34bNo2

34cNo2

368No1

371No1

373No1

373No1

373No2

373No2

3p21.v1.v2.AntiS5P

3p21.v1.v2.sense5P

3p21.v3.v4.sense35P

3p21.v3.v4.sense45P

BIOLOGICAL RESULTS

MnT FT FnT HIM0.0412 0.0858 0.0235 MT

0.1265 0.0618 MnT0.0684 FT

−0.05 0.00 0.05

−0.0

05

0.0

05

Coordinate 1

Coord

inate

2

MTMnT FT

FnT

Statistics on I4id hsa-mir idx1 hsa-mir idx2 MT MnT FT FnT(a) 321No1 321No2 1 1 9 2(b) 016b.chr3 16.2No1 3 12 15 309 -(c) 021.prec.17No1 21No1 27 5 2 921(d) 219.1No1 321No2 2 6 1903 314(e) 326No1 342No2 132 1017 3 -(f) 192.2.3No1 215.precNo1 4 300 4 3340

(a) is top ranking in all four cases as expected (hsa-mir 321No1 andhsa-mir 321No2 denote essentially the same miRNA)

(b) and (c) as (a), but with less stability in the FnT network due to noise.(d) has different stability between the male and the female (link probably

associated to sex rather than HCC).(e) is very stable for FT, while is not even picked up as a link by CLR in the

FnT network.(f) is a very well known cancer associated link, as confirmed by high stability

in MT and FT.

I1

I2

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.00 0.02 0.04 0.06 0.08 0.10 0.12

k10

k2

k4

LOO

k10

k2

k4

LOO

k10

k2

k4

LOO

k10

k2

k4

LOO

FT

FnT

MT

MnT

I1 vs. I2 plot for CLR inferred networks in the 4 subgroups.

I M much stabler than FI MT stability similar to MnTI F LOO worse than M 4/10 FoldI FnT much worse than FT

Authors acknowledge funding by the European Union FP7 Project HiperDART

NetSci 2013 - International School and Conference on Network Science http://mpba.fbk.eu {jurman, filosi, visintainer, furlan}@fbk.eu, [email protected]

Top Related