logical reduction of biological networks to their most determinative components · 2017-08-28 ·...

Bull Math Biol (2016) 78:1520–1545DOI 10.1007/s11538-016-0193-x

ORIGINAL ARTICLE

Logical Reduction of Biological Networks to Their MostDeterminative Components

Mihaela T. Matache1 · Valentin Matache1

Received: 14 December 2015 / Accepted: 23 June 2016 / Published online: 14 July 2016© The Author(s) 2016. This article is published with open access at Springerlink.com

Abstract Boolean networks have been widely used as models for gene regulatorynetworks, signal transduction networks, or neural networks, among many others. Oneof the main difficulties in analyzing the dynamics of a Boolean network and its sen-sitivity to perturbations or mutations is the fact that it grows exponentially with thenumber of nodes. Therefore, various approaches for simplifying the computations andreducing the network to a subset of relevant nodes have been proposed in the past fewyears.We consider a recently introduced method for reducing a Boolean network to itsmost determinative nodes that yield the highest information gain. The determinativepower of a node is obtained by a summation of all mutual information quantities overall nodes having the chosen node as a common input, thus representing a measureof information gain obtained by the knowledge of the node under consideration. Thedeterminative power of nodes has been considered in the literature under the assump-tion that the inputs are independent inwhich case one can use the Bahadur orthonormalbasis. In this article, we relax that assumption and use a standard orthonormal basisinstead. We use techniques of Hilbert space operators and harmonic analysis to gen-erate formulas for the sensitivity to perturbations of nodes, quantified by the notionsof influence, average sensitivity, and strength. Since we work on finite-dimensionalspaces, our formulas and estimates can be and are formulated in plain matrix algebraterminology. We analyze the determinative power of nodes for a Boolean model of a

Electronic supplementary material The online version of this article (doi:10.1007/s11538-016-0193-x)contains supplementary material, which is available to authorized users.

B Mihaela T. [email protected]

Valentin [email protected]

1 Department of Mathematics, University of Nebraska at Omaha, Omaha, NE 68182-0243, USA

123

http://crossmark.crossref.org/dialog/?doi=10.1007/s11538-016-0193-x&domain=pdf

http://dx.doi.org/10.1007/s11538-016-0193-x

Logical Reduction of Biological Networks to Their Most... 1521

signal transduction network of a generic fibroblast cell. We also show the similaritiesand differences induced by the alternative complete orthonormal basis used. Amongthe similarities, we mention the fact that the knowledge of the states of the most deter-minative nodes reduces the entropy or uncertainty of the overall network significantly.In a special case, we obtain a stronger result than in previous works, showing thata large information gain from a set of input nodes generates increased sensitivity toperturbations of those inputs.

Keywords Boolean networks · Biological information theory · Mutual information ·Sensitivity · Network reduction · Linear operators · Numerical simulations

1 Introduction

The past few decades have generated a large influx of data and information regardinga variety of real or artificial networks. The necessity to understand and use the datain a meaningful way has lead to various modeling approaches. In particular, Booleannetwork (BN) models introduced by Kauffman (1969) have acquired a significantimportance in modeling networks where the node activity can be described by twostates, 1 and 0, “ON and OFF,” “active and nonactive,” and where each node is updatedbased on logical relationships with other nodes. These models incorporate Booleanfunctions that are relevant to particular types of applications, such as signal transduc-tion in cells (e.g., Helikar et al. 2008), genetic regulatory networks or other biologicalnetworks (e.g., Kauffman 1993; Shmulevich and Kauffman 2004; Shmulevich et al.2002; Klemm and Bornholdt 2000; Albert and Othmer 2003), or neural networks (e.g.,Huepe and Aldana 2002).

However, a simplification of the reality to binary states of the nodes does not easethe difficulty of studying large, complex networks for which the existing data mayoffer only partial information on the real interactions in the network and for which thedynamics are hard to study even under a deterministic approach. As a matter of fact,even smaller networks of only a few hundred nodes or less can pose serious difficultiesin assessing the dynamics, given the exponential dependence of the state space onthe number of nodes. Consequently, a number of approaches aiming at simplifyingthe computational difficulty of analyzing the dynamics have been proposed in recentyears. For example, Goles et al. (2013) reduce the network and the number of updatesneeded to analyze the dynamics by generating sub-configurations of states that remainfixed regardless of the values of the other nodes and by identifying sets of updatingschedules which have the same dynamics. They show that such networks are minimalin the sense that no edge can be deleted because every one of them represents a realinteraction in the respective network. Various methods for reducing the network to afairly small subset of nodes that are relevant for the long-term dynamics have beenproposed. The definitions of “relevant” and “irrelevant” nodes differ depending on theactual approach. Some methods are related to eliminating stable nodes that end up inan attractor after a transient period and thus considered irrelevant. This may be pairedwith removing leaf nodes that do not contribute to the evolution of any other node, thatis, with zero out-degree (outputs) like in Bilke and Sjunnesson (2001) or Richardson

123

1522 M. T. Matache, V. Matache

(2004), or with merging or collapsing mediator nodes with one in-degree (input) andone out-degree in Saadatpour et al. (2013). Other methods are based on eliminatingirrelevant nodes that are frozen at the same value on every attractor together with nodeswhose outputs go only to irrelevant nodes in Socolar and Kauffman (2003), Kaufmanet al. (2005), or Kaufman and Drossel (2006). The basis of these methods is to reducethe “structure” of the network using some rules on the Boolean functions and thenprove that such a reduction simplifies the identification of attractors. As expected,they may carry an intrinsic numerical burden. Furthermore, alternative methods foreliminating the need for complete enumeration of the states have been considered. Forexample, Devloo et al. (2003) propose another formalism which permits calculationof the steady states as solutions of a system of steady-state equations, via an imagefunction which identifies a state by its image under a certain function. The authors useconstraint programming to solve the derived system of equations in an efficient wayfrom a computational point of view.

In addition to the previously mentioned methods for network reduction, the entropyof the relevant components of the network which are comprised of relevant nodes thateventually influence each other’s state is used as a measure of uncertainty of the futurebehavior of a random state of the network by Krawitz and Shmulevich (2007a, b). Theentropy is a measure of uncertainty that has been used also by Ribeiro et al. (2008)to find the average mutual information of a random Boolean model of regulatorynetwork as a way to quantify the efficiency of information propagation through theentire network. In this context, one needs to consider pairs of connected nodes and theintrinsic Boolean functions that govern the node updates, as opposed to evolving thenetworks in order to identify the attractors. Further research by some of the authors ofRibeiro et al. (2008), in particular Lloyd-Price et al. (2012), uses mutual informationto test for a relationship between the robustness to perturbations of an attractor in arandom BN and the amount of information propagated within the network when inthat attractor. They found that there is a trade-off between robustness and informationpropagation and that at the edge of chaos, robustness is not correlated with informationpropagation.

On the other hand, the notions of entropy and mutual information have been longused as measures of complexity of dynamical systems, such as BNs, as described, forexample, by Feldman and Crutchfield (1998) or by Sole and Luque (1997). Luque andFerrera are concerned with the mutual information contained in random BNs and itsbehavior as the networks undergo their order–disorder phase transition, showing thatthe mutual information stored in the network has a maximum at the transition point.

Only recently the mutual information has been used as a method for identifyingthe most powerful and therefore relevant nodes in a BN, thus offering an efficientalternative approach to network reduction to a relevant subset of nodes Heckel et al.(2013). The mutual information, as a basic concept in information theory, allows oneto represent the reduction in the uncertainty or entropy of the state of a node due to theknowledge of any of its inputs. A summation of all mutual information quantities overall nodes having a common input can be viewed as the determinative power of thatinput node. The more powerful the node, the more the information gain provided bythe knowledge of its state. In Heckel et al. (2013), the authors use harmonic analysis tocompare the determinative power of a set of inputs to the sensitivity to perturbations

123


to those inputs showing that an input with large sensitivity need not have a largedeterminative power. On the other hand, large information gain from a set of inputsgenerates large sensitivity to perturbations of those inputs. Moreover, by consideringthe feedforward regulatory network of E. coli, it is shown that the knowledge of thestates of the most determinative nodes reduces the uncertainty of the overall networksignificantly. Thus, one could focus on the dynamics of the reduced network of thenodes with the most determinative power.

In Heckel et al. (2013), the mutual information formula is obtained in terms ofFourier coefficients expressed in the Bahadur basis which assumes independence ofthe inputs of a Boolean function. In a subsequent paper by Klotz et al. (2014), it isshown that canalizing Boolean functions maximize the mutual information under thesame assumption as in Heckel et al. (2013). This assumption is strong, since in a BN,there are correlations between inputs that build up as the network evolves in time. Ourgoal is to relax this assumption and allow dependence of inputs, while exploring theimpact of a (necessarily) different basis on the results regarding themutual informationand the sensitivity to perturbations. We notice that some results still hold; however,not all are independent of the basis. At the same time we are interested to see theimpact of our approach on the network reduction based on most determinative nodesof a specific biological network. In particular, we use a Boolean model of the signaltransduction network of a generic fibroblast cell andwe obtain results similar toHeckelet al. (2013).

In Sect. 2, we provide the basic definitions, the mathematical setup, and we useelements of operator theory to generate formulas for finding the sensitivity to pertur-bations of the nodes of the network, quantified by the concepts of influence, averagesensitivity, and strength (to be defined in that section). We also discuss the computa-tional aspects of using those formulas in applications. In Sect. 3, we provide formulasand estimates for mutual information, determinative power, and strength, paired withsimulations, and estimates that link the mutual information and the sensitivity to per-turbations. We also consider a special case that allows us to compare our analyticalresults to those in Heckel et al. (2013). Conclusions and further directions of researchare in Sect. 4.

2 Influence, Sensitivity, and Strength

In this section, we provide analytical formulas for the sensitivity to perturbationsusing a complete orthonormal basis that does not assume independence of the Booleaninputs. We pair this with some computational aspects regarding the application of theformulas to an actual biological network.

2.1 Analytical Approach

Let �n = {0, 1}n and the random vector X valued in �n . If P denotes the proba-bility measure on the domain of definition of X , then denote PX−1 the (cumulative)distribution of X . A BN is modeled as the set [n] := {1, 2, . . . , n} of n nodes, eachnode being ON (that is in state 1) or OFF (that is in state 0). Then any ω ∈ �n is a

123


possible state of the network. Each node i ∈ [n] has an associated Boolean functionfi : �n → � that governs the dynamics of the node. We are usually interested inhow the network evolves by iterating the map F = ( f1, f2, . . . , fn) a large number oftimes. Although the measures used in this paper are discrete, we use notation typicalfor measure theoretical arguments to write shorter and more elegant proofs.

Given a node i with Boolean function fi : �n → �, the influence of the j th inputon fi has been formulated in various ways in the literature. Following the authors ofShmulevich and Kauffman (2004), Kahn et al. (1988), and Ghanbarnejad and Klemm(2012), we recall the following:

Definition 1 The influence, I j ( fi ), of variable x j on the Boolean function fi , orequivalently, the activity of node j on node i , is defined as follows:

I j ( fi ) := P( fi (X) �= fi (X ⊕ e j )) (1)

where X ⊕e j is the random vector obtained by flipping the j th slot of X from 1 to 0 orviceversa. The average sensitivity, avs( fi ), of fi is the sum of its incoming activities

avs( fi ) :=n∑

j=1

I j ( fi ). (2)

The strength, σ( fi ), of fi is the sum of the outgoing activities

σ( fi ) :=n∑

j=1

Ii ( f j ). (3)

Alternatively, I j ( fi ) can be regarded as the average of the Boolean partial derivative∂( j) fi (X) = δ fi (X), fi (X⊕e j ) with respect to the probability measure P as specifiedin Ghanbarnejad and Klemm (2012). Here δ is Kronecker’s delta function whichis equal to one if the two variables are equal and zero otherwise. The definition isoriginally introduced in the context of assuming the state of the BN as a random vectorX = (X1, . . . , Xn)with independently distributed coordinates, but this property playsno role in it. However, the aforementioned property plays an essential role in paperslikeHeckel et al. (2013), Klotz et al. (2014), orKahn et al. (1988)where it is essential inobtaining formulas for I j ( fi ) (and other related quantities relevant in the study of BNs)in terms of the Fourier coefficients of fi [see, e.g., (Kahn et al. 1988, Lemma 4.1)]. TheHilbert space where those formulas are obtained is L2(�n, dPX−1). The completeorthonormal basis used in Kahn et al. (1988) is the, so-called, Bahadur basis (seeHeckel et al. 2013). In order for that family of functions to forma complete orthonormalbasis of L2(�n, dPX−1) it is necessary that X1, . . . , Xn be independently distributed.However, the nodes of a BN may mutually influence each other, so independence is arestrictive assumption.

A composition operator is an operator on a linear space S of functions defined ona set E . For any fixed self-map ϕ of E , the operator

Cϕ f := f ◦ ϕ f ∈ S

123


is called the composition operator with symbol ϕ or induced by ϕ. It is necessarilylinear. We will use such operators on S = L2(�n, dPX−1).

Let ϕ j be the j th slot flip map. This means that, for all ω ∈ �n, ϕ j (ω) is theBoolean vector in �n obtained by flipping the j th coordinate of ω. Observe that ϕ j

are obviously self-inverse and hence so are the composition operators they induce,a fact which will be used without further comments throughout this paper. In thefollowing, 〈 , 〉 denotes the usual inner product of L2(�n, dPX−1) and ‖ ‖ the norminduced by that inner product. Also, T ∗ is the notation used for the adjoint of anyoperator T , whereas I denotes the identity operator. With these notations we prove:

Proposition 1 For all Boolean functions fi , the following formulas hold:

I j ( fi ) = 〈(I − Cϕ j )∗(I − Cϕ j ) fi , fi 〉 j = 1, . . . , n (4)

avs( fi ) = 〈T fi , fi 〉 where T =n∑

j=1

(I − Cϕ j )∗(I − Cϕ j ) (5)

σ( fi ) =n∑

j=1

〈Ti f j , f j 〉 where Ti = (I − Cϕi )∗(I − Cϕi ) (6)

Proof Using a well-known change in measure formula, for any Boolean function f(we drop the index for simplicity of notation) one can write

I j ( f ) = P({ f (X) �= f (X ⊕ e j )}) =∫

| f (X) − f (X ⊕ e j )|2 dP

=∫

| f (X) − f ◦ ϕ j (X)|2 dP =∫

�n| f − f ◦ ϕ j |2 dPX−1

= ‖(I − Cϕ j ) f ‖2 = 〈(I − Cϕ j )∗(I − Cϕ j ) f, f 〉.

��Proposition 2 For all j = 1, 2, . . . , n, let � j denote the largest eigenvalue of Tj =(I − Cϕ j )

∗(I − Cϕ j ), respectively, whereas � is the largest eigenvalue of T . Thefollowing estimates hold:

I j ( f ) ≤ � j E[ f (X)] (7)

avs( f ) ≤ � E[ f (X)]. (8)

Hence:σ( fi ) ≤ �i E[F(X)], i ∈ [n], (9)

where F = ∑nj=1 f j .

Proof Indeed, both the operators Tj and T are nonnegative operators, and hence, theirnumerical range is equal to the line interval with endpoints the least, respectively, thelargest eigenvalue. On the other hand, formulas (4) and (5) show that I j ( f )/‖ f ‖2and avs( f )/‖ f ‖2 belong to the numerical range of the operator Tj , respectively, T .

123


Combining all that with the fact that f , being a Boolean function, satisfies conditionE[ f (X)] = ‖ f ‖2, proves (7) and (8).

For arbitrary fixed i ∈ [n] one has by (7) that

σ( fi ) ≤ �i

n∑

j=1

E[ f j (X)].

Given that obviously∑n

j=1 E[ f j (X)] = E[F(X)], estimate (9) follows.

��The space L2(�n, dPX−1) has a simple and natural complete orthonormal basis,

namely B = {eω = χω/√pω : ω ∈ �n}, where

χω(x) = δω,x x ∈ �n

andpω = P(X = ω) ω ∈ �n .

We assume all states are possible, so pω > 0.Checking the fact that B is a complete orthonormal basis of L2(�n, dPX−1),

whether X1, . . . , Xn are independently distributed or not is left to the reader.

Proposition 3 For all j = 1, . . . , n, the operator Tj = (I − Cϕ j )∗(I − Cϕ j ) has a

matrix with respect to B whose entries are

aω,η(Tj ) = δω,η −√

pϕ j (ω)

pω

δϕ j (ω),η −√

pϕ j (η)

pη

δϕ j (η),ω

+√

pϕ j (ω) pϕ j (η)

pω pη

δϕ j (ω),ϕ j (η) ω, η ∈ �n . (10)

Hence, the entries in the matrix of the operator T are:

aω,η(T ) =n∑

j=1

aω,η(Tj ) ω ∈ �n . (11)

Proof Given ω, η ∈ �n , the entry aω,η(Tj ) in the matrix of (I − Cϕ j )∗(I − Cϕ j ) is

aω,η(Tj ) = 〈(I − Cϕ j )∗(I − Cϕ j )eη, eω〉 = 〈(I − Cϕ j )eη, (I − Cϕ j )eω〉. (12)

Note that

Cϕ j χν = χϕ j (ν) ν ∈ �n,

123


and hence

Cϕ j eν =√

pϕ j (ν)

pν

eν ν ∈ �n . (13)

Equalities (12) and (13) combine into establishing by a straightforward computationEq. (10).

��Therefore, one can state the following:

Corollary 1 Given a Boolean function f , the following practical formulas hold:

I j ( f ) =∑

ω,η∈�n

aω,η(Tj ) f (ω) f (η)√pω pη (14)

andavs( f ) =

∑

ω,η∈�n

aω,η(T ) f (ω) f (η)√pω pη. (15)

Hence:

σ( fi ) =n∑

j=1

∑

ω,η∈�n

aω,η(Ti ) f j (ω) f j (η)√pω pη, i ∈ [n]. (16)

Indeed, (14) and (15) are immediate consequences of the matricial description ofoperators Tj and T combined with the following computation of the Fourier coeffi-cients cω, ω ∈ �n of f relative to B:

cω = 〈 f, eω〉 = 1√pω

∫

�nf χωdPX−1 = 1√

pω

f (ω)P(X = ω) = f (ω)√pω.

(17)

Example 1 In this example, we show that formula (14) agrees with the definition ofI j ( f ). To this goal, assume a product distribution, which is the basic assumption ofHeckel et al. (2013), such that every state of the network is equally likely. Thus, theprobability of any state is 1/2n . Then

I j ( f ) =∑

ω,η∈�n

aω,η(Tj ) f (ω) f (η)√pω pη = 1

2n∑

ω,η∈�n

aω,η(Tj ) f (ω) f (η)

= 1

2n∑

ω,η∈supp f

[δω,η − δϕ j (ω),η − δϕ j (η),ω + δϕ j (ω),ϕ j (η)

]

where supp f is the support f −1(1) of the function f .Let f be the Boolean function with support supp f = {(0, 1, 1), (1, 1, 1)}. Then

obviously a flip of x1 does not generate a flip in the output, so I1( f ) = 0. Similarly,a flip in x2 generates a flip of the output only for (x1, x2, x3) = (0,1,1), (0,0,1), (1,0,1)or (1,1,1), so, by definition, I2( f ) = 4/23 = 1/2. By symmetry, I3( f ) = 1/2. On theother hand,

123


I1( f ) = 1

8[a(0,1,1),(0,1,1)(T1) + a(0,1,1),(1,1,1)(T1) + a(1,1,1),(0,1,1)(T1)

+ a(1,1,1),(1,1,1)(T1)] = 1

8[2 + (−2) + (−2) + 2] = 0

and similarly

I2( f ) = 1

8[2 + 0 + 0 + 2] = 1

2

so one can see that formula (14) agrees with the definition of I j ( f ).

In order to compute the influence of each node in the network on all its output nodes(those towhich the node under consideration is an input), we generateMATLAB codesthat involve nested “for loops.”Note that, if we denote by ki the actual number of inputsto node i (its connectivity), for formula (14) there are i × ki × 22ki such loops. Thisexponential number of loops can easilymake the computations prohibitive. As amatterof fact, in the actual simulations, even a single connectivity of at least 12 nodes turnedout to be excessive for the capabilities of MATLAB. Thus, one has to rely either oneasier estimates, such as the upper bounds of Proposition 2, or find alternative exactformulas. However, a quick analysis indicates that to compute the upper bounds onewould still need exponentially many “for loops,” since the procedure would requireagain the construction of the matrices Tj and T . For this reason, in the sequel we usethe following equivalent formula which follows from the definition.

Recall that, for any Boolean function f, supp f denotes the support of f , that is,supp f = f −1(1).

Remark 1 The following formula holds

I j ( f ) =∑

ω∈supp f \ϕ j (supp f )

(pω + pϕ j (ω)

), (18)

where the sum above is considered 0 if supp f \ ϕ j (supp f ) = ∅.Hence:

Corollary 2 Let fi , i ∈ [n] be the Boolean update function of node i . Then

avs( fi ) =n∑

j=1

⎛

⎝∑

ω∈supp fi\ϕ j (supp fi )

(pω + pϕ j (ω)

)⎞

⎠ i ∈ [n] (19)

σ( fi ) =n∑

j=1

⎛

⎝∑

ω∈supp f j\ϕi (supp f j )

(pω + pϕi (ω)

)⎞

⎠ i ∈ [n]. (20)

Observe that:

123


Remark 2 Formula (18) is computationally efficient because it identifies the influentialnodes, namely those in the support of the given Boolean function which get mappedoutside of support by that function. In particular, the variable x j has null influence onthe Boolean function f , that is I j ( f ) = 0, if and only if ϕ j (supp f ) ⊆ supp f .

Indeed, ϕ j (supp f ) ⊆ supp f implies

supp f = ϕ j (ϕ j (supp f )) ⊆ ϕ j (supp f ) ⊆ supp f.

This agrees perfectly with the definition (1) of the influence as the probability of achange in the output of f when its j th input is flipped.

To understand the computational efficiency of formula (18), or equivalently thedefinition of the influence, we run MATLAB codes for both formula (14) and (18)on the same network, and compare the processing times. The results are included inSupplementary Material, Section 1. We present one more example.

Example 2 We call a BN: “Dominant states network (DSN)” if the update Booleanfunctions of nodes are characteristic functions of distinct states (called dominantstates), that is if there is a set of n states S = {ω1, . . . , ωn}, so that f1 = χω1 , . . . , fn =χωn .

Our previous considerations and formulas show by straightforward computationsthat, in the case of a DSN, one has that:

σ( fi ) =n∑

j=1

(pω j + pϕi (ω j )

)i ∈ [n]

and

avs( fi ) =n∑

j=1

(pωi + pϕ j (ωi )

)i ∈ [n].

3 Determinative Power and Strength

In this section, we are comparing the impact of node strength to the so-called deter-minative power of nodes defined and explored in Heckel et al. (2013) under theassumption of a BN with product distribution of states. We recall the main defini-tions and concepts from Heckel et al. (2013) and Cover and Thomas (2006). Theseinclude the notion of entropy of random variables, which is a measure of uncertainty,and the mutual information which is a measure of dependence between two randomvariables and is defined in terms of the entropy.

Definition 2 Let X and Y be discrete random variables. The entropy of X is definedas

H(X) = −∑

x

px log2 px = −E[log2(X)]

123


which in binary reduces to the function

h(p) = −p log2(p) − (1 − p) log2(1 − p), p = P(X = 1).

The conditional entropy of Y conditional on the knowledge of X is

H(Y |X) =∑

x

px H(Y |X = x) = −E[log2 P(Y |X)].

The mutual information (MI) is the reduction in uncertainty of the random variable Ydue to the knowledge of X . That is

MI(Y ; X) = H(Y ) − H(Y |X).

In principle, the mutual information is a measure of the “gain of information,” orthe determinative power (DP) of X over Y . The authors of Heckel et al. (2013) usethe MI to construct the DP of a node j over the states of a BN, namely

DP( j) =n∑

i=1

MI ( fi (X); X j ) (21)

which represents a summation of all “information gains” obtained from node j overits outputs (i.e., nodes that have j as an input). Besides providing a number of relatedmathematical results to which we will refer below, the authors identify the nodeswith the largest determinative power in a feedforward E. coli network, assuming aproduct distribution of the input states. The goal is to be able to reduce the networkto a smaller sub-network whose knowledge can provide sufficient information aboutthe entire network; in other words the entropy of the network conditional on theknowledge of this sub-network is small enough. The authors show that by consideringthe nodes with most DP one can reduce the network to less than a half of its originalsize, so that for larger sub-networks the entropy does not improve significantly oncean approximate (threshold) network size is reached.

As specified in Introduction, network reduction is an important topic in the lit-erature, since many real networks have sizes that lead to prohibitive computationsand manipulations as we can see in Supplementary Material with the computationof the influence. For instance, signal transduction networks such as that of a genericfibroblast cell, which we will use as an example, can have thousands of nodes thatare interconnected. The analysis and dynamical study of such networks becomes pro-hibitive due to the computational burden despite the advances in technology and datascience. Thus, finding meaningful ways to reduce the network to a significant “core”or “relevant component” has been of interest for a number of authors, and a numberof procedures have been proposed. Ultimately, all of them generate a clear trade-offbetween accuracy and computational burden.

We are interested in comparing the effect of network reduction applied to the fibrob-last network, by considering the nodes with largest DP on the one hand, and the nodes

123


with largest strength values on the other hand. Before we compare them, let us focus onsome theoretical results that supplement some of the formulas in Heckel et al. (2013)for the less restrictive case we consider in this paper, that is, not requiring a productdistribution of the input values.

If X is the state of the network with values in �n , let XA taking values in �|A| bethe collection of states of the nodes in set A ⊆ [n]. So X can be written as (XA, XAc ),where Ac = [n] \ A. Let pω|ωA = P(X = ω|XA = ωA).

Theorem 1 The following formula for conditional entropy holds

H( f (X)|XA) = EA

⎡

⎣h

⎛

⎝∑

ω∈supp f

pω|XA

⎞

⎠

⎤

⎦ (22)

where EA refers to expected value with respect to the marginal distribution of X A.

Proof By the definition of the conditional entropy

H( f (X)|XA) =∑

ωA∈�|A|P(XA = ωA)H( f (X)|XA = ωA)

which in our binary case reduces to

H( f (X)|XA) =∑

ωA∈�|A|P(XA = ωA)h(P( f (X) = 1|XA = ωA))

and the obvious equality P( f (X) = 1|XA = ωA) = E[ f (X)|XA = ωA] implies

H( f (X)|XA) =∑

ωA∈�|A|P(XA = ωA)h(E[ f (X)|XA = ωA])

= EA[h(E[ f (X)|XA])]. (23)

But

E[ f (X)|XA = ωA] =∑

ω∈�n

f (ω)P(X = ω|XA = ωA) =∑

ω∈supp f

pω|ωA

which is a number in [0, 1] and we can substitute it in (23) to get formula (22).

��Formula (22) is exactly the analog of Theorem 1 of Heckel et al. (2013) where it is

written for a system with states −1 and 1 as opposed to 0 and 1.

Proposition 4 The mutual information formula M I ( f (X); XA) can be written as

M I ( f (X); XA) = h

⎛

⎝∑

ω∈supp f

pω

⎞

⎠ − EA

⎡

⎣h

⎛

⎝∑

ω∈supp f

pω|XA

⎞

⎠

⎤

⎦ . (24)

123


Proof The formula for MI follows again directly from the definition of the mutualinformation

MI ( f (X); XA) = H( f (X)) − H( f (X)|XA) = h(E[ f (X)]) − H( f (X)|XA).

But

h(E[ f (X)]) = h

(∑

ω∈�n

f (ω)P(X = ω)

)= h

⎛

⎝∑

ω∈supp f

pω

⎞

⎠

with the argument of h being in [0, 1] as needed. Substituting this and formula (22)in the definition of the mutual information we obtain formula (24).

��Now we focus on two special (extreme) cases considered in Heckel et al. (2013),

to see if we can identify an analog of the results in that paper, where the authors usethe additional assumption of independence paired with the Bahadur basis (a familyof functions which form a complete orthonormal basis of L2(�n, dPX−1) providedthat X1, X2, . . . , Xn are independent).

Special cases:

• A = [n].Then Ac = ∅ and

MI ( f (X); X) = h

⎛

⎝∑

ω∈supp f

pω

⎞

⎠

which is maximized for∑

ω∈supp f pω = 1/2, in other words if E[ f (X)] =P( f (X) = 1) = 1/2. Hence, we are dealing with a nonbiased function. So thecloser the function f is to a nonbiased function, the larger the MI between itsoutput and all of its inputs. This is similar to what was observed in Heckel et al.(2013).

• A = {i} where i is a fixed input/node.Thus, ωA = ωi , XA = Xi . The mutual information can be written as

MI ( f (X); Xi ) = h

⎛

⎝∑

ω∈supp f

pω

⎞

⎠ − P(Xi = 1)h

⎛

⎝∑

ω∈supp f

pω|1

⎞

⎠

−P(Xi = 0)h

⎛

⎝∑

ω∈supp f

pω|0

⎞

⎠

In comparison with Heckel et al. (2013), this formula does not allow for a simpli-fication to a small subset of Fourier coefficients in order to find the MI. In theirformula, the authors of Heckel et al. (2013) identify three coefficients that act as

123


independent variables. Based on them they manage to obtain some informationon the behavior of the MI, which is not the case for our approach. The reasonfor considering this special case is that the quantity MI( f (X); Xi ) is also knownas information gain, or informativeness, and is common in information theory.Thus, we note that a change in the underlying basis can induce different resultsand situations.

Now that we have a deeper understanding of the MI as used in formula (21), let usturn to the network under consideration, namely the signal transduction network of ageneric fibroblast cell which consists of several main signaling pathways, includingthe receptor tyrosine kinase, the G-protein coupled receptor, and the integrin signalingpathway. A Boolean representation of this network has been provided in Helikar et al.(2008), and has been studied further in Kochi and Matache (2012) and Kochi et al.(2014). The fully annotated signal transduction model is freely available for simula-tions and/or download via the Cell Collective software from www.thecellcollective.org Helikar et al. (2012) and Helikar et al. (2013). Each node in the model representsa signaling molecule (mainly protein). The network has 130 nodes with connectivitiesthat vary between 1 and 14 nodes.

Using formula (18), one can quickly determine the average sensitivity and thestrength of all nodes in the fibroblast network. In Fig. 1, we plot them against thenodes (in alphabetic order), together with two more plots on the connectivity and thebias, i.e., P( f (X) = 1), for an overall view of the main numerical characteristics ofthe fibroblast nodes.

We compute the DP of the nodes in the fibroblast network and compare them withthe node strength σ . The results are shown in Fig. 2. For the network under consid-eration, the strength values seem to be slightly larger than the DP values. We haveconducted a statistical analysis related toDP and σ values for the fibroblast network. Insummary, there is enough statistical evidence that the average DP-σ is negative with ap value of basically zero. The paired test gives an upper bound of−0.14208 for a 95%confidence interval for the difference DP−σ . On the other hand, a linear regressionanalysis indicates a fairly strong linear relationship between the two variables with a75.1% coefficient of determination (COD), and a higher COD of 82.4% for the linearrelationship between the average σ and the number of outlinks corresponding to thenodes. The average values are computed over all nodeswith a given number of outlinks.This relationship is weaker for average DP versus number of outlinks with a COD of60.3%. We also note that the outliers occur mostly for nodes with a larger number ofoutlinks. In other words, fewer outlinks generate a stronger correlation between theDP or σ and the number of outlinks. For example, there is one particular node in thenetwork, namely EGFR, that generates the maximum DP and is the only node with 13outlinks. If we eliminate this node from the correlation analysis, the COD for averageDP versus outlinks increases from 60.3 to 81.3%. Notably, mutations of the EGFR,epidermal growth factor receptor, are known to be related to lung cancer, interferingwith the signaling pathways within the cell triggered to promote cell growth and divi-sion (proliferation) and cell survival. The second node in the order of DP is ASK1,apoptosis signal-regulating kinase 1, and plays important roles in many stress-relateddiseases, including cancer, diabetes, cardiovascular, and neurodegenerative diseases.

123

www.thecellcollective.org

www.thecellcollective.org


0 50 1000

5

10

15Connectivity

nodes

nu

mb

er o

f in

pu

ts

average=4.4231, max=14

0 50 1000

0.2

0.4

0.6

0.8

1Bias

nodes

P(f

(X)=

1)

average = 0.4378

0 50 1000

0.5

1

1.5

2

nodes

avs

Average sensitivity

min=0.16, average=1.126, max=1.77

0 50 1000

1

2

3

4

5

6

7

nodes

σ

Strength

average=1.0485, max=6.5547

Fig. 1 (Color Figure online) Main numerical characteristics of the nodes of the fibroblast network asspecified in each subplot

The third node is Src, proto-oncogene tyrosine-protein kinase, which is involved inthe control of many functions, including cell adhesion, growth, movement, and dif-ferentiation. The fourth node is PIP3_345, phosphatidylinositol (3,4,5)-trisphosphatethat functions to activate downstream signaling components, while the fifth node isPKC, protein kinase C, involved in receptor desensitization, in modulating membranestructure events, in regulating transcription, in mediating immune responses, in regu-lating cell growth, and in learning andmemory. The DP procedure managed to capturethe importance of these nodes in relationship to the rest of the network. Four of thetop five DP nodes are also among the five strongest nodes which are: Src, PIP3_345,PKC, PIP2_45, and EGFR. Thus, the strength also captures biologically importantnodes. Moreover, higher DP and strength values are correlated with a larger numberof outlinks as seen from the figures, which means that this procedure can identify hubsin the network. It is also apparent from the figures that the COD increases when con-sidering smaller DP and σ values.We have included relevant figures in SupplementaryMaterial, Sect. 2

We would also like to point out at this time that the MI has been used as way toidentify relevant pairs of genes in genetic expression data sets by Butte and Kohane(2000, 2003), and Jiang et al. (2009). Those authors identify relevance networks byselecting pairs of genes whose MI surpasses a given threshold. For example, in Butte

123


0 20 60 100 1300

4

7

nodes in alphabetic order

DP

,σDeterminative power (DP) and strength (σ)

for the fibroblast nodes

0 20 60 100 130−4

−3

−2

−1

0

1

2

nodes in alphabetic order

DP

− σ

Difference (DP − σ) by nodes

0 20 60 100 1300

4

7

sorted nodes

DP

, σ

DP and σ by nodes sorted inascending order of DP

0 20 60 100 1300

4

7

sorted nodes

DP

, σ

DP and σ by nodes sorted in ascending order of σ

DPσ

DPσ

DPσ

Fig. 2 (Color Figure online) Comparison of DP and σ by nodes of the fibroblast network. The nodes aresorted by names in the top panels, while in the bottom panels they are ordered according to increasing DP,and σ , respectively, as indicated in the graphs. Note that the strength values seem to be slightly larger thanthe DP values

and Kohane (2000) it is shown that the relevance networks are of the following types:those that link identical genes, those linking geneswith similar functions, those linkinggenes in the same biological pathway, and combinations of these. In our work, the MIis directed from the node toward its outputs via the outlinks, so we do not involvethe bidirectional aspect of the previous studies mentioned here. Moreover, we do notuse genetic expression data sets, but actual Boolean functions. Nevertheless, it willbe of interest to explore in the future the main types or classes of nodes/sub-networksidentified by the DP procedure in a variety of cellular networks.

Next, let us compute the network entropy generated by sub-networks chosen basedon top DP or strength values of nodes. If we denote by Al the collection of the top lnodes in order of DP or σ , then we compute

H(X |XAl ) ≤n∑

i=1

H(Xi |XAl ), for l = 1, 2, 3, . . . , n. (25)

We plot the values of the larger quantity in (25) against l and obtain the graph inFig. 3, where we note that the entropy decreases with increased sub-network size

123


l, and that the two curves are very close, with a slightly better result for σ whichprovides lower entropy values for most values of l, as seen from the bottom panelwhere we plot also the differences of entropy values by l. As the sub-network sizeincreases, the strength provides a somewhat tighter upper bound for the entropy of thenetwork. Note that the actual entropy will be smaller than the upper bound graphed inFig. 3, and that sub-networks of sizes 60 or more (with approximation), do not yield asignificant improvement of the entropy. Thus, it suffices to consider less than half ofthe original network to be able to predict the overall network behavior with fairly lowuncertainty/entropy levels.

Knowing that the DP method allows a network reduction without a significantincrease in entropy, one could use a tested approach for node elimination that preservesessential dynamical properties to reduce the network to the sub-network of the top DPnodes. For example, in Naldi et al. (2009b) the authors introduce a general method foreliminating nodes one by one by basically connecting directly the inputs of a removednode to its output nodes. Of course, one needs to either keep or consider some extradecisions on autoregulated nodes, that is nodes that are or may become self-inputsupon elimination of other nodes. One also needs to understand the impact of the orderin which nodes are removed. In our case, the natural order is provided by the sortedDP values. In Naldi et al. (2009b), it is shown that with their approach the attractorsand stable states are preserved. We note here that there are nodes whose removal mayhave no impact on the dynamics, like those with no outputs. Moreover, the authors ofNaldi et al. (2009b) have developed a Java software Naldi et al. (2009a) that allowsone to apply the reduction algorithm and analyze attractors. Alternative methods havebeen proposed by Veliz-Cuba and collaborators in Veliz-Cuba (2011); Veliz-Cubaet al. (2015) that could be used in conjunction with the DP method. Future work willexplore those methods.

One could actually go one step further and provide the following estimates for theconditional entropy H( f (X)|XA), which are the analog of Theorem 2 of Heckel et al.(2013).

Theorem 2 The following estimates of the conditional entropy hold:

LB ≤ H( f (X)|XA) ≤ LB1/ln 4 (26)

where

LB = 4(E[ f (X)] − EA[E[ f (X)|XA]2]

)

= 4

⎛

⎜⎝∑

ω∈supp f

pω −∑

ωA∈�|A|pωA

⎛

⎝∑

ω∈supp f

pω|ωA

⎞

⎠2⎞

⎟⎠ (27)

Proof We use the following inequality (found in Topsoe 2001 and used in Heckel et al.2013) that provides lower and upper bounds on the binary entropy function h(p)

4p(1 − p) ≤ h(p) ≤ [4p(1 − p)]1/ln 4.

123


0 20 40 60 80 100 1200

20

40

60

80

100

l (sub−networks generated by l nodes)

H(X

|XA

l)Upper bound of entropy; initial network entropy = 107.983

0 20 40 60 80 100 120−3

−2

−1

0

1

2

3

4

5

l (sub−networks generated by l nodes)

H(X

|XA

l)

Difference of entropy values: DP−σ

DPσ

Fig. 3 (Color Figure online) Values of the upper bound in (25) for sub-networks chosen based on the top lvalues of DP and σ , respectively. The bottom panel shows the differences in the entropy that favor mostlyσ for l > 20 with approximation

If in formula (22) we denote

q(XA) =∑

ω∈supp f

pω|XA

then LB = EA[4q(XA)(1 − q(XA))]. For the upper bound, we use the fact that

EA[(4q(XA)(1 − q(XA)))1/ln 4] ≤ EA[4q(XA)(1 − q(XA))]1/ln 4.

Then the double inequality (26) is immediate. Now, LB can be expressed as follows

LB = EA[4q(XA)(1 − q(XA))]

= 4EA

⎡

⎣

⎛

⎝∑

ω∈supp f

pω|XA

⎞

⎠

⎛

⎝1 −∑

ω∈supp f

pω|XA

⎞

⎠

⎤

⎦

= 4EA

⎡

⎢⎣

⎛

⎝∑

ω∈supp f

pω|XA

⎞

⎠ −⎛

⎝∑

ω∈supp f

pω|XA

⎞

⎠2⎤

⎥⎦

123


= 4

⎛

⎜⎝∑

ωA∈�|A|pωA

∑

ω∈supp f

pω|ωA −∑

ωA∈�|A|pωA

⎛

⎝∑

ω∈supp f

pω|ωA

⎞

⎠2⎞

⎟⎠

= 4

⎛

⎜⎝∑

ω∈supp f

pω −∑

ωA∈�|A|pωA

⎛

⎝∑

ω∈supp f

pω|ωA

⎞

⎠2⎞

⎟⎠

where the first term is obtained by the law of total probability.

��We are interested in identifying possible relationships or inequalities between themutual information and the influence of a subset of nodes of the network. In Heckelet al. (2013), the authors show that for a given collection of nodes A ⊆ [n], thefollowing holds

IA( f ) ≥ mini∈A

(1

σ 2i

)[MI( f (X); XA) − �(Var( f (X)))] (28)

where �(x) = x (1/ ln 4) − x takes positive values that are very close to zero, andσ 2i = Var(Xi ), under the assumption of independence of the random variables. Let

us explore an alternative inequality using the results of this paper. For this purpose, weconsider a special case that allows us to compare directly our results with inequality(28). In this special case, we consider a uniform distribution of inputs, which is actuallya case of product distributions or independent inputs.

3.1 Special Case

Consider a network with n nodes, such that for each ω ∈ �n , we have pω = 12n , so

that we consider a uniform distribution of the inputs. Let |supp f | = K and A ⊆ [n].Then

pωA = P(XA = ωA) =∑

ωcA∈�n−|A|

1

2n= 1

2|A| ωA ∈ �|A|.

Using formula (24), we can write the following for the MI,

MI( f (X); XA) = h

⎛

⎝∑

ω∈supp f

pω

⎞

⎠ − EA

⎡

⎣h

⎛

⎝∑

ω∈supp f

pω|XA

⎞

⎠

⎤

⎦

= h

(K

2n

)−

∑

ωA∈�|A|pωAh

⎛

⎝∑

ω∈supp f

pω|ωA

⎞

⎠

123


= h

(K

2n

)− 1

2|A|∑

ωA∈�|A|h

⎛

⎝∑

ω∈supp f

pω|ωA

⎞

⎠

Note that

pω|ωA = P(X = ω|XA = ωA) = P(X = ω, XA = ωA)

P(XA = ωA)= P(X = ω)

P(XA = ωA)= 1

2n−|A| .

If we let KωA = |supp f ∩ Pr−1A (ω)|, where PrA is the projection of ω on A, then

MI( f (X); XA) = h

(K

2n

)− 1

2|A|∑

ωA∈�|A|h

(KωA

2n−|A|

). (29)

We immediately notice that 0 ≤ KωA ≤ K for all ωA ∈ �|A|, and that∑ωA∈�|A| KωA = K , so that we create a partition of supp f . Therefore, in the sum of

(29), some of the terms could be zero, since not all ωA ∈ �|A| need to be representedin supp f , leading to KωA = 0, which in turn leads to h(0) = 0.

Let us focus on IA( f ) = ∑j∈A I j ( f ). By formula (18) we obtain

I j ( f ) =∑


(pω + pϕ j (ω)

) =∑


1

2n−1

= |supp f \ ϕ j (supp f )|2n−1 .

If m j = |supp f ∩ ϕ j (supp f )| then

I j ( f ) = K − m j

2n−1

and consequently

IA( f ) =∑

j∈A

K − m j

2n−1 = K |A| − ∑j∈A m j

2n−1 . (30)

Notice that 0 ≤ m j ≤ K and that m j is an even number since ϕ j is its own inverse.Observe that if in formula (28) of Heckel et al. (2013) we consider a uniform

distribution of random variables, then all σ 2i = 1, and thus the influence bounds above

the MI minus a small positive quantity. We would like to check a similar inequalityfor our case. To this end, we first generate some simulations in which we plot bothMI( f (X); XA) and IA( f ) versus K for various values of |A| in Fig. 4, using formulas(29) and (30). Not only is IA( f ) an upper estimate of MI ( f (X); XA), but also itsvalues are significantly larger. So in our case, the inequality becomes

MI( f (X); XA) ≤ IA( f ) (31)

123


0 50 100 150 200 2500

2

4

K

n=8, |A|=1

MI(f o X; XA) I

A(f)

0 50 100 150 200 2500

2

4

K

n=8, |A|=2

0 50 100 150 200 2500

2

4

K

n=8, |A|=3

0 50 100 150 200 2500

2

4

K

n=8, |A|=4

0 50 100 150 200 2500

2

4

K

n=8, |A|=5

0 50 100 150 200 2500

2

4

K

n=8, |A|=6

0 50 100 150 200 2500

2

4

K

n=8, |A|=7

0 50 100 150 200 2500

2

4

K

n=8, |A|=8

Fig. 4 (Color Figure online) Graphs of IA( f ) and MI( f (X); XA) computed with formulas (29) and (30)versus K for a network with n = 8 nodes and |A| = 1, 2, . . . , n as specified in the subplots. The MI curveis very close to zero for most values of |A|, while the influence has much larger values in all cases. Recallalso that MI is always a number in [0, 1]. The actual values increase with the increase in |A|. Notice alsothe expected symmetry as K crosses from values less than to values greater than 2n/2

which is stronger than the corresponding inequality of Heckel et al. (2013). Besides,the quantityMI( f (X); XA)−�(Var( f (X))) used inHeckel et al. (2013) takes on alsonegative values. For example, one can easily check that this is true for n = 8, K =1, |A| = 3, 4, 5 under the assumptions of the uniform distribution (which leads toindependence of random variables). Then the inequality becomes superfluous.

Observe that in this case, if A = { j}, and fi is the Boolean function associatedwith node i of the network, then the inequality (31) becomesMI( fi (X); X j ) ≤ I j ( fi )which implies

∑ni=1 MI ( fi (X); X j ) ≤ ∑n

i=1 I j ( fi ), or in other words DP( j) ≤σ( f j ). However, when looking at the relationship between the DP and strength valuesin Fig. 2, we observe that this inequality does not hold for all nodes of the fibroblastnetwork. That is most likely due to the actual dependencies between the states ofthe nodes. Thus, dependent inputs may lead to different results. However, we noticein Fig. 2, top right graph, that the magnitudes of the positive differences DP−σ aregenerally smaller than themagnitudes for negative differences. Thus, itmaybepossiblethat a version of inequality (28) is still valid. We have not been able to find such an

123


alternative inequality so far. On the other hand, all the examples involving dependentinputs we have looked at, support inequality (31).We provide one such example beforereturning to the Special Case 3.1.

Example 3 Consider the Boolean function f (x) where x = (x1, x2, x3), representedby the truth table shown below and the corresponding probabilities of states. It is easyto check that this is not a product distribution, so the variables are dependent; for exam-ple, P(X1 = 0, X2 = 0) = 3/10 while P(X1 = 0)P(X2 = 0) = 1/4. One can alsocheck that I1( f ) = 3/5 and I2( f ) = I3( f ) = 1/2. In this case, for any A ⊆ [3] ofcardinal at least two, we get automatically that MI( f (X); XA) ≤ 1 ≤ IA( f ). On theother hand, for A = {1} we obtain via formula (24), that MI( f (X); X{1}) = h(1/2)−(1/2)h(3/5) − (1/2)h(2/5) = .029 < 0.6 = I1( f ). Also, MI( f (X); X{2}) =MI( f (X); X{3}) = h(1/2) − (1/2)h(7/10) − (1/2)h(3/10) = .1187 < 0.5 =I2( f ) = I3( f ). So in all cases, MI( f (X); XA) ≤ IA( f ).

(x1, x2, x3) f (x1, x2, x3) P(x1, x2, x3)

(0, 0, 0) 0 3/20(0, 0, 1) 1 3/20(0, 1, 0) 0 1/20(0, 1, 1) 1 3/20(1, 0, 0) 1 3/20(1, 0, 1) 1 1/20(1, 1, 0) 0 3/20(1, 1, 1) 0 3/20

Nowwe return to our Special Case 3.1 andwe conjecture that the following inequal-ity is true for all choices of parameters

h

(K

2n

)− 1

2|A|∑

ωA∈�|A|h

(KωA

2n−|A|

)≤ K |A| − ∑

j∈A m j

2n−1 . (32)

A general proof of this inequality seems to be very technical and intricate. Wepresent briefly a few particular cases whose proofs can be found in SupplementaryMaterial, Sect. 3 Note that the extreme cases of K = 0 and K = 2n are triviallysatisfied since they lead to null quantities on both sides of inequality (32).Case 1: The support is a singleton.

The inequality (32) takes on the particular form

h

(1

2n

)− 1

2kh

(1

2n−k

)≤ k

2n−1

where |A| = k.One obtains the following consequence, which is valid no matter the cardinality of

supp f .

123


Corollary 3 Inequality (32) holds if |PrA(supp f )| = 1.

Case 2: supp f = {τ, η}, τ �= η

The inequality (32) becomes

h

(2

2n

)− 1

2k∑

ωA∈�k

h

(KωA

2n−k

)≤ 2k − ∑

j∈A m j

2n−1 .

Case 3: The support is a subgroup of �n and A = [n].What is meant here is that we identify {0, 1} toZ2, the additive group of equivalence

classes modulo 2, and �n to the product group Zn2. For any fixed j ∈ [n], denote by

δ j the Boolean vector in �n whose entries are all null, except entry j . Under thepreviously described identification, one easily sees that, given a Boolean function f ,the quantities

m j = |supp f ∩ ϕ j (supp f )| j = 1, . . . , n,

can be calculated with the alternative formula

m j = |supp f ∩ (δ j + supp f )| j = 1, . . . , n,

where the kind of addition used is addition modulo 2. Finally, recall that the order ofa subgroup of �n must be a divisor of 2n ; hence, it will have the form 2k for somenonnegative integer k ≤ n. Keeping all the above in mind, we state and prove thefollowing:

Lemma 1 Let f be a Boolean function, S its support, and 〈S〉, the subgroup of �n

generated by S. Then, the following inequality holds:

n∑

j=1

m j ≤ k2k

where 2k = |〈S〉|.Now, observe that if A = [n], the inequality we wish to prove has the form

h

(K

2n

)≤ Kn − ∑n

j=1m j

2n−1 .

and if S = supp f is a subgroup of �n of order 2k , then the inequality becomes

h

(1

2n−k

)≤ 2kn − ∑n

j=1m j

2n−1 .

123


This one holds since one can write

h

(1

2n−k

)≤ (n − k)

2n−k−1 = n2k − k2k

2n−1 ≤ 2kn − ∑nj=1m j

2n−1 .

Remark 3 If f is a Boolean function having support S, A = [n], and 〈S〉 ∩{δ1, . . . , δn} = ∅, then (32) holds.

4 Final Comments

The main conclusions of this work are that operator theory can offer computationallyefficient ways to find or estimate important quantities used in assessing the sensitivityto perturbations of BNs, and to quantify the relevance of nodes using elements ofinformation theory, in particular MI. We conclude that MI is an excellent tool foridentifying a subset of relevant nodes in the network that offer the most informationgain and whose knowledge reduces the entropy of the whole network significantly.Moreover, the MI provides a lower estimate for the influence of nodes in variousscenarios.

It would be of interest to continue this exploration under various scenarios of depen-dent nodes in the network, as well as to refine further some of the results of this paper.For example, could one strengthen inequality (32) and prove it in general or for dif-ferent scenarios?

On the other hand, in Klotz et al. (2014) it is shown that MI is maximized for canal-izing functions. However, real networks do not consist of a single type of Booleanfunction. Therefore, it would be of interest to explore a possible hierarchy of varioustypes of functions regarding the information gain they provide. That could offer moreinformation regarding the inequality between the MI and the influence. Besides, mostof the functions in real networks, such as cellular networks, need not be strictly canal-izing as considered in Klotz et al. (2014) (i.e., one value of one of the inputs forcesthe output to take on a certain fixed value regardless of the other inputs). In reality,functions may be only partially canalizing, allowing for multiple, but not necessar-ily all inputs, to be canalizing in a cascading fashion as discussed, for example, inLayne et al. (2012) or Dimitrova et al. (2015). Partially nested canalizing functionshave been considered recently as a more realistic alternative to canalizing functions.Another type of function that is common in applications is a threshold function thatturns a node ON provided that a sufficient number of inputs are ON. This type offunction is typical for neural networks. This opens the door for a variety of directionsof research that stem from the work in this paper.

Furthermore, as specified before, it is our intention to explore the Java softwareNaldi et al. (2009a) to actually perform network reduction preserving attractors andstability and use it to analyze dynamics of various signal transduction networks foundin Cell Collective Helikar et al. (2012) and Helikar et al. (2013).

Finally, it would be of great interest to look further into other real networks to iden-tify the possibility of reducing them to the most determinative nodes. For example,

123


it would be interesting to compare the results of the fibroblast network to other net-works such as the Boolean model of the influenza–host interactions during infectionof an epithelial cell Madrahimov et al. (2013), to identify possible similarities anddifferences that may occur. At the same time, identifying possible classes of biologi-cal nodes/sub-networks obtained with the DP sub-network procedure in a variety ofother networks could bring further clarifications on the advantages of the DP method.Even more, it is important to assess the degree to which the reduced network providesaccurate information on specific tasks typical for the whole network, such as patternrecognition or decision making. At the same time, exploring the impact of consideringDP versus strength might provide new insights.

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 Interna-tional License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution,and reproduction in any medium, provided you give appropriate credit to the original author(s) and thesource, provide a link to the Creative Commons license, and indicate if changes were made.

References

Albert R, Othmer H (2003) The topology of the regulatory interactions predicts the expression pattern ofthe segment polarity genes in Drosophila melanogaster. J Theor Biol 223:1–18

Bilke S, Sjunnesson F (2001) Stability of the Kauffman model. Phys Rev E 65:016129Butte AJ, Kohane IS (2000) Mutual information relevance networks: functional genomic clustering using

pairwise entropy measurements. Pac Symp Biocomput 5:415–426Butte AJ, Kohane IS (2003) Relevance networks: a first step toward finding genetic regulatory networks

within microarray data. In: Parmigiani G, Garett ES, Irizarry RA, Zeger SL (eds) The analysis of geneexpression data. Part of the series statistics for biology and health. Springer, Berlin, pp 428–446

Cover TM, Thomas JA (2006) Elements of information theory. John Wiley & Sons, Inc., Hoboken, NewJersey

Devloo V, Hansen P, LabbéM (2003) Identification of all steady states in large networks by logical analysis.Bull Math Biol 65(6):10251051

Dimitrova ES, Yordanov OI, Matache MT (2015) Difference equation for tracking perturbations in systemsof boolean nested canalyzing functions. Phys Rev E 91:062812

Feldman DP, Crutchfield JP (1998) Measures of statistical complexity: why? Phys Lett A 238(45):244252Ghanbarnejad F, KlemmK (2012) Impact of individual nodes in boolean network dynamics. Europhys Lett

99(5):58006Goles E, Montalva M, Ruz GA (2013) Deconstruction and dynamical robustness of regulatory networks:

application to the yeast cell cycle networks. Bull Math Biol 75(6):939–966Heckel R, Schober S, Bossert M (2013) Harmonic analysis of boolean networks: determinative power and

perturbations. EURASIP J Bioinform Syst Biol 2013:1–13Helikar T, Konvalina J, Heidel J, Rogers JA (2008) Emergent decision-making in biological signal trans-

duction networks. PNAS 105(6):1913–1918Helikar T, Kowal B, McClenathan S, Bruckner M, Rowley T, Wicks B, Shrestha M, Limbu K, Rogers JA

(2012) The cell collective: toward an open and collaborative approach to systems biology. BMC SystBiol 6:96

Helikar T, Kowal B, Rogers JA (2013) A cell simulator platform: the cell collective. Clin Pharmacol Ther93:393–395

Huepe C, Aldana-González M (2002) Dynamical phase transition in a neural network model with noise: anexact solution. J Stat Phys 108(3–4):527–540

Jiang W, Zhang L, Na B, Wang L, Xu J, Li X, Wang Y, Rao S (2009) Mapping and characterization of tworelevance networks from SNP and gene levels. Prog Nat Sci 19(5):653657

Kahn J, Kalai G, Linial N (1988) The influence of variables on Boolean functions. In: Proceedings of the29th annual symposium on foundations of computer science. IEEE, New York, pp 68–80

123

http://creativecommons.org/licenses/by/4.0/


Kauffman SA (1969) Metabolic stability and epigenesis in randomly constructed genetic nets. J Theor Biol22(3):437–467

Kauffman SA (1993) The origins of order. Oxford University Press, OxfordKaufman V, Mihaljev T, Drossel B (2005) Scaling in critical random boolean networks. Phys Rev E

72(4):046124Kaufman V, Drossel B (2006) Relevant components in critical random boolean networks. New J Phys

8(10):228Klemm K, Bornholdt S (2000) Stable and unstable attractors in boolean networks. Phys Rev E 72:055101Klotz JG, Kracht D, Bossert M, Schober S (2014) Canalizing boolean functions maximize the mutual

information. IEEE Trans Inform Theory 60(4):2139–2147Kochi N, Helikar T, Allen L, Rogers JA, Wang Z, Matache MT (2014) Sensitivity analysis of biological

boolean networks using information fusion based on nonadditive set functions. BMC Syst Biol 8:92Kochi N, Matache MT (2012) Mean-field boolean network model of a signal transduction network. Biosys-

tems 108:14–27Krawitz P, Shmulevich I (2007) Basin entropy in boolean network ensembles. Phys Rev Lett 98:158701Krawitz P, Shmulevich I (2007) Entropy of complex relevant components of boolean networks. Phys Rev

E 76:036115Layne L, Dimitrova E, Macauley M (2012) Nested canalyzing depth and network stability. Bull Math Biol

74(2):422–433Lloyd-Price J, Gupta A, Ribeiro AS (2012) Robustness and information propagation in attractors of random

boolean networks. PLoS One 7(7):e42018Luque B, Ferrera A (2000) Measuring mutual information in random boolean networks. Complex Syst

12(2):241Madrahimov A, Helikar T, Kowal B, Lu G, Rogers J (2013) Dynamics of influenza virus and human host

interactions during infection and replication cycle. Bull Math Biol 75(6):988–1011Naldi A, Berenguier D, Fauré A, Lopez F, Thieffry D, Chaouiya C (2009a) Logical modelling of regulatory

networks with GINsim 2.3. Biosystems 97(2):134–139Naldi A, Remy E, Thieffry D, Chaouiya C (2009b) A reduction of logical regulatory graphs preserving

essential dynamical. In:DeganoP,Gorrieri R (eds)Computationalmethods in systems biology. Lecturenotes in computer science, vol 5688. Springer-Verlag Berlin Heidelberg, pp 266–280

Ribeiro AS, Kauffman SA, Lloyd-Price J, Samuelsson B, Socolar JES (2008)Mutual information in randomboolean models of regulatory networks. Phys Rev E 77:011901

Richardson KA (2004) Simplifying boolean networks. Adv Complex Syst 8(4):365381Saadatpour A, Albert R, Reluga TC (2013) A reduction method for boolean network models proven to

conserve attractors. SIAM J Appl Dyn Syst 12:1997–2011Shmulevich I, Dougherty ER, Zhang W (2002) From boolean to probabilistic boolean networks as models

for genetic regulatory networks. Proc IEEE 90(11):1778–1792Shmulevich I, Kauffman SA (2004) Activities and sensitivities in boolean network models. Phys Rev Lett

93(4):048701Socolar JES, Kauffman SA (2003) Scaling in ordered and critical random boolean networks. Phys Rev Lett

90:068702Sole RV, Luque B (1997) Statistical measures of complexity for strongly interacting systems. Santa Fe

Institute working paper 97-11-083Topsoe F (2001) A bounds for entropy and divergence for distributions over a two-element set. J Inequal

Pure Appl Math 2(2):25Veliz-Cuba A (2011) Reduction of boolean network models. J Theor Biol 289:167172Veliz-Cuba A, Laubenbacher R, Aguilar B (2015) Dimension reduction of large sparse AND–NOT network

models. Electron Notes Theor Comput Sci 316:8395

123

logical reduction of biological networks to their most determinative components · 2017-08-28 ·...

Documents