dynamics in boolean networks - diva portal20230/fulltext01.pdf · department of science and...

Department of Science and Technology Institutionen för teknik och naturvetenskap Linköpings Universitet Linköpings Universitet SE-601 74 Norrköping, Sweden 601 74 Norrköping

ExamensarbeteLITH-ITN-ED-EX--05/012--SE

Dynamics in Boolean NetworksFredrik Karlsson

2005-04-28

LITH-ITN-ED-EX--05/012--SE

Dynamics in Boolean NetworksExamensarbete utfört i elektronikdesign

vid Linköpings Tekniska Högskola, CampusNorrköping

Fredrik Karlsson

Handledare Michael HörnquistExaminator Michael Hörnquist

Norrköping 2005-04-28

RapporttypReport category

Examensarbete B-uppsats C-uppsats D-uppsats

_ ________________

SpråkLanguage

Svenska/Swedish Engelska/English

_ ________________

TitelTitle

FörfattareAuthor

SammanfattningAbstract

ISBN_____________________________________________________ISRN_________________________________________________________________Serietitel och serienummer ISSNTitle of series, numbering ___________________________________

NyckelordKeyword

DatumDate

URL för elektronisk version

Avdelning, InstitutionDivision, Department

Institutionen för teknik och naturvetenskap

Department of Science and Technology

2005-04-28

x

x

LITH-ITN-ED-EX--05/012--SE

http://www.ep.liu.se/exjobb/itn/2005/ed/012/

Dynamics in Boolean Networks

Fredrik Karlsson

In this thesis several random Boolean networks are simulated. Both completely computer generatednetwork and models for biological networks are simulated. Several different tools are used to gainknowledge about the robustness. These tools are Derrida plots, noise analysis and mean probability forcanalizing rules. Some simulations on how entropy works as an indicator on if a network is robust arealso included. The noise analysis works by measuring the hamming distance between the state of thenetwork when noise is applied and when no noise is applied. For many of the simulated networks twotypes of rules are applied: nested canalizing and flat distributed rules. The computer generated networksconsists of two types of networks: scale-free and ER-networks. One of the conclusions in this report isthat nested canalizing rules are often more robust than flat distributed rules. Another conclusion is thatthe mean probability for canalizing rules has, for flat distributed rules, a very dominating effect on if thenetwork is robust or not. Yet another conclusion is that when flat distributed rules are applied, theprobability distribution for indegrees has a strong effect on if a network is robust. The indegrees has astrong effect due to the connection between the probability distribution for indegrees and the meanprobability for canalizing rules.

Random Boolean Networks, Derrida plots, Genetic regulatory networks

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –under en längre tid från publiceringsdatum under förutsättning att inga extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat förickekommersiell forskning och för undervisning. Överföring av upphovsrättenvid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning avdokumentet kräver upphovsmannens medgivande. För att garantera äktheten,säkerheten och tillgängligheten finns det lösningar av teknisk och administrativart.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman iden omfattning som god sed kräver vid användning av dokumentet på ovanbeskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådanform eller i sådant sammanhang som är kränkande för upphovsmannens litteräraeller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press seförlagets hemsida http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possiblereplacement - for a considerable time from the date of publication barringexceptional circumstances.

The online availability of the document implies a permanent permission foranyone to read, to download, to print out single copies for your own use and touse it unchanged for any non-commercial research and educational purpose.Subsequent transfers of copyright cannot revoke this permission. All other usesof the document are conditional on the consent of the copyright owner. Thepublisher has taken technical and administrative measures to assure authenticity,security and accessibility.

According to intellectual property law the author has the right to bementioned when his/her work is accessed as described above and to be protectedagainst infringement.

For additional information about the Linköping University Electronic Pressand its procedures for publication and for assurance of document integrity,please refer to its WWW home page: http://www.ep.liu.se/

© Fredrik Karlsson

Abstract In this thesis several random Boolean networks are simulated. Both completely computer generated network and models for biological networks are simulated. Several different tools are used to gain knowledge about the robustness. These tools are Derrida plots, noise analysis and mean probability for canalizing rules. Some simulations on how entropy works as an indicator on if a network is robust are also included. The noise analysis works by measuring the hamming distance between the state of the network when noise is applied and when no noise is applied. For many of the simulated networks two types of rules are applied: nested canalizing and flat distributed rules. The computer generated networks consists of two types of networks: scale-free and ER-networks. One of the conclusions in this report is that nested canalizing rules are often more robust than flat distributed rules. Another conclusion is that the mean probability for canalizing rules has, for flat distributed rules, a very dominating effect on if the network is robust or not. Yet another conclusion is that when flat distributed rules are applied, the probability distribution for indegrees has a strong effect on if a network is robust. The indegrees has a strong effect due to the connection between the probability distribution for indegrees and the mean probability for canalizing rules.

Acknowledgment First of all, I want to express my gratitude towards my supervisor and examiner Michael Hörnquist who has been a great support for me in the work with my master-thesis, by providing me with interesting ideas and articles. I would also want to thank my parents for their love and support.

Contents CONTENTS........................................................................................................................................................... 1

AIM AND PURPOSE ........................................................................................................................................... 3

METHOD .............................................................................................................................................................. 3

LANGUAGE COMMENTS................................................................................................................................. 3

INTRODUCTION................................................................................................................................................. 4

THEORY ............................................................................................................................................................... 7 GRAPHS AND NETWORKS..................................................................................................................................... 7 BOOLEAN RULES................................................................................................................................................ 12

Flat distributed rules..................................................................................................................................... 12 Canalizing and Nested canalizing rules........................................................................................................ 13

ENTROPY ........................................................................................................................................................... 15 HAMMING DISTANCE ......................................................................................................................................... 17 DERRIDA PLOTS ................................................................................................................................................. 18 NOISE ANALYSIS TOOL....................................................................................................................................... 19 STANDARD ERROR ............................................................................................................................................. 21

IMPLEMENTATION ........................................................................................................................................ 22 IMPLEMENTATION OF THE NETWORK REPRESENTATION..................................................................................... 22 IMPLEMENTATION OF THE UPDATING OF STATES................................................................................................ 23 IMPLEMENTATION OF THE CREATION AND UPDATING OF THE BOOLEAN RULES ................................................. 24 IMPLEMENTATION OF THE REWIRING ................................................................................................................. 27 IMPLEMENTATION OF ER-NETWORKS GENERATING PROCEDURE ....................................................................... 28 IMPLEMENTATION OF SCALE FREE NETWORKS GENERATING PROCEDURE.......................................................... 29 RANDOM NUMBER GENERATOR ......................................................................................................................... 29 THE IMPLEMENTATION OF THE CALCULATION OF ENTROPY ............................................................................... 30 THE IMPLEMENTATION OF THE NOISE ANALYSIS TOOL....................................................................................... 31 IMPLEMENTATION OF THE DERRIDA PLOT.......................................................................................................... 31

SIMULATIONS .................................................................................................................................................. 33 SIMULATIONS ON ENTROPY VERSUS CHAOS ....................................................................................................... 33

Simulation settings ........................................................................................................................................ 33 Simulation results.......................................................................................................................................... 33 Analysis of the result ..................................................................................................................................... 36

CALCULATIONS FOR MEAN PROBABILITY FOR CANALIZING RULES FOR DIFFERENT DISTRIBUTIONS................... 37 Description of the calculations ..................................................................................................................... 37 Results of the calculations............................................................................................................................. 38 Discussion and analysis ................................................................................................................................ 39

SIMULATIONS ON ER-NETWORKS ...................................................................................................................... 40 Probability distribution................................................................................................................................. 40 Derrida plots ................................................................................................................................................. 41 Noise analysis tool ........................................................................................................................................ 43 Mean probability for canalizing rules........................................................................................................... 47 Discussion and analysis ................................................................................................................................ 47

SIMULATIONS ON THE FANG-NET....................................................................................................................... 49 Probability distribution................................................................................................................................. 49 Derrida plots ................................................................................................................................................. 51 Noise analysis tool ........................................................................................................................................ 52 Mean probability for canalizing rules........................................................................................................... 56

SIMULATIONS ON THE LEE-NET.......................................................................................................................... 59 Probability distribution................................................................................................................................. 59 Derrida plots ................................................................................................................................................. 61 Noise analysis tool ........................................................................................................................................ 62

1

Mean probability for canalizing rules........................................................................................................... 66 Analysis and discussion................................................................................................................................. 66

SIMULATIONS ON THE MILO-NET ....................................................................................................................... 69 Probability distributions ............................................................................................................................... 69 Derrida plots ................................................................................................................................................. 71 Noise analysis tool ........................................................................................................................................ 72 Mean probability for canalizing rules........................................................................................................... 76 Analysis and discussion................................................................................................................................. 76

SIMULATIONS ON REWIRED VERSIONS OF THE MILO-NET................................................................................... 78 Motif detection .............................................................................................................................................. 78 Derrida plots ................................................................................................................................................. 79 The Noise analysis tool ................................................................................................................................. 81 Analysis and discussion................................................................................................................................. 83

SIMULATIONS ON THE LASSO-NET ..................................................................................................................... 84 Probability distributions ............................................................................................................................... 84 Derrida plots ................................................................................................................................................. 86 Noise analysis tool ........................................................................................................................................ 87 Mean probability for canalizing rules........................................................................................................... 89 Analysis and discussion................................................................................................................................. 89

SUMMARIZING DISCUSSION AND ANALYSIS ........................................................................................ 91

SUGGESTIONS ON FURTHER RESEARCH................................................................................................ 92

FIGURE LIST..................................................................................................................................................... 93

EQUATION LIST............................................................................................................................................... 95

TABLE LIST....................................................................................................................................................... 95

REFERENCES.................................................................................................................................................... 96

2

Aim and purpose The aim with this thesis is to study the robustness for different Boolean networks, which are used as models for biological networks. The purpose is both to examine which networks that are robust and to try to determine what factors that governs if a network is robust or not. Method The method to examine the properties of the robustness is to simulate different types of Boolean networks. The programming to achieve the possibility to simulate the Boolean networks is done in C++.

Language comments Sometimes several different words are used to designate the same thing and sometimes can a word have several meanings. The usage of some of these words will be explained under this topic. The word input is sometimes used as a synonym to indegree. If it is used in context: input to vertex it is synonymous to indegree. A similar connection exists between output and outdegree. The terms ordered and chaotic regime is used to describe a network as robust respectively not robust. Instead of not robust the phrase, sensitive to initial perturbation, can be used. The word robust is also used in phrases such: robust against noise. The meaning of the phrase is that it is sensitive to noise. In some rare occasions, in connection to the article [23], the word stable is used as a synonym to robust.

3

Introduction In 1969 Stuart Kauffman introduced random Boolean networks as a model for genetic regulatory networks. He did it in the article [1]. The main concept of this idea is to approximate the expression of the genes to on or off. This approximation is quite crude due to the fact that the genes are allowed to occupy positions in other levels than on or off. Random Boolean networks have gained some acceptance due to the fact that they have shown similar results as networks with multilevel rules [2]. The word expression will, from now on, be replaced with the word state in the context of Boolean networks. In random Boolean networks the state of one gene is determined by the states of the other genes. This is done with the help of logical rules also known as Boolean rules. The network determines the genes that influence a certain gene and the Boolean rules determines how they shall influence that gene [3]. Examples of typical Boolean rules are AND, OR and XOR. So one can say that the random Boolean networks is a digital-circuit where the actual genes are approximated with D-flipflops and the rules are approximated with combinatory circuits. In random Boolean networks another approximation is utilized besides the Boolean rules and it is that synchronous updating is applied. Synchronous updating means that all the genes are updated at the same time. This is an approximation due to the fact that genes, in the real world, are updated asynchronously. Asynchronous updating means that the genes are updated sequentially. The state of the gene is determined before the next gene is updated. The order of the updating is often arbitrary. Random Boolean networks can also utilize an asynchronous updating scheme and if they do they are often denoted asynchronous random Boolean networks. Now to the question: Why is synchronous updating used? The answer is because it is simple. If one uses asynchronous updating one must decide the order for the updating of the genes. One possible solution is to update the genes in a random order. This solution suffers from the problem that the system becomes non deterministic. The problem with a non-deterministic behaviour is that the cyclic behaviour that is typical for synchronous updating is destroyed [2]. This cyclic behaviour also exists in the real cells [4]. Which genes that affect which gene are decided randomly and the Boolean rules are also determined randomly. Is this type of model relevant? The answer is yes but it is not relevant for obtaining detailed information. The model can give information about generic properties such as the probable number of connection between the genes. Random Boolean networks, cannot disclose any properties of the real genetic regulatory network without a very central postulate, which says that a biological network must be robust. Robust means, in this case, that a biological network must be insensitive to disturbances. The postulate is essential due to the fact that a living organism is dependent, for its survival, on that certain tasks are performed in a certain order at a certain time. The cell is a good example that some tasks must be performed in an ordered way. The cell cycle is described in the text below. The cell cycle for the eucaryotic cell, which is a cell with a distinct nucleus and cytoplasm, consists of four phases. Cytoplasm is the contents within the plasma membrane but outside the nucleus. The four phases are the M-phase, the G1-phase, the S-phase and the G2-phase. The M-phase consists of two stages mitosis and the cytokinesis. Mitosis is the division of the nucleus and the cytokinesis is the division of the cytoplasm. The mitosis occurs before the cytokinesis and after the cytokinesis two new cells with two different nucleuses have been formed. During the G1-phase, G stands for gap, will the cell grow and if the appropriate conditions exist the cell will enter the S-phase. The S in the S-phase stands for synthesis and during this phase the cell copies the DNA in the nucleus. The copying of the DNA is a

4

necessary step before the mitosis can start. After the S-phase the chromosomes and their copies are tightly bound together and they will not separate before the mitosis has been performed. The G2-phase lies between the S-phase and the M-phase. During the G2-phase the cell will grow and when the appropriate conditions exist the cell will enter the M-phase. The phases must be executed in a certain order. The order is shown, in Fig 1, with the help of the circle arrow. For example the M-phase is executed before the G1-phase and so on. To make sure that the phases are executed in the right order the cell utilizes the cell cycle control system. Besides the execution order the cell cycle control system checks if the cell cycle shall continue and this is done with the help of feedback from the cell. Different things are tested, in so called checkpoints, before the different phases. Two things must be checked before the M-phase can start and these two things are the cell size and if the entire DNA has been copied. At the checkpoint before the S-phase three things must be tested. The three things are cell size, environment and if the DNA has been damaged. All the phases in the cell cycle is started and performed by different proteins.

G2

S

G1

M Fig 1: The Figure shows the control of the cell cycle. The importance of proteins for the cell cycle demands a short description on how they are manufactured. They are manufactured in the ribosome but the ribosome cannot manufacture a protein without a blueprint. The blueprint exists in the genes in the DNA. The information in the DNA is copied to a molecule called a messenger RNA. Messenger RNA is abbreviated mRNA. The entire DNA is not copied into the mRNA molecules. Only small parts of it are copied. The small parts contain the information necessary to manufacture the protein needed. The process between DNA and protein consists of many steps and it contains many complex chemical processes. It is not enough with mRNA one also need something that is called tRNA, which stands for translation RNA. Translation RNA helps binding the amino acids with components in the mRNA molecule. The translation RNA is only one example of the many components that are involved in the process of creating protein from the description in DNA.

5

With the knowledge of how the cell cycle works and how the manufacturing of protein works it is easy to see that the genes cannot order the production of the same amount of proteins at all times. In a multi cellular organism the DNA contains information for the whole organism but the single cell do not need proteins that are used in other cell types. From the two previous sentences it is clear that all genes cannot be active all the time for every cell type. Therefore it is necessary to regulate how much protein different genes shall “produce”. It is possible for a cell to control the expression of the gene at different levels, so the random Boolean networks can be seen as a projection to the gene level. With other words the model does not contain any information on what level the control has been performed. It only contains information about the genes that caused the controlling mechanism. A short comparison between the genetic regulatory for the Eucaryotic cell and (random) Boolean networks will be given below. The word random is put in parenthesis due to the fact the common features with genetic regulatory for the Eucaryotic cell is not dependent on the fact that network is random. The first common feature is that a gene can affect any other gene no matter the position in the DNA.. This is also true for the (random) Boolean networks. The second common feature is that the genes are regulated by a combination of proteins, which are expression from several genes. This is consistent with the (random) Boolean networks that allow several genes to affect a single gene. [4] The networks simulated in this report are not all random as in random Boolean networks. Some has a fixed network, which has been proposed as models for different regulatory networks. In one sense these networks are also random due to the fact that the Boolean rules are assigned randomly.

6

Theory Graphs and Networks The networks are represented by graphs or more precisely by so-called digraphs. Before the concept of digraphs is described some properties of graphs will be presented. The main components in a graph are edges and vertices (see Fig 2). The dots in Fig 2 represent the vertices and the arrows represent the edges. Arrows represent the edges due to the fact that the graph in Fig 2 is a digraph, which means that the edges are directed. In an ordinary graph simple lines represent the edges. The reason for using a digraph instead of an ordinary graph is that it is not always desirable for two vertices, which are connected by an edge, to have mutual influence on each other. Fig 2: The figure shows an example of a digraph. A graph G is defined with the help of Eq1. V (G) and E (G) are the set of vertices respectively the set of edges which defines the graph G. One way of representing a graph defined by Eg1 is to use an adjacency matrix. A slightly modified version of an adjacency matrix is used in the implementation. For more information about the adjacency matrix and its implementation see under the topic Implementation of the network representation. [5]

( )( )

GEGV

(Eq1)

Eq1 is not enough to describe the type of networks simulated in this report. The essential part that is missing is the Boolean rules and they have been added in Eq2, which gives a complete definition of the networks used.

( )( )

)(GBGEGV

(Eq2)

B(G) in Eq2 is the set of Boolean rules which together with V(G) and E(G) defines a network. The definition in Eq2 opens up for a vast number of possible networks. The formula in Eq3 gives a hint on the number of possible networks.

( )

N

Net KNNNo

k

−=

!!22

(Eq3)

N in Eq3 denotes the number of vertices in the network and k denotes the number of indegrees to every vertex. Indegrees is the number of inputs for a vertex. The formula in Eq3

7

does not give the number of every possible network. It assumes that every vertex has the same number of indegrees. To get a picture on how large the number of possible network is consider a network with N=10 and k=2, and Eq3 will give the value NoNet≈3.83e31. The possible number of networks is even larger if k would represent the mean number of inputs, for an arbitrary probability distribution, instead of the actual number of inputs. [2] One type of graph is the Erdős-Rényi graph (ER-graph). From now on the word ER-network will be used instead of the word ER-graph. In an ER- network the edges are chosen from all possible edges with equal probability. In a digraph the number of possible edges is given by N(N-1), where N is the number of vertices. It follows from the statement in the previous sentence that the probability for one edge to be chosen is equal to 1/(N(N-1)). For large N the ER- network has both Poisson distributed indegrees and Poisson distributed outdegrees. The Poisson distribution is given in Eq4.

( )!)exp(

kmmkp

k⋅−= (Eq4)

The letter k in Eq4 denotes a positive integer, which either represents the number of indegress or the number of outdegrees. The m represents the mean number of outdegrees or indegrees depending on what k represents. For example when k represents the number of outdegrees, m represents the mean number outdegrees. See Fig 3 to see the Poisson distribution for some different values on m. [6]

01

23

45

67

89

m=2m=1

m=0.5

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

p(K)

k

Poissom distribution for different values on m

Fig 3: The figure shows the Poisson distribution for three different values on m. One interesting property for the ER-networks, with Boolean rules that are drawn from a flat distribution, is that above a certain value for the mean number of indegrees, the network will

8

be placed in the chaotic regime. The value for the mean number of indegrees is obtained with help of a simplified model. The simplification that has been made is called the annealed model. The annealed model makes it possible to get an analytic result on how the distance between two states evolves with time. In the annealed model new Boolean rules are assigned after every time step. In the networks simulated in this thesis the quenched model is used, which means that the Boolean rules are kept intact for every time step. The ER-network is, according to the annealed model, in the chaotic regime when the mean number of indegrees lies above two. The same value is achieved when numerical simulations of the quenched model are performed. The numerical results for the quenched model and the analytical result for the annealed model can be found in the articles [7] and [8]. The edge between the chaotic and ordered regimen lies not always at the value two for the mean number of edges. It only lies there when the distribution of Boolean rules are flat. Eq5 gives the real edge between the chaotic and ordered region. The constant K is the average number of indegrees for a vertex and p is the bias, which is a parameter that affects the distribution of Boolean rules.

( ) 112 =− Kpp (Eq5) Another type of network that is interesting is the one, which have a power law distribution. These networks are called scale-free networks and the power law distribution is given in Eq6.

( ) ( )[ ] 1−⋅= γγζ kkp (Eq6)

The two constants, in Eq6, k and γ are the number of indegrees or outdegrees respectively a constant. The normalizing constant ζ(γ) is given in Eq7.

( ) ∑∞

=

−=1kk γγζ (Eq7)

The constants in Eq7 have the same meaning as in Eq6. [2] The constant ζ (γ) cannot always be estimated by summing a large number of terms. For example if γ=1, then ζ (γ) is a harmonic series, which diverges and it is therefore impossible to find a finite number for ζ (γ). Values for γ that are slightly bigger than one will converge but they will converge very slowly and therefore it is not suitable to use the method of summing many terms. A better method is integral approximation and it works by summing the terms in the beginning and then integrate from the last summed term to infinity. A better integral approximation is to calculate a mean from two integrals, which have starting points that differs with one term (See Eq8).

21* nn

nnAA

sss+

+=≈ + (Eq8)

The s in Eq8 is the actual value for the sum over all the series. The symbol denotes an integral approximation that has n terms that have been summed. Summed terms are denoted sn in Eq8. In Eq9 sn is given.

*ns

( )∑=n

n kfs1

(Eq9)

9

In the case of ζ (γ) will f(k) in Eq9 be ( ) γ−= kkf . An is described in Eq10.

( )dxxfAn

n ∫∞

= (Eq10)

The functions f(x) in Eq10 and f(k) in Eq9 are the same function but k consists of discrete values and x consists of continues values. In the case of An+1 the expression is similar to the expression for An but the lower bound is n+1 instead of n. Now to the question: How many terms should one sum over? To answer this question one needs a way to approximate the error of the approximation and an expression for the error is given in Eq11.

21* +−

≤− nnn

AAss (Eq11)

The expression in Eq11 guaranties that the real error is equal or less than the result from the expression. The result of applying Eq8 on Eq7 is shown in Eq12 and it is the approximation for ζ (γ). [9]

( )( )12

1 11

1 −++

+≈−−

=

−∑ γ

γγγ nnks

n

k (Eq12)

In a similar way it is possible to derive an expression for the error by applying Eq11 on Eq7. This expression is shown in Eq13.

( )( )12

1 11*

−+−

≤−−−

γ

γγ nnss n (Eq13)

An alternative to the function in Eq7 is Eq14, which allows that the value k equals zero.

( ) ( ) ( )[ ] ( ) ( )∑∞

=

−−+=+⋅=

1

11 where,1

kkgkgkp γγ γγ (Eq14)

If one applies Eq8 on Eq14 the result will be the expression shown in Eq15, which is the formula to approximate the sum.

( ) ( ) ( )( )12

21111

1 −+++

++≈−−

=

−∑ γ

γγγ nnks

n

k (Eq15)

The error approximation for Eq14 is shown in Eq16.

( ) ( )( )12

21 11*

−+−+

≤−−−

γ

γγ nnss n (Eq16)

There are at least two interesting things with scale-free networks. The first thing is that they occur in many real networks, such as in the world-wide web, internet and science

10

collaboration network [6]. The other thing that is interesting is that the chaotic regime does not dominate the parameter space, which is opposite to the case of ER-networks. The mean connectivity is not a relevant parameter to describe the topology of scale-free networks. So the parameter γ is used instead. In the article [10] an expression, to determine the critical value for γc, is found, which follows from simulations with the Derrida plot. The critical value γc is the value, which lies on the limit between the chaotic and ordered regime. The expression is given in Eq17.

( ) ( )( ) 1

112 =

−−

c

cppγζ

γζ (Eq17 )

The constant p in Eq17 is called bias and determines the probability for true or false to occur (for more information see the topic Boolean rules). The function ζ is given in Eq7. A property of the networks that can be interesting to study besides the probability distributions for outdegrees and indegrees is so called motifs. Motifs are small networks within the network. There are many different motifs. One example is the feed forward loop [11]. The interesting thing is that in real networks, biological networks and electronic circuits, the occurrence of some motifs are higher than in completely random generated networks [12].

11

Boolean rules Two types of Boolean rules are described under this topic. The two types are nested canalizing rules and rules drawn from a flat distribution.

Flat distributed rules Flat distributed rules are described under this topic. Flat distributed rules are Boolean rules, which are drawn from all possible Boolean rules. The word flat refers to that all Boolean rules have the same probability to be drawn. The number of Boolean rules grows when the number of inputs is increased. Already at relatively small number of inputs the number of possible Boolean rules is quite large. An expression for calculating the number of possible rules is given in Eq18. [2]

K

N 22= (Eq18) N is the number of Boolean rules and K is the number of inputs. See Tab 2 to get the picture of how large the number of possible Boolean rules is for different number of inputs. How are the rules chosen with equal probability? To answer this question consider for example a rule with two inputs A and B. Applying Eq18 gives the fact that the number of possible Boolean rules are sixteen. All possible rules are listed in Tab 1. One interesting property to look at is the number of ones and zeros. If one looks at this number for each row R1-R16 one can see that half of the numbers are one. From the fact stated in the previous sentence one can draw the conclusion that if one assigns either one or zero with the probability one half, one will achieve the goal to draw rules from a flat distribution. It is the values in the output column in a truth table that are assigned with one or zero with a fifty percents probability. Tab 1: The table shows all possible rules for two inputs. A B R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15 R16 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

The probability for assigning either one or zero is often referred to as bias. The term no bias can be used when the probability for true is fifty percent. Depending on this bias some rules have a higher probability of being drawn. The flat distribution is, as mentioned before, achieved with a fifty-percent probability to obtain a one and if the probability is less than fifty, rules with fewer ones will have a higher probability of being chosen. It can be seen in Tab 1 that R1, R2, R3, R5 and R9 have a higher probability of being chosen.

12

Canalizing and Nested canalizing rules The concept of canalizing rules is described before the nested canalizing rules are explained. Canalizing rules are a subset of all possible Boolean rules and they have a special feature. The feature is that at least one input can determine the output regardless of the values on the other inputs. An example of a canalizing rule is a+bc, were a, b and c either have the value 0 (false) or 1 (true). The rule is canalizing because if a is one the output will become one regardless of the values on b and c. An example of a non-canalizing rule is exclusive or (XOR). XOR is non-canalizing, because it is impossible to determine the output value without having knowledge about the values at both the inputs. The feature, which makes canalizing rules interesting, is that they have the ability to repress chaotic behaviour. In other words, a network with a high degree of canalizing rules should lie in the ordered regime. Canalizing rules are present among the rules drawn from a flat distribution of all possible Boolean rules. The probability for canalizing rules is dependent on the bias and the number of inputs. The exact dependence is given in Eq19. [13]

( ) ( ) ( )( ) ( )

( ) ( )( )∑=

−−+

−+

−+

−−−+−=

−−

−−

n

k

kk

np

knnknn

nnnn

ppkn

pnpppC

1

22221

2222

121

1211Pr11

(Eq19)

Eq19 will not be derived here but a good derivation exists in the source article [13]. The bias is denoted by p in Eq19. The variable n in Eq19 denotes the number of inputs. C in Eq19 denotes that the equation gives the probability for canalizing rules. In Tab 2 the number of possible canalizing rules is given for different number of inputs. The data in the table is fetched from [13]. The conclusion that can be drawn from data in Tab 2 is that the relative appearance of the number of possible canalizing rules is decreasing as the number of inputs is increasing Tab 2: The table shows the number of possible canalizing for different number of inputs. n |C| Total number of possible

Boolean rules 1 4 122 = 4 2 14 222 = 16 3 120 322 = 256 4 3514 422 = 65536 5 1292276 522 =4294967296 6 103071426294 ≈

622 1.8e19 7 516508833342349371 ≈

722 3.4e38 8 108890357414700308266959

16769153787968498 822

9 4.168515213e78 922 10 5.363123172e155 1022

13

Now when the properties of canalizing rules have been described it is time to give a description of nested canalizing rules, which are a subset of canalizing rules. Nested canalizing rules have the advantage of being easy to generate. The nested canalizing rules guarantee that the rules are canalizing. The main components in nested canalizing rules are two lookup tables, which contain the output values respectively values that are used for comparison with the input values (see Fig 4). Capital o followed by an index denotes the values in the look up table with output values. Lower case o denotes the actual output value, which contains one of the values from the look up table with output values. Lower case i followed by an index is an input value, which is going to be compared with the values denoted by I followed by an index. The first thing that happens is a check if I1 equals i1 and if it does the value O1 is chosen as output. If I1 does not equal i1 the next position is compared, and this goes on until two values that are equal is found or all positions in the look up table have been tried. In the case when no match is found, the output is chosen to equal the value Od.

Choose if Ik=ik else compare next.

Choose if I3=i3 else compare next.



Choose if no fit was found.

Look up table with values for comparison with the input values.

ik

i3

i2

i1

Ik

I3

I2

I1

Look up table with output values.

Od

Ok

O3

O2

O1

Look up table

with input values.

o =

Fig 4: The figure shows a sketch over how the nested canalizing rules work. The part that is left to describe about nested canalizing rules is how the output values and the values, which is compared with the input values, are generated. They are generated randomly but not from a flat distribution. The distribution is given in Eq20.

( ) ( ) ( )( )α

αm

m

mm trueOPtrueIP −

−

−+−

====2exp1

2exp … (Eq20)

The letter m in Eq20 denotes the index and is defined as m=1, 2,….k-1,k. The symbol α is only a constant and it is set to be seven if nothing else is written. The distribution in Eq20 shall according to the authors of [14] result in rules that are biologically relevant. Od is not assigned with the help of the distribution given in Eq20. It is assigned by giving it the inverted value of Ok. [14]

14

Entropy Under this topic the subject of entropy will be dealt with, not the entropy defined in thermodynamics, but the closely related Information-Theoretic definition of entropy. The first task is to define what information is. Given a discrete random variable X={xk | k=0, 1,…, K}, which can take on the value X=xk with the probability of pk. The information we would get by observing the event X=xk is defined by Eq21.

( ) ( kk

k pp

xI log1log −=

= ) (Eq21)

It is possible to use a logarithmic function of an arbitrary base but in this report and in all simulations the logarithmic function with the base two will be used. For the definition in Eq21 to make any sense some restrictions on pk are required and these restrictions are given in Eq22.

[ ] 1 and 1,0 =∈ ∑=

K

-Kkkk pp (Eq22)

The units used for the information I(xk) depends on the base of the logarithmic function and the units are therefore bits. Now when information has been defined it is time to give entropy its Information-Theoretic definition. Entropy is “a measure of the average amount of information conveyed per message” [14]. What a “message” is, in the context of simulating Boolean networks, will be discussed further down in the text. A formal definition of entropy is given in Eq23.

( ) ( ) )(log kkk ppxIXH ∑ ⋅−== (Eq23)

In this report and in all simulations the units for the entropy are bits because, as mentioned earlier, the logarithmic function has the base two. The size of the entropy H(X) has a lower bound and an upper bound. The bounds for the entropy are given in Eq24.

( ) ( )[ ]1log,0 +∈ KXH (Eq24)

The lower bound 0 comes from the fact that if all the states, but one, have a zero probability to occur Eq23 will result in a zero result. It will give a zero result because the state with the nonzero probability must have the probability of one. The states with the zero probability will also give a zero result in accordance with the facts stated in Eq25.

+→→ 0p when ,0plogp kkk (Eq 25) The upper bound in Eq24 is based on the fact that all, K+1, states have the same, non-zero, probability to occur. [14] Now when entropy and some of its properties have been defined it is time to describe how it can be used for determining if a Boolean network is in the ordered or chaotic regime. In information-theoretic one talks about the uncertainty of a random variable [15]. The entropy

15

in thermodynamics is a measurement of the disorder in the system. A higher degree of disorder in the system will result in higher entropy [16]. The uncertainty of variable is an analogue to the disorder of a thermodynamic system. So the measurement of the entropy is a measurement of the degree of disorder in the “message”. Higher entropy means a lower degree of order, that is the “message” is more disordered. What is a “message” in context of Boolean networks? The answer depends on if one means the ideal “message”, which would contain information about all possible states generated by the network of interest, or the “message” which is practically possible to obtain. To start with the ideal “message” will be defined. The ideal “message” S(G) consists of the probability for all possible states generated by a graph with a certain set of vertices, set of edges and set of Boolean rules (see Eq26). All possible states means the states the network will generate for every possible initial state after an infinite number of time steps.

( )( )GS

GBGEGV

⇒

)()( (Eq26 )

From the description of the ideal “message” above it follows that it would be impossible to obtain the ideal “message” to calculate the entropy, because the number of states would be infinite. So the first step towards obtaining a practical “message” is to limit the number of time steps. What the limit should be set to, can be hard to say because it is probably dependent on the graph. For most graphs it is impossible to gather states from all initial states, because the number of initial states grows large even for relatively small networks. For example, a network with 100 vertices would result in 2100 ≈ 1.28e+30 initial states. It would be too time consuming to start gathering information from all initial states, so the solution is to choose a few initial states. The estimation of the entropy, which is based on the practical message”, will be denoted He and it will be calculated by taking the mean from a number of subsets of S(G) (see Eq27). Every Subset Pn(G) consists of the probabilities for every state, which as occurred for a single initial state and a definite number of time steps.

( ) ( )( ) ( )

( ) ( )

( )( )( )( )

( )( )

( )(∑=

=⇒

⇒

⊆

⊆⊆

N

nnne

NNN

GPHN

H

GPH

GPHGPH

GSGP

GSGPGSGP

1

22

11

2

1

1MM

) (Eq27)

There is no guarantee for that the estimation of the entropy will give a good picture of the degree of order in the network, because the entropy is only calculated from a limited number of subsets and these subsets might not give a justifying picture of S(G). Although this method of estimating the entropy suffers from some disadvantages it can still be justified. It can be justified due to a typical phenomenon, for chaotic systems, strange attractors. In the discrete systems simulated in this thesis one cannot talk about true strange attractors because systems have a finite number of states. The term strange attractors will be used any way to denote systems, which have a behaviour that seems to be random. Before a description of strange attractors is given attractors will be described. In ordered systems there are so called attractors and these attractors are states, which the systems are “drawn” to. An attractor is often an equilibrium state or some kind of periodic behaviour. An example of a system with an attractor is bowl with a marble. As long as the marble is placed within the bowl one can be

16

sure that no matter were the marble is placed it will roll down to the bottom of the bowl. The position at the bottom of the bowl is an attractor. Now when a description of attractors has been given it is time to describe strange attractors. Strange attractors are in one way the opposite of attractors. A strange attractor is associated with unpredictability of the motion of the state when there is a small uncertainty about the initial state. Very few examples of strange attractors are not chaotic according to definition based on Lyapunov exponents. [17] The question is, how can strange attractors and attractors justify the estimation of the entropy He? The answer is, in an ordered network there will be attractors, which “attracts” the initial states to either one state or a cycle of states and this will lead to relatively low entropy. In a system with strange attractors the network will not converge to a state or cycle of states and the states of the network will appear to be completely random, this will lead to very high entropy. The biggest disadvantage for this method is that it will not give an accurate measurement of the order of the system when system lies in the ordered regime, because depending on the initial state the network can converge to a single state or a cycle of states. A cycle will of course result in a higher entropy than a single state. The length of the cycle will also have effect on the entropy. It is impossible to know if the chosen initial states will result in representative number of cycles and single states and therefore it is not suitable to use this method as an exact measurement of the order in the system. Another disadvantage is that if a cycle, which the system has converged to, exceeds the number of measured states the entropy will take on its highest value. The highest practical entropy is dependent on the number of measured states. It is given by the logarithm of the number of measured states and has nothing to do with the real maximum entropy for the system. According to [18] numerical simulations of networks in the ordered regime have shown that the cycle length often is of the same order as the square root of the number of vertices. Later research has shown that the statement in the previous sentence is false [2]. The disadvantages mentioned above makes it important to have in mind, that one should use the method with caution. It is not recommendable to use this method as the sole indicator on if the system is in the ordered or chaotic regime. Hamming distance Hamming distance is a measurement of the difference between two bit streams. It is the number of bits between the bit streams, which is not equal, that is measured [19]. Consider two vectors with values, either one (true) or zero (false), A=(a1 a2 … an) and B=(b1 b2 … bn). With these vectors the definition of hamming distance will take the form shown in Eq28 [2].

( ) ∑=

−=n

iii baBAd

1, (Eq28)

To exemplify the use of Eq28 consider two vectors with the size 4, (1 0 0 0) and (1 1 1 0). The hamming distance for the two vectors is d((1 0 0 0),(1 1 1 0))=|1-1|+|0-1|+|0-1|+| 0-0|=2.

17

Derrida plots The Derrida plot is a tool for determining if a network lies in the chaotic or ordered regime. With other words it is a tool to determine if the network is robust. The robustness can be determined by examining how small perturbation of the initial state affects the dynamics of the network. In practice this is done by measuring the distance between the initial state and a perturbed version of the initial state and then update the two states and measuring the new distance. The distance between the two states that have been mentioned in the previous sentences is the Hamming distance. Given an initial state and a perturbed version of it, SA=(1 0 0 0) and SB=(0 0 0 0), and a network that is defined by a set of vertices V(G), a set of edges E(G) and a set of Boolan rules B(G). One can derive two new states, SA’ and SB’, by updating the network one time step. So for example let SA=(1 0 0 0)→ SA’(1 0 0 1) and SB=(0 0 0 0)→ SB’(0 1 0 0). This will give the hamming distances d(SA, SB)=1 and d(SA’, SB’)=3. These two hamming distances will, from now, be denoted d(T) and d(T+1). Where d(T) is the hamming distance before updating and d(T+1) is the hamming distance after updating. The value d(T) is plotted along the horizontal axis and d(T+1) is plotted along the vertical axis. So far only one point for the Derrida plot has been derived and to get more points new initial states, with new Hamming distances, must be sampled. [18] According to the article [18] the Derrida plot is the binary discrete counterpart of the Lyapunov exponents. The Lyapunov exponent is a well-accepted tool to diagnose if a system lies in the ordered or chaotic regime. Lyapunov exponents measure the sensitivity for perturbations of the initial state. A small distance between two initial states is chosen and when a small amount of time has elapsed the new distance is measured. The new distance can be written as the initial distance multiplied with an exponent with an arbitrary base. If the average exponent is larger than zero the system is in the chaotic regime and if it is smaller the system lies in the ordered regime. [17] To evaluate if the network is in the chaotic or ordered regime one compares the derived plot from the network with the diagonal of the graph. This diagonal corresponds to the situation when the Hamming distance for the two initial states is equal to the Hamming distance after one time step. In a situation, as in the last sentence, the system is considered to be in the ordered regime. The network is also considered to be in the ordered regime if the curve derived from the network lies under the diagonal. When the curve lies above the diagonal the network is considered to be in the chaotic regime [18]. To justify the three statements done in the three past sentences it can be interesting to compare with the Lyapunov exponents. The network is in the chaotic regime when the Lyapunov exponent is larger than zero and this means that the distance will grow larger than the initial state when some time has elapsed. This behaviour of growing distances is shown in the Derrida plot by the fact that the curve derived from the network lies above the diagonal. A Lyapunov exponent that is zero corresponds to a curve that lies on the diagonal in the Derrida plot. A curve, which lies below the diagonal in the Derrida plot, corresponds to a negative Lyapunov exponent. No rigid proof for that Derrida plots actually measures the presence of chaos exists but the similarities with the Lyapunov exponent, makes it a credible tool to measure if a network lies in the ordered or chaotic regime. Both Derrida plots and Lyapunov exponents measure the sensitivity for perturbation of the initial state.

18

Noise analysis tool Entropy and Derrida plots are in one way a noise analysis tool because they are indicators on the sensitivity for perturbation of initial states, but to get a picture on how the network is affected by noise over longer periods one needs another tool. One suitable method is to use a slightly modified version of the method presented in [7]. The method in [7] is based on that two initial states, with a certain Hamming distance, is chosen and then updated over several time steps. The Hamming distance between the states is calculated for every time step. To analyze the consequences of noise only one initial state is chosen. The state is updated in the same way as in [7], but instead of calculating the hamming distance between states that have originated from two different initial states, it is calculated between states, which have been updated with noise, and states, which have been updated without noise (See Fig 5).

A B Calculate the Hamming distance

Update one time step with noise


0101

1100


0000

0000

Update one time step without noise


10 1 0

00 1 0


01 0 0

00 0 0 d(A,B)=0

d(A,B)=1

d(A,B)=3 d(A,B)=4 Fig 5: The figure shows an example of how the Noise analysis tool works. The number of time steps taken is four. The idea with the noise analysis tool is to look at a plot with the Hamming distances plotted against the time steps. To determine if one type of network is better than another type, one simply plots the Hamming distances for both the networks in the same plot. Features one should compare are the level of the curves and their slopes. The number of time steps can be chosen arbitrary. Now when the noise analysis tool has been described it is time to describe the different types of noise, which will be used for testing the network. Basically there are two main categories of noise, structural noise and dynamical noise. Dynamical noise is noise that affects the state. Structural noise is noise, which affects the topology of the network by for example randomly adding and removing edges. The simulations performed in this report do not include tests for structural noise. Dynamical noise has to do with the state or transfer of the state from one vertex to the input of another vertex. The Noise analysis tool has the ability to test three types of dynamical noise. These types are freezing of a state (Delay noise), randomly altering a state (State noise) and randomly disturb the transfer of states (Transfer noise). Freezing of a state will be described first. The freezing of a state means that for every vertex there is a

19

probability that its state is not updated (See Fig 6). In Fig 6 the probability for freezing a state is denoted p and the states are denoted by the letters a to d. The index on the letters describes what time step they come from. It is possible for a state to stay freezed for more than one time step. In Fig 6 ct is an example of a state of that is freezed over two time steps.

p-1p-1p-1p-1

p-1p-1p-1p-1

dt ct bt at

dt+2ct bt+2at+1

dt+1ct bt+1at+1

Update the individual

states one time step with a probability of p-1

Update the individual states one time step with a probability of p-1

Fig 6: Shows an example of how the Delay noise works. The second type of noise is when the actual state changes with a certain probability p (see Fig 7). In Fig 7 the probability for a single state to invert is denoted with p and an inverted state is denoted with the state followed by an apostrophe. For every single state there is a probability that the state will be inverted. The updating of a state is performed after the noise has been applied.

p p p p

dt+1’ct+1bt+1at+1

dt ct bt’at

p p p p

dt ct bt at

dt+1ct+1bt+1at+1

Invert the individual

states with a probability of p

Update all states one

time step Invert the individual

states with a probability of p

Fig 7: The figure shows an example of how the noise, which affects the actual states, works. The first two noise types are applied directly to the states. The third noise type is not applied directly to the states. It is applied on the transfer of states between vertices. Transfer of states only occurs between vertices that are connected by an edge. The transfer of states is

20

performed before the updating of the states. The behavior of Transfer noise for a transfer of a state between two vertices is shown in Fig 8. The probability for sending the state without any change is denoted p-1 and p is the probability for that the transfer goes wrong. p-1

True True p

p

Start-vertex.

End-vertex.

False False

p-1 Fig 8: The figure shows a schematic sketch over how Transfers noise works. One question that rises is, why is this third type of noise interesting? The answer is that in the other two types of noise the states are affected directly but in the third type they are only affected indirectly. If the Boolean rules are arranged in a certain way the third type of noise is not necessarily visible. Standard error In all analysis methods described in this report it can be interesting to take means and standard errors over several Boolean rules and initial states. A formula for estimating the standard error is given by Eq29.

( )∑=

−−

=n

jj xx

ns

1

2

11 (Eq29)

It is easier to use another formula for estimating the standard error and it is possible to derive it from Eq29. The equation is not derived but it is given in Eq30.

−⋅

−= ∑∑

==

2

11

2 11

1 n

jj

n

jj x

nx

ns (Eq30)

Eq30 is more practical to use because one must not calculate the mean before one calculates the standard error. [20]

21

Implementation This section consists of a description of how the essential parts of the program have been implemented. The essential parts consist of those parts, which can be considered to be necessary to estimate the validity of the results. The programming was done in C++ using Dev-C++ 4.0. Implementation of the network representation The network structure is stored in a slightly modified version of the adjacency matrix. In the adjacency matrix every position in the matrix represents a possible edge. The position in the column and the row vector determines which vertices the edge should be drawn between. A number can represent the number of edges but this feature is not implemented [5]. In this implementation the row vectors represent the start vertices and the column vectors represent the end vertices (See Fig 9).

( )

=

0 0 11 0 00 1 0

GAa

b a b c a

c b

c

Fig 9: The figure shows a graph and the corresponding adjacency matrix. Every column and row vector does not need to be stored due to the fact that an adjacency list representation, which consists of several single linked lists, is used. It saves memory by only storing the positions in the matrix that actually represent an edge. A head node represents every vertex. Nodes with a number that refers to a position in the row in the adjacency matrix are connected to the head nodes (See Fig 10). The head nodes, which have no indegrees, do not have any nodes. [21]

0

2

1

2

1

0 Single linked

lists Head Nodes

Fig 10: The figure shows the adjacency list representation for the graph in Fig 9. The head nodes correspond to the end vertex, for the edges, and the single linked list connected to each head node corresponds to the start vertices for every edge that has the same end vertex. In Fig 10 the first head node has the number zero and this corresponds to the first position in the column (see the adjacency matrix in Fig 9). The node that is connected to the first head node in Fig 10 has the number one, which corresponds to the second position in the row. The adjacency matrix is represented in this way because it is not necessary to store all the zeros.

22

Implementation of the updating of states The updating of states is synchronous and to get a synchronous updating, input-buffers are used. The input buffer is placed in “front” of every vertex. For example, the vertex (head node) zero in Fig 10 has a buffer, which contains the state from vertex one. All the input-buffers are updated before the states (see Fig 11). The updating of the input buffers is performed by a loop, which steps through the vector with vertices (head nodes). For every vertex (head node) a loop steps through the single linked list, which contains the positions of the vertices with the states that are used for updating the input buffer. The updating of the input buffer is conducted after the single linked list has been stepped through.

Update states by using nestled canalizing rules.

Update states by using flat distributed rules.

No

Yes

Use nestled canalizing

rules?

Updates the input buffers for every vertex.

Fig 11: The figure shows a simplified flow chart for how the states are updated. The procedure for updating the states is a bit different depending on if nestled canalizing rules or flat distributed rules are used. For flat distributed rules the output value is stored in a vector. The input buffer decides the value that is going to be used as output. It is decided by the position in the vector with output values. The input-buffer can be seen as a binary number and this number can be transformed to a decimal integer, which constitutes the position in the output vector. The first position in the input-buffer is defined as the least significant bit and the first position contains the state from the vertex, which is first in the single linked list. A higher position in the input buffer means a more significant bit and from this follows that the last position is the most significant bit. In the case with nestled canalizing rules the output is also dependent on the position in the vector with output values but the position is not determined in the same way as in the case of flat distributed rules. The position is determined by comparing a predefined vector with the input-buffer (for more details see under the topic Boolean rules in the Theory part).

23

Implementation of the creation and updating of the Boolean rules The first subject to be dealt with under this topic is the creation of flat distributed rules. The task that is performed first when creating the flat distributed rules is to determine the number of indegrees (inputs) for the vertex. The number of inputs is determined by the size of the input-buffer. If the number of inputs is higher than fifteen no Boolean rule will be created and the program will be terminated. It will be terminated due to the fact that the vector with output values would grow very large and cause memory problems (See Fig 12). After the vector with output values has been created a value, either true of false, is assigned to every position in the vector. To draw a Boolean rule from a flat distribution of all possible Boolean rules one assigns true or false, with fifty percents probability, to every position in the vector with output values (for more details see under the topic Boolean rules in the Theory part).

Yes

No

Assign random values from a flat distribution to the vector with output values

Create the vector for the output values with size 2(No. of inputs).

To many inputs! No. of

inputs <15?

Get the number of indegrees (inputs).

Fig 12: The figure shows a flowchart for the process of creating Boolean rules from a flat distribution The creation process for nestled canalizing rules is somewhat similar as for flat distributed rules (see Fig 13). It starts in the same way by determining the number of inputs but there exist no upper limit for the number of inputs. Instead of creating one vector, two vectors are created, one with the same size as the input buffer and another that is one position bigger. The first vector shall contain values which are going be compared with the values in the input-buffer. The second vector contains the output values. Both vectors are assigned with, either true or false, randomly from a special probability distribution given in Eq20. After the values have been assigned, the last position in the vector with the output values is determined by equaling it with the inverted value at the second last position.

24

Create two vectors. One with a size that equals the number of inputs and another, which is one position bigger.

Assign the last value in the vector, which is one position bigger by equaling it with the inversion of the second

Assign random values to the two vectors.

Get the number of edges, which has the current vertex as end vertex (number of inputs).

Fig 13: The figure shows a flowchart for the process of creating nestled canalizing rules. When loading or creating a network the Boolean rules are not created instantly but they evolve as new edges are added to the network. The two earlier methods described under this topic are used for simulations, which need to draw statistics from more Boolean rules than the original ones. The process of updating creates the original Boolean rules. The first step is to determine the number of inputs for the end vertex of the newly added edge (see Fig 14) and this is done in the same way as in the processes of creating Boolean rules. Depending on if nestled canalizing rules or Boolean rules drawn from a flat distribution are used, one of two different courses of events is executed (see Fig 14). Memory problems can occur if rules from a flat distribution are used. A temporary vector with the size 2x is created and it will be used to store the output values, which existed before the new edge was added. The letter x equals the number of edges before the new edge has been added. When the output values have been stored in the temporary vector the vector, which originally contained the output values, is deleted and a new vector is created with the size 2y. The letter y equals the number of edges after the new edge has been added. After the new vector with output values have been created the values from the temporary vector are copied to the first positions in the new vector and the positions, which are left are assigned to either true or false with a fifty percent probability. The process for nestled canalizing rules has no check on the number of inputs due to the fact that the length of the vectors created is linearly dependent and will therefore cause no memory problems. Two vectors are copied, one with outputs values and one with values that is going to be compared with the input, to two temporary vectors. The two original vectors are deleted. After the two vectors are deleted new ones are created but they are one position bigger and the values from the temporary vectors are copied to the new vectors. New values for the last positions are created in the same way as for the process of creating nestled canalizing rules.

25

e ues

e bigger

Yes

Create two temporary vectors. One for the values, which is compared with the valin the input buffer and another for the output values. All the values arecopied to the temporary vectors and new versions are created which is onposition

All the values except the last in the temporary vector with output values are copied to the new vector. The two last positions are assigned randomly respectively by the inverted value of the second last. All the values in the other temporary vector are copied to the other new vector. The value of the last position is determined randomly.

The values of the temporary vector are copied to the first positions in the new vector for output values. The rest of positions are assigned the value true or false with 50 percent probability.

No

Create the vector for the output with two positions and assign the two positions with either true or false with fifty- percent probability

Create temporary vector for the output values with size 2(No of edges-1) and copy the output values to the temporary vector. Delete the vector with output values and create a new one with the size 2(No of

edges).

Yes

Did there exist any input before?

Did there exist any input before?

No

No Nr of inputs<15? Too many inputs

Use nestled canalizing

rules?

Get the number of inputs (indegrees).

No

Yes

Create two vectors with one position and one with two positions. They are assigned the values true or false according thdistribution in Eq20.

Fig 14: The figure shows the flowchart for the process of updating the Boolean rules after an edge has been added.

26

Implementation of the rewiring The rewiring procedure is used when the number of outdegrees and indegrees for every vertex should be kept constant but the edges should be reassigned. Switching the start vertices for two edges that has been randomly chosen can achieve this. For example, if the network is described by the graph in Fig 9 and the two edges (1,0) and (0,2) have been chosen the new edges will become (0,0) and (1,2) (see Fig 15). The first position in the parenthesis is the start vertex and the second is the end vertex.

Switch 1

2

0

2

1

0 Chose to change place with each other

0

2

1

2

1

0 Head

Nodes

Start Vertices.

End Vertices.

End Vertices.

Start Vertices.

Fig 15: The figure shows how the switches of start vertices work. The selection of the two edges begins by randomly choosing two vertices between all the vertices in the network. After the two vertices have been chosen it will be tested if they have any input and if one of them do not, two new vertices are chosen. If both have inputs the start vertex (input) is randomly chosen with equal probability from all the vertices (inputs) connected to the current vertex. For the network in Fig 15 there is only one input per vertex so the probability for it to be chosen is 100%. A vertex with two inputs would result in a probability of 50% of choosing any of them. After the two edges have been chosen it is checked if the edges is a part of a motif and if motifs should be kept intact when the rewiring is applied. If one of the edges is a part of a motif and motifs should be kept intact, two new edges are randomly chosen. The final step before the switch can occur is to check if the new edges created by the switch already exists and if they do a new pair of edges must be chosen. The switching of start vertices is continued until the number of “switchings” equals the number of edges or if the selection of two edges has failed more than hundred thousand times. For the switching to be interrupted by too many failures it is required that the failures have occurred after each other. The reason for the feature of interruption due to too many failures is that in some situations it would be practically impossible to find two edges, which are allowed to switch start vertices. This rewiring algorithm has some problems but tests show that the rewiring is sufficient. One problem is that switching can be interrupted before all edges have switched start vertices. Another problem is that there is no guarantee for all edges to be switched even if no interruption has occurred. The problem mentioned in the previous sentence is due to the fact that the edges are chosen randomly with no control if they have been chosen earlier. This has the consequence that an edge that has switched its start vertex can be selected instead of an edge, which has not been switched. Yet another problem is that an edge that has an end vertex, which is shared by many edges, experiences a lower probability to be chosen than an edge with an end vertex, which is shared by few edges. Despite the problems tests show that the algorithm described above generates a high degree of rewiring.

27

Implementation of ER-networks generating procedure One random network or several random networks are often used as a null hypothesis when the different networks from articles are simulated. Adding new edges, randomly, one by one until the network contains the predefined number of edges creates the ER-networks. The vertices, which the edges are drawn between, are chosen from all vertices with the same probability The process of adding edges are, as seen in Fig 16, conducted by a loop that is not stopped until the predefined number of edges has been added. Some parts of the process of creating the random networks are left out in the flowchart in Fig 16. Tests on how many outdegrees and indegrees the start vertex respective end vertex has are left out. The tests mentioned in the previous sentence can be enabled or disabled. Another detail, which is left out in the flowchart, is the feature to disable the possibility of generating self-feedback. When the possibility of self-feedback is disabled it is tested if the two randomly chosen vertices are equal and if they are equal a new pair of vertices are selected.

Add an edge between the two chosen vertices.

No

Yes

Choose randomly two vertices from all the vertices in the network.

Are there no edge between

the two sen edcho ges?

The network is finished.

Is the No. of added edges <

final No. of edges?

Yes

No

Fig 16: The figure shows a simplified flowchart for the generating of an ER-network.

28

Implementation of Scale free networks generating procedure The process of generating the scale free networks contains many different steps. The first step is to transform the number of edges to the exponent, gamma, in the distribution shown in Eq6. This is done iteratively by calculating the mean number of edges for the distribution in Eq6 for different values on gamma until the requested value is obtained. To obtain a scale free distribution the maximum number of indegrees and outdegrees are locked to certain values generated from the distribution given in Eq6. The process of generating the ”locking” values is divided into two parts one part for the indegrees and another for the outdegrees. This is due to the fact that the number of indegrees is limited to ten and that the number of outdegrees is allowed to exceed ten. Probabilities for one to ten indegrees are generated before the “locking” values for the indegrees are generated. The number of indegrees is then locked for all the vertices in accordance with the probabilities. The allowed number of outdegrees for every vertex is decided in a similar way as for the indegrees. Probabilities for one to hundred outdegrees per vertex are generated and with the help of these probabilities the allowed number of outdegres per vertex is set. The total number of indegrees and outdegrees must equal each other and the predefined number of edges. To achieve this one must sometimes correct the number of indegrees and outdegrees due to the fact that the process to determine the number of indegrees and outdegrees is random. Depending on if the outedegrees or indegrees are to many or to few they are removed or added. They are removed or added by randomly choosing vertices and remove or add one indegree or outdegree for the chosen vertex. The vertices are randomly chosen until the number of indegrees and outdegrees is corrected. When the number of outdegrees and indegrees has been corrected it is time to add the actual edges and they are added randomly. The edges are added one by one until the predefined number of edges has been added or that the adding of an edge has failed more than hundred thousand times. The restriction on the number of indegrees and outdegrees for every vertex together with the restriction that an edge with the same direction between two vertices only is allowed once can result in that it is impossible to add the last edges. The problem with the last edges is solved by removing the restriction on the number of outdegrees for every vertex. Random number generator An external random generator is used instead of the one implemented in the C++ library math.h. The external generator is used due to the fact the generator included C++ is insufficient when many random numbers are needed. The random generator that is used is fetched from [22] and according to [22] it has passed all statistical tests within the floating-point precision.

29

The implementation of the calculation of entropy The entropy is calculated with data from a file that contains the states from an arbitrary number of time steps. Every line in the file contains the state of the total network for one time step. The state of the total network means the states from all vertices. In the flowchart in Fig 17 one state, here the state means the state for the total network, is read per cycle that is one line from the file is read per cycle. A single linked list is used to store the states, which occurs in the file. For every new state a new node is added to the singled linked list. A number that indicates the number of times the state has appeared should be increased when the state has occurred earlier. The comparison between the states from the file and the states, which already exists in the single linked list, is conducted by a loop. The loop steps through the single linked list node by node and if no node with a state, which equals the state from the file, is found a new node will be added at the end of the single linked list. When all states from the file have been loaded, the entropy will be calculated with the help of the numbers in the single linked list. The program also allows one to calculate the mean of the entropy for several initial states.

No

Yes

ed, to one.

Add a new node with the new state to the single linked list that contains allthe states, which have appeared and set the number that indicates how many times the current state, has appear

No

Yes

Has the state occurred before?

Increase the number, which indicates how many times the current state has appeared.

Read state from file.

Has all the states been

read?

Calculate the entropy based on relative appearance of states in thesingle linked list.

Fig 17: The figure shows a simplified flowchart for the calulation of entropy.

30

The implementation of the noise analysis tool To analyze the impact of different types of noise a special tool has been implemented. It compares the dynamics of the network with noise and without noise over a predefined number of time steps. The comparison is performed by calculating the hamming distance between the states, which have been updated with noise and the states, which have been updated without noise. It is possible to do the comparison with all the noise types enabled or with just one noise type enabled (See under the topic Noise analysis tool in the Theory part for more information about the noise types). A simplified flowchart for the implementation of the noise analysis tool is shown in fig 18. According to the flowchart a set of rules are loaded into the network and these rules must be created prior to the running of the noise analysis tool. It is also possible to load completely new networks instead of just Boolean rules. The initial states are chosen by the fact that every position in the network is assigned either true or false, with a fifty- percent probability. For every Boolean rule and initial state the network runs a predefined number of time steps. First one step is taken without any noise enabled and the network state is, after updating, stored in a temporary vector (TV1). In the same loop, which stores the network state in a temporary vector, the network state is assigned new values from another temporary vector (TV2), which contains values from the updating with noise. When no updating has occurred, the vector TV2 contains the initial value. After the network state has been assigned values from the vector TV2, one step is taken. When the step has been taken the network state is stored in the vector TV2. In the same loop the network state is assigned values from the vector TV1. The hamming distance between TV1 and TV2 are calculated before the updated states are loaded in to them. The final step is to calculate the mean and standard error for the hamming distances over all the Boolean rules and initial states (see fig 18). Implementation of the Derrida plot Several implementations for the Derrida plot have been tried, or more exactly, several ways to sample the states, used in the Derrida plot, have been tried. See under the topic Derrida plots in the Theory part for more information about the Derrida plot. The implementation presented here is the one that gave the best fit for the Derrida plots of the ER-network, with flat distributed rules, in the article [13]. It is based on, that one initial state is chosen and that a second state is created, from the initial state, by randomly choosing a position in the initial state and inverts its value. The second state is stored in a vector (v2). The network is updated one step from the initial value and the updated value is stored in a vector (v1). After the network is updated from the initial state the values from the vector v2 is copied to the network state. With the values from vector v2 in place the network is updated one time step and the hamming distance between the network state and the vector v1 is calculated. After the calculation the networks state is assigned new values from the vector v2. The network state and vector v2 is now altered randomly to specify a state with a bigger hamming distance. A position can only be altered ones per initial state. The procedure described above will repeat itself until the predefined number of samples has been reached. To get a more accurate picture of the dynamics several initial states and Boolean rules are used to calculate a Derrida plots which is based on a mean

31

Yes

Is No of steps <predef. No.

Steps?

Generate a new state, randomly, for the network.

Have all initial states been tried?

Calculate the mean and standard error for the hamming distances

Load new rules.

Have all Boolean rules

been tried?

Update the state of the network with and without noise

Yes

Yes

No

No

No

Calculate the hamming distance between the states with noise and without noise.

Fig 18: The figure shows a simplified flowchart for the noise analysis tool.

32

Simulations Simulations on entropy versus chaos The aim with this simulation is to test how reliable entropy is as an indicator of chaos. It is achieved by comparing the result from Derrida plots with the results from entropy.

Simulation settings The simulations are performed on ER-networks with 1000 vertices. The ER-networks simulated is divided into four groups based on their number of edges. The groups have 1000 edges, 1500 edges, 2000 edges and 2500 edges. For every group ten networks are generated with different set of edges E(G). Every E(G) is associated with a certain B(G) and every B(G) consists of rules drawn from a flat distribution. The Derrida plots are based on ten different networks, and 50 initial states per network. The entropy is also calculated over 50 initial states and is based on 1000 time steps.

Simulation results The blue line in the Derrida plots is the diagonal and the error bars correspond to one standard error.

ER-1000

0

200

400

600

800

1000

1200

0 200 400 600 800 1000 1200

d(T)

d(T+

1) d(T) er-1000

Fig 19: The figure shows the Derrida plot for the ER-networks with 1000 edges.

33

Derrida plot for ER-1500

0

200

400

600

800

1000

1200

0 200 400 600 800 1000 1200

d(T)

d(T+

1) d(T) er-1500


Derrida plots for ER-2000

0

200

400

600

800

1000

1200

0 200 400 600 800 1000 1200

d(T)

d(T+

1) d(T) er-2000


34

Derrida plots for ER-2500

0

200

400

600

800

1000

1200

0 200 400 600 800 1000 1200

d(T)

d(T+

1) d(T) er-2500


Entropy

0

2

4

6

8

10

12

1 2 3 4 5 6 7 8 9 10

Networks

Entr

opy ER-1000

ER-1500ER-2000ER-2500

Fig 23: The figure shows the entropy for all the simulated networks.

35

Analysis of the result Under this topic will a description and interpretation of the results from the simulation follow. The first thing that can be interesting to observe is that the entropy differs for the same network when different sets of Boolean rules are applied. They differ to the extent that for some Boolean rules the entropy is higher than for some Boolean rules for networks with more edges (See Fig 23). This behaviour is not so strange it is the same behaviour that the standard error in the Derrida plots shows. Despite the difference between the Boolean rules for the same network one can see a trend. Higher number of edges result, for most of the Boolean rules, in a higher entropy. Another property that also is interesting to observe is that the entropy for ER-2500 has the highest entropy for all of the applied rules. If one looks at the Derrida plot for ER-2500 in the graph in Fig 22 one can see that the curve crosses the diagonal, which means that ER-2500 lies in the chaotic regime. If one looks at the curve in Fig 21, which shows the Derrida plot for ER-2000, one can see that the curve lies very close to the diagonal. From the Derrida plot in Fig 21 one can draw the conclusion that ER-2000 lies on the boundary between the chaotic and ordered regime and if one looks at the entropy in Fig 23 one can see that the entropy lies relatively high. If one looks at the Derrida plots for the two networks that have the lowest entropy, ER-1000 and ER-1500, one can see that their curves are situated below the diagonal and they are therefore robust. The Derrida plots for ER-1000 and ER-1500 are displayed in Fig 19 and Fig 20. The results show that the entropy can be used as an indicator on if the network lies in the chaotic or ordered regime. One should be careful, as mentioned under the topic entropy in the theory part, to use the entropy as an sole indicator on if the system lies in the chaotic or ordered regime. One should be careful due to the fact one must choose the number of time steps that the entropy is based on to be a much bigger number than the length of the attractors and it is not possible to know the length for all of the attractors. Another problem is that the entropy with its current implementation is not suitable to be used for larger networks.

36

Calculations for mean probability for canalizing rules for different distributions The purpose and aim with the calculations under this topic is to try to see how big influence, canalizing rules have on the robustness of the network. This is done by calculations that are based on the indegrees only. The result is then compared with values for the robustness for real networks.

Description of the calculations No simulated networks are used for the calculations performed under this topic. Probability distributions for the indegrees are used instead. Three different distributions are used. A slightly modified version of the Poisson distribution is used for the blue curve in Fig 24. The original Poisson distribution is presented in Eq4. The modification that has been done is that the Poisson distribution has been normalized for the interval one to the infinity. This has been done because a rule must at least consist of one input (indegree) if it is going to be possible to determine if it is canalizing or not. A power law distribution, given in Eq6, is used for the blue curve in Fig 25. Another power law distribution, shown in Eq14, is used for green curve in Fig 25. The probability distributions are one part of deriving the curves in the graphs Fig 24 and Fig 25. The other part is to calculate the probability for canalizing rules and this is done with equation Eq19. Bias and number of inputs are the variables that the expression, in Eq19, consists of. Bias is, as mentioned before, the probability that the output for a certain input is one. The desired value is the mean probability for canalizing rules and it is obtained by taking the expectation value for a distribution and the function for the probability for canalizing rules (See Eq31).

( ) (kpbkfEk

MCR ⋅= ∑∞

=1, ) (Eq31)

EMCR, in Eq31, is the mean probability for canalizing rules and f(k,b) denotes the function in Eq19 . Lower case k and b, in Eq31, denotes the number of indegrees(inputs) and the bias. The function p(k) is the probability distribution, either a Poisson or a power law distribution. For the Poisson distribution the mean number of indegrees is used as input besides the number of inputs. In the case of a power law distribution the constant gamma is used as input besides the number of indegrees. It is not practically possible to perform the summing all the way to the infinity. So the sum is only taken to k equals one hundred, but this is sufficient due to the fact the probability for canalizing rules are very small for high k values. Eq31 is evaluated for several values on the bias and, depending on the distribution used, gamma or the mean number of indegrees. All these evaluated values will form points with bias as y coordinate and gamma or mean number of indegrees as x coordinate. A certain mean probability for canalizing rules is associated to every point. All points below a certain value on the mean number of canalizing rules are removed to obtain a region with a boundary that is constituted by the blue curves and the green curve. The value, which the mean number of canalizing rules should lie above, is the mean probability for canalizing rules when the mean number of indegrees for a Poisson distribution is two. The value two for the mean number of indegrees is based on the fact that it, according to [8], is the critical value between the ordered

37

and chaotic region for ER-networks. The final step is to remove all points except the ones, which lie on the edge. Now to the red curves, they display the real region for the parameter values, where the result is a robust network. These red curves, in Fig 24 and Fig 25, are based on two different equations. One equation for the ER-networks, which are Poisson distributed and another equation for the Scale-free networks that are power law distributed. These equations give the same curve as simulations on real networks. This has been shown in the article [10]. The equation for the edge of chaos for the power law distribution can be found in Eq17 and the other equation for the Poisson distributed networks can be found in Eq5. The red curve, in Fig 24, is calculated for the same bias as the blue curve. The situation is a bit different for Eq17, due to the fact that the bias is calculated from an array of predefined values on gamma.

Results of the calculations

Robustness for different mean number of indegrees and bias

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

0 20 40 60 80 100

Mean number of indegrees

Bia

s Based on MCRReal curve

Fig 24: The figure shows the robust regime and chaotic regime for the parameters bias and mean number of indegrees. The ordered regime is situated to the left of the curves. The red curve shows the real ordered regime and is based on Eq5. In the case of the blue curve the method based on mean probability for canalizing rules has been used.

38

Robustness for Scale-free networks for gamma versus bias

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

1 1,5 2 2,5 3

Gamma

Bia

s

Based on MCRReal valuePlO

Fig 25: The figure shows the robust regime and chaotic regime for the parameters bias and gamma. The ordered regime is situated to the right of the curves. The red curve shows the real ordered regime and is based on Eq17. In the case of the blue curve the method based on mean probability for canalizing rules has been used on the distribution given in Eq6. The green curve has also been derived with the help of mean probability for canalizing rules but Eq14 has been used instead of Eq6.

Discussion and analysis It is not possible to interpret the regions, which are bounded by the blue curves or the green curve as a measurement of the ordered region due to the fact that the curves deviate from the red curves that are the real edges between the chaotic and ordered region. It is still interesting to observe similarities between the blue or green curve and the red curve. In the case of the Poisson distribution, in Fig 24, one can see that the curves are almost identical when the bias value lies in the interval 0.15 to 0.85. The behaviour has also similarities outside the interval. Although the curves lie at different levels they converge to the same point. In the case of the Scale-free networks, which are based on power law distributions, the difference between the curves is larger. The blue curve, in Fig 25, deviates most from the real curve, while the green curve lies closer. Both the blue and green curve has a shape that looks somewhat like the red curve. The reason for that the green curve lies closer to the red curve might be that the distribution allows a zero value on the number of indegrees and it is not an unreasonable property for real networks. Observe the word allows in the previous sentence because the distribution allows zero values but it is normalized in the interval one to infinity and it is used in this interval when the mean probability for canalizing rules are calculated. The resemblance between the curves cannot be interpreted as that the mean probability for canalizing rules strictly decides if a networks lies in the ordered or chaotic regime but it can be seen as an indicator on that they have strong impact on the robustness of the network.

39

Simulations on ER-networks Several generated ER-networks are simulated under this topic to examine the robustness, for the two types of rules. Four different types of ER-networks with the same number of vertices but a different number of edges are used. The number of vertices is 1000 and the number of edges for the different networks is 1000, 2000, 4000 and 8000. The two types of rules are nested canalizing rules and flat distributed rules. The tools used for evaluating the robustness are the Derrida plots, the noise analysis tool and mean number of canalizing rules. The probability distribution for both indegrees and outdegrees is also evaluated.

Probability distribution In Fig 26, the probability distribution for indegrees is shown and in Fig 27 the probability distribution for outdegrees is shown. The actual probability distribution is not displayed. A mean over several networks of the same type is displayed instead. For each of the four types of ER-networks are fifty networks generated and these fifty networks are denoted ER-1000, ER-2000, ER-4000 and ER-8000.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

0,4

Probability

Indegrees(inputs)

Probability distribution for indegrees

ER-1000ER-2000ER-4000 ER-8000

Fig 26: The figure shows the mean probability distribution for the indegrees for ER-1000, ER-2000, ER-4000 and ER-8000.

40

0 2 4 6 8

10 12 14 16 18 20 22 24 26

0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

0,4

Probability

outdegrees(outputs)

Probability distribution for outdegrees

ER-1000ER-2000ER-4000 ER-8000

Fig 27: The figure shows the mean probability distribution for the outdegrees for ER-1000, ER-2000, ER-4000 and ER-8000.

Derrida plots Derrida plots for the networks ER-1000, ER-2000, ER-4000 and ER-8000 are displayed. Every network mentioned in the last sentence generates two curves one for each of the two rules applied. The applied rules are, as mentioned before, nested canalizing and flat distributed rules. The Derrida plots are displayed in two different graphs, one with curves with flat distributed rules and another with nested canalizing rules. Two graphs are used due to the fact that the graph becomes a bit “messy” with all the curves in the same graph. The networks with flat distributed rules are denoted BFD(ER-1000), BFD(ER-2000), BFD(ER-4000) and BFD(ER-8000). The symbol BFD(ER-X000) does not symbolize one set of Boolean rules but fifty sets of Boolean rules due to the fact that ER-X000 is the notation for fifty networks. In the case of nested canalizing rules the notation are BNC(ER-1000), BNC(ER-2000), BNC(ER-4000) and BNC(ER-8000). As in the case for flat distributed rules BNC(ER-X000) is the notation not only for one set of rules but for fifty sets of rules. The Derrida plots are based on means that is based on fifty networks within the same type and 50 initial states per network. For example the curve denoted BFD(ER-1000), in Fig 28, is based on the fifty networks denoted ER-1000, which has a set of flat distributed rules attached to every network, and fifty initial states for every network. Every curve in the Derrida plots are based on fifty points that represent fifty different hamming distances between the initial states.

41

Derrida plots for the ER-nets with flat distributed rules

0

100

200

300

400

500

600

700

800

900

1000

0 200 400 600 800 1000

d(T)

d(T+

1)

diagBFD(ER-1000)BFD(ER-2000)BFD(ER-4000)BFD(ER-8000)

Fig 28: The figure shows the Derrida plots for the four ER-networks with flat distributed rules.

Derrida plots for the ER-networks w ith nested canalizing rules

0

100

200

300

400

500

600

700

800

900

1000

0 200 400 600 800 1000

d(T)

d(T+

1)

diag

BNC(ER-1000)

BNC(ER-2000)

BNC(ER-4000)

BNC(ER-8000)

Fig 29: The figure shows the Derrida plots for the four ER-networks with nested canalizing rules.

42

Noise analysis tool The graphs are based on means for the sets of rules and for every set of rules are fifty starts per set of rules made. In every set is fifty time steps taken. One noise level is applied 0.001.

State noise 0.001 for flat distributed rules

0

100

200

300

400

500

600

0 10 20 30 40 50

Time step

Ham

min

g di

stan

ce

BFD(ER-1000)BFD(ER-2000)BFD(ER-4000)BFD(ER-8000)

Fig 30: The figure shows State noise for the flat distributed rules. BFD(ER-1000) is not visible due to the fact that it lies entirely in the x-axis.

State noise 0.001 for nested canalizing rules

05

1015202530354045

0 10 20 30 40 50

Time step

Ham

min

g di

stan

ce

BNC(ER-1000)BNC(ER-2000)BNC(ER-4000)BNC(ER-8000)

Fig 31: The figure shows State noise at a noise level 0.001 for the nested canalizing rules.

43

Delay noise 0.001 for flat distributed rules

050

100150200250300350400450500

0 10 20 30 40 50

Time step

Ham

min

g di

stan

ce


Fig 32: The figure shows Delay noise at a noise level 0.001 for the flat distributed rules.

Delay noise 0.001 for nested canalizing rules

0

0,1

0,2

0,3

0,4

0,5

0,6

0 10 20 30 40 50

Time step

Ham

min

g di

stan

ce


Fig 33: The figure shows Delay noise at a noise level 0.001 for the nested canalizing rules

44

Tranfer noise 0.001 for flat distributed rules

050

100150200250300350400450500

0 10 20 30 40 50

Time step

Ham

min

g di

stan

ce


Fig 34: The figure shows Transfer noise at a noise level 0.001 for the flat distributed rules.

Tranfer noise 0.001 for nested canalizing rules

00,20,40,60,8

11,21,41,6

0 10 20 30 40 50

Time step

Ham

min

g di

stan

ce


Fig 35: The figure shows Transfer noise at a noise level 0.001 for nested canalizing rules.

45

All noise 0.001 for flat distributed rules

0

100

200

300

400

500

600

0 10 20 30 40 50

Time step

Ham

min

g di

stan

ce


Fig 36: The figure shows All noise at a noise level 0.001 for the flat distributed rules.

All noise 0.001 for nested canalizing rules

05

1015202530354045

0 10 20 30 40 50

Time step

Ham

min

g di

stan

ce


Fig 37: The figure shows All noise at a noise level 0.001 for the nested canalizing rules.

46

Mean probability for canalizing rules The mean probability for canalizing rules is calculated for ER-1000, ER-2000, ER-4000 and ER-8000. Vertices with no indegrees (inputs) are not taken to account when the mean probability of canalizing rules is calculated. The standard error is based on the difference of networks within a type of network. Tab 3: The table shows the mean probability for canalizing rules for the different types of networks. Mean probability for

canalizing rules. Standard Error

ER-1000 0.883467 0.00568397 ER-2000 0.689399 0.00668482 ER-4000 0.308768 0.00706961 ER-8000 0.0286581 0.00373516

Discussion and analysis A short description and interpretation of the results from the Derrida plots, in Fig 28 and Fig 29, is given first. The first thing that is interesting to notice is the difference between the flat distributed rules and the nested canalizing rules. The difference consists of that the robustness increases with a lower number of edges for flat distributed rules but decreases for nested canalizing rules. Another thing that is interesting to notice is that with nested canalizing rules all of the tried networks are, according to the Derrida plots, robust. They are robust due to the fact that none of the curves in Fig 29 crosses the diagonal. This result is consistent with the results in the article [23], which claims that networks with nested canalizing rules always are stable. Yet another thing that is worth noting is that for flat distributed rules the networks ER-1000 and ER-2000 are, according to the Derrida plots, robust but ER-4000 and ER-8000 are not robust. This is consistent with the results in the article [8], which claims that a critical value on the mean connectivity exists and that values above this critical value will result in a network that lies in the chaotic regime. The critical value is two and if one multiplies with 1000 one can see that the network ER-2000 lies on the boundary between the ordered and chaotic regime. Now an interpretation of the results from the Noise analysis tool will follow. The first noise type that was applied was State noise and the results are shown in Fig 30 and Fig 31. The curves for the flat distributed rules are shown in Fig 30 and one can see a distinct difference between the networks that lie above the critical value and the ones, which lie below. For the networks that are situated below the critical value the State noise has almost no effect but for the networks that are situated above the critical value the State noise causes very large Hamming distances. The large Hamming distances are a good indicator on that the networks are in the chaotic regime. The order follows the number of edges. The order is BFD(ER-1000), BFD(ER-2000), BFD(ER-4000) and BFD(ER-8000). Where BFD(ER-1000) is the most resistant towards State noise and BFD(ER-8000) the least resistant. The curves with nested canalizing rules are displayed in Fig 31. One thing that is interesting to notice is that the least State noise resistant type of network in Fig 31, BNC(ER-1000), shows a much higher resistance than the two least noise resistant networks in Fig 30. The results presented in last sentence are consistent with the claim that all networks with nested canalizing rules are stable. Another

47

thing that is interesting to notice is that the order for the nested canalizing rules is opposite to the order for the flat distributed rules. The order for the nested canalizing rules is BNC(ER-8000), BNC(ER-4000), BNC(ER-2000) and BNC(ER-1000). Where BNC(ER-8000) is the most resistant toward State noise and BNC(ER-1000) is the least resistant. The second noise type that was applied is the Delay noise and the results are displayed in Fig 32 and Fig 33. One interesting property of the graphs, Fig 32 and Fig 33, is the difference of the curves, which are the least resistant to Delay noise. The Delay noise resistance for the least noise resistant in Fig 32, which displays the curves with flat distributed rules, is very large if one compares with the Delay noise resistance for the least noise resistant in Fig 33, which shows curves with nested canalizing rules. The difference between BFD(ER-1000) and BFD(ER-2000) is negligent but they lie substantially lower than the networks, which lies above the critical value for the mean connectivity, BFD(ER-4000) and BFD(ER-8000). The values for the nested canalizing rules are so low that it is not interesting to examine their order. The third noise type that was applied is Transfer noise. The results are displayed in Fig 34 and Fig 35. The same trends that have been observed for the other noise types can also be observed for Transfer noise. For example: the least Transfer noise resistant for the flat distributed rules is much more sensitive for Transfer noise than the least resistant with nested canalizing rules. The order for the curves with flat distributed rules is also the same as for the other noise types and it is BFD(ER-1000), BFD(ER-2000) , BFD(ER-4000) and BFD(ER-8000). Where BFD(ER-1000) is the most resistant to Transfer noise and BFD(ER-8000) is the least resistant. The order is opposite for the nested canalizing rules and it is BNC(ER-8000), BNC(ER-4000), BNC(ER-2000) and BNC(ER-1000). Where BNC(ER-8000) is the most resistant to Transfer noise and BNC(ER-1000) is the least resistant. The last setting for the Noise analysis tool was when the three noise types where applied at the same time. One can see the results in Fig 36 and Fig 37 and the graphs have a similar appearance as for State noise. The order for the flat distributed rules, in Fig 36, is BFD(ER-1000), BFD(ER-2000), BFD(ER-4000) and BFD(ER-8000). Where BFD(ER-8000) is the least resistant to noise and BFD(ER-1000) is the most resistant. The order is opposite for the nested canalizing rules and it is BNC(ER-1000), BNC(ER-2000), BNC(ER-4000) and BNC(ER-8000). Where BNC(ER-1000) is the least resistant to noise and BNC(ER-8000) is the most resistant. In Tab 3 the result of the last test is presented and the result is consistent with the result from the other tests performed on the networks with flat distributed rules. With other words the order is the same and it is BFD(ER-1000), BFD(ER-2000), BFD(ER-4000) and BFD(ER-8000). Where BFD(ER-1000) has the highest mean probability for canalizing rules and BFD(ER-8000) the lowest probability. The result is also consistent with the probability distribution for indegrees(inputs). The network types that have a high fraction of vertices with few indegrees have also a high mean probability for canalizing rules. Worth noting is also that the mean probability for canalizing rules, for ER-networks, are approximately 0.69 when the mean number of indegrees is two. A short summary of the most important conclusion that is possible to draw is presented here. Robustness, for ER-networks, decreases with higher number of edges for flat distributed rules and increases for nested canalizing rules. There is a connection between the robustness and the mean probability for canalizing rules. All networks with the nested canalizing rules are robust.

48

Simulations on the Fang-net The network that is simulated under this topic is fetched from the article [24]. The Fang-net is one of the proposed models for Saccharomyces cervisiae also known as budding yeast. In the Fang-net there are eleven vertices and 34 edges. The type of rules that governs the dynamics for the network in the article is not Boolean but based on thresholds. Under this topic the same network, the Fang-net, is simulated with two types of Boolean rules. The two types of rules are flat distributed rules and nested canalizing rules. Beside the Fang-net two other types of networks are simulated and they are used as comparison. These networks are ER-networks and Scale-free networks with the same number of vertices and edges as the Fang-net. The tools used for evaluating the robustness are the Derrida plots, the noise analysis tool and mean number of canalizing rules. Another property that is evaluated is the probability distribution for both indegrees and outdegrees.

Probability distribution Graphs for the probability distribution for the Fang-net, ER-networks and Scale-free networks are displayed. The probability distribution for both indegrees and the outdegrees are displayed for all the networks. For the Fang-net the actual probability distribution is showed but for ER-networks and Scale-free networks the mean probability distribution is displayed. This is due to the fact that fifty networks of each type are used. The fifty ER-networks are denoted NetER-

34 and the fifty Scale-free networks are denoted NetSF-34.

Probability distribution for the Fang-net

0

0,05

0,1

0,15

0,2

0,25

0,3

0 1 2 3 4 5 6 7 8 9 10

Number of indegrees and outdegrees

Prob

abili

ty

indegreesoutdegrees

Fig 38: The graph shows the probability distribution for the indegrees and outdegrees in the Fang-network.

49

Mean probability distribution for the ER-networks

0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

0 1 2 3 4 5 6 7 8 9 10

Number of indegrees/outdegrees

Prob

abili

ty

indegreesoutdegrees

Fig 39: The graph shows the mean probability distribution for the indegrees and outdegrees in NetER-34.

Mean probability distribution for the Scale-free networks

0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

0 1 2 3 4 5 6 7 8 9 10


Prob

abili

ty

indegreesoutdegrees

Fig 40: The graph shows the mean probability distribution for the indegrees and outdegrees in NetSF-34.

50

Derrida plots Derrida plots based on the networks Fang-net, NetER-34 and NetSF-34 are displayed. The networks, mentioned in the previous sentence, generate two curves each due to the two types of rules applied flat distributed and nested canalizing rules. Fifty sets of rules of each type are generated for the Fang-net. The fifty sets of flat distributed rules for the Fang net is denoted as B50-FD(Fang). B50-NC(Fang) is the notation for the fifty sets of nested canalizing rules for the Fang-net. In the case of NetER-34 and NetSF-34 the situation is a bit different. Instead of assigning fifty sets of rules per network, only one set of rules, of each type, is assigned per network. This results in fifty rules each for NetER-34 and NetSF-34 due to the fact that they consist of fifty networks. The sets of flat distributed rules for NetER-34 and NetSF-34 are denoted BFD(NetER-34) and BFD(NetSF-34). The sets of nested canalizing rules for NetER-34 and NetSF-34 are denoted BNC(NetER-34) and BNC (NetSF-34). The Derrida plots are based on a mean of fifty rules and 1000 initial state. The error bars that occur in the graphs correspond to one standard error. The graphs are based on eleven test points. Each test point corresponds to a unique hamming distance.

Derrida plots for the flat distributed rules

0

2

4

6

8

10

12

0 2 4 6 8 10 12

d(T)

d(T+

1)

diag

B50-FD(Fang)

BFD(NetER-34)

BFD(NetSF-34)

Fig 41: The graph shows the Derrida plots for flat distributed rules for B50-FD(Fang), BFD(NetER-34) and BFD(NetSF-34).

51

Derrida plots for nested canalizing rules

0

2

4

6

8

10

12

0 2 4 6 8 10 12

D(T)

D(T

+1)

diagB50-NC(Fang)BNC(NetER-34) BNC(NetSF-34)

Fig 42: The graph shows the Derrida plots for flat distributed rules for B50-NC(Fang), BNC(NetER-34) and BNC(NetSF-34).

Noise analysis tool The graphs are based on means for the sets of rules and for every set of rules there are fifty starts. For every start is fifty time steps taken. Two noise levels are applied 0.01 and 0. 1.

State noise 0.01

0

0,5

1

1,5

2

2,5

3

3,5

0 20 40 60

Time steps

Ham

min

g di

stan

ce B50-FD(Fang)B50-NC(Fang)BFD(NetER-34)BNC(NetER-34)BFD(NetSF-34)BNC(NetSF-34)

Fig 43: The figure shows the result for the noise analysis tool with State noise at a 0.01 probability.

52

State noise 0.1

00,5

11,5

22,5

33,5

44,5

5

0 20 40 60

Time step

Ham

min

g di

stan



Delay noise 0.01

00,20,40,60,8

11,21,41,61,8

2

0 20 40 60

Time step

Ham

min

g di

stan


Fig 45: The figure shows the result for the noise analysis tool with Delay noise at a 0.01 probability.

53

Delay noise 0.1

0

0,5

1

1,5

2

2,5

3

3,5

0 20 40 60

Time step

Ham

min

g di

stan



Tranfer noise 0.01

0

0,5

1

1,5

2

2,5

3

3,5

0 20 40 60

Time step

Ham

min

g di

stan


Fig 47: The figure shows the result for the noise analysis tool with Transfer noise at a 0.01 probability.

54

Transfer noise 0.1

00,5

11,5

22,5

33,5

44,5

0 20 40 60

Time step

Ham

min

g di

stan



All noise 0.01

00,5

11,5

22,5

33,5

4

0 20 40 60

Time step

Ham

min

g di

stan


Fig 49: The figure shows the result for the noise analysis tool with All noise at a 0.01 probability.

55

All noise 0.1

00,5

11,5

22,5

33,5

44,5

5

0 20 40 60

Time step

Ham

min

g di

stan



Mean probability for canalizing rules The mean probability for canalizing rules is calculated for Fang-net, NetER-56 and NetSF-56. Vertices with no indegrees (inputs) are not taken to account when the mean probability of canalizing rules is calculated. The standard error is based on the difference of networks within a type of network and because of this the Fang-net has not any standard error. Tab 4: The table shows the mean probability for canalizing rules for the different types of networks. Mean probability for

canalizing rules Standard error.

Fang-net 0.46719 0 NetER-34 0.448868 0.0555706 NetSF-34 0.570595 0.0697515 Analysis and discussion A brief summary and interpretation of the result will be given here. An interpretation of the result from the Derrida plots in Fig 41 and Fig 42 will be given first. The most important feature to observe is the fact that all the curves for flat distributed rules exceed the diagonal (See Fig 41). This means that the Derrida plots indicate that the Fang-net, NetER-34 and NetSF-

34 are situated in the chaotic regime when flat distributed rules are assigned. With other words the networks are not robust. In the case of nested canalizing rules the three networks are robust (See the Derrida plot in Fig 42). This conclusion can be drawn because the curves do not exceed the diagonal. The order for the curves in Fig 41 is BFD(NetER-34), B50-FD(Fang) and BFD(NetSF-34). BFD(NetER-34) is the most chaotic and BFD(NetSF-34) is the least chaotic. In the case of the nested canalizing rules, in Fig 42, the order is BNC(NetSF-34), B50-NC(Fang) and BNC(NetER-34). BNC(NetSF-34) is the least robust and BNC(NetER-34) is the most robust but there is very little difference between B50-NC(Fang) and BNC(NetER-34). The standard error is very

56

big for the curves in Fig 42 and Fig 41. It includes the means for other networks for every type of network. This means that it is impossible to say that all possible networks within a network type is more or less robust than a network within another type, but it is possible to claim that a network within a type is more probable to be more or less robust than a network from another type. So when something is referred to as robust or chaotic in the text above it means that more networks within this type has the referred property than networks within other types. The second tool that was applied is the Noise analysis tool. The results are interpreted for every noise type setting. The first noise type is State noise and the results are shown in Fig 43 and Fig 44. The first thing that can be observed is that the State noise, for the flat distributed rules, has a fairly large effect. Another thing that is interesting to notice is that the curves for the nested canalizing rules have a much lower hamming distance, for both noise levels, than the curves with flat distributed rules. The difference between the curves with the same type of Boolean rules is relatively small but it is still interesting to observe the order. The order for the flat distributed rules, for both noise levels, is B50-FD(Fang) and BFD(NetER-34) at the same level and BFD(NetSF-34) at a lower level. The order for the nested canalizing rules is not possible to determine due to the fact that they are very close and differs for the two noise levels. The effect of Delay noise is somewhat less than for State noise but it is too large to be neglected. One interesting property that it shares with the State noise is that the curves with nested canalizing rules have substantially lower hamming distances than the curves with flat distributed rules (See Fig 45 and Fig 46). The order for the flat distributed rules, for both noise levels, is B50-FD(Fang), BFD(NetER-34) and BFD(NetSF-34). B50-FD(Fang) shows the least noise resistance and BFD(NetSF-34) has the best Delay noise resistance but it is worth noting that the difference between curves are relatively small. The third noise type that was applied is Transfer noise and it has almost as large effect as State noise. It can be interesting to observe that the curves for the nested canalizing rules has lower hamming distances than the curves for the flat distributed rules (See Fig 47 and Fig 48). It is also interesting to observe the order for the curves. The order for the curves with flat distributed rules is a bit hard to determine because, B50-FD(Fang) and BFD(NetER-34) are situated at the same level for the noise level 0.1. At the noise level 0.01 it is possible to see that B50-FD(Fang) has a higher hamming distance than BFD(NetER-34). In both Fig 47 and Fig 48 it is clear that BFD(NetSF-34) is the most Transfer noise resistant among the flat distributed rules due to the fact that it has a lower hamming distance than B50-FD(Fang) and BFD(NetER-34). For the nested canalizing rules the order is B50-NC(Fang), BNC(NetSF-34) and BNC(NetER-34). B50-

NC(Fang) has the least resistance towards Transfer noise among the nested canalizing rules. For the fourth noise type, All noise types turned on, one can observe the same phenomena as for the other noise types that the nested canalizing rules results in a lower hamming distance than the flat distributed rules (See Fig 49 and Fig 50). The order for the flat distributed rules differs a bit for the graphs in Fig 49 and Fig 50 but for both graphs the BFD(NetSF-34) is the most resistant to the noise among the flat distributed rules. In the case of nested canalizing rules the order also differs between the two graphs Fig 49 and Fig 50 but BNC(NetER-34) shows the highest noise resistance in both graphs. The last test performed was the mean probability for canalizing rules and the order for the networks is NetER-34, Fang-net and NetSF-34 (See Tab 4). NetER-34 has the lowest mean

57

probability for canalizing rules and NetSF-34 the highest. It is also worth noting that NetER-34 and Fang-net lies very close, within the standard error. The result is consistent with the probability distributions in Fig 38, Fig 39 and Fig 40. The NetSF-34 has the highest probability for vertices with only one indegree (input) and a rule with one input is always canalizing. The Fang-net and NetER-34 has a lower probability for vertices with only one indegree and therefore also a lower mean probability for canalizing rules. There are some conclusions that can be drawn from the generated data. The first conclusion that can be drawn is that the Fang-net is in the chaotic regime for flat distributed rules, with other words it is sensitive to noise. This contradicts the claims in [24] that the network should be very robust. In [24] Boolean rules is, as mentioned before, not used. Thresholds based rules are used instead. The results from the simulations on the Fang-net with flat distributed rules is consistent with the claim in (Derrida, 1986) that networks with a mean connectivity larger than two should be in the chaotic regime. The mean connectivity for the Fang-net is approximately 3.1. The second conclusion to be drawn is that the Fang-net is robust when nested canalizing rules are applied. It is also interesting to observe the connection between the higher mean probability for canalizing rules for NetSF-34 and the higher noise resistance shown for BFD(NetSF-34).

58

Simulations on the Lee-net The network studied under this topic is the Lee-net with self-feedback, which is one of the proposed regulatory networks for Saccharomyces cervisiae. The Lee-net is fetched from [14] and is based on the networks from [25] but the vertices with no outputs have been removed. The Lee-net has 30 vertices and 56 edges. Several ER-networks and Scale-free networks, with 56 edges and 30 vertices, are used as comparison. A few different types of methods are used for determining the robustness for the Lee-net. These methods are Derrida plots, Noise analysis tool and mean probability for canalizing rules. Every type of network is evaluated with both flat distributed rules and nested canalizing rules. Flat distributed rules refer to rules that are drawn from all possible rules with the same probability. Another property that is evaluated is the probability distribution for both out- and indegrees.

Probability distribution The graphs, presented under this topic, show the probability distribution for the in- and out-degrees in the Lee-net, ER-networks and scale-free networks. Due to the fact that it is impractical to display the probability distribution for all ER- and Scale-free networks a mean is calculated and shown instead. For the Lee-net the actual probability distribution is shown. The means are based on fifty different, randomly generated, networks for both networks type. The fifty ER-networks are denoted by NetER-56 and the fifty scale-free networks are denoted by NetSF-56.

Probability distributions for the Lee-net

0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

0,4

0,45

0,5

0 1 2 3 4 5 6 11

indegrees/outdegrees

Prob

abili

ty

indegrees

outdegrees

Fig 51: The graph shows the probability distribution for the indegrees and outdegrees in the Lee-network.

59

Probability distributions for the ER-networks

0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

0 1 2 3 4 5 6 8


Prob

abili

ty

indegreesoutdegrees

Fig 52: The graph shows the probability distribution for the indegrees and outdegrees in the Lee-network.

Probability distributions for the Scale-free networks

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

1 2 3 4 5 6 7 8 9 10 11 12 14 15 16


Prob

abili

ty

indegreesoutdegrees

Fig 53: The graph shows the mean probability distribution for the indegrees and outdegrees in NetSF-56.

60

Derrida plots Derrida plots for the Lee-net, NetER-56 and NetSF-56 are displayed. Two types of rules, nested canalizing and flat distributed rules, are tried for every network. In the case of the Lee-net fifty rules of each type are assigned. The fifty sets of flat distributed rules for the Lee-net are denoted B50-FD(Lee) and the fifty sets of nested canalizing rules are denoted B50-NC(Lee). The assigning of rules for NetER-56 and NetSF-56 differs from the Lee-net. Instead of assigning fifty sets of rules per network, only one set of rules of each type is assigned per network. The procedure described in the last sentence will result in the same number of rules as for the Lee-net. This is due to the fact that both NetER-56 and NetSF-56 consist of fifty networks each. The sets of flat distributed rules and the sets of the nested canalizing rules for the ER-networks are denoted BFD(NetER-56) respectively BNC(NetER-56). The sets of flat distributed rules for the Scale-free networks are denoted BFD(NetSF-56). Similarly the sets of nested canalizing rules for the Scale-free networks are denoted BNC(NetSF-56). To summarize, fifty sets of rules, of each type, are assigned to the Lee-net and for NetER-56 and NetSF-56 one set of rules, of each type, is assigned per network. The Derrida plots are based on a mean of fifty rules and 1000 initial state. The error bars that occur in the graphs correspond to one standard error. The graphs are also based on thirty test points that is it is based on thirty different hamming distances.

Derrida plots for flat distributed rules

0

5

10

15

20

25

30

35

0 5 10 15 20 25 30 35

D(T)

D(T

+1)

diagB50-FD(Lee) BFD(NetER-56) BFD(NetSF-56)

Fig 54: The graph shows the Derrida plots for flat distributed rules for B50-FD(Lee), BFD(NetER-56) and BFD(NetSF-56).

61


0

5

10

15

20

25

30

35

0 5 10 15 20 25 30 35

d(T)

d(T+

1)

diagB50-NC(Lee) BNC(NetER-56)BNC(NetSF-56)

Fig 55: The graph shows the Derrida plots for nested canalizing rules for B50-NC(Lee), BNC(NetER-56) and BNC(NetSF-56).

Noise analysis tool The noise analysis tool is applied to B50-FD(Lee), B50-NC(Lee), BFD(NetER-56), BFD(NetSF-56), BNC(NetER-56) and BNC(NetSF-56). The graphs are based on means for the sets of rules and for every set of rules fifty starts are performed. In every start fifty time steps are taken. Two noise levels are applied 0.001 and 0.01.

State noise 0.001

0

0,5

1

1,5

2

2,5

0 10 20 30 40 50

Time step

Ham

min

g di

stan

ce B50-FD(Lee)B50-NC(Lee)BFD(NetER-56)BNC(NetER-56) BFD(NetSF-56)BNC(NetSF-56)


62

State noise 0.01

0123456789

0 10 20 30 40 50

Time step

Ham

min

g di

stan



Delay noise 0.001

0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

0 10 20 30 40 50

Time step

Ham

min

g di

stan



63

Delay noise 0.01

00,20,40,60,8

11,21,41,61,8

0 10 20 30 40 50

Time step

Ham

min

g di

stan



Transfer noise 0.001

0

0,2

0,4

0,6

0,8

1

1,2

0 10 20 30 40 50

Time step

Ham

min

g di

stan



64

Transfer noise 0.01

0

0,5

1

1,5

2

2,5

3

3,5

0 10 20 30 40 50

Time step

Ham

min

g di

stan

ces

B50-FD(Lee)B50-NC(Lee)BFD(NetER-56)BNC(NetER-56) BFD(NetSF-56)BNC(NetSF-56)


All noise 0.001

0

0,5

1

1,5

2

2,5

3

0 10 20 30 40 50

Time step

Ham

min

g di

stan



65

All noise 0.01

0123456789

10

0 10 20 30 40 50

Time step

Ham

min

g di

stan

ces

B50-FD(Lee)B50-NC(Lee)BFD(NetER-56)BNC(NetER-56) BFD(NetSF-56)BNC(NetSF-56)


Mean probability for canalizing rules The mean probability for canalizing rules is calculated for Lee-net, NetER-56 and NetSF-56. Vertices with no indegrees (inputs) are not taken into account when the mean probability of canalizing rules is calculated. The standard error is based on the difference of networks within a type of network and because of this the Lee-net has not any standard error. Tab 5: The table shows the mean probability for canalizing rules for the different types of networks. Mean probability for

canalizing rules Standard error

Lee 0.592379 0 NetER-56 0.723275 0.0325901 NetSF-56 0.823927 0.0205114

Analysis and discussion A short description and interpretation of the result from each of the tried methods is presented under this topic. The first method, to be presented, is the Derrida plot and the results are displayed in Fig 54 and Fig 55. The result form the networks with flat distributed rules are shown in Fig 54 and from the result one can see that B50-FD(Lee) is the least noise resistant network. The second least noise resistant network is BFD(NetER-56) and the most noise resistant network is BFD(NetSF-56).It is not possible to claim that all the networks within a network type are more or less robust than all the networks within another network type. This is not possible due to the large standard error. One can see that the standard error for BFD(NetER-56) contains both B50-FD(Lee) and BFD(NetSF-56). It is still possible to claim, despite the large standard errors, that the Lee-net has the highest probability to be the least noise resistant and so on. The order for the nested canalizing rules in Fig 55 differs from the flat distributed rules. The

66

least robust is BNC(NetSF-56) and the second least robust is B50-NC(Lee). The most robust is BNC(NetER-56). As in the case of the flat distributed rules, the standard error is too large to make it possible to claim that all possible networks within a type is more robust than networks in the other types. Still it is possible to claim that a network within a certain type has a greater probability to be more robust than networks from another type. The Noise analysis tool is the second analysis tool that was applied. The results will be presented for each of the noise settings. The first noise type that was applied is the State noise (see Fig 56 and Fig 57). If one looks at the flat distributed rules, one can see that the Lee-net has the highest curve, for both noise levels, which means that it is the least State noise resistant. BFD(NetER-56) is the second least robust against State noise and BFD(NetSF-56) is the most robust against State noise. The order for nested canalizing rules is B50-NC(Lee), BNC(NetER-56) and BNC(NetSF-56). B50-NC(Lee) is the least State noise resistant and BNC(NetSF-

56) is the most State noise resistant. The second noise type that was applied is Delay noise and the first conclusion that can be drawn is that it has a substantially lower impact than the State noise(See Fig 58 and Fig 59). The order of the flat distributed rules is BFD(NetSF-56), B50-

FD(Lee) and BFD(NetER-56). The least resistant to Delay noise is BFD(NetSF-56) and the most resistant is BFD(NetER-56). In the case of the nested canalizing rules the order is BNC(NetSF-56), BNC(NetER-56) and B50-NC(Lee). Where B50-NC(Lee) is most resistant to Delay noise. BNC(NetSF-

56) has the worst resistance for Delay noise. Transfer noise is the third noise type that was applied and the results are shown in Fig 60 and Fig 61. The Transfer noise has not as big impact as the State noise but it has a bigger impact than the Delay noise. The order for the flat distributed rules is a bit hard to determine for the noise level 0.01 in Fig 61. The three curves are placed very close to each other, and therefore it is very hard too tell them apart. It is easier to determine the order for the noise level 0.001 in Fig 60. The order of the flat distributed rules in Fig 60 is B50-FD(Lee), BFD(NetSF-56) and BFD(NetER-56). The most resistant to Transfer noise is BFD(NetER-56) and the least resistant is B50-FD(Lee). For the nested canalizing rules the order is BNC(NetSF-56), B50-NC(Lee) and BNC(NetER-56). The BNC(NetSF-56) has the worst resistance against Transfer noise and BNC(NetER-56) has the best. The fourth and last noise setting is when all noise types are turned on at the same time. The result of this setting is displayed in Fig 62 and Fig 63. For the flat distributed rules the order is B50-FD(Lee), BFD(NetER-56) and BFD(NetSF-56). BFD(NetSF-56) is the most robust and B50-FD(Lee) is the least robust. This is true for both noise levels but the difference is bigger in Fig 62. In the case of nested canalizing rules the order is harder to determine because it differs depending on the time step. The third test that was performed is the mean probability of canalizing rules and the result is displayed in Tab 5. According to the results displayed in Tab 5 the Lee-net has the lowest mean probability for canalizing rules and the NetSF-56 has the highest. This means that the Lee-net should show a lower degree of robustness. Another thing that can be interesting is to compare the results from the Derrida plots in Fig 54 and Fig 55 with the Derrida plots in [14]. The early behavior is fairly equal but for larger Hamming distances there is a substantial difference for both flat distributed and nested canalizing rules. Another difference is that the standard errors in the Derrida plots in [14] are smaller than the results obtained in Fig 54 and Fig 55. The small standard error is a bit suspicious because a network with as few vertices as 30 should have a large standard error [8]. The small standard error in the article is probably caused by the fact that it is based on

67

means over several initial states for every Boolean rule. The fact stated in the previous sentence probably explains why the results are a bit different. A few interesting conclusions can be drawn from the results of the simulations on the Lee-net. For example one can see, for the tried noise levels, that the State noise has the greatest impact and that the Delay noise has the smallest impact. Another interesting property is the correspondence between the probability distribution with high probability for few indegrees and a high mean probability for canalizing rules. This is logical because few inputs lead to a rule with higher probability for it to be canalizing (see Tab 2). If one looks at the result of the mean probability for canalizing rules, one can see that the Lee-net has the lowest mean probability for canalizing rules and this is consistent with the probability distribution for the Lee-net. The Lee-net has the lowest probability for vertices with few indegrees (See fig 51, 52 and 53). One can also observe that the order for the flat distributed rules is the same for the Derrida plots, Noise analysis tool (partially) and mean probability for canalizing rules. For the Noise analysis tool the order is the same for the settings State noise and All noise. Due to the fact that the State noise, the noise with biggest effect, and the All noise results in the same order it is possible to claim that the Lee-net is the least robust and NetER-56 the second least robust. The most robust is NetSF-56. For the nested canalizing rules it is not possible to draw any conclusions regarding the order of stability due to the fact that the Derrida plots and Noise analysis tool gives different results.

68

Simulations on the Milo-net The aim with the simulations performed under this topic is to determine how robust the Milo-net is [26]. The Milo-net is one of the proposed models for the gene interaction in the yeast cell Saccharomyces cervisiae. Several properties for the Milo-net will be evaluated. The Milo-net has 688-vertices and 1079 edges. To determine the robustness two other types of networks are generated and they will serve as comparison. The two types of networks are randomly generated with Poisson distributed out- and in-degrees (ER-networks) respectively power-law distributed out- and in-degrees (scale-free networks). For the scale-free networks the in-degrees are set to a maximum of ten. In the two types of generated networks the number of nodes and edges is the same as in the Milo-net. The properties that will be evaluated are, probability distribution for in- and out-degrees, Derrida plots, Noise analysis tool and the mean probability for canalizing rules.

Probability distributions The graphs presented under this topic show the probability distribution for in- and out-degrees for the Milo-net, ER-networks and scale-free networks. For ER-networks and scale-free networks a mean of the distributions is shown (see Fig 64, 65 and 66). The mean is based on fifty different networks that is, fifty ER-networks and fifty scale-free networks are generated to calculate their respective means. The fifty ER-networks are denoted NetER-1079 and the fifty scale-free networks are denoted NetSF-1079. All graphs for the probability distribution have a log-log scale. All zero values for the probability are removed due to the log-log scale. When there are values for the in- or out-degrees that are zero, the number of in- or out-degrees is added with one. This is shown in the graphs by the fact that the label on the x-axis says: indegrees/outdegrees +1.

Probabaility distributions for the Milo-net

0,001

0,01

0,1

11 10 100

indegrees/outdegrees+1

Prob

abili

ty

indegreesoutdegrees

Fig 64: The graph shows the probability distribution for the indegrees and outdegrees in the Milo-network.

69

Probability distributions for the ER-nets

0,00001

0,0001

0,001

0,01

0,1

11 10 100

indegrees/outdegrees +1

Prob

abili

ty

indegreesoutdegrees

Fig 65: The graph shows the mean probability distribution for indegrees and outdegrees in NetER-1079.

Probability distributions for the Scale-free nets

0,00001

0,0001

0,001

0,01

0,1

11 10 100


Prob

abili

ty

indegrees

outdegrees

Fig 66 : The graph shows the mean probability distribution for indegrees and outdegrees in NetSF-1079.

70

Derrida plots Derrida plots are plotted for the same networks as the probability distributions: the Milo-net, NetER-1079 and NetSF-1079. Two types of Boolean rules are assigned for every type of network. The two types of rules are nested-canalizing rules and rules drawn from a flat distribution of all possible Boolean rules (flat distributed rules). Fifty sets of rules of the two types of rules are assigned to the Milo-net. The flat distributed rules for the Milo-net is denoted B50-

FD(Milo). B50-FD(Milo) does not denote a single set of rules for the network but fifty different sets of rules. The nested canalizing rules for the Milo-net is denoted B50-NC(Milo). As in the case of flat distributed rules B50-NC(Milo) is denoting fifty sets of different rules. In the case of NetER-1079 and NetSF-1079 the same two types of rules are assigned. For NetER-1079 and NetSF-1079 only one rule, of each type, is assigned per network. NetER-1079 and NetSF-1079 denote not only one network each they denote fifty networks each. So the number of generated Boolean rules is the same as for the Milo-net. BFD(NetER-1079) and BFD(NetSF-1079) denote the flat distributed rules assigned to the ER-networks respectively the scale-free networks. The nested canalizing rules for NetER-1079 and NetSF-1079 are denoted by (NetER-1079) respectively BNC(NetSF-1079). The error bars show one standard error for every point in the Derrida plots, which are based on the mean from the fifty rules in the Milo-net or from the fifty networks in NetSF-1079 and NetER-1079

Derrida plots for flat distributed rules

0

100

200

300

400

500

600

700

0 100 200 300 400 500 600

d(T)

d(T+

1)

DiagonalB50-FD(Milo), BFD(NetER-1079) BFD(NetSF-1079)

Fig 67: The graph shows the Derrida plots for flat distributed rules for B50-FD(Milo), BFD(NetER-1079) and BFD(NetSF-1079).

71


0

100

200

300

400

500

600

700

0 100 200 300 400 500 600

d(T)

d(T+

1)

DiagonalB50NC(Milo) BNC(NetER-1079) BNC(NetSF-1079)

Fig 68: The graph shows the Derrida plots for nested canalizing rules for B50-NC(Milo), BNC(NetER-1079) and BNC(NetSF-1079).

Noise analysis tool The noise analysis tool is used for evaluating the Milo-net, NetSF-1079 and NetER-1079. The graphs are based on means for B50-FD(Milo), B50-NC(Milo), BFD(NetER-1079), BFD(NetSF-1079), BNC(NetER-1079) and BNC(NetSF-1079). For every set of rules fifty starts are performed and included in the mean. For every start fifty time steps are taken. Two noise levels are applied for every different noise type.

State noise 0.0001

0

0,5

1

1,5

2

2,5

3

3,5

0 10 20 30 40 50

Time Step

Ham

min

g di

stan

ce B50-FD(Milo)B50-NC(Milo)BFD(NetER-1079)BNC(NetER-1079) BFD(NetSF-1079)BNC(NetSF-1079)

Fig 69: The figure shows the result for the noise analysis tool with State noise at 0.0001 a probability.

72

State noise 0.001

0

510

15

20

2530

35

0 10 20 30 40 50

Time step

Ham

min

g di

stan


Fig 70: The figure shows the result for the noise analysis tool with State noise at 0.001 a probability.

Delay noise 0.0001

0

0,02

0,04

0,06

0,08

0,1

0,12

0,14

0 10 20 30 40 50

Time step

Ham

min

g di

stan


Fig 71: The figure shows the result for the noise analysis tool with Delay noise at a probability level of 0.0001.

73

Delay noise 0.001

0

0,2

0,4

0,6

0,8

1

1,2

1,4

0 10 20 30 40 50

Time step

Ham

min

g di

stan


Fig 72: The figure shows the result for the noise analysis tool with Delay noise at a probability level of 0.0001.

Tranfer noise 0.0001

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0 10 20 30 40 50

Time step

Ham

min

g di

stan


Fig 73: The figure shows the result for the noise analysis tool with Transfer noise at a probability level of 0.0001.

74

Transfer noise 0.001

0

1

2

3

4

5

6

7

0 10 20 30 40 50

Time step

Ham

min

g di

stan


Fig 74: The figure shows the result for the noise analysis tool with Transfer noise at a probability level of 0.001.

All noise 0.0001

00,5

11,5

22,5

33,5

4

0 20 40 60

Time step

Ham

min

g di

stan


Fig 75: The figure shows the result for the noise analysis tool with All noise at a probability level of 0.0001.

75

All noise 0.001

0

5

10

15

20

25

30

35

0 10 20 30 40 50

Time step

Ham

min

g di

stan


Fig 76: The figure shows the result for the noise analysis tool with All noise at a probability level of 0.001.

Mean probability for canalizing rules The mean probability for canalizing rules is calculated for Milo, NetER-1079 and NetSF-1079. The mean is calculated for every vertex with at least one indegree, that is vertices with no indegrees are removed from the calculations. The standard error is based on the difference of networks within a type of network and because of this Milo gets a zero standard error. Tab 6: The table shows the mean probability for canalizing rules for the different types of networks. Mean probability for


Milo 0.828003 0 NetER-1079 0.780073 0.00915448 NetSF-1079 0.884952 0.00387963

Analysis and discussion A description and interpretation of the results will be presented here. The results of the Derrida plots are presented first. The Derrida plot in Fig 67 shows the results for the flat distributed rules B50-FD(Milo), BFD(NetER-1079) and BFD(NetSF-1079). The curve closest to the diagonal is BFD(NetER-1079), the second closest is B50-FD(Milo) and the most distant is BFD(NetSF-1079). The closest curve to the diagonal is the least robust but all the networks plotted in Fig 67 are in the ordered region. Derrida plots based on nested canalizing rules are shown in Fig 68. The order of the curves in Fig 68 is different from the order in Fig 67. The curve that is closest to the diagonal is BNC(NetSF-1079), the second closest is B50-NC(Milo) and the most distant is BNC(NetER-1079). One important fact to observe for both Fig 67 and Fig 68 is

76

that the standard errors are large. They are so large that they overlap each other and in some cases even overlap the mean for the other curves. The Noise analysis tool has been applied with a multitude of different settings. The result is described and interpreted step by step for the different settings in the order: State noise, Delay noise, Transfer noise and All noise. The plots for State noise are shown in Fig 69 and Fig 70. For the flat distributed rules both noise levels result in the same order. The order of the curves is BFD(NetER-1079), B50-FD(Milo) and BFD(NetSF-1079). For nested canalizing rules the order is the same for both noise levels. The order is B50-NC(Milo), BNC(NetER-1079) and BNC(NetSF-1079). B50-NC(Milo) is the highest, which means that it is the least noise resistant. The result from the Delay noise is shown in Fig 71 and Fig 72. One interesting thing that can be observed is the fact that the impact of Delay noise is much smaller than for State noise. Due to the small impact it is not interesting to compare the order of the different curves but it can be interesting to notice that the highest curve is BNC(NetSF-1079). The third type of noise is the Transfer noise and the results are shown in Fig 73 and Fig 74. One can see in both figures that the impact is bigger than for Delay noise but smaller than for State noise. The behaviour of the curves for both the noise levels are relatively similar. The curves are divided into three regions. The highest region contains BNC(NetSF-1079). The region in the middle contains BFD(NetSF-1079) and BFD(NetER-1079). In the lowest region is the curves B50-NC(Milo), B50-FD(Milo) and BNC(NetER-

1079) positioned. The last setting for the Noise analysis tool is when all noise types are turned on at the same time and the result from this setting is shown in Fig 75 and Fig 76. One thing one can observe is that the curves are similar to those displayed for State noise. In the case of flat distributed rules the order of the curves is BFD(NetER-1079), B50-FD(Milo) and BFD(NetSF-

1079). BFD(NetER-1079) is the highest curve and therefore the least noise resistant. The order of the nested canalizing rules is harder to describe because it depends on the time step. The last test performed was the mean probability for canalizing rules and the results are shown in Tab 6. The NetSF-1079 has the highest probability for canalizing rules and NetER-1079 has the lowest probability. Several conclusions are possible to draw from the generated data. The first conclusion to be drawn is that State noise is the noise, which has the biggest effect. Another conclusion that can be drawn is that there is a connection between the robustness and a high mean number of canalizing rules. This conclusion can be drawn due to the fact that the order of the networks with flat distributed rules is the same as the order for the Derrida plots and Noise analysis tool. The mean number of canalizing rules is also consistent with the probability distributions for the indegrees (See Fig 64, 65 and 66). The networks with a low fraction of vertices that have few indegrees have a low mean number of canalizing rules. Yet another conclusion one can draw is that, according to the performed tests, NetSF-1079 is the most robust network.

77

Simulations on rewired versions of the Milo-net The simulations presented here is from the Milo-net and seven rewired versions of the Milo net. These rewired nets are denoted MNet0, MNet1, MNet2, MNet3, MNet4, MNet5 and MNet6. The rewired nets have the same number of vertices and edges as the Milo-net and the number of indegrees and outdegrees for every vertex is also the same as in the Milo-net. See the topic rewiring in the implementation part. Both nested canalizing rules and flat distributed rules are assigned to the networks. Fifty sets each of the two types of rules are generated and they are denoted B50-FD(X) and B50-NC(X), where X can be an arbitrary network with the same indegrees as the Milo-net. The same sets of rules are used for all the networks. This is possible due to the fact that the number of indegrees is the same for all the networks dealt with under this topic. Three types of tests are performed on the networks and the tests are Derrida plots, Noise analysis tool and motif-detection. The motif detection are performed by a software, which is called Mfinder and it works by comparing the motifs from an arbitrary number of random generated networks with the motifs in a network that is loaded from a text file. Motifs that have a higher occurrence in the network from the text file than from the random networks are presented in an output file. The Mfinder software can be found at the web page [26]. [27]

Motif detection Two types of motifs are examined, motifs with three vertices and motifs with four vertices. The Mfinder tool is set to generate one hundred random networks as comparison. Some networks have vertices with self-feedback and in those networks the edges, which form the self-feedback, are removed due to the inability of Mfinder to handle self-feedback. No networks have more than one self-feedback loop and the networks with one self-feedback loop are MNet2, MNet3 and MNet6. In the networks mentioned in the last sentence the edge, which forms self-feedback loop, is removed when the Mfinder tool is applied but the self-feedback loops are kept intact when the other tools are applied. Only in the Milo-net and MNet4 are there motifs that have a higher occurrence than in the generated random networks. See TAB 7 for the motif in MNet4 and TAB 8 for the motifs in the Milo-net. The rows in the adjacency matrix represent the start vertices and the columns represent the end vertices TAB 7: The table shows the adjacency matrix, the occurrence in the tested network and the occurrence in generated random networks for motifs that have a higher occurrence in the MNet4. Adjacency matrix Number of the motif in

MNet4 Number of the motif in the random networks

0000000001011100

232 141.3±44.7

78

TAB 8: The table shows the adjacency matrix, the occurrence in the tested network and the occurrence in generated random networks for motifs that have a higher occurrence in the Milo-net. Adjacency matrix Number of the motif in the

Milo-net Number of the motif in the random networks

000100110

70 14.1±3.8

0000000001001110

1125 506±192.3

0000000001011100

286 134.5±44.8

0000000011001100

1843 338.9±41.9

0000000011001110

157 4.9±4.7

0000100010001010

102 48.5±18.9

Derrida plots Derrida plots are displayed for all the networks in two different graphs. In the first graph, Fig 77, the networks are displayed when flat distributed rules are applied. In the second graph, Fig 78, the same networks are displayed, but with nested canalizing rules applied. The Derrida plots are based on fifty sets of rules, for each type of rule, and for every rule is fifty different initial states chosen. The Derrida plots are also based on fifty different hamming distances.

79

Derrida plots for the rewired Milo-nets w ith flat distributed rules

0

100

200

300

400

500

600

700

0 100 200 300 400 500 600 700

d(T)

d(T+

1)

diag

B50-FD(Milo)

B50-FD(MNet0)

B50-FD(MNet1)

B50-FD(MNet2)

B50-FD(MNet3)

B50-FD(MNet4)

B50-FD(MNet5)

B50-FD(MNet6)

Fig 77: The figure shows the Derrida plots for the flat distributed rules for the Milo-net and its rewired versions.

Derrida plots for the rewired Milo-nets w ith nested canalizing rules

0

100

200

300

400

500

600

700

0 100 200 300 400 500 600 700

d(T)

d(T+

1)

diag

B50-NC(Milo)

B50-NC(MNet0)

B50-NC(MNet1)

B50-NC(MNet2)

B50-NC(MNet3)

B50-NC(MNet4)

B50-NC(MNet5)

Fig78: The figure shows the Derrida plots for the nested canalizing rules for the Milo-net and its rewired versions.

80

The Noise analysis tool The Noise analysis tool has been applied for one noise level, 0.001, and two different noise settings, State noise and All noise. It has been used on the Milo-net and the rewired versions of the Milo-net, for both flat distributed rules and nested canalizing rules. The Noise analysis tool has been performed on fifty sets of rules, fifty different initial states per set of rules and for fifty time steps.

State noise 0.001 for the flat distributed rules

0

5

10

15

20

25

30

0 10 20 30 40 50

Time step

Ham

min

g di

stan

ce

B50-FD(Milo) B50-FD(MNet0) B50-FD(MNet1) B50-FD(MNet2) B50-FD(MNet3) B50-FD(MNet4) B50-FD(MNet5) B50-FD(MNet6)

Fig 79: The figure shows the results for the flat distributed rules from the Noise analysis tool with State noise at a level of 0.001.

State noise 0.001 for nested canalizing rules

0

5

10

15

20

25

30

35

0 10 20 30 40 50

Time step

Ham

min

g di

stan

ce B50-NC(Milo) B50-NC(MNet0) B50-NC(MNet1) B50-NC(MNet2) B50-NC(MNet3) B50-NC(MNet4) B50-NC(MNet5) B50-NC(MNet6)

Fig 80: The figure shows the results for the nested canalizing rules from the Noise analysis tool with State noise at a level of 0.001.

81

All noise 0.001 for flat distributed rules

0

5

10

15

20

25

30

0 10 20 30 40 50

Time step

Ham

min

g di

stan

ce B50-FD(Milo) B50-FD(MNet0) B50-FD(MNet1) B50-FD(MNet2) B50-FD(MNet3) B50-FD(MNet4) B50-FD(MNet5) B50-FD(MNet6)

Fig 81: The figure shows the results for the flat distributed rules from the Noise analysis tool with All noise at a level of 0.001.

All noise 0.001 for nested canalizing rules

0

5

10

15

20

25

30

35

0 10 20 30 40 50

Time step

Ham

min

g di

stan

ce B50-NC(Milo) B50-NC(MNet0) B50-NC(MNet1) B50-NC(MNet2) B50-NC(MNet3) B50-NC(MNet4) B50-NC(MNet5) B50-NC(MNet6)

Fig 82: The figure shows the results for the nested canalizing rules from the Noise analysis tool with All noise at a level of 0.001.

82

Analysis and discussion A short summary and an interpretation of the results will be given for every applied analysis. The first tool that was applied was Mfinder. The most interesting to notice is that the Milo-net has several motifs, which occur in larger numbers than in the random generated networks, Mfinder uses as comparison. This can be seen in TAB 8. Another issue, which is interesting to notice, is that when the Milo-net was rewired the motifs were broken up. With other words the motifs in the rewired networks occurs in approximately the same numbers as in the random networks, which Mfinder uses as comparison. One exception exists and it is MNet4, which has one motif with higher occurrence (see TAB 7). The Derrida plots in Fig 77 and Fig 78 shows that the networks are robust for both nested canalizing and flat distributed rules, due to the fact they do not cross the diagonal. The second thing, which is interesting to notice, is that difference between the networks is very small. This is true for both nested canalizing and flat distributed rules. The third tool that was applied was the Noise analysis tool and the result is consistent for all variations of parameters. Both for State noise and All noise is the difference between the different networks very small and this is true for both nested canalizing and flat distributed rules (See Fig 79, Fig 80, Fig 81 and Fig 82). The difference increases a bit for higher number of time steps, but it is still very small. The overall conclusion one can draw is that the higher occurrence of certain motifs in the Milo-net does not affect the stability. This conclusion can be drawn due to the fact that the different networks result in almost the same curves for both the Derrida-plots and the Noise analysis tool despite the fact that the higher occurrence of the motifs only is present in the Milo-net

83

Simulations on the Lasso-net The purpose with these simulations is to examine how robust the Lasso-net is. The network is called the Lasso-net after the method used to derive it and it is a model for the genetic regulatory network for the yeast cell (Saccharomyces cervisiae). The method is described in [28]. Besides the Lasso-net two other types of networks are generated for comparison. These two networks have the same number of vertices and edges as the Lasso-net. The Lasso-net has 6178 vertices and 11674 edges. The two other types of networks are ER-networks and scale-free networks. The networks from these two types are generated randomly. For scale-free networks a maximum limit for indegrees exists and it is ten indegrees per vertex. The properties that will be evaluated are, probability distribution for in- and out-degrees, Derrida plots, Noise analysis tool and the mean probability for canalizing rules.

Probability distributions The probability distributions are presented in the graphs below. The probability distributions for both indegrees and outdegrees are shown for the Lasso-net, ER-nets and the scale-free nets. For the Lasso-net the real probability distribution is shown but for the other two types of networks the mean probability distribution is shown. The mean probability distribution is based on several networks within the same type. The mean is based on ten different networks that is ten ER-networks and ten scale-free networks are generated to calculate the respective means. The ER-networks are denoted NetER-11674 and the scale-free networks are denoted NetSF-11674. Probability distributions for indegrees and outdegrees are shown in the same graph with log-log scales. When there are values for the in- or out-degrees that are zero, the number of in- or out-degrees is added with one. This is shown in the graphs by the fact that the label on the x-axis says: number of in-degrees (inputs) +1.

Probability distributions for the Lasso-net

0,0001

0,001

0,01

0,1

11 10 100 1000

Number of indegrees/outdegrees+1

Prob

abili

ty

Probability distribution for indegrees Probability distribution for outdegrees

Fig 83: The figure shows the probability distributions for the Lasso-net.

84

Probability distributions for ER-11674

0,00001

0,0001

0,001

0,01

0,1

11 10

Number of indegrees/outdegrees +1

Prob

abili

ty


Fig 84: The figure shows the mean probability distributions for NetER-11674. The mean probability distributions for both indegrees and outdegrees are plotted.

Probability distributions for NetSF-11674

0,00001

0,0001

0,001

0,01

0,1

11 10 100


Prob

abili

ty


Fig 85: The figure shows the mean probability distributions for NetSF-11674.

85

Derrida plots Derrida plots are plotted for the Lasso-net, NetER-11674 and NetSF-11674. Two types of rules are applied to every network. These rules are flat distributed and nested canalizing rules. For the Lasso-network ten rules of each type are assigned. The notation for these rules are B10-

FD(Lasso) and B10-NC(Lasso). Where B10-FD(Lasso) represents the ten sets of flat distributed rules and B10-NC(Lasso) represents the ten sets of nested canalizing rules. For the other two types of networks only one set of rule of each type are assigned per network but the total number of sets of rules are the same due to the fact that ten networks are used per network type. The rules for the ER-nets are denoted BFD(NetER-11674) for the flat distributed rules and BNC(NetER-11674) for the nested canalizing rules. In the case of the scale-free nets the flat distributed rules are denoted BFD(NetSF-11674) and the nested canalizing rules are denoted BNC(NetSF-11674). The Derrida plots are based on the mean from the ten rules in the Milo-net or from the ten networks in NetSF-11674 and NetER-11674. For every Boolean rule is fifty initial states run and the curves are based on fifty different hamming distances.

Derrida plots for the flat distributed rules

0

1000

2000

3000

4000

5000

6000

0 1000 2000 3000 4000 5000 6000

d(T)

d(T+

1)

diagB10-FD(Lasso)BFD(NetER-11674)BFD(NetSF-11674)

Fig 86: The figure shows the Derrida plots for the flat distributed rules for the Lasso-net, NetER-11674 and NetSF-11674.

86


0

1000

2000

3000

4000

5000

6000

0 1000 2000 3000 4000 5000 6000

d(T)

d(T+

1)

diagB10-NC(Lasso)BNC(NetER-11674)BNC(NetSF-11674)

Fig 87: The figure shows the Derrida plots for the nested canalizing rules for the Lasso-net, NetER-11674 and NetSF-11674.

Noise analysis tool The graphs are based on means for the sets of rules and for every set of rules fifty starts are made. For every start fifty time steps are taken. One noise level is applied 0.001.

State noise 0.001

050

100150200250300350400450

0 10 20 30 40 50

Time step

Ham

min

g di

stan

ce B10-FD(Lasso)BFD(NetER-11674)BFD(NetSF-11674)B10-NC(Lasso)BNC(NetER-11674)BNC(NetSF-11674)

Fig 88: The figure shows the result for the State noise at a noise level 0.001 for the Lasso-net and its comparison net.

87

Delay noise 0.001

0

5

10

15

20

25

30

0 10 20 30 40 50

Time step

Ham

min

g di

stan


Fig 89: The figure shows the result for the Delay noise at a noise level 0.001 for the Lasso-net and its comparison net.

Transfer Noise 0.001

0

20

40

60

80

100

120

0 10 20 30 40 50

Time step

Ham

min

g di

stan


Fig 90: The figure shows the result for the Transfer noise at a noise level 0.001 for the Lasso-net and its comparison net.

88

All noise 0.001

050

100150200250300350400450500

0 10 20 30 40 50

Time step

Ham

min

g di

stan


Fig 91: The figure shows the result for the All noise at a noise level 0.001 for the Lasso-net and its comparison net.

Mean probability for canalizing rules The mean probability for canalizing rules is calculated for the Lasso-net, NetER-11674 and NetSF-11674. The mean is calculated for every vertex with at least one indegree, that is vertices with no indegrees are removed from the calculations. The standard error is based on the difference of networks within a type of network and because of this the Lasso-net has no standard error. Tab 9: The table shows the mean probability for canalizing rules for the different types of networks. Mean probability for


Lasso 0.809457 0 NetER-11674 0.715035 0.00244881 NetSF-11674 0.822146 0.00157343

Analysis and discussion The results will be interpreted and discussed for every analysis method. First out is the Derrida plots. In Fig 86 the graph shows the Derrida plot for the flat distributed rules. One can see that the least robust network type is the BFD(NetER-11674) due to the fact that it lies closest to the diagonal. The most robust network, according to the Derrida plot, is the BFD(NetSF-

11674). B10-FD(Lasso) has a robustness that lies between BFD(NetER-11674) and BFD(NetSF-11674). It is also worth noting that all the network types lie below the diagonal for flat distributed rules and this means that they are robust. Fig 87 shows the Derrida plots for the nested canalizing rules. The first thing one can observe is that all curves lie below the diagonal and all the networks are therefore robust for nested canalizing rules. According to the Derrida plots, in Fig 87, the order for the networks is B10-NC(Lasso), BNC(NetER-11674) and BNC(NetSF-11674). Where B10-NC(Lasso) is the most robust and BNC(NetSF-11674) the least robust.

89

The Noise analysis tool has been applied with a multitude of different settings. The result is summarised step by step for the different settings in the order: State noise, Delay noise, Transfer noise and All noise. The results of the State noise are shown in Fig 88 and one can observe that the curves for nested canalizing rules show a higher degree of robustness against State noise than the curves for flat distributed rules. The biggest difference can be observed for NetER-11674. For the other curves the difference is substantially smaller. It can also be interesting to observe that the BFD(NetER-11674) is more sensitive for State noise than B10-

FD(Lasso) and BFD(NetSF-11674). The results for the Delay noise is presented in Fig 89 and one can observe that the effect of Delay noise is smaller than for State noise. One can also observe that in contrast to the results from the other methods B10-FD(Lasso) shows the least robustness. In Fig 90 the results from the Transfer noise is shown and one can observe that the curves for nested canalizing rules show a lower sensitivity towards Transfer noise than the curves for the flat distributed rules. One can also observe that the effect of Transfer noise is smaller than for State noise but the effect is bigger than for Delay noise. It is not possible to determine which of the flat distributed rules that is the most robust against Transfer noise due to the facts that they lay rather close to each other and that the curves crosses each other. The results form the settings when all three types of noise are turned on are shown in Fig 91 and on can observe a difference between the curves for flat distributed rules and the curves for nested canalizing rules. The nested curves show a higher robustness toward the applied noise types. It can also be interesting to observe the order for the curves with flat distributed rules. The most robust of the curves with flat distributed rules are according to Fig 91 BFD(NetSF-11674). The least robust is BFD(NetSF-11674) and the second least robust is B10-FD(Lasso). The last property that was evaluated was the mean probability for canalizing rules, see Tab 9, and it gave the order NetSF-11674, Lasso and NetER-11674. Where NetSF-11674 has the highest mean probability for canalizing rules and NetER-11674 the smallest. It can be interesting to observe that the order for the mean probability for canalizing rules is the same as the order for the robustness. The values for the mean probability for canalizing rules for the different networks are consistent with the probability distributions for the indegrees. The probability distributions are displayed in Fig 83, Fig 84 and Fig 85. A higher probability for vertices with few indegrees results in a high mean probability for canalizing rules. To summarize, the most interesting observations one can draw from the results is the connection between the robustness and mean probability for canalizing rules and the difference between nested canalizing rules and flat distributed rules. It can also be worth noting that B10-NC(Lasso) is the most robust for the nested canalizing rules according to the Derrida plots and the Noise analysis tool with the settings State noise and All noise. BFD(NetSF-11674) is the most robust when flat distributed rules are applied.

90

Summarizing discussion and analysis There are three main conclusions one can draw from the simulations performed in this thesis. These conclusions are:

• There is a strong connection between a high mean probability for canalizing rules and robustness.

• The probability distribution of indegrees is, when flat distributed rules are applied, the

most important factor for if the network lies in the chaotic or ordered regime.

• Networks with nested canalizing rules are robust. When flat distributed rules are applied one can see that the mean probability for canalizing rules is the most important property for if the network lies in the chaotic or ordered regime. Canalizing rules are known for there ability to make a system ordered [13]. A high probability for canalizing rules is not anything that itself can bring order to a system but in a system that has random generated rules a high probability for canalizing rules will lead to more canalizing rules. Support for the claims that the mean number of canalizing rules is an important property for the robustness of a system can be found in Fig 24 and Fig 25. In these figures one can see that the curves generated with the help of the mean probability for canalizing rules lie close to the curves that show the edges between the chaotic and ordered regime. More support can be found in the simulations for the different networks. For example if one compares the Derrida plot in Fig 28 with the mean probability for canalizing rules in Tab 3 one can see that networks that has a mean probability for canalizing that lies above 0.69 are robust. This can also be seen for the other networks. One can for example see that the Fang-net lies in chaotic regime (See Fig 41) and that the mean probability for canalizing rules, for the Fang-net, lies below 0.69 (See Tab 4). The Noise analysis tool gives, most of the time, the same order as for the mean probability for canalizing rules. The most noise resistant has the highest mean number of canalizing rules, the second most noise resistant has the second highest mean number of canalizing rules and so on. One can often see that networks, which has mean probabilities for canalizing rules that lie close to each other often lie close to each other for the other analysis tools to. This can be seen for the Milo-net and the ER-net with the same number of edges and vertices as the Milo-net (See Fig 67, Fig 70, Fig 75 and Tab 6). This not always true for all noise types. One can see in Fig 72 that shows the curves for the Delay noise that the Milo-net and the ER-networks are far from each other this can also be seen for Transfer noise (See Fig 74). The conclusion one can draw is that one can determine if a network with flat distributed rules lies in the chaotic or ordered regime by observing the mean probability for canalizing rules. If it lies above 0.69 the network is robust. It is important to observe that the mean probability for canalizing rules is only useful when the rules are randomly assigned due to the fact that it is possible to construct rules manually to be non-canalizing despite that the mean probability for canalizing rules are high. The probability distribution for indegrees is the most important factor that affects the robustness for networks with flat distributed rules. This statement can be justified due to the fact that there is a connection between probability distribution for indegrees and the mean probability for canalizing rules. The probability for a Boolean rule to be canalizing is higher the fewer the inputs are see Tab 2 [13]. This leads to that networks that has a high probability for vertices with few indegrees has a high mean probability for canalizing rules and one sees this connection for all the simulated networks. The blue graph in Fig 24 has been calculated

91

without considering the outdegrees or the topology besides the probability distribution for indegrees and its behavior is for large part of the scale the same as the real edge between the chaotic and ordered regime. The result in Fig 24 provides a strong support for the claim that the probability distribution of indegrees is, when flat distributed rules are applied, the most important factor that affects the robustness of the network. Similar results that also support the claim exist in Fig 25. Yet more support can be found under the topic Simulations on rewired versions of the Milo-net in the simulations part. The rewired networks have almost exactly the same result for the analysis tools as the Milo-net. See for example Fig 77, Fig 79 or Fig 81. The rewired networks have the same response as the Milo-net despite the fact that the motifs in the Milo-net were broken up in the rewired versions. This is an indication on that the topology of the network, beside the probability distributions, does not have any affect on the robustness. The motifs have no affect on the robustness for nested canalizing rules. All the simulations performed in this report on networks with nested canalizing rules can confirm the claims in the article [23] that nested canalizing rules always are stable. See for example Fig 29. Another interesting thing one can observe is that the higher the mean number of edges per vertices is the more robust the network gets. For a more detailed description and interpretation of the robustness for the different networks see the topics for respectively network in the Simulations part. Suggestions on further research The simulations performed in this thesis have raised many question and some of them have not been answered. One of the most interesting unanswered questions is what factors, besides mean connectivity, that determine how robust a network is when nested canalizing rules are applied. One factor one should be able to rule out, with the help of the simulations performed on the rewired versions of the Milo-net, is the motifs. Is it the probability distribution for the indegrees that affects the robustness when nested canalizing rules are applied? Or is it the probability distribution for the outdegrees that affects the robustness? These questions could be interesting to continue to study. Another interesting issue to try to resolve is the problems with using entropy as an indicator on chaotic behavior. The problems are that one does not always have knowledge about the length of the cycle attractors and that entropy with its current implementation cannot handle larger networks. Yet another interesting property to study further is the phenomenon with higher hamming distances for early time steps when Delay noise is applied. One possible explanation that would be interesting to examine is if the Delay noise has a larger effect before the system has reached a cyclic behavior.

92

Figure list Fig 1 5 Fig 2 7 Fig 3 8 Fig 4 14 Fig 5 19 Fig 6 20 Fig 7 20 Fig 8 21 Fig 9 22 Fig 10 22 Fig 11 23 Fig 12 24 Fig 13 25 Fig 14 26 Fig 15 27 Fig 16 28 Fig 17 30 Fig 18 32 Fig 19 33 Fig 20 34 Fig 21 34 Fig 22 35 Fig 23 35 Fig 24 38 Fig 25 39 Fig 26 40 Fig 27 41 Fig 28 42 Fig 29 42 Fig 30 43 Fig 31 43 Fig 32 44 Fig 33 44 Fig 34 45 Fig 35 45 Fig 36 46 Fig 37 46 Fig 38 49 Fig 39 50 Fig 40 50 Fig 41 51 Fig 42 52 Fig 43 52 Fig 44 53 Fig 45 53 Fig 46 54 Fig 47 54 Fig 48 55 Fig 49 55 Fig 50 56 Fig 51 59 Fig 52 60 Fig 53 60 Fig 54 61 Fig 55 62 Fig 56 62 Fig 57 63 Fig 58 63

93

Fig 59 64 Fig 60 64 Fig 61 65 Fig 62 65 Fig 63 66 Fig 64 69 Fig 65 70 Fig 66 70 Fig 67 71 Fig 68 72 Fig 69 72 Fig 70 73 Fig 71 73 Fig 72 74 Fig 73 74 Fig 74 75 Fig 75 75 Fig 76 76 Fig 77 80 Fig 78 80 Fig 79 81 Fig 80 81 Fig 81 82 Fig 82 82 Fig 83 84 Fig 84 85 Fig 85 85 Fig 86 86 Fig 87 87 Fig 88 87 Fig 89 88 Fig 90 88 Fig 91 89

94

Equation list Eq1 7 Eq2 7 Eq3 7 Eq4 8 Eq5 9 Eq6 9 Eq7 9 Eq8 9 Eq9 9 Eq10 10 Eq11 10 Eq12 10 Eq13 10 Eq14 10 Eq15 10 Eq16 10 Eq17 11 Eq18 12 Eq19 13 Eq20 14 Eq21 15 Eq22 15 Eq23 15 Eq24 15 Eq25 15 Eq26 16 Eq27 16 Eq20 17 Eq29 21 Eq30 21 Eq31 37 Table list Tab 1 12 Tab 2 13 Tab 3 47 Tab 4 56 Tab 5 66 Tab 6 76 Tab 7 78 Tab 8 79 Tab 9 89

95

References 1. Kauffman Stuart (1969), “Metabolic stability and epigenesis in randomly constructed genetic nets”, Journal of Theoretical Biology vol. 22, p.437 2. Gershenson Carlos (2004), Introduction to Random Boolean networks, http://uk.arxiv.org/abs/nlin.AO/0408006, (acc 2005-04-03) 3. Shmulevich Ilya, Dougherty Edward R., Zhang Wei (2002), “From Boolean to Probabilistic Boolean Networks as Models of Genetic Regulatory Networks”, Proceedings of the IEEE vol. 90, p. 1778 4. Alberts Bruce, Bray Dennis, Johnson Alexander, Lewis Julian, Raff Martin, Roberts Keith, Walter Peter (1998), Essential cell biology an introduction to the molecular biogology of the cell, Garland Publishing, ISBN: 0-8153-2045-0 5. West, Douglas B (2001), Introduction to graph theory, Prentice Hall, 2nd edition, ISBN: 0-13-014400-2 6. Albert Réka, Barabási Albert-László (2002), “Statistical mechanics of complex networks”, Reviews of modern physics vol. 74 no. 1 p. 47 7. Derrida B, Weisbuch G (1986), “Evolution of overlaps between configurations in random Boolean networks”, Journal de physique vol. 47 p. 1297 8. Derrida B, Pommeau Y (1986), “Random networks of automata: a simple annealed approximation”, Europhysics Letter vol. 1, p. 45 9. Adams, Robert A (1999), A complete course calculus, Addison Wesley, 4th edition, ISBN: 0-201-39607-6 10. Aldana, Maximino (2003), “Boolean dynamics of networks with scale-free topology”, Physica D: Nonlinear Phenomena vol. 185, p.45 11. Barabási Albert-László, Oltvai Zoltán N. (2004), “Network Biology: Understanding the Cell’s Functional Organization”, Nature Reviews Genetics vol. 5, p.101 12. Milo R., Shen-Orr S, Itzkovitz S., Kashtan N., Chklovskii D., Alon U. (2002), “Network Motifs: Simple Building Blocks of Complex Networks”, Science vol. 298, p. 824 13. Just Winfried, Shmulevich Ilya, Konvalina John (2004), “The number and probability of canalizing functions”, Physica D vol. 197, p.221 14. Kauffman Stuart, Peterson Carsten, Samuelsson Björn, Troein Carl (2003), “Random Boolean network models and the yeast transcriptional network”, PNAS vol 100 no. 25 p. 4781 15. Cover Thomas M., Thomas Joy A. (1991), Elements of Information Theory, Wiley, 1st edition, ISBN 0-471-06259-6

96

97

16. Harris Randy (1999), Nonclassical Physics Beyond Newton’s View, Addison Wesley, 1st edition, ISBN 0-201-83436-7 17. Moon, Francis C (2004), Chaotic vibrations an introduction for applied scientists and engineers, Wiley, 2nd edition, ISBN: 0-471-67908-9 18. Kauffman Stuart (2003), “Understanding genetic regulatory networks”, International journal of astrobiology Vol 2 p.131-139 19. Stallings, William (2002), Wireless communications and networks, Prentice Hall, ISBN: 0-13-040864-6 20. Blom, Gunnar (1989), Sannolikhetsteori och statistikteori med tillämpningar, Studentlitteratur, 4th edition, ISBN: 91-44-03594-2 21. Weis, Mark Allen (1999), Data structures and algorithm analysis, Addison Wesley, ISBN: 0-201-35754-2 22. Press William H, Teukolsky Saul A, Vetterling William T, Flannery Brian P (2002), Numerical Recipes in C++ the art of scientific computing, Cambridge university press, 2nd edition, ISBN: 0-521-75033-4 23. Kauffman Stuart, Peterson Carsten, Samuelsson Björn, Troein Carl (2004), ”Genetic networks with canalyzing Boolean rules are always stable”, PNAS vol. 101, p.17102 24. Li Fangting, Long Tao, Lu Ying, Quyang Qi, Tang Chao (2004), “The yeast cell-cycle network is robustly designed”, PNAS vol 101 no.14 p.4781-4786 25. Lee Tong Ihn, et al. (2002), “Transcriptional Regulatory Networks in Saccharomyces cerevisiae”, Science vol. 298, p. 799 26. http://www.weizmann.ac.il/mcb/UriAlon/ (acc 2005-04-03) 27. Mfinder Tool Guide, Departments of Molecular Cell Biology and Computer Science & Applied Mathematics Weizmann Institute of Science Rehovot Israel 28. Mika Gustafsson, Michael Hornquist, Anna Lombardi (2004), "Large-scale reverse engineering by the Lasso", http://arxiv.org/abs/q-bio.MN/0403012 (acc 2005-04-03)

dynamics in boolean networks - diva portal20230/fulltext01.pdf · department of science and...

Documents