analysis of biological networks: constraint-based modeling...

Analysis of Biological Networks:Constraint-Based Modeling of Metabolic Networks

Lecturer: Roded Sharan Scribe: Nir Dil and Liron Levkovitz

Lecture 10, January 5, 2006

1 Introduction

In silico analysis of metabolic networks has gained significant attention in recent years. We describehere a common approach calledconstraint based modeling (CBM), which models achievable physiolog-ical metabolic states of the cell for unicellular organisms (microorganisms). Currently, CBM is the onlyapproach that enables genome-scale description of metabolic processes. This review is based mainly on thereview of Price et al, 2004[13].

1.1 Metabolism

Metabolismis the biochemical modification of chemical compounds in living organisms and cells. Thecell metabolism includes all chemical processes in a cell that produce energy and basic materials neededfor important life processes. This includes the biosynthesis of complex organic molecules (anabolism) andtheir breakdown with release of energy (catabolism).A substance which participates a biochemical reaction is called ametabolite.A process in which two or more molecules (reactants) interact and produce a product, usually with the helpof an enzyme, is called abiochemical reaction.Metabolic processes consist of enzymatic step sequences which form ametabolic pathway. These pathwaysachieve either the formation of a metabolic product to be used or stored in the cell, or the initiation ofanother metabolic pathway (see Figure 1 for an example). Many of these pathways involve a step by stepmodification of the initial substance to shape it into a product with a desired chemical structure.Several distinct but linked metabolic pathways are used by cells to transfer the energy released by breakdownof fuel molecules to ATP and are calledcellular respiration.Anaerobic respiration refers to the oxidation of molecules in the absence of oxygen in order to generateenergy. Aerobic respiration requires oxygen in order to generate energy and consists of three well-knownand characterized metabolic pathways, glycolysis (does not necessarily require oxygen), Kreb’s citric acidcycle and oxidative phosphorylation.

1.2 Metabolic Networks

Metabolic networks consist of vertices that represent metabolites and edges that represent biochemical re-actions. A biochemical reaction is catalyzed by an enzyme, which is a protein and is translated from acorresponding gene. Therefore, each edge corresponds to a gene as well. Note that the graph that representsthe metabolic network is not necessarily a simple graph, since each enzyme can catalyze more then onereaction. In addition, a single reaction can be catalyzed by a complex of enzymes. This can be representedby a bipartite graph

1

Figure 1: Metabolic Network of the central carbon metabolism and partial amino acid biosynthesis andmetabolism ofB. megaterium.

2

An alternative way of representing metabolic networks is thereaction graphrepresentation. In such a rep-resentation, vertices represent enzymes and edges represent metabolites that are produced by one enzymeand consumed by the other enzyme. This way of representation is similar to other common networks wherevertices represent genes/proteins.There are three main motivations to study metabolic networks. First, it is probably the best understood outof all cellular networks. The data is extensive, the networks coverage is high and so are their qualities (withlow false positives and false negatives). Second, the function of the metabolic system is characterized interms of its input and output. Third, it has tremendous importance in medicine since it can be used fordeveloping drugs, studying metabolic disorders, liver disorders, heart disorders, etc.

2 Kinetic Models

One way to describe metabolic processes is by a kinetic model. Such a model describes the dynamics ofmetabolic behavior over time. It incorporates metabolite concentrations, enzyme concentrations and enzymeactivity rates (which depend on these concentrations), and is solved using a set of differential equations.Currently, such models are not feasible for large scale networks, due to high complexity level and therequirement of specific enzyme activity rates data, which is difficult to measure. However, kinetic modelswere successfully used to model the activity of specific pathways such as glycolysis and histidine synthesisin yeast.

3 Constraint Based Modeling

The constraint-based modeling approach is based on the assumption that organisms exist in particular envi-ronments that typically have scarce resources, and over time, the probability that the fit will survive is highercompared with the probability of survival of the less fit. To be fit for survival, a myriad of constraints mustbe satisfied, which limits the range of available phenotypes.

3.1 Constraints on cellular functions

The constraints can be divided into four categories:

• Physico-chemical constraints:Mass, energy and momentum must be conserved. Furthermore, bio-chemical reactions must result in a negative free-energy change to proceed in the forward direction.

• Topobiological constraints:The crowding of molecules inside cells leads to topobiological, or three-dimensional constraints.

• Environmental constraints: Nutrient availability, pH, temperature, osmolarity, the availability ofelectron acceptors, etc.

• Regulatory constraints: Regulatory constraints differ from the three categories above as they arenot ”hard” constraints, but more of regulatory restraints. They are implemented by the cell in variousways, including the amount of gene products made (transcriptional and translational regulation) andtheir activity (enzyme regulation).

3.2 Metabolic Network Model Construction

When developing a metabolic network model we aim to define all of the metabolites and metabolic reactionsin the system. This includes enzymes and their catalyzed reactions (reactants and products) and all transport

3

Figure 2: Source: [1]. Iterative metabolic network model building. Involves the formulation of experimen-tally testable hypotheses based on thein silico analysis, collection of experimental data, and subsequentrefinement of the models.

mechanisms in the cell (membrane diffused, pores diffused and actively transported).The process of building such a model is an iterative process[1]. It begins with collecting and incorporatinggenomic, biochemical, and physiological data. This results in anin silico model which has some interpre-tive and predictive capabilities. However, because of incomplete knowledge of constraints and erroneousannotation, the initial model is able to represent only some functions of the organism correctly. Therefore,the next step is the iterative step in which experiments are performed in order to check hypothesis that arebased on thein silico analysis, and consecutively, the model is updated. In addition to the experiments, themodel is updated also by various bioinformatic methods and databases (see Figure 2).

3.3 Mathematical Representation of Constraints

After the recognition and definition of constraints, they need to be described mathematically. Once in amathematical form, they can be used to perform anin silico analysis.There are two fundamental types of constraints:

• Balances: Constraints that are associated with conserved quantities, such as energy, mass, redoxpotential and momentum, as well as with phenomena such as solvent capacity, electroneutrality andosmotic pressure.

• Bounds: Constraints that limit numerical ranges of individual variables and parameters such as con-centrations, fluxes or kinetic constraints.

The conservation of mass is an example of a balance constraint. Mass balance is utilized in terms of theflux and the stoichiometry of each reaction. LetS be the stoichiometric matrix containing the stoichiometryof all reactions in the network.S is anm×n matrix wherem corresponds to the number of metabolites andnis the number of reactions or fluxes taking place within the network. The entrySij defines the stoichiometryof reactanti in reactionj (see Figure 3 for example).

4

Figure 3: The first step of the glycolysis pathway and the corresponding stoichiometric matrix. In thisprocess, the enzyme glucokinase catalyzes the reaction in which glucose and ATP are converted into glucose-6-phosphate and ADP.

Figure 4: Solution space representation. (a) The mass balance constraintS · v = 0 limits the solution spaceto a subspace ofRn (b) Thermodynamic constraintsvi > 0 further limit the solution space to a convex cone(c) Capacity constrainsvi < vmax again further limit the solution space to a bounded convex cone.

Let v be a vector such thatvi defines the flux through reactioni. Mass balance constraints are formulatedas:

S · v = 0 (1)

Allowed ranges of fluxes is an example of bound constraints. Such ranges can be determined in several ways,either experimentally (commonly by isotope labeling), thermodynamically (irreversibility of a reaction) orby capacity constraints (such as maximum uptake rate of a transporter).Mathematically speaking we define a set of inequalities of the form:

vmin < vi < vmax (2)

For irreversible reactions,vmin = 0. Specific upper limitsvmax that are based on enzyme capacity mea-surements are generally imposed on reactions.Taken together, the balances and bounds described as linear equations define a feasible flux distributionsspace which is a polytope in a high-dimensional space. All allowable network states are contained in thisspace (Figure 4).

In recent years, manyin silico methods have been developed to explore the solution space polytope.These methods can be classified into two categories: finding optimal states according to some criteria andexploring the phenotypic solution space. The following sections describe these methods.

5

4 Exploring the Convex Solution Space

4.1 Single Optima: Flux Balance Analysis (FBA)

FBA is a constraint based modeling approach that assumes that under any given environmental condition,the organism will reach a steady state in which there is no accumulation or depletion of metabolites in ametabolic network. Hence, the rate of production of each metabolite in the network must equal its rate ofconsumption. This results in a mass balance constraint as explained in the previous section.Once the solution space has been defined, FBA can be used to find optimal metabolic states with respect tosome objective function[8].The objective function represents probable physiological function, and is defined in context of the studiedorganism. It can aim to maximize biomass production, minimize ATP production, minimize nutrient uptakerate or any other estimated function.Biomass production maximization has been shown to be consistent and best approximate experimental data.Biomass production is essentially the production of molecules that are needed for growth. It is comprised outof a set of molecules in specific concentrations such as amino acids, ribonucleotides, deoxyribonucleotides,phospholipids, lipids, fatty acids, etc.Through our objective function we wish to maximize the production of these molecules and their removalfrom the network (By adding a new virtual reaction) in ratios that correspond to experimentally evaluatedconcentrations (See Figure 5), in such a case the objective function takes a form of this kind:Z = 41.257VATP − 3.547VNADH + 18.225VNADPH + . . ..

Figure 5: Biomass Production ratios. Experimentally evaluated concentration of different molecules neededfor cell’s growth

Hence our problem can be formulated as a LP problem (See Appendix A) in the following manner:

max Vgrowth

6

s.t.

S · v = 0∀i vmin < vi < vmax (3)

It should be noted that under some conditions the behavior of cellular systems is incompatible withthis objective and other objective functions should be evaluated. Furthermore, this objective function cancause inaccuracies due to the fact that the biomass composition varies between different organisms, differentgrowth medium and different growth rates.

To better understand FBA, its steps are explained in details and are illustrated through an example inFigure 6.

Figure 6: Source: [8]. Methodology for flux balance analysis. (a) A model system comprising three metabo-lites (A, B and C) with three reactions (internal fluxes,vi, including one reversible reaction) and three ex-change fluxes (bi). (b) Mass balance equations accounting for all reactions and transport mechanisms andthe corresponding matrix. At steady state, this reduced toS · v = 0. (c) The fluxes of the system areconstrained on the basis of thermodynamics and experimental insights. (d) Optimization of the system withdifferent objective functions (Z). Case I gives a single optimal point, whereas case II gives multiple optimalpoints lying along an edge.

A unique solution is not guaranteed for the LP problem. In fact there can be numerous solutions tothe problem which all yield optimized objective function values. Several methodologies for analyzing suchalternate optima are described in section 4.3.

4.2 FBA applications

The effect of the transient behavior of the regulatory system after genetic perturbations on metabolism hasbeen observed in a recent study of Escherichia coli adaptive evolution [19]. In this study strains carrying

7

deletions of metabolic genes were created and and each strain allowed to evolve in replicate, to yield a totalof 50 evolved deletion strains. Computationally predicted growth rates were compared for each deletionstrain with experimental growth rates determined at the beginning and end of adaptive evolution.The computational model used in this study was a genome-scale model of E. coli containing 904 genesand 1327 reactions that was reconstructed based upon the constraint-based approach utilizing genomic an-notation, biochemical stoichiometry, physiological data, and thermodynamics. FBA was used to calculatepredicted growth rates for the strains containing a single gene deletion. Given a specific gene deletion, themetabolic reactions associated with that gene were deleted from the reaction list to computationally simulatea knockout strain. For direct comparison with experimental strains, experimentally determined substrate andoxygen uptake rates were used as input parameters for the computational simulations.The results showed that in most cases the growth rate of the organism drops after a gene knockout and thengradually increases and converges to a near-optimal growth rate similar to that predicted by FBA; for 39of the 50 strains (78%), the computationally predicted growth rate was within 10% of the experimentallydetermined growth rate at each strain’s evolutionary endpoint.

4.3 Alternative Optima

Currently, there is no definite answer to what do the multiple FBA solutions represent. Several argumentstry to address this problem.

• Missing constraints: It is possible that missing constraints lead to solutions that do not representplausible metabolic behaviors. There have been several attempts to address this problem by introduc-ing new constraints into the systems. Incorporation of additional constraints is reviewed below.

• Effect of exogenous factors:The metabolic space corresponds to growth in a medium under variousexternal conditions that are beyond the scope of the model, such as stress or temperature.

• Heterogeneity within a population: The metabolic space represents heterogenous metabolic behav-iors by individuals within population of genetically identical cells.

• Alternative evolutionary paths: The metabolic space represents different metabolic states attainablethrough different evolutionary paths.

These interpretations are obviously not mutually exclusive.There are numerous methods for analyzing the solution space for alternative optima, as described in thefollowing sections.

4.3.1 Basic Solutions Enumeration Using MILP

As mentioned above there can be multiple flux distributions which satisfy the metabolic network constraintsand optimal objective value.Basic solutions enumeration is an approach for automatically enumerating between such distributions. Itutilizes mixed integer linear programming (MILP, a type of LP problems where some of the variables arecontinuous yet some are discrete. This group of problems is NP-hard) for finding all of the extreme points ofthe convex polytope (the feasible solution space) that have identical optimum objective function values[17].Such extreme points are referred to as alternate optima of the given LP problem.Under this approach new discrete variables are introduced into the problem, altering these variables forcesthe selection of a new basis. A basis corresponds to an extreme point in the solution space polytope, whichcorresponds to a suspected optimum point. The iterative form of the algorithm enables the enumeration

8

between bases and guarantees to find all of the alternate optima.It should be noted that the number of iterations may be exponential since the number of extreme points isexponential.

4.3.2 Flux Variability Analysis

This approach determines the maximum and minimum values of each flux that will satisfy the constraintsand optimize the objective function[15]. This approach does not identify all alternate optima but rather therange of flux variability that is possible within any given solution (Figure 7). The analysis begins with FBA

Figure 7: Source: [15]. A schematic describing the optimal solution space. The shaded area representsthe region where the objective function can take on the same maximum value, and the circles represent thesolutions that can be identified using the flux variability analysis

analysis for finding flux distributions which correspond to optimal growth. Next, these distributions are usedfor calculating the range of variability that can exist for each reaction in the network due to alternate optimalsolutions. This is accomplished by a series of LP problems. Each LP problem corresponds to a reaction inthe network, the objective function corresponds to maximizing/minimizing the flux through this reaction,the constraints are the same as in the original problem but a new equality constraint is added for obtainingonly alternative optima solutions (using the pre-calculated optimal flux distributions). This determines thefeasible range of flux values for each reaction. The formalization of this problem takes the following form:

For each i solve the following two LP problems:

max / min vi

s.t.

S · v = 0vgrowth = Zobj

∀j vmin < vj < vmax (4)

Zobj represents the maximal growth obtained via FBA, and is used as a constraint in order to obtain optimalsolutions with maximal growth. Since metabolic systems may not necessarily operate at the full optimalstate, but may be operating in a suboptimal mode. The demonstrated approach can be used to examinesub-optimal solutions as well.As we’ve shown, the analysis examines solutions with optimum growth by introducing a new equalityconstraint. This constraint can be relaxed and substituted for a constraint of this kind:vgrowth ≥ c ·Zobj for

9

some constantc (c should be close to 1). This relaxation enables the flux ranges exploration of suboptimalsolutions as well. Such an analysis can contribute to understanding the solution space and its flexibility.

4.3.3 Hit And Run Sampling

Hit and run sampling is a method for analyzing the convex solution space [16]. This method produces asequence of random deviates corresponding to a random walk inside the solution space polytope.It starts by choosing an arbitrary solution, and proceeding by iteratively making steps in a random direction.Once the polytope’s walls are hit, it bounces off again in a random direction (See Figure 8 for an illustration).The method does not guarantee uniformly distributed random points, but it generates an approximation forsuch distribution.

Figure 8: Hit and Run sampling. Based on a random walk inside the solution space polytope

4.3.4 Uniform random sampling

The problem of uniform sampling a high-dimensional polytope is NP-Hard. To overcome the computationaldifficulties, Monte Carlo sampling is used as a way to quantify the size of the steady-state solution spacewith respect to a reference space [18]. In Monte Carlo sampling, points are picked in the reference spacerandomly and uniformly, and the number of hits in the sub-space (versus the total number of points), isdetermined. The reference space is a hyper-cube surrounding the polytope whose edges were defined byeach reactionsVmax. It should be noted that in higher dimensions this becomes problematic as the volumeof the solution space may be small, as compared with the reference hyper-cube. The reference space issampled randomly and uniformly (see Figure 9 for an illustration) and only the feasible solutions are taken(i.e fluxes that meet the constraints of the model). If Q is any polytope that contains our polytope, P; we canestimate the volume of P by:

vol(P ) ≈ vol(Q)M

N(5)

where M is the number of times a point was in P and N is the total number of random points generated.Note that Q is chosen such that its volume can easily be computed. For the hyper-rectangle, Q=[0,Vmax],we get:vol(Q) = Vmax1 × Vmax2 × ...× Vmaxn.

10

5 Topological Methods

5.1 Convex basis vectors: Elementary modes and extreme pathways

The solution space can be spanned by a set of basis vectors, such that any vector within the convex solutioncone can be represented as a nonnegative linear combination of the generating vectors of the cone (i.e. thebasis vectors), which correspond to its edges [4] [3]. In other words, any steady-state flux pattern for asystem can be decomposed as a linear superposition modes, which corresponded to steady-state pathwaysthrough a system.Elementary modeis a unique set of non-decomposable reaction sets that characterizes the convex solu-tion cone. Extreme pathwaysare a minimal and unique set of pathways (a subset of elementary modes)that defines the edges of the convex solution cone (see Figure 10). A useful construct for describing thesenetwork-based metabolic pathways is presented in Figure 11, showing a metabolic network, its stoichiomet-ric matrix and the corresponding set of elementary modes and extreme pathways.Extreme pathways and elementary modes are useful for studying microorganisms. They have been used tostudy the inherent redundancy in metabolic networks and have shown that there is more redundancy in the

Figure 9: Uniform random sampling. The solution space polytope is surrounded by a hypercube. Points arerandomly picked from the hypercube

Figure 10: Source: [4]. Elementary modes and extreme pathways. (a) Simple reaction network. (b) For thisreaction there are three extreme pathways and (c) four elementary modes. Note that each set of the extremepathways and elementary modes is unique.

11

production of amino acids by the H. influenzae metabolic network than in the H. pylori metabolic network[12]. The robustness of an organism to gene deletions and changes in gene expression has also been studiedusing extreme pathways and elementary modes in central E. coli metabolism [5], and enzyme subsets (orcorrelated reaction subsets) have been calculated using network-based pathways [7]. In addition, extremepathways and elementary modes have been used to assign functions to orphan genes based on metabolicdata [6] and to design strains [14].

6 Altering Phenotypic Potential: Gene Knockouts

Gene knockout is translated to a deletion of a reaction in the metabolic net. The deletion can result in areduced volume of achievable metabolic states.

6.1 Minimization Of Metabolic Adjustment (MOMA)

Minimization Of Metabolic Adjustment(MOMA) is an algorithm for predicting metabolic flux distributionsafter gene knockout [4]. Although the assumption of optimality for a wild-type organism may be justifiable,the same argument may not be valid for genetically engineered knockouts that were not exposed to long-term evolutionary pressure.In the basis of this method lays the assumption that after a knockout, metabolic fluxes undergo a minimalchanges with respect to the flux configuration of the wild type. MOMA employs quadratic programming toidentify a point in flux space, which is closest to the wild-type point based on the conjecture that the mutantremains initially as close as possible to the wild-type optimum in terms of flux values.In mathematical manner, we denoteΦj to be the feasible solution space after a knockout of the enzyme

Figure 11: Source: [4]. Network-based pathways mathematically characterize the functions of a metabolicnetwork. (a) A reaction network is created from diverse data sets by defining all the different reactionsin an organism. (b) These data are then used to create a stoichiometric matrix. model. With the matrix,extreme pathway and elementary mode analysis can be used to generate a unique set of pathways. (c)In a high-dimensional flux space in which each axis corresponds to the flux through a given reaction, thenetwork-based pathways define the limits to the possible steady-state flux distributions that a network canachieve. All possible flux distributions of a metabolic network lie within the cone circumscribed by thepathways.

12

catalyzing reactionj. The goal is to find the vectorx ∈ Φj such that the Euclidean distance ofx andw isminimized (see Eq. 5).

minD(w, x) = min

√√√√ N∑i=1

(wi − xi)2 (6)

whereN is the total number of fluxes andw is the optimal flux vector in the wild type. MOMA is formalizedsimilarly to FBA, yet the objective function does not explicitly depend on biomass production.

6.2 Regulatory On-Off Minimization (ROOM)

Regulatory On-Off Minimization (ROOM) is also a CBM algorithm for predicting the behavior of metabolicnetworks in response to gene knockouts [20]. In the basis of this method lays the assumption that theorganism adapts to the new situation by minimizing the set of regulatory changes and adjusts to a steadystate that is similar to the steady state of the wild type. Therefore, it implicitly minimizes the number ofregulatory changes by minimizing the total number of significant flux changes. ROOM uses Mixed IntegerLinear Programming (MILP) which can be formalized as:

minN∑

i=1

yi

s.t.

S · v = 0for all 1 ≤ i ≤ N

vi − y(vmax,i − wi) ≤ wi

vi − y(vmin,i − wi) ≥ wi

vj = 0, j ∈ A

yi ∈ {0, 1} (7)

where for each fluxi, yi = 1 for a significant flux change invi andyi = 0 otherwise.A is a set of reactionsassociated with the deleted genes andw, v, S are as in MOMA.Both MOMA and ROOM try to find a flux distribution which is the closest to the one of the wild typewithout concerning about maximizing the growth rate. However, since ROOM is minimizing the numberof significant flux changes, it implicity attempts to maintain the maximal possible growth rate of the wild-type: a significant change in growth which causes to numerous changes in fluxes is unlikely. A biologicalinterpretation may be that the evolved regulatory mechanisms in the cell aiming to minimize flux changesafter genetic perturbations may result in maximizing growth (see example in Figure 12).

6.3 A Comparison between ROOM and MOMA

Predictions of metabolic fluxes for five differentE. coli knockouts were calculated by ROOM, MOMA andFBA and compared with experimental measurements [20].All measured fluxes belong to the central carbon metabolism of E. coli and were empirically(a) determinedby combining NMR spectroscopy in13C labeling experiments and physiological data measurements. Figure13 shows the central carbon metabolism of the E. coli and displays the reactions measured experimentally.Figure 14(a) presents the results of the analysis using FBA, MOMA and ROOM. We can see that in eight ofnine knockout experiments, ROOMs flux predictions are either equal to or more accurate than its contem-poraries. The low number of significant flux changes in ROOMs predictions is in agreement with biological

13

Figure 12: Source: [20]. An example network. (a) Stoichiometric matrix. (b) A given flux distributionfor the wild-type intact network. The flux through b2 represents growth rate.(c) MOMAs prediction for theknocked-out network following the knockout of reaction v6. (d) ROOMs prediction for the knocked-outnetwork. ROOM finds changes in flux only along a short alternative pathway through v5 and v4, preservingthe optimal growth rate of the wild-type strain.

14

Figure 13: Source: [20]. A schematic representation of the central carbon metabolism of E. coli. Thecrossed reactions are the reactions that a knockout was made to the enzymes catalyzing them.

15

data showing a small number of regulatory changes in the adapted steady-state condition after a knock-out. The high number significant flux changes in MOMAs predictions suggests that MOMA may be moresuitable for predicting transient postperturbation states, in agreement with measured large-scale transientchanges in expression patterns.Figure 14(b) shows the relative errors in growth rate predictions. It’s easy to see that ROOM and FBA pro-vide significantly more accurate predictions than MOMA. Overall, the work of [20] suggests that MOMA is

Figure 14: Source: [20]. Flux and growth-rate comparison among FBA, MOMA, and ROOM for fiveknocked-out organisms. (a) Pearson correlations between experimental fluxes and predictions. (b) Relativeerrors in growth rate predictions, calculated by subtracting the experimentally measured growth rate fromthe predicted growth rate and dividing by the experimentally measured growth rate.

more appropriate for predicting transient growth rates in response to genetic perturbations, whereas ROOMand FBA better predict the final growth rate achieved after the adaptation process.

7 Application of additional constraints

7.1 Regulatory constraints

Regulatory constraints differ from the rigid physico-chemical constraints in two ways. First, they are selfimposed by the organism and presumably represent the result of an optimal evolutionary process. Second,they are time-dependant in that the external and internal environment at a given time point determines tran-scriptional activity.As a result, the effects of transcriptional regulation can be treated as temporary constraints on the metabolicsystem. These constraints reduce the size of the solution space, change its shape from one environmentalcondition to another and may help avoiding solutions that do not represent plausible metabolic behaviors[9][11][10].Recently a framework has been described for creating a combined metabolic/regulatory (Figure 14) networkand for generating time profiles of the flux distribution. In this framework the transcriptional regulatorystructure is described by using Boolean logic equations.Each reaction has a Boolean formula attached to it which corresponds to the regulation status that permitsthis reaction (for instance the existence of regulator A and absence of regulator B). In addition to that theregulators themselves have Boolean formulas which govern their existence as well (Existence of glucose,oxygen, etc.).

16

Figure 15: Source: [9]. In silico modeling of metabolism and transcriptional regulation using the constraints-based approach. (A) the constraints based approach to metabolic modeling. The metabolic genotype isdefined from the known genes in the genome, as identified in metabolic databases and in the literature.Once the metabolic network has been defined, known invariant constraints that the network must obey areapplied to the cell, enabling the network to be described geometrically as a closed solution space. Flux-balance analysis can be used to identify particular optimal solutions (such as optimization of growth) withinthe space (blue point), which represent possible behaviors of the cell (4). Assuming that metabolism is in aquasi-steady state relative to cell growth, the dynamic behavior of the cell may be simulated using numericalintegration and flux-balance analysis at each time step (5, 6). As shown in (B), transcriptional regulationreduces the steady-state solution space.

17

Once such a model is completed it is possible to simulate the dynamic growth of a cell with the incorpora-tion of such temporary regulatory constraints (Figure 16). This can be achieved by dividing the experimentaltime to small time steps, each step is optimized using FBA and the results are used for determining the inputfor the next step (by calculating the new extra-cellular concentrations and regulatory state of the next step).

Figure 16: Source: [9]. Combined regulatory/metabolic network for central metabolism in E. coli. All ofthe metabolic genes considered are shown. The genes that are regulated are indicated by the color codeshown in the legend. Genes or reactions regulated by multiple regulatory proteins or molecules are shownwith multiple arrows.

7.2 Energy balance analysis

According to thermodynamic principles, each reaction in a biochemical reaction network must have a neg-ative Gibbs free-energy drop (bound constraint) in order to proceed, and the summation of the free-energydrops around a biochemical loop must be zero (balance constraint). Due to these two constraints, biochem-ical loops are constrained to have zero net flux which results in a nonlinear problem and in a non-convexsolution space [2].

18

A Linear Programming - Overview

LP problems are optimization problems that minimize or maximize some linear cost function (the objectivefunction) given some set of linear constraints. The general form of an LP problem is as follows:

minx1...xn

/ maxx1...xn

n∑i=1

cixi

s.t.

∀i = 1...m

n∑j=1

aijxj{≤,≥,=}bi (8)

LP problems can be visualized geometrically. Each inequality represents a half-space. The intersectionof these half-spaces is a region that abides to all constraints and is called the feasible region. The points thatmaximize/minimize the objective function must always lie in one of the corners of the polyhedron that isformed by the intersection of all half-spaces. These corners are also referred to as extreme points.The feasible region may be unbounded and by that allowing the objective function to be unbounded as well.The feasible region might not exist at all hence lack of solution for the objective function. The objectivefunction might converge with a hyper-plane hence infinite number of optimal solutions (See Figure 17 foran illustration example of a LP problem and its solutions)There are several different algorithms for solving LP problems:

• Simplex (not necessarily polynomial)

• Ellipsoid (polynomial)

• Interior Point (polynomial)

– Projective Method

– Affine Method

– Logarithmic Barrier Method

The simplex algorithm is one of the earliest methods for solving LP problems. The basic simplex algorithmis quite intuitive. The idea behind it is that the optimum for the objective function lies in a vertex of thepolytope. Hence, all that need to be done is somehow traverse the extreme points of the LP problem andchoose the one which performs best on the objective function. At first, a random corner is chosen (advancedsimplex algorithms make a wise decision of the initial corner), then the algorithm checks how the neighborsof the current corner perform on the objective function, if there are some corners that perform better wemove to the best one and continue recursively, if the current corner outperforms all its neighbors than itoptimizes the objective function. The main observation that the simplex algorithm takes advantage of is thatin terms of neighboring corners and their objective function value local minimum/maximum is also a globalminimum/maximum.One should mind that the number of corners for a given LP problem is exponential in nature. Hence, thealgorithm may have an exponential running time. In spite of this fact there is much advancement from thisbasic algorithm and in many cases they perform better than other polynomial time LP solvers and are verypopular in practice.

19

Figure 17: An example LP problem and its corresponding solutions. (a) The LP problem is comprised outof two variablesv1 andv2 and three constraints. The three diagrams show the geometric interpretation ofeach of the constraints. (b) The intersection (upper diagram) of the three regions defined by the constraintsforms the feasible solution space (lower diagram). (c) Each of the diagrams refers to a different objectivefunction. In the upper diagram the objective function converges with an edge (hyperplane) and this resultsin an infinite number of optimal solutions. In the lower diagram the objective function reaches its optimumin one of the corners of the feasible solution space, this results in a single optimal solution.

20

References

[1] Palsson B. The challenges of in silico biology.Nature Biotechnology, 18:1147–1150, 2000.

[2] D.A. Beard, S.D. Liang, and H. Qian. Energy balance for analysis of complex metabolic networks.Biophys. J., 83:79–86, 2002.

[3] Schilling C.H., Schuster S., Palsson B.O., and Heinrich R. Metabolic pathway analysis: Basic conceptsand scientific applications in the post-genomic era.Biotechnol. Prog., 15:296–303, 1999.

[4] Papin J.A. et. al. Metabolic pathways in the post-genome era.Trends in Biochemical Sciences,28(5):250–258, 2003.

[5] Stelling J. et. al. Metabolic network structure determines key aspects of functionality and regulation.Nature, 420:190–193, 2002.

[6] Forster J., Gombert A.K., and Nielsen J. A functional genomics approach using metabolomics andinsilico pathway analysis.Biotechnol. Bioeng., 79:703–712, 2002.

[7] Papin J.A., Price N.D, and Palsson B.O. Extreme pathway lengths and reaction participation ingenomescale metabolic networks.Genome Res., 12:1889–1900, 2002.

[8] Kauffman K.J and Prakash P.and Edwards J.S. Advances in flux balance analysis.Current Opinion inBiotechnology, 14:491–496, 2003.

[9] Covert M.W. and Palsson B.O. Transcriptional regulation in constraints-based metabolic models ofescherichia coli.The Journal Of Biological Chemistry, 277(31):28058–28064, 2002.

[10] Covert M.W. and Palsson B.O. Constraints-based models: Regulation of gene expression reduces thesteady-state solution space.J. theor. Biol., 221:309–325, 2003.

[11] Covert M.W., Schilling C.H., and Palsson B.O. Regulation of gene expression in flux balance modelsof metabolism.J. theor. Biol., 213:73–88, 2001.

[12] Price N.D., Papin J.A, and Palsson B.O. Determination of redundancy and systems properties of themetabolic network of helicobacter pylori using genome-scale extreme pathway analysis.Genome Res.,12:760–769, 2002.

[13] Price N.D., Reed J.L., and Palsson B.O. Genome-scale models of microbial cells: evaluating theconsequences of constraints.Nature, 2:886–897, 2004.

[14] Carlson R., Fell D., and Srienc F. F. metabolic pathway analysis of a recombinant yeast for rationalstrain development.Biotechnol. Bioeng., 79:121–134, 2002.

[15] Mahadevan R. and Schilling C.H. The effects of alternate optimal solutions in constraint-basedgenome-scale metabolic models.Metabolic Engineering, 5:264–276, 2003.

[16] Smith R.L. The hit-and-run sampler: a globally reaching markov chain sampler for generating arbitrarymultivariate distributions.Department of Industrial and Operations Engineering The University ofMichigan, Proceedings of the 1996 Winter Simulation Conference:260–264, 1996.

[17] Lee S., Phalakornkule C., Domach M.M., and Grossmann I.E. Recursive milp model for finding allthe alternate optima in lp models for metabolic networks.Computers and Chemical Engineering,24:711–716, 2000.

21

[18] Wiback S.J., Famili I., Greenberg H.J., and Palsson B.O. Monte carlo sampling can be used to deter-mine the size and shape of the steady-state flux space.Journal of Theoretical Biology, 228:437–447,2004.

[19] Fong S.S and Palsson B.O. Metabolic gene deletion strains of escherichia coli evolve to computation-ally predicted growth phenotypes.Nature Genetics, 36:1056 – 1058, 2004.

[20] Shlomi T., Berkman O., and Ruppin E. Regulatory on/off minimization of metabolic flux changes aftergenetic perturbations.PNAS, 102(21):7695–7700, 2005.

22

analysis of biological networks: constraint-based modeling...

Documents