model based design of a saccharomyces cerevisiae platform ... · pdf filemodel based design of...
TRANSCRIPT
i
Model based design of a Saccharomyces cerevisiae
platform strain with improved tyrosine
production capabilities
by
Sarat Chandra Cautha
A thesis submitted in conformity with the requirements for the degree of Master of Applied Science
Chemical Engineering and Applied Chemistry University of Toronto
© Copyright by Sarat Chandra Cautha 2012
ii
Model based design of a Saccharomyces cerevisiae platform
strain with improved tyrosine production capabilities
M.A.Sc Thesis, 2012
Sarat Chandra Cautha, Department of Chemical Engineering and Applied Chemistry,
University of Toronto
Abstract
Large-scale production of plant secondary metabolites is of interest because of their application
in production of many valuable products. Recent advances in the area of DNA recombinant
technology has made it possible to produce these valuable compounds using microbial routes.
The objective of this work was, to design a platform strain of Saccharomyces cerevisiae with
improved intracellular tyrosine pools using computational modeling. This engineered yeast could
be used as a host for producing important plant secondary metabolites on an industrial-scale. In
this study, a combination of steady-state and dynamic modeling methods were used for strain
design. Initial strain design was performed using steady-state modeling, and the predictions from
steady-state modeling were prioritized for experimental validation using dynamic modeling. The
final strategy proposed included deletion of PDC1, ZWF1, ARO10; over-expression of ALD6,
and alleviation of tyrosine feedback resistance in shikimate pathway. Initial experiments for
validation of this strategy showed promising results.
iii
Acknowledgement
First and foremost, I wish to thank my supervisor, Professor Radhakrishnan Mahadevan. He has
provided excellent guidance and support throughout the course of this project and has always had
my best interest at heart. I am very grateful to him.
I would like to thank our collaborator, Professor Vince Martin and his group at Concordia
University, for validating the model predictions and providing me the data for ARO10 gene
deletion.
I would also like to thank my committee members, Professor William Cluett and Professor
Alexander Yakunin for providing valuable feedback and suggestions.
Many thanks are due to past and current members of Biozone and Laboratory of Metabolic
Systems Engineering, especially Pratish Gawand, Nadeera Jayasinghe, Ilan Adler, Tahnimeh
Khazaei and Christopher Gowen for their friendship, support, encouragement and help over the
course of this project.
I would like to thank Genome Canada and Department of Chemical Engineering and Applied
Chemistry for providing the funding for this project.
Finally and most importantly, I wish to thank my parents and sister for their unconditional love
and unwavering support during the best and worst periods of my life. I would like to dedicate
this thesis to them.
iv
Table of Contents
Abstract ii
Acknowledgement iii
Table of Contents iv
List of Figures vii
Chapter 1: Introduction 1
1.1 Advantages of producing chemicals using engineered microbes 1
1.2 Challenges in large-scale production of heterologous products 3
1.2.1 Effective expression of heterologous genes in microbial host 3
1.2.2 Supply of microbial precursors to heterologous pathway 4
1.3 Industrial importance of tyrosine 5
Chapter 2: Objective 7
Chapter 3: Literature Review 9
3.1 Tyrosine production 9
3.1.1 Biotechnology based methods for production of tyrosine 9
3.1.2 Tyrosine production using engineered microbes 11
3.2 Steady-state modeling 13
3.3 Ensemble Modeling 14
Chapter 4: Methods and Methodology 15
v
4.1 Steady-state modeling 15
4.1.1 Flux Balance Analysis (FBA) 16
4.1.2 Genome-scale metabolic models 18
4.1.3 Bi-level strain design algorithms 18
4.1.4 Limitations of steady-state modeling 21
4.2 Dynamic modeling 22
4.2.1 Ensemble modeling concept 22
4.2.2 Ensemble modeling framework 23
4.2.3 Screening the ensemble using literature data 27
4.2.4 Limitations of ensemble modeling 28
4.3 Methodology 28
Chapter 5: Results and Discussion 30
5.1 Steady-state modeling results 30
5.1.1 Predicted strategy 32
5.1.2 Experimental validation for ARO10 deletion 34
5.1.3 Need for Ensemble modeling 36
5.2 Ensemble modeling results 36
5.2.1 S. cerevisiae central model reconstruction 37
5.2.2 Development of ensemble of models 39
5.2.3 Screening the ensemble using data from literature 39
5.2.4 Prioritizing the targets for experiments using screened models 41
5.3 Final strategy to be verified experimentally 44
Chapter 6: Conclusions and Future Work 45
vi
6.1 Conclusions 45
6.2 Future work 46
References 47
Appendix A 57
Appendix B 60
Appendix C 61
Appendix D 63
vii
List of Figures
Figure 1.1 Schematic of the steps involved in production of Xanthohumol from tyrosine. 2
Figure 1.2 Schematic of some of the industrially important chemicals which require 6
tyrosine as a precursor
Figure 3.1 A schematic of aromatic amino acid (shikimate) pathway up to the 10
generation of Chorismate
Figure 3.2 Tyrosine production from Chorismate 10
Figure 3.3 A schematic of allosteric regulation in aromatic amino acid pathway of 12
S. cerevisiae
Figure 4.1 Trade-off between steady-state and dynamic modeling methods 15
Figure 4.2 Schematic of conversion of cell network into under-determined mass 16
balance constraints at steady-state
Figure 4.3 Schematic of how optimal flux distribution is calculated using FBA 17
Figure 4.4 Schematic of computational prediction of possible flux space for 20
wild-type and Optknock suggested mutant
Figure 4.5 Flow chart depicting the steps involved in Ensemble Modeling 23
viii
Figure 5.1 Steady-state strain design approach adapted in this study 30
Figure 5.2 Map of computationally predicted solution space for wild type 31
iMM904 and the strategy designed from GDLS
Figure 5.3 Schematic of the deletions suggested by GDLS 32
Figure 5.4 Schematic of experimental modifications made while testing 34
ARO10 deletion
Figure 5.5 Graphs showing accumulation of 4HPP and tyrosine in the four 35
strains that we investigated
Figure 5.6 Schematic of reconstructed S. cerevisiae network used in this work 38
along with calculated flux data
Figure 5.7 Model screening using data from succinic acid and glycerol 41
over-producing strains
Figure 5.8 EM predicted PEP, E4P, DAHP accumulation and biomass formation 42
rates when deletions suggested by GDLS are implemented
Figure 5.9 Graph showing the effect of ALD over-expression on 43
growth rate of PDC-- and ΔZWF mutant
Figure 5.10 Schematic of the proposed final strategy for tyrosine over-producing strain 44
1
Chapter 1
Introduction
1.1 Advantages of producing chemicals using engineered microbes
Declining supplies of fossil fuels and increasing environmental problems are currently driving
scientists around the world to develop novel biotechnology-based processes for producing fuels,
chemicals and other major materials using simple inexpensive sugars as the major carbon source.
Such processes do not require high temperatures and pressures thereby minimizing the energy
consumption and do not generate toxic compounds as by-products. Recent advances in the area
of recombinant DNA technology have enabled the production of many exotic and valuable
substances that were virtually unobtainable before using microbial routes. These include
substances that are traditionally not produced by microbes such as plant secondary metabolites
(e.g. polyketides, alkaloids, flavonoids). Many drugs used in modern medicine, such as
vinblastine, digitalis, taxol and codeine, are derived from plant secondary metabolites and are
used for treatment of cancer, heart diseases and pain. Apart from pharmaceutical purposes, such
valuable chemicals are used in production of flavours, fragrances, pigments, insecticides and
other important products.
Chemical synthesis of plant secondary metabolites is often difficult and expensive because of
their chemical complexity, and yields from natural resources are typically low, making industrial
scale production difficult. Therefore, there is a great incentive for producing these valuable
compounds using microbial routes. Production of plant secondary metabolites in microbes like
E.coli and yeast is accomplished by incorporating the plant genes into micro-organisms (Khosla
etal., 2003; Maury et al., 2005, Hawkins et al 2008; Minami et al., 2008). This process of
expressing non-native genes is called heterologous protein expression, and the compounds
obtained are called heterologous products.
2
The work we presented here is part of a project which aims to produce a group of plant
secondary metabolites like codeine, xanthohumol, resveratrol etc, on an industrial scale using
Saccharomyces cerevisiae (baker’s yeast) as the microbial host. Tyrosine, an aromatic amino
acid produced by S. cerevisiae, acts as precursor to the heterologous pathways that produce
codeine, xanthohumol etc. The following graphic (Fig 1.1) shows the steps involved in
production of one of the plant metabolite of our interest: xanthohumol, a flavonoid with anti-
cancer capabilities, from tyrosine.
Figure 1.1 Schematic of the steps involved in production of Xanthohumol from tyrosine. All the enzymes shown in
above graphic are heterologous enzymes that are required to be expressed in the microbial host (Phytometasyn
project report, 2008).
3
1.2 Challenges in large scale production of heterologous products
Wild-type S. cerevisiae has the enzymes necessary to produce the precursor tyrosine from cheap
sugars, but the enzymes shown in the graphic are non-native enzymes that need to be expressed
through genetic engineering. Although our current knowledge of microbial metabolism allows us
to do heterologous gene expression, the possibility of producing non-native compounds on an
industrial scale is limited, primarily because of the low yields of production. Microbial yields of
heterologous products depend on two factors: effective expression of heterologous genes in
microbial host; supply of microbial precursors to heterologous pathway.
1.2.1 Effective expression of heterologous genes in microbial host
With the advances in the field of synthetic biology and novel experimental techniques,
heterologous gene expression is routine, provided we select the suitable host. The choice of
microbial host is very important for the production of heterologous products. The microbial host
should be amenable for genetic manipulation, growth and provide a suitable environment for
proper expression of heterologous genes. In this work, we chose S. cerevisiae as our host
microbe because it meets all these requirements as detailed below.
S. cerevisiae is widely used in baking, brewing, and wine making industries, hence yeast
genetics, physiology, biochemistry, genetic engineering and fermentation strategies are well
understood. Experimental techniques required to precisely modify genetic network of this yeast
are widely available. Also, S. cerevisiae being a eukaryote, is known to have protein machinery
similar to that of higher eukaryotes. It has been established that enzymes from plants and humans
are properly folded and processed in yeast versus a prokaryotic host (Primrose, 1986; Zabriskie
et al., 1986), thereby, making it a suitable host for expression of key enzymes like aromatic
prenyltransferase (DMADP) and cytochrome P450. A number of studies of successful
expression of heterologous genes in S. cerevisiae were reported in literature ((Ro et al., 2004;
Porro et al., 2005; Jiang et al., 2005; Yan et al.,2007; Dejong et al., 2006; Ro et al., 2006). Also,
4
there are no endotoxins and oncogenic or viral DNA in S. cerevisiae, thus making it a very
suitable choice for our purpose.
In addition, S. cerevisiae produces no toxic metabolites and is non-pathogenic, earning it a
GRAS (generally regarded as safe) classification by the U.S. Food and Drug Administration
(FDA) (Chemler et al. 2006). Also, its physical properties such as tolerance to low pH and robust
growth under high sugar and ethanol conditions lead to the choice of yeast as our preferred
microbial cell factory.
1.2.2 Supply of microbial precursors to the heterologous pathway.
Another major determinant on the yield of heterologous products is the supply of microbial
precursor metabolites that act as feed to the heterologous pathway. Tyrosine, an aromatic amino
acid, is the precursor for production of plant metabolites of our interest. In a previous study
reported by Jiang et al. (2005) which looked at the production of chalconaringenin in S.
cerevisiae, it was observed that although the expression of the three genes TAL, 4CL and CHS
(shown in the above schematic (Fig.1.1)) was successful in producing chalconaringenin, its yield
was limited by the tyrosine flux. Similar observations suggesting limited heterologous
production due to insufficient precursor metabolite pools were reported by several others (Ro et
al., 2006; Santos et al., 2011; Brochado et al., 2010). Therefore, in order to improve the yields of
these non-native compounds, it is important to improve the intracellular pools of microbial
precursors.
To produce compounds such as xanthohumol in large-scale, it is important that we use a strain of
S. cerevisiae with sufficiently high levels of intracellular tyrosine as the host for expression of
heterologous pathways. S. cerevisiae, like other micro-organisms in nature, is assumed to evolve
with a goal of maximizing its growth rate under the given conditions, a state where there is no
excess production of tyrosine. In order to obtain a strain with higher intracellular tyrosine pools,
it is necessary to modify the genetic network of the yeast, thereby forcing it to channel more of
the substrate towards tyrosine. This process of modifying genetic network of organisms for over-
production of metabolites is called metabolic engineering.
5
Traditionally metabolic engineering has been done by classical strain improvement methods
which involved random mutagenesis and screening. However, with the advances in the area of
genome sequencing, a greater knowledge of microbial genetic networks is widely available. S.
cerevisiae, our microbial host, was the first eukaryotic organism whose complete genome was
sequenced (Goffeau et al., 1997). The information about its genome is widely available
(http://www.yeastgenome.org) along with information on open reading frames, biochemical
pathways, microarray studies and protein interaction networks. This information can be used to
devise rational design strategies for improved production of the required metabolites. Predicting
rational design strategies is not trivial considering the complexity of biological networks;
therefore, in order to help with this process many mathematical modeling methods have been
developed (Burgard et al., 2003; Patil et al., 2005; Pharkya et al., 2006; Tran et al., 2008; Lun et
al., 2009; Ranganathan et al., 2010; Yang et al., 2011). In the current study, we discussed a
methodology that used a combination of computational modeling methods to predict an effective
genetic engineering strategy for improved intracellular tyrosine. This strain of S.cerevisiae with
higher tyrosine pools would be an ideal host for expression of heterologous pathway of our
interest.
1.3 Industrial importance of tyrosine
Tyrosine, apart from being a precursor for alkaloids and polyketides, is also a valuable
compound with a variety of applications (Fig. 1.2). Tyrosine is used in its natural form as a
common dietary supplement due to its ability to stimulate brain activity for improved memory
and to control depression and anxiety. Tyrosine also serves as an important starting material for a
variety of high-value compounds such as Melanin, L- 3,4-dihydroxyphenylalanine (L-DOPA or
levodopa), which is currently the most powerful symptomatic drug for treatment of Parkinson’s
disease. Therefore, in addition to our objective of obtaining a platform strain for plant secondary
metabolites production, if the final strain of S. cerevisiae is observed to have significantly high
titers of tyrosine, it can be considered for industrial production of tyrosine, replacing E.coli,
which is currently the preferred microbe.
6
Figure 1.2 A schematic of some of the industrially important chemicals which require tyrosine as a precursor.
.
7
Chapter 2
Objective
As stated in the previous chapter, the supply of microbial precursors is an important factor in
determining the yields of heterologous products. Recently, Santos et al. (2011) showed that when
a strain of E. coli that was engineered for production of tyrosine was used as the microbial host,
naringenin was produced at sufficiently high titers (up to 84 mg/l) from glucose using minimal
media without any precursor supplementation. This clearly suggests that there is an incentive to
obtain a tyrosine over-producing platform strain of S. cerevisiae, which can be used as a host for
heterologous pathway expression. There is limited work done previously on improving the
aromatic amino acid production in S. cerevisiae as E. coli is the preferred microbe for industrial
scale production of aromatic amino acids. In this work, our objective was to design a strain of S.
cerevisiae with improved intracellular tyrosine pools using computational modeling methods.
The only reported work on obtaining metabolically engineered strain of S. cerevisiae with
tyrosine over-producing capabilities has focused on removing the feedback inhibition present in
aromatic amino acid pathway (Luttik et al., 2008). However, to optimize the network of S.
cerevisiae for tyrosine production, a holistic design that would account for the entire genome of
the yeast is required. The process of performing a genome-scale design is not trivial and cannot
be performed by observation because of the complexity of genetic networks. This difficulty in
developing a strategy, while considering the entire genome of the microbe, acts as motivation for
using mathematical modeling techniques for designing metabolic engineering strategies. In
addition, the availability of a number of well-curated in silico genome-scale models of S.
cerevisiae (Forster et al., 2003; Duarte et al., 2004; Nookaew et al., 2008; Herrgard et al., 2008;
Mo et al., 2009) provides an additional motivation for using mathematical modeling techniques.
In order to truly understand the dynamic microbial behaviour and to be able to predict the result
of genetic manipulations with complete accuracy, it is desirable to have a detailed genome-scale
dynamic model of the microbial metabolism. However, currently it is impractical to have large
scale dynamic models because of the lack of information on kinetic parameters and regulatory
8
network. In the absence of kinetic and regulatory information, it is possible to partly predict the
behaviour of cellular metabolism by using steady-state analysis. However, the predictions made
by steady-state modeling methods need not necessarily be consistent with experiments because it
does not consider the dynamic nature of cells. Therefore, there exists a trade off between the size
of the network and the accuracy in prediction between steady-state and dynamic modeling
methods.
In this study, the objective was to propose a methodology that can tackle the inherent limitations
of steady-state and dynamic modeling methods and devise an effective strategy for tyrosine over-
production. This was accomplished through the following tasks:
1. Obtain an initial strain design for improved tyrosine production using steady-state bi-
level optimization methods Optknock (Burgard et al., 2003) and GDLS (Lun et al., 2009).
2. Construct a small central model of S. cerevisiae, based on the suggestions from steady-
state modeling, for application of Ensemble modeling framework.
3. Predict the dynamic behaviour of yeast by applying Ensemble modeling over the central
model.
4. Use the dynamic central model obtained from Ensemble modeling to understand the
effect of each of the deletions suggested by steady-state models on the flux distribution
and predict the critical deletions for improving tyrosine flux.
5. Propose a final strategy for improved intracellular tyrosine pools for experimental
validation.
9
Chapter 3
Literature Review
3.1 Tyrosine production
Aromatic amino acids have a high industrial demand because of many applications, primarily in
the food and pharmaceutical industry (Breuer et al., 2004). Among the aromatic amino acids, the
demand for tyrosine is much lower when compared to the other aromatic amino acids,
phenylalanine and tryptophan, and this probably explains the reason why industrial scale
production of tyrosine received limited attention. Tyrosine is manufactured by three different
methods: (a) enzymatic synthesis by tyrosine phenol lyase (Lütke-Eversloh et al., 2007) (b)
extraction from protein hydrolysates ( Leuchtenberger et al., 2005) and (c) fermentation using
high performance mutants or genetically engineered microbial strains (Lütke-Eversloh et al.,
2007; G.Gosset 2009). For this project, we are interested in biological production of tyrosine, so
we have detailed information on tyrosine production by genetically engineered microbial strains
here.
3.1.1 Biotechnology based methods for production of tyrosine
Aromatic amino acids are produced by microbes via the aromatic amino acid or shikimate
pathway (Fig. 3.1). Phosphoenolpyruvate (PEP) and erythrose-4-phosphate (E4P) act as major
precursors to this pathway. In the first step PEP, a central carbon metabolite, and E4P, a pentose
phosphate pathway intermediate, are condensed to form 3-deoxy-D-arabinoheptulosonate-7-
phosphate (DAHP) which is further converted to shikimate (SHIK) via 3-dehydroquinate (DHQ)
and 3-dehydroshikimate (DHS). Shikimate is then phosphorylated and converted to chorismate
(CHOR) after the addition of another PEP molecule. Chorismate is the biosynthetic branch point
for aromatic amino acids, as well as for folate, ubiquinone, menaquinone, and siderophores
synthesis (Fig 3.2) (Pittard et al., 1996; Dosselaere et al., 2001).
10
Figure 3.1 A schematic of aromatic amino acid (shikimate) pathway up to the generation of Chorismate. Seven
reactions are involved in the conversion of PEP and E4P to Chorismate.
For phenylalanine and tyrosine biosynthesis, chorismate is converted to prephenate, a common
precursor, in a reaction catalyzed by the enzyme chorismate mutase. In the branch that produces
tyrosine, prephenate gets converted to 4-hydroxyphenylpyruvate (4-HPP) using the enzyme
prephenate dehydrogenase. In S. cerevisiae, formation of 4-HPP is associated with formation of
one mole of NADPH. Tyrosine is then formed by transamination of 4-HPP using the enzyme
aminotransferase. A schematic of tyrosine branch from Chorismate is depicted in the following
figure (Fig, 3.2).
Figure 3.2 Tyrosine production from Chorismate. This process involves generation of one mol of NADPH.
11
3.1.2 Tyrosine production using engineered microbes
As mentioned before, most of the work on aromatic amino acid production was directed towards
phenylalanine and tryptophan production (Berry A 1996; Bongaerts et al., 2001; Leuchtenberger
et al., 2005; Ikeda et al., 2003, 2006). Most of the research conducted for tyrosine production
focused on its production in E.coli, which currently is the preferred microbe for industrial scale
production. In E.coli, the first reaction in aromatic amino acid pathway, a condensation reaction
between PEP and E4P, is catalyzed by a set of three isoenzymes which are feedback inhibited by
the three aromatic amino acids (Berry et al., 1996; Frost et al., 1995). The first generation of
metabolic engineering approaches towards increasing carbon flux to tyrosine in E.coli
concentrated on over-expressing the feedback resistant enzyme that correspond to tyrosine
(Lutke-Eversloh et al., 2007; Olson et al., 2007), accompanied by over-expression of rate
controlling pathway enzymes (Olson et al., 2007). A second generation of quantitative metabolic
engineering approaches focussed on over-expression of phosphoenolpyruvate synthase and
transketolase A genes, which would result in an increase in the precursor pools, PEP and E4P ,
(Lutke-Eversloh et al., 2007; Yi et al., 2003). In a recent study, Juminaga et al., (2012) expressed
all the genes encoding formation of tyrosine from PEP and E4P on two plasmids and
transformed them inside E.coli cells. This effort resulted in complete removal of bottlenecks in
aromatic amino acid pathway (Juminaga et al., 2012).
In S. cerevisiae, two reactions in the aromatic amino acid pathway are known to be subject to
feedback inhibition (Fig. 3.3). The first of these reactions is the formation of DAHP, and this
reaction is catalyzed by two isoenzymes (Ar03p, Aro4p) which are feedback regulated by
phenylalanine (Aro3p) and Tyrosine (Aro4p) (Kunzler et al., 1992). The second reaction is the
conversion of chorismate to prephenate. This reaction is catalyzed by the enzyme chorismate
mutase (Aro7p) whose activity is inhibited by Tyrosine and activated by tryptophan (Brown et
al., 1990). Although considerable knowledge is available on the functioning of aromatic amino
acid pathway in yeast, very little work has yet been done on using S. cerevisiae as a host to
produce Tyrosine. To date, only one report provides a glimpse into the possibility of developing
S. cerevisiae into a Tyrosine over-producer (Luttik et al., 2008). This work involved using
Tyrosine feedback resistant versions of both Aro4p (Hartmann et al., 2003) and Aro7p enzymes
(Krappmann et al., 2000). The feedback resistant Aro4p contains a single lysine-to-leucine
12
substitution at position 229 and, feedback resistant Aro7p has a serine-to-glycine substitution at
position 141. In this strain, the production of Tyrosine increased by 3 fold when compared to
wild-type. This work also showed that DAHP synthase exerts a stronger degree of control on the
synthesis of Tyrosine than chorismate mutase in S. cerevisiae.
Figure 3.3 A schematic of allosteric regulation in aromatic amino acid pathway of S. cerevisiae.
13
3.2 Steady-state Modeling
In this work, we used FBA (Orth et al., 2010) based bi-level optimization methods, Optknock
(Burgard et al., 2003) and GDLS (Lun et al., 2009) for strain design. The advantage of using
steady-state modeling methods is that, they can be applied to genome-scale models and does not
require information of kinetic parameters. It has been shown previously that bi-level
optimization methods are useful for strain design in yeast:
Bro et al. (2006) used in silico simulations with the iFF708 model for increasing ethanol
production and at the same time decrease the amount of glycerol produced by the cell under
anaerobic growth conditions. The engineered strain had a decreased glycerol production of 40%
and an increased ethanol yield of 3% without affecting the maximum specific growth rate.
Asadollahi et al. (2009) investigated strategies for improving the yield of sesquiterpene
production in S. cerevisiae, by enhancing the precursor pools. They used a bi-level optimization
framework, Optgene, over iFF708 model for their predictions. The strategy that they obtained,
led to an approximately 85% increase in the final cubebol titer.
Brochado et al. (2010) studied a recombinant yeast strain producing vanillin, this strain was
modeled in silico to find deletion targets that could improve the vanillin yield on glucose. The
iFF708 model was used to suggest two different deletion targets. When these deletions were
implemented in vivo, a 5-fold increase of vanillin yield was observed as compared to previously
reported vanillin production in S. cerevisiae.
From the above examples, it is clear that in silico modeling using genome-scale metabolic
models can accelerate the process of metabolic engineering by suggesting rational targets for
over-expression or deletion for improved production of a certain metabolite.
14
3.3 Ensemble Modeling
Ensemble modeling (Tran et al., 2008) is a novel dynamic modeling approach, which estimates
the kinetic behaviour of cells using phenotypic data of various enzyme tuning experiments
reported in literature. Ensemble modeling is useful for predicting the metabolic behaviour of
cells in a greater detail when compared to steady-state modeling techniques, but cannot be used
to perform extensive strain design like Optknock and GDLS. Ensemble modeling has been used
previously to improve the yield of lysine in engineered E.coli (Contador et al., 2009), to
understand fatty acid metabolism in hepatic cells (Dean et al., 2010) and more recently to predict
new drug targets for cancer (Khazaei, 2011). There is no reported work on its usage for S.
cerevisiae.
A more detailed explanation of these modeling methods was provided in the following chapter.
15
Chapter 4
Methods and Methodology
As stated before, the objective of current work was to design a strain of S. cerevisiae that can
show increased intracellular levels of tyrosine using computational methods. Using mathematical
models allows us to have a more holistic perspective of the microbial system while performing
the strain design. Mathematical modeling approaches can broadly be classified into two major
approaches: constraint-based steady-state modeling and mechanism-based dynamic modeling.
Each of these modeling approaches has their own advantages and limitations (Fig 4.1).
Figure 4.1 Trade-off between steady-state and dynamic modeling methods.
4.1 Steady-state modeling
Steady-state modeling analyzes the metabolic networks based on reaction stoichiometries and
enzyme reversibilities in addition to network topology. With the advances in whole-genome
sequencing, these characteristics are readily available for several organisms in the form of
reconstructed metabolic networks (Feist et al., 2008; Herrgard et al., 2008). A major feature of
steady-state modeling approach is that it assumes no change in the concentration of intracellular
metabolites. This assumption of constant metabolite concentrations is used to calculate the fluxes
by performing mass balance across each intracellular metabolite, thereby, circumventing the
16
need for information on kinetic parameters to characterize the flux distributions (Bailey 2001). In
this work, strain design was performed using bi-level optimization based methods, which are
extensions of a widely used fundamental modeling methodology called Flux Balance Analysis
(FBA) (Orth et al., 2010).
4.1.1 Flux Balance Analysis (FBA)
FBA is one of the most widely used steady-state modeling technique that allows for detailed
simulations of metabolic systems. In FBA, the reaction network of a cell is represented as a set
of under-determined mass balance constraints as shown in the figure (Fig 4.2).
Figure 4.2 Schematic of conversion of cell network into under-determined mass balance constraints at steady-state.
Mass balance equations are represented in mathematical form using the stoichiometric matrix or
the S matrix and the flux distribution vector V. S matrix reflects the stoichiometry of various
reactions involved in the network. In S matrix every row corresponds to the concentration of one
metabolite and every column corresponds to the flux of one reaction. Because the system of
equations is an under-determined system, we can have more than one flux distribution that can
satisfy mass balance constraint. Among these possible sets of flux distribution, FBA predicts the
optimal flux distribution by assuming that the metabolic network has been optimized during
17
evolution with respect to a particular objective function. Objective functions commonly used are
maximization of ATP production (Ra et al., 1990; Ramakrishna et al., 2001), maximization of
biomass formation (Kauffman et al., 2003; Edward et al., 2000; Price et al., 2003) or
minimization of metabolic adjustment (MoMa) (Segre et al., 2002). So far, growth maximization
has been the most extensively used approach to describe the physiology during growth (Edwards
et al., 2001; Famili et al., 2003).
The functioning of FBA is described in the following figure (Fig. 4.3). When no constraints were
imposed, the flux distribution of a biological network can take any possible value. When mass
balance and capacity constraints were introduced, a solution space was defined, and when the
solution space was optimized for a specific objective function, the point on the edge of the
solution space was identified as the optimal flux distribution.
Figure 4.3 Schematic of how optimal flux distribution is calculated using FBA (Orth et al., 2010).
Mathematically FBA is formulated as:
S = stoichiometric matrix ; V = flux vector ; f = objective vector ; A = vector of lower bound of
flux V and B = vector of upper bound of flux V
Max f’V
Subject to SV = 0
A ≤ V ≤ B (4.3)
(4.2)
(4.1)
(Steady-state condition)
18
4.1.2 Genome-scale metabolic models
The uncomplicated nature of FBA formulation allows it to be applied on large-scale metabolic
reconstructions, called genome-scale metabolic models. Genome-scale metabolic models are in
silico models that contain information about all the known pathways in the organism and it is
constructed based on the annotated genome sequences and the known biochemical and
physiological data. Applying FBA on these models has been shown to be very useful in
predicting the physiological behaviour, like growth rate and product secretion rate, of
microorganisms under different environmental and genetic disturbances (Forster et al., 2003;
Duarte et al., 2004). Genome-scale metabolic models were also found useful in designing cells
for improved production of desired products by suggesting the reactions that need to be targeted
(Alper et al., 2005; Fong et al., 2005; Wang et al., 2006; Bro et al., 2006).
S. cerevisiae is perhaps the most well studied eukaryotic microbe and number of genome-scale
metabolic models of S. cerevisiae have been developed and are available for Metabolic modeling
: iFF708 (Forster et al., 2003), iND750 (Duarte et al., 2004), iLL672 (Keupfer et al., 2005),
iIN800 (Nookaew et al., 2008), iMM904 (MO et al., 2009) and Yeast 4.0 (Dobson et al., 2010).
Appendix D shows the number of genes sequenced, metabolites and reactions for each of the
above mentioned models and also compares the effectiveness of iIN800, iMM904 and Yeast 4.0
in predicting in silico viability of single deletion strains. These genome-scale metabolic models
have been successfully used in the past for improving the yield of sesquiterpenes (Asadollahi et
al., 2009), vanillin (Brochado et al., 2010) and ethanol (Bro et al., 2006). In this work we used
iMM904, which has 904 metabolic genes, 1228 metabolites and 1577 reactions, for performing
strain design to improve tyrosine yield. The eventual design that we obtained by using this model
is then verified on Yeast 4.0 and improved version of iMM904 (Zomorrodi et al., 2010).
4.1.3 Bi-level strain design algorithms
After the success of FBA in predicting the result of various genetic manipulations strategies with
minimal knowledge of kinetic parameters, the obvious next step was to extend its framework to
try and predict the metabolic engineering targets for improved production of desired metabolites.
19
This can be achieved by searching the space of possible genetic manipulations for the strategy
that results in the desired metabolic state of improved production of required products. Many
computational tools for identifying strain modifications leading to targeted overproductions have
been described in the literature ( Burgard et al., 2003; Patil et al., 2005; Pharkya et al., 2006; Lun
et al., 2009). One of the earliest efforts was OptKnock (Burgard et al., 2003) procedure that
proposed gene knockouts leading to targeted overproductions. Optknock algorithm is a bilevel
optimization algorithm, meaning it maximizes the cellular objective function (as described in the
previous flux balance analysis section) while also maximizing a surrounding bioengineering
objective. The inner optimization of maximizing cellular objective is necessary to prevent
prediction of lethal deletions.
Optknock formulation (Burgard et al., 2003)
The input to the Optknock algorithm is the reaction network that we are interested in and the
substrate uptake rates. Bioengineering and cellular objectives need to be determined such that
they reflect the strain that we aim to engineer. In our case, we used genome-scale model
iMM904 model as our input and glucose as the substrate under aerobic minimal media
conditions. We defined the bioengineering objective as maximization of tyrosine production and
for cellular objective, we chose growth maximization.
The result that we get from Optknock is a set of reactions that needs to be deleted to couple the
growth of the organism to the production of the product of our interest. This process is explained
in the following figure (Fig 4.4).
Maximize Bioengineering objective through gene knockouts
Subject to
Number of knockouts <= limit
Maximize Cellular Objective
Subject to Network stoichiometry
Reaction bounds
Substrate uptake rate
Blocked reactions identified by outer
problem
20
Figure 4.4 Schematic of computational prediction of possible flux space for wild-type and Optknock suggested
mutant. Deleting genes suggested by Optknock would obligate the cells to produce the product of our interest after
evolution. (Adapted from Fong et al., 2005).
In the above figure, region A shows all possible values of growth and product formation fluxes
that wild-type cell can operate for a given substrate uptake rate under steady-state condition. We
considered maximizing growth rate as our cellular objective because, it is reasonable to assume
that cells are evolved over millions of years to optimize their growth. According to this
assumption, wild-type cells would operate at (1) in region A, where the growth rate is maximum.
Under this condition, the cell will not produce the required product, unless the product is a
primary metabolite for the cell. In our case, because wild-type yeast produces only sufficient
amount of tyrosine required for its growth, we will not see any excess production. After
performing the suggested deletions from Optknock, the genetic network of resulting mutant will
be different from wild-type. This modification results in a change in the possible solution space
for mutant growth and product formation fluxes. This new solution space is represented by the
region B. After adaptive evolution, mutant would operate at (2) which is the optimal growth flux
point. It can be seen that, after implementing the mutations suggested by Optknock, the genome
is engineered in such a way that if the cell has to grow at its most preferred state (maximum
growth), it has to produce the product of our interest.
When Optknock is applied over large networks like iMM904, the algorithm will not converge if
we are looking to design a strain that has more than 3-4 knockouts. This limit on maximum
21
allowed knockouts is because, the runtimes scale exponentially with increase in number of
knockouts when applied over genome-scale models. This can be a limitation in some cases
because, certain metabolites might require more than 3-4 knockouts to observe appreciable
growth-product coupling. GDLS (Lun et al., 2009) is an algorithm that is formulated on the same
lines as Optknock, but it uses local search instead of global search approach adopted by
Optknock. The advantage of using a local search approach is that, we can make a design with a
much larger limit on the number of possible knockouts when we are using genome-scale models.
However, unlike Optknock, the solution that GDLS gives might not be globally optimal. In the
current work, we tried both Optknock and GDLS to design tyrosine over-producing S. cerevisiae.
Although similar algorithms, like Optreg (Pharkya et al., 2006), Optforce (Ranganathan et al.,
2010) and EMILiO (Yang et al., 2011) which would include over-expression and repression in
addition to deletion of reactions in strain design were reported, we have not used them in this
project because the designs made by up-regulation and repression of reactions might not be
robust, especially when kinetics is not considered. This is because, when we allow the algorithm
to consider over-expression and repression, it would predict specific enzyme expression levels
which will be difficult to implement experimentally.
4.1.4 Limitations of steady-state modeling
The major limitations of steady-state modeling are a result of the assumptions of steady-state
modeling i.e. assumption of cells operating at a steady-state; assumption of presence of a
biological objective function. The assumption of steady-state is generally valid for microbial
metabolic networks because, the time-scales for equilibration of metabolite concentration are
much smaller when compared to genetic regulation (Segre et al., 2002). However, the
assumption of presence of a biological objective function appears far less convincing, especially
while performing strain design using Optknock and GDLS. This is because, while it is
reasonable to assume that wild type cells, evolving over millions of years, have optimized their
genetic network to maximize their growth rate, there is no experimental evidence that suggests
mutants exhibit a particular objective function. Even the steady-state assumption appears to be
more acceptable while predicting the metabolic behaviour of cells using FBA than while
designing strategies using Optknock/GDLS. These limitations provide a more compelling case
22
for verification of steady state modeling predictions using a dynamic model before experimental
validation.
4.2 Dynamic modeling
Dynamic modeling would provide a more detailed analysis of biological systems and can predict
dynamic cellular behaviour. A detailed kinetic model of an organism, if available, can be very
useful to predict how the metabolic flux map changes when a genetic manipulation is performed.
However, development of detailed kinetic models (Chassagnole et al., 2002; Lee et al., 2006;
Wang et al., 2004) has been difficult because of the lack of knowledge on kinetic parameters.
The time-course data that is required to predict the kinetic parameters requires tedious and
expensive experimental procedures. Ensemble modeling (EM) (Tran et al., 2008) is an approach
that can be used overcome this drawback as it uses experimental data reported in literature to
predict the dynamic behaviour of the cell. However, the large number of parameters involved in
EM limits the size of the cellular network over which it can be applied.
4.2.1 Ensemble modeling concept
In EM to avoid the difficulty of knowing the kinetic parameters of each reaction in the system,
we construct an ensemble of models that all reach the given steady-state in terms of flux
distribution and metabolite concentrations. These models, which span the entire space of kinetics
that is thermodynamically feasible, are then screened using available experimental data to obtain
a smaller subset which could account for dynamic cell behaviour. The data used for screening is
phenotypic data such as flux changes due to changes in enzyme expression. Even though such
data are measured at steady-state, they are the results of interplay among many kinetic
parameters, and therefore provide a useful screen. EM breaks down each reaction in the network
into their elementary form, which allows us to incorporate any known information on the true
mechanism of an enzymatic reaction such as regulation, thermodynamics and steady-state
23
metabolite levels, but does not require such information if it is not available. The major steps
involved in EM is represented in the following flow chart (Fig. 4.5).
Figure 4.5 Flow chart depicting the steps involved in Ensemble Modeling (adapted from Contador et al., 2009).
4.2.2 Ensemble modeling framework
As stated before, it is currently impractical to apply EM framework for large metabolic networks,
so we developed a smaller network of 50 reactions for S. cerevisiae (Fig. 5.6). This central
metabolic network is used as an input in to the framework along with a known reference steady-
state flux data. EM uses this reference steady-state flux data to anchor the models. In addition to
flux data, EM framework requires Gibbs free energy (ΔGs) values for all the reaction in the
network in order to calculate the feasible thermodynamic space. We obtained the ΔGs of reaction
from the paper by Jankowski et al. (2008). A table containing all the considered reactions, their
steady-state flux values and ΔGs is shown in appendix A. The framework then breaks down
every enzymatic reaction into a set of elementary reactions. Elementary reactions are the most
24
fundamental form of reaction and represent events at the molecular level which allows us capture
the mechanism of the reaction. For example for a one reactant one product reaction
𝑿𝒊 𝑬𝒊
𝑿𝒊+𝟏
The scheme of break down into elementary reactions is:
Flux through each elementary reaction is given by:
𝒗𝒊,𝟏 = 𝒌𝒊,𝟏 𝑿𝒊 [𝑬𝒊]
Where ki,1 is the kinetic rate constant for the first elementary reaction, [Xi] is the concentration of
metabolite, and [Ei] is the concentration of enzyme i. Similarly, a standard mechanism of break
down into elementary reactions (appendix B) was followed for other reactions with different
number of reactants and products. EM framework can also consider any information available on
allosteric regulation in the network by treating the regulation as an individual reaction.
In order to make the above equation (Eq. 4.9) dimensionless, and making it easier and more
accurate for numerical simulation, we normalize the concentrations of metabolites by the
corresponding concentration at the reference steady-state Xiss,ref
. Similarly, the free enzyme and
enzyme complexes are normalized by the total concentration of the corresponding enzyme
Eref
i,total at the reference state.
𝒗𝒊,𝟏 = 𝒌𝒊,𝟏𝑬𝒊,𝒕𝒐𝒕𝒂𝒍𝒓𝒆𝒇
𝑿𝒊𝒔𝒔,𝒓𝒆𝒇
∗ [𝑿𝒊]
𝑿𝒊𝒔𝒔,𝒓𝒆𝒇
∗ [𝑬𝒊]
𝑬𝒊,𝒕𝒐𝒕𝒂𝒍𝒓𝒆𝒇
= Ǩ𝒊,𝟏𝒓𝒆𝒇
∗ 𝑿 𝒊 ∗ ě𝒊,𝟏
(4.4)
(4.5)
(4.6)
(4.7)
25
Note that the reaction has a log linear form:
𝒍𝒏 𝒗𝒊,𝟏𝒓𝒆𝒇
= 𝒍𝒏 Ǩ𝒊,𝟏𝒓𝒆𝒇
+ 𝒍𝒏 𝑿 𝒊 + 𝒍𝒏ě𝒊,𝟏
At the reference steady-state Xiss,ref
= 1 and the equation becomes:
𝒍𝒏 𝒗𝒊,𝟏𝒓𝒆𝒇
= 𝒍𝒏 Ǩ𝒊,𝟏𝒓𝒆𝒇
+ 𝒍𝒏ě𝒊,𝟏
From the above equation (Eq. 4.12), it can be seen that kinetic parameters can be calculated if
vi,1ref
and enzyme fraction ei,1ref
are sampled. Enzyme fraction value lies between 0 and 1 and can
be sampled effectively, but it is not easy to sample the flux values because they can range
anywhere between 0 to infinity. In order to avoid this situation, we sample what is called the
reversibility of the reaction. It is defined as:
𝑹𝒊,𝒋 = 𝒎𝒊𝒏(𝒗𝒊,𝟐𝒋−𝟏,𝒗𝒊,𝟐𝒋)
𝒎𝒂𝒙(𝒗𝒊,𝟐𝒋−𝟏,𝒗𝒊,𝟐𝒋)
where vi,2j-1 indicate the forward fluxes of the elementary reactions of reaction i and vi,2j
represent the reverse flux. From the way reversibility is defined, it is obvious that its value lies
between 0 and 1, thereby making it easier to sample effectively. The values of reversibilities are
a representation of different kinetic states. For example, if within the enzymatic reaction i, Ri,j for
step j is close to zero while that of the next step is near 1, step j is determined to be the rate
limiting step (Tran et al., 2008).
At reference steady-state forward and reverse flux of each step are constrained by the following
equation
𝒗𝒓𝒆𝒇𝒊,𝟐𝒋−𝟏 − 𝒗𝒓𝒆𝒇
𝒊,𝟐𝒋 = 𝑽𝒓𝒆𝒇𝒊,𝒏𝒆𝒕
(4.8)
(4.9)
(4.10)
(4.11)
26
Vi,netref
is the reference steady-state flux of the reaction catalyzed by enzyme i. Using the above
two equations (Eq. 4.13 and 4.14) we can calculate the forward and backward fluxes of each
elementary reaction at reference steady-state as:
𝒗𝒊,𝟐𝒋−𝟏𝒓𝒆𝒇
= 𝑽𝒊,𝒏𝒆𝒕𝒓𝒆𝒇
𝟏− 𝑹𝒊,𝒋
𝒔𝒊𝒈𝒏 (𝑽𝒊,𝒏𝒆𝒕𝒓𝒆𝒇
) 𝒗𝒊,𝟐𝒋
𝒓𝒆𝒇=
𝑽𝒊,𝒏𝒆𝒕𝒓𝒆𝒇
∗ 𝑹𝒊,𝒋
𝒔𝒊𝒈𝒏 (𝑽𝒊,𝒏𝒆𝒕𝒓𝒆𝒇
)
𝟏− 𝑹𝒊,𝒋
𝒔𝒊𝒈𝒏 (𝑽𝒊,𝒏𝒆𝒕𝒓𝒆𝒇
)
Reversibilities are also used to apply thermodynamic constraints by the following equation.
(∆𝑮𝒊
𝑹𝑻)𝒍𝒐𝒘𝒆𝒓 𝒃𝒐𝒖𝒏𝒅 ≤ 𝒔𝒊𝒈𝒏 𝑽𝒊,𝒏𝒆𝒕
𝒓𝒆𝒇 ∗ 𝒍𝒏𝑹𝒊,𝒋
𝒓𝒆𝒇 𝒋 ≤ (
∆𝑮𝒊
𝑹𝑻)𝒖𝒑𝒑𝒆𝒓 𝒃𝒐𝒖𝒏𝒅
Sign (Vi,net) indicates the direction of the reaction, a value of +1 is assigned for forward
reactions and -1 for reverse reactions. The upper bound and lower bound values are calculated
from the standard Gibbs free energies and metabolite concentration ranges. In our case, because
the range of metabolite concentrations was not known, we assumed 0 and 100 as the lower and
upper bounds metabolite concentrations.
In addition to reversibilities, the enzyme fractions are also sampled. At steady-state, the total
enzyme concentration for each reaction is the sum of the free enzymes and bound enzymes. The
distribution of the total enzyme amount over the free enzymes and bound enzymes affects the
kinetics of the system and for this reason the enzyme fractions are considered. The enzyme
fractions are sampled with the constraint that the total enzyme amount is conserved. In other
words, the sum of the enzyme fractions of the elementary reactions for each enzymatic reaction
must equal one (Contador et al., 2008).
ě𝒓𝒆𝒇𝒊,𝒋𝒏𝒊𝒋=𝟏 = 𝟏
(4.12) (4.13)
(4.14)
(4.15)
27
Once reversibilities and enzyme fractions are sampled, kinetic parameters can be calculated
using Eq. [4.2], Eq. [4.15] and Eq. [4.16]. So, for each set of enzyme fractions and reversibilites
we get a set of kinetic parameters that define one of the possible kinetic model. This process is
repeated thousands of times to generate thousands of sets of kinetic parameters which are then
used to develop alternate kinetic models that are all anchored onto the same reference steady-
state flux values.
𝑴𝒐𝒅𝒆𝒍𝒌 = 𝒇(𝑹𝒌𝒓𝒆𝒇
, 𝒆𝒌𝒓𝒆𝒇
)
After estimating kinetic parameters of all the elementary reactions, the next step is to calculate
the steady-state flux and metabolite concentration data. The metabolic network for each model in
the ensemble is described by a system of ODEs:
𝒅Ӯ𝒊
𝒅𝒕=
𝟏
𝒚𝒊𝒔𝒔,𝒓𝒆𝒇 𝒗𝒈𝒆𝒏𝒆𝒓𝒂𝒕𝒊𝒐𝒏 - 𝒗𝒄𝒐𝒏𝒔𝒖𝒎𝒑𝒕𝒊𝒐𝒏
Where Ӯi represents both the normalised metabolite concentration with respect to the reference
steady-state or the enzyme fractions. yiss,ref
stands for the corresponding metabolite or total
enzyme concentration at the reference state. Solving the ODEs by numerical integration the
steady-state metabolite and enzyme concentrations can be generated and then using Eq. [4. 20]
steady-state flux data can be computed. In this work, we used ode15s solver in MATLAB to
solve the ODEs with an integration time of 500 units and a step size of 25 units.
4.2.3 Screening the ensemble using literature data
The models developed are then screened using reported data from literature. All the models are
perturbed by modifying the enzyme concentration levels to reflect the reported experiments and
the model predicted data is compared with experimental data. Models that are in agreement with
the reported physiological data are retained for further screening using additional data.
The following equation represents how each model is perturbed:
(4.16)
(4.17)
28
𝒗𝒊,𝟏 = 𝒌𝒊,𝟏𝑬𝒊,𝒕𝒐𝒕𝒂𝒍𝒓𝒆𝒇
𝑿𝒊𝒔𝒔,𝒓𝒆𝒇 ∗
𝑬𝒊,𝒕𝒐𝒕𝒂𝒍
𝑬𝒊,𝒕𝒐𝒕𝒂𝒍𝒓𝒆𝒇
∗[𝑿𝒊]
𝑿𝒊𝒔𝒔,𝒓𝒆𝒇
∗ [𝑬𝒊]
𝑬𝒊,𝒕𝒐𝒕𝒂𝒍𝒓𝒆𝒇
= Ǩ𝒊,𝟏𝒓𝒆𝒇
∗ 𝑬𝒊,𝒓 ∗ 𝑿 𝒊 ∗ ě𝒊,𝟏
In this equation, the new term Ei, total represent the modified enzyme concentration of enzyme i
and the ratio between Ei, total and Eref
i, total represents the fold change in enzyme concentration.
This fold change in enzyme concentrations is modified in order to perturb the systaem. In this
study, the value of ratio for deletion was taken as .01 and for over-expression as 10. We used
data from succinic acid strain (Raab et al., 2010) and from glycerol over-production strain
(Nevoigt et al., 1996) to screen the models. The screened models obtained after this process were
used to observe the changes in flux distribution with deletion of reactions suggested from steady-
state modeling. The reactions that were predicted to have a greater control on diversion of flux
towards aromatic amino acid pathway were chosen as targets for experimental validation.
4.2.4 Limitations of Ensemble modeling
The major limitation of ensemble modeling is its computationally intensive nature, which
prevents it from applying over genome-scale metabolic models. Also, EM in the present form
could only estimate the metabolic behaviour of cell after a mutation like FBA, and is not capable
of predicting strain designs for over-production like Optknock/GDLS. Additionally, ensemble
modeling can only make a qualitative prediction of the dynamic behaviour using steady-state
flux data from literature. In order to make a more accurate quantitative estimation of the kinetic
parameters, EM requires knowledge of the actual mechanism of every reaction in the network
and information about the allosteric regulations that operate inside the cell and metabolite and
enzyme concentrations.
4.3 Methodology
The steps followed while performing a computational design of the strain for tyrosine over-
production in S. cerevisiae were outlined below:
(4.18)
29
1. Obtain the genome-scale model iMM904 in SBML format for steady-state strain design.
2. Formulate Optknock/GDLS such that the bioengineering objective is tyrosine production and
cellular objective is growth maximization.
3. Run Optknock/GDLS algorithm, using MATLAB, to obtain a set of reactions that need to be
knocked out to achieve our objective.
4. Run FBA simulation using COBRA toolbox after removing the above mentioned reactions to
verify the result from Optknock/GDLS and obtain FBA predictions of growth rate and tyrosine
production fluxes.
5. Develop a small model of S. cerevisiae metabolism taking into account the major metabolic
pathways and the reactions suggested by Optknock/GDLS.
6. Obtain reference steady-state flux values and ΔGs of all the reactions in the small-scale model
from literature.
7. Build an ensemble of 2500 models that are all anchored to the reference steady-state flux data
but with different kinetic parameters.
8. Screen these models using the data reported on succinic acid and glycerol over producing
strain reported in literature.
9. Use the final set of models to get a better understanding of S. cerevisiae metabolic flux
distribution and prioritize the set reactions among the ones suggested by steady-state modeling
for experimental validation.
30
Chapter 5
Results and Discussion
5.1 Steady-state modeling results
The strain design in this work was performed using bi-level optimization methods, over the
genome-scale model iMM904 with 1577 reactions and 1228 metabolites. Our initial attempt at
strain design was performed using Optknock in which, maximizing tyrosine production was
considered as the outer optimization objective and growth rate maximization was considered as
the inner optimization objective. The simulation was carried out using glucose as the substrate
under aerobic conditions (Fig 5.1).
Figure 5.1 Steady-state strain design approach adapted in this study.
31
Optknock did not converge when applied over iMM904. This was possibly because of the limit
on the maximum possible knockouts that could be allowed, which in our case was four.
Therefore, in order to circumvent this problem, we used an alternative formulation of Optknock
called GDLS. GDLS uses a local search formulation instead of the global search approach
adopted by Optknock and would allow us to increase the limit on the maximum allowable knock
outs. When GDLS was applied with tyrosine production as the objective of outer problem, it did
not yield any solution. However, as GDLS is a local search algorithm, we hypothesised that the
path followed by the algorithm while performing a search for strategy might be critical. Hence,
in order to direct the algorithm path, we first designed a strain that maximized for chorismate
production. Chorismate is an intermediate in aromatic amino acid pathway and a precursor to all
the three aromatic amino acids. We used the chorismate strain, instead of wild-type iMM904, as
the starting point for tyrosine strain design. With this approach of strain design, we could obtain
a strategy that showed excellent growth coupling. Figure below (Fig. 5.2) shows the FBA
predicted feasible flux space of wild type iMM904 and the mutant.
Figure 5.2 Map of computationally predicted solution space for wild type iMM904 and the strategy designed from
GDLS. A and B represent the optimal points for wild type and mutant strains
According to our hypothesis, cells should grow at state where growth rate is maximum. From the
above figure (Fig. 5.2) it is clear that for wild-type iMM904 this state is at point A. At this state
32
of maximum growth rate, there is no production of tyrosine as expected. However, in the case of
mutant, the cells are forced to produce tyrosine. After adaptive evolution, the mutant is expected
to grow at point B, where it is predicted to produce tyrosine at 60% of the maximum possible
mathematical yield.
5.1.1 Predicted strategy
Figure 5.3 Schematic of the deletions suggested by GDLS.
33
As shown in the figure (Fig. 5.3) above, the predicted strategy is complex and targets deletion of
8 different enzymes.
PDC, Pyruvate decarboxylase, exists in three isoforms PDC1, PDC5 and PDC6 in S. cerevisiae
(http://www.yeastgenome.org/).This enzyme catalyzes conversion of pyruvate to acetaldehyde
and plays a major role in respiro-fermentative metabolism. We hypothesize that deleting this
reaction would divert the flux from ethanol production into aromatic amino acid pathway.
PYC, pyruvate carboxylase, exists in two isoforms PYC1 and PYC2 and converts cytoplasmic
pyruvate to cytoplasmic oxaloacetate. MDH, malate dehydrogenase, is a mitochondrial enzyme
that converts malate to oxaloacetate. MTM, malate transporter, helps in export of malate across
mitochondrial membrane (http://www.yeastgenome.org/). We hypothesize that, PYC, MDH and
MTM deletions are required to prevent wastage of carbon flux by producing excess
mitochondrial ATP through respiratory metabolism of TCA cycle.
ZWF, glucose-6-phosphate dehydrogenase, catalyses the first step of pentose phosphate pathway
and is the major source of cytoplasmic NADPH pools. It has been reported that deletion of ZWF
would results in mithionine auxotrophy due to the depletion of NADPH pools in cytoplasm. We
hypothesize that, ZWF deletion increases the production through cofactor coupling as production
of tyrosine from prephenate (Fig. 3.2) is one of the few ways in which cells can generate
cytoplasmic NADPH pools (http://www.yeastgenome.org/).
SER, 3-phosphoglycerate dehydrogenase, catalyzes the first step in serine and glycine
biosynthesis. Deleting this gene does not deplete the cells of serine and glycine, because they can
be produced by an alternate route from alanine. SER deletion is suggested probably to prevent
diversion of flux from glycolysis. DAK, dihydroxy acetone kinase, is a gene involved in glycerol
production branch of cell metabolism, and the deletion was suggested probably to prevent
regeneration of depleted NADPH pools using this reaction, instead of tyrosine production.
ARO10 is a decarboxylase enzyme that plays a role in degradation of all three aromatic amino
acids (http://www.yeastgenome.org/). This was the only reaction suggested by GDLS inside the
aromatic amino acid pathway and we have experimental evidence (Fig. 5.4) to show that that
deletion of ARO10 gene results in increase of intracellular tyrosine pools.
34
5.1.2 Experimental validation for ARO10 deletion
Figure 5.4 Schematic of experimental modifications made while testing ARO10 deletion. We compared ARO10
deletion mutant with tyrosine feedback resistant mutant reported earlier.
Experimental validation of ARO10 deletion was carried out by Professor Vince Martin’s group,
our collaborators at Concordia Universirty. We compared the result of ARO10 deletion mutant
with the only reported work on increased tyrosine levels in S. cerevisiae (Luttik et al., 2008). In
this study, Luttik et al., observed increased tyrosine pools when the tyrosine sensitive feedback
enzymes were replaced by their feedback insensitive versions. Above figure (Fig 5.4) shows the
reactions targeted for experimental validation. The figure below (Fig. 5.5) shows the
accumulation rates of 4 HPP (A) and tyrosine (B) for four different strains: wild-type, mutant
with feedback insensitive genes, ARO10 deletion mutant, and ARO10 deletion mutant along
with removal of feedback inhibition. In this work, coumarate was used as a sink for tyrosine.
35
Figure 5.5 Graphs showing accumulation of 4HPP and tyrosine in the four strains that we compared: wild-type,
tyrosine feedback insensitive mutant, ARO10 deletion mutant and mutant with both ARO10 deletion and feedback
insensitive enzymes. (Data from Vince Martin’s Group)
4HPP is the immediate precursor of tyrosine which is converted to tyrosine by an equilibrium
reaction, catalyzed by aromatic amniotransferase. Intracellular 4HPP pools can also be drawn
towards our product: naringenin, or xanthohumol (Fig 1.1) as it exists in equilibrium with
tyrosine. The accumulation rates of both 4HPP and coumarate in ΔARO10 mutant were found to
be comparable to that of feedback resistant mutants that was reported earlier. However, an
important observation was that when removal of feedback inhibition was combined with ARO10
gene deletion, the intracellular levels of both coumarate and 4HPP were much higher than in
either of the above cases. This suggests that, in the feedback resistant mutant, most of the flux
36
coming in to the aromatic amino acid pathway was diverted towards production of tyrosol,
indole-3-ethanol and phenylethanol, instead of increasing the tyrosine pools.
5.1.3 Need for Ensemble modeling
As shown before, steady-state modeling predicted a complex strategy. According to this strategy,
there was no appreciable growth coupling of tyrosine unless all the predicted deletions were
made simultaneously. This seemingly incorrect prediction was made because the steady-state
models do not account for kinetic parameters and treat every reaction in the network equally
feasible. However, the fact that ARO10 deletion alone resulted in an increased tyrosine pool
suggests that, not all the above deletions were needed to observe further increase in tyrosine
levels. Additionally, it is experimentally tedious to make a mutant with multiple deletions, so it is
important to select the targets that are experimentally verifiable. In order to do this selection, we
need to determine which of the deletions suggested by the steady-state model play major roles in
re-routing the carbon flux towards tyrosine. Determining this subset required the information of
kinetics of the reactions. However, the time-course data required to predict kinetic parameters is
difficult to obtain. In order to circumvent this difficulty, we used ensemble modeling to estimate
the dynamic behaviour of the cells using steady state flux data (section 4.2).
5.2 Ensemble modeling results
As stated in chapter 4, although EM is a novel approach that can predict dynamic behaviour of
cells using steady-state reference data available in literature, the large number of parameters
involved makes it currently impractical to apply EM over a large-scale network. For this reason
we prepared a small central model of S. cerevisiae for EM application.
37
5.2.1 S. cerevisiae central model reconstruction
While constructing the model for EM, we included reactions from all the major metabolic
pathways such as glycolysis, glucose fermentation, pentose phosphate pathways and TCA cycle.
In this small-scale model, we accounted for compartmentalisation by separating the metabolites
in cytoplasm and mitochondria. Metabolites that were present in both the compartments were
connected through exchange reactions. However, we assumed that cofactors such as ATP,
NADPH and NADH could be freely transported across the two compartments. EM framework
required us to provide a reference steady-state flux data that was used to anchor the models. In
our case, because we did not know C13 data for our strain, we used yeast C13 data from Blank et
al (2005). However, the data that they reported was only for core central metabolism and did not
include the data for aromatic amino acid pathway. Hence, for this project, we have not
considered the reactions in aromatic amino acid pathway.
The central model that we used contained all the other reactions suggested by steady-state
modeling except the aromatic amino acid pathway. However, one of our hypotheses was that
NADPH pool depletion might play a role in coupling growth rate with tyrosine production. In
order to test this hypothesis, we modified the first step in aromatic amino acid pathway which
involves condensation of PEP and E4P to DAHP in to the following form: PEP + E4P + NADP -
--> DAHP + NADPH, and made it a reaction which could regenerate NADPH, although in the
actual reaction there is no NADPH regeneration. The reference steady-state flux through this
reaction was taken as zero because under steady-state conditions there should be no
accumulation of any of the aromatic amino acids in wild-type. Since we did not have aromatic
amino acid pathway in our model, we could not prioritize targets for production of tyrosine.
Instead, we looked for targets that would increase the intracellular pools of the precursors PEP,
E4P and DAHP.
Also, the flux data for all the reactions present in our network was not reported in their paper
(Blank et al., 2005). In order to calculate the unknown flux values, and to ascertain that the
predicted flux data was in a steady-state, we formulated the following optimization problem
which minimized the difference between the calculated and the reported flux data:
Min (Vcal - Vm)2
Subject to S * Vcal = 0
Reaction bounds
(5.1)
(5.2)
(5.3)
38
In the above formulation, Vcal is the calculated flux data for all the reactions in our network and
Vm is the reported flux value in Blank et al (2005). S is the stoichiometric matrix. The reaction
bounds are chosen as 0 and 1000 for irreversible, and -1000 and 1000 for reversible reactions.
The following figure (Fig. 5.6) shows the reactions we considered, along with the calculated
reference steady-state flux values:
Figure 5.6 Schematic of reconstructed S. cerevisiae network used in this work along with calculated flux data
We obtained the biomass equation for our reconstructed model from Heer et al. (2009), which
was assumed to be composed of:
1.911 G6P[c] + 0.351 R5P[c] + 0.363 GAP[c] + 1.332 OAA[c] + 0.584 SER[c] + 1.397
ACCOA[c] + 0.155 ACCOA[m] + 0.997 OXOGLUTARATE[m] + 2.147 PYR[m] + 0.239
PYR[c] + 0.579 PEP[c] + 0.289 E4P[c] + 11.352 ATP + 11.249 NADPH + 0.118 NAD --->
Biomass + 11.249 NADP + 0.118 NADH + 11.352 ADP
39
EM can also account for allosteric regulation by considering them as separate reactions. In our
network, we considered three reported allosteric regulations: activation and repression of
Fructose-6-Phosphate (F6P) to Fructose-1,6-bis-Phosphate (FbP) by AMP and ATP respectively
(Simonis et al., 2004), activation of PEP to Pyruvate conversion by FbP (Boles et al., 1997).
5.2.2 Development of an ensemble of models
The reconstructed network along with the reference steady-state flux values and ΔGs for all the
reactions in the network were given as input into the EM algorithm and we constructed a set of
2500 models. Each of these models had a different set of kinetic parameter values, but all of
them were anchored to the same steady-state flux data.
5.2.3 Screening the ensemble using data from literature
The advantage of using EM is that, we can predict the dynamic behaviour of cells by using
reported steady-state flux data from enzyme tuning experiments. In this work, to screen the
models, we used data of glycerol (Nevoigt et al., 1996) and succinic acid (Raab et al., 2010)
over-producing strains reported in the literature. We chose these data sets because; all the
modifications done in their studies were performed on reactions that were included in our central
model.
In the glycerol over producing strain, pyruvate decarboxylase (PDC) and glycerol-3-phosphate
dehydrogenase (GPD) were the targeted genes. It was observed that when PDC gene was
repressed to approximately 20% activity, the yield of glycerol increased by 4.5 folds. When GPD
was over-expressed by increasing its activity 20 fold, a six fold increase in the yield of glycerol
was observed, and when these two manipulations were performed simultaneously, the glycerol
yield increased by around 8 fold. In each of the above mentioned glycerol over-production cases,
ethanol production rate decreased.
40
In the case of succinate over-producing strain, succinate dehydrogenase (SDH) and isocitrate
dehydrogenase (IDH) were the targeted reactions. When these two genes were deleted, an
increase in succinate production along with 25% reduction in the growth rate of the strain was
observed.
The schematic below (Fig. 5.6) shows how models were screened using data from each of the
above mentioned experimental findings. For model screening, we perturbed each of the 2500
models computationally, by modifying the enzyme concentration to reflect the genetic
manipulation performed experimentally. So in the case of glycerol production, to simulate PDC
repression the enzyme concentration in all the 2500 models was decreased to .2 times the
original concentration and for GPD over-expression, the enzyme concentration was increased 20
times. In the case of succinic acid strain, SDH and IDH deletions were simulated by decreasing
the corresponding enzyme concentrations in each model to .01 times the original value. After
these perturbations, we found that 42 of the 2500 models could simulate the results in agreement
with the experimental observations. These models were then used for determining the
experimental targets.
The EM framework applied in this work is qualitative, and cannot be expected to give an
accurate prediction of the observed experimental flux data after an enzyme tuning experiment.
Therefore, while screening these models we allowed for a 25% error range to the predicted flux
values. The actual ranges used for screening each perturbation are shown in the table below:
Perturbation Range
GPD over-expression Glycerol yield between 4.5 times to 7.5 times the wild
type yield
PDC repression Glycerol yield between 3.2 times to 5.7 times the wild
type yield
GPD over-expression + PDC repression Glycerol yield more than 6 times the wild type yield
SDH and IDH deletion Growth rate between 60% to 90% of the wildtype
growth rate
41
Figure 5.7 Model screening using data from succinic acid and glycerol over producing strains
5.2.4 Prioritizing the targets for experiments using screened models
The 42 models obtained after screening with reported the experimental data, were used to
prioritize the reactions to be targeted among the deletions suggested by the steady-state
modeling. To accomplish this, the selected models were perturbed to simulate the flux
distribution when each of the deletions is made. Then, we observed the effect of deletion on
growth rate, and accumulation rates of PEP, E4P and DAHP. Accumulation rate of a metabolite
was defined as the difference between the fluxes of reactions for which, the metabolite is a
product and a reactant. The plots (Fig. 5.8) below show average rate of accumulation of PEP,
E4P, DAHP and Biomass formation predicted for the 42 models. Results from SER and DAK
deletion are not shown because they did not show any significant increase in accumulation of
PEP, E4P or DAHP.
Although GDLS and FBA predict that PDC reaction is not a lethal deletion for S. cerevisiae, it
has been reported that if we delete all three isoforms of PDC (PDC1, PDC5 and PDC6), yeast
does not grow with glucose as the only carbon source (Flikweert et al., 1996; Hohmann et al.,
42
1991). Also, complete deletion of PDC was not recommended for this project, because we
needed to produce acetate which will then be converted to Acetyl CoA and then further to
Malonyl CoA. Malonyl CoA is required for conversion of 4-Coumaryl CoA to naringenin, as
shown in Fig. 1.1 Therefore while simulating EM, we repressed PDC gene instead of deleting it.
It has been reported that, PDC gene repression can be achieved by deleting PDC1 isoform which
results in 30% reduction of total pyruvate decarboxylase activity (Flikweert et al., 1996).
Figure 5.8 EM predicted PEP, E4P, DAHP accumulation and biomass formation rates when deletions suggested by
GDLS are implemented.
Model predicted that, PDC repression is the most important manipulation. PDC repression would
result in increased intracellular pools of PEP (Fig 5.8), but no significant increase was observed
in either E4P or DAHP pools (Fig 5.8). PEP accumulation was observed probably because,
repression of the PDC gene resulted in decreased flux from pyruvate to acetaldehyde (which is a
major flux in S. cerevisiae under aerobic conditions), thereby increasing the pools of pyruvate,
and its immediate precursor PEP. Additionally, the predicted drop in the growth rate was
significant compared to the wild-type (Fig 5.8).
43
PYC deletion does not add any significant value to our objective, but when PYC, MDH and
MTM were deleted together along with PDC repression, we observed that growth rate decreased
significantly (Fig 5.8), without any increase in metabolites of our interest. Therefore, PYC,
MTM and MDH deletions may not be required to observe improved tyrosine levels.
ZWF deletion although resulted in a significant decrease in growth rate (Fig. 5.7), it was useful
to improve the intracellular DAHP, our proxy metabolite for tyrosine, levels (Fig 5.7). Also, we
observed a slight increase in E4P pools with ZWF deletion. This sudden appearance of DAHP
inside the cell was probably to compensate for the reduced NADPH pools. In addition to
appearance of DAHP, we observed that the flux from acetaldehyde was directed more towards
acetate than ethanol, probably because production of acetate results in generation of NADPH.
This observation is of interest to us because; acetate is the precursor for Malonyl CoA and,
diverting more flux from acetaldehyde towards acetate instead of ethanol will be beneficial for
the project. After this observation, we looked into literature to see if there was any reported data
that supported our observation. We found that, this is indeed true and that over-expressing ALD6
(Grabowska et al., 2003), major isoform of acetaldehyde dehydrogenase, is the only way in
which growth defect could be reversed for ZWF mutants. In order to test this hypothesis, we
over-expressed acetaldehyde dehydrogenase activity in the models, and we did observe a slight
increase in the growth rate (Fig. 5.9). Over-expression of ALD reaction also resulted in further
increase of flux towards acetate. The model predicted flux distribution patterns for the reactions
in fermentative branch of the yeast for different mutants were discussed in detail in appendix C.
Figure 5.9 Graph showing the effect of ALD over-expression on growth rate of PDC-- and ΔZWF mutant.
44
In addition to improving the growth rate of PDC repressed and ZWF deleted mutant, over-
expression of ALD6 can compensate for the loss of acetate flux resulting from PDC repression.
5.3 Final strategy to be verified experimentally
The final strategy that we propose to obtain a strain with increased tyrosine levels is shown in the
following schematic:
Figure 5.10 Schematic of the proposed final strategy for tyrosine over-producing strain
The final strategy proposed involved
- Deletion of three genes: ZWF1, glucose-6-phosphate dehydrogenase ; PDC1, major
isoform of pyruvate decarboxylase and ARO10
- Over-expression of ALD6, major isoform of acetaldehyde dehydrogenase
- Removal of tyrosine feedback resistance by expressing tyrosine insensitive AROFFBR
45
Chapter 6
Conclusions and Future Work
6.1 Conclusions
In this work, we proposed a strategy for obtaining a strain of S.cerevisiae with an improved
tyrosine producing capability. The proposed strategy involved five genetic manipulations:
deletion of PDC1, ZWF1 and ARO10 genes, over-expressing ALD6 and substitution of natural
enzymes with their feedback insensitive isoforms, which was shown previously to improve
tyrosine yields. Initial experimental validation of our strategy revealed that, when ARO10
deletion was combined with incorporation of feedback insensitive enzymes, the intracellular
tyrosine pools increased significantly higher than the previously reported strain. While ARO10
deletion and replacement of feedback insensitive enzymes in aromatic amino acid pathway was
already shown to be effective, dynamic modeling predicted that PDC1 and ZWF1 deletions
would further improve the flux to tyrosine by improving the pools of precursors and through
cofactor coupling. Also, we predicted that ZWF1 deletion would be useful to divert greater flux
from acetaldehyde towards acetate instead of ethanol due to NADPH coupling. This would be
further enhanced by over-expression of ALD6, which would also contribute towards improving
the growth rate of PDC1, ZWF1 deletion mutant. We expect the final strain containing all the
suggested manipulations would show higher pools of both tyrosine and acetate, thereby, making
it an ideal platform strain for production of plant secondary metabolites like xanthohumol.
Our work will be the first reported study that investigated the genome-wide engineering of
S.cerevisiae for improved tyrosine production. Apart from using the final strain as the host for
production of plant metabolites, if the yield of tyrosine is found to be sufficiently high, the strain
can be used for large-scale production of tyrosine which by itself is a valuable compound
industrially. Using S. cerevisiae as the host instead of E.coli is advantageous because yeast is a
more robust organism.
46
The modeling procedure discussed here, provides an effective way to make a genome-scale
experimental design in the absence of kinetic and regulatory information. Although steady-state
strain design algorithms are effective in predicting genetic engineering strategies, the predicted
strategies can sometimes prove to be difficult for experimental validation. In such cases, the
procedure followed in this work of prioritizing the deletions by taking kinetics into account will
be very useful. Also it can be concluded that, because steady-state modeling methods does not
account for kinetics, it is always advisable to test the steady state predictions using a dynamic
model before experimental validation.
6.2 Future work
We are currently working on validating the final strategy experimentally, which is an obvious
future direction. Also, modeling methods such as ensemble modeling are highly strain specific,
and if an accurate dynamic model for a strain is desired, it is necessary to use data only from that
strain for both construction and screening of the ensemble. We are currently working on
obtaining C13 flux data for the strain that we are using for plant secondary metabolite
production. Once the C13 data is available, we will expand our central model to include
reactions from aromatic amino acid pathway and the heterologous pathway. This expanded
central model would be used to develop a kinetic model that is specific to our strain using
experimental results of ZWF1, PDC1 and ARO10 deletions.
47
References
Alper H, Jin YS, Moxley JF, Stephanopoulos G. Identifying gene targets for the metabolic
engineering of lycopene biosynthesis in Escherichia coli. 2005. Metabolic Engg., 7:155–164.
Asadollahi MA, Maury J, Patil KR, et al: Enhancing sesquiterpene production in Saccharomyces
cerevisiae through in silico driven metabolic engineering. 2009. Metabolic Engg. 11:328-34.
Bailey JE. Complex biology with no parameters. 2001. Nature Biotechnology, 19:503–504.
Berry, A. Improving production of aromatic compounds in Escherichia coli by metabolic
engineering. 1996. Trends in Biotechnol., 14:250–256.
Blank LM, Lars Kuepfer, Uwe Sauer. Large-scale 13C-flux analysis reveals mechanistic
principles of metabolic network robustness to null mutations in yeast. 2005. Genome Biology,
6: R49.
Bongaerts J, Krämer M, Müller U, Raeven L, Wubbolts M. Metabolic engineering for
microbial production of aromatic amino acids and derived compounds. 2001. Metabolic Engg.
3:289–300.
Breuer M, Ditrich K, Habicher T, Hauer B, Keβeler M, Stürmer R, Zelinski T. Industrial
methods for the production of optically active intermediates. 2004. Angewandte Chem Int Ed,
43:788–824.
Bro C, Regenberg B, Förster J, Nielsen J. In silico aided metabolic engineering of
Saccharomyces cerevisiae for improved bioethanol production. 2006. Metabolic Engg.,
8:102–111.
48
Brochado, Ana Rita, Claudia Matos, Birger L Møller, Jørgen Hansen, Uffe H Mortensen, Kiran
Raosaheb Patil. Improving vanillin production in baker’s yeast through in silico design. 2010.
Metabolic Cell Factories, 9:84.
Brown, I.W. Dawes. Regulation of chorismate mutase in Saccharomyces cerevisiae. 1990.
Molecular Genetics and Genomics, pp. 283–288.
Burgard, A. P., Pharkya, P., Maranas, C. D. OPTKNOCK: a bilevel programming framework for
identifying gene knockout strategies for microbial strain optimization. 2003. Biotechnology and
Bioengineering. 84(6), 647-657.
Chassagnole, C, Noisommit-Rizzi, N., Schmid, J.W., Mauch, K., Reuss, M. Dynamic
modeling of the central carbon metabolism of Escherichia coli. 2002. Biotechnology and
bioengineering 79, 53-73.
Chemler JA, Yan Y, Koffas MA. Biosynthesis of isoprenoids, polyunsaturated fatty acids and
flavonoids in Saccharomyces cerevisiae. 2006. Microbial Cell Fact, 5:20.
Contador, C. A., Rizk, M. L., Asenjo, J. A., Liao, J. C., Ensemble modeling for strain
development of L-lysine-producing Escherichia coli. 2009. Metabolic Engg., 11, 221–233.
Dean, J. T., M. L. Rizk, Y. Tan, K. M. Dipple, and J. C. Liao. Ensemble modeling of hepatic
fatty acid metabolism with a synthetic glyoxylate shunt. 2010. Biophysical Journal, 98 (8) :
1385-95.
Dejong JM, Liu Y, Bollon AP, Long RM, Jennewein S, Williams D, Croteau RB Genetic
engineering of taxol biosynthetic genes in Saccharomyces cerevisiae. 2006. Biotechnol
Bioeng., 93:212–224.
Dobson PD, Smallbone K, Jameson D, Simeonidis E, Lanthaler K, Pir P, Lu C, Swainston N,
Dunn WB, Fisher P. Further developments towards a genome-scale metabolic model of yeast.
2010. BMC Systems Biology, 4:145–151.
49
Dosselaere F, Vanderleyden J. A metabolic node in action: chorismate-utilizing enzymes in
microorganisms. 2001. Crit Rev Microbiol, 27:75–131.
Duarte NC, Herrgård MJ, Palsson B. Reconstruction and Validation of Saccharomyces
cerevisiae iND750, a Fully Compartmentalized Genome-Scale Metabolic Model.
2004. Genome Res.,14:1298–1309.
Edwards JS, Ibarra RU, Palsson BØ: In silico predictions of Escherichia coli metabolic
capabilities are consistent with experimental data. 2001. Nature Biotechnology, 19:125-30.
Edwards JS, Palsson BO. The Escherichia coli MG1655 in silico metabolic genotype: its
definition, characteristics, and capabilities. 2000. PNAS USA., 97:5528–5533.
Famili I, Förster J, Nielsen J, Palsson BØ: Saccharomyces cerevisiae phenotypes can be
predicted by using constraint-based analysis of a genome-scale reconstructed metabolic
network. 2003. PNAS USA, 100:13134-9.
Flikweert MT, Zanden LVD, Janssen WMTM, et al: Pyruvate decarboxylase: An
indispensable enzyme for growth of Saccharomyces cerevisiae on glucose. 1996. Yeast,
140:1723-257.
Feist AM, Palsson BO The growing scope of applications of genome-scale metabolic
reconstructions using Escherichia coli. 2008. Nature Biotechnology, 26:659–667.
Fong SS, Burgard AP, Herring CD, Knight EM, Blattner FR, Maranas CD, Palsson BO. In
silico design and adaptive evolution of Escherichia coli for production of lactic acid. 2005.
Biotechnol Bioeng., 91:643–648.
50
Forster J, Famili I, Fu P, Palsson BO, Nielsen J. Genome-scale reconstruction of the
Saccharomyces cerevisiae metabolic network. 2003. Genome Res., 13:244–253.
Frost JW, Draths KM. Biocatalytic synthesis of aromatics from D-glucose: renewable microbial
sources of aromatic compounds. 1995. Annu Rev Mircobiol., 49:557–579.
Gosset, G. Production of aromatic compounds in bacteria. 2009. Curr Opin Biotechnol., 20 , pp.
651–658.
Goffeau A: The yeast genome directory. 1997. Nature, 387:5-6.
Grabowska D and Chelstowska A .The ALD6 gene product is indispensable for providing
NADPH in yeast cells lacking glucose-6-phosphate dehydrogenase activity. 2003. J Biol
Chem., 278(16):13984-8.
Hawkins, K.M. and C.D. Smolke, Production of benzylisoquinoline alkaloids in Saccharomyces
cerevisiae. 2008. Nature Chemical Biology, 4: 564-73.
Hartmann, T.R. Schneider, A. Pfeil, G. Heinrich, W.N. Lipscomb, G.H. Braus. Evolution of
feedback-inhibited beta /alpha barrel isoenzymes by gene duplication and a single mutation.
2003. PNAS USA, 100 pp. 862–867.
Heer D, Heine D, Sauer U: Resistance of Saccharomyces cerevisiae to high concentrations of
furfural is based on NADPH-dependent reduction by at least two oxireductases. 2009. Appl
Environ Microbiol, 75:7631.
Herrgard MJ et al. A consensus yeast metabolic network reconstruction obtained from a
community approach to systems biology. 2008. Nature Biotechnology, 26:1155–1160.
Hohmann S: Characterization of PDC6, a third structural gene for pyruvate decarboxylase in
Saccharomyces cerevisiae. 1991. Journal of bacteriology, 173:7963-9.
51
Ikeda, M. Amino acid production processes. 2003. Advances in biochemical engineering/
biotechnology, 79:1-35.
Ikeda M. Towards bacterial strains overproducing L-tryptophan and other aromatics by
metabolic engineering. 2006. Appl Microbiol Biotechnol., 69:615–626.
Jankowski, M. D., C. S. Henry, L. J. Broadbelt, and V. Hatzimanikatis. Group contribution
method for thermodynamic analysis of complex metabolic networks. 2008. Biophysical
Journal, 95 (3) (Aug): 1487-99.
Jiang H, Wood KV, Morgan JA. Metabolic engineering of the phenylpropanoid pathway in
Saccharomyces cerevisiae. 2005. Appl Environ Microbiol, 71:2962–2969.
Juminaga D, Edward E. K. Baidoo, Alyssa M. Redding-Johanson, Tanveer S. Batth, Helcio
Burd, Aindrila Mukhopadhyay, Christopher J. Petzold
and Jay D. Keasling.
Modular
Engineering of Tyrosine Production in Escherichia coli. 2012. Appl. Environ. Microbiol., vol.
78.
Kauffman KJ, Prakash P, Edwards JS. Advances in flux balance analysis. 2003. Curr Opin
Biotechnol., 14:491–496.
Khosla C, Keasling JD, Metabolic engineering for drug discovery and development. 2003. Nat
Rev Drug Discov., 2:1019–1025.
Krappmann, W.N. Lipscomb, G.H. Braus. Coevolution of transcriptional and allosteric
regulation at the chorismate metabolic branch point of Saccharomyces cerevisiae. 2000.
PNAS USA, 97 pp. 13585–13590.
Kuepfer L, Sauer U, Blank LM. Metabolic functions of duplicate genes in Saccharomyces
cerevisiae. 2005. Genome Res., 15:1421–1430.
52
Kunzler, G. Paravicini, C.M. Egli, S. Irniger, G.H. Braus. Cloning, primary structure and
regulation of the ARO4 gene, encoding the tyrosine-inhibited 3-deoxy-d-arabino-
heptulosonate-7-phosphate synthase from Saccharomyces cerevisiae. 1992. Gene, 113 pp. 67–
74.
Lee, D.Y. et al. WebCell: a web-based environment for kinetic modeling and dynamic
simulation of cellular networks. 2006. Bioinformatics 22, 1150-1151.
Leuchtenberger, K. Huthmacher, K. Drauz Biotechnological production of amino acids and
derivatives: current status and prospects. 2005. Appl Microbiol Biotechnol, 69 pp. 1–8
Lun, D. S., Rockwell, G., Guido, N. J., Baym, M., Kelner, J. A., Berger, B., Galagan, J. E., et
al. Large-scale identification of genetic design strategies using local search. 2009. Molecular
Systems Biology.
Lütke-Eversloh, C.N. Santos, G. Stephanopoulos Perspectives of biotechnological production of
tyrosine and its applications. 2007. Appl Microbiol Biotechnol., 77 pp. 751–762.
Luttik MAH, Vuralhan Z, Suir E, et al: Alleviation of feedback inhibition in Saccharomyces
cerevisiae aromatic amino acid biosynthesis: quantification of metabolic impact. 2008.
Metabolic engineering, 10:141-53.
Maury J, Asadollahi MA, Moller K, Clark A, Nielsen J. Microbial isoprenoid production: an
example of green chemistry through metabolic engineering. 2005. Adv Biochem Eng
Biotechnol., 100:19–51.
Minami, H., J. S. Kim, N. Ikezawa, T. Takemura, T. Katayama, H. Kumagai, and F. Sato.
Microbial production of plant benzylisoquinoline alkaloids. 2008. PNAS USA, 105: 7393-98.
53
Mo ML, Palsson B, Herrgård MJ. Connecting extracellular metabolomic measurements to
intracellular flux states in yeast. 2009. BMC Sys Biol., 3:37–54.
Nevoigt E, Stahl U. Reduced pyruvate decarboxylase and increased glycerol-3-phosphate
dehydrogenase [NAD+] levels enhance glycerol production in Saccharomyces cerevisiae. 1996.
Yeast, Oct;12(13):1331-7.
Nissen TL, Kielland-Brandt MC, Nielsen J, Villadsen J: Optimization of ethanol production in
Saccharomyces cerevisiae by metabolic engineering of the ammonium assimilation. 2000.
Metabolic Engg. , 2:69-77.
Nookaew I, Jewett MC, Meechai A, Thammarongtham C, Laoteng K, Cheevadhanarak S,
Nielsen J, Bhumiratana S. The genome-scale metabolic model iIN 800 of Saccharomyces
cerevisiae and its validation: a scaffold to query lipid metabolism. 2008. BMC Sys Biol., 2:71.
Olson MM, Templeton LJ, Suh W, Youderian P, Sariaslani FS, Gatenby AA, Van Dyk TK .
Production of tyrosine from sucrose or glucose achieved by rapid genetic changes to
phenylalanine-producing Escherichia coli strains. 2007. Appl Microbiol Biotechnol.,
74:1031–1040.
Osterlund, T., Nookaew, I., Nielsen, J. Fifteen years of large scale metabolic modeling of
yeast: Developments and impacts. 2011. Biotechnology Advances, V. 30 (5).
Orth, J.D., I. Thiele, and B.Ø. Palsson. What is flux balance analysis? 2010. Nature
Biotechnology, 28 (3): 245-8.
Patil,K.R., Rocha,I., Forster,J., and Nielsen,J. Evolutionary programming as a platform for in
silico metabolic engineering. 2005. BMC Bioinformatics.
54
Pharkya P, Maranas CD. An optimization framework for identifying reaction
activation/inhibition or elimination candidates for overproduction in microbial systems. 2006.
Metabolic Engg., 8:1–13.
Pittard J Biosynthesis of aromatic amino acids. In: Neidhardt FC (ed) Escherichia coli and
Salmonella typhimurium: cellular and molecular biology, vol. 1. 1996. American Society of
Microbiology, Washington, DC, pp 458–484.
Porro D, Sauer M, Branduardi P, Mattanovich D. Recombinant protein production in yeasts.
2005. Mol Biotechnol., 31:245–259.
Price ND, Papin JA, Schilling CH, Palsson BO. Genome-scale microbial in silico models: the
constraints-based approach. 2003. Trends in Biotechnol., 21:162–169.
Primrose SB. The application of genetically engineered micro-organisms in the production of
drugs. 1986. J. Appl. Bacteriol., 61: 99-116.
Ra M, Mm D. Simple constrained optimization view of acetate overflow in E. coli. 1990.
Biotechnol Bioeng., 35:732–738.
Raab AM, Gebhardt G, Bolotina N, Weuster-Botz D, Lang C. Metabolic engineering of
Saccharomyces cerevisiae for the biotechnological production of succinic acid. 2010. Metabolic
Engg., 12, 518-525.
Ramakrishna R, Edwards JS, McCulloch A, Palsson BO. Flux-balance analysis of mitochondrial
energy metabolism: consequences of systemic stoichiometric constraints. 2001. American J
Physiol., 280:695–704.
55
Ranganathan S, Suthers PF, Maranas CD. OptForce: An Optimization Procedure for
Identifying All Genetic Manipulations Leading to Targeted Overproductions. 2010. PLoS
Comp Biol., 6:1–11.
Ro DK, Douglas CJ. Reconstitution of the entry point of plant phenylpropanoid metabolism in
yeast (Saccharomyces cerevisiae): implications for control of metabolic flux into the
phenylpropanoid pathway. 2004. J Biol Chem., 279:2600–2607.
Ro DK, Paradise EM, Ouellet M et al. Production of the antimalarial drug precursor
artemisinic acid in engineered yeast. 2006. Nature, 440:940–943.
Santos CNS, MattheosKoffas , GregoryStephanopoulos. Optimization of a heterologous
pathway for the production of flavonoids from glucose. 2011. Metabolic Engg., 13 392–400.
Segre D, Vitkup D, Church GM. Analysis of optimality in natural and perturbed metabolic
networks. 2002. PNAS USA, 99:15112–15117.
Khazaei, Tahmineh. Ensemble Modeling of Cancer Metabolism. 2011. University of Toronto,
MASc thesis.
Tran, L. M., Rizk, M. L., Liao, J. C., Ensemble modeling of metabolic networks. 2008.
Biophys. J., 95, 5606–5617.
Wang, L., Birol, I. & Hatzimanikatis, V. Metabolic control analysis under uncertainty:
framework development and case studies. 2004. Biophysical journal, 87, 3750-3763.
Wang Q, Chen X, Yang Y, Zhao X. Genome-scale in silico aided metabolic analysis and flux
comparisons of Escherichia coli to improve succinate production. 2006. Appl Microbiol
Biotechnol., 73:887–894.
56
Yan Y, Huang L, Koffas MA. Biosynthesis of 5-deoxy flavanones in microorganisms. 2007.
Biotechnol. J, 2:1250–1262.
Yang, L., Cluett, W. R., Mahadevan, R. EMILiO: a fast algorithm for genome-scale strain
design. 2011. Metabolic Engg., 13(3), 272-281.
Yi J, Draths KM, Li K, Frost JW. Altered glucose transport and shikimate product yields in
Escherichia coli. 2003. Biotechnol. Prog., 19:1450–1459.
Zabriskie DW, Arcuri EJ. Factors influencing productivity of fermentations employing
recombinant microorganisms. 1986. Enzyme Microb. Technol., 8: 706-717.
Zomorrodi, A.R and Maranas, C.D. Improving the iMM904 S. cerevisiae metabolic model
using essentiality and synthetic lethality data. 2010. BMC Systems Biology, 4:178.
57
Appendix A
Table showing the reactions that we considered for central model of S.cerevisiae, along with
the ΔG and steady-state flux values
Name Reaction G Vss
hex [c] : D-Glucose + ATP ---> ADP + G6P -4.5 10
zwf [c] : G6P + NADP ---> NADPH + 6PGL -2 1.06
sol [c] : 6PGL + NADP ---> R5P + NADPH -4.81 1.06
pgi [c] : G6P <==> F6P -0.9 8.47
pfk [c] : ATP + F6P ---> ADP + FBP -4.5 9.1
fba [c] : FBP ---> 2 GAP 4.3 9.1
tkt1 [c] : 2 R5P <==> GAP + S7P 2.8 0.35
tal [c] : GAP + S7P <==> E4P + F6P -1.75 0.35
tkt2 [c] : E4P + R5P <==> GAP + F6P -1.75 0.28
ser3 [c] : GAP + NAD ---> NADH + SER 3.9 0.12
gpd [c] : GAP + NAD <==> 3pDGP + NADH -0.36 17.24
pgk [c] : 3pDGP + ADP ---> ATP + PEP -2.63 17.24
pyk [c] : PEP + ADP ---> PYR + ATP -4.61 17.14
pepck [c] : OAA + ATP ---> ADP + PEP -0.29 0.03
pyc [c] : PYR + ATP ---> ADP + OAA -1.1 0.89
acoah [c] : ACETATE + ATP ---> ACCOA + AMP -0.95 0.35
ald [c] : ACETALDEHYDE + NADP ---> ACETATE + NADPH -11.99 0.87
adh [c] : ACETALDEHYDE + NADH ---> ETHANOL + NAD -5.78 14.17
pdc [c] : PYR ---> ACETALDEHYDE -3.44 15.04
gdh [c] : G3P + NADH ---> GLYCEROL + NAD 1.44 1.06
glyc1 [c] : GLYCEROL + NADP ---> DHA + NADPH -0.9 0
dak [c] : DHA + ATP ---> ADP + GAP -4.5 0
dahps [c] : PEP + E4P + NADP ---> DAHP + NADPH -12.17 0
pdhm [m] : PYR + NAD ---> ACCOA + NADH -8.14 0.66
csm [m] : OAA + ACCOA ---> CIT -8.93 0.64
icdhm [m] : CIT + NAD ---> NADH + OXOGLUTARATE 4.55 0.64
icl [m] : CIT ---> SUCCINATE + GLYOXYLATE 5.25 0
mas [m] : GLYOXYLATE + mitACCOA ---> MAL 8 0
kgdm [m] : OXOGLUTARATE + NAD ---> SUCCINYLCOA + NADH -9.68 0.4
sucoam [m] : SUCCINYLCOA + 0.5 ADP ---> SUCCINATE + 0.5 ATP 1.06 0.2
sdhm [m] : SUCCINATE + NAD ---> FUMARATE + NADH -2.44 0.13
fumm [m] : FUMARATE ---> MAL -0.61 0.13
mdhm [m] : MAL + NAD ---> NADH + OAA 4.8 0.1
malpyr [m] : MAL + NADP ---> PYR + NADPH 1.32 0.03
nadhatp [c] : 2 NADH + 3 ATP + O2 ---> 2 NAD + 3 ATP -0.1 1.99
58
pyr_t PYR[c] <==> PYR[m] -0.1 1.15
accoa_t ACCOA[c] <==> ACCOA[m] -0.1 0.02
oaa_t OAA[c] <==> OAA[m] -0.1 0.53
atp_r ATP <==> ADP -0.1 17.1
nadp_r NADPH <==> NADP -0.1 0.33
nad_r NADH <==> NAD -0.1 0.01
glc_in [c] : ---> D-Glucose -0.1 10
O2_in [c] : ---> O2 -0.1 1.99
glycerol_out [c] : GLYCEROL ---> -0.1 1.06
ac_out [c] : ACETATE ---> -0.1 0.52
eth_out [c] : ETHANOL ---> -0.1 14.2
succ_out [m] : SUCCINATE ---> -0.1 0.26
dahp_out [c] : DAHP ---> -0.1 0
Biomass_out [c] : Biomass ---> -0.1 0.26
Abbreviations
Metabolites
G6P glucose-6-phosphate F6P fructose-6-phosphate
FBP fructose-1,6-bisphosphate DHAP dihydroxyacetonephosphate
GAP glyceraldehdye-3-phosphate DHA dihydroxyacetone
3PG 3-phosphoglycerate PEP phosphoenolpyruvate
PYR pyruvate SER serine
6PGL 6-phosphogluconate Ru5P ribulose-5-phosphate
X5P xylulose-5-phosphate R5P ribose-5-phosphate
S7P sedoheptulose-7-phosphate E4P erythrose-4-phosphate
ACCOA acetyl-CoA OAA oxaloacetate
CIT citrate ICIT isocitrate
MAL malate ATP adenosine-triphosphate
ADP adenosine-diphosphate AMP adenosine-monophosphate
NADH diphosphopyridinenucleotide-reduced NAD diphosphopyridinenucleotide
59
NADP nicotinamideadeninedinucleotidephosphate
NADPH nicotinamideadeninedinucleotidephosphate-reduced
DAHP 3-deoxy-D-arabino-hepulosonate-7-phosphate
Enzymes
hex hexokinase pgi phosphoglucose isomerase
pfk phosphofructokinase fba fructose bisphosphatealdolase
gpd glyceraldehyde 3-phosphatedehydrogenase pgk phosphoglycerate kinase
pyk pyruvate kinase pdh pyruvate dehydrogenase
pepck phosphoenolpyruvate carboxylase gdh glycerol 3-phosphatedehydrogenase
ser3 serine synthesis zwf glucose-6-phosphate-1-dehydrogenase
tkt1 transketolase 1 tkt2 transketolase 2
tal transaldolase dahps dahp synthesase
csm citrate synthase icdhm isocitrate dehydrogenase
sdhm succinate dehydrogenase fumm fumarase
mdhm malate dehydrogenase icl isocitrate lyase
mas malate synthase pyc pyruvate carboxylase
dak dihydroxy-acetonekinase pdc pyruvate decarboxylase
acoah acetylcoahydroxylase ald acetaldehyde dehydrogenase
pdhm pyruvatedehydrogenase kgdm oxoglutarate dehydrogenase complex
sucoam succinate-Coa ligase adh alcohol dehydrogenase
sol enzyme representing the clubbed reactions from 6PGL to R5P
pyr_t pyruvate transfer atp_r atp-recycle
O2_in Oxygen inflow ac_ out acetate outflow
60
Appendix B
Standard mechanisms used to break a reaction into its elementary form (Dean et al., 2010) are
shown below. If a reaction contains more than 2 substrates or products, it should be
decomposed to two or more reactions that meet the required criteria.
One substrate, One product
𝑋𝑖 + 𝐸𝑖 ↔ 𝑋𝑖𝐸𝑖 ↔ 𝑋𝑖+1𝐸𝑖 ↔ 𝑋𝑖+1 + 𝐸𝑖
One substrate, Two products
𝑋𝑖 + 𝐸𝑖 ↔ 𝑋𝑖𝐸𝑖 ↔ 𝑋𝑖+1𝑋𝑖+2𝐸𝑖 ↔ 𝑋𝑖+1 + 𝑋𝑖+2𝐸𝑖 ↔ 𝑋𝑖+2 + 𝐸𝑖
Two substrates, One product
𝑋𝑖 + 𝐸𝑖 ↔ 𝑋𝑖𝐸𝑖 + 𝑋𝑖+1 ↔ 𝑋𝑖𝑋𝑖+1𝐸𝑖 ↔ 𝑋𝑖+2𝐸𝑖 ↔ 𝑋𝑖+2 + 𝐸𝑖
Two substrates, Two products
𝑋𝑖 + 𝐸𝑖 ↔ 𝑋𝑖𝐸𝑖 + 𝑋𝑖+1 ↔ 𝑋𝑖𝑋𝑖+1𝐸𝑖 ↔ 𝑋𝑖+2𝑋𝑖+3𝐸𝑖 ↔ 𝑋𝑖+2 + 𝑋𝑖+3𝐸𝑖 ↔ 𝑋𝑖+3 + 𝐸𝑖
Allosteric regulation
M + 𝐸𝑓𝑟𝑒𝑒 ↔ 𝐸𝑐𝑜𝑚𝑝𝑙𝑒𝑥
After breaking down the reactions in our network (shown in Appendix A) into their
elementary form, EM generated a set of 394 elementary reactions. Each of the 2500 models
considered had a different set of kinetic parameters for these 394 elementary reactions. The
excel file attached with this report contains the set of kinetic parameters for the forty two
screened models. The excel file also contains the stoichiometric matrix for the elementary
reactions. The stoichiometric matrix has 394 reactions and 240 metabolites. Out of the 240
metabolites, 49 are the actual metabolites and remaining 191 are the enzyme complexes.
61
Appendix C
The above plots show dynamic model predicted flux through the reactions ald, acetaldehyde
dehydrogenase and adh, alcohol dehydrogenase which converts acetaldehyde to acetate and
ethanol respectively. The data shown is for the four strains: wild type; mutant with PDC
repression; mutant with PDC repression and ZWF deletion and mutant with ALD over-
expression in addition to PDC repression and ZWF deletion. The table shows the ratio of
fluxes towards acetate and ethanol for the four strains. From the plots it is clear that, for all the
three mutants the flux towards ethanol and acetate was lower when compared to the wild type.
This lower flux is observed because, all three mutants contain PDC repression which would
decrease the flux towards acetaldehyde, the common precursor for ethanol and acetate. From
the table it can be observed that, while the ratio of flux towards ethanol and acetate remains
the same for wild type and PDC repressed mutants, a greater fraction of flux from
62
acetaldehyde was channelled towards acetate when ZWF was deleted. This increase in the
ratio of flux towards acetate after deletion of ZWF gene is to compensate for the depleted
NADPH pools in the cytoplasm. When this mutant was further perturbed by over-expressing
ALD (10 times), the fraction of flux towards acetate increased further as expected. This
increased flux towards acetate was the reason for improving the growth rate (Fig 5.9).
63
Appendix D
Table showing the number of genes, metabolites and reactions in each of the genome-scale
models reported till date.
Model Name No. of Genes
Sequenced No. of Metabolites No. of Reactions
iFF708 708 825 1145
iND750 750 646 1149
iLL672 672 636 1038
iIN800 800 1013 1446
iMM904 904 1228 1577
Yeast 4.0 932 1319 1865
iIN800 iMM904 Yeast 4.0
Number of genes 707 904 924
True positive (%) 69.7 75 74.8
True negative (%) 6.9 5.1 5.3
False positive (%) 10.6 9.3 11.1
False negative (%) 12.7 10.6 8.8
In the above table positive and negative refer to the ability and in ability to grow under
glucose-limited minimal media conditions (Dobson et al., 2010)