model based design of a saccharomyces cerevisiae platform ... · pdf filemodel based design of...

i

Model based design of a Saccharomyces cerevisiae

platform strain with improved tyrosine

production capabilities

by

Sarat Chandra Cautha

A thesis submitted in conformity with the requirements for the degree of Master of Applied Science

Chemical Engineering and Applied Chemistry University of Toronto

© Copyright by Sarat Chandra Cautha 2012

ii

Model based design of a Saccharomyces cerevisiae platform

strain with improved tyrosine production capabilities

M.A.Sc Thesis, 2012

Sarat Chandra Cautha, Department of Chemical Engineering and Applied Chemistry,

University of Toronto

Abstract

Large-scale production of plant secondary metabolites is of interest because of their application

in production of many valuable products. Recent advances in the area of DNA recombinant

technology has made it possible to produce these valuable compounds using microbial routes.

The objective of this work was, to design a platform strain of Saccharomyces cerevisiae with

improved intracellular tyrosine pools using computational modeling. This engineered yeast could

be used as a host for producing important plant secondary metabolites on an industrial-scale. In

this study, a combination of steady-state and dynamic modeling methods were used for strain

design. Initial strain design was performed using steady-state modeling, and the predictions from

steady-state modeling were prioritized for experimental validation using dynamic modeling. The

final strategy proposed included deletion of PDC1, ZWF1, ARO10; over-expression of ALD6,

and alleviation of tyrosine feedback resistance in shikimate pathway. Initial experiments for

validation of this strategy showed promising results.

iii

Acknowledgement

First and foremost, I wish to thank my supervisor, Professor Radhakrishnan Mahadevan. He has

provided excellent guidance and support throughout the course of this project and has always had

my best interest at heart. I am very grateful to him.

I would like to thank our collaborator, Professor Vince Martin and his group at Concordia

University, for validating the model predictions and providing me the data for ARO10 gene

deletion.

I would also like to thank my committee members, Professor William Cluett and Professor

Alexander Yakunin for providing valuable feedback and suggestions.

Many thanks are due to past and current members of Biozone and Laboratory of Metabolic

Systems Engineering, especially Pratish Gawand, Nadeera Jayasinghe, Ilan Adler, Tahnimeh

Khazaei and Christopher Gowen for their friendship, support, encouragement and help over the

course of this project.

I would like to thank Genome Canada and Department of Chemical Engineering and Applied

Chemistry for providing the funding for this project.

Finally and most importantly, I wish to thank my parents and sister for their unconditional love

and unwavering support during the best and worst periods of my life. I would like to dedicate

this thesis to them.

iv

Table of Contents

Abstract ii

Acknowledgement iii

Table of Contents iv

List of Figures vii

Chapter 1: Introduction 1

1.1 Advantages of producing chemicals using engineered microbes 1

1.2 Challenges in large-scale production of heterologous products 3

1.2.1 Effective expression of heterologous genes in microbial host 3

1.2.2 Supply of microbial precursors to heterologous pathway 4

1.3 Industrial importance of tyrosine 5

Chapter 2: Objective 7

Chapter 3: Literature Review 9

3.1 Tyrosine production 9

3.1.1 Biotechnology based methods for production of tyrosine 9

3.1.2 Tyrosine production using engineered microbes 11

3.2 Steady-state modeling 13

3.3 Ensemble Modeling 14

Chapter 4: Methods and Methodology 15

v

4.1 Steady-state modeling 15

4.1.1 Flux Balance Analysis (FBA) 16

4.1.2 Genome-scale metabolic models 18

4.1.3 Bi-level strain design algorithms 18

4.1.4 Limitations of steady-state modeling 21

4.2 Dynamic modeling 22

4.2.1 Ensemble modeling concept 22

4.2.2 Ensemble modeling framework 23

4.2.3 Screening the ensemble using literature data 27

4.2.4 Limitations of ensemble modeling 28

4.3 Methodology 28

Chapter 5: Results and Discussion 30

5.1 Steady-state modeling results 30

5.1.1 Predicted strategy 32

5.1.2 Experimental validation for ARO10 deletion 34

5.1.3 Need for Ensemble modeling 36

5.2 Ensemble modeling results 36

5.2.1 S. cerevisiae central model reconstruction 37

5.2.2 Development of ensemble of models 39

5.2.3 Screening the ensemble using data from literature 39

5.2.4 Prioritizing the targets for experiments using screened models 41

5.3 Final strategy to be verified experimentally 44

Chapter 6: Conclusions and Future Work 45

vi

6.1 Conclusions 45

6.2 Future work 46

References 47

Appendix A 57

Appendix B 60

Appendix C 61

Appendix D 63

vii

List of Figures

Figure 1.1 Schematic of the steps involved in production of Xanthohumol from tyrosine. 2

Figure 1.2 Schematic of some of the industrially important chemicals which require 6

tyrosine as a precursor

Figure 3.1 A schematic of aromatic amino acid (shikimate) pathway up to the 10

generation of Chorismate

Figure 3.2 Tyrosine production from Chorismate 10

Figure 3.3 A schematic of allosteric regulation in aromatic amino acid pathway of 12

S. cerevisiae

Figure 4.1 Trade-off between steady-state and dynamic modeling methods 15

Figure 4.2 Schematic of conversion of cell network into under-determined mass 16

balance constraints at steady-state

Figure 4.3 Schematic of how optimal flux distribution is calculated using FBA 17

Figure 4.4 Schematic of computational prediction of possible flux space for 20

wild-type and Optknock suggested mutant

Figure 4.5 Flow chart depicting the steps involved in Ensemble Modeling 23

viii

Figure 5.1 Steady-state strain design approach adapted in this study 30

Figure 5.2 Map of computationally predicted solution space for wild type 31

iMM904 and the strategy designed from GDLS

Figure 5.3 Schematic of the deletions suggested by GDLS 32

Figure 5.4 Schematic of experimental modifications made while testing 34

ARO10 deletion

Figure 5.5 Graphs showing accumulation of 4HPP and tyrosine in the four 35

strains that we investigated

Figure 5.6 Schematic of reconstructed S. cerevisiae network used in this work 38

along with calculated flux data

Figure 5.7 Model screening using data from succinic acid and glycerol 41

over-producing strains

Figure 5.8 EM predicted PEP, E4P, DAHP accumulation and biomass formation 42

rates when deletions suggested by GDLS are implemented

Figure 5.9 Graph showing the effect of ALD over-expression on 43

growth rate of PDC-- and ΔZWF mutant

Figure 5.10 Schematic of the proposed final strategy for tyrosine over-producing strain 44

1

Chapter 1

Introduction

1.1 Advantages of producing chemicals using engineered microbes

Declining supplies of fossil fuels and increasing environmental problems are currently driving

scientists around the world to develop novel biotechnology-based processes for producing fuels,

chemicals and other major materials using simple inexpensive sugars as the major carbon source.

Such processes do not require high temperatures and pressures thereby minimizing the energy

consumption and do not generate toxic compounds as by-products. Recent advances in the area

of recombinant DNA technology have enabled the production of many exotic and valuable

substances that were virtually unobtainable before using microbial routes. These include

substances that are traditionally not produced by microbes such as plant secondary metabolites

(e.g. polyketides, alkaloids, flavonoids). Many drugs used in modern medicine, such as

vinblastine, digitalis, taxol and codeine, are derived from plant secondary metabolites and are

used for treatment of cancer, heart diseases and pain. Apart from pharmaceutical purposes, such

valuable chemicals are used in production of flavours, fragrances, pigments, insecticides and

other important products.

Chemical synthesis of plant secondary metabolites is often difficult and expensive because of

their chemical complexity, and yields from natural resources are typically low, making industrial

scale production difficult. Therefore, there is a great incentive for producing these valuable

compounds using microbial routes. Production of plant secondary metabolites in microbes like

E.coli and yeast is accomplished by incorporating the plant genes into micro-organisms (Khosla

etal., 2003; Maury et al., 2005, Hawkins et al 2008; Minami et al., 2008). This process of

expressing non-native genes is called heterologous protein expression, and the compounds

obtained are called heterologous products.

2

The work we presented here is part of a project which aims to produce a group of plant

secondary metabolites like codeine, xanthohumol, resveratrol etc, on an industrial scale using

Saccharomyces cerevisiae (baker’s yeast) as the microbial host. Tyrosine, an aromatic amino

acid produced by S. cerevisiae, acts as precursor to the heterologous pathways that produce

codeine, xanthohumol etc. The following graphic (Fig 1.1) shows the steps involved in

production of one of the plant metabolite of our interest: xanthohumol, a flavonoid with anti-

cancer capabilities, from tyrosine.

Figure 1.1 Schematic of the steps involved in production of Xanthohumol from tyrosine. All the enzymes shown in

above graphic are heterologous enzymes that are required to be expressed in the microbial host (Phytometasyn

project report, 2008).

3

1.2 Challenges in large scale production of heterologous products

Wild-type S. cerevisiae has the enzymes necessary to produce the precursor tyrosine from cheap

sugars, but the enzymes shown in the graphic are non-native enzymes that need to be expressed

through genetic engineering. Although our current knowledge of microbial metabolism allows us

to do heterologous gene expression, the possibility of producing non-native compounds on an

industrial scale is limited, primarily because of the low yields of production. Microbial yields of

heterologous products depend on two factors: effective expression of heterologous genes in

microbial host; supply of microbial precursors to heterologous pathway.

1.2.1 Effective expression of heterologous genes in microbial host

With the advances in the field of synthetic biology and novel experimental techniques,

heterologous gene expression is routine, provided we select the suitable host. The choice of

microbial host is very important for the production of heterologous products. The microbial host

should be amenable for genetic manipulation, growth and provide a suitable environment for

proper expression of heterologous genes. In this work, we chose S. cerevisiae as our host

microbe because it meets all these requirements as detailed below.

S. cerevisiae is widely used in baking, brewing, and wine making industries, hence yeast

genetics, physiology, biochemistry, genetic engineering and fermentation strategies are well

understood. Experimental techniques required to precisely modify genetic network of this yeast

are widely available. Also, S. cerevisiae being a eukaryote, is known to have protein machinery

similar to that of higher eukaryotes. It has been established that enzymes from plants and humans

are properly folded and processed in yeast versus a prokaryotic host (Primrose, 1986; Zabriskie

et al., 1986), thereby, making it a suitable host for expression of key enzymes like aromatic

prenyltransferase (DMADP) and cytochrome P450. A number of studies of successful

expression of heterologous genes in S. cerevisiae were reported in literature ((Ro et al., 2004;

Porro et al., 2005; Jiang et al., 2005; Yan et al.,2007; Dejong et al., 2006; Ro et al., 2006). Also,

4

there are no endotoxins and oncogenic or viral DNA in S. cerevisiae, thus making it a very

suitable choice for our purpose.

In addition, S. cerevisiae produces no toxic metabolites and is non-pathogenic, earning it a

GRAS (generally regarded as safe) classification by the U.S. Food and Drug Administration

(FDA) (Chemler et al. 2006). Also, its physical properties such as tolerance to low pH and robust

growth under high sugar and ethanol conditions lead to the choice of yeast as our preferred

microbial cell factory.

1.2.2 Supply of microbial precursors to the heterologous pathway.

Another major determinant on the yield of heterologous products is the supply of microbial

precursor metabolites that act as feed to the heterologous pathway. Tyrosine, an aromatic amino

acid, is the precursor for production of plant metabolites of our interest. In a previous study

reported by Jiang et al. (2005) which looked at the production of chalconaringenin in S.

cerevisiae, it was observed that although the expression of the three genes TAL, 4CL and CHS

(shown in the above schematic (Fig.1.1)) was successful in producing chalconaringenin, its yield

was limited by the tyrosine flux. Similar observations suggesting limited heterologous

production due to insufficient precursor metabolite pools were reported by several others (Ro et

al., 2006; Santos et al., 2011; Brochado et al., 2010). Therefore, in order to improve the yields of

these non-native compounds, it is important to improve the intracellular pools of microbial

precursors.

To produce compounds such as xanthohumol in large-scale, it is important that we use a strain of

S. cerevisiae with sufficiently high levels of intracellular tyrosine as the host for expression of

heterologous pathways. S. cerevisiae, like other micro-organisms in nature, is assumed to evolve

with a goal of maximizing its growth rate under the given conditions, a state where there is no

excess production of tyrosine. In order to obtain a strain with higher intracellular tyrosine pools,

it is necessary to modify the genetic network of the yeast, thereby forcing it to channel more of

the substrate towards tyrosine. This process of modifying genetic network of organisms for over-

production of metabolites is called metabolic engineering.

5

Traditionally metabolic engineering has been done by classical strain improvement methods

which involved random mutagenesis and screening. However, with the advances in the area of

genome sequencing, a greater knowledge of microbial genetic networks is widely available. S.

cerevisiae, our microbial host, was the first eukaryotic organism whose complete genome was

sequenced (Goffeau et al., 1997). The information about its genome is widely available

(http://www.yeastgenome.org) along with information on open reading frames, biochemical

pathways, microarray studies and protein interaction networks. This information can be used to

devise rational design strategies for improved production of the required metabolites. Predicting

rational design strategies is not trivial considering the complexity of biological networks;

therefore, in order to help with this process many mathematical modeling methods have been

developed (Burgard et al., 2003; Patil et al., 2005; Pharkya et al., 2006; Tran et al., 2008; Lun et

al., 2009; Ranganathan et al., 2010; Yang et al., 2011). In the current study, we discussed a

methodology that used a combination of computational modeling methods to predict an effective

genetic engineering strategy for improved intracellular tyrosine. This strain of S.cerevisiae with

higher tyrosine pools would be an ideal host for expression of heterologous pathway of our

interest.

1.3 Industrial importance of tyrosine

Tyrosine, apart from being a precursor for alkaloids and polyketides, is also a valuable

compound with a variety of applications (Fig. 1.2). Tyrosine is used in its natural form as a

common dietary supplement due to its ability to stimulate brain activity for improved memory

and to control depression and anxiety. Tyrosine also serves as an important starting material for a

variety of high-value compounds such as Melanin, L- 3,4-dihydroxyphenylalanine (L-DOPA or

levodopa), which is currently the most powerful symptomatic drug for treatment of Parkinson’s

disease. Therefore, in addition to our objective of obtaining a platform strain for plant secondary

metabolites production, if the final strain of S. cerevisiae is observed to have significantly high

titers of tyrosine, it can be considered for industrial production of tyrosine, replacing E.coli,

which is currently the preferred microbe.

6

Figure 1.2 A schematic of some of the industrially important chemicals which require tyrosine as a precursor.

.

7

Chapter 2

Objective

As stated in the previous chapter, the supply of microbial precursors is an important factor in

determining the yields of heterologous products. Recently, Santos et al. (2011) showed that when

a strain of E. coli that was engineered for production of tyrosine was used as the microbial host,

naringenin was produced at sufficiently high titers (up to 84 mg/l) from glucose using minimal

media without any precursor supplementation. This clearly suggests that there is an incentive to

obtain a tyrosine over-producing platform strain of S. cerevisiae, which can be used as a host for

heterologous pathway expression. There is limited work done previously on improving the

aromatic amino acid production in S. cerevisiae as E. coli is the preferred microbe for industrial

scale production of aromatic amino acids. In this work, our objective was to design a strain of S.

cerevisiae with improved intracellular tyrosine pools using computational modeling methods.

The only reported work on obtaining metabolically engineered strain of S. cerevisiae with

tyrosine over-producing capabilities has focused on removing the feedback inhibition present in

aromatic amino acid pathway (Luttik et al., 2008). However, to optimize the network of S.

cerevisiae for tyrosine production, a holistic design that would account for the entire genome of

the yeast is required. The process of performing a genome-scale design is not trivial and cannot

be performed by observation because of the complexity of genetic networks. This difficulty in

developing a strategy, while considering the entire genome of the microbe, acts as motivation for

using mathematical modeling techniques for designing metabolic engineering strategies. In

addition, the availability of a number of well-curated in silico genome-scale models of S.

cerevisiae (Forster et al., 2003; Duarte et al., 2004; Nookaew et al., 2008; Herrgard et al., 2008;

Mo et al., 2009) provides an additional motivation for using mathematical modeling techniques.

In order to truly understand the dynamic microbial behaviour and to be able to predict the result

of genetic manipulations with complete accuracy, it is desirable to have a detailed genome-scale

dynamic model of the microbial metabolism. However, currently it is impractical to have large

scale dynamic models because of the lack of information on kinetic parameters and regulatory

8

network. In the absence of kinetic and regulatory information, it is possible to partly predict the

behaviour of cellular metabolism by using steady-state analysis. However, the predictions made

by steady-state modeling methods need not necessarily be consistent with experiments because it

does not consider the dynamic nature of cells. Therefore, there exists a trade off between the size

of the network and the accuracy in prediction between steady-state and dynamic modeling

methods.

In this study, the objective was to propose a methodology that can tackle the inherent limitations

of steady-state and dynamic modeling methods and devise an effective strategy for tyrosine over-

production. This was accomplished through the following tasks:

1. Obtain an initial strain design for improved tyrosine production using steady-state bi-

level optimization methods Optknock (Burgard et al., 2003) and GDLS (Lun et al., 2009).

2. Construct a small central model of S. cerevisiae, based on the suggestions from steady-

state modeling, for application of Ensemble modeling framework.

3. Predict the dynamic behaviour of yeast by applying Ensemble modeling over the central

model.

4. Use the dynamic central model obtained from Ensemble modeling to understand the

effect of each of the deletions suggested by steady-state models on the flux distribution

and predict the critical deletions for improving tyrosine flux.

5. Propose a final strategy for improved intracellular tyrosine pools for experimental

validation.

9

Chapter 3

Literature Review

3.1 Tyrosine production

Aromatic amino acids have a high industrial demand because of many applications, primarily in

the food and pharmaceutical industry (Breuer et al., 2004). Among the aromatic amino acids, the

demand for tyrosine is much lower when compared to the other aromatic amino acids,

phenylalanine and tryptophan, and this probably explains the reason why industrial scale

production of tyrosine received limited attention. Tyrosine is manufactured by three different

methods: (a) enzymatic synthesis by tyrosine phenol lyase (Lütke-Eversloh et al., 2007) (b)

extraction from protein hydrolysates ( Leuchtenberger et al., 2005) and (c) fermentation using

high performance mutants or genetically engineered microbial strains (Lütke-Eversloh et al.,

2007; G.Gosset 2009). For this project, we are interested in biological production of tyrosine, so

we have detailed information on tyrosine production by genetically engineered microbial strains

here.

3.1.1 Biotechnology based methods for production of tyrosine

Aromatic amino acids are produced by microbes via the aromatic amino acid or shikimate

pathway (Fig. 3.1). Phosphoenolpyruvate (PEP) and erythrose-4-phosphate (E4P) act as major

precursors to this pathway. In the first step PEP, a central carbon metabolite, and E4P, a pentose

phosphate pathway intermediate, are condensed to form 3-deoxy-D-arabinoheptulosonate-7-

phosphate (DAHP) which is further converted to shikimate (SHIK) via 3-dehydroquinate (DHQ)

and 3-dehydroshikimate (DHS). Shikimate is then phosphorylated and converted to chorismate

(CHOR) after the addition of another PEP molecule. Chorismate is the biosynthetic branch point

for aromatic amino acids, as well as for folate, ubiquinone, menaquinone, and siderophores

synthesis (Fig 3.2) (Pittard et al., 1996; Dosselaere et al., 2001).

10

Figure 3.1 A schematic of aromatic amino acid (shikimate) pathway up to the generation of Chorismate. Seven

reactions are involved in the conversion of PEP and E4P to Chorismate.

For phenylalanine and tyrosine biosynthesis, chorismate is converted to prephenate, a common

precursor, in a reaction catalyzed by the enzyme chorismate mutase. In the branch that produces

tyrosine, prephenate gets converted to 4-hydroxyphenylpyruvate (4-HPP) using the enzyme

prephenate dehydrogenase. In S. cerevisiae, formation of 4-HPP is associated with formation of

one mole of NADPH. Tyrosine is then formed by transamination of 4-HPP using the enzyme

aminotransferase. A schematic of tyrosine branch from Chorismate is depicted in the following

figure (Fig, 3.2).

Figure 3.2 Tyrosine production from Chorismate. This process involves generation of one mol of NADPH.

11

3.1.2 Tyrosine production using engineered microbes

As mentioned before, most of the work on aromatic amino acid production was directed towards

phenylalanine and tryptophan production (Berry A 1996; Bongaerts et al., 2001; Leuchtenberger

et al., 2005; Ikeda et al., 2003, 2006). Most of the research conducted for tyrosine production

focused on its production in E.coli, which currently is the preferred microbe for industrial scale

production. In E.coli, the first reaction in aromatic amino acid pathway, a condensation reaction

between PEP and E4P, is catalyzed by a set of three isoenzymes which are feedback inhibited by

the three aromatic amino acids (Berry et al., 1996; Frost et al., 1995). The first generation of

metabolic engineering approaches towards increasing carbon flux to tyrosine in E.coli

concentrated on over-expressing the feedback resistant enzyme that correspond to tyrosine

(Lutke-Eversloh et al., 2007; Olson et al., 2007), accompanied by over-expression of rate

controlling pathway enzymes (Olson et al., 2007). A second generation of quantitative metabolic

engineering approaches focussed on over-expression of phosphoenolpyruvate synthase and

transketolase A genes, which would result in an increase in the precursor pools, PEP and E4P ,

(Lutke-Eversloh et al., 2007; Yi et al., 2003). In a recent study, Juminaga et al., (2012) expressed

all the genes encoding formation of tyrosine from PEP and E4P on two plasmids and

transformed them inside E.coli cells. This effort resulted in complete removal of bottlenecks in

aromatic amino acid pathway (Juminaga et al., 2012).

In S. cerevisiae, two reactions in the aromatic amino acid pathway are known to be subject to

feedback inhibition (Fig. 3.3). The first of these reactions is the formation of DAHP, and this

reaction is catalyzed by two isoenzymes (Ar03p, Aro4p) which are feedback regulated by

phenylalanine (Aro3p) and Tyrosine (Aro4p) (Kunzler et al., 1992). The second reaction is the

conversion of chorismate to prephenate. This reaction is catalyzed by the enzyme chorismate

mutase (Aro7p) whose activity is inhibited by Tyrosine and activated by tryptophan (Brown et

al., 1990). Although considerable knowledge is available on the functioning of aromatic amino

acid pathway in yeast, very little work has yet been done on using S. cerevisiae as a host to

produce Tyrosine. To date, only one report provides a glimpse into the possibility of developing

S. cerevisiae into a Tyrosine over-producer (Luttik et al., 2008). This work involved using

Tyrosine feedback resistant versions of both Aro4p (Hartmann et al., 2003) and Aro7p enzymes

(Krappmann et al., 2000). The feedback resistant Aro4p contains a single lysine-to-leucine

12

substitution at position 229 and, feedback resistant Aro7p has a serine-to-glycine substitution at

position 141. In this strain, the production of Tyrosine increased by 3 fold when compared to

wild-type. This work also showed that DAHP synthase exerts a stronger degree of control on the

synthesis of Tyrosine than chorismate mutase in S. cerevisiae.

Figure 3.3 A schematic of allosteric regulation in aromatic amino acid pathway of S. cerevisiae.

13

3.2 Steady-state Modeling

In this work, we used FBA (Orth et al., 2010) based bi-level optimization methods, Optknock

(Burgard et al., 2003) and GDLS (Lun et al., 2009) for strain design. The advantage of using

steady-state modeling methods is that, they can be applied to genome-scale models and does not

require information of kinetic parameters. It has been shown previously that bi-level

optimization methods are useful for strain design in yeast:

Bro et al. (2006) used in silico simulations with the iFF708 model for increasing ethanol

production and at the same time decrease the amount of glycerol produced by the cell under

anaerobic growth conditions. The engineered strain had a decreased glycerol production of 40%

and an increased ethanol yield of 3% without affecting the maximum specific growth rate.

Asadollahi et al. (2009) investigated strategies for improving the yield of sesquiterpene

production in S. cerevisiae, by enhancing the precursor pools. They used a bi-level optimization

framework, Optgene, over iFF708 model for their predictions. The strategy that they obtained,

led to an approximately 85% increase in the final cubebol titer.

Brochado et al. (2010) studied a recombinant yeast strain producing vanillin, this strain was

modeled in silico to find deletion targets that could improve the vanillin yield on glucose. The

iFF708 model was used to suggest two different deletion targets. When these deletions were

implemented in vivo, a 5-fold increase of vanillin yield was observed as compared to previously

reported vanillin production in S. cerevisiae.

From the above examples, it is clear that in silico modeling using genome-scale metabolic

models can accelerate the process of metabolic engineering by suggesting rational targets for

over-expression or deletion for improved production of a certain metabolite.

14

3.3 Ensemble Modeling

Ensemble modeling (Tran et al., 2008) is a novel dynamic modeling approach, which estimates

the kinetic behaviour of cells using phenotypic data of various enzyme tuning experiments

reported in literature. Ensemble modeling is useful for predicting the metabolic behaviour of

cells in a greater detail when compared to steady-state modeling techniques, but cannot be used

to perform extensive strain design like Optknock and GDLS. Ensemble modeling has been used

previously to improve the yield of lysine in engineered E.coli (Contador et al., 2009), to

understand fatty acid metabolism in hepatic cells (Dean et al., 2010) and more recently to predict

new drug targets for cancer (Khazaei, 2011). There is no reported work on its usage for S.

cerevisiae.

A more detailed explanation of these modeling methods was provided in the following chapter.

15

Chapter 4

Methods and Methodology

As stated before, the objective of current work was to design a strain of S. cerevisiae that can

show increased intracellular levels of tyrosine using computational methods. Using mathematical

models allows us to have a more holistic perspective of the microbial system while performing

the strain design. Mathematical modeling approaches can broadly be classified into two major

approaches: constraint-based steady-state modeling and mechanism-based dynamic modeling.

Each of these modeling approaches has their own advantages and limitations (Fig 4.1).

Figure 4.1 Trade-off between steady-state and dynamic modeling methods.

4.1 Steady-state modeling

Steady-state modeling analyzes the metabolic networks based on reaction stoichiometries and

enzyme reversibilities in addition to network topology. With the advances in whole-genome

sequencing, these characteristics are readily available for several organisms in the form of

reconstructed metabolic networks (Feist et al., 2008; Herrgard et al., 2008). A major feature of

steady-state modeling approach is that it assumes no change in the concentration of intracellular

metabolites. This assumption of constant metabolite concentrations is used to calculate the fluxes

by performing mass balance across each intracellular metabolite, thereby, circumventing the

16

need for information on kinetic parameters to characterize the flux distributions (Bailey 2001). In

this work, strain design was performed using bi-level optimization based methods, which are

extensions of a widely used fundamental modeling methodology called Flux Balance Analysis

(FBA) (Orth et al., 2010).

4.1.1 Flux Balance Analysis (FBA)

FBA is one of the most widely used steady-state modeling technique that allows for detailed

simulations of metabolic systems. In FBA, the reaction network of a cell is represented as a set

of under-determined mass balance constraints as shown in the figure (Fig 4.2).

Figure 4.2 Schematic of conversion of cell network into under-determined mass balance constraints at steady-state.

Mass balance equations are represented in mathematical form using the stoichiometric matrix or

the S matrix and the flux distribution vector V. S matrix reflects the stoichiometry of various

reactions involved in the network. In S matrix every row corresponds to the concentration of one

metabolite and every column corresponds to the flux of one reaction. Because the system of

equations is an under-determined system, we can have more than one flux distribution that can

satisfy mass balance constraint. Among these possible sets of flux distribution, FBA predicts the

optimal flux distribution by assuming that the metabolic network has been optimized during

17

evolution with respect to a particular objective function. Objective functions commonly used are

maximization of ATP production (Ra et al., 1990; Ramakrishna et al., 2001), maximization of

biomass formation (Kauffman et al., 2003; Edward et al., 2000; Price et al., 2003) or

minimization of metabolic adjustment (MoMa) (Segre et al., 2002). So far, growth maximization

has been the most extensively used approach to describe the physiology during growth (Edwards

et al., 2001; Famili et al., 2003).

The functioning of FBA is described in the following figure (Fig. 4.3). When no constraints were

imposed, the flux distribution of a biological network can take any possible value. When mass

balance and capacity constraints were introduced, a solution space was defined, and when the

solution space was optimized for a specific objective function, the point on the edge of the

solution space was identified as the optimal flux distribution.

Figure 4.3 Schematic of how optimal flux distribution is calculated using FBA (Orth et al., 2010).

Mathematically FBA is formulated as:

S = stoichiometric matrix ; V = flux vector ; f = objective vector ; A = vector of lower bound of

flux V and B = vector of upper bound of flux V

Max f’V

Subject to SV = 0

A ≤ V ≤ B (4.3)

(4.2)

(4.1)

(Steady-state condition)

18

4.1.2 Genome-scale metabolic models

The uncomplicated nature of FBA formulation allows it to be applied on large-scale metabolic

reconstructions, called genome-scale metabolic models. Genome-scale metabolic models are in

silico models that contain information about all the known pathways in the organism and it is

constructed based on the annotated genome sequences and the known biochemical and

physiological data. Applying FBA on these models has been shown to be very useful in

predicting the physiological behaviour, like growth rate and product secretion rate, of

microorganisms under different environmental and genetic disturbances (Forster et al., 2003;

Duarte et al., 2004). Genome-scale metabolic models were also found useful in designing cells

for improved production of desired products by suggesting the reactions that need to be targeted

(Alper et al., 2005; Fong et al., 2005; Wang et al., 2006; Bro et al., 2006).

S. cerevisiae is perhaps the most well studied eukaryotic microbe and number of genome-scale

metabolic models of S. cerevisiae have been developed and are available for Metabolic modeling

: iFF708 (Forster et al., 2003), iND750 (Duarte et al., 2004), iLL672 (Keupfer et al., 2005),

iIN800 (Nookaew et al., 2008), iMM904 (MO et al., 2009) and Yeast 4.0 (Dobson et al., 2010).

Appendix D shows the number of genes sequenced, metabolites and reactions for each of the

above mentioned models and also compares the effectiveness of iIN800, iMM904 and Yeast 4.0

in predicting in silico viability of single deletion strains. These genome-scale metabolic models

have been successfully used in the past for improving the yield of sesquiterpenes (Asadollahi et

al., 2009), vanillin (Brochado et al., 2010) and ethanol (Bro et al., 2006). In this work we used

iMM904, which has 904 metabolic genes, 1228 metabolites and 1577 reactions, for performing

strain design to improve tyrosine yield. The eventual design that we obtained by using this model

is then verified on Yeast 4.0 and improved version of iMM904 (Zomorrodi et al., 2010).

4.1.3 Bi-level strain design algorithms

After the success of FBA in predicting the result of various genetic manipulations strategies with

minimal knowledge of kinetic parameters, the obvious next step was to extend its framework to

try and predict the metabolic engineering targets for improved production of desired metabolites.

19

This can be achieved by searching the space of possible genetic manipulations for the strategy

that results in the desired metabolic state of improved production of required products. Many

computational tools for identifying strain modifications leading to targeted overproductions have

been described in the literature ( Burgard et al., 2003; Patil et al., 2005; Pharkya et al., 2006; Lun

et al., 2009). One of the earliest efforts was OptKnock (Burgard et al., 2003) procedure that

proposed gene knockouts leading to targeted overproductions. Optknock algorithm is a bilevel

optimization algorithm, meaning it maximizes the cellular objective function (as described in the

previous flux balance analysis section) while also maximizing a surrounding bioengineering

objective. The inner optimization of maximizing cellular objective is necessary to prevent

prediction of lethal deletions.

Optknock formulation (Burgard et al., 2003)

The input to the Optknock algorithm is the reaction network that we are interested in and the

substrate uptake rates. Bioengineering and cellular objectives need to be determined such that

they reflect the strain that we aim to engineer. In our case, we used genome-scale model

iMM904 model as our input and glucose as the substrate under aerobic minimal media

conditions. We defined the bioengineering objective as maximization of tyrosine production and

for cellular objective, we chose growth maximization.

The result that we get from Optknock is a set of reactions that needs to be deleted to couple the

growth of the organism to the production of the product of our interest. This process is explained

in the following figure (Fig 4.4).

Maximize Bioengineering objective through gene knockouts

Subject to

Number of knockouts <= limit

Maximize Cellular Objective

Subject to Network stoichiometry

Reaction bounds

Substrate uptake rate

Blocked reactions identified by outer

problem

20

Figure 4.4 Schematic of computational prediction of possible flux space for wild-type and Optknock suggested

mutant. Deleting genes suggested by Optknock would obligate the cells to produce the product of our interest after

evolution. (Adapted from Fong et al., 2005).

In the above figure, region A shows all possible values of growth and product formation fluxes

that wild-type cell can operate for a given substrate uptake rate under steady-state condition. We

considered maximizing growth rate as our cellular objective because, it is reasonable to assume

that cells are evolved over millions of years to optimize their growth. According to this

assumption, wild-type cells would operate at (1) in region A, where the growth rate is maximum.

Under this condition, the cell will not produce the required product, unless the product is a

primary metabolite for the cell. In our case, because wild-type yeast produces only sufficient

amount of tyrosine required for its growth, we will not see any excess production. After

performing the suggested deletions from Optknock, the genetic network of resulting mutant will

be different from wild-type. This modification results in a change in the possible solution space

for mutant growth and product formation fluxes. This new solution space is represented by the

region B. After adaptive evolution, mutant would operate at (2) which is the optimal growth flux

point. It can be seen that, after implementing the mutations suggested by Optknock, the genome

is engineered in such a way that if the cell has to grow at its most preferred state (maximum

growth), it has to produce the product of our interest.

When Optknock is applied over large networks like iMM904, the algorithm will not converge if

we are looking to design a strain that has more than 3-4 knockouts. This limit on maximum

21

allowed knockouts is because, the runtimes scale exponentially with increase in number of

knockouts when applied over genome-scale models. This can be a limitation in some cases

because, certain metabolites might require more than 3-4 knockouts to observe appreciable

growth-product coupling. GDLS (Lun et al., 2009) is an algorithm that is formulated on the same

lines as Optknock, but it uses local search instead of global search approach adopted by

Optknock. The advantage of using a local search approach is that, we can make a design with a

much larger limit on the number of possible knockouts when we are using genome-scale models.

However, unlike Optknock, the solution that GDLS gives might not be globally optimal. In the

current work, we tried both Optknock and GDLS to design tyrosine over-producing S. cerevisiae.

Although similar algorithms, like Optreg (Pharkya et al., 2006), Optforce (Ranganathan et al.,

2010) and EMILiO (Yang et al., 2011) which would include over-expression and repression in

addition to deletion of reactions in strain design were reported, we have not used them in this

project because the designs made by up-regulation and repression of reactions might not be

robust, especially when kinetics is not considered. This is because, when we allow the algorithm

to consider over-expression and repression, it would predict specific enzyme expression levels

which will be difficult to implement experimentally.

4.1.4 Limitations of steady-state modeling

The major limitations of steady-state modeling are a result of the assumptions of steady-state

modeling i.e. assumption of cells operating at a steady-state; assumption of presence of a

biological objective function. The assumption of steady-state is generally valid for microbial

metabolic networks because, the time-scales for equilibration of metabolite concentration are

much smaller when compared to genetic regulation (Segre et al., 2002). However, the

assumption of presence of a biological objective function appears far less convincing, especially

while performing strain design using Optknock and GDLS. This is because, while it is

reasonable to assume that wild type cells, evolving over millions of years, have optimized their

genetic network to maximize their growth rate, there is no experimental evidence that suggests

mutants exhibit a particular objective function. Even the steady-state assumption appears to be

more acceptable while predicting the metabolic behaviour of cells using FBA than while

designing strategies using Optknock/GDLS. These limitations provide a more compelling case

22

for verification of steady state modeling predictions using a dynamic model before experimental

validation.

4.2 Dynamic modeling

Dynamic modeling would provide a more detailed analysis of biological systems and can predict

dynamic cellular behaviour. A detailed kinetic model of an organism, if available, can be very

useful to predict how the metabolic flux map changes when a genetic manipulation is performed.

However, development of detailed kinetic models (Chassagnole et al., 2002; Lee et al., 2006;

Wang et al., 2004) has been difficult because of the lack of knowledge on kinetic parameters.

The time-course data that is required to predict the kinetic parameters requires tedious and

expensive experimental procedures. Ensemble modeling (EM) (Tran et al., 2008) is an approach

that can be used overcome this drawback as it uses experimental data reported in literature to

predict the dynamic behaviour of the cell. However, the large number of parameters involved in

EM limits the size of the cellular network over which it can be applied.

4.2.1 Ensemble modeling concept

In EM to avoid the difficulty of knowing the kinetic parameters of each reaction in the system,

we construct an ensemble of models that all reach the given steady-state in terms of flux

distribution and metabolite concentrations. These models, which span the entire space of kinetics

that is thermodynamically feasible, are then screened using available experimental data to obtain

a smaller subset which could account for dynamic cell behaviour. The data used for screening is

phenotypic data such as flux changes due to changes in enzyme expression. Even though such

data are measured at steady-state, they are the results of interplay among many kinetic

parameters, and therefore provide a useful screen. EM breaks down each reaction in the network

into their elementary form, which allows us to incorporate any known information on the true

mechanism of an enzymatic reaction such as regulation, thermodynamics and steady-state

23

metabolite levels, but does not require such information if it is not available. The major steps

involved in EM is represented in the following flow chart (Fig. 4.5).

Figure 4.5 Flow chart depicting the steps involved in Ensemble Modeling (adapted from Contador et al., 2009).

4.2.2 Ensemble modeling framework

As stated before, it is currently impractical to apply EM framework for large metabolic networks,

so we developed a smaller network of 50 reactions for S. cerevisiae (Fig. 5.6). This central

metabolic network is used as an input in to the framework along with a known reference steady-

state flux data. EM uses this reference steady-state flux data to anchor the models. In addition to

flux data, EM framework requires Gibbs free energy (ΔGs) values for all the reaction in the

network in order to calculate the feasible thermodynamic space. We obtained the ΔGs of reaction

from the paper by Jankowski et al. (2008). A table containing all the considered reactions, their

steady-state flux values and ΔGs is shown in appendix A. The framework then breaks down

every enzymatic reaction into a set of elementary reactions. Elementary reactions are the most

24

fundamental form of reaction and represent events at the molecular level which allows us capture

the mechanism of the reaction. For example for a one reactant one product reaction

𝑿𝒊 𝑬𝒊

𝑿𝒊+𝟏

The scheme of break down into elementary reactions is:

Flux through each elementary reaction is given by:

𝒗𝒊,𝟏 = 𝒌𝒊,𝟏 𝑿𝒊 [𝑬𝒊]

Where ki,1 is the kinetic rate constant for the first elementary reaction, [Xi] is the concentration of

metabolite, and [Ei] is the concentration of enzyme i. Similarly, a standard mechanism of break

down into elementary reactions (appendix B) was followed for other reactions with different

number of reactants and products. EM framework can also consider any information available on

allosteric regulation in the network by treating the regulation as an individual reaction.

In order to make the above equation (Eq. 4.9) dimensionless, and making it easier and more

accurate for numerical simulation, we normalize the concentrations of metabolites by the

corresponding concentration at the reference steady-state Xiss,ref

. Similarly, the free enzyme and

enzyme complexes are normalized by the total concentration of the corresponding enzyme

Eref

i,total at the reference state.

𝒗𝒊,𝟏 = 𝒌𝒊,𝟏𝑬𝒊,𝒕𝒐𝒕𝒂𝒍𝒓𝒆𝒇

𝑿𝒊𝒔𝒔,𝒓𝒆𝒇

∗ [𝑿𝒊]


∗ [𝑬𝒊]

𝑬𝒊,𝒕𝒐𝒕𝒂𝒍𝒓𝒆𝒇

= Ǩ𝒊,𝟏𝒓𝒆𝒇

∗ 𝑿 𝒊 ∗ ě𝒊,𝟏

(4.4)

(4.5)

(4.6)

(4.7)

25

Note that the reaction has a log linear form:

𝒍𝒏 𝒗𝒊,𝟏𝒓𝒆𝒇

= 𝒍𝒏 Ǩ𝒊,𝟏𝒓𝒆𝒇

+ 𝒍𝒏 𝑿 𝒊 + 𝒍𝒏ě𝒊,𝟏

At the reference steady-state Xiss,ref

= 1 and the equation becomes:

𝒍𝒏 𝒗𝒊,𝟏𝒓𝒆𝒇

= 𝒍𝒏 Ǩ𝒊,𝟏𝒓𝒆𝒇

+ 𝒍𝒏ě𝒊,𝟏

From the above equation (Eq. 4.12), it can be seen that kinetic parameters can be calculated if

vi,1ref

and enzyme fraction ei,1ref

are sampled. Enzyme fraction value lies between 0 and 1 and can

be sampled effectively, but it is not easy to sample the flux values because they can range

anywhere between 0 to infinity. In order to avoid this situation, we sample what is called the

reversibility of the reaction. It is defined as:

𝑹𝒊,𝒋 = 𝒎𝒊𝒏(𝒗𝒊,𝟐𝒋−𝟏,𝒗𝒊,𝟐𝒋)

𝒎𝒂𝒙(𝒗𝒊,𝟐𝒋−𝟏,𝒗𝒊,𝟐𝒋)

where vi,2j-1 indicate the forward fluxes of the elementary reactions of reaction i and vi,2j

represent the reverse flux. From the way reversibility is defined, it is obvious that its value lies

between 0 and 1, thereby making it easier to sample effectively. The values of reversibilities are

a representation of different kinetic states. For example, if within the enzymatic reaction i, Ri,j for

step j is close to zero while that of the next step is near 1, step j is determined to be the rate

limiting step (Tran et al., 2008).

At reference steady-state forward and reverse flux of each step are constrained by the following

equation

𝒗𝒓𝒆𝒇𝒊,𝟐𝒋−𝟏 − 𝒗𝒓𝒆𝒇

𝒊,𝟐𝒋 = 𝑽𝒓𝒆𝒇𝒊,𝒏𝒆𝒕

(4.8)

(4.9)

(4.10)

(4.11)

26

Vi,netref

is the reference steady-state flux of the reaction catalyzed by enzyme i. Using the above

two equations (Eq. 4.13 and 4.14) we can calculate the forward and backward fluxes of each

elementary reaction at reference steady-state as:

𝒗𝒊,𝟐𝒋−𝟏𝒓𝒆𝒇

= 𝑽𝒊,𝒏𝒆𝒕𝒓𝒆𝒇

𝟏− 𝑹𝒊,𝒋

𝒔𝒊𝒈𝒏 (𝑽𝒊,𝒏𝒆𝒕𝒓𝒆𝒇

) 𝒗𝒊,𝟐𝒋

𝒓𝒆𝒇=

𝑽𝒊,𝒏𝒆𝒕𝒓𝒆𝒇

∗ 𝑹𝒊,𝒋


)

𝟏− 𝑹𝒊,𝒋


)

Reversibilities are also used to apply thermodynamic constraints by the following equation.

(∆𝑮𝒊

𝑹𝑻)𝒍𝒐𝒘𝒆𝒓 𝒃𝒐𝒖𝒏𝒅 ≤ 𝒔𝒊𝒈𝒏 𝑽𝒊,𝒏𝒆𝒕

𝒓𝒆𝒇 ∗ 𝒍𝒏𝑹𝒊,𝒋

𝒓𝒆𝒇 𝒋 ≤ (

∆𝑮𝒊

𝑹𝑻)𝒖𝒑𝒑𝒆𝒓 𝒃𝒐𝒖𝒏𝒅

Sign (Vi,net) indicates the direction of the reaction, a value of +1 is assigned for forward

reactions and -1 for reverse reactions. The upper bound and lower bound values are calculated

from the standard Gibbs free energies and metabolite concentration ranges. In our case, because

the range of metabolite concentrations was not known, we assumed 0 and 100 as the lower and

upper bounds metabolite concentrations.

In addition to reversibilities, the enzyme fractions are also sampled. At steady-state, the total

enzyme concentration for each reaction is the sum of the free enzymes and bound enzymes. The

distribution of the total enzyme amount over the free enzymes and bound enzymes affects the

kinetics of the system and for this reason the enzyme fractions are considered. The enzyme

fractions are sampled with the constraint that the total enzyme amount is conserved. In other

words, the sum of the enzyme fractions of the elementary reactions for each enzymatic reaction

must equal one (Contador et al., 2008).

ě𝒓𝒆𝒇𝒊,𝒋𝒏𝒊𝒋=𝟏 = 𝟏

(4.12) (4.13)

(4.14)

(4.15)

27

Once reversibilities and enzyme fractions are sampled, kinetic parameters can be calculated

using Eq. [4.2], Eq. [4.15] and Eq. [4.16]. So, for each set of enzyme fractions and reversibilites

we get a set of kinetic parameters that define one of the possible kinetic model. This process is

repeated thousands of times to generate thousands of sets of kinetic parameters which are then

used to develop alternate kinetic models that are all anchored onto the same reference steady-

state flux values.

𝑴𝒐𝒅𝒆𝒍𝒌 = 𝒇(𝑹𝒌𝒓𝒆𝒇

, 𝒆𝒌𝒓𝒆𝒇

)

After estimating kinetic parameters of all the elementary reactions, the next step is to calculate

the steady-state flux and metabolite concentration data. The metabolic network for each model in

the ensemble is described by a system of ODEs:

𝒅Ӯ𝒊

𝒅𝒕=

𝟏

𝒚𝒊𝒔𝒔,𝒓𝒆𝒇 𝒗𝒈𝒆𝒏𝒆𝒓𝒂𝒕𝒊𝒐𝒏 - 𝒗𝒄𝒐𝒏𝒔𝒖𝒎𝒑𝒕𝒊𝒐𝒏

Where Ӯi represents both the normalised metabolite concentration with respect to the reference

steady-state or the enzyme fractions. yiss,ref

stands for the corresponding metabolite or total

enzyme concentration at the reference state. Solving the ODEs by numerical integration the

steady-state metabolite and enzyme concentrations can be generated and then using Eq. [4. 20]

steady-state flux data can be computed. In this work, we used ode15s solver in MATLAB to

solve the ODEs with an integration time of 500 units and a step size of 25 units.

4.2.3 Screening the ensemble using literature data

The models developed are then screened using reported data from literature. All the models are

perturbed by modifying the enzyme concentration levels to reflect the reported experiments and

the model predicted data is compared with experimental data. Models that are in agreement with

the reported physiological data are retained for further screening using additional data.

The following equation represents how each model is perturbed:

(4.16)

(4.17)

28

𝒗𝒊,𝟏 = 𝒌𝒊,𝟏𝑬𝒊,𝒕𝒐𝒕𝒂𝒍𝒓𝒆𝒇

𝑿𝒊𝒔𝒔,𝒓𝒆𝒇 ∗

𝑬𝒊,𝒕𝒐𝒕𝒂𝒍


∗[𝑿𝒊]


∗ [𝑬𝒊]


= Ǩ𝒊,𝟏𝒓𝒆𝒇

∗ 𝑬𝒊,𝒓 ∗ 𝑿 𝒊 ∗ ě𝒊,𝟏

In this equation, the new term Ei, total represent the modified enzyme concentration of enzyme i

and the ratio between Ei, total and Eref

i, total represents the fold change in enzyme concentration.

This fold change in enzyme concentrations is modified in order to perturb the systaem. In this

study, the value of ratio for deletion was taken as .01 and for over-expression as 10. We used

data from succinic acid strain (Raab et al., 2010) and from glycerol over-production strain

(Nevoigt et al., 1996) to screen the models. The screened models obtained after this process were

used to observe the changes in flux distribution with deletion of reactions suggested from steady-

state modeling. The reactions that were predicted to have a greater control on diversion of flux

towards aromatic amino acid pathway were chosen as targets for experimental validation.

4.2.4 Limitations of Ensemble modeling

The major limitation of ensemble modeling is its computationally intensive nature, which

prevents it from applying over genome-scale metabolic models. Also, EM in the present form

could only estimate the metabolic behaviour of cell after a mutation like FBA, and is not capable

of predicting strain designs for over-production like Optknock/GDLS. Additionally, ensemble

modeling can only make a qualitative prediction of the dynamic behaviour using steady-state

flux data from literature. In order to make a more accurate quantitative estimation of the kinetic

parameters, EM requires knowledge of the actual mechanism of every reaction in the network

and information about the allosteric regulations that operate inside the cell and metabolite and

enzyme concentrations.

4.3 Methodology

The steps followed while performing a computational design of the strain for tyrosine over-

production in S. cerevisiae were outlined below:

(4.18)

29

1. Obtain the genome-scale model iMM904 in SBML format for steady-state strain design.

2. Formulate Optknock/GDLS such that the bioengineering objective is tyrosine production and

cellular objective is growth maximization.

3. Run Optknock/GDLS algorithm, using MATLAB, to obtain a set of reactions that need to be

knocked out to achieve our objective.

4. Run FBA simulation using COBRA toolbox after removing the above mentioned reactions to

verify the result from Optknock/GDLS and obtain FBA predictions of growth rate and tyrosine

production fluxes.

5. Develop a small model of S. cerevisiae metabolism taking into account the major metabolic

pathways and the reactions suggested by Optknock/GDLS.

6. Obtain reference steady-state flux values and ΔGs of all the reactions in the small-scale model

from literature.

7. Build an ensemble of 2500 models that are all anchored to the reference steady-state flux data

but with different kinetic parameters.

8. Screen these models using the data reported on succinic acid and glycerol over producing

strain reported in literature.

9. Use the final set of models to get a better understanding of S. cerevisiae metabolic flux

distribution and prioritize the set reactions among the ones suggested by steady-state modeling

for experimental validation.

30

Chapter 5

Results and Discussion

5.1 Steady-state modeling results

The strain design in this work was performed using bi-level optimization methods, over the

genome-scale model iMM904 with 1577 reactions and 1228 metabolites. Our initial attempt at

strain design was performed using Optknock in which, maximizing tyrosine production was

considered as the outer optimization objective and growth rate maximization was considered as

the inner optimization objective. The simulation was carried out using glucose as the substrate

under aerobic conditions (Fig 5.1).

Figure 5.1 Steady-state strain design approach adapted in this study.

31

Optknock did not converge when applied over iMM904. This was possibly because of the limit

on the maximum possible knockouts that could be allowed, which in our case was four.

Therefore, in order to circumvent this problem, we used an alternative formulation of Optknock

called GDLS. GDLS uses a local search formulation instead of the global search approach

adopted by Optknock and would allow us to increase the limit on the maximum allowable knock

outs. When GDLS was applied with tyrosine production as the objective of outer problem, it did

not yield any solution. However, as GDLS is a local search algorithm, we hypothesised that the

path followed by the algorithm while performing a search for strategy might be critical. Hence,

in order to direct the algorithm path, we first designed a strain that maximized for chorismate

production. Chorismate is an intermediate in aromatic amino acid pathway and a precursor to all

the three aromatic amino acids. We used the chorismate strain, instead of wild-type iMM904, as

the starting point for tyrosine strain design. With this approach of strain design, we could obtain

a strategy that showed excellent growth coupling. Figure below (Fig. 5.2) shows the FBA

predicted feasible flux space of wild type iMM904 and the mutant.

Figure 5.2 Map of computationally predicted solution space for wild type iMM904 and the strategy designed from

GDLS. A and B represent the optimal points for wild type and mutant strains

According to our hypothesis, cells should grow at state where growth rate is maximum. From the

above figure (Fig. 5.2) it is clear that for wild-type iMM904 this state is at point A. At this state

32

of maximum growth rate, there is no production of tyrosine as expected. However, in the case of

mutant, the cells are forced to produce tyrosine. After adaptive evolution, the mutant is expected

to grow at point B, where it is predicted to produce tyrosine at 60% of the maximum possible

mathematical yield.

5.1.1 Predicted strategy

Figure 5.3 Schematic of the deletions suggested by GDLS.

33

As shown in the figure (Fig. 5.3) above, the predicted strategy is complex and targets deletion of

8 different enzymes.

PDC, Pyruvate decarboxylase, exists in three isoforms PDC1, PDC5 and PDC6 in S. cerevisiae

(http://www.yeastgenome.org/).This enzyme catalyzes conversion of pyruvate to acetaldehyde

and plays a major role in respiro-fermentative metabolism. We hypothesize that deleting this

reaction would divert the flux from ethanol production into aromatic amino acid pathway.

PYC, pyruvate carboxylase, exists in two isoforms PYC1 and PYC2 and converts cytoplasmic

pyruvate to cytoplasmic oxaloacetate. MDH, malate dehydrogenase, is a mitochondrial enzyme

that converts malate to oxaloacetate. MTM, malate transporter, helps in export of malate across

mitochondrial membrane (http://www.yeastgenome.org/). We hypothesize that, PYC, MDH and

MTM deletions are required to prevent wastage of carbon flux by producing excess

mitochondrial ATP through respiratory metabolism of TCA cycle.

ZWF, glucose-6-phosphate dehydrogenase, catalyses the first step of pentose phosphate pathway

and is the major source of cytoplasmic NADPH pools. It has been reported that deletion of ZWF

would results in mithionine auxotrophy due to the depletion of NADPH pools in cytoplasm. We

hypothesize that, ZWF deletion increases the production through cofactor coupling as production

of tyrosine from prephenate (Fig. 3.2) is one of the few ways in which cells can generate

cytoplasmic NADPH pools (http://www.yeastgenome.org/).

SER, 3-phosphoglycerate dehydrogenase, catalyzes the first step in serine and glycine

biosynthesis. Deleting this gene does not deplete the cells of serine and glycine, because they can

be produced by an alternate route from alanine. SER deletion is suggested probably to prevent

diversion of flux from glycolysis. DAK, dihydroxy acetone kinase, is a gene involved in glycerol

production branch of cell metabolism, and the deletion was suggested probably to prevent

regeneration of depleted NADPH pools using this reaction, instead of tyrosine production.

ARO10 is a decarboxylase enzyme that plays a role in degradation of all three aromatic amino

acids (http://www.yeastgenome.org/). This was the only reaction suggested by GDLS inside the

aromatic amino acid pathway and we have experimental evidence (Fig. 5.4) to show that that

deletion of ARO10 gene results in increase of intracellular tyrosine pools.

34

5.1.2 Experimental validation for ARO10 deletion

Figure 5.4 Schematic of experimental modifications made while testing ARO10 deletion. We compared ARO10

deletion mutant with tyrosine feedback resistant mutant reported earlier.

Experimental validation of ARO10 deletion was carried out by Professor Vince Martin’s group,

our collaborators at Concordia Universirty. We compared the result of ARO10 deletion mutant

with the only reported work on increased tyrosine levels in S. cerevisiae (Luttik et al., 2008). In

this study, Luttik et al., observed increased tyrosine pools when the tyrosine sensitive feedback

enzymes were replaced by their feedback insensitive versions. Above figure (Fig 5.4) shows the

reactions targeted for experimental validation. The figure below (Fig. 5.5) shows the

accumulation rates of 4 HPP (A) and tyrosine (B) for four different strains: wild-type, mutant

with feedback insensitive genes, ARO10 deletion mutant, and ARO10 deletion mutant along

with removal of feedback inhibition. In this work, coumarate was used as a sink for tyrosine.

35

Figure 5.5 Graphs showing accumulation of 4HPP and tyrosine in the four strains that we compared: wild-type,

tyrosine feedback insensitive mutant, ARO10 deletion mutant and mutant with both ARO10 deletion and feedback

insensitive enzymes. (Data from Vince Martin’s Group)

4HPP is the immediate precursor of tyrosine which is converted to tyrosine by an equilibrium

reaction, catalyzed by aromatic amniotransferase. Intracellular 4HPP pools can also be drawn

towards our product: naringenin, or xanthohumol (Fig 1.1) as it exists in equilibrium with

tyrosine. The accumulation rates of both 4HPP and coumarate in ΔARO10 mutant were found to

be comparable to that of feedback resistant mutants that was reported earlier. However, an

important observation was that when removal of feedback inhibition was combined with ARO10

gene deletion, the intracellular levels of both coumarate and 4HPP were much higher than in

either of the above cases. This suggests that, in the feedback resistant mutant, most of the flux

36

coming in to the aromatic amino acid pathway was diverted towards production of tyrosol,

indole-3-ethanol and phenylethanol, instead of increasing the tyrosine pools.

5.1.3 Need for Ensemble modeling

As shown before, steady-state modeling predicted a complex strategy. According to this strategy,

there was no appreciable growth coupling of tyrosine unless all the predicted deletions were

made simultaneously. This seemingly incorrect prediction was made because the steady-state

models do not account for kinetic parameters and treat every reaction in the network equally

feasible. However, the fact that ARO10 deletion alone resulted in an increased tyrosine pool

suggests that, not all the above deletions were needed to observe further increase in tyrosine

levels. Additionally, it is experimentally tedious to make a mutant with multiple deletions, so it is

important to select the targets that are experimentally verifiable. In order to do this selection, we

need to determine which of the deletions suggested by the steady-state model play major roles in

re-routing the carbon flux towards tyrosine. Determining this subset required the information of

kinetics of the reactions. However, the time-course data required to predict kinetic parameters is

difficult to obtain. In order to circumvent this difficulty, we used ensemble modeling to estimate

the dynamic behaviour of the cells using steady state flux data (section 4.2).

5.2 Ensemble modeling results

As stated in chapter 4, although EM is a novel approach that can predict dynamic behaviour of

cells using steady-state reference data available in literature, the large number of parameters

involved makes it currently impractical to apply EM over a large-scale network. For this reason

we prepared a small central model of S. cerevisiae for EM application.

37

5.2.1 S. cerevisiae central model reconstruction

While constructing the model for EM, we included reactions from all the major metabolic

pathways such as glycolysis, glucose fermentation, pentose phosphate pathways and TCA cycle.

In this small-scale model, we accounted for compartmentalisation by separating the metabolites

in cytoplasm and mitochondria. Metabolites that were present in both the compartments were

connected through exchange reactions. However, we assumed that cofactors such as ATP,

NADPH and NADH could be freely transported across the two compartments. EM framework

required us to provide a reference steady-state flux data that was used to anchor the models. In

our case, because we did not know C13 data for our strain, we used yeast C13 data from Blank et

al (2005). However, the data that they reported was only for core central metabolism and did not

include the data for aromatic amino acid pathway. Hence, for this project, we have not

considered the reactions in aromatic amino acid pathway.

The central model that we used contained all the other reactions suggested by steady-state

modeling except the aromatic amino acid pathway. However, one of our hypotheses was that

NADPH pool depletion might play a role in coupling growth rate with tyrosine production. In

order to test this hypothesis, we modified the first step in aromatic amino acid pathway which

involves condensation of PEP and E4P to DAHP in to the following form: PEP + E4P + NADP -

--> DAHP + NADPH, and made it a reaction which could regenerate NADPH, although in the

actual reaction there is no NADPH regeneration. The reference steady-state flux through this

reaction was taken as zero because under steady-state conditions there should be no

accumulation of any of the aromatic amino acids in wild-type. Since we did not have aromatic

amino acid pathway in our model, we could not prioritize targets for production of tyrosine.

Instead, we looked for targets that would increase the intracellular pools of the precursors PEP,

E4P and DAHP.

Also, the flux data for all the reactions present in our network was not reported in their paper

(Blank et al., 2005). In order to calculate the unknown flux values, and to ascertain that the

predicted flux data was in a steady-state, we formulated the following optimization problem

which minimized the difference between the calculated and the reported flux data:

Min (Vcal - Vm)2

Subject to S * Vcal = 0

Reaction bounds

(5.1)

(5.2)

(5.3)

38

In the above formulation, Vcal is the calculated flux data for all the reactions in our network and

Vm is the reported flux value in Blank et al (2005). S is the stoichiometric matrix. The reaction

bounds are chosen as 0 and 1000 for irreversible, and -1000 and 1000 for reversible reactions.

The following figure (Fig. 5.6) shows the reactions we considered, along with the calculated

reference steady-state flux values:

Figure 5.6 Schematic of reconstructed S. cerevisiae network used in this work along with calculated flux data

We obtained the biomass equation for our reconstructed model from Heer et al. (2009), which

was assumed to be composed of:

1.911 G6P[c] + 0.351 R5P[c] + 0.363 GAP[c] + 1.332 OAA[c] + 0.584 SER[c] + 1.397

ACCOA[c] + 0.155 ACCOA[m] + 0.997 OXOGLUTARATE[m] + 2.147 PYR[m] + 0.239

PYR[c] + 0.579 PEP[c] + 0.289 E4P[c] + 11.352 ATP + 11.249 NADPH + 0.118 NAD --->

Biomass + 11.249 NADP + 0.118 NADH + 11.352 ADP

39

EM can also account for allosteric regulation by considering them as separate reactions. In our

network, we considered three reported allosteric regulations: activation and repression of

Fructose-6-Phosphate (F6P) to Fructose-1,6-bis-Phosphate (FbP) by AMP and ATP respectively

(Simonis et al., 2004), activation of PEP to Pyruvate conversion by FbP (Boles et al., 1997).

5.2.2 Development of an ensemble of models

The reconstructed network along with the reference steady-state flux values and ΔGs for all the

reactions in the network were given as input into the EM algorithm and we constructed a set of

2500 models. Each of these models had a different set of kinetic parameter values, but all of

them were anchored to the same steady-state flux data.

5.2.3 Screening the ensemble using data from literature

The advantage of using EM is that, we can predict the dynamic behaviour of cells by using

reported steady-state flux data from enzyme tuning experiments. In this work, to screen the

models, we used data of glycerol (Nevoigt et al., 1996) and succinic acid (Raab et al., 2010)

over-producing strains reported in the literature. We chose these data sets because; all the

modifications done in their studies were performed on reactions that were included in our central

model.

In the glycerol over producing strain, pyruvate decarboxylase (PDC) and glycerol-3-phosphate

dehydrogenase (GPD) were the targeted genes. It was observed that when PDC gene was

repressed to approximately 20% activity, the yield of glycerol increased by 4.5 folds. When GPD

was over-expressed by increasing its activity 20 fold, a six fold increase in the yield of glycerol

was observed, and when these two manipulations were performed simultaneously, the glycerol

yield increased by around 8 fold. In each of the above mentioned glycerol over-production cases,

ethanol production rate decreased.

40

In the case of succinate over-producing strain, succinate dehydrogenase (SDH) and isocitrate

dehydrogenase (IDH) were the targeted reactions. When these two genes were deleted, an

increase in succinate production along with 25% reduction in the growth rate of the strain was

observed.

The schematic below (Fig. 5.6) shows how models were screened using data from each of the

above mentioned experimental findings. For model screening, we perturbed each of the 2500

models computationally, by modifying the enzyme concentration to reflect the genetic

manipulation performed experimentally. So in the case of glycerol production, to simulate PDC

repression the enzyme concentration in all the 2500 models was decreased to .2 times the

original concentration and for GPD over-expression, the enzyme concentration was increased 20

times. In the case of succinic acid strain, SDH and IDH deletions were simulated by decreasing

the corresponding enzyme concentrations in each model to .01 times the original value. After

these perturbations, we found that 42 of the 2500 models could simulate the results in agreement

with the experimental observations. These models were then used for determining the

experimental targets.

The EM framework applied in this work is qualitative, and cannot be expected to give an

accurate prediction of the observed experimental flux data after an enzyme tuning experiment.

Therefore, while screening these models we allowed for a 25% error range to the predicted flux

values. The actual ranges used for screening each perturbation are shown in the table below:

Perturbation Range

GPD over-expression Glycerol yield between 4.5 times to 7.5 times the wild

type yield

PDC repression Glycerol yield between 3.2 times to 5.7 times the wild

type yield

GPD over-expression + PDC repression Glycerol yield more than 6 times the wild type yield

SDH and IDH deletion Growth rate between 60% to 90% of the wildtype

growth rate

41

Figure 5.7 Model screening using data from succinic acid and glycerol over producing strains

5.2.4 Prioritizing the targets for experiments using screened models

The 42 models obtained after screening with reported the experimental data, were used to

prioritize the reactions to be targeted among the deletions suggested by the steady-state

modeling. To accomplish this, the selected models were perturbed to simulate the flux

distribution when each of the deletions is made. Then, we observed the effect of deletion on

growth rate, and accumulation rates of PEP, E4P and DAHP. Accumulation rate of a metabolite

was defined as the difference between the fluxes of reactions for which, the metabolite is a

product and a reactant. The plots (Fig. 5.8) below show average rate of accumulation of PEP,

E4P, DAHP and Biomass formation predicted for the 42 models. Results from SER and DAK

deletion are not shown because they did not show any significant increase in accumulation of

PEP, E4P or DAHP.

Although GDLS and FBA predict that PDC reaction is not a lethal deletion for S. cerevisiae, it

has been reported that if we delete all three isoforms of PDC (PDC1, PDC5 and PDC6), yeast

does not grow with glucose as the only carbon source (Flikweert et al., 1996; Hohmann et al.,

42

1991). Also, complete deletion of PDC was not recommended for this project, because we

needed to produce acetate which will then be converted to Acetyl CoA and then further to

Malonyl CoA. Malonyl CoA is required for conversion of 4-Coumaryl CoA to naringenin, as

shown in Fig. 1.1 Therefore while simulating EM, we repressed PDC gene instead of deleting it.

It has been reported that, PDC gene repression can be achieved by deleting PDC1 isoform which

results in 30% reduction of total pyruvate decarboxylase activity (Flikweert et al., 1996).

Figure 5.8 EM predicted PEP, E4P, DAHP accumulation and biomass formation rates when deletions suggested by

GDLS are implemented.

Model predicted that, PDC repression is the most important manipulation. PDC repression would

result in increased intracellular pools of PEP (Fig 5.8), but no significant increase was observed

in either E4P or DAHP pools (Fig 5.8). PEP accumulation was observed probably because,

repression of the PDC gene resulted in decreased flux from pyruvate to acetaldehyde (which is a

major flux in S. cerevisiae under aerobic conditions), thereby increasing the pools of pyruvate,

and its immediate precursor PEP. Additionally, the predicted drop in the growth rate was

significant compared to the wild-type (Fig 5.8).

43

PYC deletion does not add any significant value to our objective, but when PYC, MDH and

MTM were deleted together along with PDC repression, we observed that growth rate decreased

significantly (Fig 5.8), without any increase in metabolites of our interest. Therefore, PYC,

MTM and MDH deletions may not be required to observe improved tyrosine levels.

ZWF deletion although resulted in a significant decrease in growth rate (Fig. 5.7), it was useful

to improve the intracellular DAHP, our proxy metabolite for tyrosine, levels (Fig 5.7). Also, we

observed a slight increase in E4P pools with ZWF deletion. This sudden appearance of DAHP

inside the cell was probably to compensate for the reduced NADPH pools. In addition to

appearance of DAHP, we observed that the flux from acetaldehyde was directed more towards

acetate than ethanol, probably because production of acetate results in generation of NADPH.

This observation is of interest to us because; acetate is the precursor for Malonyl CoA and,

diverting more flux from acetaldehyde towards acetate instead of ethanol will be beneficial for

the project. After this observation, we looked into literature to see if there was any reported data

that supported our observation. We found that, this is indeed true and that over-expressing ALD6

(Grabowska et al., 2003), major isoform of acetaldehyde dehydrogenase, is the only way in

which growth defect could be reversed for ZWF mutants. In order to test this hypothesis, we

over-expressed acetaldehyde dehydrogenase activity in the models, and we did observe a slight

increase in the growth rate (Fig. 5.9). Over-expression of ALD reaction also resulted in further

increase of flux towards acetate. The model predicted flux distribution patterns for the reactions

in fermentative branch of the yeast for different mutants were discussed in detail in appendix C.

Figure 5.9 Graph showing the effect of ALD over-expression on growth rate of PDC-- and ΔZWF mutant.

44

In addition to improving the growth rate of PDC repressed and ZWF deleted mutant, over-

expression of ALD6 can compensate for the loss of acetate flux resulting from PDC repression.

5.3 Final strategy to be verified experimentally

The final strategy that we propose to obtain a strain with increased tyrosine levels is shown in the

following schematic:

Figure 5.10 Schematic of the proposed final strategy for tyrosine over-producing strain

The final strategy proposed involved

- Deletion of three genes: ZWF1, glucose-6-phosphate dehydrogenase ; PDC1, major

isoform of pyruvate decarboxylase and ARO10

- Over-expression of ALD6, major isoform of acetaldehyde dehydrogenase

- Removal of tyrosine feedback resistance by expressing tyrosine insensitive AROFFBR

45

Chapter 6

Conclusions and Future Work

6.1 Conclusions

In this work, we proposed a strategy for obtaining a strain of S.cerevisiae with an improved

tyrosine producing capability. The proposed strategy involved five genetic manipulations:

deletion of PDC1, ZWF1 and ARO10 genes, over-expressing ALD6 and substitution of natural

enzymes with their feedback insensitive isoforms, which was shown previously to improve

tyrosine yields. Initial experimental validation of our strategy revealed that, when ARO10

deletion was combined with incorporation of feedback insensitive enzymes, the intracellular

tyrosine pools increased significantly higher than the previously reported strain. While ARO10

deletion and replacement of feedback insensitive enzymes in aromatic amino acid pathway was

already shown to be effective, dynamic modeling predicted that PDC1 and ZWF1 deletions

would further improve the flux to tyrosine by improving the pools of precursors and through

cofactor coupling. Also, we predicted that ZWF1 deletion would be useful to divert greater flux

from acetaldehyde towards acetate instead of ethanol due to NADPH coupling. This would be

further enhanced by over-expression of ALD6, which would also contribute towards improving

the growth rate of PDC1, ZWF1 deletion mutant. We expect the final strain containing all the

suggested manipulations would show higher pools of both tyrosine and acetate, thereby, making

it an ideal platform strain for production of plant secondary metabolites like xanthohumol.

Our work will be the first reported study that investigated the genome-wide engineering of

S.cerevisiae for improved tyrosine production. Apart from using the final strain as the host for

production of plant metabolites, if the yield of tyrosine is found to be sufficiently high, the strain

can be used for large-scale production of tyrosine which by itself is a valuable compound

industrially. Using S. cerevisiae as the host instead of E.coli is advantageous because yeast is a

more robust organism.

46

The modeling procedure discussed here, provides an effective way to make a genome-scale

experimental design in the absence of kinetic and regulatory information. Although steady-state

strain design algorithms are effective in predicting genetic engineering strategies, the predicted

strategies can sometimes prove to be difficult for experimental validation. In such cases, the

procedure followed in this work of prioritizing the deletions by taking kinetics into account will

be very useful. Also it can be concluded that, because steady-state modeling methods does not

account for kinetics, it is always advisable to test the steady state predictions using a dynamic

model before experimental validation.

6.2 Future work

We are currently working on validating the final strategy experimentally, which is an obvious

future direction. Also, modeling methods such as ensemble modeling are highly strain specific,

and if an accurate dynamic model for a strain is desired, it is necessary to use data only from that

strain for both construction and screening of the ensemble. We are currently working on

obtaining C13 flux data for the strain that we are using for plant secondary metabolite

production. Once the C13 data is available, we will expand our central model to include

reactions from aromatic amino acid pathway and the heterologous pathway. This expanded

central model would be used to develop a kinetic model that is specific to our strain using

experimental results of ZWF1, PDC1 and ARO10 deletions.

47

References

Alper H, Jin YS, Moxley JF, Stephanopoulos G. Identifying gene targets for the metabolic

engineering of lycopene biosynthesis in Escherichia coli. 2005. Metabolic Engg., 7:155–164.

Asadollahi MA, Maury J, Patil KR, et al: Enhancing sesquiterpene production in Saccharomyces

cerevisiae through in silico driven metabolic engineering. 2009. Metabolic Engg. 11:328-34.

Bailey JE. Complex biology with no parameters. 2001. Nature Biotechnology, 19:503–504.

Berry, A. Improving production of aromatic compounds in Escherichia coli by metabolic

engineering. 1996. Trends in Biotechnol., 14:250–256.

Blank LM, Lars Kuepfer, Uwe Sauer. Large-scale 13C-flux analysis reveals mechanistic

principles of metabolic network robustness to null mutations in yeast. 2005. Genome Biology,

6: R49.

Bongaerts J, Krämer M, Müller U, Raeven L, Wubbolts M. Metabolic engineering for

microbial production of aromatic amino acids and derived compounds. 2001. Metabolic Engg.

3:289–300.

Breuer M, Ditrich K, Habicher T, Hauer B, Keβeler M, Stürmer R, Zelinski T. Industrial

methods for the production of optically active intermediates. 2004. Angewandte Chem Int Ed,

43:788–824.

Bro C, Regenberg B, Förster J, Nielsen J. In silico aided metabolic engineering of

Saccharomyces cerevisiae for improved bioethanol production. 2006. Metabolic Engg.,

8:102–111.

48

Brochado, Ana Rita, Claudia Matos, Birger L Møller, Jørgen Hansen, Uffe H Mortensen, Kiran

Raosaheb Patil. Improving vanillin production in baker’s yeast through in silico design. 2010.

Metabolic Cell Factories, 9:84.

Brown, I.W. Dawes. Regulation of chorismate mutase in Saccharomyces cerevisiae. 1990.

Molecular Genetics and Genomics, pp. 283–288.

Burgard, A. P., Pharkya, P., Maranas, C. D. OPTKNOCK: a bilevel programming framework for

identifying gene knockout strategies for microbial strain optimization. 2003. Biotechnology and

Bioengineering. 84(6), 647-657.

Chassagnole, C, Noisommit-Rizzi, N., Schmid, J.W., Mauch, K., Reuss, M. Dynamic

modeling of the central carbon metabolism of Escherichia coli. 2002. Biotechnology and

bioengineering 79, 53-73.

Chemler JA, Yan Y, Koffas MA. Biosynthesis of isoprenoids, polyunsaturated fatty acids and

flavonoids in Saccharomyces cerevisiae. 2006. Microbial Cell Fact, 5:20.

Contador, C. A., Rizk, M. L., Asenjo, J. A., Liao, J. C., Ensemble modeling for strain

development of L-lysine-producing Escherichia coli. 2009. Metabolic Engg., 11, 221–233.

Dean, J. T., M. L. Rizk, Y. Tan, K. M. Dipple, and J. C. Liao. Ensemble modeling of hepatic

fatty acid metabolism with a synthetic glyoxylate shunt. 2010. Biophysical Journal, 98 (8) :

1385-95.

Dejong JM, Liu Y, Bollon AP, Long RM, Jennewein S, Williams D, Croteau RB Genetic

engineering of taxol biosynthetic genes in Saccharomyces cerevisiae. 2006. Biotechnol

Bioeng., 93:212–224.

Dobson PD, Smallbone K, Jameson D, Simeonidis E, Lanthaler K, Pir P, Lu C, Swainston N,

Dunn WB, Fisher P. Further developments towards a genome-scale metabolic model of yeast.

2010. BMC Systems Biology, 4:145–151.

49

Dosselaere F, Vanderleyden J. A metabolic node in action: chorismate-utilizing enzymes in

microorganisms. 2001. Crit Rev Microbiol, 27:75–131.

Duarte NC, Herrgård MJ, Palsson B. Reconstruction and Validation of Saccharomyces

cerevisiae iND750, a Fully Compartmentalized Genome-Scale Metabolic Model.

2004. Genome Res.,14:1298–1309.

Edwards JS, Ibarra RU, Palsson BØ: In silico predictions of Escherichia coli metabolic

capabilities are consistent with experimental data. 2001. Nature Biotechnology, 19:125-30.

Edwards JS, Palsson BO. The Escherichia coli MG1655 in silico metabolic genotype: its

definition, characteristics, and capabilities. 2000. PNAS USA., 97:5528–5533.

Famili I, Förster J, Nielsen J, Palsson BØ: Saccharomyces cerevisiae phenotypes can be

predicted by using constraint-based analysis of a genome-scale reconstructed metabolic

network. 2003. PNAS USA, 100:13134-9.

Flikweert MT, Zanden LVD, Janssen WMTM, et al: Pyruvate decarboxylase: An

indispensable enzyme for growth of Saccharomyces cerevisiae on glucose. 1996. Yeast,

140:1723-257.

Feist AM, Palsson BO The growing scope of applications of genome-scale metabolic

reconstructions using Escherichia coli. 2008. Nature Biotechnology, 26:659–667.

Fong SS, Burgard AP, Herring CD, Knight EM, Blattner FR, Maranas CD, Palsson BO. In

silico design and adaptive evolution of Escherichia coli for production of lactic acid. 2005.

Biotechnol Bioeng., 91:643–648.

50

Forster J, Famili I, Fu P, Palsson BO, Nielsen J. Genome-scale reconstruction of the

Saccharomyces cerevisiae metabolic network. 2003. Genome Res., 13:244–253.

Frost JW, Draths KM. Biocatalytic synthesis of aromatics from D-glucose: renewable microbial

sources of aromatic compounds. 1995. Annu Rev Mircobiol., 49:557–579.

Gosset, G. Production of aromatic compounds in bacteria. 2009. Curr Opin Biotechnol., 20 , pp.

651–658.

Goffeau A: The yeast genome directory. 1997. Nature, 387:5-6.

Grabowska D and Chelstowska A .The ALD6 gene product is indispensable for providing

NADPH in yeast cells lacking glucose-6-phosphate dehydrogenase activity. 2003. J Biol

Chem., 278(16):13984-8.

Hawkins, K.M. and C.D. Smolke, Production of benzylisoquinoline alkaloids in Saccharomyces

cerevisiae. 2008. Nature Chemical Biology, 4: 564-73.

Hartmann, T.R. Schneider, A. Pfeil, G. Heinrich, W.N. Lipscomb, G.H. Braus. Evolution of

feedback-inhibited beta /alpha barrel isoenzymes by gene duplication and a single mutation.

2003. PNAS USA, 100 pp. 862–867.

Heer D, Heine D, Sauer U: Resistance of Saccharomyces cerevisiae to high concentrations of

furfural is based on NADPH-dependent reduction by at least two oxireductases. 2009. Appl

Environ Microbiol, 75:7631.

Herrgard MJ et al. A consensus yeast metabolic network reconstruction obtained from a

community approach to systems biology. 2008. Nature Biotechnology, 26:1155–1160.

Hohmann S: Characterization of PDC6, a third structural gene for pyruvate decarboxylase in

Saccharomyces cerevisiae. 1991. Journal of bacteriology, 173:7963-9.

51

Ikeda, M. Amino acid production processes. 2003. Advances in biochemical engineering/

biotechnology, 79:1-35.

Ikeda M. Towards bacterial strains overproducing L-tryptophan and other aromatics by

metabolic engineering. 2006. Appl Microbiol Biotechnol., 69:615–626.

Jankowski, M. D., C. S. Henry, L. J. Broadbelt, and V. Hatzimanikatis. Group contribution

method for thermodynamic analysis of complex metabolic networks. 2008. Biophysical

Journal, 95 (3) (Aug): 1487-99.

Jiang H, Wood KV, Morgan JA. Metabolic engineering of the phenylpropanoid pathway in

Saccharomyces cerevisiae. 2005. Appl Environ Microbiol, 71:2962–2969.

Juminaga D, Edward E. K. Baidoo, Alyssa M. Redding-Johanson, Tanveer S. Batth, Helcio

Burd, Aindrila Mukhopadhyay, Christopher J. Petzold

and Jay D. Keasling.

Modular

Engineering of Tyrosine Production in Escherichia coli. 2012. Appl. Environ. Microbiol., vol.

78.

Kauffman KJ, Prakash P, Edwards JS. Advances in flux balance analysis. 2003. Curr Opin

Biotechnol., 14:491–496.

Khosla C, Keasling JD, Metabolic engineering for drug discovery and development. 2003. Nat

Rev Drug Discov., 2:1019–1025.

Krappmann, W.N. Lipscomb, G.H. Braus. Coevolution of transcriptional and allosteric

regulation at the chorismate metabolic branch point of Saccharomyces cerevisiae. 2000.

PNAS USA, 97 pp. 13585–13590.

Kuepfer L, Sauer U, Blank LM. Metabolic functions of duplicate genes in Saccharomyces

cerevisiae. 2005. Genome Res., 15:1421–1430.

52

Kunzler, G. Paravicini, C.M. Egli, S. Irniger, G.H. Braus. Cloning, primary structure and

regulation of the ARO4 gene, encoding the tyrosine-inhibited 3-deoxy-d-arabino-

heptulosonate-7-phosphate synthase from Saccharomyces cerevisiae. 1992. Gene, 113 pp. 67–

74.

Lee, D.Y. et al. WebCell: a web-based environment for kinetic modeling and dynamic

simulation of cellular networks. 2006. Bioinformatics 22, 1150-1151.

Leuchtenberger, K. Huthmacher, K. Drauz Biotechnological production of amino acids and

derivatives: current status and prospects. 2005. Appl Microbiol Biotechnol, 69 pp. 1–8

Lun, D. S., Rockwell, G., Guido, N. J., Baym, M., Kelner, J. A., Berger, B., Galagan, J. E., et

al. Large-scale identification of genetic design strategies using local search. 2009. Molecular

Systems Biology.

Lütke-Eversloh, C.N. Santos, G. Stephanopoulos Perspectives of biotechnological production of

tyrosine and its applications. 2007. Appl Microbiol Biotechnol., 77 pp. 751–762.

Luttik MAH, Vuralhan Z, Suir E, et al: Alleviation of feedback inhibition in Saccharomyces

cerevisiae aromatic amino acid biosynthesis: quantification of metabolic impact. 2008.

Metabolic engineering, 10:141-53.

Maury J, Asadollahi MA, Moller K, Clark A, Nielsen J. Microbial isoprenoid production: an

example of green chemistry through metabolic engineering. 2005. Adv Biochem Eng


Minami, H., J. S. Kim, N. Ikezawa, T. Takemura, T. Katayama, H. Kumagai, and F. Sato.

Microbial production of plant benzylisoquinoline alkaloids. 2008. PNAS USA, 105: 7393-98.

53

Mo ML, Palsson B, Herrgård MJ. Connecting extracellular metabolomic measurements to

intracellular flux states in yeast. 2009. BMC Sys Biol., 3:37–54.

Nevoigt E, Stahl U. Reduced pyruvate decarboxylase and increased glycerol-3-phosphate

dehydrogenase [NAD+] levels enhance glycerol production in Saccharomyces cerevisiae. 1996.

Yeast, Oct;12(13):1331-7.

Nissen TL, Kielland-Brandt MC, Nielsen J, Villadsen J: Optimization of ethanol production in

Saccharomyces cerevisiae by metabolic engineering of the ammonium assimilation. 2000.

Metabolic Engg. , 2:69-77.

Nookaew I, Jewett MC, Meechai A, Thammarongtham C, Laoteng K, Cheevadhanarak S,

Nielsen J, Bhumiratana S. The genome-scale metabolic model iIN 800 of Saccharomyces

cerevisiae and its validation: a scaffold to query lipid metabolism. 2008. BMC Sys Biol., 2:71.

Olson MM, Templeton LJ, Suh W, Youderian P, Sariaslani FS, Gatenby AA, Van Dyk TK .

Production of tyrosine from sucrose or glucose achieved by rapid genetic changes to

phenylalanine-producing Escherichia coli strains. 2007. Appl Microbiol Biotechnol.,

74:1031–1040.

Osterlund, T., Nookaew, I., Nielsen, J. Fifteen years of large scale metabolic modeling of

yeast: Developments and impacts. 2011. Biotechnology Advances, V. 30 (5).

Orth, J.D., I. Thiele, and B.Ø. Palsson. What is flux balance analysis? 2010. Nature

Biotechnology, 28 (3): 245-8.

Patil,K.R., Rocha,I., Forster,J., and Nielsen,J. Evolutionary programming as a platform for in

silico metabolic engineering. 2005. BMC Bioinformatics.

54

Pharkya P, Maranas CD. An optimization framework for identifying reaction

activation/inhibition or elimination candidates for overproduction in microbial systems. 2006.

Metabolic Engg., 8:1–13.

Pittard J Biosynthesis of aromatic amino acids. In: Neidhardt FC (ed) Escherichia coli and

Salmonella typhimurium: cellular and molecular biology, vol. 1. 1996. American Society of

Microbiology, Washington, DC, pp 458–484.

Porro D, Sauer M, Branduardi P, Mattanovich D. Recombinant protein production in yeasts.

2005. Mol Biotechnol., 31:245–259.

Price ND, Papin JA, Schilling CH, Palsson BO. Genome-scale microbial in silico models: the

constraints-based approach. 2003. Trends in Biotechnol., 21:162–169.

Primrose SB. The application of genetically engineered micro-organisms in the production of

drugs. 1986. J. Appl. Bacteriol., 61: 99-116.

Ra M, Mm D. Simple constrained optimization view of acetate overflow in E. coli. 1990.

Biotechnol Bioeng., 35:732–738.

Raab AM, Gebhardt G, Bolotina N, Weuster-Botz D, Lang C. Metabolic engineering of

Saccharomyces cerevisiae for the biotechnological production of succinic acid. 2010. Metabolic

Engg., 12, 518-525.

Ramakrishna R, Edwards JS, McCulloch A, Palsson BO. Flux-balance analysis of mitochondrial

energy metabolism: consequences of systemic stoichiometric constraints. 2001. American J

Physiol., 280:695–704.

55

Ranganathan S, Suthers PF, Maranas CD. OptForce: An Optimization Procedure for

Identifying All Genetic Manipulations Leading to Targeted Overproductions. 2010. PLoS

Comp Biol., 6:1–11.

Ro DK, Douglas CJ. Reconstitution of the entry point of plant phenylpropanoid metabolism in

yeast (Saccharomyces cerevisiae): implications for control of metabolic flux into the

phenylpropanoid pathway. 2004. J Biol Chem., 279:2600–2607.

Ro DK, Paradise EM, Ouellet M et al. Production of the antimalarial drug precursor

artemisinic acid in engineered yeast. 2006. Nature, 440:940–943.

Santos CNS, MattheosKoffas , GregoryStephanopoulos. Optimization of a heterologous

pathway for the production of flavonoids from glucose. 2011. Metabolic Engg., 13 392–400.

Segre D, Vitkup D, Church GM. Analysis of optimality in natural and perturbed metabolic

networks. 2002. PNAS USA, 99:15112–15117.

Khazaei, Tahmineh. Ensemble Modeling of Cancer Metabolism. 2011. University of Toronto,

MASc thesis.

Tran, L. M., Rizk, M. L., Liao, J. C., Ensemble modeling of metabolic networks. 2008.

Biophys. J., 95, 5606–5617.

Wang, L., Birol, I. & Hatzimanikatis, V. Metabolic control analysis under uncertainty:

framework development and case studies. 2004. Biophysical journal, 87, 3750-3763.

Wang Q, Chen X, Yang Y, Zhao X. Genome-scale in silico aided metabolic analysis and flux

comparisons of Escherichia coli to improve succinate production. 2006. Appl Microbiol


56

Yan Y, Huang L, Koffas MA. Biosynthesis of 5-deoxy flavanones in microorganisms. 2007.

Biotechnol. J, 2:1250–1262.

Yang, L., Cluett, W. R., Mahadevan, R. EMILiO: a fast algorithm for genome-scale strain

design. 2011. Metabolic Engg., 13(3), 272-281.

Yi J, Draths KM, Li K, Frost JW. Altered glucose transport and shikimate product yields in

Escherichia coli. 2003. Biotechnol. Prog., 19:1450–1459.

Zabriskie DW, Arcuri EJ. Factors influencing productivity of fermentations employing

recombinant microorganisms. 1986. Enzyme Microb. Technol., 8: 706-717.

Zomorrodi, A.R and Maranas, C.D. Improving the iMM904 S. cerevisiae metabolic model

using essentiality and synthetic lethality data. 2010. BMC Systems Biology, 4:178.

57

Appendix A

Table showing the reactions that we considered for central model of S.cerevisiae, along with

the ΔG and steady-state flux values

Name Reaction G Vss

hex [c] : D-Glucose + ATP ---> ADP + G6P -4.5 10

zwf [c] : G6P + NADP ---> NADPH + 6PGL -2 1.06

sol [c] : 6PGL + NADP ---> R5P + NADPH -4.81 1.06

pgi [c] : G6P <==> F6P -0.9 8.47

pfk [c] : ATP + F6P ---> ADP + FBP -4.5 9.1

fba [c] : FBP ---> 2 GAP 4.3 9.1

tkt1 [c] : 2 R5P <==> GAP + S7P 2.8 0.35

tal [c] : GAP + S7P <==> E4P + F6P -1.75 0.35

tkt2 [c] : E4P + R5P <==> GAP + F6P -1.75 0.28

ser3 [c] : GAP + NAD ---> NADH + SER 3.9 0.12

gpd [c] : GAP + NAD <==> 3pDGP + NADH -0.36 17.24

pgk [c] : 3pDGP + ADP ---> ATP + PEP -2.63 17.24

pyk [c] : PEP + ADP ---> PYR + ATP -4.61 17.14

pepck [c] : OAA + ATP ---> ADP + PEP -0.29 0.03

pyc [c] : PYR + ATP ---> ADP + OAA -1.1 0.89

acoah [c] : ACETATE + ATP ---> ACCOA + AMP -0.95 0.35

ald [c] : ACETALDEHYDE + NADP ---> ACETATE + NADPH -11.99 0.87

adh [c] : ACETALDEHYDE + NADH ---> ETHANOL + NAD -5.78 14.17

pdc [c] : PYR ---> ACETALDEHYDE -3.44 15.04

gdh [c] : G3P + NADH ---> GLYCEROL + NAD 1.44 1.06

glyc1 [c] : GLYCEROL + NADP ---> DHA + NADPH -0.9 0

dak [c] : DHA + ATP ---> ADP + GAP -4.5 0

dahps [c] : PEP + E4P + NADP ---> DAHP + NADPH -12.17 0

pdhm [m] : PYR + NAD ---> ACCOA + NADH -8.14 0.66

csm [m] : OAA + ACCOA ---> CIT -8.93 0.64

icdhm [m] : CIT + NAD ---> NADH + OXOGLUTARATE 4.55 0.64

icl [m] : CIT ---> SUCCINATE + GLYOXYLATE 5.25 0

mas [m] : GLYOXYLATE + mitACCOA ---> MAL 8 0

kgdm [m] : OXOGLUTARATE + NAD ---> SUCCINYLCOA + NADH -9.68 0.4

sucoam [m] : SUCCINYLCOA + 0.5 ADP ---> SUCCINATE + 0.5 ATP 1.06 0.2

sdhm [m] : SUCCINATE + NAD ---> FUMARATE + NADH -2.44 0.13

fumm [m] : FUMARATE ---> MAL -0.61 0.13

mdhm [m] : MAL + NAD ---> NADH + OAA 4.8 0.1

malpyr [m] : MAL + NADP ---> PYR + NADPH 1.32 0.03

nadhatp [c] : 2 NADH + 3 ATP + O2 ---> 2 NAD + 3 ATP -0.1 1.99

58

pyr_t PYR[c] <==> PYR[m] -0.1 1.15

accoa_t ACCOA[c] <==> ACCOA[m] -0.1 0.02

oaa_t OAA[c] <==> OAA[m] -0.1 0.53

atp_r ATP <==> ADP -0.1 17.1

nadp_r NADPH <==> NADP -0.1 0.33

nad_r NADH <==> NAD -0.1 0.01

glc_in [c] : ---> D-Glucose -0.1 10

O2_in [c] : ---> O2 -0.1 1.99

glycerol_out [c] : GLYCEROL ---> -0.1 1.06

ac_out [c] : ACETATE ---> -0.1 0.52

eth_out [c] : ETHANOL ---> -0.1 14.2

succ_out [m] : SUCCINATE ---> -0.1 0.26

dahp_out [c] : DAHP ---> -0.1 0

Biomass_out [c] : Biomass ---> -0.1 0.26

Abbreviations

Metabolites

G6P glucose-6-phosphate F6P fructose-6-phosphate

FBP fructose-1,6-bisphosphate DHAP dihydroxyacetonephosphate

GAP glyceraldehdye-3-phosphate DHA dihydroxyacetone

3PG 3-phosphoglycerate PEP phosphoenolpyruvate

PYR pyruvate SER serine

6PGL 6-phosphogluconate Ru5P ribulose-5-phosphate

X5P xylulose-5-phosphate R5P ribose-5-phosphate

S7P sedoheptulose-7-phosphate E4P erythrose-4-phosphate

ACCOA acetyl-CoA OAA oxaloacetate

CIT citrate ICIT isocitrate

MAL malate ATP adenosine-triphosphate

ADP adenosine-diphosphate AMP adenosine-monophosphate

NADH diphosphopyridinenucleotide-reduced NAD diphosphopyridinenucleotide

59

NADP nicotinamideadeninedinucleotidephosphate

NADPH nicotinamideadeninedinucleotidephosphate-reduced

DAHP 3-deoxy-D-arabino-hepulosonate-7-phosphate

Enzymes

hex hexokinase pgi phosphoglucose isomerase

pfk phosphofructokinase fba fructose bisphosphatealdolase

gpd glyceraldehyde 3-phosphatedehydrogenase pgk phosphoglycerate kinase

pyk pyruvate kinase pdh pyruvate dehydrogenase

pepck phosphoenolpyruvate carboxylase gdh glycerol 3-phosphatedehydrogenase

ser3 serine synthesis zwf glucose-6-phosphate-1-dehydrogenase

tkt1 transketolase 1 tkt2 transketolase 2

tal transaldolase dahps dahp synthesase

csm citrate synthase icdhm isocitrate dehydrogenase

sdhm succinate dehydrogenase fumm fumarase

mdhm malate dehydrogenase icl isocitrate lyase

mas malate synthase pyc pyruvate carboxylase

dak dihydroxy-acetonekinase pdc pyruvate decarboxylase

acoah acetylcoahydroxylase ald acetaldehyde dehydrogenase

pdhm pyruvatedehydrogenase kgdm oxoglutarate dehydrogenase complex

sucoam succinate-Coa ligase adh alcohol dehydrogenase

sol enzyme representing the clubbed reactions from 6PGL to R5P

pyr_t pyruvate transfer atp_r atp-recycle

O2_in Oxygen inflow ac_ out acetate outflow

60

Appendix B

Standard mechanisms used to break a reaction into its elementary form (Dean et al., 2010) are

shown below. If a reaction contains more than 2 substrates or products, it should be

decomposed to two or more reactions that meet the required criteria.

One substrate, One product

𝑋𝑖 + 𝐸𝑖 ↔ 𝑋𝑖𝐸𝑖 ↔ 𝑋𝑖+1𝐸𝑖 ↔ 𝑋𝑖+1 + 𝐸𝑖

One substrate, Two products

𝑋𝑖 + 𝐸𝑖 ↔ 𝑋𝑖𝐸𝑖 ↔ 𝑋𝑖+1𝑋𝑖+2𝐸𝑖 ↔ 𝑋𝑖+1 + 𝑋𝑖+2𝐸𝑖 ↔ 𝑋𝑖+2 + 𝐸𝑖

Two substrates, One product

𝑋𝑖 + 𝐸𝑖 ↔ 𝑋𝑖𝐸𝑖 + 𝑋𝑖+1 ↔ 𝑋𝑖𝑋𝑖+1𝐸𝑖 ↔ 𝑋𝑖+2𝐸𝑖 ↔ 𝑋𝑖+2 + 𝐸𝑖

Two substrates, Two products

𝑋𝑖 + 𝐸𝑖 ↔ 𝑋𝑖𝐸𝑖 + 𝑋𝑖+1 ↔ 𝑋𝑖𝑋𝑖+1𝐸𝑖 ↔ 𝑋𝑖+2𝑋𝑖+3𝐸𝑖 ↔ 𝑋𝑖+2 + 𝑋𝑖+3𝐸𝑖 ↔ 𝑋𝑖+3 + 𝐸𝑖

Allosteric regulation

M + 𝐸𝑓𝑟𝑒𝑒 ↔ 𝐸𝑐𝑜𝑚𝑝𝑙𝑒𝑥

After breaking down the reactions in our network (shown in Appendix A) into their

elementary form, EM generated a set of 394 elementary reactions. Each of the 2500 models

considered had a different set of kinetic parameters for these 394 elementary reactions. The

excel file attached with this report contains the set of kinetic parameters for the forty two

screened models. The excel file also contains the stoichiometric matrix for the elementary

reactions. The stoichiometric matrix has 394 reactions and 240 metabolites. Out of the 240

metabolites, 49 are the actual metabolites and remaining 191 are the enzyme complexes.

61

Appendix C

The above plots show dynamic model predicted flux through the reactions ald, acetaldehyde

dehydrogenase and adh, alcohol dehydrogenase which converts acetaldehyde to acetate and

ethanol respectively. The data shown is for the four strains: wild type; mutant with PDC

repression; mutant with PDC repression and ZWF deletion and mutant with ALD over-

expression in addition to PDC repression and ZWF deletion. The table shows the ratio of

fluxes towards acetate and ethanol for the four strains. From the plots it is clear that, for all the

three mutants the flux towards ethanol and acetate was lower when compared to the wild type.

This lower flux is observed because, all three mutants contain PDC repression which would

decrease the flux towards acetaldehyde, the common precursor for ethanol and acetate. From

the table it can be observed that, while the ratio of flux towards ethanol and acetate remains

the same for wild type and PDC repressed mutants, a greater fraction of flux from

62

acetaldehyde was channelled towards acetate when ZWF was deleted. This increase in the

ratio of flux towards acetate after deletion of ZWF gene is to compensate for the depleted

NADPH pools in the cytoplasm. When this mutant was further perturbed by over-expressing

ALD (10 times), the fraction of flux towards acetate increased further as expected. This

increased flux towards acetate was the reason for improving the growth rate (Fig 5.9).

63

Appendix D

Table showing the number of genes, metabolites and reactions in each of the genome-scale

models reported till date.

Model Name No. of Genes

Sequenced No. of Metabolites No. of Reactions

iFF708 708 825 1145

iND750 750 646 1149

iLL672 672 636 1038

iIN800 800 1013 1446

iMM904 904 1228 1577

Yeast 4.0 932 1319 1865

iIN800 iMM904 Yeast 4.0

Number of genes 707 904 924

True positive (%) 69.7 75 74.8

True negative (%) 6.9 5.1 5.3

False positive (%) 10.6 9.3 11.1

False negative (%) 12.7 10.6 8.8

In the above table positive and negative refer to the ability and in ability to grow under

glucose-limited minimal media conditions (Dobson et al., 2010)

model based design of a saccharomyces cerevisiae platform ... · pdf filemodel based design of...

Documents