computational tools for the analysis of microbial ... · metabolic engineering is the analysis and...
TRANSCRIPT
Computational Tools for the Analysis of Microbial Production Systems
E-mail: [email protected]
Web page: fenske.che.psu.edu/faculty/cmaranas
Penn State University
University Park, PA 16802
Costas D. Maranas
Chemical factory on the µµµµm scale
PEP
G6P
F16P
F6P
13P2DG
3PDGL
2PDGL
PEP
PYR
OA
MAL
SUCCOACIT
AKG
GAP
ICIT
DHAP
GLC
PYRD6PGL RL5P
X5P
S7P
E4P F6P
GAP
R5P
D6PGC
FUM
ACCOA
SUCC
ACTP
AC
LAC ETH
PEP
G6P
F16P
F6P
13P2DG
3PDGL
2PDGL
PEP
PYR
OA
MAL
SUCCOACIT
AKG
GAP
ICIT
DHAP
GLC
PYRD6PGL RL5P
X5P
S7P
E4P F6P
GAP
R5P
D6PGC
FUM
ACCOA
SUCC
ACTP
AC
LAC ETH
Escherichia coli Chemical Process Plant
� Metabolism is the totality of chemical reactions that occur in an organism
� Metabolic Engineering is the analysis and modification of metabolic pathways
• Applications include biochemical production, bioremediation, and drug discovery
What is Metabolism?
DNA
SequenceGenes
Proteins
(Enzymes)Metabolic
Pathways
Cellular
Physiology
Increasing Complexity
HT Technologies for Systems BiologyDNA Sequence
Genes
Enzymes
(Gene Products)
Metabolism and
Cellular Physiology
Automated Sequencing
Genomics
DNA Microarrays
Transcriptomics
2-D Protein Arrays
Proteomics
Metabolic Reconstruction Technology
Genome Annotation:
DNA sequence
ORFs identification
Genes
ORFs assignment
Genes Products
Function
Genome
Database
List of reactions
Pathway
Database
Organism’s
metabolism
Metabolic Reconstruction:
ORF = open
reading frame, a
short fragment of
DNA that is
translated into
RNA message
Manual curation
Wet Lab
Literature Review
Organism-Specific Model Construction:
Complexity of Metabolic NetworksBiodegradation of
Xenobiotics
Metabolism of
Complex Carbohydrates
Nucleotide
MetabolismMetabolism of
Complex Lipids
Carbohydrate
Metabolism
Metabolism of
Other Amino Acids
Amino Acid
Metabolism
Lipid
Metabolism
Metabolism of
Cofactors and VitaminsEnergy
Metabolism
Biosynthesis of
Secondary Metabolites
Glucose
Glc-6-P
Fru-6-P
“Small” E. coli Model (<100 reactions)
Central Metabolism in E. coli( http://gcrg.ucsd.edu/downloads/PathwayFBA/default.htm)
Glucose → G6P
Biomass Composition
Amino acids
Fatty Acids
Glycogen
Energy metabolites (i.e., ATP, GTP)
Succinyl CoA, Acetyl CoA
Biomass
Biomass formation modeled as:
∑(x)(metabolite) → Biomass + ∑ (y)(metabolite)
ADP
Phosphate
Pyro-phosphate
Protons
+
Mass Balances in Metabolic Networks
Reaction network:
2B C
A B
v1
v2
Stoichiometric matrix: A
B
C
v1 v2
-1 0
1 -2
0 1
Mass balance:
dt= v1 – 2 v2
d[B]
Steady-state assumption: ∑=j
jijvS0
i = set of metabolites (chemical species)
j = set of reactions
mmol B
gDW⋅⋅⋅⋅hrv ≡≡≡≡
Metabolic Flux:
Biomass Composition
Flux Balance Analysis
4D
3C
2B
(mmol / gDW)
4D
3C
2B
(mmol /
AAxt B C
D
Dxt
biomassbiomass
biomass
bA
bD
v1 v2
v3
vbio vbio
vbio
“Cell boundary”
AAxt B C
D
Dxt
biomassbiomass
biomass
bA
bD
v1 v2
v3
vbio vbio
vbio
“Cell boundary”
Given:
(1) Stoichiometry of the network
(2) Cellular composition information
(3) Substrate uptake rate
bA = mmol
gDW*hr-10
1000
-4100
-3010
-2-1-11
000-1
1000
-100
-010
--1-11
000-1
=
Optimization Model:
Maximize vbio
subject to
0
0
-10
0
0
-10
5 unknowns > 4 equations
vbio
v1
v2v3
bD
v1,v2,v3,vbio,bD > 0
D
3.33310
D
AAxt B C
D
biomassbiomass
biomass
“Cell boundary”
AAxt B C
Dxt
biomassbiomass
biomass
10
0
4.444
“Cell boundary”
Growth rate: = 1.111 hr-1Model Predictions:
vbio
1.111
1.111
1.111
Units of flux: mmol/(gDW*hr)
Presentation Outline� Systems biology and the constraints-based modeling approach
� Pathway discovery and optimization
� Constraining allowable cellular behavior
How can we identify multiple metabolic manipulations for producing a
desired product and also computationally evaluate of the consequences of
potential network modifications ?
� Metabolic network structural and topological analysis
How can we systematically select the appropriate set of pathways/genes
to recombine into existing production systems?
Desired
Phenotype
E. coli metabolic
capabilities
How can we identify gene knockouts that will force biochemical
overproduction by coupling it with cell growth?
E. coli metabolic
capabilities
Expanding the CapabilitiesE. coli Stoichiometric Models:
• Pramanik & Keasling (1997)
(300 reactions, 289 metabolites)
• Edwards & Palsson (2000)
(627 reactions, 438 metabolites)
• Reed, Vo, Schilling & Palsson (2003)
(931 reactions, 625 metabolites)
+
5,700
reactions
from KEGG
and MetaCyc
databases
Multi-organism
Reaction
Network
• Burgard, A.P. and C.D. Maranas (2001), “Probing the Performance Limits of the Escherichia coli Metabolic Network Subject to
Gene Additions or Deletions,” Biotechnology and Bioengineering, 74, 364-375.
Pharkya, P., Burgard, A.P and C.D. Maranas (2004), “OptStrain: A Computational Framewrok for Redesign of Microbial
Production Systems,” Genome Research, 14, 2367-2376.
Desired
Phenotype
E. coli metabolic
capabilities
~ 1000 reactions
KEGG database
~ 6000 reactions
Gene Addition Study
PEP
G6P
F16P
F6P
13P2DG
3PDGL
2PDGL
PEP
PYR
OA
MAL
SUCCOACIT
AKG
GAP
ICIT
DHAP
GLC
PYRD6PGL RL5P
X5P
S7P
E4P F6P
GAP
R5P
D6PGC
FUM
ACCOA
SUCC
ACTP
AC
LAC ETH
PEP
G6P
F16P
F6P
13P2DG
3PDGL
2PDGL
PEP
PYR
OA
MAL
SUCCOACIT
AKG
GAP
ICIT
DHAP
GLC
PYRD6PGL RL5P
X5P
S7P
E4P F6P
GAP
R5P
D6PGC
FUM
ACCOA
SUCC
ACTP
AC
LAC ETH
GL3P
Glycerol
3-HP
1,3 PD
IPP
DMAPPPYR
Non-mevalonate pathway
MEV
Mevalonate
pathway
Useful in Apparels, Upholstries, etc.
Precursors to anticancer
and antimalarial drugs
Mathematical Framework∑
j
jjvc
0=∑j
jijvS
=0
1jy
if reaction flux switched “on”
if reaction flux switched “off”
Maximize
Subject to
jjj yv ⋅≤≤ max0 ν
∑=
reactionsnonEcolij
jy
jjj yv ⋅≤≤ max0 ν
Procedure: 1) Find maximum theoretical yield using all reactions in
multi-organism reaction network
2) Find minimum number of non-E. coli reactions necessary to
achieve maximum yield
0=∑j
jijvS
Yield = maximum theoretical
Minimize
Subject to
1) Pathway Discovery
Example: Glucose to 1,3-Propanediol using KEGG database
Typical Shortest Path Result
D-Glucose
Glycerol
D-Fructose
D-Fructose-1-phosphate
D-glyceraldehyde
1,3-Propanediol
3-Hydroxypropanal
Balanced Shortest Path Result
at Max. Yield
D-Glucose
Glycerol
D-Fructose
D-Fructose-1-phosphate
Dihydroxyacetone phosphate D-glyceraldehyde
1,3-Propanediol
3-Hydroxypropanal
sn-Glycerol-3-phosphate
Max. Yield: 2 mol 13PD
1 mol GLC
Net Cost: 2 NADH
2 NADPH
� Branched pathways
� Energy/cofactor requirements
� Maximum yield information
Key Advantages
ATP
ADP
NADPH
NADP
ATPADP
NADH
NAD
NADPH
NADP
1) Pathway Discovery
Example: Glucose to 1,3-Propanediol using KEGG database
Typical Shortest Path Result
D-Glucose
Glycerol
D-Fructose
D-Fructose-1-phosphate
D-glyceraldehyde
1,3-Propanediol
3-Hydroxypropanal
Alternative Pathway
D-Glucose
Glycerol-3-phosphate
D-Glucose-6-Phosphate
D-Fructose-6-phosphate
Dihydroxyacetone phosphate
1,3-Propanediol
3-Hydroxypropanal
D-Glyceraldehyde-3-phosphate
Max. Yield: 2 mol 13PD
1 mol GLC
Net Cost: 2 ATP
2 NADH
� Branched pathways
� Energy/cofactor requirements
� Maximum yield information
Key Advantages
ATP
ADP
NADH
NAD
NADH
NAD
D-Fructose-1,6-phosphate
Glycerol
ATP
ADP
Presentation Outline� Systems biology and the constraints-based modeling approach
� Pathway discovery and optimization
� Constraining allowable cellular behavior
How can we identify multiple metabolic manipulations for producing a
desired product and also computationally evaluate of the consequences of
potential network modifications ?
� Metabolic network structural and topological analysis
How can we systematically select the appropriate set of pathways/genes
to recombine into existing production systems?
Desired
Phenotype
E. coli metabolic
capabilities
How can we identify gene knockouts that will force biochemical
overproduction by coupling it with cell growth?
Desired
Phenotype
E. coli metabolic
capabilities
E. coli metabolic
capabilities
Motivating Challenge
Maximum
Theoretical Yields
(mol/mol glucose)
1.71
2.00
Acetate
Succinate
Ethanol
2.00Lactate
2.78Formate
2.40
Experimental
Yields
(mol/mol glucose)
0.34
0.84
Acetate
Succinate
Ethanol
0.10Lactate
1.16Formate
0.81
(Stokes, J. Bacteriology, 1949)
Cellular
ObjectiveBioengineering
Objective(e.g.,
Max. biomass production
Min. metabolic adjustment
Max. ATP production, etc.)
(e.g.,
Max. 1,3-propanediol yield
Max. lactate yield, etc.)
Motivating Challenge
Maximum
Theoretical Yields
(mol/mol glucose)
1.71
2.00
Acetate
Succinate
Ethanol
2.00Lactate
2.78Formate
2.40
Experimental
Yields
(mol/mol glucose)
0.34
0.84
Acetate
Succinate
Ethanol
0.10Lactate
1.16Formate
0.81
(Stokes, J. Bacteriology, 1949)
Cellular
Objective
Bioengineering
Objective
??
(e.g.,
Max. biomass production
Min. metabolic adjustment
Max. ATP production, etc.)
(e.g.,
Max. 1,3-propanediol yield
Max. lactate yield, etc.)
OptKnock Bilevel Optimization Framework
Inner Problem:adjust reaction fluxes
� optimize cellular objective
− Max. biomass yield
− Min. metabolic adjustment
− Max. ATP yield
Maximize
s.t.
Biomass Yield
� Fixed substrate uptake rate
� Network connectivity
(over fluxes)
s.t.
MaximizeOuter Problem:adjust knockouts
� optimize bioeng. objective
− Max. 1,3-propanediol yield
− Max. lactate yield
Biochemical Yield
� Blocked reactions identified
by outer problem
�Minimum biomass yield
� # Knockouts ≤≤≤≤ limit
(over gene knockouts)
• Burgard, A.P., Pharkya, P., and C.D. Maranas (2003), “OptKnock: A bilevel programming framework for identifying gene
knockout strategies for microbial strain optimization,” Biotechnology and Bioengineering, 84, 647-657.
• Pharkya, P., Burgard, A.P., and C.D. Maranas (2003), “Exploring the overproduction of amino acids using the bilevel
optimization framework OptKnock,” Biotechnology and Bioengineering, 84, 887-899.
PRIMAL (Inner problem)
subject to
LP Duality Theory
uptakevGLC =
BiomassPRIMAL vZ =Maximize
0=∑j
jijvS
0≥jv
ZDUAL
ZPRIMAL
Optimal solution
if and only if:
ZDUAL = ZPRIMAL
vj
DUAL
guptakeZDUAL ⋅=Minimize
subject to
ui, g
ui
g
multipliers
0, =+∑ gSui
GLCii
1, ≥∑i
BiomassiiSu
0≥∑i
ijiSu
PRIMAL (Inner problem)
subject to
LP Duality Theory
0, =+∑ gSui
GLCii
DUAL
uptakevGLC =
BiomassPRIMAL vZ =Maximize
0=∑j
jijvS
0≥jv
guptakeZDUAL ⋅=Minimize
1, ≥∑i
BiomassiiSu
subject to
ZDUAL
ZPRIMAL
0≥∑i
ijiSu
=⋅ guptake Biomassv
Maximize Productvyj, ui, g, vj
subject to
{ }1,0∈jy
0, =+∑ gSui
GLCii
1, ≥∑i
BiomassiiSu
0≥∑i
ijiSu
uptakevGLC =
0=∑j
jijvS
ui, g
vj
∑ ≤−j
jy )1(
jjj yv ⋅≤≤ max0 ν
Dual
Primal
MILP problem
# of knockouts
Optimal Gene Knockout Identification
Case Studies:
(1) 1,3 Propanediol (2) Lactate
Questions:
� Identify single, double, triple, and quadruple knockout strategies?
� Characterize allowable envelope of biomass vs. biochemical production?
?
?
?
Growth
Biochemical
Production
1,3 PD Overproduction
PEP
G6P
F16P
F6P
13P2DG
3PDGL
2PDGL
PEP
PYR
OA
MAL
SUCCOACIT
AKG
GAP
ICIT
DHAP
GLC
PYRD6PGL RL5P
X5P
S7P
E4P F6P
GAP
R5P
D6PGC
FUM
ACCOA
SUCC
ACTP
AC
LAC ETH
PEP
G6P
F16P
F6P
13P2DG
3PDGL
2PDGL
PEP
PYR
OA
MAL
SUCCOACIT
AKG
GAP
ICIT
DHAP
GLC
PYRD6PGL RL5P
X5P
S7P
E4P F6P
GAP
R5P
D6PGC
FUM
ACCOA
SUCC
ACTP
AC
LAC ETH
0
2
4
6
8
10
12
14
0 0.2 0.4 0.6 0.8 1 1.2
Growth Rate (1/hr)
1,3 PD Production Limits
(mmol/hr)
Complete E. coli network
Basis: 10 mmol/hr glucose, 1 gDW cells
Maximum Growth:
E. coli (wild type)
GL3P
Non-E. colireaction
Glycerol
GPP1& 2
3-HP
dhaB
1,3 PD
dhaT
Non-E. coli genes/enzymes: GPP1&2: glycerol-3-phosphatase
dhaB: glycerol dehydratase
dhaT: 1,3 PD oxidoreductase
Klebsiella pneumoniae
Klebsiella pneumoniae
Saccharomyces cerevisiae
PEP
G6P
F16P
F6P
13P2DG
3PDGL
2PDGL
PEP
PYR
OA
MAL
SUCCOACIT
AKG
GAP
ICIT
DHAP
GLC
PYRD6PGL RL5P
X5P
S7P
E4P F6P
GAP
R5P
D6PGC
FUM
ACCOA
SUCC
ACTP
AC
LAC ETH
PEP
G6P
F16P
F6P
13P2DG
3PDGL
2PDGL
PEP
PYR
OA
MAL
SUCCOACIT
AKG
GAP
ICIT
DHAP
GLC
PYRD6PGL RL5P
X5P
S7P
E4P F6P
GAP
R5P
D6PGC
FUM
ACCOA
SUCC
ACTP
AC
LAC ETH
1,3 PD Overproducing Mutants
GLAL
Basis: 10 mmol/hr glucose, 1 gDW cells
Maximum Biomass :
1,3 PD:1.05 hr-1
0.00 mmol/hr“Wild” type:
Maximum Biomass :
1,3 PD:0.21 hr-1
9.66 mmol/hrMutant A:
Mutant A:
(3) Fructose-1,6-bisphosphatase (fbp) or Fructose-1,6-bisphosphate aldolase (fba)
(2) Phosphoglycerate kinase (pgk) or Glyceraldehyde-3-phosphate dehydrogenase (gapA, gapC1C2)
(1) Aldehyde dehydrogenase (adhC)
GL3P
Non-E. colireaction
Glycerol
2KD6PG
Entner-Doudoroff
3-HP
1,3 PD
PEP
G6P
F16P
F6P
13P2DG
3PDGL
2PDGL
PEP
PYR
OA
MAL
SUCCOACIT
AKG
GAP
ICIT
DHAP
GLC
PYRD6PGL RL5P
X5P
S7P
E4P F6P
GAP
R5P
D6PGC
FUM
ACCOA
SUCC
ACTP
AC
LAC ETH
PEP
G6P
F16P
F6P
13P2DG
3PDGL
2PDGL
PEP
PYR
OA
MAL
SUCCOACIT
AKG
GAP
ICIT
DHAP
GLC
PYRD6PGL RL5P
X5P
S7P
E4P F6P
GAP
R5P
D6PGC
FUM
ACCOA
SUCC
ACTP
AC
LAC ETH
1,3 PD Overproducing Mutants
Non-E. colireaction
Maximum Biomass :
1,3 PD :
0.11 hr-1
9.84 mmol/hrMutant C:
Maximum Biomass :
1,3 PD :
0.14 hr-1
9.78 mmol/hr
Mutant D:
Maximum Biomass :
1,3 PD :
0.16 hr-1
9.75 mmol/hrMutant E:
GLAL
DR5P
ACAL
Maximum Biomass :
1,3 PD :
0.29 hr-1
9.67 mmol/hrMutant B:
Mutant B:(2) Triose phosphate isomerase (tpiA)
(3) Glucose 6-phosphate-1-dehydrogenase (zwf) or 6-Phosphogluconolactonase (pgl)
(1) Aldehyde dehydrogenase (adhC)
(4) Deoxyribose-phosphate aldolase (deoC)
Maximum Biomass :
1,3 PD :0.21 hr-1
9.66 mmol/hrMutant A:
Basis: 10 mmol/hr glucose, 1 gDW cells
Maximum Biomass :
1,3 PD :1.05 hr-1
0.00 mmol/hr“Wild” type:
GL3P
Glycerol
3-HP
1,3 PD
0
2
4
6
8
10
12
14
0 0.2 0.4 0.6 0.8 1 1.2
Growth Rate (1/hr)
1,3 PD Production Limits
(mmol/hr)
Complete E. coli network
1,3 PD Mutant CharacterizationBasis: 10 mmol/hr glucose, 1 gDW cells
Maximum Growth:
E. coli (wild type)
Mutant A
Mutant B
Mutant C
Mutant D
Mutant E
Mutant A
Mutant B
Lactate Mutant Experimentation
60 days ~ 500 generations
Three designs constructed:
Blattner Lab: Strain Construction
Palsson Lab: Adaptive Evolution
1) pta-adhE (5 strains evolved)
2) pta-pfk (3 strains evolved)
3) pta-adhE-pfk-glk (3 strains evolved)
Increased glucose uptake rates or increased lactate yields ?
0
5
10
15
20
25
30
35
40
0 0.1 0.2 0.3
Growth rate (1/hr)
Lactate Secretion R
ate (mmol/g-
DW.hr)
pta-pfk SB1
pta-pfk SB2
pta-pfk SB3
pta-adhE-pfk-glk SB1
pta-adhE-pfk-glk SB2
pta-adhE-pfk-glk SB3
pta-adhe SB1
pta-adhE SB2
pta-adhE SB3
pta-adhE SB4
pta-adhE SB5
Presentation Outline� Systems biology and the constraints-based modeling approach
� Pathway discovery and optimization
� Constraining allowable cellular behavior
How can we identify multiple metabolic manipulations for producing a
desired product and also computationally evaluate of the consequences of
potential network modifications ?
� Metabolic network structural and topological analysis
How can we systematically select the appropriate set of pathways/genes
to recombine into existing production systems?
Desired
Phenotype
E. coli metabolic
capabilities
How can we identify gene knockouts that will force biochemical
overproduction by coupling it with cell growth?
Desired
Phenotype
E. coli metabolic
capabilities
E. coli metabolic
capabilities
Application to genome-scale models:
1. Helicobacter pylori 389 reactions
2. Escherichia coli 627 reactions
3. Saccharomyces cerevisiae 1173 reactions
(Schilling et al., J Bacteriol, 2002)
(Edwards and Palsson, PNAS, 2000)
(Forster et al., Genome Res, 2003)
G6P
F16P
F6P
13P2DG
3PG
2PG
PEP
PYR
GAPDHAP
GLC
v1
v2
Steady-state Flux Coupling Analysis
Identify sets of coupled reactions, equivalent knockouts, and
affected reactions in genome-scale stoichiometric models.
Objective:
How are the two fluxes v1 and v2 related?
1. Does v1 imply v2 ?
2. Do v1 and v2 imply each other?
3. Are v1 and v2 completely uncoupled?
Questions:Coupled Reactions
Equivalent Knockouts
Which reactions could alternatively be deleted to force the flux
through a particular reaction to zero?
Affected Reactions
Which reaction fluxes will be forced to zero if a particular reaction
is removed from the network?
• Burgard, A.P., Nikolaev, E.V., and C.D. Maranas (2004), “Flux coupling analysis of genome-scale metabolic network
reconstructions ,” Genome Research, 14, 301-312.
max v1 / v2 = Rmax
min v1 / v2 = Rmin
Rmin= 0 Rmax= ∞Uncoupled:
Rmin= c1 Rmax= c2
Partially Coupled:
v1 ↔ v2
0∞
Rmin= Rmax= c
Fully Coupled:
v1 ⇔ v2
0∞
Rmin ≤ ≤ Rmaxv1
v2
Identifying Coupled Reactions
Rmin= 0 Rmax= c
Directionally Coupled:
v1 → v2 ∞
Directionally Coupled: v2 → v1
Rmin= c Rmax= ∞0
Potential Flux Ratio Outcomes:
12 =v)
Fractional programming formulation:
To find the presence and directionality of coupling between two fluxes v1 and v2:
transformation
Identifying Coupled Reactions
maximize (or minimize) v1 / v2
01
=∑=
M
j
jijvS
0≥jv
subject to
vuptake ≤ vuptake_max
vj / v2
1v)
01
=∑=
M
j
jijvS)
12 =v)
maximize (or minimize)
subject to
uptakev)
≤ vuptake_max⋅t
0≥jv)
t ≥ 0
Linear formulation: where =jv)
� Network size ↑ % Coupled reactions ↓
� Presence of Large coupled
biomass reaction reaction sets
� Conditions only slightly affect coupling.
19 sets of 2 reactions complex
media
glucose
minimal
complex
media
glucose
minimal
complex
media
glucose
minimal
complex
media
glucose
minimal
complex
media
glucose
minimal
complex
media
glucose
minimal
19 (2) 15 (2) 25 (2) 23 (2) 45 (2) 33 (2) 48 (2) 37 (2) 51 (2) 45 (2) 51 (2) 46 (2)
8 (3) 7 (3) 10 (3) 9 (3) 9 (3) 6 (3) 9 (3) 9 (3) 13 (3) 11 (3) 14 (3) 14 (3)
2 (4) 1 (5) 3 (4) 2 (4) 4 (4) 1 (4) 4 (4) 3 (4) 6 (4) 5 (4) 7 (4) 5 (4)
1 (6) 1 (7) 1 (5) 2 (5) 3 (5) 1 (5) 5 (5) 5 (5) 2 (5) 3 (5) 2 (5) 3 (5)
2 (7) 1 (10) 2 (6) 3 (6) 2 (6) 3 (7) 2 (6) 2 (6) 2 (6) 2 (6) 2 (6) 2 (6)
1 (10) 1 (174) 3 (7) 2 (7) 1 (7) 1 (10) 2 (7) 2 (7) 1 (7) 1 (7) 1 (7) 1 (7)
1 (148) 1 (8) 1 (8) 1 (8) 1 (112) 1 (8) 1 (8) 2 (8) 2 (8) 2 (8) 2 (8)
1 (9) 1 (9) 2 (9) 3 (9) 3 (9) 1 (9) 1 (9) 1 (9) 1 (9)
4 (10) 4 (10) 1 (66) 1 (10) 1 (10) 1 (12) 1 (12) 1 (12) 1 (12)
1 (13) 1 (13) 1 (17) 1 (17) 1 (30) 1 (34) 1 (17) 1 (17)
1 (20) 1 (20)
total reactions
in subsets:248 247 220 213 259 236 252 226 261 248 255 242
total subsets: 34 26 52 49 68 46 76 64 80 72 82 76
H. pylori E. coli S. cerevisiaebiomass reaction no biomass reaction biomass reaction no biomass reaction biomass reaction no biomass reaction
Coupled to
biomass formation
→
Reaction Coupling Statistics
Com
plex (B
io)
Glc (B
io)
Com
plex
(No Bio)
Glc (N
o Bio)
H. pylori
E. coli
S. cerevisiae
0%
10%
20%
30%
40%
50%
60%
70%
% of Reactions in Coupled Sets
H. pylori
E. coli
S. cerevisiae
Directional Coupling
v1
v2
v3
v*
Reactions “affected” by v*
Corresponding flux ratio outcomes,
v1,2, or 3 ≤≤≤≤ c⋅⋅⋅⋅v*
v4
v5
v6
v* ≤≤≤≤ c⋅⋅⋅⋅v4,5, or 6
“Equivalent KO’s” for v*
Equivalent Knockouts vs. Affected Reactions
E. coli Central Metabolic Network Coupling
Growth on glucose minimal media
Steady-state conditions
� The forward and backward directions of the major
metabolic pathways show significant internal coupling.
(1) glycolysis
(2) pentose phosphate pathway
(3) TCA cycle
� Anaplerotic and respiration reactions are uncoupled
from the rest of central metabolism.
� The major pathways above are uncoupled from one
another.
Nodes: Reactions
Arcs: Couplings
1
10
100
1 10 100
k
N(k)
1
10
100
1 10 100
k
N(k)
Helicobacter pylori Escherichia coli Saccharomyces cerevisiae
N(k) = 94.6 k -1.154
R2 = 0.707
N(k) = Number of reactions implying k reactions
k = Number of reactions
N(k) = 167.7 k -1.436
R2 = 0.848
N(k) = 323.7 k -1.624
R2 = 0.864
vj
vj’
vj’’
vj’’’
k = 3 rxns
1
10
100
1000
1 10 100
k
N(k)
Ex.
Scale-free Nature of Directional Coupling
Scale Free Architectures
Metabolite Centered Graphs
Reaction Flux Centered Graphs (Burgard, et. al., Genome Res., 2004)
(Jeong et al., Nature, 2000)(Wagner and Fell, Proc. R. Soc. Lond. B. Biol. Sci., 2001)
“Gaps” in metabolic networks
http://doegenomestolife.org
• Genes not annotated
• Genes not assigned function
• Incomplete curation of models
Bridging of gaps required !
Church et al., 2004, BMC Bioinformatics
Green and Karp, 2004, Bioinformatics
A B
Stoichiometric Metabolic reconstructions are
inherently incompleteChurch et al. BMC Bioinformatics (2004)
Green & Karp, Bioinformatics (2004)
Gaps in Metabolic Pathways
Missing biochemical
reactions
What are Gaps ? Causes for Gaps ?
• Gene not annotated
• Function not assigned
• Incomplete curation
Bridging Gaps in Metabolic pathways
STEP 1: Compile candidate reaction database
MetaCyc
KEGG
(~ 3,900 reactions)
G6P
F16P
F6P
T3P1T3P2
GLCPEP
PYRD6PGL RL5P
X5P
S7P
E4P F6P
T3P1
R5P
D6PGC
pts
pgi
pfkAB fbp
fba
tpiA
tktAB
zwf pgl gnd
rpiABrpe
tktAB
talB
Database Compilation
Database Curation
� Eliminate compounds with no chemical formulae from the compound list and
remove the corresponding reactions e.g., (1-4-beta-D-Glucan)(M+N)
� Link the compounds in individual stoichiometric models with those in the
compiled database and assign unique ids e.g., Urea MetaCyc compound id: 4619 CID:123
� Link the reactions in individual metabolic models with those in the
database and assign unique ids
Development of experimental and computational tools to evaluate metabolic flux
Labeled
Isotopes
GLX
PEP
G6P
F16P
F6P
13P2DG
3PG
2PG
PEP
PYR
OA
MAL
SUCCOACIT
AKG
GAP
ICIT
DHAP
GLC
PYRD6PGL
FUM
ACCOA
SUCC
ACT
AC
RL5P
X5P
S7P
E4P F6P
GAP
R5P
D6PGC
ATP H+ QH2
NADHNADPH FADH
10.0
7.1 2.9 2.9 2.9
8.5
8.5
17.5
17.5
16.5
16.5
12.7
10.9
8.9
8.98
8
8
88
7.2
9.76.7
0.9
0.60.9
� Isotopomer analysis using GC/MS(Christensen & Neilsen, 1999;Fischer & Sauer, 2003)
� Isotopomer analysis using NMR spectra(Schmidt et.al., 1999)
� Computational models for flux elucidation(Zupke at al. , 1994;Wiechert & Graff, 1996; Wiechert et.al. 1996;Mollney et.al. 1999 )
� Optimization algorithms(Ghosh et.al.2004; Riascos et.al.2004; Phalakornkule et.al. 2001)
GC-MS
NMR
PEP
G6P
F16P
F6P
13P2DG
3PDGL
2PDGL
PEP
PYR
OA
MAL
SUCCOACIT
AKG
GAP
ICIT
DHAP
GLC
PYRD6PGL RL5P
X5P
S7P
E4P F6P
GAP
R5P
D6PGC
FUM
ACCOA
SUCC
ACTP
AC
LAC ETH
PEP
G6P
F16P
F6P
13P2DG
3PDGL
2PDGL
PEP
PYR
OA
MAL
SUCCOACIT
AKG
GAP
ICIT
DHAP
GLC
PYRD6PGL RL5P
X5P
S7P
E4P F6P
GAP
R5P
D6PGC
FUM
ACCOA
SUCC
ACTP
AC
LAC ETHAlac-S
Leucine
Valine
Pro-LGlu-L
Gln-L
Acglu
Arginine
Methionine
Proline
Glx
4pasp Asp-L
Asn-LAspsaHom-L
Phom
Threonine
Lysine
3php
Serine
Glycine
Cystenine
Pser-L
Acser
co2
Limitations� Employed metabolic models are of limited scope
(i.e. 30-50 rxns)� For genome-scale models (~1,000 rxns), measurables do not
always elucidate unique flux distributions
2dda7p
pphn
34hpp
Tyrosine
Phpyr
Phenyl
alaninePrpp
Pran
Tyrphtophan
Many relevant pathways
are absent…
PEP
G6P
F16P
F6P
13P2DG
3PDGL
2PDGL
PEP
PYR
OA
MAL
SUCCOACIT
AKG
GAP
ICIT
DHAP
GLC
PYRD6PGL RL5P
X5P
S7P
E4P F6P
GAP
R5P
D6PGC
FUM
ACCOA
SUCC
ACTP
AC
LAC ETH
PEP
G6P
F16P
F6P
13P2DG
3PDGL
2PDGL
PEP
PYR
OA
MAL
SUCCOACIT
AKG
GAP
ICIT
DHAP
GLC
PYRD6PGL RL5P
X5P
S7P
E4P F6P
GAP
R5P
D6PGC
FUM
ACCOA
SUCC
ACTP
AC
LAC ETH
Wet-lab experimental system (Keasling lab)
� E. coli strain that produces amorphadiene
(precursor to anti-malarial drug artemisinin)
� Measure labeling of 14 amino acids
� Average standard deviation of measurements is 3%
� Includes 353 reactions and 184 metabolites
� Yields 15,500 isotopes (Iik)
� Balances cofactors such as ATP, NADH, NADPH
� Weights measurements by their standard deviation
Isotopomer mapping model (Burgard & Van Dien)
-10
-8
-6
-4
-2
0
2
4
6
0 20 40 60 80 100 120 140 160
Calculated labeling matches well with those measured
� Average of 1.4% difference between calculated
labeling and measured labeling patterns
average percent deviation of
measurements (3 %)
observation #
percent difference
average difference between measurements and model predictions
(1.4 %)
Networks…
Signaling NetworksMetabolic Networks
http://doegenomestolife.org
Publications (fenske.che.psu.edu/faculty/cmaranas)Strain design
• Pharkya, P. and C.D. Maranas (2005), “An Optimization Framework for Identifying Reaction
Activation/Inhibition or Elimination Candidates for Overproduction in Microbial Systems”, submitted.
• Pharkya, P., A.P. Burgard and C.D. Maranas (2004), “OptStrain: A Computational Framework for Redesign
of Microbial Production Systems”, Genome Research, 14, 2367-2376.
• Pharkya, P., A.P. Burgard and C.D. Maranas (2003), “Exploring the Overproduction of Amino Acid Using the
Bilevel Optimization Framework OptKnock,” Biotechnology and Bioengineering, 84, 887-899.
• Burgard, A.P., P. Pharkya and C.D. Maranas (2003), “OptKnock: A Bilevel Programming Framework for
Identifying Gene Knockout Strategies for Microbial Strain Optimization,” Biotechnology and Bioengineering,
84, 647-657
Metabolite flux and concentration coupling analysis
• Nikolaev, E.V., A.P. Burgard, and C.D. Maranas (2005), "Elucidation and Structural Analysis of Conserved
Pools for Genome-Scale Metabolic Reconstructions," Biophysical Journal, 88, 37-49.
• Burgard, A.P.†, E.V. Nikolaev†, C.H. Schilling and C.D. Maranas (2004), “Flux Coupling Analysis of Genome-
scale Metabolic Network Reconstructions,” Genome Research, 14, 301-312
Inferring and Testing Metabolic Objective Functions
• Burgard, A.P. and C.D. Maranas (2002), “An Optimization-based Framework for Inferring and Testing
Hypothesized Metabolic Objective Functions,” Biotechnology and Bioengineering, 82, 670-677.
Minimal Reaction Set Identification
• Burgard, A.P., S. Vaidyaraman and C.D. Maranas (2001), “Minimal Reaction Sets for Escherichia coli
Metabolism under Different Growth Requirements and Uptake Environments,” Biotech. Progress,17, 791-797.
Pathway discovery and optimization
•Burgard, A.P. and C.D. Maranas (2001), “Probing the Performance Limits of the Escherichia coli Metabolic Network Subject to Gene Additions or Deletions,” Biotechnology and Bioengineering, 74, 364-375.
Signaling networks
• Dasika, M., Burgard, A.P. and C.D. Maranas (2006), "A computational framework for topological analysis and targeted disruption of signal transduction networks" Biophysical Journal, in press.
Acknowledgements
Graduate Researchers:
Post-Doctoral Researcher:
Collaborators:
Funding Sources:
Tony Burgard
Priti Pharkya
Madhukar Dasika
Vinay Satish Kumar
Evgeni Nikolaev
Patrick Suthers
Bernhard Palsson
Fred Blattner
National Science Foundation and Department of Energy
University of Wisconsin
University of California, San Diego