bmc systems biology biomed central - springer...structured (bigg) 'databases'. we employ...

20
BioMed Central Page 1 of 20 (page number not for citation purposes) BMC Systems Biology Open Access Research article Investigating the metabolic capabilities of Mycobacterium tuberculosis H37Rv using the in silico strain iNJ661 and proposing alternative drug targets Neema Jamshidi and Bernhard Ø Palsson* Address: Department of Bioengineering, University of California, San Diego, La Jolla, CA, 92093-0412, USA. Email: Neema Jamshidi - [email protected]; Bernhard Ø Palsson* - [email protected] * Corresponding author Abstract Background: Mycobacterium tuberculosis continues to be a major pathogen in the third world, killing almost 2 million people a year by the most recent estimates. Even in industrialized countries, the emergence of multi-drug resistant (MDR) strains of tuberculosis hails the need to develop additional medications for treatment. Many of the drugs used for treatment of tuberculosis target metabolic enzymes. Genome-scale models can be used for analysis, discovery, and as hypothesis generating tools, which will hopefully assist the rational drug development process. These models need to be able to assimilate data from large datasets and analyze them. Results: We completed a bottom up reconstruction of the metabolic network of Mycobacterium tuberculosis H37Rv. This functional in silico bacterium, iNJ661, contains 661 genes and 939 reactions and can produce many of the complex compounds characteristic to tuberculosis, such as mycolic acids and mycocerosates. We grew this bacterium in silico on various media, analyzed the model in the context of multiple high-throughput data sets, and finally we analyzed the network in an 'unbiased' manner by calculating the Hard Coupled Reaction (HCR) sets, groups of reactions that are forced to operate in unison due to mass conservation and connectivity constraints. Conclusion: Although we observed growth rates comparable to experimental observations (doubling times ranging from about 12 to 24 hours) in different media, comparisons of gene essentiality with experimental data were less encouraging (generally about 55%). The reasons for the often conflicting results were multi-fold, including gene expression variability under different conditions and lack of complete biological knowledge. Some of the inconsistencies between in vitro and in silico or in vivo and in silico results highlight specific loci that are worth further experimental investigations. Finally, by considering the HCR sets in the context of known drug targets for tuberculosis treatment we proposed new alternative, but equivalent drug targets. Background Tuberculosis continues to be a devastating pathogen throughout the world, particularly in developing nations. In 2001, the World Health Organization (WHO) esti- mated 8.5 million new cases of tuberculosis (based on 3.8 million new reported cases) and an estimated 1.8 million deaths from tuberculosis in 2000 [1]. Within the United States, the number of reported cases of tuberculosis has Published: 8 June 2007 BMC Systems Biology 2007, 1:26 doi:10.1186/1752-0509-1-26 Received: 6 March 2007 Accepted: 8 June 2007 This article is available from: http://www.biomedcentral.com/1752-0509/1/26 © 2007 Jamshidi and Palsson; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Upload: others

Post on 19-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BMC Systems Biology BioMed Central - Springer...structured (BiGG) 'databases'. We employ constraint-based reconstruction and analysis (COBRA) of this BiGG reconstruction to learn about

BioMed CentralBMC Systems Biology

ss

Open AcceResearch articleInvestigating the metabolic capabilities of Mycobacterium tuberculosis H37Rv using the in silico strain iNJ661 and proposing alternative drug targetsNeema Jamshidi and Bernhard Ø Palsson*

Address: Department of Bioengineering, University of California, San Diego, La Jolla, CA, 92093-0412, USA.

Email: Neema Jamshidi - [email protected]; Bernhard Ø Palsson* - [email protected]

* Corresponding author

AbstractBackground: Mycobacterium tuberculosis continues to be a major pathogen in the third world,killing almost 2 million people a year by the most recent estimates. Even in industrialized countries,the emergence of multi-drug resistant (MDR) strains of tuberculosis hails the need to developadditional medications for treatment. Many of the drugs used for treatment of tuberculosis targetmetabolic enzymes. Genome-scale models can be used for analysis, discovery, and as hypothesisgenerating tools, which will hopefully assist the rational drug development process. These modelsneed to be able to assimilate data from large datasets and analyze them.

Results: We completed a bottom up reconstruction of the metabolic network of Mycobacteriumtuberculosis H37Rv. This functional in silico bacterium, iNJ661, contains 661 genes and 939 reactionsand can produce many of the complex compounds characteristic to tuberculosis, such as mycolicacids and mycocerosates. We grew this bacterium in silico on various media, analyzed the model inthe context of multiple high-throughput data sets, and finally we analyzed the network in an'unbiased' manner by calculating the Hard Coupled Reaction (HCR) sets, groups of reactions thatare forced to operate in unison due to mass conservation and connectivity constraints.

Conclusion: Although we observed growth rates comparable to experimental observations(doubling times ranging from about 12 to 24 hours) in different media, comparisons of geneessentiality with experimental data were less encouraging (generally about 55%). The reasons forthe often conflicting results were multi-fold, including gene expression variability under differentconditions and lack of complete biological knowledge. Some of the inconsistencies between in vitroand in silico or in vivo and in silico results highlight specific loci that are worth further experimentalinvestigations. Finally, by considering the HCR sets in the context of known drug targets fortuberculosis treatment we proposed new alternative, but equivalent drug targets.

BackgroundTuberculosis continues to be a devastating pathogenthroughout the world, particularly in developing nations.In 2001, the World Health Organization (WHO) esti-

mated 8.5 million new cases of tuberculosis (based on 3.8million new reported cases) and an estimated 1.8 milliondeaths from tuberculosis in 2000 [1]. Within the UnitedStates, the number of reported cases of tuberculosis has

Published: 8 June 2007

BMC Systems Biology 2007, 1:26 doi:10.1186/1752-0509-1-26

Received: 6 March 2007Accepted: 8 June 2007

This article is available from: http://www.biomedcentral.com/1752-0509/1/26

© 2007 Jamshidi and Palsson; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Page 1 of 20(page number not for citation purposes)

Page 2: BMC Systems Biology BioMed Central - Springer...structured (BiGG) 'databases'. We employ constraint-based reconstruction and analysis (COBRA) of this BiGG reconstruction to learn about

BMC Systems Biology 2007, 1:26 http://www.biomedcentral.com/1752-0509/1/26

been decreasing with the exception of a period when thetrend reversed in 1986 and peaked in 1992 [2,3]. Thisreversal has been attributed principally to HIV/AIDs,immigration from countries with high prevalence oftuberculosis, poverty, homelessness, and multi-drugresistant (MDR) tuberculosis [1,2]. MDR tuberculosis isgenerally defined as strains that are resistant to treatmentwith isoniazid and rifampin [4], two of the key first lineantituberculosis drugs [1]. MDR strains of tuberculosisemerged in the early 1990s and have now been found allover the world [4].

Many of the unique properties of tuberculosis are attribut-able to its metabolism, particularly the complex fatty acidscharacteristics of the organism. These mycolic acids, phe-nolic glycolipids, and mycoceric acids confer many of theproperties such as its acid-fastness and are believed to con-tribute to the resilience of the organism. Mycobacteriumtuberculosis can survive in a wide range of environments(many different tissues) and fairly extreme pHs [5]. Oneof the most confounding factors with these bacteria istheir ability to survive for long periods of time in a dor-mant stage. The slow doubling time of tuberculosis hasfurther limited the amount of experimental data that canbe generated. Many of the first and second line drugs usedto treat tuberculosis have metabolic targets, so developingsystems level models of metabolism are anticipated to beof great use in the future.

DNA sequencing of the ~4.4 Mbp genome of Mycobacte-rium tuberculosis H37Rv (M. tb) in 1998 [6] enabled theability the pursuit of genome-scale analyses of this micro-organism. The remarkable relevance to world health anddisease control and the need to understand the metabolicfunction of the organism all evoke the need of a genome-scale metabolic model. Long-term anticipated goals andapplications of such models are to understand the growthof mycobacteria under different conditions, identifyingstrategies to improve growth in vitro (for experimental anddiagnostic purposes), and identifying new drug targets fortreatment.

In order to gain understanding about the unique charac-teristics of this important pathogen, we manually recon-structed the metabolic network of M. tb in silico (iNJ661),from which we developed a model to compute performcomputational analyses and interpret experimental data.These bottom-up reconstructions have been described inthe past as, biochemically, genetically and genomicallystructured (BiGG) 'databases'. We employ constraint-based reconstruction and analysis (COBRA) of this BiGGreconstruction to learn about its normal metabolic func-tion and to infer new potential targets for drugs.

Results and discussionThe reconstruction process has been described previously[7,8], Figure 1 summarizes this process in brief. The net-work statistics for iNJ661, which has 661 genes and 939intra-system reactions, are summarized in Table 1. A bio-mass objective function was defined using available meas-urements of M. tb H37Rv and other mycobacteria strainsif information was lacking. The biomass objective func-tion was defined using the literature for chemical compo-sition studies of M. tb [9-13]. When such information wasnot found for the specific strain, Mycobacterium bovis wasused (for example to approximate the biomass composi-tion of nucleotides and peptidoglycans [14]). The simplefatty acids compositions were based on studies of themycobacterial cell wall [15]. The biomass functionincludes 90 metabolites in addition growth associatedmaintenance ATP costs (see Table 2). [See Additional file1] for more detailed information.

M. tb growth in silicoFlux Balance Analysis (FBA) was used to grow iNJ661 insilico maximizing the biomass function defined in Table 2(see Materials and Methods). iNJ661 was grown in silicoon three different types of media, Middlebrook 7H9 (sup-plemented with glucose and glycerol) [16], Youmans[17], and the chemically defined rich culture media(CAMR) [18]. Uptake fluxes which define the differentmedia conditions are shown in Table 3 and the computedresults under the different conditions are summarized inTable 4. The doubling times are within the rangedescribed in the literature [18,19], and as expected, thegrowth rates are higher with the richer media. The mini-mum doubling time of 14.7/hr described by [18] arewithin the capabilities of maximum growth of iNJ661.

As noted by Primm et al [20], M. tb growth on Middle-brook 7H9 media requires supplementation with glucoseand glycerol. Glycerol is a component of the biomassfunction (Table 2) according to the chemical compositionmeasurements performed by [14] on Mycobacteriumbovis. So in order to grow in silico, glycerol will eitherneed to be in the medium or the cell must have some wayof producing it from other precursors. iNJ661 was able togrow when glucose uptake was abrogated, however, italways required glycerol supplementation in the mediawhich could be due to the biomass requirement or due tothe need for glycerol for the production of a differentessential metabolite. The growth dependence on glyceroland glucose uptake can be seen in the phase plane dia-gram depicted in Figure 2a. In this figure, the biomassobjective function is optimized while the glucose andglycerol uptake fluxes are varied in different combina-tions. The resulting plot shows how the growth capabilityof iNJ661 varies as a function of glucose and glyceroluptake. The solid black lines depict isoclines, along which

Page 2 of 20(page number not for citation purposes)

Page 3: BMC Systems Biology BioMed Central - Springer...structured (BiGG) 'databases'. We employ constraint-based reconstruction and analysis (COBRA) of this BiGG reconstruction to learn about

BMC Systems Biology 2007, 1:26 http://www.biomedcentral.com/1752-0509/1/26

growth is constant [21]. Further investigations were car-ried out in order to understand the essentiality of glyceroland non-essentiality of glucose in iNJ661. Since flux dis-tributions calculated by FBA are non-unique, flux variabil-ity analysis (see Methods) can be used to calculate thespan of each flux while still achieving the maximum valuefor the objective function. Reactions with non-zero fluxes,in which the maximum flux equals the minimum flux(and is non-zero), are essential in achieving the particularobjective (no alternative pathways exist). Flux variability[22] at maximum growth, with Middlebrook media withglucose and glycerol supplementation (Table 2) was per-formed in order to identify the constraining flux(es) thatprevents growth on glucose but not glycerol. The onlynon-zero fluxes with no flexibility, directly involving themetabolism of glycerol, were the glycerol transporter(GLYCt) and glycerol kinase (Rv3696c, GlpK, GLYK). Theproduction of glycerol-3-phosphate via glycerol kinase isessential for membrane and fatty acid metabolism involv-ing fatty-acyl glycerol phosphates and interconversionsbetween CDP-diacyl-glycerol and phosphatidyl-glycerolphosphates. These results are consistent and expectedgiven the importance of membrane metabolism in therole of biomass production.

The non-zero fluxes with no flexibility directly involvingglucose were the glucose transporter (GLCabc, SugABC,Rv1236+Rv1237+Rv1238) and glucokinase (PPGKr,PpgK, Rv2702). Glucose-6-phosphate is subsequentlyessential for a number of other reactions involving variousaspects of membrane metabolism, including the synthesisof trehalose phosophate (TRE6PS, OtsA, Rv3490) andinositol-phosphate (MI1PS, Ino1, Rv0046c), in additionto phosphoglucomutase (PGMT, PgmA, Rv3068c). Thesethree reactions are required not only for optimal growth,but they are in fact essential for any growth at all in silico.Only one of these, however, has been suggested to beessential experimentally (Rv3490) [23]. Growing iNJ661in Middlebrook media (with glycerol but without glucose

supplementation) severely retards the growth rate, how-ever it is still viable in silico in contrast to M. tb growth invitro. The discrepancy between in vitro essentiality of glu-cose, but non-essentiality of reactions requiring glucosederivatives and in silico non-essentiality of glucose suggestthat there may be non-metabolic functions related to theneed for glucose, in addition to alternative metabolicpathways for the production and/or use of glucose-6-phosphate.

Figure 2b characterizes the growth dependence on theuptake rates of inorganic phosphate and sulfate. Limitinguptake of either of these molecules will inhibit growth insilico. Although the uptake fluxes are small compared tothe uptake of the carbon sources and oxygen, both arerequired for growth of M. tb. Consequently, the sulfateand phosphate transporters may serve as bacteriostaticand possibly bacteriocidal drug targets. A robustness anal-ysis (see Methods) was carried for biomass production byiNJ661 for phosphate and sulfate [see Additional file 7].This was compared to a robustness analysis performed forthe in silico E. coli strain iJR904 [24] on aerobic glucoseconditions [see Additional file 7]. The sensitivity of therespective biomass production rates to the uptake fluxesare given by the slopes of the plots for each organism. Thesensitivity to the phosphate uptake rate for iNJ661 andiJR904 are both approximately 1. However, the slope forsulfate uptake in iNJ661 is approximately four times thatof iJR904 (about 16 for iNJ661 compared to about 4 foriJR904). This difference reflects the relatively largeramount of sulfur as a percent of total biomass in M. tbcompared to E. coli, which is consistent with knowledge ofM. tb's composition [25]. Collectively, the phase planesand robustness plots suggest that M. tb may be much moresensitive to sulfate depletion than E. coli.

A remarkable property of M. tb is its ability to grow in cul-ture with carbon monoxide (CO) serving as the sole car-bon source [26,27]. Unlike other mycobacteria that cangrow with CO as the sole carbon source, M. tb lacks ribu-lose-bisphosphate carboxylase (RuBisCO), an enzymefound in plants enabling fixation of carbon dioxide.Although it was argued that a reductive citric acid cyclemay enable the fixation of carbon dioxide into by M. tb[27,28], experimentally it was observed that citrate didnot affect M. tb growth on CO [27]. We did not expectiNJ661 to grow on CO, since the only experimentallycharacterized CO associated metabolism in M. tb was COdehydrogenase (CODH). The uptake of glycerol versusthe uptake of CO while optimizing for growth can be seenin Figure 3. CO uptake affects the biomass optimizationonly during minimal uptake of glycerol, over a slightrange. The contribution being made in this region is thedonation of protons (through the oxidation of CO toCO2). Looking further into this issue, we tested the

Table 1: The confidence level for each reaction is based on a scale from 0 to 4, 4 being the highest level of confidence (experimental biochemical evidence supporting the inclusion of a reaction) and 1 being the lowest level of confidence (inclusion of a reaction solely on modeling functionality). Sequence based annotations have a confidence level of 2.

Network properties of iNJ661

Genes 661Proteins 543Reactions (intra-system) 939Reactions (exchange) 88Gene Associated Reactions 77%Metabolites 828Average Confidence level 2.31

Page 3 of 20(page number not for citation purposes)

Page 4: BMC Systems Biology BioMed Central - Springer...structured (BiGG) 'databases'. We employ constraint-based reconstruction and analysis (COBRA) of this BiGG reconstruction to learn about

BMC Systems Biology 2007, 1:26 http://www.biomedcentral.com/1752-0509/1/26

hypothesis that RuBisCO would enable growth on CO(even though M. tb is not known to have it). We addedRuBisCO in addition ribose 1-phosphokinase and ribose1,5-bisphosphate isomerase (to complete the pathway)and optimized for growth on a modified Middlebrook7H9 media (glycerol and glucose were removed and COwas added). iNJ661 was unable to grow under these con-ditions, despite conferring the ability to fix CO2 withribose sugars. This observation in conjunction with Figure2a suggests that one possibility for how M. tb is able togrow using CO as the carbon source is that it somehow isable to fix CO into a glycolytic intermediate that does not

appear to occur through a RuBisCO pathway or a reduc-tive citrate cycle.

A context for contentThis BiGG reconstruction can be used as a 'context forcontent' in analyzing large biological datasets. Multipleinvestigators have employed high-throughput data analy-sis methods to characterize M. tb in different environmen-tal conditions, Sassetti et al [23] used transposon sitehybridization (TraSH) to identify the genes needed foroptimal growth of M. tb in vitro (this will be referred to asthe OptGro dataset throughout the rest of the text), Gao et

The Gene Index for Mycobacterium tuberculosis H37Rv was downloaded from The Institute for Genomic Research (TIGR) [45]Figure 1The Gene Index for Mycobacterium tuberculosis H37Rv was downloaded from The Institute for Genomic Research (TIGR) [45]. Reconstruction content was defined based on the sequence annotation, legacy data, the Tuberculist database [31], ancil-lary sources such as the Kyoto Encyclopedia of Genes and Genomes (KEGG), and SEED [47]. Reactions were defined and according to the Tuberculist web and KEGG. Legacy data was also used in the process of building the model from the recon-struction. The manual curation aspects of the reconstruction are outlined above and discussed in further detail in [42]. Debug-ging was started once the first draft of the reconstruction was completed and functional testing (i.e. flux balance analysis calculations, etc.) were begun.

Reconstruction and Model Building of iNJ661 from M. tb H37Rv

iNJ661 Reconstruction

iNJ661 Model

Manual Curation Steps

Evidence for gene

Identify catalytic protein complex/subunits

Reaction definition: primary metabolite conversion

Reaction definition: secondary metabolites/cofactors

Metabolites: formula and charge determination

Reaction definition: catalytic mechanism

Reaction definition: compartmentation

Confirm reaction mass conservation

Confirm reaction charge conservation

Manual Curation Resources

Primary Literature

TextbooksInternet Resource Databases: Tuberculist, KEGG, SEED

Debugging

Identify gaps and carry out directed manual curationEliminate `free energy’ loops

Convert to ModelDefine system boundaries and uptake constraintsTest anabolic and catabolic capabilities

TuberculistKEGG

The SEED

Genome Annotation (TIGR)

Page 4 of 20(page number not for citation purposes)

Page 5: BMC Systems Biology BioMed Central - Springer...structured (BiGG) 'databases'. We employ constraint-based reconstruction and analysis (COBRA) of this BiGG reconstruction to learn about

BMC Systems Biology 2007, 1:26 http://www.biomedcentral.com/1752-0509/1/26

Table 2: The list of biomass components for the initial biomass function. [See Additional file 1] for the additional components added for the extended biomass function and [see Additional file 5] for the explicit names and molecular formulas.

Biomass composition for growth of iNJ661

mmol/gDW Abbreviation

0.40596 ala-L

0.02261 cys-L

0.12031 asp-L

0.09001 glu-L

0.04810 phe-L

0.33581 gly

0.04062 his-L

0.08773 ile-L

0.03909 lys-L

0.20471 leu-L amino acids

0.03489 met-L

0.04770 asn-L

0.13803 pro-L

0.05812 gln-L

0.12042 arg-L

0.14340 ser-L

0.13571 thr-L

0.20583 val-L

0.02012 trp-L

0.03176 tyr-L

0.13903 amp

0.25095 cmp

0.24365 gmp

0.13160 ump nucleic acids

0.00349 damp

0.00666 dcmp

0.00666 dgmp

0.00365 dtmp

0.00797 tre

0.00118 tat

0.00125 pat

0.00114 sl1

0.00649 tre6p

0.00647 tres

0.16315 glc-D

0.09507 man sugars

0.02172 rib-D carbohydrates

0.05547 gal peptidoglycans

0.23018 acgam1p

0.00279 uamr

0.00101 uaaAgla

0.00101 uaaGgla

0.00099 uaagmda

0.00098 ugagmda

0.02536 glyc

Page 5 of 20(page number not for citation purposes)

Page 6: BMC Systems Biology BioMed Central - Springer...structured (BiGG) 'databases'. We employ constraint-based reconstruction and analysis (COBRA) of this BiGG reconstruction to learn about

BMC Systems Biology 2007, 1:26 http://www.biomedcentral.com/1752-0509/1/26

0.01885 peptido_TB1

0.01885 peptido_TB2

0.00073 arabinanagalfragund

0.00168 Ac1PIM1

0.00149 Ac1PIM2

0.00134 Ac1PIM3

0.00121 Ac1PIM4

0.00127 Ac2PIM2

0.00208 PIM1

0.00179 PIM2

0.00157 PIM3

0.00140 PIM4

0.00127 PIM5

0.00115 PIM6

0.00406 pe160

0.00586 clpn160190 fatty acids

0.01219 ttdca glycolipids

0.23515 hdca phospholipids

0.01094 hdcea

0.03484 ocdca

0.00985 ocdcea

0.09580 mocdca

0.01568 arach

0.00784 mbhn

0.05835 hexc

0.00724 pa160

0.00680 pa160190

0.00641 pa190190

0.00649 pg160

0.00613 pg160190

0.00581 pg190

0.01100 mycolate

0.00334 kmycolate

0.00329 mkmycolate

0.00337 mmmycolate

0.00111 tmha1 mycolic acids

0.00110 tmha2 phenolic glycolipids

0.00110 tmha3 mycocerosates

0.00110 tmha4 mycobactin

0.00645 phdca

0.00118 mfrrppdima

0.00140 mcbts

0.00150 fcmcbtt

0.00027 pdima

0.00026 ppdima

60.00000 atp

60.00000 adp (product) growth associated maintenance

60.00000 pi (product)

Table 2: The list of biomass components for the initial biomass function. [See Additional file 1] for the additional components added for the extended biomass function and [see Additional file 5] for the explicit names and molecular formulas. (Continued)

Page 6 of 20(page number not for citation purposes)

Page 7: BMC Systems Biology BioMed Central - Springer...structured (BiGG) 'databases'. We employ constraint-based reconstruction and analysis (COBRA) of this BiGG reconstruction to learn about

BMC Systems Biology 2007, 1:26 http://www.biomedcentral.com/1752-0509/1/26

al [29] identified a set of genes that are consistentlyexpressed under different growth conditions in liquid cul-ture (this will be referred to as the ConstExp datasetthroughout the rest of the text), and Sassetti in 2003 inves-tigated the set of genes needed for M. tb survival in vivoduring infection in mice [30] (will be referred to as theInfect dataset). The degree of overlap between the threedatasets can be seen in Figure 4A (only 2 genes are sharedby all three sets) highlight the variability of gene expres-sion across the different experimental datasets. Althoughsignificant differences between in vivo and in vitro patternsof gene expression may be expected, the OptGro and Con-stExp datasets were both done in vitro. The heterogeneityamong the datasets (all three experimental datasets shareonly two loci) suggests that defining an objective functionbased on in vitro data, may not necessarily translate to use-ful results in vivo (it may constrain the solution space fromthe wrong directions).

We evaluated these three datasets for M. tb and comparedthem to gene deletion analyses for iNJ661, in which eachgene was individually deleted and growth was optimized

on Middlebrook media. Any growth rate that fell short ofthe maximum wild type growth was considered 'sub-opti-mal'. Table 5a summarizes the 114 false negative (iNJ661gene non-essentiality, but in vitro gene essentiality; 50%)results between iNJ661 and the OptGro dataset (for thefull results [see Additional file 4]). The inconsistencieswere considered individually and classified into four cate-gories, Not in Objective Function (NOF), Alternate Route(AR), Alternative Locus (AL), and Not Essential (NE);these classifications are not mutually exclusive and weredefined in order to navigate and interpret with greaterease. A gene was placed in the NOF category if it was iden-tified to be in a biosynthetic pathway (based on networkconnectivity) for a particular metabolite, such as a vitaminor co-factor. Genes would be categorized as AR or AL ifthere was an alternative biochemical route or an alterna-tive gene annotation that could carry out that reaction iniNJ661, respectively, but not in M. tb. Potential AR or ALclassifications reflect one of the potential errors due tosequence based annotations, so further biochemical char-acterization of the gene products would help resolve thesecases. Genes that did not fall in these categories were clas-

Table 4: Summary of iNJ661 biomass production rates (in mmol/hr/g dwt) and doubling times (1/hr) in silico on different media. * from [19], ^ from [18].

iNJ661 Growth

Biomass Accumulation iNJ661 Doubling Time M. tb Doubling Time

Middlebrook 7H9 0.052 13.33Youman's 0.029 23.90 24*

CAMR 0.058 11.95 14.7^

Table 3: *Glucose and glycerol supplementation were added as described in [20].

in silico media composition

Youmans Middlebrook 7H9 CAMR

oxygen ammonium L-alanineL-asparagine calcium L-arginine

citrate chloride L-asparagineglycerol citrate L-aspartic acidwater copper L-glutamic acid

octadecanoate (Tween) ferric iron L-glycinephosphate L-glutamate L-isoleucine

sulfate magnesium L-leucinemagnesium oxygen L-serinepotassium phosphate L-phenylalanine

bicarbonate sodium pyruvate(ferric iron: minimal amount) sulfate glycerol

water octadecanoate (Tween)D-glucose*glycerol*

Page 7 of 20(page number not for citation purposes)

Page 8: BMC Systems Biology BioMed Central - Springer...structured (BiGG) 'databases'. We employ constraint-based reconstruction and analysis (COBRA) of this BiGG reconstruction to learn about

BMC Systems Biology 2007, 1:26 http://www.biomedcentral.com/1752-0509/1/26

sified as NE. Many of these loci were associated with fluxesin which the flux variability was uniformly zero (maxi-mum and minimum value).

Since quantitative data was not available for vitamins andcofactors, they were not included in the original defini-tion of the objective function (Table 2). However, after thedeletion analysis, the majority of the false negative predic-tions in the NE category by iNJ661 were reactions associ-ated with the biosynthesis of vitamins and cofactors, suchas dihydrofolate and tetrahydrofolate. 16 such metabo-lites were added [see Additional file 1] to the biomassfunction with coefficients of 0.000001 (small enough tohave negligible effects on quantitative growth, but stillrequired for growth). The results for the genes required foroptimal growth using the initial biomass function and theexpanded compared to OptGrow are displayed in the firstVenn diagram of Figure 4B. Although only 16 metaboliteswere added, the deletion study resulted in 31 more geneloci that matched the OptGro data. This only slightlyincreased the deletion prediction from 54% to 56%. Thereason for the modest increase is that there were also 18additional false positive (iNJ661 gene essentiality, but invitro gene non-essentiality) results (from 46% to 44% forthe original and expanded objective functions, respec-tively).

Genes categorized as AL or AR suggest that there was anerror in the sequence based annotation, there are regula-tory control loops that we are not aware of, or there was afalse positive result from the experimental data (TraSH isoften used as a screening tool). Sassetti et al [23] dicussedone of these cases involving the Rv0505c and Rv3042cloci (serB and serB2 respectively). Both have beenassigned the same function based on annotation, howeverserB2 was found to be essential while serB was not. Theremay be many possibilities why serB cannot rescue muta-tions/deletions of serB2, however the detection of suchcases imply that having duplicate genes many not conferthe degree or robustness that is often assumed to accom-pany an organism. The cases with alternative loci high-light areas where further experimental investigation mayprovide insight into the growth capabilities of M. tb. TheAL cases are enumerated in Table 5b. These loci are worthfurther experimental investigation to confirm the annota-tion and to determine the different conditions which mayinduce or inhibit expression. The false negative predictionfor the sulfate transporter (Rv2397c, Rv2398c, Rv2399c,and Rv2400c) is an interesting case to consider, in partbecause it validates the observations in the phase planediscussed in Figure 2b. Additionally, it identifies an areain which a partially characterized annotation causes erro-neous results. The four aforementioned loci form a pro-tein complex in the model in order to catalyze the

a. Phase plane diagram of glycerol versus glucose uptake while optimizing for growth on Middlebrook mediaFigure 2a. Phase plane diagram of glycerol versus glucose uptake while optimizing for growth on Middlebrook media. iNJ661 can grow with glycerol serving as the sole carbon source, but not glucose. The open dots are the calculated phase points and the solid black lines indicate isoclines. b. Phase plane diagram for phosphate versus sulfate uptake on Middlebrook media. Although the transport fluxes are very small, they are both necessary for the organism to be able to grow.

Page 8 of 20(page number not for citation purposes)

Page 9: BMC Systems Biology BioMed Central - Springer...structured (BiGG) 'databases'. We employ constraint-based reconstruction and analysis (COBRA) of this BiGG reconstruction to learn about

BMC Systems Biology 2007, 1:26 http://www.biomedcentral.com/1752-0509/1/26

transport reaction. However, since there was an additionalannotation identifying Rv1739c as a sulfate transporter[31], an association was made between Rv1739c in addi-tion to the existing protein complex (Rv2397c + Rv2398c+ Rv2399c + Rv2400c). So, although the sulfate trans-porter is essential for in silico growth of iNJ661, the lack ofa biochemically characterized gene protein relationshipcan result in false predictions.

The 103 false positive cases likely reflect incompleteknowledge about the network (alternative pathways exist)or an alternative objective function [see Additional file 4].Approximately 20 cases involve amino acid pathways,since amino acids would presumably be required for anykind of growth in lab conditions, these cases likely suggestalternative pathways or enzymes are present in M. tb thatare not in iNJ661. Almost half of these cases (46) involvefatty acid, membrane, or peptidoglycan metabolism, sincethe fatty acid composition in M. tb can change signifi-cantly over time in vitro [9], many of these false predic-

tions are likely to be due to changes in the biomasscomposition (in silico simulations have fixed biomasscompositions). These cases highlight areas in whichexperimental studies documenting the changes in compo-sition and growth rates may yield interesting results. Fiveof the cases involve loci associated with the transport ofphosphate, ammonium, and a sugar transporter. Thesecases likely involve as of yet unidentified alternative trans-porters in M. tb. Another case involves glycerol kinase(Rv3696c, GlpK, GLYK) which was one of the fluxes withno flexibility (see M. tb growth in silico), since M. tb invitro requires glycerol supplementation for growth [20].However, since glycerol kinase does not appear to berequired even for sub-optimal growth as implied by theOptGro dataset, there must be other reactions involvingthe direct metabolism of glycerol that have not yet beenidentified for M. tb. The ability to grow using CO as thesole carbon source and maintain optimal growth evenwithout glycerol kinase activity imply unique and uniden-tified aspects to central metabolism in the M. tb network.

Biomass optimization on Middlebrook media without glucoseFigure 3Biomass optimization on Middlebrook media without glucose. Carbon monoxide uptake can affect the growth at very low glyc-erol uptake rates via Carbon Monoxide Dehydrogenase, which generate protons through the oxidation of carbon dioxide.

Page 9 of 20(page number not for citation purposes)

Page 10: BMC Systems Biology BioMed Central - Springer...structured (BiGG) 'databases'. We employ constraint-based reconstruction and analysis (COBRA) of this BiGG reconstruction to learn about

BMC Systems Biology 2007, 1:26 http://www.biomedcentral.com/1752-0509/1/26

Further experimental investigations may provide cluestowards controlling the growth of M. tb (increasing thegrowth rate in culture and inhibiting growth in vivo).

Out of the 194 genes identified in the Infect dataset, only17 were shared with the OptGro dataset and 37 were

shared with the ConstExp dataset (Figure 4A). From thesethree sets of experimental datasets, OptGro and ConstExphad the greatest number of genes in common, 162. How-ever this is still only about 27% and 29% of the genes ineach dataset, respectively. The large variability between allof these datasets highlights the point that further develop-

Summary of experimental datasets and their overlap with iNJ661 and the iNJ661 gene deletion studiesFigure 4Summary of experimental datasets and their overlap with iNJ661 and the iNJ661 gene deletion studies. OptGro refers to the dataset described in [23], ConstExp refers to the dataset described in [29], and Infect describes the dataset in [30]. ^ denotes the subset of genes from the experimental dataset contained in iNJ661. iNJ661 represents the 661 genes in the reconstruction. iNJ661 optimal growth represents the subset of genes required for optimal growth with the original objective function. iNJ661 optimal growth* represents the subset of genes required for optimal growth with the objective function expanded to include vitamins and cofactors. iNJ661 alternative objective represents the set of genes required for optimal growth using the objective function constructed based on the ConstExp dataset. Panel A: The Venn diagram shows the overlap between the different experimental gene expression datasets. The accompanying chart summarizes the total number of genes in each dataset and how many of those genes are found in the iNJ661 reconstruction. Panel B: Series of Venn diagrams summarizing the results for the gene deletion studies carried out of iNJ661 compared to the experimental gene expression datasets.

iNJ661optimalgrowth

OptGro^

31031

31

18

83

123

iNJ661optimalgrowth*

iNJ661optimalgrowth 01901

10

39

55

36

iNJ661optimalgrowth*

ConstExp^

iNJ661optimalgrowth 42142

3

46

20

12

iNJ661optimalgrowth*

Infect^

OptGro ConstExp

Infect

553515

437 363

142

2

160

A

B

OptGro ConstExp InfectGenes in Data Set 614 560 194Genes Shared with iNJ661 237 101 103

Table 5a: Classification of False Negative results by subdivision into four overlapping categories for the gene deletion study in iNJ661 compared to the OptGro dataset. Each row and column lists the number of those genes in the respectively classes. NOF: Not in Objective Function, AR: Alternate Route, AL: Alternative Locus, NE: Not essential.

Classification of False Negative results

AL AR NE NOF

AL 26 0 5 2AR 0 7 6 1NE 5 6 24 0

NOF 2 1 0 43

Page 10 of 20(page number not for citation purposes)

Page 11: BMC Systems Biology BioMed Central - Springer...structured (BiGG) 'databases'. We employ constraint-based reconstruction and analysis (COBRA) of this BiGG reconstruction to learn about

BMC Systems Biology 2007, 1:26 http://www.biomedcentral.com/1752-0509/1/26

ment of Flux Balance Analysis based approaches willrequire the definition alternative objective functions tobiomass optimization.

Functional assessment independent of objective functionsBiomass functions can help improve predictions underwell-defined conditions by constraining the steady statesolution space. However, different growth conditions canappreciably alter the composition and the 'objectives' ofbacteria, as reflect by Figure 4A and the discussion above.Significant changes in fatty acid composition have beenobserved even while growing M. tb in culture over thespan of weeks [12]. These changes would be reflected aschanges in the composition and the coefficients of the

biomass function, consequently altering the predictionsmade by the model. Not all COBRA methods require thedefinition of objective functions [32] and identifyinggroups of reactions that operate together can help simplifya network and provide insight into its functionality [33].Correlated reaction sets have been calculated using sam-pling [34] and flux coupling [35] and implications forclassifying diseases and identifying pathway specific drugtargets have been discussed [36]. Applying the same sys-tems view of metabolic networks to pathogens such as M.tb causes one to adopt the view that single enzyme drugtargets actually knock out complete pathways. As a result,terminating the activity of any other enzyme in that path-way should have the same effect. Similar to the correlated

Table 5b: Detailed listing of the AL results for the False Negative results for iNJ661. The first column lists a locus needed for optimal growth according to the OptGro dataset. The second column lists the reactions involved. The third column lists the possible alternative loci in iNJ661 (commas indicate isozymes, addition symbols indicate formation of protein complexes).

Alternative Loci in False Negative results

OptGro Reaction Other loci in iNJ661

Rv0337c ASPTA Rv2565Rv0462 PDHc, PDH, PDHa Rv0843, Rv2497c+Rv2496cRv0808 GLUPRT Rv1602Rv0824c DESAT18 Rv1094Rv1122 GND Rv1844cRv1600 HSTPT Rv3772, Rv2231cRv1602 GLUPRT Rv0808Rv1609 ANS Rv2859cRv2182c AGPAT160190, AGPAT160, AGPAT190 Rv2483cRv2215 PDH, AKGDb Rv2241, Rv0462, Rv0843Rv2231c HSTPT Rv1600, Rv3772Rv2397c SULabc, TSULabc Rv1739cRv2398c SULabc, TSULabc Rv1739cRv2399c SULabc, TSULabc Rv1739cRv2400c SULabc, TSULabc Rv1739cRv2682c DXPS Rv3379cRv2746c PGSA190, PGSA160, PGSA160190 Rv1822Rv2996c PGCD Rv0728cRv3003c ACLS Rv3509c+Rv3002c, Rv1820+Rv3002c,

Rv3509c+Rv3002c, Rv3002c+Rv3470cRv3042c PSP_L Rv0505cRv3257c PMANM, AIRCr Rv3275c+Rv3276cRv3275c AIRCr Rv3257c+Rv3276cRv3281 ACCOACr Rv3280+Rv0904cRv3441c PMANM, ACGAMPM Rv3308, Rv3257cRv3634c UDPG4E Rv0536, Rv0501Rv0112 GMAND Rv1511Rv2247 PPCOAC Rv2502c+Rv0973c, Rv3285+Rv0974c,

Rv0973c+Rv0974c,Rv0973c PPCOAC Rv2502c+Rv0973c, Rv3285+Rv0974c,

Rv0973c+Rv0974c, Rv2501c+Rv2247Rv3285 PPCOAC Rv2502c+Rv0973c, Rv0973c+Rv0974c,

Rv2501c+Rv2247Rv2967c PC Rv2976cRv3464 TDPGDH Rv3468c, Rv3784Rv2754c TMDS Rv2764cRv3713 ADCYRS Rv2848c

Page 11 of 20(page number not for citation purposes)

Page 12: BMC Systems Biology BioMed Central - Springer...structured (BiGG) 'databases'. We employ constraint-based reconstruction and analysis (COBRA) of this BiGG reconstruction to learn about

BMC Systems Biology 2007, 1:26 http://www.biomedcentral.com/1752-0509/1/26

reaction sets calculated by sampling or flux coupling, HCRsets are defined by mass balance constraints. The defini-tion of HCRs is stricter in the sense that it is based onmetabolites with one-to-one connectivity. However incontrast to the other approaches, it is independent of theexchange reactions with the environment and any require-ments for demand functions. Consequently, the sets canbe calculated directly from the stoichiometric matrix.There is a significant degree of overlap between HCRs andthe Enzyme Subsets (ES) described by Pfeiffer et al [37].Since ESs are not constrained by 1:1 connectivity, in prin-ciple they may include additional reactions. On the sametoken, when the same reaction is carried out by multipleco-factors and the reactions are reversible, which is not

uncommon in metabolic networks, ESs may add reactionswhich form loops (similar to Type III Extreme Pathways[38]). Additionally, intermediate steps in the calculationof HCRs allow the consideration of inclusion or removalof potentially reversible reactions.

147 HCR sets (1:1 only, additional reversible reactions,0:2/2:0, were not included, see Methods) were calculatedfor the network, the average size of each set was 2.93, withmedian and modes both equal to 2. The largest set, HCR40, consisted of 13 reactions involved in Cofactor Metab-olism and Porphyrin Metabolism. The summary of howthe HCRs are split among the 35 subsystems in the net-work are outlined in Table 6. Using the Gene-Protein-

Table 6: HCR sets mapped to the subsystems in the network along with the size of each subsystem.

HCR mapping to subsystems

Metabolic Subsystem HCR Sets in each Subsystem Number of HCRs

Alanine and Aspartate Metabolism 47,81,99,126 4Arginine and Proline Metabolism 126 1Biotin Metabolism 33 1Citric Acid Cycle 98 1Cofactor Metabolism 7,24,25,26,30,40,56,77,118,123,129 11Cysteine Metabolism 38 1Fatty Acid Metabolism 1,43,57,78,86,90,93,94,95,96,97,103,108,110,11

4,115,116,128,13219

Folate Metabolism 9,22,28,31,44 5Glutamate Metabolism 21,27,29,67,76 5Glycine Serine Threonine Metabolism 19,24,45,111,124 5Glycolysis 41,42 2Glyoxylate Metabolism 104 1Histidine Metabolism 10,70 2Lysine Metabolism 2,8 2Membrane Metabolism 23,29,34,35,36,43,46,61,64,65,66,74,79,97,103,

106,107,108,109,110,112,113,116,117,130,131,138

27

Methionine Metabolism 27,28,45,91,92 5Nucleotide Sugar Metabolism 55,59,63,69,75,83,140 7Other Amino Acid Metabolism 29,48,82,91,92,125,127 7Pantothenate and CoA Metabolism 12,67,68,81 4Pentose Phosphate Pathway 32,121 2Peptidoglycan Metabolism 43,133,134,136,137,100 6Phenylalanine Tyrosine Tryptophan Metabolism 10,11,20,85,105 5Polyprenyl Metabolism 16,60,62,77 4Porphyrin Metabolism 25,30,40,51,52,53 6Purine Metabolism 5,17,27,49,58,63,73,140 8Pyrimidine Metabolism 55,59,83,98 4Pyruvate Metabolism 82 1Redox Metabolism 39,50,71,72,76,80,100,102 8Riboflavin Metabolism 6,7,26,30,56 5Sugar Metabolism 15,42,74,75,87,89,99,119,120,135 10Thiamine Metabolism 15 1Transport 39,48,54,71,82,84,87,88,89,101,102,106,110,12

4,127,139,141,142,143,144,145,146,14723

Ubiquinone Metabolism 13,122,125 3Urea Cycle 27,37 2Valine Leucine Isoleucine Metabolism 3,4,14,18,84,88,139 7

Page 12 of 20(page number not for citation purposes)

Page 13: BMC Systems Biology BioMed Central - Springer...structured (BiGG) 'databases'. We employ constraint-based reconstruction and analysis (COBRA) of this BiGG reconstruction to learn about

BMC Systems Biology 2007, 1:26 http://www.biomedcentral.com/1752-0509/1/26

Reaction relationships which form the foundation of thereconstruction, we mapped the HCRs back to the gene loci(Figure 5). The complete HCR set mapped to 124 loci and61 HCRs in the OptGro dataset, to 46 loci and 34 HCRsin the ConstExp dataset, and 21 loci and 18 HCRs in theInfect dataset [see Additional file 2]. The OptGro andConstExp shared 8 HCR sets (HCR 2, 3, 11, 12, 43, 57, 69,and 122). The intersection of these datasets identifies asubset of reactions that are consistently expressed underdifferent conditions and also required for optimal growthin vitro. These HCR sets may reflect underlying core sets ofreactions vital to growth and survival in vitro.

Many of the tuberculosis treatment drugs have metabolictargets [1] and a use of this metabolic based reconstruc-tion can be to assist in identifying new and alternativechemotherapy targets for tuberculosis. Mdluli and Spigel-man recently discussed the drug targets in M. tb reasona-bly comprehensively [39]. We used the list (absent theDNA synthesis and Regulatory Protein targets) andmapped them to the 147 HCR sets as seen in Table 7 [seeAdditional file 3 for the full set]. The final column lists thenumber of different reactions in the HCR. Other reactionsthat are shared within a particular HCR which contains aknown drug target, have the potential to be alternative but

The stoichiometric matrix is created from the metabolic networkFigure 5The stoichiometric matrix is created from the metabolic network. The HCR (Hard Coupled Reaction) sets are calculated directly from the stoichiometric matrix and mapped back to the gene loci. Gene-Protein-Reaction relationships: top box is the gene, the next box is the peptide, the oval represents the functional protein, and the bottom boxes are the reactions catalyzed by the protein.

Calculating Hard Coupling Reaction Sets

and Mapping to Genes

Stoichiometric Matrix

Metabolic Network

Rv0211

pckA

PckA

PEPCK_re PPCKr

Gene-Protein-Reaction

Association

Transcript

Catalyzed Reaction

ProteinComplex

Hard Coupled

Reaction Sets

GeneLocus

Page 13 of 20(page number not for citation purposes)

Page 14: BMC Systems Biology BioMed Central - Springer...structured (BiGG) 'databases'. We employ constraint-based reconstruction and analysis (COBRA) of this BiGG reconstruction to learn about

BMC Systems Biology 2007, 1:26 http://www.biomedcentral.com/1752-0509/1/26

Page 14 of 20(page number not for citation purposes)

Table 7: Mapping the HCR sets to the drug targets described by Mdluli and Spigelman [39]. The first column includes the protein names (adopted from Mdluli and Spigelman), the second column lists the corresponding gene locus, the third is the HCR set number and the fourth lists how many reactions are in the HCR set. Only those drug targets mapped to an HCR are shown. [See Additional file 3] for the full set of HCR sets and the individual reactions they are comprised of; the HCR sets mapped to the above drug targets are highlighted in these data.

HCRs mapped to Mtb drug targets

Drug Target Gene Locus HCR HCR size

EmbA Rv3794 43 10EmbB Rv3795 43 10EmbC Rv3793 43 10AftA Rv3792 43 10decaprenyl phosphoryltransferase Rv3806c 62 3udp galatofuraosyltransferase Rv3808c 43 10dTDP-deoxy-hexulose reductase Rv3809c 43 10RmlA Rv0334 69 4RmlB Rv3464 69 4RmlC Rv3465 69 4RmbD Rv3266c 69 4InhA Rv1484 57 3MabA Rv1483 90 5KasA Rv2245 90 5KasB Rv2246 90 5MmaA4 Rv0642c 93 4Pks13 Rv3800c 86 4FadD32 Rv3801c 86 4AccD5 Rv3280 78 2LeuD Rv2987c 14 2TrpD Rv2192c 10 4DapB Rv2773c 2 3AroA Rv3227 20 3AroC Rv2540c 20 3AroE Rv2537c 11 4AroG Rv2178c 11 4AroQ Rv2537c 11 4ilvG (acetolactate synthase) Rv1820 3 3ilvX (acetolactate synthase) Rv3509c 3 3ilvN (acetolactate synthase) Rv3002c 3 3ilvB (acetolactate synthase) Rv3003c 3 3ilvB2 (acetolactate synthase) Rv3470c 3 3branchd chain aminotransferase Rv2210c 84 2dihyropteroate synthase Rv3608c 9 3dihyropteroate synthase Rv1207 9 3PanC Rv3602c 12 4riboflavin synthase Rv1416 56 3riboflavin synthase Rv1412 56 3sulfotransferase Rv1373 131 2MshA Rv0486 80 2MshB Rv1170 80 2MshC Rv2130c 50 2MshD Rv0819 50 2IspD Rv3582c 16 5IspE Rv1011 16 5IspF Rv3581c 16 5MenA Rv0534c 13 2MenB Rv0548c 122 3MenD Rv0555 125 2MenE Rv0542c 122 3

Page 15: BMC Systems Biology BioMed Central - Springer...structured (BiGG) 'databases'. We employ constraint-based reconstruction and analysis (COBRA) of this BiGG reconstruction to learn about

BMC Systems Biology 2007, 1:26 http://www.biomedcentral.com/1752-0509/1/26

metabolically equivalent targets. The drug targets mappedto 25 of the HCR sets. These sets are depicted on a reducedmetabolic map of iNJ661 (Figure 6). An image of the HCRsets mapped to the entire metabolic network will be avail-able on our website [40]. These 25 HCR sets contain all ofthe 8 HCR sets identified by the overlap between the Opt-Gro and ConstExp data.

Last year Raman et al [41] constructed and analyzed amodel of the mycolic acid synthesis pathways and pro-posed additional targets in fatty acid metabolism.Accounting for the entire metabolic network furtherbuilds upon this by enabling a more rigorous evaluationof the global growth capabilities of the network and theidentification of potential drug targets. The genome-scalemodel presented here includes the extensive fatty acid

metabolism pathways of M. tb, in addition to rest of themetabolic network. With this full network, we thensought to extend the identification of potential drug tar-gets by taking advantage of the underlying principles ofreconstructing metabolic networks to calculate HCR sets.

Interpreting large, complex networks in a functionallymeaningful manner is enabled by adopting a hierarchicalview of the network, where one identifies groups of reac-tions that respond in unison to changes in media condi-tions. Groups of reactions strictly bound together basedon mass conservation and stoichiometry constraints canlead to the definition of hierarchical sets of reactions[33,34]. This principle was applied to causal SNP associ-ated diseases in the human mitochondria, and it wasfound that reactions in the same co-set had similar disease

A partial metabolic map of iNJ661 with the 25 drug target HCR setsFigure 6A partial metabolic map of iNJ661 with the 25 drug target HCR sets. Each reaction is numbered and color coded according to the HCR which it belongs to. The number of the HCR sets matches those in the Additional files. Only pathways or parts of pathways which included members of the 25 HCRs are depicted.

Folate MetabolismRiboflavin Metabolism

Cofactor Metabolism

Lysine Metabolism

Phenylalanine, Tyrosine, Tryptophan

Metabolism

Glutamate Metabolism

Redox Metabolism

Biotin Metabolism

Thiamine Metabolism

Pantothenate and CoA Metabolism

Fatty Acid Oxidation

Fatty Acid Synthesis

Ubiquinone Metabolism

Valine, Leucine, and Isoleucine

Metabolism

Alanine and Aspartate Metabolism

Mycolic Acid Synthesis

PDIM Synthesis

Trehalose Dimycolate Metabolism

Sulfolipid-1 Synthesis

From Peptidoglycan Metabolism

Peptidoglycan Metabolism

Arabinogalactan and Peptidoglycan Metabolism

Polyprenyl Metabolism

Fatty Acid Synthesis

Mycobactin Synthesis

Membrane Metabolism

h2o

f6p

co2 h2o

h

h

h

acgam1p

nad

coa

uacgam

nad

h2o

pmhexc

nadph

h2o2

gdp

h2o

mmcoa-S

h2o

uacgam

atp

h

h

h2o

nadph

atp

nadp

coa

prpp

accoa

for

pi

argsuc

pgp160190

h

h

h2o

ump

coa

h

ala-L

fadh2

h

adp

adp

amp

h2o

tarab-D

h

sl26da

amp

amp

ump

nadp

galfragund

ala-D

26dap-M

amet

adp

ser-L

tat

atp

pi

ctp

atp

nadh

nh4

h

nh4

h

amp

peptido_EC

arab-D

arachACP

h

nadp

mkmeroacidcyc1AM P

gdp

uacgam

2p4c2me

h

h

nh4

nadph

ugmd

gdp

for

h

nadph

nad

h

thmmp

gln-L

nadp

paps

h

h2o

nadp

pa190190

adp

nadph

indpyr

ctp

glucys

ppi

ppi

coa

h2o

h

nh4

uaAgla

nadp

pi

uacgam

uaaGgla

co2

meroacidcyc2AMP

octa

co2

glu-D

ppi

acser

decda_tb

amp

udcpp

stcoa

ahcys

o2

h

uaGgla

h

pat

4adcho

uama

mmmeroacidACP

h

co2

ACP

h2o

co2

nad

h

h2o

mshfaldox

atp

ppi

h2o

forcoa

2shchc

peptido_TB2

coa

4mhetz

nadph

co2

thmpp

mmcoa-S

coa

udp

h

tmha1

amp

glu-Lpi

ahcys

nad

nadh

arpp

adn

gthox

h

co2

atp

h

nadp

tdcoa

gdp

pi

atp

ppi

h

mmcoa-S

4abz

ppi

h2o

2ahhmd

h

amp

co2

4ahmmp

uAgla

h

thr-L

clpn160190

ttc

ppi

adp

harab-D

mkmycolate

apoACP

atp

h

fad

bmn

mshfald

Ac1PIM4

cys-L

ac

ahcys

udp

pi

pi

uacgam

mlthf

nadph

ACP

mycolate

uaaGgla

h

nadp

h2o

h

pi

h2o

o2

h2o

glu-L

ump

h2o

pant-R

3c2hmp

adp

prephthcoa

2ippm

odecoa

h

nadp

h2o

coa

atp

h2o

amp

h2o

gdp

Ac2PIM2

cdpc16c19g

h

h2o

pi

achms

coa

nh4

cdp

h

2ahbut

mmeroacidcyc2AM P

glu-L

hpmtria

h2o

h

pi

co2

h

decda_tb

2cpr5p

h

h

ad

ACP

h2o

coah2o

atp

h

co2

26dap-M

amp

cdpdhdecg

uacgam

peptido_TB1

h

h

uaAgla

h

mmcoa-S

nadp

ipdp

h

h

h2o

chor

coa

mcbts

co2

nadphcoa

mmmycolate

utp

h

mssg

dhlam

nadp

atp

nadh

no

25drapp

sl1

pi

25dhpp

h

nad

h2o

pi

h2o

h2o

h2o

ocdca

ctp

pg160190

ahcys

nadp

coa

uAgl

gdpfuc

ppi

tdm2

udcpp

h2o

ump

nadp

co2

nadp

malcoa

amp

fmettrna

udcpp

coa

occoa

coa

h

mi1p-D

h2o

h

coa

co2

Ac1PIM3

atp

h2o

uamag

nad

uamr

23dhdp

PIM3

mmcoa-S

adp

h

h

ahcys

h2o

h

h

3ig3p

fad

amp

ppi

mmcoa-S

ragund

arg-L

h2o

h

atp

decda_tb

cmp

h

atp

h

nad

h

co2

unaga

ssaltpp

pi

h

tamhexc

h2o

h

h

nadph

cyst-L

mharab-D

h

h

mstrACP

alaala

succ

grdp

coa

h

hexc

gam1p

akg

pep

akg

gtp

h

h

malACP

adp

h

h

adp

ala-L

h2o

co2

pi

mkmeroacidACP

Ac1PIM1

succ

succ

atp

mi1p-D

tdm3

nh4

h

h

ctp

uGgl

h

ipdp

h

h

co2

cys-L

mmmeroacidcyc1AMP

h

2ahhmp

nadph

decdman1p_tb

galfragund

nh4

h

h2o

cys-L

cmp

h

h

phe-L

ACP

atp

amet

pyr

coa

accoa

h

nh4

h2o

co2

2obut

chor

pi

ala-B

nh4

nadp

glu-L

hexccoa

o2

accoa

adp

uaaAgla

gdpmann

for

ahcys

atp

gdpmann

nadp

nadph

h2o

h2o

mmcoa-S

arabinanagalfragund

h2o

h

gdpmann

pi

h2

pmtcoa

nadph

coa

nadp

h

amp

thf

adp

co2

co2

ctp

3mob

co2

h2o

uacgam

nh4

udpgalfur

coa

lys-L

h

nadph

mfrrppdima

4abut

ipdp

4mop

gln-L

cmp

nadp

ragund

nadph

nadph

h2o

h

mmcoa-S

tdm2

gdpmann

coa

h

fad

2dhp

h

pa160

h2o

nadp

ppi

mmcoa-S

uamag

co2

h

co2

co2

pdima

coa

adp

2mecdp

gcald

h

pep

PIM2

co2

mmeroacidcyc1AC P

nadh

h

prephthACP

co2

atp

fadh2

h

adp

h2o

gthrd

h

ppi

nadph

atp

succoa

pyr

mmcoa-S

udcpdp

atp

h

pi

co2

hdca

coa

amet

h[e]

decdp

amet

succoa

nadph

mmcoa-S

co2

ahcys

atp

coa

dpcoa

atp

coa

nadph

h

malcoa

h

h

cdpc19c19g

gam1p

atp

atp

h2o

ppi

ppi

h2o

5apru

coa

h

o2

ppi

gluala

nadp

2shchc

grdp

co2

pi

nadph

adp

prepphth

nadph

h

h

dxyl5p

nadph

adp

h

asn-L

air

citr-L

malcoa

h2o

h2o

nadph amet

uacgam

cys-L

ipdp

pa190190

ser-L

co2

h

uagmda

h

amet

h

coa

atp

nadp

gdp

mmycolate

nadp

thf

malACP

h2o

nadph

tdm1

h

h

nadph

pi

nadp

h

nadh

nadh

malACP

h

suchms

akg

ACP

h

h2o

mql8

h

nadph

amp

pap

chexccoa

h

h2o

gdp

ser-L

h2o

ACP

ump

cyst-L

pyr

nadp

octdp

atp

nad

malACP

pe160

o2

atp

h

ugma

ACP

mmcoa-S

tamocta

h

h[e]

nadph

mkmeroacidcyc1ACP

amet

h

ACP

tre

26dap-M

kmeroacidcyc1ACP

pep

adp

nadph

h2o

nadph

coa

h

ppi

adp

nadph

coa

coa

nadph

atp

h

atp

gcald

atp

h2o

udcpp

h2o

h2o

co2

co2

alac-S

coa

ACP

h

h

uaagmda

co2

glu-L

h2o

ACP

h

succ

co2

h2o

h

udcpp

h2o

uacgam

hh

coa

akg

gdp

uGgl

nadph

ugmda

kmeroacidACP

26dap-LL

3c4mop

uama

udp

nadh

pi

h2o

h2o

h

glyc

5mthf

h2o

rmyc

gdpmann

q8h2

nad

adp

udcpp

nadph

h

coa

h

ser-D

thm

nadp

hpmdtria

fadh2

nadp

pnto-R

prpp

h

meroacidACP

h

co2

atp

ala-D

coa

atp

h2o

akg

o2

decd_tb

ACP

glucys

nadp

nadp

h2o

h

h2o

coah2o

h

adp

uaaGgtla

malcoa

atp

h

Ac1PIM3

h2o

nadp

g3p

cmp

23dhmp

adp

omtta

sucsal

h2o

h

h2o

uagmdaugggmda

nadp

glu-L

adp

glyc3p

atp

h

pi

dtdp

atp

h2o

4r5au

palmACP

amp

mkmycolate

5aprbu

thr-L

lys-L

adp

amp

h

nadph

udpgalfur

amet

ppi

nadph

malcoa

nadh

ipdp

o2

iasp

h

adp

nadp

h

pi

meroacidcyc2ACP

atp

hdcoa

udp

nad

co2

h

pan4p

amp

uGgla

dtdp

h

glu-L

nadp

triat

ala-L

dmarach

gln-L

pi

h2s

adp

h

pi

nadph

ppi

inost

meoh

pi

hdca

h

h2o

uAgl

pad

mmcoa-S

PIM6

ugmag

h

h

glu-L

pi

h2o

bhb

amp

frdp

mcbtt

co2

akg

adp

pyr

h2o

mmcoa-S

uaccg

coa

db4p

nadph

h

mmcoa-S

3mob

omtta

h2o nad

ACP

pi

tmlgnc

mycolate

g6p

nadp

coa

bmnmsh

cys-L

coa

coa

pyr

arabinan

chexccoa

co2

atp

h

pi

co2

pi

ppi

ocdcea

nadh

ser-L

3psme

ppi

kmycolate

udp

amp

2mahmp

h

amp

h

h

tdm4

tdm3

frdp

nad

ppi

hmtria

hom-L

ppi

co2

ppi

coa

h

h2o2

h2o

ppi

h2o

g3p

h2o

pi

coa

3mop

gdpmann

amp

h

fad

h2o

coa

h

amet

nadph

ACP

ahcys

kmycolate

omdtria

co2

pi

h

amp

h

gdp

atp

fmn

nadph

h2o

ppi

succ

hmocta

for

accoa

h

atp

h2o

6hmhpt

h

hom-L

ahcys

h2o

pmtcoa

h

atp

agalfragund

h2o

h2o

2dmmq8

coa

3dhsk

nadh

occoa

ACP

tres

glu-L

amet

co2

Ac1PIM2

tmhexc

co2

mmcoa-S

h

h

nadp

akg

h

malcoa

ahcys

coa

h2o

h

h

h2o

nadph

glu-D

lys-L

msh

h

iacgam

ptd1ino160

dhnpt

sucsal

nadh

tdm2

h

atp

co2

amet

dca

arabinan

mmmycolate

atp

ppi

h2o

h2o

4ahmmp

uamag

h

ribflv

ugagmda

coa

h2o

h

coa

phpyr

thfglu

atp

coa

nad

nadh

decd_tb

igam

2dda7p

h

dhpmp

gly

amp

nad

ptd1ino160190

tmha2

co2

10fthf

cys-L

gluala

arach

atp

tdm1gdpmann

cmp

pran

h

h2o

pi

inost

gly

nh4

dmbhn

pi

h2o

nadp

adp

g3pe

malcoa

ile-L

ssaltpp

4c2me

2dmmq6

ACP

nadp

ppi

pa160190

arachcoa

cmp

glyc3p

atp

pi

h2o

amp

glu-L

h

g6p

nadh

ppi

nad atp

34hpp

pi

fadh2

unaga

h2o

nad

ppi

h2o

h

h

h2o

kmeroacidcyc2ACP

h

accoa

h

h2o

nadph

4hbz

3c3hmp

h

h2o

nh4

tyr-L

h

h

decdpa_tb

gdpmann

h2o

ppi

co2

ppi

coa

phdcaACP

h

23dhmb

decdp_tb

ACP

nad

kmeroacidcyc2AMP

h

co2

mycolate

mi4p-D

amet

nadph

coa

tmbhn

nh4

nadh

h2o

h2o

dmlgnc

tmha6

ipdp

ctp

pi

coa

h

h2o

h2o

ahcys

nadp

fol

atp

uaGgla

anth

h2o

amet

atp

gtp

h

nadph

h

coa

atp

o2

4pasp

2dmmql8

mettrna

phthdl

atp

h

nadp

adp

cmp

h

o2coa

2agpe160

salc

atp

nh4

h2o

amp

udcpp

tdm1

atp

coa

pi

ppi

26dap-M

asp-L

h

nadph

PIM2

rppdima

chor

nadph

atp

hdca

pi

co2

h

udp

nad

pi

agalfragund

succ

h2o

mycolate

h

co2

h

h2o

pmocta

decda_tb

phthcl

adp

h

mmmeroacidcyc1AC P

h

h2o

h

adp

o2

h2o

o2

alpam

coa

co2

co2

udcpp

adp

nadph

h2o

pyr

ppi

atp

h

6hmhpt

cmcbtt

q8

ppi

pi

udcpdp

amet

nadp

nadp

h2o

mmcoa-S

ttdca

coa

h2o

dtdp

pi

udp

nadp

h

4hbzACP

gthrd

decdr_tb

2me4p

adp

h

adp

adp

udp

dmpp

ppi

fum

tarab-D

h2o

h2o

sl2a6o

nadh

gdp

nadh

h

h2o

ugmd

rmyc

h

akg

h2o

udcpp

nadp

nadph

h

h

atp

ump

h

ACP

glu-L

h2o

pyr

h2o

amet

glyc3p

gdpmann

cgly

coa

h2o

h

udp

nadph

nadp

prepphthACP

udpg

dtdp

6hmhptpp

ppi

mmcoa-S

h

nadh

h2o

ppi

pphthdl

ppi

mqn8

udcpdp

h

PIM4

ile-L[e]

lys-L

coa

h

adp

h

h

coa

gcald

uGgla

for

aspsa

h2mb4p

uggmda

uamr

nadh

h2o

amp

h

acac

h2o

coa

nadph

malcoa

h

co2

malcoa

nadh

coa

pac

h

h

mlthf

pphn

4ppan

ru5p-D

h

nadhpi

g3p

h2o

atp

h

co2

h2o

h

nad

pyr

nad

amp

val-L

co2

h2o

amp

ppi

h

nad

h

mlthf

PIM5

ppi

nadh

mharab-D

uamag

nadp

h

bhb

adp

glu-D

ahcys

pi

h

h2o

co2

utp

m1ocdca

PIM1

h2o

nadp

atp

co2

nadp

ile-L

akg

nad

h

amp

h

co2

akg

ppcoa

cdp

h

octdp

amp

glu-L

ichor

coa

h2o

atp

mmcoa-S

tmlgnc

atp

h

pg190

thmpp

nadph

ahcys

nadph

nadp

coa

for

nadh

ppi

ppi

adp

co2

h

malcoa

ppi

udp

h

glu-L

ocdca

pi

homtta

h

h

mmcoa-S

atp

malcoa

thf

h

o2s

h2o

ACP

h

ACP

adp

nadp

h

decdpr_tb

chexccoa

4mpetz

nadph

coa

dmpp

cys-L

coa

nadh

ppdima

methf

4ampm

atp

malcoa

uggmd

octdp

h

dtdprmn

pgp190

h2o

nadp

2mecdp

no3

mppp9

nadp

atp

h

glu-L

atp

nadp

h2o

leu-L

h

hdcea

ppi

coa

pi

nadp

mmcoa-R

coa ACP

h2o

h

glu-L

h

h

h

h2o

accoa

h

h2o

h

udcpp

mmcoa-S

glu-L

tmha5

h2o

mmcoa-S

h2o

h

glu-L

h2o

mmycolate

fad

h

peamn

decd_tb

pi

hexc

udp

pmtcoa

o2

amp

nadp

tamlgnc

h

coa

coa

nadp

malcoa

nadph

arab-D

coa

nadphh

dmlz

gdp

mbhn

ppi

h2o

mmeroacidACP

mmeroacidcyc2AC P

adp

h2o

h2o

lys-L

coa

h

h2o2

5fthf

fald

rrppdima

ahcys

uaagmda

5oxpro

h2o

uaaAgtla

h2o

nadph

nadph

tmha4

akg

co2

h2o

h

atp

pi

h

ppi

h

malcoa

coa

phthclh2

hdca

co2

nadp

phthiocerol

cmp

dtdprmn

dhf

co2

coa

h2mb4p

amp

h

h2o

adp

amp

h

coa

odecoa

lys-L

h

nadph

acgam

mqn6

udcpdp

pacald

nh4oh

alaala

co2

1msg3p

h2o

pi

indole

uAgla

amp

h2o

mi3p-D

marach

udpgal

h

o2

chexccoa

nadph

pi

co2

amp

co2

nadph

pi

pi

adp

h2o

nadp

hdca

adp

4ppcys

gam6p

akg

ump

nh4

for

pi

h

co2

mycolate

ugmr

gly

nadp

amp

malcoa

ipdp

nadph

skm5p

ichor

atp

h

frrppdima

h[e]

mndn

h2o

trp-L

mmcoa-S

h

ACP

coa

h

ppi

pap

gthox

gln-L

tmha3

h2o

uaagtmda

sucbz

nadph

nadph

h

sbzcoa

h2o

thdp

h2o

h2o

atp

h

ahdt

h

bhb

nadph

2obut

amp

h

ppi

nad

h2o

h2o

amet

pi

h2o

co2

harab-D

h

h

pi

nadph

nadp

ppi

nadp

h

h

meroacidcyc1ACP

o2

dtdprmn

h

h[e]

hpmtria

coa

nadh

atp

h

h

h

atp

atp

skm

aacoa

cmp

3dhq

h2o

nadp

h

ppi

pphthiocerol

adp

h2o

h

hdcoa

atp

dhna

mocdca

odecoa

phom

h

ac

ala-L

uaaAgla

ugmda

h

h2o

ACP

pi

nadp

h

h2o

tre6p

nadph

gdpmann

dtdprmn

co2

nadp

akg

pi

accoa

ddca

h

prephth

nh4

nh4

ump

co2

h

h

h

h

nh4

h

nadp

cigam

fadh2

nadp

glu-L

hexc

h

gdp

fad

succoa

ahcys

decdp

coa

asp-L

ump

ppi

pmtcoa

ahcys

ile-L[e]

e4p

dhpt

FASm2201

PMPK

TRESULT131

PPND

DHFS

AMID2

ALAD_L

DHAD2

FASm280

TMHAS4

FASm2801

PNTK12

GTMLT

GLNSP2

OXGDC2125

FACOAL80

TRIATS

UGLDDS1

SHSL4r

DHFR

HMPK2

GLUDx

DCPDP

MEPCT16

FAS14057

FOLR2

FASm300

MNDNS2

HPPK9

FAS161

KARA1i3

G16MTM4

DPR12

MYCFLUX5

FASm2602

IPPS

SSALy

FACOAE180

DASYN190190

PPNCL2

G16MTM7

PPNCL

MMCD

CDAPPA160190

PAPPT1

HSERTA

OCTDPS

TRE6PS

PHTHCLR1

GGLUCT2

SERAT

MYC1CYC4

GTHOr

LEUTA

NH4OHDs

AACPS3

ILEt2r84

MYCON4

CDPPT160190

FCOAH2

MTHFR2

ALDD19x

MYCON5

UAMRH

CYTBD

UAAGDS

RBFSb56

AFTA43

FASm2001

PREPPACPHr

MYCSacp5090

FTHFCL

UAGCVT

DESAT18

PHTHSOCOAT1r

DHPS2

PREPHACPH

PREPPACPH

DB4PS56

PRDX

HSDx

2AGPEAT160

FASm1601

FASm220

NADH5

PREPTHS

PPTGS_TB2

ALDD1

GLNSP3

DHPPDA2

UDPDPS

UGLDDS2

TDMS4

MTHFD

MYC2CYC386

SHKK20

PHTHDLS

PHETA1

FASm3001

DNMPPA

MECDPS16

PLIPA1E160

FAS80_L

DHQS11

G16MTM1

PRAIi10

FAH4

TRE6PP

PHYCBOXL

SDPTA

G3PAT190

UAGPT1

MME

ABTA

PHTHS2

BMNMSHS

GTPCI

DHAD13

CDPMEK16

ICHORSi

FASm2601

MYCSacp56

ACGAMT43

ARACHTA

PPNDH

DHQD11

UAGPT2

FAS240_L

HPPK2

DCPDPP62

FASPHDCA

MSHOXH

ANPRT10

MYCFLUX2

CDPPT160

UGGPT3

FMNAT

FAO1

GMT243

DXPRIi16

GLYCL

DXPS

UDCPDPS

SHK3Dr11

FAMPL4

CIGAMS50

FAS160

DALAOX

FASm260

MYCSacp58

SDPDS

MYC2CYC190

SUCBZL122

G16MTM10

PPDIMAS

MSHAMID

G16MTM2

ACLS3

DMPPS

PREPHACPHr

FACOAL161

TDMS1

PPTGS

PAPPT3

G16MTM8

HSK

MYC1CYC593

THMDP

GTHS

AMMQT8_2

PPTGS_TB1

TATSO131

DMATT

FAS181

FAMPL2

UGMDDS

FTHFD

RRPPDIMAS

DDPA11

PGSA190

OMCDC

PTPATi

UGLDDS2

AMMQT8

UDPGALM43

NADH9

SHCHCS2125

ARGSL

UAGPT3

UAMAS

PAPPT2

UAGPT3

ANS10

TMN

FMETTRS

PAPPT1

FAO3

FASm2202

PDIMAS

MANAT3

RBFSa56

NPHS122

MYCON2

IACGAMS80

ILETA84

G1PACT

TMHAS3

G16MTM6

ACCC78

RBFK

MYCFLUX4

DPPS

DAPE

AGPAT190

SHCHCS

UGMDDS

MI3PP

DCPDPP2

GF6PTAr

MYC2CYC2

TMPKr

SERD_L

TMHAS1

ALAALAD

LPLIPAL2E160

FHL

SERD_D

CHORS20

SHSL1r

CDAPPA160

MYC1M293

UAMAGS

ARABF43

TAS43

ARABF43

PPNCL3

TMHAS6

ACP1(FMN)

PATS

PHTHDLS2

FACOAL200

FACOAL180

HSDy

SUCBZS122

FACOAE140

UGMDDS2

MDFDH

FASm1801

GTPCII

HMPK4

DCPT62

MANAT2

ASNS1

FAS180

AMMQT613

FRRPPDIMAS

FAO2

FASm2401

PGPP190

AMPTASECG

MCBTS2

PSCVT20

GCC2

DHPPDA

MYC1CYC386

PGPP160190

DHNPA2

G16MTM9

FACOAL160

ARI

VALTA

MSHS50

PAPPT3

KARA2i

CHRPL

IGPS10

DHDPS2

MANAT1

TMPPP

FAMPL593

DAPDC

ACHBS

IPDDI

CYSS

IGAMD80

TYRTA

FAO10

MOHMT12

PHTO

RPPDIMAS

MI4PP

UAMAGS

SSALx

AACPS11

PREPTHS2

MTHFC

MYC1CYC190

GALFT43

UAGPT1

FOMETR

SALCS

ASPK

MYCFLUX3

DHNPA9

OXGDC2125

FTHFL

IPMD

GLNS

FAMPL190

PGPPT3

FASm2002

DESAT16

APRAUR

PEAMNO

PMDPHT

GLUCYS

MCBTS3

UGMAGS

MNDNS1

GLNSP1

FAS12057

MI1PS

ASPO5

TRPS2

TDMS2

MMM2r

IPPMIa14

MYCON3

ALATA_L

DPCOAK

DASYN160190

TRPS1

DCPE62

THDPS2

UDPDPS2

TATS

CDAPPA190

ARGSS

AMID4

UAAGDS

TRPTA

UAAGLS1

IPDPS

GTPCII2

AFTA43

FAMPL386

FAO11

FOLD39

FACOAL140

CHORM

MECDPDH16

FASm180

UAMAS

O16RHAT43

TDMS3

NADH10

HMPK1

DCPDPS

FASm340

UDCPDP

ASNN

HMPK3

MCBTS

PGAMT

AHMMPS

TMHAS2

PGLS

IPPMIb14

MCOATA

ACPS1

MYC1CYC2

PAPPT2

MYCTR

GLUCYS

UAGDP

THRD_Lr

GMT243

NO

GALFT43

DCPT2

DHDPRy2

FACOALPREPH

UAPGR

PANTS12

MYC1M1

UGMAS

CYTBD2

GLUDC

MCD

ILEt2r84

ASAD

TMHAS5

UGAGDS

CYSTGL

ADCL

CLPNS160190

DNTPPA

FACOAL181

FASm320

UAGPT2

AACPS10

HAS43

PREPTTA

HSST

FASm2402

DHNAOT213

PHTHCLS

MYCTR2

G16MTM5

MYCFLUX1

GRTT

UAAGLS2

FORMCOAL

PPCDC

DHNAOT

MYCON190

FASm2802

ACChex78

THRS

TRPS3

ACGAMT43

FASm240

GLUR

GLUSy

MTHFD2

PGSA160190

THFGLUS

UGLDDS1

O16RHAT43

FAS10057

FAS260

ADCS

AFE43

PHTHCLR2

GLUR

GMT1

Page 15 of 20(page number not for citation purposes)

Page 16: BMC Systems Biology BioMed Central - Springer...structured (BiGG) 'databases'. We employ constraint-based reconstruction and analysis (COBRA) of this BiGG reconstruction to learn about

BMC Systems Biology 2007, 1:26 http://www.biomedcentral.com/1752-0509/1/26

phenotypes [36]. We applied this same concept to iNJ661,but rather than looking for disease phenotypes, we soughtto identify alternative drug targets.

ConclusionOver time and with iterative improvements, metabolicnetwork reconstructions have achieved the stage ofhypothesis generation and model-driven biological dis-covery in systems biology [42]. We present an in silicostrain of M. tb, iNJ661, with the anticipation that it will bereceived by the community in a similar vein and to assistin the discovery process of this remarkable yet devastatingpathogen.

A large amount of material has been presented in thismanuscript, beginning with the presentation of a genome-scale reconstruction and model of M. tb, iNJ661, followedby experimental validation and use of the model for inte-gration and analysis of multiple experimental datasets.We summarize some of the main points that highlightareas in which further experimental research may be fruit-ful.

Growth studies• iNJ661 can grow at rates consistent with experimentaldata in varying media conditions. Further measurementsof the biomass composition of M. tb in well-defined situ-ations are likely to improve the in silico predictions underthose conditions.

• Analysis of the capabilities under maximal growth con-ditions can help identify critical nutrient targets for killingM. tb (such as the sulfate transporter) or for optimizinggrowth (in the lab).

• The pathway enabling growth of M. tb using CO as thecarbon source is still unclear. The computational studiessuggest that there must be an alternative pathway (toreductive citrate cycle and RuBisCO) which enables entryof CO2 into central metabolism.

• The seemingly paradoxical need for glucose supplemen-tation in Middlebrook media but non-essentiality ofdirectly related reactions involving the metabolism of glu-cose-6-phosphate for in vitro growth and the almost dia-metrically opposed results for in silico growth, suggest thatthere may be additional pathways involving glucosemetabolism and potentially non-metabolic functionsrelated to the glucose requirement.

Comparison with and analysis of large datasets• The definition of the biomass function will largely dic-tate the results of optimal growth/gene deletion experi-ments, consequently, positive results (in agreement withthe experiments) are not as interesting as the false negative

(or false positive results). The false negative results may bedue to an error in the annotation of the gene or an under-lying physiological mechanism that we are not aware of.In either case, investigating the particular genes experi-mentally may help clarify the issue.

• The model can be used as a Context for Biological Con-tent and provides a consistent, coherent framework toanalyze various datasets focusing on metabolism.

'Unbiased' analysisUnbiased analysis methods, methods that calculate prop-erties of the network independent of objective functions,can provide interesting and valuable insights into networkcapabilities [32]. The definition of HCR sets, in a similarvein to correlated reaction sets from sampling and flux-coupling, can confer a hierarchical structure to metabolicnetworks, and may assist in the identification of alterna-tive drug targets. Using these models in conjunction withexperimental data can assist in screening and identifyingnew and alternative drug targets.

Taken together, this study adds another high resolutionreconstruction of microbial metabolism. There now exista growing number of such reconstructions that have beenparticularly useful for basic studies of network propertiesand for bioprocessing applications [42-44]. As thenumber of reconstructions of human pathogens grows,we should be able to expand their uses towards improvedunderstanding pathogenic mechamisms, and design ofinterventions and treatment. Large-scale rigorous experi-mental validation studies should now be performed tofurther these goals.

MethodsReconstruction and model development of the metabolicnetwork has been described previously [7,8]. The recon-struction contents will be made available on our website[40] in Excel [see Additional file 5] and SBML formats.

ReconstructionFigure 1 provides a brief summary of the reconstructionand modeling building process. The sequence basedgenome annotation of Mycobacterium tuberculosisH37Rv was downloaded from TIGR [45] and served as theframework of the model. Charge and elementally bal-anced reactions were added individually based on thisannotation, legacy data when available, and updatesmade in the Tuberculist database [31]. KEGG [46] and theSEED [47] were used as ancillary tools on occasion. Fol-lowing the initial reconstruction, the gaps were evaluatedindividually by searching for direct evidence in the litera-ture for their metabolism. Due to the relative sparsity ofliterature on aspects of central and cofactor metabolism inM. tuberculosis, many of these gaps could not be filled. Leg-

Page 16 of 20(page number not for citation purposes)

Page 17: BMC Systems Biology BioMed Central - Springer...structured (BiGG) 'databases'. We employ constraint-based reconstruction and analysis (COBRA) of this BiGG reconstruction to learn about

BMC Systems Biology 2007, 1:26 http://www.biomedcentral.com/1752-0509/1/26

acy data in the form of primary articles, review articles,and textbooks were employed in addition to the databaseresources [see Additional file 6] during the reconstructionand model building phases. A comprehensive map ofiNJ661's metabolic network was visualized by creating amap of the network organized by lumped subsystems ofmetabolism [see Additional files 8, 9, 10, 11, 12, 13, 14,15]. After the debugging process, when the cell could grow(i.e. produce biomass) on different media, the remainingintracellular gaps were evaluated by searching the litera-ture for evidence of metabolic reactions involving thatparticular metabolite. If no evidence of transport or bio-chemical transformations of the metabolite in M. tb wasfound, no additional reactions or transporters wereadded.

Model formulationThe stoichiometric matrix, S, was constructed based onthe reactions described in the reconstruction. The biomassfunction was defined using available legacy data [seeAdditional file 1]. Flux Balance Analysis (FBA) simula-tions were employed during the debugging processtowards developing a functional model.

Flux Balance Analysis (FBA)The metabolic network is represented mathematically bythe stoichiometric matrix, m rows by n columns, wherethere are m metabolites and n reactions in the network.Mass conservation dictates

which simplifies to

S·v = 0

at the steady state. Further constraints can be placed onthe system based on thermodynamics (reaction direction-ality) and limits on substrate uptake rates

α ≤ v ≤ β

The oxygen uptake rate measured from the MycobacteriaBovis BCG strain of 25 μL/hr/mg dry weight (0.98 mmolO2/hr/g dry weight) [48] was used. The small moleculecarbon sources (glucose, glycerol, etc.) were limited to anuptake rate of 1 mmol/hr/g dry weight and all otherallowable substrates were unconstrained. The different insilico culture media are listed in Table 3. Deviations fromthe standard media include: the addition of minuteamounts of ferric iron in Youmans media. The oxygenuptake rate was the growth constraining flux in all of thedifferent media.

Flux Variability Analysis (FVA)Constraining the biomass objective function, c, to theoptimal value calculated in FBA,

max(cT·v)

every flux in the network is then minimized and maxi-mized. The different between the maximum and mini-mum for each flux defines the flux variability for thatreaction.

Robustness AnalysisRobustness plots are performed by varying a particularflux through a pre-defined range and recalculating theobjective function. The slope of the curve describes thesensitivity of the objective function on that particular flux(over the specified range of values).

Hard-Coupled Reaction (HCR) Sets

Denoting the binary form of the stoichiometric matrix, ,the input/reactant part of the stoichiometric matrix, S-,

and the output/product part of the stoichiometric matrix,S+, then it is transparent that,

The input and output connectivities of each metabolite

can be calculated for - and +, respectively. The

input:output ratio for each metabolite can be determinedand all metabolites with 1:1 connectivity will be hard-coupled by the network. Metabolites with 0:2 or 2:0 con-nectivity reflect metabolites that are hard-coupled in theevent that the corresponding reaction is reversible. Metab-olites with 0:1 or 1:0 connectivity are either source/sinktransporters or blocked reactions.

A general algorithm for calculating hard coupled reactionsets.

1. Identify 1:1, 0:2, and 2:0 metabolites

2. Define HCRs

a. Identify the sets of reactions corresponding to themetabolites, denote the entire set as HCR0

b. Join any HCR0 subset that share 1 or more reactions,denote the entire set as HCR1

c. i ← 1

3. while (HCRi-1 ≠ HCRi)

dx

dtS v= •

S

S S S= +− +ˆ ˆ

S S

Page 17 of 20(page number not for citation purposes)

Page 18: BMC Systems Biology BioMed Central - Springer...structured (BiGG) 'databases'. We employ constraint-based reconstruction and analysis (COBRA) of this BiGG reconstruction to learn about

BMC Systems Biology 2007, 1:26 http://www.biomedcentral.com/1752-0509/1/26

a. Join any HCRi subset that share 1 or more reactions

b. i ← i+1

Since 0:2 or 2:0 reactions may be irreversible reactions, anintermediate processing step may be needed to verify thatthere is biological evidence supporting the catalysis of thereaction in the reverse direction. If one does not wish toconsider these potential effects and assume that all of thereactions are reversible then step 1 can be simplified by

finding all of the 2 metabolites for . Biomass functionswere removed from the stoichiometric matrix before cal-culating the HCR sets. The HCRs calculated for iNJ661had 94 0:2/2:0 reactions. Of these, 24 overlapped with thecore set of 147 1:1 HCRs. Since the set of 0:2/2:0 reactionsdid not have additional gene associations (that weren'talready in the 1:1 set) and since they had many irreversi-ble reactions throughout, the 1:1 set and the 0:2/2:0 setwere not combined (step 1 in the above algorithm outlinewould involve only identifying the 1:1 metabolites).

The reconstruction process, the metabolic maps, and themajority of the calculations were performed using Sim-pheny™ v1.11 software (Genomatica, Inc.). HCR Sets werecalculated with Mathematica© v4.2 (Wolfram Research,Inc.).

Authors' contributionsNJ carried out the reconstruction, performed the analyses,and drafted the manuscript. BØP conceived the study andhelped revise the manuscript. Both authors approved thecontent of the final manuscript.

Additional material

Additional File 1Molecular composition of biomass functions. A detailed list of the biomass constituents and their respective coefficients in the biomass function in addition to the references from which this information was found.Click here for file[http://www.biomedcentral.com/content/supplementary/1752-0509-1-26-S1.xls]

Additional File 2HCRs mapped to experimental data sets. The list of HCRs that mapped to the three experimental data, based on gene loci.Click here for file[http://www.biomedcentral.com/content/supplementary/1752-0509-1-26-S2.xls]

S

Additional File 3HCRs mapped to reaction and metabolic subsystems. The complete list of HCRs, highlighting those known to be drug targets in M. tb.Click here for file[http://www.biomedcentral.com/content/supplementary/1752-0509-1-26-S3.xls]

Additional File 4Comprehensive list of false negatives between OptGro data set and iNJ661 lethal gene deletions. A catalog of the false negative results pro-duced by iNJ661 when compared to the OptGro data set. Each of false negative results is categorized into at least one of four different categories. Specific loci are listed in cases with suggested alternative loci or alternative paths.Click here for file[http://www.biomedcentral.com/content/supplementary/1752-0509-1-26-S4.xls]

Additional File 5Model content for iNJ661. A complete list of the reactions (abbreviations and full names), the associated mass and charge balanced reactions, con-fidence scores, reaction subsystems, and the list of the Gene-Protein-Reac-tion relationships, including logical AND/OR relationships, for all of the reactions in the model. A complete list of metabolites appearing in iNJ661, including abbreviation, full name, formula, and charge at pH 7.2 can be found in the second tab in the file.Click here for file[http://www.biomedcentral.com/content/supplementary/1752-0509-1-26-S5.xls]

Additional File 6Explicit list of references used in the reconstruction process. The list of ref-erences used to define particular reactions and GPR relationships.Click here for file[http://www.biomedcentral.com/content/supplementary/1752-0509-1-26-S6.pdf]

Additional File 7Robustness plots for phosphate and sulfate for iNJ661 and iJR904. Plot-ting the biomass objective function versus phosphate or sulfate uptake rates for in silico strains of M. tb and E. coli.Click here for file[http://www.biomedcentral.com/content/supplementary/1752-0509-1-26-S7.pdf]

Additional File 8Amino acid metabolism. A map of the amino acid pathways in iNJ661.Click here for file[http://www.biomedcentral.com/content/supplementary/1752-0509-1-26-S8.svg]

Additional File 9Central metabolism. A map of the central metabolic pathways in iNJ661.Click here for file[http://www.biomedcentral.com/content/supplementary/1752-0509-1-26-S9.svg]

Additional File 10Fatty acid metabolism. A map of the fatty acid pathways in iNJ661.Click here for file[http://www.biomedcentral.com/content/supplementary/1752-0509-1-26-S10.svg]

Page 18 of 20(page number not for citation purposes)

Page 19: BMC Systems Biology BioMed Central - Springer...structured (BiGG) 'databases'. We employ constraint-based reconstruction and analysis (COBRA) of this BiGG reconstruction to learn about

BMC Systems Biology 2007, 1:26 http://www.biomedcentral.com/1752-0509/1/26

AcknowledgementsThis work was supported in part by an NIH Training Grant. We also thank Mr. Scott Becker and Mr. Adam Feist for helpful comments and feedback. The authors do not declare any competing interests.

References1. Kasper D, Braunwald E, Fauci A, Hauser S, Longo D, Jameson J: Har-

rison's Principles of Internal Medicine. 16th edition. McGraw-Hill Professional; 2004.

2. Schneider E, Moore M, Castro KG: Epidemiology of tuberculosisin the United States. Clinics in chest medicine 2005, 26:183-195. v.

3. Small PM, Fujiwara PI: Management of tuberculosis in theUnited States. The New England journal of medicine 2001,345(3):189-200.

4. Sharma SK, Mohan A: Multidrug-resistant tuberculosis: a men-ace that threatens to destabilize tuberculosis control. Chest2006, 130(1):261-272.

5. Wayne LG, Sohaskey CD: Nonreplicating persistence of myco-bacterium tuberculosis. Annual review of microbiology 2001,55:139-163.

6. Cole ST, Brosch R, Parkhill J, Garnier T, Churcher C, Harris D, Gor-don SV, Eiglmeier K, Gas S, Barry CE 3rd, et al.: Deciphering thebiology of Mycobacterium tuberculosis from the completegenome sequence. Nature 1998, 393(6685):537-544.

7. Becker SA, Palsson BO: Genome-scale reconstruction of themetabolic network in Staphylococcus aureus N315: an initialdraft to the two-dimensional annotation. BMC microbiology[electronic resource] 2005, 5(1):8.

8. Feist AM, Scholten JC, Palsson BO, Brockman FJ, Ideker T: Modelingmethanogenesis with a genome-scale metabolic reconstruc-tion of Methanosarcina barkeri. Molecular systems biology [elec-tronic resource] 2006, 2(2006):0004. 2006 0004

9. Khuller GK, Taneja R, Kaur S, Verma JN: Lipid composition andvirulence of Mycobacterium tuberculosis H37Rv. The Austral-ian journal of experimental biology and medical science 1982, 60(Pt5):541-547.

10. Nandedkar AK: Comparative study of the lipid composition ofparticular pathogenic and nonpathogenic species of Myco-bacterium. Journal of the National Medical Association 1983,75(1):69-74.

11. Watanabe M, Aoyagi Y, Ridell M, Minnikin DE: Separation andcharacterization of individual mycolic acids in representativemycobacteria. Microbiology (Reading, England) 2001, 147(Pt7):1825-1837.

12. Youmans AS, Youmans GP: Ribonucleic acid, deoxyribonucleicacid, and protein content of cells of different ages of Myco-bacterium tuberculosis and the ralationship to immuno-genicity. J Bacteriol 1968, 95(2):272-279.

13. Garcia-Vallve S, Guzman E, Montero MA, Romeu A: HGT-DB: adatabase of putative horizontally transferred genes inprokaryotic complete genomes. Nucleic acids research 2003,31(1):187-189.

14. Beste DJ, Peters J, Hooper T, Avignone-Rossa C, Bushell ME, McFad-den J: Compiling a molecular inventory for Mycobacteriumbovis BCG at two growth rates: evidence for growth rate-mediated regulation of ribosome biosynthesis and lipidmetabolism. J Bacteriol 2005, 187(5):1677-1684.

15. Acharya PV, Goldman DS: Chemical composition of the cell wallof the H37Ra strain of Mycobacterium tuberculosis. J Bacteriol1970, 102(3):733-739.

16. Middlebrook G, Cohn ML: Bacteriology of tuberculosis: labora-tory methods. American journal of public health 1958,48(7):844-853.

17. Youmans , Karlson : 1947.18. James BW, Williams A, Marsh PD: The physiology and patho-

genicity of Mycobacterium tuberculosis grown under con-trolled conditions in a defined medium. Journal of appliedmicrobiology 2000, 88(4):669-677.

19. Cox RA: Quantitative relationships for specific growth ratesand macromolecular compositions of Mycobacterium tuber-culosis, Streptomyces coelicolor A3(2) and Escherichia coli B/r: an integrative theoretical approach. Microbiology (Reading,England) 2004, 150(Pt 5):1413-1426.

20. Primm TP, Andersen SJ, Mizrahi V, Avarbock D, Rubin H, Barry CE3rd: The stringent response of Mycobacterium tuberculosis isrequired for long-term survival. J Bacteriol 2000,182(17):4889-4898.

21. Edwards J, Ramakrishna R, Palsson B: Characterizing the meta-bolic phenotype: a phenotype phase plane analysis. BiotechnolBioeng 2002, 77(1):27-36.

22. Reed J, Palsson B: Genome-scale in silico models of E. coli havemultiple equivalent phenotypic states: assessment of corre-lated reaction subsets that comprise network states. GenomeRes 2004, 14(9):1797-1805.

23. Sassetti CM, Boyd DH, Rubin EJ: Genes required for mycobacte-rial growth defined by high density mutagenesis. Mol Microbiol2003, 48(1):77-84.

24. Reed JL, Vo TD, Schilling CH, Palsson BO: An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR).Genome biology 2003, 4(9):R54.

25. Cole ST: Tuberculosis and the tubercle bacillus. Washington,DC: ASM Press; 2005.

26. King GM: Uptake of carbon monoxide and hydrogen at envi-ronmentally relevant concentrations by mycobacteria. ApplEnviron Microbiol 2003, 69(12):7266-7272.

27. Park SW, Hwang EH, Park H, Kim JA, Heo J, Lee KH, Song T, Kim E,Ro YT, Kim SW, et al.: Growth of mycobacteria on carbonmon-oxide and methanol. J Bacteriol 2003, 185(1):142-147.

28. Srinivasan V, Morowitz HJ: Ancient genes in contemporary per-sistent microbial pathogens. The Biological bulletin 2006,210(1):1-9.

29. Gao Q, Kripke KE, Saldanha AJ, Yan W, Holmes S, Small PM: Geneexpression diversity among Mycobacterium tuberculosis

Additional File 11Membrane metabolism. A map of the membrane metabolism pathways in iNJ661.Click here for file[http://www.biomedcentral.com/content/supplementary/1752-0509-1-26-S11.svg]

Additional File 12Nucleotide metabolism. A map of the nucleotide synthesis and degrada-tion pathways in iNJ661.Click here for file[http://www.biomedcentral.com/content/supplementary/1752-0509-1-26-S12.svg]

Additional File 13Peptidoglycan metabolism. A map of the peptidoglycan pathways in iNJ661.Click here for file[http://www.biomedcentral.com/content/supplementary/1752-0509-1-26-S13.svg]

Additional File 14Transporters. A map of the transport reactions found in iNJ661.Click here for file[http://www.biomedcentral.com/content/supplementary/1752-0509-1-26-S14.svg]

Additional File 15Vitamin and cofactor metabolism. A map of the vitamin and cofactor pathways in iNJ661.Click here for file[http://www.biomedcentral.com/content/supplementary/1752-0509-1-26-S15.svg]

Page 19 of 20(page number not for citation purposes)

Page 20: BMC Systems Biology BioMed Central - Springer...structured (BiGG) 'databases'. We employ constraint-based reconstruction and analysis (COBRA) of this BiGG reconstruction to learn about

BMC Systems Biology 2007, 1:26 http://www.biomedcentral.com/1752-0509/1/26

Publish with BioMed Central and every scientist can read your work free of charge

"BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime."

Sir Paul Nurse, Cancer Research UK

Your research papers will be:

available free of charge to the entire biomedical community

peer reviewed and published immediately upon acceptance

cited in PubMed and archived on PubMed Central

yours — you keep the copyright

Submit your manuscript here:http://www.biomedcentral.com/info/publishing_adv.asp

BioMedcentral

clinical isolates. Microbiology (Reading, England) 2005, 151(Pt1):5-14.

30. Sassetti CM, Rubin EJ: Genetic requirements for mycobacterialsurvival during infection. Proc Natl Acad Sci USA 2003,100(22):12989-12994.

31. Tuberculist Web Server, Pasteur Institute [http://genolist.pasteur.fr/TubercuList/]

32. Price ND, Reed JL, Palsson BO: Genome-scale models of micro-bial cells: evaluating the consequences of constraints. Nat RevMicrobiol 2004, 2(11):886-897.

33. Papin JA, Reed JL, Palsson BO: Hierarchical thinking in networkbiology: the unbiased modularization of biochemical net-works. Trends Biochem Sci 2004, 29(12):641-647.

34. Thiele I, Price ND, Vo TD, Palsson BO: Candidate metabolic net-work states in human mitochondria. Impact of diabetes,ischemia, and diet. J Biol Chem 2005, 280(12):11683-11695.

35. Burgard AP, Nikolaev EV, Schilling CH, Maranas CD: Flux couplinganalysis of genome-scale metabolic network reconstruc-tions. Genome Res 2004, 14(2):301-312.

36. Jamshidi N, Palsson BO: Systems biology of SNPs. Molecular sys-tems biology [electronic resource] 2006, 2:38.

37. Pfeiffer T, Sanchez-Valdenebro I, Nuno JC, Montero F, Schuster S:METATOOL: for studying metabolic networks. Bioinformatics1999, 15(3):251-257.

38. Palsson BO: Systems Biology: Determining the Capabilities ofReconstructed Networks. Cambridge Univ Pr; 2006.

39. Mdluli K, Spigelman M: Novel targets for tuberculosis drug dis-covery. Current opinion in pharmacology 2006, 6(5):459-467.

40. Systems Biology Research Group, UCSD [http://systemsbiology.ucsd.edu/]

41. Raman K, Rajagopalan P, Chandra N: Flux balance analysis ofmycolic Acid pathway: targets for anti-tubercular drugs.PLoS computational biology 2005, 1(5):e46.

42. Reed JL, Famili I, Thiele I, Palsson BO: Towards multidimensionalgenome annotation. Nat Rev Genet 2006, 7(2):130-141.

43. Hjersted JL, Henson MA, Mahadevan R: Genome-scale analysis ofSaccharomyces cerevisiae metabolism and ethanol produc-tion in fed-batch culture. Biotechnol Bioeng 2007.

44. Van Dien SJ, Lidstrom ME: Stoichiometric model for evaluatingthe metabolic capabilities of the facultative methylotrophMethylobacterium extorquens AM1, with application toreconstruction of C(3) and C(4) metabolism. Biotechnol Bioeng2002, 78(3):296-312.

45. The Institute for Genomic Research [http://www.tigr.org/]46. Kyoto Encyclopedia of Genes and Genomes [http://

www.genome.jp/kegg/]47. The SEED [http://theseed.uchicago.edu/FIG/index.cgi]48. van Hemert PA, Tiesjema RH: Possible use of the oxygen uptake

rate in the evaluation of BCG vaccines. Journal of biological stand-ardization 1977, 5(2):121-129.

Page 20 of 20(page number not for citation purposes)