the compositional and evolutionary logic of metabolism · together these reactions make up...

57
The Compositional and Evolutionary Logic of Metabolism Rogier Braakman D. Eric Smith SFI WORKING PAPER: 2012-08-011 SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent the views of the Santa Fe Institute. We accept papers intended for publication in peer-reviewed journals or proceedings volumes, but not papers that have already appeared in print. Except for papers by our external faculty, papers must be based on work done at SFI, inspired by an invited visit to or collaboration at SFI, or funded by an SFI grant. ©NOTICE: This working paper is included by permission of the contributing author(s) as a means to ensure timely distribution of the scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the author(s). It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may be reposted only with the explicit permission of the copyright holder. www.santafe.edu SANTA FE INSTITUTE

Upload: lenhi

Post on 30-Aug-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

The Compositional andEvolutionary Logic ofMetabolismRogier BraakmanD. Eric Smith

SFI WORKING PAPER: 2012-08-011

SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent theviews of the Santa Fe Institute. We accept papers intended for publication in peer-reviewed journals or proceedings volumes, but not papers that have already appeared in print. Except for papers by our externalfaculty, papers must be based on work done at SFI, inspired by an invited visit to or collaboration at SFI, orfunded by an SFI grant.©NOTICE: This working paper is included by permission of the contributing author(s) as a means to ensuretimely distribution of the scholarly and technical work on a non-commercial basis. Copyright and all rightstherein are maintained by the author(s). It is understood that all persons copying this information willadhere to the terms and constraints invoked by each author's copyright. These works may be reposted onlywith the explicit permission of the copyright holder.www.santafe.edu

SANTA FE INSTITUTE

Page 2: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

The compositional and evolutionary logic of metabolism

Rogier Braakman and Eric SmithSanta Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA

(Dated: July 10, 2012)

Metabolism is built on a foundation of organic chemistry, and employs structures and interac-tions at many scales. Despite these sources of complexity, metabolism also displays striking androbust regularities in the forms of modularity and hierarchy, which may be described compactly interms of relatively few principles of composition. These regularities render metabolic architecturecomprehensible as a system, and also suggests the order in which layers of that system came intoexistence. In addition metabolism also serves as a foundational layer in other hierarchies, up to atleast the levels of cellular integration including bioenergetics and molecular replication, and trophicecology. The recapitulation of patterns first seen in metabolism, in these higher levels, motivatesus to interpret metabolism as a source of causation or constraint on many forms of organization inthe biosphere. Many of the forms of modularity and hierarchy exhibited by metabolism are readilyinterpreted as stages in the emergence of catalytic control by living systems over organic chemistry,sometimes recapitulating or incorporating geochemical mechanisms.

We identify as modules, either subsets of chemicals and reactions, or subsets of functions, that arere-used in many contexts with a conserved internal structure. At the small molecule substrate level,module boundaries are often associated with the most complex reaction mechanisms, catalyzed byhighly conserved enzymes. Cofactors form a biosynthetically and functionally distinctive controllayer over the small-molecule substrate. The most complex members among the cofactors are oftenassociated with the reactions at module boundaries in the substrate networks, while simpler cofac-tors participate in widely generalized reactions. The highly tuned chemical structures of cofactors(sometimes exploiting distinctive properties of the elements of the periodic table) thereby act as“keys” that incorporate classes of organic reactions within biochemistry.

Module boundaries provide the interfaces where change is concentrated, when we catalogue extantdiversity of metabolic phenotypes. The same modules that organize the compositional diversityof metabolism are argued, with many explicit examples, to have governed long-term evolution.Early evolution of core metabolism, and especially of carbon-fixation, appears to have requiredvery few innovations, and to have used few rules of composition of conserved modules, to produceadaptations to simple chemical or energetic differences of environment without diverse solutionsand without historical contingency. We demonstrate these features of metabolism at each of severallevels of hierarchy, beginning with the small-molecule metabolic substrate and network architecture,continuing with cofactors and key conserved reactions, and culminating in the aggregation of multiplediverse physical and biochemical processes in cells.

I. INTRODUCTION

The chemistry of life is distinguished by being bothhighly ordered and far from thermodynamic equilib-rium [1].1 This dynamical order is maintained by thenon-equilibrium transfer of electrons through the bio-sphere. Free energy from potential differences betweenelectron donors and acceptors can be derived from a vari-ety of biogeochemical cycles [4], but within cells electrontransfer is mediated by a small number of universal elec-tron carriers which drive a limited array of organic reac-tions [5]. Together these reactions make up metabolism,which governs the chemical dynamics both within organ-

1 Applying the often-invoked term “far from equilibrium” to bio-chemistry requires care. When catalysts (including transportersor other molecular machinery) create a separation of timescalesbetween supported reactions and autonomous parasitic reactions,the supported reactions can sometimes be treated as an isolatedsubsystem with equilibrium approximations [2, 3], though theisolation itself is a cumulative deviation far from equilibrium.

isms and across ecosystems. The universal and appar-ently conserved metabolic network transcends all knownspecies diversification and evolutionary change [6, 7], anddistinguishes the biosphere within the major classes ofplanetary processes.2 We identify metabolism with thequite specific substrate architecture and hierarchical con-trol flow of this network, which provide the most essentialcharacterization of the chemical nature of the living state.

Understanding the structure of metabolism is centralto understanding how physics and chemistry constrainlife and evolution. The polymerization of monomersinto selected functional macromolecules, and the evenmore complex integration and replication of completecells, form a well-recognized hierarchy of coordinationand information-carrying processes. However, in the se-quence of biosynthesis these processes come late, andthey involve a much smaller and simpler set of chemical

2 Ref. [8] first proposed classifying the biosphere as the fourth “geo-sphere”, parallel to the lithosphere, hydrosphere, and atmospherethat have provided a classical taxonomy in geology.

Page 3: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

2

reactions than core metabolism, the network in which allbasic monomer components of biomass are created fromenvironmental inputs. Because the core is the origin of allbiomass, its flux is perforce higher than that in any sec-ondary process; only membrane electron transport (re-viewed in Ref. [4]) has higher energy flux.3 Metabolismis the sub-space of organic chemistry over which life hasgained catalytic control. Because in the construction andoptimization of biological phenotypes all matter flowsthrough this sub-space, its internal structure imposes astrong filter on evolution.

In this review we identify a number of organizing prin-ciples behind the major universal structures and func-tions of metabolism. They provide a simple character-ization of metabolic architecture, particularly in rela-tion to microbial metabolism, ecology, and phylogeny,and the major (biogeochemical) transitions in evolution.We often find the same patterns of organization reca-pitulated at multiple scales of time, size, or complex-ity, and can trace these to specific underlying chemistry,network topology, or robustness mechanisms. Acting asconstraints and sources of adaptive variation, they havegoverned the evolution of metabolism since the earliestcells, and some of them may have governed its emergence.They allow us to make plausible reconstructions of thehistory of metabolic innovations and also to explain cer-tain strong evolutionary convergences and the long-termpersistence of the core components of metabolic architec-ture.

Many structural motifs in both the substrate and con-trol levels of metabolism may be interpreted as functionalmodules. By isolating effects of perturbation and error,modularity can both facilitate emergence, and supportrobust function, of hierarchical complex systems [10, 11].It may also affect the large-scale structure of evolutionby favoring variation in the regulation and linkage be-tween modules, while conserving and thereby minimiz-ing disruption of their internal architecture and stabil-ity [12, 13]. This can enhance evolvability through twoseparate effects. An increased phenotypic (i.e. structuralor functional, as opposed to genotypic or sequence) ro-bustness of individual modules gives access to larger ge-netic neutral spaces and thus a greater number of novelphenotypes at the boundaries of these spaces [14]. Atthe same time, concentrating change at module inter-faces, and allowing combinatorial variation at the mod-ule level, can decrease the amount of genetic variationneeded to generate heritable changes in aggregate phe-notypes [15, 16]. It has been argued that asymmetriesin evolutionary constraints can be amplified through di-rect selection for evolvability, and that this is a centralsource of modularity and hierarchy within biological sys-

3 Ref. [9] notes that, over a broad sample of enzymes collectedfrom the literature, those for secondary metabolic reactions haverates ∼ 1/30 the typical rates of enzymes for core reactions.

tems [15–18].These functional consequences of modularity lead us to

expect that intermediary metabolism will be modular asa reflection of the requirements of emergence and inter-nal stability. Certainly we observe this empirically; manytopological analyses of metabolic networks find a mod-ular and hierarchical structure [19–21]. We also expectthat, with the numerous and diverse constraints fromchemistry and physics in core metabolism, and their largeimpact on metabolic flux, the evolutionary consequencesof chemical modularity will be greatest from the core,and will diminish as chemical mechanisms are simplified,and the impact on flux is reduced, in more peripheralstages of biosynthesis.

To understand the origin and evolutionary conse-quences of modularity in metabolism, however, we willneed system-level representations that go beyond topol-ogy, to include sometimes quite particular distinctions offunction. Details of substrate chemistry, enzyme group-ing and conservation, and phylogenies of metabolic mod-ules, in particular, are rich sources of functional infor-mation and context. These will enable us to reconstructsteps in metabolic evolution and identify their environ-mental drivers.

A. Hierarchy in metabolism, and the role of

individuals and ecosystems

While most metabolic conversions are performedwithin cells,4 the structure of metabolism spans manylevels of biological organization. The causes and rolesof evolutionary changes, even though they arise withincellular lineages, may be only partly explained by orga-nization at the cellular or species level. Other levels thatmust also be considered include the meta-metabolomeof trophic ecosystems [22–24], and the links to geochem-istry [25–32]. The great biochemical cycles – of carbon,nitrogen, phosphorus, or many metals – combine phys-iological, ecological, and even geochemical links such asmantel convection or continental weathering. The deep-est universal features of metabolism are reliably seen atthe ecosystem level [7, 33] but not necessarily within or-ganisms [34].

These observations could be summarized as showingthat individuality is a derived characteristic of living sys-tems within a larger framework of metabolic regularity, aperspective that fits well with the modern understandingthat individuality takes many forms which must be ex-plained within their contexts [35]. Alternatively, in moreconventional genetic descriptions of evolution [36, 37],metabolic completeness, trophic as well as physiologicalflux balance, and network-level response to fluctuations

4 Exceptions include siderophores and secreted enzymes, most of-ten used at the cell-population level.

Page 4: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

3

are explicit features contributing to an organism’s fitnesswithin a co-evolving or constructed environment [38].

We can to a considerable extent disentangle the inher-ent chemical hierarchy of metabolism from the evolution-ary hierarchy of species by studying variations in the an-abolic (biosynthetic) versus catabolic (degradative) path-ways within organisms, along with the relations of au-totrophy (self-feeding) versus heterotrophy (feeding fromothers) in the ecological roles of species. We can arguefor the existence of a universal anabolic, autotrophic net-work [39, 40] that comprises the chemistry essential tolife. We can then separate the structural requirementsand evolutionary history of the universal network fromsecondary complexities, which we will argue originate inthe diversification of species and the concurrent processesof assembly of ecological communities.

Within the universal (and apparently essential) net-work we may identify further layers, with distinct func-tions and plausibly distinct origins. A functioningmetabolism is both a network of fluxes through sub-strate molecules, and a set of hierarchical relations inwhich some of the more complex structures control thekinetics of flows within the network. Within the sub-strate network, distinguishable subnetworks include thecore network to synthesize CHO backbones, networks ra-diating from the core that incorporate N, S, P, or metals,higher-order networks that assemble complex organicsfrom “building blocks”, and still others that synthesize allforms of polymers from small organic monomers. Withinthe control hierarchy, the layers of cofactors, oligomercatalysts, and integrated cellular energetic and biosyn-thetic subsystems are qualitatively distinct.

The foundation of autotrophy – and more generally theanchor that embeds the biosphere within geochemistry –is carbon-fixation, the transformation of CO2 into smallorganic molecules (see Fig. 1). A recent study [41] com-bining evidence from phylogeny and metabolic networkreconstruction5 showed that all carbon fixation pheno-types may be related by an evolutionary tree with veryhigh (nearly perfect) parsimony, and a novel but sen-sible phenotype at the root. The branches represent-ing innovations in carbon fixation were found to tracethe standard deep divergences of bacteria and archaea.More striking, likely environmental drivers could be iden-tified for most divergences, suggesting that deep evolu-tion reflects first incursions into novel geochemical envi-ronments. The tight coupling of the reconstructed phy-logeny to geochemical variety suggests that constraintsfrom chemistry and energetics drove early evolution inpredictable ways, leaving little need to invoke historicalcontingency.

5 We refer to this approach as “phylometabolic” reconstruction.

CO2

Biomass

GLYCO2CO2

Respiration

Carbon- xation

Catabolism

Anabolism

Anabolism

Carbon- xation

ACA

PYR

OXA

AKG

FIG. 1: The metabolic structure of the biosphere. The bio-sphere as a whole can be described as implementing a globalbiological carbon cycle based on CO2, with carbon-fixation asthe metabolic foundation. The small organic molecules pro-duced during fixation of CO2 are subsequently transformedand built up into the full diversity of known biomoleculesthrough the process of anabolism, before ultimately beingbroken back down through catabolism and re-released as CO2

through respiration. The striking modularity of metabolismis expressed in the fact that the interface between carbon-fixation and anabolism consists of a very small number ofsmall organic molecules (shown schematically at right-center).The key observation that in addition to intermediates in thecitric acid cycle – from which nearly all anabolic pathways em-anate [7] (see Fig. 6) – glycine (red) should be included in thisset allows a complete reconstruction of the evolutionary his-tory of carbon-fixation [41] (see also Figures 10 and 11). Ab-breviations: Acetyl-CoA (ACA); Pyruvate (PYR); Oxaloac-etate (OXA); α-Ketoglutarate (AKG); Glycine (GLY).

B. Catalytic control and origins of modularity in

metabolism

While carbon-fixation draws on all levels of biolog-ical organization (requiring integration and control ofmany cellular components), evolution in the network ofits small-molecule substrate has consisted only of changesin the use of a small number of clearly defined reactionsequences. The disruption, disconnection, or reversal ofthese modules accounts for the full diversity of mod-ern carbon-fixation. As we will show below, the mod-ule structure is further defined by a distinction betweentwo types of chemistry. Within modules, the reactionsare mainly (de-)hydration or (de-)hydrogenation reac-tions, catalyzed by enzymes from common and highly-diversified families. Module interfaces are created (anddistinguished) by key carboxylation reactions, catalyzedby highly conserved enzymes, often involving specialmetal centers and/or complex organic cofactors. Thecongruence of phylogenetic branching with topologicaland chemical module boundaries suggests that a verysmall number of catalytic innovations were the key bot-tlenecks to evolutionary diversification, against a back-ground of facile and readily re-used organic chemistry.

Topological modularity in the small-molecule substratenetwork is often associated with functional divisions in

Page 5: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

4

the more complex molecules that control metabolism,particularly the cofactors, showing that their metabolicrole is also an evolutionary role. As carriers of electronsor essential functional groups, cofactors regulate kineticbottlenecks in metabolic networks. The appearance anddiversification of families of biosynthetically related co-factors introduced functions which served as “keys” todomains in organic chemistry, incorporating these withinbiochemistry. Often we may map biosynthetic pathwaydiversification of cofactors onto particular lineage diver-gences in the tree of life. Cofactor biosynthetic networksare themselves modular, with multiple biosynthetic path-ways in a family using closely related enzymes that enablestructures characteristic of the cofactor class.

The quite sharply defined roles of many modules en-able us to understand strong evolutionary convergencesthat have occurred within fundamental biochemistry, andin some cases we can relate the functioning of an en-tire class of substrate or control molecules to specificchemical properties of elements or small chemical groups.Several important module boundaries are aligned at thesame points in their substrate networks and their controllayers. This suggests to us that lower-level substrate-reaction networks introduced constraints on the accessi-ble or robust forms of catalysis and aggregation that itwas later possible to build up over them. From repeatedmotifs within the substructure of modules, and from pat-terns of re-use or convergence, we may identify chemicalconstraints on major transitions in metabolic evolution,and we may separate the early functions of promiscu-ous catalysts as enablers of chemistry, from later restric-tions of reactants as catalysts were made more specific.The remarkable fact that such low-level chemical distinc-tions (in elements, reactions, or small-molecule networks)should have created constraints on innovation well intothe Darwinian era of modern cells suggests these as rel-evant constraints also in the pre-cellular era.

C. Manuscript outline

Our main message is twofold: 1) that the structure ofbiosynthetic networks and their observed variation, eventhough the networks are elaborate, has a compact repre-sentation in terms of a small collection of rules for com-position, and 2) that the same rules we abstract fromcomposition have a natural interpretation as constraintson evolutionary dynamics, which as a generating processhas produced the observed variants. We intend the ex-pression “logic of metabolism” to refer to the collection ofarchitectural motifs and functions that have apparentlybeen necessary for persistence of the biosphere, that haveled to modularity in the physics and chemistry of life, andthat have determined its major evolutionary contingen-cies and convergences.

After a short description of the important global fea-tures of metabolism in Sec. II, we will construct theseat ascending levels in the hierarchy, beginning in Sec. III

with the networks of core carbon fixation and the low-est levels of intermediary metabolism. We will then, inSec. IV, consider cofactors as the intermediate level ofstructure and the first level of explicit control in bio-chemistry, illustrating how key cofactor classes governthe fixation and transfer of elementary carbon units, andintroduce control over reductants and redox state. Bothin the metabolic substrate and in the cofactor domain, itwill be possible to suggest a specific historical order formany major innovations. For the substrate network thiswill capture conditional dependencies in the innovationof carbon fixation strategies. For cofactors it will allowus to approximately place the emergence of specific co-factor functionalities within the expansion of metabolicnetworks from inorganic inputs.

In Sec. V we consider the processes by which innova-tion occurs, specifically interplay of the introduction ofgeneral reaction mechanisms versus selectivity over sub-strates. The modular substructure and evolutionary se-quence of many of our reconstructed innovations favorsan early role for non-specific catalysts, with substrateselectivity appearing later. Finally in Sec. VI we listcandidates for the major organizing constraints on inte-gration of metabolism within cells. These include therole of compartments in linking energy systems, as wellas the coupling of physiological and genetic individuality,which permit species differentiation, and complementaryspecialization within ecological assemblies.

II. AN OVERVIEW OF THE ARCHITECTURE

OF METABOLISM

A. Anabolism and catabolism in individuals and

ecosystems

Metabolic networks within organisms are commonlycharacterized [42, 43] as having three classes of pathways:1) catabolic pathways that break down organic food toprovide chemical “building blocks” or energy; 2) corepathways through which nearly all small metabolites passduring primary synthesis or ultimate breakdown, and 3)anabolic pathways that build up all complex chemicalsfrom components originating in the core. This qual-itative characterization (which may be complicated bysalvage pathways and other cross-linkages) is supportedby a strong statistical observation that most minimalpathways connecting pairs of metabolites consist of acatabolic and an anabolic segment connected through thecore [44]. Thus, relatively speaking, the catabolic and an-abolic pathways are less densely crosslinked than path-ways within the core, from which they radiate. Catabolicpathways in a cell may be fed through physiological ortrophic links to other cells or organisms, or they maybreak down food produced previously by the same celland then stored. Fig. 2 illustrates schematically the re-lation of the three classes. Both catabolic and anabolicpathways may be large and somewhat diversified; the

Page 6: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

5

core itself constitutes no more than a few hundred smallmetabolites [39, 40], most of which have functions thatare universal throughout the biosphere.

Net Biosphere

Heterotrophic Individual

BiomassAnabolismCatabolism

Core

Autotrophic Individual

Trophic ecosystem

FIG. 2: Global structure of metabolism. Anabolic pathways(blue) build biomass and catabolic pathways (red), which maybe their direct reverses, break it down. Carbon enters the bio-sphere through the core (black), which is the starting point ofanabolism and also the endpoint of respiration. Because thebiosphere as a whole is autotrophic, anabolism is functionallyprior to catabolism. Both single organisms and assembliesof autotrophs can possess metabolic charts consisting onlyof anabolic pathways that fan outward from the core (blueand green). By partitioning pathway directions between an-abolic and catabolic (joined at the core), organisms can takeon the familiar “bowtie” architecture of derived metabolism(red with blue). Their assembly into trophic ecosystems (blueand red radial graph) then both builds and degrades organiccompounds actively, cycling carbon between environmentalCO2 and biomass (green). In these graphs, concentric (green)shells reflect sequential steps in biosynthesis leading to a hi-erarchy of increasing molecular complexity.

Whole-organism metabolisms are conventionally di-vided into two classes – autotrophs and heterotrophs – ac-cording to the ways they combine anabolic and catabolicpathways [5]. Autotrophs synthesize all required metabo-lites from inorganic precursors, and can function withoutcatabolism, using only the core and anabolic pathwaysradiating from it.6 Heterotrophs, in contrast, are organ-isms that must obtain organic inputs from their envi-ronments because they lack essential biosynthetic path-ways. As a result of this difference, the two classes ofmetabolism have fundamentally different ecological roles.

6 Establishing this completeness can prove challenging, how-ever [45].

Autotrophs form the lowest trophic level in the bio-sphere, fixing CO2 into organic matter, while het-erotrophs form all subsequent levels, determining thestructure of flows of organic compounds in trophicwebs [46], and actively cycling carbon from biomass backto environmental CO2. While all biological free energypasses at some stage through redox couples, autotrophscapture a part of this energy by transferring electronsfrom high energy reductants to CO2 [7]. Heterotrophsmay exploit incomplete use of this free energy throughinternal redox reactions (fermentation), or they may re-oxidize organic matter back to CO2 (respiration).

The role of catabolism in most organisms is closely tiedto their ecological role as heterotrophs. Heterotrophyprovides enormous opportunity for metabolic diversifi-cation [34], in the evolution of catabolic pathways andthe partitioning of essential anabolic reactions amongthe constituent species within ecosystems. However,the study of metabolism restricted to particular het-erotrophic organisms7 can obscure much of its universal-ity: heterotrophs may differ widely, but the aggregate an-abolic networks that sustain them at the level of ecosys-tems are largely invariant. Autotrophs show that muchof this diversity is not essential to life, allowing us toconceptually separate the requirements for biosynthesisfrom complexities that originate in processes of individ-ual specialization and ecological assembly [47].

The motif of three-stage pathways – catabolic, core,anabolic – between typical pairs of metabolites, a mo-tif obtained through the study of heterotrophs, has beenabstracted into a paradigm of “bowtie” architecture formetabolism [42–44].8 However, in combining universalelements of metabolic dependency with widely variablephysiological or ecological specializations, the “bowtie”can be misleading. The core and anabolism are essen-tial (and we argue more ancestral), and the reductionin cross-linking with distance from the core may be seento reflect an entirely outgoing radial “fan” of anabolism.Biomass is organized in a sequence of concentric shellsspanned by the radial pathways, which count the num-ber and complexity of biosynthetic steps [40]. Organismsexist which can function without catabolism, but only themost derived parasites lack anabolism. Moreover, manyof the common catabolic pathways are (approximate or

7 Almost all model organisms have been heterotrophs, becausethese are accessible and are usually connected to humans as sym-bionts, pathogens, or cultivars. E. coli (in which operons werediscovered) is a phenotypically and trophically very plastic or-ganism due to its complex lifecycle. All multicellular organismsare heterotrophs, including plants, since these fix carbon but relyon soil symbionts to fix nitrogen. The only known autotrophic or-ganisms are bacteria and archaea, and no autotroph is currentlywell-developed as a model organism.

8 The paradigm of the metabolic bowtie is also in part a borrow-ing from a conventional paradigm in engineering [42], motivatedby applications to human physiology and medicine (John Doyle,pers. comm.).

Page 7: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

6

exact) reversals of widespread anabolic pathways,9 andare explained as consequences of ecological change. Finerdiversifications arise as adaptations to specific ecologicalor geochemical environments. Therefore, in this reviewwe will emphasize core pathways and a subset of anabolicpathways, as they contribute to the universal aspects ofautotrophic networks.

The conceptual difference and asymmetry between au-totrophy and heterotrophy becomes clearer when we ex-amine the metabolic structure of ecosystems at increas-ing scales of aggregation. Entire ecosystems, to the ex-tent that they are approximately closed, function chem-ically as autotrophs. The biosphere as a whole is notonly approximately, but fully autotrophic, as today itdoes not depend significantly on extraterrestrially, atmo-spherically, or geologically produced organics. This ob-servation still admits two possibilities for the emergenceof aggregate metabolism: Either the biosphere has beenautotrophic since its inception, or it was originally het-erotrophic and later switched to using CO2 as its solecarbon source.

The Oparin-Haldane conjecture [49, 50] has motivatedsome consideration of a catabolic origin of life, but we canfind no close empirical contact of this conjecture with fea-tures of extant life. Therefore we will assume the primacyof the core and anabolic pathways, and will consider theproblem of emergence and early evolution of fully au-totrophic systems. As long as we do not conflate thechemical condition of autotrophy (complete anabolism)with assumptions about individuality (whether completeanabolisms are contained within the regulatory control ofindividual organisms) [47], and as long as we recognizethe ecosystem as potentially the correct level of aggrega-tion to define autotrophy, we need not assume that thefirst life consisted of autotrophic individual cells. Our in-terpretation extends equally to populations of organismsthat were physiologically as well as genetically incompleteand functioned cooperatively [51–54].

Once organism-level and species-level organization hasbeen put aside as a separate question, the chemicaldistinction between heterotrophy and autotrophy is adistinction between metabolic partial-systems with un-known and highly variable boundary conditions, versuswhole-systems required to subsist on CO2 and reductant.If we wish to understand the structure of the biosphereand to interpret the sequence of innovations in core car-bon fixation, the added constraints of autotrophy providea framework to do this. Finally, identifying the chemicalnature of life with autotrophic metabolism, rather thanmodeling it on species heterotrophy as in the Oparin-Haldane conjecture, is compatible with a hypothesis ofcontinuity in the emergence of life [47]. Rather than re-inventing metabolism as a palimpsest over earlier abi-

9 An example is glycolysis, which is the reverse of gluconeogene-sis [48].

otic sources [55], we suppose that organic life subsumedand controlled organosynthetic process of geochemicalorigin [27, 56].

B. Network topology, self-amplification, and levels

of structure

Understanding either the emergence of life, or the ro-bust persistence of the biosphere, requires understand-ing life’s capacity for exponential growth. Exponen-tial growth results from proportional self-amplificationof metabolic and other networks that have an “autocat-alytic” topology [57–61] (see Fig. 3). Network autocatal-ysis is a term used to describe a topological (stoichio-metric) property of the substrate network of chemicalreactions. In a catalytic network, one or more of thenetwork intermediates is needed as a substrate to en-able the pathway to connect to its inputs or to convertthem to outputs, but the catalytic species is regeneratedby the stage at which the pathway completes. Network-catalytic pathways must therefore incorporate feedbackand comprise one or more loops with regard to the in-ternally produced molecules. An autocatalytic network isa catalytic network augmented by further reactions thatconvert outputs to additional copies of the network cat-alyst, rendering the pathway self-amplifying.10

Network autocatalysis is necessary to maintain dy-namical ordered states, by re-concentrating inputs intoa finite number of intermediates, against the disorder-ing effects of thermodynamic decay and continual ex-ternal perturbation. Therefore all observed persistentmaterial flows in the biosphere can only be products ofautocatalytic networks, though they may require hard-to-recognize feedbacks ranging from the level of cellmetabolism to trophic ecology for full regeneration. Thisex post observation does not, however, explain why self-amplification was possible in abiotic chemistry to giverise to a biosphere. In addition to topologies enablingfeedback, the latter would have required that interme-diates in the network be produced at rates higher thanthose at which they were removed.

The significant observations about autocatalysis in theextant biosphere, which may also contain information

10 Molecular autocatalysis – the property that intermediates in apathway serve as conventional molecular catalysts for other re-actions in the pathway – may be understood as a restricted formof network autocatalysis in which the reaction to which somespecies is an essential input is the same reaction that regeneratesthat species. Some chemists prefer to use the term “network au-toamplification” for the general case, restricting “autocatalysis”to apply only when species are traditionally-defined molecularcatalysts. We will use “autocatalysis” for the general case, toreflect the property of stoichiometry that a pathway regeneratesessential inputs. For us the distinction between autocatalysisat the single molecule versus more general network level mainlyeffects the kinetics and regulation of pathways.

Page 8: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

7

Prebiotic chemistry

Cellular biochemistry

time

sub

stra

te m

ass

No feedback

No autocatalysis

Feedback

Short-loop autocatalysis

Cofactors

Enzymes

Cofactors

Enzymes

Feedback

Short-loop + long-loop autocatalysisFeedback

Long-loop autocatalysis

= Environmental substrate precursor

FIG. 3: Upper limit growth rate curves for chemical reac-tion networks of different classes. To highlight the role ofnetwork topologies, the chemistry is simplified with only C-C bond forming and cleaving reactions shown. Each of thegrowth curves qualitatively represents the upper limits formass accumulation within the participating substrates of thepathways. When fully integrated within modern cellular bio-chemistry, both linear and cyclic pathways are network au-tocatalytic and capable of exponential growth. This formof network autocatalysis, however, derives from feedback pro-vided by cofactors and enzymes, both of which have elaboratesynthesis pathways, and is thus classified as “long-loop” au-tocatalysis. In an abiotic world in which reaction-level catal-ysis is limited to external sources, only cyclic pathways withfeedback topologies at the substrate level – correspondinglyclassified as “short-loop” autocatalysis – are capable of ex-ponential growth, while linear pathways are limited to lineargrowth. We contend that an early presence of short-loop au-tocatalysis is important because it provides a mechanism toconcentrate mass flux within abiotic chemical networks, pre-venting excessive dilution with increasing size and complexityof organic molecules, and in turn giving easier and more ro-bust access to subsequent long-loop feedback closures.

about its emergence, concern the complexity, number,and particular form of levels in which autocatalytic feed-back can be found. Where the hierarchical modulesof metabolic structure or function follow the bound-aries required for feedback closure of different autocat-alytic sub-networks, it may be possible to order the ap-pearance of those sub-networks in time, and to infer

the geochemical supports they required for stability andself-amplification, before those supports were attainedthrough integration into cellular biochemistry.

We wish, in these characterizations, to recognize whatwe might call “conditional” as well as strict autocatal-ysis. In extant organisms, where (essentially) all reac-tions are catalyzed by macromolecules, and most cofac-tors (reductants, nucleoside-triphosphates, coenzymes)are recharged by cellular processes, strict autocatalysisof any network is only satisfied in the context of thefull complement of integrated cellular processes. If, how-ever, inputs provided by cofactors, macromolecules andenergy systems in modern cells could have been providedexternally in earlier stages of life, for instance by miner-als or geochemical processes, then identifying networksin extant biochemistry that, although simple, would be

autocatalytic if given these supports, may give informa-tion about intermediate stages of emergence (see Fig. 3).The strong modularity of extant metabolism and its con-gruence with such conditionally autocatalytic topologiessuggests that a separation into layers corresponding tostages of emergence may be sensible. In addition to re-constructing historical stages, the mechanisms leading toautocatalysis in different sub-systems may suggest im-portant geochemical contexts or sources of robustnessstill exploited in modern metabolism.

C. Network-autocatalysis in carbon-fixation

pathways

At the chemically simplest level of description – thatof the small-molecule metabolic substrates and theirreaction-network topologies – carbon fixation pathwaysform two classes. Five of the the six known pathwaysare autocatalytic loops, while one is a linear reactionsequence.11 The loop pathways condense CO2 or bicar-bonate onto their substrate molecules, lengthening them.Each condensation is followed by a reduction, making theaverage oxidation state of carbon in the pathway sub-strate lower than that of the input CO2, and resultingin a negative net free energy of formation in a reduc-ing environment [7].12 Each fixation loop contains onereaction where the maximal-length substrate is cleavedto produce two intermediates earlier in the same path-way, resulting in self-amplification of the pathway flux.As long as pathway intermediates are replenished faster

11 All uses of autocatalysis in this section refer to conditional au-tocatalysis, taking as external support the same level of cofactoror enzymatic complexity. Such external factors being equal, thesmall-molecule substrate pathways of the loops display an addi-tional form of autocatalysis not present in the linear pathway.

12 Reducing power may originate in the geochemical environment,but in modern cells electrons are transferred endergonically tomore powerful reductants such as NADH, NADPH, FADH2, orreduced ferredoxin.

Page 9: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

8

than they are drained by parasitic or anabolic side reac-tions, the loop current remains above the autocatalyticthreshold. However, the threshold is fragile, as pathwaykinetics provide no inherent barrier against flux’s fallingbelow threshold and subsequently collapsing.13

At the level of network topology, the linear Wood-

Ljungdahl (WL) fixation pathway [62–64] is strikinglyunlike the five loop pathways. Instead of covalently bind-ing CO2 onto pathway substrates, which then serve asplatforms for reduction, the WL reactions directly reduceone-carbon (C1) groups, and then distribute the partly-or fully-reduced intermediates to other anabolic path-ways where they are incorporated into metabolites. Thelinear sequence of reductions has no feedback, and the C1

groups at intermediate oxidation states do not increase incomplexity. Instead, these reductions (leading to inter-mediate C1 states that would be unstable in solution) arecarried out on evolutionarily refined folate cofactors [65].The topology of the WL pathway becomes self-amplifyingonly if the larger and more complex biosynthetic networkfor these cofactors is considered together with that ofthe C1 substrate. We will characterize this distinctionbetween the loop-fixation pathways and WL as a distinc-tion between short-loop and long-loop autocatalysis (seeFig. 3).14 Short loops contain only the small-moleculesubstrates; long loops incorporate the biosynthetic net-works for cofactors as well.

The network catalysts that could be said to “select”the short-loop pathways are the reaction intermediatesthemselves. The key metabolites that have the corre-sponding selection role for WL are the folate cofactorsproduced in a secondary biosynthetic network. Short-loop and long-loop pathways are therefore distinguishedboth by the number of reactions that must be maintainedand regulated, and by the fact that WL spans substratesand cofactors, which we will argue in Sec. IV are nat-urally interpreted as qualitatively distinct layers withinbiochemistry.

The appearance of different features suggesting sim-plicity or primordial robustness, in different fixationpathways, together with aspects of their phylogenetic dis-tribution, have led to diverse proposals about the order oftheir emergence [66, 67]. WL is the only carbon-fixationpathway found in both bacteria and archaea, and its re-actions have been shown to have abiotic mineral ana-logues [66, 68, 69], suggesting a prebiotic origin. YetWL is not self-amplifying and so lacks the capacity forchemical “competitive exclusion” (equivalent to the ca-pacity for exponential growth). The cofactors that make

13 The autocatalytic threshold and dynamics of growth, saturation,or collapse are considered in Sec. III D, and shown in Fig. 12.

14 In the network context of long-loop, WL fixation mechanism, thefolate cofactors have an intermediate role between network cat-alysts and molecular catalysts, as they are passive carriers, butform stable molecular intermediates rather than mere complexesas are formed by enzymes with their substrates.

it self-amplifying are complex, and the simple pathwaystructure of C1 reduction does not suggest what wouldhave supported their formation.

In contrast, autocatalysis within the small-moleculesubstrate networks of the loop pathways suggeststhe inherent capacity for self-amplification, exponentialgrowth, and chemical competitive exclusion. This is anappealing explanation [7] for the role, particularly of theintermediates in the reductive citric acid cycle [6, 70] (dis-cussed in Sec. III) as precursors of biomass. Arcs withinthis pathway have also been reproduced experimentallyin mineral environments [71], though a self-amplifyingsystem has not yet been demonstrated. However, self-amplification requires complete loops, and even the mostcompelling candidate for a primordial form (reductive cit-ric acid cycling) is found only in a subset of bacterialclades.

We argue in the next section that a joint fixation path-way incorporating both WL and citric acid cycling re-solves many of these ambiguities in a way that no modernfixation pathway can.15 As a phylogenetic root, it definesa template from which the fixation pathways in all mod-ern clades could have diverged, and as a candidate for aprimordial metabolic network, it provides both chemicalselection of biomass precursors by short-loop autocatal-ysis, and a form of protection against the fragility of theautocatalytic threshold. We will first describe the bio-chemistry and phylogenetics of carbon-fixation pathwaysin the current biosphere, and then show how their pat-terns of modularity and chemical redundancy provide aframework for historical reconstruction.

III. CORE CARBON METABOLISM

Currently six carbon fixation pathways are known [72,73]. While they are distinct as complete pathways,they have significant overlaps at the level of individualreactions, and even greater redundancy in local-groupchemistry. They are also, as shown in Fig. 5 (below),tightly integrated with the main pathways of core car-bon metabolism, including lipid synthesis, gluconeogen-esis, and pentose-phosphate synthesis.

An extensive analysis of their chemistry under phys-iologically relevant conditions has shown that individ-ual fixation pathways contain two groups of thermo-dynamic bottlenecks: carboxylation reactions, and car-boxyl reduction reactions [74]. In isolation these reac-tions generally require ATP hydrolysis to proceed, andhow pathways deal with (or avoid) these costs has been

15 A proposal for WL fixation followed by citric-acid cycling is madein the text of Ref. [69], though the primordial networks proposedin that paper are forms of acetogenesis, and do not emphasizeself-amplification and short-loop autocatalysis as essential earlyrequirements.

Page 10: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

9

concluded to form an important constraint their inter-nal structure [74]. We will further show how the elabo-rate and complex catalytic mechanisms associated withthese reactions form essential evolutionary constraints onmetabolism.

We will first describe the biochemical and phylogeneticdetails of the individual pathways, and then diagramtheir patterns of redundancy, first at the level of modu-lar reaction sequences, and then in local-group chemistry.Finally we will use this decomposition together with evi-dence from gene distributions to propose their historicalrelation and identify constraints that could have spannedthe Darwinian and pre-cellular eras.

A. Carbon fixation pathways

1. Overview of pathway chemistries, phylogeny andenvironmental context

Wood-Ljungdahl: The Wood-Ljungdahl (WL) path-way [62–64, 66] consists of a sequence of five reactionsthat directly reduce one CO2 to a methyl group, a par-allel reaction reducing CO2 to CO, and a final reactioncombining the methyl and CO groups with each otherand with a molecule of Coenzyme-A (CoA) to form thethioester acetyl-CoA. The reactions are shown below inFig. 4, and discussed in detail in Sec. IV. The five stepsreducing CO2 to −CH3 make up the core pathway offolate (vitamin B9) chemistry and its archaeal analog,which we consider at length in Sec. III B. The reduc-tion to CO, and the synthesis of acetyl-CoA, are per-formed by the bi-functional CO-Dehydrogenase/Acetyl-CoA Synthase (CODH/ACS), a highly conserved enzymecomplex with Ni-[Fe4S5] and Ni-Ni-[Fe4S4] centers [75–78]. Methyl-transfer from pterins to the ACS active siteis performed by a corrinoid iron-sulfur protein (CFeSP)in which the cobalt-tetrapyrrole cofactor cobalamin (vi-tamine B12) is part of the active site [79, 80].

Phylogenetically, WL is a widely distributed pathway,found in a variety of both bacteria and archaea, in-cluding acetogens, methanogens, sulfate reducers, andpossibly anaerobic ammonium oxidizers [72]. The fullpathway is found only in strict anaerobes, because theCODH/ACS is one of the most oxygen-sensitive enzymesknown [81, 82]16. However, as we have argued in Ref. [41],the folate-mediated reactions form a partly-independentsub-module. This module combines with the equally-distinctive CODH/ACS enzyme to to form the complete

16 Recent results (S. Ragsdale, pers. comm.) suggest that theCODH/ACS is also sensitive to sulfides and perhaps other ox-idants. We will impute this oxidant sensitivity as the reasonthe CODH/ACS was lost in the ancestral environments of someclades in Sec. III D2. For later branches the oxidant may havebeen molecular oxygen, but O2 is not a plausible toxin at thetime of the LUCA or earliest phylogenetic separations.

N

HN

NH

HN

O

H2N R2

HN

R2

R1

O O

CO2

HCOOH

CH3

H4MPT, THF

Ni

Ni

NiO O

H3C

CoA

CoA Fdred

2-

Fdox + H2O

2HCO2O

H3C

H4MPT

SCH3

SO3

-

HS

SO

CoB-SH

CoB-S-S-CoM

CH4

CoB-SH

XH2

X

SH

HO

O

NH

S

HO

O

NH2

CH3

S

HO

O

NH

AdS

HO

O

NH2

H3C

Ad

NADH, H

NADCO2

NH3

HO NH

O

HO

OH

NH

O

HO

O-Pi

NH

O

HO

O-Pi

O

O

HO

O-Pi

OH

O

HN

S

S

O

NH

SHS

O NH2HN

HSHS O

H2N

R3

HN

O

R3

CO2

H2O

X

XH2

Glucuneogenesis / Glycolysis

NAD

NADH, H

Purine

THF

ATP

ADP, Pi

THF

THF

HO O

O

H

H2O

X

XH2

X

XH2

H

H2O

N

HN

NH

NO

H2N R2

N

R2

R1

N

HN

NH

NO

H2N R2

N

R2

R1

N

HN

NH

NO

H2N R2

HN

R2

R1

2

2

3

-

N

HN

NH

NO

H2N

HN R1

N

HN

NH

NO

H2N

N R1H

O

N

HN

NH

NO

H2N

HN R1

ATP

ADP, P

?

Glycine

Serine

Glyoxylate

R-CH3

FIG. 4: The reactions in the Wood-Llyungdahl pathway ofdirect C1-reduction. The main sequence on pterins is shown,with five outputs for formyl, methylene, or methyl groups.The semi-independent submodule often used to directly syn-thesize glycine and serine from CO2, even when acetyl-CoAsynthesis is absent, is highlighted in red. Alternative path-ways to glycine and serine, from 3-phosphoglycerate in gluco-neogenesis/glycolysis and glyoxylate, are shown in the upperright quadrant. Finally, the dashed arrows represent a sug-gested alternative form of formate uptake based on binding atN5 rather than N10 of folate before cyclization to methenyl-THF [41].

WL pathway, but can serve independently as partialcarbon-fixation pathways even in the absence of the fi-nal step to acetyl-CoA (see Fig. 4). In this role it isfound almost universally among deep bacterial clades.

All carbon fixation pathways in extant organisms em-ploy some essential and apparently unique enzymes andmost also rely in essential ways on certain cofactors.17

Among these, however, the function provided by pterincofactors18 in WL is arguably the most complex, extend-ing beyond mere reduction. Pterins mediate capture of

17 For example, the 3-hydroxypropionate pathway relies on biotinfor reactions shared with (or homologous to) those in fatty acidsynthesis. The reductive citric-acid cycle relies on reduced ferre-doxin [83], a simple multi-domain [Fe4S4] enzyme and on thi-amine, in its reductive carbonyl-insertion reaction [84], and alsoon biotin for its β-carboxylation steps [85, 86].

18 Pterin is a name referring to the class of cofactors includingfolates and the methanopterins, which are both derived from aneopterin precursor.

Page 11: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

10

formate, reduction of carbon bound to one or two nitro-gen atoms, and transfer of formyl, methylene, or methylgroups.19 In this sense the simple network topology ofdirect C1 reduction seems to require a more elaboratedependence on cofactors than is seen in other pathways.

Reductive citric-acid cycle: The reductive citric-acid (reductive Tricarboxylic Acid, or rTCA) cycle [70,87] is the reverse of the oxidative Krebs cycle, whichwas by all evidence derived from it [7, 33, 88] duringthe rise of oxygen. It is a sequence of eleven interme-diates and eleven reactions, highlighted in Fig. 5, whichreduce two molecules of CO2, and combine these througha substrate-level phosphorylation with CoA, to form onemolecule of acetyl-CoA. In the cycle, one molecule ofoxaloacetate grows by condensation with two CO2 andis reduced and activated with CoA. The result, citryl-CoA, undergoes a retro-aldol cleavage to regenerate ox-aloacetate and acetyl-CoA.20 A second arc of reactionssometimes termed anaplerotic [5] condenses two furtherCO2 with acetyl-CoA to produce a second molecule of ox-aloacetate, completing the network-autocatalytic topol-ogy and making the cycle self-amplifying. The distinctivereaction in the rTCA pathway is a carbonyl insertion ata thioester (acetyl-CoA or succinyl-CoA), performed bya family of conserved ferredoxin-dependent oxidoreduc-tases which are triple-Fe4S4-cluster proteins [84]. Thecycle is found in many anaerobic and microaearophylicbacterial lineages, including aquificales, chlorobi, and δ-and ǫ-proteobacteria.

Enzymes from reductive TCA reactions are very widelydistributed among bacteria, where they support fermen-tative pathways that break cycling and use intermediatessuch as succinate as terminal electron acceptors [91]. Theco-presence of variant enzymes associated with reductiveand oxidative cycling [92–94] may provide detailed evi-dence about the reversal of core metabolism under therise of oxygen.

Dicarboxylate/4-hydroxybutyrate cycle: Thedicarboxyate/4-hydroxybutyrate (DC/4HB) cycle [66,95], illustrated in Fig. 7 (below) is, like rTCA, a single-loop network-autocatalytic cycle, but has a simpler formof autocatalysis in which acetyl-CoA rather than ox-aloacetate is the network catalyst. Only two CO2

molecules are attached in the course of the cycle to formacetoacetyl-CoA, which is then thioesterified at the sec-ond acetyl moiety and cleaved to directly regenerate twomolecules of acetyl-CoA. An extra copy of the networkcatalyst is thus directly regenerated (with suitable CoA

19 The distinctive role of cofactors continues with the dependenceof the acetyl-CoA synthesis on cobalamin, a highly reducedtetrapyrrole capable of two-electron transfer[79].

20 Here we separate the formation of citryl-CoA from its subsequentretro-aldol cleavage, as this is argued to be the original reactionsequence, and the one displaying the closest homology in thesubstrate-level phosphorylation with that of succinyl-CoA [89,90].

activation) without the need for anaplerotic reactions.The cycle has so far been found only in anaerobic crenar-chaeota, but within this group it is believed to be widelydistributed phylogenetically [66, 95]. The first five re-actions in the cycle (from acetyl-CoA to succinyl-CoA)are identical to those of rTCA. The second arc of the cy-cle begins with reactions found also in 4-hydroxybutyrateand γ-aminobutyrate fermenters in the clostridia (a sub-group of Firmicutes within the bacteria), and terminatesin the reverse of reactions in the isoprene biosynthesispathway. The DC/4HB pathway thus uses the sameferredoxin-dependent carbonyl-insertion reaction used inrTCA (though only at acetyl-CoA), along with distinc-tive reactions associated with 4-hydroxybutyrate fermen-tation. In particular, the dehydration/isomerization se-quence from 4-hydroxybutyryl-CoA to crotonyl-CoA isperformed by a flavin-dependent protein containing an[Fe4-S4] cluster, and involves a ketyl-radical intermedi-ate [96, 97].

3-hydroxypropionate bicycle: The 3-hydroxypropionate (3HP) bicycle [82], highlightedin Fig. 9, has the most complex network topology of thefixation pathways, using two linked cycles to regenerateits network catalysts and to fix carbon. The networkcatalysts in both loops are acetyl-CoA and the outletfor fixed carbon is pyruvate. The reactions in thecycle begin with the biotin-dependent carboxylation ofAcetyl-CoA to form Malonyl-CoA, from the fatty-acidsynthesis pathway, followed by a distinctive thioester-ification [98] and a second, homologous carboxylationof propionyl-CoA (to methylmalyl-CoA) followed byisomerization to form succinyl-CoA. The first cyclethen proceeds as the oxidative TCA arc, followed byretro-aldol reactions also found in the glyoxylate shunt(described below). A second cycle is initiated by analdol condensation of propionyl-CoA with glyoxylatefrom the first cycle to yield β-methylmalyl-CoA, whichfollows a sequence of reduction and isomerizationthrough an enoyl intermediate (mesaconate) similar tothe second cycle of rTCA or the 4HB pathway. Thiscomplex pathway was discovered in the Chloroflexiand is believed to represent an adaptation to alkalineenvironments in which the CO2/HCO−

3 (bicarbonate)equilibrium strongly favors bicarbonate. All carbonfixations proceed through activated biotin, thus avoidingthe carbonyl insertion of the rTCA and DC/4HBpathways. The complexity of the bicycle network likelyreflects the difficulty of replacing both carbonyl insertionreactions from an ancestral rTCA cycle while retainingautocatalysis, but it also suggests the diverse inventoryof pathway segments available to draw from at the timeof its emergence, which reflect an underlying chemicalsimplicity and redundancy, as we will show.

3-hydroxypropionate/4-hydroxybutyrate cy-cle: The 3-hydroxypropionate/4-hydroxybutyrate(3HP/4HB) cycle [99], shown in Fig. 7, is a single-looppathway in which the first arc is the 3HP pathway, andthe second arc is the 4HB pathway. Like DC/4HB,

Page 12: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

11

3HP/4HB uses acetyl-CoA as network catalyst andfixes two CO2 to form acetoacetyl-CoA. The pathwayis found in the Sulfolobales (crenarchaeota), where itcombines the crenarchaeal 4HB pattern of autotrophiccarbon fixation with the bicarbonate adaptation of the3HP pathway. Like the 3HP bicycle, the 3HP/4HBpathway is thought to be an adaptation to alkalinity,but because the 4HB arc does not fix additional carbon,this adaptation resulted in a simpler pathway structurethan the bicycle.

Calvin-Benson-Bassham cycle: The Calvin-Benson-Bassham (CBB) cycle [100, 101] is responsible formost of known carbon fixation in the biosphere. In thesame way as WL adds only the distinctive CODH/ACSreaction to an otherwise very-widely-distributed folatepathway [41], the CBB cycle adds a single reactionto the otherwise-universal network of aldol reactionsamong sugar-phosphates that make up the gluconeogenicpathway to fructose 1,6-bisphosphate and the reductivepentose phosphate pathway to ribose and ribulose 1,5-bisphosphate.21 The distinctive CBB reaction that ex-tends reductive pentose-phosphate synthesis to a car-bon fixation cycle is a carboxylation performed by theRibulose 1,5-bisphosphate Carboxylase/Oxygenase (Ru-bisCO), together with cleavage of the original ribulosemoiety to produce two molecules of 3-phosphoglycerate.The Calvin cycle resembles the 4HB pathways in regen-erating two copies of the network catalyst directly, notrequiring separate anaplerotic reactions for autocalysis.In addition to carboxylation, RubisCO can react withoxygen in a process known as photorespiration [102–104]to produce 2-phosphoglycolate (2PG), a precursor to gly-oxylate that is independent of rTCA-cycle reactions. TheCBB cycle is widely distributed among cyanobacteria,inchloroplasts in plants, and in some secondary endosym-bionts.

The glyoxylate shunt: Although it is not an au-totrophic carbon-fixation pathway, the glyoxylate shunt(or glyoxylate bypass) is of interest because it shares in-termediates and reactions with many of the above fixa-tion pathways, and because it resembles a fixation path-way in certain topological features. The pathway isshown below in Fig. 9. All aldol reactions that can beperformed starting from rTCA intermediates appear inthis pathway, either as cleavages or as condensations. Inaddition to condensation of acetate and oxaloacetate toform citrate, these include cleavage of isocitrate to form

21 The universality of this network requires some qualification. Weshow a canonical version of the network in the figures below,and some variant on this network is present in every organismthat synthesizes ribose. However, the (CH2O)n stoichiometryof sugars, together with the wide diversity of possible aldol re-actions among sugar-phosphates, make sugar re-arrangement aproblem in the number theory of the small integers, with solu-tions that may depend sensitively on allowed inputs and out-puts. Other pathways within the collection of attested pentose-phosphate networks are shown in Ref. [61].

glyoxylate and succinate, and condensation of glyoxylateand acetate to form malate. The shunt is a weakly ox-idative pathway (generating one H2-equivalent from oxi-dizing succinate to fumarate), and is otherwise a networkof internal redox reactions. It is therefore a very widely-used facultative pathway under conditions where carbonfor biosynthesis, more than reductant, is limiting.

Two of the arcs of the shunt overlap with arcs in theoxidative Krebs cycle, but the entire pathway is a bicy-cle much like the 3HP-bicycle, sharing many of the sameintermediates, but running in the opposite direction. Ox-idative pathways such as the Krebs cycle are ordinarilycatabolic, and hence not self-maintaining. The glyoxy-late shunt may be regarded as a network-autocatalyticpathway for intake of acetate, using malate as the net-work catalyst and regenerating a second molecule ofmalate from two acetate molecules. This may be partof the reason that the shunt is up-regulated in thedeinococcus-thermus family of bacteria in response to ra-diation exposure [105], providing additional robustnessfrom network topology under conditions when metaboliccontrol is compromised.

2. Thermodynamic constraints on pathway structure

The central energetic costs of carbon-fixation pathwaysare associated with carboxylation reactions in which CO2

molecules are added to the growing substrate, and thesubsequent reactions in which the carboxyl group is re-duced to a carbonyl [74]. In isolation these reactionsrequire ATP hydrolysis, but these costs can be avoidedin several ways. In some cases a thioester intermediateis used to effectively couple together a carboxyl reduc-tion and a subsequent carboxylation, allowing the tworeactions to be driven by a single ATP hydrolysis. Anunfavorable (endergonic) reaction can also be coupled toa highly favorable (exergonic) reaction, allowing the re-actions to proceed without ATP hydrolysis.

Individual pathways employ such couplings to varyingdegrees, resulting in a range of ATP costs associated withcarbon fixation. At the low end, WL eliminates nearly alluse of ATP through its unique pathway chemistry. Us-ing folates to reduce one-carbon units derived from CO2

before incorporating them into growing substrates avoidthe cost of carboxylation, saving an ATP, while the en-dergonic reduction of CO2 to CO is coupled to the sub-sequent exergonic synthesis of acetyl-CoA. Finally, theactivated thioester bond of acetyl-CoA allows the subse-quent carboxylation to pyruvate to also proceed withoutadditional ATP. As a result WL requires on a single ATP,for the attachment (and activation) of formate on THF22,in the synthesis of pyruvate from CO2 [66, 74]. Similarly,

22 In methanogens this cost has been completely eliminated bymodifying the structure of THF to that of H4MPT [41]

Page 13: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

12

rTCA has high energetic efficiency as a result of exten-sive reaction coupling, requiring only 2 ATP to synthesizepyruvate from CO2 [66, 74]. Two ATP are saved by cou-pling carboxyl reductions to subsequent carboxylationsusing thioester intermediates, and an additional ATP issaved by coupling the carboxylation of α-ketoglutarate tothe subsequent carbonyl reduction leading to isocitrate.

At the high end of energetic cost of carbon-fixationare pathways that couple unfavorable reactions less ef-fectively, or not at all, or even hydrolyze ATP for re-actions other than carboxylation or carboxyl reduction.Both the DC/4HB pathway and the 3HP bicycle decou-ple one or more of the thioester-mediated carboxyl reduc-tion + carboxylation sequences such as used in rTCA,and neither couple endergonic carboxylations to exer-gonic reductions. As a result DC/4HB requires 5 ATPand the 3HP bicycle 7 ATP to synthesize pyruvate fromCO2 [66, 74]. The 3HP/4HB pathway has the highestcost of any fixation pathway, with 9 ATP required tosynthesize pyruvate from CO2. This is partly because italso decouples thioester-mediated carboxyl reduction +carboxylation sequences, and partly because pyruvate issynthesized by diverting and ultimately decarboxylating

succinyl-CoA [66, 73, 74]. Finally, CBB is also at the highend in terms of cost, requiring 7 ATP to synthesize pyru-vate from CO2. Although this pathway avoids the costof carboxylation reactions by coupling them to exergoniccleavage reactions, CBB is the only fixation pathway thatinvests ATP hydrolysis in chemistry other than carboxy-lations or carboxyl reductions, relatively increasing itscost [74].

3. Centrality and universality of the reactions in thecitric-acid cycle, and the pillars of anabolism

The apparent diversity of six known fixation pathwaysis unified by the role of the citric-acid cycle reactions, andsecondarily by that of gluconeogenesis and the pentose-phosphate pathways. Fig. 5 presents the C,H,O stoi-chiometry23 for a network of reactions that includes allsix known pathways. The network contains only 35 or-ganic intermediates,24 because many intermediates andreactions appear in multiple pathways.

23 Here, by stoichiometry we refer to the mole-ratios of reactantsand products for each reaction, with molecules represented bytheir CHO constituents, and attached phosphate or thioestergroups omitted. Where phosphorylation or thioesterification me-diates a net dehydration, we have represented the dehydration

directly in the figure.24 Hydroxymethyl-glutarate and butyrate are also shown, to indi-

cate points of departure to isoprene and fatty acid synthesis,respectively.

In Fig. 5 the TCA cycle and the gluconeogenic pathwayare highlighted. Beyond being mere points of departurefor alternative fixation pathways and for diversificationsin intermediary metabolism, they are invariants under di-versification because they determine carbon flow amongthe universal precursors of biosynthesis.

Almost all anabolic pathways in extant organisms orig-inate in one of five intermediates in the TCA cycle – ac-etate (as acetyl-CoA), pyruvate, oxaloacetate, succinate(or succinyl-CoA) or α-ketoglutarate – which have beendubbed the “pillars of anabolism” [40]. Succinyl-CoAcan serve as the precursor to pyrroles (metal-coordinatinggroups in many cofactors) – mainly in α-proteobacteriaand mitochondria – but these are more commonly madefrom α-ketoglutarate via glutamate in what is known asthe C5 pathway [106]. A phylogenetic analysis of thesepathways confirms that the C5 pathway is the most plau-sible ancestral form [107]. Thus as few as four TCA in-termediates provide the organic inputs to all anabolicpathways. Fig. 6 shows the major molecule classes as-sociated with each intermediate. The only exceptions tothis universality, which form a biosynthetic sequence, areglycine, serine, and a few compounds synthesized fromthem; this sequence can be initiated directly from CO2

outside of the pillars (see Fig. 4), an observation that be-comes key in reconstructing the evolutionary history of

carbon-fixation (see Sec. III C). The gluconeogenic path-way then forms a similarly unique bridge between theTCA intermediate pyruvate (in the activated form phos-phoenolpyruvate) and the network of sugar-phosphate re-actions known as the pentose-phosphate pathway.

Carbon-fixation pathways must reach all four (or five)of the universal anabolic starting compounds. They maydo this either by producing them as pathway interme-diates, or by means of secondary reactions convertingpathway intermediates into the essential precursors. Thedegree to which a pathway passes through all essentialbiosynthetic precursors may suggest its antiquity. Inmetabolism-first theories of the origin of life [6], the lim-ited set of compounds selected and made available inhigh concentration by proto-metabolism determined theopportunities for further biosynthesis, thus establishingthemselves as the precursors of anabolism.

Among the five network-autocatalytic fixation path-ways, the CBB pathway is unique in not passing throughany universal anabolic precursors. When used as a fix-ation pathway, CBB reactions must thus be connectedto the rest of anabolism through the reverse of sev-eral reactions in the gluconeogenic pathways connect-ing 3-phosphoglycerate (3PG) to pyruvate. Pyruvateis then connected to the remaining precursors throughpartial TCA sequences. Glyoxylate produced from 2-

Page 14: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

13

GLX

MSCCTM

MML

ISC

4HB

H2OCAC

CRT

OXS

SSA

H2

CIT

AcACE

OXA

ACE

AKG

H2O

H2

CO2

CO2

H2O

3HB

H2

H2O

H2O

PYR

MLN

H2

H2O

CO2

CO2

MSA

MAL

3HP

H2

H2 SUC

PRPMEM

CO2

H2O

H2O

FUM

ACR

H2O

FRC

GLADHA

SED

RBL

XYL

ERY

CO2

RIB

H2O

GLT

H2O

H2

BUT

H2

To fatty acids

HMG

To iosprenes

CO2

CH4

(WL)

CO2

H2O

FIG. 5: The projection of the complete network for core carbon anabolism onto its CHO components. Phosphorylatedintermediates and thioesters with Coenzyme-A are not shown explicitly. The bipartite graph notation used to show reactionstoichiometry is explained in App. A. Arcs of the reductive citric acid cycle and gluconeogenesis are bold, showing that thesepathways pass through the universal biosynthetic precursors. The Wood-Ljungdahl (labeled WL) pathway, without its cofactorsand reductants shown, is represented by the last reaction of the acetyl-CoA synthase, which is the inverse of a disproportionation.Abbreviations: acetate (ACE); pyruvate (PYR); oxaloacetate (OXA); malate (MAL); fumarate (FUM); succinate (SUC); α-ketoglutarate (AKG); oxalosuccinate (OXS); isocitrate (ISC); cis-aconitate (CAC); citrate (CIT); malonate (MLN); malonatesemialdehyde (MSA); 3-hydroxypropionate (3HP); acrolyate (ACR); propionate (PRP); methylmalonate (MEM); succinatesemialdehyde (SSA); 4-hydroxybutyrate (4HB); crotonate (CRT); 3-hydroxybutyrate (3HB); acetoacetate (AcACE); butyrate(BUT); hydroxymethyl-glutarate (HMG); glyoxylate (GLX); methyl-malate (MML); mesaconate (MSC); citramalate (CTM);glycerate (GLT); glyceraldehyde (GLA); dihydroxyacetone (DHA); fructose (FRC); erythrose (ERY); sedoheptulose (SED);xylulose (XYL); ribulose (RBL); ribose (RIB).

phosphoglycolate during photorespiration may alterna-tively be converted directly to glycine and serine (seeFig. 4).

Among the remaining loop-fixation pathways, onlyrTCA passes through all five anabolic pillars. Throughits partial overlap with rTCA, DC/4HB passes throughfour, excluding α-ketoglurate. The 3HP-bicycle furtherbypasses oxaloacetate, while the 3HP/4HB loop and WLinclude only acetyl-CoA. All of the latter pathways re-quire anaplerotic reactions in the form of incomplete (ei-ther oxidative or reductive) TCA arcs; when these com-bine (in various ways) with WL carbon fixation, they areknown collectively as the reductive acetyl-CoA pathways.

The most parsimonious explanation for the universal-ity of the TCA arcs as anaplerotic reactions is lock-in

by downstream anabolic pathways, to which metabolismwas committed by the time carbon-fixation strategies di-verged. This is a direct extension of the metabolism-firstassumption that anabolic pathways themselves formedaround proto-metabolic selection of the rTCA interme-diates.25 (A similar but later form of commitment hasbeen argued to convert basal gene regulatory networksin metazoan development into kernels, which admit novariation and act as constraints on subsequent evolution-ary dynamics [108, 109].) If lock-in provides the correct

25 Harold Morowitz summarizes this assumption with the phrasemetabolism recapitulates biogenesis [6].

Page 15: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

14

O

O

OH

O

HO

O

OH

O O

HO

O

O

OH

O

OH

O

OH

O

HO

HO

O OH

O

OH

O

HOO

OH

O O

HO

H

O

OH

O

HO

O

OH

O

HO

HOO

OH

O

O

OH

O

HO

O

OH

O

OH

O

HO

O

OH

malonatelipids

alanine,sugars

aspartateamino

acids,

pyrimidines

glutamate

amino

acids

pyrroles

citrate

acetate

pyruvate

oxaloacetate

malate

fumarate

cis-aconitate

succinate

α-ketoglutarate

oxalosuccinate

isocitrate

FIG. 6: The pillars of anabolism, showing lipids, sugars,amino acids, pyrimidines and purines, and tetrapyrroles fromeither succinate or AKG. Molecules with homologous localchemistry are at opposite positions on the circle. Oxidationstates of internal carbon atoms are indicated by color (red =oxidized, blue = reduced).

interpretation of TCA universality, then much of the bur-den of accounting for the inventory of small metabolitesis shifted away from Darwinian selection for function ina post-RNA world, and onto constraints of biosyntheticsimplicity and network context. We show below that phy-logenetic reconstruction is consistent with a selective rolefor rTCA cycling in the root metabolism of cellular life,though only as part of a larger network than the modernrTCA cycle.

B. Modularity in the internal structure and

mutual relationships of the known fixation pathways

Fig. 5 shows that the number of molecules and reac-tions required to include all carbon fixation pathways ismuch smaller than might have been expected from theirnominal diversity, because many reactions are used inmultiple pathways, and all pathways remain close to theuniversal precursors. We have already noted in the pre-vious section that this re-use goes beyond the require-ments of autocatalysis, to the anaplerotic role of rTCAarcs adapting variant fixation pathways to an invariantset of biosynthetic precursors.

The aggregate network also shows many kinds of struc-ture: clusters, concentric rings, and ladders reflectingparallel sequences of the same inputs and outputs in dif-ferent pathways. We will show in this section that theseresult from re-use of local-group chemistry in transfor-mations of distinct molecules.

At the end of the section we will describe a third formof re-use not represented in the aggregate graph. Thefolate-mediated direct C1 reduction sequence of Wood-Ljungdahl, responsible for the methyl group in the WLdisproportionation reaction in Fig. 5, is also found as a

free-standing fixation pathway across the bacterial tree,often as one component in a disconnected autotrophicnetwork using one of the loop fixation pathways as itsother component.

Because of such extensive redundancy, little innova-tion is required to explain the extant diversity of carbonfixation. All known carbon fixation strategies can bedescribed as assemblies of a small number of strongly-defined modules, which govern not only the function ofpathways, but also their evolution.

1. Modularity in carbon fixation loops from re-use ofpathway segments

Fig. 7 shows the sub-network from Fig. 5 contain-ing the four loop-autotrophic carbon fixation pathwaysthat pass through some or all universal precursors, to-gether with reactions in the glyoxylate shunt. The fourloop pathways are shown in four colors, with the organicpathway-intermediates (but not environmental precur-sors or reductants) highlighted.

The figure shows that these pathways re-use interme-diates by combining entire pathway segments. The com-binatorial assembly of these segments is possible becausethey all pass through acetate (as acetyl-CoA), succinate(usually as succinyl-CoA), and all except the second loopof the 3HP bicycle pass through both. Thus the con-served reactions among the autocatalytic loop carbon-fixation pathways are shared within strictly preserved se-quences, which have key molecules as the boundaries atwhich segments may be combined.

2. Homologous local-group chemistry across pathwaysegments

In addition to the re-use of complete reactions in path-way segments, variant carbon-fixation pathways have ex-tensively re-used transformations at the level of localfunctional groups. The network of Fig. 7 is arranged inconcentric rings, in which the arcs of the rTCA cycle alignwith the 3HP or 4HB pathways, or with the mesaconatearc of the 3HP bicycle. The “ladder” structure of in-puts and outputs of reductant (H2) or water betweenthese rings shows the similar stoichiometric progressionin these alternative pathways. Fig. 8 decomposes theaggregate network into two pairs of short-molecule andlong-molecule arcs, and the mesaconate arc, and showsthe pathway intermediates in each arc. The figure makesclear that, both within the arcs of the loop pathways,and between alternate pathways, the type, sequence, andposition of reactions is highly conserved. In particular,the reduction sequence from α-ketones or semialdehydes,to alcohols, to isomerization through enoyl intermedi-ates, is applied to the same bonds on the same carbonatoms from input acetyl moieties in rTCA, 3HP, and4HB pathways, and to analogous functional groups in

Page 16: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

15

ISC

4HB

H2OCAC

CRT

OXS

SSA

H2

CIT

AcACE

OXA

ACE

AKG

H2O

H2

CO2

CO2

H2O

3HB

H2H2O

H2O

PYR

MLN

H2

H2O

CO2

CO2

MSA

MAL

3HP

H2

H2

SUC

PRPMEM

CO2

H2O

H2O

FUM

ACR

GLX

CTM

H2O

MSC

MML

FIG. 7: The four loop carbon-fixation pathways that passthrough some or all of the universal biosynthetic precursors,from the graph of Fig. 5. rTCA is black, DC/4HB is red,3HP-bicycle is blue, and 3HP/4HB is green. The one aldolreaction from the glyoxylate shunt that is not part of the3HP-bicycle is shown in fine lines. The module-boundarynature of acetate (ACE) and succinate (SUC) is shown by theintersection of multiple paths in these compounds. Radiallyaligned reactions are homologous in local-group chemistry;deviations from strict homology in different pathways appearas excursions from concentric circles.

the bicycle. Finally, in the cleavage of both citryl-CoAand citramalyl-CoA, the bond that has been isomerizedthrough the enoyl intermediate is the one cleaved to re-generate the network catalyst.

Even the distinctive step to crotonyl-CoA in the 4HBpathway creates an aconate-type intermediate, and theenzyme responsible has high homology to the acrolyl-CoA synthetase [110, 111], whose output (acrolyl-CoA)follows the standard pattern. Only the position of thedouble bond breaks the strict pattern in crotonyl-CoA,and the abstraction of the un-activated proton requiredto produce this bond requires the unique ketyl-radicalintermediate [112]. From crotonyl-CoA, the sequence to3-hydroxybutyrate is then followed by a surprising ox-idation and re-hydration, resulting in a 5-step, redox-neutral, sequence. The net effect of this sequence isto shift the carbonyl group (of succinate semialdehyde,SSA)) by one carbon (in acetoacetate, AcACE). Becausethe 4HB pathway takes in no new CO2 molecules, this

GLX

PRP

H2O

MSC CTM

PYR

ACE

MML

O

O−

O

O

O−

O

O−

O−

O−

O

O

OO

O

O−

O

O

O

O−

O

O−

O−

O

O3HP (long)

rTCA (long)

4HB

rTCA (short)

3HP (short)

ACE

MLN

H2

H2O

OXA

MSA

MAL

3HP

H2 H2O

FUM

ACR

H2

SUC

PRP

MEM

CO2

H2OO

O

O

O

O

O

O

OO

O

O

OO

O

O

O

O O

O

O

O

O

O

O

O

O

O

O

O

O O

O

O

O

O

O

O

O

O

O

O

CO2

H2O

PYR

CO2

H2O

SUC

ISC

4HB

H2O

CAC

CRT

OXS

SSA

H2

CIT

AcACE

OXA

ACE

AKG

H2O

H2

CO2

CO2

H2O

3HB

H2 H2O

OO

O

O

O

O

O

O

O

O

O

OO O

O

O

O

O

O

O

O

O

O

O

O

O OO

O

O

O

O

O

O

O

O

O O

O

O

O

OOO

OO

O

O

OO

O

O

O

O

O−

O−

OO

O

( )

FIG. 8: Comparison of redundant reactions in the loop carbonfixation pathways. Pathways are divided into “long-molecule”(upper-ranks) and “short-molecule” (lower-ranks) segments;long-molecule segments occupy roughly the upper-right half-plane in Fig. 7, and abbreviations are as in Fig. 5. Moleculeforms are shown next to the corresponding tags. Bonds drawnin red are the active acetyl or semialdehyde moieties in therespective segments. Vertical colored bars align homologouscarbon states. The yellow block shows retro-aldol cleavages ofcitrate or citramalate. Two molecules are shown beneath thetag CRT (crotonate): the greyed-out molecule in parentheseswould be the homologue to the other aconitase-type reac-tions; actual crotonate (full saturation) displaces the doublebond by one carbon, requiring the abstraction of the α-protonin 4-hydroxybutyrate via the ketyl-radical mechanism that isdistinctive of this pathway.

isomerization enables regeneration of the network cata-lysts in the same way the reduction/aldol-cleavage se-quence enables regeneration for rTCA or 3HP.

Duplication of reaction sequences in diverse fixationpathways results from retention of gene sets as organismclades diverged. Duplication of local-group chemistry indiverse reactions has resulted (at least in most cases)from retention of reaction mechanisms as enzyme fami-lies diverged. All enoyl intermediates are produced by awidely diversified family of aconitases [113], while biotin-dependent carboxylations are performed by homologousenzymes acting on pyruvate and α-ketoglutarate [86],26

26 Similar to the synthesis of citryl-CoA we separate here the car-boxylation of α-ketoglutarate from the subsequent reduction of

Page 17: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

16

and substrate-level phosphorylation and thioesterifica-tion are similarly performed by homologous enzymes oncitrate and succinate in rTCA [89, 90].27 The wide cov-erage of a few reaction types may reflect their early es-tablishment by promiscuous catalysts [116], followed byevolution toward increasing specificity as intermediarymetabolic networks expanded and metabolites capableof participating in carbon fixation diversified.

A functional identification of modules that seeks tominimize influence from historical effects (such as lock-in) has been carried out by Noor et al. [117], and iden-tifies similar module boundaries. Using as data thefirst three numbers of the EC classification of enzymes– which distinguish reaction types but coarse-grain overboth substrate specificity and enzyme homology – theyshow that many pathways in core metabolism are theshortest routes possible between inputs and products.Where their analysis overlaps with the pathways shownhere, many of their minimal sequences overlap with themodules in Fig. 7, as well as with others in gluconeo-genesis which we do not consider here. Thus, returningto metabolism-first premises [6], it may be that histor-ical retention of a few reaction types reflects facility ofthe substrate-level chemistry, and that this has placedtime-independent constraints on evolution.

The functional-group homology shown in Fig. 8 allowsus to separate stereotypical sequences of widely diversi-fied reactions from key reactions that distinguish path-ways. The stereotypical sequences lie downstream of re-actions such as the ferredoxin-dependent carbonyl inser-tion (rTCA), or biotin-dependent carboxylation (3HP),which are associated with highly conserved enzymes orcofactors. The downstream reactions are also more “ele-mentary”, in the sense that they are common and widelydiversified in biochemistry, compared to the pathway-distinguishing reactions.

3. Association of the initiating reactions withtransition-metal sulfide mineral stoichiometries and other

distinctive metal-ligand complexes

The observation that alternative fixation pathways arenot distinguished by their internal reaction sequences,but primarily by their initiating reactions, suggests thatthese reactions were the crucial bottlenecks in evolution,and perhaps reflect the limiting diversity of chemicalmechanisms for carbon bond formation.28 The distinc-

oxalosuccinate to isocitrate – performed by a single enzymein most organisms – because it is argued to be the ancestralform [114, 115]

27 However, the thioesterification of propionate is performed by dis-tinct enzymes in bacteria and archaea, an observation that hasbeen interpreted to suggest convergent evolution [82, 99].

28 Mechanisms of organosynthesis in aqueous solution are especiallylimited by the instability of radical intermediates, which may bestabilized by association with metal centers.

tive use of metals in the (often highly conserved) en-zymes and cofactors for these initiating reactions mayfurther suggest a direct link between prebiotic mineraland metal-ligand chemistry [118], and constraints infer-able from the long-term structure of later cellular evolu-tion.

Several enzyme iron-sulfur centers have been recog-nized [119, 120] to use strained versions of the unitcells found in Fe-S minerals, particularly Mackinawiteand Greigite. These are particular instances within awider use of transition-metal-sulfide chemistry in core-metabolic enzymes.

Pyruvate:ferredoxin oxidoreductase (PFOR), whichcatalyzes the reversible carboxylation of acetyl-CoA topyruvate, contains three [Fe4S4] clusters and a thiaminpyrophosphate (TPP) cofactor. The [Fe4S4] clusters andTPP combine to form an electron transfer pathway intothe active site, and the TPP also mediates carboxyltransfer in the active site [84].

The bifunctional carbon monoxidedehydrogenase/acetyl-CoA synthase (CODH/ACS)enzyme that catalyzes the final acetyl-CoA synthe-sis reaction in the WL pathway employs even moreelaborate transition-metal chemistry. Like PFOR, thisenzyme uses [Fe4S4] clusters for electron transfer, butits active sites contain additional, more unusual metalcenters. The CODH active site contains an asymmetric[Ni-Fe4S5] cluster on which CO2 is reduced to CO [75],while the ACS active site contains a Ni-Ni-[Fe4S4]cluster on which CO (from CODH) and a methyl groupfrom folates are joined to form acetyl-CoA [76–78]. Itwas originally thought that a variant form of the ACSactive site contains a Cu-Ni-[Fe4S4] cluster [121, 122],but it was subsequently shown that the Cu-containingcluster represents an inactivated form of ACS [77].Similarly, it has also been shown that the open formof the Ni-Ni ACS may switch to a closed, inactivated,form by exchanging one of the nickel atoms for a zincatom [76]. Finally, methyl-group transfer from themethyl-pterin to the ACS active site is mediated by thecorrinoid iron-sulfur protein (CFeSP) containing thecobalt-tetrapyrrole cofactor cobalamin [79, 80]. Thistransfer process involves the cycling between oxidationstates of both the cobalt and one of the nickel atomsin the NiNi iron-sulfur cluster of ACS. In the first partof the transfer cobalt becomes oxidized from the Co(I)to the Co(III) state. The subsequent donation of themethyl-group to the ACS active site simultaneouslyreduces cobalt back to the Co(I) state and oxidizes theactive nickel from the Ni(0) to the Ni(II) state. Finally,in the release of acetyl-CoA from the ACS the nickel isreduced back to the Ni(0) state, allowing the cycle tostart over [80].

Perhaps not surprisingly, all these examples of metal-cluster enzymes concern catalysis not just of the forma-tion of C-C bonds, but of the incorporation of the smallgas-phase molecule CO2. In general, enzymes involved inthe processing of small gas-phase molecules (including H2

Page 18: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

17

and N2) are among the most unique enzymes in biology– all but one of the known Nickel-containing enzymes be-long to this group [123] – always containing highly com-plex metal centers in their active sites [124–129]. Thisindicates both the difficulty of controlling the catalysisof these reactions, and the importance of understandingtheir functions in the context of the origin of life [120].

4. Complex network closures: diversity and opportunitycreated by aldol reactions

The network closures that retain carbon flux and en-able autocatalysis in rTCA, DC/4HB, and 3HP/4HBpathways are all topologically rather simple, and arequite similar due to the homology among most of thepathway intermediates. Their module boundaries alsoare all defined by acetate and succinate, and at least inthe case of acetate, were probably facilitated by its mul-tiple pre-existing roles as the redox-drain of the rTCAcycle [33] and the starting point for both isoprenoid andfatty-acid lipid biosynthesis.

In contrast, the topology of the 3HP-bicycle appearscomplex, and perhaps an improbable solution to theproblem of recycling all carbon flux through core path-ways.29 If we are to argue that the emergence or evolu-tion of such network closures is facilitated by a form ofmodularity, it must exist at the level of reaction mech-anisms that render their discovery less improbable. Forthe 3HP bicycle and the related glyoxylate shunt – and toa lesser degree also for rTCA – the mechanism of interestis the aldol reaction.

The aldol reaction is an internal oxidation-reductionreaction, which means that it exploits residual free en-ergy from organosynthesis, and also that it can take placeindependently of external electron donors or acceptors.Many aldol reactions are also kinetically facile, occur-ring at appreciable rates without the aid of catalysts. Wetherefore expect that among compounds capable of par-ticipating in them, aldol reactions would have been com-mon in the prebiotic world, providing opportunities forpathway generation. Since their diversity is difficult tosuppress except by special mechanisms [130], we expectthat potential aldol reactions among metabolites wouldeither have become regulated (perhaps through phospho-

29 The many parallel connections in networks such as the 3HP-bicycle or the pentose-phosphate network (see Fig. 5) make theproblem of metabolite interconversion complex [61] in a differentway than arises in the metabolic “bowtie”. Optimal conversionwithin the bowtie consists of finding common molecular “divi-sors” of input and output metabolites, and so can be seen evenin the number theory of highly abstracted string chemistries [44].The fact that short paths exist from most metabolites to a smallset of building blocks is, in our interpretation, a reflection of theprior role of the core (where the building blocks are first cre-ated) in defining the possibilities for later anabolism and thusthe metabolites reached by the bowtie.

ryl occupation of hydroxyl groups) or else incorporatedinto actively-used biochemical pathways.

Aldol reactions are important generators of diversityin organic chemistry, notorious for the very-complex net-work known as the formose reaction [131–134], initi-ated from formaldehyde and glycolaldehyde. Many al-dol reactions are possible for sugars, and the reductivepentose phosphate pathway is indeed a network of se-lected aldol condensations and cleavages among sugar-phosphates [61].

ISC

H2O

CAC

CIT

OXA

ACE

H2O

PYR

MLN

H2

H2O

CO2

MSA

MAL

3HP

H2

H2

SUC

PRPMEM

CO2

H2O

H2O

FUM

ACR

GLX

CTM

MSC

MML

FIG. 9: The 3-hydroxypropionate bicycle (blue) and the gly-oxylate shunt (orange) compared. Directions of flow are indi-cated by arrows on the links to acetate (ACE). The commoncore that enables flux recycling in both pathways is the al-dol reaction between glyoxylate (GLX), acetate, and malate(MAL). The four other aldol reactions (labeled by theircleavage direction) are from isocitrate (ISC), methyl-malate(MML), citrate (CIT), and citramalate (CTM). Malate is arecycled network catalyst in both pathways. Carbon is fixedin the 3HP-bicycle as pyruvate (PYR), so the cycle only be-comes autocatalytic if pyruvate can be converted to malatethrough anaplerotic (rTCA) reactions.

Fewer aldol reactions are possible among intermediatesof the rTCA cycle and their homologues such as methyl-malate or citramalate in other carbon-fixation pathways,but all possibilities are indeed used either in intermedi-ary metabolism or in carbon fixation. Fig. 9 shows the3-hydroxypropionate bicycle and the closely-related gly-oxylate shunt. In both pathways, the network topologiesthat regenerate all carbon flux or achieve autocatalysisare created by aldol reactions. The retention of carbonwithin the shunt appears to be a reason for its widespreaddistribution and frequent use [105, 135, 136], even whenenergetically more-efficient pathways such as the Krebscycle exist as alternatives within organisms.

Page 19: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

18

5. Re-use of the direct C1 reduction pathway and hybridfixation strategies

A unique form of re-use is found for the sequence ofreactions that directly reduce one-carbon (C1) groupson pterin cofactors. We have argued elsewhere [41]that even when a complete, autotrophic WL pathwayis not present due to the loss of the oxygen-sensitiveCODH/ACS enzyme, the direct C1-reduction sequenceon pterins is often still present and being used as a par-tial fixation pathway. The reaction sequence supplies thediverse methyl-group chemistry mediated by S-adenosyl-methionine, and the direct synthesis of glycine and serinefrom methylene groups, reductant, and ammonia. Thesethen serve as precursors to cysteine and tryptophan. Thepathway may exist in either a complete (8-reaction) or apreviously-unrecognized but potentially widespread (7-reaction) form that involves uptake on N5 rather thanN10 of THF [41] (see Fig. 4.)

The widely distributed and diversified form of directC1 reduction functions much as auxiliary catabolic path-ways function in mixotrophs [5], operating in parallel toan independent “primary” fixation pathway, with the pri-mary and the direct-C1 pathway supplying carbon to dif-ferent subsets of core metabolites. In many cases wherethe CODH/ACS is lost, this loss disconnects the primaryand direct-C1 pathway segments, creating the novel fea-ture of a disjoint carbon fixation pathway.30 The ex-istence of parallel fixation pathways in autotrophs hadpreviously been recognized only in one (relatively late-branching) γ-proteobacterium, the uncultured endosym-biont of the deep-sea tube worm Riftia pachyptila, whichwas found to be able to use both the rTCA and CBBcycles [137]. In that case, however, the two pathwaysare not disjoined, but rather connected through inter-mediates in the glycolytic/gluconeogenic pathways. Inaddition, the capacity for using either cycle is thoughtto reflect an ability to adapt to variation in the avail-ability of environmental energy sources, with an appar-ent up-regulation of the more efficient rTCA cycle un-der energy-poor conditions [137]. Our phylogenetic re-construction [41], however, indicate that parallel disjointpathways were the majority phenotype in the deep tree oflife, in which a reductive C1 sequence to glycine and ser-ine is preserved in combination with with rTCA in Aquif-icales and Nitrospirae, with CBB in Cyanobacteria, withthe 3HP bicycle in Chloroflexi (all bacteria), and withDC/4HB in Desulfurococcales and Acetolobales, and the3HP/4HB cycle in Sulfolobales (all archaea). In contrast,the full WL pathway is found only in a subset of lin-

30 A secondary connection between the two components may ex-ist in the form of oxidative conversion of 3-phosphoglycerate toserine. This connection may lead subsequently to the loss ofdirect-C1 reduction as a fixation route, as in the proteobacteria,or it may release a constraint leading to change in pterin cofactorchemistry as in methanogens, discussed below.

eages of bacteria (especially acetogenic Firmicutes) andarchaea (methanogenic Euryarcheota).

Apparently as a result of the flexibility enabled byparallel carbon inputs to core metabolites, the directC1 reduction sequence is more universally distributedthan any of the other loop-networks (whether pairedwith C1 reduction or used as exclusive fixation path-ways), or than the complete WL pathway. The status ofthe pterin-mediated sequence as a module appears morefundamental than its integration into the full WL path-way, and comparable to the arcs identified within rTCA,which may function as parts of fixation pathways or al-ternatively as anaplerotic extensions to other pathways.The two types of pathways also serve similar functionalroles in our phylogenetic reconstruction of a root carbon-fixation phenotype, as the key components enabling andselecting the core anabolic precursors.

The reductive synthesis of glycine furnishes a potentreminder of the importance of taking evolutionary con-text into account when interpreting results from studiesof metabolism. Historically much of our understandingof biochemistry comes from the study of human (or moregenerally mammalian) physiology, which can introducebiases. We noted above the example of the reductivecitric acid cycle, which is sometimes called the “reverse”citric acid cycle even though it was ancestral to the oxida-tive form. Similarly, the “glycine cleavage system” (GCS)was originally studied in rat and chicken livers [138],before being recognized as phylogenetically widespread.The distribution of this system is now known to be nearlyuniversal across the tree of life (with methanogens be-ing the main systematic exception, for reasons explainedelsewhere [41]), suggesting it was present already in theLUCA. The lipoyl-protein based system has long beenknow to be fully reversible [138–140], and has nearlyneutral thermodynamics at physiological conditions [74].Thus, the LUCA could have this system either to syn-thesize or to cleave glycine. From this perspective theformer possibility (synthesis) seems a more likely inter-pretation, even without additional data. Absent the in-terpretation bias from mammalian physiology, the choicebetween these alternatives might have become clear muchsooner.

C. A coarse-graining of carbon-fixation pathways

We can combine all the previous observations on mod-ularity in carbon-fixation – the sharing of arcs betweenloop pathways, the re-use of TCA and reductive C1 se-quence to complete the set of anabolic pillars – to performa “coarse-graining” of carbon-fixation. Combining thedecomposition of Fig. 7 with the gluconeogenic and WLpathways in Fig. 5, we may list the seven modules fromwhich all known autotrophic carbon-fixation pathwaysare assembled: 1) direct one-carbon reduction on folatesor related compounds, with or without the CODH/ACSterminal reaction of WL; 2) the short-molecule rTCA arc

Page 20: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

19

from acetyl-CoA to succinyl-CoA; 3) the long-moleculerTCA arc from succinyl-CoA to citryl-CoA; 4) the gluco-neogenic/reductive pentose-phosphate pathway, with orwithout the RubisCO reaction of CBB; 5) the 3HP arcfrom acetyl-CoA to succinyl-CoA; 6) the long-molecule4HB pathway from succinyl-CoA to acetoacetyl-CoA;7) the glyoxylate-shunt/mesaconate pathway to citra-malate, which is the long-molecule loop in the 3HP bicy-cle. Fig. 10 shows the summary of these modules at thepathway level, as well as their different combinations toform complete autotrophic carbon-fixation pathways.

The importance of including glycine in the set of an-abolic pillars immediately becomes clear in this coarse-grained view. The general similarity among differentcarbon-fixation pathways increases significantly, whilefiner distinction between forms becomes possible. Inparticular, both of the pathways that have been mostcommonly discussed in the context of ancestral carbon-fixation and the origin of life, WL and rTCA [7, 27, 73,88], separate into deep- and late-branching forms. Theincreased similarity of the deep-branching forms of thesepathways suggests an underlying template that combinesboth WL and rTCA in a fully connected network. WLand rTCA differ from this linked network by single re-actions associated either with energy (ATP) economy oroxygen (or perhaps other oxidant) sensitivity. Combin-ing information on the synthesis, structural variation,ecology and phylogenetics of the pterin molecules uponwhich direct C1 reduction is performed similarly sug-gests a distinction between the acetogenic (bacterial) andmethanogenic (archaeal) forms of WL associated withenergy economy [41]. A “proto-tree” of carbon-fixationemerges from the pooling of these different observations,which in turn makes it possible to reconstruct a completephylometabolic tree of carbon-fixation, as discussed indetail in section III D below.

1. How the inventory of elementary modules hasconstrained innovation and evolution

The essential invariance across the biosphere of theseven sub-networks listed above allows us to representall carbon-fixation phenotypes in terms of the presence orabsence, connectivity, and direction of these basic mod-ules. In this representation, metabolic innovation at themodular level retains the character of individual discreteevents, even if the pathway segments involved incorpo-rate multiple genes. In cases where multiple genes mustbe acquired to constitute a module, as in the innovationof the 4HP pathway, this innovation may take place athigher levels of metabolism (e.g. fermentative secondarymetabolism), after which their incorporation as fixationpathways appears appears as a single innovation.

Because the module boundaries are defined by par-ticular (often universal) molecular species (e.g., acetyl-CoA, succinyl-CoA, and ribulose-1,5-bisphosphate) it of-ten remains true that innovation can be traced to the

Pillars w/o glycine Pillars w/ glycine

A

A

B,A

B

B

B

DC/4HB

3HP/4HB

WL

CBB

rTCA

(3HP)2

ATP dependent

O2 sensitive

rTCA + WL

Pentose Phosphate

Glyoxylate/mesaconate

bypass

3HP pathway 4HB pathway

Modules:

A

B

A - Archaea

B - Bacteria

WL/reductive glycine

1

5

rTCA cycle arcs

2 3

6

4

7

C1 uptake = 0 ATP

C1 uptake = 1 ATP

FIG. 10: Coarse-grained summary of carbon-fixation path-ways. The left panel shows the six pathways as they areknown from extensive laboratory characterization. Includ-ing glycine along with the anabolic pillars as the moleculesthat must be reached in carbon-fixation then adds resolu-tion, allowing finer distinctions among forms and generallyincreasing their similarity. As a result, underlying evolution-ary templates and patterns begin to emerge. The panel onthe bottom right shows the modules from which all carbon-fixation pathways are constructed, as outlined in the maintext.

change in single genes. This is true for the loss of theCODH/ACS from acetyl-CoA phenotypes, the innova-tion of RubisCO in CBB bacteria, or the loss of substrate-level phosphorylation to acetyl-CoA or succinyl-CoA inacetogens. A case with only slightly greater complexity isthe apparently repeated, convergent evolution of an ox-idative pathway to form serine from 3-phosphoglycerate(3PG), which involves three common and widely diversi-fied reactions: a dehydrogenation, a reductive transami-nation, and a dephosphorylation.

At the module level, we may represent changes incarbon fixation pathways between closely-related phe-notypes in terms of single connections, disconnections,or overall changes of direction within the subsets of theseven modules which are present. The change of directionwithin modules is usually complete, even if it is partial orintermediate at the level of whole pathways. An exam-ple is the switch from autotrophic rTCA to fermentativeTCA using a reductive small-molecule arc and an oxida-tive large-molecule arc [91]. Such fermentative pathways

Page 21: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

20

may alternate with fully oxidative TCA (Krebs) cycling,and they often occur in organisms that carry homologuesto genes for both oxidative and reductive pathway direc-tions [92–94].

An important exception to this pattern is the par-tial reversal of the formyl-to-methylene sequence on fo-lates, between its carbon-fixation role and its role inthe catabolic cleavage of glycine. We refer in Ref. [41]to the module formed by combining the GCS with themethylene-serine transferase as the glycine cycle. Thecombination of the complex free energy landscape pro-vided by the folates [65] with the reversibility and nearlyneutral thermodynamics of the glycine cycle [74, 138]permits a high degree of flexibility within this module.Carbon can enter either directly through CO2, throughserine (from 3PG), or through glycine (from glyoxylate),and from any of these sources may be redirected to allof C1 chemistry. The topology of the main reaction se-quence is preserved in all these cases of reversal, thoughnew enzymes or cofactors may be recruited to reversesome reactions.31

D. Reconstructed evolutionary history

1. Phylogeny suggests little historical contingency of deepevolution within the modular constraints

The small number of modules that contribute to car-bon fixation, and the even smaller number of “gate-way” molecules that serve as interfaces between most ofthem, permit free recombination into many phenotypessatisfying the constraints of autotrophy. An importantconsequence of free recombination is that the externalconstraint (autotrophy) does not lock in dependencieswithin networks over separations larger than the modulesthemselves. Homology across intra-modular reaction se-quences – especially if it is due to catalytic promiscuity– further weakens any lock-in effect created by selectionfor metabolic completeness. Through these mechanismsmodularity promotes innovation-sharing [141] and rapidand reliable adaptation [18] to environmental conditions,but reduces standing variation among individuals sharinga common environment.

As we reviewed in Sec. III, distinct carbon-fixationpathway modules have very different couplings to thechemical environment. The genome distributions re-ported in Ref. [41] show that they also have very uneven

31 An example is the reversal of the complete rTCA cycle to theoxidative Krebs cycle. The electron donor in rTCA, reducedferredoxin, is replaced by lipoic acid as an electron acceptor in theKrebs cycle, in the TPP-dependent oxidoreductase reaction. Theenzymes catalyzing the retro-aldol cleavage of rTCA, which haveundergone considerable re-arrangement even within the reductiveworld [89, 90], were further modified to the oxidative citryl-CoAsynthetase.

phylogenetic distribution. (For example, TCA arcs andintermediates, as well as direct C1-reduction, are nearlyuniversally distributed, while the 3HP arcs are restrictedto specific bacterial or archaeal clades living in alkalineenvironments.) Finally, we note that not all module com-binations consistent with autotrophy have been observedin extant organisms.

By combining these observations it is possible to ar-range autotrophic phenotypes on a graph according totheir degree of similarity, and to assign environmentalfactors as correlates of phenotypic changes over mostlinks. The graph projects onto a tree with very high par-simony and therefore almost no requirement to invokeeither horizontal gene transfer or convergent evolutionfrom distinct lineages. With a natural choice of root mo-tivated by the overlap with bacterial and archaeal phy-logeny, links become directed and environmental factorstake on the interpretation of evolutionary causes. Thelack of reticulation in a tree of innovations in autotrophy– at first surprising when compared to highly-reticulatedgene phylogenies [142] covering the same period – be-comes sensible as a record of invasion and adaptationto new chemical environments by organisms capable ofmaintaining little long-standing variation.

2. A parsimony tree for autotrophic metabolism, andcausation on links

The tree of autotrophic carbon-fixation phenotypesfrom Ref. [41] is shown in Fig. 11. All nodes in thetree satisfy the constraint that all five universal anabolicprecursors plus glycine can be synthesized directly fromCO2. We have defined parsimony by requiring singlechanges over links at the level of pathway modules, asexplained above, rather than at the level of single genes,in cases where the two criteria differ. (This definitionseparates the evolution of genetic backgrounds, such as 4-hydroxybutyrate fermentation, from the events at whichorganisms came to rely on complete pathways for au-totrophy.) A complete-parsimony tree for the known phe-notypes is not possible, so we chose a tree in which theonly violations are duplicate innovation of serine synthe-sis from 3-phosphglycerate (common reactions and diver-sified enzyme families), and duplication or transfer of theshort-molecule 3HP pathway (common environments).

The nodes in the tree of Fig. 11 are all phenotypesof extant organisms, with one important exception,32

32 There is also one unimportant exception, which is the insertionof an acetogenic phenotype with a facultative oxidative pathwayto serine at the root of the Euryarchaeota. Since methanogensuse this pathway, and since an acetogenic pathway lacking ox-idative serine synthesis is the most plausible ancestral form forall archaea as well as for Firmicutes within the bacteria, we inferthat such an intermediate state did or does exist. This fixationpathway is consistent with forms observed in extant organisms,

Page 22: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

21

which is the node between Aquificale branch and theFirmicute/Archaea branch. Aquificales and all pheno-types descending from them lack the CODH/ACS en-zyme, while firmicutes and archaea lack one or moreATP-dependent acyl-CoA (citryl-CoA or succinyl-COA)synthases. Therefore, if we seek a connected tree of life,two changes – the gain of one enzyme and loss of theother – are required to connect these branches. Sinceany organism lacking both enzymes could not fix carbonautotrophically, we have chosen the order of gain and lossso that the intermediate node has both the CODH/ACS

and the acyl-CoA synthases. It therefore has both acomplete WL pathway and an autocatalytic rTCA loop,connected through their shared intermediate acetyl-CoA.Losses (but not re-acquisitions) of either of these enzymesoccur at multiple points on the tree, and both have likelyexplanations in either environmental chemistry or ener-getics. For this reason and several others given below,although a parsimony tree is (a priori) unrooted, wewill regard the joint WL/rTCA phenotype as not onlya bridging node but the root of the tree of autotrophs.

and we expect that such a phenotype either will be discovered orwill result from reclassification of genes in an existing organism.

Firmicutes incl.

Acetogens,

Bridge forms?

Robust network

(pre-LUCA?)

Energy

ALK

ALK

(Redox)

CBB

cycle

4-HB

fermentation

3HP

pathway

Oxidative

glycine

Oxidative/

reductive

glycine

Oxidative

glycine

3-HP

pathway

H4MPT

Furans

Energy

4-HB

cycle

2-P-glycolate

ArchaeaPolyisoprene G1P membranes

Archaeal DNA systems

Treelike phylogenies

BacteriaFatty acyl G3P membranes

Bacterial DNA systems

Reticulated phylogenies

LEGEND

Lost reaction

THF/H4MPT sequence

Pentose phosphate

TCA reactions

3PG

F6B

RIB RBL

GAP

GLY

SER

HCO-

-CH2-

ACAPYR

Heterotrophic

Euryarchaeota

ACA

Sulfolobales

δ-proteo-

bacteria (?)ε-proteo-

bacteria

Cyanobacteria

Chloro!exi

Methanogens

Clostridium

Kluyveri

Aqui"cales

Ca. Nitrospira De!uvii

Desulfurococcales

Acidilobales

O2 / S

O2

{n-}

O2 / S{n-}

O2 / S{n-}

O2 / S{n-}

FIG. 11: A parsimony-based reconstruction of the innovations linking the major carbon-fixation phenotypes, from Ref. [41].Nodes in the tree are autotrophic phenotypes, following the coarse-grained notation introduced in Fig. 10, and summarized inthe legend. Grey links are transitions in the maximum-parsimony phylometabolic reconstruction, and yellow-highlighted regionsin the diagrams are innovations following each link. Organism names or clades in which these phenotypes are found are given inblack; fixation pathways innovated along each link are shown in blue, and imputed evolutionary causes are shown in red. S{n−}

refers to sulfides of different oxidation states. Dashed lines separate regions in which the clades by phylometabolic parsimonyfollow standard phylogenetic divisions. Abbreviations: formyl (HCO−); methylene (−CH2−); acetyl-CoA (ACA); pyruvate(PYR); serine (SER); 3-phosphoglycerate (3PG); glyceraldehyde-3-phosphate (GAP); fructose-1,6-bisphosphate (F6B); ribose-phosphate (RIB); ribulose-phosphate (RBL); akalinity (ALK). Arrows indicate reaction directions; dashed line connecting 3PGto SER indicate intermittent or bi-directional reactions.

In the evolution of carbon fixation from a jointWL/rTCA root, the primary division is between theloss of the CODH/ACS, resulting in rTCA loop-fixationphenotypes, and the loss of the acetyl-CoA or succinyl-CoA synthetases, resulting in acetogenic phenotypes.

Very low levels of oxygen permanently inactivate theCODH/ACS, so its loss is probable even under mi-croaerobic conditions. Although the dominant mineralbuffers for oxygen in the Archaean remain a topic ofsignificant uncertainty [143–146], it appears implausi-

Page 23: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

22

ble that molecular oxygen was the toxin responsiblefor loss of the CODH/ACS much before the “GreatOxidation Event” (GOE).33 Therefore the sensitivityof the CODH/ACS to sulfides or perhaps other oxi-dants (S. Ragsdale, pers. comm.) remains a possibly im-portant factor in the early divergences of carbon fixation.

Alternatively, among strict WL-anaerobes, the loss ofcitryl-CoA or succinyl-CoA synthetase saves one ATP percarbon fixed, and all acetogenic phenotypes break rTCAcycling only through the loss of one or the other of theseenzymes. We therefore interpret the loss of rTCA cyclingas a result of selection for energy efficiency. The failureto regain either of these enzymes by acetogens which sub-sequently also lost the CODH/ACS is perhaps surprisinggiven the inferred homology of the ancestral citryl-CoAand succinyl-CoA synthetases [89, 90], but explains theabsence of rTCA cycling in either Firmicutes or any Ar-chaea.

The remaining autotrophic phenotypes are derivedfrom either rTCA cycling or acetogenesis in naturalstages due to plausible environmental factors. Oxida-tive serine synthesis (from 3PG) is associated with therise of the proteobacteria, whose differentiation in manyfeatures tracks the rise of oxygen and the transitionto oxidizing rather than reducing environments. Ru-bisCO and subsequently photorespiration arise withinthe cyanobacteria. The innovation of the 3HP bicyclefrom the malonate pathway arises within the Chloroflexi.In both Firmicutes (bacteria) and the crenarchaea, 4-hydroxybutyrate (or closely related 4-aminobutyrate)fermentation is more or less developed. Closure of thefermentative arcs to form a ring, again driven by elim-ination of the CODH/ACS [41] leads to the DC/4HBpathway in Crenarchaeota, which is then specialized inthe Sulfolobales to the alkaline 3HP/4HB pathway. TheEuryarchaeota are distinguished by the absence of analternative loop-fixation pathway to rTCA, so that allmembers are either methanogens or heterotrophs.

Similarly, the innovation of the 3HP pathways, usingbiotin, emerges as a specialization to invade extreme butrelatively rare environments. A particularly interestingcase is the modification of folates in archaea, leading fromTHF in ancestral nodes to tetrahydromethanopterin inthe methanogens, which enables initial fixation of for-mate (formed by hydrogenation of CO2) in an ATP-freesystem [41, 65]. The root position of rTCA explains thepreservation of rTCA arcs both in reductive acetyl-CoApathways, and in anaplerotic pathways for other fixationpathways, and the root position of direct C1 reductionexplains its near-universal distribution.

33 The GOE is usually dated at 2.5 GYA, though arguments existfor low levels of oxygen as much as 50-100 million years ear-lier [147, 148]. These may be relevant dates to compare to ge-netically estimated loss events in later branches of the Archaeaor possibly in the Clostridia, but they are not plausible as datesfor the first branching in the tree of Fig. 11.

3. Parsimony violation and the role of ecologicalinteractions

A tree is by construction a summary statistic for therelations among the phenotypes which are its leaves orinternal nodes. It is not inherently a map of species de-scent, and takes on that interpretation only when com-mon ancestry is shown to explain the conditional in-dependence of branches given their (topological) parentnodes. This caution is especially important for the inter-pretation of Fig. 11, which shows high parsimony in thedeepest branches where horizontal gene transfer is gen-erally believed to have been most intense [51, 52]. Wehave argued that this behavior is consistent in a tree ofsuccessive optimal adaptations to varied environments,by organisms that could maintain little persistent vari-ation. Violations of parsimony that are improbable byevolutionary convergence contain information about con-tact among historically separated lineages. Under thisinterpretation the separation is primarily environmen-tal, with the subsequent contact identifying ecologicalco-habitation. The possible transfer of genes for the 3HPpathway is especially plausible, as the organisms involvedmay have shared the same extreme (alkaline) environ-ments and been under common selection pressure, whichwhen severe is known to accelerate rates of gene trans-fer [149, 150].

While our methods in Ref. [41] (flux-balance analy-sis of core networks) may be interpreted as producingeither organism models or meta-metabolome models ofconsortia, the general agreement with robust phyloge-netic signatures from many different genomic phyloge-nies [142, 151, 152] may still suggest a dominant role forvertical descent among autotrophic organisms (and notmerely consortia) in the early evolution of carbon fixa-tion.

4. A non-modern but plausible form of redundancy in theroot node

The joint WL/rTCA network was introduced intoFig. 11 to produce a connected tree containing only au-totrophic nodes. Our constraints in choosing it led toa kind of redundancy not found in extant fixation path-ways. Either WL or rTCA alone is self-maintaining (ina modern organism) so a network that incorporates bothis redundantly autocatalytic. While this is an importantand speculative departure from all known phenotypes, itcan be argued to reduce fragilities in both WL and rTCAunder conditions of poor catalysis or unreliable regula-tion of anabolism. In that respect it is a more plausiblephenotype for a universal ancestor than any modern net-work.

The enhanced robustness of the joint network followsfrom the interaction of short-loop and long-loop auto-catalysis. The threshold for autocatalysis in the rTCAloop, fragile against parasitic side reactions or uncon-

Page 24: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

23

strained anabolism, is supported and given a recoverymode when fed by an independent supply of acetyl-CoAfrom WL. In turn, the production of a sufficient concen-tration of folates to support direct C1 reduction, fragile ifthe long biosynthetic pathway is unreliable, is augmentedby additional carbon fixed in rTCA. These argumentsare topological, and do not make specific reference towhether the catalysts for the underlying reactions areenzymes. They may provide context for (perhaps multi-stage) models of transition from primordial mineral catal-ysis [68, 153] to the eventual support of carbon fixationby biomolecules.

Fig. 12 shows a numerical solution for the current flowthrough a minimal version of the joint WL/rTCA net-work, with lumped-parameter representations of para-sitic side reactions and the net free energy of formationof acetate. (The exact rate equations used, and their in-terpretation, are provided in an App. A.) In the absenceof a WL “feeder” pathway, rTCA has a sharp thresh-old for the maintenance of flux through the network as afunction of the free energy of formation of its output ac-etate. The existence of such a sharp threshold dependingon the rate of parasitism, below which the cycle supportsno transport, has been one of the major sources of crit-icism of network-autocatalytic pathways as models forproto-metabolism [154]. When WL is added as a feeder,however, the threshold disappears, and some nonzero fluxpasses through the pathway at any positive free energyof formation of the outputs.

Chemical self-amplification, if it can be demonstratedexperimentally, is the most plausible mechanism bywhich the biosphere can concentrate all energy flows andmaterial cycles through a small, stable set of organic com-pounds. It supplies the molecules that are within the loop– and secondarily those that are made from loop interme-diates – above the concentrations they would have in aGibbs equilibrium distribution, as a result of flow throughthe network. The fact that self-amplification is permittedto act in the model of Fig. 12, even below the chemical-potential difference where the rTCA loop alone is self-sustaining, provides a mechanism by which the loop in-termediates could have been provided in excess supply inthe earliest stages of the emergence of metabolism. Wereturn in Sec. VI to a related form of robustness and se-lection, which applies as anabolic pathways begin to formfrom loop intermediates.

E. The rise of oxygen, and changes in the

evolutionary dynamics of core metabolism

The limits of the phylometabolic tree we show inFig. 11 fall on a horizon that coincides with the riseof oxygen. More precisely: we do not show branchesthat phylogenetically trace lineage divisions later thanthis horizon, because no known divisions in carbon fix-ation distinguish such later branches. Many of the latebranches contain only heterotrophs, and to the extent

−3−2

−10

12

3

−3

−2

−1

0

1

2

30

0.2

0.4

0.6

0.8

1

log(zrTCA

)

Fraction of equilibrium acetate from driving rTCA and WL in parallel

log(zWL

)

x =

[AC

E] /

[AC

E] G

FIG. 12: Graph of solutions to Eq. (A44) from App. A isshown versus base-10 logarithms of zrTCA and zWL. Thequantity x on the z-axis is the fraction of the acetate con-centration [ACE] relative to the value it would take in anequilibrium ensemble with carbon dioxide, reductant, and wa-ter. The value x = 1 corresponds to an asymptotically zeroimpedance of the chemical network, compared to the rate ofenvironmental drain. The parameter zrTCA is a monotonefunction of the non-equilibrium driving chemical potential tosynthesize acetate, and zWL measures the conductance of the“feeder” WL pathway. At zWL → 0, the WL pathway con-tributes nothing, and the rTCA network has a sharp catalyticthreshold at zrTCA = 1. For nonzero zWL, the transition issmoothed, so some excess population of rTCA intermediatesoccurs at any driving chemical potential.

that post-oxygen lineage divisions follow divisions inmetabolism, they are divisions in forms of heterotrophy.The rise of oxygen seems to have put an end to innovationin carbon fixation, and led to a florescence of innovationin carbon sharing.34

On the same horizon, the high parsimony of the treewe have shown ends, and it becomes necessary to ex-plain complex metabolisms as a consequence of transferof metabolic modules among clades in which they hadevolved separately. We no longer expect that it wouldbe possible to explain – and to some extent to predict –these innovations given only constraints of chemistry and

34 By “sharing” we refer to general exchanges in which organic com-pounds are re-used without de novo synthesis; we do not intendonly symbiotic associations. At the level of aggregate-ecosystemnet primary production, the exchange of organics with incom-plete catabolism may, however, reduce the free energy cost ofthe de novo synthesis of biomass that supports a given level ofphenotypic diversity or specialization, allowing ecologies of com-plementary specialists to partially displace ecologies of generalistautotrophs.

Page 25: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

24

invasion of new geochemical environments. Instead, theyrely chemically on ecologically determined carbon flows,and genetically on opportunities for transfer of genes orpathway segments. Therefore any explanation will re-quire some explicit model of ecological dynamics, andmay require invoking some accidents of historical contin-gency. This contrast of phylometabolic reconstructions,between later and earlier periods, illustrates our associ-ation of parsimony violation with the role of ecosystemsand explicit contributions of multilevel dynamics to evo-lution.

It is perhaps counterintuitive, but we believe consis-tent, that the phylometabolic tree is more tree-like inthe earlier era of more extensive single-gene lateral trans-fers, and becomes less tree-like and more reticulated,in the era of complex ecosystems enabled by oxygenicmetabolisms, which may have come as much as 1.5 bil-lion years later. For reticulation to appear in a tree ofreconstructed metabolisms, it is necessary that variantswhich evolved independently – as we have argued, un-der distinct selection pressures – be maintained in newenvironments where they can be brought into both con-

tact and interdependence. The maintenance of standingvariation is facilitated both by the evolution of more ad-vanced mechanisms to integrate genomes and limit hori-zontal transfer, and by the greater power density of oxy-genic metabolisms.

The serine cycle used by some methylotrophic pro-teobacteria, shown in Fig. 13, provides an example ofthe structure and complex inheritance of a post-oxygen,heterotrophic pathway. Methylotrophs possess bothan H4MPT system transferred from methanogenic ar-chaea [155, 156], and a conserved THF system ances-tral to the proteobacteria (and we argue, to the univer-sal common ancestor). In methylotrophs, H4MPT is pri-marily used for the oxidation of formaldehyde to formate,while THF can be used in both the oxidative direction aspart of the demethylation of various reduced one-carboncompounds, and in the reduction of formate. C1 com-pounds are then assimilated either as CO2 in the CBBcycle, as methylene-groups and CO2 in the serine cycle oras formaldehyde in the ribulose monophosphate (RuMP)cycle, in which formaldehyde is attached to ribulose-5-phosphate to produce fructose-6-phosphate [157, 158].

ISC

4HB

H2OCAC

OXS

SSA

H2

CIT

AcACE

OXA

ACE

AKG

H2O

H2

CO2

CO2

H2O

3HB

H2H2O

H2O

PYR

MLN

H2

H2O

CO2

CO2

MSA

MAL

3HP

H2

H2

SUC

PRPMEM

CO2

H2O

H2O

FUM

ACR

GLX

CTM

H2O

MSC

MML

HPY

GLC

CH2O

H2

H2O

GLY

SER

CH2O

H2 NH3H2O

EMA

MSU

H2

CO2CRT

MML

serine cycle

MAL

ACE

AcACE

3HB

CRT

EMAMSU

MSC

PRP

MEM

SUCFUM

GLXGLY

SER

HPY

GLCPYR

OXA

CH2O

CO2

CO2

CO2

glyoxylate regeneration

cycle

TCA arcs

Glycolysis

3HP arc

reversed 3HP arc

reversed 4HB arc

glycine cycle arc

FIG. 13: The serine cycle/glyoxylate-regeneration cycle of methylotrophy. Left panel shows the stoichiometric pathway overlaidon the autotrophic loop pathways from Fig. 7. Right panel gives a projection of the serine cycle and glyoxylate regenerationcycle showing pathway directions; overlaps with the predecessor autotrophic pathways are labeled.

The full substrate network of the most complex assim-ilatory pathway of methylotrophy is a bicycle in whichthe serine cycle is coupled to the glyoxylate regenerationcycle. This full network employs segments of all four

loop-autotrophic pathways, as well as reactions in gly-colysis, and part of the “glycine cycle”. Carbon entersthe pathway at several points. Methylene groups enterthrough the glycine cycle, combining with glycine to form

Page 26: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

25

serine. Serine is then deaminated and reduced to pyru-vate, which is combined with a CO2 in a carboxylationto enter the core of TCA reactions. TCA arcs are per-formed reductively from pyruvate to malate, and oxida-tively from succinate to malate, following the pattern ofthe 3HP pathway plus anaplerotic reactions from its out-put pyruvate. The short-molecule arc of 3HP is run asin the autotrophic carbon-fixation pathway starting frompropionate, but part of the long- molecule arc of 3HP isreversed in the glyoxylate regeneration cycle. The 4HBpathway arc, transferred from archaea, is also reversed tofeed this glyoxylate cycle, and is followed by a final ad-ditional carboxylation unique to this pathway [159, 160].

The serine/gyoxylate cycle of methylotrophy is aremarkable “Frankenstein’s monster” of metabolism,stitched together from parts of all pre-existing pathways,but requiring almost nothing new in its own local chem-istry. Notably, the modules in this bacterial pathwaywhich have been inherited from archaea are all reversedfrom the archaeal direction.

F. Summary: Catalytic control as a central source

of modularity in metabolism

Focusing on the metabolic foundation of the biosphere– carbon-fixation and its interface with anabolism – wehave seen many examples of how catalytic control is acentral organizing principle in metabolism. The mostcomplex and conserved reaction mechanisms in carbon-fixation often have unique (often very elaborate) metalcenters and cofactors associated to them, reflecting thedifficulty of the catalytic problem being solved. Not sur-prisingly, these reactions form the boundaries at whichthe various modules making up carbon-fixation are con-nected. As a result, these module boundaries form someof the strongest long-term constraints on evolution. Theyact as “turnstiles” along which the flow of carbon intothe biosphere is redirected upon biogeochemical pertur-bations, resulting in the deepest structure in the tree oflife.

The catalytic control of classes of organic reactions alsoleads to a secondary source of modularity, the lockingin of various core pathways in the elaboration of down-stream intermediary metabolism. The most striking ex-ample of this is that across the modern biosphere all an-abolic pathways originate in only a very small numberof molecules, mostly within the TCA cycle, even thougha variety of different carbon-fixation strategies are used.The suggested interpretation is that much of intermedi-ary metabolism had elaborated prior to the divergencesin carbon-fixation. A related, but slightly different formof lock-in is found in the construction of methylotrophicpathways, which circumvents innovations in the catalyticcontrol of difficult chemistry by re-using a wide range ofparts from pre-existing carbon-fixation pathways.

IV. COFACTORS, AND THE EMERGENCE

AND CENTRALIZATION OF METABOLIC

CONTROL

Cofactors form a unique and essential class of compo-nents within biochemistry, both as individual moleculesand as a distinctive level in the control over metabolism.In synthesis and structure they tend to be among themost complex of the metabolites, and unlike amino acids,nucleotides, sugars and lipids, they are not primarystructural elements of the macromolecular componentsof cells. Instead, cofactors provide a limited but essen-tial inventory of functions, which are used widely and ina variety of macromolecular contexts. As a result theyoften have the highest connectivity (forming topological“hubs”) within metabolic networks, and are required inconjunction with key inputs or enzymes [161–163] to com-plete the most elaborate metabolisms.

In this section we will discuss how cofactors determineand regulate the scope of organic reactions in biochem-istry, and how as focal points of selection they have beenimportant in the large-scale structure of evolution. Inunderstanding the role of cofactors in the emergence andevolution of metabolism, two consequences of their func-tional roles are essential to acknowledge. First, as wehave discussed, cofactor functions are central in goingfrom the short-loop network autocatalysis that wouldhave been abiotically favored with proper mineral sup-ports, to the long-loop network autocatalysis upon whichall life today rests. As we will see, the most struc-turally complex cofactors are associated with the mostcatalytically complex functions within carbon-fixation,and thus form the most elaborate long-loop feedback clo-sures at the substrate level. Second, because cofactorfunctions are associated with kinetic bottlenecks withinmetabolism, their inventory of functions form stronglong-term constraints on the evolution of new pathways,so innovations in cofactor synthesis can have dramaticeffects on the large-scale structure of evolution.

A. Introduction to cofactors as a group, and why

they define an essential layer in the control of

metabolism

1. Cofactors as a class in extant biochemistry

The biosynthesis of cofactors involves some of the mostelaborate and least understood organic chemistry used byorganisms. The pathways leading to several major cofac-tors have only very recently been elucidated or remain tobe fully described, and their study continues to lead tothe discovery of novel reaction mechanisms and enzymesthat are unique to cofactor synthesis [164–166]. Whilecofactor biosynthetic pathways often branch from coremetabolic pathways, their novel reactions may producespecial bonds and molecular structures not found else-where in metabolism. These novel bonds and structures

Page 27: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

26

are generally central in their catalytic functions.

Structurally, many cofactors form a class in transitionbetween the core metabolites and the oligomers. Theycontain some of the largest directly-assembled organicmonomers (pterins, flavins, thiamin, tetrapyrroles), butmany also show the beginnings of polymerization of stan-dard amino acids, lipids or ribonucleotides. These maybe joined by the same phosphate ester bonds that linkRNA oligomers or aminoacyl-tRNA, or they may use dis-tinctive bonds (e.g. 5′-5′ esters) found only in the cofactorclass [167].

The polymerization exhibited within cofactors is dis-tinguished from that of oligomers by its heterogene-ity. Srinivasan and Morowitz [40] have termed them“chimeromers”, because they often include monomericcomponents from several molecule classes. Examplesare coenzyme-A, which includes several peptide unitsand an ATP; folates, which join a pterin moiety topara-aminobenzoic acid (PABA); quinones, which join aPABA derivative to an isoprene lipid tail; and a varietyof cofactors assembled on phosphoribosyl-pyrophosphate(PRPP) to which RNA “handles” are esterified.

We may understand the border between small andlarge molecules, where most cofactors are found, as morefundamentally a border between the use of heteroge-neous organic chemistry to encode biological informa-tion in covalent structures, and the transition to homoge-neous phosphate chemistry, with information carried insequences or higher-order non-covalent structures. Thechemistry of the metabolic substrate is mostly the chem-istry of organic reactions. Phosphates and thioesters mayappear in intermediates, but their role generally is to pro-vide energy for leaving groups, enabling formation of themain structural bonds among C, N, O, and H. One of thestriking characteristics scales in metabolism is that itsorganic reactions, the near-universal mode of construc-tion for molecules of 20 to 30 carbons or less, cease tobe used in the synthesis of larger molecules.35 Largeoligomeric macromolecules are almost entirely synthe-sized using the dehydration potential of phosphates [170]to link monomers drawn from the inventory [39] of smallcore metabolites. Many cofactors have structure of bothkinds, and they are the smallest molecules that as a classcommonly use phosphate esters as permanent elementsof structure [171].

Finally, cofactors are distinguished by structure-function relations determined mostly at the single-molecule scale. The monomers that are incorporated intomacromolecules are often distinguished by general prop-erties, and only take on more specific functional rolesthat depend strongly on location and context [172, 173].In contrast, the functions of cofactors are specific, of-

35 Even siderophores, among the most complex of widely-used or-ganic compounds, are often elaborations of functional centersthat are small core metabolites, such as citrate [168, 169].

ten finely tuned by evolution [65], and deployable in awide range of macromolecular contexts. Usually theyare carriers or transfer agents of functional groups or re-ductants in intermediary metabolism [174]. Nearly halfof enzymes require cofactors as coenzymes [171, 174].If we extend this grouping to include chelated met-als [175, 176] and clusters, ranging from common iron-sulfur centers to the elaborate metal centers of gas-handling enzymes [80, 120], more than half of enzymesrequire coenzymes or metals in the active site.

The universal reactions of intermediary metabolismdepend on only about 30 cofactors [174] (though thisnumber depends on the specific definition used). Majorfunctional roles include 1) transition-metal-mediated re-dox reactions (heme, cobalamin, the Nickel tetrapyrroleF430, chlorophylls36), 2) transport of one-carbon groupsthat range in redox state from oxidized (biotin for car-boxyl groups, methanofurans for formyl groups) to re-duced (lipoic acid for methylene groups, S-adenosyl me-thionine, coenzyme-M and cobalamin for methyl groups),with some cofactors spanning this range and mediat-ing interconversion of oxidation states (the folate familyinterconverting formyl to methyl groups), 3) transportof amino groups (pyridoxal phosphate, glutamate, glu-tamine), 4) reductants (nicotinamide cofactors, flavins,deazaflavins, lipoic acid, and coenzyme-B), 5) membraneelectron transport and temporary storage (quinones), 6)transport of more complex units such acyl and amino-acyl groups (panthetheine in CoA and in the acyl-carrierprotein (ACP), lipoic acid, thiamine pyrophosphate), 7)transport of dehydration potential from phosphate es-ters (nucleoside di- and tri-phosphates), and 8) sources ofthioester bonds for substrate-level phosphorylation andother reactions (panthetheine in CoA).

2. Roles as controllers, and consequences for the emergenceand early evolution of life

Cofactors fill roles in network or molecular catalysis be-low the level of enzymes, but they share with all catalyststhe property that they are not consumed by participat-ing in reactions, and therefore are key loci of control overmetabolism. Cofactors as transfer agents are essentialto completing many network-catalytic loops. In associ-ation with enzymes, they can create channels37 and ac-

36 It is natural in many respects to include Ferredoxins (and relatedflavodoxins) in this list. Although not cofactors by the criteria ofsize and biosynthetic complexity, these small, widely-diversified,ancient, and general-purpose Fe2S2, Fe3S4, and Fe4S4-bindingpolypeptides are unique low-potential (high-energy) electrondonors. Reduced Ferredoxins are often generated in reactionsinvolving radical intermediates in iron-sulfur enzymes, describedbelow in connection with electron bifurcation.

37 An example is the role of cobalamin as a C1 transfer agent tothe Nickel reaction center in the acetyl-CoA synthase from acorrinoid iron-sulfur protein [177–179].

Page 28: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

27

tive sites38, and thus they facilitate molecular catalysis.Through the limits in their own functions or in the func-tional groups they transport through networks, they mayimpose constraints on chemical diversity or create bottle-necks to evolutionary innovation. The previous sectionshave shown that many module boundaries in carbon fix-ation and core metabolism are defined by idiosyncraticreactions, and we have noted that many of these idiosyn-crasies are associated with specific cofactor functions.

Cofactors, as topological hubs, and participants in re-actions at high-flux boundaries in core and intermediarymetabolism, are focal points of natural selection. Theadaptations available to key atoms and bonds includealtering charge or pKa, changing energy level spacingthrough non-local electron transport, or altering orbitalgeometry through ring strains. Divergences in low-levelcofactor chemistry may alter the distribution of func-tional groups and thereby change the global topology ofmetabolic networks,39 and some of these changes maponto deep lineage divergences in the tree of life.

Most research on the origin of life has focused ei-ther on the metabolic substrate [6, 180] or catalysis byRNA [181], but we believe the priority of cofactors de-serves (and is beginning to receive) greater consider-ation [182, 183]. In the expansion of metabolic sub-strates from inorganic inputs, the pathways to produceeven such complex cofactors as folates et alia are com-parable in position and complexity to those for purineRNA, while some for functional groups such as nicoti-namide [182] or chorismate are considerably simpler.Therefore, even though it is not known what catalyticsupport or memory mechanisms enabled the initial elab-oration of metabolism, any solutions to this problemshould also support the early emergence of at least themajor redox and C- and N-transfer cofactors. Con-

versely, the pervasive dependence of biosynthetic reac-tions on cofactor intermediates makes the expansion ofprotometabolic networks most plausible if it was sup-ported by contemporaneous emergence and elaborationof cofactor groups. In this interpretation cofactors oc-cupy an intermediate position in chemistry and com-plexity, between the small-metabolite and oligomer lev-els [182]. They were the transitional phase when the re-action mechanisms of core metabolism came under selec-tion and control of organic as opposed to mineral-basedchemistry, and they provided the structured foundationfrom which the oligomer world grew.

We argue next that a few properties of the elementshave governed both functional diversification and evolu-tionary optimization of many cofactors, especially thoseassociated with core carbon-fixation. We focus in par-ticular on heterocycles with conjugated double bonds in-corporating nitrogen, and on the groups of functions thatexploit special properties of bonds to sulfur atoms.

B. The cofactors derived from purine RNA

Most of the cofactors that use heterocycles for theirprimary functions have biosynthetic reactions closely re-lated to those for purine RNA. These reactions are per-formed by a diverse class of cyclohydrolase enzymes,which are responsible for the key ring-formation and ring-rearrangement steps. The cyclohydrolases can split andreform the ribosyl ring in PRPP, jointly with the 5- and6-membered rings of guanine and adenine. Five biosyn-thetically related cofactor groups are formed in this way.Four of these – the folates, flavins, deazaflavins and thi-amin – are formed from GTP, as shown in Fig. 14.

38 An example is the role of TPP as the reaction center in thepyruvate-ferredoxin oxidoreductase (PFOR), which lies at theend of a long electron-transport channel formed by Fe-S clus-ters [84].

39 A well-understood example is the repartitioning of C1 flux frommethanopterins versus folates [41, 65]. The same adaptation

that enables formylation of methanopterins within an exclu-sively thioester system, where the homologous folate reactionrequires ATP, reduces the potential for methylene-group trans-fer, and necessitates the oxidative formation of serine from 3PGin methanogens, which is not required of acetogens.

Folates: The folates are structurally most similar toGTP, but have undergone the widest range of secondaryspecializations, particularly in the Archaea. They areprimarily responsible for binding C1 groups during re-duction from formyl to methylene or methyl oxidationstates, and their secondary diversifications are apparentlyresults of selection to tune the free-energy landscape ofthese oxidation states.

Flavins and deazaflavins: The flavins are tricycliccompounds formed by condensation of two pterin groups,while deazaflavins are synthesized through a modifiedversion of this pathway, in which one pterin groupis replaced by a benzene ring derived from choris-

mate. Flavins are general-purpose reductants, whiledeazaflavins are specifically associated with methanogen-esis.

Thiamin: Thiamine combines a C-N heterocycle com-mon to the GTP-derived cofactors with a thiazole group(so incorporating sulfur), and shares functions with boththe purine cofactor group and the alkyl-thiol group re-viewed in the next subsection.

Histidine: The last “cofactor” in this group is the aminoacid histidine, synthesized from ATP rather than GTPbut using similar reactions. Histidine is a general acid-base catalyst with unique pKa, which in many ways func-tions as a “cofactor in amino acid form” [40].

Page 29: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

28

HO

N

N

N

NNH

OOP

OH

O

HO OH

OHO

O P

OH

O

OHHOO OHPO

OH

O

OPHO

OH

OPO

OH

O

HO OH

PRPP OOPHO

OH

O

HO OH

HN

NH

H

O

NH

AIR

N

N

NH2

OOPHO

OH

O

HO OH

N

N

NO

OPHO

OH

O

HO OH

NH2

O

NH O

OH

OH

N

N

NH2

OOPHO

OH

O

HO OH

NH2

O

HN

N

NH2

NH2

O

OH

Histidine

NN

HN

OOPHO

OH

O

HO OH

ONH2

O

HN

N

NN

O

OOPHO

OH

O

HO OH

IMP

OPO

OH

O

PHO

OH

O

N

N

NN

NH2

OOP

OH

O

HO OH

ATP

OPO

OH

O

PHO

OH

O

HN

N

NN

O

OOP

OH

O

HO OH

H2N

GTP

N

N

NH2

N+

S

O P

OH

O

O P OH

OH

O

Thiamin-PP

N

HN

N

NH

O

OPO

OH

O

PHO

OH

O

OP

OH

O

OH

OH

N

NH

HN

NH

ONH

O

NH

OHO

O

HO

THF

N

HN

N

N

O O

OH

OH

OH

OH

N

HN

N

N

O O

OH

OH

OH

OH

NH

HN

H2N

HN

O

OH

OH OH

OH

O

Ribo avin

3.5.4.19

3.5.4.10

3.5.4.16

3.5.4.25

ThiC

ATP

x 2

N

N

NH2

OH

HisF/HisH

FIG. 14: Key molecular re-arrangements in the network leading from AIR to purines and the purine-derived cofactors. The3.5.4 class of cyclohydrolases (red) convert FAICAR to IMP (precursor to purines), and subsequently convert GTP to folatesand flavins by opening the imidazole ring. Acting on the 6-member ring of ATP and on a second attached PRPP, the enzyme3.5.4.19 initiates the pathway to histidinol. The thiamine pathway, which uses the unclassified enzyme ThiC to hydrolizeimidazole and ribosyl moieties, is the most complex, involving multiple group rearrangements (indicated by colored atoms).This complexity, together with the subsequent attachment of a thiazole group, lead us to place thiamine latest in evolutionaryorigin among these cofactors.

We will first describe in detail the remarkable role ofthe folate group in the evolutionary diversification of theWood-Ljungdahl pathway, and then return to generalpatterns found among the purine-derived cofactors, andtheir placement within the elaboration of metabolism andRNA chemistry.

1. Folates and the central superhighway of C1 metabolism

Members of the folate family carry C1 groups boundto either the N5 nitrogen of a heterocycle derived fromGTP, an exocyclic N10 nitrogen derived from a para-aminobenzoic acid (PABA), or both. The two mostcommon folates are tetrahydrofolate (THF), ubiqui-tous in bacteria and common in many archaeal groups,and tetrahydromethanopterin (H4MPT), essential formethanogens and found in a small number of late-branching bacterial clades. Other members of this familyare exclusive to the archaeal domain and are structuralintermediates between THF and H4MPT. Two kinds ofstructural variation are found among folates, as shownin Fig. 15. First, only THF retains the carbonyl groupof PABA, which shifts electron density away from N10

via the benzene ring, and lowers its pKa relative to N5

of the heterocycle. All other members of the family lackthis carbonyl. Second, all folates besides THF incorpo-rate one or two methyl groups that impede rotation be-tween the pteridine and aryl-amine planes, changing therelative entropies of formation among different binding

states for the attached C1 [41, 65, 184].

Folates mediate a diverse array of C1 chemistry, var-ious parts of which are essential in the biosynthesis ofall organisms [65]. The collection of reactions, summa-rized in Fig. 4, has been termed the “central superhigh-way” of one-carbon metabolism. Functional groups sup-plied by folate chemistry, connected by interconversionof C1-oxidation states along the superhighway, include1) formyl groups for synthesis of purines, formyl-tRNA,and formylation of methionine (fMet) during transla-tion, 2) methylene groups to form thymidilate, which arealso used in many deep-branching organisms to synthe-size glycine and serine, forming the ancestral pathway tothese amino acids [41], and 3) methyl groups which maybe transferred to S-adenosyl-methionine (SAM) as a gen-eral methyl donor in anabolism, to the acetyl-CoA syn-thase to form acetyl-CoA in the Wood-Ljungdahl path-way, or to coenzyme-M where the conversion to methaneis the last step in the energy system of methanogenesis.

The variations among folates, shown in Fig. 15, leavethe charge, pKa and resulting C-N bond energy at N5

roughly unaffected, while the the N10 charge, pKa, andC-N bond energy change significantly across the fam-ily. This charge effect, together with entropic effectsdue to steric hindrance from methyl groups, can sharplyvary the functional roles that different folates play in an-abolism.

The biggest difference lies between THF and H4MPT.In THF, the N10 pKa is as much as 6.0 natural-log unitslower than that of N5 [185]. The resulting higher-energy

Page 30: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

29

GTP

OP OOH

O

P HOOH

O

HN

N

NN

O

OO P

OH

O

HOOH

H2

N

NH

N

N

HN

O OP HO

O

O

O PO

O

N2

O

HO

NH

HN

NH2N

OOH

OH

OHOO

OHO

O

P

HO

O

O

O

O

HO

HN

OH

O

sulfopterin

sarcinapterin

Pyrococcus/Thermococcus

tatiopterin 0/1

thermopterin

THF

H

OP HO

OH

O

O

HOOH

OP HO

OH

O

O

HOOH

OPHOOH

O

OPOH

O

N2H

N2H

p-aminobenzoate

OH

OH

OH

NH

NHN

NH2N

O HN

OH

PRPP

N

NH N

NH

O HN

O

HO

H2N

SAM

THF

H4MPT

H H

OHN

OHO

N

NH N

NH

O HNO

H2N

OH

HOH

OH

OH

NH

NHN

NH2N

O HN

O

O

P

O

O

HO

HO

OO

O OH

HO

OH

OH

OH

OH

NH

NHN

NH2N

O HN

O

O

P

O

O

HO

HO

OO

O OH

HO

OH

OH

OH

OH

NH

HN

NH2N

O

O

O

P

O

O

HO

HO

OO

O OH

HO

OH

Synthesis Structural variation

NH

HN

NH2N

OOH

OH

OHOO

OO

OH

OH

N

O

H OH

1-5

NH

HN

NH2N

OOH

OH

OHOO

OHO

O

P

HO

O

O

O

O

HO

HN

OH

O

HO

OH

1-2NH

HN

NH2N

OOH

OH

OHOO

OHO

O

P

HO

O

O

O

O

HO

HN

OH

O

NH

HN

NH2N

OOH

OH

OHOR

NH

HN

NH2N

O

1-O

HO

HN

OH

O

O

nB, A

A

A

A

A

A

A = Archaea

B = Bacteria

α-KG

Glu

NHNH

NHNH

NHNH

NHNH

NHNH

NHNH

NHNH

OH

OH

OHOO

OHO

O

P

HO

O

O

O

OH

NH

HN

NH2N

O

H4MPT

B, A NHNH

510

5

10

*

*

FIG. 15: Structural variants among cofactors in the folatefamily, shown with the biosynthetic pathways that producethese variations. Pteridine and benzene groups shown in blue,and methyl groups that regulate steric hindrance shown inred.

C-N bond cannot be formed without hydrolysis of oneATP, either to bind formate to N10 of THF, or to cy-clize N5-formyl-THF to form N5,N10-methenyl-THF (seeFig. 4).40 After further reduction, the resulting methy-lene is readily transferred to lipoic acid to form glycineand serine, in what we have termed the “glycine cy-cle” [41] (the lipoyl-protein based cycle on the right inFig. 4.

In contrast, in H4MPT the difference in pKa betweenN10 and N5 is only 2.4 natural-log units. The lowerC-N10 bond energy permits spontaneous cyclization ofN5-formyl-H4MPT, following (also ATP-independent)transfer of formate from a formyl-methanofuran cofac-tor. Through this sequence, methanogens fix formate inan ATP-independent system using only redox chemistry.

40 This reaction is the mirror image of the cyclization of N10-formyl-THF. This latter reaction is spontaneous. We will argue belowthat the alternative cyclization from N5-formyl-THF, previouslyonly recognized as a salvage pathway [186], may reflect an un-recognized function of the cycloligase as an enzyme for ATP-dependent formate incorporation.

The initial free energy to attach formate to methanofuranis provided by the terminal methane released in methano-genesis (the Co-M/Co-B cycle in Fig. 4). The resultingdownstream methylene group, however, has too little en-ergy as a leaving group to transfer to an alkyl-thiol cofac-tor, so methanogens sacrifice the ability to form glycineand serine by direct reduction of formate.

The reconstructed ancestral use of the 7-9 reac-tions in Fig. 4 is to reduce formate to acetyl-CoA ormethane. However, the reversibility of many reactionsin the sequence, possibly requiring substitution of reduc-tant/oxidant cofactors, allows folates to accept and do-nate C1 groups in a variety of oxidation states, from andinto many pathways including salvage pathways. Methy-lotrophic proteobacteria which have obtained H4MPTthrough horizontal gene transfer [156, 157] may run thefull reaction sequence in reverse. They may use ei-ther H4MPT to oxidize formaldehyde or THF to oxidizevarious methylated C1 compounds, in both cases lead-ing to formate, or other intermediary oxidation states(from THF) as inputs to anabolic pathways. In manylate-branching bacteria, some archaea, and eukaryotes,the THF based pathway may run in part oxidativelyand in part reductively, through connections to eithergluconeogenesis/glycolysis or glyoxylate metabolism. Inthese organisms serine (derived through oxidation, am-ination and dephosporylation from 3-phosphoglycerate)or glycine (derived through amination of glyoxylate) be-come the sources of transferable methyl groups in an-abolism. This versatility has preserved the folate path-way as an essential module of biosynthesis in all domainsof life, and at the same time has made it a pivot of evo-lutionary variation.

2. Refinement of folate-C1 chemistry maps onto lineagedivergence of methanogens

The structural and functional variation within thefolate family illustrates the way that selection, actingon cofactors, can create large-scale re-arrangements inmetabolism, enabling adaptations that are reflected inlineage divergences. The free-energy cascade described inthe last section, linking ATP hydrolysis, the charge andpKa of the N10 nitrogen, and the leaving-group activityof the resulting bound carbon for transfer to alkyl-thiolcofactors or other anabolic pathways, is a fundamentallong-range constraint of folate-C1 chemistry. A compar-ative analysis of gene profiles in pathways for glycine andserine synthesis, explained in Ref. [41], shows that whilethe constraint cannot be overcome, its impact on the formof metabolism can vary widely depending on the struc-ture of the mediating folate cofactor.

The annotated role for ATP hydrolysis in WL au-totrophs is to attach formate to N10 of THF, initiatingthe reduction sequence. However, many deep-branchingbacteria and archaea show no gene for this reaction, whilemultiple lines of evidence indicate that THF nonetheless

Page 31: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

30

functions as a carbon-fixation cofactor in these organ-isms [41]. In almost all cases where an ATP-dependentN10-formyl-THF synthase is absent, an ATP-dependentN5-formyl-THF cycloligase [186, 187] is found. This isanother case where a broad evolutionary context allowsan alternate interpretation. N5-formyl-THF cycloligasewas originally discovered in mammalian systems, whereits function has been highly uncertain and hypothesizedto be the salvage mechanism as part of a futile cy-cle [186, 187], before being found to be widespread acrossthe tree of life [41]. If we deduce by reconstruction, how-ever, that ancestral folate chemistry operated in the fullyreductive direction, and that in H4MPT systems formateis attached at the N5 position, while in THF systems for-mate is attached at the N10 position, the widespread dis-tribution of the cycloligase takes on a different meaning.It is plausible that the N5-formyl-THF cycloligase allowsa formate incorporation pathway that is an evolutionaryintermediate between the commonly recognized pathwayusing THF and its evolutionary derivative using H4MPT(see Fig. 4). The ATP-dependent cycloligase producesN5,N10-methenyl-THF from N5-formyl-THF, which maypotentially form spontaneously due to the higher N5-pKa [187]. ATP hydrolysis is thus specifically linkedto the N10-carbon bond which is the primary donor forcarbon groups from folates. Methanogens, in contrast,escape the dependence on ATP hydrolysis by decarboxy-lating PABA before it is linked to pteridine to formmethanopterin (see Fig. 15), but they sacrifice methyl-group donation from H4MPT to most anabolic pathways,making methanogenesis viable only in clades that evolvedthe oxidative pathway to serine from 3-phosphoglycerate.

We noted in Sec. III D that the elimination of one ATP-dependent acyl-CoA synthase in acetogens reduces thefree energy cost of carbon fixation relative to rTCA cy-cling. The decoupling of the formate-fixation step onmethanopterins from ATP hydrolysis is a further signif-icant innovation, lowering the ATP cost for uptake ofCO2. This divergence of H4MPT from THF, and a re-lated divergence of deazaflavins from flavins (see Fig. 16),follow phylogenetically (and we believe, were responsiblefor) the divergence of the methanogens from other eur-yarcheota [41].

We regard this example as representative of the waythat innovations in cofactor chemistry more generallymediated large-scale rearrangements in metabolism, andcorresponding evolutionary (and ecological) divergencesof clades. Another similar example comes from thequinones, a diverse family of cofactors mediating mem-brane electron transport [188]. Ref. [189] found that thesynthetic divergence of mena- and ubiquinone followsthe pattern of phylogenetic diversification within pro-teobacteria. δ- and ǫ-proteobacteria use menaquinone,γ-proteobacteria use both mena- and ubiquinone, andα- and β-proteobacteria use only ubiquinone. Becausemena- and ubiquinone have different midpoint potentials,it was suggested that their distribution reflects changesin environmental redox state as the proteobacteria diver-

sified during the rise of oxygen [189, 190]. Such phylo-genetic divergences may alternatively be thought of asdivergences driven by the closure of more advantageouslong-loop feedback cycles.

3. Relation of the organic superhighway to minerals

A very wide range of circumstantial arguments hasbeen made for the emergence of biochemistry fromthe reduced-mineral/seawater chemistry of hydrothermalvents. These include: detailed accounts of the capac-ity of a range of geochemical energy systems to to sup-port extant life [25, 28],41 detailed similarities betweentransition-metal/sulfide mineral unit cells and metallo-enzyme active sites [68, 119, 191], the widespread useof radical mechanisms in assembly of metal-center en-zymes [120], and the more general presence of chelatedmetals in ubiquitous and conserved cofactors and en-zymes (particularly tetrapyrroles and ferredoxins), therichness of vent environments in geometry, surface catal-ysis [88, 192, 193], thermal and pH gradients, and thegreater similarity of the aqueous redox environment ofhydrothermal fluids to biochemistry, than of atmosphericfree-radical chemistry or the quenched ion chemistry inthe interstellar medium [27, 66, 194, 195]. While these ar-guments still leave too many circumstantial steps to havecreated consensus that metabolism emerged through self-organization from geochemistry [154], among the manyspeculations about what was necessary for the firstmetabolism, the geochemical hypothesis is grounded inthe widest array of relevant empirical evidence. The geo-chemical hypothesis has also been circumstantially sup-ported by experimental evidence that minerals can cat-alyze reactions in the citric-acid cycle [71], and an exten-sive range of reductions [196, 197], including synthesis ofacetyl-thioesters [56].

The distinctive features of biochemical C1 reductionare the attachment of formate to tuned heterocyclic oraryl-amine nitrogen atoms for reduction, and the transferof reduced C1 groups to sulfhydryl groups (of SAM, lipoicacid, or CoM). In the mineral-origin hypothesis for directreduction, the C1 were adsorbed at metals and either re-duced through crystal oxidation [193] or by reductant insolution. The transfer of reduced C1 groups to alkyl-thiolcofactors may show continuity with reduction on metal-sulfide minerals. However, the mediation of reduction by

41 A subset of the entries in Table 1 of Ref.[25], involving Fe2+ re-duction or autotrophic methanogenesis, can be applied directlyto early-earth environments. Many entries in their table of envi-ronments involve sulfates, nitrates, ferric iron, or small amountof molecular oxygen (the Knallgas reaction) as terminal electronacceptors. The organic conversions detailed in the paper remaina basis for habitability analysis, but plausible pathways in theHadean will be limited by the alternative terminal electron ac-ceptors.

Page 32: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

31

nitrogens appears a distinctively biochemical innovation.

4. Cyclohydrolases as the central enzymes in the family,and the resulting structural homologies among cofactors

The common reaction mechanism unifying the purine-derived cofactors is an initial hydrolysis of bothpurine and ribose rings performed by cyclohydro-lases assigned EC numbers 3.5.4 (see Fig. 14).These enzymes are responsible for the synthesis ofinosine-monophosphate (IMP, precursor to AMP andGMP) from 5-formamidoimidazole-4-carboxamide ri-bonucleotide (FAICAR), for the first committed steps inthe syntheses of both folates and flavins from GTP, andfor the initial ring-opening step in the synthesis of Histi-dine from ATP and PRPP. Fig. 14 shows the key stepsin the network synthesizing both purines and the pterins,folates, flavins, thiamine, and histidine.

The common function of the 3.5.4 cyclohydrolases ishydrolysis of rings on adjacent nucleobase and ribosegroups, or the formation of cycles by ligation of ringfragments. In all cases, the ribosyl moieties come fromphosphoribosyl-pyrophosphate (PRPP). In the synthesisof pterins from GTP and of histidinol from ATP, both anucleobase cycle and a ribose are cleaved. In pterin syn-thesis, the imidazole of guanine and the purine ribose arecleaved. In histidine synthesis, the six-membered ring ofadenine is cleaved (at a different bond than the one syn-thesized from FAICAR), and the ribose comes from asecondary PRPP.

By far the most complex synthesis in this family is thatof thiamin from aminoimidazole ribonucleotide (AIR).This sequence begins with an elaborate molecular rear-rangement, performed in a single step by the enzymeThiC [166]42. While this enzyme is unclassified, and itsreaction mechanism incompletely understood, it sharesapparent characteristics with members of the 3.5.4 cy-clohydrolases. As in the first committed steps in thesynthesis of folates and flavins from GTP, both a ribosering and a 5-member heterocycle are cleaved and sub-sequently (as in folate synthesis) recombined into a 6-member heterocycle. The complexity of this enzymaticmechanism makes a pre-enzymatic homologue to ThiCdifficult to imagine, and suggests that thiamin is both oflater origin, and more highly derived, than other cofac-tors in this family. This derived status is supported bythe fact that the resulting functional role of thiamin is notperformed on the pyrimidine ring itself, but rather on thethiazole ring to which it is attached, and which is likewisecreated in an elaborate synthetic sequence [166]. The re-actions involving TPP do not directly create bonds to the

42 Eukaryotes use an entirely different pathway, in which thepyrimidine is synthesized from histidine and pyridoxal-5-phosphate [198]

sulfur atom, but instead use the carbon between it andthe positively charged nitrogen. It seems likely, however,that the sulfur indirectly contributes to the properties ofthat carbon, through some combination of electrostatic,resonance, or possibly ring-straining interactions.

Fig. 16 shows the detailed substrate re-arrangement inthe sub-network leading from GTP to methanopterins,folates, riboflavin, and the archaeal deazaflavin F420. Inthe pterin branch, both rings of neopterin are synthe-sized directly from GTP, and an aryl-amine originatingin PABA provides the second essential nitrogen atom.PABA is either used directly (in folates) or decarboxy-lated with attachment of a PRPP (in methanopterins) tovary the pKa of the amine. In contrast, the flavin branchis characterized by the integration of either ribulose (inriboflavin) or chorismate (in F420) to form the internalrings. Two 6,7-Dimethyl-8-(D-ribityl)lumazine are con-densed to form riboflavin, whereas a single GTP withchorismate forms F420.

OPO

OH

O

PHO

OH

O

HN

N

NN

O

OOP

OH

O

HO OH

H2N

GTP

OPO

OH

O

PHO

OH

O

HN

N

NH

NH2

O

OOP

OH

O

HO OH

H2N

HO

HN

HN

NH

NH2

O

OOP

OH

O

HO OH

O

HN

NH

NH

NH2

O

O

OH

OHHO

OH

OHO

O

HO

O

HO

OHO P

OH

O

HN

NH

N

HN

O

O

OH

OHHO

OH

HN

NH

N

O

O

OH

OHHO

OH

OH

OPO

OH

O

PHO

OH

O

NH

NHN

H2N

O

OP

OH

O

NH2

O

HOOH

NH

N

N

NH

O

OPO

OH

O

PHO

OH

O

OP

OH

O

OH

HO

NH2

N

NH

HN

NH

ONH

O

HO

NH2

O

HO

N

NH

HN

NH

ONH

OPHO

OH

O

O

HO OH

F420FADTHFH4MPT

PRPP

Chorismate

HN

NH

N

HN

O

O

OH

OHHO

OH

Ribulose-5-P

Chorismate

Ribo!avin

FIG. 16: The substrate modifications leading from GTP tothe four major cofactors H4MPT, THF, riboflavin (in FAD)and the archaeal homologue deazaflavin F420. The branchesindicating substrate diversification may also reflect an evolu-tionary lineage.

The cyclohydrolase reactions are the innovation en-abling the biosynthesis of this whole family of cofactors,and importantly, of purine RNA itself. Except for TPP,the distinctions among purine-derived cofactors are mi-nor secondary modifications on a background structured

Page 33: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

32

by PRPP and C-N heterocycles. Chorismate, precursorto PABA and the unique source of single benzene ringsin biochemistry, is the only other developed sub-networkwithin metabolism, besides purine synthesis, on whichthis family draws. Flexibility in the ways that choris-mate is modified to control electron density, and the waythe benzene ring is combined with other heterocycles,contributes to the combinatorial elaboration within thefamily.

5. Placing the members of the class within the networkexpansion of metabolism

The following observations suggest to us that most ofthe purine-derived cofactors (possibly excepting thiamin)were available contemporaneously with monomer purineRNA.

The current understanding of protein cyclohydrolasesdoes not suggest other, simpler mechanisms by whichsimilar reactions might first have been catalyzed.43

However, at whatever stage catalysts capable of inter-converting AIR, AICAR, FAICAR, and IMP first be-came available, there is no compelling reason to be-lieve that pteridines were not formed contemporaneously.If the chorismate pathway (which begins in the sugar-phosphate network) had also arisen by that time, thereis no compelling reason to believe that folates and flavinswere not likewise available. Particularly if the early cata-lysts were primitive, opening reaction mechanisms at thelevel of the first three EC numbers but not restrictingmolecular substrates, it would be difficult to argue thatmolecules generally resembling this cofactor class couldhave been reliably excluded from a monomer-purine RNAworld.44

Conversely, the patterns that characterize currentmetabolism as a recursive network expansion [161, 162]about inorganic inputs are most easily understood as areflection of the organic-chemical possibilities opened bycofactors. Pterins, as donors of activated formyl groups,support (among other reactions) the synthesis of purines,forming a short autocatalytic loop. Similarly, flavinswould have augmented redox reactions. Finally, it haslong been recognized that acid/base catalysis is uniquelyserved by histidine, which has a pKa ≈ 6.5 on the ε-nitrogen, a property not found among any biological ri-bonucleotides (though possible for some substituted ade-nine derivatives) [202].

Within the class of GTP-derived cofactors, a sub-structure may perhaps be suggested: the dimer condensa-

43 For some reactions, the abstraction of enzyme mechanism is ad-vanced enough to identify small-molecule organocatalysts thatcould have provided similar functions [199, 200].

44 Whether the first RNA were produced in this way, or throughstructurally very dissimilar stages, is a currently active ques-tion [201].

tion that forms riboflavin is a hierarchical use of buildingblocks formed from GTP. Although simple and consistingof a single key reaction, this could reflect a later stage ofrefinement. It is recognized [203] that flavins are some-what specialized reductants, both biosynthetically andfunctionally more specific than the much simpler nicoti-namide cofactors, which plausibly preceded them [182].

6. Purine-derived cofactors selected before RNA itself, asopposed to having descended from an RNA world defined

through base pairing?

The overlap between RNA and cofactor biosynthesis,and the incorporation of AMP in several cofactors (whereis serves primarily as a “handle” for docking), has beennoticed and given the interpretation that cofactors are adegenerated relic of an oligomer RNA world [171]. Theonly significant logical motivation to place oligomer RNAprior to small-molecule cofactors (which are of compara-ble complexity to monomer RNA) is a premise that theelaboration of biosynthesis required selected catalysts,and that RNA base pairing is the least-complex plausiblemechanism supporting (specifically, Darwinian) selectionand persistence of the required catalysts.

This is a complex premise, as it requires not onlyorganosynthesis of RNA, but also chiral selection andmechanisms to enable base pairing and (presumablytemplate-directed) ligation [204].45 In comparison, small-molecule catalysis by either RNA [205] or related cofac-tors may be considered in any context that supports theirsynthesis.46 If chemical mechanisms are found which sup-port structured organosynthesis and selection – a require-ment for any metabolism-first theory of the origin of life– the default premise may favor simplicity: that hete-rocycles were first selected as cofactors, and that purineRNA, only one among many species maintained by thesame generalized reactions, was subsequently selected forchirality, base-pairing, and ligation.

45 A particular problem for RNA replication is the steric restric-tion to 3′-5′ phosphate esters, over the kinetically favored 2′-5′

linkage.46 The relative importance of synthesis and selection depends on

whether opening access to a space of reactions, or concentrat-ing flux within a few channels in that space, is the primarylimit on the emergence of order at each phase in the elabora-tion of metabolism. Following our earlier arguments about theneed for autocatalysis, selection will be essential in some stages,and this remains an important problem for metabolism-firstpremises [154]. Chemical selection criteria derived from differen-tial growth rate pose no problem in the domain of small-moleculeorganocatalysis, but the identification of plausible mechanisms topreserve selected differences remains an important area of work.Most mechanisms that do not derive from RNA base pairing in-volve separation by spatial geometry or material phases, includ-ing porous-medium processes akin to invasion percolation [153],or more general proposals for compositional inheritance [206–208], abstracted from models of coascervate chemistry.

Page 34: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

33

C. The alkyl-thiol cofactors

The major chemicals in this class include the sul-fonated alkane-thiols coenzyme-B (CoB) and coenzyme-M (CoM), cysteine and homocysteine including theactivated forms S-adenosyl-homocysteine (which undermethylation becomes SAM), lipoic acid, and pantetheineor pantothenic acid, including pantetheine-phosphate.The common structure of the alkyl-thiol cofactors is analkane chain terminated by one or more sulfhydryl (SH)groups. In all cases except lipoic acid, a single SH isbound to the terminal carbon; in lipoic acid two SHgroups are bound at sub-adjacent carbons. Differencesamong the alkyl-thiol cofactors arise from their biosyn-thetic context, the length of their alkane chains, and per-haps foremost the functional groups that terminate theother ends of the chains. These may be as simple as sul-fones (in CoB) or as complex as peptide bonds (in CoA).

Cofactors in this class serve three primary functions,as reductants (cysteine, CoB, pantetheine, and one sulfuron lipoic acid), carriers of methyl groups (CoM, SAM,one sulfur on lipoic acid), and carriers of larger func-tional groups such as acyl groups (lipoic acid in lipoylprotein, phosphopantotheine in acyl-carrier protein). Ahighly specialized role in which H is a leaving group is theformation of thioesters at carboxyl groups (pantethenicacid in CoA, lipoic acid in lipoyl protein) This functionis essential to substrate-level phosphorylation [209], andappears repeatedly in the deepest and putatively oldestreactions in core metabolism. A final function closely re-lated to reduction is the formation and cleavage of S− Slinkages by cysteine in response to redox state, which is amajor controller of both committed and plastic tertiarystructure in proteins. The sulfur atoms on cysteine oftenform coordinate bonds to metals in metallo-enzymes, afunction that we may associate with protein ligands, incontrast to the more common nitrogen atoms that coor-dinate metals in pyrrole cofactors.

The properties of the alkyl-thiol cofactors derivelargely from the properties of sulfur, which is a “soft”period-3 element [210] that forms relatively unstable(usually termed “high-energy”) bonds with the hardperiod-2 element carbon. For the alkyl-thiol cofactorsin which sulfur plays direct chemical roles, three mainbonds dictate their chemistry: S − C, S − S, and S − H.Sulfur can also exist in a wide range of oxidation states,and for this reason often plays an important role in en-ergy metabolism [211], particularly for chemotrophs, anddue to its versatility has been suggested to precede oxy-gen in photosynthesis [212]. The electronic versatility ofsulfur and the high-energy C−S bonds combine with thelarge atomic radius of sulfur to give access to additionalgeometrical, electronic and ring-straining possibilities notavailable to CHON chemistry.

Although not alkyl-thiol compounds as categorizedabove, two additional cofactors that make important in-direct use of sulfur are thiamin and biotin. In neither caseis sulfur the element to which transferred C1 groups are

bound, but its importance to the focal carbon or nitro-gen atom is suggested by the complexity of the chemistryand enzymes involved in its incorporation into these twocofactors [166, 213].

1. Biochemical roles and phylogenetic distribution

Transfer of methyl or methylene groups: The Satoms of CoM, lipoic acid, and S-adenosyl-homocysteineaccept methyl or methylene groups from the nitrogenatoms of pterins. Considering that transition-metal sul-fide minerals are the favored substrates for prebioticdirect-C1 reduction [56, 119, 197], a question of partic-ular interest is how, in mineral scenarios for the emer-gence of carbon fixation, the distinctive relation betweentuned nitrogen atoms in pterins as carbon carriers, andalkyl-thiol compounds as carbon acceptors, would haveformed.

Reductants and co-reductants: CoB and CoM acttogether as methyl carrier and reductant to form methanein methanogenesis.47 A similar role as methylene carrierand reductant is performed by the two SH groups in lipoicacid. CoM is specific to methanogenic archaea [215],while lipoic acid and S-adenosyl-homocysteine are foundin all three domains [41, 216]. Lipoic acid is formed fromoctanoyl-CoA, emerging from the biotin-dependent mal-onate pathway to fatty acid synthesis, and along withfatty acid synthesis [85], may have been present in theuniversal common ancestor. The universal distributionof the glycine cycle supports this as noted earlier.

Role in the reversal of citric-acid cycling: Lipoicacid becomes the electron acceptor in the oxidative de-carboxylation of α-ketoglutarate and pyruvate in the ox-idative Krebs cycle, replacing the role taken by reducedferredoxin in the rTCA cycle. Thus the prior availabilityof lipoic acid was an enabling precondition for reversal ofthe cycle in response to the rise of oxygen.

Carriers of acyl groups: Transport of acyl groups inthe acyl-carrier protein (ACP) proceeds through thioes-terification with pantetheine phosphate, similar to thethioesterification in fixation pathways. In fatty acid

47 In this complex transfer [120], the fully-reduced (Ni+) state ofthe Nickel tetrapyrrole F430 forms a dative bond to −CH3 dis-placing the CoM carrier, effectively re-oxidizing F430 to Ni3+.Reduced F430 is regenerated through two sequential single-electron transfers. The first, from CoM-SH, generates a Ni2+

state that releases methane, while forming a radical CoB·−S−S−

CoM intermediate with CoB. The radical then donates the sec-ond electron, restoring Ni+. The strongly oxidizing heterodisul-fide CoB-S-S-CoM is subsequently reduced with two NADH ina process known as electron bifurcation [214] (described furtherbelow), regenerating CoM-SH and CoB-SH while jointly gener-ating the low-potential reductant reduced-ferredoxin. Both thestepwise reduction of F430 and electron bifurcation illustrate thecentral role of metals as mediators of single-electron transfer pro-cesses in metabolism.

Page 35: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

34

biosynthesis acyl groups are further processed while at-tached to the panthetheine phosphate prosthetic group.

Electron bifurcation: The heterodisulfide bond ofCoB-S-S-CoM has a high midpoint potential (E′

0 =−140mV), relative to the H+/H2 couple (E′

0 =−414mV), and its reduction is the source of free en-ergy for the endergonic production of reduced Ferre-doxin (Fd2−, E′

0 in situ unknown but between −520mVand −414mV) [214], which in turn powers the initialuptake of CO2 on H4MPT in methanogens. The re-markable direct coupling of exergonic and endergonicredox reactions through splitting of binding pairs intopairs of radicals, which are then directed to paired high-potential/low-potential acceptors, is known as electronbifurcation [111]. Variant forms of bifurcation are com-ing to be recognized as a widely-used strategy of metal-center enzymes, either consuming oxidants as energysources to generate uniquely biotic low-potential reduc-tants such as Fd2− [214, 217–219], or to “titrate” re-dox potential to minimize dissipation and achieve re-versibility of redox reactions involving reductants at di-verse potentials, e.g. by combining low-potential (Fd2−,E′

0 = −420mV) and high-potential (NADH, E′

0 =−300mV) reductants to produce intermediate-potentialreductants (NADPH, E′

0 = −360mV) [220]. Togetherwith substrate-level phosphorylation (SLP), electron bi-furcation may be the principle chemical mechanism (con-trasted with membrane-mediated oxidative phosphoryla-tion) for interconverting biological energy currencies, andalong with SLP [209], a mechanism of central importancein the origin of metabolism [221]. Small metabolites in-cluding such heterodisulfides of cofactors, which can formradical intermediates exchanging single electrons with Fe-S clusters (typically via flavins) are essential sources andrepositories of free energy in pathways using bifurcation.

2. Participation in carbon fixation pathway modules

The similarity between the glycine cycle and methano-genesis in Fig. 4 emphasizes the convergent roles of alkyl-thiol cofactors. In the glycine cycle, methylene groupsare accepted by the terminal sulfur on lipoic acid, andthe subadjacent SH serves as reductant when glycine isproduced, leaving a disulfide bond in lipoic acid. Thedisulfide bond is subsequently reduced with NADH. Inmethanogenesis, a methyl group from H4MPT is trans-ferred to CoM, with the subsequent transfer to F430, andthe release from F430 as methane in the methyl-CoM re-ductase, coupled to formation of CoB−S−S−CoM. Theheterodisulfide is again reduced with NADH, but employsa pair of electron bifurcations to retain the excess freeenergy in the production of Fd2− rather than dissipat-ing it as heat [214]. Methanogenesis is thus associatedwith 7 distinctive cofactors beyond even the set knownto have diversified functions within the archaea [5], againsuggesting the derived and highly optimized nature ofthis Euryarchaeal phenotype. The striking similarity of

these two methyl-transfer systems, mediated by indepen-dently evolved and structurally quite different cofactors,suggests evolutionary convergence driven specifically byproperties of alkyl thiols.

A curious pattern, which we note but do not attemptto interpret, is the association of non-sulfur, nitrogen-heterocycle cofactors with WL carbon fixation, con-trasted with the use of sulfur-containing heterocycles incarboxylation reactions of the rTCA cycle. The non-sulfur cofactors THF and H4MPT are used in the re-actions of the WL pathway, while the biosynthetically-related but sulfur-containing cofactor Thiamin mediatesthe carbonyl insertion (at a thioester) in rTCA [84, 222].Biotin – which has been generally associated with mal-onate synthesis in the fatty-acid pathway (and derivativessuch as propionate carboxylation to methyl-malonate in3HP [85]) – mediates the subsequent β-carboxylation ofpyruvate and of α-ketoglutarate [86, 223, 224]. Thusthe two cofactors we have identified as using sulfur in-directly to tune properties of carbon or nitrogen C1-bonding atoms mediate the two chemically quite differentsequential carboxylations in rTCA.

D. Carboxylation reactions in cofactor synthesis

Carboxylation reactions can be classified as falling intotwo general categories: those used in core carbon “up-take”, and those used exclusively in the synthesis of spe-cific cofactors. In addition to carboxylation reactions incarbon-fixation pathways, the former category includesthe carboxylation of crotonyl-CoA in the glyoxylate re-generation cycle. Although not an autotrophic pathwaythis reaction does form a distinct entry point for CO2

into the biosphere. The carboxylation of acetyl-CoA tomalonyl-CoA further serves a dual purpose, in being boththe starting point for fatty acid synthesis, as well as a keystep in the 3HP pathway used in several carbon-fixationpathways. All these carboxylation reactions thus have incommon that they are used at least in some organism asthe central source for cellular carbon. All other carboxy-lation reactions that are not used as part of core carbonuptake, are used in the synthesis of the biotin cofactor,and the purine and pyrimidine nucleotides (see Fig. 17).

If we consider the sequences in which CO2 is incorpo-rated in these pathways, they also form a distinct classof chemistry. In all three cases the resulting carboxylgroup is immediately aminated, either as part of thecarboxylation reaction, or in the following reaction, andthe carboxamide group is subsequently maintained intothe final heterocyclic structure. In addition we previ-ously saw that IMP becomes the source for the folateand flavin family (through GTP). Carboxylation reac-tions are thus either a general source for cellular carbonin core metabolism, or a specific source of carboxamidegroups in the synthesis of cofactors that are part of thecatalytic control of core metabolism.

Page 36: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

35

AIR

N

N

NH 2

OOPHO

OH

O

HO OH

N

N

NH2

OOPHO

OH

O

HO OH

NH2

O

NN

HN

OOPHO

OH

O

HO OH

OH N2

O

HN

N

NN

O

OOPHO

OH

O

HO OH

IMP

CO2 NH3NH2

O

OPHO

OH

O O

NH2

O

HO

HN

HO

O

NH

O

HN

HO

OO

NOOPHO

OH

O

HO OH

O ONH

UMP

OO

S-CoAHO

O

HO

NH 2

NH2

NHHN

O

HO

O

NHHNO

HO

O

S

+

Biotin

CO2

CO2

NH3

NH3ALA

ASP PRPP

FIG. 17: Carboxylation reactions in the synthesis of cofac-tors. The sequences show the immediate amination of thecarboxyl group to a carboxamide group, which is then pre-served into the final heterocyclic structure. As the only car-boxylations not used in core carbon uptake, these reactionsequences form a distinct class of chemistry. Amination re-actions are shown as net additions of ammonia, which maybe derived from other sources (such as glutamine, aspartateor S-adenosyl-methionine). Abbreviations: Alanine (ALA);Aspartate (ASP); phosphoribosyl pyrophosphate (PRPP).

E. The chorismate pathway in both amino acid and

cofactor synthesis

Chorismate is the sole source of single benzene ringsin biochemistry [225]. The non-local π-bond resonance isused in a variety of charge-transfer and electron transferand storage functions, in functional groups and cofac-tors derived from chorismate. We have noted the charge-transfer function of PABA in tuning N10 of folates, andits impact on C1 chemistry. The para-oriented carbonylgroups of quinones may be converted to partially- orfully-resonant orbitals in the benzene ring, enabling fullyoxidized (quinone), half-reduced (semiquinone), or fullyreduced (hydroquinone) states [203]. Finally, the aro-matic ring in tryptophan (a second amino acid whichbehaves in many ways like a cofactor) has at least onefunction in the active sites of enzymes as a mediator ofnon-local electron-transfers [226].

V. INNOVATION: PROMISCUOUS CATALYSIS,

SERENDIPITOUS PATHWAYS

The previous sections argued for the existence oflow-level chemical and cofactor/catalyst constraints onmetabolic innovations, and presented evolutionary diver-gences that either respected these as constraints, or wereenabled by the diversification of cofactor and catalyticfunctions. In this section we consider the dynamics bywhich innovation occurs, and its main organizing prin-ciples. Innovation in modern metabolism occurs prin-cipally by duplication and divergence of enzyme func-tion [116, 227, 228]. Often it relies on similarity of func-

tions among paralogous enzymes, but in some cases mayexploit more distant or accidental overlap of functions.

Innovation always requires some degree of enzymaticpromiscuity [116], which may be the ability to catalyzemore than one reaction (catalytic promiscuity) or toadmit more than one substrate (substrate ambiguity).Pathway innovation also requires serendipity [229], whichrefers to the co-incidence of new enzymatic function withsome avenue for pathway completion that generates anadvantageous phenotype from the new reaction. Mostmodern enzymes are highly specific,48 but specific en-zymes – whether due to structure or due to evolved reg-ulation – are of necessity diversified in order to coverthe broad range of metabolic reactions used in the mod-ern biosphere. Serendipitous pathways assembled froma diversified inventory of specific enzymes will in mostcases be strongly historically contingent as they dependon either overlap of narrow affinity domains or on “ac-cidental” enzyme features not under selection from pre-existing functions. Such pathways therefore seem unpre-dictable from first principles; whether they are rare willdepend on the degree to which the diversity of enzymesubstrate-affinities compensates for their specificity.

A key question for early metabolic evolution is whetherthe trade-off between specificity and diversity was differ-ent in the deep past than in the present, in either degreeor in structure, in ways that affected either the discov-ery of pathway completions or the likelihood that newmetabolites could be retained within existing networks.These structural aspects of promiscuity and serendipitydetermine the regulatory problem faced by evolution inbalancing the elaboration of metabolism with its preser-vation and selection for function.

1. Creating reaction mechanisms and restricting substrates,while evolving genes

Metabolism is characterized at all levels by a ten-sion between creating reaction mechanisms that intro-duce new chemical possibilities, and then pruning thosepossibilities by selectively restricting reaction substrates.Whether this tension creates a difficult or an easy prob-lem for natural selection to solve depends at any timeon whether the accessible changes in catalytic function,starting from integrated pathways, readily produce newintegrated pathways whose metabolites can be recycledin autocatalytic loops. We argue that the conservation ofpathway mechanisms, particularly when these are definedby generic functional groups such as carboxyls, ketones,

48 However, broad substrate-specificity is no longer considered rare,and is even explained as an expected outcome in some caseswhere costs of refinement are higher than can be supported bynatural selection, and in other cases by positive selection forphenotypic plasticity [228]

Page 37: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

36

and enols, with promiscuity coming from substrate am-biguity with respect to molecular properties away fromthe reacting functional group, favors the kind of orderlypathway duplication that we observe in the extant di-versity of core metabolism. Therefore we expect thatserendipitous pathway formation was both facile in thoseinstances in the early phases of metabolic evolution whereinnovations in radical-based mechanisms for carbon in-corporation occurred, and structured according to thesame local-group chemistry around which the substratenetwork is organized [67].

Modern enzymes both create reaction mechanisms andrestrict substrates, but the parts of their sequence andstructure that are under selection for these two cate-gories of function may be quite different, so the two func-tions can evolve to a considerable degree independently.Active-site mechanisms in enzymes for organic reactionswill often depend sensitively on a small number of highlyconserved catalytic residues in a relatively fixed geome-try, while substrate selection can depend on a wide rangeof properties of enzyme shape or conformation dynam-ics [228], on local functional-group properties of the sub-strate that have been termed “chemophores” [230], aswell as (in some cases) on detailed relations between thesubstrate and active-site geometry or residues. An ex-treme example of the potential for separability betweenreaction mechanism and substrate selection is found inthe polymerases. A stereotypical reaction mechanismof attack on activating phosphoryl groups requires lit-tle more than correct positioning of the substrates. Inthe case of DNA polymerases, at least six known cat-egories (A, B, C, D, X, and Y) with apparently inde-pendent sequence origin have converged on a geometrylikened to a “right hand” which provides the requiredorientation [231, 232].

At the same time as evolving enzymes needed to pro-vide solutions to the biosynthetic problem of enablingand regulating metabolic network expansion, they werethemselves dependent on the evolving capabilities of ge-nomic and translation systems for maintaining complex-ity and diversity. Jensen [227] originally argued that highenzymatic specificity was no more plausible in primitivecells than highly diversified functionality,49 and that en-zymatic promiscuity was both evolutionarily necessaryand consistent with what was currently known about

49 This argument was largely a rebuttal of an earlier proposal byHorowitz [233] for “retrograde evolution” of enzyme functions.The 1940s witnessed the rise of an overly-narrow interpretationof “one gene, one enzyme, one substrate, one reaction” (a rigidcodification of what would become Crick’s Central Dogma [234],which in the context of complex pathway evolution appeared tobe incompatible with natural selection for function of interme-diate states. The Horowitz solution was to depend on an all-inclusive “primordial soup” [50], in which pathways could growbackward from their final products, propagating selection step-wise downward in the pathway until a pre-existing metabolite orinorganic input was found as a pathway origin.)

substrate ambiguity and catalytic promiscuity. Mod-ern reviews [116, 228, 230] of the mechanisms underlyingfunctional diversity, promiscuity, and serendipity confirmthat substrate ambiguity is the primary source of promis-cuity that has led to the diversification of enzyme fam-ilies. It is striking that, even in cases where substrateaffinity has been the conserved property while alternatereaction mechanisms or even alternate active sites havebeen exploited, it is often local functional groups on oneor more substrates that appear to determine much of thisaffinity [228].

2. Evidence in our module substructure that earlyinnovation was governed principally by local chemistry

The substructure of modules, and the sequence of in-novations, we have sketched in Sec. III appears to bedominated by substrate ambiguity in enzymes or enzymefamilies with conserved reaction mechanisms. The keyreactions in carbon fixation are of two types: Crucial re-actions typically involve metal centers or cofactors thatcould have antedated enzymes, and it is primarily re-action sites, not molecular selectivity, that distinguishespathways at the stage of these reactions50. The sharedinternal sequence of reductions and isomerizations com-mon to modules (Fig. 8) are very broadly duplicated,and the molecular specificity in their enzymes today isnot correlated with significant reaction-sequence changesin the internal structure of pathways. These pathwayscould plausibly function much as they do today with less-specific hydrogenases and aconitases.

A quantitative reconstruction of early evolutionary dy-

namics will require merging probability models for net-works and metabolic phenotypes with those for sequencesand structure of enzyme families. The goal is a consis-tent model of the temporal sequence of ancestral states ofcatalyst families, and of the substrate networks on whichthey acted.

VI. INTEGRATION OF CELLULAR SYSTEMS

The features of metabolism that display a “logic” ofcomposition, which is then reflected in their evolutionaryhistory, are those with few and robust responses to envi-ronmental conditions that can be inferred from presentdiversity. These are the subsystems whose evolution hasbeen simplified and decoupled by modularity. Their rel-ative immunity from historical contingency, resulting inmore “thermodynamic” modes of evolution, result from

50 Recall that the enzymes that have been argued to be the ances-tral forms of both the acetyl- and succinyl-CoA ligases and thepyruvate and α-ketoglutarate biotin-carboxylases show very closesequence homology [86, 89], suggesting shared ancestral enzymesfor both.

Page 38: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

37

rapid, high-probability convergence in populations thatcan share innovations [141].

The larger roles for standing variation and historicalcontingency that are so often emphasized [235] in ac-counts of evolutionary dynamics are made possible bylonger-range correlations that link modules, creating mu-tual dependencies and restricting viable changes [108,109]. The most important source of such linkage in extantlife is the unification of metabolic substrates and controlprocesses within cells [236]. Cellular death or reproduc-tion couples fitness contributions from many metabolic-phenotype traits, together with genome replication sys-tems. This enables the accumulation of diversity asgenomes capture and exploit gains from metabolic con-trol, complementary specialization [237], and the emer-gence of ecological assemblies of specialists as significantmediators of contingent aspects of evolutionary innova-tion (as we illustrated with the example of methylotro-phy).

We consider in this section several important ways inwhich aggregation of metabolic processes within cells fol-lows its own orderly hierarchy and progression. We notethat even a single cell does not impose only one type ofaggregation, but at least three types, and that these arethe bases for different selection pressures and could havearisen at different times. Within cellular subsystems, thecoupling of chemical processes is often mediated by cou-pling of their energy systems, which has probably de-veloped in stages we can identify. Finally, even wheremolecular replication is coupled to cellular physiology, inthe genetic code, strong and perhaps surprising signa-tures of metabolic modularity are recapitulated.

A. Cells provide at least three functionally distinct

forms of compartmentation

Under even the coarsest functional abstraction, the cellprovides not one form of compartmentation, but at leastthree [238, 239]. The geometry and topology of closedspheres or shells, and the capacitance and proton imper-meability of lipid bilayers, permit the buildup of pH andvoltage differences, and thus the coupling of redox andphosphate energy systems through intermediate proton-motive (or in many cases, sodium-motive) force [240].The concentration of catalysts with substrates enhancesreactions that are second-order in organic species, andthe equally important homeostatic control of the cyto-plasm regulates metabolic reaction rates and precludesparasitic reactions. Finally, the cell couples genetic vari-ations to internal biochemical and physiological varia-tions51 much more exclusively than they are coupled to

51 The perspective that this is an active coupling, which defines oneof the forms of individuality rather than providing a completecharacterization of the nature of the living state, is supported by

shared resources such as biofilms or siderophores, leadingto the different evolutionary dynamics of developmentfrom niche construction [38].52 Each of these forms ofcoupling affects the function and evolution of the mod-ules we have discussed.

1. Coupling of redox and phosphate energy systems mayhave been the first form of compartmentation selected

Biochemical subsystems driven, respectively, by redoxpotential or phosphoanhydride-bond dehydration poten-tial, cannot usually be directly coupled to one anotherdue to lack of “transducer” reactions that draw on bothenergy systems.53 The notable exception to this rule isthe exchange of phosphate and sulfur groups in substrate-level phosphorylation [203] from thioesters (which mayproceed in either direction depending on conditions). Al-though it provides a less flexible mode of coupling thanmembrane-mediated oxidative phosphorylation, this cru-cial reaction type, which occurs in some of the deepestreactions in biochemistry (those employing CoA, includ-ing all those in the six carbon fixation pathways), hasbeen proposed as the earliest coupling of redox and phos-phate [209], and the original source of phosphoanhydridepotential [69] enabling pathways that require both reduc-tion and dehydration reactions.

Phosphate concentration limits growth of many bio-logical systems today, and phosphate concentrations ap-pear to be even lower in vent fluids [244] than on av-erage in the ocean, making it difficult to account forthe emergence of many metabolic steps in hydrothermalvent scenarios for the origin of life. Serpentinization andother rock-water interactions that produce copious re-ductants also scavenge phosphates into mineral form, soit appears doubtful that phosphates were abundant inthe environments otherwise most favorable to geochem-ical organosynthesis. What little phosphate is found inwater is primarily orthophosphate, because the phospho-anhydride bond is unstable to hydrolysis. Therefore theretention of orthophosphate, and the continuous regen-eration of pyrophosphate and polyphosphates [245–249],may have been essential to the spread of early life beyondvery rare geochemical environments.

The membrane-bound ATP-synthetase, which cou-ples phosphorylation to a variety of redox reactions [5]

the complex ecosystems including viral RNA and DNA that arepartly autonomous of the physiology of particular cells [241, 242].

52 For an argument that somatic development and niche construc-tion are variants on a common process, distinguished by thegenome’s level of control and exploitation of the constructed re-sources, see Ref. [243].

53 In addition to the ultimate physical constraint of limits to freeenergy, biochemistry operates under additional proximate con-straints not only from availability of free energy but from thechemical and quantum-mechanical substrates in which it is car-ried.

Page 39: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

38

through proton or sodium pumping, is therefore essentialin nearly all biosynthetic pathways, and must have beenamong the first functions of the integrated cell. Withouta steady source of phosphate esters, none of the threeoligomer families could exist. The ATP synthetase itselfis homologous in all organisms, providing one strong ar-gument (among many [85, 250]) for a membrane-boundlast common ancestor. Proton-mediated phosphorylation(best known through oxidative phosphorylation in therespiratory chain [203]) requires a topologically enclosedspace to function as a proton capacitor [240]. However,as shown by gram-negative bacteria [5] and their de-scendants mitochondria and plastids, which acidify theperiplasmic space or thylakoid lumen, the proton capaci-tor need not be (and generally is not) the same compart-ment as the cytoplasm containing enzymatic reactions.Because the coupling of energy systems is a differentfunction from regulating reaction rates catalytically, thephosphorylation system should not generally have beensubject to the same set of evolutionary pressures andconstraints as other cellular compartments, and need nothave arisen at the same time. We note that, becauseit may have lower osmotic pressure than the cytoplasm,the acidified space required for proton-driven phospho-rylation may not have required a cell wall, greatly sim-plifying the number of concurrent innovations requiredfor compartmentalization, compared to those for the cy-toplasm. Therefore we conjecture that proton-mediatedphosphorylation could have been the first function lead-ing to selection for lipid-bilayer compartmentalization,allowing other cellular functions to accrete at later times.

2. Regulation of biosynthetic rates may have beenprerequisite for the optimization of loop-autocatalytic cycles

The second function of cellular compartments, and theone most emphasized in vesicle theories of the origin oflife [6, 251, 252], is the enhancement of second-order re-actions by collocation of catalysts and their substrates.Here we note another role that we have not seen men-tioned, which is more closely related to the functionsof the cell that inhibit reactions. Organisms employingautocatalytic-loop carbon fixation pathways must reli-ably limit their anabolic rates to avoid drawing off excessnetwork catalysts into anabolism, resulting in passage be-low the autocatalytic threshold for self-maintenance, andcollapse of carbon fixation and metabolism. Regulatinganabolism to maintain viability and growth may havebeen an early function of cells.

We noted in Sec. III D 4 the fragility of autocatalytic-loop pathways to parasitic side-reactions, and the waythe addition of a linear pathway such as WL stabilizesloop autocatalysis in the root node of Fig. 11.54 It may

54 For proto-metabolism, spontaneous abiotic side-reactions may

be that the optimizations in either branch of the carbon-fixation tree were not possible until rates of anabolismwere sufficiently well-regulated to protect supplies of loopintermediates or essential cofactors. Therefore, while theroot node is plausible as a pre-cellular [56] or an earlycellular (but non-optimized) form, either branch from itmay have required the greater control afforded by quiterefined cellular regulation of reaction rates.

B. Coupling of metabolism to molecular

replication, and signatures of chemical regularity in

the genetic code

Among the subsystems coupled by modern cells, per-haps none is more elaborate than the combined appara-tus of amino acid and nucleotide biosynthesis and pro-tein coding. The most remarkable chemical aspect of theprotein-coding system is that it is an informational sys-tem: a sophisticated machinery of transcription, tRNAformation and aminoacylation, and ribosomal translationseparates the chemical properties of DNA and RNA fromthose of proteins, permitting almost free selection of se-quences in both alphabets in response to requirements ofheredity and protein function.55 The interface at whichthis separation occurs is the genetic code. From the in-formational suppression of chemical details that definesthe coding system, the code itself might have been ex-pected to be a random map, but empirically the code isknown to contain many very strong regularities relatedto amino acid biosynthesis and chemical properties, andperhaps to the evolutionary history of the aminoacyl-tRNA synthetases.

Many explanations have been advanced for redundancyin the genetic code, as a source of robustness of proteinproperties against single-point mutations [141, 253–255],but in all of these the source of selection originates inthe elaborate and highly evolved function of coding it-self. In many cases the redundancy of amino acids atadjacent coding positions reflects chemical or structuralsimilarities, consistent with this robustness-criterion forselection, but in nearly all cases redundancy of bases inthe code correlates even more strongly with shared ele-ments of biosynthetic pathways for the amino acids. Theco-evolutionary hypothesis of Wong [256] accounts forthe correlation of the first base-position with amino-acidbackbones as a consequence of duplication and diver-

be hazardous, if catalysts in the main fixation pathway to notsufficiently accelerate their reaction rates, creating a separationof timescales relative to the uncatalyzed background. Within thefirst cells, the same hazard is posed by secondary anabolism, asits reaction rates become enhanced by catalysts similar to thosein the core. This fact was clearly noted already in Ref. [56].

55 The observation that enzymes acting on DNA have evolved to ac-tively mitigate chemical differences in the bases, to enable a morenearly neutral combinatorial alphabet, is due to Peter Schuster(pers. comm.)

Page 40: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

39

gence of amino acid biosynthetic enzymes together withaminoacyl-tRNA synthetases (aaRS). The stereochemi-cal hypothesis of Woese [257] addresses a correlation ofthe second coding position with a measure of hydropho-bicity called the polar requirement. The remarkable factthat both correlations are highly significant relative torandom assignments, but that they are segregated be-tween first and second codon bases, is not specifically ad-dressed in either of these accounts. Copley et al. [205] ad-dress the same regularities as both the Wong and Woesehypotheses, but link them to much more striking redun-dancies in biosynthetic pathways, which they propose areconsequences of small-molecule organo-catalytic roles ofdimer RNA in the earliest biosynthesis of amino acids.

We note here a further chemical regularity in the ge-netic code, which falls outside the scope of the previousexplanations: purines in second base-position code forseveral amino acids that either use related purine-derivedcofactors in their biosynthetic pathways, or are directlyrelated to the codon. This association is much morecomprehensive for G-second codons than for A-secondcodons, and it does not suggest the same kinds of mech-anistic relations in the two cases. However, it furthercompresses the description of patterns in the code thatwere not addressed in Ref. [205], in terms of similar chem-ical and biosynthetic associations.

A strong correlation, of a single kind, is found betweenthe glycine cycle for amino acid biosynthesis from C1

groups on folate cofactors, and codons XGX, where Xis any base and G is guanine. (In what follows we ab-breviate wobble-base positions y for pyrimidines U andC, or u for purines A or G.) This group includes glycine(GGX), serine (AGy), cysteine (UGy), and tryptophan(UGu).56 We do not propose a specific mechanism forsuch an association here, but our earlier argument thatfolates would have been contemporaneous with GTP sug-gests that biosynthesis through the glycine cycle was theimportant source of these amino acids at the time theybecame incorporated into the code. Some of these aminoacids satisfy multiple regularities, as in the correlationof glycine with GXX ↔ reductive transamination, orcysteine with UXX ↔ pyruvate backbone, proposed inRef. [205].

The position (CAy) of histidine, synthesized fromATP, is the only case we recognize of a related corre-lation in XAX codons. For this position, the availabilityof ATP seems to have been associated with the synthesisof histidine directly through the cyclohydrolase function(rather than through secondary cofactor functions), atthe time this amino acid became incorporated into thecode.

Much more than correlation is required to impute cau-sation, and all existing theories of cause for regularities

56 Both purines are used in the mitochondrial code and only UGGin the nuclear code.

in the genetic code are either highly circumstantial orrequire additional experimental support. Therefore welimit the aspects of these observations that we regard assignificant to the following three points:

The existence of a compression: The idealized adap-tive function of coding is to give maximum evolution-ary plasticity to aspects of phenotype derived from pro-tein sequence, uncoupled from constraints of underlyingbiosynthesis. The near-wholesale transition from organicchemistry to polymer chemistry around the C20 scale sug-gests that this separation has been effectively maintainedby evolution. Strong regularities which make the descrip-tion of the genetic code compressible relative to a randomcode reflect failures of this separation which have trans-mitted selection pressure across levels, during either theemergence or maintenance of the code. These includebase-substitution errors, whether from mutation or in thetranscription and translation processes, but also appar-ently chemical relations between nucleobases and aminoacids.

The segregation of the roles of different base po-sitions and in some cases different bases in termsof their biochemical correlates: The genetic code islike a “rule book” for steps in the biosynthesis of manyamino acids,57 but the chemical correlations which areits rules are of many kinds. Beyond the mere existenceof those rules, and their collective role as indices of reg-ularity threading the code, we must explain why rulesof different kinds are so neatly segregated over differentbase positions and sometimes over different bases (as inthe XGX and XAX codons).

A compression that references process ratherthan property: The role of biosynthetic pathways ascorrelates of regularities makes this compression of thegenetic code a reference to the process and metabolicnetwork context within which amino acids are produced,and not merely to their properties. (Many of the chemi-cal properties recognized as criteria of selection, whethersize or hydrophobicity, are shared at least in part be-cause they result from shared substrates or biosyntheticsteps.) We think of the function of coding as separat-ing biosynthetic process from phenotype: transcriptionand translation are “Markovian” in the sense that theonly information from the biosynthetic process which sur-vives to affect the translated protein is what is inherentin the structure of the amino acid.58 Thus selection onpost-translation phenotypes should only be responsive tothe finished amino acids. The existence of regularities inthe genetic code which show additional correlation with

57 The correlations in the code may be understood as rules be-cause the biosynthetic pathways may be placed on a decision tree,with branches labeling alternative reactions at several stages ofsynthesis, and branching directions indicated by the position-dependent codon bases [205].

58 In technical terms, one says that the phenotype is conditionally

independent of the biosynthetic pathway, given the amino acid.

Page 41: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

40

intermediate steps in the biosynthetic process thereforerequires either causes other than selection on the post-coding phenotype (including its robustness), or a history-dependence in the formation of the code that reflects ear-lier selection on intermediate pathway states. If they re-flect causal links to metabolic chemistry, these “failures”of the separation between biosynthetic constraint and se-lection of polymers for phenotype may have broken downthe emergence of molecular replication into a sequence ofsimpler, more constrained, and therefore more attainablesteps.

VII. CONCLUSIONS

We have argued that the fundamental problem of elec-tron transfer in aqueous solution leads to a qualitativedivision between catalytically “hard” and “easy” chem-istry, and that this division in one form or another hasled to much of the architecture and long-term evolution ofthe biosphere. Hard chemistry involves electron transferswhose intermediate states would be unstable or energet-ically inaccessible in water if not mediated by transition-metal centers in metal-ligand complexes. Easy chemistryinvolves hydrogenations and hydrations, intramolecularredox reactions, and a wide array of acid-base chemistry.Easy chemistry is promiscuously re-used and providesthe internal reactions within modules of core metabolism.Hard chemistry defines the module boundaries and thekey constraints on evolutionary innovation. These sim-ple ideas underlie a modular decomposition of carbonfixation that accounts for all known diversity, largely interms of unique adaptations to chemically simple vari-ations in the abiotic environment. On the foundationof core metabolism laid by carbon fixation, the remain-der of biosynthesis is arranged as a fan of increasinglyindependent anabolic pathways. The unifying role ofthe core permits diverse anabolic pathways to indepen-dently reverse and become catabolic, and the combina-torics of possible reversals in communities of organismsdetermines the space of evolutionary possibilities for het-erotrophic ecology.

We have emphasized the role of feedback in biochem-istry, which takes different forms at several levels. Net-work autocatalysis, if we take as a separate questionthe origin of external catalytic and cofactor functions,is found as a property internal to the small-moleculesubstrate networks for many core pathways. A quali-tatively different form of feedback is achieved throughcofactors, which may act either as molecular or as net-work catalysts. As network catalysts they differ fromsmall metabolites because their internal structure is notchanged except at one or two bonds, over the reactionsthey enable. The cofactors act as “keys” that incorpo-rate domains of organic chemistry within biochemistry,and this has made them both extraordinarily productiveand severely limiting. No extant core pathways func-tion without cofactors, and cofactor diversification ap-

pears to have been as fundamental as enzyme diversifica-tion in some deep evolutionary branches. We have there-fore argued for a closely linked co-evolution of cofactorfunctions with the expansion of the universal metabolicnetwork from inorganic inputs, and attempted to placekey cofactor groups within the dependency hierarchy ofbiosynthetic pathways, particularly in relation to the firstability to synthesize RNA.

The most important message we hope to convey is theremarkable imprint left by very low-level chemical con-straints, even up to very high levels of biological orga-nization. Only seven carbon fixation modules, mostlydetermined by distinctive, metal-dependent carboxyla-tion reactions, cover all known phylogenetic diversity andprovide the building blocks for both autotrophic and het-erotrophic metabolic innovation. A similar, small collec-tion of organic or organometallic cofactor families havebeen the gateways that determine metabolic networkstructure from the earliest cells to the present. The num-ber of these cofactors that we consider distinct may besomewhat further reduced if we recognize biosyntheticrelatedness that leads to functional relatedness (as in thepurine-derived or chorismate-derived cofactors), or casesof evolutionary convergence dominated by properties ofelements (as for lipoic acid and the CoB-CoM system).

We believe that these regularities should be under-stood as laws of biological organization. In a proper,geochemically-embedded theory of the emergence ofmetabolism, they should be predictable, either particularforms as in the case of metal chemistry or convergent usesof nitrogen and sulfur, or properties of distributions as inthe use of network modules or the diversity of cofactors.Moreover, this lawfulness should have been expected: thefactors that reduce (or encrypt) the role of laws in biol-ogy, and lead to unpredictable historical contingencies,arise from long-range correlation. Correlation of multi-ple variables leads to large spaces of possibility and en-tangles the histories of different traits, making the spacedifficult to sample uniformly. But correlation in biologyis in large part a constructed property; it has not beenequally strong in all eras and its persistence depends ontimescales. Long-term evolution permits recombinationeven in modern integrated cells and genomes. Early life,in contrast, with its less-integrated cells and genomes,and its more loosely-coupled traits, had constructed lesslong-range correlation. These are the domains where thesimpler but invariant constraints of underlying chemistryand physics should show through.

Acknowledgments

This work was completed as part of the NSF FIBRgrant nr. 0526747: From geochemistry to the geneticcode. DES thanks Insight Venture Partners for support.RB is further supported by an Omidyar Fellowship atthe Santa Fe Institute. We are grateful to Harold Mo-rowitz, Shelley Copley, and Charles McHenry for critical

Page 42: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

41

conversations of these ideas and essential references.

APPENDIX A: BIPARTITE GRAPH

REPRESENTATIONS FOR CHEMICAL

REACTION NETWORKS

The stoichiometry of a chemical reaction may be rep-resented by a directed hypergraph [258]. A hypergraphdiffers from a simple graph in that, where each edge of asimple graph has two points as its boundary, in a hyper-graph, a hyper-edge may have a set of points as its bound-ary. In a directed hypergraph, the input and output setsin the boundary are distinguished. For the application tochemistry presented here, each hyper-edge corresponds toa reaction, and its input and output boundary sets cor-respond to moles of the reactant and product molecules.

It is possible to display the hypergraphs represent-ing chemical reactions as doubly-bipartite simple graphs,meaning that both nodes and edges exist in two types,and that well-formed graphs permit only certain kindsof connections of nodes to edges. The bipartite graphrepresentation of a reaction has an intuitive similarityto the conventional chemical-reaction notation (shownin Fig. A1 below), but it makes more explicit refer-ence to the chemical mass-action law as well as to thereaction stoichiometry. For appropriately constructedgraphs, graph-rewrite rules correspond one-to-one withevaluation steps of mass-action kinetics, permitting sim-plification of complex reaction networks to isolate keyfeatures, while retaining correspondence of the visual andmathematical representations.

We use graph representations of reaction networks inthe text where we need to show relations among multiplepathways that may connect the same inputs and outputs(such as acetyl-CoA and succinyl-CoA), and may drawfrom the same input and output species (such as CO2, re-ductant, and water). Parallel input and output sequencesappear as “ladder” topology in these graphs, and for theparticular pathways of biological carbon fixation, this isdue to the recurrence of identical functional-group re-action sequences in multiple pathways, as discussed inSec. III B.

In this appendix we define the graph representationused in the text, introduce graph-reduction proceduresand prove that they satisfy the mathematical propertyof associativity, and provide solutions for the particularsimplification of interacting rTCA and Wood-Ljungdahlpathways in a diluting environment.

1. Definition of graphic elements

a. Basic elements and well-formed graphs

The elements in a bipartite-graph representation of achemical reaction or reaction network are defined as fol-lows:

• Filled dots dots represent concentrations of chemi-cal species. Each such dot is given a label indicatingthe species, such as

ACE

↔ [ACE] ,

used to refer to acetate in the text.

• Dashed lines represent transition states of reac-tions. Each is given a label indicating the reaction,as in b .

• Hollow circles indicate inputs or outputs betweenmolecular species and transition states, as in

Ace

H2

CO2

b

.

Each circle is associated with the complex of reac-tants or products to the associated reaction, indi-cated as labeled line stubs.

• Hollow circles are tied to molecular concentrationswith solid lines ACE ; one line per mole of reactantor product participating in the reaction. (That is,if m moles of a species A enter a reaction b, thenm lines connect the dot corresponding to [A] to thehollow circle leading into reaction b. This choiceuses graph elements to carry information about sto-ichiometry, as an alternative to labeling input- oroutput-lines to indicate numbers of moles.)

• Full reactions are defined when two hollow circlesare connected by the appropriate transition state,as in

ACEH2

CO2

bPYR

H2O,

describing the reductive carboxylation of acetate toform pyruvate.

• The bipartite graph for a fully specified reactiontakes the form

ACE

H2

CO2

b

PYR

H2O

ACE

H2 CO2

PYR

H2O

, (A1)

where labeled stubs are connected to filled circlesby mole-lines. The bipartite-graph corresponds tothe standard chemical notation for the same reac-tion as shown.

Page 43: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

42

b. Assignment of graph elements to terms in themass-action rate equation

The mass-action kinetics for a graph such as the reduc-tive carboxylation of acetate59 is given in terms of twohalf-reaction currents, which we may denote with the re-action label and an arbitrary sign as

j+

b = kb [ACE] [CO2] [H2]

j−b = kb [PYR] [H2O] . (A2)

kb and kb denote the forward and reverse half-reactionrate constants. The total reaction current Jb ≡ j+

b −j−b isrelated to the contribution of this reaction to the changesin concentration as

˙[ACE] = ˙[CO2] = ˙[H2] = −Jb

˙[PYR] = ˙[H2O] = Jb, (A3)

where the overdot denotes the time derivative. Reac-tion currents on graphs do not have inherent directions,reflecting the microscopic reversibility of reactions. Allsources of irreversibility are to be made explicit in thechemical potentials that constitute the boundary condi-tions for reactions.

Each term in the mass-action rate equation may beidentified with a specific graphical element in the bipar-tite representation. The half-reaction rate constants kb,kb are associated with the hollow circles, and the currentJb (which is the time-derivative of the coordinate giv-ing the “extent of the reaction”) is associated with thetransition-state dashed line. Concentrations, as noted,are associated with filled dots, and stoichiometric coeffi-cients are associated with the multiplicities of solid lines.

2. Graph reduction for reaction networks in steady

state

Networks of chemical reactions in steady state satisfythe constraints that the input and output currents toeach chemical species (including any external sources orsinks) sum to zero. These constraints are the basis of sto-ichiometric flux-balance analysis [259–262], but they canalso be used to eliminate internal nodes as explicit vari-ables, leading to lumped-parameter expressions for entiresub-networks as “effective” vertices or reactions. With

59 All examples in this appendix use the same simplified projec-tion onto the CHO sector that is used in diagrams in the maintext. Actual reaction free energies will be driven by coupled en-ergies of hydrolysis of ATP or oxidation of thiols to thioesters.The graph-reduction methods described in the next section maybe used to include such effects into lumped-parameter represen-tations of multi-reagent reaction sequences that regenerate en-ergetic intermediates such as ATP or CoA in a network wherethese are made explicit.

appropriate absorption of externally buffered reagentsinto rate constants, this network reduction can be doneexactly, without loss of information. An example of sucha reduction is the Michaelis-Menton representation ofmultiple substrate binding at enzymes. Systematic meth-ods for network reduction were one motivation behindSinanoglu’s graphic methods [263, 264]. More sophisti-cated stochastic approaches have recently been used toinclude fluctuation properties in effective vertices, gener-alizing the Michaelis relation beyond mean field [265].

The map we have given of mass-action rate parametersto graphic elements allows us to represent steady-statenetwork reduction in terms of graph reduction. In thisapproach, rewrite rules for the removal of graph elementsare mapped to composition rules for half-reaction rateconstants and stoichiometric coefficients. These compo-sition rules can be proved to be associative, leading to analgebra for graph reduction. Here we sketch the rewriterules relevant to reduction of the citric-acid cycle graph.In the next subsection we will reduce the graph, to theform used in the text.

a. The base composition rule for removal of a singleinternal species

The simplest reduction is removal of an intermediatechemical species that is the sole output to one reaction,and the sole input to another, in a linear chain. Exam-ples in the TCA cycle include malate (MAL) and isoc-itrate (ISC), produced by reductions and consumed bydehydrations. They also include citrate (CIT) itself, pro-duced by the hydration of aconitate and consumed byretro-aldol cleavage.

For a single linear reaction as shown in Fig. 18, themass-action law is

[A] ka − [B] ka = Ja, (A4)

and concentrations change as

˙[A] = −Ja

˙[B] = Ja. (A5)

The equilibrium constant for the reaction A → B is

KA→B =ka

ka. (A6)

[A] [B]aka ka

_

FIG. 18: Basic reaction graph. [A] and [B] are concentrationsassociated with the two colored nodes. Forward and backwardrate constants ka and ka are associated with the two unfilledcircles. The associated reaction state current is Ja.

Page 44: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

43

For two such reactions in a chain, as shown in Fig. 19,the mass-action laws are

[A] ka − [X] ka = Ja

[X] kb − [B] kb = Jb, (A7)

and the conservation laws become

˙[A] = −Ja

˙[X] = Ja − Jb

˙[B] = Jb. (A8)

[A]a

[B]b

[X]

[A]ab

[B]

FIG. 19: Removal of an internal species X from a diagramwith elementary reactions. Rate constant pairs

(

ka, ka

)

,(

kb, kb

)

are used to define new rate constants(

kab, kab

)

forthe effective transition state ab.

Under the steady-state condition ˙[X] = 0, we wish toreplace the equations (A7,A8) with a rate equation

[A] kab − [B] kab = Jab (A9)

and a conservation law expressed in terms of Ja = Jab =Jb. The rate constants in Eq. (A9) are to be specifiedthrough a composition rule

(

ka, ka

)

(

kb, kb

)

=(

kab, kab

)

(A10)

derived from the graph rewrite. Removing [X] from the

mass-action equations using ˙[X] = 0, we derive that therate constants satisfying Eq. (A9) are given by

kab =kakb

ka + kb

kab =kakb

ka + kb. (A11)

The associated equilibrium constant correctly satisfiesthe relation

kab

kab=

ka

ka

kb

kb. (A12)

b. Associativity of the elementary composition rule

The composition rule (A12) is associative, meaningthat internal nodes may be removed from chains of reac-tions in any order, as shown in Fig. 20. All compositionrules derived in the remainder of this appendix will bevariants on the elementary rule (with additional buffered

[A]ab

[B]c

[Y]

[A]abc

[B]

[A]a b

[X][B]

c

[Y]

[A]a

[B]bc

[X]

or{ }

FIG. 20: Composition of three reactions a, b, c can proceedby elimination of either X or Y first.

concentration variables added), so we demonstrate asso-ciativity for the base case as the foundation for othercases.

From Eq. (A12) for(

ka, ka

)

(

kb, kb

)

, followed by the

equivalent expressions for(

kab, kab

)

(

kc, kc

)

,(

ka, ka

)

◦(

kbc, kbc

)

, and(

kb, kb

)

(

kc, kc

)

, we derive the sequenceof reductions

kabc =kabkc

kab + kc

=kakbkc

kakb +(

ka + kb

)

kc

=kakbkc

ka

(

kb + kc

)

+ kbkc

=kakbc

ka + kbc, (A13)

and a similar equation follows for kabc. Thus we have

[(

ka, ka

)

(

kb, kb

)]

(

kc, kc

)

=(

ka, ka

)

[(

kb, kb

)

(

kc, kc

)]

.(A14)

c. Removal of internal nodes that require other inputs oroutputs

Next we consider the elimination of an internal node[X] that is produced or consumed together with other

products or reactants. Conservation ˙[X] = 0 implies re-lations among the currents of these other species as well.All remaining graph reductions that we will perform forthe TCA cycle are of this kind. In some cases both thesecondary product and reactant are the solvent (water),as in the aconitase reactions (repeated in TCA, 3HB,4HB, and bicycle pathways). In other cases they are re-ductants or inputs such as CO2 that we consider bufferedin the environment.

The pair of mass action equations we wish to reduce

Page 45: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

44

are60

[A] ka − [X] [C] ka = Ja

[X] [D] kb − [B] kb = Jb, (A15)

and the desired reduced form is

[A] [D] kab − [C] [B] kab = Jab. (A16)

We first reduce Eq. (A15) to the base case of the pre-vious section, by absorbing the concentrations not to beremoved into a pair of effective rate constants

[A] ka − [X](

[C] ka

)

= Ja

[X] ([D] kb) − [B] kb = Jb. (A17)

From these we derive a composition equation

[A] kab − [B]¯kab = Jab, (A18)

corresponding to the graph representation in Fig. 21. We

may then define kab and¯kab by the elementary composi-

tion rule (A10)

(

ka, [C] ka

)

(

[D] kb, kb

)

=(

kab,¯kab

)

, (A19)

giving the transformation61

kab =ka [D] kb

[C] ka + [D] kb

¯kab =

[C] kakb

[C] ka + [D] kb. (A20)

[A]a

[B]b

[X]

[C] [D]

[A]ab

[B]

[D] [C]

60 In this and the following examples, we consider single additionalspecies [C] and [D]. These may readily be generalized to a vari-ety of cases in which the additional reagents are

∏p

k=1[Ck] and

∏q

l=1[Dl].

61 Note that if [C] and [D] are the same species these cancel inthe numerator and denominator of Eq. (A20), and the same ap-plies to common factors in products

∏p

k=1[Ck] and

∏q

l=1[Dl].

Therefore these factors may simply be removed before the graphreduction if desired, because they encoded redundant constraintswith the conservation law already implied by ˙[X] = 0. The irrel-evance of redundant species in the graph reduction for removalof [X] is radically different from the graphically similar-lookingrole of a network catalyst which is both an input and an out-put of the same reaction. Network catalysts are essential to thedetermination of reaction rates.

FIG. 21: Representation of a composite graph with internalconnections other than those to X as an effective elementarygraph. Highlights denote the absorption of other species intomodifications of effective rate constants coupled to X at a

and b. These are then used to define the elementary-form

rate constants kab and¯kab in the reduced graph.

Now removing the factors of [C] and [D] used to definethe hatted rate constants,

kab = [D] kab

¯kab = [C] kab, (A21)

we obtain a direct expression for the composition rule inEq. (A18), of

kab =kakb

[C] ka + [D] kb

kab =kakb

[C] ka + [D] kb, (A22)

which is the interpretation of the graph reduction shownin Fig. 22.

[A]a

[B]b

[X]

[C] [D]

[A]ab

[B]

[D] [C]

FIG. 22: The composite graph corresponding to the reductionfrom Eq. (A15) to Eq. (A16).

d. Associativity for composite graphs

Associativity for composite graphs follows from theassociativity of the elementary composition rule (A14),via the grouping (A19). To show how this works, wedemonstrate associativity for the minimal case shownin Fig. 23. The important features are that the graph“re-wiring” follows from composition of the rule demon-strated in Fig. 22, and the composition rule for rate con-stants permits consistent removal of the necessary factorsof reagent concentrations.

The application of the elementary reduction to removeX, corresponding to the second line in Fig. 23, yieldsEq’s. (A19,A20). An equivalent removal of Y first (thethird line of Fig. 23) gives

(

kb, [E] kb

)

(

[F] kc, kc

)

=(

kbc,¯kbc

)

, (A23)

Page 46: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

45

or{ }

[A]a b

[X][B]

c

[Y]

[F][E][D][C]

[A]ab

[B]c

[Y]

[F][C][E]

[D]

[A]a

[B]bc

[X]

[C] [E]

[A]ab

[B]

[D] [F] [C] [E]

[D][F]

FIG. 23: A two-step reduction with other internal connec-tions, which may be performed by removing either X or Yfirst.

with rule

kbc =kb [F] kc

[E] kb + [F] kc

¯kbc =

[E] kbkc

[E] kb + [F] kc. (A24)

The two equivalent rules for removing whichever internalnode was not removed in the first reduction are

(

kab, [E]¯kab

)

(

[F] kc, kc

)

=(

kabc,¯kabc

)

,

(

ka, [C] ka

)

(

[D] kbc,¯kbc

)

=(

kabc,¯kabc

)

. (A25)

Composing these rules for intermediate rate constants,we may check that

kabc =kab [F] kc

[E]¯kab + [F] kc

=(ka [D] kb) [F] kc

[C] ka [E] kb +(

[C] ka + [D]kb

)

[F] kc

=ka [D] (kb [F]kc)

[C] ka

(

[E] kb + [F]kc

)

+ [D]kb [F] kc

=ka [D] kbc

[C] ka + [D] kbc

, (A26)

and a similar equation follows for¯kabc. Converting the

hatted forms to the normal reaction form produces therate equation

[A] [D] [F] kabc − [C] [E] [B] kabc = Jabc. (A27)

We may directly obtain the rate constants kabc, kabc

with the composition rule(

kabc, kabc

)

=(

kab, kab

)

(

kc, kc

)

=(

ka, ka

)

(

kbc, kbc

)

, (A28)

using the appropriate version of the graph-dependentevaluation rule (A22) in each step. The resulting com-position (A28) is automatically associative, because itsatisfies the conversion

kabc = [D] [F] kabc

¯kabc = [C] [E] kabc (A29)

with Eq. (A26), which is associative. As a final check, theequilibrium constants in the normal reaction form satisfythe necessary chain rule

kabc

kabc=

ka

ka

kb

kb

kc

kc. (A30)

Intermediate (hatted) rate constants have been used hereto show how associativity is inherited from the base case.The examples below work directly with the actual (un-hatted) rate constants, which keep the network in its lit-eral form at each reduction.

3. Application to the citric-acid cycle reactions

Using this graph representation and the associatedgraph reductions, we may express the qualitative kineticsassociated with network autocatalysis in the rTCA cycle.We use a minimal model network in which only the cy-cle intermediates are represented explicitly, and only theCHO stoichiometry is retained.62 External sources orsinks are used to buffer only four compounds in the net-work, which are CO2, H2, H2O, and a pool of reducedcarbon which we take to be acetate (ACE, or CH3COOH)because it has the lowest free energy of formation of cy-cle intermediates under reducing conditions (followingRef. [266]) and is the natural drain compound [7].

The purpose of network reduction in such a model isto produce a graph in which each element corresponds toa specific control parameter for the interaction of conser-vation laws with non-equilibrium boundary conditions.CO2, H2, and H2O provide sources of carbon and re-ductant, and an output for reduced oxygen atoms. Be-cause they comprise different ratios of three elements,any set of concentrations is consistent with a Gibbs equi-librium, and the chemical potentials corresponding to theelements are preserved by the conservation laws of arbi-trary reactions. A fourth boundary condition for acetatecannot be linearly independent in equilibrium, and drivesthe steady-state reaction flux.

Such a model is limited in many ways. The replace-ment of explicit (and unknown) parasitic side reactions,from all cycle intermediates, by a single loss rate for ac-etate may fail to capture concentration-dependent losses,

62 As noted above, phosphorylated intermediates and thioesters,including the energetically important substrate-level phosphory-lation of citrate and succinate, are not represented.

Page 47: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

46

in a way that cannot simply be absorbed into lumped rateconstants. Moreover, the rate constants themselves de-pend on catalysts, and reasonable values for these in aprebiotic or early-cellular context are unknown. There-fore all critical properties of the model are expressedrelative to these rate constants. The reduction remainsmeaningful, however, because the lumped-parameter rateconstants are controlled by the three buffered environ-mental compounds CO2, H2, and H2O, leaving the net-work flux to be controlled by the disequilibrium concen-tration of acetate.

a. The graph reduction sequence

CIT

H2

CO2

H2O

ACE

PYR

OXA

MAL

FUM SUC

AKG

OXS

ISC

cAC

b

c

a

d

e

f

g

h

i

j

k

FIG. 24: The projection of the TCA cycle onto CHO com-pounds. Phosphates and thioesters are omitted, and thestoichiometry of all acids refers to the protonated forms, sothat H2 stands for general two-electron reductants. Omissionof explicit representations of substrate-level phosphorylationto form citryl-CoA and succinyl-CoA causes water elimina-tion to accompany carboxylation of acetate and succinate inthis graph, where in the actual cycle it would occur outsidethe graph, in the formation of pyrophosphates. Highlightedspecies are sole outputs and sole inputs of their associatedreactions, and can be removed with the elementary composi-tion rule (A11) of Sec. A 2 a. Legend: acetate (ACE), pyru-vate (PYR), oxaloacetate (OXA), malate (MAL), fumarate(FUM), succinate (SUC), α-ketoglutarate (AKG), oxalosuc-cinate (OXS), cis-aconitate (cAC), isocitrate (ISC), citrate(CIT).

The bipartite graph for the minimal rTCA networkin CHO compounds is shown in Fig. 24. All networksin the text are generated by equivalent methods. High-lighted nodes are those that can be removed by the basereduction in Sec. A 2 a. Reactions are labeled with lower-case Roman letters, and relative to the elementary half-reaction rate constants, the lumped-parameter rate con-

stants are given by63

kde =kdke

kd + kekde =

kdke

kd + ke

kij =kikj

ki + kjkij =

kikj

ki + kj

kka =kkka

kk + kakka =

kkka

kk + ka, (A31)

with equivalent expressions for the ks. These define theelementary reactions in the reduced graph of Fig. 25.

H2

CO2

H2O

ACE

PYR

OXA

FUM SUC

AKG

OXS

cAC

b

c

ka

de

f

g

h

ij

FIG. 25: Graph of Fig. 24 with its highlighted species re-moved. Cis-aconitate (cAC highlighted) has common factorsof [H2O], and is the next internal node to be removed, by therewrite rules of Sec. A 2 c, but with the simplifying featurethat common factors cancel, so they resemble the base case.

One further reduction that follows the elementary rulein Fig. 25 is removal of cis-aconitate (cAC), which in-volves a common factor of the solvent [H2O]. The result-ing lumped-parameter rate constants are given by

kijka =kijkka

kij + kkakijka =

kij kka

kij + kka. (A32)

These lead to the graph of Fig. 26.All further graph reductions require the composition

rules of Sec. A 2 c, and result in changes of the inputor output stoichiometries of the unreduced nodes. Allhighlighted compounds in Fig. 26 may be removed, andthe resulting lumped-parameter rate constants are givenby

kbc =kbkc

[H2O] kb + [CO2] kc

63 Here and below, we give formulae only for the forward half-reaction rate constants k. Formulae for the backward half-reaction rate constants k have corresponding forms as shown inthe preceding sections.

Page 48: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

47

H2

CO2

H2O

ACE

PYR

OXA

FUMSUC

AKG

OXS

b

c

ijka

de

f

g

h

FIG. 26: Graph of Fig. 25 with cAC and its parallel linksto water removed. For all remaining species except acetate(ACE), neither sources nor sinks are assumed, and these maybe removed with non-trivial instances of the composition ruleof Sec. A 2 c. Each of these removals changes the degree ofthe remaining reactions, and thus changes the topology of thegraph.

kdef =kdekf

[H2O] kde + [H2] kf

kdefg =kdefkg

[H2O] kdef + [H2] [CO2] kg

kdefgh =kdefgkh

[H2O]2kdefg + [CO2] kh

kdefghijka =kdefghkijka

[H2O]2kdefgh + [H2] kijka

. (A33)

These define the maximal reduction of the original rTCAgraph, to the graph shown in Fig. 27.

H2

CO2

H2O

ACE

OXA

bc

defghijka

FIG. 27: Graph of Fig. 26 with all internal nodes from lin-ear chains removed. [H2O], [H2], [CO2], and [ACE] are thefour molecular concentrations to which boundary sources arecoupled. [OXA] is retained as the last representation of thenetwork catalysis of the loop, indicated by highlighting of thereaction in which OXA is input and output with equal stoi-chiometry. In steady state, OXA is in equilibrium with ACE,because it is not coupled to external currents.

The lumped-parameter rate equations for Fig. 27,parametrized by lumped-parameter rate constants, are

Jbc = [ACE] [H2] [CO2]2kbc

− [OXA] [H2O] kbc

Jdefghijka = [OXA] [H2]4[CO2]

2kdefghijka

− [OXA] [ACE] [H2O]2kdefghijka .(A34)

In steady state Jbc = 0 and [OXA] may be replaced withthe equilibrium function

[OXA] =kbc

kbc

[H2] [CO2]2

[H2O][ACE] . (A35)

b. Network reaction fluxes and their control parameters

For the remainder of the appendix we replace the sub-script defghijka with designation rTCA in currents J , half-reaction rate constants k, k, and equilibrium constantsK. Dimensionally, the rate constants require the concen-tration of OXA in the mass-action law, and so presumethat the anaplerotic segment bc has been handled.

Plugging Eq. (A35) into the second rate equation ofEq. (A34), and supposing [OXA] is in equilibrium with[ACE] at a (non-equilibrium) steady state for the networkas a whole, we obtain the only independent mass-actionrate equation for the reduced network. This is the currentproducing acetate:

JrTCA = krTCA

kbc

kbc[H2] [CO2]

2[H2O] [ACE]

(

krTCA

krTCA

[H2]4[CO2]

2

[H2O]2

− [ACE]

)

. (A36)

The first term in parenthesis in Eq. (A36) is the concen-tration at which acetate would be in equilibrium with the

inorganic inputs, which we denote64

[ACE]G ≡

krTCA

krTCA

[H2]4[CO2]

2

[H2O]2

. (A37)

64 Although the lumped-parameter rate constant in this relation

Page 49: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

48

Therefore the network response is proportional to theoffset of [ACE] from its equilibrium value, with a rateconstant that depends on the particular contributions ofchemical potential from [CO2] and reductant.

4. Interaction of Wood-Ljungdahl with rTCA

We may envision an early Wood-Ljungdahl “feeder”pathway to acetyl-CoA as a reaction with the same stoi-chiometry as rTCA for the creation of acetate, but fixedhalf-reaction rate constants that do not depend on theinternal concentrations in the network. This may be apre-pterin mineral pathway [56], in which rate constantsare determined by the abiotic environment, or an earlypathway using pterin-like cofactors, if the concentrationsof these are somehow buffered from the instantaneousflows through the reductive pathway. Labeling this “lin-ear” effective reaction WL, the rate equation becomes

JWL = kWL[H2O]2

(

kWL

kWL

[H2]4[CO2]

2

[H2O]2

− [ACE]

)

.

(A38)

Note that kWL/kWL = krTCA/krTCA because both areexpressions for the equilibrium constant which dependsonly on the free energy of reaction.

To understand the performance of a joint network inthe presence of losses, as the simplest case introduce areaction labeled Env standing for dilution of acetate to anenvironment at zero concentration. The dilution currentbecomes

JEnv = kEnv [ACE] . (A39)

At a non-equilibrium steady state the total losses mustequal the total supply currents, so that

JEnv = JrTCA + JWL. (A40)

The un-reduced equation for steady-state currents canbe written

appears complex, the consistency conditions with single-reactionequilibrium constants ensure that krTCA/krTCA is independentof synthetic pathway and equal to the exponential of the Gibbs

free energy of formation.

JrTCA + JWL = [H2O]2

{

krTCAkrTCA

Kbc

KrTCA

[CO2]

[H2][ACE]

1/2

G [ACE] + kWL

}

([ACE]G − [ACE])

= JD = kD [ACE] (A41)

The graph corresponding to this model for rate laws isshown in Fig. 28.

WL

Env

H2

CO2

H2O

ACE

OXA

bc

rTCA

FIG. 28: Hypergraph model for parallel reactions through therTCA and WL pathways, coupled to a linear drain reactionrepresenting dilution of acetate by the environment.

The variable that characterizes the “impedance” of achemical reaction network, and displays thresholds forautocatalysis when these exist, is the ratio of the outputacetate concentration to the value that would exist in a

Gibbs equilibrium with the inputs:

x ≡

[ACE]

[ACE]G. (A42)

For a network with no reaction barriers (either in rateconstants or due to limitations of network catalysts, theoutput x → 1.

The two control parameters that govern the relativecontributions of the rTCA loop and the direct WL feederare

zrTCA =

krTCAkrTCA

kEnv

Kbc

KrTCA

[CO2] [H2O]2

[H2][ACE]

3/2

G

zWL =kWL[H2O]

2

kEnv

. (A43)

Each control parameter is a ratio of lumped half-reactionrates that feed [ACE] to the environment dilution con-stant kEnv through which it drains.

In terms of zWL and zrTCA, the normalized concentra-tion x – which is proportional by kEnv to the total current

Page 50: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

49

through the system – satisfies

x =1

2

(

1 −

1 + zWL

zrTCA

)

+

zWL

zrTCA

+1

4

(

1 −

1 + zWL

zrTCA

)2

.

(A44)The solution to Eq. (A44) is shown versus base-10 log-

arithms of zrTCA and zWL in Fig. 12 in the main text.

The critical (unsupported) response of the rTCA loop oc-curs at zWL → 0 and zrTCA = 1. It is identified with thediscontinuity in the derivative ∂x/∂zrTCA at zrTCA = 1and the exactly zero value of x for zrTCA < 1. As zWL

increases from zero, the transition becomes smooth, anda nonzero concentration x is maintained against dilutionat all values of zrTCA.

[1] E. Schrodinger. What is Life?: the physical aspect ofthe living cell. Cambridge U. Press, New York, 1992.

[2] Eric Smith. Thermodynamics of natural se-lection I: Energy and entropy flows throughnon-equilibrium ensembles. J. Theor. Biol.,http://dx.doi.org/10.1016/j.jtbi.2008.02.010, 2008.

[3] Eric Smith. Thermodynamics of natural selec-tion II: Chemical Carnot cycles. J. Theor. Biol.,http://dx.doi.org/10.1016/j.jtbi.2008.02.008, 2008.

[4] Paul G. Falkowski, Tom Fenchel, and Edward F. De-long. The microbial engines that drive earth’s biogeo-chemical cycles. Science, 320:1034–1039, 2008.

[5] Joseph W. Lengeler, Gerhart Drews, and Hans G.Schlegel. Biology of the Prokaryotes. Blackwell Science,New York, 1999.

[6] Harold J. Morowitz. Beginnings of Cellular Life. YaleU. Press, New Haven, CT., 1992.

[7] Eric Smith and Harold J. Morowitz. Universality inintermediary metabolism. Proc. Nat. Acad. Sci. USA,101:13168–13173, 2004. SFI preprint # 04-07-024.

[8] Kalervo Rankama and Thure Georg Sahama. Geochem-istry,. U. Chicago Press, Chicago, Ill., 1950.

[9] Arren Bar-Even, Elad Noor, Yonatan Savir, WolframLiebermeister, Dan Davidi, Dan S. Tawfik, and RonMilo. The moderately efficient enzyme: Evolutionaryand physicochemical trends shaping enzyme parame-ters. Biochem., 50:4402–4410, 2011.

[10] Herbert A. Simon. The architecture of complexity.Proc. Am. Phil. Soc., 106:467–482, 1962.

[11] Herbert A. Simon. The organization of complex sys-tems. In Howard H. Pattee, editor, Hierarchy theory:The challenge of complex systems, pages 3–27, NewYork, 1973. George Braziller.

[12] L. W. Ancel and W. Fontana. Plasticity, evolvabilityand modularity in rna. J. Exp. Zool. (Mol. Dev. Evol.),288:242–283, 2000.

[13] W. Fontana. Modeling ‘evo-devo’ with rna. Bioessays,24:1164–1177, 2002.

[14] Andreas Wagner. Robustness and evolvability: a para-dox resolved. Proceedings of the Royal Society B: Bio-logical Sciences, 275(1630):91–100, 2008.

[15] Gunter P. Wagner and Lee Altenberg. Perspective:Complex adaptations and the evolution of evolvability.Evolution, 50(3):967–976, 1996.

[16] Marc Kirschner and John Gerhart. Evolvability.Proceedings of the National Academy of Sciences,95(15):8420–8427, 1998.

[17] John Gerhart and Marc Kirschner. Cells, embryos, andevolution. Wiley, New York, 1997.

[18] John Gerhart and Marc Kirschner. The theory of facil-itated variation. Proc. Nat. Acad. Sci. USA, 104:8582–

8589, 2007.[19] E. Ravasz, A. L. Somera, D. A. Mongru, Z. N. Oltvai,

and A.-L. Barabsi. Hierarchical organization of modu-larity in metabolic networks. Science, 297(5586):1551–1555, 2002.

[20] Roger Guimera and Luis A. Nunes Amaral. Functionalcartography of complex metabolic networks. Nature,433(7028):895–900, 2005.

[21] M. E. J. Newman. Modularity and community structurein networks. Proceedings of the National Academy ofSciences, 103(23):8577–8582, 2006.

[22] J. J. Elser, R. W. Sterner, E. Gorokhova, W. F. Fa-gan, T. A. Markow, J. B. Cotner, J. F. Harrison, S. E.Hobbie, G. M. Odell, and L. W. Weider. Biological sto-ichiometry from genes to ecosystems. Ecology Letters,3:540–550, 2000.

[23] J. Craig Venter, Karin Remington, John F. Heidelberg,Aaron L. Halpern, Doug Rusch, Jonathan A. Eisen,Dongying Wu, Ian Paulsen, Karen E. Nelson, WilliamNelson, Derrick E. Fouts, Samuel Levy, Anthony H.Knap, Michael W. Lomas, Ken Nealson, Owen White,Jeremy Peterson, Jeff Hoffman, Rachel Parsons, HollyBaden-Tillson, Cynthia Pfannkoch, Yu-Hui Rogers, andHamilton O. Smith. Environmental genome shotgun se-quencing of the sargasso sea. Science, 304:66–74, 2004.

[24] Elhanan Borenstein, Martin Kupiec, Marcus W. Feld-man, and Eytan Ruppin. Large-scale reconstructionand phylogenetic analysis of metabolic environments.Proc. Nat. Acad. Sci. USA, 105:14482–14487, 2008.

[25] Jan P. Amend and Everett L. Shock. Energetics ofoverall metablic reactions of thermophilic and hyper-thermophilic archaea and bacteria. FEMS MicrobiologyReviews, 25:175–243, 2001.

[26] Anna-Louise Reysenbach and Everett Shock. Merg-ing genomes with geochemistry in hydrothermal ecosys-tems. Science, 296:1077–1082, 2002.

[27] William Martin, John Baross, Deborah Kelley, andMichael J. Russell. Hydrothermal vents and the originof life. Nature Rev. Microbiol., 6:805–814, 2008.

[28] Everett L. Shock. Minerals as energy sources for mi-croorganisms. Econ. Geol., 104:1235–1248, 2009.

[29] Douglas H. Erwin, Marc Laflamme, Sarah M. Tweedt,Erik A. Sperling, Davide Pisani, and Kevin J. Peter-son. The Cambrian conundrum: early divergence andlater ecological success in the early history of animals.Science, 334:1091–1097, 2011.

[30] Douglas H. Erwin and Sarah Tweedt. Ecological driversof the Ediacaran-Cambrian diversification of metazoa.Evol. Ecol., 26:417–433, 2012.

[31] Douglas H. Erwin and James W. Valentine. The Cam-brian Radiation. in press, 2012.

Page 51: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

50

[32] Douglas H. Erwin. Macroevolution: dynamics of diver-sity. Current Biology, 21:R1000–R1001, 2012.

[33] Harold J. Morowitz and Eric Smith. Energy flow andthe organization of life. Complexity, 13:51–59, 2007. SFIpreprint # 06-08-029.

[34] Jo/ ao F. Matias Rodrigues and Andreas Wagner.Evolutionary plasticity and innovations in complexmetabolic reaction networks. PLoS Computational Bi-ology, 5:e1000613:1–11, 2009.

[35] Leo W. Buss. The evolution of individuality. PrincetonU. Press, Princeton, N. J., 2007.

[36] R. A. Fisher. The genetical theory of natural selection.Oxford U. Press, London, 2000.

[37] Warren J. Ewens. Mathematical Population Genetics.Springer, Heidelberg, second edition, 2004.

[38] F. John Odling-Smee, Kevin N. Laland, and Marcus W.Feldman. Niche construction: the neglected process inevolution. Princeton U. Press, Princeton, N. J., 2003.

[39] Vijayasarathy Srinivasan and Harold J. Morowitz.The canonical network of autotrophic intermediarymetabolism: Minimal metabolome of a reductivechemoautotroph. Biol. Bulletin, 216:126–130., 2009.

[40] Vijayasarathy Srinivasan and Harold J. Morowitz. Anal-ysis of the intermediary metabolism of a reductivechemoautotroph. Biol. Bulletin, 217:222–232, 2009.

[41] Rogier Braakman and Eric Smith. The emergence andearly evolution of biological carbon fixation. PLoSComp. Biol., 8:e1002455, 2012.

[42] Marie Csete and John Doyle. Bow ties, metabolism anddisease. Trends. Biotechnol., 22:446–450, 2004.

[43] Jing Zhao, Lin Tao, Hong Yu, JianHua Luo, ZhiWeiCao, and YiXue Li. Bow-tie topological features ofmetabolic networks and the functional significance. Chi-nese Sci. Bull., 52:1036–1045, 2007.

[44] William J. Riehl, Paul L. Krapivsky, Sidney Redner,and Daniel Segre. Signatures of arithmetic simplicity inmetabolic network architecture. PLoS ComputationalBiology, 6:e1000725, 2010.

[45] Vijayasarathy Srinivasan, Harold J. Morowitz, and Har-ald Huber. What is an autotroph? Arch. Microbiol.,194:135–140, 2012.

[46] Robert W. Sterner and James J. Elser. Ecological Sto-ichiometry: the biology of elements from molecules tothe biosphere. Princeton U. Press, Princeton, NJ, 2002.

[47] Eric Smith and Harold J. Morowitz. The autotrophicorigins paradigm and small-molecule organocatalysis.Orig. Life Evol. Biosphere, 40:397–402, 2010.

[48] Rafael F. Say and Georg Fuchs. Fructose 1,6-bisphosphate aldolase/phosphatase may be an ancestralgluconeogenic enzyme. Nature, 464:1077–1081, 2010.

[49] Alexander I. Oparin. The origin of life. In Bernal [267],pages 199–234.

[50] J. B. S. Haldane. The origin of life. In Bernal [267],pages 242–249.

[51] Carl R. Woese. The universal ancestor.Proc. Nat. Acad. Sci. USA, 95:6854–6859, 1998.

[52] C. R. Woese. Interpreting the universal phylogenetictree. Proc. Nat. Acad. Sci. USA, 97:8392–8396, 2000.

[53] C. R. Woese. On the evolution of cells.Proc. Nat. Acad. Sci. USA, 99:8742–8747, 2002.

[54] Nigel Goldenfeld and Carl Woese. Life is physics: evo-lution as a collective phenomenon far from equilibrium.Ann. Rev. Cond. Matt. Phys., 2010.

[55] Steven A. Benner, Andrew D. Ellington, and Andreas

Tauer. Modern metabolism as a palimpsest of the rnaworld. Proc. Nat. Acad. Sci. USA, 86:7054–7058, 2006.

[56] Claudia Huber and Wachtershauser, Gunter. Activatedacetic acid by carbon fixation on (fe,ni)s under primor-dial conditions. Science, 276:245–247, 2000.

[57] Manfred Eigen and Peter Schuster. The hypercycle,part a: The emergence of the hypercycle. Naturwis-senschaften, 64:541–565, 1977.

[58] Manfred Eigen and Peter Schuster. The hypercycle,part c: The realistic hypercycle. Naturwissenschaften,65:341–369, 1978.

[59] Stuart Kauffman. The Origins of Order: Self-Organization and Selection in Evolution. OxfordU. Press, London, 1993.

[60] Wim Hordijk, Stuart A. Kauffman, and Mike Steel.Required levels of catalysis for emergence of autocat-alytic sets in models of chemical reaction systems.Int. J. Mol. Sci., 12:3085–3101, 2012.

[61] Jakob L. Andersen, Christoph Flamm, Daniel Merkle,and Peter F. Stadler. Maximizing output and recogniz-ing autocatalysis in chemical reaction networks is NP-complete. J. Sys. Chem., 3:1, 2012.

[62] M. F. Utter and H. G. Wood. Mechanisms of fixa-tion of carbon dioxide by heterotrophs and autotrophs.Adv. Enzymol. Relat. Areas Mol. Biol., 12:41–151, 1951.

[63] L. Ljungdahl, E. Irion, and H. G. Wood. Total sun-thesis of acetate from co2. i. co-methylcobyric acidand co-(methyl)-5-methoxybenzimidazolylcobamide asintermediates with Clostridium thermoaceticum. Bio-chemistry, 4:2771–2780, 1965.

[64] L. Ljungdahl and H. G. Wood. Incorporation of c14

from carbon dioxide into sugar phosphates, carboxylicacids, and amino acids by Clostridium thermoaceticum.J. Bacteriol., 89:1055–1064, 1965.

[65] B. Edward H. Maden. Tetrahydrofolate and tetrahy-dromethanopterin compared: functionally distinct car-riers in c1 metabolism. Biochem. J., 350:609–629, 2000.

[66] Ivan A. Berg, Daniel Kockelkorn, W. Hugo Ramos-Vera,Rafael F. Say, Jan Zarzycki, Michael Hugler, Birgit E.Alber, and Georg Fuchs. Autotrophic carbon fixationin archaea. Nature Rev. Microbiol., 8:447–460, 2010.

[67] Juli Pereto. Out of fuzzy chemistry: from prebioticchemistry to metabolic networks. Chem. Soc. Rev.,pages –, 2012.

[68] Michael J. Russell and William Martin. The rockyroots of the acetyl-coa pathway. Trends Biochem. Sci.,29:358–363, 2004.

[69] William Martin and Michael J. Russell. On the ori-gin of biochemistry at an alkaline hydrothermal vent.Phil. Trans. Roy. Soc. B, 362:1887–1926, 2007.

[70] B. B. Buchanan and D. I. Arnold. A reverse krebs cyclein photosynthesis: Consensus at last. Photosynth. Res.,24:47–53, 1990.

[71] George D. Cody, Nabil Z. Boctor, Timothy R. Filley,Robert M. Hazen, James H. Scott, Anurag Sharma,and Hatten S. Jr. Yoder. Primordial carbonylated iron-sulfur compounds and the synthesis of pyruvate. Sci-ence, 289:1337–1340, 2000.

[72] Michael Hugler and S. M. Seivert. Beyond thecalvin cycle: Autotrophic carbon fixation in the ocean.Ann. Rev. Marine Sci., 3:261–289, 2011.

[73] Georg Fuchs. Alternative pathways of carbon dioxidefixation: Insights into the early evolution of life? Ann.Rev. Microbiol., 65(1):631–658, 2011.

Page 52: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

51

[74] Arren Bar-Even, Avi Flamholz, Elad Noor, and RonMilo. Thermodynamic constraints shape the structureof carbon fixation pathways. Biochimica et BiophysicaActa (BBA) - Bioenergetics, 1817(9):1646 – 1659, 2012.

[75] Holger Dobbek, Vitali Svetlitchnyi, Lothar Gremer,Robert Huber, and Ortwin Meyer. Crystal structure ofa carbon monoxide dehydrogenase reveals a [Ni-4Fe-5S]cluster. Science, 293:1281–1285, 2001.

[76] Claudine Darnault, Anne Volbeda, Eun Jin Kim, PierreLegrand, Xavier Vernede, Paul A. Lindahl, and Juan C.Fontecilla-Camps. Ni-zn-[fe4-s4] and ni-ni-[fe4-s4] clus-ters in closed and open [alpha] subunits of acetyl-coasynthase/carbon monoxide dehydrogenase. Nat StructMol Biol, 10(4):271–279, 2003.

[77] Javier Seravalli, Yuming Xiao, Weiwei Gu, Stephen P.Cramer, William E. Antholine, Vladimir Krymov,Gary J. Gerfen, and Stephen W. Ragsdale. Evidencethat nini acetyl-coa synthase is active and that the cunienzyme is not. Biochemistry, 43(13):3944–3955, 2004.

[78] Vitali Svetlitchnyi, Holger Dobbek, Wolfram Meyer-Klaucke, Thomas Meins, Brbel Thiele, Piero Rmer,Robert Huber, and Ortwin Meyer. A functional ni-ni-[4fe-4s] cluster in the monomeric acetyl-coa synthasefrom carboxydothermus hydrogenoformans. Proceedingsof the National Academy of Sciences of the United Statesof America, 101(2):446–451, 2004.

[79] Ruma Banerjee and Stephen W. Ragsdale. The manyfaces of vitamin b12: catalysis by cobalamin-dependentenzymes. Annu. Rev. Biochem., 72:209–247, 2003.

[80] Gunes Bender, Elizabeth Pierce, Jeffrey A. Hill,Joseph E. Darty, and Stephen W. Ragsdale. Metal cen-ters in the anaerobic microbial metabolism of co andco2. Metallomics, 3:797–815, 2011.

[81] S. W. Ragsdale, J. E. Clark, L. G. Ljungdahl, L. L.Lundie, and H. L. Drake. Properties of purified car-bon monoxide dehydrogenase from clostridium ther-moaceticum, a nickel, iron-sulfur protein. Journal ofBiological Chemistry, 258(4):2364–2369, 1983.

[82] Jan Zarzycki, Volker Brecht, Michael Muller, and GeorgFuchs. Identifying the missing steps of the autotrophic3-hydroxypropionate CO2 fixation cycle in Chloroflexusaurantiacus. Proc. Nat. Acad. Sci. USA, 106:21317–21322, 2009.

[83] J. Dongun Kim, Augustina Rodriguez-Granillo,David A. Case, Vikas Nanda, and Paul G. Falkowski.Energetic selection of topology in ferredoxins. PLoScomp. biol., 8:e1002463, 2012.

[84] E. Chabriere, M. H. Charon, A. Volbeda, L. Pieulle,E. C. Hatchikian, and J. C. Fontecilla-Camps. Crys-tal structures of the key anaerobic enzyme pyru-vate:ferredoxin oxidoreductase, free and in complex withpyruvate. Nat. Struct. Biol., 6:182–190, 1999.

[85] Jonathan Lombard and David Moreira. Early evolu-tion of the biotin-dependent carboxylase family. BMCevol. biol., 11:232:1–22, 2011.

[86] Miho Aoshima, Masaharu Ishii, and Yasuo Igarashi.A novel biotin protein required for reductive carboxy-lation of 2-oxoglutarate by isocitrate dehydrogenas inHydrogenobacter thermophilus TK-6. Mol. Microbiol.,51:791–798, 51.

[87] M. C. W. Evans, B. B. Buchanan, and D. I. Arnon.A new ferredoxin dependent carbon reduction cycle inphotosynthetic bacterium. Proc. Nat. Acad. Sci. USA,55:928–934, 1966.

[88] Gunter Wachtershauser. Evolution of the first metaboliccycles. Proc. Nat. Acad. Sci. USA, 87:200–204, 1990.

[89] Miho Aoshima, Masaharu Ishii, and Yasuo Igarashi. Anovel enzyme, citryl-coa synthetase, catalysing the firststep of the citrate cleavage reaction in hydrogenobacterthermophilus tk-6. Molecular Microbiology, 52:751–761,2004.

[90] Miho Aoshima, Masaharu Ishii, and Yasuo Igarashi. Anovel enzyme, citryl-coa lyase, catalysing the secondstep of the citrate cleavage reaction in hydrogenobacterthermophilus tk-6. Molecular Microbiology, 52:763–770,2004.

[91] Shinya Watanabe, Michael Zimmermann, Michael B.Goodwin, Uwe Sauer, Clifton E. Barry 3rd, and He-lena I. Boshoff. Fumarate reductase activity maintainsan energized membrane in anaerobic Mycobacterium tu-berculosis. PLoS Pathogens, 7:e1002287, 2011.

[92] Jing Tian, Ruslana Bryk, Manabu Itoh, MakotoSuematsu, and Carl Nathan. Variant tricar-boxylic acid cycle in Mycobacterium tuberculosis:identification of α-ketoglutarate decarboxylase.Proc. Nat. Acad. Sci. USA, 102:10670–10675, 2005.

[93] Baughn, Anthony D. and Garforth, Scott J. andVilcheze, Catherine and Jacobs Jr., William R. Ananaerobic-type α-ketoglutarate ferredoxin oxidoreduc-tase completes the oxidative tricarboxylic acid cy-cle of Mycobacterium tuberculosis. PLoS Pathogens,5:e1000662:1–10, 2009.

[94] Shuyi Zhang and Donald A. Bryant. The tricarboxylicacid cycle in cyanobacteria. Science, 334:1551–1553,2011.

[95] Harald Huber, Martin Gallenberger, Ulrike Jahn, EvaEylert, Ivan A. Berg, Daniel Kockelkorn, WolfgangEisenreich, and Georg Fuchs. A dicarboxylate/4-hydroxybutyrate autotrophic carbon assimilation cyclein the hyperthermophilic archaeum Ignicoccus hospi-talis. Proc. Nat. Acad. Sci. USA, 105:7851–7856, 2008.

[96] Ute Muh, Irfan Cinkaya, Simon P. J. Albracht, andWolfgang Buckel. 4-hydroxybutyryl-coa dehydratasefrom clostridium aminobutyricum: Characterizationof fad and ironsulfur clusters involved in an overallnon-redox reaction. Biochemistry, 35(36):11710–11718,1996.

[97] Berta M. Martins, Holger Dobbek, Irfan Cinkaya, Wolf-gang Buckel, and Albrecht Messerschmidt. Crystalstructure of 4-hydroxybutyryl-coa dehydratase: Radicalcatalysis involving a [4fe-4s] cluster and flavin. Proceed-ings of the National Academy of Sciences of the UnitedStates of America, 101(44):15645–15649, 2004.

[98] Birgit E. Alber, Johannes W. Kung, and Georg Fuchs.3-hydroxypropionyl-coenzyme a synthetase from met-allosphaera sedula, an enzyme involved in autotrophicco2 fixation. Journal of Bacteriology, 190(4):1383–1389,February 15, 2008.

[99] Robin Teufel, Johannes W. Jung, Daniel Kockelkorn,Birgit E. Alber, and Georg Fuchs. 3-Hdyroxypropionyl-Coenzyme A Dehydratase and Acroloyl-CoenzymeA Reductase, Enzymes of the Autotrophic 3-Hydroxypropionate/4-Hydroxybutyrate Cycle in theSulfolobales. J. Bacteriology, 191:4572–4581, 2009.

[100] J. A. Bassham, A. A. Benson, L. D. Kay, A. Z. Harris,A. T. Wilson, and M. Calvin. The path of carbon inphotosynthesis. xxi. the cyclic regeneration of carbondioxide acceptor. J. Am. Chem. Soc., 76:1760–1770,

Page 53: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

52

1954.[101] F. A. Tabita. Research on carbon dioxide fixation in

photosynthetic microorganisms (1971 – present). Pho-tosynth. Res., 80:315–332, 2004.

[102] Marion Eisenhut, Shira Kahlon, Dirk Hasse, RalphEwald, Judy Lieman-Hurwitz, Teruo Ogawa, WolfgangRuth, Hermann Bauwe, Aaron Kaplan, and MartinHagemann. The plant-like c2 glycolate cycle and thebacterial-like glycerate pathway cooperate in phospho-glycolate metabolism in cyanobacteria. Plant Physiol-ogy, 142:333–342, 2006.

[103] Marion Eisenhut, Wolfgang Ruth, Maya Haimovich,Hermann Bauwe, Aaron Kaplan, and Martin Hage-mann. The photorespiratory glycolate metabolismis essential for cyanobacteria and might havebeen conveyed endosymbiontically to plants.Proc. Nat. Acad. Sci. USA, 105(44):17199–17204,2008.

[104] Christine H. Foyer, Arnold J. Bloom, Guillaume Que-val, and Graham Noctor. Photorespiratory metabolism:Genes, mutants, energetics, and redox signaling.Annu. Rev. Plant Biol., 60:455–484, 2009.

[105] Yongqing Liu, Jizhong Zhou, Marina V. Omelchenko,Alex S. Beliaev, Amudhan Venkateswaran, Julia Stair,Liyou Wu, Dorothea K. Thompson, Dong Xu, Igor B.Rogozin, Elena K. Gaidamakova, Min Zhai, Kira S.Makarova, Eugene V. Koonin, and Michael J. Daly.Transcriptome dynamics of deinococcus radiodurans re-covering from ionizing radiation. Proceedings of the Na-tional Academy of Sciences, 100(7):4191–4196, 2003.

[106] D. von Wettstein, S. Gough, and C. G. Kannan-gara. Chlorophyll biosynthesis. The Plant Cell Online,7(7):1039–1057, 1995.

[107] Rogier Braakman and Eric Smith. An evolutionary con-text for the reconstructed metabolism of the hyperther-mophilic chemoautotroph Aquifex aeolicus. 2012.

[108] Eric H. Davidson and Douglas H. Erwin. Gene regula-tory networks and the evolution of animal body plans.Science, 311:796–800, 2006.

[109] Douglas H. Erwin and Eric H. Davidson. The evolu-tion of hierarchical gene regulatory networks. NatureRev. Genetics, 10:141–148, 2009.

[110] Marc Hetzel, Matthias Brock, Thorsten Selmer, Anto-nio J. Pierik, Bernard T. Golding, and Wolfgang Buckel.Acryloyl-CoA reductase from Clostridium propionicum:An enzyme complex of propionyl-CoA dehydrogenaseand electron-transferringflavoprotein. Eur. J. Biochem.,270:902–910, 2003.

[111] Gloria Herrmann, Elamparithi Jayamani, Galina Mai,and Wolfgang Buckel. Energy conservation via electron-transferring flavoprotein in anaerobic bacteria. J. Bac-teriol., 190:784–791, 2008.

[112] Wolfgang Buckel. Unusual dehydrations in anaerobicbacteria: considering ketyls (radical anions) as reactiveintermediates in enzymatic reactions. FEBS Letters,389(1):20 – 24, 1996.

[113] Megan J. Gruer, Peter J. Artymiuk, and John R. Guest.The aconitase family: three structural variations on acommon theme. Trends Biochem. Sci., 22:3–6, 1997.

[114] Miho Aoshima and Yasuo Igarashi. A noveloxcalosuccinate-forming enzyme involved in the reduc-tive carboxylation of 2-oxoglutarate in Hydrogenobacterthermophilus TK-6. Mol. Microbiol., 62:748–759, 2006.

[115] Miho Aoshima and Yasuo Igarashi. Nondecarboxylat-

ing and decarboxylating isocitrate dehydrogenases: ox-alosuccinate reductase as an ancestral form of isocitratedehydrogenase. J. Bacteriol., 190:2050–2055, 2008.

[116] Shelley D. Copley. Enzymes with extra talents:moonlighting functions and catalytic promiscuity.Curr. Opin. Chem. Biol., 7:265–272, 2003.

[117] Elad Noor, Eran Eden, Ron Milo, and Uri Alon. Cen-tral carbon metabolism as a minimal biochemical walkbetween precursors for biomass and energy. Mol. Cell,39:809–820, 2010.

[118] Harold J. Morowitz, Vijayasarathy Srinivasan, and EricSmith. Ligand field theory and the origin of life asan emergent feature of the periodic table of elements.Biol. Bull., 219:1–6, 2010.

[119] Michael J. Russell and Allan J. Hall. The onset andearly evolution of life. Geol. Soc. Am. Memoir, 198:1–32, 2006.

[120] Juan C. Fontecilla-Camps, Patricia Amara, ChristineCavazza, Yvain Nicolet, and Anne Volbeda. Structure-function relationships of anaerobic gas-processing met-alloenzymes. Nature, 460:814–822, 2009.

[121] Tzanko I. Doukov, Tina M. Iverson, Javier Seravalli,Stephen W. Ragsdale, and Catherine L. Drennan. ANi-Fe-Cu center in a bifunctional carbon monoxidedehydrogenase/acetyl-CoA synthase. Science, 298:567–572, 2002.

[122] Javier Seravalli, Weiwei Gu, Annie Tam, Erick Strauss,Tadhg P. Begley, Stephen P. Cramer, and Stephen W.Ragsdale. Functional copper at the acetyl-coa synthaseactive site. Proceedings of the National Academy of Sci-ences, 100(7):3689–3694, 2003.

[123] Stephen W. Ragsdale. Nickel-based enzyme systems.Journal of Biological Chemistry, 284(28):18571–18575,2009.

[124] Anne Volbeda, Marie-Helene Charon, Claudine Pi-ras, E. Claude Hatchikian, Michel Frey, and Juan C.Fontecilla-Camps. Crystal structure of the nickel-iron hydrogenase from desulfovibrio gigas. Nature,373(6515):580–587, 1995.

[125] John W. Peters, William N. Lanzilotta, Brian J. Lemon,and Lance C. Seefeldt. X-ray crystal structure of the fe-only hydrogenase (cpi) from clostridium pasteurianumto 1.8 angstrom resolution. Science, 282(5395):1853–1858, 1998.

[126] MM Georgiadis, H Komiya, P Chakrabarti, D Woo,JJ Kornuc, and DC Rees. Crystallographic struc-ture of the nitrogenase iron protein from azotobactervinelandii. Science, 257(5077):1653–1659, 1992.

[127] J Kim and DC Rees. Structural models for the metalcenters in the nitrogenase molybdenum-iron protein.Science, 257(5077):1677–1682, 1992.

[128] Kyle M. Lancaster, Michael Roemelt, Patrick Ettenhu-ber, Yilin Hu, Markus W. Ribbe, Frank Neese, UweBergmann, and Serena DeBeer. X-ray emission spec-troscopy evidences a central carbon in the nitrogenaseiron-molybdenum cofactor. Science, 334(6058):974–977,2011.

[129] Thomas Spatzal, Mge Aksoyoglu, Limei Zhang, SusanaL. A. Andrade, Erik Schleicher, Stefan Weber, Dou-glas C. Rees, and Oliver Einsle. Evidence for inter-stitial carbon in nitrogenase femo cofactor. Science,334(6058):940, 2011.

[130] A. Ricardo, M. A. Carrigan, A. N. Olcott, and S. A.Benner. Borate minerals stabilize ribose. Science,

Page 54: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

53

303:196, 2004.[131] Arthur L. Weber. Sugars as the optimal biosynthetic

carbon substrate of aqueous life throughout the uni-verse. Orig. Life Evol. Biosphere, 30:33–43, 2000.

[132] Arthur L. Weber. Sugar model of the origin of life:Catalysis by amines and amino acid products. Orig. LifeEvol. Biosphere, 31:71–86, 2001.

[133] Arthur L. Weber. Chemical constraints governing theorigin of metabolism: The thermodynamic landscape ofcarbon group transformations under mild aqueous con-ditions. Orig. Life Evol. Biosphere, 32:333–357, 2002.

[134] Arthur L. Weber. Kinetics of organic transformationsunder mild aqueous conditions: Implications for the ori-gin of life and its metabolism. Orig. Life Evol. Bio-sphere, 34:473–495, 2004.

[135] Eliane Fischer and Uwe Sauer. A novel metabolic cy-cle catalyzes glucose oxidation and anaplerosis in hun-gry escherichia coli. Journal of Biological Chemistry,278(47):46446–46451, 2003.

[136] Dany J. V. Beste, Bhushan Bonde, Nathaniel Hawkins,Jane L. Ward, Michael H. Beale, Stephan Noack, Katha-rina Noh, Nicholas J. Kruger, R. George Ratcliffe, andJohnjoe McFadden. 13c metabolic flux analysis identi-fies an unusual route for pyruvate dissimilation in my-cobacteria which requires isocitrate lyase and carbondioxide fixation. PLoS Pathog, 7(7):e1002091, 07 2011.

[137] Stephanie Markert, Cordelia Arndt, Horst Felbeck,Dorte Becher, Stefan M. Sievert, Michael Hugler, DirkAlbrecht, Julie Robidart, Shellie Bench, Robert A. Feld-man, Michael Hecker, and Thomas Schweder. Physio-logical proteomics of the uncultured endosymbiont ofRiftia pachyptila. Science, 315(5809):247–250, 2007.

[138] Goro Kikuchi. The glycine cleavage system: Compo-sition, reaction mechanism, and physiological signifi-cance. Mol. Cell. Biochem., 1:169–187, 1973.

[139] H. A. Barker and J. V. Beck. The fermentative de-composition of purines by clostridium acidi-urici andclostridium cylindrosporum. Journal of BiologicalChemistry, 141(1):3–27, 1941.

[140] L J Waber and H G Wood. Mechanism of acetate syn-thesis from co2 by clostridium acidiurici. Journal ofBacteriology, 140(2):468–478, 1979.

[141] Kalin Vetsigian, Carl Woese, and Nigel Golden-feld. Collective evolution and the genetic code.Proc. Nat. Acad. Sci. USA, 103:10696–10701, 2006.

[142] Pere Puigbo, Yuri Wolf, and Eugene Koonin. Search fora ’tree of life’ in the thicket of the phylogenetic forest.Journal of Biology, 8, 2009.

[143] J. F. Kasting, D. H. Eggler, and S. P. Raeburn. Mantleredox evolution and the oxidation state of the archeanatmosphere. J. Geol., 101:245–257, 1993.

[144] James F. Kasting. Ups and owns of ancient oxygen.Nature, 443:643–644, 2006.

[145] Robert M. Hazen, Dominic Papineau, Wouter Bleeker,Robert T. Downs, John M. Ferry, Timothy J. McCoy,Dimitri A. Sverjensky, and Heixiong Yang. Mineral evo-lution. Am. Mineralogist, 93:1693–1720, 2008.

[146] D. Trail, E. B. Watson, and N. D. Tallby. The oxida-tion state of hadean magmas and implications for earlyearth’s atmosphere. Nature, 480:79–82, 2011.

[147] Roger Buick. When did oxygenic photosynthesis evolve?Philosophical Transactions of the Royal Society B: Bio-logical Sciences, 363(1504):2731–2743, 2008.

[148] Lee R. Kump. The rise of atmospheric oxygen. Nature,

451(7176):277–278, 2008.[149] William J. Brazelton and John A. Baross. Abun-

dant transposases encoded by the metagenome of a hy-drothermal chimney biofilm. ISME Journal, pages 1–5,2009.

[150] William Joseph Brazelton. Ecology of archaeal and bac-terial biofilm communities at the Lost City hydrothermalfield. PhD thesis, University of Washington, 2010.

[151] Norman R. Pace. A molecular view of microbial diver-sity and the biosphere. Science, 276:734–740, 1997.

[152] Francesca D. Ciccarelli, Tobias Doerks, Christian vonMering, Christopher J. Creevey, Berend Snel, and PeerBork. Toward automatic reconstruction of a highly re-solved tree of life. Science, 311:1283–1287, 2006.

[153] William Martin and Michael J. Russell. On the ori-gin of cells: An hypothesis for the evolutionary tran-sitions from abiotic geochemistry to chemoautotrophicprokaryotes, and from prokaryotes to nucleated cells.Philos. Trans. Roy. Soc. London, 358B:27–85, 2003.

[154] Leslie E. Orgel. The implauxibility of metabolic cycleson the early earth. PLoS Biology, 06:0005–0013, 2008.

[155] Ludmila Chistoserdova, Julia A. Vorholt, Rudolf K.Thauer, and Mary E. Lidstrom. C1 transfer enzymesand coenzymes linking methylotrophic bacteria andmethanogenic archaea. Science, 281:99–102, 1998.

[156] Julia A. Vorholt, Ludmila Chistoserdova, Sergei M.Stolyar, Rudolf K. Thauer, and Mary E. Lidstrom.Distribution of tetrahydromethanopterin-dependent en-zymes in methylotrophic bacteria and phylogenyof methenyl tetrahydromethanopterin cyclohydrolases.J. Bacteriol., 181:5750–5757, 1999.

[157] L. Chistoserdova, M. G. Kalyuzhnaya, and M. E.Lidstrom. The expanding world of methylotrophicmetabolism. Annu. Rev. Microbiol., 63:477–499, 2009.

[158] Ludmila Chistoserdova. Modularity of methylotrophy,revisited. Environmental Microbiology, 13(10):2603–2622, 2011.

[159] Tobias J. Erb, Ivan A. Berg, Volker Brecht, MichaelMller, Georg Fuchs, and Birgit E. Alber. Synthesis ofc5-dicarboxylic acids from c2-units involving crotonyl-coa carboxylase/reductase: The ethylmalonyl-coa path-way. Proceedings of the National Academy of Sciences,104(25):10631–10636, 2007.

[160] Tobias J. Erb, Volker Brecht, Georg Fuchs, MichaelMller, and Birgit E. Alber. Carboxylation mech-anism and stereochemistry of crotonyl-coa carboxy-lase/reductase, a carboxylating enoyl-thioester reduc-tase. Proceedings of the National Academy of Sciences,106(22):8871–8876, 2009.

[161] Thomas Handorf, Oliver Ebenhoh, and Reinhart Hein-rich. Expanding metabolic networks: scopes of com-pounds, robustness, and evolution. J. Mol. Evol.,61:498–512, 2005.

[162] Jason Raymond and Daniel Segre. The effect of oxygenon biochemical networks and the evolution of complexlife. Science, 311:1764–1767, 2006.

[163] Moritz Schutte, Alexander Skupin, Daniel Segre, andOliver Ebenhoh. Modeling the complex dynamics ofenzyme-pathway coevolution. Chaos, 20:045115, 2010.

[164] David E. Graham and Robert H. White. Elucidationof methanogenic coenzyme biosyntheses: from spec-troscopy to genomics. Nat. Prod. Rep., 19:133–147,2002.

[165] Tadhg P Begley, Abhishek Chatterjee, Jeremiah W

Page 55: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

54

Hanes, Amrita Hazra, and Steven E Ealick. Cofac-tor biosynthesisstill yielding fascinating new biologi-cal chemistry. Current Opinion in Chemical Biology,12(2):118 – 125, 2008.

[166] Christopher T. Jurgenson, Tadhg P. Begley, andSteven E. Ealick. The structural and biochemical foun-dations of thiamin biosynthesis. Annual Review of Bio-chemistry, 78(1):569–603, 2009.

[167] Faqing Huang, Charles Walter Bugg, and MichaelYarus. RNA-Catalyzed CoA, NAD, and FAD Synthesisfrom Phosphopantetheine, NMN, and FMN. Biochem-istry, 39:15548–15555, 2000.

[168] Jorge H. Crosa and Christopher T. Walsh. Genetics andassembly line enzymology of siderophore biosynthesis inbacteria. Microbiology and Molecular Biology Reviews,66(2):223–249, 2002.

[169] Alison Butler. Marine siderophores and microbial ironmobilization. BioMetals, 18:369–374, 2005.

[170] Frank H. Westheimer. Why nature chose phosphates.Science, 235:1173–1178, 1987.

[171] Harold B. White. Coenzymes as fossils of an earliermetabolic state. J. Mol. Evol., 7:101–104, 1976.

[172] Gregory A. Petsko and Dagmar Ringe. Protein structureand function. New Science Press, London, 2003.

[173] Alex Gutteridge and Janet M. Thornton. Understand-ing nature’s catalytic toolkit. Trends Biochem. Sci.,30:622–629, 2005.

[174] Julia D. Fischer, Gemma L. Holliday, Syed A. Rahman,and Janet M. Thornton. The structures and physico-chemical properties of organic cofactors in biocatalysis.J. Mol. Biol., 403:803–824, 2010.

[175] Claudia Andreini, Ivano Bertini, Gabriele Cavallaro,Gemma L. Holliday, and Janet M. Thornton. Metalions in biological catalysis: from enzyme databases togeneral principles. J. Biol. Inorg. Chem., 13:1205–1218,2008.

[176] Claudia Andreini, Ivano Bertini, Gabriele Cavallaro,Gemma L. Holliday, and Janet M. Thornton. Metal-MACiE: a database of metals involved in biologicalcatalysis. Bioinformatics App. Note, 25:2088–2089,2009.

[177] Stephen W. Ragsdale and Harland G. Wood. Enzymol-ogy of the acetyl-coa pathway of co2 fixation. CriticalReviews in Biochemistry and Molecular Biology, 26:261–300, 1991.

[178] Stephen W. Ragsdale and Manoj Kumar. Nickel-containing carbon monoxide dehydrogenase/acetyl-coasynthase. Chemical Reviews, 96:2515–2540, 1996.

[179] Stephen W. Ragsdale. Enzymology of theWood-Ljungdahl pathway of acetogenesis.Ann. N Y Acad. Sci., 1125:129–136, 2008.

[180] Iris Fry. The Emergence of Life on Earth: A Histor-ical and Scientific Overview. Rutgers U. Press, NewBrunswick, N.J., 2000.

[181] Raymond F. Gesteland, Thomas R. Cech, and John F.Atkins, editors. The RNA World. Cold Spring HarborLaboratory Press, Cold Spring Harbor, New York, thirdedition, 2006.

[182] Shelley D. Copley, Eric Smith, and Harold J. Morowitz.The origin of the RNA world: Co-evolution of genes andmetabolism. Bioorganic Chemistry, 35:430–443, 2007.

[183] Michael Yarus. Getting past the RNA world: the initialDarwinian ancestor. In John Atkins, Ray Gesteland,and Tom Cech, editors, Cold Spring Harbor Perspectives

in Biology, pages 1–8, Cold Spring Harbor, NY, 2011.Cold Spring Harbor Laboratory Press.

[184] Robert E. MacKenzie. Biogenesis and interconversion ofsubstituted tetrahydrofolates. In Raymond L. Blakelyand Stephen J. Benkovic, editors, Folates and Pterins,vol. 1: Chemistry and Biochemistry of Folates, pages255–306. John Wiley & Sons, New York, 1984.

[185] Roland G. Kallen and William P. Jencks. The dissoci-ation constants of tetrahydrofolic acid. J. Biol. Chem.,241:5845–5850, 1966.

[186] Patrick Stover and Verne Schirch. The metabolic role ofleucovorin. Trends in Biochemical Sciences, 18(3):102 –106, 1993.

[187] Teng Huang and Verne Schirch. Mechanismfor the coupling of atp hydrolysis to the con-version of 5-formyltetrahydrofolate to 5,10-methenyltetrahydrofolate. Journal of BiologicalChemistry, 270(38):22296–22300, 1995.

[188] M D Collins and D Jones. Distribution of isoprenoidquinone structural types in bacteria and their taxo-nomic implication. Microbiological Reviews, 45(2):316–354, 1981.

[189] Barbara Schoepp-Cothenet, Clement Lieutaud,Frauke Baymann, Andre Vermeglio, ThorstenFriedrich, David M. Kramer, and Wolfgang Nitschke.Menaquinone as a pool quinone in a purple bacterium.Proc. Nat. Acad. Sci. USA, 106:8549–8554, 2005.

[190] W. Nitscke, D.M. Kramer, A. Riedel, and U. Liebl.From naptho- to benzoquinones - (r)evolutionary reor-ganizations of electron transfer chains. In P. Mathis, ed-itor, Photosynthesis: from Light to the Biosphere, vol.1, pages 945–950. Kluwer Academic Press, Dordrecht,1995.

[191] Michael J. Russell and A. J. Hall. The emergence oflife from iron monosulphide bubbles at a submarine hy-drothermal redox and ph front. J. Geol. Soc. London,154:377–402, 1997.

[192] Gunter Wachtershauser. Before enzymes and tem-plates: A theory of surface metabolism. Microbiol. Rev.,52:452–484, 1988.

[193] Gunter Wachtershauser. Groundworks for an evolution-ary biochemistry: the iron-sulphur world. Prog. Bio-phys. Molec. Biol., 58:85–201, 1992.

[194] Robert Shapiro. Small molecule interactions were cen-tral to the origin of life. Quarterly Review of Biology,81:105–125, 2006.

[195] Robert Shapiro. A simpler origin for life. ScientificAmerican, Feb 12, 2007. (online).

[196] Wolfgang Heinen and Anne Marie Lauwers. Organic sul-fur compounds resulting from the interaction of iron sul-fide, hydrogen sulfide and carbon dioxide in an anaero-bic aqueous environment. Origins of Life and Evolutionof Biospheres, 26:131–150, 1996. 10.1007/BF01809852.

[197] G. D. Cody, N. Z. Boctor, R. M. Hazen, J. A. Bran-deis, H. J. Morowitz, and Jr. Yoder, H. S. Geochem-ical roots of autotrophic carbon fixation: hydrother-mal experiments in the system citric acid, h2O-(±FeS)(±NiS). Geochimica et Cosmochimica Acta., 65:3557–3576, 2001.

[198] K. Tazuya, C. Azumi, K. Yamada, and H. Kumaoka.Pyrimidine moiety of thiamin is biosynthesized frompyridoxine and histidine in Saccharomyces cerevisiae.Biochem. Mol. Biol. Int., 36(4):883–888, Jul 1995.

[199] Carlos F. III Barbas. Organocatalysis lost: modern

Page 56: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

55

chemistry, ancient chemistry, and an unseen biosyn-thetic apparatus. Angew. Chemie Int. Ed., 47:42–47,2008.

[200] David W. C. MacMillan. The advent and developmentof organocatalysis. Nature, 455:304–308, 2008.

[201] Powner, Matthew W. and Gerland, Beatrice andSutherland, John D. Synthesis of activated pyrim-idine ribonucleotides in prebiotically plausible condi-tions. Nature, 459:239–242, 2009.

[202] Decout, Jean-Luc and Maurel, Marie-Christine. N6-substituted adenine derivatives and RNA primitive cat-alysts. Orig. Life Evol. Biosphere, 23:299–306, 1993.

[203] Lubert Stryer. Biochemistry. Freeman, San Francisco,CA., second edition, 1981.

[204] Tracy A. Lincoln and Gerald F. Joyce. Self-sustainedreplication of an RNA enzyme. Science, 323:1229–1232,2009.

[205] Shelley D. Copley, Eric Smith, and Harold J. Mo-rowitz. A mechanism for the association of amino acidswith their codons and the origin of the genetic code.Proc. Nat. Acad. Sci. USA, 102:4442–4447, 2005.

[206] Daniel Segre and Doron Lancet. A statistical chemistryapproach to the origin of life. Chemtracts – Biochem-istry and Molecular Biology, 12:382–397, 1999.

[207] Daniel Segre, Dafna Ben-Ali, and Doron Lancet.Compositional genomes: Prebiotic information trans-fer in mutually catalytic noncovalent assemblies.Proc. Nat. Acad. Sci. USA, 97:4112–4117, 2000.

[208] Daniel Segre, Barak Shenhav, Ron Kafri, and DoronLancet. The molecular roots of compositional inheri-tance. J. Theor. Biol., 213:481–491, 2001.

[209] Christian de Duve. Blueprint for a cell. Neil Patterson,Burlington, N. C., 1991.

[210] Harry B. Gray. Chemical bonds: an introduction toatomic and molecular structure. University SciencePress, Sausalito, CA, 1994.

[211] George Wald. Life in the second and third periods: orwhy phosphorus and sulfur for high-energy bonds. InM. Kasha and B. Pullman, editors, Horizons in bio-chemistry, pages 127–142, New York, 1962. AcademicPress.

[212] Martin F. Hohmann-Marriott and Robert E.Blankenship. Evolution of photosynthesis.Annu. Rev. Plant. Biol., 62:515–548, 2011.

[213] Frederick Berkovitch, Yvain Nicolet, Jason T. Wan,Joseph T. Jarrett, and Catherine L. Drennan. Crystalstructure of biotin synthase, an s-adenosylmethionine-dependent radical enzyme. Science, 303(5654):76–79,2004.

[214] Anne-Kristin Kaster, Johanna Moll, Kristian Parey,and Rudolf K. Thauer. Coupling of ferredoxinand heterodisulfide reduction via electron bifur-cation in hydrogenotrophic methanogenic archaea.Proc. Nat. Acad. Sci. USA, 108:2981–2986, 2012.

[215] W. E. Balche and R. S. Wolfe. Specificity and biologicaldistribution of coenzyme M (2-mercaptoethanesulfonicacid). J. Bacteriol., 137:256–263, 1979.

[216] M. J. Danson. Central metabolism of the archaea. InM. Kates, D. J. Kushner, and A. T. Matheson, editors,The biochemistry of archaea, pages 1–24, Amsterdam,1993. Elsevier.

[217] Astrid Garhardt, Irfan Cinkaya, Dietmar Linder,Gjalt Hulsman, and Wolfgang Buckel. Fermentationof 4-aminobutyrate by Clostridium aminobytyricum:

cloning of two genes involved in the formation and de-hydration of 4-hydroxybutyryl-CoA. Arch. Microbiol.,174:189–199, 2000.

[218] Michael Schutz, Barbara Schoepp-Cothenet, ElisabethLojou, Mireille Woodstra, Doris Lexa, Pascale Tron,Alain Dolla, Marie-Claire Durand, Karl Otto Stetter,and Frauke Baymann. The naphthoquinol oxidizing cy-tochrome bc1 complex of the hyperthermophilic knall-gasbacterium aquifex aeolicus: Properties and phylo-genetic relationships. Biochemistry, 42:10800–10808,2003.

[219] Fuli Li, Julia Hinderberger, Henning Seedorf, Jin Zhang,Wolfgang Buckel, and Rudolf K. Thauer. Coupledferredoxin and crotonyl coenzyme A (CoA) reductionwith NADH catalyzed by the butyryl-CoA dehydroge-nase/Etf complex from Clostridium kluyveri . J. Bacte-riol., 190:843–850, 2008.

[220] Shuning Wang, Haiyan Huang, Johanna Moll, andRudolf K. Thauer. NADP+ reduction with reducedferredoxin and NADP+ reduction with NADH are cou-pled via an electron-bifurcating enzyme complex inClostridium kluyveri . J. Bacteriol., 192:5115–5123,2010.

[221] William F. Martin. Hydrogen, metals, bifurcating elec-trons, and proton gradients: The early evolution of bi-ological energy conservation. FEBS Lett., 586:485–493,2012.

[222] Kesen Ma, Andrea Hutchins, Shi-Jean S. Sung, andMichael W. W. Adams. Pyruvate ferredoxin oxidore-ductase from the hyperthermophilic archaeon, Pyrococ-cus furiosus, functions as a CoA-dependent pyruvate de-carboxylase. Proc. Nat. Acad. Sci. USA, 94:9608–9613,1997.

[223] John E Cronan Jr. and Grover L Waldrop. Multi-subunit acetyl-coa carboxylases. Progress in Lipid Re-search, 41(5):407 – 435, 2002.

[224] Hailong Zhang, Zhiru Yang, Yang Shen, and LiangTong. Crystal structure of the carboxyltransferasedomain of acetyl-coenzyme a carboxylase. Science,299(5615):2064–2067, 2003.

[225] Ronald Bentley and E. Haslam. The shikimate pathwaya metabolic tree with many branches. Critical Reviewsin Biochemistry and Molecular Biology, 25(5):307–384,1990.

[226] A. Kuki and P. G. Wolynes. Electron tunneling pathsin proteins. Science, 236:1647–1652, 2000.

[227] Roy A. Jensen. Enzyme recruitment in evolution of newfunction. Annu. Rev. Microbiol., 30:409–425, 1976.

[228] Dan S. Khersonsky, Olga Tawfik. Enzyme promis-cuity: A mechanistic and evolutionary perspective.Annu. Rev. Biochem., 79:471–505, 2010.

[229] Juhan Kim, Jamie P. Kershner, Yehor Novikov,Richard K. Shoemaker, and Shelley D. Copley. Threeserendipitous pathways in E. coli can bypass a blockin pyridoxal-5′-phosphate synthesis. Mol. Sys. Biol.,6:436:1–13, 2010.

[230] Olga Khersonsky, Sergey Malitsky, Ilana Rogachev, andDan S. Tawfik. Role of chemistry versus substratebinding in recruiting promiscuous enzyme functions.Biochem., 50:2683–2690, 2011.

[231] Dan K. Braithwaite and Junetsu Ito. Compilation,alignment, and phylogenetic relationships of DNA poly-merases. Nucl. Acids Res., 21:787–802, 1993.

[232] Scott Bailey, Richard A. Wing, and Thomas A. Steitz.

Page 57: The Compositional and Evolutionary Logic of Metabolism · Together these reactions make up metabolism, ... reactions than core metabolism, the network in which all basic ... ent chemical

56

The Structure of T. aquaticus DNA Polymerase III IsDistinct from Eukaryotic Replicative DNA Polymerases.Cell, 126:893–904, 2006.

[233] N. H. Horowitz. On the evolution of biochemical syn-thesis. Proc. Nat. Acad. Sci. USA, 31:153–157, 1945.

[234] Francis Crick. Central dogma of molecular biology. Na-ture, 227:561–563, 1970.

[235] Stephen Jay Gould. The structure of evolutionary the-ory. Harvard U. Press, Cambridge, Mass, 2002.

[236] Edmund Beecher Wilson. The cell in development andinheritance. Macmilan, New York, third edition, 1925.

[237] Poisot, Timothee and Bever, James D. and Nemri, Ad-nane and Thrall, Peter H. and Hochberg, Michael E.A conceptual framework for the evolution of ecolog-ical specialization. Ecology Lett., doi:10.1111/j.1461-0248.2011.01645.x, 2011.

[238] David L. Nelson and Michael M. Cox. Lehninger Prin-ciples of Biochemistry. W. H. Freeman, fourth edition,2004.

[239] Bruce Alberts. Molecular biology of the cell. GarlandScience, New York, 4th edition, 2002.

[240] Harold J. Morowitz. Foundations of Bioenergetics. Aca-demic Press, New York, 1987.

[241] Jean-Michel Claverie. Viruses take center stage in cel-lular evolution. Genome Biol., 7:110:1–5, 2006.

[242] Patrick Forterre. Defining life: the virus viewpoint.Orig. Life Evol. Biosphere, 40:151–160, 2010.

[243] Anya K. Bershad, Miguel A. Fuentes, and David C.Krakauer. Developmental autonomy and somaticniche construction promotes robust cell fate decisions.J. Theor. Biol., 254:408–416, 2008.

[244] Deborah S. Kelley, John A. Baross, and John R. De-laney. Volcanoes, fluids, and life at mid-ocean ridgespreading centers. Annu. Rev. Earth Planet. Sci.,30:385–491, 2002.

[245] H. Baltscheffsky, L.-V. von Stedingk, H.-W. Heldt, andM. Klingenberg. Inorganic pyrophosphate: formationin bacterial photophosphorylation. Science, 153:1120–1122, 1966.

[246] Y. Yamagata, H. Watanabe, M. Saitoh, and T. Namba.Volcanic production of polyphosphates and its relevanceto prebiotic evolution. Nature, 352:516–519, 1991.

[247] Arthur Kornberg, Narayana N. Rao, and Dana Ault-Riche. Inorganic polyphosphate: A molecule of manyfunctions. Annu. Rev. Biochem., 68:89–125, 1999.

[248] M. Baltscheffsky, A. Schultz, and H. Baltscheffsky. h+-ppases: a tightly membrane-bound family. FEBS Lett.,457:527–533, 1999.

[249] Michael R. W. Brown and Arthur Kornberg. Inorganicpolyphosphate in the origin and survival of species.Proc. Nat. Acad. Sci. USA, 101:16085–16087, 2004.

[250] Juli Pereto, Purificacion Lopez-Garcıa, and David Mor-eira. Ancestral lipid biosynthesis and early membraneevolution. Trends Biochem. Sci., 29:469–477, 2004.

[251] M. M. Hanczyk, S. M. Fujikawa, and J. W. Szostak. Ex-perimental models of primitive cellular compartments:Encapsulation, growth, and division. Science, 302:618–622, 2003.

[252] Pier Luigi Luisi. The emergence of life: from chemi-cal origins to synthetic biology. Cambridge U. Press,London, 2006.

[253] Steven J. Freeland and Laurence D. Hurst. The geneticcode is one in a million. J. Mol. Evol., 47:238–248, 1998.

[254] Robert D. Knight, Steven J. Freeland, and Laura F.Landweber. Selection, history and chemistry: the threefaces of the genetic code. Trends Biochem. Sci., 24:241–247, 1999.

[255] Yi Lu and Stephen Freeland. On the evolution of thestandard amino-acid alphabet. Gen. Biol., 7:102:1–6,2006.

[256] Tze-Fei Wong. A co-evolution theory of the geneticcode. Proc. Nat. Acad. Sci. USA, 72:1909–1912, 1975.

[257] C. R. Woese, D. H. Dugre, W. C. Saxinger, and S. A.Dugre. The molecular basis for the genetic code.Proc. Nat. Acad. Sci. USA, 55:966–974, 1966.

[258] Claude Berge. Graphs and hypergraphs. North-Holland,Amsterdam, rev. ed. edition, 1973.

[259] Nathan D. Price, Iman Famili, Daniel A. Beard, andBernhard O. Palsson. Extreme pathways and Kirch-hoff’s second law. Biophys. J., 83:2879–2882, 2002.

[260] Iman Famili and Bernhard O. Palsson. Systemicmetabolic reactions are obtained by singular value de-composition of genome-scale stoichiometric matrices.J. Theor. Biol., 224:87–96, 2003.

[261] Bernhard O. Palsson. Systems Biology. CambridgeU. Press, Cambridge, MA, 2006.

[262] Adam M. Feist, Markus J. Herrgard, Ines Thiele, Jen-nie L. Reed, and Bernhard O. Palsson. Reconstructionof biochemical networks in microorganisms. Nature Re-views Microbiol., 7:129–143, 2009.

[263] Oktay Sinanoglu. Theory of chemical reaction net-works. all possible mechanisms or synthetic path-ways with given number of reaction steps or species.J. Amer. Chem. Soc., 97:2309–2320, 1975.

[264] Oktay Sinanoglu. On the algebraic construction ofchemistry from quantum mechanics. a fundamental va-lency vector field defined on the euclidean 3-space andits relation to the hilbert space. Theoret. Chim. Acta,65:243–248, 1984.

[265] N. A. Sinitsyn, Nicalas Hengartner, and Ilya Ne-menman. Adiabatic coarse-graining and sim-ulations of stochastic biochemical networks.Proc. Nat. Acad. Sci. USA, 106:10546–10551, 2009.

[266] S. L. Miller and D. Smith-Magowan. The thermo-dynamics of the krebs cycle and related compounds.J. Phys. Chem. Ref. Data., 19:1049–1073, 1990.

[267] J. D. Bernal, editor. The Origin of Life, London, 1967.Weidenfeld and Nicolson.