discovering the electronic circuit diagram of life: structural

9
rstb.royalsocietypublishing.org Research Cite this article: Kim JD, Senn S, Harel A, Jelen BI, Falkowski PG. 2013 Discovering the electronic circuit diagram of life: structural relationships among transition metal binding sites in oxidoreductases. Phil Trans R Soc B 368: 20120257. http://dx.doi.org/10.1098/rstb.2012.0257 One contribution of 14 to a Discussion Meeting Issue ‘Energy transduction and genome function: an evolutionary synthesis’. Subject Areas: biochemistry, evolution Keywords: evolution, oxidoreductase, bioenergetics, metabolic pathways, biogeochemical cycles, protein structure Author for correspondence: Paul G. Falkowski e-mail: [email protected] Electronic supplementary material is available at http://dx.doi.org/10.1098/rstb.2012.0257 or via http://rstb.royalsocietypublishing.org. Discovering the electronic circuit diagram of life: structural relationships among transition metal binding sites in oxidoreductases J. Dongun Kim 1,3 , Stefan Senn 3 , Arye Harel 3 , Benjamin I. Jelen 3,4 and Paul G. Falkowski 1,2,3,4 1 Department of Chemistry and Chemical Biology, and 2 Department of Earth and Planetary Sciences, Rutgers University, Piscataway, NJ 08854, USA 3 Environmental Biophysics and Molecular Ecology Program, Institute of Marine and Coastal Sciences, Rutgers University, New Brunswick, NJ 08901, USA 4 Department of Environmental Sciences, Rutgers University, New Brunswick, NJ 08901, USA Oxidoreductases play a central role in catalysing enzymatic electron-transfer reactions across the tree of life. To first order, the equilibrium thermodyn- amic properties of these proteins are governed by protein folds associated with specific transition metals and ligands at the active site. A global analy- sis of holoenzyme structures and functions suggests that there are fewer than approximately 500 fundamental oxidoreductases, which can be further clus- tered into 35 unique groups. These catalysts evolved in prokaryotes early in the Earth’s history and are largely responsible for the emergence of non- equilibrium biogeochemical cycles on the planet’s surface. Although the evolutionary history of the amino acid sequences in the oxidoreductases is very difficult to reconstruct due to gene duplication and horizontal gene transfer, the evolution of the folds in the catalytic sites can potentially be used to infer the history of these enzymes. Using a novel, yet simple analysis of the secondary structures associated with the ligands in oxidoreductases, we developed a structural phylogeny of these enzymes. The results of this ‘composome’ analysis suggest an early split from a basal set of a small group of proteins dominated by loop structures into two families of oxido- reductases, one dominated by a-helices and the second by b-sheets. The structural evolutionary patterns in both clades trace redox gradients and increased hydrogen bond energy in the active sites. The overall pattern suggests that the evolution of the oxidoreductases led to decreased entropy in the transition metal folds over approximately 2.5 billion years, allowing the enzymes to use increasingly oxidized substrates with high specificity. 1. Introduction Biologically driven electron-transfer reactions are the primary energy-transduction processes across the tree of life. These reactions depend upon energy sources exter- nal to the system. The two external energy sources on the Earth are solar radiation and geothermally derived heat. These create chemical redox gradients, which are coupled to biological redox reactions and ultimately drive the non-equilibrium ther- modynamic reactions that make life possible. Indeed, the origin of life almost certainly began with the evolution of a small set of metabolic processes coupled to redox chemistry. Oxidoreductases (enzyme commission 1, EC1) are a class of enzymes that facilitate these proton-coupled electron-transfer reactions. All of the core metabolic processes mediated by EC1 proteins evolved in prokaryotes and ultimately became coupled on local and planetary scales to facilitate an electron ‘market’ between the major light elements. This electron market ultimately led to a closed cycle between respiratory reactions and their biological oxidative analogues, and photosynthesis and their biological reductive analogues. These reactions, which & 2013 The Author(s) Published by the Royal Society. All rights reserved. on January 31, 2018 http://rstb.royalsocietypublishing.org/ Downloaded from

Upload: vuongmien

Post on 21-Dec-2016

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Discovering the electronic circuit diagram of life: structural

on January 31, 2018http://rstb.royalsocietypublishing.org/Downloaded from

rstb.royalsocietypublishing.org

ResearchCite this article: Kim JD, Senn S, Harel A,

Jelen BI, Falkowski PG. 2013 Discovering the

electronic circuit diagram of life: structural

relationships among transition metal binding

sites in oxidoreductases. Phil Trans R Soc B

368: 20120257.

http://dx.doi.org/10.1098/rstb.2012.0257

One contribution of 14 to a Discussion Meeting

Issue ‘Energy transduction and genome

function: an evolutionary synthesis’.

Subject Areas:biochemistry, evolution

Keywords:evolution, oxidoreductase, bioenergetics,

metabolic pathways, biogeochemical cycles,

protein structure

Author for correspondence:Paul G. Falkowski

e-mail: [email protected]

& 2013 The Author(s) Published by the Royal Society. All rights reserved.

Electronic supplementary material is available

at http://dx.doi.org/10.1098/rstb.2012.0257 or

via http://rstb.royalsocietypublishing.org.

Discovering the electronic circuit diagramof life: structural relationships amongtransition metal binding sites inoxidoreductases

J. Dongun Kim1,3, Stefan Senn3, Arye Harel3, Benjamin I. Jelen3,4

and Paul G. Falkowski1,2,3,4

1Department of Chemistry and Chemical Biology, and 2Department of Earth and Planetary Sciences,Rutgers University, Piscataway, NJ 08854, USA3Environmental Biophysics and Molecular Ecology Program, Institute of Marine and Coastal Sciences,Rutgers University, New Brunswick, NJ 08901, USA4Department of Environmental Sciences, Rutgers University, New Brunswick, NJ 08901, USA

Oxidoreductases play a central role in catalysing enzymatic electron-transfer

reactions across the tree of life. To first order, the equilibrium thermodyn-

amic properties of these proteins are governed by protein folds associated

with specific transition metals and ligands at the active site. A global analy-

sis of holoenzyme structures and functions suggests that there are fewer than

approximately 500 fundamental oxidoreductases, which can be further clus-

tered into 35 unique groups. These catalysts evolved in prokaryotes early in

the Earth’s history and are largely responsible for the emergence of non-

equilibrium biogeochemical cycles on the planet’s surface. Although the

evolutionary history of the amino acid sequences in the oxidoreductases is

very difficult to reconstruct due to gene duplication and horizontal gene

transfer, the evolution of the folds in the catalytic sites can potentially be

used to infer the history of these enzymes. Using a novel, yet simple analysis

of the secondary structures associated with the ligands in oxidoreductases,

we developed a structural phylogeny of these enzymes. The results of this

‘composome’ analysis suggest an early split from a basal set of a small

group of proteins dominated by loop structures into two families of oxido-

reductases, one dominated by a-helices and the second by b-sheets. The

structural evolutionary patterns in both clades trace redox gradients and

increased hydrogen bond energy in the active sites. The overall pattern

suggests that the evolution of the oxidoreductases led to decreased entropy

in the transition metal folds over approximately 2.5 billion years, allowing

the enzymes to use increasingly oxidized substrates with high specificity.

1. IntroductionBiologically driven electron-transfer reactions are the primary energy-transduction

processes across the tree of life. These reactions depend upon energy sources exter-

nal to the system. The two external energy sources on the Earth are solar radiation

and geothermally derived heat. These create chemical redox gradients, which are

coupled to biological redox reactions and ultimately drive the non-equilibrium ther-

modynamic reactions that make life possible. Indeed, the origin of life almost

certainly began with the evolution of a small set of metabolic processes coupled

to redox chemistry.

Oxidoreductases (enzyme commission 1, EC1) are a class of enzymes that

facilitate these proton-coupled electron-transfer reactions. All of the core metabolic

processes mediated by EC1 proteins evolved in prokaryotes and ultimately

became coupled on local and planetary scales to facilitate an electron ‘market’

between the major light elements. This electron market ultimately led to a

closed cycle between respiratory reactions and their biological oxidative analogues,

and photosynthesis and their biological reductive analogues. These reactions, which

Page 2: Discovering the electronic circuit diagram of life: structural

rstb.royalsocietypublishing.orgPhilTransR

SocB368:20120257

2

on January 31, 2018http://rstb.royalsocietypublishing.org/Downloaded from

are far from thermodynamic equilibrium, allow gas exchanges

across the tree of life and transformed the Earth’s biogeochemical

cycles (figure 1a) [1].

Although the exact number of core EC1 proteins, including

orthologues, paralogues and analogues is unknown, their

functions appear to be encoded by fewer than approximately

500 unique genes (i.e. genes that encode for unique functions

including paralogues and analogues; figure 1a). One reason

that such a small set of proteins plays such a large role in the

Earth’s elemental cycles is that the different electron-transport

chains found ubiquitously throughout life share common

components. For example, chemoautotrophic, anaerobic and

aerobic respiration, as well as anoxygenic and oxygenic photo-

synthesis all use similar proton-coupled electron-transport

schemes. The basis of the scheme is separation of protons

from electrons across a membrane. The electrons are inevita-

bly ferried via carriers, across a set of membrane-bound

proteins, ultimately arriving at a transient sink that allows a

negatively charged carrier to be neutralized by a proton. The

protons are initially segregated from the electrons by the mem-

brane, thereby forming an asymmetric distribution of charge.

The return flow of the protons (i.e. the proton motive force)

is coupled to nanomachines, especially the ‘coupling factor,’

ATP synthase, which conserves the electrochemical energy as

chemical bond energy.

Although peripheral components of these proton-coupled

electron-transport reactions have been selected for specific

reaction substrates and products, the basic architecture of all

the core pathways shares similar protein structures and

ligands, including iron–sulfur clusters, pterins, haems and

quinones. These interchangeable structures and ligands have

evolved into a metabolic network with overlapping functions

across the tree of life (figure 1b). Additionally, many biological

energy-transduction systems share a small subset of metabo-

lic pathways such as glycolysis (the Embden–Meyerhof, the

Entner–Doudoroff, including Archaean modifications of

these pathways) or the reverse TCA cycle [2]. Thus, metab-

olism, using a variety of electron donors and acceptors,

draws catalysts from a core set of similar components and path-

ways to enable a flow of electrons and protons. This modular

approach to metabolism has provided great flexibility on a

relatively small number of EC1 genes. Indeed, prokaryotes

are often able to regulate major components of metabolism

in accordance with environmental conditions [3], often at

suboptimal efficiency.

Regardless of efficiency, the flux of electrons through the

metabolic network is particularly dependent on, and sensi-

tive to the availability of specific transition metals, especially

iron (figure 2). The bioavailability of transition metals is, in

turn, highly dependent on the redox state of the environment.

A recent whole-genome analysis of phylogenetically diverse

micro-organisms suggests that the earliest proteins incorporated

metals, and that metal usage over time evolved in accordance

with environmental availability [4]. The metals are invariably

coordinated to the protein scaffolds via a small set of specific

protein folds [1]. Identifying members and the evolutionary pat-

tern of this set of folds is critical to understanding the evolution

of metabolism across the tree of life, as well as the emergence of

biogeochemical cycles, far from equilibrium.

In this paper, we present an analysis of the evolutionary

history of metal usage, the structures of the protein folds

and redox state of the oxidoreductases across the tree of

life, which ultimately formed an electronic circuit on a

planetary scale. Our results suggest that the redox processes

connecting metabolism across the Earth’s surface underwent

a secular trend in evolutionary transitions that led to succes-

sively greater complexity and thermodynamic efficiency in

these critical enzymes over the first approximately 2.5 billion

years of the Earth’s history.

2. Transition metals in oxidoreductasesOxidoreductases catalyse electron-transfer reactions via pros-

thetic groups that usually contain transition metals whose

ions have incompletely filled d or f orbitals and which can

accept a electrons from protein side-chains [5,6]. Multiple oxi-

dation states, especially of molybdenum and manganese,

allow transition metals to be coupled with a wide range of

the reduction–oxidation reactions. Further, a specific metal–

ligand structure is ‘tuned’, or poised for specific redox reactions

by the protein–metal microenvironment [7].

Among transition metals, specific elements were naturally

selected for their physical–chemical properties, abundance,

coordination bond strength, atomic radii, solubility and polariz-

ability [8]. Iron is, by far, the most abundant transition metal on

the Earth [9]. The abundance of this element, especially as a

ferrous ion in the Archaean and early Proterozoic oceans, is

reflected in its wide use as a catalysing cofactor for oxidoreduc-

tases (figure 2). In Swiss–Prot [10] and the protein data bank

(PDB) [11], iron is identified as a catalytic component in more

that 67 per cent of the EC1 proteins. Iron-containing oxidoreduc-

tases can use the metal in a mineral form of a sulfide, or

coordinated to imidazole nitrogens in porphyrins, forming

haems. Following iron, oxidoreductases contain metals in

the following order of relative abundance: Cu . Mn . Ni .

Mo . Co . V . W (figure 2). It should be noted that under

anoxic and/or euxenic conditions, Cu and Mo are highly

insoluble unless the ions are oxidized.

3. Biopolymer – metal interactions in the‘ancient ocean’

Assuming that the conversion of energy via dissipation of redox

gradients was an early bioinorganic reaction essential for the

origin of life, it logically follows that transition metals played

a key role. Transition metals can undergo stoichiometric reac-

tions via photochemical processes or in solution phase with

other redox couples [12–15]. Transition metals are also capable

of carrying out catalytic reactions when surrounded by a

biopolymer that provides a specific structural framework

that allows reversible population of the metal–ligands with

electrons [16].

Lewis basic peptide side-chains, such as thiolates (cysteine),

imidazole nitrogens (e.g. histidines) and carboxylates (aspartic

and glutamic acids), can form coordination bonds via d-orbitals

in the transition metals. Metals bound by multiple side-chains

are locked within the peptide/protein matrix. These interac-

tions influence multiple physical properties of the holoprotein,

including solvent accessibility, tuned redox potential, optimiz-

ation of Gibbs free energy and enhanced substrate specificity.

Understanding how the earliest biopolymer–metal interactions

evolved is critical to understanding the origins of non-equilibrium

bioenergetic reactions, and hence the origins of life.

Page 3: Discovering the electronic circuit diagram of life: structural

VI:oxygenic

photosynthesis

O2

Mn4+

Mn2+

(a)

(b)

NH4+ HS– CH4

O2-mediated

Mn-mediatedFe-mediatedS-mediated

N-mediatedH2- or C-mediated

respiration

photosynthesis

+CO2

H2Fe2+

Fe3+

NO–3

SO42–

CO2

H2O

N2

N2 fixation

VI

VI

IVI

III

IIV

photosystem I

and II,

cytochrome

b6f complex

cytochrome c

reductive Acetyl C

oA

pathway

I: aerobic sugarmetabolism

IV:hydrogenoxidation III:

denitrification

II: sulfatereduction

V: methanogenesis

NAD(P)H quinoneoxidoreductase

intermediate cytochromes,

iron sulfur proteins,

quinones

TCA

cycle

glycolysis, pyruvate

oxidation

APS reductasesulfite reductasesulfate reductase

nitrate reductasenitrite reductase

nitric oxide reductasenitrous oxide reductase

hydrogenase

coenzyme F420

[CH2O]

Figure 1. Network of life’s biologically mediated cycles modified from [1], and a Venn diagram of the basic metabolic pathways. (a) The interconnected ‘circuit board’ oflife’s electron-transfer reactions (for C, H, O, N, S and Fe) is labelled with pathway groups I to VI, corresponding to metabolic networks. (b) The metabolic pathway groups Ito VI (large font) are shown to have redox components of electron-transport chains as well as auxiliary pathways in common (small font). Pathway group I: ‘aerobic sugarmetabolism’ has components such as NAD(P)H reductases, as well as membrane-localized cytochromes and quinones, in common with pathway group VI: ‘oxygenic photo-synthesis’. Pathway group II: ‘sulfate reduction’ utilizes oxidized sulfur compounds as terminal electron acceptors and carry out carbon metabolism via reductive acetyl CoApathway or the TCA cycle. Pathway group III: ‘denitrification’ uses unique oxidoreductases to reduce nitrates and other oxidized nitrogen compounds but still requires thecommon components of the electron-transport chain as well as glycolysis and the tricarboxylic acid (TCA) cycle as a source of reductant. Pathway group IV: ‘hydrogenoxidation’ uses an electron-transport chain but is not dependent on glycolysis or the TCA cycle for producing reductant. Pathway group V: ‘methanogenesis’ does notuse an electron-transport chain and is the only pathway using coenzyme F420. Size of circles is irrelevant.

rstb.royalsocietypublishing.orgPhilTransR

SocB368:20120257

3

on January 31, 2018http://rstb.royalsocietypublishing.org/Downloaded from

Page 4: Discovering the electronic circuit diagram of life: structural

Co2%

Ni4%

Mo4%

Mn6%

Fe26%

Cu14%

Fe (haem)43%

V1%

W0%

Figure 2. Transition metals in EC1 proteins. The relative contribution of themajor transition metals found in protein data bank (PDB) structures anno-tated as oxidoreductases.

rstb.royalsocietypublishing.orgPhilTransR

SocB368:20120257

4

on January 31, 2018http://rstb.royalsocietypublishing.org/Downloaded from

In the early Archaean [17], metals in minerals may have

played a significant role in adsorbing and concentrating

organic molecules and catalysing various chemical reactions

implicated in the origin of non-equilibrium redox reactions.

Provided with the building blocks of life, metals bound to

short peptides could have functioned as protoenzymes, as

is proposed by models of early protein evolution [18,19].

An early protoenzyme would have had to originate and

evolve under a strict set of rules:

(1) The electronic structure of transition metals must match

geometrical requirements for metal–ligand coordination

number and geometry. As a result, the emergent entactic

states constrain the subsequent evolution of the structures,

and topologies of the coordinately bound polypeptides or

other molecules required for catalysis.

(2) The mildly reducing environment of the ancient oceans

[20] required a relatively low midpoint potential of the

early redox organocatalysts, especially compared with

more recently evolved oxidoreductases. The ligand

environment was modulated to tune the midpoint poten-

tial to meet the functional requirements.

To meet these constraints, random polypeptides almost cer-

tainly evolved in association with the available metals and

continuously selected for sequences and folds that satisfied

both structural and functional constraints. Such a combinator-

ial search to local minima would have persisted until the

polypeptide–metal complex acquired locally optimized cataly-

tic functions. The specific folds almost certainly coevolved over

geological time with an increasing larger set of coupled biogeo-

chemical cycles. The ancient folds were spread genetically

across the nascent tree of life primarily via horizontal gene

transfer, but ultimately diverged into several motifs. The

subsequent structural innovations were accelerated by various

modes of evolution such as gene insertion, duplication

and partial loss. Evolved core protein folds became

molecular ‘modules’ from which a variety of biomachines

could ultimately be built via a ‘mix and match’ set of motifs.

4. Evolution of sequences and foldsBecause of both their modularity and early spread across the tree

of life, it is extremely difficult to determine the evolutionary heri-

tage of folds in the oxidoreductases solely based on analysis of

organisms, sequences or synteny within highly conserved oper-

ons. While the core oxidoreductases in oxygenic photosynthesis

are extremely highly conserved, inspection reveals major

sequence degeneracy in closely related structures. For example,

transmembrane-spanning helices of photosystem I and II have

highly divergent sequences, yet their structures are almost

identical [21]. This basic phenomenon was noted early on by pio-

neers in the field of bioinformatics. Indeed, Eck & Dayhoff [22, p.

363], noted ‘the processes of natural selection severely inhibit

change to a well-adapted system on which several other essential

components depend’. While their comments were based on the

highly conserved structure of ferredoxin, they apply to many

ancient proteins, including enzymes that do not catalyse redox

reactions. For example, ribulose-1,5-bisphosphate carboxylase

oxygenase (EC 4.1.1.39) is a carboxylyase. The enzyme is respon-

sible for the fixation of CO2 in many photosynthetic and

chemoautotrophic organisms. This crucial enzyme cannot

easily distinguish between its ‘true’ substrate, CO2 and O2. The

result is that at present atmospheric levels of O2, the enzyme is

often remarkably inefficient. Moreover, the catalytic turnover

of the reaction, even under optimal conditions, is much slower

than reactions feeding the substrate (CO2) or removing the pro-

duct (3-phosphoglycerate). Regardless, the catalytic site of the

enzyme is highly conserved and the biological result of this con-

servation is that organisms often synthesize the enzyme in

excess to achieve maximum overall growth efficiency [23].

There are many other, similar examples.

The fundamental physico-chemical properties that govern

the major protein fold conformations have remained unchan-

ged. Reinvention of metalloenzyme folds is highly restricted,

given that geometrical and energetic selection processes

limit structural solutions. Let us examine a novel approach to

identifying and ordering the structural solutions found in

extant oxidoreductases.

5. The ‘composome’ approachWe hypothesize that the ensemble of secondary structures in

the region surrounding the catalytically active metals has

been selected to facilitate catalysis of the holoenzyme. We

further hypothesize that these secondary structures are the out-

come of selection and provide a window into the processes in

which protein folds evolved. We assume that the composition

of the secondary structural motifs reflects the evolutionary his-

tory of the protoenzyme from which the extant motif is

descended, and was inherited with modifications through a

myriad of organisms to form the observed protein fold. We

further assume that the folds must obey the rules set by the

d-block metal coordination chemistry [5]. These underlying

hypotheses are extensions from our previous work where we

proposed that secondary structures around the metal or

metal–ligand in the active site would be more conserved

than elsewhere in the protein [24]. We call this quantitative

analysis of the secondary structure of the folds in active sites

Page 5: Discovering the electronic circuit diagram of life: structural

formate dehydrogenase (PDB: 2iv2)

cytochrome c oxidase (PDB: 2gsm) oxalate oxidase (PDB: 2et1)

toluene 2,3-dioxygenase (PDB: 3en1)

sheet (%)

helix

(%) loop (%

)

0 10 20 30 40 50 60 70 80 9010

0%

0

10

100%

90

80

70

60

50

40

20

30

10

0

20

30

40

50

60

70

80

90

100%

MnFe4S4Mo–Fe–SdiCuMO-PterinCuFe2S2FeFe-haem

Figure 3. Ternary vector space used for the secondary structure composition (i.e. the ‘composome’) and representative structures. Representative structures areshown for each of the major folds in the ternary space.

structural similarity (%)

com

poso

me

dist

ance

100 80 60 40 20 0

1.2

1.0

0.8

0.6

0.4

0.2

0

Figure 4. Comparison of the composome approach with a tertiary structuralanalysis. The structural similarity and composome distance (described in text)are compared in a dot plot of 3240 alignments of 82 metal catalytic sites. The447 alignments between the same metal cofactors are coloured red.Alignments that are less than 40 per cent structurally similar (grey-shadedarea) can be considered inconclusive in terms of structural similarity. Theinset shows an alignment between the manganese-binding motif in oxalateoxidase (2et1, EC 1.2.3.4; orange for matched, blue for unmatched) and theiron-binding motif of clavaminate synthase (1ds1, EC 1.14.11.21; red formatched, green for unmatched). The folds are very similar in secondary struc-ture content and overall fold (52.64% structurally similar, r.m.s.d. 1.86 A).The sequence identity derived from the 50 residues’ alignment is 6%, orthree residues, two of which are the metal-coordinating histidines (illustratedin stick representation). This analysis strongly suggests that the ‘composome’approach based solely on secondary structures can differentiate betweenclosely related folds with a high degree of accuracy.

rstb.royalsocietypublishing.orgPhilTransR

SocB368:20120257

5

on January 31, 2018http://rstb.royalsocietypublishing.org/Downloaded from

the elucidation of a ‘composome’. To the best of our knowl-

edge, this is the first attempt to infer quantitative distances

between distinctly different protein folds based solely on

secondary structural composition. The resulting ‘phylogeny’

represents a linkage of fold relationships in structural space,

and obviously is not intended to imply linear, monophyletic

history of the evolution of EC1 proteins.

The approach uses PDB files that possess previously deter-

mined ‘gold standard’ domains [25]. From this dataset, we

extracted a subset of representatives using the best resolu-

tion structure for each organism per ‘gold standard’ domain

(see the electronic supplementary material). We included

all structures of orthologous proteins with high resolution

for every gold standard domain. For every domain, the

corresponding metal–ligand was treated as the catalytic site.

For each catalytic site, we first collected a list of amino acid

residues that are within 15 A from a catalytic metal, based on

carbon alpha (Ca) coordinates. Secondary structures of the

residues were assigned using the Define Secondary Structure

of Proteins (DSSP) database [26]. The overall secondary structure

composition (i.e. the ‘composome’) around each metal-bearing

catalytic site was determined and further adjusted based on a

residue-metal distance by a factor of 1/r2. These secondary struc-

tural compositions, from each catalytic site, were plotted in a

ternary vector space (figure 3). Structures that contained both

identical metals and a Euclidean distance in composome space

less than 0.02 A were collapsed into a single representative struc-

ture, resulting in 82 final representative structures. Each data

point in this ternary space diagram represents an individual

oxidoreductase. The data points largely cluster into two

groups with helix-rich and sheet-rich clades. For each metal or

metal–ligand, data points tend to aggregate, suggesting that

the physico-chemical properties of metals constrain the entactic

evolution towards specific compositions of secondary structure

around the catalytic site, regardless of catalytic function.

The compositional finger printing method described

for calculating relations in folds across all known EC1

structures collapses three-dimensional information into a one-

dimensional matrix. To assess the effects of this mathematical

simplification and compression, the Ca backbone environments

for the 82 structurally different catalytic sites were superimposed

in a pairwise, all versus all fashion. We used a structural align-

ment protocol that calculates a novel similarity score, which

combines alignment length and spatial deviation into one

measurement [27]. The similarity values of highly similar struc-

ture pairs (more than 40% structurally similar) correlate with low

compositional profile distances (figure 4). This result strongly

Page 6: Discovering the electronic circuit diagram of life: structural

not found in extant folds

–2.0

–3.0

–4.0

primordial polypeptide simple foldlow H-bond energy

complex foldhigh H-bond energy

protein evolvability(kcal m

ol –1 per residue)

beta barrel

100% sheet

80

60

40

20

0100% helix

100% loop

mixed helix and sheet

iron(haems)

Figure 5. Average hydrogen bond energy per amino acid residue associated with the metal/metal – ligand centre across the composome surface. Hydrogen bondenergy is a proxy for fold ‘evolvability’. The evolvability contour plot is generated based on average hydrogen bond energy per residue from 82 data points andoverlayed on the ternary plot ( figure 3). As protein folds acquire more helix or sheet secondary structures, they become less flexible and their folds are increasinglyfixed. The evolution of fold occurred starting from simple disordered state (red) to complex organized state (blue).

rstb.royalsocietypublishing.orgPhilTransR

SocB368:20120257

6

on January 31, 2018http://rstb.royalsocietypublishing.org/Downloaded from

suggests that, in principle, secondary structures retain sufficient

topological information such that a quantitative analysis of the

fold can retrieve conformational relationships with sufficient res-

olution and confidence to derive the structural history of the

folds in catalytic sites.

Indeed, for dissimilar structural pairs, secondary structure

composition maintains a high degree of predictive performance.

We find significantly lower secondary structure composition

distances for environments of same ligands compared with

different ligands (figure 4), especially for pairs where the calcu-

lated Ca backbone structure similarity is less than 40 per cent

and therefore inconclusive. Very high composome distances

are generally not observed for same ligand pairs, suggesting

a limit of the differences between same ligand environment sec-

ondary structures. This phenomenon appears to be related to

hydrogen bonding patterns that are required by certain elec-

tronic structures around metal or metal–ligand catalytic sites,

leading to similar secondary structure composition signatures.

We kept orthologous structures with detectable sequence

similarity in our dataset for two reasons: (i) to show the

structural variability of a motif and get the full spectrum of sec-

ondary structure composition and (ii) to highlight the sensitivity

of secondary structure composition analysis. This redundancy

also includes phylogenetic information and represents a positive

control for our analysis. One example is the molybdenum cluster

(figure 3) where related structure domains (SCOP family:

molybdenum cofactor-binding domain) are sampled over a

wide phylogenetic space as well as different functions according

to EC classifications.

Let us now examine the fold ‘phylogeny’ derived from

the composome analysis.

6. Fold phylogeny based on composomeA polypeptide chain has an intrinsic property of having

dynamic conformational variability, but specific structures

with relatively rigid folds are often conferred for specific bio-

logical functions. Fold evolution is achieved when a protein

structure with a functional promiscuity has a flexible chain,

which is ‘evolvable’ [28].

To check the evolvability of each protein fold, we calculated

the average hydrogen bond energy per residue around each

metal site. We postulated that hydrogen bond energy is inver-

sely proportional to the evolvability of the protein fold. The

analysis (figure 5) suggests that ‘loop-rich’ folds have low

hydrogen bond energy per residue, making the fold more ‘evol-

vable’; loops and coils tend to be more flexible. By contrast,

a-helices or b-sheets form more extended hydrogen-bonding

networks, making the fold far more rigid. Also the extensive

hydrogen-bonding network confers a high contact order,

which corresponds to a complex and slow folding kinetics

Page 7: Discovering the electronic circuit diagram of life: structural

HEM

LA

NO

STER

OL1

4-A

LPH

A-D

EMET

HY

LASE

HE

M C

YTO

CH

RO

ME

P450

2C8

HE

M C

YT

OC

HR

OM

EP4

502C

9H

EM

CY

TO

CH

RO

ME

P450

2C5

HE

M P

450M

ON

OO

XY

GE

NA

SE

HE

M C

YT

OC

HR

OM

EP450130

HE

M PE

RO

XID

ASE

N

HE

M C

ATA

LA

SE-PE

RO

XID

ASE

CFN

NIT

RO

GE

NA

SEM

OLY

BD

EN

UM

-IRO

NPR

OT

EIN

AL

PHA

CH

AIN

HE

M C

ATA

LA

SE-PE

RO

XID

ASE

CFM

NITR

OG

ENA

DEM

OLY

BD

ENU

M-IR

ON

PRO

TEINA

LPHA

CH

AIN

HEM

CY

TOC

HRO

MEP4507A

1

HEM

CYTO

CHRO

MEP450CY

P119

FE 4 5-DIOXYGENASEALPHACHAIN

SF4 IRONHYDROGENASE1

HEM ENDOTHELIALNITRIC-OXIDESYNTHASE

HEM BOVINEENDOTHELIALNITRICOXIDESYNTHASEHEMEDOMAIN

HEM NITRIC-OXIDESYNTHASEHOMOLOG

HEM NITRICOOXIDESYNTHASE

HEM NITRICOXIDESYNTHASE INDUCIBLE

CFM NITROGENASEMOLYBDENUMIRONPROTEINALPHACHAIN

CFM NITROGENASEMOLYBDENUM-IRONPROTEINALPHACHAIN

CFM NITROGENASEMOLYBDENUMIRONPROTEINALPHACHAIN

SF4 PERIPLASMIC[NIFE] HYDROGENASESMALLSUBUNIT

SF4 RESPIRATORYNITRATEREDUCTASE1ALPAHACHAIN

SF4 FORMATEDEHYDROGENASEH

SF4 FORMATEDEHYDROGENASE[LARGESUBUNIT]SF4 FORMATEDEHYDROGENASE NITRATE-INDUCIBLE MAJORSUBUNIT

FES APOCYTOCHROMEFMN OXALATEOXIDASE1CU PHENYLETHYLAMINEOXIDASE

CU COPPERAMINEOXIDASE

CU AMILORIDE-SENSITIVEAMINEOXIDASE

CU COPPERAMINEOXID

ASE LIVERISOZYME

FE2 ISOPENIC

ILLIN

NSYNTHASE

CU PEPTIDYL-G

LYCIN

EALPHA-AM

IDATIN

GMONOOXYGENASE

CU SUPEROXID

EDISM

UTASE[C

U-ZN]

CU C

OPP

ER-ZIN

CSUPE

ROXID

EDIS

MU

TASE

FE H

YD

ROX

YQ

UIN

OL1

2-D

IOX

YG

ENA

SE

CU

SU

PERO

XID

EDIS

MU

TASE

FE C

ATEC

HO

L1 2

-DIO

XY

GEN

ASE

CU

EX

TR

AC

EL

LU

LA

RSU

PER

OX

IDE

DIS

MU

TASE

[CU

-ZN

]

FE2

CL

AV

AM

INA

TE

SYN

TH

ASE

1

FE2

AL

PHA

-KE

TO

GL

UTA

RA

TE

-DE

PEN

DE

NT

TAU

RIN

ED

IOX

YG

EN

ASE

CU

CO

PPE

R-Z

INC

SUPE

RO

XID

ED

ISM

UTA

SEC

U C

OPP

ER

-ZIN

CSU

PER

OX

IDE

DIS

MU

TASE

FE C

HL

OR

OC

AT

EC

HO

L1

2-D

IOX

YG

EN

ASE

FE2 A

SPAR

AG

INE

OX

YG

EN

ASE

FE PR

OT

OC

AT

EC

HU

AT

E3 4-D

IOX

YG

EN

ASE

AL

PHA

CH

AIN

FE PR

OT

OC

AT

EC

HU

AT

E3 4-D

IOX

YG

EN

ASE

CFM

NIT

RO

GE

NA

SEM

OLY

BD

NU

MIR

ON

PRO

TE

INA

LPH

AC

HA

IN

HE

M P

UTA

TIV

EC

YT

OC

HR

OM

EP4

5012

4

FE D

ESU

LFO

FER

RO

DO

XIN

FE2 D

ESU

LFO

FER

RO

DO

XIN

FE PUTATIV

ESUPER

OX

IDER

EDU

CTA

SE

CU

A BA

3-TYPEC

YTO

CH

ROM

E-CO

XID

ASE

FE SUPERO

XID

EREDU

CTASE

SF4 NITRO

GENA

SEIRON

PROTEIN

FES CYTOCHROMEB

CUB CARBONMONOXIDEDEHYDROGENASESM

ALLCHAIN

PCD CUTS IRON-SULFURPROTEINOFCARBONMONOXIDE

PCD 4-HYDROXYBENZOYL-COAREDUCTASEALPHASUBUNIT

MOS XANTHINEDEHYDROGENASE

MOM XANTHINEDEHYDROGENASE and OXIDASE

PCD ALDEHYDEOXIDOREDUCTASEFES CYTOCHROMEB6

FES RIESKEIRON-SULFURPROTEINFES RIESKEPROTEINFES NAPHTHALENE1 2-DIOXYGENASEALPHASUBUNIT

FES BENZENE1 2-DIOXYGENASESUBUNITALPHA

CUA BA3-TYPECYTOCHROME-COXIDASE

CU PHENOLOXIDASESUBUNIT2

SF4 DIFYDROPYRIMIDINEDEHYDROGENASE

HEM MYELOPEROXIDASE

HEM LACTOPEROXIDASE

HEM PEROXIDASEMANGANESE-D

EPENDENTI

HEM PUTATIV

ECYTOCHROMEP450120

HEM V

ERSATILEPEROXID

ASEVPL2

HEM C

YTOSOLIC

ASCORBATEPE

ROXIDASE

C20 P

OLY

PHEN

OLO

XID

ASE

CH

LORO

PLA

ST

HEM

CY

TOCH

ROM

EP45

0-SU

1

C20

CAT

ECH

OLO

XID

ASE

(a)

(d )

(b)

(c)

Figure 6. A tree of protein folds based on evolvability of metal binding sites (average hydrogen bond energy per residue, figure 5) and a Euclidean distance matrix based onthe composome analysis (i.e. figure 3). Full names of proteins that include metal binding folds are provided and the metal types are labelled with different colours (light blue:Fe2S2 cluster; purple: molybdopterin; pink: Fe4S4 cluster; red: iron; yellow: haem; cyan: copper; green: molybdenum – iron – sulfur). A complete list of examined proteinstructures is provided in the electronic supplementary data. During early evolution, polypeptide chains gained catalytic ability through random mutations. (a) The earliestprotein folds would have resembled extant Rieske or molybdopterin folds, which have high evolvability. (b) Conformational searches yields folds with both a-helix and b-sheet structures, such as ferredoxins. (c) Further conformational searches result in the divergence of protein structure, giving rise to a-helix-rich and (d) b-sheet folds.

rstb.royalsocietypublishing.orgPhilTransR

SocB368:20120257

7

on January 31, 2018http://rstb.royalsocietypublishing.org/Downloaded from

[29]. While evolution can obviously still occur in these folds, the

resulting structures are highly conserved. Mixed usage of both

helices and sheets requires adjoining loop regions. The result is

a variable degree of hydrogen bonding across the peptide back-

bone. Protein folds with both helices and sheets are expected be

relatively highly evolvable. In our analysis, we observe two

‘evolvability hot spots’ (figure 5). One such area is occupied

by a ferredoxin fold, which might be one of the ancient extant

oxidoreductase structures [22], whereas other obvious hotspots

of origins do not exist among folds in our dataset (see the elec-

tronic supplementary material).

Based on the evolvability, we hypothesize a parsimonious

condition (analogous to a Bayesian prior) for fold evolution:

that is, folds evolved from conditions with fewer to more

hydrogen bond energies. Hence, loop- and coil-rich protein

structures, lacking secondary structure, would be located

close to the root of the phylogenetic tree of electron-transfer

folds. Using the secondary structure compositional vectors

for each protein fold, a matrix of Euclidian distances was gen-

erated, and a phylogenetic tree of protein folds (figure 6) was

calculated using the Fitch–Margoliash algorithm and global

tree optimization, as provided by the PHYLIP package [30].

The Rieske fold was chosen as a root for building a monophy-

letic tree; however, it is impossible to prove that the actual

evolution of protein folds occurred in a monophyletic

fashion, and it most likely did not. For example, the higher

midpoint potential of Rieske proteins suggests that these

structures evolved after ferredoxin. Clearly, simple folds

could have been recruited from other functions to become a

catalytic part of oxidoreductases. However, the basic

Page 8: Discovering the electronic circuit diagram of life: structural

r

8

on January 31, 2018http://rstb.royalsocietypublishing.org/Downloaded from

topology of the fold tree is potentially informative about the

evolutionary history of electron-transfer reactions.

stb.royalsocietypublishing.orgPhilTransR

SocB368:20120257

7. Iron – sulfur proteinsIron–sulfur-containing ferredoxin folds are located near the

centre of the ternary plot, indicating relatively equal amounts

of helix, sheet and loop composition. The location of iron–

sulfur proteins on a ternary plot coincides with a high

evolvability hotspot, suggesting ferredoxin folds might be

the common ancestor of many extant protein folds for a

number of reasons.

First, iron–sulfur minerals were thought to be relatively

abundant in the early Archaean ocean and it has been specu-

lated that iron–sulfur clusters played an important role in the

evolution of bioenergetic redox transduction systems

[14]. These hypotheses postulate that the earliest biologically

relevant redox reactions occurred on iron–sulfur mineral depos-

its associated with hydrothermal vents. Second, ferredoxin is

found across the tree of life, and is found alone or as a domain

in larger proteins, many of which are encoded by the core

redox genes of life. Third, sequence and structural symmetry of

ferredoxins suggests that they may have evolved from a gene

duplication event of a 28–30 amino acid sequence, each capable

of binding one iron–sulfur cluster. Sequence analyses by Eck–

Dayhoff revealed even shorter repeats of four amino acids,

suggesting a prebiotic ‘protoferredoxin’ that was potentially

composed of a primaeval subset of the 20 amino acids [22].

Fourth, all ferredoxins have a simple, conserved fold that binds

two Fe4S4 clusters and is composed of fifty to sixty amino acids.

8. Molybdopterin and Rieske proteinsThe lack of a helix/sheet is characteristic of Rieske iron–sulfur-

containing proteins and molybdopterin proteins. Although

pterins may have been formed very early in the Earth’s history

(as they are derived from GTP), it is unclear how these proteins

carry out specific biological functions without a fixed helix or

sheet secondary structure. Regardless, our composome analysis

suggests that these folds have the highest potential to evolve

(figure 5). Unlike nitrogenases, molybdopterin proteins, such as

dimethyl sulfoxide (DMSO) reductase or nitrate reductase,

have a molybdenum atom chelated by pterins, which is

surrounded by a protein environment that lack either a-helix or

b-sheet. In the absence of Mo, tungsten can serve as a replacement

owing to its similar chemical properties, and it is possible

that tungsten-containing proteins gave rise to molybdenum-con-

taining protein folds in a prebiotic world [31].

9. Molybdenum – iron – sulfur proteinsReduction of N2 to NH3 is catalysed by nitrogenases, the

modern forms of which contain molybdenum–iron–sulfur

clusters. Molybdenum-containing nitrogenase folds are closely

located near ferredoxin folds in a ternary composome space,

indicating their similarity in the secondary structure com-

position with mixed a-helix and b-sheet. Our composome

analysis suggests that the Mo–Fe–S-cluster-containing folds

may have evolved from Fe–S folds. Both iron–sulfur folds

and molybdenum–iron–sulfur folds have a-helix and b-

sheet elements with a significant loop content, adding flexi-

bility to the core structure. As a result, these folds may have

diversified by exploring different conformations and

compositions, mainly diverging into two distinct clades: an

a-helix-rich clade and a b-sheet-rich clade.

10. Rubredoxin, Mn and Cu proteinsThe Fe atom is found alone or as a ligand complex with a por-

phyrin ring in most extant folds. Rubredoxin is a single iron-

containing fold with high b-sheet content. Manganese and

copper folds also appear at the far edges of the compo-

some-space in the b-sheet clade. These folds form extensive

hydrogen bond networks across the backbone amide and car-

bonyl, making the fold less evolvable. In return, these folds

gained high catalytic specificity and accuracy with rigid

structure, but sacrificed evolvability.

11. Haem proteinsHaem-containing proteins are a hallmark of many electron-

transfer reactions. Unlike rubredoxins, haem folds have an

iron atom residing in a porphyrin ring surrounded by

a-helices. The iron atom has octahedral coordination geome-

try, where axial positions are ligated to histidine side-chains.

A surrounding porphyrin ring allows iron atoms to be

incorporated into a protein scaffold with just one amino

acid residue binding the iron directly. The usage of porphyrin

rings may have relieved the stringent geometric requirement

of the iron coordination (i.e. rubredoxin), allowing an explo-

sive diversification of haem folds. Haems are among the most

abundant protein cofactors in oxidoreductases (figure 2) and

widely covered in composome ternary space (figure 3).

12. Concluding remarksLife is dependent on the catalytic function of a relatively small

set of core proteins. Within this core set, oxidoreductases play

an outsized role, but their evolutionary history is poorly under-

stood. Our brief analysis conforms to the hypothesis that the

evolution of the oxidoreductases is, like for all proteins, a

trade-off between enzyme specificity and evolvability. Assum-

ing that the evolutionary trajectory followed from a disordered

state to an ordered state, proteins appear to have evolved from

loop-rich into either a-helix-rich or b-sheet-rich structures,

becoming more specific but less evolvable. The early split

between the two major clades of oxidoreductases continued

to follow a decrease in the free energy within the active site—

a long-term displacement of internal entropy that coincides

with an increased contact order of the core structure. The selec-

tion pressure that led to the increased hydrogen bonding

energy concordant with increased redox potential in oxido-

reductases may be an outcome of evolution, or a result of a

feedback between the initial ignition of a non-equilibrium ther-

modynamic system that is self-sustaining with a positive

feedback. Regardless, the redox space explored by biology

over the past two billion years appears to be extremely limited.

We conclude that life has evolved a very small set of key

core catalysts, the genes of which are transmitted across vast

expanses of geological time by microbes, allowing a fundamen-

tal set of electron-transfer reactions to permit energy extraction

from an open system. The microbes themselves are temporary,

disposable carriers of the core genes. They go extinct but

Page 9: Discovering the electronic circuit diagram of life: structural

rstb.royalso

9

on January 31, 2018http://rstb.royalsocietypublishing.org/Downloaded from

transfer the functions onwards. How these catalysts came to be

coupled in specific sequences and with other chemical reactions

to obviate non-equilibrium systems remains a major challenge

to understanding the origins and continuity of life on the Earth.

This research is funded by the Gordon and Betty Moore Foundationthrough Grant GBMF2807 to Paul Falkowski. We thank DoronLancet for suggesting the term ‘composome’. We thank Yana Brom-berg, Vikas Nanda, David A. Case and two anonymous reviewersfor constructive comments.

cietypublishin

References

g.orgPhilTransR

SocB368:20120257

1. Falkowski PG, Fenchel T, Delong EF. 2008 Themicrobial engines that drive Earth’s biogeochemicalcycles. Science 320, 1034 – 1039. (doi:10.1126/science.1153213)

2. Braakman R, Smith E. 2012 The emergence andearly evolution of biological carbon-fixation. PLoSComput. Biol. 8. e1002455. (doi:10.1371/journal.pcbi.1002455)

3. Unden G, Bongaerts J. 1997 Alternative respiratorypathways of Escherichia coli: energetics andtranscriptional regulation in response to electronacceptors. Biochim. Biophys. Acta 1320, 217 – 234.(doi:10.1016/S0005-2728(97)00034-0)

4. Dupont CL, Yang S, Palenik B, Bourne PE. 2006Modern proteomes contain putative imprints ofancient shifts in trace metal geochemistry. Proc. NatlAcad. Sci. USA 103, 17 822 – 17 827. (doi:10.1073/pnas.0605798103)

5. Holm RH, Kennepohl P, Solomon EI. 1996 Structuraland functional aspects of metal sites in biology. Chem.Rev. 96, 2239 – 2314. (doi:10.1021/cr9500390)

6. Metz S, Thiel W. 2011 Theoretical studies on thereactivity of molybdenum enzymes. Coord. Chem. Rev.255, 1085 – 1103. (doi:10.1016/j.ccr.2011.01.027)

7. Dey A, Jenney Jr FE, Adams MWW, Babini E, TakahashiY, Fukuyama K, Hodgson KO, Hedman B, Solomon EI.2007 Solvent tuning of electrochemical potentials inthe active sites of HiPIP versus ferredoxin. Science 318,1464 – 1468. (doi:10.1126/science.1147753)

8. Williams RJP. 1990 Overview of biological electron-transfer. Adv. Chem. Ser. 226, 3 – 23. (doi:10.1021/ba-1990-0226.ch001)

9. Schlesinger KJ et al. 2012 The metallicitydistribution functions of segue g and k dwarfs:constraints for disk chemical evolution andformation. Astrophys. J. 761, 160. (doi:10.1088/0004-637X/761/2/160)

10. The UniProt Consortium 1. 2012 Reorganizing theprotein space at the universal protein resource(UniProt). Nucleic Acids Res. 40, D71 – D75. (doi:10.1093/nar/gkr981)

11. Bernstein FC, Koetzle TF, Williams GJ, Meyer Jr EF,Brice MD, Rodgers JR, Kennard O, Shimanouchi T,

Tasumi M. 1997 The protein data bank. Acomputer-based archival file for macromolecularstructures. Eur. J. Biochem. 80, 319 – 324. (doi:10.1111/j.1432-1033.1977.tb11885.x).

12. Mauzerall DC. 1990 The photochemical origins oflife and photoreaction of ferrous ion in the archaeanoceans. Origins Life Evol. Biosph. 20, 293 – 302.(doi:10.1007/BF01808111)

13. Braterman PS, Cairns-Smith AG, Sloper RW. 1983Photo-oxidation of hydrated Fe2þ significance forbanded iron formations. Nature 303, 163 – 164.(doi:10.1038/303163a0)

14. Wachtershauser G. 1998 Pyrite formation, the 1stenergy-source for life: a hypothesis. Syst. Appl.Microbiol. 10, 207 – 210. (doi:10.1016/S0723-2020(88)80001-8)

15. Martin W, Russell MJ. 2003 On the origins of cells: ahypothesis for the evolutionary transitions fromabiotic geochemistry to chemoautotrophicprokaryotes, and from prokaryotes to nucleatedcells. Phil. Trans. R. Soc. Lond. B 358, 59 – 85.(doi:10.1098/rstb.2002.1183)

16. Cody GD. 2004 Transition metal sulfides and theorigins of metabolism. Annu. Rev. Earth Planet. Sci.32, 569 – 599. (doi:10.1146/annurev.earth.32.101802.120225)

17. Oparin AI. 2003 Origin of life. New York, NY: DoverPublications.

18. Hazen RM, Sverjensky DA. 2010 Mineral surfaces,geochemical complexities, and the origins of life.Csh Perspect. Biol. 2, a002162. (doi:10.1101/cshperspect.a002162)

19. Wachtershauser G. 1988 Before enzymes andtemplates: theory of surface metabolism. Microbiol.Rev. 52, 452 – 484.

20. Holland HD. 1984 The chemical evolution of theatmosphere and oceans. Princeton, NJ: PrincetonUniversity Press.

21. Sadekar S, Raymond J, Blankenship RE. 2006Conservation of distantly related membraneproteins: photosynthetic reaction centers share acommon structural core. Mol. Biol. Evol. 23,2001 – 2007. (doi:10.1093/molbev/msl079)

22. Eck RV, Dayhoff MO. 1966 Evolution of the structureof ferredoxin based on living relics of primitiveamino acid sequences. Science 152, 363 – 366.(doi:10.1126/science.152.3720.363)

23. Lorimer GH. 1981 The carboxylation andoxygenation of ribulose 1, 5-bisphosphate: theprimary events in photosynthesis andphotorespiration. Annu. Rev. Plant Physiol. 32,349 – 382. (doi:10.1146/annurev.pp.32.060181.002025)

24. Kim JD, Rodriguez-Granillo A, Case DA, Nanda V,Falkowski PG. 2012 Energetic selection of topologyin ferredoxins. PLoS Comput. Biol. 8, e1002463.(doi:10.1371/journal.pcbi.1002463)

25. Harel A, Falkowski P, Bromberg Y. 2012 TrAnsFuSErefines the search for protein function:oxidoreductases. Integr. Biol. 4, 765 – 777. (doi:10.1039/c2ib00131d)

26. Kabsch W, Sander C. 1983 Dictionary of proteinsecondary structure: pattern recognition ofhydrogen-bonded and geometrical features.Biopolymers 22, 2577 – 2637. (doi:10.1002/bip.360221211)

27. Sippl MJ, Wiederstein M. 2012 Detection of spatialcorrelations in protein structures and molecularcomplexes. Structure 20, 718 – 728. (doi:10.1016/j.str.2012.01.024)

28. Tokuriki N, Tawfik DS. 2009 Protein dynamism andevolvability. Science 324, 203 – 237. (doi:10.1126/science.1169375)

29. Plaxco KW, Simons KT, Baker D. 1998 Contact order,transition state placement and the refolding rates ofsingle domain proteins. J. Mol. Biol. 277, 985 – 994.(doi:10.1006/jmbi.1998.1645)

30. Felsenstein J. 1985 Phylogenies and thecomparative method. Am. Nat. 125, 1 – 15. (doi:10.1086/284325)

31. Schoepp-Cothenet B, van Lis R, Philippot P,Magalon A, Russell MJ, Nitschke W. 2012 Theineluctable requirement for the trans-ironelements molybdenum and/or tungsten in theorigin of life. Sci. Rep. 2, 263. (doi:10.1038/srep00263)