exposing relationships using directed evolution

Download Exposing relationships using directed evolution

Post on 30-Dec-2016




1 download

Embed Size (px)


  • |Research Focus

    Exposing relationships using directed evolution

    Oliver J. Miller and Paul A. Dalby

    The Advanced Centre for Biochemical Engineering, Department of Biochemical Engineering, University College London,Torrington Place, London WC1E 7JE, UK

    Functionally related protein structures that have under-gone significant mutagenesis and re-arrangement overa large evolutionary time-scale might no longer shareenough sequence or structural similarity to be revealedby even the most advanced database searches.Recently, Christ and Winter used directed evolution toobtain functional variants of the RNA-hairpin-bindingprotein Rop. Using the functional sequences obtained,a structural database search revealed previouslyunknown similarity to the tRNA-binding region of valyl-tRNA synthetase.

    It is well established that proteins of both similar andunrelated function can have the same overall structuraltopology but statistically insignificant sequence homology.For example, human hemoglobin and lupine leghemo-globin have very similar tertiary structures but only 15.6%homology at the amino acid sequence level [1]. The extentto which sequences can be altered and yet achieve thesame protein fold has been investigated with the directedevolution of a functional Src homology 3 (SH3) domain,using phage-displayed libraries containing a simplifiedalphabet of just five amino acids [2]. Sequence simplifica-tion was achieved at 40 of the 45 randomized non-peptide-binding residues, highlighting that, potentially, a proteincould evolve to have a dramatically different sequencewhile retaining its structure and function. Consequently,we can expect an abundance of distantly related proteinswith similar functions that are difficult to identify bycomparison of their sequences alone.

    A fundamental aim of protein science is to develop amethod to predict ab initio the folded structures of proteinsfrom sequence data alone and, subsequently, to infer theirfunction. The structure of many proteins can be identifiedby sequence homology with known protein structures,although this is not possible when a protein sequence haslittle or no significant homology to those in structuredatabases [3]. The recent report by Christ and Winter [4]demonstrates that directed evolution might bridge the gapto homology modeling for a subset of sequences that arerelated structurally and functionally but no longer havesignificant sequence homology.

    Protein engineering by directed evolutionOver the past decade, directed evolution has becomeestablished as the leading method both for obtainingproteins with novel binding affinities and for altering theproperties of enzymes [5,6]. It has been used to obtain

    enzyme activity [7] and to improve many properties ofproteins, including binding affinities [810], enzymeactivity [11,12], stability [13,14], substrate specificity[15,16], enantioselectivity [17] and protein expression[18]. The key to its success has been that it does notrequire comprehensive knowledge of protein structure andfunction. Successive rounds of random mutation andcarefully designed selection or screening protocols identifyimproved proteins in a manner that mimics naturalevolution processes.

    The screening or selection for new protein variants froma library of random mutants neatly avoids the require-ment that currently hampers rational protein design(i.e. understanding the complex relationship betweenprotein structure and function). Mutations that alter orimprove protein function are frequently obtained; thesewould have been difficult to predict by sequence analysisor protein modeling. Interestingly, these unexpectedmutations, alongside those rationalized more readily,might play a significant role in understanding betterboth structurefunction relationships and, as Christ andWinter have demonstrated, the evolutionary relationshipsbetween proteins [4]. Furthermore, directed evolutionoften reveals divergence to more than one consensussequence that results in the same overall protein structureand function [19,10], thus highlighting the potentialdifficulty in identifying the evolutionary link betweentwo distantly related sequences.

    The ability of directed evolution to identify thesechanges has led to its increased use as a tool for identifyingprotein residues or structural elements with functionalimportance. For example, it has been used to identifyresidues that affect enzyme regulation [20], to obtainfunctional consensus sequences compatible with certainstructural elements in proteins [19,10] and to identifypeptide sequence motifs that interact with target proteins[2123]. After detecting consensus sequence motifs thatbind to a chosen target molecule, computational searchtools can then be used to identify potential interactionpartners. Using this method, protein interaction networkshave been identified for SH3 domains that were thenrefined using two-hybrid screening [24].

    Directed evolution in bioinformaticsChrist and Winter have extended the use of directedevolution to reveal an evolutionary relationship betweentwo proteins that would have been difficult to identifyusing alternative current methods [4]. The consensussequences obtained by directed evolution of the dimericRNA-binding protein Rop, mapped to the RNA-bindingCorresponding author: Paul A. Dalby (p.dalby@ucl.ac.uk).

    Update TRENDS in Biotechnology Vol.22 No.5 May 2004



  • helix structure, have been used to identify a distantlyrelated enzyme, valyl-tRNA-synthetase (ValRS), withpreviously unknown structural and functional similarityto Rop. The two proteins have no significant sequencehomology and only a search with alternative functionalRop sequences revealed the potential link to ValRS.

    In their approach, Christ and Winter randomized fiveresidues of Rop corresponding to the putative RNA-bindingsite within the N-terminal helix. A genetic complementa-tion approach was then used to select active Rop variants.The basis of this system is a derivative of the naturallyoccurring ColE1 plasmid with the rop gene deleted. Thisdeletion boosts the plasmid copy number and, conse-quently, increases the metabolic burden on the cell, whichresults in reduced growth rate. The increased copy numberalso raises the expression level of the plasmid-bornereporter gene LacZ. Clones from the library that expressedactive variants of Rop in trans had their growth ratesrestored and were, therefore, enriched by growth selectionin liquid media. Subsequent bluewhite screening ofcolonies growing on X-Gal (5-bromo-4-chloro-3-indolyl-b-D-galactopyranoside) confirmed clones that expressedactive variants of Rop their lower levels of reportergene expression colored them white. After three suchrounds of selection and screening, the sequences of 28active Rop variants were compiled and used to search aProtein-Data-Bank-derived database with the SPASMprogram [25]. All combinations of the obtained sequenceswere used in the search pattern, excluding positions atwhich mutations occurred only once. The search patternincluded only the mutated residues and enabled a maxi-mum of 1-A root-mean-square from their spatial arrange-ment in Rop. Initially, the inclusion of residue 25 returnedonly Rop as a match but its exclusion enabled six otherproteins to be identified, of which ValRS was the onlyRNA-binding protein. This refinement of the searchpattern seems to indicate that, in general, several versionsof a search pattern might be required for efficientidentification of hits with SPASM.

    Having obtained a match to ValRS, the authors built amodel of wild-type Rop bound to RNA, based on the ValRSstructure and the synthetic TarTar* RNA hairpin, forwhich a nuclear magnetic resonance (NMR) structure isavailable. The binding affinity of TarTar* for Rop issimilar to that of ColE1, the natural target of Rop, makingit a reasonable RNA structure to use in the model. Themodel obtained was consistent with previous NMR andbiochemical data for Rop. Comparison of the RopRNAmodel with the known structure of Rop in the absence ofRNA enabled Christ and Winter to rationalize the RNAbinding in terms of a ribose trap, in which a hydrogenbond between Arg-13 and Asn-10 of Rop is broken to formnew contacts with the ribose of RNA [4].

    Concluding remarksOverall, these results demonstrate that using directedevolution and structure searches is a powerful newapproach for identifying potential new evolutionary linksbetween distantly related protein sequences. Further-more, the identification of structural and functionalsimilarity to a protein for which a liganded structure is

    available has enabled Christ and Winter to infer the modeof binding for their protein to a similar ligand. Thesimilarities suggest a possible common evolutionaryorigin for Rop and ValRS, bearing in mind that mostother tRNA synthetases (e.g. ArgRS) have differentbinding modes to RNA.

    The technique used in the SPASM program identifiesonly proteins containing the search motif and does notrequire matches outside this region. Looking beyond theRNA contact sites, the authors found that both ValRS andRop contain a four-helix bundle. However, Rop is anantiparallel bundle between a homodimer, whereas ValRSis monomeric bundle. Also, Rop binds two RNA moleculesin a symmetrical manner, whereas ValRS binds only onetRNA molecule. Consequently, it is difficult to distinguishthe evolutionary link between Rop and ValRS as beingeither divergent or convergent evolution. Despite this,many researchers should, surely, be revisiting the resultsof their directed evolution experiments to see whether theycan reveal any further evolutionary links to functionallysimilar proteins.

    This work has broad implications for the study ofprotein evolution. Prediction of evolutionary relationshipsis currently limited to cases in which sequence or struc-tural similaritie