sequence-activity relationships guide directed evolution

NATURE BIOTECHNOLOGY VOLUME 25 NUMBER 3 MARCH 2007 297

Sequence-activity relationships guide directed evolutionJoelle N Pelletier & Robert Lortie

A new method for in vitro evolution integrates computational analysis and experimental screening.

Directed evolution to improve the properties of proteins faces a problem of numbers: for an average-sized protein, the sequence space to be explored is astronomically large, whereas prac-tical screening capacity is limited to thousands or millions of variants. Virtual screening in silico can analyze a much higher fraction of the total number of variants and has considerable potential, but is not yet widely implemented. In this issue, Fox et al.1 present an alternative—a combined computational and experimental method in which sequence space is sifted along multiple paths marked by experimental data points. Their approach, which incorporates a protein sequence-activity relationship (ProSAR) algorithm, promises to become a useful general strategy for evolving proteins with novel prop-erties.

The authors apply ProSAR to address the challenging synthesis of the (3R,5S)-dihy-droxyheptanoate side chain of atorvastatin, the active ingredient in the blockbuster cholesterol-lowering drug Lipitor. Their starting point, a halohydrin dehalogenase from Agrobacterium radiobacter, catalyzes the cyanation of ethyl (S)-4-chloro-3-hydroxybutyrate into the synthetic intermediate ethyl (R)-4-cyano-3-hydroxy-butyrate, but with an efficiency under process conditions that is several thousand–fold too low to be economically viable. The authors’ invest-ment in 18 rounds of directed evolution, each providing a modest improvement over the pre-vious one, paid off handsomely: they achieved

Joelle N. Pelletier is at the Département de chimie and Département de biochimie, Université de Montréal, C.P. 6128, Succursale Centre-Ville, Montréal, Québec, Canada, H3C 3J7 and Robert Lortie is at the Biotechnology Research Institute, National Research Council, 6100 Royalmount Avenue, Montréal, Québec, Canada, H4P 2R2 and Département de chimie, Université de Montréal.e-mail: [email protected] [email protected]

Mutated sequences Performance

Partial least-squares regression

New backbone

Postulate

Effect onperformance

1 2

1

1

3

3

4

2

+ +

+

+

–

–

?

– –

Include 1

Discard 2

Discard 4

Retest 3

New

mut

atio

ns

2 4

3

1 3 4

0

1,000

2,000

3,000

4,000

Rel

ativ

e pr

oduc

tivity Target productivity

Generations0 5 10 15

a

b

Figure 1 Mathematical analysis bolsters directed evolution. (a) Partial least-squares regression is used to predict the effects of individual mutations on the performance of enzyme variants each of which carries multiple mutations. Deleterious mutations are discarded, and favorable mutations are included for the next round of screening. Whenever the effect of a given mutation is uncertain, it is retested in a subsequent round. (b) The rewards of a small increase in enzyme productivity (~1.5-fold) per round become evident in the later rounds.

N E W S A N D V I E W S©

2007

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy

298 VOLUME 25 NUMBER 3 MARCH 2007 NATURE BIOTECHNOLOGY

an impressive 4,000-fold increase in volumetric productivity under process conditions, thereby attaining commercial feasibility.

Enzyme engineering is typically tackled experimentally2 or, more recently, computa-tionally3–5. What distinguishes the work of Fox et al. is that functional data and computation are combined in each round to predict which mutations to retain in the subsequent round. Here’s how it works. First, accumulate a stock-pile of functional data from multiply mutated variants of your favorite enzyme. Feed the data to an algorithm that assigns a likelihood that a given mutation has a beneficial, neutral or detrimental effect on activity. Incorporate the beneficial mutations into a modified protein template. Add new sets of multiple mutations to the improved template, and repeat the process for additional rounds of evolution (Fig. 1a).

The apparent secret of this method’s success lies in the use of partial least-squares regression6. Similar statistical methods have been applied to simpler systems, such as small molecules (quan-titative structure-activity relationship processes) and peptides and, to a limited extent, proteins (reviewed in ref. 7). Fox et al. successfully dem-onstrated the potential of partial least-squares regression for enzyme engineering. The ProSAR algorithm evaluates the contribution toward enzyme performance of individual mutations within multiply mutated sequences, random-izing the influence of these different molecular environments. The advantage is that the number of variables treated (e.g., number of individual mutations) can be greater than the number of data points (e.g., activity of a variant with mul-tiple mutations).

This powerful data analysis algorithm aug-ments the number of times an individual mutation is assessed after activity screening of a manageable number of variants with multiple mutations, which minimizes the possibility of spurious conclusions about the effects of muta-tions considered on a limited number of occa-sions. By deconvoluting the effects of multiple mutations into the contributions of individual mutations, selective pressure is focused on indi-vidual mutations rather than on the ensemble of mutations. The approach allows identification of beneficial mutations even from variants with reduced function, which would be rejected when using a purely experimental framework.

The ProSAR approach is different from, and in fact complementary to, the protein design auto-mation algorithm of Dahiyat and Mayo8, which computationally removes proteins predicted to fold poorly independent of any consideration of their activity. Like other directed evolution strategies, ProSAR can be applied generally, and it requires that the high-throughput assay conditions accurately reproduce the desired

reaction conditions. Screening conditions that differ from the ultimate process conditions may bias partial least-squares regression and attribute functional effects to mutations incorrectly. The successful results of Fox et al. were dependent on labor- and cost-intensive functional assays per-formed according to three sequential protocols of increasing complexity and stringency with respect to process conditions.

As with other directed evolution strategies, approaches involving ProSAR will be more successful when prior functional information enables more targeted mutagenesis. The overall number of variants screened by Fox et al. was relatively high (500,000), and a large fraction of the mutations was directed rather than random, requiring substantial resources. The rather mod-est improvement associated with each round of evolution (~1.5-fold on average) meant that 18 rounds, requiring as many months, were needed to nudge the overall improvement into the eco-nomically viable range (Fig. 1b).

Nevertheless, although the outcome is pro-portional to the input of resources, the ability of the ProSAR approach to analyze the effects of multiple mutations in parallel and to rapidly identify beneficial mutations should provide an important advantage over purely experimental methods. In fact, the method could be consid-ered a more efficient, mathematical alternative to genetic backcrossing9. Importantly, although no detailed characterization of the improved cata-lysts was undertaken, in addition to increased activities, other improvements include bet-ter thermal resistance and reduced inhibition by the accumulating product, illustrating that

the approach allows improvement of multiple traits.

A useful future extension of ProSAR-guided evolution would be one that took into account the effects of interacting mutations. In their study, Fox et al. assumed that the effect of each mutation is independent of any other muta-tions. This has indeed been demonstrated in some enzymes10, but in many cases the effect of a mutation is context dependent11. The good news is that interactions can in principle be accounted for by existing theory6, although this would require greater computational power. Even in its current form, however, the ProSAR approach represents an important step toward “putting engineering back into protein engi-neering”7 that will likely render directed evolu-tion more rapid and efficient.

COMPETING INTERESTS STATEMENTThe authors declare that they have no competing financial interests.

1. Fox, R.J. et al. Nat. Biotechnol. 25, 338–344 (2007).2. Hibbert, E.G. et al. Biomol. Eng. 22, 11–19 (2005).3. Dwyer, M.A., Looger, L.L. & Hellinga, H.W. Science 304,

1967–1971 (2004).4. Oelschlaeger, P. & Mayo, S.L. J. Mol. Biol. 350, 395–401

(2005).5. Zanghellini, A. et al. Protein Sci. 15, 2785–2794

(2006).6. Fox, R. J. Theor. Biol. 234, 187–199 (2005).7. Gustafsson, C., Govindarajan, S. & Minshull, J. Curr.

Opin. Biotechnol. 14, 366–370 (2003).8. Hayes, R.J. et al. Proc. Natl. Acad. Sci. USA 99, 15926–

15931 (2002).9. Rothman, S.C. & Kirsch, J.F. J. Mol. Biol. 327, 593–608

(2003).10. Aita, T., Iwakura, M. & Husimi, Y. Protein Eng. 14, 633–

638 (2001).11. Schmitzer, A.R., Lépine, F. & Pelletier, J.N. Protein Eng.

Des. Sel. 17, 809–819 (2004).

Towards quantitative analysis of proteome dynamicsSebastian Kühner & Anne-Claude Gavin

Integrating mass-spectrometry data across multiple samples promises to reveal the dynamics of proteomes.

Sebastian Kühner and Anne-Claude Gavin are at the EMBL, Structural and Computational Biology Unit, Meyerhofstrasse 1, D-69117 Heidelberg, Germany.e-mail: [email protected]

In this issue, Rinner et al.1 report an innovative strategy for the quantitative analysis of protein-protein interactions based on label-free liquid-chromatography mass spectrometry (LC-MS). Called MasterMap, the approach makes possible

an unbiased and comprehensive depository of LC-MS features that will inspire future strate-gies for storing, exchanging, annotating and integrating mass-spectrometry data within and between laboratories. MasterMap also opens up interesting avenues for more systematic and global study of proteome dynamics.

Although the simultaneous profiling and quantification of thousands of different RNAs is now commonplace, the prospects for con-ducting similar analyses of proteins remain

NEWS AND V IEWS©

2007

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy

sequence-activity relationships guide directed evolution

Documents