putting engineering back into protein engineering: bioinformatic approaches to catalyst design
TRANSCRIPT
Putting engineering back into protein engineering:bioinformatic approaches to catalyst designClaes Gustafsson�, Sridhar Govindarajan and Jeremy Minshull
Complex multivariate engineering problems are commonplace
and not unique to protein engineering. Mathematical and data-
mining tools developed in other fields of engineering have now
been applied to analyze sequence–activity relationships of
peptides and proteins and to assist in the design of proteins
and peptides with specified properties. Decreasing costs of
DNA sequencing in conjunction with methods to quickly
synthesize statistically representative sets of proteins allow
modern heuristic statistics to be applied to protein
engineering. This provides an alternative approach to
expensive assays or unreliable high-throughput surrogate
screens.
AddressesDNA 2.0, Inc., 1455 Adams Drive, Menlo Park, CA 94025, USA�e-mail: [email protected]
Current Opinion in Biotechnology 2003, 14:366–370
This review comes from a themed issue on
Protein technologies and commercial enzymes
Edited by Gjalt Huisman and Stephen Sligar
0958-1669/$ – see front matter
� 2003 Elsevier Ltd. All rights reserved.
DOI 10.1016/S0958-1669(03)00101-0
AbbreviationsNK1 neurokinin 1
NP non-polynomial
PLS partial least squares
IntroductionProtein engineering has classically been approached from
two diametrically opposed directions: rational design and
directed evolution. Rational design, in the tradition of
Descartes and Leibniz, attempts to understand protein
structure and function at a complete mechanistic level so
that any desired change can be effected by calculation
from first principles. Directed evolution, in the tradition
of John Locke and other empiricists, attempts to find
a desired solution by testing many different variants,
typically using various evolutionary based algorithms.
Both rational design and directed evolution in their
many alternative formats have shortcomings and advan-
tages that have been discussed and compared else-
where [1–3].
Modern heuristics applied to protein engineering is a
synthesis of empirical data and a rational analysis of that
information. The very first paper describing chemical
synthesis of a gene proposed that systematic variation
of amino acids would enable an understanding of the
relationships between the sequence of a protein and its
structure, physical behavior and activity [4]. Soon after
that, Svante Wold’s group developed and applied multi-
variate data analysis techniques to peptide design and
suggested that ‘the rapid development of protein engi-
neering may then make it possible to produce designed
sets of mature proteins and enzymes for QSAR studies’
[5,6]. This review will summarize recent publications in
which modern heuristics have been applied to protein
engineering and describes technological advances that are
enabling Wold’s vision.
Protein optimization from an engineeringperspectiveWhen faced with solving a difficult problem it can be
enlightening to see if a similar type of problem has been
solved before. Many disciplines and industries face the
same challenges of high system complexity and abun-
dant variables that confront protein engineering [7]. In
some industries increasing complexity is intentional, as
in the addition of new control parameters for a car’s
combustion engine. Sometimes it is inherent to the
system itself, for example, in clinical drug trials. The
common challenge in car manufacturing, clinical trials
and protein engineering is to account for as much of this
complexity as possible when describing the relationship
between input variables (e.g. piston angle and tempera-
ture for car engines, age and medical history for patients
or amino acid residues available at each position for pro-
tein engineering [8]) and output variables (e.g. exhaust
levels and fuel efficiency for cars, side effects and surv-
ival rate for patients or the desired commercial proper-
ties such as catalytic activity, thermostability, substrate
specificity and immunogenicity for protein engineering).
Measured output variables may in turn result from com-
binations of properties that are not explicitly measured;
for protein engineering, these may include expression
levels and protein solubility [9]. Like small-molecule
quantitative structure–activity relationships (QSAR),
which have enjoyed much success in pharmaceutical
development, heuristic protein engineering aims to
identify the relationship between input and output var-
iables to create biological macromolecules with defined
properties. For reasons described below, more work
has been published optimizing peptides than proteins
using engineering concepts. We therefore use peptide
examples to describe some of the principles before
describing how the same engineering tools are used to
optimize proteins.
366
Current Opinion in Biotechnology 2003, 14:366–370 www.current-opinion.com
Navigating in protein sequence spaceProtein engineering can be divided into two subtasks:
defining the solution space and defining the search
algorithm.
Define the solution space
The total possible number of proteins encoded by a 1 kb
gene is 20333 (20 alternative amino acids at each position
in a string of 333 residues) �10430. This is an unfeasibly
large number of variants to screen. Fortunately, not all
possible sequences need be considered as naturally occur-
ring proteins can usually be relied on to provide a starting
point for engineering efforts. Active point-mutants [10],
phylogenetic substitutions [11��], structural modeling
[12,13] and known immunogenic constraints [14] are
well-explored methods of targeting specific regions of a
protein for change.
Define the search algorithm
Protein engineering is a non-polynomial (NP)-complete
problem [15,16], meaning that the problem scales non-
polynomially with increasing complexity and no known
algorithm can guarantee determining the optimal solution
without evaluating all possible solutions. Empirical pro-
tein engineers have largely limited themselves to ad-
dress the NP-complete problem with exhaustive searches
using ultra-high-throughput phage and ribosome display
screens [17,18] or evolutionary methods [1–3,19]. By
contrast, the wider engineering community has exploited
genetic algorithms as well as regression-based algorithms,
neural nets, clustering, and several other tools as alter-
native techniques to address NP-complete problems [20].
Statistical targeting of amino acid changesComparisons of natural protein and DNA sequences,
particularly those using the powerful technique of prin-
cipal component analysis, can be used to identify residues
that are important for specific functionality within a
protein [21,22�,23�,24,25]. Natural substitution patterns
can also be used to infer which changes are likely to be
acceptable within functional proteins. For example, a
recent study of subtilisin variants found that all 52 of
the amino acid variations found in 15 homologs were
active within the context of at least one backbone; their
incorporation produced proteases with varying catalytic
properties [26�]. In another set of experiments, all of the
active-site residues from one fungal phytase were
replaced with those from another, again the result was
an active protein with altered catalytic properties [11��].By incorporating small numbers of changes identified
from alignments of naturally occurring sequences, it
has also been possible to increase the thermostability
of a fungal phytase by over 308C [27]. Substitution
matrices derived from synonymous and non-synonymous
substitution rates can also be used to choose reasonable
amino acid changes if there is insufficient phylogenetic
data to use sequence alignments [28–30,31�].
Multivariate design of improved polypeptidesFigure 1 shows a procedure for peptide optimization
derived from the one used by Norinder et al. [32] to
design analogs of the neuropeptide substance P with
increased affinity for the neurokinin 1 (NK1) receptor.
These authors used partial least squares (PLS) regression
[33,34] to correlate the sequences of 36 substance P
analogs with their activities. They used this model to
identify the positions and amino acid properties in sub-
stance P that had the largest effects on NK1 binding. The
authors designed, synthesized and tested six new pep-
tides that the model predicted to be improved NK1
binders. All six were shown to be highly active. Their
sequence–activity data was added to the first 36 peptides
to build a second generation PLS model, which was used
to design a further three variants. One of these had an
IC50 of 5 pM, 300-fold better than the wild-type peptide
and 45-fold better than the best of the original 36 variants
[32]. It is striking that extremely small numbers of var-
iants (45) were made and tested to achieve very signif-
icant improvements in the desired function.
The same techniques have also been applied to proteins.
In one particularly informative example, Bucht and col-
leagues optimized a complex protein phenotype: the
activity of acetylcholinesterase expressed on the surface
of human COS-1 cells. Display of acetylcholinesterase on
the cell surface occurs as a result of glycosyl phosphati-
dylinositol modification at the C terminus of the protein.
The authors identified two amino acids in the signal
peptide region of the protein, the identity of which
affected cell-surface localization of the protein. They
synthesized eight variant genes, tested the surface ex-
pression of the eight encoded proteins and used PLS to
Figure 1
Add new data to refine
sequence–activity model
Create initial set of variantsand measure desired phenotype
Build sequence–activity model
Design new variants based on modelpredictions for high performing sequences
Synthesize and test new variants
Current Opinion in Biotechnology
Polypeptide optimization using mathematical models. The process is
that used by Norinder et al. [32] for the optimization of the neuropeptide
substance P.
Bioinformatic approaches to catalyst design Gustafsson, Govindarajan and Minshull 367
www.current-opinion.com Current Opinion in Biotechnology 2003, 14:366–370
model the sequence–activity relationship. The authors
then constructed an additional 27 variants in this same
region of the protein, using them to test and refine the
model, thereby identifying the optimal sequence for cell-
surface expression of acetylcholinesterase [35��]. Mod-
eling sequence–activity relationships to identify optimal
protein variants has not been limited to amino acids
localized to a small region of a protein. Statistical analysis
of mutations distributed throughout several enzymes has
been used to identify the contributions of those changes
to function of the protein [36] and to predict the se-
quence with best function [37]. Mathematical sequence–
activity modeling has thus been validated at many scales
of complexity: from small molecules to peptides to loca-
lized regions of proteins to changes spread throughout
entire proteins.
Although there is a growing body of work in which
sequence–activity relationships are used to design im-
proved peptides [5,6,38,39], application of the same
methods to protein/biocatalyst engineering is still in its
infancy. One reason for this has been the difficulty in
producing large numbers of modified molecules [40�]; in
contrast to peptides, proteins cannot easily be synthesized
directly. As technology improves, the synthesis of indi-
vidually designed genes becomes increasingly cost-
effective [41�,42]. Testing variants taken from libraries
that are even cheaper to produce is also likely to produce
useful sequence–activity relationships [43�].
Experimental design of maximallyinformative datasetsAnother useful statistical tool with its origins in other
engineering disciplines is that of experimental design.
This is a technique by which a variant set is designed to
contain the maximum amount of information for sub-
sequent analysis of sequence–activity data [44]. Using
D-optimal design, Mee et al. [45] designed, synthesized
and tested a training set of 60 analogs of a 15 amino acid
antibacterial peptide. A regression-based model derived
from the sequence–activity correlation of the 60 data-
points was used to design and synthesize 39 new peptides
predicted to have improved activity. The best designed
peptide was twice as potent as the best one in the training
set. In their selection of acetylcholinesterase variants,
Bucht et al. [35��] also used experimental design to choose
the eight gene variants that would best represent the
sequence variation they were exploring.
Accounting for amino acid interactionsIf an amino acid change at one position affects the
functional consequences of changing other amino acids
in a protein, predictive sequence–function models must
account for this. A model that incorporates amino acid
interactions requires more data than one that assumes that
the amino acids act to achieve the same quality of model
[40�,46��]. In studies of antigen–antibody binding [40�]
and ligand–receptor binding [47��], researchers found
that very few interaction terms (and thus very little
additional data) were needed to produce accurate descrip-
tions of the sequence–activity relationship.
Recent work from Husimi’s group suggests that this result
is also true for proteins. Individual amino acid changes
contributing to specific properties of dihydrofolate reduc-
tase [36], thermolysin and prolyl endopeptidase [37] are
approximately independent. Of particular interest is a
recent study in which only two of 14 randomly generated
mutations that increased prolyl endopeptidase thermo-
stability appeared to be interdependent. The authors’
model contained a single interaction term to account for
this residue pair. A gene variant containing the pair pre-
dicted to interact was synthesized and tested; its activity
was shown be as predicted by the model. Only 45 gene
variants were needed to accurately model the activities of
16 384 possible sequence combinations [46��].
Heuristic methods are becoming morewidespreadOther successful examples of heuristic approaches to
analyze and optimize biological systems include the
optimization of peptidase I using neural networks [48],
calculations of individual amino acid contributions to
serine protease inhibitor activity [49��], PLS-based pre-
diction of the determinants of protein localization [50,51],
and protein contact map and interaction site prediction
using neural networks [52]. In work complementing
modeling to assess the contributions of small numbers
of changes at many positions, sequence–activity relation-
ships have been derived using PLS to quantitate the
effects of multiple amino acid substitutions at single
positions in haloalkane dehalogenase, T4 lysozyme, sub-
tilisin and tryptophan synthase. These methods have also
been used to determine the physicochemical properties
required at identified positions to confer specific enzyme
properties [53]. Furthermore, the same tools have been
used to systematically characterize the substrates for a set
of haloalkane dehalogenase variants to determine the
effects of amino acid changes on substrate specificity of
the enzyme [54].
Conclusions: drivers for changeBy casting the protein engineering problem as an opti-
mization problem common to other engineering disci-
plines, we are able to exploit many different problem
solving algorithms. Gone are the technological barriers
to synthesizing statistically representative datasets. As
Wold predicted in 1986, the capture of protein sequence–
activity relationships nowpermits thedesign ofoptimized
proteins.
There are several drivers for applying modern engineer-
ing tools to protein engineering. Firstly, the human gen-
ome project, microarrays and other recent large scientific
368 Protein technologies and commercial enzymes
Current Opinion in Biotechnology 2003, 14:366–370 www.current-opinion.com
endeavours have changed biology from a ‘one variable at a
time’ science to a science engulfed in variables. Secondly,
statistical tools developed and deployed in a variety of
engineering areas can now be operated by non-statisticians
from any desktop computer. Finally, the cost of generating
and sequencing statistically representative sets of genes is
continuously decreasing.
It is striking that by measuring the contribution of amino
acid variations to the function of a protein, sequence–
activity modeling requires orders of magnitude fewer
variants to be tested to design improved sequences than
the numbers screened using widespread directed evolu-
tion techniques. This is important, because methodologies
that rely upon screening large sample sets are vulnerable
to the weakness that high-throughput screens often turn
out to have limited ability to measure the protein proper-
ties that are really important [2,19,40�]. Heuristic meth-
odologies may therefore permit protein engineers to test
fewer variants under conditions that more closely approx-
imate their final intended applications and reduce the time
and resources that are often spent in building and imple-
menting imprecise high-throughput screens.
AcknowledgementsOne of us (CG) began this manuscript while employed at Maxygen Inc.We thank Maxygen for their support.
References and recommended readingPapers of particular interest, published within the annual period ofreview, have been highlighted as:
� of special interest��of outstanding interest
1. Tobin MB, Gustafsson C, Huisman GW: Directed evolution: the‘rational’ basis for ‘irrational’ design. Curr Opin Struct Biol 2000,10:421-427.
2. van Regenmortel MH: Are there two distinct research strategiesfor developing biologically active molecules: rational designand empirical selection? J Mol Recognit 2000, 13:1-4.
3. Ryu DD, Nam DH: Recent progress in biomolecular engineering.Biotechnol Prog 2000, 16:2-16.
4. Nambiar KP, Stackhouse J, Stauffer DM, Kennedy WP, EldredgeJK, Benner SA: Total synthesis and cloning of a gene coding forthe ribonuclease S protein. Science 1984, 223:1299-1301.
5. Hellberg S: A Multivariate Approach to QSAR. PhD thesis. Umea,Sweden: University of Umea: 1986.
6. Hellberg S, Sjostrom M, Skagerberg B, Wold S: Peptidequantitative structure-activity relationships, a multivariateapproach. J Med Chem 1987, 30:1126-1135.
7. Gustafsson C, Govindarajan S, Emig R: Exploration of sequencespace for protein engineering. J Mol Recognit 2001, 14:308-314.
8. Sandberg M, Eriksson L, Jonsson J, Sjostrom M, Wold S: Newchemical descriptors relevant for the design of biologicallyactive peptides. A multivariate characterization of 87 aminoacids. J Med Chem 1998, 41:2481-2491.
9. Lin Z, Thorsen T, Arnold FH: Functional expression ofhorseradish peroxidase in E. coli by directed evolution.Biotechnol Prog 1999, 15:467-471.
10. Glieder A, Farinas ET, Arnold FH: Laboratory evolution of asoluble, self-sufficient, highly active alkane hydroxylase.Nat Biotechnol 2002, 20:1135-1139.
11.��
Lehmann M, Lopez-Ulibarri R, Loch C, Viarouge C, Wyss M,van Loon AP: Exchanging the active site between phytases foraltering the functional properties of the enzyme. Protein Sci2000, 9:1866-1872.
Demonstration that residues identified as functionally important (in thiscase the entire active site) can be moved from one protein backbone toanother, leading to functionally novel catalysts.
12. Looger LL, Dwyer MA, Smith JJ, Hellinga HW: Computationaldesign of receptor and sensor proteins with novel functions.Nature 2003, 423:185-190.
13. Kwasigroch JM, Gilis D, Dehouck Y, Rooman M: PoPMuSiC,rationally designing point mutations in protein structures.Bioinformatics 2002, 18:1701-1702.
14. Tangri S, LiCalsi C, Sidney J, Sette A: Rationally engineeredproteins or antibodies with absent or reduced immunogenicity.Curr Med Chem 2002, 9:2191-2199.
15. Pierce NA, Winfree E: Protein design is NP-hard. Protein Eng2002, 15:779-782.
16. Lathrop RH: The protein threading problem with sequenceamino acid interaction preferences is NP-complete. Protein Eng1994, 7:1059-1068.
17. Hanes J, Pluckthun A: In vitro selection and evolution offunctional proteins by using ribosome display. Proc Natl AcadSci USA 1997, 94:4937-4942.
18. Wells JA, Lowman HB: Rapid evolution of peptide and proteinbinding properties in vitro. Curr Opin Biotechnol 1992,3:355-362.
19. Ness JE, del Cardayre SB, Minshull J, Stemmer WP: Molecularbreeding: the natural approach to protein design. Adv ProteinChem 2000, 55:261-292.
20. Johnson DS, McGeoch LA: The traveling salesman problem: acase study in local optimization. In Local Search in CombinatorialOptimization. Edited by Aarts EHL, Lenstra JK, Aarts EL: John Wiley& Sons Ltd; 1997:215-310.
21. Casari G, Sander C, Valencia A: A method to predict functionalresidues in proteins. Nat Struct Biol 1995, 2:171-178.
22.�
del Sol Mesa A, Pazos F, Valencia A: Automatic methods forpredicting functionally important residues. J Mol Biol 2003,326:1289-1302.
Excellent comparison of methods available to identify residues thatcontribute to protein function.
23.�
Gogos A, Jantz D, Senturker S, Richardson D, Dizdaroglu M,Clarke ND: Assignment of enzyme substrate specificity byprincipal component analysis of aligned protein sequences: anexperimental test using DNA glycosylase homologs.Proteins 2000, 40:98-105.
Principal component analysis of small numbers of proteins used toidentify residues likely to be involved in substrate specificity deter-mination.
24. Suzuki Y, Gojobori T: A method for detecting positiveselection at single amino acid sites. Mol Biol Evol 1999,16:1315-1328.
25. Jonsson J, Norberg T, Carlsson L, Gustafsson C, Wold S:Quantitative sequence-activity models (QSAM) — tools forsequence design. Nucleic Acids Res 1993, 21:733-739.
26.�
Govindarajan S, Ness JE, Kim S, Mundorff EC, Minshull J,Gustafsson C: Systematic variation of amino acid substitutionsfor stringent assessment of pairwise covariation. J Mol Biol2003, 328:1061-1069.
Fifty-two phylogenetically identified substitutions in subtilisins areaccepted into one enzyme backbone, modifying its activity. Most naturalchanges that occur together are shown to be a result of descent from acommon ancestor and not a result of functional constraints.
27. Lehmann M, Loch C, Middendorf A, Studer D, Lassen SF,Pasamontes L, van Loon AP, Wyss M: The consensus concept forthermostability engineering of proteins: further proof ofconcept. Protein Eng 2002, 15:403-411.
28. Benner SA, Cohen MA, Gonnet GH: Amino acid substitutionduring functionally constrained divergent evolution of proteinsequences. Protein Eng 1994, 7:1323-1332.
Bioinformatic approaches to catalyst design Gustafsson, Govindarajan and Minshull 369
www.current-opinion.com Current Opinion in Biotechnology 2003, 14:366–370
29. Wu TD, Brutlag DL: Discovering empirically conserved aminoacid substitution groups in databases of protein families.Proc Int Conf Intell Syst Mol Biol 1996, 4:230-240.
30. Adenot M, Sarrauste de Menthiere C, Chavanieu A, Calas B,Grassy G: Peptides quantitative structure-functionrelationships: an automated mutation strategy to designpeptides and pseudopeptides from substitution matrices. J MolGraph Model 1999, 17:292-309.
31.�
Dimmic MW, Rest JS, Mindell DP, Goldstein RA: rtREV: an aminoacid substitution matrix for inference of retrovirus and reversetranscriptase phylogeny. J Mol Evol 2002, 55:65-73.
A substitution matrix for maximum likelihood phylogenetic analysis isdeveloped that is optimized on a subset of sequences. Substitutionmatrices are unique for each sequence subset.
32. Norinder U, Rivera C, Unden A: A quantitative structure-activityrelationship study of some substance P-related peptides. Amultivariate approach using PLS and variable selection.J Pept Res 1997, 49:155-162.
33. Sandberg M: Deciphering Sequence Data, a Multivariate Approach.PhD thesis. Umea: Umea University: 1997.
34. Geladi P, Kowalski BR: Partial least squares regression: atutorial. Anal Chim Acta 1986, 186:1-17.
35.��
Bucht G, Wikstrom P, Hjalmarsson K: Optimising the signalpeptide for glycosyl phosphatidylinositol modification ofhuman acetylcholinesterase using mutational analysis andpeptide-quantitative structure-activity relationships.Biochim Biophys Acta 1999, 1431:471-482.
PLS and experimental design are used to optimize acetylcholinesterase,increasing its surface expression on cells threefold.
36. Aita T, Iwakura M, Husimi Y: A cross-section of the fitnesslandscape of dihydrofolate reductase. Protein Eng 2001,14:633-638.
37. Aita T, Uchiyama H, Inaoka T, Nakajima M, Kokubo T, Husimi Y:Analysis of a local fitness landscape with a model of the roughMt. Fuji-type landscape: application to prolyl endopeptidaseand thermolysin. Biopolymers 2000, 54:64-79.
38. Strom MB, Haug BE, Rekdal O, Skar ML, Stensen W, Svendsen JS:Important structural features of 15-residue lactoferricinderivatives and methods for improvement of antimicrobialactivity. Biochem Cell Biol 2002, 80:65-74.
39. Eriksson L, Jonsson J, Hellberg S, Lindgren F, Skagerberg B,Sjostrom M, Wold S: Peptide QSAR on substance P analogues,enkephalins and bradykinins containing L- and D-amino acids.Acta Chem Scand A 1990, 44:50-55.
40.�
Choulier L, Andersson K, Hamalainen MD, van Regenmortel MH,Malmqvist M, Altschuh D: QSAR studies applied to the predictionof antigen-antibody interaction kinetics as measured byBIAcore. Protein Eng 2002, 15:373-382.
Multivariate analysis applied to sequence optimization and reactionconditions.
41.�
Hoover DM, Lubkowski J: DNAWorks: an automated method fordesigning oligonucleotides for PCR-based gene synthesis.Nucleic Acids Res 2002, 30:e43.
The shape of things to come. Gene synthesis gets cheaper and easier.
42. Holowachuk EW, Ruhoff MS: Efficient gene synthesis by Klenowassembly/extension-Pfu polymerase amplification (KAPPA) ofoverlapping oligonucleotides. PCR Methods Appl 1995,4:299-302.
43.�
Abecassis V, Pompon D, Truan G: High efficiency family shufflingbased on multi-step PCR and in vivo DNA recombination inyeast: statistical and functional analysis of a combinatoriallibrary between human cytochrome P450 1A1 and 1A2.Nucleic Acids Res 2000, 28:E88.
One of many library synthesis methods. Interesting analysis of variants inwhich hybridization signals instead of known sequence changes are usedas input variables for modeling.
44. Hellberg S, Eriksson L, Jonsson J, Lindgren F, Sjostrom M,Skagerberg B, Wold S, Andrews P: Minimum analogue peptidesets (MAPS) for quantitative structure-activity relationships.Int J Pept Protein Res 1991, 37:414-424.
45. Mee RP, Auton TR, Morgan PJ: Design of active analogues of a15-residue peptide using D-optimal design, QSAR and acombinatorial search algorithm. J Pept Res 1997, 49:89-102.
46.��
Aita T, Hamamatsu N, Nomiya Y, Uchiyama H, Shibanaka Y,Husimi Y: Surveying a local fitness landscape of a protein withepistatic sites for the study of directed evolution. Biopolymers2002, 64:95-105.
A model of only 45 prolyl endopeptidase variants accurately predicts theactivities of combinations of 14 different mutations. Only one interactionterm in required in the model.
47.��
Prusis P, Lundstedt T, Wikberg JE: Proteo-chemometricsanalysis of MSH peptide binding to melanocortin receptors.Protein Eng 2002, 15:305-311.
Statistically representative sets of melanocortin peptide and chimericreceptors were analyzed. Models incorporated linear and interactionterms; predictions were externally validated.
48. Schneider G, Schrodl W, Wallukat G, Muller J, Nissen E,Ronspeck W, Wrede P, Kunze R: Peptide design by artificialneural networks and computer-based evolutionary search.Proc Natl Acad Sci USA 1998, 95:12179-12184.
49.��
Lu SM, Lu W, Qasim MA, Anderson S, Apostol I, Ardelt W, Bigler T,Chiang YW, Cook J, James MN et al.: Predicting the reactivity ofproteins from their sequence alone: Kazal family of proteininhibitors of serine proteinases. Proc Natl Acad Sci USA 2001,98:1410-1415.
The conclusion of an heroic 20 year study. By synthesizing and testing<200 variants, activities of many natural proteinases can be accuratelypredicted.
50. Sjostrom M, Wold S, Wieslander A, Rilfors L: Signal peptide aminoacid sequences in Escherichia coli contain information relatedto final protein localization. A multivariate data analysis.EMBO 1987, 6:823-831.
51. Schein AI, Kissinger JC, Ungar LH: Chloroplast transit peptideprediction: a peek inside the black box. Nucleic Acids Res 2001,29:E82.
52. Fariselli P, Pazos F, Valencia A, Casadio R: Prediction of protein–protein interaction sites in heterocomplexes with neuralnetworks. Eur J Biochem 2002, 269:1356-1361.
53. Damborsky J: Quantitative structure-function and structure-stability relationships of purposely modified proteins.Protein Eng 1998, 11:21-30.
54. Marvanova S, Nagata Y, Wimmerova M, Sykorova J, Hynkova K,Damborsky J: Biochemical characterization of broad-specificity enzymes using multivariate experimental designand a colorimetric microplate assay: characterization of thehaloalkane dehalogenase mutants. J Microbiol Methods 2001,44:149-157.
370 Protein technologies and commercial enzymes
Current Opinion in Biotechnology 2003, 14:366–370 www.current-opinion.com