a potential function for conformational analysis of proteins
Post on 02-Oct-2016
Embed Size (px)
Int. J. Peptide Protein Res 24, 1984,219-296
A potential function for conformational analysis of proteins
G.M. CRIPPEN a d Y.N. VISWANADHAN
Department of Chemistry, Texas A & M University, College Station, Texas, USA
Received 15 November 1983, accepted for publication 15 March 1984
We have devised a residue-residue potential function for low resolution protein conformational calculations. The interactions between residues near in sequence maintain correct secondary structure, while the long-range terms in the poten- tial govern the larger packing features and overall globularity. The short-range terms were calculated by comparing the observed distributions of distances between Ca atomsin 35 protein crystal structures to the expected distributions and assigning the discrepancies to a Boltzmann distribution due to an effective potential. Long-range terms were adjusted to ensure that the crystal structure of bovine pancreatic trypsin inhibitor has a lower total energy than perturbed conformations of the same molecule. Thus the empirical potential function implicity contains solvation and conformational entropy effects along with the usual Van der Waals and electrostatic energies. Extensive testing of the potential on trypsin inhibitor and other proteins establishes that it is generally applicable to small proteins, it does not attempt to compress or expand the conformations found by X-ray crystallography, standard secondary structural features are maintained under the potential, and there are so many local minima that local minimization can be trusted to return a perturbed structure to the native conformation only if they differ initially by less then 1 A. Key words: amino acid residue; conformation; energy embedding; potential energy; protein
One of the standard approaches for attempting to calculate the tertiary structure of a globular protein, given only its sequence, is to search for conformations of low calculated energy. Recent advances in the distance geometry approach, called energy embedding (1, 2), enable us to find very rare, low energy conformations possibly subject to additional distance con- straints. For general application of energy embedding to protein conformational calcu- lations, the energy function must have the following characteristics: (i) the molecule must be represented as a collection of points in space, where each point may represent a single atom
or some group of atoms; (ii) the energy is calculated as a sum of pairwise interaction terms between all pairs of points; (iii) each term depends isotropically on the distance between the points and on which points they are; and (iv) each terms functional dependence on distance may have only a single minimum (unimodality). The need for each of these requirements is clear from an examination of the energy embedding algorithm (l), and will not be discussed here. In order for the potential to be applied successfully to many different proteins, we must further require that (v) the identity of the interacting points in the
G.M. Crippen and V.N. Viswanadhan
third requirement can only be specified on the basis of chemical type (atom type, residue type, etc.) and/or relative position in the amino acid sequence.
The purpose of this paper is to present a generally useful methodology for finding a potential of the above form given crystal structures of a number of proteins, and to examine the suitability of the potential we have produced for protein conformational calcu- lations. A number of potential functions for protein energy calculations have been published, but none fulfill all the design criteria listed above. The one point per atom potentials, such as that of Momany et al. (3), are not strictly sums of pairwise interactions because of the torsional terms they employ. We are also interested in speeding up the calculations by adopting a considerably simplified model of the protein. The function of Pincus & Scheraga (4) takes a step in that direction by maintaining an all-atom representation only for residues near each other in space. Unfortunately, even the one point per residue polyvaline potential of McCammon ( 5 ) and its predecessor (6) have torsional terms about the virtual bonds. Our earlier effort along these lines (7) does not foster good secondary structure, and some of the terms are not unimodal functions of the distance between the interacting residues. Thus we have been forced to devise a new potential adhering strictly to the design criteria.
The first requirement concerns the precision of representation of the protein molecule. Since all pairwise interactions must be included in the calculation of the energy for a given confor- mation, it is obviously much faster to deal with fewer points. Also, the fewer the points, the fewer the local minima in the potential, as an empirical rule. Therefore we have chosen not an all-atom representation, or a sidechain point and a backbone point per residue, but simply one point per amino acid residue, centered at the C". The penalty is that fine details of struc- ture will not be represented, and there may be errors on a scale even larger than the size of the atom clusters used. Without some explicit sidechain points, the isotropic interpoint terms depending only on distance (according to the third requirement) cannot distinguish between a right-handed and a left-handed a helix, for
example. The potential could, however, show a preference for two right-handed helices packed together over the packing of a right-handed one and a left-handed one. Thus it would be possible to arrive at the native conformation except for an overall mirror inversion of the entire mol- ecule, but the relative chirality of the various parts of the molecule could be correctly deduced, in principle. Since our present interest lies in calculating terthry structure of proteins, we do not anticipate that lumping all the atoms of an entire residue together will be too great an approximation.
The fourth requirement forced us to abandon the Oobatake & Crippen (7) potential, because some terms had as many as three minima with respect to the distance between the interacting (2"'s. Energy embedding with this potential nevertheless gave such encouraging results (2) that we felt obliged to devise a more suitable function.
The second requirement turns out to cause a great deal of trouble. We conjecture that whenever the number of pairwise unimodal interaction terms in the potential exceeds the number of conformational degrees of freedom, there will be many local minima. In our experi- ence, the number of local minima rises very rapidly with the number of points (and hence energy terms). For example, consider points labeled 1 and 2 constrained to lie on the x-axis. If the potential F consists simply of quadratic terms, then
and there is clearly only a single minimum (at Ixl -x2 I = l ) , a single interaction term, and one degree of conformational freedom. If we now add a third point and change F to
+ (2 - d d 2
then there are three terms in F and two confor- mational degrees of freedom. In all there are three minima: F(0, 1 , 2) = 0, F(0, 513, 413) = 413, and F(0, - 113, 413) = 413. The situation quickly becomes more complicated in three dimensions, but we have observed a high density of local minima for systems of many
Protein conformational energy
sequentially adjacent quartets of residue points) for all i . Physically it is clear that there must be some sort of energy term for residues with greater sequence separation, although such effects could be built into possibly complicated functional forms for the short-range inter- action terms, in principle. We instead chose to keep all interaction terms in the energy in spite of there being insufficient conformational degrees of freedom to make a one-to-one assignment between terms and conformational parameters. That left us with considerable ambiguity as to how the total energy should be apportioned among the many terms, and we have used that latitude to keep the functional forms of the terms simple and unimodal. .
points with even the simplest of unimodal interaction terms.
A second difficulty with having more interaction terms than degrees of freedom arises when trying to deduce the energy function causing observed conformations. Suppose the set of conformational parameters, (@i}, i = 1, . . . , N (representing bond lengths, angles, dihedral angles, etc.), are a necessary and sufficient set of parameters to completely describe a molecules conformation. Further suppose that the 9s are mutually independent in the global sense that any particular & can assume its full range of values regardless of the values of the other $ii+k. This would not be true, for example, if the 9s represented inter- point distances. Then suppose we can make a large number of observations on the molecule in an equilibrium system. Estimating for all i the probability of q5i taking on its various possible values, p(&), by the observed normal- ized frequency, we can use standard equilibrium statistical mechanics to conclude
i = l p(conformation) = n p(&)
where Z is the partition function. Because of the simple correspondence between statistically independent p(+i)s and additive energy terms, e(@Js, it is easy to calculate the total probability of any particular conformation, p(confor- mation), and ascribe its frequency of occurrence to a Boltzmann distribution over the energy function. Given a sufficiently large, unbiased set of observations, each q can be deduced for all values of &, for all i. Unfortunately, we are obliged to work with interpoint (interresidue) distances as our conformational variables. Not only are there fundamental geometric inter- dependencie