conformational analysis of proteins: algorithms and data structures for array processing

Conformational Analysis of Proteins: Algorithms and

Data Structures for Array Processing

C. Pottle Department of Electrical Engineering, Cornell University, Ithaca, New York 14853

M. S. Pottle and R. W. Tuttle Department of Chemistry, Cornell University, Ithaca, New York 14853

R. J. Kinch Department of Electrical Engineering, Cornell University, Ithaca, New York 14853

H. A. Scheraga Department of Chemistry, Cornell University, Ithaca, New York 14853 Received October 11, 1979

Current efforts to determine the nature of the interactions that influence protein folding involve, among other things, minimization of an appropriate empirical conformational energy function (ECEPP, Empirical Conformational Energy Program for Peptides) to obtain the native structure. Recause of the prohibitive cost of such a massive computational project, either on a conventional large-scale machine at a self-supporting installation or on a dedicated minicomputer, an alternative computer hardware system has been developed to aid in the conformational analysis of proteins. It consists of a Floating Point Systems AP-120B array processor and a Prime 350 minicomputer host. A version of ECEPP has been adapted to run on the AP-120B. The data structures and algorithms chosen for this version reflect the highly unusual parallel architecture of this machine. Benchmark comparisons with BPTI (Bovine Pancreatic Trypsin Inhibitor), a protein of 58 residues and a known structure, have been carried out on this system as well as on an IBM 370/168. They show a significant advantage in speed for the AP-l20B/Prime 350 system as well as a substantially lower cost. An energy minimization of BPTI with 154 variable dihedral angles is reported, an effort heretofore prohibited by the computer costs involved.

INTRODUCTION

Extensive work is underway in many laborato- ries t o determine the nature of the interactions that dictate how a polypeptide chain folds into the unique conformation of a native globular protein. T h e experiments of Anfinsen and co -worke r~ l -~ led to the concept that the native conformation is one of lowest free energy of the system and computational efforts are being devoted to minimize an appropriate empirical conformational energy function t o obtain the native structure. T h e in- gredients of this empirical function include the interactions tha t are known to determine the structures of small molecules and are assumed to apply equally well t o larger protein molecules. The

background of this problem and current ap- proaches to its solution have recently been re- viewed e l ~ e w h e r e . ~ - ~

Two major difficulties have made this problem insoluble so far-one an intrinsic difficulty and the other one of methodology. The intrinsic difficulty arises from the existence of many minima in the multidimensional conformational energy surface of the protein-the multiple-minimum problem.5 Its solution will probably require (1) various ap- proximate procedures to reach the correct potential energy well and (2) minimization within this potential well t o reach the m i n i m ~ m . ~ Both require extensive computation, which constitutes the second difficulty, that of methodology. Whatever approach is ultimately taken to solve the multi-

Journal of Computational Chemistry, Vol. 1, No. 1,46-58 (1980) 0 1980 by John Wiley & Sons, Inc. 0192-8651/80/0001-0046$01 .OO 46

Conformational Analysis of Proteins 47

ple-minimum problem,5 the computational effort amounts to an evaluation of all pairwise interatomic interaction energies to compute the conformational energy. For a molecule of N atoms this involves a computation of the order of N 2 interaction energies.

The cost of such a massive computational project either on a conventional large-scale machine a t a self-supporting installation or on a dedicated minicomputer is prohibitive. In this article we describe the development of a low-cost alternative for this and other problems. Its hardware consists of a Floating Point Systems (FPS) AP-120B array processor and a Prime 350 minicomputer host. The array processor (AP) is a high-speed computing device that requires a conventional computer as a host and can operate more efficiently on organized arrays of data than on scalars. The AP can be applied most successfully to computational problems for which (1) the data structure can take advantage of the hardware architecture, (2) several arithmetic operations can be carried out on each item of data fetched from its memory, and (3) there is minimal need for communication between the AP and the host. An earlier report8 described the application of an AP-120B with a large mainframe host (IBM 370/168) to a problem (a Monte Carlo computation) similar to those considered here.

We demonstrate here the feasibility of the new hardware system for calculating the conformational energy of a large polypeptide or protein. We describe the algorithm for ECEPPg used to calcu- late the conformational energy of a protein, the elements of the AP/host system, and the manner in which this system is used in ECEPP calculations. For illustrative purposes we consider bovine pancreatic trypsin inhibitor (BPTI), a protein of small size (58 residues) and known three-dimen- sional s t r i~c ture ,~~ ' and compute its total energy and the derivatives of its energy with respect to each of 154 variable dihedral angles. These quantities are required for the energy-minimizing routine MINOP," currently in use in our laboratory. For comparison of the time and cost for one evaluation of the energy and one set of derivatives the same quantities are calculated on the IBM 370/168 computer.

SUMMARY OF ECEPP

The FORTRAN program for ECEPP has been deposited with the Quantum Chemistry Program Exchange (QCPE) with extensive documentation to aid in its use.* From a survey of the structural literature on amino acids and peptides a set of standard geometries (bond lengths and bond angles) was deduced for all the naturally occurring amino acidsg The peptide group is usually fixed in the planar trans conformation, except for peptide bonds preceding proline, in which both trans and cis conformations are considered. The justi- fication for these restrictions on the geometry when the conformational energy is calculated has been presented e l ~ e w h e r e . ~ J ~ The program uses matrix multiplication methods to generate the Cartesian coordinates of the atoms for any desired conformation of a polypeptide, defined by the standard geometry and by the dihedral angles for rotation about the single bonds of the backbone (4, jb, and w ) and side chains (xl, x2, etc.); 4, jb, and w refer to rotation about the N-C", Ccy--C', and C'-N bonds, respectively; w is fixed in most computations. These dihedral angles are the independent variables in the calculation of the conformational energy.

ECEPP is then used to compute the conformational energy of the specified conformation. This energy consists of torsional energies (in the form of periodic functions) for rotation about the single bonds of most side chains and the interaction energies for all atom pairs whose interatomic distance can vary. (The effect of hydration may also be included as an additive contribution to the conformational energy.13) The parameters for the torsional energies were obtained from experi- mental torsional barriers of small molecules and the parameters for interatomic interactions from crystal structures of small mole~ules .~ Because of the large number of interacting atom pairs in a large polypeptide, the evaluation of their inter-

* The computer program for fi:CEPP, all associated geo- metric and energy parameters, and an instructional manual for its use are available on magnetic tape from the Quan- tum Chemistry Program Exchange. Write to Quantum Chemistry Program Exchange, Chemistry Department, Room 204, Indiana Universitv, Bloomington. Indiana 47401, for standard program request forms and then order No. QCPE 286.

48 Pottle et al.

action energies requires most of the total computation time in ECEPP. The relations used to compute the interaction energy for atom pair ij are the are the following:

(1) Electrostatic:

U,1 = 332.0q;qjlDr;j (1)

where qi and q j are atomic partial charges in electronic charge units, D is the dielectric constant, r;j is the internuclear distance in angstrom units, and 332.0 is a factor to convert to energy units of kcal/mole.

(2) Nonbonded (modified Lennard-Jones 12-6 potential):

where the coefficients Ekl and r0,kl are assigned specific values for each coinbillation of atom types k and 1; Ekl is the depth of the energy minimum, and r0,kl is the corresponding internuclear distance. The quantity F has a value of ‘12 for 1-4 interactions, namely, those between atoms separated by three single bonds, and a value of 1 for 1-5 interactions, namely, those between atoms separated by four or more single bonds.

(3) Hydrogen-bonded [substituted for eq. (2) when appropriate]:

where Ekl and rO,kl are corresponding specific coefficients for the two interacting atoms in the hydrogen bond, namely, the donor H atom and the acceptor 0 or N atom.

Calculation of the total interaction energy [eqs. (1)-(3)] between one pair of atoms is designated as the “inner loop” of an ECEPP calculation. The summation over all pair interactions in the “inner loop” constitutes the “outer loop” of an ECEPP calculation. Calculations of the torsional energy and the artificial loop-closing potential to form disulfide bonds are carried out separately.

Most of the computation time of ECEPP is spent in identifying atom types and selecting charges, c’s, and rots, deciding whether the interactions are 1-4 or 1-5 and whether they involve hydrogen bonds, and carrying out the arithmetic required by eqs.

(1)-(3). This arithmetic includes the computation of r$ from the Cartesian coordinates of the atoms and the square root of the reciprocal of this quantity for use in eq. (1).

In minimizing the ECEPP energy, the MINOP routine” is used. MINOP requires analytical derivatives of the total energy with respect to the variable dihedral angles.

ECEPP has been applied to a variety of computational problems which include terminally blocked single residues and oligopeptides, cyclic peptides, and fibrous and globular proteins. These problems are summarized el~ewhere.~

ELEMENTS OF THE SYSTEM HARDWARE CONFIGURATION

The exact configuration of the computer system represents a compromise between the types of conformational energy calculation envisaged: the requirement for high-speed execution of ECEPP- type programs on the AP, and the available fund- ing (see Acknowledgments). The following considerations led to the choice of the FPS AP-120B and Prime 350.

AP-120B

Details of the architecture of an AP are discussed e1~ewhere.l~ The AP-120B is a satellite processor supported by a host computer. It consists of a collection of functional units which include a floating-point adder, a floating-point multiplier, two blocks of 32 high-speed registers for storing intermediate results, an arithmetic logic unit for 16-bit integers, and three memory units, one of which is devoted to storage of program instructions. The other two are used for storage of data. A floating-point number is represented by a 10-bit binary exponent and a 28-bit fraction; this representation yields a precision of about eight decimal digits and a range of 10-150-10+150. The functional units may be connected by the program for communication through several different data paths. Individual bits in a 64-bit instruction word of the program activate the functional units and cause the data paths to be connected from the output of one unit to the inputs of one or more units. One instruction may thus activate several


functional units, interconnecting them so that several seemingly unrelated operations are being performed in parallel. This machine, which may legitimately be called a “parallel computer,” is programmed by microinstructions coded directly by the user or generated from a FORTRAN program by a compiler. The machine can execute these instructions (each of which can initiate one floating-point addition and one multiplication) a t the rate of 6/psec. The result is a computer that will operate at a maximum rate of six additions and six multiplications per microsecond or 12 million floating-point operations per second (12 megaflops). This computation rate places the AP-120B in a class with the CDC-7600 and makes it considerably faster than the IBM 370/168. Its chief limitations are a limited data communication rate with the host computer and the difficulty involved in writing optimized machine-language code for it. The cost of the typical AP-120B ranges between $60,000 and $120,000. In many benchmark tests of recent computers on scientific problems the AP-1208 had the smallest cost/computation rate ratio, usually by an order of magnitude.15

To estimate the rate a t which ECEPP could be run on an AP-120B we used timings provided by FPS for the execution of some of its optimized library routines for vector operations (e.g., vector addition and vector square root). The inner loop of ECEPP was laid out as a series of vector operations, more than 80% of which were contained in the FPS subroutine library. By estimating the timing oft he remaining 20% we concluded that the AP-120B would run ECEPP in this mode at least as fast as the IBM 370/168. We further estimated that it could be speeded up by a factor of 3-5 more if the entire inner loop were carefully coded in assembly language. Data transfer time with the host computer, one of the weak points of the array processor. did not appear to be a problem with ECEPP. on the basis of these estimates the AP- 120B was selected. As shown in the Results section, the speedup factor turned out to be a t least 6.

The next decision that had to be made was the afnount of main data memory required for our computations. Although 64K words of AP-120B main data memory are obtainable without a memory hank-switching option, we selected a 32K

main data memory because ECEPP requires only a modest amount of data storage (about nine words per atom). It was estimated that 32K data words would suffice for proteins of about 200 residues-sufficient for our purposes for several years. Main data memory is available with two cycle times: standard speed that can be accessed every other instruction cycle and high speed that can be accessed every cycle. High-speed memory was considerably more expensive than the standard-speed version at the time of purchase. The inner loop of ECEPP involves so much computation that data are requested a t a rate below the capa- bility of standard-speed memory. Because high- speed memory offered no increase in computational rate, standard speed was selected.

Programs in the A P execute from the program source memory which is separate from the main data and table data memories. Program source memory is expensive because of its speed and is limited in any case to 4K 64-bit words. Because we originally intended to hand-code our programs in assembly language, we began with only 512 words of program source memory, which we estimated to be sufficient. With the subsequent availability of a FORTRAN compiler for the AP-120B (which, however, is not so efficient as our hand-generated code and requires considerably more storage) we added 2K additional words of program source memory. If the compiler were used extensively, we would generate enough code to justify expansion to the 4K-word limit.

FPS offers a high-speed auxiliary memory (table data memory) in addition to main data memory. I t has the distinct advantage of being accessible simultaneously and in parallel with the main data memory. As usually supplied, the ~ ~ - 1 2 0 B contains 2K words of read-only table data memory which primarily contains tables of lookup constants for the circular functions and a division algorithm. We added 1K words of read/write table data memory. Half of it stores the quantities 6 and ro of the Lennard-Jones function for 16 atom types. The other half contains constants for the piecewise quadratic approximation of l /r when r2 is given. The presence of this additional 1K of writable table data memory is a key to the efficient operation of the ECEPP algorithm.

50 Pottle et al.

Prime 350 Host Computer ADAPTATION OF ECEPP TO THE AP

To recode the ECEPP program to run on the AP/host system it was necessary (1) to identify the inner computational loop that would profit most by assignment to the AP, (2) to develop for this inner loop a computational data structure and algorithm that would be optimized for each other and for the architecture of the AP, ( 3 ) to decide how much of the entire program should be executed in the AP, and (4) to identify and isolate those parts of the program (the “setup” phase) that convert user input data (such as amino acid sequence and initial dihedral angles) into the data structure (such as Cartesian coordinates, E ’ S , To’s, and charges) used by the inner computational loop (in the subsequent “execution” phase). These considerations are now discussed in order.

The AP-120B is provided with both hardware and software to interface with a variety of large and small host computers. The selection of the host was based on a compromise between available funds, the need to accommodate several users in a time-sharing environment (in addition to one using the AP), and the need to provide the computational services offered by a large machine such as the IBM 370/168, although with longer running time. This last requirement, in particular, elimi- nated from consideration all standard minicom- puters with a maximum program size of 64K bytes. A machine that meets these criteria acceptably is the Prime 350.

This minicomputer is one of the smallest and least expensive supporters of a true virtual- memory, time-sharing environment for several users. Programs of 768K bytes can execute on this machine, although the execution speed depends strongly on the number of users logged onto the system, the amount of high-speed memory present, and the access time of the disk used for holding data from the high-speed memory not currently being used (the paging device). Financial considerations limited us to 192K bytes of high- speed memory and a paging and file storage device that consisted of two 6-megabyte cartridge disks. Seven asynchronous lines serviced four nearby terminals, a dial-up port, a connection to a Prime 400 in another building, and a 120-character/sec hard-copy terminal used primarily as a line printer. This host computer is presently adequate for the protein conformational energy calculations foreseeable in the near future. Its principal shortcomings, which will be rectified in the future, are currently a lack of disk storage space, too little memory for efficient time sharing, and a lack of convenient access to an offline bulk-storage me- dium such as magnetic tape. The connection to the nearby Prime 400 (which does have a magnetic tape unit) has enabled us to transfer our earlier FORTRAN programs and protein data bases from magnetic tape to Prime 350 disk files. New programs and data can, of course, be entered and edited a t any of the four cathode-ray-tube terminals and the results printed on the line printer.

Inner-Loop Algorithm (Execution Phase)

An inner loop of ECEPP had already been optimized in assembly language for the IBM 370/168. I t performs the calculation of r2 between a pair of atoms and then the contribution of this pair to the electrostatic and nonbonded (or, alternatively, hydrogen-bonded) energy [eqs. (1)-(3)].

It was necessary to make a change in this algorithm because the AP-120B hardware does not contain an instruction for division. The AP-120B, however, does possess a useful hardware facility for lookup in tables of 128 or fewer words; that is, we have incorporated a piecewise quadratic approximation in the program to provide l /r (for calculation of the electrostatic energy) when r2 is given, thus using one sequence of operations to obtain a reciprocal and a square root. The powers of l /r used in the calculation of the nonbonded energy are then built up by multiplication.

A change that was not made was the “vectorization” of the inner loop by computing a vector of l/ri, for all j > i, followed by vector element-by- element products and sums, with all the elements finally summed to total the energy. Vectorization of this inner loop would lead to excessive tempo- rary storage in a rather slow memory as well as inefficient use of the floating-point arithmetic


units.* Instead, the parallel features of the architecture were used to process the interaction of atom i with five others a t once (instead of only one), where each of the five (atoms j to j - 4) are a t different stages of the energy computation; for example, one of the instructions of the inner loop, namely, the fourth, carries out the following operations when it is executed:

(1) Accesses main data memory for the x coordinate of atom j .

(2) Calculates z , - z j for the z coordinate of atom j - 1.

( 3 ) Does nothing for atom j - 2. (4) Accesses table data memory for €0 for atom

j - 3. (5) Saves (rolr)4 (which was calculated in an

earlier step), multiplies (ro/rI4 by (rO/rl2, and saves the electrostatic energy updated for atom j - 4 in a high-speed register. Thus two floating-point operations (2 and 5) and accesses to two memories (1 and 4) are initiated by this single instruction. This one instruction has accomplished part of the calculation (steps 1-5) of the interaction energy between atom i and atoms J to j - 4; that is, by carrying out the computations in parallel five atom pairs are being treated a t once in the AP.

It would have been desirable to have a single set o f inner-loop instructions to handle all types of interaction. This objective, however, was not re- alized because the decision logic required to dif- ferentiate between 1-4 and 1-5 interactions added instructions to the inner loop. Instead, the rela- tively fewer 1-4 interactions were identified and treated separately from the 1-5 interactions. The result is an inner loop containing 14 instructions that handle the calculation of the electrostatic energy and 1-5 nonbonded energy (both hydrogen-bonded and otherwise). These 14 instructions initiate 13 multiplications and 11 additions for a computational rate of [(13 + 11)/14] X 6 flopslpsec or 10.3 megaflops, that is, a high percentage (86%) of the theoretical maximum of 12 megaflops. This

* The feasibility study mentioned earlier did adopt this approach in order to use the timings of FPS vector library routines. The code that was actually written without vectorization of the inner loop is estimated to have speeded up the calculation by a factor of 6.

high efficiency is achievable because the energy computation of ECEPP can be arranged in a data structure particularly suited to the architecture of the AP.

Inner-Loop Data Structure

To support a computational rate of 10.3 megaflops, however, the data structure had to be chosen carefully. Because the AP-120B does not require use of its integer arithmetic unit for ad- dressing purposes when accessing main data memory sequentially (and therefore releases it for use in another parallel operation in the same instruction), the Cartesian coordinates and charge of each atom are stored sequentially in this memory. The atom type, a four-bit integer, is packed into the least significant bits (higher decimal places) of the floating-point number that represents the charge. There it constitutes an insignif- icant “noise” in the computation of the electrostatic energy. The atom type is easily recovered by a masking operation, which extracts the least significant digits from the number representing the atomic charge. The atom type provides an address offset in a lookup table to obtain its values of 6 and ro. Hydrogen bonding is signaled by using a negative value of Ekl for the appropriate k,l pair.

The remaining problem concerns the encoding of the interaction type (1-2,l-3,l-4, or 1-5) which involves atoms i and j . One way of viewing this problem is to construct a map of interaction types. This map could take the form of a square matrix whose i th row represents atom i and whose j t h column represents atom j . An 0 in position (i,j) represents 1-2 or 1-3 bonding, for which no interaction energy is calculated. An X represents a 1-4 interaction and a blank represents a 1-5 interaction. This map now appears as a sparse matrix, an example of which is shown in Figure 1 for a small, 110-atom section of ribonuclease (which is a small loop containing a disulfide bond). Non- blank entries close to the diagonal represent atoms near the backbone; this feature arises because of the convention used in ECEPP to number the atoms sequentially [first backbone, then side chains, of each residue (see Fig. a)] and has no other significance. The nonblank entries far from

52 Pottle et al.

x x x 0 x x o

Figure 1. Example of a sparse matrix map for the 110 atoms of residues 65-72 of ribonuclease (actually the map is drawn for this amino acid sequence in a complete peptide, with -NH2 and -COOH end groups). The symbols are defined in the text.

H3 H5 o8 ,,I3 ~ 1 0 2 0105 the diagonal correspond to atoms (remote from one another in the amino acid sequence) connected by a disulfide bond. A full representation of either triangle (e.g., the upper) of this matrix would require ( N 2 - N)/2 , entries and consequently a

interaction of one type is most likely to be followed

1101 11 104 109 110

I

C-C-0-H I I I I I I H 1- N2- c4- c L- N 12 c 1.4. . . . . . . . . . . . 9 16 10 I 106 I 103 107 cI6 H-c-n H-C-H

I S"

Figure 2. Illustration of numbers of atoms in ECEPP for prohibitive amount Of memory. Noting that an the example considered in Fig. 1.


Table I. Atom No.” 5 6 7 8 9 10 11 12 13 14 15 16 .-.-*-. 107 108 109 110

Example of coding for atom 4 of Fig. 1.

Matrixelementb 0 0 0 0 0 0 0 0 X X X

a These atoms are in row 4 of the upper triangle of the sparse matrix in Fig. 1 and correspond to atom j being 5,6,7,

b These are the elements in row 4 (upper triangle) in Fig. 1. etc.

by another of the same type suggests a representation in terms of sequences of the same type. In the discussion that follows the term “interaction array” is used to refer to the much more compact representation of the sparse matrix actually used. Each row of the upper triangle is coded as a series of packed words in the main data memory of the AP. Each word for atom i defines a run of consec- utive atoms j ; it contains the number of atoms j in a 1-2 or 1-3 relationship to atom i, followed by the number of 1-4 relationships and then by the number of 1-5 relationships. Table I provides an example of the coding of one row of this matrix and Table I1 illustrates the code words for this row. Only two code words are required for atom 4 because 1-5 interactions (atom 4 interacting with atoms 15-107) are followed by only one 1-4 interaction (the interaction of atom 4 with atom 108; see Fig. 2). The end of this series of code words for atom 4 is signaled by a negative number in the low mantissa. The outer computational loop with this

Table 11. Packed word coding for row 4 of Fig. 1.

High Low Code word Exponenta mantissaa mantissaa

number (10 bits) (12 bits) (16 bits)

4 b 8 2 93 5“ 0 1 -2

a A floating-point number in the AP-120B is represented by its exponent and high and low mantissa, where 10,12, and 16 bits, respectively, are used to represent them. For a machine that might use 2,2, and 3 decimal digits for the representation of, say, the number 0.16752 X lo5 the quantities would be 05, 16, and 752, respectively.

Atoms 1, 2, and 3 required only one code word each, whereas atom 4 requires two code words. The first code word for atom 4 (beginning at atom 5 ) contains a run of 8 O’s, a run of 2 X’s, and 93 blanks; hence this code word contains the entries 8, 2, and 93.

The second code word (beginning at atom 108) contains no O’s, 1 X, and 2 blanks; hence this code word contains the entries 0, 1, -2 . The minus sign indicates that this code word completes the row and that the next code word per- tains to atom 5.

data structure consists of the successive reading and decoding of these code words. Each new code word generates an execution of the inner loop contained in a subroutine named ACAL; for example, when code word 4 (Table 11) causes execution of ACAL, it evaluates the energies involved in two 1-4 interactions, followed by the energies of 93 1-5 interactions.

There is an overhead associated with each call to ACAL: in addition to setting up counters and address registers at the beginning of ACAL, the processing of the last 1-5 interaction in the loop generates a considerable amount of computation on the next four atoms beyond it because five atoms are being treated simultaneously. These computations must be discarded, thus contribut- ing to overhead. Therefore to maximize processing speed, that is, to reduce this overhead, it is necessary to minimize the number of calls to ACAL.

Our current version of ECEPP generates 1253 code words for BPTI (886 atoms) or about 1.4 code words per row of the interaction matrix described. An interesting problem that we have not yet in- vestigated is how to number the atoms of a polypeptide chain to minimize the number of code words in the interaction array.

Allocation of Effort to Host and AP

Because all calculations that actually compute the electrostatic, nonbonded, and hydrogen- bonded energy have been put in the inner-loop routine ACAL, the outer loop, which serves only to set up loop counting for ACAL, could run on the host, thereby avoiding the expenditure of considerable programming effort to run it on the AP. This approach, however, would be highly inefficient: because there are 1253 code words for BPTI, the AP would be activated 1253 times to run ACAL. On the other hand, if the outer loop were run on the AP, the AP would be activated only once for the

54

W PRIME 350 r-------- 1 ' I I S u p e r v i s o r (1101 I

Pottle et al.

- AP AP-1208 r - - iNt R-G - - 7

polypeptide data I I I I

' (rent I I

calculation of the total electrostatic, nonbonded, and hydrogen-bonded energy of a particular conformation. Hence the outer loop is also allocated to the AP. The transmission of data to and the re- trieval of data from the AP, as well as the activation of the AP and the response to a completion signal from the AP, involve a considerable amount of overhead for the host. This overhead takes time, which is increased considerably in a minicomputer that is time-shared among several users. During the time that the AP is assigned to one user and is waiting for work from the host it must remain idle.

In the present version of ECEPP the entire calculation which supports the order of N 2 computations of electrostatic, nonbonded, and hydrogen-bonded energy has been hand-coded in assembly language for the AP. So far the designation of the conformation and the calculation of disulfide interaction energy and torsional energy for rotation about single bonds are handled by the host, but they will also eventually be allotted to the AP. Hence many conformations may be specified and their energies computed by the AP without requiring intervention by the host.

For purposes of illustration consider the study of a polypeptide containing a selected set of M bonds about which rotation will be allowed. As- sume that this set of M dihedral angles is to be assigned different values in each of several ( L ) runs to examine the total energies of L conformations. A block diagram for this computation, which shows the various AP subroutines, appears in Figure 3. The host program provides the supervisory procedure for the whole problem. From a disk file prepared by the setup phase (see below) the following fixed polypeptide data are read and transferred once to the main data memory of the AP:

(1) a set of reference Cartesian coordinates for each atom (with all variable dihedral angles set equal to zero),

(2) charges for each atom, (3) a number (0-15) to designate the type of

(4) the interaction array of code words, an ex-

(5) information for each variable dihedral angle

each atom,

ample of which appears in Table I,

Polypeptide Dihedral Angler

Figure 3. protein conformation by energy calculations.

Block diagram of host/AP routines to study

which specifies the end atoms of the bond and defines the atoms rotated by a change in that dihedral angle,

(6) a designation to indicate which half-cystines are paired to form disulfide bonds.

The 6 and ro lookup tables and the l l r interpo- lation tables are read from another file and transferred to the table data memory of the AP. These tables also need be transferred only once.

The host program then reads the first set of variable dihedral angles from a user input file to specify the conformation, sends these dihedral angles to the AP, and starts the AP. The A P generates a set of Cartesian coordinates from the reference coordinates and the given dihedral angles (AGEN) and decodes the appropriate code words for every atom in the interaction array (the outer-loop routine ANERG); it calls the subroutine ACAL, which contains the inner loop, to compute the energy once for each code word. The total electrostatic and nonbonded (plus hydrogen-bonded) energies are then returned to the host, where the torsional and disulfide bond energies are added. The host then prints the results and sends the AP another set of dihedral angles. For each energy calculation M independent variables are sent to the AP. The A P returns only EES, the total of the electrostatic energies, and ENB, the total of the nonbonded and hydrogen- bonded energies. This set of operations is repeated L times. This data transfer requirement is ex- ceedingly small for the large computation that results; that is, most of the time is spent on com-


putations in the AP and very little in transferring data.

Setup Phase

The “setup phase” of any computation is the portion of the overall calculation that converts data presented in user input format to a form more efficient for later computation. Typically i t needs to be run only when the description of the object under investigation is changed. The “execution phase” computes values of one or several attrib- utes of the object, given a set of values for the independent variables of the problem. This phase may be executed hundreds of times for a given description of the object with different sets of values for the independent variables. The setup phase is usually not computationally intensive and requires reading and writing of disk files. It should therefore be run as a separate program on the host.

For ECEPP the object is a polypeptide of a particular amino acid sequence. The setup phase (PRECEP) reads user input data that specifies the amino acid sequence of the polypeptide chain and identifies those dihedral angles that are to be varied in the later execution phase (the energy calculation). The program PRECEP, among other things, generates a particular conformation from a library file of amino acid residues, each in its own local coordinate system, and identifies 1-2, 1-3, 1-4, and 1-5 interactions for the given amino acid sequence. I t then constructs an output file whose contents and use have been described in the preceding subsection.

I t should be noted especially that the key data structure for the energy calculation, the interaction array of code words, is generated by PRECEP. PRECEP is run only when the subset of dihedral angles used as independent variables in energy minimization or Monte Carlo calculation is changed.

Conformational Energy Minimization Programs

A variety of host programs in addition to that involved in ECEPP and diagrammed in Figure 3 is

used in our laboratory. In one of them a Monte Carlo procedure generates trial sets of variable dihedral angles to locate conformations wit.h a high probability of low energy. The overall block diagram of this program is that in Figure 3, in which the supervisory program (in the host) generates sets of values for dihedral angles instead of reading them from a user file. Because of the speed with which the AP calculates a total energy (see the next section) the Monte Carlo dihedral angle generator becomes a high-priority candidate for moving to the AP to reduce the frequency of AP requests to the host for intervention. In this case the torsional and disulfide bonding-energy calculations would have to move also.

A more complex program, computationally, is energy minimization that uses the variable-metric algorithm MINOP.ll This algorithm, running on the host, generates values for variable dihedral angles in response to previous energy calculations in an attempt to locate a local minimum of energy. In contrast to Monte Carlo procedures MINOP requires a gradient vector for its operation. The analytical calculation of the gradient vector is done in the AP (ADRV) because its execution requires considerably more time than an energy calculation. The formula for one element of the gradient vector is given by

where 0, is the mth variable dihedral angle. Three nested computational loops are implied by eq. (4). To avoid storage of intermediate results the in- nermost loop generates the M values of drijldd,, an intermediate loop computes the summation over j , and the outermost loop computes the summation over i (as in the energy calculation). The intermediate loop, which calculates dEijldrij, is similar to the inner-loop calculation discussed in the preceding sections and is executed on the order of N2/2 times. The inner loop, which calculates M values of dr;jld0,, is thus executed on the order of N 2 M / 2 times and becomes the code that most requires optimization. ADRV has been hand-coded in assembly language for the AP but has not yet had the attention that was given to the

56 Pottle et al.

Dihedrol Polypeptide I I I Angles I - u Figure 4. protein conformation by energy minimization.

Block diagram of host/AP routines to study

coding of ACAL. Their performances are discussed further in the next section.

A block diagram of the host/AP routines used in minimization appears in Figure 4. The routines ADRUN and AVEC are the counterparts of ANERG and AGEN, respectively.

RESULTS

To illustrate the power of the AP/Prime system we present the results of several benchmark computer runs that involve one cycle of minimization of the energy of BPTI on this system and on the IBM 370/168. In these runs the energy routine was called once and the gradient was called once by MINOP. The FORTRAN program used on the IBM machine was compiled under FORTHX (optimization level 2).

The starting conformation was taken from Table I11 of Swenson et a1.12 All dihedral angles 4 and 4 (except 4 of Pro residues) and all side-chain dihedral angles x1 for those residues with side chains were allowed to vary. There were, then, 154 variable dihedral angles in the benchmark runs.

The comparison data appear in Table 111. From these data it may be seen that the AP is 11.3 times faster than the IBM 370/168 in evaluating the electrostatic and nonbonded (plus hydrogen- bonded) energy of BPTI and 3.35 times faster in evaluating a 154-element gradient vector. The gradient speed factor is low because the code for the gradient routine in the AP has not yet been optimized for performance. We anticipate that

Table 111. cycle of minimization of the energy of BPTI.

Comparison of two computer systems for one

A~-l2OB/Prime 350 IBM 370/168

Energy evaluation 1.12 sec (AP) 12.7 sec Gradient 190 sec (AP) 636.5 sec

Total CPU time 55 sec (Prime) 10 min, 52 sec Total elapsed time 5 mina 33 min Cost of iob N/A $118.64

evaluation

a This includes 54 sec not accounted for in the preceding three lines and due to overhead at the Prime/AP interface, with no other users logged onto the Prime. If other users had been logged on, the total elapsed time would have been greater.

such optimization can reduce the time required for computation of the gradient, perhaps by a factor of 4-5.

Some data are available to indicate how much of the speedup factor of 11.3 is due to the AP hardware and how much to the inefficiency of the IBM 370/168 FORTRAN code. An earlier version of ECEPP required 15.0 sec to evaluate the energy of BPTI when completely coded in FORTRAN. This time dropped to 6.7 sec when the inner loop was carefully programmed in assembly language. The speedup factor of 2.2 obtained by hand-coding the inner loop would be expected to hold in the newer FORTRAN version used in the IBM 370h68 computation of Table 111, leaving a speedup factor of 5.1 attributable to the AP hardware. Because the assembly language version requiring 6.7 sec for execution spent 88% of the time in the inner loop, there was little reason to code any other part of ECEPP in assembly language. No such data are available for the gradient calculation, but we be- lieve that, as indicated above, a similar speedup factor will be obtained after optimization of the A P gradient code.

The final benchmark was a series of runs to complete the minimization begun above. The minimum was reached after 513 gradient evalua- tions and 1090 energy calculations that took 27.4 hr of A P time. Several successive runs of 3-4 hr each went into this effort. This minimization is probably one of the largest ever attempted with unconstrained nonlinear parameter optimization algorithms, both in the amount of computation required for a function evaluation and the number


of variable parameters (154). I t would have cost in the neighborhood of $50,000 to carry out on the IBM 370/168.

These results may be used with caution to estimate the performance of the AP on other proteins. Because the time for calculation of the energy is roughly proportional to the square of the number of the atoms in the protein, this time is 1.12 sec/ (886 atoms in BPTI)~ , or 1.43 psec/atom squared. The calculation time for the gradient will show a similar variation with the number of atoms and will be approximately linear in the number of variable dihedral angles; for example, lysozyme (hen egg white) is a protein with 129 residues and 1951 atoms. An energy computation should take about 1.43 X lop6 X (1951)2 = 5.4 sec. The number of variable dihedral angles corresponding to those in our BPTI benchmark is 371. An evaluation of the gradient will thus take about (190/60) X (1951/ 88612 X (371/154) = 37 min of AP time. It is diffi- cult to use this information to predict how long a complete energy minimization on lysozyme would take, starting from, say, its crystal structure. Little is known about the performance of parameter optimizat,ion algorithms with such large numbers of variables. An order-of-magnitude estimate for the total minimization, however, would be two weeks of' A P time.

DISCUSSION

The A€'-120B and other commercially available array processors were developed specifically to meet a need for high-speed signal processing in defense applications and oil exploration.16 The fact that the inner loop of ECEPP can execute a t a rate of 10.3 magaflops [86% of the maximum possible rate of six multiplications and six additions per microsecond (12 megaflops)], faster than the fast Fourier transform algorithm for which the AP was designed, suggests that the machine is an excellent one for conformational energy calculations. A single user of the Prime 350 who assigns the A P to himself alone should be able to carry out a Monte Carlo calculation or an ECEPP-like minimization at a rate considerably in excess of that of the IBM 370/168 and at a fraction of the cost. The only reason why the total elapsed time for the compu-

tation might not be considerably shorter is the difficulty of reactivating the host program in a time-shared Prime 350 after the A P computation is completed. The results seem to indicate further that two or three days' heavy use of the AP/host system would produce data that would cost the equivalent of a normal one-year computing budget to obtain from an IBM 370/168.

Our experience, as well as that of others,lT indicates that most calculations involving the posi- tions of atoms in a molecule (both static and dy- namic) can profit greatly from treatment by an AP. These computations typically do several arithmetic calculations per memory access, have a readily identifiable small inner loop, need less than 64K words of data storage, and require little communication with a host computer. Computa- tional problems in quantum chemistry, however, are distinctly different and usually have none of these characteristics. The primary difference is that huge matrices and vectors are required in these computations. These arrays are so large that they must reside on disks in any system and transference of these data through the host system and the AP interface is far too slow and inefficient. We have not examined peripheral devices directly attached to the AP because they are not needed in our types of problems.

Our approach has been to study appropriate data structures and algorithms for an array processor and to follow them with careful assembly language coding. Our primary purpose has been to show the speed and cost effectiveness made possible by this hardware. For those with computational problems suited to array processors A P cost effectiveness is still possible without extensive reprogramming by use of an A P FORTRAN compiler. Compilers for the AP-l20B, a machine that will interface with a great number of host computers, have recently become available from Floating Point Systems and Cornell 1Jniversity. We have made no benchmark runs to measure the loss in speed that accompanies the use of compiler-generated code instead of hand code but estimate a factor of about 10 for the ECEPP energy calculation. This is still a good computational rate and the cost savings could be considerable. Care must be exercised, however, in the selection of

58 Pottle e t al.

hardware; for example, as indicated earlier, our original specification of 51 2 words of program source memory for the ~ ~ - 1 2 0 B would not be sufficient to support the present FORTRAN compilers, which generate a comparatively large number of machine instructions.

The National Resource for Computation in Chemistry has recently issued an assessment report on the AP in computational chemistry.18 It should be helpful to those interested in its appli- cability to their problems.

This work was supported by research grants from the National Science Foundation (PCM77-09104) and the National Institute of General Medical Sciences, National Institutes of Health (GM-25138). I t was presented before the Division of Computers in Chemistry at the 179th Na- tional Meeting of the American Chemical Society, Hous- ton, Texas, March 1980.

References

1. M. Sela, F. H. White, Jr., and C. B. Anfinsen, Science,

2. F. H. White, Jr. and C. B. Anfinsen, Ann. N.Y. Acad.

3. F. H. White, Jr., J . Biol. Chem., 235,383 (1960). 4. C.B. Anfinsen, E. Haber, M. Sela, and F.H. White, Jr.,

Proc. Natl. Acad. Sci. USA, 47,1309 (1961). 5. G. Nemethy and H. A. Scheraga, Q. Rev. Biophys., 10,

239 (1977).

125,691 (1957).

Sci., 81,515 (1959).

6. H. A. Scheraga, in Versatility of Proteins, C. H. Li, Ed., Academic, New York, 1978, p. 119.

7. H. A. Scheraga, in Regensberg Symposium on Protein Folding, R. Jaenicke, Ed., Elsevier, Amsterdam, 1979, in press.

8. G. Chester, R. Gann, R. Gallagher, and A. Grimison, in Computer Modeling of Matter, P. Lykos, Ed.; ACS Symp. Ser., American Chemical Society, Washington, D.C., 1978, Vol. 86, p. 111.

9. F. A. Momany, R. F. McGuire, A. W. Burgess, and H. A. Scheraga, J . Phys. Chem., 79,2361 (1975).

10. J. Deisenhofer and W. Steigemann, Acta Crystallogr. B, 31,238 (1975).

11. J. E. Dennis and H. H. W. Mei, Technical Rep. No. 75-246 (1975), Department of Computer Science, Cornell University, Ithaca, New York 14853.

12. M. K. Swenson, A. W. Burgess, and H. A. Scheraga, in Frontiers in Physicochemical Biology, B. Pullman, Ed., Academic, New York, 1978, p. 115.

13. Z. I. Hodes, G. Nemethy, and H. A. Scheraga, Bio- polymers, 18,1565 (1979).

14. W. R. Wittmayer, Comput. Des., 17(3), 93 (1978). 15. See, for example, R. Podmore, M. Liveright, S. Vir-

mani, N. Peterson, and J. Britton, Proc. 1979 Power Industry Computer Applications Conf., Cleveland, OH, May 1979, p. 325.

16. C. N. Winningstad, Datamation, 24(10), 159 (1978). 17. K. R. Wilson, in Computer Networking and Chemis-

try, P. Lykos, Ed., ACS Symp. Ser., American Chemical Society, Washington, D.C., 1975, Vol. 19, p. 17.

18. N. S. Ostlund, “Attached Scientific Processors for Chemical Computations: A Report to the Chemistry Community,” National Resource for Computation in Chemistry, to appear.

conformational analysis of proteins: algorithms and data structures for array processing

Documents