prediction of the structure of groes and its interaction with groel

11
PROTEINS: Structure, Function, and Genetics 22:199-209 (1995) PREDICTION REPORT Prediction of the Structure of GroES and Its Interaction With GroEL Alfonso Valencia,' Tim J. Hubbard; Arturo Muga? Sonia Banuelos? Oscar Llorca,' Jose L. Carrascosa,' and Jose Maria Valpuesta' 'Centro Nacional de Biotecnologia, C.S.Z.C. Universidad Autonoma de Madrid, 28049 Madrid, Spain; 'Centre for Protein Engineering (CPE), MRC Centre, Cambridge, CB2 2QH, United Kingdom; 3Departamento de Bioquimica y Biologia Molecular, Facultad de Ciencias, Universidad del Pais Vasco, 48080 Bilbao, Spain ABSTRACT The three-dimensional struc- ture of the GroES monomer and its interaction with GroEL has been predicted using a combi- nation of prediction tools and experimental data obtained by biophysical [electron micro- scope (EM),Fourier transform infrared (FTIR), and nuclear magnetic resonance (NMR)] and biochemical techniques. The GroES monomer, according to the prediction, is composed of eight @-strands forming a @barrel with loose ends. In the model, p-strands 5-8 run along the outer surface of GroES, forming an antiparallel p-sheet with p4 loosely bound to one of the edges. p-strands 13 would then be parallel and placed in the interior of the molecule. Loops 13 would face the internal cavity of the GroEL- GroES complex, and together with conserved residues in loops 5 and 7, would form the active surface interacting with GroEL. 0 1995 Wdey-Liss, Inc. Key words: chaperonins, electron microscopy, FTIR, molecular modeling, struc- ture prediction, contact prediction, active site prediction INTRODUCTION Chaperonins are a family of proteins involved in the cellular response to stress and in the proper fold- ing of proteins, both in vivo and in ~ i t r o . l - ~ One of the best characterized chaperonin systems is the one formed by the bacterial GroEL (Hsp 60) and GroES (Hsp 10). They share extensive homology with cor- responding members of the family from bacterial, mitochondrial, chloroplast, and eukaryote cyto- plasm origin. Both proteins are needed, together with ATP, Mg2+ and K', to assist in the proper folding of some proteins, although the exact mecha- nism of their function is still ur~known.~*~ GroEL forms an oligomer of 14 identical subunits arranged as a double toroid of around 800 KDa,6,7 while GroES forms a ring of around 70 KDa, built up by 7 0 1995 WILEY-LISS. INC identical subunit^.^,^ Central to the study of the role of the chaperonins in protein folding is analysis of the structure of the complex formed by GroEL and GroES. The three-dimensional structure of the GroEL oligomer a t atomic resolution," together with the low-resolution three-dimensional maps of the complex derived from electron microscopy (EM) have shed some light on this subject; how- ever, there are a number of open questions related to the functional role of asymmetric and symmetric GroEGGroES complexes.'3-16 Also, the details of the interaction between GroEL and GroES oligo- mers is still unknown. Although it is clear that only X-ray crystallogra- phy can provide a high-resolution structure of the complex, this is a good system for trying to combine data from different approaches to reveal some as- pects of the structure. In particular, since the three- dimensional structure at atomic resolution of GroES is close to being it is interesting to combine the results from structural prediction tech- niques with data from spectroscopy and EM to ob- tain a predicted three-dimensional structure of GroES that can be compared later with the one ob- tained by X-ray diffraction techniques, to test the feasibility of this approach. It has recently been established that various new structural prediction methods can be quite useful in fold recognition and ab initio prediction of protein structure," namely, hidden Markov model^,'^ p-strand pairing prediction," and two-dimensional and "in/out" predictions by Neural Networks.21,22 Other tools have been tested on many known struc- tures (tree determinants or conserved residues) or across the entire database of known structures (cor- related mutations) but have never before been used Received January 27, 1995; accepted February 13, 1995. Address reprint requests to Jose Maria Valpuesta, Centro Nacional de Biotecnologia, C.S.I.C. Universidad Autonoma de Madrid, 28049 Madrid, Spain.

Upload: alfonso-valencia

Post on 06-Jul-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Prediction of the structure of GroES and its interaction with GroEL

PROTEINS: Structure, Function, and Genetics 22:199-209 (1995)

PREDICTION REPORT

Prediction of the Structure of GroES and Its Interaction With GroEL Alfonso Valencia,' Tim J. Hubbard; Arturo Muga? Sonia Banuelos? Oscar Llorca,' Jose L. Carrascosa,' and Jose Maria Valpuesta' 'Centro Nacional de Biotecnologia, C.S.Z.C. Universidad Autonoma de Madrid, 28049 Madrid, Spain; 'Centre for Protein Engineering (CPE), MRC Centre, Cambridge, CB2 2QH, United Kingdom; 3Departamento de Bioquimica y Biologia Molecular, Facultad de Ciencias, Universidad del Pais Vasco, 48080 Bilbao, Spain

ABSTRACT The three-dimensional struc- ture of the GroES monomer and its interaction with GroEL has been predicted using a combi- nation of prediction tools and experimental data obtained by biophysical [electron micro- scope (EM), Fourier transform infrared (FTIR), and nuclear magnetic resonance (NMR)] and biochemical techniques. The GroES monomer, according to the prediction, is composed of eight @-strands forming a @barrel with loose ends. In the model, p-strands 5-8 run along the outer surface of GroES, forming an antiparallel p-sheet with p4 loosely bound to one of the edges. p-strands 1 3 would then be parallel and placed in the interior of the molecule. Loops 1 3 would face the internal cavity of the GroEL- GroES complex, and together with conserved residues in loops 5 and 7, would form the active surface interacting with GroEL. 0 1995 Wdey-Liss, Inc.

Key words: chaperonins, electron microscopy, FTIR, molecular modeling, struc- ture prediction, contact prediction, active site prediction

INTRODUCTION Chaperonins are a family of proteins involved in

the cellular response to stress and in the proper fold- ing of proteins, both in vivo and in ~ i t r o . l - ~ One of the best characterized chaperonin systems is the one formed by the bacterial GroEL (Hsp 60) and GroES (Hsp 10). They share extensive homology with cor- responding members of the family from bacterial, mitochondrial, chloroplast, and eukaryote cyto- plasm origin. Both proteins are needed, together with ATP, Mg2+ and K', to assist in the proper folding of some proteins, although the exact mecha- nism of their function is still u r ~ k n o w n . ~ * ~ GroEL forms an oligomer of 14 identical subunits arranged as a double toroid of around 800 KDa,6,7 while GroES forms a ring of around 70 KDa, built up by 7

0 1995 WILEY-LISS. INC

identical subunit^.^,^ Central to the study of the role of the chaperonins in protein folding is analysis of the structure of the complex formed by GroEL and GroES. The three-dimensional structure of the GroEL oligomer a t atomic resolution," together with the low-resolution three-dimensional maps of the complex derived from electron microscopy (EM)

have shed some light on this subject; how- ever, there are a number of open questions related to the functional role of asymmetric and symmetric GroEGGroES complexes.'3-16 Also, the details of the interaction between GroEL and GroES oligo- mers is still unknown.

Although it is clear that only X-ray crystallogra- phy can provide a high-resolution structure of the complex, this is a good system for trying to combine data from different approaches to reveal some as- pects of the structure. In particular, since the three- dimensional structure a t atomic resolution of GroES is close to being it is interesting to combine the results from structural prediction tech- niques with data from spectroscopy and EM to ob- tain a predicted three-dimensional structure of GroES that can be compared later with the one ob- tained by X-ray diffraction techniques, to test the feasibility of this approach.

It has recently been established that various new structural prediction methods can be quite useful in fold recognition and ab initio prediction of protein structure," namely, hidden Markov model^,'^ p-strand pairing prediction," and two-dimensional and "in/out" predictions by Neural Networks.21,22 Other tools have been tested on many known struc- tures (tree determinants or conserved residues) or across the entire database of known structures (cor- related mutations) but have never before been used

Received January 27, 1995; accepted February 13, 1995. Address reprint requests to Jose Maria Valpuesta, Centro

Nacional de Biotecnologia, C.S.I.C. Universidad Autonoma de Madrid, 28049 Madrid, Spain.

Page 2: Prediction of the structure of GroES and its interaction with GroEL

200 A. VALENCIA ET AL.

to predict protein structure^.^^^^^ The degree of suc- cess of the prediction described here will tell us a lot about future needs.

The second approach described here involves the integration of experimental data with the predic- tions obtained using the methods described above. The use of Fourier transform infrared (FTIR) and/or circular dichroism (CD) data could help to validate two-dimensional structure predictions since they provide global insight into the overall secondary structure of protein^.^^,'^ In particular, joint analy- sis of protein secondary structure by FTIR and pre- diction methods has been successfully applied to sev- eral proteins (see ref. 27 and references therein). The application of EM information, even at low res- olution, could facilitate the fitting and checking of molecular models within a three-dimensional frame- work. Here we describe the use of prediction tools in coordination with experimental techniques (FTIR and EM) to predict the three-dimensional structure of GroES and its interaction with GroEL.

METHODS Prediction Methods

All GroES sequences in the SWISS-PROT data- basez8 were aligned with the program MAXHOM.29 The E . coti sequence3’ was used as the guide se- quence, and its numbering is used throughout. The full sequence alignment is available from the au- thors. Secondary structure and solvent accessibility predictions were obtained from the PHD server,31 which implements the algorithms of Rost and

Ab initio p-strand pair prediction was made ac- cording to Hubbard.” The results of this prediction were combined with the PHD secondary structure prediction to identify strands most likely to pair. Because the number of possible pairings is propor- tional to the square of the number of strands, whereas the number of observed pairs is linearly related, prediction generally becomes less reliable as the number of strands increases.

Correlated mutations were calculated using the algorithm of Gobel et Correlated mutations re- flect the coordinated mutational behavior in differ- ent sequences a t two different positions in a multi- ple sequence alignment. They must be considered as an average property of the family of sequences more than of any particular sequence. Correlated muta- tions correspond in statistical terms to proximity be- tween the correlated positions. On average, 45% of strong correlations (typically a number approximate to 115 of the number of amino acids in the sequence: in this case 20 predicted contacts for a 97 amino acid long sequence) correspond to residue-residue dis- tances smaller than 12Az3 (Gobel, Sander, Valencia, unpublished results).

Conserved residues were determined directly from the multiple sequence alignment. “Tree determi-

nant” residues were determined with the interactive tool “Sequencespa~e.”~~ They correspond to con- served positions in groups of sequences that at the same time are different from the other sequences a t that position in the multiple sequence alignment. “Tree determinants” are those residues that contain maximal information about the split of different groups of sequences in a protein family. In the num- ber of examples that have been analyzed previously, these residues have been found to form part of bind- ing pockets, e.g., the peptide binding cleft of SH2 domains or GAP-RAF binding sites in ras proteins24 (Casari, Sander, and Valencia, personal communi- cation).

For fold recognition, hidden Markov models (HMM)” were constructed from the multiple se- quence alignment. Models were then used to search a subset of the sequences (pdb90 database; Hubbard, unpublished data) of structures in the PDB data- base:’ none of which has greater than 90% homol- ogy with any other. The sequence or alignment was also mailed to the PHD server to obtain a secondary structure prediction.” For each alignment a value was calculated measuring the degree of similarity between predicted secondary structure segments and those observed in the known structure (from DSSP33) using an algorithm similar to that of Rost et al.34 By considering the HMM score, the second- ary structure overlap score, and the ranking of sim- ilar folds in the list (using the fold classification of

which is incorporated into pdb90), a predic- tion of fold type was made.

Experimental Techniques GroEL and GroES purification

GroEL and GroES were obtained from a pOF39 plasmid harboring E. coli strain that overexpresses both chaperon in^^^ and purified according to Llorca et a1.I3

FTIR Samples for FTIR spectroscopy were prepared by

concentrating the purified protein with a Centri- con-10 (Amicon, Beverly, MA) to 16 mg/ml. Parallel experiments were carried out in D,O buffer, since assignment and estimation of the secondary struc- ture requires analysis of the infrared spectra in both H,O and D,O. Buffer exchange was achieved by ei- ther freeze-drying the protein solution or diluting the H,O sample ten times with deuterated buffer and concentrating it again to approximately the original volume by use of a micropartition system; this procedure was repeated three times to eliminate residual H,O. Both methods gave identical results.

Infrared spectra were recorded in Nicolet 520 (D,O samples) or Magna 550 (H,O samples) spectro- photometers equipped with MCT detectors, as pre- viously described.37 A total of 200 (for D,O samples)

Page 3: Prediction of the structure of GroES and its interaction with GroEL

PREDICTION OF THE GroES STRUCTURE 201

. . . . , . . . . 10 . . . . . . . . 20 . . . . . . . 30 . . . . . . . . 4 0 . . . , . . . . 50 AA IMNIRPLHDRVIVKRKEVETKSAGGIVLTGS~KSTRGEVLAVGNGRILEI PHD sec lllllllllEEEEEEllllllllllE~ElllllllllEEEZEEllllllll SUB acc lee . . . . . . .$M. . . ree . . . . . . bbb . . . . e e e e . . W . . . . . . . el

L 1 bl L2 b2 L 3 b 3 L4

SUB acc 1p.r . . . . . be . . . . bbbe . . . eebb . . . . p..hbb....$bb...pl b4 L5 b5 L6 b6 L7 b7 L 8 b8

Fig. 1. Predicted secondary structure and residue accessibil- ity for GroES sequence family. Prediction was obtained from the PHD server as described in Methods. e, exposed; b, buried; H, a-helix; E, p-strand; I, loop. Underlined predictions are where sec- ondary structure reliabilities are high (better than 7 and better than 6 insideioutside) reliabilities are better than 6. Note the high reli- ability of the predictions of all p-strands and most of the loop regions. p-strands are labeled from p l to p8 and loops from L1 to L8 as in the text.

and 1,000 scans (for H,O samples) were accumu- lated for each spectrum, using a shuttle device. Spectra were transferred to a personal computer where solvent subtraction, Fourier-self deconvolu- tion, and curve-fitting of the original amide I band were performed as previously rep~r ted . '~

Electron microscopy and image processing To date, the best way to obtain homogeneous

views of GroES oligomers is through the preparation of GroEGGroES complexes. These were obtained by incubation of both oligomers in a 1:l molar ratio in 50 mM Tris-HC1 (pH 7.7), 10 mM MgCl,, 5 mMKC1, 2 mM ATP for 15 minutes a t room temperature. The samples were adhered to carbon-coated grids and negatively stained with 2% uranyl acetate. Electron micrographs were taken at 60,000 x magnification in a JEOL 1200 EX11 at 100 KV. Micrographs were digitized using a Eikonix IEE488 camera with a pixel size equivalent to 7 A in the specimen. Single particles (1,200) of the GroEL-GroES complex were processed and averaged using the procedures de- scribed. l3

RESULTS AND DISCUSSION Linear Prediction

The secondary structure of GroES was predicted from the multiple sequence alignment of 38 se- quences from the GroES family. The predicted struc- ture corresponds to an all P-protein (Fig. 1). The prediction is of high reliability, since where the re- liability value is greater than 7 (see legend to Fig. l), the predicted secondary structure is expected to be correct for 91.1% of residues." Eight P-strands were predicted, p8 with a slightly weaker prediction value. The regions connecting the P-strands are pre- dicted to be loops with high reliability, except loop 2 (connecting P-strands 1 and 2; Fig. 1) and loop 8 (connecting p-strands 7 and 8). This prediction is supported by the presence of many Pro and Gly res- idues, and the only insertions and deletions in the GroES sequences occur in regions predicted to be loops.

a, 0 K 0

0 v)

fl n a

1800 1700 1600 1500 - I Wavenumber, cm

Fig. 2. Original (lower traces) and Fourier self-deconvolved (upper traces) of GroES in H,O buffer (A) and D,O buffer (B). Fourier self-deconvolution was performed using Lorentzian band- widths of 18 cm-' and a resolution enhancement factor of 2.2. The buffer used was 50 mM Hepes, 150 mM NaCI, pH 7.4.

The degree of solvent accessibility" predicted for some of the amino acid residues is shown in Figure 1. All P-strands have a t least one residue that is predicted to be buried with high reliability. It is also interesting to note that almost all the predicted loops contain residues that are strongly predicted as exposed to the solvent. The low average reliability of the "inlout" prediction for GroES may indicate a high degree of accessibility, which is confirmed by the experimental finding that the NH groups of the protein are readily exchangeable with the solvent

FTIR spectroscopy was used to check the second- ary structure prediction, particularly because this technique is very sensitive to P-sheet conformation. Figure 2 shows the infrared spectra in the 1,800- 1,500 cm-' region of GroES in H,O and D,O, after digital subtraction of the respective buffer spectra. In H,O (Fig. 2A), two main bands are seen in this spectral region: the amide I band, which arises pri- marily from stretching vibrations of the backbone C = 0 groups, a t 1,640 cm-', and the amide I1 band around 1,549 cm-'. Upon deuteration (Fig. 2B), the amide I band shifts to 1,634 cm-', while there is a dramatic reduction of the intensity of the amide I1 band, which shifts to approximately 1,446 cm-' (not shown) as a consequence of isotopic substitution of the exchangeable NH protons by D. The bands per- sisting between 1,600 cm-' and 1,500 cm-' are due to amino acid side-chain absorptions (tyrosine at

(DZO).

Page 4: Prediction of the structure of GroES and its interaction with GroEL

202 A. VALENCIA ET AL

1,515 cm-’ and overlapping bands of aspartic acid, glutamic acid, and arginine around 1,581 ~ m - ’ ) . ~ ~

A more detailed interpretation of these spectra was obtained from the deconvolved amide I bands, shown in Figure 2 (upper traces). Band-narrowing reveals that the amide I contour is composed of sev- eral component bands, the frequencies of which are related to the molecular geometry and hydrogen- bonding pattern of the peptide backbone, which, in turn, is determined by the particular secondary structure adopted by the protein25,26,39,40 and can be summarized as follows. The dominant band ob- served a t 1,635 cm-’ (H,O) and 1,630 cm-’ (D,O) is highly diagnostic of amide groups in a P-sheet con- formation. The presence of weaker bands around 1,685 cm-’ (H,O) and 1,680 cm-’ (D,O), in conjunc- tion with a strong band around 1,630 cm-’, is as- signed to neighboring peptide C = O groups in an antiparallel p-sheet conformation. The relative in- tegrated intensity of this component is 30 t 3% of the amide I band area. The assignment of the second major component, which appears a t 1,646 cm-’ (H,O) and 1,641 cm-’ (D,O), is more controversial. The position at which this band is located in D,O buffer lies in the spectral region typical of “random” conformation (1,646-1,640 cm-’). However, the fact that in H,O its frequency is well below those char- acteristic of unordered peptide segments (1,658- 1,650 cm-’) argues against this assignment. In ad- dition, bands arising from unordered structures shift as much as 10 cm-’ towards lower wave num- bers upon deuteration, whereas in this case, this component undergoes a downward shift of 5 cm-’, similar to that observed for the a-helical (see below) and p-sheet structures. Moreover, on the basis of FTIR studies on ~treptokinase,~’ a - l a c t a l b ~ m i n , ~ ~ and RNase Tl,43 this band has been assigned to ex- tended fully hydrated “open loops,” structural re- gions that would not interact with nearby amide functional groups. The existence of a highly mobile and flexible loop region in the GroES molecule has been recently demonstrated by 1H-NMR.44 We pro- pose that the band at 1,646 cm-’ (H,O) and 1,641 cm-’ (D,O) arises from the vibration of a large do- main (27 ? 3% of the amide I band, as determined by curve-fitting analysis of the infrared spectra), com- posed of extended “open loop” structures, which would contain the “mobile domain” of residues 17- 32,44 and which would project into the solvent with little or no interaction with the rest of the protein structure.45

There is strong support in the literature for the assignment of the component band observed a t 1,658 cm-’ (H,O) and 1,652 cm-’ (D,O) to an a-helical structure, its intensity being 12 ? 2% of the amide I band area of GroES. There is only one small region around residues 16-21 with a tendency to form an a-helix, although it is only weakly predicted. How- ever, the fact that this region lies in the middle of

what has been characterized by NMR44 as a region lacking any stable secondary structure questions this assignment.

At this level, the structure of GroES is predicted to be formed by eight P-strands. The region com- prised by P-strands 1-4 and loops 1-4 is mobile and loosely hydrogen bonded, as indicated by FTIR and NMR experiments, as well as by prediction methods.

Arrangement of Two-Dimensional Elements After predicting the distribution of secondary

structural elements along the GroES sequence, the next step was to characterize the interactions among them to predict higher structural levels. Two differ- ent prediction methods were used: 1) p-strand pair preferences”, and 2) correlated mutation^.'^

In Figure 3, the predicted preferences for p-p pairs are displayed in the form of a contact map, indicat- ing the likelihood of any two segments of sequence to interact in either an antiparallel or parallel p-conformation. The contact map is calculated with- out any secondary structure information being in- put; however, PHD results are displayed on the map to aid interpretation. In most cases PHD secondary structure prediction and predicted pairings coincide; however, the antiparallel pairings predicted for res- idues 2-6 suggest that this region may be an addi- tional p-strand, which would most likely form an antiparallel interaction with the adjacent P l . These maps should be considered in the context that 1) each strand can make a t most only two pairs, and 2) p-strands adjacent in the sequence are the most likely to pair (57% p-strand interactions are be- tween adjacent P-strands; Hubbard, unpublished re- sults). The map shows that the protein consists of two clear parts: P-strands 1-3 forming mainly par- allel interactions and p-strands 4-8 forming mainly antiparallel interactions. There are preferences for antiparallel contacts between p-strands 5-6,6-7, and 4-5. The preferences for p-strands 1, 2, and 3 point most likely to parallel pairing between P-strands 1-2 and 2-3. There is little evidence of pairing between p4 and p-strands 1,2, or 3. A more likely interaction seems to be a parallel interaction between p3 and P8.

Figure 4 shows a diagram of the position of the main correlated mutations. A clear pairing between strands 5 and 6 is observed, which reinforces the predicted interaction between the two strands. There are also predicted contacts between residues of loop 3, which may suggest contacts between strands p2 and p3; the same can be said for loop 2 in respect to strands P l and p2.

Another key feature for assigning some of the pairing between p-strands is the size of the inter- vening loops. Loops 5-8 are short (around 3-5 res- idues), which makes parallel pairing impossible be- tween adjacent strands. Given that there is a predicted interaction between p-strands 5 and 6 and

Page 5: Prediction of the structure of GroES and its interaction with GroEL

PREDICTION OF THE GroES STRUCTURE 203

M N I R P L H D R V I V K R K E V E T K S A G G I ~ L T G S A ~ A K S T R G E V L ~ ~ G N G R I L E N G E V K P L D V K V G D l V l F N D G Y G V K S E K l D N E E V L l M S E S D l L A I V E A

E E E E E E E E E E E E E E E E E E b e t a - 6 b s l a - 7 b e t a - 8

80 90

Fig. 3. Prediction of parallel and antiparallel p-strand pair pref- erences (see Methods). GroES sequence is shown vertically and horizontally with PHD prediction. Predicted strands are labelled “beta-I” to “beta-8’’ horizontally. Raw predicted p-strand pair

preferences are shown above diagonal: lines running NW-SE are parallel predictions, lines running NE-SW are antiparallel predic- tions. Parallel (P) and antiparallel (A) pairings assumed in the final proposed structure (see Fig. 5) are shown below the diagonal.

that FTIR shows a major band (around 30% of the intensity of the amide I area) assigned to an anti- parallel P-sheet structure, we predict that p-strands 4-8 are most likely to be in an antiparallel confor- mation. Loops 1-4 are too long, and no correlation constraints to the pairing of neighbor P-strands are predicted. We propose that the region composed of p-strands 1-3 adopts a parallel P-sheet conforma- tion with very loose hydrogen bonds among the strands. P4 may also form part of this region since it is predicted to be 1) loosely bound to the rest of the

structure, 2) exposed to the solvent, and 3) an edge P-strand. Together with loops 1-4, @-strands 1-4 would be part of a mobile region described above by FTIR to be composed of extended “open loop” struc- tures, responsible for the sharp NMR signals of G ~ o E S . ~ ~ P l may not be strongly attached to the rest of the structure since GroES can be proteolyzed within loop 2, leaving a protein able to form a hep- tamer but unable to interact with GroEL (unpub- lished results). This also supports the prediction that Pl is an edge p-strand.

Page 6: Prediction of the structure of GroES and its interaction with GroEL

204 A. VALENCIA ET AL.

I

L4 n

b3 n I 1 n F l

L 1 b l 1 2 b2 L 3 1 2 8 43

4 4

4

54 60 64 69 74 78 82 88 91 96 b4 L5 b5 b8

4

4 Tree-determinant residue

I 0 Conserved residue

Fig. 4. Distribution of conserved, correlated, and tree-determi- nant residues along predicted secondary structure elements of GroES sequence. The secondary structure prediction information is taken from Figure 1, but it should be remembered that the prediction of the beginnings and ends of secondary structural el- ements is never particularly exact. p-strands are labeled from b l to b8 and loops from L1 to L8 as in the text. Conserved positions are indicated by circles with the conserved residue labeled. Cor-

Search for a Similar Folding Pattern in Current Databases

Can this set of predicted structural constraints be accommodated in any of the currently known pro- tein folds? The number and variety of all p-folding motifs in proteins is very large, even though they probably do not describe all possible ones.35,46-48 It is difficult, when searching for folding types, to dis- tinguish among them since most of the contacts be- tween p-strands are very similar and the only dif- ference is in p-strand order. A search for folds compatible with the GroES sequence family using HMMs (see Methods) found the most likely folds among all p proteins to be those of the immunoglob- ulin superfamily, fatty acid binding protein, and P-propeller family (data not shown). The Greek-key motif (the most abundant motif in nature) present in some of the possible candidates is difficult to fit with the predicted tendencies for antiparallel contact be- tween p-strands 5 and 6 and with the small size of the loops connecting @-strands 4, 5 , and 6. The ab- sence of predicted interactions between p-strands 3 and 4 and P-strands 8 and 1 suggests that the GroES structure cannot be that of a classical barrel, but

Correlated mutations between \ two positions.

y6 Predicted P-strand 91 b8

related pairs of positions are connected by lines and tree-deter- minant residues are marked by black diamonds. In some positions more than one diamond is present since there are two group of sequences indicating a tree-determinant residue at that position. The position of the best 20 correlated mutations is shown by lines connecting the correlated positions. Correlated mutations are pro- posed to correspond to residues close in the three-dimensional structure of the protein.

rather that of an “open barrel.” Structural differ- ences between fatty acid binding protein or immu- noglobulins and GroES are also observed by FTIR, mainly involving the band attributable to the mo- bile region of GroES, which is either a minor feature (fatty acid binding protein) or virtually absent (immunoglobulins) in the spectra of these pro- t e i n ~ . ~ ’ ~ ~ ’

Fold recognition “hits” in searches of this kind can be matched to structural “fragments” rather than to an entire domain. The alignment to the fatty acid binding protein almost perfectly matches strands 4-8 onto five strands of the fatty acid all-antiparal- lel sheet; however, strands 1-3 are misaligned and there are too few strands in GroES to complete a fatty acid-like barrel (which has ten p-strands) in any case. The alignment to the P-propeller family is to two consecutive four-strand antiparallel seg- ments of the propeller, which is made of seven such segments. Again, the match may be primarily due to the antiparallel region of GroES. It is interesting to note the sevenfold similarity between GroES and these propellers: one possible topology for GroES would be where p-strands 4-8 from each GroES

Page 7: Prediction of the structure of GroES and its interaction with GroEL

PREDICTION OF THE GroES STRUCTURE 205

form each blade of the propeller and p-strands 1-3 lie beneath it, close to GroEL. However, no propeller has been observed composed of multiple subunits, and such a multimer structure may be unlikely since the segments of the propeller pack so closely together. Preliminary experimental data indicate that folds with such close interactions between subunits are not very likely candidates since the GroES monomer can be obtained free in solution as a stable protein. Since there appears to be no ideal match in the PDB database, GroES may have a novel fold.

At this level, the structure of GroES is predicted to be formed by eight 6-strands, with strands 4-8 most likely forming a single antiparallel p-sheet with p4 in one of the edges. Facing this structural region, P-strands 1-3 would be placed in a parallel fashion and (with the corresponding loops) would form a flexible and mobile region generating an extended “open loop” structure. p-strands 3 and 8 are most likely interacting and therefore at the same side of the molecule. This arrangement can be better described as a p-barrel with loosely bound ends in p l and p4 and a mobile and flexible region composed of p-strands 1-4 and the intervening loops.

Analysis of Family Sequences: Conserved and Tree-Determinant Residues

As described earlier, 38 GroES family sequences are currently known (data not shown). The distribu- tion of conserved residues is given in Figure 4. These are mainly glycine residues in loop 4 (44, 46, and 52), loop 5 (62), and loop 6 (72). Glycine is known to be a conserved residue in many p-hairpins, contrib- uting mainly to the tight turn between p-strands. The position of these residues is therefore very sig- nificant and supports the antiparallel pairing pre- diction. Also significant is the strict conservation of three charged residues (Asp, in loop 1, Lys,, in loop 3, and Asp,, in loop 51, and the conservative substi- tution of Asp,, and Glu,, by Glu and Asp, respec- tively.

Tree-determinant residues were determined by “Sequencespace.” The positions of these residues are given in Figure 4. Tree-determinant residues con- tain most of the information about the differences between groups of sequences found in the family, i.e., different GroES oligomers interact with differ- ent GroEL oligomers. In the proteins that have been analyzed previously, these residues have been found to form part of binding pockets. Therefore, the pre- diction in this case is that the main regions contain- ing these residues (loops 1-4) form part of the same protein surface and that this surface has a role in functional specificity. This also strengthens the ar- gument that p-strands 1-3 are placed in a parallel conformation.

Fitting the GroES Model Within the Low-Resolution Structure of the GroES Monomer and the GroEL-GroES Complex

Comparison of two-dimensional projection images from average side-views of GroEL and GroEG GroES complexes has allowed GroES to be defined as the extension found in the apical regions of the GroEGGroES Also, the two-dimen- sional crystals of GroESl’ have helped to define the GroES morphology as that of a doughnut-shaped ag- gregate with a tronco-conical profile in the side view. From the average side-view of the asymmetric GroEGGroES complex obtained by electron micros- copy and image processing of single particles (Fig. 5B), each of the seven GroES monomers building the complex could be approximated to a wedge-shaped irregular parallelogram, with the rough dimensions outlined in Figure 5C. The outer surface of the GroES monomers is tilted 40 degrees in respect to the longitudinal axis of the GroEGGroES complex. The assembly of the GroES monomers on top of the GroEL oligomer (Fig. 5B) would then lead to a dumbbell-shaped structure, with a channel in the center with a diameter of 25 A in the region inter- acting with GroEL and 10 A in the outer surface, and aligned with the central cavity described for GroEL.”

The rough dimensions obtained for the GroES monomer (36 A wide in the outer part of the mole- cule and 30 A high; see Fig. 5C) fit with the size of the proposed p-strands. (The 30 A is compatible with the width of a p-sheet of five strands, assuming an average width of 5 A between strands plus the pro- truding side chains, and 30 A high is enough to ac- commodate P-strands of 5-6 residues.)

Additional information can be used to place the proposed model of the GroES monomer within the framework of the low-resolution three-dimensional structure of GroES and the GroEGGroES complex obtained by EM and image processing of single par- ticles of the GroEL-GroES complex. This informa- tion comes mainly from biochemical data about the domain of GroES most likely interacting with GroEL as well as prediction data about the loops of GroES facing the inside/outside of the molecule and contacting GroEL. As mentioned above, the pro- posed parallel arrangement of p-strands 1-3 and the presence of tree-determinant residues in loops 1 to 4 make all these loops candidates to form part of the binding surface. Part of this region is described by NMR to form a mobile domain,44 and our FTIR ex- periments describe a region similar in size to a structural region that does not interact with nearby amide functional groups. This region becomes im- mobilized upon ATP-dependent formation of the GroEGGroES complex. It is suggested that this re- gion interacts directly with GroEL, since GroEG GroES complex formation strongly reduces the transferred nuclear Overhauser effect signals ob-

Page 8: Prediction of the structure of GroES and its interaction with GroEL

206 A. VALENCIA ET AL

Fig. 5. Proposed topology of GroEL and its possible arrange- ment in the GroEL-GroES complex. a: Cartoon of the predicted topology of GroEL. Loop regions L1 to L4 are situated in the first plane, close to the viewer. They are shown as rectangles just to indicate their considerable size. In a second plane are situated beta strands pl-p3 (bl, b2, and b3 in the figure). The third plane, farther from the viewer, contains strands p4-p8 (labelled as b4 to b8). Despite the rigid and geometrical aspect of the cartoon, our data indicate that the structure is very exposed to the solvent and most likely very flexible. The N-terminal region, including p-strands 1-4, is probably very flexible and loosely bonded (see text). b: Average image projection of the side view of the GroEL- GroES complex, including gray level curves. The encircled region shows the projection of the area occupied by a GroES monomer. GroEL is responsible for the rest of the density below the GroES structure. The central cavity is clearly visible. c: The dimensions of the GroES oligomer are shown from the side and above, calcu- lated from the EM data shown in b. From these views of the

complex and assuming the heptameter conformation, the dimen- sions of the monomer were calculated. d: Predicted arrangement of the GroES model within the EM density of the monomer. Top: Enlarged view of the top of c. Vertical hatched line, longitudinal axis of the GroEL-GroES complex, same as hatched vertical ar- row in top of c; curved horizontal solid line at the bottom of the GroES monomer; GroEL interface; horizontal dot-dash line, ap- proximated cutting plane used for obtaining the section shown in the bottom of d; cross-hatched ellipse labeled "a," loop regions Ll-L3 placed in the interface between GroEL and GroES and close to the central cavity; labeled "b," p-strands 1-3 placed in the interior of the molecule; Labeled "c," P-strands 4-8 running par- allel to s-strands 1-3 and placed by the outer surface of the molecule. Bottom: View of the eight p-strands in the interior of the monomer. The p-sheets are represented as curved to fit the av- erage behavior of P-barrels, even if we are predicting a very loose interaction between the two sets of beta sheets (see text).

Page 9: Prediction of the structure of GroES and its interaction with GroEL

PREDICTION OF THE GroES STRUCTURE 207

served for GroES in solution.44 This is confirmed by the use of a synthetic peptide containing the sequence between residues 13 and 32, which in- cludes part of the predicted p l , L2, p2, and L3, which binds to GroEL, directly involving I1ez5- Val, ,- le~,, .~~

Proteolysis experiments have helped to define the N-terminal region as the one interacting with GroEL. Trypsin-controlled digestion of GroES, which cuts between Lys,, and Ser,, (in the predicted loop 2), is impeded when GroES is bound to G ~ o E L . ~ ~ The proteolyzed GroES maintains its oli- gomerized state but is unable to interact with GroEL (unpublished results). Proteolysis of GroES with proteinase K gives two fragments, the first cor- responding to the first 32 residues of the GroES se- quence. Again, this cleavage site (between Ala,, and Ala,,, in the predicted loop 3) becomes inacces- sible to the protease upon formation of the GroEL- GroES complex.51 Taken together, these results clearly point to the mobile region formed by p-strands 1-3 and loops 1-4 as the one involved in the interaction with GroEL, but not taking part (at least the first 20 residues) in the GroES monomer- monomer interaction.

The model that we propose would contain the fol- lowing features (Fig. 5D): An antiparallel P-sheet formed by p-strands 4-8 would run along the outer surface of the monomer. A second group of parallel p-strands (1-3) would run on the interior of the mol- ecule, almost parallel and having little interaction with the antiparallel P-sheet, forming altogether an "open" barrel. Finally, loops 1-4, connecting p-strands 1-3, would face the internal cavity of the GroEGGroES complex and make the most of the interaction between GroES and GroEL. The con- served Asp,, in loop 5 would also be placed in the region of interaction with GroEL. p-strands 8 and 1 of a GroES monomer would interact with P-strands 4 and 3 of a different monomer, respectively. There are indeed contacts between these strands that are predicted by correlated mutations (Fig. 4).

The region comprised by strands 1-4 and loops 1-4 would form a flexible and mobile domain that becomes immobilized upon interaction with GroEL. Interestingly, the region of GroEL interacting with GroES is the one that has been most difficult to elu- cidate in the three-dimensional map at atomic res- olution and has been suggested to be inherently f l e ~ i b l e . ' ~ , ~ ~ The surface of GroEL proposed to con- tact GroES maps to a region that matches well with the one proposed here for GroES (see Fig. 1 in ref. 2). The flexibility of part of the GroES monomer has been described earlier, but this flexibility also ap- plies to the GroES oligomer. EM observations reveal that the GroES molecules possess a certain plastic- ity15 (our unpublished observations). Such flexibil- ity has been interpreted to be required for the GroES oligomer to bind each subunit of GroES to a subunit

of GroEL, which also shows molecular plastic-

Finally, labeling experiments with an azido deriv- ative of ATP have located in GroES a nucleotide binding site involving Tyr,,.'l Given the low affin- ity of GroEL for ATP and that GroES has no detect- able ATPase activity, Martin et al.51 have suggested that the ATP binding by GroES could play a role in increasing the probability that all seven ATP sites of the GroEL toroid interacting with GroES are oc- cupied simultaneously, which would be very impor- tant in obtaining highly cooperative ATP hydrolysis to release bound substrates for folding. According to these authors, there is a possibility that GroES do- nates ATP to GroEL, thereby increasing the cooper- ativity of ATP binding. According to our model, Tyr7, would be located in loop 6, in the outer region of the GroES monomer, facing the solvent (Fig. 5A,D) and on the same side of the window that is localized at the level of the GroEL intermediate do- main." This window or portal has been hypothe- sized to play a role in the entrance and exit of nu- cleotides from the ATP binding site, placed in between the apical and intermediate domains and facing the GroEL ~ h a n n e l . ' ~ , ~ ~

ity,15,53

CONCLUSIONS We have carried out a prediction study of the

GroES structure using a combination of prediction techniques and experimental results. We think that this is a valid exercise, first because this modeling exercise has allowed us to explore the possibilities of new technologies in the field of structure prediction and second because it produces quite a detailed model of GroES and its interaction with GroEL. From this model, interesting biological hypotheses could be derived that can be experimentally verified. Some of these hypotheses are complementary to the resolution of the three-dimensional structure, i.e., presence of key residues for the binding to GroEL.

Prediction of the GroES structure was carried out at four different levels:

1. Linear prediction. Eight P-strands are pre- dicted. All the protein is highly exposed to the sol- vent, specially loops 1-3.4 The N-terminal region from loop 1 to p4 is mobile and loosely hydrogen bonded.

2. Topology. The GroES molecule seems to be made up of two p-sheets that most likely form a P-barrel with loose ends. p-strands 4-8 form a sin- gle antiparallel p-sheet, with p4 loosely bound as an edge strand. p-strands 1-3 are parallel. P-strands 3 and 8 are at the same side of the molecule.

3. Surface of interaction with GroEL. Loops 1, 2, and 3 and conserved residues in loops 5 and 7 are the active surface interacting with GroEL. 4. Fitting the model with the EM density.

P-strands 4-8 run along the outer surface of the

Page 10: Prediction of the structure of GroES and its interaction with GroEL

208 A. VALENCIA ET AL.

monomer. (3-strands 1-3 are located in the interior of the molecule. Loops 1-4 connect p-strands and would face the internal cavity of the GroELGroES complex and make the most of the interaction be- tween GroES and GroEL. The conserved Asp,, in loop 5 would also be placed in the region of interac- tion with GroEL. This region of GroES would there- fore match the one that in GroEL has been eluci- dated to interact with GroES.

Regarding the technical aspects of the prediction, we would like to mention five points that we find rele- vant: 1) accurate secondary structure prediction is essential for protein modeling; ambiguous assigna- tions of secondary structure elements increase dra- matically the number of alternative structures; 2) since the number of topologies for all p-proteins is high, identifying the native topology for a given se- quence is a particularly difficult problem; the prob- lem also becomes much harder as the number of p-strands increases, since when predicting p-p strand pairs, the number of possible pairs increases with the square of the number of p-strands; 3) pre- diction of long-range contacts is an important ele- ment for structure prediction; a few correctly pre- dicted contacts may be compatible with only a few possible folds, but current techniques are still in their infancy; 4) simple predictions of active site sur- faces based on the presence of conserved residues and “tree determinants” are one of the most reliable ways of rejecting wrong topologies; we expect to see more developments in this field in the future; and 5 ) integration of experimental and prediction data is a very interesting and open field. The use of EM data is an innovative way of setting shape and volume constraints to reduce the number of possible three- dimensional protein models.

There are now a very large number of protein fam- ilies in which many sequences are known; consider- able experimental information exists, but no X-ray or NMR structures are in sight. We feel that predic- tions of this type, particularly if they can be made more automated and applied to genomic databases on a large scale, can provide a very useful additional level of structural information for the experimen- talist.

ACKNOWLEDGMENTS We are indebted to our friends in the EMBL Pro-

tein Design Group for their critical comments on the model, particularly Liisa Holm for her observations about the handedness of the structure. We thank Burkhard Rost, Reinhard Schneider, and Chris Sander for access to the PHD secondary structure prediction server, Sean Eddy for use of the HMM program suite, Georg Casari for the use of “Se- quencespace,” and Ulrike Gobel for the program to calculate “correlated mutations.” This work was supported in part by grant PB91-0109 from the DIG-

ICYT (J.L.C.), a “structure-function of proteins” grant from the C.S.I.C. (J.M.V.), grant BI094-1067 from the DIGICYT and a “structure-function of pro- teins” grant from the C.S.I.C. (A.V.), and grant PGV 9212 from the Gobierno Vasco (A.M.). T.H. is grate- ful to the MRC and ZENECA for financial support. S.B. and O.L. are recipients of predoctoral fellow- ships from the Gobierno Vasco and Gobierno de Na- varra, respectively.

REFERENCES 1. Ellis, R.J., Van der Vies, S.M. Molecular chaperones.

Annu. Rev. Biochem. 60:321-347, 1991. 2. Hartl, F.U. Secrets of a double-doughnut. Nature 371557-

559,1994. 3. Hubbard, T.J., Sander, C. The role of heat-shock and chap-

erone proteins in protein folding: possible molecular mech- anisms. Protein Eng. 4:711-717, 1991.

4. Martin, J . , Mayhew, M., Langer, T., Hartl, F.U. The reac- tion cycle of GroEL and GroES in chaperonin-assisted pro- tein folding. Nature 366:228-233, 1993.

5. Weissman, J.S., Kashi, Y., Fenton, W.A., Horwich, A.L. GroEL-mediated protein folding proceeds by multiple rounds of binding and release of nonnative forms. Cell 78: 693-702,1994.

6. Hohn, T., Hohn, B., Engel, A., Wurtz, M., Smith, P.R. Iso- lation and characterization of the host protein GroE in- volved in bacteriophage lambda assembly. J . Mol. Biol. 129:359-373, 1979.

7. Hendrix, R. Purification and properties of GroE, a host protein involved in bacteriophage assembly. J . Mol. Biol. 129:375-392, 1979.

8. Chandrasekhar, G.N., Tilly, K., Woolford, C., Hendrix, R., Georgopoulos, C. Purification and properties of the GroES morphogenetic protein of Escherichia coli. J Biol. Chem. 261:12414-12419, 1986.

9. Weaver, A.J., Landry, S.J., Deisenhofer, J . Progress in the x-ray structure determination of the E. coli chaperonin GroES. Biophys. J . 64:A350, 1993.

10. Braig, K., Otwinowski, Z., Hedge, R., Boisvert, D.C., Joachimiak, A., Horwich, A.L., Sigler, P.B. The crystal structure of the bacterial chaperonin GroEL a t 2.8 Ang- strom. Nature 371578-586, 1994.

11. Langer, T., Pfeifer, G., Martin, J . , Baumeister, W., Hartl, F.U. Chaperonin-mediated protein folding: GroES binds to one end of the GroEL cylinder, which accommodates the protein substrate within its central cavity. EMBO J . 11: 4757-4765, 1992.

12. Chen, S., Rosernan, A.M., Hunter, A.S., Wood, S.P., Bur- ston, S.G. Ranson, N.A., Clarke, A.R., Saibil, H.R. Loca- tion of a folding protein and shape changes in GroEG GroES complexes imaged by cryoelectron microscopy. Nature 371:261-264, 1994.

13. Llorca, O., Marco, S., Carrascosa, J.L., Valpuesta, J.M. The formation of symmetrical GroEL-GroES complexes in the presence of ATP. FEBS Lett. 3452-3, 1994.

14. Schmidt, M., Rutkat, K., Rachel, R., Pfeifer, G., Jaenicke, R., Viitanen, P., Lorimer, G., Buchner, J. Symmetric eom- plexes of GroE chaperonins as part of the functional cycle. Science 265656-659, 1994.

15. Harris, J.R., Pluckthun, A,, Zahn, R. Transmission elec- tron microscopy of GroEL, GroES, and the symmetrical GroEL/GroES complex. J. Struct. Biol. 112:216-230,1994,

16. Todd, M.J., Viitanen, P.V., Lorimer, G.H. Dynamics of the chaperonin ATPase cycle: Implications for facilitated pro- tein folding. Science 265659-666, 1994.

17. Hunt, J.F., Weaver, A.J., Landry, S.J., Gierash, L., Deisen- hofer, J . X-ray crystal structure determination of the GroES chaperonin from E. coli. Meeting of the American Crystallographic Association, P004, 1994.

18. Moult, J . Results of the 1994 Structure Prediction Compe- tition and meeting ‘Critical assessment of techniques for protein structure prediction.’ Proteins (in press) 1995. (Also see: ftp://iris4.carb.nist.gov/).

19. Eddy, S. HMM: Hidden Markov models for protein and nucleic acid sequence analysis. Software and documenta-

Page 11: Prediction of the structure of GroES and its interaction with GroEL

PREDICTION OF THE GroES STRUCTURE 209

tion available from ftp://cele.mrc-lmb.cam.ac.uWpub/sre or http://logi.mrc-lmb.cam.ac.uk/, 1994.

20. Hubbard, T.J. Use of P-strand interaction pseudo-poten- tials in protein structure prediction and modelling. In: “Proceedings of the Biotechnology Computing Track, Pro- tein Structure Prediction MiniTrack of the 27th HICSS.” Lathrop, R.H., ed. IEEE Computer Society Press, 1994: 336-354. (Also see ftp://ind2.mrc-lmb.cam.ac.ac.uWpub/ th/betalbeta.hicss-27.ps.Z)

21. Rost, B., Sander, C. Combining evolutionary information and neural networks to predict protein secondary struc- ture. Proteins 19:55-72, 1994.

22. Rost, B., Sander, C. Conservation and prediction of solvent accessibility in protein families. Proteins 20:216-226, 1994.

23. Gobel, U., Sander, C., Schneider, R., Valencia, A. Corre- lated mutations and residue contacts in proteins. Proteins 18:309-317, 1994.

24. Casari, G., Sander, C., Valencia, A. Sequencespace: A tool for family analysis. Nature Struct. Biol. 2:171-178, 1995.

25. Arrondo, J.L., Muga, A,, Castresana, J . , Goni, F.M. Quan- titative studies of the structure of proteins in solution by Fourier-transform infrared spectroscopy. Prog. Biophys. Mol. Biol. 59:23-56, 1993.

26. Surewicz, W.K., Mantsch, H.H., Chapman, D. Determina- tion of protein secondary structure by Fourier transform infrared spectroscopy: A critical assessment. Biochemistry 32:389-394, 1993.

27. Perkins, S.J., Smith, K.F., William, S.C., Haris, P.I., Chap- man, D., Sim, R.B. The secondary structure of the von Willebrand factor type A domain in factor B of human complement by Fourier transform infrared spectroscopy. Its occurrence in collagen types VI, VII, XI1 and XIV, the integrins and other proteins by average structure predic- tions. J . Mol. Biol. 238:104-119, 1994.

28. Bairoch, A,, Boeckmann, B. The SWISS-PROT sequence data bank. Nucleic Acids Res. 19:2247-2250, 1991.

29. Sander, C., Schneider, R. Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 9:56-68, 1991.

30. Hemmingsen, S.M., Woolford, C., van der Vies, S.M., Tilly, K., Dennis, D.T., Georgopoulos, C.P., Hendrix, R. W., Ellis, R.J. Homologous plant and bacterial proteins chaperone oligomeric protein assembly. Nature 333:330-334, 1988.

31. Rost, B., Sander, C., Schneider, R. PHD-an automatic mail server for protein secondary structure prediction. Comput. Appl. Biosci. 10:53-60, 1994.

32. Abola, E.E., Bernstein, F.C., Bryant, S.H., Koetzle, T.F., Weng, J . “Crystallographic Databases-Information Con- tent, Software Systems, Scientific Applications.” Allen, F.H. Bergelhoff, G., Siewers, R., eds. Bonn: Data Commis- sion of the International Union of Crystallography, 1987: 107-132.

33. Kabsch, W., Sander, C. How good are predictions of protein secondary structure? FEBS Lett. 155179-82, 1983.

34. Rost, B., Sander, C., Schneider, R. Redefining the goals of protein secondary structure prediction. J . Mol. Biol. 235: 13-26, 1994.

35. Murzin, A,, Brenner, S.E., Hubbard, T.J.P., Chothia, C. Scop: A structural classification of proteins database for the investigation of sequences and structures. J . Mol. Biol., 247536440, 1995. (Also see: http://scop.mrc-lmb.ca- m.ac.uk/scop)

36. Fayet, O., Louarn, J.M., Georgopoulos, C. Suppresion of the E . coli dnaA46 mutation by amplification of the GroES and GroEL genes. Mol. Gen. Genet. 202:435-445, 1986.

37. Arrondo, J., Muga, A., Castresana, J . , Bernabeu, C., Goni, F.M. An infrared spectroscopic study of beta-galactosidase structure in aqueous solutions. FEBS Lett. 252:l-2, 1989.

38. Venyaminov, S.Y., Kalnin, N.N. Quantitative IR spectro- photometry of peptide compounds in water (H,O). I. Spec- tral parameters of aminoacid residues absorption. Biopoly- mers 30:1243-1257, 1990.

39. Krimm, S., Bandekar, J . Vibrational spectroscopy and con- formation of peptides, polypeptides and proteins. Adv. Pro- tein Chem. 38:181-364, 1986.

40. Surewicz, W.K., Mantsch, H.H. New insight into protein secondary structure from resolution-enhanced infrared spectra. Biochim. Biophys. Acta. 952115-130, 1988.

41. Fabian, H., Naumann, D., Misselwitz, R., Ristan, O., Gerlach, D., Welfle, H. Secondary structure of streptoki- nase in aqueous solution: A Fourier transform infrared spectroscopy study. Biochemistry 30: 10479-10485, 1992.

42. Urbanova, M., Dukor, R.K., Pancoska, P., Gupta, V.P., Keiderling, T.A. Comparison of alpha-lactalbumin and lysozyme using vibrational circular dichroism. Evidence for a difference in crystal and solution structures. Bio- chemistry 30:10479-10485, 1991.

43. Fabian, H., Schultz, C., Naumann, D., Landt, O., Hahn, U., Saenger, W. Secondary structure and temperature-in- duced unfolding and refolding of ribonuclease T1 in aque- ous solution. A Fourier transform infrared spectroscopic study. J . Mol. Biol. 232:967-981, 1993.

44. Landry, S.J., Zeilstra-Ryalls, J . , Fayet, O., Georgopoulos, C., Gierasch, L.M. Characterization of a functionally im- portant mobile domain of GroES. Nature 364:255-258, 1993.

45. Dobson, C.M. Flexible friends. Curr. Biol. 3:530-532, 1993.

46. Chothia, C., Finkelstein, A.V. The classification and ori- gins of protein folding patterns. Annu. Rev. Biochem. 59: 1007-1039, 1990.

47. Holm, L., Ouzounis, C., Sander, C., Tuparev, G., Vriend, G. A database of protein structure families with common fold- ing motifs. Protein Sci. 1:1691-1698, 1992.

48. Orengo, C. Classification of protein folds. Curr. Opin. Struct. Biol. 4:429-440, 1994.

49. Perkins, S.J., Nealis, A.S., Haris, P.I., Chapman, D., Goun- dis, D., Reid, K.B.M. Secondary structure in properdin of the complement cascade and related proteins: A study by Fourier transform infrared spectroscopy. Biochemistry 28:

50. Muga, A., Cistola, D.P., Mantsch, H.H. A comparative study of the conformational properties of E . coli-derived rat intestinal and liver fatty acid binding proteins. Bio- chim. Biophys. Acta. 1162:291-296, 1993.

51. Martin, J., Geromanos, S., Tempst, P., Hartl, F.U. Identi- fication of nucleotide-binding regions in the chaperonin proteins GroEL and GroES. Nature 366:279-282, 1993.

52. Fenton, W.A., Kashi, Y., Furtak, K., Horwich, A.L. Resi- dues in chaperonin GroEL required for polypeptide bind- ing and release. Nature 371:614-619, 1994.

53. Zahn, R., Harris, J.R., Heifer, G., Pluckthun, A,, Baumeis- ter, W. Two-dimensional crystals of the molecular chaper- one GroEL reveal structural plasticity. J . Mol. Biol. 229: 579-584, 1993.

7176-7182, 1989.