folding studies of immunoglobulin-like β-sandwich proteins suggest that they share a common folding...

9
Folding studies of immunoglobulin-like β-sandwich proteins suggest that they share a common folding pathway Jane Clarke*, Ernesto Cota, Susan B Fowler and Stefan J Hamill Background: Are folding pathways conserved in protein families? To test this explicitly and ask to what extent structure specifies folding pathways requires comparison of proteins with a common fold. Our strategy is to choose members of a highly diverse protein family with no conservation of function and little or no sequence identity, but with structures that are essentially the same. The immunoglobulin-like fold is one of the most common structural families, and is subdivided into superfamilies with no detectable evolutionary or functional relationship. Results: We compared the folding of a number of immunoglobulin-like proteins that have a common structural core and found a strong correlation between folding rate and stability. The results suggest that the folding pathways of these immunoglobulin-like proteins share common features. Conclusions: This study is the first to compare the folding of structurally related proteins that are members of different superfamilies. The most likely explanation for the results is that interactions that are important in defining the structure of immunoglobulin-like proteins are also used to guide folding. Introduction How much information can be transferred from folding studies of one protein to other members of the same family? Will folding patterns be similar, or does each protein have a unique folding mechanism dictated by its individual amino-acid sequence? Such questions can only be addressed by detailed folding studies of a number of related proteins. A major problem in the analysis of protein families is distinguishing between residues con- served for folding and residues conserved for function. Our strategy, therefore, is to study a highly diverse protein family with no conservation of function and little or no sequence identity, but with structures that are essentially the same. The immunoglobulin-like (Ig-like) β sandwich [1,2] is one of the most common structural motifs [3]. Ig-like domains are found widely in proteins of diverse function: proteins of the extracellular matrix, muscle pro- teins, proteins of the immune system, cell-surface recep- tors and enzymes. Much structural data is available so it is possible to choose proteins that are relatively easy to study (having a tryptophan for spectroscopic observation and no disulphide cross-links, for example) and there is a wealth of statistical information on sequence variability. Theoretical studies suggest that folding pathways are con- served in protein families: specifically, that interactions constituting the folding nucleus are conserved, although the sidechains involved in these interactions may not themselves be invariant [4,5]. To test this hypothesis and discern to what extent structure specifies folding pathway, we compare the folding of five different Ig-like proteins from two different superfamilies — three Ig domains and two fibronectin type III (fnIII) domains. Importantly, they have no common function and no discernible sequence homology so that any specific features of folding behav- iour they share must be related to the only common feature, their structure. We propose, from a comparison of the folding kinetics of the wild-type proteins, that the structural constraints on proteins with this Ig-like β-sand- wich fold impose a common folding mechanism. Results and discussion Comparison of the proteins The Ig-like fold Members of the Ig-like structural family, or fold, are divided into superfamilies mainly on the basis of sequence similarity [2]. Different superfamilies have no significant sequence identity, that is, they have no detectable evolu- tionary relationship. The immunoglobulin, fibronectin type III and cadherin superfamilies have 3495, 2103 and 609 members, respectively, in the current pfam data base [6]. All the Ig-like proteins have two antiparallel β sheets packed against each other and a similar Greek key strand topology. They differ in the number and the position of peripheral strands, but share a basic strand arrangement and hydrogen-bonding pattern (Figure 1). The strands are given letters from A to G according to a standard pattern. Thus all proteins will have a sheet containing the B and E Address: Department of Chemistry, Centre for Protein Engineering, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK. *Corresponding author. E-mail: [email protected] Key words: evolution, fibronectin type III, immunoglobulin, protein folding, sequence alignment Received: 19 April 1999 Revisions requested: 18 May 1999 Revisions received: 21 May 1999 Accepted: 27 May 1999 Published: 31 August 1999 Structure September 1999, 7:1145–1153 http://biomednet.com/elecref/0969212600701145 0969-2126/99/$ – see front matter © 1999 Elsevier Science Ltd. All rights reserved. Research Article 1145

Upload: jane-clarke

Post on 18-Sep-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Folding studies of immunoglobulin-like β-sandwich proteins suggest that they share a common folding pathway

Folding studies of immunoglobulin-like ββ-sandwich proteinssuggest that they share a common folding pathwayJane Clarke*, Ernesto Cota, Susan B Fowler and Stefan J Hamill

Background: Are folding pathways conserved in protein families? To test thisexplicitly and ask to what extent structure specifies folding pathways requirescomparison of proteins with a common fold. Our strategy is to choosemembers of a highly diverse protein family with no conservation of function andlittle or no sequence identity, but with structures that are essentially the same.The immunoglobulin-like fold is one of the most common structural families,and is subdivided into superfamilies with no detectable evolutionary orfunctional relationship.

Results: We compared the folding of a number of immunoglobulin-like proteinsthat have a common structural core and found a strong correlation betweenfolding rate and stability. The results suggest that the folding pathways of theseimmunoglobulin-like proteins share common features.

Conclusions: This study is the first to compare the folding of structurally relatedproteins that are members of different superfamilies. The most likely explanationfor the results is that interactions that are important in defining the structure ofimmunoglobulin-like proteins are also used to guide folding.

IntroductionHow much information can be transferred from foldingstudies of one protein to other members of the samefamily? Will folding patterns be similar, or does eachprotein have a unique folding mechanism dictated by itsindividual amino-acid sequence? Such questions can onlybe addressed by detailed folding studies of a number ofrelated proteins. A major problem in the analysis ofprotein families is distinguishing between residues con-served for folding and residues conserved for function.Our strategy, therefore, is to study a highly diverse proteinfamily with no conservation of function and little or nosequence identity, but with structures that are essentiallythe same. The immunoglobulin-like (Ig-like) β sandwich[1,2] is one of the most common structural motifs [3].Ig-like domains are found widely in proteins of diversefunction: proteins of the extracellular matrix, muscle pro-teins, proteins of the immune system, cell-surface recep-tors and enzymes. Much structural data is available so it ispossible to choose proteins that are relatively easy to study(having a tryptophan for spectroscopic observation and nodisulphide cross-links, for example) and there is a wealthof statistical information on sequence variability.

Theoretical studies suggest that folding pathways are con-served in protein families: specifically, that interactionsconstituting the folding nucleus are conserved, althoughthe sidechains involved in these interactions may notthemselves be invariant [4,5]. To test this hypothesis and

discern to what extent structure specifies folding pathway,we compare the folding of five different Ig-like proteinsfrom two different superfamilies — three Ig domains andtwo fibronectin type III (fnIII) domains. Importantly, theyhave no common function and no discernible sequencehomology so that any specific features of folding behav-iour they share must be related to the only commonfeature, their structure. We propose, from a comparison ofthe folding kinetics of the wild-type proteins, that thestructural constraints on proteins with this Ig-like β-sand-wich fold impose a common folding mechanism.

Results and discussionComparison of the proteinsThe Ig-like foldMembers of the Ig-like structural family, or fold, aredivided into superfamilies mainly on the basis of sequencesimilarity [2]. Different superfamilies have no significantsequence identity, that is, they have no detectable evolu-tionary relationship. The immunoglobulin, fibronectin typeIII and cadherin superfamilies have 3495, 2103 and 609members, respectively, in the current pfam data base [6].All the Ig-like proteins have two antiparallel β sheetspacked against each other and a similar Greek key strandtopology. They differ in the number and the position ofperipheral strands, but share a basic strand arrangementand hydrogen-bonding pattern (Figure 1). The strands aregiven letters from A to G according to a standard pattern.Thus all proteins will have a sheet containing the B and E

Address: Department of Chemistry, Centre forProtein Engineering, University of Cambridge,Lensfield Road, Cambridge CB2 1EW, UK.

*Corresponding author.E-mail: [email protected]

Key words: evolution, fibronectin type III,immunoglobulin, protein folding, sequence alignment

Received: 19 April 1999Revisions requested: 18 May 1999Revisions received: 21 May 1999Accepted: 27 May 1999

Published: 31 August 1999

Structure September 1999, 7:1145–1153http://biomednet.com/elecref/0969212600701145

0969-2126/99/$ – see front matter© 1999 Elsevier Science Ltd. All rights reserved.

Research Article 1145

Page 2: Folding studies of immunoglobulin-like β-sandwich proteins suggest that they share a common folding pathway

strands, but this sheet may also include an A and/or a Dstrand. The second sheet consists of C, F and G strands,but may also include an A (or A′) strand (in parallel with G),and/or an extra C′ and even C′′ strands. There are connect-ing loops between the sheets. All proteins have inter-sheetloops from B to C and from E to F strands; connections forother strands vary with the number and arrangement ofstrands at the edge of the two β sheets (Figure 1).

The proteinsFive different proteins are used in this study, representingtwo different superfamilies. The structure and strand topol-ogy of these proteins are described in Figure 1. The twofnIII domains are isolated modules of two human intracellu-lar matrix proteins — the third fnIII domain from human

tenascin (TNfn3) and the tenth fnIII domain of humanfibronectin (FNfn10). These have significant sequenceidentity, (~23%) (Table 1) and a larger number of residuesthat are structurally equivalent (~70%), only the A–B, C–C′,C′–D and F–G loops are different in structure (Figure 2).The conserved core tryptophan is found in the B strand.FNfn10 has eight proline residues, four of which are con-served in TNfn3, which has five prolines. All the prolinesare found in the loop regions, or at the start or end of strands.

The three Ig domains are representative of two differentIg structural subsets, and are more diverse in sequencethan the fnIII domains (Table 1). Two domains are iso-lated modules from giant muscle proteins, from humantitin (TI I27) and from Caenorhabditis elegans twitchin

1146 Structure 1999, Vol 7 No 9

Figure 1

Structure of the Ig-like β-sandwich proteinsdescribed in this study. (a–c) Ig I-set domainsTWIg18′ (b) and TI I27 (c). (d,e) Ig V-setdomain CD2d1. (f–h) fnIII domains TNfn3 (g)and FNfn10 (h). The proteins differ in thenumber and arrangement of peripheralβ strands. The cartoons (a,d,f) show thesandwich ‘opened up’ to illustrate thearrangement of the strands into two sheets: allproteins have strands B, C, E, F and G, shownin black. The diagrams were constructed usingMolScript [32], with the following coordinatefiles: 1wit.brk (TWIg18′ [27]), 1tit.brk (TI I27[33]), 1hng.brk (CD2d1 [34]), 1fnf.brk(FNfn10 [35]) and 1ten.brk (TNfn3, [36]).

Ig V-set

fnIII

Ig I-set

CD2d1

TNfn3 FNfn10

TWIg18′ TI I27

C′B A G F CE

B A G F C

C′

E

A′

D

B A G F C C′ED C′′

(a)

(d)

(f) (g) (h)

(e)

(b) (c)

Structure

Page 3: Folding studies of immunoglobulin-like β-sandwich proteins suggest that they share a common folding pathway

(TWIg18′). These are members of the intermediate, orI-set. They are low in sequence identity (~12%), but havea larger number of structurally equivalent residues(~55%). CD2 domain 1 (CD2d1) from rat is a member ofthe V-set, named after the antibody variable domains. Ithas low sequence identity to TWIg18′ and TI I27 (~10%)but has a significant structural similarity (~50% equivalentresidues). All three Ig domains have a number of prolinesbut TWIg18′ is the only domain with a cis proline (in theB–C loop). In the Ig domains the conserved tryptophanresidue is found in the C strand, and CD2d1 has a second,solvent-exposed tryptophan in the A′ strand.

A common structural coreFigure 2 shows the sequence alignment based on the struc-tural superposition of all five proteins. There is a commonstructural core composed of residues from the B, C, E, Fand G strands where the backbone atoms can be aligned inall five structures, with a pairwise backbone root meansquare deviation (rmsd) ranging from 0.7 – 1.3 Å(mean = 1.0 Å) (Figure 2). Examination of these structurally

equivalent residues in the three-dimensional structuresshows that they form an interacting network of ten residuesin the hydrophobic core of the proteins. Figure 3 shows thecommon core residues of TWIg18′ and TNfn3. Althoughthey are apparently unrelated in evolution, being membersof different superfamilies, TWIg18′ and TNfn3 have 24residues (10 buried, 14 surface) that are structurally equiva-lent (Table 1). There is no significant residue identity inthe common core of these proteins, but the pattern of inter-residue interactions is similar. These common interactionsare centred around the buried residues in the B and Fstrands (yellow and orange in Figure 3) on opposite sheetsof the sandwich. Common core residues from strands C, Eand G (green, blue and red, respectively, in Figure 3) packonto these sidechains from the B and F strands. All five pro-teins in our study have a pattern of hydrophobic core inter-actions that is the same in this region.

Stability and folding kinetics vary considerablyAll the proteins have a tryptophan residue buried in thehydrophobic core; thus folding can be monitored by

Research Article Folding of immunoglobulin-like ββ-sandwich domains Clarke et al. 1147

Table 1

Similarity of the ββ-sandwich proteins described.

Protein TI I27 TWIg18′′ CD2d1 TNfn3 FNfn10Family Ig I-type Ig I-type Ig V-type fnIII fnIIINumber of residues 89 93 98 92 96

TI I27 – 9 10 8 4

TWIg18′′ 54 – 14 6 7(1.7)

CD2d1 47 49 – 6 10(1.5) (1.2)

TNfn3 22 24 25 – 22(1.3) (1.1) (0.9)

FNfn10 22 24 25 70 –(1.4) (1.2) (1.0) (1.2)

Above the diagonal the table gives the pairwise sequence identity (see Figure 2). Below the diagonal the table shows the number of residueswhere the backbone can be superimposed, and in parentheses the backbone rmsd (in Å) for those residues.

Figure 2

Structure-based sequence alignment. Thestructures were aligned in a pairwise fashionusing the program Insight and compared byinspection. An alignment of all five proteinswas made, and regions where the structuressuperimposed were determined. Regions ofalignment are shown as follows: red, all five(or four) proteins align; green, Ig proteinsalign; blue, fnIII proteins align; orange, threeproteins align. See also Table 1. Residuenumbers are indicated.

A A' B C C'TI I27: 1-----lievekplygVEVFVGETAHFEIEls-----epdvhGQWKlkgqpltasp--45TWIg18': 1----lkpkiltasRKIKIKAGFTHNLEVDfi----gapdpTATWTvgdsgaalap--47CD2d1: 1----------rdsGTVWGALGHGINLNi--pnfqmtddidEVRWErg--stlvaEF-42TNfn3: 1-rlDAPSQIEVKDvtd-------TTALITWFKPLAEIDGIELTYGIkdvpgdRTTI-48FNfn10: 1-vsDVPRDLEVVAatp-------SSLLISWDAPAVTVRYYRITYGEtggnspVQEF-48

C" D E F G46------------DCEIIEDGKKHILILHNCQLGMTGEVSFQAa-------------naKSAANLKVkel-8948------------elLVDAKSSTTSIFFPSAKRADSGNYKLKVKnel---------gedEAIFEVIVq---9343-krkmkpflksgAFEIla---ngDLKIKNLTRDDSGTYNVTVYSt-------ngtrilNKALDLRIl---9849-DLte-------------dENQYSIGN----LKPDTEYEVSLISRrg----dmssNPAKETFTT------9049-TVpg-------------sKSTATISG----LKPGVDYTITVYAVtgrgdspassKPISINYRT------94

Structure

Page 4: Folding studies of immunoglobulin-like β-sandwich proteins suggest that they share a common folding pathway

changes in intrinsic fluorescence. They have no disul-phide bonds, but all have a number of proline residues;only refolding phases that are independent of proline iso-merization are considered here. The stability of the pro-teins at 0 M denaturant, ∆GH2O

D–N (where D is the denaturedstate and N the native state), varies significantly, from 4.0to 9.4 kcal mol–1 (Table 2). All proteins fit a two-stateequilibrium unfolding model, that is, at equilibrium onlytwo states, the native and denatured states, are populated.There is a considerable range in both refolding andunfolding rates. Refolding rates vary from 2 to 240 s–1 andunfolding rates from 2 × 10–4 to 2 × 10–3 s–1.

The more stable proteins have marginally stable foldingintermediatesTo detect the presence of folding intermediates theobserved folding kinetics were compared with the kineticsexpected for a two-state folding mechanism. In a two-statesystem, the equilibrium and kinetic data are equivalent.The observed folding rate constant reflects the free-energy difference between the denatured state, D, andthe transition state for folding, ‡. If a folding intermediate,I, is populated during folding, then the folding rate willreflect the free-energy difference between I and ‡. As I ismore stable than D, the observed folding rate will belower than predicted from the equilibrium ∆G and ku. Thedata show evidence for the presence of a low-stabilityfolding intermediate populated at low concentrations ofdenaturant in three of the proteins, FNfn10, CD2d1 andTI I27; there is a ‘roll over’ in the refolding kinetics at lowdenaturant concentrations (Figure 4). The deviation fromtwo-state behaviour is not large, meaning that the inter-mediates have a comparatively low stability, relative to D.Kinetic data alone cannot distinguish on- and off-pathwayintermediates. More work will be needed to characterizefully the nature and stability of these intermediates.

Folding rates correlate with stabilityAs we know that significant changes in stability and foldingkinetics can result from single-residue changes in a singleprotein, it is important that we compare proteins with a widerange of stabilities and folding/unfolding rates, so thatgeneral trends, rather than small differences, are observable.The proteins here fit these general criteria. To extend therange of our analysis we include in our comparisons datafrom Spitzfaden et al. [7] on the ninth fnIII domain fromhuman fibronectin (FNfn9), a significantly less stable Ig-likedomain (∆GD–N ≈ 1.0 kcal mol–1); this increases the range ofboth kf and ku to more than two orders of magnitude.

The logarithms of the folding rate constants vary directlywith protein stability (Figure 5a). This is a unique obser-vation. Two recent surveys of protein folding have shownthat there is no general correlation between folding ratesand protein stability (Figure 5b) [8,9]. Thus, this is not ageneral observation; neither is it the case for three otherprotein families where folding rates have been deter-mined (Figure 5b). The correlation between log kf and sta-bility is a feature of these Ig-like proteins.

There is no such correlation between unfolding rate con-stants and stability for these Ig-like proteins (Figure 5c).This is in contrast to the weak general correlation betweenunfolding rates and stability observed when all proteinsare considered (Figure 5d) [8,9].

ββT and contact order are not correlated with folding ratesAre there other folding patterns that can be discerned inthis analysis? The position of the transition state for folding

1148 Structure 1999, Vol 7 No 9

Figure 3

The common core residues of TWIg18′ and TNfn3 form a commonnetwork of interactions. The Cα trace is shown with the buriedsidechains of the residues of the common core displayed. Two viewsare shown, in (a) the A–B–E sheet is facing the viewer, in (b) theC–F–G sheet is frontmost. The colour coding is as follows: B strand,yellow; C strand, green; E strand, blue; F strand, orange; G strand, red.Residues are numbered according to Figure 2.

CC

NN

C

C

N

N

TNfn3TWIg18′

TNfn3

TWIg18′

88

8688

86

88

8688

86

68

18

20

72

36

59

34

73

22

24

7734

36

60

68

36

3470

72

73

36

34

75

77

(a)

(b)

Structure

Page 5: Folding studies of immunoglobulin-like β-sandwich proteins suggest that they share a common folding pathway

on the folding coordinate, βTfolding, can be assessed by the

value of m for folding (mD–‡) relative to mD–N. The m-value is a measure of the change in solvent exposuregoing from one state to another. If a transition state (‡) isvery native-like then βT

folding = mD–‡/mD–N will be closeto 1. These Ig-like proteins have βT

folding ~0.7 (0.5–0.9)(Table 2). There is no correlation between βT

folding forthese proteins and kf or ku; thus the position of the transi-tion state on the folding coordinate does not determinethe relative stability of the transition state. The similarityof βT

folding values might imply similar transition-state struc-tures in these proteins; this should be taken with caution,however, as many all-β proteins, with very different struc-tures, have similar βT

folding values [8,9].

In a survey of two-state proteins Plaxco et al. [9] observedthat folding rates are correlated with ‘contact order’, ameasure of the number of local contacts in the final struc-ture. There is no correlation, within this set of proteins, ofkf or ku with contact order. As these proteins have a similarstructure, they all have a similar contact order (Table 2),yet they have folding and unfolding rates that vary bymore than two orders of magnitude.

Common folding behaviour suggests a common foldingmechanism for Ig-like ββ-sandwich proteinsBy comparison of the folding of six different Ig-like pro-teins with a broad range of stability (1.0–9.4 kcal mol–1),folding and unfolding rates (0.3–240 s–1 and 5 × 10–2

–2 × 10–4 s–1, respectively), it is possible to distinguish broadpatterns of folding behaviour. The significant correlation

between folding rates and protein stability strongly suggeststhat the folding processes of these structurally similar pro-teins share common features.

Stability of the transition state for folding is proportional to thestability of the native stateKinetic experiments measure the difference in free energybetween the lowest-energy denatured state and the transi-tion state for folding, ∆GD–‡. Correlation between log kfand ∆GD–N implies a correlation between the free energyof the transition state, ∆GD–‡, and ∆GD–N. That is, theinteractions that confer intrinsic stability on N also stabi-lize the folding transition state ‡. Note that this meansthat unfolding rates, which depend on the energy differ-ence between N and ‡, will not correlate with stability.This is in agreement with the experimental data.

Stability of the folding intermediate is related to the stability ofthe native stateThe presence of a folding intermediate in some but not allof the proteins does not imply a different folding mecha-nism. The three most stable proteins (FNfn10, TI I27 andCD2d1) all have marginally stable folding intermediates;the least stable proteins (TNfn3 and TWIg18′) do not. Ifthe stability of I is related to the stability of the nativestate (N), then in the less stable proteins I becomes lessstable than the denatured state (D) and folding becomes,apparently, two-state. Proteins that fold by three-statekinetics have been shown to move to a two-state foldingmechanism as the folding intermediate is destabilized, rel-ative to the denatured state, by denaturants, temperature

Research Article Folding of immunoglobulin-like ββ-sandwich domains Clarke et al. 1149

Table 2

Thermodynamic and kinetic data.

Conditions* ∆GH2OD–N kH2O

f kH2Ou βfolding

T † Contact

(denaturant (kcal mol–1) (s–1) (s–1) orderpH, temperature)

TI I27‡ GdmCl 7.5 32 4.9 × 10–4 0.9 17.7pH 7.4, 25°C

TWIg18′§ Urea 4.0 1.5 2.8 × 10–4 0.7 19.7pH 5.0, 20°C

CD2d1 Urea 6.8 18 1.7 × 10–3 0.7 17.5pH 5.0, 25°C

TNfn3# Urea 5.3 2.9 4.6 × 10–4 0.7 17.1pH 5.0, 20°C

FNfn10 GdmSCN 9.4 240 2.3 × 10–4 0.5 17.0pH 5.0, 25°C

FNfn9¶ GdmCl ~1.0 0.3 5 × 10–2 0.7 18.3pH 4.8, 25°C

*Proteins were examined at a pH optimal for individual stability. Where20°C was used, the stability, folding and unfolding rates are altered by< 10%, compared with 25°C (SJH and JC, unpublished results).†Calculated using βT

folding = 1– (m‡–N/mD–N) from unfolding data. ThusβT

folding can be determined accurately for both two-state and non

two-state systems. ‡Data taken from [28]. §Equilibrium and unfoldingdata taken from [27]. #Data taken from [30]. ¶Calculated from kineticdata taken from [7]. GdmCl, guanidinium chloride; GdmSCN,guanidinium thiocyanate.

Page 6: Folding studies of immunoglobulin-like β-sandwich proteins suggest that they share a common folding pathway

or mutation. In the case of barnase it has been demon-strated that the folding pathway remains essentially thesame under these circumstances, but I can no longer beobserved in a kinetic experiment as it is not populated inthe destabilizing conditions [10,11].

Is the folding pathway determined by structuralconstraints?Defining the folding pathway of a protein involves mappinghow structure formation progresses during folding. Defin-ing the structure of the transition state for folding givesinformation about the folding pathway. Structures with sim-ilarly structured transition states are likely to have similarfolding pathways.

A number of situations could apply in investigations intothe folding of structurally related proteins. If the proteinshave different folding pathways, then the transition stateswould not be similar; that is, they would involve differentregions of the structure in different proteins. In that casewe should expect to see no correlation between folding

rate and stability. If the proteins have similar folding path-ways, that is, similar transition states, there are two possi-ble cases. In the first, the folding nucleus involvesinteractions in a part of a protein that is not involved indetermining the structure/stability of the fold. In this casethere would be no relationship between folding rate andstability. A lack of correlation between folding rate andstability need not imply that proteins fold by differentpathways. The alternative is that the proteins fold by thesame pathway and the interactions important for nucleat-ing folding and stabilizing the transition state are the sameas those that define and stabilize the fold. In this case wewould expect to see a correlation between folding rate andstability. This is what we observe for these Ig-like pro-teins. Thus the most likely explanation is that the maindeterminants of the folding pathway are to be found in thecommon elements of structure that define the fold.

As we have shown (Figures 2,3), a common structural coreis made up of a few residues from strands B, C, E, F andG. Buried residues from these strands, centred around

1150 Structure 1999, Vol 7 No 9

Figure 4

10–3

10–2

10–1

100

101

0 2 4 6 8 10

k obs

k obs

k obs

k obs

k obs

[urea] (M)0 2 4 6 8 10

[urea] (M)

TWIg18′

10

10

10

10

10

10CD2d1

10

10

10

10

10

10

10

10

0 1 2 3 4 5 6 7 8[GdmCl] (M)

TI I27

10

10

10

10

0 2 4 6 8 10

TNfn3

10

10

10

10

10

10

0.0 0.5 1.0 1.5 2.0 2.5 3.0

FNfn10

(a) (b)

(d) (e)

(c)

–2

–1

0

1

2

3

–4

–3

–2

–1

0

1

2

3

–2

–1

0

1

–1

0

1

2

3

4

[urea] (M) [GdmSCN] (D) Structure

The dependence of the refolding and unfolding rates (in s–1) on theconcentration of denaturant. The line shows the kinetics expected for atwo-state kinetic process. At low concentrations of denaturant log kfbecomes less dependent on [denaturant], and deviates from theexpected curve in CD2d1, TI I27, and FNfn10. This non-linearity(which is independent of protein concentration) indicates the presence

of a folding intermediate (see text). Some of the data have beendescribed in detail elsewhere: TWIg18′, (unfolding kinetics only) [27];TI I27 [28]; TNfn3 [30]. Note that for FNfn10, [GdmSCN] has beenadjusted to account for non-linear dependence of ∆G on denaturantconcentration (see the Materials and methods section).

Page 7: Folding studies of immunoglobulin-like β-sandwich proteins suggest that they share a common folding pathway

strands B and F, make up a common network of interac-tions in the hydrophobic core of the proteins, which definesthe common fold. Our results suggest that the proteins usestabilizing interactions in the common core to stabilize thefolding transition state. We are now able to present ahypothesis that can be tested explicitly: that residues inthis common core, centred around the B and F strands, onopposite sheets (yellow and orange residues in Figure 3),

constitute a folding nucleus in these β-sandwich proteins.The folding nucleus would thus largely consist of tertiaryinteractions. Whether a structurally identical foldingnucleus is conserved cannot be predicted. The relativeimportance of specific interactions in the common core maybe different in the different proteins. Therefore proteinengineering analysis is needed to build up a detailedpicture of the individual folding nuclei to determine

Research Article Folding of immunoglobulin-like ββ-sandwich domains Clarke et al. 1151

Figure 5

10–1

100

101

102

103

0 2 4 6 8 10∆G D–N (kcal mol–1)

k f s

–1k u

s–1

0 2 4 6 8 10∆G D–N (kcal mol–1)

0 2 4 6 8 10∆G D–N (kcal mol–1)

0 2 4 6 8 10∆G D–N (kcal mol–1)

10–1

100

101

102

103

104

105

10–4

10–3

10–2

10–1

10

10

10

10

10

10

10

10

k u s

–1k f

s–1

–5

–4

–3

–2

–1

0

1

2

(a) (b)

(c) (d)

Structure

Relationship between rate constants for folding and unfolding (in s–1)with stability. (a) Correlation between the logarithm of the rateconstant for folding and stability of Ig-like proteins. Data on FNfn9 fromSpitzfaden et al. [7] have been included. FNfn9 is homologous withFNfn10 and TNfn3. Filled circles indicate fnIII domains and opencircles Ig domains. (b) Relationship between the logarithm of thefolding rate constants and stability for a large number of two-stateproteins. Data are taken from Jackson [8] and data therein. Nocorrelation between log kf and ∆GD–N is observed. Data for CspBdomains (red diamonds), SH3 domains (black squares) and ACBP

proteins (blue circles) are highlighted. (c) There is no correlationbetween the logarithm of the unfolding rate of Ig-like proteins andstability. Filled circles indicate fnIII domains and open circles Igdomains. (d) Relationship between the logarithm of the unfolding rateconstants and stability for a large number of two-state proteins. Dataare taken from Jackson [8] and data therein. A weak correlationbetween log ku and ∆GD–N is observed. Data for CspB domains (reddiamonds), SH3 domains (black squares) and ACBP proteins (bluecircles) are highlighted. There is a correlation between log ku and∆GD–N for the CspB and ACBP proteins.

Page 8: Folding studies of immunoglobulin-like β-sandwich proteins suggest that they share a common folding pathway

whether the common features include a subset of interac-tions making up a specific folding nucleus.

Comparison with theoretical and experimental resultsShakhnovich and coworkers have proposed that a specificnucleation site for the Ig-like β-sandwich proteins will belocated in the hydrophobic core, involving interactions withthe residues corresponding to A18, I20, W22, L34, V70, A84and F88 in TNfn3, that is, interactions between strands B,C E, F and G [5]. Our results are in general agreement withthis proposition, although our common core does notextend to include residues A84 and W22 (Figure 3). Adetailed Φ-value analysis of the proteins in our study will benecessary to test further the theoretical model as proposed,as only structural detail can distinguish the relative impor-tance of specific interactions within the folding nucleus.

The limited experimental data available support ourmodel. Our hypothesis suggests that peripheral interac-tions will have a minor role, if any, in nucleation of foldingof Ig-like proteins. We have shown that TNfn3 can be sta-bilized by addition of two additional residues at the C ter-minus, and that this stabilization is reflected only by adecrease in the unfolding rate; that is, the C terminus isnot involved in the formation of the folding nucleus [12].A partially structured folding intermediate, stabilized bythe addition of Na2SO4, has been characterised in CD2d1[13–16]. The data are consistent with the presence ofstructure in this common core region in the transition statefor folding. Initial φ-value analysis of TNfn3 shows thatmost residues with high φ-values correspond to residues inthe putative folding nucleus described here (SJH and JC,unpublished results).

Comparison with other protein families We have argued that the common interactions importantin defining the structure of these Ig-like proteins are alsoused to guide folding. To what extent is this a commonfeature of proteins? Although the study presented here isthe first comparison of proteins that share the same foldbut are in different superfamilies, a few individualprotein families have been studied. However, in all casesthey have a conserved function and/or closely relatedsequences. A family of small all-β proteins for whichthere are data are the SH3 domains [17–20]. These showno correlation between folding or unfolding rates and sta-bility (Figure 5). Two of these proteins, the SH3 domainfrom α-spectrin and the Src SH3 domain have beenstudied in detail [17,18,21–23]. The proteins have a rela-tively compact folding nucleus located at one end of thestructure in a loop/turn region. The interactions in thetransition state are conserved, although the identity of theresidues that take part in the interactions are not [24].Clearly, the constraints on folding imposed by the SH3structure are different from those in the Ig-like proteins.Protein families may have conserved folding pathways,

but if the interactions that serve to nucleate folding arenot the same as those that define the structure and stabil-ity, then no correlation between folding rates and stabilitywill be observed. This serves to emphasise that whereas acorrelation between speed of folding and stability is likelyto imply a common folding nucleus, the opposite may notbe true. In acyl-coenzyme A-binding proteins, whichhave an all-α fold, there is no apparent relationshipbetween refolding rates and stability [25] (Figure 5b) butunfolding rates and stability are related (Figure 5d). Thedata set is small, but again it suggests that the all-α fold ofthese proteins places different structural constraints onthe folding pathway.

If two Ig-like proteins had the same core structure, butdifferent loops, then we would propose that they wouldhave the same folding rates, as the folding nucleus wouldbe maintained intact. Any differences in stability would belargely attributable to changes in the rate of unfolding.This is exactly the behaviour observed in another all-βfamily, the cold-shock proteins (Csp). Three CspB pro-teins with stabilities that vary from 2.7–6.3 kcal mol–1 haveapproximately the same folding rate and all fold in a two-state manner [26], so that the differences in stability arerelated to differences in unfolding rates (Figure 5b,d). Inthese proteins the sequence identity is high, and the corestructure is absolutely conserved; all the differences areperipheral, in loop regions of the structure. Possibly, theCspB proteins would have a similar pattern of folding aswe observe for Ig-like proteins. However, E Shakhnovichhas proposed (unpublished results) that proteins with thisfold have a conserved folding nucleus that is separate fromthe structural core. More data are needed to explore thepossibility further, but this serves to emphasize the impor-tance of studying a diverse set of proteins if one is todiscern folding patterns.

Biological implicationsWe have shown that there are common features in thefolding of six different Ig-like β-sandwich proteins, with astrong correlation between folding rates and stability.The most likely explanation is that these proteins share acommon folding pathway, with a folding nucleus com-posed of interactions between strands on both sheets,and at some considerable distance in the amino-acidsequence. This gives us a specific hypothesis to examinefurther, through detailed protein-engineering analysis ofthe structure of the transition states (and folding inter-mediates, where they exist), and by analysis of other pro-teins from the same superfamilies and from the relatedcadherin superfamily. It is worth stressing again that theproteins in this study are from two different superfami-lies; that is, there is no evidence that they are related inevolutionary terms. Thus the ‘conservation’ of foldingpathways may be a thermodynamic feature of the foldrather than an evolutionary constraint.

1152 Structure 1999, Vol 7 No 9

Page 9: Folding studies of immunoglobulin-like β-sandwich proteins suggest that they share a common folding pathway

Materials and methodsMaterialsThe production of recombinant proteins has been described previously:(TWIg18′ [27]; TI I27 [28]; CD2d1 [29]; TNfn3 [30]; FNfn10 [31]).

StabilityThe stability of each protein was determined by equilibrium denaturationusing urea, guanidinium chloride (GdmCl) or guanidinium thiocyanate(GdmSCN) as a denaturant, depending on stability, as described in[27]. For FNfn10, GdmSCN was used as a denaturant. The relationshipbetween ∆G and [GdmSCN] is strongly non-linear (AR Clarke, personalcommunication; EC and JC, unpublished results) and so the denaturantconcentration [D] was adjusted to account for this non-linearity:[D] = [M]*(6.47/(6.47 + [M]). This value was taken from unpublisheddata generously provided by M Pandya and AR Clarke (Bristol).

Folding kineticsFolding and unfolding kinetics were examined using stopped-flow fluo-rimetry in a range where both rate constants and amplitudes were inde-pendent of protein concentration (0.5–2 µM). Unfolding was initiated by ajump into high concentrations of denaturant. Measured folding rates variedsignificantly, with traces collected for between 10 s and 75 min. (The veryslow unfolding reactions were followed by standard fluorimetry, followingmanual mixing.) The rate constant for unfolding in water (ku

H2O) was deter-mined by extrapolation to 0 M denaturant. For most data points refoldingwas initiated by mixing a solution of protein unfolded in denaturant intobuffer. TI I27 and TNfn3 could be unfolded by alkali (pH 12.4) and onlyTWIg18′ could be unfolded by low pH (pH 1–1.5), so that for these pro-teins refolding rates in 0 M denaturant (kf

H2O) could be determined directlyby pH jump. For CD2d1 and FNfn10, Kf

H2O values were determined byextrapolation of refolding rates at low denaturant concentrations.

AcknowledgementsWe would like to thank Mark Bycroft, Cyrus Chothia and EugeneShakhnovich for helpful discussions, and Maya Pandya, Tony Clarke, EugeneShakhnovich and Leonid Mirny for sharing unpublished data. Clones forFNfn10 and CD2d1 were generously provided by Kevin Plaxco and PaulDriscoll. We acknowledge funding from the Wellcome Trust (JC, SF), theConsejo Nacional de Ciencia y Tecnologia, Mexico (EC) and the MRC (SJH).

References1. Williams, A.F. & Barclay, A.N. (1988). The immunoglobulin

superfamily — domains for cell surface recognition. Annu. Rev.Immunol. 6, 381-405.

2. Bork, P., Holm, L. & Sander, C. (1994). The immunoglobulin fold.Structural classification, sequence patterns and common core. J. Mol.Biol. 242, 309-320.

3. Gerstein, M. & Levitt, M. (1997). A structural census of the currentpopulation of protein sequences. Proc. Natl Acad. Sci. USA94, 11911-11916.

4. Shakhnovich, E., Abkevich, V. & Ptitsyn, O. (1996). Conservedresidues and the mechanism of protein folding. Nature 379, 96-98.

5. Mirny, L.A., Abkevich, V.I. & Shakhnovich, E.I. (1998). How evolutionmakes proteins fold quickly. Proc. Natl Acad. Sci. USA 95, 4976-4981.

6. Bateman, A., Birney, E., Durbin, R., Eddy, S. R., Finn, R. D. &Sonnhammer, E.L. (1999). Pfam 3.1: 1313 multiple alignments matchthe majority of proteins. Nucleic Acids Res. 27, 260-262.

7. Spitzfaden, C., Grant, R.P., Mardon, H.J. & Campbell, I.D. (1997).Module–module interactions in the cell binding region of fibronectin:stability, flexibility and specificity. J. Mol. Biol. 265, 567-579.

8. Jackson, S.E. (1998). How do small single-domain proteins fold?Folding & Design 3, R81-R91.

9. Plaxco, K.W., Simons, K.T. & Baker, D. (1998). Contact order,transition state placement and the refolding rates of single domainproteins. J. Mol. Biol. 277, 985-994.

10. Dalby, P.A., Oliveberg, M. & Fersht, A.R. (1998). Movement of theintermediate and rate determining transition state of barnase on theenergy landscape with changing temperature. Biochemistry37, 4674-4679.

11. Dalby, P.A., Oliveberg, M. & Fersht, A R. (1998). Folding intermediatesof wild-type and mutants of barnase. I. Use of phi-value analysis andm-values to probe the cooperative nature of the folding pre-equilibrium. J. Mol. Biol. 276, 625-646.

12. Hamill, S.J., Meekhof, A.E. & Clarke, J. (1998). The effect of boundaryselection on the stability and folding of the third fibronectin type IIIdomain from human tenascin. Biochemistry 37, 8071-8079.

13. Parker, M.J. & Clarke, A.R. (1997). Amide backbone and water-relatedH/D isotope effects on the dynamics of a protein folding reaction.Biochemistry 36, 5786-5794.

14. Parker, M.J., Dempsey, C.E., Lorch, M. & Clarke, A.R. (1997).Acquisition of native beta-strand topology during the rapid collapsephase of protein folding. Biochemistry 36, 13396-13405.

15. Parker, M.J., Dempsey, C.E., Hosszu, L.L.P., Waltho, J.P. & Clarke,A.R. (1998). Topology, sequence evolution and folding dynamics of animmunoglobulin domain. Nat. Struct. Biol. 5, 194-198.

16. Lorch, M., Mason, J.M., Clarke, A.R. & Parker, M.J. (1999). Effects ofcore mutations on the folding of a beta-sheet protein: implications forbackbone organization in the I-state. Biochemistry 38, 1377-1385.

17. Viguera, A.R., Martinez, J.C., Filimonov, V.V., Mateo, P.L. & Serrano, L.(1994). Thermodynamic and kinetic analysis of the SH3 domain ofspectrin shows a two-state folding transition. Biochemistry33, 2142-2150.

18. Grantcharova, V.P. & Baker, D. (1997). Folding dynamics of the srcSH3 domain. Biochemistry 36, 15685-15692.

19. Guijarro, J.I., Morton, C.J., Plaxco, K.W., Campbell, I.D. & Dobson,C.M. (1998). Folding kinetics of the SH3 domain of PI3 kinase byreal-time NMR combined with optical spectroscopy. J. Mol. Biol.276, 657-667.

20. Plaxco, K.W., Guijarro, J.I., Morton, C.J., Pitkeathly, M., Campbell, I.D.& Dobson, C.M. (1998). The folding kinetics and thermodynamics ofthe Fyn-SH3 domain. Biochemistry 37, 2529-2537.

21. Viguera, A.R., Serrano, L. & Wilmanns, M. (1996). Different foldingtransition states may result in the same native structure. Nat. Struct.Biol. 3, 874-880.

22. Grantcharova, V.P., Riddle, D.S., Santiago, J.V. & Baker, D. (1998).Important role of hydrogen bonds in the structurally polarizedtransition state for folding of the src SH3 domain. Nat. Struct. Biol.5, 714-720.

23. Martinez, J.C., Pisabarro, M.T. & Serrano, L. (1998). Obligatory stepsin protein folding and the conformational diversity of the transitionstate. Nat. Struct. Biol. 5, 721-729.

24. Gruebele, M. & Wolynes, P.G. (1998). Satisfying turns in foldingtransitions. Nat. Struct. Biol. 5, 662-665.

25. Kragelund, B.B., et al., & Poulsen, F.M. (1996). Fast and one-stepfolding of closely and distantly related homologous proteins of a four-helix bundle family. J. Mol. Biol. 256, 187-200.

26. Perl, D., et al., & Schmid, F.X. (1998). Conservation of rapid two-statefolding in mesophilic, thermophilic and hyperthermophilic cold shockproteins. Nat. Struct. Biol. 5, 229-235.

27. Fong, S., et al., & Clarke, J. (1996). Structure and stability of animmunoglobulin superfamily domain from twitchin, a muscle protein ofthe nematode Caenorhabditis elegans. J. Mol. Biol. 264, 624-639.

28. Carrion-Vazquez, M., et al., & Fernandez, J. M. (1999). Mechanical andchemical unfolding of a single protein: a comparison. Proc. Natl Acad.Sci. USA 96, 3694-3699.

29. Murray, A.J., Lewis, S.J., Barclay, A.N. & Brady, R.L. (1995). Onesequence, two folds: a metastable structure of CD2. Proc. Natl Acad.Sci. USA 92, 7337-7341.

30. Clarke, J., Hamill, S.J. & Johnson, C.M. (1997). Folding and stability ofa fibronectin type III domain from human tenascin. J. Mol. Biol.270, 771-778.

31. Plaxco, K.W., Spitzfaden, C., Campbell, I.D. & Dobson, C.M. (1996).Rapid refolding of a proline rich all β-sheet fibronectin type III module.Proc. Natl Acad. USA 93, 10703-10706.

32. Kraulis, P. (1991). MolScript, a program to produce both detailed andschematic plots of protein structures. J. Appl. Crystallogr. 24, 946-950.

33. Improta, S., Politou, A.S. & Pastore, A. (1996). Immunoglobulin-likemodules from titin I-band: Extensible components of muscle elasticity.Structure 4, 323-337.

34. Jones, E.Y., Davis, S.J., Williams, A.F., Harlos, K. & Stuart, D.I. (1992).Crystal structure at 2.8 Å resolution of a soluble form of the celladhesion molecule CD2. Nature 360, 232-239.

35. Leahy, D.J., Aukhil, I. & Erickson, H.P. (1996). 2.0 angstrom crystalstructure of a four-domain segment of human fibronectinencompassing the RGD loop and synergy region. Cell 84, 155-164.

36. Leahy, D.J., Hendrickson, W.A., Aukhil, I. & Erickson, H.P. (1992).Structure of a fibronectin type III domain from tenascin phased byMAD analysis of the selenomethionyl protein. Science 258, 987-991.

Research Article Folding of immunoglobulin-like ββ-sandwich domains Clarke et al. 1153