the present view of the mechanism of protein folding

6
© 2003 Nature Publishing Group PERSPECTIVES growing interest in diseases that result from protein unfolding, misfolding and aggrega- tion, which range from Creutzfeldt–Jakob disease to cancer 5,6 . To circumvent the Levinthal paradox, it was envisaged that proteins could fold by defined pathways and mechanisms that removed the need to search all possible conformations. At one extreme, it was thought that the stepwise formation of structure — the rapid formation of local secondary structure, which functions as a scaffold, followed by the acquisition of ter- tiary structure — would simplify the task, that is, the framework model 7 . Two mecha- nisms were proposed: diffusion–collision, which generally involves the formation of secondary structures, followed by their dif- fusion, collision and coalescence to form tertiary structure 8 ; and nucleation, in which a nucleus is formed slowly followed by the rapid propagation of structure 9 . At the other extreme, the hydrophobic-collapse mecha- nism proposes that the initial steps in folding involve hydrophobic collapse. Acquisition of secondary structure and the correct packing interactions are then formed in a confined volume 10 . The field of protein folding has seen tremendous advances over the past 15 years that are due, in large part, to techno- logical advances and communication between theoreticians and experimental- ists. The underlying technological break- throughs have been: protein engineering to probe specific portions of the protein; the use of NMR to characterize the partially unfolded and denatured states of proteins; fast spectroscopic methods; and improve- ments in molecular dynamics (MD) proce- dures, which were coupled to the advent of very fast, inexpensive computers that can simulate protein unfolding, and limited refolding, events at the atomic level. There is a synergy between these various disci- plines: experimental studies need theory so 39. Taniguchi, T. et al. S-phase-specific interaction of the Fanconi anemia protein, FANCD2, with BRCA1 and RAD51. Blood 100, 24142420 (2002). 40. Katzmann, D. J., Odorizzi, G. & Emr, S. D. Receptor downregulation and multivesicular-body sorting. Nature Rev. Mol. Cell Biol. 3, 893905 (2002). 41. Hicke, L. Gettingdown with ubiquitin: turning off cell-surface receptors, transporters and channels. Trends Cell Biol. 9, 107112 (1999). 42. Hicke, L. A new ticket for entry into budding vesicles ubiquitin. Cell 106, 527530 (2001). 43. Terrell, J., Shih, S., Dunn, R. & Hicke, L. A function for monoubiquitination in the internalization of a G protein- coupled receptor. Mol. Cell 1, 193202 (1998). 44. Levkowitz, G. et al. c-Cbl/Sli-1 regulates endocytic sorting and ubiquitination of the epidermal growth factor receptor. Genes Dev. 12, 36633674 (1998). 45. Lee, P. S. et al. The Cbl protooncoprotein stimulates CSF-1 receptor multiubiquitination and endocytosis, and attenuates macrophage proliferation. EMBO J. 18, 36163628 (1999). 46. Strous, G. J., van Kerkhof, P., Govers, R., Ciechanover, A. & Schwartz, A. L. The ubiquitin conjugation system is required for ligand-induced endocytosis and degradation of the growth hormone receptor. EMBO J. 15, 38063812 (1996). 47. Dunn, R. & Hicke, L. Multiple roles for Rsp5p-dependent ubiquitination at the internalization step of endocytosis. J. Biol. Chem. 276, 2597425981 (2001). 48. Shenoy, S. K., McDonald, P. H., Kohout, T. A. & Lefkowitz, R. J. Regulation of receptor fate by ubiquitination of activated β2-adrenergic receptor and β-arrestin. Science 294, 13071313 (2001). 49. Shih, S. C., Sloper-Mould, K. E. & Hicke, L. Monoubiquitin carries a novel internalization signal that is appended to activated receptors. EMBO J. 19, 187198 (2000). 50. Katzmann, D. J., Babst, M. & Emr, S. D. Ubiquitin- dependent sorting into the multivesicular body pathway requires the function of a conserved endosomal protein sorting complex, ESCRT-I. Cell 106, 145155 (2001). 51. Reggiori, F. & Pelham, H. R. Sorting of proteins into multivesicular bodies: ubiquitin-dependent and -independent targeting. EMBO J. 20, 51765186 (2001). 52. Reggiori, F. & Pelham, H. R. A transmembrane ubiquitin ligase required to sort membrane proteins into multivesicular bodies. Nature Cell Biol. 4, 117123 (2002). 53. Klapisz, E. et al. A ubiquitin-interacting motif (UIM) is essential for Eps15 and Eps15R ubiquitination. J. Biol. Chem. 277, 3074630753 (2002). 54. Kanelis, V., Rotin, D. & Forman-Kay, J. D. Solution structure of a Nedd4 WW domainENaC peptide complex. Nature Struct. Biol. 8, 407412 (2001). 55. Hochstrasser, M. Ubiquitin-dependent protein degradation. Annu. Rev. Genet. 30, 405439 (1996). 56. Timmers, C. et al. Positional cloning of a novel Fanconi anemia gene, FANCD2. Mol. Cell 7, 241248 (2001). 57. Howlett, N. G. et al. Biallelic inactivation of BRCA2 in Fanconi anemia. Science 297, 606609 (2002). 58. Hama, H., Tall, G. G. & Horazdovsky, B. F. Vps9p is a guanine nucleotide exchange factor involved in vesicle- mediated vacuolar protein transport. J. Biol. Chem. 274, 1528415291 (1999). 59. Horiuchi, H. et al. A novel Rab5 GDP/GTP exchange factor complexed to Rabaptin-5 links nucleotide exchange to effector recruitment and function. Cell 90, 11491159 (1997). 60. Haglund, K. et al. Multiple monoubiquitination of RTKs is sufficient for their endocytosis and degradation. Nature Cell Biol. 5, 461466 (2003). Acknowledgments The work of P.P.D.F. and S.P. is supported by grants from the Italian Association for Cancer Research (AIRC), Human Science Frontier Programme, International Association for Cancer Research (IARC), The European Community (VI Framework) and the Telethon Foundation. We thank L. Hicke for critically reviewing this manu- script. We apologize to the many colleagues whose work could not be properly acknowledged owing to space limitations. Online links DATABASES The following terms in this article are linked online to: InterPro: http://www.ebi.ac.uk/interpro/ 14-3-3 | CUE | forkhead-associated | PTB | SH2 | SH3 | UBA | UIM | WW Swiss-Prot: http://www.expasy.ch/ CIN85 | Ent1 | Ent2 | Eps15 | Eps15R | Hrs | Mms2 | Nedd4 | Ste2 | TAB2 | TSG101 | Tollip | Vps9 FURTHER INFORMATION European Institute of Oncology: http://www.ieo.it Access to this interactive links box is free online. NATURE REVIEWS | MOLECULAR CELL BIOLOGY VOLUME 4 | JUNE 2003 | 497 The present view of the mechanism of protein folding Valerie Daggett and Alan Fersht OPINION We can track the positions and movements of all the atoms in small proteins as they fold and unfold by combining experimental studies with atomic-resolution molecular dynamics simulations. General principles as to how such complex architectures form so rapidly are now emerging from in-depth studies of a few proteins. Protein folding is one of the most perplexing problems in molecular biology. The problem has two parts — the prediction of the three- dimensional, biologically active, native struc- ture of a protein from its sequence and how it reaches this native structure from its denatured state. The intellectual problem of the pathway of folding was highlighted more than thirty years ago by Levinthal 1 , who pointed out that if a small protein folded by randomly checking all possible conformations of its unfolded state, the process would take longer than the age of the universe. Although progress is being made, the de novo prediction of high-resolution structures from sequences remains unreliable 2 , but we are beginning to understand the process of how small proteins fold and unfold, which, in turn, should lead to better prediction algorithms. Both protein folding and unfold- ing are important in the cell 3,4 . In particular, the folding field has been invigorated by the “…experimental studies need theory so that detailed structural models can be used to interpret and exploit the experimental data, and theory in the absence of experimental verification is worthless.”

Upload: natalia-ronatowicz

Post on 15-Jul-2016

1 views

Category:

Documents


0 download

DESCRIPTION

,

TRANSCRIPT

© 2003 Nature Publishing Group

P E R S P E C T I V E S

growing interest in diseases that result fromprotein unfolding, misfolding and aggrega-tion, which range from Creutzfeldt–Jakobdisease to cancer5,6.

To circumvent the Levinthal paradox, itwas envisaged that proteins could fold bydefined pathways and mechanisms thatremoved the need to search all possibleconformations. At one extreme, it wasthought that the stepwise formation ofstructure — the rapid formation of localsecondary structure, which functions as ascaffold, followed by the acquisition of ter-tiary structure — would simplify the task,that is, the framework model7. Two mecha-nisms were proposed: diffusion–collision,which generally involves the formation ofsecondary structures, followed by their dif-fusion, collision and coalescence to formtertiary structure8; and nucleation, in whicha nucleus is formed slowly followed by therapid propagation of structure9. At the otherextreme, the hydrophobic-collapse mecha-nism proposes that the initial steps in foldinginvolve hydrophobic collapse. Acquisition ofsecondary structure and the correct packinginteractions are then formed in a confinedvolume10.

The field of protein folding has seentremendous advances over the past 15years that are due, in large part, to techno-logical advances and communicationbetween theoreticians and experimental-ists. The underlying technological break-throughs have been: protein engineering toprobe specific portions of the protein; theuse of NMR to characterize the partiallyunfolded and denatured states of proteins;fast spectroscopic methods; and improve-ments in molecular dynamics (MD) proce-dures, which were coupled to the advent ofvery fast, inexpensive computers that cansimulate protein unfolding, and limitedrefolding, events at the atomic level. Thereis a synergy between these various disci-plines: experimental studies need theory so

39. Taniguchi, T. et al. S-phase-specific interaction of theFanconi anemia protein, FANCD2, with BRCA1 andRAD51. Blood 100, 2414–2420 (2002).

40. Katzmann, D. J., Odorizzi, G. & Emr, S. D. Receptordownregulation and multivesicular-body sorting. NatureRev. Mol. Cell Biol. 3, 893–905 (2002).

41. Hicke, L. Getting’ down with ubiquitin: turning off cell-surface receptors, transporters and channels. Trends Cell Biol. 9, 107–112 (1999).

42. Hicke, L. A new ticket for entry into budding vesicles —ubiquitin. Cell 106, 527–530 (2001).

43. Terrell, J., Shih, S., Dunn, R. & Hicke, L. A function formonoubiquitination in the internalization of a G protein-coupled receptor. Mol. Cell 1, 193–202 (1998).

44. Levkowitz, G. et al. c-Cbl/Sli-1 regulates endocyticsorting and ubiquitination of the epidermal growth factorreceptor. Genes Dev. 12, 3663–3674 (1998).

45. Lee, P. S. et al. The Cbl protooncoprotein stimulatesCSF-1 receptor multiubiquitination and endocytosis, andattenuates macrophage proliferation. EMBO J. 18,3616–3628 (1999).

46. Strous, G. J., van Kerkhof, P., Govers, R., Ciechanover, A.& Schwartz, A. L. The ubiquitin conjugation system isrequired for ligand-induced endocytosis and degradationof the growth hormone receptor. EMBO J. 15,3806–3812 (1996).

47. Dunn, R. & Hicke, L. Multiple roles for Rsp5p-dependentubiquitination at the internalization step of endocytosis. J. Biol. Chem. 276, 25974–25981 (2001).

48. Shenoy, S. K., McDonald, P. H., Kohout, T. A. &Lefkowitz, R. J. Regulation of receptor fate byubiquitination of activated β2-adrenergic receptor and β-arrestin. Science 294, 1307–1313 (2001).

49. Shih, S. C., Sloper-Mould, K. E. & Hicke, L.Monoubiquitin carries a novel internalization signal that isappended to activated receptors. EMBO J. 19, 187–198(2000).

50. Katzmann, D. J., Babst, M. & Emr, S. D. Ubiquitin-dependent sorting into the multivesicular body pathway requires the function of a conserved endosomalprotein sorting complex, ESCRT-I. Cell 106, 145–155(2001).

51. Reggiori, F. & Pelham, H. R. Sorting of proteins intomultivesicular bodies: ubiquitin-dependent and -independent targeting. EMBO J. 20, 5176–5186 (2001).

52. Reggiori, F. & Pelham, H. R. A transmembrane ubiquitinligase required to sort membrane proteins intomultivesicular bodies. Nature Cell Biol. 4, 117–123(2002).

53. Klapisz, E. et al. A ubiquitin-interacting motif (UIM) isessential for Eps15 and Eps15R ubiquitination. J. Biol.Chem. 277, 30746–30753 (2002).

54. Kanelis, V., Rotin, D. & Forman-Kay, J. D. Solutionstructure of a Nedd4 WW domain–ENaC peptidecomplex. Nature Struct. Biol. 8, 407–412 (2001).

55. Hochstrasser, M. Ubiquitin-dependent proteindegradation. Annu. Rev. Genet. 30, 405–439 (1996).

56. Timmers, C. et al. Positional cloning of a novel Fanconianemia gene, FANCD2. Mol. Cell 7, 241–248 (2001).

57. Howlett, N. G. et al. Biallelic inactivation of BRCA2 inFanconi anemia. Science 297, 606–609 (2002).

58. Hama, H., Tall, G. G. & Horazdovsky, B. F. Vps9p is aguanine nucleotide exchange factor involved in vesicle-mediated vacuolar protein transport. J. Biol. Chem. 274,15284–15291 (1999).

59. Horiuchi, H. et al. A novel Rab5 GDP/GTP exchangefactor complexed to Rabaptin-5 links nucleotideexchange to effector recruitment and function. Cell 90,1149–1159 (1997).

60. Haglund, K. et al. Multiple monoubiquitination of RTKs issufficient for their endocytosis and degradation.Nature Cell Biol. 5, 461–466 (2003).

AcknowledgmentsThe work of P.P.D.F. and S.P. is supported by grants from the ItalianAssociation for Cancer Research (AIRC), Human Science FrontierProgramme, International Association for Cancer Research (IARC),The European Community (VI Framework) and the TelethonFoundation. We thank L. Hicke for critically reviewing this manu-script. We apologize to the many colleagues whose work couldnot be properly acknowledged owing to space limitations.

Online links

DATABASESThe following terms in this article are linked online to:InterPro: http://www.ebi.ac.uk/interpro/14-3-3 | CUE | forkhead-associated | PTB | SH2 | SH3 | UBA |UIM | WWSwiss-Prot: http://www.expasy.ch/CIN85 | Ent1 | Ent2 | Eps15 | Eps15R | Hrs | Mms2 | Nedd4 |Ste2 | TAB2 | TSG101 | Tollip | Vps9

FURTHER INFORMATIONEuropean Institute of Oncology:http://www.ieo.itAccess to this interactive links box is free online.

NATURE REVIEWS | MOLECULAR CELL BIOLOGY VOLUME 4 | JUNE 2003 | 497

The present view of the mechanism ofprotein foldingValerie Daggett and Alan Fersht

OP I N ION

We can track the positions and movementsof all the atoms in small proteins as they foldand unfold by combining experimentalstudies with atomic-resolution moleculardynamics simulations. General principles asto how such complex architectures form sorapidly are now emerging from in-depthstudies of a few proteins.

Protein folding is one of the most perplexingproblems in molecular biology. The problemhas two parts — the prediction of the three-dimensional, biologically active, native struc-ture of a protein from its sequence and how itreaches this native structure from its denatured

state. The intellectual problem of the pathwayof folding was highlighted more than thirtyyears ago by Levinthal1, who pointed out that ifa small protein folded by randomly checkingall possible conformations of its unfolded state,the process would take longer than the age ofthe universe.Although progress is being made,the de novo prediction of high-resolutionstructures from sequences remains unreliable2,but we are beginning to understand theprocess of how small proteins fold and unfold,which, in turn, should lead to better predictionalgorithms. Both protein folding and unfold-ing are important in the cell3,4. In particular,the folding field has been invigorated by the

“…experimental studiesneed theory so thatdetailed structural modelscan be used to interpretand exploit theexperimental data, andtheory in the absence ofexperimental verification isworthless.”

© 2003 Nature Publishing Group

498 | JUNE 2003 | VOLUME 4 www.nature.com/reviews/molcellbio

P E R S P E C T I V E S

pressure elevated to ~26 atm to maintainwater in the liquid state), unfolding takesplace in a fraction of a microsecond, whichis the time range that is accessible to directsimulation at present. The folding path-ways are then identified by considering theunfolding pathway in reverse3. A recentstudy of the unfolding pathway of CI2 as afunction of temperature, ranging from75–225°C, found that the overall unfold-ing process in silico is independent of tem-perature — raising the temperature merelyaccelerates the rate of unfolding withoutchanging the pathway15. Furthermore, sim-ulations show that the overall unfoldingpathway is similar in 8 M urea16.

Further unfolding of CI2 leads to the dena-tured state, which is an expanded, poorlystructured state (FIG. 1). This state is almost arandom coil, with some native, residual helicalstructure as well as dynamic hydrophobicclusters in the centre of the sequence. The MD-generated denatured state ensemble is fullysupported by the results of NMR studies17.

The folding mechanism of CI2 does not fitany of the classical models. There is concur-rent consolidation of tertiary and secondarystructure as most of the protein collapses,or condenses, around an extended nucleusthat is comprised of portions of the helixand β-sheet. This nucleation–condensation,or nucleation–collapse, mechanism seems tobe quite common11.

Three-state folding: barnaseSimilar studies were conducted on theunfolding/folding of barnase (FIG. 2), which isa multidomain protein that folds in a morecomplicated manner than CI2 through anexperimentally detectable intermediate. Itsunfolding/folding pathway has also beenmapped in detail by combining experimentalstudies with simulation. Barnase containsconsiderable residual structure in its dena-tured state, as probed by MD, NMR and otherexperimental techniques18. This residualstructure helps to set up a loose, native topol-ogy in the denatured state (FIG. 2). Even in thedenatured state, its two domains — the α1,

state (TS) and denatured state (D) using acombination of traditional kinetic andthermodynamic experiments12. The ratiosof the resulting free energy changes arereferred to as Φ-values (Φ = ∆∆GTS–D/∆∆GN–D), and the magnitude of these val-ues, which tend to fall between 0 and 1,reflects the degree of structure at the muta-tion site. So, structure is inferred from ener-getics, but detailed molecular structures arenot obtained using this approach. MD sim-ulations, on the other hand, can providedetailed structural information. This infor-mation comes from denaturation simula-tions and characterization of the transitionand intermediate states using a conforma-tional clustering approach13.

Combining theory and experiment yieldsa self-consistent view of the unfolding/foldingpathway of CI2. There is a single, rate-determining transition-state ensemble forfolding and unfolding, and it is native-likewith considerable secondary structure anddisrupted packing of the side chains (FIG. 1).Experimentally derived Φ-values and thecorresponding values that describe the extentof local structure in the MD-generated modelsare in excellent agreement13, and simulationsusing a different computer program giveconsistent results14. The MD simulations aregenerally carried out at high temperatures forthe unfolding reaction, because the rate ofunfolding increases rapidly with increasingtemperature until, at 200–225°C (with the

that detailed structural models can be usedto interpret and exploit the experimentaldata, and theory in the absence of experi-mental verification is worthless. Accordingly,theory and experiment are finally becom-ing truly integrated, building on theirstrengths to yield a much richer view of theprotein-folding process. We highlight thissynergy below by focusing on our collabo-rative work. We have chosen three smallproteins that seem to be representative ofthe different extremes of behaviour, andthe details of the behaviour of these pro-teins have allowed us to draw some generalconclusions about folding in molecularand structural terms. Importantly, smallsingle-domain proteins probably fold thesame way in vivo and in vitro, and are alsorepresentative of the individual domainsof larger proteins11. Folding in vivobecomes complicated for large multido-main proteins, because of their sequentialbiosynthesis and the involvement of chap-erone systems. However, their individualdomains seem to fold by the same basicprinciples as the small proteins that aredescribed below.

Two-state folding: CI2Our first forays into combining theory andexperiment involved studies of the transi-tion state of chymotrypsin inhibitor 2(CI2), a small, 64-residue, single-domain,two-state folding protein (FIG. 1). In two-state folding, only the denatured and nativestates occupy free energy minima (startingmaterials, intermediates and products sit atthe bottom of energy wells, whereas transi-tion states are at the tops of saddle points inenergy landscapes). CI2 was the secondprotein, after barnase (discussed below), tobe thoroughly investigated using proteinengineering and Φ-value analysis. Thisapproach involves introducing mutationsthroughout the protein and measuring theeffects on the native state (N), transition

Figure 1 | The folding pathway of chymotrypsin inhibitor 2. The figure shows snapshots ofchymotrypsin inhibitor 2 (omitting the solvent) from a 100°C unfolding simulation in reverse time15. Thestructures are labelled D, I, TS and N for denatured, intermediate, transition and native51 states,respectively. Two of the many states in the denatured ensemble are shown. The portions of the proteinthat form β-strands in the native state are highlighted in green, and the α-helix is shown in red. Theintermediate is not detected experimentally, but is indicated by some simulations. The time points for thestructures are given in parentheses.

Figure 2 | The folding pathway of barnase. The figure shows snapshots of barnase taken from a hightemperature (225°C) simulation in water, which are presented in reverse time18–21. The structures arelabelled D, I, TS and N for denatured, intermediate, transition and native52 states, respectively. Thesequences that are highlighted form β-strands 1, 2 and 5 (blue), the β3–β4 hairpin (green) and the α1 andα2 helices (red) in the native state52. The time points for the structures are given in parentheses.

D (94 ns) D (70 ns) I (30 ns) TS (0.26 ns) N (0 ns)

α β3β2

β1

D (4 ns) D (2 ns) I (0.7 ns) TS (0.15 ns) N (0 ns)

α1α2

β3

β4

β5β2

β1

© 2003 Nature Publishing Group

P E R S P E C T I V E S

same temperature range, at 75 and 100°C,and therefore required only minor extrapo-lation compared with past studies. The timethat is taken to reach the transition state inthese simulations is in good agreement withthe unfolding times determined experimen-tally (FIG. 3b).

From the intermediate state, reorientationof the helices and their loose docking thenleads to the transition state. The transitionstate contains native-like secondary structureand a partially packed hydrophobic core,which is consistent with a framework mecha-nism. The calculated and experimental Φ-values for the transition state are in goodagreement (correlation coefficient = 0.86 forthe 100°C simulations). The simulatedunfolding process is independent of temper-ature, and essentially the same transitionstates are obtained at 75, 100, and 225°C22.The final steps in folding involve the dockingof the helices, the fine-tuning of the sidechains and the expulsion of water.

We have begun to analyse the transition-state structures and unfolding pathways ofproteins with the same fold as the En-HD, but

β1–5 domain and the α2 domain — remainsemi-independent.

In the main hydrophobic core of barnase(the α1, β1–5 domain; FIG. 2), folding is nucle-ated by residual structure in the form of afluctuating native helical structure andhydrophobic clusters that are centred on theβ3–β4 hairpin. Interestingly, this hairpin par-ticipates in the folding of the α1 helix byactively aiding the formation of secondarystructure using side-chain interactions tohelp pull the helix through dihedral-angletransitions. We have called this ‘contact-assisted secondary-structure formation’19.This mechanism is important in providing alink between secondary- and tertiary-struc-ture formation, particularly for proteins thatdo not fold by merely docking pre-formedsecondary structural units11.

In barnase, the β3–β4 hairpin and the α1helix then form a scaffold on which theremaining β-strands pack (FIG. 2). The result-ing structure is loose and represents the mainintermediate during the refolding of barnase.The MD-generated intermediate that isshown in FIG. 2 is in quantitative agreementwith the experimentally derived Φ-values20.Further collapse and consolidation of thisstructure occurs, which brings the residuesthat are involved in the folding nucleus intoclose proximity in the transition state. As withthe intermediate state, the calculated andexperimental Φ-values are in good agreementfor this transition state21. For the second heli-cal domain, the α2 helix contains consider-able residual structure in the denatured state.The segregation of this region in the dena-tured and intermediate states provides thecorrect overall topology, so that tertiary inter-actions are primed for the later collapse andrefinement of packing interactions, whichoccur in the transition state, and for the con-solidation of the interface between the twodomains.

Ultrafast folding: En-HDDespite the good agreement that there isbetween the simulations and experimentsfor the systems described above, there is alarge difference between the timescale of theexperimental studies (typically millisecondsor greater) and the simulations (generallynanoseconds to microseconds). At the timeof Levinthal’s speculations, proteins werethought to fold on timescales of fractions ofa second to minutes. Remarkably, small,ultrafast folding proteins have been discov-ered recently that fold in microseconds atphysiological temperatures and unfold intens of nanoseconds or less at 65–100°C,thereby bridging the gap between the

timescales22. In particular, we have focusedon the engrailed homeodomain fromDrosophila melanogaster (En-HD; FIG. 3), aswell as some of its homologues.

The fully unfolded state of En-HD con-tains little residual secondary structure, isexpanded and very dynamic (FIG. 3a). Thisstate is not populated appreciably underconditions that favour folding (‘physiologi-cal’ conditions)22. Instead, the denaturedstate under physiological conditions is, infact, a folding intermediate, which has ahigh helical content and few tertiary con-tacts (FIG. 3a). The denatured states of manyproteins under physiological conditions arebest described as folding intermediates, andhighly unfolded denatured states are rarelyobserved. This is because the fully unfoldedconformations are very unstable and tend tocollapse to states that have some native-likefeatures or topology.

In temperature-jump experiments, En-HDfolds from this intermediate state to itsnative state in ≤1.5 µs. The kinetics were fol-lowed to 65°C and extrapolated to 100°C(FIG. 3b). Simulations were carried out in the

NATURE REVIEWS | MOLECULAR CELL BIOLOGY VOLUME 4 | JUNE 2003 | 499

Figure 3 | The folding pathway of the engrailed homeodomain. a | The figure shows snapshots ofengrailed homeodomain from a 225°C simulation, which are shown in reverse time22. The structures arelabelled D, I, TS and N for denatured, intermediate, transition and native53 states, respectively. Thepositions of the helices in the native state53 are coloured in red, green and blue for helices I, II, and III,respectively. The time points for the structures are given in parentheses. b | The kinetics of folding (blue)and unfolding (green) of the engrailed homeodomain. The figure shows a plot of the rate constants (k) forfolding and unfolding of the intermediate as a function of temperature, and also the time taken to reach thetransition state in the unfolding simulations (grey; right-hand axis). The very fast rate constants (red) mightbe those for the formation of the intermediate. Modified with permission from REF. 22 Nature © MacmillanMagazines Ltd (2003).

D (55 ns) D (40 ns) I (10 ns) TS (0.26 ns) N (0 ns)

I

II

III

a

b109

108

107

106

105

104

103

102

10–9

10–8

10–7

10–6

10–5

10–4

10–3

10–220 30 40 50 60 70 80 90 100

Temperature (°C)

k (s–1) t1/2 (s)

© 2003 Nature Publishing Group

500 | JUNE 2003 | VOLUME 4 www.nature.com/reviews/molcellbio

P E R S P E C T I V E S

propagation that is carried through the fold-ing pathway. Nucleation might be relativelyweak, but nonetheless effective, as for CI2, sothat folding is directed but the structure is notstrong enough to lead to the formation of awell-populated and experimentally observ-able intermediate. By contrast, nucleationsites can be strong and well populated, whichleads to the formation of intermediates and,in some cases, to very rapid folding. En-HDprovides a marked example of this. Its dena-tured state, under physiological conditions, ishighly helical and is a productive foldingintermediate, and, from this state, the proteinfolds extremely rapidly by the docking ofessentially pre-formed helices.

Intermediate states along the way. There arenumerous controversies surrounding inter-mediates29–30. It is our opinion that, at themolecular level, intermediates are always pre-sent. In other words, true two-state foldingwith only the denatured and native statesoccupying free energy minima is implausible.However, intermediates can be completelysilent, occupying high-energy minima, andthey can interconvert rapidly without anychange of rate-determining steps, so that, toall intents and purposes, two-state folding isadequate to describe the folding processexperimentally (as is found for CI2). In otherexamples, such as during the folding of bar-nase, an intermediate is observed by bothexperiment and simulation. This intermedi-ate contains a semi-structured core with dis-ruptions to the turns and loops. Although CI2folds in a two-state manner, as assessed byvarious experimental techniques, there is atransient intermediate during unfolding inthe MD simulations (FIG. 1). The same is truefor the c-Myb transforming protein, a relativeof the engrailed homeodomain. In thiscase, an intermediate is populated in thesimulations but is not observed experimen-tally (Gianni et al., manuscript in prepara-tion); however, a minor mutation stabilizesthe intermediate so that it can be seen experi-mentally. In either case, simulations indicatethat the structure in intermediates is carriedthrough the pathway and facilitates folding,even when this structure is effectively invisibleexperimentally.

Almost native: the transition state. The tran-sition state for unfolding/folding is, almostwithout exception, highly structured. It is anensemble of related structures that havesome or much of the secondary structureintact and disrupted packing interactions. Insome cases, the sites that are important forthe initial nucleation in the denatured state

particularly persistent tertiary interactionscan help to direct this search by starting theprocess from denatured states with residualstructure24–26. Therefore, instead of foldingbeginning from a random coil or extendedstate, it is ‘kick-started’ by native-like topologyand intramolecular interactions. Recentexperimental studies indicate that changes inthe topology of the denatured state can affectthe folding pathway27–28.

The degree of residual structure, whetherwe are considering secondary or tertiarystructure, depends on the sequence and theenvironment, and ranges from very little inCI2 to substantial amounts in barnase. Somemembers of the En-HD family have highlyhelical denatured states whereas others donot. The best-studied random coil — thedenatured state of CI2 — shows that,although low levels of residual structure canbe effectively averaged out, there can still beconformational preferences. So, not all con-formations are sampled in the denatured stateand, instead, the loose retention of nativetopology, fluctuating secondary structure anddynamic side-chain interactions direct pro-teins to fold using a biased search that leads tofolding pathways.

The structure in the denatured state, evenif minimal, can aid in the nucleation and

with little sequence identity, to look for com-monalities in folding mechanisms and theirsequence determinants (Gianni et al., manu-script in preparation). Preliminary indica-tions from both experiment and simulationare that there is a subtle movement from theframework to the nucleation–condensationmechanism with decreasing stability ofsecondary structure. We have presented indetail elsewhere the evidence that nucle-ation–condensation might be a unifyingmechanism of protein folding, and that theframework and hydrophobic-collapse mod-els might be extreme manifestations ofnucleation–condensation when either thesecondary structure or the tertiary interac-tions, respectively, become overstabilized11.

So, how do proteins fold? The starting point: the denatured state. Tobegin, we note that the term denaturedstate is an operational definition merelydenoting that the protein is inactive, incontrast to the biologically active nativestate. Denatured states can be quite sensi-tive to their environment, such that, interms of their ‘unfoldedness’, they can spana continuum of conformational states,ranging from structured intermediates (forexample, under physiological conditions or inmild denaturants) to random coils (for exam-ple, in 6 M guanidinium hydrochloride)23.From the various studies that are outlinedabove and others, we have learned that pro-teins are programmed for efficient folding.The Levinthal paradox1 is circumventedthrough a more directed and efficient searchof conformational space, that is, proteins donot sample all possible conformations, nordo proteins sample conformational spacerandomly. The intrinsic conformationalpropensities of the secondary structure and

Figure 4 | An overlay of transition-state structures from independent unfolding simulations ofchymotrypsin inhibitor 2 at different temperatures. The crystal structure51 is shown in red, fourindependent transition-state structures that were determined at 225°C are highlighted in green andtransition-state structures from 100, 125, 150, 200 and 225°C are displayed in cyan15.

Rotate 90° to right

“…if a small protein foldedby randomly checking allpossible conformations ofits unfolded state, theprocess would take longerthan the age of theuniverse.”

© 2003 Nature Publishing Group

P E R S P E C T I V E S

slides towards the framework model withincreased stability of the secondary structure— are consistent with a growing body ofinformation regarding the folding pathwaysof other small proteins that have beenanalysed in depth36–40.

There is still much to do. We need morecombined studies on the newly discoveredultrafast folding proteins to bridge the gapbetween the timescales of experiments andMD simulations at atomic resolution. Thefast unfolding proteins are now within thetimeframe of MD simulations at accessibletemperatures, and the slower folding proteinscan be studied at elevated temperatures in silico.But even the fastest folding proteins are onlytantalizingly close, and normal proteins arestill many orders of magnitude too slow forthe presently accessible microsecond timeregime of simulation41. Pande and colleagueshave recently reported the apparent successof distributed computing (tens of thousandsof short simulations on screensavers aroundthe world) on reproducing the rate constantfor the folding of a 23-residue designed pep-tide42. However, for several reasons43, we con-sider that these results are too preliminary tobe judged meaningfully. In particular, evi-dence is required that simulation and experi-ment are measuring the rate constants for thesame processes and that the overall foldingreaction is not multistep, as was found forEn-HD. For example, investigators initiallyassumed that a wild-type titin domain (TI I27)followed the same unfolding pathway insolution as when force was applied, becausethe rate constants were similar44. However,more in-depth studies of the pathways — bysimulation45,46, mutation47 and using atomic-force microscopy studies on mutants48,49 —indicate that the titin protein domain unfoldsby different pathways under the differentconditions. So, calculating rates that are inagreement with experiment is not enough;care must be taken to validate thoroughly thesimulation studies and to ensure that the twoapproaches are, in fact, monitoring the samepathways.

In addition, we need more experimentalmethods for the detection and characteriza-tion of dynamic structures during folding andunfolding, as well as more powerful com-puters. With improvements in experimental

are carried along the folding pathway andbecome stronger as the protein approachesthe native state, or they function as scaffoldsfor other parts of the chain to build on, as isthe case for barnase. By contrast, some initi-ation sites can be important at some stage ofthe pathway, but are not carried through andreflected in the native state — that is, crucial,but transient, non-native interactions, suchas the hydrophobic clusters that narrow theconformational search early in the folding ofCI2. Alternatively, interactions that areimportant to folding might not form untilthe transition state, as with the nucleationpatches that form between the α-helix andβ-sheet in CI2.

Although we prefer to carry out simula-tions and experiments in parallel and thento compare our results, experimental Φ-valuescan also be used as restraints to build plausi-ble transition-state models31. Recently, suchcombined efforts have produced intriguingtransition-state models for acylphos-phatase32 and a fibronectin type IIIdomain33. The models can then be subjectedto further experimental verification bymutating residues that were not used tobuild the models. Therefore, they also havepredictive power.

It must also be kept in mind that the tran-sition state is, like any other thermodynamicor kinetically populated state, comprised ofan ensemble of conformations. This state is

more heterogeneous than that of the nativestate, but it is constrained relative to inter-mediate and denatured states. That is, evenif there are only a few key contacts in thenucleus of the transition state, the rest ofthe protein is not random or widely diver-gent. FIG. 4 illustrates the heterogeneousnature of the transition-state ensemble andthe relative insensitivity of the transition-stateensemble of CI2 to large changes in tempera-ture. From this moderately diverse ensemble,the final steps in folding involve the expulsionof water molecules from the interior andexposed residues, and the fine-tuning of theside chains, which then leads to the muchtighter native-state ensemble.

General determinants of folding rates?It is an appealing quest to produce quantifi-able theories of folding rates based on simpleparameters. One proposal is a linearitybetween logarithms of folding rate constants(logk) and contact order, which is a measure ofthe proportion of long-range tertiary contactsthat connect elements of secondarystructure34. But, for proteins that fold bynucleation–condensation, such as CI2, logkdepends on the stability of the protein, andthere is a fairly good relationship between logkand the change in free energy for the folding ofits mutants (this is known as a Brønsted rela-tionship). The value of logk spans 3.5 units fordifferent mutants of CI2 that have constantcontact-order values (FIG. 5). There is, however,a good relationship between logk and contactorder for mutants in which the loop sizes havebeen systematically increased without unduechanges in stability (FIG. 5). Therefore, it seemsthat both contact order and stability areimportant determinants of folding rates35.The trouble with general theories on proteinsis that their stability and folding kinetics candepend crucially on specific interactions. Asthe devil lies in the details with proteins, theyrequire atomic-level simulation for the pre-cise prediction of their properties.

Concluding remarksAlthough we have focused on only threeproteins here, their folding properties havebeen characterized in detail. For example,barnase and CI2 were the first two proteinsto have their transition states subjected tointense atomic-level analysis by experimentand simulation, and En-HD is the mostextensively characterized of the ultrafastfolding proteins. We believe that the princi-ples outlined above for these proteins —such as native-like topology in the denaturedstate, native-like transition states and thenucleation–condensation mechanism, which

NATURE REVIEWS | MOLECULAR CELL BIOLOGY VOLUME 4 | JUNE 2003 | 501

Figure 5 | A plot of the logarithm of foldingrate constants for various proteins againstcontact order. The graph shows the logarithm offolding rate constants (logk) for various proteinsagainst contact order (purple circles). This plotwas made using data from REF. 35 and thecomplete published data set for mutants ofchymotrypsin inhibitor 2 (CI2). Note how mutantsof CI2 (yellow circles) that do not alter contactorder, just stability, span more than 3.5 orders ofmagnitude of folding rate constants, whichindicates that specific interactions can greatlyaffect folding rates. Contact-order mutants of CI2,which were made by inserting residues into aloop, show that contact order, as well as stability,is an important determinant of folding rateconstants.

5

4

3

2

1

0

–18 10 12 14 16 18 20 22

Contact order

logk

CI2 stability mutants

CI2 loopmutants

“It is our opinion that, atthe molecular level,intermediates are alwayspresent.”

© 2003 Nature Publishing Group

502 | JUNE 2003 | VOLUME 4 www.nature.com/reviews/molcellbio

P E R S P E C T I V E S

39. Kortemme, T., Kelly, M. J. S., Kay, L. E., Forman-Kay, J. &Serrano, L. Similarities between the spectrin SH3 domaindenatured state and its folding transition state. J. Mol. Biol. 297, 1217–1229 (2000).

40. Teilum, K., Maki, K., Kragelund, B. B., Poulsen, F. M. &Roder, H. Early kinetic intermediate in the folding of acyl-CoA binding protein detected by fluorescencelabeling and ultrarapid mixing. Proc. Natl Acad. Sci. USA99, 9807–9812 (2002).

41. Duan, Y. & Kollman, P. A. Pathways to a protein foldingintermediate observed in a 1-microsecond simulation inaqueous solution. Science 282, 740–744 (1998).

42. Snow, C. D., Nguyen, H., Pande, V. S. & Gruebele, M.Absolute comparison of simulated and experimental protein-folding dynamics. Nature 420,102–106 (2002).

43. Fersht, A. R. On the simulation of protein folding by shorttime scale molecular dynamics and distributedcomputing. Proc. Natl Acad. Sci. USA 99, 14122–14125(2002).

44. Carrion-Vazquez, M. et al. Mechanical and chemicalunfolding of a single protein: a comparison. Proc. NatlAcad. Sci. USA 96, 3694–3699 (1999).

45. Lu, H. & Schulten, K. The key event in force-inducedunfolding of titin’s immunoglobulin domains. Biophys. J.79, 51–65 (2000).

46. Paci, E. & Karplus, M. Unfolding proteins by externalforces and temperature: the importance of topology andenergetics. Proc. Natl Acad. Sci. USA 97, 6521–6524(2000).

47. Fowler, S. B. & Clarke, J. Mapping the folding pathway ofan immunoglobulin domain: structural detail from phivalue analysis and movement of the transition state.Structure 9, 355–366 (2001).

48. Brockwell, D. J. et al. The effect of core destabilization onthe mechanical resistance of I27. Biophys. J. 83,458–472 (2002).

49. Fowler, S. B. et al. Mechanical unfolding of a titin Igdomain: structure of unfolding intermediate revealed bycombining AFM, molecular dynamics simulations, NMR and protein engineering. J. Mol. Biol. 322, 841–849(2002).

50. Ferrin, T. E. et al. The MIDAS display system. J. Mol. Graph. 6, 13–27 (1988).

51. Harpaz, Y. et al. Direct observation of a better hydrationat the N terminus of an α-helix with glycine rather thanalanine as the N-cap residue. Proc. Natl Acad. Sci. USA91, 3–15 (1994).

52. Bycroft, M. et al. Determination of the three-dimensionalstructure of barnase using nuclear magnetic resonancespectroscopy. Biochemistry 30, 8697–8701 (1991).

53. Clarke, N. D. et al. Structural studies of the engrailedhomeodomain. Prot. Sci. 3, 1779–1787 (1994).

AcknowledgementsV.D. is grateful for financial support from the National Institutes ofHealth. A.R.F. is grateful for long-term support from the MedicalResearch Council. UCSF MidasPlus was used to render the pro-tein structures shown in the figures50.

Online links

DATABASESThe following terms in this article are linked online to:InterPro: http://www.ebi.ac.uk/interpro/engrailed homeodomain | fibronectin type III domainLocusLink: http://www.ncbi.nlm.nih.gov/LocusLink/acylphosphatase | titinSwiss-Prot: http://www.expasy.ch/barnase | CI2 | c-Myb

FURTHER INFORMATIONValerie Daggett’s laboratory:http://faculty.washington.edu/~daggettAlan Fersht’s laboratory:http://www.ch.cam.ac.uk/CUCL/staff/arf.htmlAccess to this interactive links box is free online.

17. Kazmirski, S. L. et al. Protein folding from a highlydisordered denatured state: the folding pathway ofchymotrypsin inhibitor 2 at atomic resolution. Proc. Natl Acad. Sci. USA 98, 4349–4354 (2001).

18. Wong, K. B. et al. Towards a complete description of thestructural and dynamic properties of the denatured stateof barnase and the role of residual structure in folding. J. Mol. Biol. 296, 1257–1282 (2002).

19. Bond, C. J. et al. Characterization of residual structure inthe thermally denatured state of barnase by simulationand experiment: description of the folding pathway. Proc. Natl Acad. Sci. USA 94, 13409–13413 (1997).

20. Li, A. & Daggett, V. The unfolding of barnase:characterization of the major intermediate. J. Mol. Biol.275, 677–694 (1998).

21. Daggett, V., Li, A. & Fersht, A. R. A combined moleculardynamics and Φ-value analysis of structure-reactivityrelationships in the transition state and unfolding pathwayof barnase: the structural basis of hammond and anti-hammond effects. J. Am. Chem. Soc. 120,12740–12754 (1998).

22. Mayor, U. et al. The complete folding pathway of aprotein from nanoseconds to microseconds. Nature 421,863–867 (2003).

23. Kazmirski, S. L. & Daggett, V. Simulations of thestructural and dynamical properties of denaturedproteins: the ‘molten coil’ state of bovine pancreatictrypsin inhibitor. J. Mol. Biol. 277, 487–506 (1998).

24. Dobson, C. M. Unfolded proteins, compact states andmolten globules. Curr. Opin. Struct. Biol. 2, 6–12 (1992).

25. Blanco, F. J., Serrano, L. & Forman-Kay, J. D. High populations of non-native structures in thedenatured state are compatible with the formation of thenative folded state. J. Mol. Biol. 284, 1153–1164 (1998).

26. Baldwin, R. A new perspective on unfolded proteins. Adv. Prot. Chem. 62, 361–367 (2002).

27. Shortle, D. The expanded denatured state: an ensembleof conformations trapped in a locally encoded topologicalspace. Adv. Prot. Chem. 62, 1–23 (2002).

28. Smith, C. R., Mateljevic, N. & Bowler, B. E. Effects oftopology and excluded volume on protein denaturedstate conformational properties. Biochemistry 41,10173–10181 (2002).

29. Krantz, B. A. et al. Fast and slow intermediateaccumulation and the initial barrier mechanism in proteinfolding. J. Mol. Biol. 324, 359–371 (2002).

30. Sanchez, I. E. & Kiefhaber, T. Evidence for sequentialbarriers and obligatory intermediates in apparent two-state protein folding. J. Mol. Biol. 325, 367–376 (2003).

31. Daggett, V. et al. Structure of the transition state forfolding of a protein derived from experiment andsimulation. J. Mol. Biol. 257, 430–440 (1996).

32. Paci, E., Vendruscolo, M., Dobson, C. M. & Karplus, M.Determination of a transition state at atomic resolutionfrom protein engineering data. J. Mol. Biol. 324, 151–163(2002).

33. Paci, E., Clarke, J., Steward, A., Vendruscolo, M. &Karplus, M. Self-consistent determination of the transitionstate for protein folding: application to a fibronectin type IIIdomain. Proc. Natl Acad. Sci. USA 100, 394–399 (2003).

34. Makarov, D. E., Keller, C. A., Plaxco, K. W. & Metiu, H.How the folding rate constant of simple, single-domainproteins depends on the number of native contacts.Proc. Natl Acad. Sci. USA 99, 3535–3539 (2002).

35. Fersht, A. R. Transition state structure as a unifying basisin protein folding mechanisms: contact order, chaintopology, stability and the extended nucleus mechanism.Proc. Natl Acad. Sci. USA 97, 1525–1529 (2000).

36. Grantcharova, V. P., Riddle, D. S. & Baker, D. Long-rangeorder in the src SH3 folding transition state. Proc. NatlAcad. Sci. USA 97, 7084–7089 (2000).

37. Garcia, P., Serrano, L., Durand, D., Rico, M. & Bruix, M.NMR and SAXS characterization of the denatured stateof the chemotactic protein CheY: implications for proteinfolding initiation. Protein Sci. 10, 1100–1112 (2001).

38. Hollien, J. & Marqusee, S. Comparison of the foldingprocesses of T. thermophilus and E. coli ribonucleases H.J. Mol. Biol. 316, 327–340 (2002).

resolution and computer power, we will beable to extend the combined analysis of tran-sition states, intermediates and denaturedstates to more complex functional proteins,as well as to even faster timescales for earlyevents. We wait, with excitement, to see howthe principles that have been deduced for thefolding of small proteins and domains will beapplicable to the folding and assembly oflarge proteins.

Valerie Daggett is at the Department of MedicinalChemistry, Box 357610, University of Washington,

Seattle, Washington 98195–7610, USA.

Alan Fersht is at the Department of Chemistry,Lensfield Road, Cambridge CB2 1EW, UK, and the MRC Centre for Protein Engineering, MRCCentre, Hills Road, Cambridge CB2 2QH, UK.

e-mails: [email protected];[email protected]

doi:10.1038/nrm1126

1. Levinthal, C. in Mossbauer Spectroscopy in BiologicalSystems (eds Monticello, I. L., Debrunner, P., Tsibris, J. C. M. & Munck, E.) 22–24 (Univer. of IllinoisPress, Urbana, Illinois,1969).

2. Tramontano, A. Of men and machines. Nature Struct.Biol. 10, 87–90 (2003).

3. Fersht, A. R. & Daggett, V. Protein folding and unfoldingat atomic resolution. Cell 108, 573–582 (2002).

4. Matouschek, A. Protein unfolding — an importantprocess in vivo? Curr. Opin. Struct. Biol. 13, 98–109(2003).

5. Dobson, C. M. Protein-misfolding diseases: getting out ofshape. Nature 418, 729–730 (2002).

6. Kelly, J. W. Towards an understanding ofamyloidogenesis. Nature Struct. Biol. 9, 323–325 (2002).

7. Kim, P. S. & Baldwin, R. L. Specific intermediates in thefolding reactions of small proteins and the mechanism of protein folding. Annu. Rev. Biochem. 51, 459–489(1982).

8. Karplus, M. & Weaver, D. L. Protein folding dynamics.Nature 260, 404–406 (1976).

9. Wetlaufer, D. B. Nucleation, rapid folding, and globularintrachain regions in proteins. Proc. Natl Acad. Sci. USA70, 697–701 (1973).

10. Ptitsyn, O. B. Protein folding: hypotheses andexperiments. J. Prot. Chem. 6, 273–293 (1987).

11. Daggett, V. & Fersht, A. R. Is there a unifying mechanismfor protein folding. Trends Biochem. Sci. 28, 19–26(2003).

12. Fersht, A. R. et al. The folding of an enzyme I. Theory of protein engineering analysis of stability and pathway of protein folding. J. Mol. Biol. 224, 771–782(1992).

13. Li, A. & Daggett, V. Characterization of the transition stateof protein unfolding by use of molecular dynamics:chymotrypsin inhibitor 2. Proc. Natl Acad. Sci. USA 91,10430–10434 (1994).

14. Lazaridis, T. & Karplus, M. ‘New view’ of protein foldingreconciled with the old through multiple unfoldingsimulations. Science 278,1928–1931 (1997).

15. Day, R. et al. Increasing temperature accelerates proteinunfolding without changing the pathway of unfolding. J. Mol. Biol. 322, 189–203 (2002).

16. Bennion, B. & Daggett, V. The molecular basis for thechemical denaturation of proteins by urea. Proc. NatlAcad. Sci. USA 100, 5142–5147 (2003).