1 introduction - university of groningen · simulations of proteins in atomic detail, including...

1 Introduction

Proteins

Proteins are bio-polymers that play an essential role in most processes in living organ-isms. They consist of fifty to several hundreds of units, which are called amino-acidresidues. Each residue is built up by a backbone of one nitrogen, two carbons, an oxy-gen and two hydrogens and a side chain. The side chain can be one of twenty differenttypes. The specific sequence of these side chains is called the primary structure of aprotein. The N-H group in the backbone of a residue can form a hydrogen bond with aC=O group in the backbone of another residue. This gives rise to the secondary structureof a protein. There are two types of secondary structure elements. One is the �-helix,where the hydrogen bonds sequentially connect residues � and ��. The other is the par-allel or anti-parallel �-sheet (see Figure 1.1), where the hydrogen bonds connect two (ormore) extended backbones In a given environment the sequence of amino acid residuesdetermines the 3-dimensional conformation in which the protein can be found most ofthe time. This conformation is called the native state.

Several aspects of a protein can be involved in its function. When a protein is usedfor building tissue its global shape is important. When used in muscles a protein mustbe able to transform chemical energy into a mechanical force. The largest diversity ofproteins can be found in signal transduction pathways. In almost any process in a cellproteins are used to transmit signals. The protein’s shape or surface structure can beused by other proteins to recognize a signal. Yet another class of proteins are enzymesthat catalyze chemical reactions. Here quantum chemistry plays an important role. Thistopic will not be covered in this thesis; quantum effects only play a role in chapter 4,which deals with water.

Although the structures of proteins can be resolved to atomic detail with X-raydiffraction and nuclear magnetic resonance (NMR), there is no experimental methodwhich can probe the detailed dynamics. Only simulation techniques, such as moleculardynamics (MD), can produce trajectories of all the atoms of a protein. In MD the pro-tein is described as a collection of classical particles which interact via mainly pairwisepotentials. The atoms move according to Newton’s equations of motion. From such asimulation one obtains the coordinates of all the atoms of the protein and the solvent,when it is included, as a function of time. This is a wealth of detailed information which

1

1 Introduction

Figure 1.1: A part of an anti-parallel �-strand and �-turn in the protein HPr. Theleft panel shows all atoms, the right panel shows only the backbone atoms, thehydrogen bonds are displayed as dashed cylinders. This protein is simulated inchapters 6, 7 and 8.

can give useful insights into biological processes. However, simulation times are cur-rently limited to hundreds of nanoseconds, while biological processes often occur ontime scales of microseconds or seconds. This means that only the fastest processes canbe studied directly. Slow processes that follow only one or a small number of pathwayscan still be studied when there is a clear separation in time scales.

Protein folding

A biologically very important process that involves a large number of pathways is pro-tein folding. When the protein can be crystallized, its 3-dimensional structure can beresolved using X-ray diffraction. Structures of relatively small proteins can also be de-termined with NMR. But in principle it should be possible to determine the structure ofa protein from its sequence alone. When a sequence is very similar to a sequence of aprotein for which the structure is known, the structure can be modeled on a template ofthe known structure; this is called homology modeling. This approach fails, of course,for proteins that lack similarity to proteins with a known fold. A large number of meth-ods have been developed to predict the structure using only the sequence of the protein.All these methods do not use explicit solvent molecules, but some implicit description ofsolvation. This usually includes some term which favor the interaction between apolar,hydrophobic, groups of atoms. These interactions are important for the initial collapse ofa completely extended chain to a compact structure in which most hydrophobic groupsare no longer in contact with the solvent. Such a state is often referred to as a molten

2

globule, as it does not remain in a unique conformation. Going from a molten globulestate to the native conformation requires an extremely accurate model, because the sta-bility of a protein is only 10-20 �� , while local interactions, such as a hydrogen bond,are also of this order.

Using an implicit solvent model might work reasonably for simulating a protein inthe native state, since this is usually a compact nearly spherical object. However, for dis-criminating between the native state and other compact states these models will probablyfail, since very detailed balancing of a large number of interactions is required. Severalof these interactions can only be modeled by explicit solvent models and it is even ques-tionable if the current non-polarizable force fields are accurate enough to reproduce thedifferences in energy up to the required accuracy.

Simulations of proteins in atomic detail, including explicit solvent in an periodicunit-cell, are limited in length by the speed of the computer. All simulations presentedin this work were performed with the Gromacs [1, 2, 3] package, which is probablythe fastest available MD software. With the speed of the current desktop computers asimulation with Gromacs of a protein of reasonable size (200 residues) in water runsat about a nanosecond per day. This is far too slow to observe (partial) unfolding andrefolding and also too slow to obtain reasonable statistics on collective motions that takeplace in the native state.

The large computational effort is due to the large number of pair interactions whichneed to be evaluated at each integration step. A commonly used potential form lookslike this:

� � �� (1.1)

��

��

��

(1.2)

��

��

(1.3)

��

�

�� (1.4)

� ��

�

�� (1.5)

��

��

�

�� (1.6)

where �� is the distance between particle � and �. The potential can be separated intoso called non-bonded and bonded terms. The first two terms are non-bonded: ��:the electrostatic interactions and �� : the Van der Waals interactions, which are repre-sented by a Lennard-Jones potential. The last three terms are bonded interactions, whichare between particles which are separated by at most three chemical bonds. Only the

3

1 Introduction

bonded terms involve three of four body interactions, all the non-bonded interactions arerepresented by pair-potentials. Apart from the electrostatics, all interactions are short-ranged. To avoid calculating electrostatic interactions between �� pairs of particles,one can apply a cut-off, neglecting all interactions beyond a certain distance. This canlead to severe artifacts in non-homogeneous systems, but even some properties of a pureliquid can be affected significantly as we will see in chapter 5. A much more accuratemethod for treating electrostatics, which scales as � � ��, is the particle mesh Ewaldmethod [4].

Speeding up simulations

For condensed systems about 90% of the CPU-time goes into calculating the non-bondedinteractions. This effort can be reduced by two methods.

The first method is increasing the time step. The time step should be small enoughto follow the changes in the forces accurately. In a condensed system this means thatthe time step should be an order of magnitude smaller than the period of the fastestvibration. When the fastest degrees of freedom are almost uncoupled from the rest ofthe system, they can be removed, which allows for an increase of the time step. Degreesof freedom can be removed by constraining distances or by removing particles from thedynamic equations. Both methods are treated in chapter 3. When the small changes inbond length are important flexible constraints can be used, as will be shown in chapter 4.

The second method is decreasing the number of interactions which need to be calcu-lated per step. An elegant way of doing this is a multi time-step algorithm [5]. In such analgorithm rapidly changing short-range forces are updated more often than slowly fluc-tuating long-range forces. Unfortunately not much can be gained here after the fastestdegrees of freedom have been removed. An obvious way of decreasing the number ofinteractions, is by decreasing the number of particles that can interact. For very homo-geneous systems, such as industrial polymers, this is a relatively easy task. For proteins,which have 20 different types of residues and specific interactions with the solvent, thisis much harder. The degrees of freedom which are removed exert three types of forceson the remaining degrees of freedom: forces related only to the positions of the parti-cles, these are conservative forces, which produce a so called potential of mean force,forces related to the velocities, these are dissipative or frictional forces and uncorrelatedor random forces. The last two are coupled by the fluctuation-dissipation theorem [6].The equations of motion for the reduced system are described in chapter 2. The mainproblem in this approach lies in determining the potential of mean force. Although thetwo other contributions might be very complicated, they only influence the dynamics,not the ensemble of sampled conformations or thermodynamics. Determining a poten-tial of mean force is so complicated because its shape can be completely different fromthe shape of the potential for the full system. Formally it will almost always containmany-body terms, even when the original potential consisted of only pair interactions.

4

A first attempt at constructing a reduced description for protein dynamics is presented inchapter 8. The model uses the techniques introduced in chapter 3.

Analyzing dynamics

Before one can construct a simplified model, one needs to know what the relevant, slowdegrees of freedom are and how these behave. The collective behavior of two differentproteins is studied in chapter 7. Since molecular dynamics simulations are limited intime and are far from convergence, even for sampling around one conformation, oneneeds to know how reliable the sampling is. This is analyzed in chapters 6 and 7.The method usually applied to isolate the slow, collective motions is based on statis-tical analysis of atomic displacements. This method requires special attention since itproduces results resembling correlated behavior, even for multidimensional random dif-fusion without potential. In chapter 6 a full mathematical analysis of this problem isgiven.

Another point of attention is the viscosity of the liquid in which the protein is sol-vated. The polar residues on the surface of the protein penetrate into the liquid, so therate of conformational changes is influenced by the viscosity of the solvent. The vis-cosity of most of the models used for water in molecular dynamics simulations is abouttwice as low as for real water. In chapter 5 several methods are discussed for calculatingviscosities of model liquids.

5

1 Introduction

6

1 introduction - university of groningen · simulations of proteins in atomic detail, including...

Documents