conformational sampling problem: how to find all of the possible conformations for a flexible...

20
Conformational Sampling Problem: How to find all of the possible conformations for a flexible molecule (protein, nucleic acid, polysaccharide, ligand, drug) The selected approach will depend on several things including: 1) The size of the molecule, and particularly the number of expected conformational states 2) The ability to define the states in obvious internal coordinates, such as torsion angles

Upload: charleen-jenkins

Post on 12-Jan-2016

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Conformational Sampling Problem: How to find all of the possible conformations for a flexible molecule (protein, nucleic acid, polysaccharide, ligand,

Conformational Sampling

Problem:How to find all of the possible conformations for a flexible molecule (protein, nucleic acid, polysaccharide, ligand, drug)

The selected approach will depend on several things including:1) The size of the molecule, and particularly the

number of expected conformational states2) The ability to define the states in obvious internal

coordinates, such as torsion angles

Page 2: Conformational Sampling Problem: How to find all of the possible conformations for a flexible molecule (protein, nucleic acid, polysaccharide, ligand,

Conformational States

How to find the stable states (conformations) of a molecule?What defines the state (or conformation) as “stable”?

Page 3: Conformational Sampling Problem: How to find all of the possible conformations for a flexible molecule (protein, nucleic acid, polysaccharide, ligand,

Using Grid Searching to Find Conformational States

If the states are related by simple internal coordinates, such dihedral angles, the states can be found by searching all of the dihedral angle space. I.e. vary the dihedral angle and look for low energy structures – this is known as Grid Searching

360

ii1

N

The number of conformations

Page 4: Conformational Sampling Problem: How to find all of the possible conformations for a flexible molecule (protein, nucleic acid, polysaccharide, ligand,

Grid Searching and Combinatorial Explosions

360

ii1

N

The number of conformations

Number of rotatable bonds

Step size (angle increment)

Number of conformations to generate

Total number of conformations

1 10 360/10 = 36 36

1 30 360/30 = 12 12

2 30 = 12*12 144

3 30 = 12*12*12 1,728

4 30 = 12*12*12*12 20,736

5 30 = 12*12*12*12*12 248,832

6 30 = 125 2,985,984

If it takes 1 second to compute the energy of each conformation, how many days will it take to perform a Grid Search of 6 bonds?

The principle problem with Grid Search methods is that the number of structures to be evaluated increases rapidly – this is the “Combinatorial Explosion” problem

Page 5: Conformational Sampling Problem: How to find all of the possible conformations for a flexible molecule (protein, nucleic acid, polysaccharide, ligand,

Stochastic Conformational SamplingAn alternative to Grid Searching, is to generate structures by randomly changing the atomic positions either in Cartesian space, or in torsion space. Random methods are also known as Stochastic Sampling methods.

The initial structures are usually energy minimized and then sorted with some sort of energy cut-off. I.e. Only low-energy conformations are kept – but the choice of what is “low-energy” is arbitrary. Often 10 – 20 kcal/mol above the minimum. All others are rejected.

0 20 40 60 80100

120140

160180

200220

240260

280300

320340

3600.0

0.5

1.0

1.5

2.0

2.5

The user decides how many random structures to generate.

For this reason, Stochastic Sampling can be much more efficient than grid searching, since it can avoid the Combinatorial Explosion problem.

Page 6: Conformational Sampling Problem: How to find all of the possible conformations for a flexible molecule (protein, nucleic acid, polysaccharide, ligand,

Stochastic Conformational Sampling

Page 7: Conformational Sampling Problem: How to find all of the possible conformations for a flexible molecule (protein, nucleic acid, polysaccharide, ligand,

Stochastic Conformational Sampling

Page 8: Conformational Sampling Problem: How to find all of the possible conformations for a flexible molecule (protein, nucleic acid, polysaccharide, ligand,

Stochastic Conformational Sampling

Both Stochastic and Systematic Searching work “OK” for small molecules

Page 9: Conformational Sampling Problem: How to find all of the possible conformations for a flexible molecule (protein, nucleic acid, polysaccharide, ligand,

Levinthal’s Paradox – Why Nature can’t use Grid Searching to Fold a Protein

In 1969, Cyrus Levinthal noted that, because of the very large number of degrees of freedom in an unfolded polypeptide chain, the molecule has an astronomical number of possible conformations [1].

For example, a polypeptide of 100 residues will have 99 peptide bonds, and therefore 198 different phi and psi bond angles. If each of these bond angles can be in one of three stable staggered conformations, the protein may fold into a maximum of 3198 different conformations.

If a protein were to attain its correctly folded configuration by sequentially sampling all the possible conformations (i.e. by Grid Searching), it would require a time longer than the age of the universe to arrive at its correct native conformation.

This is true even if conformations are sampled at rapid (nanosecond or picosecond) rates. The "paradox" is that most small proteins fold spontaneously on a millisecond or even microsecond time scale.

The fact that many naturally-occurring proteins fold reliably and quickly to their native state despite the astronomical number of possible configurations has come to be known as Levinthal's Paradox.

Levinthal, Cyrus (1969). "How to Fold Graciously". Mossbauer Spectroscopy in Biological Systems: Proceedings of a meeting held at Allerton House, Monticello, Illinois: 22–24.

Page 10: Conformational Sampling Problem: How to find all of the possible conformations for a flexible molecule (protein, nucleic acid, polysaccharide, ligand,

Conformational States

What defines the state (or conformation) as “stable”?At a given temperature which states are likely to be

populated?

0 15 30 45 60 75 90105

120135

150165

180195

210225

240255

270285

300315

330345

3600.0

0.5

1.0

1.5

2.0

2.5

Energy(kcal/mol)

Torsion Angle (degrees)

Page 11: Conformational Sampling Problem: How to find all of the possible conformations for a flexible molecule (protein, nucleic acid, polysaccharide, ligand,

Conformational States Depend on Temperature

0 15 30 45 60 75 90105

120135

150165

180195

210225

240255

270285

300315

330345

3600.0

0.5

1.0

1.5

2.0

2.5

Energy(kcal/mol)

Torsion Angle (degrees)

Average Kinetic Energy = 3/2kBTkB = Boltzmann’s constant = 0.001 987 kcal/mol/K

At 300K how much kinetic (thermal) energy is available to a molecule?

Page 12: Conformational Sampling Problem: How to find all of the possible conformations for a flexible molecule (protein, nucleic acid, polysaccharide, ligand,

Which Conformational States Are Relevant?

0 25 50 75100

125150

175200

225250

275300

325350

0.0

0.5

1.0

1.5

2.0

2.5

Energy(kcal/mol)

Torsion Angle (degrees)

360330300270240210180150120

906030

0

Simulation Time

Not all possible states will be populated (observed) at room temperature

For this reason room-temperature MD is inefficient at finding conformational states

Page 13: Conformational Sampling Problem: How to find all of the possible conformations for a flexible molecule (protein, nucleic acid, polysaccharide, ligand,

Which Conformational States Are Relevant?

Energy(kcal/mol)

Torsion Angle (degrees)

0 25 50 75100

125150

175200

225250

275300

325350

0.0

2.0

4.0

6.0

8.0

10.0

12.0

14.0

16.0

18.0

20.0

360330300270240210180150120

906030

0

Simulation Time

By raising the temperature it is possible to find other statesThis approach can be employed in either MD simulations or

MC sampling

Page 14: Conformational Sampling Problem: How to find all of the possible conformations for a flexible molecule (protein, nucleic acid, polysaccharide, ligand,

Increasing Temperature Increases Sampling

Increasing the temperature will enable more states to be detected during the simulation – this is known as Simulated Annealing

But for how long should the simulation be run? To what temperature should the system be heated?

State

Simulation Time

Page 15: Conformational Sampling Problem: How to find all of the possible conformations for a flexible molecule (protein, nucleic acid, polysaccharide, ligand,

Lowering Internal Barriers Increases Sampling

An alternative to raising the energy is to lower the barriersBut how do you know what barriers to lower?

Must be able to identify simple internal coordinates that are related to the states, such as torsion angles

0 25 50 75100

125150

175200

225250

275300

325350

0.0

0.5

1.0

1.5

2.0

2.5

Energy(kcal/mol)

Torsion Angle (degrees)

0 25 50 75100

125150

175200

225250

275300

325350

0.0

0.5

1.0

1.5

2.0

2.5

Page 16: Conformational Sampling Problem: How to find all of the possible conformations for a flexible molecule (protein, nucleic acid, polysaccharide, ligand,

Conformational Sampling with Reduced Barriers

Page 17: Conformational Sampling Problem: How to find all of the possible conformations for a flexible molecule (protein, nucleic acid, polysaccharide, ligand,

Conformational Sampling with Reduced Barriers

Page 18: Conformational Sampling Problem: How to find all of the possible conformations for a flexible molecule (protein, nucleic acid, polysaccharide, ligand,

Conformational Sampling with Reduced Barriers

Page 19: Conformational Sampling Problem: How to find all of the possible conformations for a flexible molecule (protein, nucleic acid, polysaccharide, ligand,

Conformational Sampling with Reduced Barriers

Page 20: Conformational Sampling Problem: How to find all of the possible conformations for a flexible molecule (protein, nucleic acid, polysaccharide, ligand,

Choice of Conformational Sampling Method

Thus the problem of conformational sampling is different for a small molecule (with few rotatable bonds) than for a macromolecule, such as a protein

Small molecule – can use Grid or Stochastic Searching to generate an ensemble of structures

Macromolecule – use Simulated Annealing, or Monte Carlo (MC) Sampling, or long MD simulations

In the limit – that is, once all of the stable states have been identified and their populations weighted by their relative energies – each method should give the same answer – this is related to the “Egrodic Hypothesis”

Ergodic Hypothesis: the time average property (from MD) is the same as the ensemble average property (from MC)