bioinf. data analysis & tools molecular simulations & sampling techniques117 jan 2006...

64
Molecular Simulations & Sampling Techn iques 1 Bioinf. Data Analysis & Tools 17 Jan 2006 Bioinformatics Data Analysis & Tools Molecular simulations & sampling techniques

Post on 20-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Molecular Simulations & Sampling Techniques 1

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Bioinformatics Data Analysis & Tools

Molecular simulations & sampling techniques

Molecular Simulations & Sampling Techniques 2

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Molecular Simulations: Brief History1936 Gelatine balls (Morell and Hildebrand)

1953 MC simulations (Metropolis et al.)

1957 MC of Lennard-Jones spheres (Wood and Parker)

1964 MD of liquid argon 10 ps (Rahman)

1970’s Non-equilibrium methods

1970’s Stochastic dynamics methods

1974 MD of liquid water (Stillinger and Rahman)

1977 MD of protein in vacuo 20 ps (McCammon et al.)

1980’s Quantum-mechanical effects

1983 MD of protein in water 20 ps (van Gunsteren et al.)

1998 MD of peptide folding 100 ns (Daura et al.)

1998 MD of protein folding 1 s (Duan and Kollman)

Today Large proteins or complexes in water or membrane; up to microseconds

(10-100 CPU days ~10^14 slower than nature; computer speed x10 every 6 years)2029 Protein folding 1 ms

2034 E-coli, 10^11 atoms 1 ns

2056 Cell, 10^15 atoms 1 ns

2080 Protein folding as fast as in nature

Molecular Simulations & Sampling Techniques 3

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Protein flexibility

• Also a correctly folded protein is dynamic– Crystal structure

yields average position of the atoms

– ‘Breathing’ overall motion possible

Molecular Simulations & Sampling Techniques 4

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

B-factors

• De gemiddelde beweging van atoom rond gemiddelde positie

alpha helicesbeta-sheet

Molecular Simulations & Sampling Techniques 5

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Peptide folding from simulation

• A small (beta-)peptide forms helical structure according to NMR

• Computer simulations of the atomic motions: molecular dynamics

Molecular Simulations & Sampling Techniques 6

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Folding and un-folding in 200 ns

t [ns]

RM

SD

[nm

]

00 50 100 150 20000

0.1

0.2

0.3

0.4

Unfolded structures

all different?how different?

321 1010 possibilities!

Folded structures

all the same

folded

unfolded

Molecular Simulations & Sampling Techniques 7

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Temperature dependence

folded

unfolded

folding equilibrium depends on temperature

360 K

320 K

340 K

350 K

298 K

Molecular Simulations & Sampling Techniques 8

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Pressure dependence

2000 atm

1000 atm

1 atm

folding equilibrium depends on pressure

folded

unfolded

Molecular Simulations & Sampling Techniques 9

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

• Number of relevant non-folded structures is very much smaller than the number of possible non-folded structures

• If the number of relevant non-folded structures increases proportionally with the folding time, only 109 protein structures need to be simulated in stead of 1090 structures

• Folding-mechanism perhaps simpler after all…

Surprising result

Number of aminoacids in protein chain

Folding time (exp/sim) (seconds)

Number

possible structures

relevant (observed) structures

peptide 10 10-8 320 109 103

protein 100 10-2 3200 1090 109

Molecular Simulations & Sampling Techniques 10

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Phase Space

• Defines state of classical system of N particles:

– coordinates q = (x1, y1, z1, x2, … , zN)

– momenta p = (px1, py1, pz1, px2, … , pzN)

• One conformation (+ momenta) is one point (p,q) in phase space

• Motion is a curved line in phase space– trajectory: (p(t),q(t))

Molecular Simulations & Sampling Techniques 11

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Molecular Motions: Time & Length-scales

Molecular Simulations & Sampling Techniques 12

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Newton DynamicsSir Isaac Newton

t t + t

Molecular Simulations & Sampling Techniques 13

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Classical (Newton) Mechanics• A system has coordinates q and momenta p (= mv):

p = ( p1, p2, … , pN )

q = ( q1, q2, … , qN )• This is called the configuration space.

• The total energy can be split into two components:– kinetic energy (K):

K(p) = ½ mv2 = ½ p2/m– potential energy (V):

V(q) depends on interaction(s)

• The potential energy is described by – bonded interactions (e.g. bond stretching, angle bending)– non-bonded interactions (e.g. van der Waals, electrostatic)

• Non-bonded interactions determine the conformational variation that we observe for example in protein motions.

Molecular Simulations & Sampling Techniques 14

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

The Hamilton Function• The Hamiltonian function represents the total energy:

H(p,q) = K(p) + V(q)

• Is the generalised expression of classical mechanics • In two differential expressions:

• Newton equations of motion, but in a very elegant way • Use 'generalised coordinates' (p and q):

– can use any coordiate system• e.g., Cartesian coordinates or Euler angles

q Hq = ––– = ––– t pk

p Hp = ––– = ––– t qk

. .

Molecular Simulations & Sampling Techniques 15

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Hamilton's Principle • "The time derivative of the integral over the energy of

( p q - H(p,q) ) t = 0

• Hamilton's principle is most fundamental– Newton's equation of motion are only one set of equations that

can be derived from Hamilton's principle.• The integral is called the 'action‘, meaning:

– If we integrate the trajectory of an object in a configuration space given by positions q and momenta p between time points (integration limits) t1 and t2, then the value of the integral (= the 'action') of a 'real‘ trajectory is a minimum (more precisely an extremum) if compared to all other trajectories.

• Example: Why does a thrown stone follow a parabolic trajectory?– If you vary the trajectory and calculate the action, the parbolic

trajectory will yield the smallest 'action'.

. .

Molecular Simulations & Sampling Techniques 16

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Harmonic oscillator:• 1-dimensional motion

• 2 dimensions in phase-space:

– position (1-dimensional)

– momentum (1-dimensional)

• analytical solution for integration:– q(t) = b · cos (√k/m · t )– p(t) = -b · √mk · sin ( √k/m · t )

p(t)

q(t)

Molecular Simulations & Sampling Techniques 17

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Calculating Averages

• Integration of phase space:– 1 particle, 2 values per coordinate (e.g. up, down):

• 1*6 degrees of freedom (dof); 26 = 64 points• 2 particles: 2*6 dof; 212 = 4.096 points• 3 particles: 3*6 dof; 218 = 262.144 points• 4 particles: 4*6 dof; 224 = 16.777.216 points

• Need whole of phase space ?– only low energy states are relevant

Molecular Simulations & Sampling Techniques 18

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Solving Complex systems

• No analytical solutions• Numerical integration:

– by time (Molecular Dynamics)– by ensemble (Monte-Carlo)

• Molecular Dynamics: Numerical integration in time– Euler’s approximation:

• q(t + Δt) = q(t) + p(t)/m · Δt• p(t + Δt) = p(t) + m · a(t) · Δt

– Verlet / Leap-frog

Molecular Simulations & Sampling Techniques 19

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Features of Newton Dynamics• Newton’s equations:

– Energy conservative– Time reversible– Deterministic

• Numeric integration by Verlet algrorithm: ‘Simulation’r(t + t) ~ 2 r(t) - r(t - t) + F(t)/m t2 [ + 2 O(t4) ]

• In ‘real’ simulation: Rounding errors (cumulative): not fully reversible no full energy conservation

• Coupling to thermal bath re-scaling not fully deterministic

• ‘Lyapunov’ instability trajectories diverge

Molecular Simulations & Sampling Techniques 20

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Derivation: Verlet

• Taylor expansion:– q(t+Δt) = q(t) + q’(t)Δt + 1/2! q’’(t)Δt2 + 1/3! q’’’(t)Δt3 + …

• where: q’(t) = v(t) (1st derivative, velocity)

• and: q’’(t) = a(t) (2nd derivative, acceleration)

q(t+Δt) = q(t) + q’(t)Δt + 1/2! q’’(t)Δt2 + 1/3! q’’’(t)Δt3

q(t−Δt) = q(t) − q’(t)Δt + 1/2! q’’(t)Δt2 − 1/3! q’’’(t)Δt3 +

q(t+Δt) + q(t−Δt) = 2q(t) + 2·1/2! q’’(t)Δt2

– Rearrange:

q(t+Δt) = 2q(t) − q(t−Δt) + a(t)Δt2

• 2nd order; but 3rd order accuracy

Molecular Simulations & Sampling Techniques 21

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

What do we obtain?• Trajectory:

q(t) and p(t)

• Probability of occurence:

P(p,q) = 1/Z e-H(p,q)/kT

• Averages along trajectory:

<A(p,q)T> = 1/T A(q(t),p(t)) dt (where T denotes total time, and not! temperature)

Molecular Simulations & Sampling Techniques 22

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Convergence• Amount of phase-space covered

– “Sampling”

• Impossible to prove:You cannot know what you don’t know

• Energy “landscape” in phase-space– there might be a “next valley”

Molecular Simulations & Sampling Techniques 23

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Example: Convergence (1)

Molecular Simulations & Sampling Techniques 24

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Example: Convergence (2)

Molecular Simulations & Sampling Techniques 25

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Example: Convergence (3)

• Apparent Convergence

on all timescales100 ps – 10 ns !

Molecular Simulations & Sampling Techniques 26

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Efficiency• Time step limited by vibrational frequencies

– heavy-atom–hydrogen bond vibration 10-14s (10fs)– 10-20 integration steps per vibrational period:

• 0.5 fs time step; 2.000.000 steps for 1 ns• Removal of fast vibrations (constraining):

– hydrogen atom bond and angle motion– heavy-atom bond motion– out-of-plane motions (e.g. aromatic groups)

• In practice: 1-2 fs time step– 5-7 fs maximum

Molecular Simulations & Sampling Techniques 27

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Constraining• to remove degrees of freedom, e.g.:

– bond i-j vibrations keep distance i-j constant– angle i-j-k vibrations keep distance i-k constant

• Constraint Algorithms– SHAKE

• iterative adjustment of lagrange multipliers– LINCS

• Taylor expansion of matrix inversion• non-iterative (more stable)• no highly connected constraints

– SETTLE• Analytical Solution

– for symmetric 3-atom molecules (like water)

Molecular Simulations & Sampling Techniques 28

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Improving Performance• Pairwise potential: Fij = − Fji

• Potential E(r) ~ 0 at large r : cut-off– Coulomb: ~ 1/r– Lennard-Jones: ~1/r6

• Atoms move little in one step: pair-list

– Evaluating r is expensive: r = √|rj−ri|

• Large distances change less: twin-range– short-range each step; long range less often

• Multiple time-step methods• Many Processor/Compiler/Language specific optimizations:

– use of Fortran vs. C– optimize cache performance

• arrays of positions, velocities, foces, parameters are very large

– compiler optimizations

Molecular Simulations & Sampling Techniques 29

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Ignoring Degrees of Freedom• Internal:

– bonds, angles → Constraint algorithm• larger time steps

• External:– “Solvent” → Langevin dynamics

• less (explicit) particles– Inertia & “solvent” → Brownian dynamics

• larger time steps

Molecular Simulations & Sampling Techniques 30

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Trajectory on Energy Surface

Molecular Simulations & Sampling Techniques 31

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Sampling in Conformational Space• Most of the computational time is spent on calculating

(local, harmonic) vibrations.

Entropy

Ene

rgy

E >> KT

vibration

Molecular Simulations & Sampling Techniques 32

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Barriers

• Kitao et al. (1998) Proteins 33, 496-517.

Molecular Simulations & Sampling Techniques 33

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Psychology of Theorists 100%

“In theory, there should be no difference between theory and practice. In practice, however, there is always a difference...“ (Witten and Frank)

“For every complex question there is a simple and wrong solution.” (Albert Einstein)

“All models are wrong, but some are useful.” (George Box)

0%

OP

TIM

IST

SC

ALE

Molecular Simulations & Sampling Techniques 34

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Monte Carlo Sampling• Ergodic hypothesis:

– Sampling over time (Molecular Dynamics approach); and

– Ensemble averaging (Monte Carlo approach) • Yield the same result:

(r) = < i(r) >NVE

• Detailed Balance condition:

p(o) (o n) = p(n) (n o)

Molecular Simulations & Sampling Techniques 35

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Metropolis Selection Scheme• Metropolis acceptance rule that satisfies detailed

equilibrium:

acc(o n) = p(n)/p(o) = e-E/kT if p(n) < (o)

acc(o n) = 1 if p(n) (o)

Metropolis Monte Carlo

• Ergodic probability density for configurations around rN

e-E/kT

p(rN) = –––––– e-E/kT

Molecular Simulations & Sampling Techniques 36

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Search Strategies

Molecular Simulations & Sampling Techniques 37

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Leaps

Molecular Simulations & Sampling Techniques 38

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Computational Scheme

• Readuction of the leaps will lead to classical dynamics

• Control parameter:– RMSD– Angle deviation

Molecular Simulations & Sampling Techniques 39

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Computational Load: Solvation

• Most computational time (>95%) spent on calculating (bulk) water-water interactions

Molecular Simulations & Sampling Techniques 40

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Implicit Solvation

Molecular Simulations & Sampling Techniques 41

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

POPS• Solvent accessible area

– fast and accurate area calculation– resolution:

• POPS-A (per atom) • POPS-R (per residue)

– parametrised on 120000 atoms and 12000 residues– derivable -> MD

• Free energy of solvation

Gsolvi = areai · i

• POPS is implemented in GROMOS96• parameters 'sigma' from simulations in water:

– amino acids in helix, sheet and extended conformation– peptides in helix and sheet conformation

Molecular Simulations & Sampling Techniques 42

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

POPS server

Molecular Simulations & Sampling Techniques 43

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Test molecules: alanine dipeptide

Molecular Simulations & Sampling Techniques 44

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Test molecules: BPTI / Y35G-BPTI Classical MD Leap-dynamics Essential dynamics

Molecular Simulations & Sampling Techniques 45

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Calmodulin domains• Apparent unfolding

temperatures (CD)– C-domain : 315 K

(42 ° C)– N-domain : 328 K

(55 °C)• LD simulations:

– 3 ns– 4 trajectories

• 290 K• 325 K• 360 K

Molecular Simulations & Sampling Techniques 46

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Snapshots

Molecular Simulations & Sampling Techniques 47

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Trajectories

Molecular Simulations & Sampling Techniques 48

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Example: Protein & Ligand Dynamics

Molecular Simulations & Sampling Techniques 49

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Example: Essential Dynamics Analysis

Cyt-P450BM3 7 x 10ns

“free” MD simulations

Molecular Simulations & Sampling Techniques 50

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

CD

Molecular Simulations & Sampling Techniques 51

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Comparison CD / simulation

Molecular Simulations & Sampling Techniques 52

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Example: Minima

Molecular Simulations & Sampling Techniques 53

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Example: Conformations

Molecular Simulations & Sampling Techniques 54

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Levinthal’s paradox

• Eiwitvouwingsprobleem:– Voorspel de 3D structuur vanuit de sequentie– Begrijp het vouwingsproces

Molecular Simulations & Sampling Techniques 55

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Folding energy

• Each protein conformation has a certain energy and a certain flexibility (entropy)

• Corresponds to a point on a multidimensional free energy surface

may have higher energybut lower free energythan

energyE(x)

coordinate x

Three coordinates per atom3N-6 dimensions possible G = H – TS

Molecular Simulations & Sampling Techniques 56

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Folded state

• Native state = lowest point on the free energy landscape

• Many possible routes • Many possible local minima (misfolded structures)

Molecular Simulations & Sampling Techniques 57

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Molten globule

• First step: hydrophobic collapse• Molten globule: globular structure, not yet correct folded• Local minimum on the free energy surface

Molecular Simulations & Sampling Techniques 58

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Force Field

“the collection of all forces that we consider to occur in a mechanical atomar system”

• A generalised description:Etotal = Ebonded + Enon-bonded + Ecrossterm

• Crossterms:– non-bonded interaction influence the bonded

interaction (v.v.). – Some force fields neglect those terms.

• Note that force fields are (mostly) designed for pairwise atom interactions. – Higher order interactions are implicitly included in

the pairwise interaction parameters.

Molecular Simulations & Sampling Techniques 59

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Force Field Components: Bonded Interactions

Molecular Simulations & Sampling Techniques 60

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Force Field Components: Non-Bonded Interactions

Molecular Simulations & Sampling Techniques 61

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

All Together…

Molecular Simulations & Sampling Techniques 62

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Reduced Units• Generalise description of (atomic) systems

– expres all quantities in basic units derived from system's dimensions

• For example, a Lennard-Jones interaction:VLJ = ƒ(r/)

is characteristic interaction energy; is equilibrium distance

• Choose basic units:– unit of length, – unit of energy,

– unit of mass, m (mass of the atoms in the system)• all other units can be derived from these, e.g.:

– time: m/– temperature: /kB

(from: Frenkel and Smit, 'Understanding Molecular Simulations', Academic Press.)

• Other choices, e.g., ‘MD’ units: – length nm (10-9m),mass u, time ps (10-12s), charge e, temp K– energy kJ mol-1, veolcity nm ps-1, pressure kJ mol-1 nm-3

Molecular Simulations & Sampling Techniques 63

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006

Main points

Molecular Simulations & Sampling Techniques 64

Bio

inf.

Data

An

aly

sis

& T

ools

17 Jan 2006