general notes on computational biophysics
TRANSCRIPT
-
8/13/2019 General notes on computational biophysics
1/123
Introduction to computational biophysics (CS 428) (3 credit points)Instructor: Ron Elber ([email protected]) 5-7146
Pre-requisites: CS 100, MATH 293,294, Physics 112,213 CHEM 211 or equivalent.BioBM 330 recommended
Tuesday and Thursday , lecture: 1:25-2:15 Olin Hall 245Thursday, section: 2:30-3:20 Hollister Hall 306
Water:1. Atomistic simulations. Fixed charge models. TIP3P and TIP4P. Energy
minimization (steepest descent, conjugate gradient, Newton Raphson) and thegeometry of the water dimer.
2. Water entropy and free energy. Calculation of partition functions. (Stochasticsampling, randomized algorithms, Metropolis algorithm. and Markov chains).
3. Hydrophobic effects and solvation of apolar molecules. (Enhanced sampling,
multi-tempering, and multi-ensemble approaches).Protein Folding:
1. Reduced representation of polymers and simulations of polymer collapse. (Latticeand continuous Monte Carlo simulations).
2. Simulation of kinetics and equilibrium Brownian dynamics.3. Global optimization techniques, randomized algorithms, protein design
Molecular dynamics:1. Solving initial value problems. Extracting kinetic and thermodynamic properties.2. Molecular dynamics with holonomic constraints (SHAKE).3. Solvent and solutes, Periodic boundary conditions, pressure and temperature
controls,4. Computing long-range forces (Ewald sum).5. Correlation functions and experiments6. Transition state theory in the condensed phases
Statistics:1. Estimators: Mean, standard deviation2. Maximum likelihood3. Confidence interval4. chi-2 statistics5. Regression6. Goodness of fit
The students in the class must follow the code of academic integrityhttp://www.cs.cornell.edu/degreeprogs/ugrad/CSMajor/index.htm#ai
-
8/13/2019 General notes on computational biophysics
2/123
Water
A very simple molecule that consists of 3 atoms: one oxygen and two hydrogen atoms.Some remarkable properties.Very high melting and vaporization points for liquid withso light molecular mass. For example, CCl4 melt at 250K, while water at 273K.
Anomalous density behavior at freezing (expansion in density ice lighter than liquidwater, ice 0.92 g/mL, liquid water 1.0g/mL).Very strong electrical forces (specialorientation forces?). High dielectric constant
Explain with simple microscopic model macroscopic behavior? Explain alsomicroscopic data, since macroscopic observation are too few.
Structures Correlation functions spatial & time. Spatial correlation functions (first and second peak) as determinant of interaction
strengths
Time correlations measure (for example) dielectric response.A water molecule is neutral but it carriesa large dipole moment . The oxygen is highlynegative and the hydrogen positive. The dipole moment of the molecule (it is not linear)is 1.855 Debye unit (Debye = 3.3356410-30C m). Electron charge and distance of oneangstrom corresponds to 4.803 Debye.
The geometry of a single water molecule is determined by the following parameters(using a model potential TIP3P position of an atom as light as hydrogen in quantummechanics is subject to considerable uncertainty): r(OH) 0.9572 a(HOH) 109.47 Sincethe hydrogen atoms are symmetric, the dipole moment is an experimental indication that
the molecule is not linear. The individual molecules are set to be rigid (no violations ofthe above internal coordinates are allowed). We shall use harmonic restraints to fix thegeometry (at least until we will learn how to handle holonomic constraints). For internalwater potential, we write
( ) ( ) ( )2 2 2internal 1 2 1 20.9572 0.9572 ' 1.5139o h o h h hU k r k r k r = + +
wherek andk are constants chosen to minimize the changes in the distances compared tothe ideal values, and is chosen empirically to be 600 kcal/mol2.
The potential betweenn water molecules using the TIP3P model is
( ) ( )between molecules 12 6 , ,ik jl
i j i j k lij ij ik jl
c c A BU k
r oo r oo r > >
= +
the indices ,i j are for water molecules, the indices,k l are for the atoms of a singlemolecule. The first sum is over the oxygen atoms only and includes repulsion betweenthe different terms. This type of potential is also called Lennard Jones. The hydrogenatoms (deprived from electrons) are so small that their influence on the hard-core shape
-
8/13/2019 General notes on computational biophysics
3/123
of the molecule is ignored. The hard core is set to be spherical with the oxygen atom atthe center. Hydrogen atoms (and oxygen atoms too) are coming to play in the secondterm of electrostatic interactions.The potential parameters are as follows
12
6
0
629,400 kcal /mol625.5 kcal /mol
0.8340.417
332.1 H
A
B
c
c
k
=== =
=
To begin with we will be interested in the water dimer. The interaction between the twomolecules can be written more explicitly (below) as
1 02 1 21 1 22 2 11 2 12between molecules 12 6
1 2 1 2 1 2 1 21 1 22 1 21 2 12
11 21 11 22 12 21 12 22
11 21 11 22 12 21 12 22
[
]
o o h o h o h o h
o o o o o o o h o h o h o h
h h h h h h h h
h h h h h h h h
c c c c c c c c c c A BU k
r r r r r r r
c c c c c c c cr r r r
= + + + + +
+ + + +
For the first monomer we useo1 h11 & h12 to denote the corresponding three atoms. Forthe second monomer we used insteado2 h21 & h22 . Of course the modeling of theinteraction between molecules must come together with the internal potential keeping thegeometry fixed.The water dimer has an optimal geometry (a minimum energy configuration) that we nowwish to determine. This is a question of how to arrange negatively charged atoms close topositively charge atoms and keep similar charges away from each other.Together withthe fix geometry of the individual molecules this is a frustrated system (you cannotmake everybody happy a classic example is of three spins) that also has implications onthe docking problem and drug design (determining the orientation of two interactingmolecules) and the range of complementing interactions and shape matching
What is the dimensionality of the problem ? We have three atoms in each of the watermolecules making a total of eighteen degrees of freedom. The actual number relevant forthe docking is much smaller. Of the eighteen six are of translation and rotation of thewhole system (not changing the relative orientation) and another six are reduced to theinternal constraints on the structure on the structure of the molecule. Hence, only sixdegrees of freedom remain when we attempt to match the two (rigid) water moleculestogether.
-
8/13/2019 General notes on computational biophysics
4/123
This is possible but requires considerable work for implementation and makes itnecessary to write the energy in considerably less convenient coordinates (the mostconvenient are just the Cartesian coordinates of the individual atoms). So rather thanbeing clever at the very first step, we will use the simplest approach.In the simplestapproach we attempt to optimize the energy of the water dimer as a function of all
degrees of freedom . This requires more work from the computer, but less work from us.Minimizing the energy (and find the best structure) in this large 18 dimensional space issomething we cannot do by intuition and we must design an algorithm to do it.
TheAlgorithm of steepest descent is one way of doing it, and this is the first approachthat we are going to study. Consider a guess coordinate vector for the water dimer that wdenote by R (capitals denotes arrays or vectors, lower case denotes elements of a vector).It is a vector of length 18 and includes all the x,y,z coordinates. What is the best way oforganizing the coordinates? In some programs the vector of coordinates is decomposedinto three Cartesian vectors ( ) ( )1 2 18 1 18 1 18, , , ,... , ,..., , ,..., R X Y Z x x x y y z z = . This set up ishowever less efficient from the view point of memory architecture. When we computedistances between atoms (the main computation required for the energy calculations) werequire the (x,y,z) coordinates of the two atoms. By aggregating all the Cartesiancomponents together (e.g. all the x-s are coming first) we may create cache misses. Tofind the Cartesian components of one atom makes it necessary to walk quite far alongthe array, every time we need one of the Cartesian component we need to step throughn numbers to pick the next Cartesian component. This is not a serious problem for 18 atombut for hundred of thousands and millions of atoms (like some simulations are) it mightbe.A more efficient structure that we shall adopt is therefore:
( )1 1 1 18 18 18, , ,..., , , R x y z x y z= where the coordinates that belong to a single particle are kept together, making it lesslikely to have cache misses.
Starting from an initial configuration0 R we want to find a lower energy configuration.The argument is that low energy configurations are more likely to be relevant. This is truin general but is not always sufficient. We shall come back when discussing entropytemperature and how to simulate thermal fluctuations.
For the moment assume that we are going to make a small step a displacement
( )0 R R R = of norm ( )2
0i ii
R q q = . We have used the so-called norm 2distance measure which is very common in computational biophysics. We have also usedanother variableiq , which is any of the Cartesian coordinates of the system. For a systemwith N atoms, it has3N iq s . It is useful to know that more general formulation would
be ( )1
( )0
nnni i
i
q q = . These different distances emphasize alternative aspects of the
-
8/13/2019 General notes on computational biophysics
5/123
-
8/13/2019 General notes on computational biophysics
6/123
U R
U
=
We did not discuss so far how to choose the step length , except to argue that it shouldbe small. We expect that if the gradient is very small then we are near a minimum(actually a stationary point where 0U = ). In that case only a small step should be used.On the other hand if the gradient is large, a somewhat large step could be taken since weanticipate significant change in the function (to the better). Of course the step still cannotbe too large since our arguments are based on linear expansion of the function. Based onthe above arguments, it is suggestive to use the following (simpler) expression for thedisplacement that is directly proportional to the length of the gradient vector.
R U = We can imagine now a numerical minimization process that goes as follows:0. Init -- 0 R R= 1. Compute ( )U R 2. Compute a step R U = and a new coordinate set R R R = + 3. Check for convergence( )U If converged, stop. Otherwise return to 1.
This procedure finds for us a local minimum, a minimum to which we can slide downdirectly from an initial guess. It will not provide a solution to the global optimizationproblem, finding the minimum that is the lowest of them all.We can take the above expression one step further and write it down as a differentialequation as a function of a progress variable . This is not the most efficient way ofminimizing the structure under consideration but it will help us understand the processbetter. We writedR
U d = where the (dummy) variable is varying from zero to . At we approach thenearby stationary point. Not that the potential( )( )U R is a monotonically non-increasing function of . This is easy to show as follows. Multiple both side of the
equation byt
dRd
, we havet t
t
i
i i
dR dR dRU
d d d
dqdR dU U d d dq
=
=
Assuming that the potential does not have explicit dependence of (i.e. that is morethan a dummy variable which contradicts our initial assumptions) we can write the finalexpression in a more compact and illuminating form
-
8/13/2019 General notes on computational biophysics
7/123
0
0
t dR dR dU d d d
dU d
=
Hence as we progress the solution of the differential equation the potential energy isdecreasing in the best case and is not increasing in the worst case. There can be differentvariants of how to choose the norm of the step - , we already mentioned two of them.Another important variant is to make the step size parameter and to optimize it for agiven search direction defined byU . One approach would be to performa search for aminimum along the line defined by the U , i.e. we seek a such that
( ) 0U R U + = as a function of the single scalar variable . This one-dimensionalminimization makes sense if the calculation of the gradient can be avoided in the one-dimensional minimization, and search steps in the one-dimensional minimization can becomputed more efficiently than the determination of the direction. For example, in the
one-dimensional optimization only calculation of the energy( )U R U + can be usedand the approach of interval halving. We guess a given0 and if the energy is going upwe try a half of the previous size0 2 , if it is going down we double it. We continue toevaluate the progress of halving an interval that contains a minimum until we hit aminimum with desired properties (halving on the left or halving on the right results in anincreasing energy).This scheme assumes that the calculation of the potential energyis a lot more efficient than the calculation of the gradient . This is however not thecase here and for the task at hand the line search option of the steepest descent algorithmis not efficient.
We note that compared to other optimization algorithms (such as conjugate gradient thatwe shall not discuss, and the Newton-Raphson algorithm that we will) the steepestdescent algorithm is considerably slower. However, it is a lot more stable than (forexample) the Newton Raphson approach. It is a common practice in molecularsimulations to start with a crude minimizer like the Steepest Descent algorithmdescribed above to begin with (if the initial structure is pretty bad), and then to refine thecoordinate to perfection using something like Newton Raphson algorithm, which is onour agenda.
A few words about computing potential derivatives (which is of prime importance forminimization algorithms in high dimension. It is unheard of having effective minimizers
in high dimensions without derivatives:Overall, the potential derivatives are dominated by calculations of distances. It istherefore useful to consider the derivative of a distance between two particles, as afunction of the Cartesian coordinates.
-
8/13/2019 General notes on computational biophysics
8/123
( ) ( ) ( )( )
( ) ( ) ( )
2 2 2
2 2 2 , ,
ij i j i j i j
i jij
ii j i j i j
r x x y y z z
w wdr w x y z
dw x x y y z z
= + +
= =
+ +
It is therefore straightforward to do all kind of derivatives with extensive use of the chainrule. For example the Lennard Jones term
( )
( )
12 6 13 7 13 7
14 8
12 6 12 6
12 6
k jkj
i j j k j k k ij ij kj kj k kj kj kj
k j j k kj kj
w wdr d A B A B A Bdw r r r r dw r r r
A Bw w
r r
>
= + = +
= +
Note that the expression above depends only on even power of the distance (14 and 8)which is good news, meaning that no square roots are required. Square root are the curseof simulations and are much more expensive to compute compare to add/multiply etc.Unfortunately, electrostatic interactions are more expensive to compute. The potentialand its derivatives require square root calculations. And here is the expression for a set ofindependent atoms (for convenience we forgot about molecules here)
( ) ( )2 3, , , ,
i j i j i jk ik i
i j i j i jk i j i k i k i k
c c c c c cw wd w w
dw r r r r >
= =
Here is a little tricky question. To compute the energy, (which is a single scalar),for N atoms we need to calculate2 2 N terms (assuming all unique distancescontribute to the energy), and then add them up. How many terms we need tocompute for the gradient of the potential?
Calculations of potential gradients are a major source of errors in programming moleculamodeling code. It is EXTREMELY useful to check the analytical derivatives againstnumerical derivatives computed by finite difference, when the expected accuracy shouldbe of at least a few digits. For example
( ) ( )1 1,..., 2,..., ,..., 2,..., k N k N k
U q q q U q q qdU k
dq
+
0
For the water system the step can be 610 (using double precision) and the expectedaccuracy is at least of 3-4 digits. More than that suggest an error.
This concludes (in principle) our discussion of thesteepest descent minimizationalgorithm. From the above discussion it is quite clear that minimization in theneighborhood of a stationary point of the potential (like a minimum) is difficult .
-
8/13/2019 General notes on computational biophysics
9/123
Near the minimum the gradient of the potential is close to zero, subject to potentialnumerical error and support the use of only extremely small step that are harder toconverge numerically. In that sense the algorithm we discuss next is complementary tothe steepest descent approach. It is working well in the neighborhood of a minimum. It isnot working so well if the starting structure is very far from a minimum, since the large
step taken by this algorithm (Newton Raphson) relies on the correctness of thequadratic expansion of the potential energy surface near a minimum .
We start by considering a linear expansion of the potential derivatives at the currentposition R in the neighbor hood of the desired minimumm R . At the minimum, thegradient is (of course) zero. We have
( ) ( ) ( )2
0m i im j ji j i
d U U R U R q q
dq dq = + 0
The entity that we wish to determine from the above equation ism R the position of the
minimum. The square bracket with a subscript[ ]... j denotes the j vector element. The lastterm is a multiplication of matrix by a vector. From now onwards we shall denote thesecond derivative matrix byU # . We are attempting to do so by expanding the forcelinearly in the neighbor hood of the current point. This is one step up in expansioncompared to the steepest descent minimization; however, it is not sufficient in general. Inthe simplest version of the Newton Raphson (NR) approach very large steps are allowed.Large steps can clearly lead to problems if the linear expansion is not valid. Neverthelessthe expansion is expected to be valid if we are close to a minimum, since any function inthe neighborhood of a minimum can be expanded (accurately) up to a second order term
( ) ( ) ( ) ( ),
1
2
t
m m mi j
U R U R R R U R R + # Note that we did not write down the first order derivatives (gradient) since they are zeroin at the minimum. This is one clear advantage of NR with respect to steepest descentminimizer (SDM). SDM relies on the first derivatives only, derivatives that vanish in theneighborhood of a minimum. It is therefore difficult for SDM to make progress in theecircumstances while NR can do it in one single shot as we see blow.We write again the equation for the gradient in a matrix form
( ) ( ) ( ) ( )0m mU R U R U R R R = + #0
which we can formally solve (form R ) as
( )( ) ( )1
m R R U R U
= #
The matrix( )1
U # is the inverse of the matrixU # , namely( )
1U U I
=# # where I is the
identity matrix. In principle there is nothing in the above equation that determine the
-
8/13/2019 General notes on computational biophysics
10/123
norm of the step m R R that we should take. If the system is close the quadratic thesecond derivative matrix is roughly a constant and a reasonably large step to ward theminimum can be taken without violating significantly validity of the above equation. Theability to take a large step in quadratic like system is a clear advantage of NR compare toSDM. However for systems that are not quadratic the matrix is not a constant and onlysmall steps (artificially enforced) should be used. In our water system NR should be usedwith care and only sufficiently close to a minimum. There are a few technical points thatspecifically should concern us with the water dimer optimization problem. We will beconcern with the following
1. Does the matrixU # has an inverse, and what can we do if it does not?2. How to find the inverse (or solve the above linear equations)?
The bad news is that the matrixU # for molecular systems (as the water dimer is) does nothave in general an inverse. The problem is the six degrees of freedom that we mentionedearlier and do not affect the potential energy: three overall translation and three overallrotations have zero eigenvalues making the inverse singular. This is easy to see asfollows
Let the eigenvectors of theU # be ie and the eigenvaluesi . It is possible to write thematrixU # as the following sum
t i i i
i
U e e = # where the outer product
( )
1 1 1 1 2 1
2 2 1 2 2 21 2
1 2
...
......
... ... ... ... ......
i i i i i i iN
i i i i i i iN t i i i i iN
iN iN i iN i iN iN
e e e e e e e
e e e e e e ee e e e e
e e e e e e e
= =
generates a symmetric NxN matrix. Since the vectorsie are orthogonal to each other( i j ije e = ), it is trivial to write down the inverse in this case
( )1 t
i i
i i
e eU
= # However, this expression is true only if alli are different from zero. The eigenvectorsrepresent directions of motion and the eigenvalues are associated with the cost in energyfor moving in a direction determined by the corresponding eigenvector. However moving
along the direction of global translation (or global rotation) does not change the energy,therefore their corresponding eigenvalues must be equal to zero, and the inverseimpossible to get.
One way of getting around this problem is by shifting the eigenvalues of the offendingeigenvectors. If we know in advance the six offending eigenvectors we can raise theireigenvalues to very high values (instead of zero) by adding to the matrix outer productsof these vectors multiplied by very high value (see below). Contribution of eigenvectors
-
8/13/2019 General notes on computational biophysics
11/123
with very high value will diminish when we compute the inverse, since the inverse isobtained by dividing by the corresponding eigenvalues. We do not have to find all theeigenvectors and the eigenvalues as is written above, it is sufficient if we affect the fewspecific eigenvectors and define a new matrixU # to obtain a well behaved( )
1U
# . Thequestion remains (of course) is how to find these six eigenvectors. The moststraightforward (and inefficient) way to do it is to actually compute the eigenvectors andeigenvalues by a matrix diagonalization procedure. Lucky we do not have to do that sincthe translation and rotation eigenvectors are known from the Eckart conditions. We havefor translation
( ) ( )0 =0 , ,i i i i i i ii
m r r r x y z = and for rotation
( )0 0i i i ii
m r r r = Mention the possibility of using Lagrange multipliers here
The last multiplication is a vector product, and the difference0i ir r is assumed to besmall. The coordinate vectors0ir are reference vectors used to define the coordinatesystem and are constants.
For the record, we write the vector product explicitly
( ) ( ) ( )0 0 0 0 0 0 00 0 0
x y z
i i i i i x i i i i y i i i i z i i i i
i i i
e e e
r r x y z e y z z y e x z z x e x y y x
x y z
= = +
The two equations defined six constraints. This is since they are vector equations, each o
the vectors has 3 components x,y,z. To obtain the eigenvectors associated with theseconstraints we need to compute the gradients of the above constraints. We have threevectors for translations that we denote by , ,tx ty tze e e , and three vectors for the rotation
, ,rx ry rze e e .Here are the translation vectors
( )( )( )
1 2
1 2
1 2
,0,0, ,0,0,..., ,0,0 0, ,0,0, ,0,...,0, ,0 0,0, ,0, 0, ,..., 0,0,
tx N
ty N
tz N
e m m m
e m m m
e m m m
=
=
=
And here are the rotations.( )( )( )
0 0 0 0 0 01 1 1 1 2 2 2 2
0 0 0 0 0 01 1 1 1 2 2 2 2
0 0 0 0 0 01 1 1 1 2 2 2 2
0, , ,0, , ,...,0, ,
,0, , ,0, ,..., ,0,
, ,0, , ,0,..., , ,0
rx N N N N
ry N N N N
rz N N N N
e m z m y m z m y m z m y
e m z m x m z m x m z m x
e m y m x m y m x m y m x
=
=
=
-
8/13/2019 General notes on computational biophysics
12/123
-
8/13/2019 General notes on computational biophysics
13/123
close to a minimum such that all the non-zero eigenvalues are positive. In the case thatthe quadratic expansion is accurate (and in sharp contrast to SDM) the minimization willconverge in one step.
What will happen to our solution if the eigenvalues are negative?
So far we made an important step forward establishing that the inverse of the adjustedsecond derivative matrix is likely to exist. There remains the problem of how todetermine the inverse efficiently.
In fact we do not need to compute an inverse explicitly since all we need is to solve alinear equation of the type Ax b= where A and b are known matrix and vector, and x isthe unknown vector that we seek. We start with a subset of problem that is easy tounderstand and to solve (triangular problems) and then we work our way up to the fullGaussian elimination.
Triangular problems
Example11 1 1
21 22 2 2
31 32 33 3 3
0 00
a x b
a a x b
a a a x b
=
which is rather easy to solve. We can immediately write1 1 11 x b a= . Using the (now)known value of1 x we can write for2 x , ( )2 2 21 1 22 x b a x a= . Similarly we can write for
3 x , ( )3 3 31 1 32 2 33 x b a x a x a=
For the general case we can write an implicit solution (in terms of the earlier1,..., 1 j x j i= )
1
1
i
i i ij j ii j
x b a x a
=
=
Note that a similar procedure applied to the upper triangular matrix11 12 13 1 1
22 23 2 2
33 3 3
0
0 0
a a a x b
a a x b
a x b
=
To solve a general linear problem we search for a way of transforming the matrix to atriangular form (which we know already how to solve). Formally, we seek the so-called LU decomposition in which the general A matrix is decomposed into a lower triangularmatrix L , and an upper triangular matrixU ( A LU = ). Note that if such adecomposition is known, we can solved the linear problem in two steps
-
8/13/2019 General notes on computational biophysics
14/123
Step 1.
( )
find using the lower triangular matrix
Ax b
LUx b
L Ux b
Ly b y L
==
=
=
Step 2.find x using the upper triangular matrixUx y U =
A way of implementing the above idea in practice is using Gaussian elimination.Gaussian elimination is an action that leads to a LU decomposition discussed above,even if the analogy is not obvious. In this course we will not prove the equivalence.
Gaussian eliminationConsider the following system of linear equations x x x x x x x x x x x x x x
x x x x x x x
x x x x x x x
x x x x x x x
=
where x denotes any number different from zero.We can eliminate the unknown1 x from rows 2 to n (the general matrix is of sizen n ).We multiply the first row by1 11ia a and subtract the result from rowi . By repeating theprocess 1n times we obtain the following (adjusted) set of linear equations that has thesame solution
0000
x x x x x x x
x x x x x x
x x x x x x
x x x x x x
x x x x x x
=
We can work on the newly obtained matrix in a similar way to eliminate2 x from row3 3to n. This we do by multiplying the second row by2 22ia a and subtract the results fromrows 3 to n. The (yet another) new matrix and linear equations will be of the form
00 00 00 0
x x x x x x x
x x x x x x
x x x x x
x x x x x
x x x x x
=
-
8/13/2019 General notes on computational biophysics
15/123
It should be obvious how to proceed with the elimination and to create (an upper)triangular matrix that we know by now how to solve.
-
8/13/2019 General notes on computational biophysics
16/123
Random numbers
Clearly a number that is produced on the computer in a deterministic way cannot be trulyrandom. So a valid question is what do we mean by a random number and how can wetest it?
We consider pseudo-random numbers that share some properties with random numbersbut obviously are reproducible on the computers and therefore are not truly random.
A useful definition of true random numbers is lack of correlations. If we consider theproduct of two random numbers1r and 2r , -- 1 2r r and we average over possible valuesof 1r and 2r , we should have1 2 1 2r r r r = .
So a test for a random number generator would be?
11 1
.... N N
N ii i
r r r = =
=
Essentially all the existing random number generators fail eventually on this kind of test.The common generators are cyclic in nature. There is an L -- large integer such that
i L ir r + = , hence the number of random numbers that can be generated is finite.
Widely used random number generators are based on the following simple (and fast)operations:
1 (mod m)k k I I + = +
The integers k I are between zero and m-1. Dividing by m provides a floating pointbetween 0 and 1. If all is well the sequence of the integers is uniformly distributed at theinterval [0,1]
Example: 322,147,437,301 453,816,981 m=2 = =
Using random numbers suggests a procedure to estimate
To improve the quality and the randomness of numbers generated by the aboveprocedure it is useful to have a long vector of random numbers and to shuffle them(randomly)In MATLABRand(n,m) provide an nxm matrix of random numbers.
The above procedure provides random numbers generated from a uniform distribution.Can we generate random numbers from other probability distribution (e.g. normal)?
-
8/13/2019 General notes on computational biophysics
17/123
A general procedure for doing it is based on the probability function. Let ( ) p x dx be theprobability of finding between x and x dx+ . Suppose that we want to generate a seriesof points x and then compute a function of these points ( ) y x . What will be thedistribution of the y -s? It will be connected to the probability function of the x -s.
( ) ( )
( ) ( )
p x dx p y dy
dx p y p x
dy
=
=
Example: suppose ( )( ) loge y x x=
( ) ydx p y dy dy e dydy
= =
Another example: Gaussian
We want21( ) exp / 2
2 p y dy y dy
=
select( ) ( )
( ) ( )
( )
( ) ( )
1 1 2
2 1 2
2 21 1 2
22
1
1 1
1 2 2 21 2
2 2
1 2
2log cos 2
2log sin 2
exp / 2
1 arctan2
1 1exp / 2 exp / 22 2
y x x
y x x
x y y
y x
y
x x y y
y y x x y y
=
=
= +
=
=
Note that there is one-to-one correspondence between x and y
-
8/13/2019 General notes on computational biophysics
18/123
CS 428: Homework I ([*] =15 points bonus)Due Thursday Sept 14 at 2:30PM.
1. Write a Matlab function that computes the interaction energy of N watermolecules and the forces that they exert on each other. It is useful to separate the
calculation to different functions, one function that computes the internal energyof the water molecules, and one function that computes the Lennard Jones andelectrostatic interactions. The input should be the coordinate vector and thenumber of water molecules and the output the potential value and the gradient ofthe potential. Check the analytical gradient by a finite difference formula.
2. Construct five trial configurations for a water dimer (your choice of a minimumenergy configuration) and evaluate their energies. Did you hit a minimum?
3. Write steepest descent algorithm to minimize the energy of the water dimer andminimize it. Report a plot of the relative conformation of the molecules. Prepare aMatlab movie that follows the minimization path.
4. (*) Write a code that computes the second derivative of the energy of the water
dimer.5. (*) Refine the optimal structure obtained from the steepest descent minimizationusing Newton Raphson minimization. Report the changes in structure/energy
In your submission include code, input, and output and a brief explanation. Bothelectronic and hard copies are required (send your electronic copy to [email protected]
-
8/13/2019 General notes on computational biophysics
19/123
HW4
Write a code that compute the structure and the energy of a two dimensional H/P polymeon a two dimensional lattice.
Consider a chain of length 14-mers. Enumerate all possible configurations of the chain othe lattice. How many independent (mirror-image symmetry unrelated) conformations diyou find?
Consider the following three sequence 14H , PPHHHHHHHHPPPP (8H,6P) andPPPPHHHHHHPPPP (6H,8P). Find the global energy minimum for each of thesesequences. Which of the sequence is more stable?
-
8/13/2019 General notes on computational biophysics
20/123
1
Statistics
Sampled from
Morris H. DeGroot & Mark J.Schervish, Probability and Statistics,
3rd Edition, Addison Wesley
-
8/13/2019 General notes on computational biophysics
21/123
2
Probability: Continuous distributionand variables
Continuous distributions Random variables Probability density function Uniform, normal and exponential distributions Expectations and variance Law of large numbers Central limit theorem Probability density functions of more than one
variable Rejection and transformation methods for sampling
distributions (section)
-
8/13/2019 General notes on computational biophysics
22/123
3
Statistics
Estimators: mean,standard deviation Maximum likelihood Confidence intervals statistics Regression Goodness of fit
2
-
8/13/2019 General notes on computational biophysics
23/123
4
Continuous random variables
A random variable X is a real valuefunction defined on a sample space S. X is a continuous random variable if a non-
negative function f, defined on the realline, exists such that an integral over the
domain A is the probability that X takes avalue in domain A. ( A is, for example, theinterval [a,b])
( ) ( )Prb
a
a X b f x dx< < =
-
8/13/2019 General notes on computational biophysics
24/123
5
Probability density function
f is called probability density function(p.d.f.). Note that the unit of the pdf beloware of 1/length, only after the multiplicationwith a length element we get probability
For every p.d.f. we have
( )
( )
0
1.
f x
f x dx
=
-
8/13/2019 General notes on computational biophysics
25/123
6
Examples of p.d.fs
A car is driving in a circle at a constant speed.What is the probability that it will be found in theinterval between 1 and 2 radians?
A computer is generating with equal probabilitydensity, random numbers between 0 and 1.What is the probability of obtaining 0.75?
Protein folds at a constant rate (the probabilitythat a protein will fold at the time interval [t,t+dt] is a constant dt ). If we have at time zero N 0 protein molecules, what is the probability that allprotein molecules will fold after time t ?
-
8/13/2019 General notes on computational biophysics
26/123
7
Uniform distribution on an interval
Consider an experiment in which a point X is selected from an intervalin such a way that the probability of finding
X at a given interval is proportional to theinterval length (hence the p.d.f. is a
constant). This distribution is called theuniform distribution We must have for thisdistribution
}:S x a x b=
( ) ( ) 1b
a
f x dx f x dx
= =
-
8/13/2019 General notes on computational biophysics
27/123
8
Uniform distribution (continue)
( )
1 for
0 otherwise
a x b f x b a
=
a b
1/(b-a)
f(x)
x
-
8/13/2019 General notes on computational biophysics
28/123
-
8/13/2019 General notes on computational biophysics
29/123
10
Exponential distribution
( ) ( )exp 0 f x a ax x= < <
x
f(x)
-
8/13/2019 General notes on computational biophysics
30/123
11
Normal distribution
( ) ( )( )1/ 2
20exp
a f x a x x x
= < <
xo
1 a
-
8/13/2019 General notes on computational biophysics
31/123
12
Continuous distribution functions( )defined as Pr( ) for -F x X x x= < <
F(x) is a monotonic non decreasing function of x (can you show it?),that can be written in terms of its corresponding p.d.f.
( ) ( ) ( )
( )
Pr
or
x
F x X x f x dx
dF f x
dx
= =
=
-
8/13/2019 General notes on computational biophysics
32/123
13
Distribution function: Example
( ) ( )
( ) ( ) ( ) ( ) ( 00
exp for 00 otherwise
exp exp 1 exp
x x x
a ax x f x
F x f x dx a ax dx ax ax
< < =
= = = =
-
8/13/2019 General notes on computational biophysics
33/123
14
Expectation
For a random variable X with a p.d.f. f(x)the expectation E(X) is defined
( ) ( )
( )
The expectation exists if and only if the integral is absolutely converg
E X x f x dx
x f x dx
=
<
-
8/13/2019 General notes on computational biophysics
34/123
15
Expectation (example)
( )
( )13
0
2 0 10 otherwise
22 23 3
x x f x
x E X x x dx
<
-
8/13/2019 General notes on computational biophysics
35/123
16
The Cauchy p.d.f.
( ) ( ) ( )( )
( ) ( ) ( ) ( )
( ) ( )
2
2
2
1 01
1 1 1arctan arctan21
1 11 12 2 1
x x
f x x f x x
F x dx x x x
F dx x
= < <
+ = = = +
= + = = +
-
8/13/2019 General notes on computational biophysics
36/123
17
Cauchy distribution: Expectation
( ) ( ) ( )2
Test for existence of expectation1
1Expectation exist for the Cauchy distributdoes not on.
E X x f x dx x dx x
= =
+
-
8/13/2019 General notes on computational biophysics
37/123
18
Some properties of expectations
Expectation is linear
If the random variables X and Y areindependent then
( ) ( ) ( ) E aX bY aE X bE Y + = +
( ) ( ) ( )( ), f x y f x f y=
( ) ( ) ( ) E X Y E X E Y =
-
8/13/2019 General notes on computational biophysics
38/123
19
Expectation of a function
Is essentially the same as the expectationof a variable
( )( ) ( ) ( ) ( )
( ) ( ) ( ) ( )2
22 2
of special interest is the expectation value of moments
variance
E r x r g r dr r x f x dx
E X E X x f x dx x f x dx
= =
=
Can you show that the variance is always non-negative?
-
8/13/2019 General notes on computational biophysics
39/123
20
Functions of several randomvariables
We consider a p.d.f.
of several random variables
The p.d.f. satisfies (of course)
( 1,..., n f x x
1,..., n X X
( )
( )1 2
1 2
1
1 1
,..., 0
... ,..., ... 1n
n
n
bb b
n na a a
f x x
f x x dx dx
=
-
8/13/2019 General notes on computational biophysics
40/123
21
Expectation of function of severalvariables
Similarly to one variable case,expectations of functions with severalvariables are computes
( )( ) ( ) ( )1 1 1 1,..., ... ,..., ,..., ...n n n n E Y r x x r x x f x x dx dx
= =
-
8/13/2019 General notes on computational biophysics
41/123
22
Example: expectation of more thanone variable
( ) ( )
( ) ( ) ( )
( )
1 12 2 2 2
0 01 1
2 2
0 0
1 for ,,0 otherwise
is a square: 0 1 0 1
,2 3
x y S f x y
S x y
E X Y x y f x y dx dy
x y dx dy
=
< < < 0
Prove it Why E(X)>t is not interesting?
( )Pr 0 1 X =
( ) ( )Pr E X
X t t
-
8/13/2019 General notes on computational biophysics
43/123
24
Chebyshev Inequalityis a special case of the Markov inequality
X is a random variable for which the varianceexists. For t>0
Substitute
to obtain the Markov inequality
( )( ) ( )2 2 2varPr X X E X t t
( ) ( ) ( )2 2 =var X and byY X E X E Y t t =
-
8/13/2019 General notes on computational biophysics
44/123
25
The law of large numbers I
Consider a set of N random variablesi.i.d. Each of the random variables hasmean (expectation value) and variance 2
The arithmetic average of n samples isdefined . It defines anew random variable that we call thesample mean
The expectation value of the sample mean
1,... n X X
( )11 ...n n X X X n
= + +
( ) ( )1 1n ii
E X E X nn n
= = =
-
8/13/2019 General notes on computational biophysics
45/123
26
The Law of Large Numbers II
The variance of n X ( ) ( )( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
( ) ( ) ( )( ) ( )
22 2
2 2,
2 2 2 2 2
22 2
1 1var
Since and are independent for1var
var var
n n n i j ii j i
i j i j i j
n
n
X E X E X E X X E X n n
X X i j E X X E X E X
X n E X n n E X n E X n
X E X E X n X n
= =
=
= +
= =
Which means that the variance is decreasing linearly with thenumber of sampled points
-
8/13/2019 General notes on computational biophysics
46/123
27
Law of Large numbers III
( )( ) ( )( ) ( )2 22 2 2Chebyshev Inequality:
var1 Pr Pr 1 for 0n n
n
X X X
n
X
= <
-
8/13/2019 General notes on computational biophysics
47/123
28
Central Limit Theorem
Statement without proof: Given a set of random variables
with mean i and variance 2i we define anew random variable
For very large n, the distribution ofis normal with mean and variance
1,..., n X X
1,...,1
22
1,...,
ii n
n
ii n
X Y
=
=
=
1,...,i
i n X =
1,..,i
i n
=
2
1,...,i
i n
=
-
8/13/2019 General notes on computational biophysics
48/123
29
Statistical Inference
Data generated from unknown probabilitydistribution and statement on the unknowndistribution are warranted. Determineparameters (e.g. for exponentialdistribution, and for normal
distribution) Prediction of new experiments
-
8/13/2019 General notes on computational biophysics
49/123
30
Estimation of parameters
Notation: f(x| ) is the probability density of sampling x given (conditioned on) parameters .
For a set of n independent and identically distributedsamples the probability density is:
However, what we want to determine now are theparameters For example assuming the distributionis normal, we seek the mean and the variance 2
( ) ( ) ( )11,...,
,..., | | |n ii n
f x x f x f =
= x
( ) ( )2
22
1| , exp2 2
x f x
=
-
8/13/2019 General notes on computational biophysics
50/123
31
Bayesian arguments
What we want is the functiongiven a set of observations x , what is theprobability that the set of parameters is ?
Bayesian statistics: Think of theparameters like other random variables
with probability ( ).
( )| f x
( ) ( ) ( )( ) ( ) ( )
The joint probability , | is also, |
f f
f f g
x x
x x x
-
8/13/2019 General notes on computational biophysics
51/123
32
The likelihood function
We can formally write
which is the probability of having a particular setof parameter for the p.d.f provided a set ofobservation (what we wanted). Note that ourprime interest here is in the parameter set
and the samples of x is given. Since g(x) isindependent of we can write the likelihoodfunction
( ) ( ) ( )
( )
|| f g
= xx
x
( ) ( ) ( )| | f x x
-
8/13/2019 General notes on computational biophysics
52/123
33
Example: Likelihood function I
Consider the exponential distribution
And assume the p.d.f. of the parameter is aGaussian with a mean and variance of 1.
( ) [ ]
( ) 1,...,
exp for 0|0 otherwise
exp for 0|
0 otherwise
ni
i n
x x f x
x x f
=
>= > =
x
( )21 exp
22
=
-
8/13/2019 General notes on computational biophysics
53/123
34
Example: Likelihood function II
( )( )
2
1/ 21,...,
1| exp exp22
ni
i n
x
=
=
x
-
8/13/2019 General notes on computational biophysics
54/123
35
Maximum Likelihood
( ) ( )( )We look for a maximum of the function log |as a function of the parametersn L f
= x
As a concrete example we consider the normal distribution
( ) ( )
( ) ( ) ( )22 2 1,...,
log | ,
1 log 2 log2 2 2
n
ii n
L f
n n x
=
=
=
x
To find the most likely set of parameters we determinethe maximum of L( )
-
8/13/2019 General notes on computational biophysics
55/123
36
Maximum of L( ) for normaldistribution
( )
( ) ( )
( )
2 21,..., 1,...,
1,...,
2
22 2 2 1,...,
22
1,...,
1 10 221
102 2
1
i ii n i n
ii n
ii n
ii n
dL x x nd
xn
dL n x
d
xn
= =
=
=
=
= = =
=
= + =
=
-
8/13/2019 General notes on computational biophysics
56/123
37
Determine a most likely parameter forthe uniform distribution
( )
( ) ( )
1 for 0|0 otherwise1 for 0 1,...,|0 otherwise
in
x f x
x i n f
= = =
x
It is clear that must be larger than all the x i and at the same time maximizesthe monotonically decreasing function , hence1 n
[ ]1max ,..., n x x =
-
8/13/2019 General notes on computational biophysics
57/123
38
Potential problems in maximum likelihoodprocedure
Value of is underestimated (note that should be larger than all x , notonly the ones we sample so far)No guarantee that a solution exists for the distribution below must belarge than any x but at the same time equal to the maximal x. This is notpossible and hence, no solution
The solution is not necessarily unique
( )1 for 0|0 otherwise
x f x
<
-
8/13/2019 General notes on computational biophysics
58/123
39
The 2 distribution with ndegrees of freedom
( )( )
( ) ( ) /2 1 /21 exp 2 0
2 2n
n f x x x xn= >
( ) ( ) var 2 E x n x n= =
There is a useful relation between the 2 and the normaldistributions
( ) 10
0n t n t e dt n = >
-
8/13/2019 General notes on computational biophysics
59/123
40
Theorem connecting 2 and normal distributions
If the random variables X 1,,X n are i.i.d. and if each ofthese variables has standard normal distribution, thenthe sum of the squares
Has a 2 distribution with n degrees of freedom
2 21 ... k nY X X = + +
( ) ( ) ( ) ( )( ) ( )
( ) ( )( ) ( ) ( ) ( ) ( )( ) ( )( )
2 1/ 2 1/ 2
1/ 2 1/ 2
1/ 21/ 2
1/ 2 1/ 2
The distribution functionsPr Pr Pr
The p.d.f is obtained by differentiating both side '
' . Note 2 exp / 2 . We have
1 2
F y Y y X y y X y
y y
f y F y
y y y y
f y y y
= = =
= =
= =
= + ( )( )( ) ( ) ( )
1/ 2 1/ 2
1/ 2 1/ 2
2
1 2
2 exp / 2which is the distribution with one degree of freedom
y y
f y y y
=
-
8/13/2019 General notes on computational biophysics
60/123
41
Normal distribution: ParametersLet X 1,,X n be a random sample from normal distribution having mean
and variance 2 . Then the sample mean (hat denotes M.L.E)
and the sample variance
are independent random variables.
has a normal distribution with a mean and variance 2 /n .
has a chi-square distribution of n-1 degrees of freedomWhy n-1 ? (next slide)
( )221,...,
1 i ni n
X X n
=
=
2 2 / n
1,...,
1 n ii n
X X n
=
= =
-
8/13/2019 General notes on computational biophysics
61/123
42
Parameters of the normaldistribution: Note 1
Let be a vector of random number oflength n sampled from the normal distribtuion Let be another vector of n random
numbers, related to the previous vector by lineartransformation A (AA t =I)
Consider now the calculation of the variance(next slide)
1,..., n x x1,..., n y y
=y Ax
-
8/13/2019 General notes on computational biophysics
62/123
-
8/13/2019 General notes on computational biophysics
63/123
44
Variance is not changing uponlinear transformations
Consider the expression
The analysis is based on the unitarity of A .Hence, linear transformation dos change thevariance of he distribution. This makes itpossible to exploit the difference between
( ) ( ) ( )
( ) ( ) ( )
2
1,..., 1,...,2
1,..., 1,...,
t
i n i n i ni n i n
t t t t t i n i n i n
i n i n
Y Y X X X X
X X X X X X
= =
= =
=
=
A A A A
A A
andn X
-
8/13/2019 General notes on computational biophysics
64/123
45
The n-1 (versus n) factor
Since A is arbitrary ( as long as it is unitary). We canchoose one of the transformation vectors a to be(1,,1)/n 1/2
The scalar product
Is identically zero (remember how we compute themean?)
Hence since we computed the average from the samesample we computed the variance, the variance lost onedegree of freedom.
0t n X X =a a
-
8/13/2019 General notes on computational biophysics
65/123
46
The n-1 factor II
Note that the n-1 makes sense. Consideronly a single sample point, which is ofcourse very poor and leaves a high degreeof uncertainty regarding the value sof theparameters. If we use n then the estimatedvariance becomes zero, while if we use n-
1 we obtain infinite, which is moreappropriate to the problem at hand, forwhich we have no information todetermine the variance
-
8/13/2019 General notes on computational biophysics
66/123
47
The t distribution(in preparation for confidence
intervals) Consider two random variables Y and Z , suchthat Y has chi-2 distribution with n degrees of
freedom and Z has a standard normaldistribution the variable X is defined by
Then the distribution of X is the t distribution with ndegrees of freedom.
1/ 2Y
X Z n =
-
8/13/2019 General notes on computational biophysics
67/123
48
The t distribution The function is tabulated and can be written in terms
of function
The t distribution is approaching the normaldistribution as . It has the same meanbut longer tails.
( )( )
( )1 /22
1/ 2
12 1 for
2
n
n
n x
t x xn nn
++ = + < <
( ) ( )10
exp x x dx
=
n
-
8/13/2019 General notes on computational biophysics
68/123
49
Confidence Interval
Confidence interval provide an alternativeto the use of estimator instead of theactual value of an unknown parameter.We can find an interval (A,B) that we thinkhas high probability of containing the
desired parameter. The length of theinterval gives us an idea how well we canestimate the parameter value.
-
8/13/2019 General notes on computational biophysics
69/123
50
Confidence interval: Example
Sample distribution is normal with mean and standard deviation . We expect tofind a sample S in the intervals
About 68.27% 95.45% and 99.73 of the timerespectively
; 2 ; 3S S S
-
8/13/2019 General notes on computational biophysics
70/123
51
Confidence interval for means
If the statistics S has the sample meanthen 95% and 99% confidence limits forestimation of the population mean aregiven by and respectively.
For large samples we can write(depending on the level of confidence weare interested in)
For small sample we need to t distribution
( )30n
X
1.96 X X 2.58 X X
c X zn
-
8/13/2019 General notes on computational biophysics
71/123
52
Confidence interval for the mean ofthe normal distribution
Let for a random sample from a normaldistribution with unknown mean and unknown variance.Let t n-1 (x) denote the p.d.f of the t distribution with n-1degrees of freedom, and let c be a constant such that
For every value of n, the value of c can be found fromthe table of the t distribution to fit the confidence(probability)
1,...,
n X X
( )1c
nc
t x dx
=
-
8/13/2019 General notes on computational biophysics
72/123
53
Confidence interval for meanssmall sample (n
-
8/13/2019 General notes on computational biophysics
73/123
1
Water
A very simple molecule that consists of 3 atoms: one oxygen and two hydrogen atoms.Some remarkable properties.Very high melting and vaporization points for liquid with
so light molecular mass. For example, CCl4 melt at 250K, while water at 273K.Anomalous density behavior at freezing (expansion in density ice lighter than liquidwater, ice 0.92 g/mL, liquid water 1.0g/mL).Very strong electrical forces (specialorientation forces?). High dielectric constant
Explain with simple microscopic model macroscopic behavior? Explain alsomicroscopic data, since macroscopic observation are too few.
StructuresCorrelation functions spatial & time.Spatial correlation functions (first and second peak) as determinant of interaction
strengthsTime correlations measures (for example, dielectric response).
A water molecule is neutral but it carriesa large dipole moment . The oxygen is highlynegative and the hydrogen positive. The dipole moment of the molecule (it is not linear)is 1.855 Debye unit (Debye = 3.3356410-30 C m). Electron charge and distance of oneangstrom corresponds to 4.803 Debye.
The geometry of a single water molecule is determined by the following parameters(using a model potential TIP3P position of an atom as light as hydrogen in quantummechanics is subject to considerable uncertainty): r(OH) 0.9572 a(HOH) 109.47 Since
the hydrogen atoms are symmetric, the dipole moment is an experimental indication thatthe molecule is not linear. The individual molecules are set to be rigid (no violations ofthe above internal coordinates are allowed). We shall use harmonic restraints to fix thegeometry (at least until we will learn how to handle holonomic constraints). For internalwater potential, we write
( ) ( ) ( )2 2 2internal 1 2 1 20.9572 0.9572 ' 1.5139o h o h h hU k r k r k r = + +
wherek and k are constants chosen to minimize the changes in the distances compared tothe ideal values, and are chosen empirically to be 600 kcal/mol2.
The potential betweenn water molecules using the TIP3P model is
( ) ( ) between molecules 12 6 , ,ik jl
i j i j k l ij ij ik jl
c c A BU
r oo r oo r K
> >
= +
the indices ,i j are for water molecules, the indices ,k l are for the atoms of a singlemolecule. Note the newly inserted constant K that is used for unit consistency. In our caseit is 332.0716. The first sum is over the oxygen atoms only and includes repulsion
-
8/13/2019 General notes on computational biophysics
74/123
2
between the different terms. This type of potential is also called Lennard Jones. Thehydrogen atoms (deprived from electrons) are so small that their influence on the hard-core shape of the molecule is ignored. The hard core is set to be spherical with theoxygen atom at the center. Hydrogen atoms (and oxygen atoms too) are coming to play inthe second term of electrostatic interactions.
The potential parameters are as follows
12
6
0
629,400 kcal /mol625.5 kcal /mol
0.8340.417 H
A
B
c
c
=== =
To begin with we will be interested in the water dimer. The interaction between the twomolecules can be written more explicitly (below) as
1 02 1 21 1 22 2 11 2 12 between molecules 12 6
1 2 1 2 1 2 1 21 1 22 1 21 2 12
11 21 11 22 12 21 12 22
11 21 11 22 12 21 12 22
o o h o h o h o h
o o o o o o o h o h o h o h
h h h h h h h h
h h h h h h h h
c c c c c c c c c c A BU
r r r r r r r
c c c c c c c cr r r r
= + + + + +
+ + + +
For the first monomer we useo1 h11 & h12 to denote the corresponding three atoms. Forthe second monomer we used insteado2 h21 & h22 . Of course the modeling of the
interaction between molecules must come together with the internal potential keeping thegeometry fixed. Note that we do not compute electrostatic and Lennard Jones interactionsof atom within the same molecule.
The water dimer has an optimal geometry (a minimum energy configuration) that we nowwish to determine. This is a question of how to arrange negatively charged atoms close to positively charge atoms and keep similar charges away from each other.Together withthe fix geometry of the individual molecules this is a frustrated system (you cannotmake everybody happy a classic example is of three spins) that also has implications onthe docking problem and drug design (determining the orientation of two interactingmolecules) and the range of complementing interactions and shape matching
What is the dimensionality of the problem ? We have three atoms in each of the watermolecules making a total of eighteen degrees of freedom. The actual number relevant forthe docking is much smaller. Of the eighteen six are of translation and rotation of thewhole system (not changing the relative orientation) and another six are reduced to theinternal constraints on the structure on the structure of the molecule. Hence, only sixdegrees of freedom remain when we attempt to match the two (rigid) water moleculestogether.
-
8/13/2019 General notes on computational biophysics
75/123
3
This is possible but requires considerable work for implementation and makes itnecessary to write the energy in considerably less convenient coordinates (the mostconvenient are just the Cartesian coordinates of the individual atoms). So rather than being clever at the very first step, we will use the simplest approach.In the simplest
approach we attempt to optimize the energy of the water dimer as a function of alldegrees of freedom . This requires more work from the computer, but less work from us.Minimizing the energy (and find the best structure) in this large 18 dimensional space issomething we cannot do by intuition and we must design an algorithm to do it.
The Algorithm of steepest descent is one way of doing it, and this is the first approachthat we are going to study. Consider a guess coordinate vector for the water dimer that wedenote by R (capitals denotes arrays or vectors, lower case denotes elements of a vector).It is a vector of length 18 and includes all the x,y,z coordinates. What is the best way oforganizing the coordinates? In some programs the vector of coordinates is decomposedinto three Cartesian vectors ( ) ( )1 2 18 1 18 1 18, , , ,... , ,..., , ,..., R X Y Z x x x y y z z = . This set up is
however less efficient from the view point of memory architecture. When we computedistances between atoms (the main computation required for the energy calculations) werequire the (x,y,z) coordinates of the two atoms. By aggregating all the Cartesiancomponents together (e.g. all the x-s are coming first) we may create cache misses. Tofind the Cartesian components of one atom makes it necessary to walk quite far alongthe array, every time we need one of the Cartesian components we need to step throughn numbers to pick the next Cartesian component. This is not a serious problem for 18atoms but for hundred of thousands or millions of atoms (like some simulations are) itmight be.A more efficient structure that we shall adopt is therefore:
( )1 1 1 18 18 18, , ,..., , , R x y z x y z = where the coordinates that belong to a single particle are kept together, making it lesslikely to have cache misses.
Starting from an initial configuration 0 R we want to find a lower energy configuration.The argument is that low energy configurations are more likely to be relevant. This is truein general but is not always sufficient. We shall come back when discussing entropytemperature and how to simulate thermal fluctuations.
For the moment assume that we are going to make a small step a displacement
( )0 R R R = of norm ( )20i ii
R q q = . We have used the so-called norm 2distance measure which is very common in computational biophysics. We have also usedanother variable iq , which is any of the Cartesian coordinates of the system. For a systemwith N atoms, it has3N iq . It is useful to know that a more general formulation would
-
8/13/2019 General notes on computational biophysics
76/123
4
be ( )1
( )0
nnni i
i
q q = . These different distances emphasize alternative aspects of the
system. For example whenn the norm approaches the largest displacement along agiven Cartesian direction in the system. Another widely used distance is for 1n = which
is Manhattan distance.The displacement R is made in a large space of 18 degrees of freedom, so while wefixed the size of the displacement to be small there is still a lot to be done to find anoptimal displacement that will reduce the energy as much as possible (for a given size ofa displacement). How are we going to choose the displacement (given that we arechoosing the norm of the displacement to be small). Here is a place where the Taylorsseries can come to the rescue. We expand the potential ( )U R in the neighborhood of thecurrent coordinate set 0 R to the first order in (other higher order terms are consideredsmall and are neglected).
( ) ( ) ( ) ( ) ( )0
0 0 0 0
i i
t i i
i i q q
dU U R R U R q q U R U Rdq
=
+ + = + 0
The expression( )t U means a transpose vector of rank 3 N , which is also called thegradient of the potential
( )1 2
, ,...,t N
dU dU dU U
dq dq dq
=
Similarly we have for R ( )
( )
( )
1 10
2 20
0
...
N N
q q
q q R
q q
=
The expression for the potential difference is a scalar (or inner) product between twovectors. Hence we can also write
( ) ( ) ( ) ( )0 0 2 2 cos cosU R R U R U U R U + = = We have omitted the 2 from the expression of the vector norm A on the right handside. We will always use the norm 2, unless specifically suggested otherwise, andtherefore carrying the 2 around is not necessary. Note that the gradient of the potential
is something that we compute at the current point,0 R , and it is not something that we canchange. Similarly we fixed the norm of the displacement . So the only variable intown is the direction of the displacement with respect to the gradient of the potential ( ) .To minimize the difference in energies (making ( )0U R R + as low as possible) weshould make ( )cos as small as possible. The best we could do is to make ( )cos 1 =
-
8/13/2019 General notes on computational biophysics
77/123
5
and choose the step in the opposite direction to the gradient vector. Hence, the steepestdescent algorithm for energy minimization is
U R
U
=
We did not discuss so far how to choose the step length , except to argue that it should be small. We expect that if the gradient is very small then we are near a minimum(actually a stationary point where 0U = ). In that case only a small step should be used.On the other hand if the gradient is large, a somewhat larger step could be taken since weanticipate a significant change in the function (to the better). Of course the step stillcannot be too large since our arguments are based on a linear expansion of the function.Based on the above arguments, it is suggestive to use the following (simpler) expressionfor the displacement that is directly proportional to the length of the gradient vector.
R U = We can imagine now a numerical minimization process that goes as follows:0. Init -- 0 R R=
1. Compute ( )U R
2. Compute a step R U = and a new coordinate set R R R = + 3. Check for convergence( )U If converged, stop. Otherwise return to 1.
This procedure finds for us a local minimum, a minimum to which we can slide downdirectly from an initial guess. It will not provide a solution to the global optimization problem, finding the minimum that is the lowest of them all.We can take the above expression one step further and write it down as a differentialequation with the coordinates a function of a progress variable . This is not the mostefficient way of minimizing the structure under consideration but it will help usunderstand the process better. We writedR
U d
=
where the (dummy) variable is varying from zero to . At we approach thenearby stationary point. Not that the potential ( )( )U R is a monotonically non-increasing function of . This is easy to show as follows. Multiple both side of the
equation byt
dRd
, we have
t t
t i
i i
dR dR dRU d d d
dqdR dU U
d d dq
=
=
Assuming that the potential does not have explicit dependence of (i.e. that is morethan a dummy variable which contradicts our initial set-up) we can write the finalexpression in a more compact and illuminating form
-
8/13/2019 General notes on computational biophysics
78/123
6
0
0
t dR dR dU d d d
dU
d
=
Hence as we progress the solution of the differential equation the potential energy isdecreasing in the best case and is not increasing in the worst case. There can be differentvariants of how to choose the norm of the step - , we already mentioned two of them.
Another important variant of the steepest descent minimization is to make the step size a parameter and to optimize it for a given search direction defined byU . One approachwould be to performa search for a minimum along the line defined by the U , i.e. we seek a such that ( ) ( ) 0t U R U R U + = as a function of the single scalarvariable . This one-dimensional minimization makes sense if the calculation of thegradient can be avoided in the one-dimensional minimization, and search steps in theone-dimensional minimization can be computed more efficiently than the determinationof the direction. For example, in the one-dimensional optimization only calculation of theenergy ( )U R U + can be used in conjunction with the approach of interval halving.We guess a given 0 and if the energy is going up we try a half of the previous size
0 2 , if it is going down we double it. We continue to evaluate the progress of halvingan interval that contains a minimum until we hit a minimum with desired properties(halving on the left or halving on the right results in an increasing energy).This schemeassumes that the calculation of the potential energy is a more efficient than thecalculation of the gradient . This is however not the case here and for the task at handthe line search option of the steepest descent algorithm is not efficient.
We note that compared to other optimization algorithms (such as conjugate gradient thatwe shall not discuss, and the Newton-Raphson algorithm that we will) the steepestdescent algorithm is considerably slower. However, it is a lot more stable than (forexample) the Newton Raphson approach. It is a common practice in molecularsimulations to start with a crude minimizer like the Steepest Descent algorithmdescribed above to begin with (if the initial structure is pretty bad), and then to refine thecoordinates to perfection using something like the Newton Raphson algorithm, which ison our agenda.
A few words about computing potential derivatives (which is of prime importance forminimization algorithms in high dimensions. It is unheard of having effective minimizersin high dimensions without derivatives, since we must have a sense of direction where togo. That sense of direction is given by the potential gradient.
Overall, the potential derivatives are dominated by calculations of distances. It istherefore useful to consider the derivative of a distance between two particles, as afunction of the Cartesian coordinates.
-
8/13/2019 General notes on computational biophysics
79/123
7
( ) ( ) ( )( )
( ) ( ) ( )
2 2 2
2 2 2 , ,
ij i j i j i j
i jij
i i j i j i j
r x x y y z z
w wdr w x y z
dw x x y y z z
= + +
= =
+ +
With the above formulas at hand it is straightforward to do all kind of derivatives withextensive use of the chain rule. For example the Lennard Jones term
( )
( )
12 6 13 7 13 7
14 8
12 6 12 6
12 6
k jkj
i j j k j k k ij ij kj kj k kj kj kj
k j j k kj kj
w wdr d A B A B A Bdw r r r r dw r r r
A Bw w
r r
>
= + = +
= +
Note that the expression above depends only on even power of the distance (14 and 8)which is good news, meaning that no square roots are required. Square roots are the curseof simulations and are much more expensive to compute compare to add/multiply etc.Unfortunately, electrostatic interactions: energy and derivatives yield odd powers, and aremore expensive to compute. The potential and its derivatives require square rootcalculations. And here is the expression for a set of independent atoms (for conveniencewe forget about molecules here)
( ) ( )2 3, , , ,
i j i j i jk ik i
i j i j i jk i j i k i k i k
c c c c c cw wd w w
dw r r r r >
= =
Here is a little tricky question. To compute the energy, (which is a single scalar),for N atoms we need to calculate 2 2 N terms (assuming all unique distancescontribute to the energy), and then add them up. How many terms we need tocompute for the gradient of the potential?
Here is another trick question going back to our data structure. Suppose that I amadding strong electric field, E , along the Z axis at a specific direction and thecorresponding (added) energy takes the form E field i i
i
U c E z = . Is the datastructure that we proposed (keeping the Cartesian components of one particlevector together is still ok?
Calculations of potential gradients are a major source of errors in programming molecularmodeling code. It is EXTREMELY useful to check the analytical derivatives againstnumerical derivatives computed by finite difference, when the expected accuracy should be of at least a few digits. For example
-
8/13/2019 General notes on computational biophysics
80/123
8
( ) ( )1 1,..., 2,..., ,..., 2 ,..., k N k N k
U q q q U q q qdU k
dq
+
0
For the water system the step can be 610 (using double precision) and the expectedaccuracy is at least of 3-4 digits. More than that suggest an error.
This concludes (in principle) our discussion of thesteepest descent minimizationalgorithm. From the above discussion it is quite clear that minimization in theneighborhood of a stationary point of the potential (like a minimum) is difficult . Near the minimum the gradient of the potential is close to zero, subject to potentialnumerical error and support the use of only extremely small step that are harder toconverge numerically. In that sense the algorithm we discuss next is complementary tothe steepest descent approach. It is working well in the neighborhood of a minimum. It isnot working so well if the starting structure is very far from a minimum, since the largestep taken by this algorithm (Newton Raphson) relies on the correctness of thequadratic expansion of the potential energy surface near a minimum .
We start by considering a linear expansion of the potential derivatives at the current position R in the neighbor hood of the desired minimumm R . At the minimum, thegradient is (of course) zero. We have
( ) ( ) ( )2
0m i im j ji j i
d U U R U R q q
dq dq = + 0
The entity that we wish to determine from the above equation ism R the position of theminimum. The square bracket with a subscript[ ]... j denotes the j vector element. The last
term is a multiplication of matrix by a vector. From now onwards we shall denote thesecond derivative matrix byU # . We are attempting to do so by expanding the forcelinearly in the neighbor hood of the current point. This is one step up in expansioncompared to the steepest descent minimization; however, it is not sufficient in general. Inthe simplest version of the Newton Raphson (NR) approach very large steps are allowed.Large steps can clearly lead to problems if the linear expansion is not valid. Nevertheless,the expansion is expected to be valid if we are close to a minimum, since any function inthe neighborhood of a minimum can be expanded (accurately) up to a second order term
( ) ( ) ( ) ( ),
12
t
m m mi j
U R U R R R U R R + #
Note that we did not write down the first order derivatives (gradient) since they are zeroin at the minimum. This is one clear advantage of NR with respect to steepest descentminimizer (SDM). SDM relies on the first derivatives only, derivatives that vanish in theneighborhood of a minimum. It is therefore difficult for SDM to make progress in theecircumstances while NR can do it in one single shot as we see blow.We write again the equation for the gradient in a matrix form
( ) ( ) ( ) ( )0m mU R U R U R R R = + #0
-
8/13/2019 General notes on computational biophysics
81/123
9
which we can formally solve (for m R ) as
( )( ) ( )1
m R R U R U
= #
The matrix ( )1
U # is the inverse of the matrixU # , namely ( )
1U U I
=# # where I is the
identity matrix. In principle there is nothing in the above equation that determine thenorm of the step m R R that we should take. If the system is close the quadratic thesecond derivative matrix is roughly a constant and a reasonably large step to ward theminimum can be taken without violating significantly validity of the above equation. Theability to take a large step in quadratic like system is a clear advantage of NR compare toSDM. However for systems that are not quadratic the matrix is not a constant and onlysmall steps (artificially enforced) should be used. In our water system NR should be usedwith care and only sufficiently close to a minimum. There are a few technical points that
specifically should concern us with the water dimer optimization problem. We will beconcern with the following1. Does the matrixU # have an inverse, and what can we do if it does not?2. How to find the inverse (or solve the above linear equations)?
The bad news is that the matrixU # for molecular systems (as the water dimer is) does nothave in general an inverse. The problem is the six degrees of freedom that we mentionedearlier and do not affect the potential energy: three overall translation and three overallrotations have zero eigenvalues making the inverse singular. This is easy to see asfollows
Let the eigenvectors of theU # be ie and the eigenvalues i . It is possible to write thematrix U # as the following sum
t i i i
i
U e e = # where the outer product
( )
1 1 1 1 2 1
2 2 1 2 2 21 2
1 2
...
......
... ... ... ... ......
i i i i i i iN
i i i i i i iN t i i i i iN
iN iN i iN i iN iN
e e e e e e e
e e e e e e ee e e e e
e e e e e e e
= =
generates a symmetric NxN matrix. Since the vectorsie are orthogonal to each other( i j ije e = ), it is trivial to write down the inverse in this case
( )1 t
i i
i i
e eU
= # However, this expression is true only if all i are different from zero. The eigenvectorsrepresent directions of motion and the eigenvalues are associated with the cost in energy
-
8/13/2019 General notes on computational biophysics
82/123
10
for moving in a direction determined by the corresponding eigenvector. However movingalong the direction of global translation (or global rotation) does not change the energy,therefore their corresponding eigenvalues must be equal to zero, and the inverseimpossible to get.
One way of getting around this problem is by shifting the eigenvalues of the offendingeigenvectors. If we know in advance the six offending eigenvectors we can raise theireigenvalues to very high values (instead of zero) by adding to the matrix outer productsof these vectors multiplied by very high value (see below). Contribution of eigenvectorswith very high value will diminish when we compute the inverse, since the inverse isobtained by dividing by the corresponding eigenvalues. We do not have to find all theeigenvectors and the eigenvalues as is written above, it is sufficient if we affect the fewspecific eigenvectors and define a new matrixU # to obtain a well behaved( )
1U
# . Thequestion remains (of course) is how to find these six eigenvectors. The moststraightforward (and inefficient) way to do it is to actually compute the eigenvectors andeigenvalues by a matrix diagonalization procedure. Lucky we do not have to do that sincethe translation and rotation eigenvectors are known from the Eckart conditions. We havefor translation
( ) ( )0 =0 , ,i i i i i i ii
m r r r x y z = and for rotation
( )0 0i i i ii
m r r r = Mention the possibility of using Lagrange multipliers here
The last multiplication is a vector product, and the difference 0i ir r is assumed to besmall. The coordinate vectors 0
ir are reference vectors used to define the coordinate
system and are constants.
For the record, we write the vector product explicitly
( ) ( ) ( )0 0 0 0 0 0 00 0 0
x y z
i i i i i x i i i i y i i i i z i i i i
i i i
e e e
r r x y z e y z z y e x z z x e x y y x
x y z
= = +
The two equations defined six constraints. This is since they are vector equations, each ofthe vectors has 3 components x,y,z. To obtain the eigenvectors associated with theseconstraints we need to compute the gradients of the above constraints. We have three
vectors for translations that we denote by , ,tx ty tz e e e , and three vectors for the rotation, ,rx ry rz e e e .
Here are the translation vectors( )( )( )
1 2
1 2
1 2
,0,0, ,0,0,..., ,0,0 0, ,0,0, ,0,...,0, ,0 0,0, ,0, 0, ,..., 0,0,
tx N
ty N
tz N
e m m m
e m m m
e m m m
=
=
=
-
8/13/2019 General notes on computational biophysics
83/123
11
And here are the rotations.
( )( )
( )
0 0 0 0 0 01 1 1 1 2 2 2 2
0 0 0 0 0 01 1 1 1 2 2 2 2
0 0 0 0 0 01 1 1 1 2 2 2 2
0, , ,0, , ,...,0, ,
,0, , ,0, ,..., ,0,
, ,0, , ,0,..., , ,0
rx N N N N
ry N N N N
rz N N N N
e m z m y m z m y m z m y
e m z m x m z m x m z m x
e m y m x m y m x m y m x
=
=
=
A vector that is orthogonal to the above six does not include overall rotation ortranslation component. So our goal is to work in the reduced space that does not includethe above six. Note that the Eckart conditions are linear so the constraints are constant(independent of the current coordinates). This makes the manipulation of these vectorsstraightforward to do, and doing it only once at the beginning of the calculation. Oneapproach is to modify input vectors (project out from them the offending part). This ishowever, quite expensive and will need further work for any incoming vector. A muchsimpler procedure is to modify the matrix, which is what we shall do.
*** IMPORTANT CORRECTION: For the application of the Eckart conditions to theoptimization problem we must set all the masses to one. In the calculation of the potentialthe masses was not used and it should not be used also in the construction of theconstraints. So we have (for minimization)
( ) ( )0 =0 , ,i i i i i ii
r r r x y z = and for rotation
( )0 0i i ii
r r r =
*** End of correction
Note that the so produced six vectors are not orthogonal. They span the complete six-foldspace of eigenvectors with eigenvalues that are equal to zero. However, attempting touse them within our procedure require them to be orthonormal. We are making it into that point by performing Gram Smith process on the space of the constraints derivatives.
What we do is basically the following. We have a set of N linearly independent vectorsWe pick one of them at random (call it for convenience1e ) and normalize it
' 11
1
e
ee
=
Let 2e be the second vector that we pick from the set. We make it orthogonal to'1e and
normalize it.( )
( )
' '2 2 1 1'
2 ' '2 2 1 1
t
t
e e e ee
e e e e
=
-
8/13/2019 General notes on computational biophysics
84/123
12
The next vector on the agenda we make it orthogonal to the previously constructed vector'1e and
'2e and normalize it. The same process is used for the rest of the base vectors
With orthonormal representation of the constraint space{ }' 1,...,6i ie = , we can redefine the
second derivative matrix using a shifting procedure to MUCH higher values for all theoverall body motions. We have
( )' '1,...,6
' t
up i ii
U U e e =
= + # #
where up is a very large number in accord with the idea that we promote earlier (oncethe inverse is computed 1 up will be significantly lower than anything else (say
810up = for the water dimmer).
The modified second derivative matrix is now ready to prime time NR optimization,since it has a straightforward inverse (we must be a little careful though, it is possible thata point on the energy surface will be found for which the second derivative is zero even ifit is not global motion). Here we ignore this possibility, assuming that we are sufficientlyclose to a minimum such that all the non-zero eigenvalues are positive. In the case thatthe quadratic expansion is accurate (and in sharp contrast to SDM) the minimization willconverge in one step.
What will happen to our solution if the eigenvalu