3. optimization methods for molecular modeling by barak raveh
TRANSCRIPT
![Page 1: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/1.jpg)
3. Optimization Methods for Molecular Modeling
by Barak Raveh
![Page 2: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/2.jpg)
Outline• Introduction• Local Minimization Methods (derivative-based)
– Gradient (first order) methods– Newton (second order) methods
• Monte-Carlo Sampling (MC)– Introduction to MC methods– Markov-chain MC methods (MCMC)– Escaping local-minima
![Page 3: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/3.jpg)
Prerequisites for Tracing the Minimal Energy Conformation
I. The energy function:The in-silico energy function should correlate with the (intractable) physical free energy. In particular, they should share the same global energy minimum.
II. The sampling strategy:Our sampling strategy should efficiently scan the (enormous) space of protein conformations
![Page 4: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/4.jpg)
The Problem: Find Global Minimum on a Rough One Dimensional Surface
rough = has multitude of local minima in a multitude of scales.
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 5: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/5.jpg)
The landscape is rough because both small pits and the Sea of Galilee are local
minima.
The Problem: Find Global Minimum on a Rough Two Dimensional Surface
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 6: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/6.jpg)
The Problem: Find Global Minimum on a RoughMulti-Dimensional Surface
• A protein conformation is defined by the set of Cartesian atom coordinates (x,y,z) or
by Internal coordinates (φ /ψ/χ torsion angles ; bond angles ; bond lengths)
• The conformation space of a protein with 100 residues has ≈ 3000 dimensions
• The X-ray structure of a protein is a point in this space.
• A 3000-dimensional space cannot be systematically sampled, visualized or
comprehended.
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 7: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/7.jpg)
Characteristics of the Protein Energetic Landscape
smooth? rugged?
Images by Ken Dill
space of conformations
energy
![Page 8: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/8.jpg)
Outline• Introduction• Local Minimization Methods (derivative-based)
– Gradient (first order) methods– Newton (second order) methods
• Monte-Carlo Sampling (MC)– Introduction to MC methods– Markov-chain MC methods (MCMC)– Escaping local-minima
![Page 9: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/9.jpg)
Example: removing clashes from X-ray models
Local Minimization Allows the Correction of Minor Local Errors in Structural Models
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 10: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/10.jpg)
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
The path to the closest local minimum = local minimization
What kind of minima do we want?
![Page 11: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/11.jpg)
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
The path to the closest local minimum = local minimization
What kind of minima do we want?
![Page 12: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/12.jpg)
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
The path to the closest local minimum = local minimization
What kind of minima do we want?
![Page 13: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/13.jpg)
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
The path to the closest local minimum = local minimization
What kind of minima do we want?
![Page 14: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/14.jpg)
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
The path to the closest local minimum = local minimization
What kind of minima do we want?
![Page 15: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/15.jpg)
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
The path to the closest local minimum = local minimization
What kind of minima do we want?
![Page 16: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/16.jpg)
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
The path to the closest local minimum = local minimization
What kind of minima do we want?
![Page 17: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/17.jpg)
A Little Math – Gradients and Hessians Gradients and Hessians generalize the first and second derivatives (respectively)
of multi-variate scalar functions ( = functions from vectors to scalars)
jijiji
jijiji
jijiji
ji
zz
E
yz
E
xz
E
zy
E
yy
E
xy
E
zx
E
yx
E
xx
E
rr
Eij
222
222
222
2
h
i
i
i
ii
z
Ey
Ex
E
r
E
Gradient Hessian
Energy = f(x1, y1, z1, … , xn, yn, zn)
![Page 18: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/18.jpg)
Analytical Energy Gradient (i) Cartesian Coordinates
Energy, work and force: recall that Energy ( = work) is defined as force integrated over distance Energy gradient in Cartesian coordinates = vector of forces that act upon atoms (but this is not exactly so for statistical energy functions, that aim at the free energy ΔG)
Energy, work and force: recall that Energy ( = work) is defined as force integrated over distance Energy gradient in Cartesian coordinates = vector of forces that act upon atoms (but this is not exactly so for statistical energy functions, that aim at the free energy ΔG)
E = f(x1, y1 ,z1, … , xn, yn, zn)
nnn z
E
y
E
x
E
z
E
y
E
x
E...
111
Example: Van der-Waals energy between pairs of atoms – O(n2) pairs:
ji ijij
VdW R
B
R
AE
,612
713
612
ijijij
VdW
R
B
R
A
R
E
222 )()()( jijijiij zzyyxxR
![Page 19: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/19.jpg)
Enrichment: Transforming a gradient between Cartesian and Internal coordinates (see Abe, Braun, Nogoti and Gö, 1984 ; Wedemeyer and Baker, 2003)
Consider an infinitesimal rotation of a vector r around a unit vector n . From physical
mechanics, it can be shown that:
Analytical Energy Gradient (ii) Internal Coordinates (torsions, etc.)
E = f(1, 1, 1, 11, 12 , …)
...1211111 EEEEE
Note: For simplicity, bond lengths and bond angles are often ignored
rnr
cross product – right hand rule
n x r
r
n
Using the fold-tree (previous lesson), we can recursively propagate changes in internal coordinates to the whole structure (see Wedemeyer and Baker 2003)
n
adapted from image by Sunil Singh http://cnx.org/content/m14014/1.9/
r
![Page 20: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/20.jpg)
Gradient Calculations – Cartesian vs. Internal Coordinates
For some terms, Gradient computation is simpler and more natural with Cartesian coordinates, but harder for others:• Distance / Cartesian dependent: Van der-Waals term ; Electrostatics ; Solvation
• Internal-coordinates dependent: Bond length and angle ; Ramachandran and Dunbrack terms (in Rosetta)
• Combination: Hydrogen-bonds (in some force-fields)
Reminder: Internal coordinates provide a natural distinction between soft constraints (flexibility of φ/ψ torsion angles) and hard constraints with steep gradient (fixed length of covalent bonds). Energy landscape of Cartesian coordinates is more rugged.
Reminder: Internal coordinates provide a natural distinction between soft constraints (flexibility of φ/ψ torsion angles) and hard constraints with steep gradient (fixed length of covalent bonds). Energy landscape of Cartesian coordinates is more rugged.
![Page 21: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/21.jpg)
• Analytical solutions require a closed-form algebraic formulation of energy score
• Numerical solution try to approximate the gradient (or Hessian)– Simple example:
f’(x) ≈ f(x+1) – f(x)– Another example:
the Secant method (soon)
Analytical vs. Numerical Gradient Calculations
![Page 22: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/22.jpg)
Outline• Introduction• Local Minimization Methods (derivative-based)
– Gradient (first order) methods– Newton (second order) methods
• Monte-Carlo Sampling (MC)– Introduction to MC methods– Markov-chain MC methods (MCMC)– Escaping local-minima
![Page 23: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/23.jpg)
Gradient Descent Minimization AlgorithmSliding down an energy gradient
good ( = global minimum)
local minimum
Image by Ken Dill
![Page 24: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/24.jpg)
1. Coordinates vector (Cartesian or Internal coordinates):
X=(x1, x2,…,xn)
2. Differentiable energy function:
E(X)
3. Gradient vector:
xxx n
EEEX ,......,,)(
21
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
Gradient Descent – System Description
![Page 25: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/25.jpg)
Gradient Descent Minimization Algorithm:
Parameters: λ = step size ; = convergence threshold
• x = random starting point• While (x) >
– Compute (x)– xnew = x + λ(x)
• Line search: find the best step size λ in order to minimize E(xnew) (discussion later)
Note on convergence condition: in local minima, the gradient must be zero (but not always the other way around)Note on convergence condition: in local minima, the gradient must be zero (but not always the other way around)
![Page 26: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/26.jpg)
Line Search Methods – Solving argminλE[x + λ(x)]:
(1)This is also an optimization problem, but in one-dimension…(2)Inexact solutions are probably sufficientInterval bracketing – (e.g., golden section, parabolic interpolation, Brent’s search)
– Bracketing the local minimum by intervals of decreasing length – Always finds a local minimum
Backtracking (e.g., with Armijo / Wolfe conditions): – Multiply step-size λ by c<1, until some condition is met– Variations: λ can also increase
1-D Newton and Secant methodsWe will talk about this soon…
![Page 27: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/27.jpg)
The (very common) problem: a narrow, winding “valley” in the energy landscape The narrow valley results in miniscule, zigzag steps
The (very common) problem: a narrow, winding “valley” in the energy landscape The narrow valley results in miniscule, zigzag steps
2-D Rosenbrock’s Function: a Banana Shaped ValleyPathologically Slow Convergence for Gradient Descent
100 iterations
1000 iterations
0 iterations
10 iterations
![Page 28: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/28.jpg)
(One) Solution: Conjugate Gradient Descent• Use a (smart) linear combination of gradients from previous
iterations to prevent zigzag motion
Parameters: λ = step size ; = convergence threshold
• x0 = random starting point• Λ0 = (x0)• While Λi >
– Λi+1 = (xi) + βi∙Λi– choice of βi is important
– Xi+1 = xi + λ ∙ Λi • Line search: adjust step size λ to
minimize E(Xi+1)
gradient descent
Conjugated gradient descent
• The new gradient is “A-orthogonal” to all previous search direction, for exact line search• Works best when the surface is approximately quadratic near the minimum (convergence in N iterations),
otherwise need to reset the search every N steps (N = dimension of space)
• The new gradient is “A-orthogonal” to all previous search direction, for exact line search• Works best when the surface is approximately quadratic near the minimum (convergence in N iterations),
otherwise need to reset the search every N steps (N = dimension of space)
![Page 29: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/29.jpg)
Outline• Introduction• Local Minimization Methods (derivative-based)
– Gradient (first order) methods– Newton (second order) methods
• Monte-Carlo Sampling (MC)– Introduction to MC methods– Markov-chain MC methods (MCMC)– Escaping local-minima
![Page 30: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/30.jpg)
Root Finding – when is f(x) = 0?
![Page 31: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/31.jpg)
Taylor’s Series
First order approximation:
Second order approximation:
The full Series:
Example:
=
(a=0)
![Page 32: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/32.jpg)
Taylor’s Approximation: f(x)=ex
![Page 33: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/33.jpg)
Taylor’s Approximation of f(x) = sin(x)2x at x=1.5
-2
-1
0
1
2
-3-2-10123
sin(X)^(2X)
![Page 34: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/34.jpg)
Taylor’s Approximation of f(x) = sin(x)2x at x=1.5
-2
-1
0
1
2
-3-2-10123
sin(X)^(2X)
1st order
![Page 35: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/35.jpg)
Taylor’s Approximation of f(x) = sin(x)2x at x=1.5
-2
-1
0
1
2
-3-2-10123
sin(X)^(2X)
1st order
2nd order
![Page 36: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/36.jpg)
-2
-1
0
1
2
-3-2-10123
sin(X)^(2X)1st order2nd order3rd order
Taylor’s Approximation of f(x) = sin(x)2x at x=1.5
![Page 37: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/37.jpg)
From Taylor’s Series to Root Finding(one-dimension)
First order approximation:
)(!1
)(')()( ax
afafxf
Root finding by Taylor’s approximation:
)(!1
)(')(0 ax
afaf
)('
)(
af
afax
![Page 38: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/38.jpg)
Newton-Raphson Method for Root Finding(one-dimension)
1. Start from a random x0
2. While not converged, update x with Taylor’s series:
)('
)(1
n
nnn xf
xfxx
![Page 39: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/39.jpg)
Image from http://www.codecogs.com/d-ox/maths/rootfinding/newton.php
Newton-Raphson: Quadratic Convergence Rate
THEOREM: Let xroot be a “nice” root of f(x). There exists a “neighborhood” of some size Δ around xroot , in which Newton method will converge towards xroot quadratically ( = the error decreases quadratically in each round)
![Page 40: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/40.jpg)
The Secant Method(one-dimension)
• Just like Newton-Raphson, but approximate the derivative by drawing a secant line between two previous points:
01
01 )()()('
xx
xfxfxf
Secant algorithm:
1. Start from two random points: x0, x1
2. While not converged:
•Theoretical convergence rate: golden-ratio (~1.62)•Often faster in practice: no gradient calculations
![Page 41: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/41.jpg)
Newton’s Method:from Root Finding to Minimization
Second order approximation of f(x):
2)(!2
)('')(
!1
)(')()( ax
afax
afafxf
Minimum is reached when derivative of approximation is zero:
))(('')('0 axafaf
)(''
)('
af
afax
take derivative (by X)
• So… this is just root finding over the derivative (which makes sense since in local minima, the gradient is zero)
• So… this is just root finding over the derivative (which makes sense since in local minima, the gradient is zero)
![Page 42: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/42.jpg)
Newton’s Method for Minimization:
1. Start from a random vector x=x0
2. While not converged, update x with Taylor’s series:
)(''
)('
xf
xfxxnew
Notes: • if f’’(x)>0, then x is surely a local minimum point• We can choose a different step size than one
![Page 43: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/43.jpg)
Newton’s Method for Minimization:Higher Dimensions
1. Start from a random vector x=x0
2. While not converged, update x with Taylor’s series:
Notes: •H is the Hessian matrix (generalization of second derivative to high dimensions)•We can choose a different step size using Line Search (see previous slides)
![Page 44: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/44.jpg)
Generalizing the Secant Method to High Dimensions: Quasi-Newton Methods
• Calculating the Hessian (2nd derivative) is expensive numerical calculation of Hessian
• Popular methods: – DFP (Davidson – Fletcher – Powell)– BFGS (Broyden – Fletcher – Goldfarb – Shanno)– Combinations
Timeline: Newton-Raphson (17th century) Secant method DFP (1959, 1963)
Broyden Method for roots (1965) BFGS (1970)
Timeline: Newton-Raphson (17th century) Secant method DFP (1959, 1963)
Broyden Method for roots (1965) BFGS (1970)
![Page 45: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/45.jpg)
Some more Resources on Gradient and Newton Methods
• Conjugate Gradient Descent http://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf
• Quasi-Newton Methods: http://www.srl.gatech.edu/education/ME6103/Quasi-Newton.ppt
• HUJI course on non-linear optimization by Benjamin Yakir http://pluto.huji.ac.il/~msby/opt-files/optimization.html
• Line search:– http://pluto.huji.ac.il/~msby/opt-files/opt04.pdf– http://www.physiol.ox.ac.uk/Computing/Online_Documentation/Matlab/toolbox/nnet/backpr59.html
• Wikipedia…
![Page 46: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/46.jpg)
Outline• Introduction• Local Minimization Methods (derivative-based)
– Gradient (first order) methods– Newton (second order) methods
• Monte-Carlo Sampling (MC)– Introduction to MC methods– Markov-chain MC methods (MCMC)– Escaping local-minima
![Page 47: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/47.jpg)
Arbitrary starting point
Example: predict protein structure from its AA sequence.
Harder Goal: Move from an Arbitrary Model to a Correct One
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 48: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/48.jpg)
10
iteration
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 49: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/49.jpg)
100
iteration
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 50: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/50.jpg)
200
iteration
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 51: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/51.jpg)
400
iteration
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 52: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/52.jpg)
800
iteration
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 53: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/53.jpg)
1000
iteration
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 54: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/54.jpg)
1200
iteration
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 55: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/55.jpg)
1400
iteration
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 56: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/56.jpg)
1600
iteration
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 57: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/57.jpg)
1800
iteration
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 58: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/58.jpg)
2000
iteration
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 59: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/59.jpg)
4000
iteration
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 60: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/60.jpg)
7000
iterationThis time succeeded, in many cases not.
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 61: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/61.jpg)
What kind of paths do we want?
The path to the global minimum
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 62: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/62.jpg)
What kind of paths do we want?
The path to the global minimum
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 63: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/63.jpg)
What kind of paths do we want?
The path to the global minimum
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 64: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/64.jpg)
What kind of paths do we want?
The path to the global minimum
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 65: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/65.jpg)
What kind of paths do we want?
The path to the global minimum
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 66: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/66.jpg)
What kind of paths do we want?
The path to the global minimum
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 67: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/67.jpg)
What kind of paths do we want?
The path to the global minimum
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 68: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/68.jpg)
What kind of paths do we want?
The path to the global minimum
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 69: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/69.jpg)
What kind of paths do we want?
The path to the global minimum
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 70: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/70.jpg)
What kind of paths do we want?
The path to the global minimum
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 71: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/71.jpg)
What kind of paths do we want?
The path to the global minimum
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 72: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/72.jpg)
What kind of paths do we want?
The path to the global minimum
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 73: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/73.jpg)
What kind of paths do we want?
The path to the global minimum
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 74: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/74.jpg)
What kind of paths do we want?
The path to the global minimum
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 75: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/75.jpg)
What kind of paths do we want?
The path to the global minimum
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
![Page 76: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/76.jpg)
Monte-Carlo Methods(a.k.a. MC simulations, MC sampling or MC search)
• Monte-Carlo methods (“casino” methods) are a very general term for estimations that are based on a series of random samples– Samples can be dependent or independent– MC physical simulations are most famous for their
role in the Manhattan Project (Uncle of Polish mathematician Stanisław Marcin Ulam’s was said to be a heavy gambler)
![Page 77: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/77.jpg)
Example: Estimating Π by Independent Monte-Carlo Samples (I)
Suppose we throw darts randomly (and uniformly) at the square:
Algorithm:For i=[1..ntrials] x = (random# in [0..r]) y = (random# in [0..r]) distance = sqrt (x^2 + y^2) if distance ≤ r hits++EndOutput:
ntrials
hits4
Adapted from course slides by Craig Douglas
http://www.chem.unl.edu/zeng/joy/mclab/mcintro.html
![Page 78: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/78.jpg)
Outline• Introduction• Local Minimization Methods (derivative-based)
– Gradient (first order) methods– Newton (second order) methods
• Monte-Carlo Sampling (MC)– Introduction to MC methods– Markov-chain MC methods (MCMC)– Escaping local-minima
![Page 79: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/79.jpg)
Drunk Sailor’s Random Walk http://www.chem.uoa.gr/applets/AppletSailor/Appl_Sailor2.html
What is the probability that the sailor will leave through each exit?What is the probability that the sailor will leave through each exit?
0.25
0.25
0.250.25
0.25
0.250.25
0.25
![Page 80: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/80.jpg)
Markov-Chain Monte Carlo (MCMC)
• Markov-Chain: future state depends only on present state
• Markov-Chain Monte-Carlo on Graphs: we randomly walk from node to node with a certain probability, that depends only on our current location.
0.5
0.5
0.25
0.75
![Page 81: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/81.jpg)
Analysis of a Two-Nodes Walk
A B 0.5
0.5
0.25
0.75After n rounds, what is the probability of being in node A?
After n rounds, what is the probability of being in node A?
Assume Prn+1A ≈ PrnA for a large n:
Prn+1A = PrnA x 0.75 + PrnB x 0.5
0.25 x PrnA = PrnB x 0.5
PrnA = 2 x PrnB
So: Pr∞A = ⅔ Pr∞B = ⅓
Assume Prn+1A ≈ PrnA for a large n:
Prn+1A = PrnA x 0.75 + PrnB x 0.5
0.25 x PrnA = PrnB x 0.5
PrnA = 2 x PrnB
So: Pr∞A = ⅔ Pr∞B = ⅓
![Page 82: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/82.jpg)
After a long run, we want to find low-energy conformations, with high probability
After a long run, we want to find low-energy conformations, with high probability
Sampling Protein Conformations with MCMC
Protein image taken from Chemical Biology, 2006
Markov-Chain Monte-Carlo (MCMC) with “proposals”:1. Perturb Structure to create a “proposal”2. Accept or reject new conformation with a “certain” probability
Markov-Chain Monte-Carlo (MCMC) with “proposals”:1. Perturb Structure to create a “proposal”2. Accept or reject new conformation with a “certain” probability
But how?But how?
A (physically) natural* choice is the Boltzmann distribution, proportional to:
Ei = energy of state ikB = Boltzmann constantT = temperatureZ = “Partition Function” constant
A (physically) natural* choice is the Boltzmann distribution, proportional to:
Ei = energy of state ikB = Boltzmann constantT = temperatureZ = “Partition Function” constant
* In theory, the Boltzmann distribution is a bit problematic in non-gas phase, but never mind that for now…* In theory, the Boltzmann distribution is a bit problematic in non-gas phase, but never mind that for now…
Ze Tk
E
B
i
![Page 83: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/83.jpg)
The Metropolis-Hastings Criterion
• Boltzmann Distribution:
• The energy score and temperature are computed (quite) easily• The “only” problem is calculating Z (the “partition function”) –
this requires summing over all states.• Metropolis showed that MCMC will converge to the true
Boltzmann distribution, if we accept a new proposal with
probability
"Equations of State Calculations by Fast Computing Machines“ – Metropolis, N. et al. Journal of Chemical Physics (1953)
Z
e TkE Bi
![Page 84: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/84.jpg)
If we run till infinity, with good perturbations, we will visit every conformation according to the Boltzmann distribution
If we run till infinity, with good perturbations, we will visit every conformation according to the Boltzmann distribution
Sampling Protein Conformations with Metropolis-Hastings MCMC
Protein image taken from Chemical Biology, 2006
Markov-Chain Monte-Carlo (MCMC) with “proposals”:1. Perturb Structure to create a “proposal”2. Accept or reject new conformation by the Metropolis criterion3. Repeat for many iterations
Markov-Chain Monte-Carlo (MCMC) with “proposals”:1. Perturb Structure to create a “proposal”2. Accept or reject new conformation by the Metropolis criterion3. Repeat for many iterations
But we just want to find the energy minimum. If we do our perturbations in a smart manner, we can still cover relevant (realistic, low-energy) parts of the search space
But we just want to find the energy minimum. If we do our perturbations in a smart manner, we can still cover relevant (realistic, low-energy) parts of the search space
![Page 85: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/85.jpg)
Outline• Introduction• Local Minimization Methods (derivative-based)
– Gradient (first order) methods– Newton (second order) methods
• Monte-Carlo Sampling (MC)– Introduction to MC methods– Markov-chain MC methods (MCMC)– Escaping local-minima
![Page 86: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/86.jpg)
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
Getting stuck in a local minimum
![Page 87: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/87.jpg)
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
Getting stuck in a local minimum
![Page 88: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/88.jpg)
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
Getting stuck in a local minimum
![Page 89: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/89.jpg)
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
Getting stuck in a local minimum
![Page 90: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/90.jpg)
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
Getting stuck in a local minimum
![Page 91: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/91.jpg)
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
Getting stuck in a local minimum
![Page 92: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/92.jpg)
Trick 1: Simulated Annealing
The Boltzmann distribution depends on the in-silico temperature T:• In low temperatures, we will get stuck in local minima (we will
always get zero if the energy rises even slightly)• In high temperatures, we will always get 1 (jump between
conformations like nuts).
The Boltzmann distribution depends on the in-silico temperature T:• In low temperatures, we will get stuck in local minima (we will
always get zero if the energy rises even slightly)• In high temperatures, we will always get 1 (jump between
conformations like nuts).
In simulated annealing, we gradually decrease (“cool down”) the virtual temperature factor,
until we converge to a minimum point
In simulated annealing, we gradually decrease (“cool down”) the virtual temperature factor,
until we converge to a minimum point
![Page 93: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/93.jpg)
Trick 2: Monte-Carlo with Energy Minimization (MCM)Scheraga et al., 1987
• Derivative-based methods (Gradient Descent, Newton’s method, DFP) are excellent at finding near-by local minima
• In Rosetta, Monte-Carlo is used for bigger jumps between near-by local minima
![Page 94: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/94.jpg)
Trick 3: Switching between Low-Resolution (smooth) and High-Resolution (rugged) energy functions
• In Rosetta, the Centroid energy function is used to quickly sample large perturbations
• The Full-Atom energy function is used for fine tuning
START
energy
conformations
Smooth Low-res
Rugged High-res
![Page 95: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/95.jpg)
Trick 4: Repulsive Energy Ramping• The repulsive VdW energy is the main reason for getting stuck• Start simulations with lowered repulsive energy term, and gradually ramp
it up during the simulation• Similar rational to Simulated Annealing
Trick 5: Modulating Perturbation Step Size• A too small perturbation size can lead to a very slow simulation
we remain stuck in the local minimum• A large perturbation size can lead to clashes and a very high rejection rate
we remain stuck in the same local minimum• We can increase or decrease the step size until a fixed rejection rate (for
example, 50%) is achieved
![Page 96: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/96.jpg)
Monte-Carlo in Rosetta
• In Rosetta, it is common to use any of the above tricks, MCM in particular
• In general, a single simulation is pretty short (no more than a few minutes), but is repeated k independent times – getting k sampled “decoys” – We use energy scoring to decide which is the best decoy structure –
hopefully this is the near-native solution– Low-resolution sampling is often used to create a very large number
of initial decoys, and only the best ones are moved to high-resolution minimization
![Page 97: 3. Optimization Methods for Molecular Modeling by Barak Raveh](https://reader034.vdocuments.net/reader034/viewer/2022052603/56649e3f5503460f94b2fbed/html5/thumbnails/97.jpg)
Summary• Derivative-based methods can effectively reach near-by energy minima
• Metropolis-Hastings MCMC can recover the Boltzmann distribution in some applications, but for protein folding, we cannot hope to cover the huge conformational space, or recover the Boltzmann distribution.
• Still, useful tricks help us find good low-energy near-native conformations (Simulated Annealing, Monte-Carlo with Minimization, Centroid mode, Ramping, Step size modulation, and other smart sampling steps, etc.).
• We didn’t cover some very popular non-linear optimization methods:– Linear and Convex Programming ; Expectation Maximization algorithm ; Branch
and Bound algorithms ; Dead-End Elimination (Lesson 4) ; Mean Field approach ; And more…