optimization methods morten nielsen department of systems biology , dtu
DESCRIPTION
Optimization methods Morten Nielsen Department of Systems biology , DTU. Minimization. The path to the closest local minimum = local minimization . * Adapted from slides by Chen Kaeasar, Ben-Gurion University. Minimization. The path to the closest local minimum = local minimization . - PowerPoint PPT PresentationTRANSCRIPT
Optimization methods
Morten NielsenDepartment of Systems biology,
DTU
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
The path to the closest local minimum = local minimization
Minimization
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
The path to the closest local minimum = local minimization
Minimization
The path to the global minimum
*Adapted from slides by Chen Kaeasar, Ben-Gurion University
Minimization
Outline
• Optimization procedures – Gradient descent– Monte Carlo
• Overfitting – cross-validation
• Method evaluation
Linear methods. Error estimate
I1 I2w1 w2
Linear function
o
Gradient descent (from wekipedia)
Gradient descent is based on the observation that if the real-valued function F(x) is defined and differentiable in a neighborhood of a point a, then F(x) decreases fastest if one goes from a in the direction of the negative gradient of F at a. It follows that, if
for > 0 a small enough number, then F(b)<F(a)
Gradient descent (example)
Gradient descent
Gradient descent
Weights are changed in the opposite direction of the gradient of the error
Gradient descent (Linear function)
Weights are changed in the opposite direction of the gradient of the error
I1 I2w1 w2
Linear function
o
Gradient descent
Weights are changed in the opposite direction of the gradient of the error
I1 I2w1 w2
Linear function
o
Gradient descent. Example
Weights are changed in the opposite direction of the gradient of the error
I1 I2w1 w2
Linear function
o
Gradient descent. Example
Weights are changed in the opposite direction of the gradient of the error
I1 I2w1 w2
Linear function
o
Gradient descent. Doing it your selfWeights are changed in the opposite direction of the gradient of the error
1 0
W1=0.1 W2=0.1
Linear function
o
What are the weights after 2 forward (calculate predictions) and backward (update weights) iterations with the given input, and has the error decrease (use =0.1, and t=1)?
Fill out the table
itr W1 W2 O
0 0.1 0.1
1
2
What are the weights after 2 forward/backward iterations with the given input, and has the error decrease (use =0.1, t=1)?
1 0
W1=0.1 W2=0.1
Linear function
o
Fill out the table
itr W1 W2 O
0 0.1 0.1 0.1
1 0.19 0.1 0.19
2 0.27 0.1 0.27
What are the weights after 2 forward/backward iterations with the given input, and has the error decrease (use =0.1, t=1)?
1 0
W1=0.1 W2=0.1
Linear function
o
Monte Carlo
Because of their reliance on repeated computation of random or pseudo-random numbers, Monte Carlo methods are most suited to calculation by a computer. Monte Carlo methods tend to be used when it is unfeasible or impossible to compute an exact result with a deterministic algorithmOr when you are too stupid to do the math yourself?
Example: Estimating Π by Independent
Monte-Carlo SamplesSuppose we throw darts randomly (and uniformly) at the square:
Algorithm:For i=[1..ntrials] x = (random# in [0..r]) y = (random# in [0..r]) distance = sqrt (x^2 + y^2) if distance ≤ r hits++EndOutput:
Adapted from course slides by Craig Douglas
http://www.chem.unl.edu/zeng/joy/mclab/mcintro.html
Estimating P
After a long run, we want to find low-energy conformations, with high probability
Sampling Protein Conformations with MCMC(Markov Chain Monte Carlo)
Protein image taken from Chemical Biology, 2006
Markov-Chain Monte-Carlo (MCMC) with “proposals”:1. Perturb Structure to create a “proposal”2. Accept or reject new conformation with a “certain”
probability
But how?
A (physically) natural* choice is the Boltzman distribution, proportional to:
Ei = energy of state ikB = Boltzman constantT = temperatureZ = “Partition Function”
* In theory, the Boltzman distribution is a bit problematic in non-gas phase, but never mind that for now…
Ze Tk
E
B
i
Slides adapted from Barak Raveh
The Metropolis-Hastings Criterion
• Boltzman Distribution:
• The energy score and temperature are computed (quite) easily• The “only” problem is calculating Z (the “partition function”) –
this requires summing over all states.• Metropolis showed that MCMC will converge to the true
Boltzman distribution, if we accept a new proposal with
probability
"Equations of State Calculations by Fast Computing Machines“ – Metropolis, N. et al. Journal of Chemical Physics (1953)
Ze TkE Bi
Slides adapted from Barak Raveh
If we run till infinity, with good perturbations, we will visit every conformation according to the Boltzman distribution
Sampling Protein Conformations with Metropolis-Hastings MCMC
Protein image taken from Chemical Biology, 2006
Markov-Chain Monte-Carlo (MCMC) with “proposals”:1. Perturb Structure to create a “proposal”2. Accept or reject new conformation by the Metropolis criterion3. Repeat for many iterations
But we just want to find the energy minimum. If we do our perturbations in a smart manner, we can still cover relevant (realistic, low-energy) parts of the search space
Slides adapted from Barak Raveh
Monte Carlo (Minimization)
dE<0dE>0
The Traveling Salesman
Adapted from www.mpp.mpg.de/~caldwell/ss11/ExtraTS.pdf
Adapted from www.mpp.mpg.de/~caldwell/ss11/ExtraTS.pdf
Adapted from www.mpp.mpg.de/~caldwell/ss11/ExtraTS.pdf
Adapted from www.mpp.mpg.de/~caldwell/ss11/ExtraTS.pdf
Adapted from www.mpp.mpg.de/~caldwell/ss11/ExtraTS.pdf
Adapted from www.mpp.mpg.de/~caldwell/ss11/ExtraTS.pdf
Gibbs sampler. Monte Carlo simulations RFFGGDRGAPKRGYLDPLIRGLLARPAKLQVKPGQPPRLLIYDASNRATGIPA GSLFVYNITTNKYKAFLDKQ SALLSSDITASVNCAK GFKGEQGPKGEPDVFKELKVHHANENI SRYWAIRTRSGGITYSTNEIDLQLSQEDGQTIE
RFFGGDRGAPKRGYLDPLIRGLLARPAKLQVKPGQPPRLLIYDASNRATGIPAGSLFVYNITTNKYKAFLDKQ SALLSSDITASVNCAK GFKGEQGPKGEPDVFKELKVHHANENI SRYWAIRTRSGGITYSTNEIDLQLSQEDGQTIE
E1 = 5.4 E2 = 5.7
E2 = 5.2
dE>0; Paccept =1
dE<0; 0 < Paccept < 1
Note the sign. Maximization
Monte Carlo Temperature
• What is the Monte Carlo temperature?
• Say dE=-0.2, T=1
• T=0.001
MC minimization
Monte Carlo - Examples
• Why a temperature?
Local minima