optimization methods morten nielsen department of systems biology , dtu

36
Optimization methods Morten Nielsen Department of Systems biology, DTU

Upload: lynna

Post on 22-Feb-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Optimization methods Morten Nielsen Department of Systems biology , DTU. Minimization. The path to the closest local minimum = local minimization . * Adapted from slides by Chen Kaeasar, Ben-Gurion University. Minimization. The path to the closest local minimum = local minimization . - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Optimization methods

Morten NielsenDepartment of Systems biology,

DTU

Page 2: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

*Adapted from slides by Chen Kaeasar, Ben-Gurion University

The path to the closest local minimum = local minimization

Minimization

Page 3: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

*Adapted from slides by Chen Kaeasar, Ben-Gurion University

The path to the closest local minimum = local minimization

Minimization

Page 4: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

The path to the global minimum

*Adapted from slides by Chen Kaeasar, Ben-Gurion University

Minimization

Page 5: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Outline

• Optimization procedures – Gradient descent– Monte Carlo

• Overfitting – cross-validation

• Method evaluation

Page 6: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Linear methods. Error estimate

I1 I2w1 w2

Linear function

o

Page 7: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Gradient descent (from wekipedia)

Gradient descent is based on the observation that if the real-valued function F(x) is defined and differentiable in a neighborhood of a point a, then F(x) decreases fastest if one goes from a in the direction of the negative gradient of F at a. It follows that, if

for > 0 a small enough number, then F(b)<F(a)

Page 8: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Gradient descent (example)

Page 9: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Gradient descent

Page 10: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Gradient descent

Weights are changed in the opposite direction of the gradient of the error

Page 11: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Gradient descent (Linear function)

Weights are changed in the opposite direction of the gradient of the error

I1 I2w1 w2

Linear function

o

Page 12: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Gradient descent

Weights are changed in the opposite direction of the gradient of the error

I1 I2w1 w2

Linear function

o

Page 13: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Gradient descent. Example

Weights are changed in the opposite direction of the gradient of the error

I1 I2w1 w2

Linear function

o

Page 14: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Gradient descent. Example

Weights are changed in the opposite direction of the gradient of the error

I1 I2w1 w2

Linear function

o

Page 15: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Gradient descent. Doing it your selfWeights are changed in the opposite direction of the gradient of the error

1 0

W1=0.1 W2=0.1

Linear function

o

What are the weights after 2 forward (calculate predictions) and backward (update weights) iterations with the given input, and has the error decrease (use =0.1, and t=1)?

Page 16: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Fill out the table

itr W1 W2 O

0 0.1 0.1

1

2

What are the weights after 2 forward/backward iterations with the given input, and has the error decrease (use =0.1, t=1)?

1 0

W1=0.1 W2=0.1

Linear function

o

Page 17: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Fill out the table

itr W1 W2 O

0 0.1 0.1 0.1

1 0.19 0.1 0.19

2 0.27 0.1 0.27

What are the weights after 2 forward/backward iterations with the given input, and has the error decrease (use =0.1, t=1)?

1 0

W1=0.1 W2=0.1

Linear function

o

Page 18: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Monte Carlo

Because of their reliance on repeated computation of random or pseudo-random numbers, Monte Carlo methods are most suited to calculation by a computer. Monte Carlo methods tend to be used when it is unfeasible or impossible to compute an exact result with a deterministic algorithmOr when you are too stupid to do the math yourself?

Page 19: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Example: Estimating Π by Independent

Monte-Carlo SamplesSuppose we throw darts randomly (and uniformly) at the square:

Algorithm:For i=[1..ntrials] x = (random# in [0..r]) y = (random# in [0..r]) distance = sqrt (x^2 + y^2) if distance ≤ r hits++EndOutput:

Adapted from course slides by Craig Douglas

http://www.chem.unl.edu/zeng/joy/mclab/mcintro.html

Page 20: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Estimating P

Page 21: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

After a long run, we want to find low-energy conformations, with high probability

Sampling Protein Conformations with MCMC(Markov Chain Monte Carlo)

Protein image taken from Chemical Biology, 2006

Markov-Chain Monte-Carlo (MCMC) with “proposals”:1. Perturb Structure to create a “proposal”2. Accept or reject new conformation with a “certain”

probability

But how?

A (physically) natural* choice is the Boltzman distribution, proportional to:

Ei = energy of state ikB = Boltzman constantT = temperatureZ = “Partition Function”

* In theory, the Boltzman distribution is a bit problematic in non-gas phase, but never mind that for now…

Ze Tk

E

B

i

Slides adapted from Barak Raveh

Page 22: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

The Metropolis-Hastings Criterion

• Boltzman Distribution:

• The energy score and temperature are computed (quite) easily• The “only” problem is calculating Z (the “partition function”) –

this requires summing over all states.• Metropolis showed that MCMC will converge to the true

Boltzman distribution, if we accept a new proposal with

probability

"Equations of State Calculations by Fast Computing Machines“ – Metropolis, N. et al. Journal of Chemical Physics (1953)

Ze TkE Bi

Slides adapted from Barak Raveh

Page 23: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

If we run till infinity, with good perturbations, we will visit every conformation according to the Boltzman distribution

Sampling Protein Conformations with Metropolis-Hastings MCMC

Protein image taken from Chemical Biology, 2006

Markov-Chain Monte-Carlo (MCMC) with “proposals”:1. Perturb Structure to create a “proposal”2. Accept or reject new conformation by the Metropolis criterion3. Repeat for many iterations

But we just want to find the energy minimum. If we do our perturbations in a smart manner, we can still cover relevant (realistic, low-energy) parts of the search space

Slides adapted from Barak Raveh

Page 24: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Monte Carlo (Minimization)

dE<0dE>0

Page 25: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

The Traveling Salesman

Adapted from www.mpp.mpg.de/~caldwell/ss11/ExtraTS.pdf

Page 26: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Adapted from www.mpp.mpg.de/~caldwell/ss11/ExtraTS.pdf

Page 27: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Adapted from www.mpp.mpg.de/~caldwell/ss11/ExtraTS.pdf

Page 28: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Adapted from www.mpp.mpg.de/~caldwell/ss11/ExtraTS.pdf

Page 29: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Adapted from www.mpp.mpg.de/~caldwell/ss11/ExtraTS.pdf

Page 30: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU
Page 31: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Adapted from www.mpp.mpg.de/~caldwell/ss11/ExtraTS.pdf

Page 32: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Gibbs sampler. Monte Carlo simulations RFFGGDRGAPKRGYLDPLIRGLLARPAKLQVKPGQPPRLLIYDASNRATGIPA GSLFVYNITTNKYKAFLDKQ SALLSSDITASVNCAK GFKGEQGPKGEPDVFKELKVHHANENI SRYWAIRTRSGGITYSTNEIDLQLSQEDGQTIE

RFFGGDRGAPKRGYLDPLIRGLLARPAKLQVKPGQPPRLLIYDASNRATGIPAGSLFVYNITTNKYKAFLDKQ SALLSSDITASVNCAK GFKGEQGPKGEPDVFKELKVHHANENI SRYWAIRTRSGGITYSTNEIDLQLSQEDGQTIE

E1 = 5.4 E2 = 5.7

E2 = 5.2

dE>0; Paccept =1

dE<0; 0 < Paccept < 1

Note the sign. Maximization

Page 33: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Monte Carlo Temperature

• What is the Monte Carlo temperature?

• Say dE=-0.2, T=1

• T=0.001

Page 34: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

MC minimization

Page 35: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Monte Carlo - Examples

• Why a temperature?

Page 36: Optimization methods Morten Nielsen Department of Systems  biology ,  DTU

Local minima