optimization methods morten nielsen department of systems biology, dtu

Optimization methods

Morten NielsenDepartment of Systems Biology,

Outline

• Optimization procedures – Gradient decent– Monte Carlo

• Overfitting – cross-validation

• Method evaluation

Linear methods. Error estimate

Linear function

Gradient decent (from wekipedia)

Gradient descent is based on the observation that if the real-valued function F(x) is defined and differentiable in a neighborhood of a point a, then F(x) decreases fastest if one goes from a in the direction of the negative gradient of F at a. It follows that, if

for > 0 a small enough number, then F(b)<F(a)

Gradient decent (example)

Gradient decent

Weights are changed in the opposite direction of the gradient of the error

Gradient decent (Linear function)

Linear function

Gradient decent

Linear function

Gradient decent. Example

Linear function

Gradient decent. Example

Linear function

Gradient decent. Doing it your selfWeights are changed in the opposite direction of the gradient of the error

W1=0.1 W2=0.1

Linear function

What are the weights after 2 forward (calculate predictions) and backward (update weights) iterations with the given input, and has the error decrease (use =0.1, and t=1)?

Fill out the table

itr W1 W2 O

0 0.1 0.1

What are the weights after 2 forward/backward iterations with the given input, and has the error decrease (use =0.1, t=1)?

W1=0.1 W2=0.1

Linear function

Monte Carlo

Because of their reliance on repeated computation of random or pseudo-random numbers, Monte Carlo methods are most suited to calculation by a computer. Monte Carlo methods tend to be used when it is unfeasible or impossible to compute an exact result with a deterministic algorithmOr when you are too stupid to do the math yourself?

Monte Carlo (Minimization)

dE<0dE>0

Gibbs sampler. Monte Carlo simulations

RFFGGDRGAPKRGYLDPLIRGLLARPAKLQVKPGQPPRLLIYDASNRATGIPA GSLFVYNITTNKYKAFLDKQ SALLSSDITASVNCAK GFKGEQGPKGEPDVFKELKVHHANENI SRYWAIRTRSGGITYSTNEIDLQLSQEDGQTIE

RFFGGDRGAPKRGYLDPLIRGLLARPAKLQVKPGQPPRLLIYDASNRATGIPAGSLFVYNITTNKYKAFLDKQ SALLSSDITASVNCAK GFKGEQGPKGEPDVFKELKVHHANENI SRYWAIRTRSGGITYSTNEIDLQLSQEDGQTIE

E1 = 5.4 E2 = 5.7

E2 = 5.2

dE>0; Paccept =1

dE<0; 0 < Paccept < 1

Note the sign. Maximization

Monte Carlo Temperature

• What is the Monte Carlo temperature?

• Say dE=-0.2, T=1

• T=0.001

MC minimization

Monte Carlo - Examples

• Why a temperature?

Local minima

• A prediction method contains a very large set of parameters

– A matrix for predicting binding for 9meric peptides has 9x20=180 weights

• Over fitting is a problem

Data driven method training

yearsTe

mperature

ALAKAAAAMALAKAAAANALAKAAAARALAKAAAATALAKAAAAVGMNERPILTGILGFVFTMTLNAWVKVVKLNEPVLLLAVVPFIVSVMRSGRVHAVVRFNIDETPANYIGQDGLAELCGDPGDQTRAVADGKGRPVPAAHPMTAQWWLDAFARGVVHVILQRELTRLQAVAEEMTKS

Evaluation of predictive performance• Train PSSM on raw data

– No pseudo counts, No sequence weighting– Fit 9*20 parameters to 9*10 data points

• Evaluate on training data–PCC = 0.97–AUC = 1.0

• Close to a perfect prediction method

AAAMAAKLAAAKNLAAAAAKALAAAARAAAAKLATAALAKAVAAAIPELMRTNGFIMGVFTGLNVTKVVAWLLEPLNLVLKVAVIVSVPFMRSGRVHAVVRFNIDETPANYIGQDGLAELCGDPGDQTRAVADGKGRPVPAAHPMTAQWWLDAFARGVVHVILQRELTRLQAVAEEMTKS

Evaluation of predictive performance• Train PSSM on Permuted (random) data

– No pseudo counts, No sequence weighting– Fit 9*20 parameters to 9*10 data points

• Evaluate on training data–PCC = 0.97–AUC = 1.0

• Close to a perfect prediction method AND• Same performance as one the original data

Repeat on large training data (229 ligands)

Cross validation

Train on 4/5 of dataTest/evaluate on 1/5=>Produce 5 different methods each with a different prediction focus

Model over-fitting

2000 MHC:peptide binding dataPCC=0.99

Evaluate on 600 MHC:peptide binding dataPCC=0.80

Model over-fitting (early stopping)

Evaluate on 600 MHC:peptide binding dataPCC=0.89

Stop training

What is going on?

Temperature

5 fold training

Which method to choose?

5 fold training

Method evaluation

• Use cross validation• Evaluate on concatenated data and not

as an average over each cross-validated performance

Method evaluation

Which prediction to use?

Method evaluation

SMM - Stabilization matrix method

Linear function

Per target:

Global:

Sum over weights

Sum over data points

Linear function

l per target

Linear function

SMM training

Evaluate on 600 MHC:peptide binding dataL=0: PCC=0.70L=0.1 PCC = 0.78

SMM - Stabilization matrix methodMonte Carlo

Linear function

Global:

• Make random change to weights

• Calculate change in “global” error

• Update weights if MC move is accepted

Note difference between MC and GD in the use of “global” versus “per target” error

Training/evaluation procedure

• Define method• Select data• Deal with data redundancy

– In method (sequence weighting)– In data (Hobohm)

• Deal with over-fitting either– in method (SMM regulation term) or– in training (stop fitting on test set

performance)• Evaluate method using cross-validation

A small doit script/usr/opt/www/pub/CBS/courses/27623.algo/exercises/code/SMM/doit_ex

#! /bin/tcsh foreach a ( `cat allelefile` )

mkdir -p $cd $a

foreach l ( 0 1 2.5 5 10 20 30 )

mkdir -p l.$lcd l.$l

foreach n ( 0 1 2 3 4 )

smm -nc 500 -l $l train.$n > mat.$npep2score -mat mat.$n eval.$n > eval.$n.pred

echo $a $l `cat eval.?.pred | grep -v "#" | gawk '{print $2,$3}' | xycorr`

optimization methods morten nielsen department of systems biology, dtu

Documents

cross validation, training and evaluation of data driven...

artificial neural networks 1 morten nielsen department of...

protein structure and homology modeling morten nielsen, cbs,...

protein fold recognition morten nielsen, cbs, biocentrum,...

protein fold recognition morten nielsen, cbs, department of...

characterizing receptor ligand interactions morten nielsen,...

protein fold recognition morten nielsen, thomas nordahl cbs,...

stabilization matrix method (ridge regression) morten...

characterizing receptor ligand interactions morten … ·...

prediction of t cell epitopes using artificial neural...

performance measures morten nielsen, cbs ... - dtu health...

notes for dtu course 46100: introduction to micro ... ·...

hidden markov models, hmm’s morten nielsen, cbs, biosys,...

hidden markov models, hmm’s morten nielsen department of...

blast heuristics morten nielsen department of systems...

center for biological sequence analysistechnical university...

optimization methods morten nielsen department of systems ...

artificial neural networks 1 morten nielsen department of...

stabilization matrix method (rigde regression) morten...

artificial neural networks 2 morten nielsen depertment of...