artificial neural networks 2 morten nielsen depertment of systems biology , dtu

49
Artificial Neural Networks 2 Morten Nielsen Depertment of Systems Biology, DTU

Upload: donat

Post on 23-Mar-2016

59 views

Category:

Documents


2 download

DESCRIPTION

Artificial Neural Networks 2 Morten Nielsen Depertment of Systems Biology , DTU. Outline. Optimization procedures Gradient decent (this you already know) Network training back propagation cross-validation Over-fitting examples. Neural network. Error estimate. Linear function. I 1. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Artificial Neural Networks 2

Morten NielsenDepertment of Systems Biology,

DTU

Page 2: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Outline

• Optimization procedures – Gradient decent (this you already know)

• Network training– back propagation– cross-validation– Over-fitting– examples

Page 3: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Neural network. Error estimate

I1 I2

w1 w2

Linear function

o

Page 4: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Neural networks

Page 5: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Gradient decent (from wekipedia)

Gradient descent is based on the observation that if the real-valued function F(x) is defined and differentiable in a neighborhood of a point a, then F(x) decreases fastest if one goes from a in the direction of the negative gradient of F at a. It follows that, if

for > 0 a small enough number, then F(b)<F(a)

Page 6: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Gradient decent (example)

Page 7: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Gradient decent (example)

Page 8: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Gradient decent. Example

Weights are changed in the opposite direction of the gradient of the error

I1 I2

w1 w2

Linear function

o

Page 9: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

What about the hidden layer?

Page 10: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Hidden to output layer

Page 11: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Hidden to output layer

Page 12: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Hidden to output layer

Page 13: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Input to hidden layer

Page 14: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Input to hidden layer

Page 15: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Input to hidden layer

Page 16: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Summary

Page 17: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Or

Page 18: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Or

Ii=X[0][k]

Hj=X[1][j]

Oi=X[2][i]

Page 19: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Can you do it your self?

v22=1v12=1

v11=1v21=-1

w1=-1 w2=1

h2H2

h1H1

oO

I1=1 I2=1

What is the output (O) from the network?What are the wij and vjk values if the target value is 0 and =0.5?

Page 20: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Can you do it your self (=0.5). Has the error decreased?

v22=1v12=1

v11=1v21=-1

w1=-1 w2=1

h2=H2=

h1=H1=

o=O=

I1=1 I2=1

v22=.

v12=V11=

v21=

w1= w2=

h2=H2=

h1=H1=

o=O=

I1=1 I2=1Before After

Page 21: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Sequence encoding

• Change in weight is linearly dependent on input value

• “True” sparse encoding is therefore highly inefficient

• Sparse is most often encoded as– +1/-1 or 0.9/0.05

Page 22: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Training and error reduction

Page 23: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Training and error reduction

Page 24: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Training and error reduction

Size matters

Page 25: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

• A Network contains a very large set of parameters

– A network with 5 hidden neurons predicting binding for 9meric peptides has more than 9x20x5=900 weights

• Over fitting is a problem• Stop training when test performance is optimal

Neural network training

yearsTe

mpe

ratu

re

Page 26: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

What is going on?

years

Tem

pera

ture

Page 27: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Examples

Train on 500 A0201 and 60 A0101 binding dataEvaluate on 1266 A0201 peptides

NH=1: PCC = 0.77NH=5: PCC = 0.72

Page 28: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Neural network training. Cross validation

Cross validation

Train on 4/5 of dataTest on 1/5=>Produce 5 different neural networks each with a different prediction focus

Page 29: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Neural network training curve

Maximum test set performanceMost cable of generalizing

Page 30: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

5 fold training

Which network to choose?

Page 31: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

5 fold training

Page 32: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

How many folds?• Cross validation is always good!, but

how many folds?– Few folds -> small training data sets– Many folds -> small test data sets

• Example from Tuesdays exercise– 560 peptides for training

• 50 fold (10 peptides per test set, few data to stop training)

• 2 fold (280 peptides per test set, few data to train)

• 5 fold (110 peptide per test set, 450 per training set)

Page 33: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Problems with 5fold cross validation• Use test set to stop training, and test

set performance to evaluate training– Over-fitting?

• If test set is small yes• If test set is large no• Confirm using “true” 5 fold cross

validation– 1/5 for evaluation– 4/5 for 4 fold cross-validation

Page 34: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Conventional 5 fold cross validation

Page 35: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

“True” 5 fold cross validation

Page 36: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

When to be careful

• When data is scarce, the performance obtained used “conventional” versus “true” cross validation can be very large

• When data is abundant the difference is small, and “true” cross validation might even be higher than “conventional” cross validation due to the ensemble aspect of the “true” cross validation approach

Page 37: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Do hidden neurons matter?

• The environment matters

NetMHCpan

Page 38: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Context matters• FMIDWILDA YFAMYGEKVAHTHVDTLYVRYHYYTWAVLAYTWY 0.89 A0201• FMIDWILDA YFAMYQENMAHTDANTLYIIYRDYTWVARVYRGY 0.08 A0101• DSDGSFFLY YFAMYGEKVAHTHVDTLYVRYHYYTWAVLAYTWY 0.08 A0201• DSDGSFFLY YFAMYQENMAHTDANTLYIIYRDYTWVARVYRGY 0.85 A0101

Page 39: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Example

Page 40: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Summary• Gradient decent is used to determine the

updates for the synapses in the neural network

• Some relatively simple math defines the gradients– Networks without hidden layers can be solved on

the back of an envelope (SMM exercise)– Hidden layers are a bit more complex, but still ok

• Always train networks using a test set to stop training– Be careful when reporting predictive performance

• Use “true” cross-validation for small data sets• And hidden neurons do matter (sometimes)

Page 41: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

And some more stuff for the long cold winter nights• Can it might be made differently?

Page 42: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Predicting accuracy

• Can it be made differently?

Reliability

Page 43: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

• Identification of position specific receptor ligand interactions by use of artificial neural network decomposition. An investigation of interactions in the MHC:peptide system

Master these by Frederik Otzen Bagger

Making sense of ANN weights

Page 44: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Making sense of ANN weights

Page 45: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Making sense of ANN weights

Page 46: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Making sense of ANN weights

Page 47: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Making sense of ANN weights

Page 48: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Making sense of ANN weights

Page 49: Artificial  Neural Networks 2 Morten Nielsen Depertment  of Systems  Biology , DTU

Making sense of ANN weights