artificiel neural networks 2 morten nielsen department of systems biology , dtu

76
Artificiel Neural Networks 2 Morten Nielsen Department of Systems Biology, DTU

Upload: crevan

Post on 14-Feb-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Artificiel Neural Networks 2 Morten Nielsen Department of Systems Biology , DTU. Outline. Optimization procedures Gradient decent (this you already know) Network training back propagation cross-validation Over-fitting examples. Neural network. Error estimate. Linear function. I 1. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Artificiel Neural Networks 2

Morten NielsenDepartment of Systems Biology,

DTU

Page 2: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Outline

• Optimization procedures – Gradient decent (this you already know)

• Network training– back propagation– cross-validation– Over-fitting– Examples– Deeplearning

Page 3: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Neural network. Error estimate

I1 I2

w1 w2

Linear function

o

Page 4: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Neural networks

Page 5: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Gradient decent (from wekipedia)

Gradient descent is based on the observation that if the real-valued function F(x) is defined and differentiable in a neighborhood of a point a, then F(x) decreases fastest if one goes from a in the direction of the negative gradient of F at a. It follows that, if

for > 0 a small enough number, then F(b)<F(a)

Page 6: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Gradient decent (example)

Page 7: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Gradient decent (example)

Page 8: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Gradient decent. Example

Weights are changed in the opposite direction of the gradient of the error

I1 I2

w1 w2

Linear function

o

Page 9: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Network architecture

Input layer

Hidden layer

Output layer

Ik

hj

Hj

oO

vjk

wj

Page 10: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

What about the hidden layer?

Page 11: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Hidden to output layer

Page 12: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Hidden to output layer

Page 13: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Hidden to output layer

Page 14: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Hidden to output layer

Page 15: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Input to hidden layer

Page 16: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Summary

Page 17: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Or

Page 18: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Or

Ii=X[0][k]

Hj=X[1][j]

Oi=X[2][i]

Page 19: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Can you do it your self?

v22=1v12=1

v11=1v21=-1

w1=-1 w2=1

h2H2

h1H1

oO

I1=1 I2=1

What is the output (O) from the network?What are the wij and vjk values if the target value is 0 and =0.5?

Page 20: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Can you do it your self (=0.5). Has the error decreased?

v22=1v12=1

v11=1v21=-1

w1=-1 w2=1

h2=H2=

h1=H1=

o=O=

I1=1 I2=1

v22=.

v12=V11=

v21=

w1= w2=

h2=H2=

h1=H1=

o=O=

I1=1 I2=1Before After

Page 21: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

h1=2H1=0.88

Can you do it your self (=0.5). Has the error decreased?

v22=.

v12=V11=

v21=

w1= w2=

h2=H2=

h1=H1=

o=O=

I1=1 I2=1Before After

v22=1v12=1

v11=1v21=-1

w1=-1 w2=1

h2=0H2=0.5

o=-0.38O=0.41

I1=1 I2=1

Page 22: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Can you do it your self?

v22=1v12=1

v11=1v21=-1

w1=-1 w2=1

h2=0H2=0.5

h1=2H1=0.88

o=-0.38O=0.41

I1=1 I2=1

• What is the output (O) from the network?

• What are the wij and vjk values if the target value is 0?

Page 23: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Can you do it your self (=0.5). Has the error decreased?

v22=1

v12=1v11=1

v21=-1

w1=-1 w2=1

h2=0H2=0.5

h1=2H1=0.88

o=-0.38O=0.41

I1=1 I2=1

v22=.0.99v12=1.005

v11=1.005v21=-1.01

w1=-1.043w2=0.975

h2=-0.02H2=0.495

h1=2.01H1=0.882

o=-0.44O=0.39

I1=1 I2=1

Page 24: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Sequence encoding

• Change in weight is linearly dependent on input value

• “True” sparse encoding (i.e 1/0) is therefore highly inefficient

• Sparse is most often encoded as– +1/-1 or 0.9/0.05

Page 25: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Sequence encoding - rescaling

• Rescaling the input valuesIf the input (o or h) is too large or too small, g´ is zero and the weights are not changed. Optimal performance is when o,h are close to 0.5

Page 26: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Training and error reduction

Page 27: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Training and error reduction

Page 28: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Training and error reduction

Size matters

Page 29: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Example

Page 30: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Neural network training. Cross validation

Cross validation

Train on 4/5 of dataTest on 1/5=>Produce 5 different neural networks each with a different prediction focus

Page 31: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Neural network training curve

Maximum test set performanceMost cable of generalizing

Page 32: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

5 fold training

Which network to choose?

Page 33: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

5 fold training

Page 34: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Conventional 5 fold cross validation

Page 35: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

“Nested” 5 fold cross validation

Page 36: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

When to be careful

• When data is scarce, the performance obtained used “conventional” versus “Nested” cross validation can be very large

• When data is abundant the difference is small, and “nested” cross validation might even be higher than “conventional” cross validation due to the ensemble aspect of the “nested” cross validation approach

Page 37: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Do hidden neurons matter?

• The environment matters

NetMHCpan

Page 38: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Context mattersFMIDWILDA YFAMYGEKVAHTHVDTLYVRYHYYTWAVLAYTWY 0.89 A0201FMIDWILDA YFAMYQENMAHTDANTLYIIYRDYTWVARVYRGY 0.08 A0101DSDGSFFLY YFAMYGEKVAHTHVDTLYVRYHYYTWAVLAYTWY 0.08 A0201DSDGSFFLY YFAMYQENMAHTDANTLYIIYRDYTWVARVYRGY 0.85 A0101

Page 39: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Example

Page 40: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Summary• Gradient decent is used to determine the

updates for the synapses in the neural network

• Some relatively simple math defines the gradients– Networks without hidden layers can be solved on

the back of an envelope (SMM exercise)– Hidden layers are a bit more complex, but still ok

• Always train networks using a test set to stop training– Be careful when reporting predictive performance

• Use “nested” cross-validation for small data sets• And hidden neurons do matter (sometimes)

Page 41: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

And some more stuff for the long cold winter nights• Can it might be made differently?

Page 42: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Predicting accuracy

• Can it be made differently?

Reliability

Page 43: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

• Identification of position specific receptor ligand interactions by use of artificial neural network decomposition. An investigation of interactions in the MHC:peptide system

Master thesis’ by Frederik Otzen Bagger and Piotr Chmura

Making sense of ANN weights

Page 44: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Making sense of ANN weights

Page 45: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Making sense of ANN weights

Page 46: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Making sense of ANN weights

Page 47: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Making sense of ANN weights

Page 48: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Making sense of ANN weights

Page 49: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Making sense of ANN weights

Page 50: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Deep learning

http://www.slideshare.net/hammawan/deep-neural-networks

Page 51: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Deep(er) Network architecture

Page 52: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Deeper Network architectureIl Input layer, l

1. Hidden layer, k

Output layer

h1k

H1k

h3

H3

ukl

wj

2. Hidden layer, j

vjk

h2j

H2j

Page 53: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Network architecture (hidden to hidden)

Page 54: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Network architecture (input to hidden)

Page 55: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Network architecture (input to hidden)

Page 56: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Speed. Use delta’s

j

k

l

dj

vjk

ukl

dk

Bishop, Christopher (1995). Neural networks for pattern recognition. Oxford: Clarendon Press. ISBN 0-19-853864-2.

Page 57: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Use delta’s

Page 58: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Deep learning – time is not an issue

17000 17500 18000 18500 19000 19500 200000

50

100

150

200

250

Number of weights

CPU

(u)

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.51600016500170001750018000185001900019500

N layer

# w

eigh

ts

Page 59: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Deep learning

http://www.slideshare.net/hammawan/deep-neural-networks

Page 60: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Deep learning

http://www.slideshare.net/hammawan/deep-neural-networks

Page 61: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Deep learning

http://www.slideshare.net/hammawan/deep-neural-networks

Page 62: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Deep learning

http://www.slideshare.net/hammawan/deep-neural-networks

Page 63: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Deep learning

http://www.slideshare.net/hammawan/deep-neural-networks

Auto encoder

Page 64: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Deep learning

http://www.slideshare.net/hammawan/deep-neural-networks

Page 65: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Deep learning

http://www.slideshare.net/hammawan/deep-neural-networks

Page 66: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Deep learning

http://www.slideshare.net/hammawan/deep-neural-networks

Page 67: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Deep learning

http://www.slideshare.net/hammawan/deep-neural-networks

Page 68: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Deep learning

http://www.slideshare.net/hammawan/deep-neural-networks

Page 69: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Pan-specific prediction methods

NetMHC NetMHCpan

Page 70: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

ExamplePeptide Amino acids of HLA pockets HLA Aff VVLQQHSIA YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.131751SQVSFQQPL YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.487500SQCQAIHNV YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.364186LQQSTYQLV YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.582749LQPFLQPQL YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.206700VLAGLLGNV YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.727865VLAGLLGNV YFAVWTWYGEKVHTHVDTLLRYHY A0202 0.706274VLAGLLGNV YFAEWTWYGEKVHTHVDTLVRYHY A0203 1.000000VLAGLLGNV YYAVLTWYGEKVHTHVDTLVRYHY A0206 0.682619VLAGLLGNV YYAVWTWYRNNVQTDVDTLIRYHY A6802 0.407855

Page 71: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Going Deep – One hidden layer

0 50 100 150 200 250 300 350 400 450 5000

0.020.040.060.08

20

# Ietrations

MSE

0 50 100 150 200 250 300 350 400 450 5000

0.010.020.030.040.050.060.07

# Iterations

MSE

0 50 100 150 200 250 300 350 400 450 5000

0.20.40.60.8

1

# Iterations

PCC

Train

Test

Test

Page 72: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Going Deep – 3 hidden layers

0 50 100 150 200 250 300 350 400 450 5000

0.05

0.120 20+20 20+20+20

# Ietrations

MSE

0 50 100 150 200 250 300 350 400 450 5000

0.020.040.060.080.1

# Iterations

MSE

0 50 100 150 200 250 300 350 400 450 5000

0.20.40.60.8

1

# Iterations

PCC

Train

Test

Test

Page 73: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Going Deep – more than 3 hidden layers

0 50 100 150 200 250 300 350 400 450 5000

0.020.040.060.080.1

20 20+20 20+20+20 20+20+20+20 20+20+20+20+20

# Ietrations

MSE

0 50 100 150 200 250 300 350 400 450 5000

0.05

0.1

# Iterations

MSE

0 50 100 150 200 250 300 350 400 450 5000

0.5

1

# Iterations

PCC

Train

Test

Test

Page 74: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Going Deep – Using Auto-encoders

0 50 100 150 200 250 300 350 400 450 5000

0.05

0.1

20 20+20 20+20+2020+20+20+20 20+20+20+20+20 20+20+20+20+Auto

# Ietrations

MSE

0 50 100 150 200 250 300 350 400 450 5000

0.020.040.060.080.1

# Iterations

MSE

0 50 100 150 200 250 300 350 400 450 5000

0.5

1

# Iterations

PCC

Train

Test

Test

Page 75: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Deep learning

http://www.slideshare.net/hammawan/deep-neural-networks

Page 76: Artificiel  Neural Networks 2 Morten Nielsen Department of Systems  Biology , DTU

Conclusions -2• Implementing Deep networks using

deltas1 makes CPU time scale linearly with respect to the number of weights – So going Deep is not more CPU intensive

than going wide• Back-propagation is an efficient

method for NN training for shallow networks with up to 3 hidden layers

• For deeper network, pre-training is required using for instance Auto-encoders

1Bishop, Christopher (1995). Neural networks for pattern recognition. Oxford: Clarendon Press. ISBN 0-19-853864-2.