artificial neural networks 1 morten nielsen department of systems biology, dtu · 2014. 6. 17. ·...

Artificial Neural Networks 1

Morten Nielsen Department of Systems Biology,

Biological Neural network

Biological neuron structure

Artificial neural networks. Background

Higher order sequence correlations

•  Neural networks can learn higher order correlations! –  What does this mean?

S S => 0 L S => 1 S L => 1 L L => 0

Say that the peptide needs one and only one large amino acid in the positions P3 and P4 to fill the binding cleft How would you formulate this to test if a peptide can bind?

=> XOR function

Neural networks

•  Neural networks can learn higher order correlations

XOR function: 0 0 => 0 1 0 => 1 0 1 => 1 1 1 => 0

(1,0) (0,0)

No linear function can separate the points OR

Error estimates

XOR 0 0 => 0 1 0 => 1 0 1 => 1 1 1 => 0

(1,0) (0,0)

(0,1) Predict 0 1 1 1

Error 0 0 0 1

Mean error: 1/4

Linear methods and the XOR function

Linear function

y = x1 ⋅ v1 + x2 ⋅ v2

Neural networks with a hidden layer

w21 w22

wt2 wt1 w11

Input 1 (Bias) {

1+ exp(−o)

o = xii=1

∑ ⋅wi + t = xii=1

∑ ⋅wi

How does it work? Ex. Input is (0 0)

-2 -6 4

Input 1 (Bias) {o1=-6 O1=0

o2=-2 O2=0

y1=-4.5 Y1=0

o = xi∑ ⋅ wi

Gradient decent (from wekipedia)

Gradient descent is based on the observation that if the real-valued function F(x) is defined and differentiable in a neighborhood of a point a, then F(x) decreases fastest if one goes from a in the direction of the negative gradient of F at a. It follows that, if for ε > 0 a small enough number, then F(b)<F(a)

b = a −ε ⋅ ∇F(a)

Gradient decent. Example

Weights are changed in the opposite direction of the gradient of the error

wi' = wi +Δwi

E = 12 ⋅ (O− t)

O = wii∑ ⋅ Ii

Δwi = −ε ⋅∂E∂wi

= −ε ⋅∂E∂O

⋅∂O∂wi

= −ε ⋅ (O− t) ⋅ Ii

Linear function

O = I1 ⋅ w1 + I2 ⋅ w2

ANNs - Hidden to output layer

∂E∂wj

=∂E∂O

⋅∂O∂o

⋅∂o∂wj

∂E(O(o(wj )))∂wj

E = 12 ⋅ (O− t)

O = g(o)

o = wii∑ ⋅ Ii

Hidden to output layer

∂E∂wj

=∂E∂O

⋅∂O∂o

⋅∂o∂wj

= (O− t) ⋅ g '(o) ⋅H j

∂E∂O

= (O− t)

∂O∂o

=∂g∂o

= g '(o)

∂o∂wj

=1∂wj

wj ⋅l∑ H j = H j

o = wii∑ ⋅H j

What about the hidden layer?

Δvjk = −ε ⋅∂E∂vjk

o = wjj∑ ⋅H j

hj = vjkk∑ ⋅ Ik

E = 12 ⋅ (O− t)

O = g(o),H = g(h)

g(x) =1

1+ e−x

Input to hidden layer

∂E∂vjk

=∂E(O(o(H j (hj (vjk ))))

∂vjk

=∂E∂O

⋅∂O∂o

⋅∂o∂H j

⋅∂H j

∂hj⋅∂hj∂vjk

= (O− t) ⋅ g '(o) ⋅wj ⋅ g '(hj ) ⋅ Ik

Summary

∂E∂wj

= (O− t) ⋅ g '(o) ⋅H j

∂E∂vjk

= (O− t) ⋅ g '(o) ⋅wj ⋅ g '(hj ) ⋅ Ik

∂E∂wj

= (O− t) ⋅ g '(o) ⋅H j = δ ⋅H j

∂E∂vjk

= (O− t) ⋅ g '(o) ⋅wj ⋅ g '(hj ) ⋅ Ik = δ ⋅wj ⋅ g '(hj ) ⋅ Ik

δ = (O− t) ⋅ g '(o)

Neural networks and the XOR function

Deep(er) Network architecture

E = 12 ⋅ (O− t)

O = g(o),H = g(h)

g(x) =1

1+ e−x

o = wjj∑ ⋅H

2 = vjkk∑ ⋅H

1 = ukll∑ ⋅ Il

Δwi = −ε ⋅∂E∂wi

Deeper Network architecture Il Input layer, l

1. Hidden layer, k

Output layer

2. Hidden layer, j

∂E∂wj

=∂E(H 3(h3(wj )))

=∂E∂H 3 ⋅

∂H 3

∂h3⋅∂h3

= (H 3 − t) ⋅ g '(h3) ⋅H j2

Network architecture (hidden to hidden)

∂E∂vjk

=∂E∂H 3 ⋅

∂H 3

∂h3⋅∂h3

2 ⋅∂H

2 ⋅∂h

∂vjk= (H 3 − t) ⋅ g '(h3) ⋅wj ⋅ g '(hj

2 ) ⋅Hk1

Network architecture (input to hidden)

∂E∂ukl

=∂E∂H 3 ⋅

∂H 3

∂h3⋅

2 ⋅∂H

2 ⋅∂h

∂Hk1

j∑ ⋅

∂Hk1

∂hk1 ⋅∂h

∂ukl

= (H 3 − t) ⋅ g '(h3) ⋅ wj ⋅ g '(hj

2 ) ⋅ vjkj∑ ⋅ g '(h

1 ) ⋅ Il

Network architecture (input to hidden)

∂E∂ukl

=∂E∂H 3 ⋅

∂H 3

∂h3⋅

2 ⋅∂H

2 ⋅∂h

∂Hk1

j∑ ⋅

∂Hk1

∂hk1 ⋅∂h

∂ukl

= (H 3 − t) ⋅ g '(h3) ⋅ wj ⋅ g '(hj

2 ) ⋅ vjkj∑ ⋅ g '(h

1 ) ⋅ Il

Use delta’s

q = wjii∑ H

q−1 = g(hi

q−1)

Bishop, Christopher (1995). Neural networks for pattern recognition. Oxford: Clarendon Press. ISBN 0-19-853864-2.

Use delta’s

∂E∂w

q =∂E∂h

q ⋅∂hj

∂wji

q = δ j

q ⋅Hi

q =∂E∂h

δ3 =∂E∂h3

=∂E∂H 3 ⋅

∂H 3

∂h3= (H 3 − t) ⋅ g '(h3)

δ j2 =

∂E∂h

2 =∂E∂h3

⋅∂h3

∂hj2 =

∂E∂h3

⋅∂h3

∂H j2 ⋅∂H j

∂hj2 = g '(hj

2 ) ⋅δ3 ⋅ vjk

δk1 =

∂E∂hk

1 =∂E∂hj

2j∑ ⋅

∂hj2

∂hk1 =

∂E∂hj

2j∑ ⋅

∂hj2

∂Hk1 ⋅∂Hk

∂hk1 = g '(hk

1) ⋅ δ j2

j∑ ⋅ vjk

hj = wjii∑ Hi

Hi = g(hi )

Deep learning

http://www.slideshare.net/hammawan/deep-neural-networks

Deep learning – time is not an issue

17000 17500 18000 18500 19000 19500 20000

CPU (u)

Number of weights

0 1 2 3 4 5 6

N layer

Deep learning

Auto encoder

Deep learning

Pan-specific prediction methods

NetMHC NetMHCpan

Example Peptide Amino acids of HLA pockets HLA Aff VVLQQHSIA YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.131751 SQVSFQQPL YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.487500 SQCQAIHNV YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.364186 LQQSTYQLV YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.582749 LQPFLQPQL YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.206700 VLAGLLGNV YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.727865 VLAGLLGNV YFAVWTWYGEKVHTHVDTLLRYHY A0202 0.706274 VLAGLLGNV YFAEWTWYGEKVHTHVDTLVRYHY A0203 1.000000 VLAGLLGNV YYAVLTWYGEKVHTHVDTLVRYHY A0206 0.682619 VLAGLLGNV YYAVWTWYRNNVQTDVDTLIRYHY A6802 0.407855

Going Deep – One hidden layer

0 50 100 150 200 250 300 350 400 450 500

# Ietrations

0 50 100 150 200 250 300 350 400 450 500

# Iterations

0 50 100 150 200 250 300 350 400 450 500

# Iterations

Going Deep – 3 hidden layers

0 50 100 150 200 250 300 350 400 450 500

# Ietrations

20 20+20 20+20+20

0 50 100 150 200 250 300 350 400 450 500

# Iterations

0 0.2 0.4 0.6 0.8

0 50 100 150 200 250 300 350 400 450 500

# Iterations

Going Deep – more than 3 hidden layers

0 0.02 0.04 0.06 0.08

0 50 100 150 200 250 300 350 400 450 500

# Ietrations

20 20+20 20+20+20 20+20+20+20 20+20+20+20+20

0 50 100 150 200 250 300 350 400 450 500

# Iterations

0 50 100 150 200 250 300 350 400 450 500

# Iterations

Going Deep – Using Auto-encoders

0 50 100 150 200 250 300 350 400 450 500

# Ietrations

20 20+20 20+20+20 20+20+20+20 20+20+20+20+20 20+20+20+20+Auto

0 0.02 0.04 0.06 0.08

0 50 100 150 200 250 300 350 400 450 500

# Iterations

0 50 100 150 200 250 300 350 400 450 500

# Iterations

Conclusions

•  Implementing Deep networks using deltas1 makes CPU time scale linearly with respect to the number of weights –  So going Deep is not more CPU intensive than

going wide •  Back-propagation is an efficient method

for NN training for shallow networks with up to 3 hidden layers

•  For deeper network, pre-training is required using for instance Auto-encoders

1Bishop, Christopher (1995). Neural networks for pattern recognition. Oxford: Clarendon Press. ISBN 0-19-853864-2.

artificial neural networks 1 morten nielsen department of systems biology, dtu · 2014. 6. 17. ·...

Documents

biological neurons and neural networks, artificial neurons

biological sequence analysis and objectives...biological...

neural networks. biological neuron dendrites cell body axon...

biological pattern recognition neural networks

biological neural network & nonlinear dynamics biological...

artificiel neural networks 2 morten nielsen department of...

neural networks biological analogy introduction to...

artificial neural networks 1 morten nielsen department of...

artificial neural networks 2 morten nielsen depertment of...

artificial neural networks 2 morten nielsen depertment of...

biological modeling of neural networks

artificial neural networks 1 morten nielsen department of...

neural networks. background - neural networks can be :...

artificial neural networks 2 morten nielsen biosys, dtu

neural network and biological circuits

psychology’s biological roots: neurons and neural...

center for biological sequence analysistechnical university...

analyzing biological and artificial neural networks

modeling biological variables using neural networks

functional model of biological neural networks