artificial neural networks 1 morten nielsen department of systems biology, dtu · 2014. 6. 17. ·...
Post on 04-Mar-2021
3 Views
Preview:
TRANSCRIPT
Artificial Neural Networks 1
Morten Nielsen Department of Systems Biology,
DTU
Biological Neural network
Biological neuron structure
Artificial neural networks. Background
Higher order sequence correlations
• Neural networks can learn higher order correlations! – What does this mean?
S S => 0 L S => 1 S L => 1 L L => 0
Say that the peptide needs one and only one large amino acid in the positions P3 and P4 to fill the binding cleft How would you formulate this to test if a peptide can bind?
=> XOR function
Neural networks
• Neural networks can learn higher order correlations
XOR function: 0 0 => 0 1 0 => 1 0 1 => 1 1 1 => 0
(1,1)
(1,0) (0,0)
(0,1)
No linear function can separate the points OR
AND
XOR
Error estimates
XOR 0 0 => 0 1 0 => 1 0 1 => 1 1 1 => 0
(1,1)
(1,0) (0,0)
(0,1) Predict 0 1 1 1
Error 0 0 0 1
Mean error: 1/4
Linear methods and the XOR function
v1 v2
Linear function
€
y = x1 ⋅ v1 + x2 ⋅ v2
Neural networks with a hidden layer
w12
v1
w21 w22
v2
1
wt2 wt1 w11
1
vt
Input 1 (Bias) {
€
O =1
1+ exp(−o)
o = xii=1
N
∑ ⋅wi + t = xii=1
N+1
∑ ⋅wi
xN =1
How does it work? Ex. Input is (0 0)
0 0 6
-9
4 6
9
1
-2 -6 4
1
-4.5
Input 1 (Bias) {o1=-6 O1=0
o2=-2 O2=0
y1=-4.5 Y1=0
€
o = xi∑ ⋅ wi
Gradient decent (from wekipedia)
Gradient descent is based on the observation that if the real-valued function F(x) is defined and differentiable in a neighborhood of a point a, then F(x) decreases fastest if one goes from a in the direction of the negative gradient of F at a. It follows that, if for ε > 0 a small enough number, then F(b)<F(a)
€
b = a −ε ⋅ ∇F(a)
Gradient decent. Example
Weights are changed in the opposite direction of the gradient of the error
wi' = wi +Δwi
E = 12 ⋅ (O− t)
2
O = wii∑ ⋅ Ii
Δwi = −ε ⋅∂E∂wi
= −ε ⋅∂E∂O
⋅∂O∂wi
= −ε ⋅ (O− t) ⋅ Ii
I1 I2
w1 w2
Linear function
€
O = I1 ⋅ w1 + I2 ⋅ w2
O
ANNs - Hidden to output layer
∂E∂wj
=∂E∂O
⋅∂O∂o
⋅∂o∂wj
∂E(O(o(wj )))∂wj
=
E = 12 ⋅ (O− t)
2
O = g(o)
o = wii∑ ⋅ Ii
Hidden to output layer
∂E∂wj
=∂E∂O
⋅∂O∂o
⋅∂o∂wj
= (O− t) ⋅ g '(o) ⋅H j
∂E∂O
= (O− t)
∂O∂o
=∂g∂o
= g '(o)
∂o∂wj
=1∂wj
wj ⋅l∑ H j = H j
o = wii∑ ⋅H j
What about the hidden layer?
Δvjk = −ε ⋅∂E∂vjk
o = wjj∑ ⋅H j
hj = vjkk∑ ⋅ Ik
€
E = 12 ⋅ (O− t)
2
O = g(o),H = g(h)
g(x) =1
1+ e−x
Input to hidden layer
∂E∂vjk
=∂E(O(o(H j (hj (vjk ))))
∂vjk
=∂E∂O
⋅∂O∂o
⋅∂o∂H j
⋅∂H j
∂hj⋅∂hj∂vjk
= (O− t) ⋅ g '(o) ⋅wj ⋅ g '(hj ) ⋅ Ik
Summary
∂E∂wj
= (O− t) ⋅ g '(o) ⋅H j
∂E∂vjk
= (O− t) ⋅ g '(o) ⋅wj ⋅ g '(hj ) ⋅ Ik
Or
∂E∂wj
= (O− t) ⋅ g '(o) ⋅H j = δ ⋅H j
∂E∂vjk
= (O− t) ⋅ g '(o) ⋅wj ⋅ g '(hj ) ⋅ Ik = δ ⋅wj ⋅ g '(hj ) ⋅ Ik
δ = (O− t) ⋅ g '(o)
Neural networks and the XOR function
Deep(er) Network architecture
€
E = 12 ⋅ (O− t)
2
O = g(o),H = g(h)
g(x) =1
1+ e−x
o = wjj∑ ⋅H
j
2
hj
2 = vjkk∑ ⋅H
k
1
hk
1 = ukll∑ ⋅ Il
Δwi = −ε ⋅∂E∂wi
Deeper Network architecture Il Input layer, l
1. Hidden layer, k
Output layer
h1k
H1k
h3
H3
ukl
wj
2. Hidden layer, j
vjk
h2j
H2j
∂E∂wj
=∂E(H 3(h3(wj )))
∂wj
=∂E∂H 3 ⋅
∂H 3
∂h3⋅∂h3
∂wj
= (H 3 − t) ⋅ g '(h3) ⋅H j2
Network architecture (hidden to hidden)
∂E∂vjk
=∂E∂H 3 ⋅
∂H 3
∂h3⋅∂h3
∂Hj
2 ⋅∂H
j
2
∂hj
2 ⋅∂h
j
2
∂vjk= (H 3 − t) ⋅ g '(h3) ⋅wj ⋅ g '(hj
2 ) ⋅Hk1
Network architecture (input to hidden)
∂E∂ukl
=∂E∂H 3 ⋅
∂H 3
∂h3⋅
∂h3
∂Hj
2 ⋅∂H
j
2
∂hj
2 ⋅∂h
j
2
∂Hk1
j∑ ⋅
∂Hk1
∂hk1 ⋅∂h
k
1
∂ukl
= (H 3 − t) ⋅ g '(h3) ⋅ wj ⋅ g '(hj
2 ) ⋅ vjkj∑ ⋅ g '(h
k
1 ) ⋅ Il
Network architecture (input to hidden)
∂E∂ukl
=∂E∂H 3 ⋅
∂H 3
∂h3⋅
∂h3
∂Hj
2 ⋅∂H
j
2
∂hj
2 ⋅∂h
j
2
∂Hk1
j∑ ⋅
∂Hk1
∂hk1 ⋅∂h
k
1
∂ukl
= (H 3 − t) ⋅ g '(h3) ⋅ wj ⋅ g '(hj
2 ) ⋅ vjkj∑ ⋅ g '(h
k
1 ) ⋅ Il
Use delta’s
hj
q = wjii∑ H
i
q−1
Hi
q−1 = g(hi
q−1)
j
k
l
δj
vjk
ukl
δk
Bishop, Christopher (1995). Neural networks for pattern recognition. Oxford: Clarendon Press. ISBN 0-19-853864-2.
Use delta’s
∂E∂w
ji
q =∂E∂h
j
q ⋅∂hj
q
∂wji
q = δ j
q ⋅Hi
q−1
δj
q =∂E∂h
j
q
δ3 =∂E∂h3
=∂E∂H 3 ⋅
∂H 3
∂h3= (H 3 − t) ⋅ g '(h3)
δ j2 =
∂E∂h
j
2 =∂E∂h3
⋅∂h3
∂hj2 =
∂E∂h3
⋅∂h3
∂H j2 ⋅∂H j
2
∂hj2 = g '(hj
2 ) ⋅δ3 ⋅ vjk
δk1 =
∂E∂hk
1 =∂E∂hj
2j∑ ⋅
∂hj2
∂hk1 =
∂E∂hj
2j∑ ⋅
∂hj2
∂Hk1 ⋅∂Hk
1
∂hk1 = g '(hk
1) ⋅ δ j2
j∑ ⋅ vjk
hj = wjii∑ Hi
Hi = g(hi )
Deep learning
http://www.slideshare.net/hammawan/deep-neural-networks
Deep learning – time is not an issue
0
50
100
150
200
250
17000 17500 18000 18500 19000 19500 20000
CPU (u)
Number of weights
17000
17500
18000
18500
19000
19500
0 1 2 3 4 5 6
# w
eigh
ts
N layer
Deep learning
http://www.slideshare.net/hammawan/deep-neural-networks
Deep learning
http://www.slideshare.net/hammawan/deep-neural-networks
Deep learning
http://www.slideshare.net/hammawan/deep-neural-networks
Deep learning
http://www.slideshare.net/hammawan/deep-neural-networks
Deep learning
http://www.slideshare.net/hammawan/deep-neural-networks
Auto encoder
Deep learning
http://www.slideshare.net/hammawan/deep-neural-networks
Deep learning
http://www.slideshare.net/hammawan/deep-neural-networks
Deep learning
http://www.slideshare.net/hammawan/deep-neural-networks
Deep learning
http://www.slideshare.net/hammawan/deep-neural-networks
Deep learning
http://www.slideshare.net/hammawan/deep-neural-networks
Pan-specific prediction methods
NetMHC NetMHCpan
Example Peptide Amino acids of HLA pockets HLA Aff VVLQQHSIA YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.131751 SQVSFQQPL YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.487500 SQCQAIHNV YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.364186 LQQSTYQLV YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.582749 LQPFLQPQL YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.206700 VLAGLLGNV YFAVLTWYGEKVHTHVDTLVRYHY A0201 0.727865 VLAGLLGNV YFAVWTWYGEKVHTHVDTLLRYHY A0202 0.706274 VLAGLLGNV YFAEWTWYGEKVHTHVDTLVRYHY A0203 1.000000 VLAGLLGNV YYAVLTWYGEKVHTHVDTLVRYHY A0206 0.682619 VLAGLLGNV YYAVWTWYRNNVQTDVDTLIRYHY A6802 0.407855
Going Deep – One hidden layer
0
0.02
0.04
0.06
0.08
0 50 100 150 200 250 300 350 400 450 500
MSE
# Ietrations
20
0
0.02
0.04
0.06
0.08
0 50 100 150 200 250 300 350 400 450 500
MSE
# Iterations
0
0.2
0.4
0.6
0.8
1
0 50 100 150 200 250 300 350 400 450 500
PCC
# Iterations
Train
Test
Test
Going Deep – 3 hidden layers
0
0.05
0.1
0 50 100 150 200 250 300 350 400 450 500
MSE
# Ietrations
20 20+20 20+20+20
0
0.05
0.1
0 50 100 150 200 250 300 350 400 450 500
MSE
# Iterations
0 0.2 0.4 0.6 0.8
1
0 50 100 150 200 250 300 350 400 450 500
PCC
# Iterations
Train
Test
Test
Going Deep – more than 3 hidden layers
0 0.02 0.04 0.06 0.08
0.1
0 50 100 150 200 250 300 350 400 450 500
MSE
# Ietrations
20 20+20 20+20+20 20+20+20+20 20+20+20+20+20
0
0.05
0.1
0 50 100 150 200 250 300 350 400 450 500
MSE
# Iterations
0
0.5
1
0 50 100 150 200 250 300 350 400 450 500
PCC
# Iterations
Train
Test
Test
Going Deep – Using Auto-encoders
0
0.05
0.1
0 50 100 150 200 250 300 350 400 450 500
MSE
# Ietrations
20 20+20 20+20+20 20+20+20+20 20+20+20+20+20 20+20+20+20+Auto
0 0.02 0.04 0.06 0.08
0.1
0 50 100 150 200 250 300 350 400 450 500
MSE
# Iterations
0
0.5
1
0 50 100 150 200 250 300 350 400 450 500
PCC
# Iterations
Train
Test
Test
Conclusions
• Implementing Deep networks using deltas1 makes CPU time scale linearly with respect to the number of weights – So going Deep is not more CPU intensive than
going wide • Back-propagation is an efficient method
for NN training for shallow networks with up to 3 hidden layers
• For deeper network, pre-training is required using for instance Auto-encoders
1Bishop, Christopher (1995). Neural networks for pattern recognition. Oxford: Clarendon Press. ISBN 0-19-853864-2.
top related