![Page 1: A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is:](https://reader035.vdocuments.net/reader035/viewer/2022062719/5681308e550346895d966b46/html5/thumbnails/1.jpg)
A note about gradient descent:
Consider the function f(x)=(x-x0)2
Its derivative is:
By gradient descent .
x0
)(2)(
0xxdx
xdf
dt
dxx
+ -
)(2)(
0xxdx
xdf
![Page 2: A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is:](https://reader035.vdocuments.net/reader035/viewer/2022062719/5681308e550346895d966b46/html5/thumbnails/2.jpg)
kxx
xxxxdx
xdf
dt
dxx
00 22)(2)(
Solving the differential equation:
or in the general form:
What is the solution of this type of equation:
CBtAtx )exp()(Try:
/)exp(/)0()( ktkxtx
![Page 3: A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is:](https://reader035.vdocuments.net/reader035/viewer/2022062719/5681308e550346895d966b46/html5/thumbnails/3.jpg)
THE PERCEPTRON:(Classification)
00
01)()( 0 x
xxwherewxwO
i i Threshold unit:
where is the output for input pattern , are the synaptic weights and is the desired output
ox
iWy
o
1x
2x
3x
4x
5x
w1 w2 w3 w4 w5
x1 x2 y
1 1 1
1 0 0
0 1 0
0 0 0
AND
![Page 4: A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is:](https://reader035.vdocuments.net/reader035/viewer/2022062719/5681308e550346895d966b46/html5/thumbnails/4.jpg)
o
1x
2x
x1 x2 y
1 1 1
1 0 0
0 1 0
0 0 0
AND
-1.5
1 1
0 1
1
05.121 xx
Linearly seprable
![Page 5: A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is:](https://reader035.vdocuments.net/reader035/viewer/2022062719/5681308e550346895d966b46/html5/thumbnails/5.jpg)
o
1x
2x
x1 x2 y
1 1 1
1 0 1
0 1 1
0 0 0
OR
-0.5
1 1
0 1
1
05.021 xx
Linearly separable
![Page 6: A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is:](https://reader035.vdocuments.net/reader035/viewer/2022062719/5681308e550346895d966b46/html5/thumbnails/6.jpg)
Perceptron learning rule:
o
2x
3x
4x
5x
w1 w2 w3 w4 w5
ii xW
oy
)(Convergence proof:
Hertz, Krough, Palmer (HKP) - did you receive the email?
Assignment 3a:program in matlab a preceptronwith a perceptron learning ruleand solve the OR, AND and XOR problems. (Due before Feb 27)
Show Demo
![Page 7: A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is:](https://reader035.vdocuments.net/reader035/viewer/2022062719/5681308e550346895d966b46/html5/thumbnails/7.jpg)
Summary – what can perceptrons do and how?
![Page 8: A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is:](https://reader035.vdocuments.net/reader035/viewer/2022062719/5681308e550346895d966b46/html5/thumbnails/8.jpg)
Linear single layer network: ( approximation, curve fitting)
0wxwO ii i Linear unit:
where is the output for input pattern , are the synaptic weights and is the desired output
o xiW
y
o
1x
2x
3x
4x
5x
w1 w2 w3 w4 w5
Minimize mean square error:
2
1
P
oyE
wxo or *
![Page 9: A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is:](https://reader035.vdocuments.net/reader035/viewer/2022062719/5681308e550346895d966b46/html5/thumbnails/9.jpg)
Linear single layer network: ( approximation, curve fitting)
0WxWO ii i Linear unit:
where is the output for input pattern , are the synaptic weights and is the desired output
ox
iWy
o
1x
2x
3x
4x
5x
w1 w2 w3 w4 w5
Minimize mean square error:
2
1
P
oyE
![Page 10: A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is:](https://reader035.vdocuments.net/reader035/viewer/2022062719/5681308e550346895d966b46/html5/thumbnails/10.jpg)
The best solution is obtained when E is minimal.
For linear neurons there is an exact solution for this called the pseudo-inverse (see HKP).
Looking for a solution by gradient descent:
2
1
P
oyE
E
w
-gradient
ii
i
W
O
O
E
W
E
dt
dW
Chain rule
![Page 11: A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is:](https://reader035.vdocuments.net/reader035/viewer/2022062719/5681308e550346895d966b46/html5/thumbnails/11.jpg)
2
1
P
oyE
iii
i xoyW
O
O
E
W
E
dt
dW
2
0WxWOi i
and
i
iii
i W
xW
W
O
)(
Since:
Error:
ii xoyW Therefore:
Which types of problems can a linear network solve?
![Page 12: A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is:](https://reader035.vdocuments.net/reader035/viewer/2022062719/5681308e550346895d966b46/html5/thumbnails/12.jpg)
)(2 '
oxoydt
dWi
i
)( 0WxWOi i Sigmoidal neurons:
Which types of problems can a sigmoidal networks solve?
Assignment 3b – Implement a one layer linear and sigmoidal network, fit a 1D a linear, a sigmoid and a quadratic function, for both networks.
forexample:
)exp(1
)exp(),(
x
xx
![Page 13: A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is:](https://reader035.vdocuments.net/reader035/viewer/2022062719/5681308e550346895d966b46/html5/thumbnails/13.jpg)
Multi layer networks:
5x
k jjjkk xwwo )( 1
,122
• Can solve non linearly separable classification problems.
• Can approximate any arbitrary function, given ‘enough’ units in the hidden layer.
Hidden layer
1x
2x
1,1o
4x
1,2o
o
Output layer
Input layer3x
![Page 14: A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is:](https://reader035.vdocuments.net/reader035/viewer/2022062719/5681308e550346895d966b46/html5/thumbnails/14.jpg)
k j
jjkk xwwo )( 1,
2 o
1x
ix
1,1o
1,2o
Nx
11,1w
11,iw
11,2w
12,iw
11,Nw
21w
22w
12,Nw
Note: is not a vector but a matrix
1, jiw
![Page 15: A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is:](https://reader035.vdocuments.net/reader035/viewer/2022062719/5681308e550346895d966b46/html5/thumbnails/15.jpg)
Solving linearly inseparable problems
k j
jjkk xwwo )( 1,
2
1x
2x
1,1o
1,2o
x1 x2 y
1 1 0
1 0 1
0 1 1
0 0 0
XOR
Hint: XOR = or and not and
![Page 16: A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is:](https://reader035.vdocuments.net/reader035/viewer/2022062719/5681308e550346895d966b46/html5/thumbnails/16.jpg)
How do we learn a multi-layer networkThe credit assignment problem !
k j
jjkk xwwo )( 1,
2
.5
1x
2x
1,1o 0
-.5
1,2o
x1 x2 y
1 1 0
1 0 1
0 1 1
0 0 0
XOR
0.5 -0.5 1 -1
1 0.5
![Page 17: A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is:](https://reader035.vdocuments.net/reader035/viewer/2022062719/5681308e550346895d966b46/html5/thumbnails/17.jpg)
k j
jjkk xwwo )( 1,
2
Gradient descent/ Back Propagation, the solution to the credit assignment problem:
P
oyE1
2
2
1
Where:
1,2
2k
kk o
w
EwFrom hidden layer
to output weights:
{kh
![Page 18: A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is:](https://reader035.vdocuments.net/reader035/viewer/2022062719/5681308e550346895d966b46/html5/thumbnails/18.jpg)
)(2k
kk hwo
P
oyE1
2
2
1
Where:
For input to hidden layer:
1,
1,
1,
jijiji w
o
o
E
w
Ew
)( oyo
E
jjiji
xhww
o)('2
1,
j
jjkk xwh 1
,
and
jijiiji xxhwoyw )()( '21
,
jiji xw 1,
2' )( iii whand
{
![Page 19: A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is:](https://reader035.vdocuments.net/reader035/viewer/2022062719/5681308e550346895d966b46/html5/thumbnails/19.jpg)
For input to hidden layer:
jiji xw 1,
2' )( iii whand
o
1x
ix
1,1o
1,2o
Nx
11,1w
11,iw
11,2w
12,iw
11,Nw
21w
22w
12,Nw
Assignment 3c: Program a 2 layer network in matlab, solve the XOR problem. Fit the curve: x(x-1) between 0 and 1, how many hidden units did you need?
![Page 20: A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is:](https://reader035.vdocuments.net/reader035/viewer/2022062719/5681308e550346895d966b46/html5/thumbnails/20.jpg)
Formal neural networks can accomplish many tasks, for example:
• Perform complex classification
•Learn arbitrary functions
•Account for associative memorySome applications: Robotics, Character recognition, Speech recognition,Medical diagnostics.
This is not Neuroscience, but is motivated loosely by neuroscience and carries important information for neuroscience as well.
For example: Memory, learning and some aspects of development are assumed to be based on synaptic plasticity.
![Page 21: A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is:](https://reader035.vdocuments.net/reader035/viewer/2022062719/5681308e550346895d966b46/html5/thumbnails/21.jpg)
What did we learn today?
Is BackProp biologically realistic?