a note about gradient descent: consider the function f(x)=(x-x 0 ) 2 its derivative is:
DESCRIPTION
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent. + -. x 0. Solving the differential equation:. or in the general form:. What is the solution of this type of equation:. Try:. THE PERCEPTRON: (Classification). - PowerPoint PPT PresentationTRANSCRIPT
A note about gradient descent:
Consider the function f(x)=(x-x0)2
Its derivative is:
By gradient descent .
x0
)(2)(
0xxdx
xdf
dt
dxx
+ -
)(2)(
0xxdx
xdf
kxx
xxxxdx
xdf
dt
dxx
00 22)(2)(
Solving the differential equation:
or in the general form:
What is the solution of this type of equation:
CBtAtx )exp()(Try:
/)exp(/)0()( ktkxtx
THE PERCEPTRON:(Classification)
00
01)()( 0 x
xxwherewxwO
i i Threshold unit:
where is the output for input pattern , are the synaptic weights and is the desired output
ox
iWy
o
1x
2x
3x
4x
5x
w1 w2 w3 w4 w5
x1 x2 y
1 1 1
1 0 0
0 1 0
0 0 0
AND
o
1x
2x
x1 x2 y
1 1 1
1 0 0
0 1 0
0 0 0
AND
-1.5
1 1
0 1
1
05.121 xx
Linearly seprable
o
1x
2x
x1 x2 y
1 1 1
1 0 1
0 1 1
0 0 0
OR
-0.5
1 1
0 1
1
05.021 xx
Linearly separable
Perceptron learning rule:
o
2x
3x
4x
5x
w1 w2 w3 w4 w5
ii xW
oy
)(Convergence proof:
Hertz, Krough, Palmer (HKP) - did you receive the email?
Assignment 3a:program in matlab a preceptronwith a perceptron learning ruleand solve the OR, AND and XOR problems. (Due before Feb 27)
Show Demo
Summary – what can perceptrons do and how?
Linear single layer network: ( approximation, curve fitting)
0wxwO ii i Linear unit:
where is the output for input pattern , are the synaptic weights and is the desired output
o xiW
y
o
1x
2x
3x
4x
5x
w1 w2 w3 w4 w5
Minimize mean square error:
2
1
P
oyE
wxo or *
Linear single layer network: ( approximation, curve fitting)
0WxWO ii i Linear unit:
where is the output for input pattern , are the synaptic weights and is the desired output
ox
iWy
o
1x
2x
3x
4x
5x
w1 w2 w3 w4 w5
Minimize mean square error:
2
1
P
oyE
The best solution is obtained when E is minimal.
For linear neurons there is an exact solution for this called the pseudo-inverse (see HKP).
Looking for a solution by gradient descent:
2
1
P
oyE
E
w
-gradient
ii
i
W
O
O
E
W
E
dt
dW
Chain rule
2
1
P
oyE
iii
i xoyW
O
O
E
W
E
dt
dW
2
0WxWOi i
and
i
iii
i W
xW
W
O
)(
Since:
Error:
ii xoyW Therefore:
Which types of problems can a linear network solve?
)(2 '
oxoydt
dWi
i
)( 0WxWOi i Sigmoidal neurons:
Which types of problems can a sigmoidal networks solve?
Assignment 3b – Implement a one layer linear and sigmoidal network, fit a 1D a linear, a sigmoid and a quadratic function, for both networks.
forexample:
)exp(1
)exp(),(
x
xx
Multi layer networks:
5x
k jjjkk xwwo )( 1
,122
• Can solve non linearly separable classification problems.
• Can approximate any arbitrary function, given ‘enough’ units in the hidden layer.
Hidden layer
1x
2x
1,1o
4x
1,2o
o
Output layer
Input layer3x
k j
jjkk xwwo )( 1,
2 o
1x
ix
1,1o
1,2o
Nx
11,1w
11,iw
11,2w
12,iw
11,Nw
21w
22w
12,Nw
Note: is not a vector but a matrix
1, jiw
Solving linearly inseparable problems
k j
jjkk xwwo )( 1,
2
1x
2x
1,1o
1,2o
x1 x2 y
1 1 0
1 0 1
0 1 1
0 0 0
XOR
Hint: XOR = or and not and
How do we learn a multi-layer networkThe credit assignment problem !
k j
jjkk xwwo )( 1,
2
.5
1x
2x
1,1o 0
-.5
1,2o
x1 x2 y
1 1 0
1 0 1
0 1 1
0 0 0
XOR
0.5 -0.5 1 -1
1 0.5
k j
jjkk xwwo )( 1,
2
Gradient descent/ Back Propagation, the solution to the credit assignment problem:
P
oyE1
2
2
1
Where:
1,2
2k
kk o
w
EwFrom hidden layer
to output weights:
{kh
)(2k
kk hwo
P
oyE1
2
2
1
Where:
For input to hidden layer:
1,
1,
1,
jijiji w
o
o
E
w
Ew
)( oyo
E
jjiji
xhww
o)('2
1,
j
jjkk xwh 1
,
and
jijiiji xxhwoyw )()( '21
,
jiji xw 1,
2' )( iii whand
{
For input to hidden layer:
jiji xw 1,
2' )( iii whand
o
1x
ix
1,1o
1,2o
Nx
11,1w
11,iw
11,2w
12,iw
11,Nw
21w
22w
12,Nw
Assignment 3c: Program a 2 layer network in matlab, solve the XOR problem. Fit the curve: x(x-1) between 0 and 1, how many hidden units did you need?
Formal neural networks can accomplish many tasks, for example:
• Perform complex classification
•Learn arbitrary functions
•Account for associative memorySome applications: Robotics, Character recognition, Speech recognition,Medical diagnostics.
This is not Neuroscience, but is motivated loosely by neuroscience and carries important information for neuroscience as well.
For example: Memory, learning and some aspects of development are assumed to be based on synaptic plasticity.
What did we learn today?
Is BackProp biologically realistic?