20.5 nerual networks
DESCRIPTION
20.5 Nerual Networks. Thanks: Professors Frank Hoffmann and Jiawei Han, and Russell and Norvig. Biological Neural Systems. Neuron switching time : > 10 -3 secs Number of neurons in the human brain: ~10 10 Connections (synapses) per neuron : ~10 4 –10 5 Face recognition : 0.1 secs - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: 20.5 Nerual Networks](https://reader036.vdocuments.net/reader036/viewer/2022062718/56812d97550346895d92b4c8/html5/thumbnails/1.jpg)
20.5 Nerual Networks
Thanks: Professors Frank Hoffmann and Jiawei Han, and
Russell and Norvig
![Page 2: 20.5 Nerual Networks](https://reader036.vdocuments.net/reader036/viewer/2022062718/56812d97550346895d92b4c8/html5/thumbnails/2.jpg)
Biological Neural Systems
Neuron switching time : > 10-3 secs Number of neurons in the human brain: ~1010
Connections (synapses) per neuron : ~104–105
Face recognition : 0.1 secs High degree of distributed and parallel
computation Highly fault tolerent Highly efficient Learning is key
![Page 3: 20.5 Nerual Networks](https://reader036.vdocuments.net/reader036/viewer/2022062718/56812d97550346895d92b4c8/html5/thumbnails/3.jpg)
Excerpt from Russell and Norvig
![Page 4: 20.5 Nerual Networks](https://reader036.vdocuments.net/reader036/viewer/2022062718/56812d97550346895d92b4c8/html5/thumbnails/4.jpg)
A Neuron
Computation: input signals input function(linear)
activation function(nonlinear) output signal
ajoutput links
ak
outputInput links
Wkj
ai = output(inj)
inj
j
kkjj IWin *
![Page 5: 20.5 Nerual Networks](https://reader036.vdocuments.net/reader036/viewer/2022062718/56812d97550346895d92b4c8/html5/thumbnails/5.jpg)
Part 1. Perceptrons: Simple NN
x1
x2
xn
.
..
w1
w2
wn
a=i=1n wi xi
Xi’s range: [0, 1]
1 if a y= 0 if a <
y
{
inputsweights
activation output
![Page 6: 20.5 Nerual Networks](https://reader036.vdocuments.net/reader036/viewer/2022062718/56812d97550346895d92b4c8/html5/thumbnails/6.jpg)
Decision Surface of a Perceptron
x1
x2
Decision linew1 x1 + w2 x2 = w
1
1 1
0
0
00
0
1
![Page 7: 20.5 Nerual Networks](https://reader036.vdocuments.net/reader036/viewer/2022062718/56812d97550346895d92b4c8/html5/thumbnails/7.jpg)
Linear Separability
x1
x2
10
0 0
Logical AND
x1 x2 a y
0 0 0 0
0 1 1 0
1 0 1 0
1 1 2 1
w1=1w2=1=1.5 x1
10
0
w1=?w2=?= ?
1
Logical XOR
x1 x2 y
0 0 0
0 1 1
1 0 1
1 1 0
![Page 8: 20.5 Nerual Networks](https://reader036.vdocuments.net/reader036/viewer/2022062718/56812d97550346895d92b4c8/html5/thumbnails/8.jpg)
Threshold as Weight: W0
x1
x2
xn
.
..
w1
w2
wn
w0
x0=-1
a= i=0n wi xi
y
1 if a y= 0 if a <{
=w0
![Page 9: 20.5 Nerual Networks](https://reader036.vdocuments.net/reader036/viewer/2022062718/56812d97550346895d92b4c8/html5/thumbnails/9.jpg)
Training the Perceptron p742
Training set S of examples {x,t} x is an input vector and t the desired target vector Example: Logical And S = {(0,0),0}, {(0,1),0}, {(1,0),0}, {(1,1),1}
Iterative process Present a training example x , compute network output y ,
compare output y with target t, adjust weights and thresholds
Learning rule Specifies how to change the weights w and thresholds
of the network as a function of the inputs x, output y and target t.
![Page 10: 20.5 Nerual Networks](https://reader036.vdocuments.net/reader036/viewer/2022062718/56812d97550346895d92b4c8/html5/thumbnails/10.jpg)
Perceptron Learning Rule
w’=w + (t-y) x
wi := wi + wi = wi + (t-y) xi (i=1..n) The parameter is called the learning rate.
In Han’s book it is lower case L It determines the magnitude of weight updates wi .
If the output is correct (t=y) the weights are not changed (wi =0).
If the output is incorrect (t y) the weights wi are changed such that the output of the Perceptron for the new weights w’i is closer/further to the input xi.
![Page 11: 20.5 Nerual Networks](https://reader036.vdocuments.net/reader036/viewer/2022062718/56812d97550346895d92b4c8/html5/thumbnails/11.jpg)
Perceptron Training Algorithm
Repeatfor each training vector pair (x,t)
evaluate the output y when x is the inputif yt then
form a new weight vector w’ accordingto w’=w + (t-y) x
else do nothing
end if end forUntil y=t for all training vector pairs or # iterations > k
![Page 12: 20.5 Nerual Networks](https://reader036.vdocuments.net/reader036/viewer/2022062718/56812d97550346895d92b4c8/html5/thumbnails/12.jpg)
Perceptron Convergence Theorem
The algorithm converges to the correct classification
if the training data is linearly separable and learning rate is sufficiently small
If two classes of vectors X1 and X2 are linearly separable, the application of the perceptron training algorithm will eventually result in a weight vector w0, such that w0 defines a Perceptron whose decision hyper-plane separates X1 and X2 (Rosenblatt 1962).
Solution w0 is not unique, since if w0 x =0 defines a hyper-plane, so does w’0 = k w0.
![Page 13: 20.5 Nerual Networks](https://reader036.vdocuments.net/reader036/viewer/2022062718/56812d97550346895d92b4c8/html5/thumbnails/13.jpg)
Experiments
![Page 14: 20.5 Nerual Networks](https://reader036.vdocuments.net/reader036/viewer/2022062718/56812d97550346895d92b4c8/html5/thumbnails/14.jpg)
Perceptron Learning from Patterns
x1
x2
xn
.
..
w1
w2
wn
Input pattern
Associationunits
weights (trained)
Summation Thresholdfixed
Association units (A-units) can be assigned arbitrary Booleanfunctions of the input pattern.
![Page 15: 20.5 Nerual Networks](https://reader036.vdocuments.net/reader036/viewer/2022062718/56812d97550346895d92b4c8/html5/thumbnails/15.jpg)
Part 2. Multi Layer Networks
Output nodes
Input nodes
Hidden nodes
Output vector
Input vector
![Page 16: 20.5 Nerual Networks](https://reader036.vdocuments.net/reader036/viewer/2022062718/56812d97550346895d92b4c8/html5/thumbnails/16.jpg)
Gradient Descent Learning Rule
Consider linear unit without threshold and continuous output o (not just –1,1) Output=oj=-w0 + w1 x1 + … + wn xn
Train the wi’s such that they minimize the squared error Error[w1,…,wn] = ½ jD (Tj-oj)2
where D is the set of training examples
![Page 17: 20.5 Nerual Networks](https://reader036.vdocuments.net/reader036/viewer/2022062718/56812d97550346895d92b4c8/html5/thumbnails/17.jpg)
Neuron with Sigmoid-Function
x1
x2
xn
.
..
w1
w2
wn
a=i=1n wi xi
Output=o=(a) =1/(1+e-a)
o
inputsweights
activation output
![Page 18: 20.5 Nerual Networks](https://reader036.vdocuments.net/reader036/viewer/2022062718/56812d97550346895d92b4c8/html5/thumbnails/18.jpg)
Sigmoid Unit
x1
x2
xn
.
..
w1
w2
wn
w0
x0=-1
a=i=0n wi xi
o
o=(a)=1/(1+e-a)
(x) is the sigmoid function: 1/(1+e-x)
d(x)/dx= (x) (1- (x))
Derive gradient decent rules to train:• one sigmoid function
E/wi = -j(Tj-O) o (1-o) xij
• derivation: see next page
![Page 19: 20.5 Nerual Networks](https://reader036.vdocuments.net/reader036/viewer/2022062718/56812d97550346895d92b4c8/html5/thumbnails/19.jpg)
Explantion: Gradient Descent Learning Rule
wi = Ojp(1-Oj
p) (Tjp-Oj
p) xip
xi
wji
yj
activation ofpre-synaptic neuron
error j ofpost-synaptic neuron
derivative of activation function
learning rate
![Page 20: 20.5 Nerual Networks](https://reader036.vdocuments.net/reader036/viewer/2022062718/56812d97550346895d92b4c8/html5/thumbnails/20.jpg)
Gradient Descent: Graphical
D={<(1,1),1>,<(-1,-1),1>, <(1,-1),-1>,<(-1,1),-1>}
(w1,w2)
(w1+w1,w2 +w2)
![Page 21: 20.5 Nerual Networks](https://reader036.vdocuments.net/reader036/viewer/2022062718/56812d97550346895d92b4c8/html5/thumbnails/21.jpg)
Perceptron vs. Gradient Descent Rule
Perceptron rule w’i = wi + (t-o) xi
derived from manipulation of decision surface.
Gradient descent rule w’i = wi + (1-y) (t-y) xi
derived from minimization of error function
E[w1,…,wn] = ½ p (t-y)2
by means of gradient descent.