last lecture summary. multilayer perceptron mlp, the most famous type of neural network input...
TRANSCRIPT
![Page 1: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/1.jpg)
Last lecture summary
![Page 2: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/2.jpg)
Multilayer perceptron
• MLP, the most famous type of neural network
input layer hidden layer output layer
![Page 3: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/3.jpg)
Processing by one neuron
0
0 1 1 2 21.0
n
j jj
n n
w x
w w x w x w x
w x
inputs
weights
output
bias
activation function
![Page 4: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/4.jpg)
Linear activation functions
0j j
j
w x
w x
linear threshold
w∙x > 0
w∙x ≤ 0
![Page 5: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/5.jpg)
logistic (sigmoid, unipolar) tanh (bipolar)
1
1 e
Nonlinear activation functions
tanhe e
e e
![Page 6: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/6.jpg)
Backpropagation training algorithm
• MLP is trained by backpropagation.• forward pass– present a training sample to the neural network– calculate the error (MSE) in each output neuron
• backward pass– first calculate gradient for hidden-to-output weights– then calculate gradient for input-to-hidden weights
• the knowledge of gradhidden-output is necessary to calculate gradinput-hidden
– update the weights in the network1m m m m mw w w w d
![Page 7: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/7.jpg)
input signal propagates forward
error propagates backward
![Page 8: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/8.jpg)
• Online learning vs. batch learning– Batch learning improves the stability by averaging.
• Another averaging approach providing stability is using the momentum (μ).
– μ (between 0 and 1) indicates the relative importance of the past weight change ∆wm-1 on the new weight increment ∆wm
1 1m m mw w d
Momentum
![Page 9: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/9.jpg)
Other improvements
• Delta-Bar-Delta (Turboprop)– Each weight has its own learning rate β.
• Second order methods– Hessian matrix (How fast changes the rate of
increase of the function in the small neighborhood? curvature)
– QuickProp, Gauss-Newton, Levenberg-Marquardt– less epochs, computationally (Hessian inverse,
storage) expensive
![Page 10: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/10.jpg)
Improving generalization of MLP
• Flexibility comes from hidden neurons.• Choose such a # of hidden neurons that
neither underfitting, nor overfitting occurs.• Three most common approaches:– exhaustive search• stop training after MSE < small_threshold (e.g. 0.001)
– early stopping• large number of hidden neurons
– regularization• weight decay
2
1
m
jj
W MSE w
![Page 11: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/11.jpg)
number of neurons
Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006
![Page 12: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/12.jpg)
Network pruning
• Keep only essential weights/neurons.• Optimal Brain Damage (OBD)– If the saliency si of the weight is small, remove the
weight.
– Train flexible network (e.g. early stopping), the remove weights, retrain network, etc.
2
2ii i
i
H ws
![Page 13: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/13.jpg)
Radial Basis Function Networks(new stuff)
![Page 14: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/14.jpg)
Radial Basis Function (RBF) Network
• Becoming an increasingly popular neural network.
• Is probably the main rival to the MLP.• Completely different approach by viewing the
design of a neural network as an approximation problem in high-dimensional space.
• Uses radial functions as activation function.
![Page 15: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/15.jpg)
Gaussian RBF• Typical radial function is the Gaussian RBF
(monotonically decreases with distance from the center).
• Their response decreases with distance from a central point.
• Parameters: – center c– width (radius r)
c - center
rradius
2
2)(exp)(
rh
cxx
![Page 16: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/16.jpg)
Local vs. global units
• Local– they cover just certain part of the space– i.e. they are nonzero just in certain part of the
space• Global– sigmoid, linear
• Local– Gaussian
![Page 17: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/17.jpg)
MLP
RBF
Pavel Kordík, Data Mining lecture, FEL, ČVUT, 2009
![Page 18: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/18.jpg)
Input layer
Hidden layer
(RBFs)
Output layer
x1
x2
x3
xn
h1
h2
h3
hm
f(x)
W1
W2
W3
Wm
RBFN architectureEach of n compo-nents of the input vector x feeds forward to m basis functions whose outputs are linearly combined with weights w (i.e. dot product x∙w) into the network output f(x).
no weights
Pavel Kordík, Data Mining lecture, FEL, ČVUT, 2009
![Page 19: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/19.jpg)
Σ Σ
Pavel Kordík, Data Mining lecture, FEL, ČVUT, 2009
![Page 20: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/20.jpg)
• The basic architecture for a RBF is a 3-layer network. • The input layer is simply a fan-out layer and does no
processing. • The hidden layer performs a non-linear mapping
from the input space into a (usually) higher dimensional space in which the patterns become linearly separable.
• The output layer performs a simple weighted sum (i.e. w∙x).– If the RBFN is used for regression then this output is fine. – However, if pattern classification is required, then a hard-
limiter or sigmoid function could be placed on the output neurons to give 0/1 output values
![Page 21: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/21.jpg)
Clustering
• The unique feature of the RBF network is the process performed in the hidden layer.
• The idea is that the patterns in the input space form clusters.
• If the centres of these clusters are known, then the distance from the cluster centre can be measured.
![Page 22: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/22.jpg)
• Furthermore, this distance measure is made non-linear, so that if a pattern is in an area that is close to a cluster centre it gives a value close to 1.
• Beyond this area, the value drops dramatically. • The notion is that this area is radially symmetrical
around the cluster centre, so that the non-linear function becomes known as the radial-basis function.
non-linearly transformed distance
distance from the center of the cluster
![Page 23: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/23.jpg)
Σ Σ
Category 1 Category 2
Category 1
Category 2
RBFN for classification
![Page 24: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/24.jpg)
RBFN for regression
http://diwww.epfl.ch/mantra/tutorial/english/rbf/html/
![Page 25: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/25.jpg)
XOR problem
0 1
0
1
![Page 26: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/26.jpg)
XOR problem
• 2 inputs x1, x2, 2 hidden units (with outputs φ1, φ2), one output
• The parameters for two hidden units are set as– c1 = <0,0>, c2 = <1,1>– the value of radius r is chosen such that 2r2 = 1
x1 x2 φ1 φ2
0 0 1 0.1
0 1 0.4 0.4
1 0 0.4 0.4
1 1 0.1 1
x1
x2
h1
h2
φ1
φ2
![Page 27: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/27.jpg)
0 1
0
1
0,11,0
1,1
0,0
0 1
0
1 1,1
0,0
0,1
1,0
When mapped into the feature space < h1 , h2 >, two classes become linearly separable. So a linear classifier with h1(x)
and h2(x) as inputs can be used to solve the XOR problem.
Linear classifier is represented by the output layer.
x1 x2 φ1 φ2
0 0 1 0.1
0 1 0.4 0.4
1 0 0.4 0.4
1 1 0.1 1
![Page 28: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/28.jpg)
RBF Learning• Design decision– number of hidden neurons
• max of neurons = number of input patterns• min of neurons = determine• more neurons – more complex, smaller tolerance
• Parameters to be learnt– centers– radii
• A hidden neuron is more sensitive to data points near its center. This sensitivity may be tuned by adjusting the radius.
• smaller radius fits training data better (overfitting)• larger radius less sensitivity, less overfitting, network of smaller
size, faster execution
– weights between hidden and output layers
![Page 29: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/29.jpg)
• Learning can be divide into two independent tasks:1. Center and radii determination2. Learning of output layer weights
• Learning strategies for RBF parameters– Sample center position randomly from the
training data– Self-organized selection of centers– Both layers are learnt using supervised learning
![Page 30: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/30.jpg)
Select centers at random
• Choose centers randomly from the training set.• Radius r is calculated as
• Weights are found by means of numerical linear algebra approach.
• Requires a large training set for a satisfactory level of performance.
maximum distance between any 2 centers
number of centersr
![Page 31: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/31.jpg)
Self-organized selection of centers• centers are selected using k-means clustering
algorithm• radii are usually found using k-NN– find k-nearest centers– The root-mean squared distance between the current
cluster centre and its k (typically 2) nearest neighbours is calculated, and this is the value chosen for r.
• The output layer is learnt using a gradient descent technique
2
1
1( )
k
k ii
r c ck
![Page 32: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/32.jpg)
Supervised learning
• Supervised learning of all parameters (centers, radii, weights) using gradient descent.
• Mathematical formulas for updating all of these parameters. They are not shown here, it is not necessary to scare you in such a “nice” day.
• Learning rate is used.
![Page 33: Last lecture summary. Multilayer perceptron MLP, the most famous type of neural network input layerhidden layeroutput layer](https://reader035.vdocuments.net/reader035/viewer/2022062314/56649da85503460f94a94a6f/html5/thumbnails/33.jpg)
Advantages/disadvantages
• RBFN trains faster than a MLP • Although the RBFN is quick to train, when training is
finished and it is being used it is slower than a MLP.• RBFN are essentially well tried statistical techniques
being presented as neural networks. Learning mechanisms in statistical neural networks are not biologically plausible.
• RBFN can give “I don’t know” answer.• RBFN construct local approximations to non-linear I/O
mapping. MLP construct global approximations to non-linear I/O mapping.