1
Least-squares-based Multilayer perceptron training with weighted
adaptation-- Software simulation project
EE 690Design of Embodied Intelligence
2
Outline
Multilayer PerceptronLeast-squares based Learning AlgorithmWeighted Adaptation in trainingSignal-to-Noise Ratio Figure and
OverfittingSoftware simulation project
3
)( 11 yfz ii xwy 11
122ii zwy )( 22 yfz
Inputs x Outputs z
Feedforward (no recurrent connections) network with units arranged in layers
Multilayer perceptron (MLP)
4
Efficient mapping from inputs to outputs
Powerful universal function approximation
Number of inputs and outputs determined by the data
Number of hidden neurons Number of hidden layers
inputsoutputs
Multilayer perceptron (MLP)
MLP
5
Multilayer Perceptron Learning
hidden layerinput layer
output layer
Back-propagation (BP) training algorithm: how much each weight is responsible for the error signal
BP has two phases:• Forward pass phase: feedforward
propagation of input signals through network
• Backward pass phase: propagates the error backwards through network
6
Backward Pass
We want to know how to modify weights in order to decrease E.
Use gradient descent:
1
2))()((1
)(k
kk tytdk
tE
)(
)()()1(
tw
tEtwtww
ijijijij
Multilayer Perceptron Learning
1. Gradient-based adjustment could go to local minima
2. Time-consuming due to large number of learning steps and the step size needs to be configured
7
Least-squares based Learning Algorithm
Least-squared fit (LSF): to obtain the minimum sum of squared error
For underdetermined problem, LSF finds the solution with the minimum SSE
For overdetermined problem, pseudo-inverse finds the solution with minimum norm
Can be applied in the optimization for weights or signals on the layers
YWX
XYW
YXW
l
l
l
1
1
)(
)(
Optimized weights
Optimized signals
8
I. Start with desired output signal back-propagation
signals optimization
1. Propagation of the desired outputs back through layers
2. Optimization of the weights between layers
(1). y2=f -1(z2), scale y1 to (-1, 1).(2). Based on W2, b2: W2.z1=y2-b2.(3). y1=f-1(z1), scale y1 to (-1, 1).(4). Optimize W1, b1 to satisfy W1.x-b1=y1.(5). Evaluate z1, y1 using the new W1 and bias b1.(6). Optimize W2, b2 to satisfy W2.z1+b2=y2.(7). Evaluate z2, y2 using the new W2 and bias b2.(8). Evaluate the MSE
Least-squares based Learning Algorithm (I)
z2y2
d
W1
y1z1
b1
W2
x
b2
9
Least-squares based Learning Algorithm (I)
Weights optimization with weighted LSF
The location of x on the transfer function determines its effect on output signal of this layer
dy/dx weighting term in LSF
Optimize W1, b1 to satisfy W1.x=y1-b1
Weighted LSF
k
l
l
YXW
YXW
0
...
2
01
ΔxΔx
Δy
Δy
10
Least-squares based Learning Algorithm (II)
II. Weights optimization with iterative fitting
W1 can be further adjusted based on the output error
211W
x
11 jW
d)(
)(...)()(
)(
112
1121
12
1212
11
1211
112
xWfWde
xWfWxWfWxWfW
xWfWd
out
jnjnjjjj
out
Each hidden neuron: basis function
Start with the 1st hidden neurons, and continue to other neurons as long as eout exists
11
III. Start with input feedforward weights optimization
1. Propagation of the inputs forward through layers
2. Optimization of the weights between layers and signals on layers
(1). Evaluate z1, y1 using the initial W1 and bias b1. (2). y2=f -1(d).(3). Optimize W2, b2 to satisfy W2.z1+b2=y2. (4). Based on W2, b2, optimize z1 to satisfy W2.z1-b2=y2. (5). y1=f-1(z1). (6). Optimize W1, b1 to satisfy W1.x+b1=y1.(7). Evaluate y1, z1, y2, z2 using the new W1,W2 and bias b1,b2.(8). Evaluate the MSE
Least-squares based Learning Algorithm (III)
z2y2
d
W1
y1z1
b1
W2
x
b2
12
Least-squares based Learning Algorithm (III)
Signal optimization with weighted adaptation
The location of x on the transfer function determines how much the signal can be changed
x
2
2
12
22)11(2
1222
ezW
byzzW
zWbye
max1z
min1z
y
)1('0
...
0)1('
12
max1
max1
2
yf
yf
ezW
z
z
max1z
min1z
13
Overfitting problem
Learning algorithm can adapt MLP to fit into the training data.
For the noisy training data, how well we should learn into the data?
Overfitting Number of hidden neurons
Number of layers affect the training accuracy,
determined by users: critical
• Optimized Approximation Algorithm –SNRF criterion
1 2 3 4 5 6 7 8 9 10-1.5
-1
-0.5
0
0.5
1
1.5
training datafunction approximation
14
• Sampled data: function value + noise• Error signal:
approximation error component + noise component
Signal-to-noise ratio figure (SNRF)
Noise part Should not be learned
Useful signalShould be reduced
• Assumption: continuous function & WGN as noise• Signal-to-noise ratio figure (SNRF):
signal energy/noise energy
• Compare SNRFe and SNRFWGN
Learning should stop – ?If there is useful signal left unlearnedIf noise dominates in the error signal
15
Signal-to-noise ratio figure (SNRF)
1 1.5 2 2.5 3 3.5 4 4.5 5-1.5
-1
-0.5
0
0.5
1
1.5Training dataQuadratic fitting
1 1.5 2 2.5 3 3.5 4 4.5 5-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
Training data and approximating function Error signal
),...2,1( Ninse iii
approximation error component noise component+
16
Optimization using SNRF
Noise dominates in the error signal, Little information left unlearned,
Learning should stop
• SNRFe< threshold SNRFWGN
• Start with small network (small # of neurons or layers)
• Train the MLP etrain
• Compare SNRFe & SNRFWGN
• Add hidden neurons
Stopping criterion:SNRFe< threshold SNRFWGN
17
Optimization using SNRF
• Set the structure of MLP• Train the MLP with back-propagation iteration
etrain
• Compare SNRFe & SNRFWGN
• Keep training with more iterations
Applied in optimizing number of iterations in back-propagation training to avoid overfitting
(overtraining)
18
Prepare the data Data sample along the row: N samples Features along the column: M features Desired output in a row vector: N values Save “features” and “values” in a training MAT file
How to recall the function Run “main_MLP_LS.m” Specify MAT file path and name and MLP para
meters in command window.
M x N matrix: “Features”
1 x N vector: “Values”
Software simulation project
19
Input the path where data file can be found (C:*): E:\Research\MLP_LSInitial_desired\MLP_LS_package\Input the name of data file (*.mat): mackey_glass_data.matThere are overall 732 samples. How do you like to divide them into training and testing set? Number of training samples: 500 Number of testing samples: 232How many layers does MLP have? 3:2:7How many neurons there are on each hidden layer ? 3:1:10What kind of tranfer function you like to have on hidden neurons? 0. Linear tranfer function 1. Tangent sigmoid 2. Logrithmic sigmoid 2
Software simulation project
20
z2y2
d
W1
y1z1
b1 W2
x
b2 There are 4 types of training algorithms you can choose from. Which type you like to use? 1. Least-squared based training (I) 2. Least-squared based training with iterative neuron fitting (II) 3. Least-squared based training with weighted signal adaptation (III) 4. Back-propagation training (BP) 1How many iterations you would like to have in the training ? 3How many Monte-Carlo runs you would like to have for the training? 2
Software simulation project
21
Results: J_train (num_layer, num_neuron)J_test (num_layer, num_neuron)SNRF (num_layer, num_neuron)
Present training and testing errors for various configurations of the MLP
Present the optimum configuration found by SNRF
Present the comparison of the results, including errors, network structure
Software simulation project
22
Typical database and literature survey Function approximation & classification dataset
“IEEE Neural Networks Council Standards Committee Working Group on Data modeling Benchmarks” http://neural.cs.nthu.edu.tw/jang/benchmark/#MG“Neural Network Databases and Learning Data”http://www.neoxi.com/NNR/Neural_Network_Databases.php“UCI Machine Learning Repository”http://www.ics.uci.edu/~mlearn/MLRepository.html
Data are normalized Multiple input, with signal output. For multiple output data, use separate MLPs. Compare results from literature which uses the same dataset (*)
Software simulation project