Download - 1 Least-squares-based Multilayer perceptron training with weighted adaptation -- Software simulation project EE 690 Design of Embodied Intelligence

1

Least-squares-based Multilayer perceptron training with weighted

adaptation-- Software simulation project

EE 690Design of Embodied Intelligence

http://www.ohio.edu/

2

Outline

Multilayer PerceptronLeast-squares based Learning AlgorithmWeighted Adaptation in trainingSignal-to-Noise Ratio Figure and

OverfittingSoftware simulation project

3

)( 11 yfz ii xwy 11

122ii zwy )( 22 yfz

Inputs x Outputs z

Feedforward (no recurrent connections) network with units arranged in layers

Multilayer perceptron (MLP)

4

Efficient mapping from inputs to outputs

Powerful universal function approximation

Number of inputs and outputs determined by the data

Number of hidden neurons Number of hidden layers

inputsoutputs

Multilayer perceptron (MLP)

MLP

5

Multilayer Perceptron Learning

hidden layerinput layer

output layer

Back-propagation (BP) training algorithm: how much each weight is responsible for the error signal

BP has two phases:• Forward pass phase: feedforward

propagation of input signals through network

• Backward pass phase: propagates the error backwards through network

6

Backward Pass

We want to know how to modify weights in order to decrease E.

Use gradient descent:

1

2))()((1

)(k

kk tytdk

tE

)(

)()()1(

tw

tEtwtww

ijijijij

Multilayer Perceptron Learning

1. Gradient-based adjustment could go to local minima

2. Time-consuming due to large number of learning steps and the step size needs to be configured

http://en.wikipedia.org/wiki/Image:Gradient_ascent_%28surface%29.png

7

Least-squares based Learning Algorithm

Least-squared fit (LSF): to obtain the minimum sum of squared error

For underdetermined problem, LSF finds the solution with the minimum SSE

For overdetermined problem, pseudo-inverse finds the solution with minimum norm

Can be applied in the optimization for weights or signals on the layers

YWX

XYW

YXW

l

l

l

1

1

)(

)(

Optimized weights

Optimized signals

8

I. Start with desired output signal back-propagation

signals optimization

1. Propagation of the desired outputs back through layers

2. Optimization of the weights between layers

(1). y2=f -1(z2), scale y1 to (-1, 1).(2). Based on W2, b2: W2.z1=y2-b2.(3). y1=f-1(z1), scale y1 to (-1, 1).(4). Optimize W1, b1 to satisfy W1.x-b1=y1.(5). Evaluate z1, y1 using the new W1 and bias b1.(6). Optimize W2, b2 to satisfy W2.z1+b2=y2.(7). Evaluate z2, y2 using the new W2 and bias b2.(8). Evaluate the MSE

Least-squares based Learning Algorithm (I)

z2y2

d

W1

y1z1

b1

W2

x

b2

9

Least-squares based Learning Algorithm (I)

Weights optimization with weighted LSF

The location of x on the transfer function determines its effect on output signal of this layer

dy/dx weighting term in LSF

Optimize W1, b1 to satisfy W1.x=y1-b1

Weighted LSF

k

l

l

YXW

YXW

0

...

2

01

ΔxΔx

Δy

Δy

10

Least-squares based Learning Algorithm (II)

II. Weights optimization with iterative fitting

W1 can be further adjusted based on the output error

211W

x

11 jW

d)(

)(...)()(

)(

112

1121

12

1212

11

1211

112

xWfWde

xWfWxWfWxWfW

xWfWd

out

jnjnjjjj

out

Each hidden neuron: basis function

Start with the 1st hidden neurons, and continue to other neurons as long as eout exists

11

III. Start with input feedforward weights optimization

1. Propagation of the inputs forward through layers

2. Optimization of the weights between layers and signals on layers

(1). Evaluate z1, y1 using the initial W1 and bias b1. (2). y2=f -1(d).(3). Optimize W2, b2 to satisfy W2.z1+b2=y2. (4). Based on W2, b2, optimize z1 to satisfy W2.z1-b2=y2. (5). y1=f-1(z1). (6). Optimize W1, b1 to satisfy W1.x+b1=y1.(7). Evaluate y1, z1, y2, z2 using the new W1,W2 and bias b1,b2.(8). Evaluate the MSE

Least-squares based Learning Algorithm (III)

z2y2

d

W1

y1z1

b1

W2

x

b2

12

Least-squares based Learning Algorithm (III)

Signal optimization with weighted adaptation

The location of x on the transfer function determines how much the signal can be changed

x

2

2

12

22)11(2

1222

ezW

byzzW

zWbye

max1z

min1z

y

)1('0

...

0)1('

12

max1

max1

2

yf

yf

ezW

z

z

max1z

min1z

13

Overfitting problem

Learning algorithm can adapt MLP to fit into the training data.

For the noisy training data, how well we should learn into the data?

Overfitting Number of hidden neurons

Number of layers affect the training accuracy,

determined by users: critical

• Optimized Approximation Algorithm –SNRF criterion

1 2 3 4 5 6 7 8 9 10-1.5

-1

-0.5

0

0.5

1

1.5

training datafunction approximation

14

• Sampled data: function value + noise• Error signal:

approximation error component + noise component

Signal-to-noise ratio figure (SNRF)

Noise part Should not be learned

Useful signalShould be reduced

• Assumption: continuous function & WGN as noise• Signal-to-noise ratio figure (SNRF):

signal energy/noise energy

• Compare SNRFe and SNRFWGN

Learning should stop – ?If there is useful signal left unlearnedIf noise dominates in the error signal

15

Signal-to-noise ratio figure (SNRF)

1 1.5 2 2.5 3 3.5 4 4.5 5-1.5

-1

-0.5

0

0.5

1

1.5Training dataQuadratic fitting

1 1.5 2 2.5 3 3.5 4 4.5 5-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

Training data and approximating function Error signal

),...2,1( Ninse iii

approximation error component noise component+

16

Optimization using SNRF

Noise dominates in the error signal, Little information left unlearned,

Learning should stop

• SNRFe< threshold SNRFWGN

• Start with small network (small # of neurons or layers)

• Train the MLP etrain

• Compare SNRFe & SNRFWGN

• Add hidden neurons

Stopping criterion:SNRFe< threshold SNRFWGN

17

Optimization using SNRF

• Set the structure of MLP• Train the MLP with back-propagation iteration

etrain

• Compare SNRFe & SNRFWGN

• Keep training with more iterations

Applied in optimizing number of iterations in back-propagation training to avoid overfitting

(overtraining)

18

Prepare the data Data sample along the row: N samples Features along the column: M features Desired output in a row vector: N values Save “features” and “values” in a training MAT file

How to recall the function Run “main_MLP_LS.m” Specify MAT file path and name and MLP para

meters in command window.

M x N matrix: “Features”

1 x N vector: “Values”

Software simulation project

19

Input the path where data file can be found (C:*): E:\Research\MLP_LSInitial_desired\MLP_LS_package\Input the name of data file (*.mat): mackey_glass_data.matThere are overall 732 samples. How do you like to divide them into training and testing set? Number of training samples: 500 Number of testing samples: 232How many layers does MLP have? 3:2:7How many neurons there are on each hidden layer ? 3:1:10What kind of tranfer function you like to have on hidden neurons? 0. Linear tranfer function 1. Tangent sigmoid 2. Logrithmic sigmoid 2


20

z2y2

d

W1

y1z1

b1 W2

x

b2 There are 4 types of training algorithms you can choose from. Which type you like to use? 1. Least-squared based training (I) 2. Least-squared based training with iterative neuron fitting (II) 3. Least-squared based training with weighted signal adaptation (III) 4. Back-propagation training (BP) 1How many iterations you would like to have in the training ? 3How many Monte-Carlo runs you would like to have for the training? 2


21

Results: J_train (num_layer, num_neuron)J_test (num_layer, num_neuron)SNRF (num_layer, num_neuron)

Present training and testing errors for various configurations of the MLP

Present the optimum configuration found by SNRF

Present the comparison of the results, including errors, network structure


22

Typical database and literature survey Function approximation & classification dataset

“IEEE Neural Networks Council Standards Committee Working Group on Data modeling Benchmarks” http://neural.cs.nthu.edu.tw/jang/benchmark/#MG“Neural Network Databases and Learning Data”http://www.neoxi.com/NNR/Neural_Network_Databases.php“UCI Machine Learning Repository”http://www.ics.uci.edu/~mlearn/MLRepository.html

Data are normalized Multiple input, with signal output. For multiple output data, use separate MLPs. Compare results from literature which uses the same dataset (*)


http://neural.cs.nthu.edu.tw/jang/benchmark/#MG

http://neural.cs.nthu.edu.tw/jang/benchmark/#MG

http://www.neoxi.com/NNR/Neural_Network_Databases.php

http://www.neoxi.com/NNR/Neural_Network_Databases.php

Download - 1 Least-squares-based Multilayer perceptron training with weighted adaptation -- Software simulation project EE 690 Design of Embodied Intelligence

Top Related