setting artificial neural networks parameters

32
ARTIFICIAL NEURAL NETWORKS PARAMETERS Setting ARTIFICIAL NEURAL NETWORKS PARAMETERS

Upload: madhumita-tamhane

Post on 09-Jan-2017

378 views

Category:

Engineering


3 download

TRANSCRIPT

Page 1: Setting Artificial Neural Networks parameters

ARTIFICIAL NEURAL

NETWORKS PARAMETERS

Setting ARTIFICIAL NEURAL

NETWORKS PARAMETERS

Page 2: Setting Artificial Neural Networks parameters

NEED FOR SETTING PARAMETER VALUES

1. LOCAL MINIMA w1 – Global Minima w2, w3 – Local minima

W1 W2 W3

1

3 2

Erms

Erms

min

Page 3: Setting Artificial Neural Networks parameters

NEED FOR SETTING PARAMETER VALUES

2. LEARNING RATE Small learning rate – Slow and lengthy

learning Large learning rate –

Output may saturate or may swing across desired. May take too long to train.

3. Learning will improve and network training will

converge if inputs and outputs are statistical i.e. numeric.

Page 4: Setting Artificial Neural Networks parameters

TYPES OF TRAINING Supervised Training

•Supplies the neural network with inputs and the desired Outputs

• Response of the network to the inputs is measured

•The weights are modified to reduce the difference between the actual and desired outputs Unsupervised Training

•Only supplies inputs

•The neural network adjusts its own weights so that similar inputs cause similar outputs

•The network identifies the patterns and differences in the inputs without any external assistance

Page 5: Setting Artificial Neural Networks parameters

I.INITIALISATION OF WEIGHTS

Larger weights will drive the output of layer 1 to saturation.

Network will require larger training time to emerge out of saturation.

Weights chosen as :- Small weights

between -1 and 1

Or

between -0.5 and 0.5

Page 6: Setting Artificial Neural Networks parameters

INITIALISATION OF WEIGHTS

PROBLEM ITH THIS CHOISE:

If some of input parameters are very high, they will predominate the output.

e. g. x = [ 10 2 0.2 1]

SOLUTION:

Weights are initialized as inversely proportional to input.

Output will not depend on any individual parameters, but total input as a whole.

Page 7: Setting Artificial Neural Networks parameters

RULE FOR INITIALISATION OF WEIGHTS

Weight between input and 1st layer: P vij = (1/2P) ∑p=1 (1/|xj|)

P is total no of input patterns.

Weight between 1st layer and output layer: P wij = (1/2P) ∑p=1 (1/ f(∑ vij xj )

Page 8: Setting Artificial Neural Networks parameters

II. FREQUENCY OF WEIGHT UPDATES Per pattern training: Weight changes after every

input is applied. Input set repeated if NN is not trained yet. Per epoch training: Epoch is one iteration through the

process of providing the network with an input and updating the network's weights

Many epochs are required to train the neural network

weight changes as suggested by every input are accumulated together into a single change at the end of each epoch i.e. set of patterns.

No change in weight at end of each input. Also called BATCH MODE training

Page 9: Setting Artificial Neural Networks parameters

FREQUENCY OF WEIGHT UPDATES

Advantages / Disadvantages Batch mode training not possible for on-

line training. For large applications, with large training

time, parallel processing may reduce time in batch mode training.

Per pattern training is more expensive as weight changes more often.

Per pattern suitable for small NN and small data set

Page 10: Setting Artificial Neural Networks parameters

III. LEARNING RATE

FOR PERCEPTRON TRAINING ALGORITHM

Too small η – Very slow learning Too large η – Output may saturate on one

direction.

η = 0 --- no weight change η = 1 --- Common Choice

Page 11: Setting Artificial Neural Networks parameters

PROBLEM WITH η = 1 If η = 1 ∆w = ± x New Output = (w + ∆w)t x Output = wtx ± xtx Here if wtx > xtx output will always be positive

and grows in one direction only. Should be - wtx < ∆wtx ∆w = ± x η |xtx| >| wtx |

η > | wtx | / |xtx|

η is normally between 0 and 1.

Page 12: Setting Artificial Neural Networks parameters

III. LEARNING RATE FOR BACK PROPAGATION ALGORITHM Large η in early iterations and steadily

decrease it when NN is converging. Increase η at every iteration that improves

performance by significant amount and vise versa.

Steadily double the η untill error value worsens. If Second derivative of E, ▼2E is constant and

low, η can be large. If Second derivative of E, ▼2E is large, η can be

small. For above, more computation required.

Page 13: Setting Artificial Neural Networks parameters

MOMENTUM

•Training done to reduce this error.

•Training may stop at local minima instead global minima.

Page 14: Setting Artificial Neural Networks parameters

MOMENTUM

Can be prevented if weight changes depend on average gradient of Error, rather than gradient at a point.

Averaging δE/ δw in a small neighborhood leads the network in general direction of MSE decrease without getting stuck at local minima.

May become complex.

Page 15: Setting Artificial Neural Networks parameters

MOMENTUM

Shortcut method: Weight change at ith iteration of back

propagation algorithm also depends on immediately preceding weight changes.

This has an averaging effect. This diminishes drastic fluctuations in weight

changes over consecutive iterations. Achieved by adding momentum to weight

update rule.

Page 16: Setting Artificial Neural Networks parameters

MOMENTUM

Δwkj(t+1) = ηδkxi + α∆wkj(t) ∆wkj(t) is weight change required at time t . α is a constant . α ≤ 1. Disadvantage: Past training trend can strongly bias current

training. α depends on application. α = 0, no effect of past value. α = 1, no effect of current value.

Page 17: Setting Artificial Neural Networks parameters

What constitutes a “good” training set?

Samples must represent the general population Samples must contain members of each class Samples in each class must contain a wide range

of variations or noise effect

Page 18: Setting Artificial Neural Networks parameters

GENERALIZABILITY

Occurs more in large NN with less inputs. Inputs are repeated while training till error

reduces. This leads to network memorizing the inputs

samples. Such trained NN may behave correctly with

training data but fail with any unknown data. Also called over training.

Page 19: Setting Artificial Neural Networks parameters

GENERALIZABILITY- SOLUTION

The set of all known samples is broken into two orthogonal (independent) sets:

Training set - A group of samples used to train the neural network

Testing set - A group of samples used to test the performance of the neural network ◦ Used to estimate the error rate

Training continues as long as error to test data gradually reduces.

Training terminates as soon as error on test data increases.

Page 20: Setting Artificial Neural Networks parameters

GENERALIZABILITY

E

time

Error on test data

Error on training

data

Time when error on test data starts to increase

•Performance over test data is monitored over

several iterations, not just one iteration.

Page 21: Setting Artificial Neural Networks parameters

GENERALIZABILITY Weight will NOT change on test data. Overtraining can be avoided by using small

number of parameters (hidden nodes and weights).

If size of training set is small, multiple sets can be created by adding small randomly generated noise or displacement.

X = { x1, x2, x3…..xn} then X’ = { x1+ß1, x2+ß2, x3+ß3… xn + ßn}

Page 22: Setting Artificial Neural Networks parameters

NO. OF HIDDEN LAYERS AND NODES

Mostly obtained by trial and error. Too few nodes – NW may not be efficient. Too large nodes –

Computation is tedious and expensive.

NW may memorize the inputs and perform poorly on test data.

NW is called well trained if performs well on data not used for testing.

Hence NN should be capable of generalizing from input, rather than memorizing the inputs.

Page 23: Setting Artificial Neural Networks parameters

NO. OF HIDDEN LAYERS AND NODES

Methods: Adaptive algorithm- ◦ Choose large number of nodes and train.

◦ Gradually discard nodes one by one during training.

◦ Train till performance reduces below unacceptable level.

◦ NN to be retrained at each change in nodes.

◦ Or vice versa

◦ Choose small number of nodes and increase nodes till performance is satisfactory.

Page 24: Setting Artificial Neural Networks parameters

Let’s see how NN size advances:

Linear Classification:

L1 ax1+bx

2+c>0

ax1+bx

2+c<0

L1

Page 25: Setting Artificial Neural Networks parameters

Let’s see how NN size advances:

Two class problem - Nonlinear

L1

L2 L11

L1

L2

Page 26: Setting Artificial Neural Networks parameters

Let’s see how NN size advances:

Two class problem - Nonlinear

L1

L2 L11

L3

L4

L1

L2

L3

L4

P

Page 27: Setting Artificial Neural Networks parameters

Let’s see how NN size advances:

Two class problem - Nonlinear

L22

P P1

P2

P3

P4

P1

P4

P2

P3

Page 28: Setting Artificial Neural Networks parameters

L22

P1

P2

P3

P4

L11

L11

L11

L11

Page 29: Setting Artificial Neural Networks parameters

NUMBER OF INPUT SAMPLES

As a thumb rule: 5 to 10 times as many samples as the number of weights to be trained.

Baum and Haussler suggest:

◦ P > |w| /(1-a) ◦ P is number of samples,

◦ |w| is number of weights to be trained,

◦ a expected accuracy on test set.

Page 30: Setting Artificial Neural Networks parameters

Non-numeric inputs

Nonnumeric inputs like colours have no inherent order.

Can not be depicted on an axis e.g. red-blue-green-yellow.

Colour becomes position sensitive. Results in Erroneous training.

Hence assign binary vector with component corresponding to each colour. e.g.

Green – 0 0 1 0 red – 1 0 0 0

Blue – 0 1 0 0 yellow – 0 0 0 1

But dimension increases drastically

Page 31: Setting Artificial Neural Networks parameters

Termination criteria “Halt when goal is achieved.” Perceptron training of linearly separable patterns – ◦ Correct classification of all samples.

◦ Termination is assured if ƞ is sufficiently small.

◦ Program may run indefinitely if ƞ is not appropriate.

◦ Different choice of if ƞ may yield classification.

Back propagation algorithm using delta rule– ◦ Termination can never be achieved with above criteria as

output can never be +1 or -1.

◦ Will have to fix Emin , the minimum error acceptable. Terminates as error goes below Emin.

Page 32: Setting Artificial Neural Networks parameters

Termination criteria Perceptron training of linearly non-separable

patterns – ◦ Above criteria will allow procedure to run indefinitely.

◦ Compare amount of progress in recent past.

◦ If number of misclassification has not changed in large step, samples are not linearly separable.

◦ Can fix limit of minimum % of correct classification for termination.