integrating neural network and genetic algorithm to solve function approximation combined with...

23
Integrating Neural Netwo rk and Genetic Algorithm to Solve Function Approx imation Combined with Op timization Problem Term presentation for CSC7333 Machi ne Learning Xiaoxi Xu May 3, 2 006

Upload: emil-daniel

Post on 29-Dec-2015

219 views

Category:

Documents


3 download

TRANSCRIPT

Integrating Neural Network and Genetic Algorithm to Solve Function Approximation Combined with Optimization ProblemTerm presentation for CSC7333 Machine Learning Xiaoxi Xu

May 3, 2006

Outline

Problem description & analysis Neural Network Genetic Algorithm Implementations & experiments Results Remarks Conclusion

Problem Description

We have plentiful data gathered over time; We are not aware the underlying relationship

between the data input ( some are human controllable) and its output;

We expect to minimize or maximize the output in the future;

We hope to know that what kind of input would generate a minimum or maximum output, so that we could adjust the input to achieve our end.

Problem Analysis

The characteristics of this problem are: a .Unknown exact nature of relationship

between input and output, likely non-linear b. Inputs are likely in N-dimension (N>10)

In addition…. d. The global optimum is expected to

obtain

Problem Break Up

1. Function Approximation 2. Optimization Problem

Solution for Function Approximation This solution should meet the following

requirements:

a. Have the parallel structure to handle N dimension variables

b. Be able to model the nonlinear relation between variables and their responses

c. Had better to be fault-tolerant (noisy data could appear when data set is large)

Solution for Function Approximation (Cont’d) Neural Network could be one. Why? (Any function can be approximated to arbitrary accuracy by a network with 3

layers with linear transfer function in output layer, and sigmoid function in hidden layer)

We want to train a NN as such: Topology: Multiple Layer Network Connection Type: Feed-forward (From a mathematical point of

view, a feed-forward neural network is a function. It takes an input and produces an output.)

Transfer Function: Logsigmoid, linear Training Algorithm: Back-propagation

Problem Break Up

1. Function Approximation 2. Optimization Problem

Solution for Optimization Problem To solve it mathematically?

In mathematics, LOCAL optima can ONLY be found for functions with good properties such as convex or unimodel. These line search methods include Conjugate gradient descent, quasi-Newton and so on.

This solution should meet the following requirements:

a. Be able to recognize the expression of the objective function

b. Be able to solve the function c. Should have a better chance to find a global

optimum

Solution for Optimization Problem (Cont’d) Genetic Algorithm could be one. Why? ( GA have most been applied to optimization problems)

We can use GA as such: a. Representation ( Any real number can be represented

by a string of numerical numbers in base 10)

b. Fitness function ( Neural Net)

c. Genetic operators ( Crossover,Selection,Mutation)

Implementation & Experiment of NN for 2D Function Approximation Initialization with random selected weights Multiple layer with one hidden layer of 20 hidden nodes Transfer function: sigmoid function (hidden layer),

purelin function( output layer) Back-propagation algorithm Learning rate: 0.05 Momentum: 0.1 Stop criteria: 1.MSE below 0.01% 2.Exceed epochs ( training times) 100 Test function: tan(sin(x)) - sin(tan(x)) Training data: [-4,4] -4, -3.6, -3.2, -2.8

Implementation & Experiment of Genetic Algorithm for 2D Function Optimization Representation: A string of numerical number in base 10 to

represent a real number and its sign; Random initialization; Range: [-4,4]; Population size: 25; Chromosome length: 6; One point crossover pr: 0.7; Mutation pr: 0.3; Roulette wheel selection: preference to best-fit individual ; Fitness function: Neural Network (represented by

inputs,weights and biases as follows: weight_oh*sigmoid(weight_hi* input + bias_hi*1)+ bias_oh*1 Elitism: best-fit individual goes to next generation; Stop criteria 1. Value of the fitness function changes less than 0.01 after 10

consecutive generations 2. Maximum generation is 30

Experiment Result-2D

Real function--red solid line Approximation by NN –blue dash line

Optimal by GA for approximation function --- magnate star

Implementation of NN for 3D Function Approximation One hidden layer of 30 hidden nodes Learning rate: 0.06 Momentum: 0.5 Stop criteria: 1.MSE below 0.01% 2.Exceed epochs ( training times) 250 Test function: 1.85*sin(x)*exp(-3*(y-1.5)^2)+0.7*x*exp(-4*(x-1.2)^2)-1.4*cos(x+

y)*exp(-5*(y+1.3)^2)-1.9*exp(-8*(x+0.5)^2) Training data: [-1,3] (difference between numbers is 0.17)

Implementation & Experiment of Genetic Algorithm for 3D Function Optimization Random initialization Range: [-1,3] Population size: 40 Chromosome length: 12 One point crossover pr: 0.8 Mutation pr: 0.6 Stop criteria: 1. Fitness function changes less than 0.01,after 10 consecutive

generations 2. Maximum generation is 100

Experiment Result- Mesh

Experiment Result - Contour Map

Optimum-- black circle

One More Experiment sin(0.007*x^5)/cos(exp(0.0009*y^7))

Remarks How to adjust some parameters for NN?

It’s really application-dependent.

With regard to my experiments: Learning rate:

too small, no apparent decrease of MSE over long while

too large, MSE jumped between decrease and increase Momentum:

compared to smaller, larger value could model better

but keep it in a proper degree Hidden nodes:

keep it in a proper degree,otherwise will be overfitting

more nodes, better performance

but our computer was down before we knew how best the performance

would be Epochs:

Longer, better performance, but avoid overfittng

trade off between the accuracy and the training time

Remarks (cont’d) How to adjust some parameters for GA? Population size: bigger is better, keep in a proper degree, otherwise will be overfitting Crossover probability: typically, [0.6,0.9] it works in my experiment Mutation probability: typically, [1/pop_size,1/chromosome_length] larger value in my experiment Generation size: larger, better performance,but avoid overfitting trade off between the accuracy and the time

Random initialization influences on the performance success We used random selected weights in training NN and used random selected

individuals for the first generation of GA. We found that sometimes, random initialization value determines the success of the performance

Room for Improvements

Random data would be used to train the neural network with noisy data added

More complex examples would be tested for the performance of NN

Building up more knowledge on adjusting NN & GA parameters

Error surface would be shown Time complexity would be analyzed

Conclusion

Integrating GA&NN by using NN as the fitness function of GA could approximate a good global optimum pretty close to that found by GA using the real function

GA performs well in searching for the global optimum no matter what the fitness function would be

In practical, Multiple Layer NNs with one hidden layer have an overall good performance in function approximation, but sometimes they still have the difficulty

The random initialization value could sometimes determine the performance success

Thanks!

Questions & Comments ??