a two-pass hyklpjklbrid training algorithm for rbf networks

8/17/2019 A Two-pass Hyklpjklbrid Training Algorithm for RBF Networks

1/4

A two-pass hybrid training algorithm for RBF networks

Ali Ekber ÖZDEMİR 1 , İlyas EMİNOĞLU 2

1 Ünye Meslek Yüksek Okulu,

Ordu University, TR

2Electrical & Electronic Engineering Dept.

Ondokuz Mayıs University, [email protected] ,[email protected]

Abstract

This paper presents a systematic construction of linearlyweighted Gaussian radial basis function (RBF) neuralnetwork. The proposed method is computationally a two-stagehybrid training algorithm. The first stage of the hybridalgorithm is a pre-processing unit which generates a coarsely-tuned RBF network. The second stage is a fine-tuning phase.The coarsely-tuned RBF network is then optimized by using atwo-pass training algorithm. In forward-pass, the outputweights of RBF are calculated by the Levenberg - Marquardt

(LM) algorithm while the rest of the parameters is remainedfixed. Similarly, in backward-pass, the free parameters of basis function (center and width of each node) are adjusted bygradient descent (GD) algorithm while the output weights ofRBF are remained fixed. Hence, the effectiveness of the

proposed method for an RBF network is demonstrated withsimulations.

1. Introduction

Simple structure of RBF enables learning in stages, givesa reduction in the training time, and this has lead to theapplication of such networks to many practical problems. Thelearning strategies in the literature used for the design of RBFnetwork differ from each other mainly in the determination ofcenters. These can be categorized into the following groups

[1]. 1. Fixed Centers Assigned Randomly Among InputSamples: In this method, which is the simplest one, thecenters are chosen randomly from the set of input trainingsamples.

2. Orthogonalization of Regressors: The most commonlyused algorithm is orthogonal least squares (OLS) [2] whichselects a suitable set of centers (regressors) but might not bethe optimal set as demonstrated in among input trainingsamples [3].

3. Supervised Selection of Centers: In this method, thecenters together with all other parameters of RBF networklinear weights, variances) are updated using a back-

propagation type of learning.4. Input Clustering (IC) : The locations of centers are

determined by a clustering algorithm applied to input training

sample vectors.5. Input–Output Clustering (IOC) : The IC method in (4)

is based on the distribution of the training inputs alone. Whenvariation of output in a cluster is high, centers are alsoselected based on both input and output data or joint input-output data such as [1].

6. Evolutionary Algorithms: All RBF parameters areoptimized by genetic algorithms according to defined (singleor multi-objective) cost function, but this approach can becomputationally expensive [4].

Several heuristic hybrid learning methods, which apply aclustering algorithm for locating the centers and subsequentlya linear least squares method for the linear weights, have been

previously suggested with considerable success for manyapplications. A few hybrid learning methods is to mention[5],[9],[10],[17] and [19].

The general framework of proposed hybrid two-stagestructure is shown in Fig. 1.

Figure 1: General framework of the proposed two-stagehybrid training algorithm.

Each stage of Fig. 1. has a unique operational target andcontributes model construction in a sequential manner. Thesetwo stages - i) pre-processing unit and ii) two-pass hybrid

training unit - are summarized as below:The first stage (or the pre-processing unit) is coarse-

tuning stage. It determines a coarsely-tuned RBF networkwhich has the final structure (in terms of node numbers) androughly initiated a set of free parameters. The pre-processingunit behaves like a structural and parametric initializationunit. The number and the locations of M initial centers ofRBF network are determined by using an orthogonal leastsquares (OLS) algorithm. Afterward, a coarse-tuning of all

617


2/4

free-parameters (centers, widths and weights) is achieved byusing the Gustafson-Kessel (GK) clustering procedure.

The partition validation algorithm embedded into the GKclustering algorithm may further reduce the number of centerssince M (found by OLS algorithm) may not be optimal.Obtained RBF network is passed into the next stage for further

processing and tuning. In the literature, usage of a kind of pre- processing unit for construction of an initial model of interest

is not uncommon. A pre-processing unit is initially proposedin [5] by Linkens&Chen to construct a normalized fuzzysystem for model construction. A modified counter

propagation network (CPN) is exploited as a preprocessor toextract a number of clusters which can be viewed as an initialfuzzy model from the raw data [5],[6]. The fine tuning step isachieved by using a back-propagation type of learning.

The pre-processing unit (OLS+GK) adopted to constructinitial RBF model in this paper is one of the four-method

proposed in [7] and [8].The second stage (or two-pass hybrid training unit) is a

fine-tuning stage that is presented in this paper in detail. Thecoarsely-tuned RBF network is then optimized by using a two-

pass training algorithm. In forward-pass of the computation,the output weights of RBF are adjusted by the Levenberg-Marquardt (LM) algorithm while the rest of the parameters is

remained fixed. Similarly, in backward-pass of thecomputation, the free parameters of basis function (center andwidth of each node) are adjusted by gradient descent (GD)algorithm while the output weights of RBF are remained fixed.The final form of RBF network is efficiently constructedthrough computationally a two-pass hybrid training algorithm.

2. Two-pass hybrid training unit

As can be seen from Table 1, in forward-pass, the outputweights of RBF are adjusted by the Levenberg - Marquardt(LM) algorithm while the rest of the parameters is remainedfixed. Initially, the output of hidden units (node output or φ )is treated as input vector and ei =(di - yi) is treated as errorvector. The weights in the output layer are then updated by

the LM algorithm. In backward pass, the free parameters of basis function (center and width of each node) are adjusted bygradient descent (GD) algorithm while the output weights(updated in the last forward-pass) of RBF are remained fixed.The final form of RBF network is efficiently constructedthrough a computationally two-pass algorithm. The two-passalgorithm employed in this paper is more efficient than GDmethod only as presented in [7] and [8]. The two-passalgorithm requires less total number of iterations than GDonly algorithm employed in [7] and [8].

Table 1: Two-pass hybrid training procedure for linearly-weighted RBF networks.

2.1.

Levenberg-Marquardt (LM) Algorithm

A mathematical description of the LM neural networktraining algorithm has been presented by Haganand Menhaj

[12]. The LM algorithm is originally an intermediateoptimization algorithm between the Gauss–Newton (GN)method and gradient descent (GD) algorithm. And address thelimitations of each of those techniques. By combining the

positive attributes of GN and GD algorithms, the LMalgorithm constructs a hybrid optimization technique which issuitable for many real-world applications. A detailedtreatment of the LM method can be found in [12], [13], [14]

and [15],

vector error eeee L :]...21[=

vector parameter

aaa

aaa

w

M MDM

D

:

01

10111

=

L

MMMM

L

Jacobien matrix can be computed as follows:

=

0110111

0

11

1

1

10

1

1

1

11

1

11

M

L

MD

L

M

L L

D

L L

w

M MDM

w

D

da

de

da

de

da

de

da

de

da

de

da

de

da

de

da

de

da

de

da

de

da

de

da

de

J

LLL

MLM

4 4 4 84 4 4 76

LL

4 4 4 84 4 4 76

L

I J J H LM T

λ += : Hessian matrix ( LM λ :Marquardt parameter, I: unit matrix)

e J g T

= : Gradient vector

g H W W t t 1)()1( −+−= : Updating low of free parameters

Thus, LM λ is decreased after each successful step

(reduction in cost function) and is increased only when atentative step would increase the cost function. In this way,the cost function might always be reduced at each iteration ofthe algorithm.

2.2.

Gradient Descent (GD) Algorithm

The GD algorithm utilizes a cost function given in equation(1) and detailed treatment of the GD method can be found in[16]. The desired output of RBF network is represented by d i,actual output is yi and L shows total number of input data.The desired output of RBF network is represented by di,actual output is yi and L shows total number of input data.Input-output data set is applied during training NGD times (thenumber of iteration) and the main goal is to minimize total

cost function given in equation (2).2

1)(

2

1i

L

ii yd E −= ∑

= (1)

)min(1∑=

=GD N

ii E T (2)

618


3/4

The free-parameters of RBF network (widths and centers)

using GD algorithm can be computed using equation (3) .

i

iii

d

dE

φ µ φ φ φ −=

+1 (3)

In equation (3), 1+iφ is the current (updated) value of

free-parameter, iφ is the previous values of free-parameter

and φ µ depicts learning rate for this parameter. Using

equation (3), all free parameters of basis functions can beupdated in such a way that total cost function is iterativelyminimized.

3.

Experimental Results

Example 1: Box and Jenkins’s gas furnace is a famousexample of system identification [18]. The data consist of 296I/O measurements of a gas furnace system: the inputmeasurement u(t) is gas flow rate into the furnace and theoutput measurement y(t) is CO2 concentration in outlet gas.

For modeling, u(t-6), u(t-1) , y(t-6) and y(t-1) are chosen asthe input variables of RBF network and y(t) is chosen as theoutput. The outcomes of the simulation are graphically

presented in Fig. 2.

Example 2: The last application of the proposed method is to predict complex time series [10], a special functionapproximation problem that arises in such real-world

problems as detecting arrhythmia in heartbeats. The chaoticMackey–Glass differential is generated from the followingdelay differential equation (4) where t = 17 and the first 500data (x(t-3), x(t-2), x(t-1), x(t) and x(t+1) ) is obtained andnormalized in the range of [-1,1].

)(1.0

)(1

)(2.0)(10

t x

t x

t x

dt

t dx−

−+

−=

τ

τ (4)

After completion of the training, the outcomes are graphically presented in Fig.3. Comparative results are given in Table 2.Parameters in Table 2 are given as below:

εOLS : Termination parameter for OLS algorithmµC, µσ : Learning rates for centers and widths, respectivelyλLM : Marquardt parameter

NEpoch : Total epoch number

Table 2: The outcomes of two simulated examples.

4.

Conclusion

Systematic construction of linearly-weighted Gaussian radial basis function (RBF) neural network with a two-stage hybridtraining method is presented. The first stage of the hybridalgorithm is a preprocessing unit which generates a coarsely-tuned RBF network. The second stage is a fine-tuning phasewhich employs computationally a two-pass algorithm. The

proposed method is compared with ANFIS structure over twonon-linear benchmarks (including Box - Jenkins gas furnaceand Mackey-Glass chaotic time series) in terms of MSEerrors. As can be seen from Table 2, similar level of MSEerrors are obtained with the proposed method – along withfewer rule number- as compared to ANFIS structure. ANFIS

gives slightly better results, but it employs more rule than the proposed method. When GD only algorithm is employed inthe second stage as presented in [7] and [8], the obtained MSEresults are expectantly poor as compared to both the proposedmethod and ANFIS.

619


4/4

5.

References

[1] Uykan Z., Güzelis C., Çelebi M. E., and Heikki N. K.,“Analysis of input-output clustering for determiningcenters of RBFN”, IEEE Transactions On Neural

Networks, 11:851–858, 2000. [2] Chen S., Cowan C.F.N., and Grant P.M. “Orthogonal least

squares learning algorithm for radial basis functionnetworks” IEEE Transactions On Neural Networks,2:302–309, March 1991.

[3] Sherstinsky A. and Picard R. W. “On the efficiency of theorthogonal least squares training method for radial basisfunction networks” IEEE Transactions On Neural

Networks, 7:1995–200, 1996. [4] Buchtala O., Klimek M., and Sick B. “Evolutionary

optimization of radial basis function classifiers for datamining applications”, IEEE Transactions on Systems Manand Cybernetics Part B-Cybernetics, 35:928–947, 2005.

[5] Chen M.-Y. and Linkens D.A. “ A systematic neuro-fuzzymodeling framework with application to material

property prediction”, IEEE Transactions on Systems,Man,and Cybernetics- Part B:Cybernetics, 31:781–790,2001.

[6] D.A. Linkens and Chen M.-Y. “Input selection and

partition validation for fuzzy modelling using neuralnetwork”, Fuzzy Sets and Systems, 107:299–308, 1999. [7] Kayhan G., Özdemir A.E. and Eminoğlu İ., “Designing

Pre-Processing Units For RBF Networks Part-1: InitalStructure Identification”, To appear in InternationalSymposium on Innovations in Intelligent Systems and

Applications, Trabzon,Türkiye,2009, (INISTA’09).[8] Özdemir A.E., Kayhan G. and Eminoğlu İ, “Designing

Pre-Processing Units For RBF Networks Part-2: FinalStructure Identification And Course Tuning ofParameters”, To appear in International Symposium on

Innovations in Intelligent Systems and Applications,Trabzon,Türkiye ,2009, (INISTA’09).

[9] Jang J.-S. Roger. “Anfis : Adaptive-network-based fuzzyinference system”, IEEE Transactıons On Systems, Man,

And Cybernetics, 23:665–685, 1993.

[10] Ouyang C.-S., Lee W.-J., and Lee S.-J. “A tsk-typeneurofuzzy network approach to system modeling

problems”, IEEE Transactions on Systems, Man, andCybernetics, Part B, 35(4):751–767, Aug. 2005.

[11] Lee S.-J. and Ouyang C.-S. “A neuro-fuzzy systemmodeling with self-constructing rule generationandhybrid svd-based learning”, IEEE Transactions On FuzzySystems, 11(3):341–353, June 2003.

[12] Hagan M. T. and Menhaj M. B. “Training feedforwardnetworks with the marquardt algorithm”, ”, IEEETransactions On Neural Networks, 5(6):989–993, Nov.1994.

[13] Wilamowski B. M., Chen Y., and Malinowski A.“Efficient algorithm for training neural networks with onehidden layer”, In Neural Networks, IJCNN ’99.

International Joint Conference on, 1999.[14] Kermani B. G., Schiffman S. S., and Nagle H. T.

“Performance of the levenberg marquardt neural networktraining method in electronic nose applications”, Sensorsand Actuators B, 110:13–22, 2005.

[15] İcer S., Kara S., and Güven A. “Comparison of multilayer perceptron training algorithms for portal venous dopplersignals in the cirrhosis disease”, Expert Systems with

Applications, 31:406–413, 2006.

[16] Jenison R. L. and Fissell K. “A comparison of the vonmises and gaussian basis functions for approximatingspherical acoustic scatter”, IEEE Transactions On

Neural Networks, 6(5):1284–1287, Sept. 1995. [17] Staiano A, Tagliaferri R. , Pedrycz W., “Improving RBF

networks performance in regression tasks by means of asupervised fuzzy clustering” , Neurocomputing, 69, 13-15, 1570-1581, Aug. 2006.

[18] Kukolj D. and Levi E. “Identification of complex systems based on neural and takagi-sugeno fuzzy model”, IEEETransactions on Systems, Man, and Cybernetics, Part B,34(1):272–282, Feb. 2004.

[19] Emami M. R Turksen I. B. Goldenberg A. A, “AnImproved Fuzzy Modeling Algorithm, Part I: InferenceMechanism, Part II: System Identification” , NAFIPS ,1996

Figure 2: The original data, modelled system and inputoutput errors are given for the gas furnace example.

Figure 3: The original data, modelled system and input-output errors are given for the Mackey–Glass time series

example.

620

a two-pass hyklpjklbrid training algorithm for rbf networks

Documents