a two-pass hyklpjklbrid training algorithm for rbf networks

Upload: cooldoubtless

Post on 06-Jul-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/17/2019 A Two-pass Hyklpjklbrid Training Algorithm for RBF Networks

    1/4

    A two-pass hybrid training algorithm for RBF networks

     Ali Ekber ÖZDEMİR 1 , İlyas EMİNOĞLU 2 

    1 Ünye Meslek Yüksek Okulu,

    Ordu University, TR

    2Electrical & Electronic Engineering Dept.

    Ondokuz Mayıs University, [email protected] ,[email protected] 

    Abstract

    This paper presents a systematic construction of linearlyweighted Gaussian radial basis function (RBF) neuralnetwork. The proposed method is computationally a two-stagehybrid training algorithm. The first stage of the hybridalgorithm is a pre-processing unit which generates a coarsely-tuned RBF network. The second stage is a fine-tuning phase.The coarsely-tuned RBF network is then optimized by using atwo-pass training algorithm. In forward-pass, the outputweights of RBF are calculated by the Levenberg - Marquardt

    (LM) algorithm while the rest of the parameters is remainedfixed. Similarly, in backward-pass, the free parameters of basis function (center and width of each node) are adjusted bygradient descent (GD) algorithm while the output weights ofRBF are remained fixed. Hence, the effectiveness of the

     proposed method for an RBF network is demonstrated withsimulations.

    1.  Introduction

    Simple structure of RBF enables learning in stages, givesa reduction in the training time, and this has lead to theapplication of such networks to many practical problems. Thelearning strategies in the literature used for the design of RBFnetwork differ from each other mainly in the determination ofcenters. These can be categorized into the following groups

    [1]. 1. Fixed Centers Assigned Randomly Among InputSamples: In this method, which is the simplest one, thecenters are chosen randomly from the set of input trainingsamples.

    2. Orthogonalization of Regressors: The most commonlyused algorithm is orthogonal least squares (OLS) [2] whichselects a suitable set of centers (regressors) but might not bethe optimal set as demonstrated in among input trainingsamples [3].

    3. Supervised Selection of Centers: In this method, thecenters together with all other parameters of RBF networklinear weights, variances) are updated using a back-

     propagation type of learning.4. Input Clustering (IC) : The locations of centers are

    determined by a clustering algorithm applied to input training

    sample vectors.5. Input–Output Clustering (IOC) : The IC method in (4)

    is based on the distribution of the training inputs alone. Whenvariation of output in a cluster is high, centers are alsoselected based on both input and output data or joint input-output data such as [1].

    6. Evolutionary Algorithms: All RBF parameters areoptimized by genetic algorithms according to defined (singleor multi-objective) cost function, but this approach can becomputationally expensive [4].

    Several heuristic hybrid learning methods, which apply aclustering algorithm for locating the centers and subsequentlya linear least squares method for the linear weights, have been

     previously suggested with considerable success for manyapplications. A few hybrid learning methods is to mention[5],[9],[10],[17] and [19].

    The general framework of proposed hybrid two-stagestructure is shown in Fig. 1.

    Figure 1: General framework of the proposed two-stagehybrid training algorithm.

    Each stage of Fig. 1. has a unique operational target andcontributes model construction in a sequential manner. Thesetwo stages - i) pre-processing unit and ii) two-pass hybrid

    training unit - are summarized as below:The first stage (or the pre-processing unit) is coarse-

    tuning stage. It determines a coarsely-tuned RBF networkwhich has the final structure (in terms of node numbers) androughly initiated a set of free parameters. The pre-processingunit behaves like a structural and parametric initializationunit. The number and the locations of M initial centers ofRBF network are determined by using an orthogonal leastsquares (OLS) algorithm. Afterward, a coarse-tuning of all

    617

  • 8/17/2019 A Two-pass Hyklpjklbrid Training Algorithm for RBF Networks

    2/4

    free-parameters (centers, widths and weights) is achieved byusing the Gustafson-Kessel (GK) clustering procedure.

    The partition validation algorithm embedded into the GKclustering algorithm may further reduce the number of centerssince M (found by OLS algorithm) may not be optimal.Obtained RBF network is passed into the next stage for further

     processing and tuning. In the literature, usage of a kind of pre- processing unit for construction of an initial model of interest

    is not uncommon. A pre-processing unit is initially proposedin [5] by Linkens&Chen to construct a normalized fuzzysystem for model construction. A modified counter

     propagation network (CPN) is exploited as a preprocessor toextract a number of clusters which can be viewed as an initialfuzzy model from the raw data [5],[6]. The fine tuning step isachieved by using a back-propagation type of learning.

    The pre-processing unit (OLS+GK) adopted to constructinitial RBF model in this paper is one of the four-method

     proposed in [7] and [8].The second stage (or two-pass hybrid training unit) is a

    fine-tuning stage that is presented in this paper in detail. Thecoarsely-tuned RBF network is then optimized by using a two-

     pass training algorithm. In forward-pass of the computation,the output weights of RBF are adjusted by the Levenberg-Marquardt (LM) algorithm while the rest of the parameters is

    remained fixed. Similarly, in backward-pass of thecomputation, the free parameters of basis function (center andwidth of each node) are adjusted by gradient descent (GD)algorithm while the output weights of RBF are remained fixed.The final form of RBF network is efficiently constructedthrough computationally a two-pass hybrid training algorithm.

    2.  Two-pass hybrid training unit

    As can be seen from Table 1, in forward-pass, the outputweights of RBF are adjusted by the Levenberg - Marquardt(LM) algorithm while the rest of the parameters is remainedfixed. Initially, the output of hidden units (node output or φ )is treated as input vector and ei  =(di - yi) is treated as errorvector. The weights in the output layer are then updated by

    the LM algorithm. In backward pass, the free parameters of basis function (center and width of each node) are adjusted bygradient descent (GD) algorithm while the output weights(updated in the last forward-pass) of RBF are remained fixed.The final form of RBF network is efficiently constructedthrough a computationally two-pass algorithm. The two-passalgorithm employed in this paper is more efficient than GDmethod only as presented in [7] and [8]. The two-passalgorithm requires less total number of iterations than GDonly algorithm employed in [7] and [8].

    Table 1: Two-pass hybrid training procedure for linearly-weighted RBF networks. 

    2.1. 

    Levenberg-Marquardt (LM) Algorithm

    A mathematical description of the LM neural networktraining algorithm has been presented by Haganand Menhaj

    [12]. The LM algorithm is originally an intermediateoptimization algorithm between the Gauss–Newton (GN)method and gradient descent (GD) algorithm. And address thelimitations of each of those techniques. By combining the

     positive attributes of GN and GD algorithms, the LMalgorithm constructs a hybrid optimization technique which issuitable for many real-world applications. A detailedtreatment of the LM method can be found in [12], [13], [14]

    and [15],

    vector error eeee  L :]...21[=  

    vector  parameter 

    aaa

    aaa

    w

    M MDM 

     D

    :

    01

    10111

    =

    L

    MMMM

    L

     

    Jacobien matrix can be computed as follows:

    =

    0110111

    0

    11

    1

    1

    10

    1

    1

    1

    11

    1

    11

     L

    MD

     L

     L L

     D

     L L

    w

    M MDM 

    w

     D

    da

    de

    da

    de

    da

    de

    da

    de

    da

    de

    da

    de

    da

    de

    da

    de

    da

    de

    da

    de

    da

    de

    da

    de

     J 

    LLL

    MLM

    4 4 4 84 4 4 76

    LL

    4 4 4 84 4 4 76

    L

     

     I  J  J  H   LM T 

    λ +=   : Hessian matrix (  LM λ  :Marquardt parameter, I: unit matrix)

    e J  g T 

    =  : Gradient vector

     g  H W W t t  1)()1(   −+−=  : Updating low of free parameters

    Thus,  LM λ   is decreased after each successful step

    (reduction in cost function) and is increased only when atentative step would increase the cost function. In this way,the cost function might always be reduced at each iteration ofthe algorithm.

    2.2. 

    Gradient Descent (GD) Algorithm

    The GD algorithm utilizes a cost function given in equation(1) and detailed treatment of the GD method can be found in[16]. The desired output of RBF network is represented by d i,actual output is yi  and L shows total number of input data.The desired output of RBF network is represented by di,actual output is yi  and L shows total number of input data.Input-output data set is applied during training NGD times (thenumber of iteration) and the main goal is to minimize total

    cost function given in equation (2).2

    1)(

    2

    1i

     L

    ii  yd  E    −=   ∑

    =  (1)

    )min(1∑=

    =GD N 

    ii E T    (2)

    618

  • 8/17/2019 A Two-pass Hyklpjklbrid Training Algorithm for RBF Networks

    3/4

     The free-parameters of RBF network (widths and centers)

    using GD algorithm can be computed using equation (3) .

    i

    iii

    dE 

    φ µ φ φ  φ −=

    +1   (3)

     

    In equation (3), 1+iφ    is the current (updated) value of

    free-parameter, iφ  is the previous values of free-parameter

    and φ µ   depicts learning rate for this parameter. Using

    equation (3), all free parameters of basis functions can beupdated in such a way that total cost function is iterativelyminimized.

    3. 

    Experimental Results

    Example 1:  Box and Jenkins’s gas furnace is a famousexample of system identification [18]. The data consist of 296I/O measurements of a gas furnace system: the inputmeasurement u(t) is gas flow rate into the furnace and theoutput measurement y(t) is CO2  concentration in outlet gas.

    For modeling, u(t-6), u(t-1) , y(t-6) and y(t-1) are chosen asthe input variables of RBF network and y(t) is chosen as theoutput. The outcomes of the simulation are graphically

     presented in Fig. 2.

    Example 2: The last application of the proposed method is to predict complex time series [10], a special functionapproximation problem that arises in such real-world

     problems as detecting arrhythmia in heartbeats. The chaoticMackey–Glass differential is generated from the followingdelay differential equation (4) where t = 17 and the first 500data (x(t-3), x(t-2), x(t-1), x(t) and x(t+1) ) is obtained andnormalized in the range of [-1,1].

    )(1.0

    )(1

    )(2.0)(10

    t  x

    t  x

    t  x

    dt 

    t dx−

    −+

    −=

    τ 

    τ   (4)

    After completion of the training, the outcomes are graphically presented in Fig.3. Comparative results are given in Table 2.Parameters in Table 2 are given as below:

    εOLS : Termination parameter for OLS algorithmµC, µσ : Learning rates for centers and widths, respectivelyλLM : Marquardt parameter

     NEpoch : Total epoch number

    Table 2: The outcomes of two simulated examples. 

    4. 

    Conclusion

    Systematic construction of linearly-weighted Gaussian radial basis function (RBF) neural network with a two-stage hybridtraining method is presented. The first stage of the hybridalgorithm is a preprocessing unit which generates a coarsely-tuned RBF network. The second stage is a fine-tuning phasewhich employs computationally a two-pass algorithm. The

     proposed method is compared with ANFIS structure over twonon-linear benchmarks (including Box - Jenkins gas furnaceand Mackey-Glass chaotic time series) in terms of MSEerrors. As can be seen from Table 2, similar level of MSEerrors are obtained with the proposed method – along withfewer rule number- as compared to ANFIS structure. ANFIS

    gives slightly better results, but it employs more rule than the proposed method. When GD only algorithm is employed inthe second stage as presented in [7] and [8], the obtained MSEresults are expectantly poor as compared to both the proposedmethod and ANFIS.

    619

  • 8/17/2019 A Two-pass Hyklpjklbrid Training Algorithm for RBF Networks

    4/4

    5. 

    References

    [1] Uykan Z., Güzelis C., Çelebi M. E., and Heikki N. K.,“Analysis of input-output clustering for determiningcenters of RBFN”,  IEEE Transactions On Neural

     Networks, 11:851–858, 2000. [2] Chen S., Cowan C.F.N., and Grant P.M. “Orthogonal least

    squares learning algorithm for radial basis functionnetworks”  IEEE Transactions On Neural Networks,2:302–309, March 1991. 

    [3] Sherstinsky A. and Picard R. W. “On the efficiency of theorthogonal least squares training method for radial basisfunction networks”  IEEE Transactions On Neural

     Networks, 7:1995–200, 1996. [4] Buchtala O., Klimek M., and Sick B. “Evolutionary

    optimization of radial basis function classifiers for datamining applications”, IEEE Transactions on Systems Manand Cybernetics Part B-Cybernetics, 35:928–947, 2005. 

    [5] Chen M.-Y. and Linkens D.A. “ A systematic neuro-fuzzymodeling framework with application to material

     property prediction”,  IEEE Transactions on Systems,Man,and Cybernetics- Part B:Cybernetics, 31:781–790,2001. 

    [6] D.A. Linkens and Chen M.-Y. “Input selection and

     partition validation for fuzzy modelling using neuralnetwork”, Fuzzy Sets and Systems, 107:299–308, 1999. [7] Kayhan G., Özdemir A.E. and Eminoğlu İ., “Designing

    Pre-Processing Units For RBF Networks Part-1: InitalStructure Identification”, To appear in  InternationalSymposium on Innovations in Intelligent Systems and

     Applications, Trabzon,Türkiye,2009, (INISTA’09).[8] Özdemir A.E., Kayhan G. and Eminoğlu İ, “Designing

    Pre-Processing Units For RBF Networks Part-2: FinalStructure Identification And Course Tuning ofParameters”, To appear in  International Symposium on

     Innovations in Intelligent Systems and Applications,Trabzon,Türkiye ,2009, (INISTA’09).

    [9] Jang J.-S. Roger. “Anfis : Adaptive-network-based fuzzyinference system”, IEEE Transactıons On Systems, Man,

     And Cybernetics, 23:665–685, 1993. 

    [10] Ouyang C.-S., Lee W.-J., and Lee S.-J. “A tsk-typeneurofuzzy network approach to system modeling

     problems”,  IEEE Transactions on Systems, Man, andCybernetics, Part B, 35(4):751–767, Aug. 2005. 

    [11] Lee S.-J. and Ouyang C.-S. “A neuro-fuzzy systemmodeling with self-constructing rule generationandhybrid svd-based learning”, IEEE Transactions On FuzzySystems, 11(3):341–353, June 2003. 

    [12] Hagan M. T. and Menhaj M. B. “Training feedforwardnetworks with the marquardt algorithm”, ”,  IEEETransactions On Neural Networks, 5(6):989–993, Nov.1994. 

    [13] Wilamowski B. M., Chen Y., and Malinowski A.“Efficient algorithm for training neural networks with onehidden layer”,  In Neural Networks, IJCNN ’99.

     International Joint Conference on, 1999.[14] Kermani B. G., Schiffman S. S., and Nagle H. T.

    “Performance of the levenberg marquardt neural networktraining method in electronic nose applications”, Sensorsand Actuators B, 110:13–22, 2005. 

    [15] İcer S., Kara S., and Güven A. “Comparison of multilayer perceptron training algorithms for portal venous dopplersignals in the cirrhosis disease”,  Expert Systems with

     Applications, 31:406–413, 2006. 

    [16] Jenison R. L. and Fissell K. “A comparison of the vonmises and gaussian basis functions for approximatingspherical acoustic scatter”,  IEEE Transactions On

     Neural Networks, 6(5):1284–1287, Sept. 1995. [17] Staiano A, Tagliaferri R. , Pedrycz W., “Improving RBF

    networks performance in regression tasks by means of asupervised fuzzy clustering” ,  Neurocomputing,  69, 13-15, 1570-1581, Aug. 2006.

    [18] Kukolj D. and Levi E. “Identification of complex systems based on neural and takagi-sugeno fuzzy model”,  IEEETransactions on Systems, Man, and Cybernetics, Part B,34(1):272–282, Feb. 2004. 

    [19] Emami M. R Turksen I. B. Goldenberg A. A, “AnImproved Fuzzy Modeling Algorithm, Part I: InferenceMechanism, Part II: System Identification” ,    NAFIPS ,1996  

    Figure 2: The original data, modelled system and inputoutput errors are given for the gas furnace example.

     Figure 3: The original data, modelled system and input-output errors are given for the Mackey–Glass time series

    example.

    620