[ieee 2011 fourth international workshop on advanced computational intelligence (iwaci) - wuhan,...

7
AbstractIn this study, a new type of training the adaptive network-based fuzzy inference system (ANFIS) is presented by applying different types of the Differential Evolution branches. The TSK-type consequent part is a linear model of exogenous inputs. The consequent part parameters are learned by a gradient descent algorithm. The antecedent fuzzy sets are learned by basic differential evolution (DE/rand/1/bin) and then with some modifications in it. This method is applied to identification of the nonlinear dynamic system, prediction of the chaotic signal under both noise-free and noisy conditions and simulation of the two-dimensional function. Instead of DE/rand/1/bin, this paper suggests the complex type (DE/current-to-best/1+1/bin & DE/rand/1/bin) on predicting of Mackey-glass time series and identification of a nonlinear dynamic system revealing the efficiency of proposed structure. Finally, the method is compared with pure ANFIS to show the efficiency of this method. I. INTRODUCTION HE topologies of recurrent networks which are used to store past information consist of feedback loops. In contrast to pure feed-forward architectures which exhibit static input-output behavior, recurrent networks are able to memorize information from the past (e.g., last system states), and are thus more appropriate for the analysis of dynamic systems. Some recurrent fuzzy neural networks (FNNs) have already been proposed [1][2] to deal with temporal characteristic problems, and have been shown to outperform feed-forward FNNs and recurrent neural networks. One branch of recurrent FNNs uses feedback loops from the network output(s) as a recurrence structure [2]. (TSK) type [3][4] is a fuzzy system with crisp functions in consequent, which perceived proper for complex applications [5]. Instead of working linguistic rules of the kind introduced in the -type fuzzy rule- based systems, Takagi,, and Kang proposed a new model based on rules where the antecedent was composed of the linguistic variables and the consequent was represented by a function of the input variables. Manuscript received July 15, 2011. This work was supported in part by the Faculty of Engineering, Department of Computer Engineering, Science and Research Branch, Islamic Azad university of Tehran. A. Z. Zangeneh is with the Computer Engineering Department, Science and Research Branch, Islamic Azad university of Tehran, Iran (e-mail: [email protected]). M. Mansouri and M. Teshnehlab are with the Intelligent System laboratory (ISLAB), Control Department, K. N. Toosi University of Technology, Tehran, Iran (e-mail: {mohammad.mansouri, teshnehlab}@ee.kntu.ac.ir). A. K. Sedigh is with the Advanced Process Automation & Control laboratory (APAC), Control Department, K. N. University of Technology, Tehran, Iran (e-mail: [email protected]). It has been proved that with convenient number of rules, a TSK system could approximate every plant [6]. The TSK recurrent fuzzy networks (TRFN) [7] uses a global feedback structure, where the firing strengths of each rule are summed and fed back as internal network inputs. The TSK systems are widely used in the form of a neural-fuzzy system called Adaptive Network-based Fuzzy Inference System (ANFIS) [8]. The ANFIS is a class of adaptive networks that are functionally equivalent to fuzzy inference systems. The ANFIS architecture stands for adaptive network-based fuzzy inference system or semantically equivalent adaptive neural fuzzy inference system [9]. This adaptive network has good ability and performance in system identification, prediction and control and has been applied in many different systems. The ANFIS has the advantage of good applicability as it can be interpreted as local linearization modeling and conventional linear techniques for state estimation and control are directly applicable. The training method of ANFIS parameters is the main problem. Most of them are based on gradient and calculation of gradient in some steps is not easy. The chain rule must be used also may cause local minimum. Here, we try to propose a hybrid method which can update the premise parameters easier and faster than the gradient method. In the gradient method convergence of parameters is very slow and depends on initial value of parameters and as a result, finding the best learning rate is very difficult. But, in this new method called Differential Evolution (DE), we do not need the high number of epochs. Thus, the convergence of parameters is fast. DE is a stochastic, population-based search strategy developed by and Price [10] in 1995. While DE shares similarities with other evolutionary algorithms (EA), it differs significantly in the sense that distance and direction information from the current population is used to guide the search process. Furthermore, the original DE strategies were developed to be applied to continuous-valued landscapes. The rest of the paper is organized as follows: In section II, we review ANFIS. In section III, we discuss hybrid method. An overview of the proposed method and the application of this method to nonlinear identification are presented in section IV. Finally, section V presents our conclusions. II. THE CONCEPT OF ANFIS A.ANFIS Structure Jang presents a new algorithm called ANFIS that defines the composition of data base. A fuzzy inference system is implemented in a neural network that uses a hybrid method to adjust the parameters in its nodes. Both neural network and fuzzy inference systems [11] are model-free estimators Allahyar Z. Zangeneh, Mohammad Mansouri, Mohammad Teshnehlab, and Ali K. Sedigh Training ANFIS System with DE Algorithm T Fourth International Workshop on Advanced Computational Intelligence Wuhan, Hubei, China; October 19-21, 2011 978-1-61284-375-9/11/$26.00 @2011 IEEE 308

Upload: ali-k

Post on 14-Dec-2016

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: [IEEE 2011 Fourth International Workshop on Advanced Computational Intelligence (IWACI) - Wuhan, China (2011.10.19-2011.10.21)] The Fourth International Workshop on Advanced Computational

Abstract— In this study, a new type of training the adaptive network-based fuzzy inference system (ANFIS) is presented by applying different types of the Differential Evolution branches. The TSK-type consequent part is a linear model of exogenous inputs. The consequent part parameters are learned by a gradient descent algorithm. The antecedent fuzzy sets are learned by basic differential evolution (DE/rand/1/bin) and then with some modifications in it. This method is applied to identification of the nonlinear dynamic system, prediction of the chaotic signal under both noise-free and noisy conditions and simulation of the two-dimensional ���� function. Instead of DE/rand/1/bin, this paper suggests the complex type (DE/current-to-best/1+1/bin & DE/rand/1/bin) on predicting of Mackey-glass time series and identification of a nonlinear dynamic system revealing the efficiency of proposed structure. Finally, the method is compared with pure ANFIS to show the efficiency of this method.

I. INTRODUCTION

HE topologies of recurrent networks which are used to store past information consist of feedback loops. In

contrast to pure feed-forward architectures which exhibit static input-output behavior, recurrent networks are able to memorize information from the past (e.g., last system states), and are thus more appropriate for the analysis of dynamic systems. Some recurrent fuzzy neural networks (FNNs) have already been proposed [1][2] to deal with temporal characteristic problems, and have been shown to outperform feed-forward FNNs and recurrent neural networks. One branch of recurrent FNNs uses feedback loops from the network output(s) as a recurrence structure [2].���� −�� �� − ���(TSK) type [3][4] is a fuzzy system with crisp functions in consequent, which perceived proper for complex applications [5]. Instead of working linguistic rules of the kind introduced in the ������-type fuzzy rule-based systems, Takagi,�� ��, and Kang proposed a new model based on rules where the antecedent was composed of the linguistic variables and the consequent was represented by a function of the input variables.

Manuscript received July 15, 2011. This work was supported in part by the Faculty of Engineering, Department of Computer Engineering, Science and Research Branch, Islamic Azad university of Tehran.

A. Z. Zangeneh is with the Computer Engineering Department, Science and Research Branch, Islamic Azad university of Tehran, Iran (e-mail: [email protected]).

M. Mansouri and M. Teshnehlab are with the Intelligent System laboratory (ISLAB), Control Department, K. N. Toosi University of Technology, Tehran, Iran (e-mail: {mohammad.mansouri, teshnehlab}@ee.kntu.ac.ir).

A. K. Sedigh is with the Advanced Process Automation & Control laboratory (APAC), Control Department, K. N. ����� University of Technology, Tehran, Iran (e-mail: [email protected]).

It has been proved that with convenient number of rules, a TSK system could approximate every plant [6]. The TSK recurrent fuzzy networks (TRFN) [7] uses a global feedback structure, where the firing strengths of each rule are summed and fed back as internal network inputs. The TSK systems are widely used in the form of a neural-fuzzy system called Adaptive Network-based Fuzzy Inference System (ANFIS) [8]. The ANFIS is a class of adaptive networks that are functionally equivalent to fuzzy inference systems. The ANFIS architecture stands for adaptive network-based fuzzy inference system or semantically equivalent adaptive neural fuzzy inference system [9]. This adaptive network has good ability and performance in system identification, prediction and control and has been applied in many different systems. The ANFIS has the advantage of good applicability as it can be interpreted as local linearization modeling and conventional linear techniques for state estimation and control are directly applicable.

The training method of ANFIS parameters is the main problem. Most of them are based on gradient and calculation of gradient in some steps is not easy. The chain rule must be used also may cause local minimum. Here, we try to propose a hybrid method which can update the premise parameters easier and faster than the gradient method. In the gradient method convergence of parameters is very slow and depends on initial value of parameters and as a result, finding the best learning rate is very difficult. But, in this new method called Differential Evolution (DE), we do not need the high number of epochs. Thus, the convergence of parameters is fast.

DE is a stochastic, population-based search strategy developed by ����� and Price [10] in 1995. While DE shares similarities with other evolutionary algorithms (EA), it differs significantly in the sense that distance and direction information from the current population is used to guide the search process. Furthermore, the original DE strategies were developed to be applied to continuous-valued landscapes.

The rest of the paper is organized as follows: In section II, we review ANFIS. In section III, we discuss hybrid method. An overview of the proposed method and the application of this method to nonlinear identification are presented in section IV. Finally, section V presents our conclusions.

II. THE CONCEPT OF ANFIS

A.ANFIS Structure Jang presents a new algorithm called ANFIS that defines

the composition of data base. A fuzzy inference system is implemented in a neural network that uses a hybrid method to adjust the parameters in its nodes. Both neural network and fuzzy inference systems [11] are model-free estimators

Allahyar Z. Zangeneh, Mohammad Mansouri, Mohammad Teshnehlab, and Ali K. Sedigh

Training ANFIS System with DE Algorithm

T

Fourth International Workshop on Advanced Computational Intelligence Wuhan, Hubei, China; October 19-21, 2011

978-1-61284-375-9/11/$26.00 @2011 IEEE 308

Page 2: [IEEE 2011 Fourth International Workshop on Advanced Computational Intelligence (IWACI) - Wuhan, China (2011.10.19-2011.10.21)] The Fourth International Workshop on Advanced Computational

and share the common ability to deal with the uncertainties and noises. Both of them encode the information in a parallel and distributed architecture in a numerical framework. Hence, it is possible to convert fuzzy inference system architecture to a neural network and vice-versa. The proposed ANFIS can construct an input-output mapping based on both expert knowledge (in the form of linguistic form) and specified input-output data pairs.

An adaptive network is a multilayer feed-forward network in which each node performs a node function on the incoming signal as well as a set of parameters pertaining to this node of which its output depends. These parameters can be fixed or variable, and it is through the change of the last ones that the network is tuned. ANFIS has nodes with variable parameters, called square nodes which will represent the membership functions of the antecedents, and the linear functions for TSK-type consequent. The nodes in the intermediate layers connect the antecedent with the consequent. Their parameters are fixed and they are called circular nodes. Moreover, the network obtained this way would not remain a black box, since this network would have fuzzy inference system capabilities to interpret in terms of linguistic variables [12].

The ANFIS structure is demonstrated in five layers. It can be described as a multi-layered neural network as shown in Fig. 1.

Layer 1: The first layer executes a �������� ���� process. The parameters in this layer are referred to as premise parameters. In fact, any differentiable function such as bell and triangular-shaped membership functions (MFs) are valid for the nodes in this layer. Every node i in this layer is a square node with a node function. Usually MFs are used as Gaussian with maximum equal to 1 and minimum equal to 0 such as:

�! = μ#$(x%) = exp {− '*+,-./000

1$2

34} i=1, 2 (1)

�! = μ6$(x3) = exp {− '*+7-./000

1$2

34} i=3, 4 (2)

Where {89: ,<! } are the parameters of MFs which are affected in shape of MFs.

Layer 2: Each node represents the firing strength of a rule ithrough a conjunction operator. The function considered is the fuzzy AND. They are circular nodes with a node function: R% = o%o?, R3 = o%o@, R? = o3o?, R@ = o3o@ (3)

Layer 3: It calculates the ratio of ��ℎ rule’s firing strength with respect to the sum of all the rule’s firing strengths.

B!C = ∑ EG

HGI,

E,JE7JEKJEH (4)

Layer 4: Every node i in this layer is a square node with a node function:

FL(x%, x3) = αOL + α%

L x% + α3L x3 (5)

gL = QG(+,,+7)EGE,JE7JEKJEH

(6)

Layer 5: The single node in this layer computes the overall output as the sum of all incoming signals.

O(x%, x3) = ∑ QG(+,,+7)EGHGI,

E,JE7JEKJEH= g% + g3 + g? + g@ (7)

O(x%, x3) =∑ Q(U) ∏ μ

WGX(+G)7

GI,HXI,

∑ ∏ μWG

X(+G)7GI,

HXI,

(8)

In order to model complex nonlinear systems, the ANFIS model carries out input space partitioning that splits the input

space into many local regions from which simple local models (linear functions or even adjustable coefficients) are employed. The ANFIS uses fuzzy MFs to split each input dimension; the input space covered by MFs with overlapping that means several local regions can be activated simultaneously by a single input. As simple local models are adopted in ANFIS model, the ANFIS approximation ability will depend on the resolution of the input space partitioning which is determined by the number of MFs in ANFIS and the number of layers.

B.Learning AlgorithmsThe subsequent to the development of ANFIS approach, a

number of methods have been proposed for learning the parameters and for obtaining an optimal number of MFs.Four methods to update the parameters of ANFIS structure are introduced by Jang [8] as listed below:

� All parameters are updated by the gradient decent. � After the network parameters are set with their initial

values, the consequent part parameters is adjusted through the LSE which it is applied only once at the very beginning. Then, the gradient decent updates allparameters.

� Using extended Y Z[ � filter to update all parameters. � The hybrid learning combining GD (gradient descent)

and LSE. In this paper we introduced a hybrid method which has

less complexity and fast convergence.

III. Hybrid Method ANFIS’s network organizes two parts like fuzzy systems.

The first part is the antecedent part and the second part is the consequent part which they are connected to each other by rules in network form. These two parts can be adapted by different optimization methods, one of which is the hybrid learning procedure combining GD and DE. In a conventional fuzzy inference system, the number of rules is determined by

Fig. 1. The TSK neural fuzzy network with 2 inputs and 2 MF for an input

309

Page 3: [IEEE 2011 Fourth International Workshop on Advanced Computational Intelligence (IWACI) - Wuhan, China (2011.10.19-2011.10.21)] The Fourth International Workshop on Advanced Computational

an expert who is familiar with the target system to be modelled. In our simulation, however, no expert is available and the number of MFs assigned to each input variable is chosen empirically-that is, by plotting the data sets and examining them visually, or simply by trial and error. For data sets with more than three inputs, visualization techniques are not very effective and most of the time we have to rely on trial and error. This situation is similar to that of neural networks; there is just no simple way to determine in advance the minimal number of hidden units needed to achieve a desired performance level.

A. Gradient Descent Gradient-based algorithms are the most common and

important nonlinear local optimization techniques [13].The back propagation is a gradient-based technique that applies to neural network systems [17]. It is possible to decrease the difference between the actual output of ANFIS structure and the desired output of the ANFIS using gradient-based methods. Consider an error function as follows:

E] = − %3

^_] − _]b3 j = k �� �� ��k��� (9)

Where, yL is the output of the ANFIS structure, and yl is the desired output. We can optimize E] by using the partial derivatives in the differentiation chain rules [18].After the partial derivatives are computed, the linear equations can be used to update the consequent parameters from the q�ℎiteration to the (q + 1)�ℎ iteration as follows:

∆∝O]! =

z×|×~$�∑ ~$�

�$I,

(10)

∆∝%]! =

z×|×~$�∑ ~$�

�$I,

× �%] (11)

∆∝3]! =

z×|×~$�∑ ~$�

�$I,

× �3] (12)

B. Basic Differential Evolution

For the other Evolutionary Algorithms (EAs), variation from one generation to the next is achieved by applying crossover and/or mutation operators. If both these operators are used, crossover is usually applied first, after which the generated offspring are mutated. For these algorithms, mutation step sizes are sampled from some probability distribution function. DE differs from these evolutionary algorithms in that 1) mutation is applied first to generate a trial vector which

is then used within the crossover operator to produce one offspring, and

2) Mutation step sizes are not sampled from a prior known probability distribution function.

In DE, mutation step sizes are influenced by differences between individuals of the current population [10]. The positions of individuals provide valuable information about the fitness landscape. The initial values of premise parameters are set in such a way that the centres of the MFs are equally spaced along the range of each input variable. For example in two-dimensional ���� function, the range of each input variable is (−� ≤ �! ≤ �) and one of the ANFIS

used here contains 43 =16 rules, with four membership functions assigned to each input variable. The total number of fitting parameters is 64, including 16 premises (nonlinear) parameters (8 centres of MFs and 8 standard deviations) and 48 consequent (linear) parameters. (We also tried ANFIS model with 64 rules, because the first model is too simple to describe the highly nonlinear ���� function.)To train the premise parameters we used DE and proposed two methods for construction of the initial population.

Method1: A good uniform random initialization method is used to construct method1.

Method2: For each input variable we have four MFs, on the other hand, have four centres of MF that are equally distributed along the range of (min (�!) ≤ �! ≤ max (�!)). Slip shows the distance between two centres of MF belonging to the two neighbouring individuals of population, and in this paper, slip is calculated as below:(��+(L����)-�L� (L����))×(%-������ �� �Q�)

(������ �� �Q�)7�������L�� �L�� (13)

Then

slip = 3�(%-@)@7�O

= −0.0147 (14)

Therefore the initial values of the centres of MFs in population are distributed in an interval around them.

The initial values for standard deviations calculate as below: ��+(L����)-�L� (L����)

������ �� �Q�-%× (1 − shift) + random × shift (15)

Therefore the initial values of the standard deviations of MFs in population are distributed in an interval around a fixed value and the magnitude of the ( shift ) which determines the magnitude of this interval.

C. Difference Vectors Distances between individuals are very good indications

of the diversity of the current population, and of the order of magnitude of the step sizes that should be taken in order tocontract the population to one point. If there are large distances between individuals, it stands to justify that the individuals should make large step sizes in order to explore as much of the search space as possible. On the other hand, if the distances between individuals are small, step sizes should be small to exploit local areas. (��) indicates the number of difference vectors used.

D. Trial Vector The DE mutation operator produces a trial vector for each

individual of current population by mutating a target vector with a weighted differential. This trial vector will then be used by the crossover operator to produce offspring.

�!(�) = � ���� ������ + � ∑ ���������� ������ ¡¢£% (16)

Where �!(�) refers to a trial vector for ��ℎ parent and the way that the target vector is selected depends on DE strategy.

310

Page 4: [IEEE 2011 Fourth International Workshop on Advanced Computational Intelligence (IWACI) - Wuhan, China (2011.10.19-2011.10.21)] The Fourth International Workshop on Advanced Computational

E. General Notation A general notation was adopted in the DE literature,

namely DE/x/y/z [10]. Using this notation, x refers to the method of selecting the target vector, y indicates the number of difference vectors used, and z indicates the crossover method used.

IV. SIMULATION RESULTS In this section, the way DE employed to update the ANFIS

antecedent part parameters is shown. The antecedent part of ANFIS has two parameters which need training, the means and the standard deviation (STDEV). The membership functions are assumed Gaussian as in equations (1, 2). The parameters of consequent part are trained by gradient descent. Comparisons with ANFIS validate the performance of the way DE.

A. How to Apply DE for training ANFIS parameters There are two sets of trainable parameters in antecedent

part ({ 89: , <! }), each of these parameters has NMF genes. Where, NMF represents the number of MFs. The consequent part parameters ( {∝O

L , ∝%L , ∝3

L } ) also are trained during optimization algorithm.

We used a number of variations to the basic DE in our simulation. The different numbers of membership functions with different numbers of epochs are used for the different DE strategies.

The size of population has a direct influence on the exploration ability of DE algorithms. The more individuals are in the population, the more differential vectors are available, and the more directions can be explored. However, it should be kept in mind that the computational complexity per generation increases with the size of the population. Empirical studies provide the guideline that is (�¤ ≈ 10�¦).Where, �¤ is population size and �¦ is the number of genes for each parameter.

Initial means parameters are distributed sequentially in the domain of identification. Standard deviation parameters are determined according to the number of MFs and the domain intervals.

B. Nonlinear Function Modeling

Example1: Predicting chaotic dynamics. The fourth example is to predict future values of a chaotic time series, which is generated by

� = O.3¦(¨-©)%J¦,ª(¨-©)

− 0.1�(�) (17)

Equation (17) is also known as the chaotic Mackey-Glass differential delay equation. The initial conditions for x (0) and τ are 1.2 and 17, respectively. ANFIS structure has 4 inputs and one output. We use 840/360 data as training/test [14].

The results are illustrated in Table 1. The results of gradient descent method and complex (DE & GD) method are also shown in table 1 so that it could be possible to compare the results. Fig. 2 is depicted predicting of Mackey-Glass time series.

Example 2: Identification of a nonlinear dynamic system. In this example, the nonlinear system model with multiple time-delays is described as [16]

Table 1 Results of simulating Mackey-Glass series prediction

The train type of the antecedent part

The train type of the consequent

part

The number of MFs (each

input)/ EpochsTest Error Train Error

GD GD

2 / 500 5.2051e-005 7.2074e-005

4 / 500 5.2158e-005 7.9852e-005

4 / 250 1.1140e-004 1.4670e-004

6 / 167 1.0586e-004 1.5202e-004

8 / 250 5.8830e-005 8.4107e-005

8 / 125 1.2090e-004 1.5564e-004

DE/rand/1/bin GD

2 / 30 4.4669e-004 4.1855e-004

3 / 30 8.9599e-005 2.0459e-005

4 / 30 1.6140e-005 4.1669e-006

4 / 20 2.4909e-005 7.3149e-006

4 / 10 5.8407e-005 7.8806e-006

4 / 250 8.7885e-006 1.0085e-006

DE/rand/1/expo GD

2 / 30 3.8198e-004 3.3717e-004

3 / 30 6.9059e-005 2.2811e-005

4 / 30 1.2441e-005 1.0180e-005

4 / 20 1.8841e-005 4.6117e-006

4 / 10 1.1910e-005 1.0931e-005

4 / 250 6.2660e-006 2.6258e-006

DE/best/1/bin GD

2 / 30 7.8053e-004 7.2889e-004

3 / 30 5.7960e-005 1.6248e-005

4 / 30 1.5628e-005 3.7171e-006

4 / 20 3.8646e-005 6.2848e-006

4 / 10 4.9271e-005 2.9141e-005

4 / 250 1.2390e-005 2.6935e-006

DE/best/1/expo GD

2 / 30 3.3349e-004 6.5970e-005

3 / 30 3.9061e-005 1.4706e-005

4 / 30 4.1868e-005 1.5389e-005

4 / 20 1.0251e-005 1.4105e-005

4 / 10 3.3630e-005 2.2836e-005

4 / 250 5.5436e-006 5.4491e-006

DE/rand/3/bin GD

2 / 30 7.2959e-005 5.6971e-005

4 / 10 2.4233e-005 8.7561e-007

4 / 100 8.9657e-006 1.8663e-006

DE/rand/3/expo GD

2 / 30 8.6586e-005 6.1899e-005

4 / 10 1.1240e-005 2.7921e-005

4 / 100 1.0796e-005 3.4517e-006

DE/best/3/bin GD

2 / 30 1.1379e-004 2.6270e-006

4 / 3 2.9242e-005 4.2949e-006

4 / 10 3.5245e-005 3.6919e-006

DE/best/3/expo GD

2 / 30 3.3180e-004 6.3421e-006

4 / 3 2.5915e-005 4.2255e-006

4 / 10 4.0498e-005 4.5050e-006

DE/current-to-best/1+1/bin GD

2 / 30 6.4933e-004 1.2009e-004

4 / 3 9.8025e-005 7.3219e-006

4 / 10 4.2559e-004 3.5356e-004

DE/current-to-best/1+1/bin& DE/rand/1/bin

GD

2 / 30 4.5198e-004 2.0101e-004

4 / 10 6.0014e-006 2.7051e-006

4 / 30 3.5323e-005 1.6468e-006

311

Page 5: [IEEE 2011 Fourth International Workshop on Advanced Computational Intelligence (IWACI) - Wuhan, China (2011.10.19-2011.10.21)] The Fourth International Workshop on Advanced Computational

y�(k + 1) = f *y�(k), y�(k − 1), y�(k − 2), u(k), u(k − 1)2 (17)

Where

f(x%, x3, x?, x@, x®) = +,+7+K+¯(+K-%)J+H%J+7

7J+K7 (18)

Here, the current output of the plant depends on three previous outputs and two previous inputs. The ANFIS structure, with five input nodes for feeding the appropriate past values of _°and u were used. The system input signal u(k) as the following equation [16]

u(k) =

⎩⎪⎪⎪⎨

⎪⎪⎪⎧sin *�µ

3®2 0 < q < 250

,+1.0 250 ≤ k < 500

,−1.0 500 ≤ k < 750

,0.3 sin *�µ

3®2 + 0.1 sin *�µ

?32 + 0.6 sin *�µ

%O2 ,

750 ≤ k < 1000

� (19)

The trigonometric used here contains two hidden layers with four-two neurons sine and cosine with or without

frequency and phase in hidden layers, five inputs and one output.

(a)

(b)

Fig. 2: Mackey glass prediction. (a) Using DE to train the antecedent part parameters in ANFIS structure (b) Using GD to train the antecedent and the consequent part parameters in ANFIS structure.

Table 2 Results of simulating nonlinear dynamic system prediction

The train type of the

antecedent part

The train type of the consequent

part

The number of MFs (each

input) / Epochs

Test Error Train Error

GD GD

2 / 500 0.0036 8.5324e-005

4 / 250 0.0017 7.7851e-005

4 / 30 0.0019 2.5545e-004

8 / 125 7.4678e-004

2.8731e-005

8 / 10 0.0025 2.3893e-004

DE/rand/1/bin GD

4 / 10 0.0057 0.02834 / 30 0.0050 0.0139

8 / 10 6.6945e-004 0.0048

DE/rand/1/expo GD

4 / 10 0.0163 0.03654 / 30 0.0224 0.0291

8 / 10 2.9706e-004 0.0084

DE/best/1/bin GD4 / 10 0.0125 0.02074 / 30 0.0067 0.01298 / 10 0.0034 0.0036

DE/best/1/expo GD4 / 10 0.0010 1.8155e-

0044 / 30 0.0090 0.01338 / 10 0.0012 0.0071

DE/rand/3/bin GD

4 / 10 0.005 0.02904 / 30 0.0041 0.0119

8 / 10 3.1786e-004 0.0072

DE/rand/3/expo GD

4 / 10 0.0066 0.02764 / 30 0.0054 0.0144

8 / 10 3.5513e-004 0.0061

DE/best/3/bin GD

4 / 10 0.0049 0.01964 / 30 0.0061 0.0126

8 / 10 3.3399e-004 0.0100

DE/best/3/expo GD

4 / 10 0.0064 0.02134 / 30 0.0050 0.0119

8 / 10 5.5242e-004 0.0123

DE/current-to-best/1+1/bin GD

4 / 10 0.0062 0.03074 / 30 0.0053 0.0178

8 / 10 2.8258e-004 0.0052

DE/current-to-best/1+1/bin & DE/rand/1/bin

GD

4 / 10 0.0350 0.05274 / 30 0.0302 0.0384

8 / 10 2.6240e-004 0.0027

312

Page 6: [IEEE 2011 Fourth International Workshop on Advanced Computational Intelligence (IWACI) - Wuhan, China (2011.10.19-2011.10.21)] The Fourth International Workshop on Advanced Computational

The ANFIS structure applied here contains five inputs and different numbers of membership functions for an input. We use 597/1000 data as training/test.

The results are illustrated in Table 2. The results of different methods are also shown in Table 2 so that it could

be possible to compare the results. Fig. 3 is depicted identification of the mentioned nonlinear system.

As results suggest training error achieved using ANFIS structure with DE train the antecedent part parameters is better than training all parameters with GD.

V. Conclusions In this paper, a population-based optimization algorithm

called Differential Evolution algorithm is proposed in order to train the antecedent part parameters of ANFIS structure. In our novel method, we used a number of variations to the basic DE to update the antecedent part parameters. The simulation results indicate the new approach has better results for complex nonlinear systems than training all

parameters just with GD. Some new algorithms, preferably those that have roots in nature may also be employed in the ANFIS structure to help it reach the globally optimal solution. Since these algorithms are free of derivation which is very difficult to calculate to train, the antecedent part parameters complexity of these approaches are less than other training algorithms like GD. On the other hand, the number of computation required by each algorithm has shown that DE requires less to achieve the same error goal as with the back propagation. Also, the local minimum problem in GD algorithm to train DE algorithm is solved. The effectiveness of the proposed DE method was indicated by applying it to identification of nonlinear method.

VI. REFERENCES

(a)

(b)

Fig. 3: Nonlinear dynamic system prediction. (a) Using DE to train the antecedent part parameters in ANFIS structure (b) Using GD to train the antecedent and the consequent part parameters in ANFIS structure.

[1] J. Zhang and A. J. Morris, “ Recurrent neuro-fuzzy networks for nonlinear process modeling,” IEEE Trans. Neural Networks, vol.10, no.2, pp. 313-326, Feb. 1999.

[2] C.H. Lee and C.C. Teng, “ Identification and control of dynamic systems using recurrent fuzzy neural networks,” IEEE Trans. Fuzzy Systems, vol. 8, no.4, pp.349-366, Aug.2000.

[3] M.Sugeno and G. T. Kang, “Structure identification of fuzzy model,”Fuzzy sets and systems, pp.15-33, 1998.

[4] T. Takagi and M. Sugeno, “Fuzzy identification of systems and its application to modeling and control,” IEEE Trans. Sys., Man& Cybernetics, pp.116-132, 1985.

[5] R. Alcala, J. Casillas, O. Cordon and F. Herrera, Learning TSK rule- based system from approximate ones by mean of MAGUL methodology.Granada uni of Spain, Oct.2000.

[6] M. Mannle, FSTM: Fast Takagi-Sugeno fuzzy modeling. Uni of Karsruhe, 1999.

[7] C.F. JUANG, “A TSK-type recurrent fuzzy network for dynamic systems processing by neural network and genetic algorithm,” IEEE Trans. Fuzzy Systems, vol.10, no. 2, pp.155-170, April 2002.

[8] Jyh-Shing Roger Jang, “ ANFIS: Adaptive Network Based Fuzzy Inference System,” IEEE Trans. Sys., Man & Cybernetics, vol.23, no. 3, May-June 1993.

[9] J.-S. R. Jang, C.-T. Sun and E. Mizutani, Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence. Prentice- Hall, Inc. 1997.

[10] A.P. engelbrecht, Computational Intelligence: An introduction.Second Edition, John Wiley & Sons, Ltd, 2007.

[11] R. R. Yager and L. A. Zadeh, Fuzzy sets Neural Network, and Soft Computing,.Van Nonstrand Reinhold, 1994.

[12] M. Kumar and P. G. Devendra, “ Intelligent Learning of Fuzzy Logic Controllers via Neural Network and Genetic Algorithm, ” Proceedings of 2004 JUSFA 2004 Japan-USA Symposium on Flexible Automation Denver, Colorado,2004.

[13] O. Nelles, Nonlinear system identification from classical approaches to neural networks and fuzzy models. Springer, 2000.

[14] M. A. Shoorehdeli, M. Teshnehlab and A.K Sedigh, “Tranning ANFIS as an identifier with intelligent hybrid stable learning algorithm base on particle swarm optimization and extended Kalman filter,” Fuzzy Sets and Systems, vol. 160, pp. 922-948, 2009.

[15] K.S. Narendra and K. Parthasarathy, “ Identification and control of dynamical system using neural networks,” IEEE Trans. Neural Networks, vol.1, pp.4-27, Jan. 1990.

[16] C. J. Lin and Y. J. Xu, “ A selfadaptive neural fuzzy network with group-based symbiotic evolution and its prediction applications,”Fuzzy Sets and Systems, vol.157, pp.1036-1056,2006.

313

Page 7: [IEEE 2011 Fourth International Workshop on Advanced Computational Intelligence (IWACI) - Wuhan, China (2011.10.19-2011.10.21)] The Fourth International Workshop on Advanced Computational

[17] M.M. Gupta, L. Jin, and N. Homma, Static and Dynamic Neural Networks from Fundamentals to Advanced Theory. John Wiley & Sons, Inc, 2003.

[18] R. Alcal’ a, J. Casillas, O. Cord’on , F. Herrera, and S. J. I. Zwiry, Techniques for Learning and Tuning Fuzzy Rule-Based Systems for Learning and Tuning Fuzzy Rule-Based Systems for Linguistic Modeling and their Application. E.T.S. de Ingenier’ia Inform’atica, University of Granda. 18071- Granada, Spain,1999.

314