[ieee 2013 joint ifsa world congress and nafips annual meeting (ifsa/nafips) - edmonton, ab, canada...

Constructing Dense Fuzzy Systems by AdaptiveScheduling of Optimization Algorithms

Krisztian BalazsDepartment of Telecommunications

and Media InformaticsBudapest University of

Technology and EconomicsMagyar tudosok korutja 2.

Budapest, H-1117, HungaryEmail: [email protected]

Laszlo T. KoczyDepartment of AutomationSzechenyi Istvan University

Egyetem ter 1.Gyor, H-9026, Hungary

Email:[email protected] of Telecommunications

and Media InformaticsBudapest University of

Technology and EconomicsMagyar tudosok korutja 2.

Budapest, H-1117, HungaryEmail: [email protected]

Abstract—In this paper dense fuzzy rule based systems areconstructed for solving machine learning problems. During theknowledge extraction process a scheduling approach is applied,which adaptively switches between the different optimizationalgorithms based on their convergence speed in the phases of thelearning process, i.e. according to their respective local efficiency.

The scheduled optimization techniques are evolutionary algo-rithms that have shown efficiency in the construction of densefuzzy rule based systems in previous investigations.

Simulations runs are executed on standard benchmark datasets in order to evaluate the established fuzzy rule based learningsystem and to compare it to fuzzy systems built up using the sameoptimization methods without the scheduling approach.

Index Terms—Adaptive scheduling of optimization algorithms,Fuzzy rule based knowledge extraction, Dense fuzzy systems

I. INTRODUCTION

Because of their advantages, the scope of intelligent tech-nical applications based on soft-computing methods is contin-uously expanding in those fields, where sub-optimal solutionsmay be accepted. As a consequence, using fuzzy rule basedlearning and inference systems, as intelligent components ofcomplex systems, is also growing. Nevertheless, both theoryand application still contain many unsolved questions. Thus,researching the theory and applicability of such systems areboth important and actual tasks.

When performing knowledge extraction (or machine learn-ing) [1] the resulting model is usually selected from a pool ofpossible models by finding the optimal one from a multitudeof candidates. This means that the parameters of a parameter-izable modeling architecture are adjusted to the most suitablevalues, where the architecture can imitate the characteristicsof the involved data set most properly. This way the learningproblem can be formulated as an optimization task, and canbe solved by optimization methods.

In this paper a representative dense fuzzy architecturetogether with one of the most widely applied fuzzy inference

techniques, the Mamdani-inference [2] was chosen as themodel. For optimization, the efficient Bacterial EvolutionaryAlgorithm (BEA) [3] and its memetic extension were applied.

Our recent work [4] proposed an approach for combiningalgorithms by adaptively scheduling them during an opti-mization process in order to improve the convergence to theoptimum.

The aim of this paper is to propose a new, improvedscheduling method for the adaptive approach and to apply itin the knowledge extraction processes performed by the fuzzysystem. The efficiency of the newly proposed scheduler inrespect of the fuzzy learning system will be evaluated exper-imentally based on simulation runs carried out on standardbenchmark problem sets.

The next section gives a brief overview of the algorithmsand techniques used. After that, a new scheduling method isproposed in Section III. Then, the machine learning data setsapplied in the simulations will be enumerated, followed by thediscussion of the simulation results and the observed behaviorin Section IV. Finally, in the last section we summarize ourpresented research and draw some conclusions.

II. OVERVIEW OF THE TECHNIQUES AND ALGORITHMS

In order to carry out this investigation, it is necessary to befamiliar with some basic notions of fuzzy rule based inferencesystems, numerical optimization and machine learning.

The following subsections aim to give a brief overview ofsome important points of these theoretical aspects, which willbe referred to later repeatedly in the paper.

A. Supervised machine learningSupervised machine learning is a way of extracting knowl-

edge. Given a system (or phenomenon) characterized by adata set (samples) and a modeling architecture with ad-justable parameters. The samples are constituted of input-output ((xi, di)

ni=1) pairs corresponding to the system.

280978-1-4799-0348-1/13/$31.00 ©2013 IEEE

The task of supervised machine learning is to adjust theparameters of the modeling architecture to make it similarin ‘behavior’ to the system. The ‘behavior’ of the model ischaracterized by input-output pairs, where the inputs ({xi}ni=1)are the ones the system has, respectively, and the outputs({yi}ni=1) are the corresponding responses of the modelingarchitecture. The degree of similarity between the behaviorsis measured by a predefined error function being a functionof the parameters of the model. For example, Mean SquaredError is a commonly applied error definition:

1

n

n∑i=1

(yi − di)2 (1)

The higher the degree of similarity, the lower the valueof the error function. Hence the task of supervised machinelearning is actually to find values to the parameters, in whichcase the error is minimal. This way supervised machinelearning problems can be reduced to optimization problems.

Usually, the samples are divided into at least two parts. Ifthey are separated into two sets, then the first set (trainingsamples) is used during the learning process and involved inthe adjustment of the parameters of the model through theerror function. The second set (test samples) is applied afterthe learning in the evaluation of the accuracy of the obtainedmodel.

B. Fuzzy rule based inference systems

The structure of a fuzzy rule based inference system and theforms of fuzzy rules are well known from the literature. Theidea was first proposed in [5] and then adopted in an easiercomputable frame in [2].

The fuzzy inference method considered in this paper isMamdani-inference [2].

C. Numerical optimization

Numerical optimization [6] is a process, where the (global)optimum of an objective function fobj(p) is being searched forby choosing the proper variable (or parameter) vector p. Theoptimum can be the maximum or the minimum of the objectivefunction depending on the formulation of the problem.

There are several deterministic techniques as well asstochastic algorithms for optimization. Some of them will bepresented below; these are the ones that were investigated inour work.

1) Gradient based methods: A family of iterative deter-ministic techniques is called gradient based methods. Thesemethods, like Steepest Descent [6] algorithm and Levenberg-Marquardt [7], [8] technique, calculate the gradient of theobjective function at the actual point and step towards better(greater if the maximum and smaller if the minimum is beingsearched) values using it by modifying p. In case of advancedalgorithms additional information about the objective functionmay also be applied during the iterations.

One of the most frequently used methods of this type isthe steepest descent algorithm (SD) [6]. Each of its iterationscontains the following steps: the gradient vector is computed,

it is multiplied with a so-called bravery factor and finally, it isadded (in case of searching for the maximum) or subtracted (incase of searching for the minimum) from the actual positionto obtain the new position. If the gradient vector function isgiven, the vector can be obtained by calling this function,otherwise by a pseudo-gradient computation.

A more advanced and effective technique is the Levenberg-Marquardt algorithm (LM) [7], [8]. The steps applied duringthe iterations are based on a Jacobian matrix. Each row of thismatrix contains the gradient vector of the so-called residualfunctions, whose sum of squares is being minimized. If theJacobian matrix computing function is given, the matrix canbe obtained by calling this function, otherwise by a pseudo-Jacobian computation.

After a proper amount of iterations, as a result of thegradient steps, the algorithms find the nearest local minimumquite accurately. However, these techniques are very sensibleto the location of the starting point. In order to find the globaloptimum, the starting point must be located close enough toit, in the sense that no local optima separate these two points.

2) Evolutionary computation methods: A family of iterativestochastic techniques is called evolutionary algorithms. Thesemethods, like the Genetic Algorithm (GA) [9] or the BacterialEvolutionary Algorithm (BEA) [3], imitate the abstract modelof evolution observed in the nature. Their aim is to changethe individuals in the population by the evolutionary operators(selection, crossover, mutation and substitution in case of GA,furthermore bacterial mutation and gene transfer in case ofBEA) to obtain better and better ones. The goodness of anindividual can be measured by its fitness. If an individualrepresents a solution for a given problem, the algorithms try tofind the optimal solution for the problem. Thus, in numericaloptimization the individuals are potentially optimal parametervectors and the fitness function is a transformation of theobjective function. If an evolutionary algorithm uses an elitiststrategy, it means that the best ever individual will alwayssurvive and appear in the next generation. As a result, whenthe algorithm stops, the best individual will hold the (quasi-)optimal values for p, i.e. the best individual will represent the(quasi-) optimal parameter vector.

3) Memetic algorithms: Evolutionary computation tech-niques explore the whole objective function, because of theircharacteristics, so they find the global optimum, but theyapproach it slowly, while gradient based algorithms find onlythe nearest local optimum, however, they converge to it faster.

Avoiding the disadvantages of the two different techniquetypes, evolutionary algorithms and gradient based methodsmay be combined (e.g. [10] and [11]), for example, if in eachiteration for each individual some gradient steps are applied.Expectedly, this way the advantages of both gradient andevolutionary techniques can be exploited: the local optima canbe found quite accurately on the whole objective function, i.e.the global optimum can be approximated well.

There are several results in the literature confirming thisexpectation in the following respect. Usually, the more difficultthe applied gradient step is, the higher convergence speed the

281

algorithm has in terms of number of generations. It must beemphasized, that most often these results discuss the con-vergence speed in terms of number of generations. However,the more difficult an algorithm is, the greater computationaldemand it has, i.e. each iteration takes longer.

Therefore the question arises: how does the speed of theconvergence change in terms of time if the gradient basedtechnique applied in the method is changed?

Apparently, this is a very important question of applicability,because in real world applications time as a resource is a veryimportant and expensive factor, but the number of generationsthe algorithm executes does not really matter.

This is the reason why the efficiency in terms of time waschosen to be investigated in this paper.

D. Adaptive scheduling of optimization algorithms

The problem of adaptive scheduling of optimization algo-rithms was introduced recently in [4]. Its motivation and goalare the following.

As a rule of thumb it can be said that for simpler optimiza-tion tasks simpler algorithms, for more difficult ones morecomplex optimization techniques are suitable. An illustrative,exaggerated example can be the minimization of a simple onedimensional quadratic function and of a thousand dimensionalstochastically strongly perturbed multimodal discontinuoussurface that does not have a closed analytic form even withoutthe stochastic perturbation. Whereas in the first case the exactminimum can be found analytically in a few steps, well-knownfrom calculus, in the second case the exact minimum is cer-tainly unreachable and efficient global optimization techniqueslike EAs are only viable together with long execution times.

However, a similar dichotomy can be observed when asufficiently difficult optimization problem is solved. Namely,at the beginning of the optimization process simpler methodsconverge faster than more compound ones, because the formertechniques have lower computational demands and since theinitial points within the search space usually have low qual-ity, it is easy to reach better and better candidate solutionsiteration-by-iteration. However, in longer terms the highercomputational demand of more compound algorithms becomesa small drawback compared to the advantage coming fromthe higher improvement power of more complex methods.Thus, it heuristically follows that for more difficult problems inlonger terms the best more complex optimization approachescan outperform the best simpler ones. Indeed, there are manysimulation results in the literature confirming this heuristic(e.g. [12], [13] and [14]).

For example, Fig. 1 shows the characteristics, i.e. theconvergence speeds in terms of fitness level, of BEA and BMAduring a fuzzy rule based learning process [4].

After this observation, the idea to use a simpler methodarises in the early parts of an optimization process and whena more complex technique becomes more efficient, then switchbetween the algorithms and continue with the execution of themore compound one. Moreover, this idea can be generalizedto the approach to always (i.e. in each iteration during the

Fig. 1. Characteristics of BEA and BMA during an example optimizationprocess.

optimization) apply the currently most efficient technique, thatis, to adaptively schedule the optimization algorithms duringthe whole optimization process.

For this scheduling problem two scheduling strategies(schedulers) were defined [4].

1) Greedy Scheduler: Greedy scheduler executes all thealgorithms simultaneously, and after each iteration (or timeslot) the currently best candidate solution is selected accordingto its objective function value. In case of population basedtechniques (like BEA or BMA) the best population is selectedbased on some quality measure, e.g. the fitness value ofthe best individual. In the subsequent iteration (or time slot)every algorithm is initialized to the selected candidate solutionor population and executed simultaneously. Then, anothercomparison is performed, and so forth.

The seemingly hopeless drawback of this simple scheduleris its huge overhead originating form the parallel execution ofall algorithms during the whole optimization process.

2) Fast Greedy Scheduler: Sometimes the optimizationproblem has favorable properties and particular optimizationalgorithms can perform best during long periods. In this casethe number of switches between the algorithms in an optimalschedule may be low.

In order to exploit this advantage of such problems, anotherversion of the above discussed Greedy scheduler was pro-posed. If the scheduling problem has the mentioned favorableproperty, this scheduling method is faster, than the previousone.

The Fast greedy scheduler does not compare the optimiza-tion algorithms in each step, i.e. it does not execute themsimultaneously all the time, but only after a predefined ‘blindrunning’ time, while it applies only the last locally bestalgorithm.

This way the mentioned drawback of the previous schedulercan be eased.

282

E. Establishing fuzzy systems involving the scheduling ap-proach

This section first explains the application of the describedbacterial optimization techniques during the construction offuzzy knowledge bases within a supervised machine learningprocess. Then, the recently proposed scheduler is introducedinto the process.

1) Applying the bacterial techniques in fuzzy systems: Asit was explained in the previous section BEA represents thecandidate solutions of an optimization problem in the formof numerical vectors. In case of machine learning tasks thecomponents of these vectors correspond to the parameters ofthe modeling architecture. In fuzzy systems these parameterscharacterize the rule bases. The rule bases always containtrapezoidal membership functions in every rule in the presentwork. The number of fuzzy rules in the knowledge basesis predefined. Therefore, it is obvious to characterize therule bases by the breakpoints (vertices) of the trapezoids,i.e. to assign these points to the components of the vectorsrepresented by the individuals of BEA [11]. Thus, in thenumerical vectors all the breakpoints of every trapezoid fromeach fuzzy rule is simply collected.

2) Adaptively scheduling the bacterial methods: After thetwo bacterial techniques are adapted, it is straightforward tointroduce adaptive scheduling into the optimization process.Since the scheduler is a meta-optimizer, it has no directconnection to the underlying fuzzy system. The schedulercontrols from the top; it only monitors the bacterial methods,compares them and switches between them.

III. IMPROVEMENT FOR FAST GREEDY SCHEDULER

In this section an improvement is proposed for Fast GreedyScheduler.

If it can be assumed that the characteristics, i.e. the conver-gence speeds in terms of fitness level, are monotonic decreas-ing functions, then the following straightforward improvementcan be applied to Fast Greedy Scheduler, resulting in theapproach Monotonic Fast Greedy Scheduler (MFGS).

After each blind running phase the convergence speed of theactive algorithm is compared to the last calculated convergencespeeds of the inactive ones, which are the values calculatedat the end of the last performed comparing phase. If theconvergence speed of the active method is still greater thanor equal to all the other values, then at the current fitnesslevel the active algorithm must still be the most efficient onedue to the monotonicity of the characteristics assumed. Inthis case the subsequent comparing phase is postponed andthe blind running continues. During the remaining part ofthis lengthened phase the convergence speed is continuouslycompared to the last calculated convergence speeds of theinactive methods. When the efficiency of the active algorithmdecreases below the highest last measured efficiency level ofthe inactive methods the postponed comparing phase takesplace, and then the algorithms are executed simultaneouslyfor a short period.

With this improvement MFGS clearly reduces the compar-ing overhead of the schedulers further.

IV. EXPERIMENTAL ANALYSIS

Since the established systems are too difficult to determineanalytically their main capabilities, namely the speed of thelearning process they perform and the accuracy of the resultingknowledge base, they were evaluated based on the results ofsimulation runs discussed in this section.

After defining the involved machine learning problems andenumerating the parameters of the optimization processes,the results of the experiments and their explanation will bepresented.

A. Involved data sets

The following two machine learning data sets were involvedin the experiments from the well-know KEEL machine learn-ing data set repository [15].

1) Stock prices data set: In case of the nine dimensionalStock prices data set (Stock) the goal is to predict the dailystock prices of an aerospace company based on the stock pricesof nine concurrent aerospace companies.

The data set contains 950 instances (from January 1988through October 1991), from which 237 is used as trainingsamples and 713 as test samples in the simulations.

2) Treasury data set: The fifteen dimensional Treasury dataset (Treasury) aims at predicting the so-called one monthcertificate of deposit rates based on fifteen economic features.

There are 1049 instances in the data set (corresponding tothe weeks from April 1, 1980 through April 2, 2000), fromwhich 349 is used as training samples and 700 as test samplesin the simulations.

B. Technical details of the optimization processes

In the simulations the parameters had the same values asin our previous works (see [13] and [14]), because beforecarrying out simulations in those investigations after a numberof test runs these values seemed to be the most suitable.

The number of individuals in a generation was 8 in bothBEA and BMA algorithms. The maximum number of rules inthe rule base was 4. The number of clones was 5 and 4 genetransfers were carried out in each generation. In the memetictechnique the local search techniques executed 4 iterations ineach generation.

MFGS compared the efficiency of the different algorithmsduring 125 seconds long comparing phases in order to havea meaningful comparison, since the fitness values may notincrease during a long period. The length of the blind runningphases were set to 500 seconds.

The optimization processes stopped after reaching the timelimit, which was 2000 seconds uniformly.

At the end of each optimization processes the accuracy ofthe resulting rule bases was measured using the Mean SquaredError (MSE).

The number of executed generations during the learningprocess was also observed.

283

In case of all algorithms for each parameterization 8 runswere carried out for every learning problem, then the mean ofthe obtained values were taken.

During the runs the fitness values of the best individualswere monitored in terms of time. These fitness values werecalculated based on the MSE values measured on the trainingsamples (in contrast with the final errors, which were measuredon the test samples) as follows:

F =10

MSE + 1=

10mm∑i=1

(di − yi)2 +m

. (2)

The purpose of this definition is to obtain a strictly mono-tonic decreasing fitness function taking its values from theinterval [0, 10], which differentiate candidate solutions givinglower error values stronger.

The means of the fitness values of the best individualsduring the runs were presented in figures (Fig. 2 and Fig. 3)to get a better overview. The horizontal axes show the elapsedcomputation time in seconds, whereas the vertical axes showthe fitness values of the best individuals at the current time.

In the figures the dashed lines show the results of BEA,the dotted lines present the performance of BMA, whereasthe solid lines show the fitness values given by the schedulingapproach.

C. Simulation results

The results for the learning problems are shown in Fig. 2and Fig. 3.

Fig. 2. Fitness values for the Stock prices data set. (9 dimensions)

Observing the results given by the different fuzzy systems,perhaps the most obvious fact is that MFGS was outperformedin the nine dimensional case, however, for the fifteen dimen-sional problem MFGS was the most efficient. The explanationof these results is the following.

In case of the simpler problem it is not worth to use thescheduling technique, because its overhead deteriorates the

Fig. 3. Fitness values for the Treasury data set. (15 dimensions)

advantage gained by the possibility of adaptively switchingto the currently best performing optimization algorithm.

However, in case of the more difficult problem the advan-tage of adaptively switching is higher than the disadvantageof the overhead originating from the comparing requirements,and thus the fuzzy system involving the scheduling approachoutperforms the other ones.

Looking at the results, one might have a presumption thatBMA was overwhelmed in contrast with the conclusion ofprevious comparative works (cf. [13] and [14]). However,after a careful study it can be noticed that at the end ofthe optimization processes the fitness curve of BMA is thesteepest foreshadowing an intersection with the curve ofBEA. Indeed, in previous works BMA showed a “slowly butsurely” behavior, and if BMA had enough time for running,it outperformed any other techniques.

V. CONCLUSIONS

The goal of this paper was to propose a new, improvedscheduling method for the Adaptive Scheduling of Optimiza-tion Algorithms approach and to apply it in fuzzy rule basedknowledge extraction processes in order to solve supervisedmachine learning problems. The scheduling was performedbetween Bacterial Evolutionary Algorithm and its memeticvariant, Bacterial Memetic Algorihm, since according to pre-vious studies (e.g. [13] and [14]), these are efficient techniquesfor constructing fuzzy systems.

Simulation runs were carried out on two data sets in orderto compare the performance achieved by the application of thescheduling approach to the abilities of the bacterial methods.

The experiments showed that in case of the simpler problemit was not worth to use the scheduling technique, because itsoverhead deteriorated the advantage gained by the possibil-ity of adaptively switching to the currently best performingoptimization algorithm.

284

However, in case of the more difficult problem the advan-tage of adaptive switching was higher than the disadvantageof the overhead originating from the comparing requirements,and thus the fuzzy system involving the scheduling approachoutperformed the other ones.

The obtained simulation results are similar to the onesobtained when the adaptive scheduling was applied for in-terpolative fuzzy systems [16].

Further research aims at improving the scheduling algorithmand applying the approach for other architectures, such ashierarchical and hierarchical-interpolative fuzzy systems.

ACKNOWLEDGMENT

This paper was supported by the National Scientific Re-search Fund Grants OTKA K75711 and OTKA K105529, aSzchenyi Istvn University Main Research Direction Grant andthe Social Renewal Operation Programmes TMOP-4.2.2 08/1-2008-0021 and 421 B.

REFERENCES

[1] E. Alpaydin, “Introduction to Machine Learning” (The MIT Press, 2004)445 p.

[2] E. H. Mamdani, “Application of fuzzy algorithms for control of simpledynamic plant”, IEEE Proc., 121, no. 12, (1974) pp. 1585–1588.

[3] N. E. Nawa and T. Furuhashi, “Fuzzy system parameters discovery bybacterial evolutionary algorithm”, IEEE Transactions on Fuzzy Systems,7, no. 5, 1999, pp. 608–616.

[4] K. Balazs, L. T. Koczy: A Remark on Adaptive Scheduling of Optimiza-tion Algorithms, International Conference on Information Processing andManagement of Uncertainty in Knowledge-Based Systems, IPMU 2010,Dortmund, Germany, 2010, pp. 719–728.

[5] L. A. Zadeh, “Outline of a new approach to the analysis of complexsystems and decision processes”, IEEE Tr. Systems, Man and CyberneticsSMC 3 (1973) pp. 28–44.

[6] J. A. Snyman, “Practical Mathematical Optimization: An Introductionto Basic Optimization Theory and Classical and New Gradient-BasedAlgorithms” (Springer, New York, 2005).

[7] K. Levenberg, “A method for the solution of certain non-linear problemsin least squares”, Quart. Appl. Math., 2, no. 2 (1944) pp. 164–168.

[8] D. Marquardt, “An algorithm for least-squares estimation of nonlinearparameters”, J. Soc. Indust. Appl. Math., 11, no. 2 (1963), pp. 431–441.

[9] J. H. Holland, “Adaption in Natural and Artificial Systems”, The MITPress, Cambridge, Massachusetts, 1992.

[10] P. Moscato, “On evolution, search, optimization, genetic algorithms andmartial arts: Towards memetic algorithms”, Technical Report CaltechConcurrent Computation Program, Report. 826, California Institute ofTechnology, Pasadena, USA, 1989.

[11] J. Botzheim, C. Cabrita, L. T. Koczy, and A. E. Ruano, “Fuzzy ruleextraction by bacterial memetic algorithms”, Proceedings of the 11thWorld Congress of International Fuzzy Systems Association, IFSA 2005,Beijing, China, 2005, pp. 1563–1568.

[12] K. Balazs, J. Botzheim and L. T. Koczy, “Comparison of VariousEvolutionary and Memetic Algorithms”, Proceedings of the InternationalSymposium on Integrated Uncertainty Management and Applications,IUM 2010, Ishikawa, Japan, 2010, pp. 431–442.

[13] K. Balazs, J. Botzheim and L. T. Koczy, “Comparative Analysis ofInterpolative and Non-interpolative Fuzzy Rule Based Machine LearningSystems Applying Various Numerical Optimization Methods”, WorldCongress on Computational Intelligence, WCCI 2010, Barcelona, Spain,2010, pp. 875–982.

[14] K. Balazs, L. T. Koczy: Constructing Dense, Sparse and HierarchicalFuzzy Systems by Applying Evolutionary Optimization Techniques, Ap-plied and Computational Mathematics, Vol. 11, No. 1, 2012, pp. 81–101.

[15] J. Alcala-Fdez, A. Fernandez, J. Luengo, J. Derrac, S. Garcıa, L.Sanchez, F. Herrera: KEEL Data-Mining Software Tool: Data Set Repos-itory, Integration of Algorithms and Experimental Analysis Framework,Journal of Multiple-Valued Logic and Soft Computing 17:2–3 (2011) pp.255–287.

[16] K. Balazs, L. T. Koczy: Adaptive Scheduling of Optimization Al-gorithms in the Construction of Interpolative Fuzzy Systems, IEEEInternational Conference on Fuzzy Systems (FUZZ-IEEE 2013), 2013,(submitted paper).

285

[ieee 2013 joint ifsa world congress and nafips annual meeting (ifsa/nafips) - edmonton, ab, canada...

Documents