[ieee 2007 ieee international fuzzy systems conference - london, uk (2007.07.23-2007.07.26)] 2007...

Increasing Diagnostic Accuracy by Meta Optimization of FuzzyRule Bases

Mario Drobics, Janos Botzheim and L'szl6 T. K6czy

Abstract- In medicine the decision on which test to choosefor a given decision problem is a delicate problem. On theone hand a positive test should be a reliable indicator on thepresence of a disease, while on the other hand a negative testis required to be an indicator on the absence of a disease. Ofcourse, these two goals are conflicting and a balanced decisionaccording to the current situation is required.

Inductive learning methods for (fuzzy) rule bases are, how-ever, typically not capable of optimizing such complex andproblem depending goal functions. We therefore present a meta-learning algorithm which selects a subset from a previouslygenerated set of fuzzy rules using bacterial evolutionary algo-rithms. We also present a study where the proposed method isused to generate a model for predicting the presence/absenceof hepatitis, based on laboratory results.

I. INTRODUCTION

When creating models for medical decision problems,special care has to be taken to fulfill certain domain specificquality criteria. Typically, the quality of a medical test isdescribed in terms of specificity and sensitivity. While ahigh specificity indicates a low rate of type-I errors (falsepositive), a sensitive test will have lower type-II errors (falsenegative). Depending on the risks of a false diagnosis, thefinal test will be designed to favor specificity or sensitivity.As fuzzy rule bases are capable of fulfilling requirements

regarding interpretability and accuracy, they are often used inmedical applications where expert knowledge is not explicitlyavailable, but knowledge of the resulting model is essential.Most methods for learning fuzzy rule bases (or decisiontrees), however, choose a stepwise approach to construct therule base. Typically, decisions are based on local criteria likeentropy gain or confidence and support. When combiningdifferent rules to a rule base it has, however, to be consid-ered how these rules interact. Furthermore it is usually notpossible to define the desired goal function freely. Currently,only a few approaches like genetic programming [1] arecapable of dealing with different optimization criteria. Theseapproaches, however, are usually very complex and timeconsuming, as the search space is likely to become very large.

Mario Drobics is with the Section on Medical Expert andKnowledge-Based Systems, Core Unit for Medical Statistics andInformatics, Medical University Vienna A-1090 Vienna, Austria (email:mario.drobics @meduniwien.ac.at).

Janos Botzheim is with the Department of Telecommunications andMedia Informatics, Budapest University of Technology and Economics, H-1117 Budapest, Hungary (email: [email protected]).

Laszlo T. Koczy is with the Department of Telecommunications andMedia Informatics, Budapest University of Technology and Economics,H-1117 Budapest, Hungary & Institute of Information Technology andElectrical Engineering, Szechenyi Istvan University, H-9026 Gyor, Hungary(email: [email protected]).

We overcome these limitations by splitting the constructionof the rules and the construction of the final rule base.Namely we construct a large set of rules first, where allrules fulfill only minimal requirements in terms of confidence(i.e. predictive accuracy) and support (i.e. coverage). Then weselect a much smaller subset of these rules using a bacterialevolutionary algorithm (BEA). As in the BEA we can definethe goal function freely, we finally obtain a rule set whichperfectly fits the requirements. Comparable approaches usinggenetic algorithms have been presented in [2] and [3]. Themain disadvantage of these approaches is their complexity,caused by the use of GAs and a binary coding.

Bacterial evolutionary algorithms are simpler then geneticalgorithms and it is possible to reach lower error levels withina short time. They comprise of two operations inspired by themicrobial evolution phenomenon. The bacterial mutation op-eration which optimizes the chromosome of one bacterium,and the gene transfer operation which transfers informationbetween different bacteria within the population. BEA havealready been successfully applied to rule learning [4] andfeature selection [5].

In this paper we will first discuss the general problem ofrule selection in section II. Then, in section III the bacterialevolutionary algorithm is introduced and we show how thismethod can be applied to the problem of rule selection. Insection IV a case study on the diagnosis of Hepatitis ispresented to illustrate the potential of this new approach.

II. PROBLEM STATEMENT

Especially in medicine it is indispensable to use highly ac-curate decision models. Typically, the requirements on sucha model are formulated in terms of sensitivity, specificity, orpositive and negative predicted values. These values can beeasily obtained from the prediction matrix, where for eachcase the actual condition and the predicted diagnoses arecompared. As for rule based models null predictions mightoccur, we have to take them into account, too.

Conditionpos neg

Test pos truePos falsePos posValsoutcome neg falseNeg trueNeg negVals

Null nullPos nullNegsensitivity specificity

1-4244-1210-2/07/$25.00 C 2007 IEEE.

sens(f,S*) -_truiePossens(f, S*) =truePos + falseNeg + nullPos'

spec(f, S*) trueNegtrueNeg + falsePos + nullNeg'

posVals(f, S*) truePostruePos + falsePos'

negVas(f S*) trueNegnegVals(f, S*) =trueNeg + falseNeg

These parameters can now be used to formulate minimalrequirements on the decision model, as well as an overallquality criteria which might be used to identify the optimalmodel.When learning fuzzy rule bases from data the problem

arises, that at the time an individual rule is created, we donot know how the overall rule base will behave. Thereforelocal quality measures like confidence and support are usedto decide which rules are created. The support of a ruleA -> C is in the fuzzy case defined as the sum of samples,which fulfill both, the antecedent A and the consequence C,according to:

supp(A -> C) { Z T(A(x),C(x)).xCX

Where X is the set of samples and T is an arbitrary t-norm.The confidence is then defined as the fraction of a samplefulfilling A, which also fulfill C, according to:

conf(A -> C) = supp(A C)

Although it is usually possible to obtain satisfactory re-sults by optimizing the parameter settings of the learningalgorithms, this approach is more based on trial and errorthan on a structured optimization of the goal function.Approaches based on constrained optimization techniques orgenetic algorithm/programming which would offer a possiblesolution to this problem are usually hard to apply on realworld problems, as they are computationally expensive andoften result in uninterpretable rules.

To overcome these limitations, we use FS-Miner [6], aninductive rule learning method, to find all rules fulfilling onlyminimal requirements in terms of confidence and supportfirst. We put special emphasis on obtaining interpretablerules, as these rule bases typically have to be checked byphysicians before applying them in real world environments.This is done by defining an underlying set of fuzzy predicatesusing Comp-FS [7], involving ordering-based predicates.This set of predicates ensures, that we are able to generatecompact, but expressive rules. Of course, other learningmethods might be used as well to generate the initial setof rules (e.g. association rule miners [8], [9]).As we, however, might have a large number of rules, the

search space is still very large. E.g. for taking 20 rulesout of 1000, there are about 1040 possible combinations.Therefore, it is indispensable to use an intelligent searchstrategy to find a subset, minimizing/maximizing the goal

kind of evolutionary approach to do so. Details on howthis is done are described in the next section.

III. BACTERIAL EVOLUTIONARY ALGORITHM

There are several optimization algorithms which were in-spired by the processes of evolution. These processes can beeasily applied in optimization problems where one individualcorresponds to one solution of the problem. An individualcan be represented by a sequence of numbers that can bebits as well. This sequence is called chromosome, which isnothing else than the individual itself. Bacterial evolutionaryalgorithms are a recent variant of genetic algorithms basedon bacterial evolution rather than eukaryotic. Bacteria sharechunks of their genes rather than perform neat crossover inchromosomes, which means bacterial genomes can grow or

shrink. This mechanism is used both in the bacterial mutationand the gene transfer operations. The latter substitutes thegenetic algorithms crossover operation, so information can

be transferred between different individuals [10]. As in thisapproach many operations can be performed in parallel, itcan be adopted to a parallel computing environment in a

straightforward manner.

A. Generating the initial rule set

In our approach we use a method which finds all rulesfulfilling minimal requirements in terms of confidence andsupport called FS-Miner [6]. Although it might be possible toremove rules covering the same range of the data space using

a partial ordering structure, we do not use this mechanismas we want to obtain the most comprehensive set of rules.Other approaches like association rule miners [8], [9] mightbe used as well for generating the initial rules. The under-lying set of predicates was defined using CompFS [11]. Foreach attribute, a partition into five fuzzy sets was createdautomatically. Furthermore, ordering based predicates were

defined, too [12].

B. The encoding method

In bacterial evolutionary algorithms, one bacterium (i, i CI corresponds to one solution of the problem under inves-tigation. For the task of selecting mi rules from a set of n

rules (mi < n), the bacterium consists of a vector of ruleindices (j {<<,. < K n with (k being the

index of the k-th rule and (k 71 for k 7 1.This encoding method, although more complex than a

simple binary coding, has strong benefits. First of all thisencoding supports the implicit definition of subgroups fromnot consecutive rules. When using a binary coding, sub-groups can only evolve amongst neighboring rules. Havingsubgroups of rules is, however, very important as thesesubgroups may contain interacting rules with a good over-

all performance. Furthermore, the evolutionary operationsperform block operations which preserve these subgroups.Secondly, we have total control on the number of rules inthe rule base. By specifying the length of elements inserted or

deleted from the bacterium, we determine the overall numberfunction. We use bacterial evolutionary algorithms some of rules involved. When using a binary coding, the number

of rules is equivalent to the number of 1's, making it muchharder to control the overall number of rules in a single step.

C. The evaluation function

Similar to genetic algorithms the fitness of a bacterium(i is evaluated using an evaluation function y(Q). Thisevaluation function is then minimized/maximized by the op-timization process, depending on the actual learning problem.

D. The evolutionary process

The basic algorithm consists of three steps [4], [10]. First,an initial population has to be created randomly. Then, bac-terial mutation and gene transfer are applied, until a stoppingcriteria is fulfilled. The evolution cycle is summarized below:

Bacterial Evolutionary Algorithmcreate initial populationdo {

apply bacterial mutationapply gene transfer

} while stopping condition not fulfilledreturn best bacterium

E. Generating the initial population

First an initial bacterium population of Nind bacteria{(i, i E I} is created randomly (I {1,... Nind }).

F Bacterial mutation

To find a global optimum, it is necessary to explore newregions of the search space, not yet covered by the currentpopulation. This is achieved by adding new, randomly gen-erated information to the bacteria using bacterial mutation.

Bacterial mutation is applied to all bacteria (i, i C I. First,Nciones copies (clones) of the bacterium are created. Then,a random segment of length I is mutated in each clone.After mutating the same segment in all clones, all the clonesand the original bacterium are evaluated using the evaluationfunction b. The bacterium with the best evaluation resulttransfers the mutated segment to the other individuals. Thisstep is repeated until each segment of the bacterium has beenmutated once. The mutation may not only change the content,but also the length. The length of the new elements is chosenrandomly as 1 ± 1*, where 1* is a parameter specifying themaximal change in length. When changing a segment of abacterium, we must take care that the new segment is uniquewithin the selected bacterium. At the end, the best bacteriumis kept and the clones are discharged.

G. Gene transferThe bacterial mutation operator optimizes the bacteria in

the population individually. To ensure that information fromeffective bacteria spreads over the whole population, genetransfer is applied.

First, the population must be sorted and divided into twohalves according to their evaluation results. The bacteria witha higher evaluation are called superior half, the bacteria with

bacterium is randomly chosen from the superior half and an-

other from the inferior half. These two bacteria are called thesource bacterium, and the destination bacterium, respectively.A segment from the source bacterium is randomly chosenand this segment is used to overwrite a random segmentof the destination bacterium, if the source segment is notalready in the destination bacterium. These two segmentsmay vary in size up to a given length. This ensures togetherwith the variable length in the bacterial mutation step-that the bacteria are automatically adjusted to the optimallength. Gene transfer is repeated Ninf times, where Ninf isthe number of "infections" per generation.

H. Stopping condition

If all individuals in the population are equal or the maxi-mum number of generations Ngen is reached, the algorithmends, otherwise it returns to the bacterial mutation step.Typically, a small number of generations (below 10) alreadyleads to good results. If a target value for evaluation functionexists, a threshold value might be defined alternatively.

IV. SIMULATION RESULTS

A. Hepatitis Analysis

For this study we selected 562 case records of adult hep-atitis patients and 231 case records of non-infected patients.The patients received stationary treatment at the ViennaGeneral Hospital (AKH-Wien) between 1976 and 1986. Eachpatients' clinical diagnoses was serologically verified andthus considered to be a gold standard.

For each patient a detailed laboratory analysis was carriedout including the following parameters:

AlbuminAlkalin PhosphateseAlpha 1 GlobulinAlpha 2 GlobulinBeta GlobulinGamma GlobulinGamma-GlutamyltranspeptidaseAspartat-Aminotransferase (GOT)Alanin-Aminotransferase (GPT)Lactate DehydrogenaseBilirubinAgeGender

We distinguished not between the different types of hep-atitis, but only between infected (positive) and non-infected(negative) cases to obtain a binary decision problem. Insuch a setting, one tries to optimize the performance of thepredictor with respect to specificity and sensitivity under theconstraint that the number of rules is limited.

Formally speaking, we compute the error estimate of a ruleset (i for a given data set S* c S as the harmonic mean ofsensitivity and specificity. To ensure that we do not obtain

a lower evaluation are referred to as inferior half. Then, one infinitely large rule sets we define a fuzzy predicate "s is not

greater than m" LE(s, m) according to:

r 1 <=mLE(s, m)= e_(-s2)2

c-e~2Fm2 otherwise

with s being the actual model size, and -m the desired max-imum number of rules. The overall error measure F(f, S*)is then computed as:

F(fJ S*)1

LE(MS(f), m)

-2 sens(f, S*) + spec(f, S*)sens(f, S*) spec(f, S*)

Other error measure can be used as well, depending onthe problem under investigation. For the remaining we will,however, stick to this rather simple error measure to easeinterpretation.We created a rule base of 466 rules having a minimum

support of 0.01 and a minimum confidence of 0.6 using FS-Miner [6], where we did not remove overlapping rules. Wethen used the BEA to select the optimal subset of these rulesusing the following parameter set:

no. of generations 5no. of individuals 4no. of clones 5mutation length 3 ± 2no. of gene transfers 20gene transfer length 2 ± 2max. length 20

We also tried out other parameter setting (higher numberof generations, individuals, clones), but with no significantincrease in the quality of the obtained rule sets. A resultingrule base is shown in Fig. 1. In each iteration, the evaluationfunction was evaluated approximately 200 times.We compared the results to those of three different learning

algorithms which involve the same set of predicates. Thisenables us to eliminate influences caused from differentdefinitions of the underlying fuzzy predicates. While FS-ID3is a fuzzy decision tree learning method, FS-FOIL and FS-Miner are rule learning algorithms. We compared the resultsof the three algorithms using 5-fold cross validation withsimilar splits for all algorithms. The means of the accordingquality measures are shown in Fig. 2.We can see, that FS-Miner failed to produce reasonable

results, as only positive cases have been predicted. Thisresults in a high sensitivity, but no specificity of the resultingrule base. FS-ID3 has a slightly better performance thanthe BEA with respect to sensitivity, but with a significantlylower performance regarding specificity. FS-FOIL, finally, iscomparable to the BEA in terms of sensitivity, but fails interms of specificity. The measures PosVals, NegVals, andFractionCorrect showed comparable results for all threeapproaches. Only the x2 statistics and the mutual informationmeasure, computed on the predicted and the original outputvalues, is significantly better for the BEA. The main problemof FS-FOIL is its high ratio of negative null predictions

(nullNeg). This all together leads to an significantly betterresult of the bacterial evolutionary algorithms with respect tothe F measure, the actual goal function of this optimizationpreocess.

V. CONCLUSIONS

In this paper we have presented a novel approach forextracting classification rules optimizing a freely definablegoal function using bacterial evolutionary algorithms. Theevaluation based on 793 case records showed, that themethod is able to find better solutions than traditional top-down approaches, although the same set of elementary pred-icates was used.

Future work will focus on the investigation of other errormeasures. Furthermore, we want to implement a parallelizedversion of the BEA to obtain a substantial decrease incomputation time.

ACKNOWLEDGMENT

This paper was supported by the Szechenyi UniversityMain Research Direction Grant 2005, and a National Sci-entific Research Fund Grant OTKA T048832.

REFERENCES

[1] M. Setnes and H. Roubos, "GA-fuzzy modeling and classification:Complexity and performance," IEEE Trans. Fuzzy Systems, vol. 8,no. 5, pp. 509-522, October 2000.

[2] H. Ishibuchi and T. Yamamoto, "Fuzzy rule selection by data miningcriteria and genetic algorithms," in Proc. of Genetic and EvolutionaryComputation Conference, 2002, pp. 399-406.

[3] Y. Yi and E. Hiillermeier, "Learning complexity-bounded rule-basedclassifiers by combining association analysis and genetic algorithms,"in Proc. Joint 4th Conf. of the European Society for Fuzzy Logic andTechnology and 11 Recontres Francophones sur la Logique Floue etses Applications, Barcelona, September 2005.

[4] J. Botzheim, B. Hamori, L. T. Koczy, and A. E. Ruano, "Bacterialalgorithm applied for fuzzy rule extraction," in Proc. Int. Conf on In-formation Processing and Management of Uncertainty in Knowledge-Based Systems, Annecy, FR, 2002, pp. 1021-1026.

[5] J. Botzheim, M. Drobics, and L. T. Koczy, "Feature selection usingbacterial optimization," in Proc. 10th Int. Conf. on Information Pro-cessing and Management of Uncertainty in Knowledge-Based Systems,Perugia, July 2004, pp. 797-804.

[6] M. Drobics, Data Analysis Using Fuzzy Expressions, ser. Schriften-reihe der Johannes-Kepler-Universitat Linz. Universitatsverlag RudolfTrauner, 2005, vol. C 48.

[7] , "Data analysis using fuzzy expressions creating comprehen-sible computational models from data," Ph.D. dissertation, JohannesKepler Universitat Linz, September 2005.

[8] R. Agrawal and R. Srikant, "Fast algorithms for mining associationrules," in Proc. 20th Int. Conf on Very Large Data Bases, J. B. Bocca,M. Jarke, and C. Zaniolo, Eds. Morgan Kaufmann, 1994, pp. 487-499.

[9] T.-P. Hong, K.-Y. Lin, and S.-L. Wang, "Fuzzy data mining forinteresting generalized association rules," Fuzzy Sets and Systems, vol.138, no. 2, pp. 255-269, 2003.

[10] N. E. Nawa and T. Furuhashi, "Fuzzy system parameters discovery bybacterial evolutionary algorithm," IEEE Trans. Fuzzy Syst., vol. 7, pp.608-616, 1999.

[11] M. Drobics, "Choosing the best predicates for data-driven fuzzymodeling," in Proc. 13th IEEE Int. Conf: on Fuzzy Systems, Budapest,July 2004, pp. 245-249.

[12] U. Bodenhofer, "Representations and constructions of similarity-basedfuzzy orderings," Fuzzy Sets and Systems, vol. 137, no. 1, pp. 113-136,2003.

Class Condition

HEP2_Is_NEG ALKPHOS_IsAtLeast_M&&GPT_IsAtMost_L&&GAMMA_GT_IsAtMost_L&&GAMMA_IsAtMost_L&&LDH_IsAtLeast_LSERBILI Is VL&&ALKPHOS Is VL&&BETA_Is_MALKPHOS_IsAtMost_L&&LDH_Is_VL&&ALBUMIN_IsAtLeast_MALKPHOS IsAtMost L&&LDH Is VL&&SEX_Is_WALKPHOS IsAtLeast_M&&GAMMA_GT_Is_VL&&GPT_Is_VL&&SEX_Is_W&&GAMMA_IsAtMost_MGPT_Is_VL&&ALKPHOS_Is_VL&&SERBILI_IsAtMost_L

HEP2_Is_POS ALKPHOS IsAtLeast_M

LDH_IsAtLeast_M

GOT_IsAtLeast_MALKPHOS IsAtLeast H

ALKPHOS Is VL&&AGE IsAtLeast L&&LDH IsAtLeast_L&&ALPHA2_IsAtMost_LALKPHOS Is VL&&GAMMA GT IsAtMost H&&SEX Is M&&AGE_IsAtLeast_L&&ALPHAl_IsAtMost_HGAMMA_GT_IsAtLeast_L&&ALBUMIN_IsAtMost_HALKPHOS Is VL&&GAMMA GT IsAtMost H&&AGE_IsAtMost_L&&ALPHA2_IsAtMost_L&&ALPHAl_IsAtMost_HSERBILI_IsAtLeast_L&&LDH_IsAtLeast_LALKPHOS Is VL&&GAMMA GT IsAtMost H&&SEX Is M&&ALPHA2 IsAtMost L&&ALBUMIN_IsAtMost_H

ALKPHOS_Is_VL&&GAMMA_GT_IsAtMost_H&& SEX_Is_M&&AGE_IsAtLeast L&&ALPHA2 IsAtMost_M&&GPT_IsAtMost_HALKPHOS_Is_VL&&GAMMA_GT_IsAtMost_H&&SEX_Is_M&&AGE_IsAtLeast_L&&GPT_IsAtMost_H

Fig. 1. Rule set obtained using Bacterial Evolutionary Algorithm

>> 5x Cross.Val. << BEA FS-ID3 FS-FOIL FS-MINER

F 0.234 0.28 0.473 1.

Sensitivity 0.759 0.791 0.755 0.861

Specificity 0.781 0.695 0.424 0

PosVals 0.927 0.87 0.915 0.891

NegVals 0.606 0.59 0.651 0

FractionCorrect 0.798 0.766 0.844 0.891

ChiSquared 49.305 34.349 39.533 0

PLevel 0 0 0 0

MutualInformation 0.259 0.167 0.237 0

NormalizedMutualInformation 0.297 0.189 0.304 0

ModelSize 18.2 19.8 7.2 36.

RatioOfNullPredictions 0.038 0 0.219 0.32

Fig. 2. Comparison of results

[ieee 2007 ieee international fuzzy systems conference - london, uk (2007.07.23-2007.07.26)] 2007...

Documents