[lecture notes in computer science] rough sets and knowledge technology volume 4062 ||...

Classification Rule Mining Based on Particle

Swarm Optimization

Ziqiang Wang, Xia Sun, and Dexian Zhang

School of Information Science and Engineering, Henan University of Technology,Zheng Zhou 450052, China

[email protected], [email protected], [email protected]

Abstract. The Particle Swarm Optimization(PSO) algorithm,is a ro-bust stochastic evolutionary algorithm based on the movement and intel-ligence of swarms. In this paper, a PSO-based algorithm for classificationrule mining is presented. Compared with the Ant-Miner and ESIA inpublic domain data sets,the proposed method achieved higher predictiveaccuracy and much smaller rule list than Ant-Miner and ESIA.

Keywords: Data mining, classification rule, particle swarmoptimization.

1 Introduction

In the last years information collection has become easier, but the effort requiredto retrieve relevant information from large-scale databases become significantlygreater. With the rapid growth in the amount of information stored in databases,the development of efficient and effective tools for revealing valuable knowledgehidden in these databases becomes more critical for enterprise decision making.One of the possible approaches to this problem is by means of data mining orknowledge discovery from databases (KDD)[1]. Through data mining, interestingknowledge can be extracted and the discovered knowledge can be applied in thecorresponding field to increase the working efficiency and to improve the qualityof decision making.

Classification rule mining is one of the important problems in the emergingfield of data mining which is aimed at finding a small set of rules from the train-ing data set with predetermined targets[2]. The classification problem becomesvery hard when the number of possible different combinations of parameters isso high that algorithms based on exhaustive searches of the parameter spacebecome computationally infeasible rapidly. The self-adaptability of evolutionaryalgorithms based on population is extremely appealing when tackling the tasksof data mining. Especially, there are numerous attempts to apply genetic algo-rithms(GAs) in data mining to accomplish classification tasks[3].In addition, theparticle swarm optimization (PSO) algorithm[4], which has emerged recently asa new meta-heuristic derived from nature, has attracted many researchers’ in-terests[5,6]. The algorithm has been successfully applied to several minimizationoptimization problems and neural network training. Nevertheless, the use of the

G. Wang et al. (Eds.): RSKT 2006, LNAI 4062, pp. 436–441, 2006.c© Springer-Verlag Berlin Heidelberg 2006

Classification Rule Mining Based on Particle Swarm Optimization 437

algorithm for mining classification rule in the context of data mining is still aresearch area where few people have tried to explore.In this paper, the objectiveis to investigate the capability of the PSO algorithm to discover classificationrule with higher predictive accuracy and a much smaller rule list.

2 Overview of the PSO

PSO is a relatively new population-based evolutionary computation technique[4].In contrast to genetic algorithms (GAs)which exploit the competitive character-istics of biological evolution.PSO exploits cooperative and social aspects, suchas fish schooling, birds flocking, and insects swarming.In the past several years,PSO has been successfully applied in many different application areas due toits robustness and simplicity. In comparison with other stochastic optimizationtechniques like genetic algorithms (GAs), PSO has fewer complicated operationsand fewer defining parameters, and can be coded in just a few lines. Becauseof these advantages, the PSO has received increasing attention in data miningcommunity in recent years.

The PSO definition is described as follows.Let s denote the swarm size.Eachindividual particle i(1 ≤ i ≤ s) has the following properties: a current position xi

in search space, a current velocity vi, and a personal best position pi in the searchspace, and the global best position pgb among all the pi.During each iteration,each particle in the swarm is updated using the following equation.

vi(t + 1) = k[wivi(t) + c1r1(pi − xi(t)) + c2r2(pgb − xi(t))], (1)

xi(t + 1) = xi(t) + vi(t + 1), (2)

where c1 and c2 denote the acceleration coefficients, and r1 and r2 are randomnumbers uniformly distributed within [0,1].

The value of each dimension of every velocity vector vi can be clamped tothe range [−vmax, vmax] to reduce the likelihood of particles leaving the searchspace. The value of vmax chosen to be k × xmax(where 0.1 ≤ k ≤ 1).Note thatthis does not restrict the values of xi to the range [−vmax, vmax].Rather thanthat, it merely limits the maximum distance that a particle will move.

Acceleration coefficients c1 and c2 control how far a particle will move in asingle iteration. Typically, these are both set to a value of 2.0, although assign-ing different values to c1 and c2 sometimes leads to improved performance.Theinertia weight w in Equation (1) is also used to control the convergence behaviorof the PSO.Typical implementations of the PSO adapt the value of w linearlydecreasing it from 1.0 to near 0 over the execution. In general, the inertia weightw is set according to the following equation[5]:

wi = wmax − wmax − wmin

itermax· iter, (3)

where itermax is the maximum number of iterations, and iter is the currentnumber of iterations.

438 Z. Wang, X. Sun, and D. Zhang

In order to guarantee the convergence of the PSO algorithm, the constrictionfactor k is defined as follows:

k =2

|2 − ϕ −√

ϕ2 − 4ϕ| , (4)

where ϕ = c1 + c2 and ϕ > 4.The PSO algorithm performs the update operations in terms of Equation (1)

and (2) repeatedly until a specified number of iterations have been exceeded, orvelocity updates are close to zero.The quality of particles is measured using afitness function which reflects the optimality of a particular solution.Some of theattractive features of the PSO include ease of implementation and the fact thatonly primitive mathematical operators and very few algorithm parameters needto be tuned.It can be used to solve a wide array of different optimization prob-lems, some example applications include neural network training and functionminimization.However, the use of the PSO algorithm for mining classificationrule in the context of data mining is still a research area where few people havetried to explore.In this paper,a PSO-based classification rule mining algorithmis proposed in later section.

3 The PSO-Based Classification Rule Mining Algorithm

The steps of the PSO-based classification rule mining algorithm are describedas follows.

Step1: Initialization and Structure of Individuals.In the initialization pro-cess, a set of individuals(i.e.,particle) is created at random. The structure ofan individual for classification problem is composed of a set of attribute val-ues.Therefore, individual i′s position at iteration 0 can be represented as thevector X0

i = (x0i1, . . . , x

0in) where n is the number of attribute numbers in at-

tribute table.The velocity of individual i(i.e.,V 0i = (v0

i1, . . . , v0in))corresponds to

the attribute update quantity covering all attribute values,the velocity of eachindividual is also created at random.The elements of position and velocity havethe same dimension.

Step2: Evaluation Function Definition.The evaluation function of PSO algo-rithm provides the interface between the physical problem and the optimizationalgorithm.The evaluation function used in this study is defined as follows:

F =N

N + FP· TP

TP + FN· TN

TN + FP, (5)

where N is the total number of instances in the training set,TP(true positives)denotes the number of cases covered by the rule that have the class predicted bythe rule,FP(false positives) denotes the number of cases covered by the rule thathave a class different from the class predicted by the rule,FN(false negatives)denotes the number of cases that are not covered by the rule but that have theclass predicted by the rule,TN(true negatives) denotes the number of cases that


are not covered by the rule and that do not have the class predicted by therule.Therefore, F ′s value is within the range [0,1] and the larger the value ofF , the higher the quality of the rule.

Step3:Personal and Global Best Position Computation.Each particle i mem-orizes its own F ′s value and chooses the maximum one, which has been betterso far as personal best position pt

i.The particle with the best F ′s value amongpt

i is denoted as global position ptgb,where t is the iteration number. Note that in

the first iteration,each particle i is set directly to p0i , and the particle with the

best F ′s value among p0i is set to p0

gb.

Step4:Modify the velocity of each particle according to Equation(1).If v(t+1)i >

V maxi , then v

(t+1)i = V max

i . If v(t+1)i < V min

i , then v(t+1)i = V min

i .Step5:Modify the position of each particle according to Equation(2).Step6:If the best evaluation value pgb is not obviously improved or the iteration

number t reaches the given maximum,then go to Step7. Otherwise, go to Step2.Step7:The particle that generates the best evaluation value F is the output

classification rule.

4 Experimental Results

To thoroughly investigate the performance of the proposed PSO algorithm, wehave conducted experiment with it on a number of datasets taken from the UCIrepository[7].In Table 1, the selected data sets are summarized in terms of thenumber of instances, and the number of the classes of the data set.These datasets have been widely used in other comparative studies.All the results of thecomparison are obtained on a Pentium 4 PC(CPU 2.2GHZ,RAM 256MB).

In all our experiments,the PSO algorithm uses the following parameter val-ues.Inertia weight factor w is set by Equation (3),where wmax = 0.9 and wmin =0.4.Acceleration constant c1 = c2 = 2.The population size in the experimentswas fixed to 20 particles in order to keep the computational requirements low.Each run has been repeated 50 times and average results are presented.

We have evaluated the performance of PSO by comparing it withAnt-Miner[6], ESIA(a well-known genetic classifier algorithm)[8]. The first ex-periment was carried out to compare predictive accuracy of discovered rule listsby well-known ten-fold cross-validation procedure[9]. Table 2 shows the results

Table 1. Dataset Used in the Experiment

Data Set Instances Classes

Ljubljana Breast Cancer 282 2Wisconsin Breast Cancer 683 2Tic-Tac-Toe 958 2Dermatology 366 6Hepatitis 155 2Cleveland Heart Disease 303 5

440 Z. Wang, X. Sun, and D. Zhang

comparing the predictive accuracies of PSO , Ant-Miner and ESIA, where thesymbol ”± ” denotes the standard deviation of the corresponding predictive ac-curacy. It can be seen that predictive accuracies of PSO is higher than those ofAnt-Miner and ESIA.

Table 2. Predictive Accuracy Comparison

Data Set PSO(%) Ant-Miner(%) ESIA(%)

Ljubljana Breast Cancer 77.58±0.27 75.28±2.24 75.69±0.16Wisconsin Breast Cancer 97.95±0.68 96.04±0.93 94.71±0.04Tic-Tac-Toe 98.84±0.24 73.04±2.53 71.23±0.13Dermatology 97.72±0.74 94.29±1.20 91.58±0.24Hepatitis 95.38±0.35 90.00±3.11 90.36±0.21Cleveland Heart Disease 78.68±0.52 57.48±1.78 76.23±0.25

In addition, We compared the simplicity of the discovered rule list by thenumber of discovered rules. The results comparing the simplicity of the rule listsdiscovered by PSO,Ant-Miner and ESIA are shown in Table 3. As shown inthose tables, taking into number of rules discovered, PSO mined rule lists muchsimpler(smaller) than the rule lists mined by Ant-Miner and ESIA.

Table 3. Number of Rules Discovered Comparison

Data Set PSO Ant-Miner ESIA

Ljubljana Breast Cancer 6.13±0.25 7.10±0.31 26.63±0.25Wisconsin Breast Cancer 4.37±0.53 6.20±0.25 23.90±0.32Tic-Tac-Toe 6.68±0.47 8.50±0.62 37.43±0.15Dermatology 6.59±0.65 7.30±0.47 24.82±0.42Hepatitis 3.05±0.21 3.40±0.16 18.56±0.23Cleveland Heart Disease 7.27±0.36 9.50±0.71 29.37±0.35

In summary, PSO algorithm needs to tune very few algorithm parameters,taking into account both the predictive accuracy and rule list simplicity cri-teria, the proposed PSO-based classification rule mining algorithm has shownpromising results.

5 Conclusions

The PSO algorithm,new to the data mining community,is a robust stochasticevolutionary algorithm based on the movement and intelligence of swarms. Inthis paper, a PSO-based algorithm for classification rule mining is presented.Compared with the Ant-Miner and ESIA in public domain data sets,the proposedmethod achieved higher predictive accuracy and much smaller rule list than Ant-Miner and ESIA.


Acknowledgement

This work was supported partially by the National Natural Science Foundationof China under Grant No.90412013-3,the Natural Science Foundation of HenanProvince under Grant No.0511011700,and the Natural Science Foundation ofHenan Province Education Department under Grant No.200510463007.

References

1. Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P. : From data mining to knowl-edge discovery: an overview.In: Fayyad, U. M., et al., Eds., Advances in KnowledgeDiscovery & Data Mining. AAAI/MIT Press, Cambridge, MA (1996) 1–34.

2. Quinlan, J.R.: Induction of decision trees. Machine Learning 1 (1986) 81–106.3. Freitas, A. A.: Data Mining and Knowledge Discovery with Evolutionary Algo-

rithms. Springer-Verlag, Berlin (2002) .4. Eberhart, R. C., Kennedy, J.: A new optimizer using particle swarm the-

ory.In:Proceedings of the Sixth International Symposium on Micro Machine andHuman Science, Nagoya, Japan (1995) 39–43.

5. Kennedy, J.: The particle swarm: social adaptation of knowledge. In: Proceedingsof 1997 IEEE International Conference on Evolutionary Computation, Indianapolis(1997) 303-308.

6. Parpinelli, R. S., Lopes, H. S., and Freitas, A. A.: Data mining with an ant colonyoptimization algorithm. IEEE Transactions on Evolutionary Computing 6 (2002)321–332.

7. Hettich, S., Bay, S. D.: The UCI KDD Archive (1999) at http://kdd.ics.uci.edu/8. Liu, J. J., Kwok, J. T.: An extended genetic rule induction algorithm. In: Proceed-

ings of the 2000 Congress on Evolutionary Computation, San Diego (2000) 458–463.9. Weiss, S. M., KulIkowski, C. A.(Eds): Computer Systems that Learn. Morgan Kauf-

mann Press, San Mateo (1991) .

[lecture notes in computer science] rough sets and knowledge technology volume 4062 ||...

Documents