chapter 4 feature selection using genetic algorithm...

22
45 CHAPTER 4 FEATURE SELECTION USING GENETIC ALGORITHM In this research work, Genetic Algorithm method is used for feature selection. The following section explains how Genetic Algorithm is used for feature selection and how it works. 4.1 Genetic Algorithm A genetic algorithm (GA) is a search heuristic that mimics the process of natural evolution. This heuristic is routinely used to generate useful solutions to optimization and search problems. Genetic algorithms belong to the larger class of evolutionary algorithms (EA), which generate solutions to optimization problems using techniques inspired by natural evolution, such as inheritance, mutation, selection, and crossover [56-57]. 4.1.1 Methodology In a genetic algorithm, a population of strings (called chromosomes or the genotype of the genome), which encode candidate solutions (called individuals, creatures, or phenotypes) to an optimization problem, evolves toward better solutions. Traditionally, solutions are represented in binary as strings of 0s and 1s, but other encodings are also possible. The evolution usually starts from a population of randomly generated individuals and happens in generations. In each generation, the fitness of every individual in the population is evaluated, multiple individuals are stochastically selected from the current population (based on their fitness), and modified (recombined and possibly randomly mutated) to form a new population. The new population is then used in the next iteration of the algorithm. Commonly, the algorithm terminates when either

Upload: dangphuc

Post on 01-May-2018

239 views

Category:

Documents


4 download

TRANSCRIPT

45

CHAPTER 4

FEATURE SELECTION USING GENETIC ALGORITHM

In this research work, Genetic Algorithm method is used for feature selection. The

following section explains how Genetic Algorithm is used for feature selection and how it works.

4.1 Genetic Algorithm

A genetic algorithm (GA) is a search heuristic that mimics the process of natural

evolution. This heuristic is routinely used to generate useful solutions to optimization and search

problems. Genetic algorithms belong to the larger class of evolutionary algorithms (EA), which

generate solutions to optimization problems using techniques inspired by natural evolution, such

as inheritance, mutation, selection, and crossover [56-57].

4.1.1 Methodology

In a genetic algorithm, a population of strings (called chromosomes or the genotype of the

genome), which encode candidate solutions (called individuals, creatures, or phenotypes) to an

optimization problem, evolves toward better solutions. Traditionally, solutions are represented in

binary as strings of 0s and 1s, but other encodings are also possible. The evolution usually starts

from a population of randomly generated individuals and happens in generations. In each

generation, the fitness of every individual in the population is evaluated, multiple individuals are

stochastically selected from the current population (based on their fitness), and modified

(recombined and possibly randomly mutated) to form a new population. The new population is

then used in the next iteration of the algorithm. Commonly, the algorithm terminates when either

46

a maximum number of generations has been produced, or a satisfactory fitness level has been

reached for the population. If the algorithm has terminated due to a maximum number of

generations, a satisfactory solution may or may not have been reached.

Genetic algorithms find application in bioinformatics, computational science, engineering,

economics, chemistry, manufacturing, mathematics, physics and other fields.

A typical genetic algorithm requires:

a genetic representation of the solution domain,

a fitness function to evaluate the solution domain.

A standard representation of the solution is as an array of bits. Arrays of other types and

structures can be used in essentially the same way. The main property that makes these genetic

representations convenient is that their parts are easily aligned due to their fixed size, which

facilitates simple crossover operations. Variable length representations may also be used, but

crossover implementation is more complex in this case. Tree-like representations are explored in

genetic programming and graph-form representations are explored in evolutionary programming.

The fitness function is defined over the genetic representation and measures the quality of

the represented solution. The fitness function is always problem dependent. For instance, in the

knapsack problem one wants to maximize the total value of objects that can be put in a knapsack

of some fixed capacity. A representation of a solution might be an array of bits, where each bit

represents a different object, and the value of the bit (0 or 1) represents whether or not the object

is in the knapsack. Not every such representation is valid, as the size of objects may exceed the

capacity of the knapsack. The fitness of the solution is the sum of values of all objects in the

47

knapsack if the representation is valid or 0 otherwise. In some problems, it is hard or even

impossible to define the fitness expression; in these cases, interactive genetic algorithms are

used.

Once the genetic representation and the fitness function is defined, GA proceeds to

initialize a population of solutions randomly, and then improve it through repetitive application

of mutation, crossover, inversion and selection operators.

4.1.2 Initialization

Initially many individual solutions are randomly generated to form an initial population.

The population size depends on the nature of the problem, but typically contains several

hundreds or thousands of possible solutions. Traditionally, the population is generated randomly,

covering the entire range of possible solutions (the search space). Occasionally, the solutions

may be "seeded" in areas where optimal solutions are likely to be found.

4.1.3 Selection

During each successive generation, a proportion of the existing population is selected to

breed a new generation. Individual solutions are selected through a fitness-based process, where

fitter solutions (as measured by a fitness function) are typically more likely to be selected.

Certain selection methods rate the fitness of each solution and preferentially select the best

solutions. Other methods rate only a random sample of the population, as this process may be

very time-consuming.

48

4.1.4 Reproduction

The next step is to generate a second generation population of solutions from those selected

through genetic operators: crossover (also called recombination), and/or mutation.

For each new solution to be produced, a pair of "parent" solutions is selected for breeding

from the pool selected previously. By producing a "child" solution using the above methods of

crossover and mutation, a new solution is created which typically shares many of the

characteristics of its "parents". New parents are selected for each new child, and the process

continues until a new population of solutions of appropriate size is generated. Although

reproduction methods that are based on the use of two parents are more "biology inspired", some

research suggests more than two "parents" are better to be used to reproduce a good quality

chromosome.

These processes ultimately result in the next generation population of chromosomes that is

different from the initial generation. Generally the average fitness will have increased by this

procedure for the population, since only the best organisms from the first generation are selected

for breeding, along with a small proportion of less fit solutions, for reasons already mentioned

above.

Although Crossover and Mutation are known as the main genetic operators, it is possible to

use other operators such as regrouping, colonization-extinction, or migration in genetic

algorithms.

49

4.1.5 Termination

This generational process is repeated until a termination condition has been reached.

Common terminating conditions are:

A solution is found that satisfies minimum criteria

Fixed number of generations reached

Allocated budget (computation time/money) reached

The highest ranking solution's fitness is reaching or has reached a plateau such that

successive iterations no longer produce better results

Manual inspection

Combinations of the above

A Simple generational genetic algorithm procedure is given below.

1. Choose the initial population of individuals

2. Evaluate the fitness of each individual in that population

3. Repeat on this generation until termination (time limit, sufficient fitness achieved, etc.):

a. Select the best-fit individuals for reproduction

b. Breed new individuals through crossover and mutation operations to give birth to

offspring

c. Evaluate the individual fitness of new individuals

d. Replace least-fit population with new individuals

50

4.1.6 Variants of Genetic Algorithm

The simplest algorithm represents each chromosome as a bit string. Typically, numeric

parameters can be represented by integers, though it is possible to use floating point

representations. The floating point representation is natural to evolution strategies and

evolutionary programming. The basic algorithm performs crossover and mutation at the bit level.

Other variants treat the chromosome as a list of numbers which are indexes into an instruction

table, nodes in a linked list, hashes, objects, or any other imaginable data structure. Crossover

and mutation are performed so as to respect data element boundaries. For most data types,

specific variation operators can be designed. Different chromosomal data types seem to work

better or worse for different specific problem domains.

A very successful variant of the general process of constructing a new population is to

allow some of the better organisms from the current generation to carry over to the next,

unaltered. This strategy is known as elitist selection.

Parallel implementations of genetic algorithms come in two flavours. Coarse-grained

parallel genetic algorithms assume a population on each of the computer nodes and migration of

individuals among the nodes. Fine-grained parallel genetic algorithms assume an individual on

each processor node which acts with neighboring individuals for selection and reproduction.

Other variants, like genetic algorithms for online optimization problems, introduce time-

dependence or noise in the fitness function.

Genetic algorithms with adaptive parameters (adaptive genetic algorithms, AGAs) is

another significant and promising variant of genetic algorithms. The probabilities of crossover

51

(pc) and mutation (pm) greatly determine the degree of solution accuracy and the convergence

speed that genetic algorithms can obtain. Instead of using fixed values of pc and pm, AGAs

utilize the population information in each generation and adaptively adjust the pc and pm in

order to maintain the population diversity as well as to sustain the convergence capacity. In AGA

(adaptive genetic algorithm), the adjustment of pc and pm depends on the fitness values of the

solutions. In CAGA (clustering-based adaptive genetic algorithm), through the use of clustering

analysis to judge the optimization states of the population, the adjustment of pc and pm depends

on these optimization states.

It can be quite effective to combine GA with other optimization methods. GA tends to be

quite good at finding generally good global solutions, but quite inefficient at finding the last few

mutations to find the absolute optimum.

4.2 Using Genetic Algorithm for feature selection

This heuristic approach has been chosen as the number of features to consider is large.

The objective is first to isolate the most relevant associations of features, and then to class

individuals that have the considered similarities according to these associations.

4.2.1 Introduction

The first phase of this algorithm deals with isolating the very few relevant features from

the large set. This is not exactly the classical feature selection problem known in Data mining.

Here, we have the idea that less than 5% of the features have to be selected. But this problem is

close from the classical feature selection problem, and we will use a genetic algorithm as we saw

they are well adapted for problems with a large number of features. Genetic algorithm

considered here has different phases. It proceeds for a fixed number of generations. A

52

chromosome, here, is a string of bits whose size corresponds to the number of features. A 0 or 1,

at position i , indicates whether the feature i is selected (1) or not (0).

The Genetic Operators

These operators allow GAs to explore the search space. However, operators typically

have destructive as well as constructive effects. They must be adapted to the problem.

We use a Subset Size-Oriented Common Feature Crossover Operator (SSOCF), which

keeps useful informative blocks and produces offspring’s which have the same distribution than

the parents. Off- springs are kept, only if they fit better than the least good individual of the

population. Features shared by the 2 parents are kept by offsprings and the non-shared features

are inherited by offsprings corresponding to the ith

parent with the probability (ni - nc/nu) where

ni is the number of selected features of the ith

parent, nc is the number of commonly selected

features across both mating partners and nu is the number of non-shared selected features.

Figure 4.1The SSOCF Crossover Operator

The mutation is an operator which allows diversity. During the mutation stage, a

chromosome has a probability pmut to mutate. If a chromosome is selected to mutate, we choose

randomly a number n of bits to be flipped then n bits are chosen randomly and flipped.

53

A probabilistic binary tournament selection is taken. Tournament selection holds n

tournaments to choose n individuals. Each tournament consists of sampling 2 elements of the

population and choosing the best one with a probability p [0.5, 1].

The Chromosomal Distance

Create a specific distance which is a kind of bit to bit distance where not a single bit i is

considered but the whole window (i , i+ ) of the two individuals are compared. If one and only

one individual has a selected feature in this window, the distance is increased by one.

Sharing

To avoid premature convergence and to discover different good solutions (different

relevant associations of features), we use a niching mechanism. Both crowding and sharing give

good results and we choose to implement the fitness sharing. The objective is to boost the

selection chance of individuals that lie in less crowded area of the search space. We use a niche

count that measures of how crowded the neighborhood of a solution is. The fitness of individuals

situating in high concentrated search space regions is degraded and a new fitness value is

calculated and used, in place of the initial value of the fitness, for the selection.

Random Immigrant

Random Immigrant is a method that helps to maintain diversity in the population. It

should also help to avoid premature convergence. Random immigrant is used as follows: if the

best individual is the same during N generations, each individual of the population, whose fitness

is under the mean, is replaced by a new randomly generated individual.

54

4.2.2 Filter Approach

Filter approach uses metrics like Information Gain, Similarity, Relief methods to assign

fitness value to the individual whose fitness is being evaluated. This approach gives weight for

each of the selected features individually and overall fitness value is obtaining by combining the

individual weights suitably [58-60].

The following two filter based approaches have been implemented for feature selection

using MATLAB:

4.2.3 Relief Algorithm based feature selection

The key point of Relief algorithm is to evaluate features according to its ability to

distinguish close samples. Relief’s core concept is that a “good” feature should make the simples

in the same category closed, and keep the simple in different categories off. In Relief algorithm,

a simple R is select randomly first, then find out R’s nearest neighbor H in the same category,

say NearestHit and the nearest neighbor M in different categories, say NearestMiss. For certain

feature x, if the distance between R and H is shorter than the distance between R and M, which

means Diff(x, R, M) > Diff(x, R, H), it concludes that this feature x is good for differentiation, so

the weight value of feature x would be added; On the contrary, if Diff(x, R, M) < Diff(x, R ,H),

the weight value of the feature would be reduced. Repeat the above procedure m times, finally

get average weight of each feature. The bigger the weight value, the better the feature is.

55

The pseudo-code of Relief is given below:

Input: training set D, iterations m

Output: the weight value vector W[A]

Set all the weight value of W[A]=0

for i=1 to m do begin

Select sample R randomly;

Find out NearestHit H and NearestMiss M;

for A=1 to N do

W[A]=W[A]-diff(A,R,H)/m+diff(A,R,M)/m;

End;

The advantages of Relief series algorithms are: high efficiency, there is no restriction on

the data type and the relationship between features is not sensitive. The drawbacks of Relief

series algorithms are: they cannot remove redundant features, it would be given higher weight

value to the features with higher categories correlation, and regardless of whether the feature is

redundancy or not for the rest features.

4.2.4 Information Gain and Similarity

In this method fitness is evaluated based on the Information Gain and Similarity of an

attribute. A good subset selection should have attributes with high information gain, similarity of

the individual attribute with the class should be high and the similarity of the attributes with one

another should be less.

The Information Gain of an attribute x with respect to class c is given by

IG(c, x) = H(c) H(c | x) (4.1)

Where H(x) is the entropy of x and H(c | x) is the conditional entropy of c when value of feature

x is known.

56

pairsnum

xxSim

k

xcSimxcIG

XEji

k

i

k

i

ii )','(/

)',()',(

)( 1 1

The similarity between feature x and y is computed and the value range of Sim(x, y) is

[0,1]. Sim(x,y) is 0 means that x and y are completely irrelevant. Sim(x, y) is 1 means that x and

y are completely relevant. When Sim(x,y) is greater than a threshold , the feature x and y are

redundant.

(4.2)

The overall benefit of a feature x is given by the equation:

(4.3)

4.3 Implementation of Genetic Algorithm for feature selection

The feature selection algorithm has been implemented using MATLAB.

Fitness function is the objective function we want to minimize. We can specify the function as a

function handle of the form @distance_fitness_function, where distance_fitness_function.m is an

M-file that returns a scalar. The implementation of Relief algorithm is present in the

distance_fitness_function.m file

The distance_fitness_function performs a fitness function on a set of attributes based on

the ReliefF algorithm. At the beginning of the function, a training set of clinical dataset is read.

The total numbers of attributes as well as the total number of instances are stored in variables.

The position of class, i.e. an increment of the total number of attributes is also stored and the

attribute details are loaded. Then we specify the number of random samples that are to be

57

chosen. This signifies the number of iterations that the fitness function will perform for a

particular set of attributes. The weight variable is initially set to zero.

The MATLAB function rand() generates a random number between 0 and 9.99. Hence

we multiply this function by ten to the power of the number of digits of the total instances to give

a random number in the appropriate range. We then round-off this number to give an integer

value. We then define variables for nearest hit, nearest miss, hit value and miss value and

initialize them to 0, 0, infinity and infinity respectively.

We initialize a loop in which an index variable varies from one to number of instances in

the dataset. As long as the index variable is not equal to the generated random number, the

distance between the attribute corresponding to the index number in the training set and the

attribute corresponding to the random number in the training set is found out. Here, the distance

function performs the Exclusive OR operation between the selected attributes and the sum total

of the number of 1’s in the result is returned as the distance. Then we check if the element

present in position given by the position of class of the attribute corresponding to the random

number is equal to the corresponding element of the attribute given by the index number. If

equal, then the distance is stored as hit value and the index number is stored as the nearest hit. If

not equal, then the distance is stored as the miss value and the index number is stored as the

nearest miss.

Then, the input attribute set is loaded and for each one in the attribute set, corresponding

weight is computed as weight= weight – [absolute value of element present in position given by

index number in training set corresponding to the attribute given by random number] – [absolute

value of element present in position given by index number in training set corresponding to the

58

attribute given by nearest hit divided by number of samples to be chosen] + [absolute value of

element present in position given by index number in training set corresponding to the attribute

given by random number] - [absolute value of element present in position given by index number

in training set corresponding to the attribute given by nearest miss divided by number of samples

to be chosen]. Finally, the return value of the fitness function is calculated as the negative of the

weight value divided by the number of one’s in the input set.

Number of variables is the number of independent variables for the fitness function. Here the

number of variables is based on the number of attributes in the experimental dataset

Plot Functions

Plot functions enable us to plot various aspects of the genetic algorithm as it is executing.

Each one will draw in a separate axis on the display window. We can use the Stop button on the

window to interrupt a running process. Best individual is chosen as a plot function in this

experiment

Best individual plots the vector entries of the individual with the best fitness function

value in each generation.

4.3.1 Population Options

Population options specify options for the population of the genetic algorithm.

Population type specifies the type of the input to the fitness function. Bit string has been chosen

as Population type in this experiment.

59

Population size specifies how many individuals there are in each generation. Population size is

set to be a vector of length of 20, the algorithm creates multiple subpopulations. Each entry of

the vector specifies the size of a subpopulation.

Creation function specifies the function that creates the initial population. The default creation

function Uniform is used in our experiment that creates a random initial population with a

uniform distribution.

Initial population enables us to specify an initial population for the genetic algorithm. Since an

initial population is not specified, the algorithm creates one using the Creation function.

Initial scores enable us to specify scores for initial population. Since initial scores is not

specified, the algorithm computes the scores using the fitness function.

Initial range specifies lower and upper bounds for the entries of the vectors in the initial

population. We have specified Initial range as a matrix with 2 rows and Initial length columns.

The first row contains lower bounds for the entries of the vectors in the initial population, while

the second row contains upper bounds.

4.3.2 Fitness Scaling Options

The scaling function converts raw fitness scores returned by the fitness function to values

in a range that is suitable for the selection function.

Scaling function specifies the function that performs the scaling. Rank scaling is chosen as a

scaling function

Rank scales the raw scores based on the rank of each individual, rather than its score. The

rank of an individual is its position in the sorted scores. The rank of the fittest individual

60

is 1, the next fittest is 2 and so on. Rank fitness scaling removes the effect of the spread

of the raw scores.

4.3.3 Selection Options

The selection function chooses parents for the next generation based on their scaled values

from the fitness scaling function. The Stochastic uniform function performs the selection.

Stochastic uniform lays out a line in which each parent corresponds to a section of the

line of length proportional to its expectation. The algorithm moves along the line in steps

of equal size, one step for each parent. At each step, the algorithm allocates a parent from

the section it lands on. The first step is a uniform random number less than the step size.

4.3.4 Reproduction Options

Reproduction options determine how the genetic algorithm creates children at each new

generation.

Elite count specifies the number of individuals that are guaranteed to survive to the next

generation. Elite count is set to 2, which is less than or equal to Population Size.

Crossover fraction specifies the fraction of the next generation, other than elite individuals, that

are produced by crossover. The remaining individuals, other than elite individuals, in the next

generation are produced by mutation. Crossover fraction is set to 0.8.

61

4.3.5 Mutation Options

Mutation functions make small random changes in the individuals in the population, which

provide genetic diversity and enable the GA to search a broader space. Gaussian function

performs the mutation.

Gaussian adds a random number to each vector entry of an individual. This random

number is taken from a Gaussian distribution centered on zero. The variance of this

distribution can be controlled with two parameters. The Scale parameter determines the

variance at the first generation. The Shrink parameter controls how variance shrinks as

generations go by. The Shrink parameter is set to 1 and the variance shrinks to 0 linearly

as the last generation is reached.

4.3.6 Crossover Options

Crossover combines two individuals, or parents, to form a new individual, or child, for

the next generation. Scattered function performs the Crossover function.

Scattered creates a random binary vector. It then selects the genes where the vector is a 1

from the first parent, and the genes where the vector is a 0 from the second parent, and combines

the genes to form the child. For example,

p1 = [a b c d e f g h]

p2 = [1 2 3 4 5 6 7 8]

random crossover vector = [1 1 0 0 1 0 0 0]

child = [a b 3 4 e 6 7 8]

62

4.3.7 Migration Options

Migration is the movement of individuals between subpopulations, which the algorithm

creates if we set Population size to be a vector of length greater than 1. Every so often, the best

individuals from one subpopulation replace the worst individuals in another subpopulation. We

can control how migration occurs by the following three parameters.

Direction - Migration can take place in one direction or two.

Direction is set to Forward; migration takes place toward the last subpopulation. That is

the nth subpopulation migrates into the (n+1)'th subpopulation.

Fraction controls how many individuals move between subpopulations. Fraction is the fraction

of the smaller of the two subpopulations that moves. Fraction is set to 0.2 in our experiment.

Individuals that migrate from one subpopulation to another are copied. They are not removed

from the source subpopulation.

Interval controls how many generations pass between migrations. We have set Interval to 20,

migration between subpopulations takes place every 20 generations.

4.3.8 Hybrid Function Options

Hybrid Function enables us to specify another minimization function that runs after the genetic

algorithm terminates. In our experiment Hybrid unction option is set as none.

4.3.9 Stopping Criteria Options

Stopping criteria determine what causes the algorithm to terminate.

63

Generations specifies the maximum number of iterations the genetic algorithm performs. In this

experiment generation is set to 100.

Time limit specifies the maximum time in seconds the genetic algorithm runs before stopping.

In this experiment time limit is set to Infinity.

Fitness limit - If the best fitness value is less than or equal to the value of Fitness limit, the

algorithm stops. In this experiment fitness limit is set to Infinity.

Stall generations - If there is no improvement in the best fitness value for the number of

generations specified by Stall generations, the algorithm stops. In this experiment stall

generations is set to 50.

Stall time limit - If there is no improvement in the best fitness value for an interval of time in

seconds specified by Stall time limit, the algorithm stop. In this experiment stall time limit is set

to 50.

4.3.10 Display to Command Window Options

Level of display specifies the amount of information displayed in the MATLAB command

window when we run the genetic algorithm. We have chosen the option as off and only the final

answer is displayed.

Vectorize Option

The vectorize option specifies whether the computation of the fitness function is vectorized. The

objective function is vectorized to off to indicate that the fitness function is scalar.

64

4.4 Experimental datasets

Five standard clinical datasets of varying sizes and characteristics were obtained from

UCI Machine Learning Repository and one from BHEL Hospital is used in this experiment. The

details of the datasets are as follows:

We have two datasets for appendicitis. The first standard appendicitis dataset[61] from

UCI Machine Learning Repository is used to discriminate healthy people from those with

appendicitis disease, according to class attribute which is set to either 0 for healthy and 1 for

appendicitis disease. This dataset contains 9 numeric valued attributes and 1 binary valued class

variable and 106 records. The second data set is used to diagnose the severity of appendicitis in

patients presenting with right iliac fossa (RIF) pain. It is based on the statistics collected about

the presence of appendicitis from patients data set of around 2230 records collected from BHEL

Hospital, Tiruchirappalli, India. The second dataset is used to discriminate patients to different

classes of appendicitis namely mild, moderate and severe appendicitis.

Parkinson’s Dataset [62] is composed of a range of biomedical voice measurements from

31 people, 23 with Parkinson's disease. The main aim of the data is to discriminate healthy

people from those with Parkinson’s Disease, according to class attribute which is set to either 0

for healthy and 1 for Parkinson’s Disease.

ARCENE's [63] task is to distinguish cancer versus normal patterns from mass-

spectrometric data. This is a two-class classification problem with continuous input variables.

ARCENE was obtained by merging three mass-spectrometry datasets to obtain enough training

and test data for a benchmark.

65

SPECT Heart Dataset[64] describes diagnosing of cardiac Single Proton Emission

Computed Tomography (SPECT) images. Each patient is classified into two categories: normal

and abnormal.

Cardiotocography Dataset [63] contains the processed information of 2126 fetal

cardiotocograms (CTGs) and the respective diagnostic features measured. The CTGs were also

classified by three expert obstetricians and a consensus classification label was assigned to each

of them. They classified the fetal state as Normal and Abnormal.

4.5 Experimental Results

The classification accuracy of Genetic algorithms with Decision Tree Classifier, Naïve

Bayesian classifier and k-Nearest Neighbor Classifier for appendicitis dataset is 88.68%, 88.68%

and 85.85% respectively. The classification accuracy of Information Gain with Decision Tree

Classifier, Naïve Bayesian classifier and k-Nearest Neighbor Classifier is 83.02%, 83.96% and

81.13% respectively. The classification accuracy of Chi-Square algorithm with Decision Tree

Classifier, Naïve Bayesian classifier and k-Nearest Neighbor Classifier is 83.02%, 83.96% and

81.13% respectively. The classification accuracy of BLogReg algorithm with Decision Tree

Classifier, Naïve Bayesian classifier and k-Nearest Neighbor Classifier is 85.85%, 82.08% and

80.19% respectively. The classification accuracy of FCBF algorithm with Decision Tree

Classifier, Naïve Bayesian classifier and k-Nearest Neighbor Classifier is 85.85%, 83.02% and

83.02% respectively. The classification accuracy of Genetic Algorithms and different feature

selection techniques on other clinical data sets are given in detail in the Chapter Experimental

Results.

66

Table 4.1 Classification accuracy of different feature selection techniques on

Appendicitis dataset

Feature

Selection

algorithm

Number of

attributes

in the

dataset

Number

of

attributes

selected

Accuracy of

Decision

Tree

Classifier

Accuracy of

Naïve

Bayesian

Classifier

Accuracy of

k-Nearest

Neighbor

Classifier

Genetic

Algorithm 8 4 88.68% 88.68% 85.85%

Information Gain 8 4 83.02% 83.96% 81.13%

Chi square 8 4 83.02% 83.96% 81.13%

BLogReg 8 1 85.85% 82.08% 80.19%

FCBF 8 2 85.85% 83.02% 83.02%

4.6 Chapter Conclusions

It is observed that the proposed Relief Algorithm based feature selection implemented in

Genetic algorithm has high performance compared to the other feature selection algorithms with

different classification techniques. Genetic Algorithm is the best feature selection algorithm for

Appendicitis, Parkinson’s and ARCENE datasets, which have all attributes as real valued

attributes. It is clear that for high-dimensional datasets Genetic Algorithm in combination with

decision tree is the best feature selection strategy.