an innovative approach for feature selection based on chicken swarm optimization
TRANSCRIPT
AN INNOVATIVE APPROACH FOR FEATURE SELECTION BASED ON CHICKEN SWARM
OPTIMIZATION
By
*Ahmed Hafez and Aboul Ella Hassanien
Bio-inspiring and evolutionary computation: Trends, applications and open issues workshop, 7 Nov. 2015 Faculty of Computers and Information, Cairo University
www.egyptscience.net
*Faculty of Computers and Information, Minia University and SRGE Member
Agenda1. Introduction.
3. System Overview and Methods4. Experimental Results.
5. Conclusions.
2. Motivation
Introduction Feature selection algorithm explores the data to eliminate
noisy, irrelevant, redundant data, and simultaneously optimize the classification performance.
Feature selection is one of the most important stage in data mining, multimedia information retrieval, pattern classification, and machine learning applications, which can influence the classification accuracy rate.
In real world problems, feature selection is a must due to the abundance of noisy, misleading or irrelevant features
IntroductionFeature Selection in ML Naive theoretical view:
More features => More information=> More discrimination power
In practice: many reasons why this is not the case!
IntroductionFeature Selection in ML Many explored domains have hundreds to tens of thousands
of variables/features with many irrelevant and redundant ones.
In domains with many features the underlying probability distribution can be very complex and very hard to estimate (e.g. dependencies between variables).
Irrelevant and redundant features can “confuse” learners (classifiers).
Limited training data. Limited computational resources.
Agenda1. Introduction.
3. System Overview and Methods4. Experimental Results.
5. Conclusions.
2. Motivation.
Motivation Chicken Swarm Optimization is a new evolutionary
computation technique which mimics the hierarchal order of a chicken swarm and the behaviors of its individuals chickens. in nature.
The objective of feature selection:1. Improving the prediction performance of the
predictors (classifiers).2. Providing a faster and more cost-effective
predictors.3. Providing a better understanding of the
underlying process that generated the data.
Motivation In this paper, a classification accuracy-based
fitness function is proposed by Chicken Swarm Optimization to find optimal feature subset.
The aim of the Chicken Swarm Optimization is to find optimal regions of the complex search space through the interaction of individuals in the population. Compared with particle swarm optimization (PSO) and Genetic Algorithms (GA) over a set of UCI machine learning data repository.
Agenda1. Introduction.
3. System Overview and Methods4. Experimental Results.
5. Conclusions.
2. Motivation.
Bio-inspired optimizationIntroduction Inspired from the nature social behavior and dynamic
movements with communications of insects, birds, animals, and fish.
The term swarm (shoaling, swarming, or flocking) is applied to fish, insects, birds, and describes a behavior of an aggregation of animals of similar size and body orientation, generally cruising in the same direction.
Bio-inspired optimizationChicken Swarm Optimization (CSO) A chicken swarm is divided into several groups, each
group consists of one rooster and many hens and chicks. Each type of chickens follow different laws of motions. A hierarchal order plays a significant role in the social
lives of chickens. The superior chickens in a flock will dominate the weak
ones. More dominant hens remain near to the head rooster
Bio-inspired optimizationChicken Swarm Optimization (CSO) Mathematical model assumptions of CSO :
The chicken swarm is divided into several groups. In each groups there is a dominant rooster, followed by some hens and chicks.
Hens follow their group-mate roosters to search for food. Chicks move around their mother to search for food. Fitness value outlines swarm hierarchy:
individuals with best fitness value -> roosters ( group leader). individuals with the worst fitness values -> chicks others -> hens.
The swarm hierarchy will remain unchanged only updated every several (G) time steps..
Rooster Movements:
Hens Movements:
Chicks Movements :
Bio-inspired optimizationChicken Swarm Optimization (CSO)
Agenda1. Introduction.
3. System Overview and Methods4. Experimental Results.
5. Conclusions.
2. Motivation.
Experiment results We used 18 data sets for further experiments.
The data sets are drawn from the UCI data repository. The data is divided into 3 equal parts one for training, the second part is for validation and the third part is for testing.
The CSO optimizer is evaluated against to other algorithms : Genetic Algorithms (GA) and Particle swarm optimization (PSO).
The classification performance is measured for each optimizer, compared to the performance when using all features in the each dataset without a feature selection algorithm.
Data set No. of attributes No. of instancesLymphograp
hy 18 148WineEW 13 178
BreastEW 30 569Breastcancer 9 699CongressEW 16 435
Exactly 13 1000Exactly2 13 1000HeartEW 13 270
IonosphereEW 34 351
KrvskpEW 36 3196M-of-n 13 1000
PenglungEW 325 73SonarEW 60 208SpectEW 22 267
Tic-tac-toe 9 958Vote 16 300
WaveformEW 40 5000Zoo 16 101
Global parameter Values :
Specific optimizer’s parameter :
Experiment results
Algorithms Parameter Value
CSOr - The population size of roosters 0.15h - The population size of hens 0.7
m - The population size of mother hens 0.5
PSOw - value of the inertia factor 0.1c - individual-best acceleration
factor 0.1
GA Crossover Fraction 0.8Migration Fraction 0.2
Parameter Value
Fitness function constant0.999
9The Number of iterations for optimization 70
Number of used search agents in the optimization 10 The number of times repeating the stochastic
optimization 20
Experiment results Mean classification error on test data for each dataset compared to the
data with all features:
Breast
cancer
Breast
EW
Congre
ssEW
Exact
ly
Exact
ly2
HeartEW
Ionosp
hereEW
Krvskp
EW
Lymph
ograp
hyM-of
-n
Penglung
EW
Sona
rEW
Spect
EW
Tic-ta
c-toe Vo
te
Wavefor
mEW
WineEW Zo
o0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
GA PSO CSO All Features
Experiment results Best fitness values for each dataset obtained by different
optimizers :
Breast
cancer
Breast
EW
Cong
ressEW
Exact
ly
Exact
ly2
HeartEW
Ionosp
hereEW
Krvskp
EW
Lymph
ograp
hyM-of
-n
Penglun
gEW
SonarE
W
Spect
EW
Tic-ta
c-toe Vo
te
Wavefor
mEW
WineEW Zo
o0
0.05
0.1
0.15
0.2
0.25
0.3
GA PSO CSO All Features
Experiment results Worst fitness values for each dataset obtained by different
optimizers :
Breast
cancer
Breast
EW
Cong
ressEW
Exact
ly
Exact
ly2
HeartEW
Ionosp
hereEW
Krvskp
EW
Lymph
ograp
hyM-of
-n
Penglu
ngEW
SonarE
W
Spect
EW
Tic-ta
c-toe Vo
te
Wavefor
mEW
WineEW Zo
o0
0.1
0.2
0.3
0.4
0.5
GA PSO CSO All Features
Experiment results Mean and std fitness values obtained by different optimizers:
Breast
cancer
Breast
EW
Cong
ressEW
Exact
ly
Exact
ly2
HeartEW
Ionosp
hereEW
Krvskp
EW
Lymph
ograp
hyM-of
-n
Penglu
ngEW
Sona
rEW
Spect
EW
Tic-ta
c-toe Vo
te
Wavefor
mEW
WineEW Zo
o0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
GA PSO CSO All Features
Experiment results Feature reduction on each dataset using different optimizers.
Breast
cancer
Breast
EW
Cong
ressEW
Exact
ly
Exact
ly2
HeartEW
Ionosp
hereEW
Krvskp
EW
Lymph
ograp
hyM-of
-n
Penglu
ngEW
SonarE
W
Spect
EW
Tic-ta
c-toe Vo
te
Wavefor
mEW
WineEW Zo
o0
0.10.20.30.40.50.60.70.80.9
GA PSO CSO
Experiment results
Best fitness values Worst fitness values0
0.05
0.1
0.15
0.2
0.25
Average Best and Worst values over all datasets for each op-
timizer
GA PSO CSO All Features GA PSO CSO All Features0
0.05
0.1
0.15
0.2
0.25
Average – Mean fitness values over all datasets for each op-
timizer
Experiment results
GA PSO CSO0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Average - feature reduction over all datasets for each optimizer
GA PSO CSO All Features0
0.05
0.1
0.15
0.2
0.25
Average - Mean classification er-ror over all datasets for each op-
timizer
Agenda1. Introduction.
3. System Overview and Methods4. Experimental Results.
5. Conclusions.
2. Motivation.
Conclusions The objective of this paper was to propose a chicken swarm
optimization (CSO) algorithm for feature selection to choose minimal number of features and to obtain comparable or even better classification accuracy from utilizing all attributes.
This study shows that CSO is an effective search algorithm for feature selection problems.
The used fitness function targets both the classification accuracy and reduction size, which means we can obtain a set of minimum selected features with maximum accuracy.
The CSO proves an advance in both reduction size and classification accuracy over PSO and GA.
Future Work We will work on the updating mechanisms in CSO to resolve
feature selection to further minimize the number of attributes, maximize the classification accuracy.
Also, we will examine the employ of chicken swarm optimization (CSO) algorithm for feature selection on datasets with a large number of attributes.
Thank youE-mail: [email protected]