[ieee 2011 ieee congress on evolutionary computation (cec) - new orleans, la, usa...
TRANSCRIPT
Artificial Neural Network Synthesis by means of
Artificial Bee Colony (ABC) Algorithm
Beatriz A. Garro
Center for Computing Research
National Polytechnic Institute
CIC-IPN
Mexico City, Mexico
Email: [email protected]
Humberto Sossa
Center for Computing Research
National Polytechnic Institute
CIC-IPN
Mexico City, Mexico
Email: [email protected]
Roberto A. Vazquez
Intelligent Systems Group
Faculty of Engineering
La Salle University
Mexico City, Mexico
Email: [email protected]
Abstract—Artificial bee colony (ABC) algorithm has been usedin several optimization problems, including the optimization ofsynaptic weights from an Artificial Neural Network (ANN).However, this is not enough to generate a robust ANN. Forthat reason, some authors have proposed methodologies basedon so-called metaheuristics that automatically allow designingan ANN, taking into account not only the optimization of thesynaptic weights as well as the ANN’s architecture, and thetransfer function of each neuron. However, those methodologiesdo not generate a reduced design (synthesis) of the ANN. In thispaper, we present an ABC based methodology, that maximizes itsaccuracy and minimizes the number of connections of an ANNby evolving at the same time the synaptic weights, the ANN’sarchitecture and the transfer functions of each neuron. Themethodology is tested with several pattern recognition problems.
I. INTRODUCTION
Artificial neural networks (ANN) are very important tools
for solving different kind of problems such as pattern classifi-
cation, forecasting and regression. However, their design imply
a mechanism of error-testing that tests different architectures,
transfer functions and the selection of a training algorithm
that permits to adjust the synaptic weights of the ANN. This
design is very important because the wrong selection of one of
these characteristics could provoke that the training algorithm
be trapped in a local minimum. Because of this, several
metaheuristic based methods in order to obtain a good ANN
design have been reported.
In [1], Xin Yao presents a state-of-the-art where evolution-
ary algorithms are used to evolve the synaptic weights and the
architecture, in some cases with the help of classic techniques
like back-propagation algorithm. But there are some works
like [2] where the authors evolve automatically the design of
an ANN using basic PSO, second generation PSO (2GPSO)
and a new technique (NMPSO). In [3], the authors design an
ANN by means of DE algorithm and compare it with other
bio-inspired techniques. In these last two works, the authors
evolve, at the same time, the principal features of an ANN:
the synaptic weights, the transfer functions for each neuron
and the architecture. However, the architectures obtained by
these two methods contain many connections.
In [4] the authors train an ANN by means of ABC al-
gorithm. In [5] the authors applied this algorithm to train
a feed-forward for solving the XOR, 3-Bit Parity and 4-
Bit Encoder-Decoder problems. In the pattern classification
area, other works like [6] ABC algorithm is compared with
other evolutionary techniques, while in [7] an ANN is trained
with medical pattern classification. Another problem solved by
applying the ABC algorithm can be found in [8], where the
authors test with clustering problems.
In [9], authors train a RBF Neural Network using ABC
algorithm. In this work four characteristics of this kind of
ANN are optimized: the weights between the hidden layer
and the output layer, the spread parameters of the hidden layer
base function, the center vectors of the hidden layer and the
bias parameters of the neurons of the output layer. In [10]
an ANN is trained to estimate and model the daily reference
evapotranspiration of two USA stations.
There are other kinds of algorithms based on the bee
behavior that have been applied to train an ANN. For example,
in [11] bee algorithm is used to identify woods defects, while
in [12], the same algorithm is applied to optimize the synaptic
weights of an ANN. In [13] a good review concerning this
kind a bio-inspired algorithms to provide solutions to different
problems is given.
It said that ABC algorithm is a good optimization technique.
In this paper we want to verify if this algorithm performs in
the automatic designing of an ANN, including not only the
synaptic weights but also architecture and the transfer func-
tions of the neurons. As we will see, the architectures obtained
are optimal in the sense that the number of connections is
minimal without loosing efficiency.
The paper is organized as follows: in section 2 we briefly
present the basics of ANNs. In section 3 we explain the
fundamental concepts of ABC algorithm, while in section 4
we describe how ABC algorithm is used to design and ANN
and how the ANN’s architecture can be optimized. In section 5
the experimental results using different classification problems
are given. Finally, in section 6 we present the conclusions of
the work.
II. ARTIFICIAL NEURAL NETWORKS
An ANN tries to simulate the brain’s behavior when they
generate information, save it or transform it. An ANN is a
331978-1-4244-7835-4/11/$26.00 ©2011 IEEE
system made up from simple processing units. This system
offers the input-output mapping property and capability [14].
This type of processing unit performs in two stages: weighted
summation and some type of non-linear function, this allows
the ANN to realize a learning stage of the input data rep-
resenting the problem to be solved. Each value of an input
pattern a ∈ IRN is associated with its synaptic weight values
W ∈ IRN , which is normally between 0 and 1. Furthermore,
the summation function often takes an extra input value θ
with weight value 1, representing a threshold or a bias for the
neuron. The summation function will be then performed as
Eq. 1.
o =
N∑
i=1
aiwi + θ (1)
The sum of the products is then passed to the second stage
to perform the activation function f (o) which generates the
output of the neuron, and determines the behavior of the neural
model. By connecting multiple neurons, the true computing
power of the ANN emerges. The most common structure of
connecting neurons is by layers. In a multilayer structure,
the input nodes, which receive the pattern a ∈ IRN , pass
the information to the units in the first hidden layer, the
outputs from this first hidden layer are passed to the next layer,
and so on until reaching the output layer, producing thus an
approximation of the desired output y ∈ IRM .
Basically, learning is a process by which the free parameters
(i.e., synaptic weights W and bias levels θ) of an ANN are
adapted through a continuous process of stimulation by the
environment in which the network is embedded. The type of
learning is determined by the manner in which the parameter
changes take place. On the other hand, the learning process
may be classified as: supervised or unsupervised. In this paper
we focus on supervised learning that assumes the availability
of a labeled set of training data made up of p input-output
samples (see Eq. 2):
Tξ ={(
aξ ∈ IRN ,dξ ∈ IRM)}
∀ξ = 1, . . . , p (2)
where a is the input pattern and d the desired response.
Given the training sample Tξ, the requirement is to compute
the free parameters of the neural network so that the actual
output yξ of the ANN due to aξ is close enough to dξ for all ξ
in a statistical sense. In this sense, we may use the mean-square
error (MSE) given in Eq. 6 as the first objective function to
be minimized:
e =1
p ·M
p∑
ξ=1
M∑
i=1
(
dξi − y
ξi
)2
(3)
One of the most used ANNs is the feed-forward neural
network, trained by means of the back-propagation (BP) algo-
rithm [15] and [16]. This algorithm minimizes the objective
function described by Eq. 3. Some algorithms constantly adjust
the values of the synaptic weights until the value of the
error no longer decreases. However these classic algorithms
can converge to a local minimum instead of to the desired
global minimum. Furthermore, the architecture and the transfer
function used in their design can influence the ANN’s perfor-
mance; consequently, the learning algorithm can be trapped in
a minimum faraway from the best solution.
III. ARTIFICIAL BEE COLONY ALGORITHM
Artificial Bee Colony (ABC) algorithm is based on the
metaphor of the bees foraging behavior. The natural selection
which created this beautiful system of communication can
also be seen within the system. Information about different
parts of the environment is like species in competition. The
fitness of the species is given by the profitability of the food
source it describes. Information survives by continuing to
circulate within the nest, and is capable of reproducing itself
by recruiting new foragers who become informed of the food
source and come back to the nest and share their information
[17].
ABC algorithm was proposed by Karaboga in 2005 [18]
for solving numerical optimization problems. This algorithm is
based on the model proposed by Tereshko and Loengarov [17].
It consists in a set of possible solutions xi (the population) that
are represented by the position of the food sources. On the
other hand, in order to find the best solution three classes of
bees are used: employed bees, onlooker bees and scout bees.
These bees have got different tasks in the colony, i. e., in the
search space.
Employed bees: Each bee searches for new neighbor food
source near of their hive. After that, it compares the food
source against the old one using Eq. 4. Then, it saves in their
memory the best food source.
vji = x
ji + φ
ji
(
xji − x
jk
)
(4)
where k ∈ {1, 2, ..., SN} and j ∈ {1, 2, ..., D} are randomly
chosen indexes and k 6= i. φji is a random number between
[−a, a].After that, the bee evaluates the quality of each food source
based on the amount of the nectar (the information) i.e. the
fitness function is calculated. Finally, it returns to the dancing
area in the hive, where the onlookers bees are.
Onlooker bees: This kind of bees watch the dancing of the
employed bee so as to know where the food source can be
found, if the nectar is of high quality, as well as the size of
the food source. The onlooker bee chooses probabilistically
a food source depending on the amount of nectar shown by
each employed bee, see Eq. 5.
pi =fiti
SN∑
n=1fitn
(5)
where fiti is the fitness value of the solution i and SN is
the number of food sources which are equal to the number of
employed bees.
Scout bees: This kind of bees helps the colony to randomly
create new solutions when a food source cannot be improved
332
anymore, see Eq. 6. This phenomenon is called “limit” or
“abandonment criteria”.
xji = x
jmin + rand (0, 1)
(
xjmax − x
jmin
)
(6)
The pseudo-code of the ABC algorithm is next shown:
1: Initialize the population of solutions xi∀i, i = 1, ..., SN2: Evaluate the population xi,G∀i, i = 1, ..., NP
3: for cycle = 1 to MCN do
4: Produce new solutions vi for the employed bees by
using Eq. 4 and evaluate them.
5: Apply the greedy selection process.
6: Calculate the probability values pi for the solutions xi
by Eq. 5.
7: Produce the new solutions vi for the onlookers from
the solutions xi selected depending on pi and evaluate
them.
8: Apply the greedy selection process.
9: Determine the abandoned solution for the scout, if exist,
and replace it with a new randomly produced solution
xi by Eq. 6.
10: Memorize the best solution achieved so far.
11: cycle = cycle+ 112: end for
IV. METHODOLOGY
The main aim of our methodology is to evolve, at the
same time, the synaptic weights, the architecture (or topology),
and the transfer functions of each neuron so as to obtain a
minimum Mean Square Error (MSE) as well as a minimum
classification error (CER). At the same time, we look to
optimize the ANN’s architecture by reducing the number of
neurons and their connections.
The problem to be solved can be defined as follows: Giving
a set of input patterns X ={
x1, ...,xp}
,x ∈ IRn and a set
of desired patterns D ={
d1, ...,dp}
,d ∈ IRm, find an ANN
represented by W ∈ IRq×(q+2) such that a function defined
by min (F (D,X,W )) is minimized.
The codification of the ANN’s design to be evolved by
ABC algorithm is given in Fig. 1. This figure shows the food
source’s position representing the solution to the problem. This
solution is defined by a matrix W ∈ IRq×(q+2) composed by
three main parts: the topology (T), the synaptic weights (SW),
and the transfer functions (F) where q is the maximum number
of neurons MNN; it is defined by q = 2 (m+ n) (remember
that n is the dimension of the input patterns vector and m is
the dimension of the desired patterns vector).
The three parts of the matrix W take values form three
different ranges. For the case of the topology, the range is
between[
1, 2MNN − 1]
, for the case of the synaptic weights
is between [−4, 4] and for the case of the transfer functions the
range is [1, nF ], where nF is the number of transfer functions.
The ANN’s topology is codified based on the binary square
matrix representation of a graph x where each component xij
Fig. 1. Representation of the individual codifying the architecture, synapticweights and transfer functions.
represents the connections between neuron i and neuron j
when xij = 1. This information was codified into its decimal
base. For example, suppose that next binary code ”01101”
represents the connections of the ith neuron to five neurons.
From this binary code, we can observe that only neurons
two, three, and five are connected to neuron i. This binary
code is transformed into its decimal base value resulting in
”13; this will be the number that we will evolved instead of
the binary value. This scheme is much faster to manipulate.
Instead of evolving a string of bits, we thus evolve a decimal
base number.
The synaptic weights of the ANN are codified again by
square matrix representation of a graph x where each compo-
nent xij represents the synaptic weight between neuron i and
neuron j.
Finally, the transfer function for each neuron is represented
by an integer in the range of [0, 5] codifying one of the six
transfer functions used in this research: logsig, tansig, sin,
radbas, pureline, and hardlim. These functions were selected
for they are the most popular and useful transfer functions in
several kinds of problems.
When the aptitude of an individual is computed by means
of the MSE function (Eq. 7), all the values of matrix W are
codified so as to obtain the desired ANN. Moreover, each
solution must be tested in order to evaluate its performance.
Due to the methodology is tested with several pattern classifi-
cation problems, it is necessary to know the classification error
(CER) function, this means to know how many patterns have
been correctly classified and many were incorrectly classified.
F1 =1
p ·M
p∑
ξ=1
M∑
i=1
(
dξi − y
ξi
)2
(7)
For the case of the CER fitness function, the output of
the ANN is transformed by means of the winner-take-all
technique; this codification is then compared with the set of
the desired patterns. When the output of the ANN equals
the corresponding desired pattern, this means that the pattern
has been correctly classified, otherwise it was incorrectly
classified. Based on the winner-take-all technique we can
compute the CER, defined by Eq. 8:
F2 = 1−npwc
tnp(8)
333
where npwc is the number of patterns well classified and tnpis the total number of patterns to be classified.
When the MSE is used, the output yi of the ANN is
computed as follows:
1: For the first n neurons, the output oi = ai.
2: for nei with i = n to MNN do
3: Get connections by using individual x1,i.
4: for each neuron j < i connected to nei do
5: oi = f (o) where f is a transfer function giving by
individual xm,i and o is computing using Eq. 4.
6: end for
7: Finally, yi = oi, i = MNN −m, . . . ,MNN .
8: end for
Note that the restriction j < i is used to avoid the generation
of cycles in the ANN.
Until now, we have defined two fitness functions that help
to maximize the ANN’s accuracy by minimizing their error
(MSE or CER). Now, we have to propose a function that helps
not only to get a maximum accuracy but also to minimize
the number of connections of the ANN. The reduction of the
architecture could be represented as follows:
F3 =NC
NMaxC(9)
where NC is the number of connections in the ANN designed
by the proposed methodology and NMaxC is the maxi-
mum number of connections generated with MNN neurons.
NMaxC is given as:
NMaxC =
MNN∑
i=n
i (10)
It is important to notice that if F3 is used as a fitness
function in the ABC algorithm. The proposed methodology
will allow synthesizing the ANN but the accuracy will not
be maximized. For that reason, we have to proposed a fitness
function that integrates both objectives: the minimization of
the error and synthesis of the ANN (the reduction of the
number of connections). Two fitness functions are proposed
to achieved this goal using the ABC algorithm. These fitness
functions are composed by combining functions F1, F2 and
F3. First fitness function (FF1) is represented by Eq. 11, while
second fitness function (FF2) is represented by Eq. 12.
FF1 = F1 · F3 (11)
FF2 = F2 · F3 (12)
With these functions we will next see the we will be able
to design ANNs with a high accuracy and a very low number
of connections.
0 500 1000 1500 2000 2500 3000 3500 40000
0.05
0.1
0.15
0.2
0.25
0.3
0.35
generations
MS
E
Evolution of the error (Object recognition)
(a)
0 500 1000 1500 2000 2500 3000 3500 40000
0.05
0.1
0.15
0.2
0.25
0.3
0.35
generations
CE
R
Evolution of the error (Object recognition)
(b)
Fig. 2. Evolution of the error for the ten experiments for the objectrecognition problem. (a) Evolution of FF1 using MSE function. (b) Evolutionof FF2 using CER function.
V. RESULTS
Several experiments were performed in order to evaluate
the accuracy of the ANN designed by means of the proposal.
The accuracy of the ANN was tested with four pattern
classification problems. Three of them were taken from UCI
machine learning benchmark repository [19]: iris plant, wine
and breast cancer datasets. The other database was a real object
recognition problem.
The main features of each dataset are next given. For the
case of the object recognition dataset the dimension of the
input vector is 7 and the number of classes is 5 objects. For
the iris plant dataset, the dimension of input vector is 4 and
the number of classes is 3. For the wine dataset, the dimension
of input vector is 13 and the number of classes is 3. Finally,
for the breast cancer dataset, the dimension of input vector is
9 and the number of classes is 2.
The parameters of the ABC algorithm were set to the same
value for all the dataset problems: Colony size (NP = 50),
number of food sources NP/2, limit = 100 and the max-
imum number of cycles was MCN = 4000. Six different
transfer functions were used in all experiments: SN=sin func-
tion, LS=sigmoid function, HT= hyper-tangential function,
GS=Gaussian function, LN=liner function and HL= hard limit
function.
20 experiments using each dataset were performed. Ten for
the case of fitness function FF1, and ten for the case of
fitness function FF2. For each experiment, each dataset was
randomly divided into two sets: a training set and a testing set;
this with the aim to prove the robustness and the performance
of the methodology. The same parameters were used through
the whole experimentation.
Depending of the problem, the ABC algorithm approaches
the solution to the minimum error during the evolutionary
learning process at different rates. For instance, for the case of
the object recognition problem, we observed that by evolving
FF1 (the one using MSE), the error tends to diminish faster,
and after a certain number of generations the error diminish
slowly (Figure 2(a)). On the other hand, we also observed
that, in some cases when FF2 is evolved, the error reaches
the minimum error in a few number of epochs; nonetheless
the error tends to diminish slowly (Figure 2(b)).
334
0 500 1000 1500 2000 2500 3000 3500 40000
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
generations
MS
E
Evolution of the error (Iris plant)
(a)
0 500 1000 1500 2000 2500 3000 3500 40000
0.05
0.1
0.15
0.2
0.25
0.3
0.35
generations
CE
R
Evolution of the error (Iris plant)
(b)
Fig. 3. Evolution of the error for the ten experiments for the Iris plantproblem. (a) Evolution of FF1 using MSE function. (b) Evolution of FF2
using CER function.
0 500 1000 1500 2000 2500 3000 3500 40000
0.05
0.1
0.15
0.2
0.25
generations
MS
E
Evolution of the error (Wine)
(a)
0 500 1000 1500 2000 2500 3000 3500 40000
0.05
0.1
0.15
0.2
0.25
0.3
0.35
generations
CE
R
Evolution of the error (Wine)
(b)
Fig. 4. Evolution of the error for the ten experiments for the Wine problem.(a) Evolution of FF1 using MSE function. (b) Evolution of FF2 using CERfunction.
For the case of the iris plant dataset, we observed that by
evolving FF1 (the one using MSE) or FF2 (the one using
CER), the error tends to diminish faster, and after a certain
number of generations the error diminish slowly (Figure 3).
For the case of the wine dataset, we observed that by
evolving FF1 or FF2, the error tends to diminish slowly
(Figure 4).
Finally, for the case of the breast cancer dataset, we ob-
served that by evolving FF1 or FF2, the error tends to
diminish faster and after a certain number of generations the
error diminish slowly (Figure 5).
Figures 6, 7, 8 and 9 show two of the 20 different ANNs
automatically generated with the proposed methodology, and
0 500 1000 1500 2000 2500 3000 3500 40000
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
generations
MS
E
Evolution of the error (Breast cancer)
(a)
0 500 1000 1500 2000 2500 3000 3500 40000
0.02
0.04
0.06
0.08
0.1
0.12
generations
CE
R
Evolution of the error (Breast cancer)
(b)
Fig. 5. Evolution of the error for the ten experiments for the Breast cancerproblem. (a) Evolution of FF1 using MSE function. (b) Evolution of FF2
using CER function.
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
LN
HT
LS
GS
SN
LS
LS
SN
LS
GS
LS
(a)
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
GS
GS
LN
LS
LN
LS
GS
GS
GS
LS
GS
(b)
Fig. 6. Two different ANNs designs for the object recognition problem.(a) ANN designed by the ABC algorithm without taking into account F3
function. (b) ANN designed by the ABC algorithm taking into account F3
function.
0 1 2 3 4 5 6 7 8 9 10
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
1
2
3
4
LS
GS
LS
GS
GS
LS
(a)
0 1 2 3 4 5 6 7 8 9 10
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
1
2
3
4
HT
HT
HL
LS
GS
(b)
Fig. 7. Two different ANNs designs for the Iris plant problem. (a) ANNdesigned by the ABC algorithm without taking into account F3 function. (b)ANN designed by the ABC algorithm taking into account F3 function.
for each dataset. It is important to note that Figures 6(a),
7(a), 8(a) and 9(a) were automatically designed by the ABC
algorithm but the fitness functions FF1 and FF2 did not
include the synthesis of the ANN, in other words, these
fitness functions do not used F3 function. On the contrary,
Figures 6(b), 7(b), 8(b) and 9(b) were automatically designed
by the ABC algorithm but the fitness functions FF1 and
FF2 include the synthesis of the ANN using F3 function.
Furthermore, in some cases the dimensionality of the input
pattern is reduced because some features do not contribute to
the ANN’s output.
0 1 2 3 4 5 6 7 8 9 10
0
2
4
6
8
10
12
14
1
2
3
4
5
6
7
8
9
10
11
12
13
HL
LS
HT
HT
LN
GS
GS
HT
LS
LS
LS
(a)
0 1 2 3 4 5 6 7 8 9 10
0
2
4
6
8
10
12
14
1
2
3
4
5
6
7
8
9
10
11
12
13
GS
HT
LN
LS
LS
LS
LS
LS
(b)
Fig. 8. Two different ANNs designs for the Wine problem. (a) ANN designedby the ABC algorithm without taking into account F3 function. (b) ANNdesigned by the ABC algorithm taking into account F3 function.
335
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
HT
LN
GS
GS
HT
LS
LS
(a)
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
HT
HT
GS
LS
(b)
Fig. 9. Two different ANNs designs for the Breast cancer problem. (a) ANNdesigned by the ABC algorithm without taking into account F3 function. (b)ANN designed by the ABC algorithm taking into account F3 function.
Table I shows the average connection number achieved with
the proposed fitness functions (FF1 and FF2). In addition,
we also present the average connection number achieved when
F3 is not taken into account by the proposed fitness functions.
As the reader can appreciate, the number of connections
decreases when function F3 is used.
TABLE IAVERAGE CONNECTION NUMBER.
Dataset Objective func. Objective func.without F3 using F3
FF1 FF2 FF1 FF2
Object rec. 74 66.3 58.4 65Iris plant 20.9 19 15.6 12.7
Wine 104.8 94.6 86.3 89.9Breast cancer 48 41.9 30.7 31.6
Once generated the ANN for each problem, we proceeded
to test their accuracy. The next figures show the performance
of the methodology with the two fitness functions. Figures
10, 11, 12 and 13 present the percentage of classification for
all the experiments (executions) during the training and testing
phase, whereas for the case of the object recognition, wine and
breast cancer datasets the best percentage of recognition for
the two phases was achieved using the FF1 fitness function,
the best percentage of recognition for the iris plant dataset was
achieved using FF2 fitness function.
Table II, presents the average percentage of recognition for
all the experiments using fitness function FF1 and the fitness
function FF2. In this Table, we can observe that the best
percentage of recognition for all the databases was achieved
only during training phase. The accuracy slightly diminished
during testing phase. However, the results obtained with the
proposed methodology were highly acceptable and stable. This
tendency can be corroborated in Table III that shows the
standard deviation of all experimental results obtained with
each dataset.
Tables IV and V show the maximum and minimum per-
centage of classification achieved in all experiments during
training and testing phase using the two fitness functions. In
Table IV there are many one’s that represent the maximum
1 2 3 4 5 6 7 8 9 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
# of experiment
% o
f re
co
gn
itio
n
Accuracy of the NN (Object recognition)
Training
Testing
(a)
1 2 3 4 5 6 7 8 9 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
# of experiment
% o
f re
co
gn
itio
n
Accuracy of the NN (Object recognition)
Training
Testing
(b)
Fig. 10. Percentage of recognition for the object recognition problem andthe ten experiments during the training and testing stage for each fitnessfunction. (a) Percentage of recognition minimizing the MSE. (b) Percentageof recognition minimizing the CER.
1 2 3 4 5 6 7 8 9 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
# of experiment
% o
f re
cognitio
n
Accuracy of the NN (Iris plant)
Training
Testing
(a)
1 2 3 4 5 6 7 8 9 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
# of experiment
% o
f re
cognitio
n
Accuracy of the NN (Iris plant)
Training
Testing
(b)
Fig. 11. Percentage of recognition for the Iris problem and the tenexperiments during the training and testing stage for each fitness function.(a) Percentage of recognition minimizing the FF1 function. (b) Percentageof recognition minimizing the FF2 function.
1 2 3 4 5 6 7 8 9 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
# of experiment
% o
f re
cognitio
n
Accuracy of the NN (Wine)
Training
Testing
(a)
1 2 3 4 5 6 7 8 9 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
# of experiment
% o
f re
cognitio
n
Accuracy of the NN (Wine)
Training
Testing
(b)
Fig. 12. Percentage of recognition for the Wine problem and the tenexperiments during the training and testing stage for each fitness function.(a) Percentage of recognition minimizing the FF1 function. (b) Percentageof recognition minimizing the FF2 function.
TABLE IIAVERAGE PERCENTAGE OF RECOGNITION.
Dataset Objective func. Objective func.FF1 FF2
Training Testing Training TestingObject rec. 0.984 0.946 0.938 0.864Iris plant 0.9667 0.9253 0.9693 0.9387
Wine 0.9337 0.8629 0.8764 0.7944Breast cancer 0.973 0.9655 0.9739 0.9561
336
1 2 3 4 5 6 7 8 9 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
# of experiment
% o
f re
cognitio
n
Accuracy of the NN (Breast cancer)
Training
Testing
(a)
1 2 3 4 5 6 7 8 9 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
# of experiment
% o
f re
cognitio
n
Accuracy of the NN (Breast cancer)
Training
Testing
(b)
Fig. 13. Percentage of recognition for the Breast cancer problem and theten experiments during the training and testing stage for each fitness function.(a) Percentage of recognition minimizing the FF1 function. (b) Percentageof recognition minimizing the FF2 function.
TABLE IIISTANDARD DEVIATION OF RECOGNITION.
Dataset Objective func. Objective func.FF1 FF2
Training Testing Training TestingObject rec. 0.0386 0.0962 0.0371 0.0842Iris plant 0.0237 0.0378 0.0189 0.0373
Wine 0.0287 0.0575 0.0304 0.1164Breast cancer 0.0063 0.0102 0.0111 0.0134
percentage (100%) of recognition that can be achieved by the
designed ANN. This is important because at least, we found
one configuration that solves a specific problem without mis-
classified patterns or with a low percentage of error. In Table
V the worst values achieved with the ANN are represented.
Particularly, the dataset that provides the worst results was the
wine problem. Nonetheless, the accuracy achieved was higly
acceptables.
TABLE IVTHE BEST PERCENTAGE OF RECOGNITION.
Dataset Objective func. Objective func.FF1 FF2
Training Testing Training TestingObject rec. 1 1 1 0.96Iris plant 1 0.9733 1 0.9733
Wine 0.9775 0.9551 0.9213 0.9213Breast cancer 0.9824 0.9766 0.9853 0.9766
TABLE VTHE WORST PERCENTAGE OF RECOGNITION.
Dataset Objective func. Objective func.FF1 FF2
Training Testing Training TestingObject rec. 0.88 0.72 0.9 0.7Iris plant 0.92 0.8533 0.9333 0.84
Wine 0.8989 0.7865 0.8315 0.5169Breast cancer 0.9648 0.9444 0.9501 0.9386
From these experiments, we observed that the ABC algo-
rithm was able to find the best configuration for an ANN given
a specific set of patterns that define a classification problem.
Moreover, the integration of the synthesis into the fitness func-
tion provokes that the ABC algorithm generates ANNs with
a small number of connections and high performance. The
design of the ANN’s consists on providing a good architecture
with the best set of transfer functions and synaptic weights.
The experimentation shows that all the designs generated by
the proposal present an acceptable percentage of recognition
for both training and testing phases with the two fitness
functions.
VI. CONCLUSIONS
The design of an ANN is achieved using the proposed
methodology. The synaptic weights, the architecture and the
transfer function of an ANN are evolved by means of ABC
algorithm. Furthermore, the connections among the neurons
that belong to the ANN are synthesized. This allows generating
a a reduce design of the an ANN with a high performance.
In this work we tested the performance of the ABC algo-
rithm. We have also proved that this novel technique is a good
optimization algorithm, because it does not easily traps in local
minimums. In the case of the proposed methodology we have
demonstrated its robustness; the random choice of the patterns
for each experiment allowed us to get, statistically speaking,
good significant results.
The experiments were tested with two different fitness
functions: FF1 and FF2, based on the MSE and CER,
respectively. Additionally, these fitness functions involve the
synthesis of the architecture. Through these experiments, we
observed that both functions achieved a highly acceptable
performance. Moreover we demonstrated that these fitness
functions can considerably reduce the number of connections
of an ANN with a minimum of error of MSE and CER
functions.
On the other hand, in some of the ANN designs generated
by the proposed methodology, some neurons belonging to the
input layer are not used; they do not present any connections
with other neurons. In this particularly case, we can say that
a reduction of the dimensionality of the input pattern is also
obtained.
In general, the results were satisfactory. The proposed
methodology allows searching the best values that permit
automatically constructing an ANN that generates a good
solution for a classification problem.
ACKNOWLEDGMENT
B. Garro thanks CONACYT for the scholarship provided
during her PhD studies. H. Sossa thanks SIP-IPN under grant
number 20111016, COTEPABE-IPN, DAAD-PROALMEX
under grant J000.426/2009 and the European Union and
CONACYT under grant FONCICYT 93829, for the eco-
nomical support. The content of this paper is an exclusive
responsibility of the CIC-IPN and it cannot be considered that
it reflects the position of the European Union. Authors thank
the anonymous reviewers for their comments to improve the
paper.
337
REFERENCES
[1] X. Yao, “Evolving artificial neural networks,” PIEEE: Proceedings of
the IEEE, vol. 87, 1999.[2] B. A. Garro, H. Sossa, and R. A. Vazquez, “Design of artificial neural
networks using a modified particle swarm optimization algorithm,”in Proceedings of the 2009 international joint conference on Neural
Networks, ser. IJCNN’09. Piscataway, NJ, USA: IEEE Press, 2009,pp. 2363–2370.
[3] ——, “Design of artificial neural networks using differential evolutionalgorithm,” in Proceedings of the 17th international conference on
Neural information processing: models and applications - Volume Part
II, ser. ICONIP’10. Berlin, Heidelberg: Springer-Verlag, 2010, pp.201–208.
[4] D. Karaboga and B. Akay, “Artificial Bee Colony (ABC) Algorithmon Training Artificial Neural Networks,” in Signal Processing and
Communications Applications, 2007. SIU 2007. IEEE 15th, 2007, pp.1–4.
[5] D. Karaboga, B. Akay, and C. Ozturk, “Artificial bee colony (abc)optimization algorithm for training feed-forward neural networks,” inProceedings of the 4th international conference on Modeling Decisions
for Artificial Intelligence, ser. MDAI ’07. Berlin, Heidelberg: Springer-Verlag, 2007, pp. 318–329.
[6] D. Karaboga and C. Ozturk, “Neural networks training by artificialbee colony algorithm on pattern classification,” Neural Network World,vol. 19, no. 10, pp. 279 –292, 2009.
[7] D. Karaboga, C. Ozturk, and B. Akay, “Training neural networkswith abc optimization algorithm on medical pattern classification,” inInternational conference on multivariate statistical modelling and high
dimensional data mining, 2008.[8] C. Ozturk and D. Karaboga, “Classification by neural networks and
clustering with artificial bee colony (abc) algorithm,” in International
symposium on intelligent and manufacturing systems features, strategies
and innovation, 2008.[9] T. Kurban and E. Besdok, “A comparison of RBF neural network training
algorithms for inertial sensor based terrain classification,” Sensors,vol. 9, pp. 6312–6329, 2009.
[10] C. Ozkan, O. Kisi, and B. Akay, “Neural networks withartificial bee colony algorithm for modeling daily referenceevapotranspiration,” Irrigation Science, pp. 1–11, 2010. [Online].Available: http://dx.doi.org/10.1007/s00271-010-0254-0
[11] D. Pham, A. Soroka, A. Ghanbarzadeh, E. Koc, S. Otri, and M. Pack-ianather, “Optimising neural networks for identification of wood defectsusing the bees algorithm,” in Industrial Informatics, 2006 IEEE Inter-
national Conference on, 2006, pp. 1346 –1351.[12] D. Pham and A. O. S. Koc, E.and Ghanbarzadeh, “Optimisation of the
weights of multi-layered perceptrons using the bees algorithm,” in In
Proceedings of 5th international symposium on intelligent manufactur-
ing systems, 2006.[13] D. Karaboga and B. Akay, “A survey: algorithms simulating bee swarm
intelligence,” Artificial Intelligence Review, vol. 31, no. 1, pp. 61–85,Jun. 2009.
[14] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning internal
representations by error propagation. Cambridge, MA, USA: MITPress, 1986, ch. 8, pp. 318–362.
[15] J. A. Anderson, An Introduction To Neural Networks. The MIT Press,1995.
[16] P. Werbos, “Backpropagation through time: what it does and how todo it,” Proceedings of the IEEE, vol. 78, no. 10, pp. 1550 –1560, Oct.1990.
[17] V. Tereshko and A. Loengarov, “Collective decision making in honey-bee foraging dynamics,” Computing and Information System Journal,vol. 9, no. 3, pp. 1 –7, 2005.
[18] D. Karaboga, “An idea based on honey bee swarm for numericaloptimization,” Computer Engineering Department, Engineering Faculty,Erciyes University., Tech. Rep., 2005.
[19] P. M. Murphy and D. W. Aha, “UCI Repository of machine learningdatabases,” University of California, Department of Information andComputer Science, Irvine, CA, US., Tech. Rep., 1994.
338