[ieee 2011 ieee congress on evolutionary computation (cec) - new orleans, la, usa...

8
Artificial Neural Network Synthesis by means of Artificial Bee Colony (ABC) Algorithm Beatriz A. Garro Center for Computing Research National Polytechnic Institute CIC-IPN Mexico City, Mexico Email: [email protected] Humberto Sossa Center for Computing Research National Polytechnic Institute CIC-IPN Mexico City, Mexico Email: [email protected] Roberto A. V´ azquez Intelligent Systems Group Faculty of Engineering La Salle University Mexico City, Mexico Email: [email protected] Abstract—Artificial bee colony (ABC) algorithm has been used in several optimization problems, including the optimization of synaptic weights from an Artificial Neural Network (ANN). However, this is not enough to generate a robust ANN. For that reason, some authors have proposed methodologies based on so-called metaheuristics that automatically allow designing an ANN, taking into account not only the optimization of the synaptic weights as well as the ANN’s architecture, and the transfer function of each neuron. However, those methodologies do not generate a reduced design (synthesis) of the ANN. In this paper, we present an ABC based methodology, that maximizes its accuracy and minimizes the number of connections of an ANN by evolving at the same time the synaptic weights, the ANN’s architecture and the transfer functions of each neuron. The methodology is tested with several pattern recognition problems. I. I NTRODUCTION Artificial neural networks (ANN) are very important tools for solving different kind of problems such as pattern classifi- cation, forecasting and regression. However, their design imply a mechanism of error-testing that tests different architectures, transfer functions and the selection of a training algorithm that permits to adjust the synaptic weights of the ANN. This design is very important because the wrong selection of one of these characteristics could provoke that the training algorithm be trapped in a local minimum. Because of this, several metaheuristic based methods in order to obtain a good ANN design have been reported. In [1], Xin Yao presents a state-of-the-art where evolution- ary algorithms are used to evolve the synaptic weights and the architecture, in some cases with the help of classic techniques like back-propagation algorithm. But there are some works like [2] where the authors evolve automatically the design of an ANN using basic PSO, second generation PSO (2GPSO) and a new technique (NMPSO). In [3], the authors design an ANN by means of DE algorithm and compare it with other bio-inspired techniques. In these last two works, the authors evolve, at the same time, the principal features of an ANN: the synaptic weights, the transfer functions for each neuron and the architecture. However, the architectures obtained by these two methods contain many connections. In [4] the authors train an ANN by means of ABC al- gorithm. In [5] the authors applied this algorithm to train a feed-forward for solving the XOR, 3-Bit Parity and 4- Bit Encoder-Decoder problems. In the pattern classification area, other works like [6] ABC algorithm is compared with other evolutionary techniques, while in [7] an ANN is trained with medical pattern classification. Another problem solved by applying the ABC algorithm can be found in [8], where the authors test with clustering problems. In [9], authors train a RBF Neural Network using ABC algorithm. In this work four characteristics of this kind of ANN are optimized: the weights between the hidden layer and the output layer, the spread parameters of the hidden layer base function, the center vectors of the hidden layer and the bias parameters of the neurons of the output layer. In [10] an ANN is trained to estimate and model the daily reference evapotranspiration of two USA stations. There are other kinds of algorithms based on the bee behavior that have been applied to train an ANN. For example, in [11] bee algorithm is used to identify woods defects, while in [12], the same algorithm is applied to optimize the synaptic weights of an ANN. In [13] a good review concerning this kind a bio-inspired algorithms to provide solutions to different problems is given. It said that ABC algorithm is a good optimization technique. In this paper we want to verify if this algorithm performs in the automatic designing of an ANN, including not only the synaptic weights but also architecture and the transfer func- tions of the neurons. As we will see, the architectures obtained are optimal in the sense that the number of connections is minimal without loosing efficiency. The paper is organized as follows: in section 2 we briefly present the basics of ANNs. In section 3 we explain the fundamental concepts of ABC algorithm, while in section 4 we describe how ABC algorithm is used to design and ANN and how the ANN’s architecture can be optimized. In section 5 the experimental results using different classification problems are given. Finally, in section 6 we present the conclusions of the work. II. ARTIFICIAL NEURAL NETWORKS An ANN tries to simulate the brain’s behavior when they generate information, save it or transform it. An ANN is a 331 978-1-4244-7835-4/11/$26.00 ©2011 IEEE

Upload: roberto-a

Post on 18-Dec-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Artificial Neural Network Synthesis by means of

Artificial Bee Colony (ABC) Algorithm

Beatriz A. Garro

Center for Computing Research

National Polytechnic Institute

CIC-IPN

Mexico City, Mexico

Email: [email protected]

Humberto Sossa

Center for Computing Research

National Polytechnic Institute

CIC-IPN

Mexico City, Mexico

Email: [email protected]

Roberto A. Vazquez

Intelligent Systems Group

Faculty of Engineering

La Salle University

Mexico City, Mexico

Email: [email protected]

Abstract—Artificial bee colony (ABC) algorithm has been usedin several optimization problems, including the optimization ofsynaptic weights from an Artificial Neural Network (ANN).However, this is not enough to generate a robust ANN. Forthat reason, some authors have proposed methodologies basedon so-called metaheuristics that automatically allow designingan ANN, taking into account not only the optimization of thesynaptic weights as well as the ANN’s architecture, and thetransfer function of each neuron. However, those methodologiesdo not generate a reduced design (synthesis) of the ANN. In thispaper, we present an ABC based methodology, that maximizes itsaccuracy and minimizes the number of connections of an ANNby evolving at the same time the synaptic weights, the ANN’sarchitecture and the transfer functions of each neuron. Themethodology is tested with several pattern recognition problems.

I. INTRODUCTION

Artificial neural networks (ANN) are very important tools

for solving different kind of problems such as pattern classifi-

cation, forecasting and regression. However, their design imply

a mechanism of error-testing that tests different architectures,

transfer functions and the selection of a training algorithm

that permits to adjust the synaptic weights of the ANN. This

design is very important because the wrong selection of one of

these characteristics could provoke that the training algorithm

be trapped in a local minimum. Because of this, several

metaheuristic based methods in order to obtain a good ANN

design have been reported.

In [1], Xin Yao presents a state-of-the-art where evolution-

ary algorithms are used to evolve the synaptic weights and the

architecture, in some cases with the help of classic techniques

like back-propagation algorithm. But there are some works

like [2] where the authors evolve automatically the design of

an ANN using basic PSO, second generation PSO (2GPSO)

and a new technique (NMPSO). In [3], the authors design an

ANN by means of DE algorithm and compare it with other

bio-inspired techniques. In these last two works, the authors

evolve, at the same time, the principal features of an ANN:

the synaptic weights, the transfer functions for each neuron

and the architecture. However, the architectures obtained by

these two methods contain many connections.

In [4] the authors train an ANN by means of ABC al-

gorithm. In [5] the authors applied this algorithm to train

a feed-forward for solving the XOR, 3-Bit Parity and 4-

Bit Encoder-Decoder problems. In the pattern classification

area, other works like [6] ABC algorithm is compared with

other evolutionary techniques, while in [7] an ANN is trained

with medical pattern classification. Another problem solved by

applying the ABC algorithm can be found in [8], where the

authors test with clustering problems.

In [9], authors train a RBF Neural Network using ABC

algorithm. In this work four characteristics of this kind of

ANN are optimized: the weights between the hidden layer

and the output layer, the spread parameters of the hidden layer

base function, the center vectors of the hidden layer and the

bias parameters of the neurons of the output layer. In [10]

an ANN is trained to estimate and model the daily reference

evapotranspiration of two USA stations.

There are other kinds of algorithms based on the bee

behavior that have been applied to train an ANN. For example,

in [11] bee algorithm is used to identify woods defects, while

in [12], the same algorithm is applied to optimize the synaptic

weights of an ANN. In [13] a good review concerning this

kind a bio-inspired algorithms to provide solutions to different

problems is given.

It said that ABC algorithm is a good optimization technique.

In this paper we want to verify if this algorithm performs in

the automatic designing of an ANN, including not only the

synaptic weights but also architecture and the transfer func-

tions of the neurons. As we will see, the architectures obtained

are optimal in the sense that the number of connections is

minimal without loosing efficiency.

The paper is organized as follows: in section 2 we briefly

present the basics of ANNs. In section 3 we explain the

fundamental concepts of ABC algorithm, while in section 4

we describe how ABC algorithm is used to design and ANN

and how the ANN’s architecture can be optimized. In section 5

the experimental results using different classification problems

are given. Finally, in section 6 we present the conclusions of

the work.

II. ARTIFICIAL NEURAL NETWORKS

An ANN tries to simulate the brain’s behavior when they

generate information, save it or transform it. An ANN is a

331978-1-4244-7835-4/11/$26.00 ©2011 IEEE

system made up from simple processing units. This system

offers the input-output mapping property and capability [14].

This type of processing unit performs in two stages: weighted

summation and some type of non-linear function, this allows

the ANN to realize a learning stage of the input data rep-

resenting the problem to be solved. Each value of an input

pattern a ∈ IRN is associated with its synaptic weight values

W ∈ IRN , which is normally between 0 and 1. Furthermore,

the summation function often takes an extra input value θ

with weight value 1, representing a threshold or a bias for the

neuron. The summation function will be then performed as

Eq. 1.

o =

N∑

i=1

aiwi + θ (1)

The sum of the products is then passed to the second stage

to perform the activation function f (o) which generates the

output of the neuron, and determines the behavior of the neural

model. By connecting multiple neurons, the true computing

power of the ANN emerges. The most common structure of

connecting neurons is by layers. In a multilayer structure,

the input nodes, which receive the pattern a ∈ IRN , pass

the information to the units in the first hidden layer, the

outputs from this first hidden layer are passed to the next layer,

and so on until reaching the output layer, producing thus an

approximation of the desired output y ∈ IRM .

Basically, learning is a process by which the free parameters

(i.e., synaptic weights W and bias levels θ) of an ANN are

adapted through a continuous process of stimulation by the

environment in which the network is embedded. The type of

learning is determined by the manner in which the parameter

changes take place. On the other hand, the learning process

may be classified as: supervised or unsupervised. In this paper

we focus on supervised learning that assumes the availability

of a labeled set of training data made up of p input-output

samples (see Eq. 2):

Tξ ={(

aξ ∈ IRN ,dξ ∈ IRM)}

∀ξ = 1, . . . , p (2)

where a is the input pattern and d the desired response.

Given the training sample Tξ, the requirement is to compute

the free parameters of the neural network so that the actual

output yξ of the ANN due to aξ is close enough to dξ for all ξ

in a statistical sense. In this sense, we may use the mean-square

error (MSE) given in Eq. 6 as the first objective function to

be minimized:

e =1

p ·M

p∑

ξ=1

M∑

i=1

(

dξi − y

ξi

)2

(3)

One of the most used ANNs is the feed-forward neural

network, trained by means of the back-propagation (BP) algo-

rithm [15] and [16]. This algorithm minimizes the objective

function described by Eq. 3. Some algorithms constantly adjust

the values of the synaptic weights until the value of the

error no longer decreases. However these classic algorithms

can converge to a local minimum instead of to the desired

global minimum. Furthermore, the architecture and the transfer

function used in their design can influence the ANN’s perfor-

mance; consequently, the learning algorithm can be trapped in

a minimum faraway from the best solution.

III. ARTIFICIAL BEE COLONY ALGORITHM

Artificial Bee Colony (ABC) algorithm is based on the

metaphor of the bees foraging behavior. The natural selection

which created this beautiful system of communication can

also be seen within the system. Information about different

parts of the environment is like species in competition. The

fitness of the species is given by the profitability of the food

source it describes. Information survives by continuing to

circulate within the nest, and is capable of reproducing itself

by recruiting new foragers who become informed of the food

source and come back to the nest and share their information

[17].

ABC algorithm was proposed by Karaboga in 2005 [18]

for solving numerical optimization problems. This algorithm is

based on the model proposed by Tereshko and Loengarov [17].

It consists in a set of possible solutions xi (the population) that

are represented by the position of the food sources. On the

other hand, in order to find the best solution three classes of

bees are used: employed bees, onlooker bees and scout bees.

These bees have got different tasks in the colony, i. e., in the

search space.

Employed bees: Each bee searches for new neighbor food

source near of their hive. After that, it compares the food

source against the old one using Eq. 4. Then, it saves in their

memory the best food source.

vji = x

ji + φ

ji

(

xji − x

jk

)

(4)

where k ∈ {1, 2, ..., SN} and j ∈ {1, 2, ..., D} are randomly

chosen indexes and k 6= i. φji is a random number between

[−a, a].After that, the bee evaluates the quality of each food source

based on the amount of the nectar (the information) i.e. the

fitness function is calculated. Finally, it returns to the dancing

area in the hive, where the onlookers bees are.

Onlooker bees: This kind of bees watch the dancing of the

employed bee so as to know where the food source can be

found, if the nectar is of high quality, as well as the size of

the food source. The onlooker bee chooses probabilistically

a food source depending on the amount of nectar shown by

each employed bee, see Eq. 5.

pi =fiti

SN∑

n=1fitn

(5)

where fiti is the fitness value of the solution i and SN is

the number of food sources which are equal to the number of

employed bees.

Scout bees: This kind of bees helps the colony to randomly

create new solutions when a food source cannot be improved

332

anymore, see Eq. 6. This phenomenon is called “limit” or

“abandonment criteria”.

xji = x

jmin + rand (0, 1)

(

xjmax − x

jmin

)

(6)

The pseudo-code of the ABC algorithm is next shown:

1: Initialize the population of solutions xi∀i, i = 1, ..., SN2: Evaluate the population xi,G∀i, i = 1, ..., NP

3: for cycle = 1 to MCN do

4: Produce new solutions vi for the employed bees by

using Eq. 4 and evaluate them.

5: Apply the greedy selection process.

6: Calculate the probability values pi for the solutions xi

by Eq. 5.

7: Produce the new solutions vi for the onlookers from

the solutions xi selected depending on pi and evaluate

them.

8: Apply the greedy selection process.

9: Determine the abandoned solution for the scout, if exist,

and replace it with a new randomly produced solution

xi by Eq. 6.

10: Memorize the best solution achieved so far.

11: cycle = cycle+ 112: end for

IV. METHODOLOGY

The main aim of our methodology is to evolve, at the

same time, the synaptic weights, the architecture (or topology),

and the transfer functions of each neuron so as to obtain a

minimum Mean Square Error (MSE) as well as a minimum

classification error (CER). At the same time, we look to

optimize the ANN’s architecture by reducing the number of

neurons and their connections.

The problem to be solved can be defined as follows: Giving

a set of input patterns X ={

x1, ...,xp}

,x ∈ IRn and a set

of desired patterns D ={

d1, ...,dp}

,d ∈ IRm, find an ANN

represented by W ∈ IRq×(q+2) such that a function defined

by min (F (D,X,W )) is minimized.

The codification of the ANN’s design to be evolved by

ABC algorithm is given in Fig. 1. This figure shows the food

source’s position representing the solution to the problem. This

solution is defined by a matrix W ∈ IRq×(q+2) composed by

three main parts: the topology (T), the synaptic weights (SW),

and the transfer functions (F) where q is the maximum number

of neurons MNN; it is defined by q = 2 (m+ n) (remember

that n is the dimension of the input patterns vector and m is

the dimension of the desired patterns vector).

The three parts of the matrix W take values form three

different ranges. For the case of the topology, the range is

between[

1, 2MNN − 1]

, for the case of the synaptic weights

is between [−4, 4] and for the case of the transfer functions the

range is [1, nF ], where nF is the number of transfer functions.

The ANN’s topology is codified based on the binary square

matrix representation of a graph x where each component xij

Fig. 1. Representation of the individual codifying the architecture, synapticweights and transfer functions.

represents the connections between neuron i and neuron j

when xij = 1. This information was codified into its decimal

base. For example, suppose that next binary code ”01101”

represents the connections of the ith neuron to five neurons.

From this binary code, we can observe that only neurons

two, three, and five are connected to neuron i. This binary

code is transformed into its decimal base value resulting in

”13; this will be the number that we will evolved instead of

the binary value. This scheme is much faster to manipulate.

Instead of evolving a string of bits, we thus evolve a decimal

base number.

The synaptic weights of the ANN are codified again by

square matrix representation of a graph x where each compo-

nent xij represents the synaptic weight between neuron i and

neuron j.

Finally, the transfer function for each neuron is represented

by an integer in the range of [0, 5] codifying one of the six

transfer functions used in this research: logsig, tansig, sin,

radbas, pureline, and hardlim. These functions were selected

for they are the most popular and useful transfer functions in

several kinds of problems.

When the aptitude of an individual is computed by means

of the MSE function (Eq. 7), all the values of matrix W are

codified so as to obtain the desired ANN. Moreover, each

solution must be tested in order to evaluate its performance.

Due to the methodology is tested with several pattern classifi-

cation problems, it is necessary to know the classification error

(CER) function, this means to know how many patterns have

been correctly classified and many were incorrectly classified.

F1 =1

p ·M

p∑

ξ=1

M∑

i=1

(

dξi − y

ξi

)2

(7)

For the case of the CER fitness function, the output of

the ANN is transformed by means of the winner-take-all

technique; this codification is then compared with the set of

the desired patterns. When the output of the ANN equals

the corresponding desired pattern, this means that the pattern

has been correctly classified, otherwise it was incorrectly

classified. Based on the winner-take-all technique we can

compute the CER, defined by Eq. 8:

F2 = 1−npwc

tnp(8)

333

where npwc is the number of patterns well classified and tnpis the total number of patterns to be classified.

When the MSE is used, the output yi of the ANN is

computed as follows:

1: For the first n neurons, the output oi = ai.

2: for nei with i = n to MNN do

3: Get connections by using individual x1,i.

4: for each neuron j < i connected to nei do

5: oi = f (o) where f is a transfer function giving by

individual xm,i and o is computing using Eq. 4.

6: end for

7: Finally, yi = oi, i = MNN −m, . . . ,MNN .

8: end for

Note that the restriction j < i is used to avoid the generation

of cycles in the ANN.

Until now, we have defined two fitness functions that help

to maximize the ANN’s accuracy by minimizing their error

(MSE or CER). Now, we have to propose a function that helps

not only to get a maximum accuracy but also to minimize

the number of connections of the ANN. The reduction of the

architecture could be represented as follows:

F3 =NC

NMaxC(9)

where NC is the number of connections in the ANN designed

by the proposed methodology and NMaxC is the maxi-

mum number of connections generated with MNN neurons.

NMaxC is given as:

NMaxC =

MNN∑

i=n

i (10)

It is important to notice that if F3 is used as a fitness

function in the ABC algorithm. The proposed methodology

will allow synthesizing the ANN but the accuracy will not

be maximized. For that reason, we have to proposed a fitness

function that integrates both objectives: the minimization of

the error and synthesis of the ANN (the reduction of the

number of connections). Two fitness functions are proposed

to achieved this goal using the ABC algorithm. These fitness

functions are composed by combining functions F1, F2 and

F3. First fitness function (FF1) is represented by Eq. 11, while

second fitness function (FF2) is represented by Eq. 12.

FF1 = F1 · F3 (11)

FF2 = F2 · F3 (12)

With these functions we will next see the we will be able

to design ANNs with a high accuracy and a very low number

of connections.

0 500 1000 1500 2000 2500 3000 3500 40000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

generations

MS

E

Evolution of the error (Object recognition)

(a)

0 500 1000 1500 2000 2500 3000 3500 40000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

generations

CE

R

Evolution of the error (Object recognition)

(b)

Fig. 2. Evolution of the error for the ten experiments for the objectrecognition problem. (a) Evolution of FF1 using MSE function. (b) Evolutionof FF2 using CER function.

V. RESULTS

Several experiments were performed in order to evaluate

the accuracy of the ANN designed by means of the proposal.

The accuracy of the ANN was tested with four pattern

classification problems. Three of them were taken from UCI

machine learning benchmark repository [19]: iris plant, wine

and breast cancer datasets. The other database was a real object

recognition problem.

The main features of each dataset are next given. For the

case of the object recognition dataset the dimension of the

input vector is 7 and the number of classes is 5 objects. For

the iris plant dataset, the dimension of input vector is 4 and

the number of classes is 3. For the wine dataset, the dimension

of input vector is 13 and the number of classes is 3. Finally,

for the breast cancer dataset, the dimension of input vector is

9 and the number of classes is 2.

The parameters of the ABC algorithm were set to the same

value for all the dataset problems: Colony size (NP = 50),

number of food sources NP/2, limit = 100 and the max-

imum number of cycles was MCN = 4000. Six different

transfer functions were used in all experiments: SN=sin func-

tion, LS=sigmoid function, HT= hyper-tangential function,

GS=Gaussian function, LN=liner function and HL= hard limit

function.

20 experiments using each dataset were performed. Ten for

the case of fitness function FF1, and ten for the case of

fitness function FF2. For each experiment, each dataset was

randomly divided into two sets: a training set and a testing set;

this with the aim to prove the robustness and the performance

of the methodology. The same parameters were used through

the whole experimentation.

Depending of the problem, the ABC algorithm approaches

the solution to the minimum error during the evolutionary

learning process at different rates. For instance, for the case of

the object recognition problem, we observed that by evolving

FF1 (the one using MSE), the error tends to diminish faster,

and after a certain number of generations the error diminish

slowly (Figure 2(a)). On the other hand, we also observed

that, in some cases when FF2 is evolved, the error reaches

the minimum error in a few number of epochs; nonetheless

the error tends to diminish slowly (Figure 2(b)).

334

0 500 1000 1500 2000 2500 3000 3500 40000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

generations

MS

E

Evolution of the error (Iris plant)

(a)

0 500 1000 1500 2000 2500 3000 3500 40000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

generations

CE

R

Evolution of the error (Iris plant)

(b)

Fig. 3. Evolution of the error for the ten experiments for the Iris plantproblem. (a) Evolution of FF1 using MSE function. (b) Evolution of FF2

using CER function.

0 500 1000 1500 2000 2500 3000 3500 40000

0.05

0.1

0.15

0.2

0.25

generations

MS

E

Evolution of the error (Wine)

(a)

0 500 1000 1500 2000 2500 3000 3500 40000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

generations

CE

R

Evolution of the error (Wine)

(b)

Fig. 4. Evolution of the error for the ten experiments for the Wine problem.(a) Evolution of FF1 using MSE function. (b) Evolution of FF2 using CERfunction.

For the case of the iris plant dataset, we observed that by

evolving FF1 (the one using MSE) or FF2 (the one using

CER), the error tends to diminish faster, and after a certain

number of generations the error diminish slowly (Figure 3).

For the case of the wine dataset, we observed that by

evolving FF1 or FF2, the error tends to diminish slowly

(Figure 4).

Finally, for the case of the breast cancer dataset, we ob-

served that by evolving FF1 or FF2, the error tends to

diminish faster and after a certain number of generations the

error diminish slowly (Figure 5).

Figures 6, 7, 8 and 9 show two of the 20 different ANNs

automatically generated with the proposed methodology, and

0 500 1000 1500 2000 2500 3000 3500 40000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

generations

MS

E

Evolution of the error (Breast cancer)

(a)

0 500 1000 1500 2000 2500 3000 3500 40000

0.02

0.04

0.06

0.08

0.1

0.12

generations

CE

R

Evolution of the error (Breast cancer)

(b)

Fig. 5. Evolution of the error for the ten experiments for the Breast cancerproblem. (a) Evolution of FF1 using MSE function. (b) Evolution of FF2

using CER function.

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

1

2

3

4

5

6

7

LN

HT

LS

GS

SN

LS

LS

SN

LS

GS

LS

(a)

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

1

2

3

4

5

6

7

GS

GS

LN

LS

LN

LS

GS

GS

GS

LS

GS

(b)

Fig. 6. Two different ANNs designs for the object recognition problem.(a) ANN designed by the ABC algorithm without taking into account F3

function. (b) ANN designed by the ABC algorithm taking into account F3

function.

0 1 2 3 4 5 6 7 8 9 10

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

1

2

3

4

LS

GS

LS

GS

GS

LS

(a)

0 1 2 3 4 5 6 7 8 9 10

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

1

2

3

4

HT

HT

HL

LS

GS

(b)

Fig. 7. Two different ANNs designs for the Iris plant problem. (a) ANNdesigned by the ABC algorithm without taking into account F3 function. (b)ANN designed by the ABC algorithm taking into account F3 function.

for each dataset. It is important to note that Figures 6(a),

7(a), 8(a) and 9(a) were automatically designed by the ABC

algorithm but the fitness functions FF1 and FF2 did not

include the synthesis of the ANN, in other words, these

fitness functions do not used F3 function. On the contrary,

Figures 6(b), 7(b), 8(b) and 9(b) were automatically designed

by the ABC algorithm but the fitness functions FF1 and

FF2 include the synthesis of the ANN using F3 function.

Furthermore, in some cases the dimensionality of the input

pattern is reduced because some features do not contribute to

the ANN’s output.

0 1 2 3 4 5 6 7 8 9 10

0

2

4

6

8

10

12

14

1

2

3

4

5

6

7

8

9

10

11

12

13

HL

LS

HT

HT

LN

GS

GS

HT

LS

LS

LS

(a)

0 1 2 3 4 5 6 7 8 9 10

0

2

4

6

8

10

12

14

1

2

3

4

5

6

7

8

9

10

11

12

13

GS

HT

LN

LS

LS

LS

LS

LS

(b)

Fig. 8. Two different ANNs designs for the Wine problem. (a) ANN designedby the ABC algorithm without taking into account F3 function. (b) ANNdesigned by the ABC algorithm taking into account F3 function.

335

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

1

2

3

4

5

6

7

8

9

HT

LN

GS

GS

HT

LS

LS

(a)

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

1

2

3

4

5

6

7

8

9

HT

HT

GS

LS

(b)

Fig. 9. Two different ANNs designs for the Breast cancer problem. (a) ANNdesigned by the ABC algorithm without taking into account F3 function. (b)ANN designed by the ABC algorithm taking into account F3 function.

Table I shows the average connection number achieved with

the proposed fitness functions (FF1 and FF2). In addition,

we also present the average connection number achieved when

F3 is not taken into account by the proposed fitness functions.

As the reader can appreciate, the number of connections

decreases when function F3 is used.

TABLE IAVERAGE CONNECTION NUMBER.

Dataset Objective func. Objective func.without F3 using F3

FF1 FF2 FF1 FF2

Object rec. 74 66.3 58.4 65Iris plant 20.9 19 15.6 12.7

Wine 104.8 94.6 86.3 89.9Breast cancer 48 41.9 30.7 31.6

Once generated the ANN for each problem, we proceeded

to test their accuracy. The next figures show the performance

of the methodology with the two fitness functions. Figures

10, 11, 12 and 13 present the percentage of classification for

all the experiments (executions) during the training and testing

phase, whereas for the case of the object recognition, wine and

breast cancer datasets the best percentage of recognition for

the two phases was achieved using the FF1 fitness function,

the best percentage of recognition for the iris plant dataset was

achieved using FF2 fitness function.

Table II, presents the average percentage of recognition for

all the experiments using fitness function FF1 and the fitness

function FF2. In this Table, we can observe that the best

percentage of recognition for all the databases was achieved

only during training phase. The accuracy slightly diminished

during testing phase. However, the results obtained with the

proposed methodology were highly acceptable and stable. This

tendency can be corroborated in Table III that shows the

standard deviation of all experimental results obtained with

each dataset.

Tables IV and V show the maximum and minimum per-

centage of classification achieved in all experiments during

training and testing phase using the two fitness functions. In

Table IV there are many one’s that represent the maximum

1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

# of experiment

% o

f re

co

gn

itio

n

Accuracy of the NN (Object recognition)

Training

Testing

(a)

1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

# of experiment

% o

f re

co

gn

itio

n

Accuracy of the NN (Object recognition)

Training

Testing

(b)

Fig. 10. Percentage of recognition for the object recognition problem andthe ten experiments during the training and testing stage for each fitnessfunction. (a) Percentage of recognition minimizing the MSE. (b) Percentageof recognition minimizing the CER.

1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

# of experiment

% o

f re

cognitio

n

Accuracy of the NN (Iris plant)

Training

Testing

(a)

1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

# of experiment

% o

f re

cognitio

n

Accuracy of the NN (Iris plant)

Training

Testing

(b)

Fig. 11. Percentage of recognition for the Iris problem and the tenexperiments during the training and testing stage for each fitness function.(a) Percentage of recognition minimizing the FF1 function. (b) Percentageof recognition minimizing the FF2 function.

1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

# of experiment

% o

f re

cognitio

n

Accuracy of the NN (Wine)

Training

Testing

(a)

1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

# of experiment

% o

f re

cognitio

n

Accuracy of the NN (Wine)

Training

Testing

(b)

Fig. 12. Percentage of recognition for the Wine problem and the tenexperiments during the training and testing stage for each fitness function.(a) Percentage of recognition minimizing the FF1 function. (b) Percentageof recognition minimizing the FF2 function.

TABLE IIAVERAGE PERCENTAGE OF RECOGNITION.

Dataset Objective func. Objective func.FF1 FF2

Training Testing Training TestingObject rec. 0.984 0.946 0.938 0.864Iris plant 0.9667 0.9253 0.9693 0.9387

Wine 0.9337 0.8629 0.8764 0.7944Breast cancer 0.973 0.9655 0.9739 0.9561

336

1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

# of experiment

% o

f re

cognitio

n

Accuracy of the NN (Breast cancer)

Training

Testing

(a)

1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

# of experiment

% o

f re

cognitio

n

Accuracy of the NN (Breast cancer)

Training

Testing

(b)

Fig. 13. Percentage of recognition for the Breast cancer problem and theten experiments during the training and testing stage for each fitness function.(a) Percentage of recognition minimizing the FF1 function. (b) Percentageof recognition minimizing the FF2 function.

TABLE IIISTANDARD DEVIATION OF RECOGNITION.

Dataset Objective func. Objective func.FF1 FF2

Training Testing Training TestingObject rec. 0.0386 0.0962 0.0371 0.0842Iris plant 0.0237 0.0378 0.0189 0.0373

Wine 0.0287 0.0575 0.0304 0.1164Breast cancer 0.0063 0.0102 0.0111 0.0134

percentage (100%) of recognition that can be achieved by the

designed ANN. This is important because at least, we found

one configuration that solves a specific problem without mis-

classified patterns or with a low percentage of error. In Table

V the worst values achieved with the ANN are represented.

Particularly, the dataset that provides the worst results was the

wine problem. Nonetheless, the accuracy achieved was higly

acceptables.

TABLE IVTHE BEST PERCENTAGE OF RECOGNITION.

Dataset Objective func. Objective func.FF1 FF2

Training Testing Training TestingObject rec. 1 1 1 0.96Iris plant 1 0.9733 1 0.9733

Wine 0.9775 0.9551 0.9213 0.9213Breast cancer 0.9824 0.9766 0.9853 0.9766

TABLE VTHE WORST PERCENTAGE OF RECOGNITION.

Dataset Objective func. Objective func.FF1 FF2

Training Testing Training TestingObject rec. 0.88 0.72 0.9 0.7Iris plant 0.92 0.8533 0.9333 0.84

Wine 0.8989 0.7865 0.8315 0.5169Breast cancer 0.9648 0.9444 0.9501 0.9386

From these experiments, we observed that the ABC algo-

rithm was able to find the best configuration for an ANN given

a specific set of patterns that define a classification problem.

Moreover, the integration of the synthesis into the fitness func-

tion provokes that the ABC algorithm generates ANNs with

a small number of connections and high performance. The

design of the ANN’s consists on providing a good architecture

with the best set of transfer functions and synaptic weights.

The experimentation shows that all the designs generated by

the proposal present an acceptable percentage of recognition

for both training and testing phases with the two fitness

functions.

VI. CONCLUSIONS

The design of an ANN is achieved using the proposed

methodology. The synaptic weights, the architecture and the

transfer function of an ANN are evolved by means of ABC

algorithm. Furthermore, the connections among the neurons

that belong to the ANN are synthesized. This allows generating

a a reduce design of the an ANN with a high performance.

In this work we tested the performance of the ABC algo-

rithm. We have also proved that this novel technique is a good

optimization algorithm, because it does not easily traps in local

minimums. In the case of the proposed methodology we have

demonstrated its robustness; the random choice of the patterns

for each experiment allowed us to get, statistically speaking,

good significant results.

The experiments were tested with two different fitness

functions: FF1 and FF2, based on the MSE and CER,

respectively. Additionally, these fitness functions involve the

synthesis of the architecture. Through these experiments, we

observed that both functions achieved a highly acceptable

performance. Moreover we demonstrated that these fitness

functions can considerably reduce the number of connections

of an ANN with a minimum of error of MSE and CER

functions.

On the other hand, in some of the ANN designs generated

by the proposed methodology, some neurons belonging to the

input layer are not used; they do not present any connections

with other neurons. In this particularly case, we can say that

a reduction of the dimensionality of the input pattern is also

obtained.

In general, the results were satisfactory. The proposed

methodology allows searching the best values that permit

automatically constructing an ANN that generates a good

solution for a classification problem.

ACKNOWLEDGMENT

B. Garro thanks CONACYT for the scholarship provided

during her PhD studies. H. Sossa thanks SIP-IPN under grant

number 20111016, COTEPABE-IPN, DAAD-PROALMEX

under grant J000.426/2009 and the European Union and

CONACYT under grant FONCICYT 93829, for the eco-

nomical support. The content of this paper is an exclusive

responsibility of the CIC-IPN and it cannot be considered that

it reflects the position of the European Union. Authors thank

the anonymous reviewers for their comments to improve the

paper.

337

REFERENCES

[1] X. Yao, “Evolving artificial neural networks,” PIEEE: Proceedings of

the IEEE, vol. 87, 1999.[2] B. A. Garro, H. Sossa, and R. A. Vazquez, “Design of artificial neural

networks using a modified particle swarm optimization algorithm,”in Proceedings of the 2009 international joint conference on Neural

Networks, ser. IJCNN’09. Piscataway, NJ, USA: IEEE Press, 2009,pp. 2363–2370.

[3] ——, “Design of artificial neural networks using differential evolutionalgorithm,” in Proceedings of the 17th international conference on

Neural information processing: models and applications - Volume Part

II, ser. ICONIP’10. Berlin, Heidelberg: Springer-Verlag, 2010, pp.201–208.

[4] D. Karaboga and B. Akay, “Artificial Bee Colony (ABC) Algorithmon Training Artificial Neural Networks,” in Signal Processing and

Communications Applications, 2007. SIU 2007. IEEE 15th, 2007, pp.1–4.

[5] D. Karaboga, B. Akay, and C. Ozturk, “Artificial bee colony (abc)optimization algorithm for training feed-forward neural networks,” inProceedings of the 4th international conference on Modeling Decisions

for Artificial Intelligence, ser. MDAI ’07. Berlin, Heidelberg: Springer-Verlag, 2007, pp. 318–329.

[6] D. Karaboga and C. Ozturk, “Neural networks training by artificialbee colony algorithm on pattern classification,” Neural Network World,vol. 19, no. 10, pp. 279 –292, 2009.

[7] D. Karaboga, C. Ozturk, and B. Akay, “Training neural networkswith abc optimization algorithm on medical pattern classification,” inInternational conference on multivariate statistical modelling and high

dimensional data mining, 2008.[8] C. Ozturk and D. Karaboga, “Classification by neural networks and

clustering with artificial bee colony (abc) algorithm,” in International

symposium on intelligent and manufacturing systems features, strategies

and innovation, 2008.[9] T. Kurban and E. Besdok, “A comparison of RBF neural network training

algorithms for inertial sensor based terrain classification,” Sensors,vol. 9, pp. 6312–6329, 2009.

[10] C. Ozkan, O. Kisi, and B. Akay, “Neural networks withartificial bee colony algorithm for modeling daily referenceevapotranspiration,” Irrigation Science, pp. 1–11, 2010. [Online].Available: http://dx.doi.org/10.1007/s00271-010-0254-0

[11] D. Pham, A. Soroka, A. Ghanbarzadeh, E. Koc, S. Otri, and M. Pack-ianather, “Optimising neural networks for identification of wood defectsusing the bees algorithm,” in Industrial Informatics, 2006 IEEE Inter-

national Conference on, 2006, pp. 1346 –1351.[12] D. Pham and A. O. S. Koc, E.and Ghanbarzadeh, “Optimisation of the

weights of multi-layered perceptrons using the bees algorithm,” in In

Proceedings of 5th international symposium on intelligent manufactur-

ing systems, 2006.[13] D. Karaboga and B. Akay, “A survey: algorithms simulating bee swarm

intelligence,” Artificial Intelligence Review, vol. 31, no. 1, pp. 61–85,Jun. 2009.

[14] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning internal

representations by error propagation. Cambridge, MA, USA: MITPress, 1986, ch. 8, pp. 318–362.

[15] J. A. Anderson, An Introduction To Neural Networks. The MIT Press,1995.

[16] P. Werbos, “Backpropagation through time: what it does and how todo it,” Proceedings of the IEEE, vol. 78, no. 10, pp. 1550 –1560, Oct.1990.

[17] V. Tereshko and A. Loengarov, “Collective decision making in honey-bee foraging dynamics,” Computing and Information System Journal,vol. 9, no. 3, pp. 1 –7, 2005.

[18] D. Karaboga, “An idea based on honey bee swarm for numericaloptimization,” Computer Engineering Department, Engineering Faculty,Erciyes University., Tech. Rep., 2005.

[19] P. M. Murphy and D. W. Aha, “UCI Repository of machine learningdatabases,” University of California, Department of Information andComputer Science, Irvine, CA, US., Tech. Rep., 1994.

338