advanced computational intelligence algorithm based on

Advanced Computational Intelligence Algorithm

Based on Neural and Evolutionary Mechanisms

by

Tao Jiang

A dissertation

submitted to the Faculty of Engineering

in Partial Fulfillment of the Requirements

for the Degree of

Doctor of Engineering

University of Toyama

Gofuku 3190, Toyama-shi, Toyama 930-8555 Japan

2016

(Submitted January 24, 2016)

ii

Abstract

Computational intelligence (CI), a branch of artificial intelligence, makes a clear dis-

tinction from the traditional artificial intelligence based on the mathematical logic.

Computational intelligence uses heuristic algorithms such as fuzzy systems, neural

networks, and evolutionary computation. Computational intelligence also uses tech-

niques such as swarm intelligence, fractal, chaos theory, artificial immune system,

and wavelet. Making full use of these elements such as adaptation, computational

intelligence aims to create intellectual programs. Researchers have learned a lot of

things from natural systems, using the knowledge which has been obtained to develop

new algorithmic models to solve complex problems. Methods developed for low-level

cognitive functions include supervised and unsupervised learning by adaptive system-

s, and they encompass not only neural, fuzzy, and evolutionary approaches but also

probabilistic and statistical approaches, such as Bayesian networks or kernel methods.

These methods are used to solve the same type of problems in various fields such as

pattern recognition, signal processing, classification and regression, and data mining.

In order to more effectively deal with intricate data in the real world, researchers

have studied how to combine these intelligence methods to find the best way for real-

world problems. We have been concentrated mainly on artificial neural networks,

artificial immune systems, and evolutionary computation.

Accumulative results of these studies have suggested that synaptic nonlinearities

of dendrites in a single neuron can possess a powerful computational capacity. We

have established an approximate neuronal model that is able to capture the nonlin-

earities among excitatory and inhibitory inputs and thus is able to successfully make

predictions about the morphology of neurons when the model has been used for spe-

iii

cific learning tasks. Back-propagation (BP) method based on gradient has been used

to train the dendritic neuron model. Because of its inherent local optima trapping

problem, the BP method usually cannot find satisfactory solutions. Therefore we also

propose an artificial immune algorithm to train the dendritic neuron model. The arti-

ficial immune algorithm has an advantage that the training process does not provide

gradient information, which enables the dendritic model to utilize non-conventional

transfer/activation functions in soma. The learning can be accomplished on the basis

of the population of antibodies, where a potential parallel computing is used. It also

greatly improves the probability of jumping out of the local optima during training.

The single neuron model with synaptic nonlinearities in a dendritic tree was also

applied to liver disease diagnosis. Artificial neural network has provided physicians

with a powerful tool to analyze, compute, and figure out complex data across many

medical applications. The single neuron model (NMSN) simulates the essence of

nonlinear interactions among synaptic inputs in the dendrites. Experimental results

suggested that NMSN was superior to the traditional BPNN with the similar com-

putational architecture or with the best performance. NMSN has a distinct ability of

pattern extraction through a pruning function, which is a metaphor of the neuronal

morphology. We also focused on gravitational search algorithm (GSA) in dealing

with complex optimization problems. Because it still has some drawbacks, such as

slow convergence and the tendency to become trapped in local minima, we combined

Chaos with GSA to enhance its searching performance. In our work, other four differ-

ent chaotic maps are utilized to further improve the searching capacity of the hybrid

chaotic gravitational search algorithm (CGSA), and six benchmark instances, which

are widely used for optimization, are chosen from the literature as the test suit. All

five chaotic maps can improve the performance of the original GSA in terms of the

solution quality and convergence speed. The four newly incorporated chaotic maps

exhibit a better influence on improving the performance of GSA than the logistic map,

suggesting that the hybrid searching dynamics of CGSA is significantly affected by

the distribution characteristics of chaotic maps. We still worked on the evolutionary

algorithms, differential evolution in particular, which is well known as a stochastic

iv

search method for real-parameter optimization over continuous space. Differential

evolution is still limited in finding uniformly distributed solutions near optimal Pare-

to fronts. To alleviate such limitations, we introduced an adaptive mutation operator

to avoid the prematurity of convergence by tuning the mutation scale factor F and

adopted -dominance strategy to update the archive that stores the non-dominated

solutions. The effectiveness of our proposed approach was demonstrated with respect

to the quality of solutions in terms of the convergence and diversity of the Pareto

fronts.

Computational intelligence is now playing a greatly important part in our daily

life. The methods that we developed can help people get important information ef-

fectively from complex data and thus find optimal solutions. We plan to investigate

the user-defined parameter sensitivities of the proposed artificial immune algorithm

and apply the proposed model to more problems. We also try to adaptively use mul-

tiple chaotic maps simultaneously in the chaotic search to construct a more powerful

CGSA and analyze the search dynamics of the algorithm. The study of computation-

al intelligence, particularly of the mechanisms and constructions of single neuron and

the swarm intelligence, will be continued in the future.

v

Contents

Abstract ii

1 Introduction 1

1.1 Computational Intelligence Paradigms . . . . . . . . . . . . . . . . . 1

1.2 Short History of Computational Intelligence . . . . . . . . . . . . . . 2

1.3 Applications and Improvement in my study . . . . . . . . . . . . . . 4

2 Traditional Computational Intelligence 8

2.1 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Evolutionary Computation . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 Swarm Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.1 Ant colony optimization algorithms . . . . . . . . . . . . . . . 13

2.3.2 Particle swarm optimization algorithm . . . . . . . . . . . . . 14

2.4 Artificial Immune Systems . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5 Fuzzy Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 Dendritic Neural Model: Computation Capacity 18

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2 Proposed single neural model based on dendritic struture . . . . . . . 20

3.2.1 Synaptic layer . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2.2 Branch layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.3 Membrane layer . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.4 Soma layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.5 Neuronal-pruning Function . . . . . . . . . . . . . . . . . . . . 23

vi

3.3 Error Back-propagation Learning algorithm . . . . . . . . . . . . . . 24

3.4 Experimental results and discussion . . . . . . . . . . . . . . . . . . . 26

3.4.1 Performance comparison . . . . . . . . . . . . . . . . . . . . . 26

3.4.1.1 Convergence comparison . . . . . . . . . . . . . . . . 26

3.4.1.2 Classification accuracy comparison . . . . . . . . . . 27

3.4.2 The synaptic and dendritic morphology after learning . . . . . 27

3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4 Dendritic Neural Model: Immunological Learning Algorithm 29

4.1 Research Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.2 Single Dendritic Neural Model for Morphology Prediction . . . . . . . 31

4.3 Artificial Immune Training Algorithm . . . . . . . . . . . . . . . . . . 36

4.3.1 Immunological Inspiration . . . . . . . . . . . . . . . . . . . . 36

4.3.2 Training Algorithm based on Immune Mechanisms . . . . . . 39

4.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.4.1 Experiments Setup . . . . . . . . . . . . . . . . . . . . . . . . 41

4.4.2 Results Analysis and Discussions . . . . . . . . . . . . . . . . 41

4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5 Dendritic Neural Model: Classification Ability 46

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.2 Backgrounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.2.1 ANN in medical diagnosis . . . . . . . . . . . . . . . . . . . . 48

5.2.2 The discovery of synaptic nonlinearity in single neuron . . . . 49

5.3 Single Dendritic Neural Model for Classification . . . . . . . . . . . . 51

5.4 Learning algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.5 Experimental results and discussion . . . . . . . . . . . . . . . . . . . 57

5.5.1 Experimental environment and evaluation metrics . . . . . . . 57

5.5.2 The liver disease database description . . . . . . . . . . . . . . 58

5.5.3 Experimentation setup and results . . . . . . . . . . . . . . . 59

5.5.3.1 Optimal parameters setting . . . . . . . . . . . . . . 59

vii

5.5.3.2 Performance comparison . . . . . . . . . . . . . . . . 61

5.5.3.3 Convergence properties . . . . . . . . . . . . . . . . . 64

5.5.3.4 ROC analysis . . . . . . . . . . . . . . . . . . . . . . 64

5.5.4 The final synaptic and dendritic morphology . . . . . . . . . . 65

5.6 Conclusions and Remarks . . . . . . . . . . . . . . . . . . . . . . . . 67

6 Evolutionary Model: Chaotic Gravitation Search 71

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6.2 Overview of GSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6.3 chaotic maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

6.3.1 Logistic map . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

6.3.2 Piecewise linear chaotic map . . . . . . . . . . . . . . . . . . . 76

6.3.3 Gauss map . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.3.4 Sinusoidal map . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.3.5 Sinus map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.4 Chaotic gravitational search algorithm . . . . . . . . . . . . . . . . . 78

6.5 Numerical simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.5.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . 80

6.5.2 Results and discussions . . . . . . . . . . . . . . . . . . . . . . 81

6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

7 Evolutionary Model: Multi-objective Differential Evolution 89

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

7.2 Brief Introduction to DE . . . . . . . . . . . . . . . . . . . . . . . . . 91

7.3 Design of multi-objective differential evolution algorithm . . . . . . . 92

7.4 Simulation and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 95

7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

8 Conclusions 100

Bibliography 103

viii

Acknowledgements 123

ix

List of Figures

2.1 A biological neuron. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Mc-Culloch-Pitts neuron model. . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Graphical representation of multi-layer perceptron. . . . . . . . . . . . 10

3.1 The architecture of the proposed dendritic neuron model. . . . . . . . . 20

3.2 Four connections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3 Evolution of predicted dendrite structure by neural pruning. . . . . . . 24

3.4 Convergence graphs obtained by the proposed dendritic neuron model

and BPNN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.5 Predicted dendrite structure by neural pruning obtained by the proposed

model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.1 Schema of a neuron model with dendritic branches. Axons of presynaptic

neurons (input X) connect to branches of dendrites (horizontal blue

lines) by synaptic layers (black triangles); the membrane layer (vertical

blue lines) sums the dendritic activations, and transfers the sum to the

soma body (black sphere). . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2 Four connection states of synaptic layers. The left figure responds state

before training, each synaptic layer will land on one of the right four

connection states by training, which constitutes the structure of ALMN. 33

4.3 Six function cases of the synaptic layer. The graph’s horizontal x axis

represents the inputs of presynaptic neurons; the vertical y axis shows

the output of the synaptic layer. Because the range of x is [0,1], only

the corresponding part needs to be observed. . . . . . . . . . . . . . . 34

x


4.5 Biological immune procedures used as the training algorithm for single

dendritic neural model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.6 Mutation operators used in the artificial immune training algorithm. . 40

4.7 Final dendritic morphology of the XOR problem after training. . . . . 43

5.1 The architecture of the proposed dendritic neuron model. . . . . . . . . 51

5.2 Six function cases of the synaptic layer. . . . . . . . . . . . . . . . . . 52


5.4 Confusion matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.5 Comparison of convergence speed of NMSN and BPNN. . . . . . . . . 65

5.6 The ROC curves of NMSN and BPNNs. . . . . . . . . . . . . . . . . . 66

5.7 The AUC values of NMSN and BPNNs. . . . . . . . . . . . . . . . . . 67

5.8 The evolution of the neuronal morphology. . . . . . . . . . . . . . . . . 70

6.1 The distribution of x under certain system parameters in 20000 itera-

tions when x0 = 0.74 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6.2 Statistical values of the final best-so-far solution obtained by the six

algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.3 The average fitness trendlines of the best-so far solution found by the

six algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.4 The ratio of best-so-far solutions found by the six algorithms. . . . . . 84

7.1 The general flow chart of the proposed adaptive mutation based multi-

objective differential evolution (IDE). . . . . . . . . . . . . . . . . . . . 93

7.2 Pareto fronts obtained by IDE and its competitor algorithm MDE on

ZDT1, ZDT2, ZDT3, ZDT4, and ZDT6 respectively. . . . . . . . . . . 97

xi

List of Tables

3.1 Parameter setting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.2 Exclusive OR problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3 Classification accuracy. . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.1 Target XOR training data. . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.2 The training data set of slope stability classification problem. . . . . . 44

4.3 The test data set of slope stability classification problem . . . . . . . . 45

4.4 Average final least squared error after learning using BP and artificial

immune algorithm for XOR and slope stability. . . . . . . . . . . . . . 45

5.1 Terms used to define sensitivity, specificity and accuracy. . . . . . . . . 58

5.2 Basic features for Liver Disorders. . . . . . . . . . . . . . . . . . . . . . 59

5.3 No. of patterns in the training and testing data set. . . . . . . . . . . . 59

5.4 Parameter levels in NMSN. . . . . . . . . . . . . . . . . . . . . . . . . 60

5.5 L16(45) orthogonal array and factor assignment. . . . . . . . . . . . . . 61

5.6 Structures of NMSN and BPNN for Liver disorders dataset. . . . . . . 62

5.7 Classification results by NMSN and BPNN. . . . . . . . . . . . . . . . 62

5.8 Comparison of the simulations results between NMSN and BPNN. . . 63

5.9 Classification accuracies for BUPA Liver Disorders problem obtained by

other methods in literature. . . . . . . . . . . . . . . . . . . . . . . . . 69

6.1 The function name, definition, dimension, feasible interval of variants,

and the known global minimum of six benchmark function. . . . . . . . 86

6.2 Statistical results of different methods for Sphere function (f1). . . . . 87

6.3 Statistical results of different methods for Schwefel function (f2). . . . 87

xii

6.4 Statistical results of different methods for Rosenbrock function (f3). . 87

6.5 Statistical results of different methods for Schwefel 2.26 function (f4). 87

6.6 Statistical results of different methods for Ackley function (f5). . . . . 88

6.7 Statistical results of different methods for Griewank function (f6). . . 88

7.1 Comparison of the convergence metric between IDE and MDE. . . . . 96

7.2 Comparison of the diversity metric between IDE and MDE. . . . . . . 96

7.3 Comparison of the convergence metric during IDE, NSGA-II, SPEA2,

and MOEO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

7.4 Comparison of the diversity metric during IDE, NSGA-II, SPEA2, and

MOEO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

1

Chapter 1

Introduction

1.1 Computational Intelligence Paradigms

The design of algorithmic models has become a major thrust in algorithmic develop-

ment to solve problems which have become more and more complicated. Great suc-

cesses have been achieved through the modeling of natural and biological intelligence.

Computational intelligence paradigms include artificial neural networks, evolutionary

computation, swarm intelligence, artificial immune systems, and fuzzy systems [1].

They form part of the field of Artificial Intelligence, together with logic, deductive

reasoning, expert systems, case-based reasoning and symbolic machine learning sys-

tems. Computational Intelligence (CI) studies adaptive mechanisms to use intelligent

in complex and changing environments. These mechanisms exhibit an ability to gen-

eralize, abstract, discover and make sense in new situations. Every computational

intelligence paradigms has its origins in biological systems. Neural networks model

biological neural systems, evolutionary computation originated from natural evolu-

tion (including behavioral and genetic evolution), swarm intelligence models the social

behavior of organisms living in swarms or colonies, artificial immune system model-

s the human immune system, and fuzzy system originated from studies of the way

organisms interact with their environment [1].

2

1.2 Short History of Computational Intelligence

The first definition of artificial intelligence was established only in the 1950s by Alan

Turing. Turing studied how machinery could be used to mimic processes of the

human brain, which resulted in one of the first publications of AI, named Intelligent

Machinery [1].

The term artificial intelligence was first raised at the Dartmouth conference in

1956, organized by John MacCarthy who was regarded as the father of Artificial

intelligence. From 1956 to 1969 there were many researches in modeling biological

neurons, in which the most notable one was the work on perceptrons by Rosenblatt,

and the adaline by Widrow and Hoff. In 1969, Minsky and Papert caused a great

setback to artificial neural network research, concluding that the extension of simple

perceptrons to multilayer perceptrons is sterile. The research in neural networks kept

to be stagnant until the mid-1980s with the resurrection of neural networks research by

landmark publications from Hopfield, Hinton, Rumelhart and McLelland. Research

in neural networks started to explode from the late 1980s and it is one of the largest

research areas in Computer Science today [1].

The development of evolutionary computation started in the 1950s, with genetic

algorithms in the study of Fraser, Bremermann and Reed. However, it is John Hol-

land who is generally viewed as the father of evolutionary computation. The works

of evolutionary computation modeled Elements of Darwins theory of evolution al-

gorithmically [2]. Evolutionary strategies (ES) was developed by Rechenberg in the

1960s, and evolutionary programming was developed by Lawrence Fogel independent-

ly, as an approach to develop behavioral models. There are many other important

contributions made by De Jong, Schaffer and other scientists to shape the field of

evolutionary computation.

The history of fuzzy logic was believed to start with Gautama Buddha and Bud-

dhism, but the Western community considers that the study of Aristotle on two-valued

logic was the birth of fuzzy logic. In 1920 Lukasiewicz published the first deviation

from two-valued logic in his work on three-valued logic, which expanded to an arbi-

3

trary number of values later. It was Max Black, a quantum philosopher, who firstly

introduced quasi-fuzzy sets, wherein degrees of membership to sets were assigned to

elements [1]. Lotfi Zadeh was the developer of fuzzy sets, who contributed most to the

field of fuzzy logic [3]. Until the 1980s fuzzy systems was an active field, but it also

experienced a dark age in the 1980s. It was revived by Japanese researchers in the

late 1980s. Nowadays fuzzy systems are wildly used in many successful applications,

especially in control systems.

Swarm intelligence was firstly put forward by Eugene N Marais, an South African

poem, who had made great contributions in his works of social behaviors of apes and

ants, namely The Soul of the White Ant [4] and The Soul of the Ape [5]. Swarm

intelligence was modeled algorithmically in the work of Marco Dorigo on the modeling

of ant colonies in the early 1990s. In 1995, Eberhart and Kennedy [6,7] developed the

particle swarm optimization algorithm, modeling the behaviors of bird flocks. Swarm

intelligence has become a promising research field and has been used to resolve real-

world problems.

The theoretical definition of clonal selection in the natural immune system was

initially made by Burnet [8] as B-Cells and Killer-T-Cells with antigen-specific re-

ceptors, enhanced by the introduction of the concept of a helper T-Cell by Bretscher

and Cohn [9]. Later Lafferty and Cunningham [10] added a co-stimulatory signal

to the helper T-Cell model. Different artificial immune models have been developed

on the basis of a specific theory on immunology or a combination of the different

immunology theories. The first model in artificial immune system was the discrimi-

nation between self and non-self with mature T-Cells introduced by Forrest et al. [11],

using a training technique known as the negative selection of T-Cells [12]. The clonal

selection theory was firstly implemented to optimization problems on the model of

Mori et al [13]. The network theory of the natural immune system was introduced by

Jerne [14] that the B-Cells are interconnected to make a network of cells [14,15]. The

Jerne theory was further developed by Perelson [15]. The network theory of Jerne

was first modeled mathematically by Farmer et al. [16]. For data mining and data

analysis tasks the network theory has been modeled into artificial immune systems,

4

of which the earliest artificial immune system research was published by Hunt and

Cooke [17]. The danger theory, based on the co-stimulated model of Lafferty and

Cunningham [10,18,19], was introduced by Matzinger [20,21]. It is mainly viewed by

the danger theory that the immune system distinguishes between what is dangerous

and non-dangerous in the body. The first work of AISs based on danger theory was

published by Aickelin and Cayzer [22].

1.3 Applications and Improvement in my study

Our part of the study was concentrated on modeling the single neuron and applied

the single models to some real world questions. In the traditional ANNs’ literatures,

the prevailing view has been that the brain has strong computational abilities be-

cause of the complex connectivity of neural networks, in which a single neuron could

only perform a linear summation and a nonlinear thresholding operation (all-or-none

response). As a consequence, the contribution of single neurons and their dendrites

has long been overlooked. Recently it has been conjectured by a series of theoreti-

cal studies that individual neurons could act more powerfully as computational units

considering synaptic nonlinearities in a dendritic tree. The various types of synaptic

plasticity and nonlinearity mechanisms allow synapses to play a more important role

in computations. Synaptic inputs from different neuronal sources can be distribut-

ed spatially on the dendritic tree and plasticity in neuron can result from changing

in synaptic strength or connectivity, and the excitability of the neurons themselves.

Moreover, a slight morphological difference can just cause great functional variation,

acting as filters to determine what signals a single neuron receives and then how these

signals are integrated. However, there is no effective model that can capture the non-

linearities among excitatory and inhibitory inputs while predicting the morphology

and its evolution of synapses and dendrites.

We propose a new single neuron model with synaptic nonlinearities in a dendritic

tree. The computation on neuron has a neuron-pruning function that can reduce

dimension by remove useless synapses and dendrites during learning, forming a precise

5

synaptic and dendritic morphology. The nonlinear interactions in a dendrite tree are

expressed using the Boolean logic AND (conjunction), OR (disjunction) and NOT

(negation). An error back propagation algorithm is used to train the neuron model.

Furthermore, we apply the new model to the Exclusive OR (XOR) problem and it can

solve the problem perfectly with the help of inhibitory synapses which demonstrate

synaptic nonlinear computation and the neurons ability to learn.

The previous works have established an approximate neuronal model which is

able to capture the nonlinearities among excitatory and inhibitory inputs and thus

successfully predict the morphology of neurons when performing specific learning

tasks. Gradient based back-propagation (BP) method has been used to train the

dendritic neuron model. Due to its inherent local optima trapping problem, the BP

method usually cant find satisfactory solutions. In the following work, we proposed

an artificial immune algorithm to train the dendritic neuron model. In comparison to

BP, the artificial immune algorithm has advantages that the training process doesnt

necessarily provide gradient information, which enables the dendritic model can utilize

non-conventional transfer/activation functions in soma, and that the learning can be

accomplished based on a population of antibodies which is in a potential parallel

computing manner and greatly improve the probability of jumping out the local

optima during training. Experimental results based on the famous XOR problem and

a geotechnical engineering problem verified the effectiveness of the proposed artificial

immune algorithm.

We also applied proposed new single neuron model (NMSN) to liver disease diag-

nosis. ANN has provided a powerful tool for physicians to analyze, compute and figure

out complex data across many medical applications. The advent of ANN brought the

hope to improve diagnostic accuracy with its ability to capture complex nonlinear

and multidimensional relationships among variables. The single neuron model with

synaptic nonlinearities (NMSN) proposed simulates the essence of nonlinear interac-

tions among synaptic inputs in the dendrites. We assume that each branch receives

signals at their synapses and performs a multiplication of these signals, while the

synapses perform a sigmoidal nonlinear operation on their inputs. The branching

6

point sums up each multiplied input and then the current is transmitted to the cell

body (soma). Once exceeding the threshold, the cell fires and sends signal down

to other neurons through axon. The performance of NMSN was verified based on

the liver disease diagnostic problems. Experimental results suggested that NMSN

was superior than the traditional BPNN with the similar computational architecture

(denoted as BPNN-15) or with the best performance (namely BPNN-40), in terms

of classification accuracy, convergence properties, and AUC criterion. In addition,

NMSN also produced better or competitive solutions than a number of previous-

ly proposed methods, such as SVM, C4.5, Classification tree, KNN, Neuron-fuzzy

model, etc. NMSN has a distinct ability of pattern extraction through the pruning

function, which is a metaphor of the neuronal morphology.

By learning a larger than necessary initial network, and thereafter screening out

the useless synapses and unnecessary dendrites, NMSN can finally produce a neuron

with least necessary dentritic morphology. The resultant neuron can not only possess

significant higher computational capacity than the traditional Mc-Culloch-Pitts linear

neuron model which is incapable of solving even the simple 3-bit parity problem, but

also provide a possible information processing mechanism of the neuronal morphology

and plasticity. These findings and evidences might also give some insights into the

development of new techniques for understanding the mechanisms and constructions

of single neurons.

The other part of our study focused on gravitational search algorithm (GSA) in

dealing with complex optimization problems. Because it still has some drawbacks,

such as slow convergence and the tendency to become trapped in local minima, we

used Chaos, which is generated by the logistic map and has the properties of ergodicity

and stochasticity, to combine with GSA to enhance its searching performance. In our

work, other four different chaotic maps are utilized to further improve the searching

capacity of the hybrid chaotic gravitational search algorithm (CGSA), and six widely

used benchmark optimization instances are chosen from the literature as the test suit.

Simulation results indicate that all five chaotic maps can improve the performance of

the original GSA in terms of the solution quality and convergence speed. Moreover,

7

the four newly incorporated chaotic maps exhibit better influence on improving the

performance of GSA than the logistic map, suggesting that the hybrid searching dy-

namics of CGSA is significantly affected by the distribution characteristics of chaotic

maps.

We still worked on the evolutionary algorithms especially differential evolution

(DE) which is well known as a powerful and efficient population-based stochastic

real-parameter optimization algorithm over continuous space. DE is recently shown

to outperform several well-known stochastic optimization methods in solving multi-

objective problems. Nevertheless, its performance is still limited in finding a uniformly

distributed and near optimal Pareto fronts. To alleviate such limitations, we intro-

duced an adaptive mutation operator to avoid premature of convergence by adaptively

tuning the mutation scale factor F, and adopts ε-dominance strategy to update the

archive that stores the non-dominated solutions. Experiments based on five wide-

ly used multiple objective functions are conducted. Simulation results demonstrate

the effectiveness of our proposed approach with respect to the quality of solutions in

terms of the convergence and diversity of the Pareto fronts.

8

Chapter 2

Traditional ComputationalIntelligence

2.1 Artificial Neural Networks

The brain is composed of approximately 1011 neurons with more than 1015 connec-

tions between them. Though variable in size and shape, all neurons are composed of

three parts: the cell body, the axon and the dendrites, as illustrated in Fig. 2.1. Den-

drites receive input from other neurons, including but not limited to direct input from

the sensory system involved, at a connection called a synapse and then transmit the

message to the cell body directly or via dendrites. When the net excitation achieves

a threshold value, the neuron fires and sends signals to other neurons through the

axon. A neuron can either inhibit or excite a signal [23].

The brain is able to perform tasks such as pattern recognition and perception

much faster than computer. The brain can also learn, memorize and generalize.

Current successes in neural modeling are solving a specific task by small artificial

neural networks. An artificial neural network (NN) is a layered network of artificial

neurons. Tasks with a single objective can be solved quite easily by neural networks

with suitable size because of the constraints by the capabilities of modern computing

power and storage space [23].

An artificial neural network (ANN) is a mathematical representation of the human

neural architecture, reflecting its “learning” and “generalization” abilities. In the

9

Figure 2.1: A biological neuron.

threshold

non-linearity

W1

outputinputs

Wn

Figure 2.2: Mc-Culloch-Pitts neuron model.

1940s, Warren McCulloch and Walter Pitts explored the computational abilities of

mathematical models of neural networks made up of simple neurons. The neural

networks will compute any of the finite basic Boolean logical functions. The Mc-

Culloch-Pitts neuron model has been widely used as a basic unit for modern studies

of neural networks which multiplies the input vector by a weight vector, and then

passes through a linear threshold gate (see Fig. 2.2). The neurons can have the

ability to learn arbitrary linearly separable dichotomies of the inputs space through

adjusting weights and thresholds of synapses.

However, such neural networks were considered to be too inflexible to be applied

as models of cognition because of their inability to generalize. The development of

ANNs was promoted by Rosenblatt, who proposed a more flexible method based on

statistical separability on the analysis of neurons. Rosenblatt developed a class of

networks known as perceptrons. A typical perceptron, is made up of 3 layers of

cells: an input layer, hidden layers and an output layer (Fig. 2.3). Inputs in one

10

threshold

non-linearity

W1

outputinputs

Wn

Figure 2.3: Graphical representation of multi-layer perceptron.

layer are connected, fully or partially, to the neurons in the middle layer. These

neurons are then connected to the response layer of neurons in a random way. The

response neurons produce the outputs of the network, but also inhibit each other.

The generalization ability of perceptrons is shown when the response cell inhibits the

others through receiving the strongest input, and its response become the output. In

addition, perceptrons were also shown to be capable of learning [23].

Artificial neural networks were early used as practical applications by Widrow

and Hoff, who developed ADALINE, a simple neuron similar to the perceptron and

networks of ADALINEs called MADALINE. Widrow and Hoff also developed least

mean square, which is a supervised learning procedure considered as a pioneer to the

backpropagation learning algorithm [23].

One of the most significant developments of neural networks was the discovery

of a learning algorithm known as backpropagation to adjust the value of weights in

a multi-layer feedforward network. Using the backpropagation learning algorithm,

neural networks become more effective to solve nonlinear problems, leading to more

adoption for solving practical problem. Though many learning algorithms are avail-

able for artificial neural network, depending on its type and its practical application,

backpropagation learning algorithm is the one used most frequently.

Neural networks have been successfully applied to many data-intensive applica-

tions. These applications include: classification, prediction, pattern recognition, con-

trol and so on.

11

2.2 Evolutionary Computation

Although we can trace the origins of evolutionary computation back to the late 1950’s,

evolutionary computation has really drawn public attentions during the last decade.

However, it did not get enough development at that time for some reasons such as

lack of powerful computer platforms and the defects of previous methods [24].

Evolutionary computation should be considered as a general adaptable concept

for solving difficult optimization problems, as using evolutionary search can not only

gain flexibility and adaptability to the current task but also can combine with robust

performance and global search characteristics. There are three closely connected but

separately developed approaches currently implemented: genetic algorithms, evolu-

tionary programming, and evolution strategies.

Genetic algorithm (GA), a search heuristic that mimics the process of natural

selection, has been originally proposed as a general model of adaptive processes.

The largest application of the techniques is routinely to generate useful solutions to

optimization and search problems, inspired by natural evolution, such as inheritance,

mutation, selection, and crossover [1]. Evolutionary programming is similar to genetic

programming, but the structure of the program to be optimized is fixed, while its

numerical parameters are allowed to evolve. Evolutionary programming was originally

used as a learning process aiming to generate artificial intelligence. Finite state

machines (FSM) were evolved and were used as predictors on the basis of former

observations. The performance of an FSM might be measured on the basis prediction

capability of the machine. Currently evolutionary programming has no fixed structure

and it is becoming harder to be distinguished from evolutionary strategies [24].

Evolution strategies, an optimization technique based on ideas of adaptation and

evolution, were initially developed to solve difficult discrete and continuous opti-

mization problems. The neo-Darwinian model of bio-evolution is represented by the

structure of the following evolutionary algorithm.

Algorithm 1– General evolution framwork based on neo-Darwinian model.

12

Begin:

t := 0

initialize M(t)

evaluate M(t)

While not termination conditions fulfilled do

M ′(t) := variation [M(t)]

evaluate [M ′(t)]

M(t+ 1) := select [M ′(t) ∪M(t)]

t := t+ 1

End

In this algorithm, M(t) denotes a population of n individuals at generation t.

N is a special set of individuals considered for selection. An offspring population

M ′(t) of size λ is generated through variation operators. The offspring individuals

are then evaluated by calculating the values of objective function as each of the

solutions represented by individuals in M ′(t) and selection based on the fitness values

is implemented to get better solutions. The better an individual performs under these

conditions the greater is the possibility for the individual to live longer and generate

offspring. The uncertain nature of reproduction leads to a permanent production of

novel genetic information, thus to the creation of diverse offspring [25–27].

Evolutionary computation is closely related to some other techniques, such as

neural networks and fuzzy logic, which are usually considered as part of artificial in-

telligence. According to Bezdek [28,29], it is their characteristic of numerical knowl-

edge representation that distinguishes them from traditional artificial intelligence.

Moreover, the following characteristics were proposed by Bezdek that computational

intelligence should have:

1) numerical knowledge representation;

2) fault tolerance;

3) adaptability;

4) error rate optimality;

5) processing speed comparable to human cognition processes.

13

2.3 Swarm Intelligence

Swarm intelligence (SI) originated from the observation of the social behavior of

organisms, or the study of colonies. Efficient swarm optimization and clustering

algorithms derived from foraging behavior of ants and choreography of bird flock, such

as the ant colony optimization (ACO) algorithms and the particle swarm optimization

(PSO) algorithm. The swarm can always find an optimal pattern [30–32].

Swarm intelligence models are designed to model the simple individual behaviors

and the local interactions with the neighbor and environment, for the purpose of un-

derstanding more complicated behaviors that are useful for solving complex problems,

mostly optimization problems.

2.3.1 Ant colony optimization algorithms

An ant can be seen as a stimulus response agent [33–36]. For ants, the pheromone is

the stimulus and each ant perceives pheromone concentrations of local environment

and produces an action to select a direction with the highest pheromone concentra-

tions probabilistically. Thus an ant can be considered as a simple computational

agent and this simple behavior of real ants can be modeled algorithmically. The ar-

tificial ant decision process is shown in Algorithm 2. When the ant needs to make a

decision, this algorithm will be executed.

Algorithm 2– Artificial Ant Decision Process.

Begin:

Let r ∼ U(0, 1)

For each potential path A do

Calculate PA

If r < PA then

Follow path A

Break;

End

14

End

In Algorithm 2, PA represents the probability of the next ant to choose path

A. Ant algorithms have been wildly applied to real-world problems such as the

TSP [37–40]. However, ACO algorithms can only be applied to optimization problems

while meeting some requirements such as an appropriate graph should be able to

represent all states and transitions in discrete search space [41–43].

2.3.2 Particle swarm optimization algorithm

Particle swarm optimization (PSO) is a stochastic population-based search algorithm,

based on simulation of two simple social behaviors of individual birds within a flock:

each bird (1) moves toward its closest best neighbor, and (2) moves back to its

experienced best state. These two social behaviors lead all birds to converge on their

best environment state [6, 44,45].

Each individual in the swarm represents a candidate solution of the optimization

problem. In a PSO system, each particle fly through the hyper-dimensional search

space and it will be affected by other particles in the swarm to adjust its position in

search space. A particle uses the best position experienced by itself and its neighbors

to position itself toward a best solution. Although particle moves toward an optimum,

it will still search a wider area around the current optimum solution. The performance

of each particle is measured by a hypothetical fitness function considering the problem

to be solved [46–49]. PSO has been applied to problems including optimization of

mechanical structures, function approximation, clustering, and solving systems of

equations.

2.4 Artificial Immune Systems

The natural immune system has a powerful pattern matching ability to distinguish

between the cells belonging to the body (self) and foreign cells entering the body (non-

self). While encountering antigen, the adaptive nature of the natural immune system

15

will be shown by memorizing the structure of these antigen for quicker response when

encountering again in the future [1].

Artificial Immune System has powerful information processing capabilities such

as pattern recognition, feature extraction, learning and memory. However, it is a

highly complicated system and is still under active research. The current artificial

immune systems primarily adopt three immunological principles including the im-

mune network theory, the mechanisms of negative selection, and the clonal selection

principles [50,51].

The immune network theory, based on Jerne’s idiotypic network theory [14,52,53],

proposed a hypothesis that the immune system maintains a network of interconnected

B-cells for antigen recognition. These cells build a stable network by both stimulating

and suppressing each other in certain ways. Two B-cells will be combined if the

value of affinities between them goes beyond a certain threshold, and the connection

strength is directly proportional to the affinities [54].

The negative selection algorithm is composed of three phases: defining self, gener-

ating detectors and monitoring anomaly. The negative selection algorithm originated

from the mechanism to train the T-cells to distinguish antigens and to prevent foreign

antigens from recognizing the cells belonging to the body. A set of (binary) detector

are generated to detect anomaly [55]. The clonal selection principle [56] is a descrip-

tion of how an immune response to an antigenic stimulus. It suggests that only those

cells that recognize the antigen proliferate can be selected. The main characteristics

of the clonal selection theory are that [57,58]:

1) The new cells are copies of their parents which are subjected to somatic hyper-

mutation;

2) The newly differentiated lymphocytes which carry self-reactive receptors are

eliminated;

3) Proliferation and differentiation on interaction of mature cells with antigens.

Algorithm 3 is a proposal of basic AIS. Each of the algorithms parts is briefly

explained next.

16

Algorithm 3– Basic AIS Algorithm.

Begin:

Initialize a set of ALCs as population C

Determine the antigen patterns as training set DT

While some stopping condition(s) not true do

for each antigen pattern zp ∈ DT do

Select a subset of ALCs for exposure to zp, as population S ⊆ C;

for each ALC, xi ∈ S do

Calculate the antigen affinity between zp and xi

End

Select a subset of ALCs with the highest calculated antigen affinity as population

H ⊆ S

Adapt the ALCs in H with some selection method, based on the calculated antigen

affinity and/or the network affinity among ALCs in H

Update the stimulation level of each ALC in H

End

End

Artificial immune systems have many successful applications for many problem

domains, ranging from network intrusion and anomaly detection to pattern recogni-

tion, data classification, virus detection, and data mining. The AIS methods based

on genetic algorithm are applied to some structural optimization problems with t-

wo objectives. The optimum solutions are defined as antigens and the rest of the

population are defined as a pool of antibodies [59–61].

2.5 Fuzzy Systems

As our observations and reasoning often include a measure of uncertainty, we need

fuzzy sets and fuzzy logic which can perform an approximate reasoning. The degree

of certainty can be measured that an element belongs to a set with fuzzy sets. Fuzzy

logic allows reasoning with uncertainty to bring out new possible facts [1].

17

Fuzzy sets are an extension of two-valued sets to handle partial fact, which enables

the modeling to accommodate the uncertainty. Different to classical sets, elements

of a fuzzy set have degree measurement of belonging to that set, which indicates the

certainty (or uncertainty). Suppose X is the domain, and x ∈ X is a specific element

of the domainX. The fuzzy set A is characterized by a membership mapping function.

µA : X → [0, 1] (2.1)

Therefore, for all x ∈ X, µA(x) indicates the certainty that element x belongs to

fuzzy set A. In the case of two-valued sets, µA(x) is either 0 or 1.

For a discrete domain X, the fuzzy set can either be expressed in the form of an

nx-dimensional vector. If X = x1, x2, ..., xn,

A = {(µA(xi)/xi)|xi ∈ X, i = 1, ..., nx} (2.2)

Or in the form of using sum notation,

A = µA(x1)/x1 + µA(x2)/x2 + ...+ µA(xnx/xnx) =nx∑i=1

µA(xi)/xi (2.3)

A continuous fuzzy set A, is denoted as

A =

∫X

µ(x)/x (2.4)

The uncertainty in fuzzy systems should not be confused with statistical uncer-

tainty. Instead of basing on the laws of probability, nonstatistical uncertainty is based

on vagueness. Statistical uncertainty is altered through observations while nonsta-

tistical uncertainty is an intrinsic property of a system which cannot be altered by

observations [1, 3, 62,63].

There are many successful applications using fuzzy systems such as control sys-

tems, braking systems in vehicles, controlling traffic signals, and many others.

18

Chapter 3

Dendritic Neural Model:Computation Capacity

3.1 Introduction

Neurons are the building blocks of the nervous system. The brain has approximately

1011 neurons and each neuron may be connected to up to 10,000 other neurons,

passing signals to each other through about 1,000 trillion synaptic connections. The

neuron consists of a cell body (or soma) with branching dendrites, a cell membrane

and an axon, which conduct the nerve signal. First dominant conceptual model on

neural networks was a single neuron model called the binary Mc-Culloch-Pitts neuron,

which was proposed by McCulloch and Pitts in 1943 [64]. It has been criticized to

be oversimplified for not considering nonlinearities in a dendrite tree and some rather

elementary computations, such as the Exclusive OR problem, could not be solved by

a single layer of Mc-Culloch-Pitts neuron model according to Minsky and Papert [65].

The prevailing view in the traditional artificial neural networks literature, has been

that the brain has powerful computational abilities due to the complex connectivity

of neural networks, in which a single neuron could only perform a linear summation

and a nonlinear thresholding operation [64]. As a consequence, the contribution of

single neurons and dendrites has been neglected for a long time. Dendritic process-

ing is very nonlinear and such dendritic nonlinearities has been hypothesized to be

able to enhance computational capabilities of a single neuron [66–68]. The synaptic

19

interaction at the turning point of a branch can be implemented by Boolean logical

operations according to the hypothesis by Koch, Poggio and Torre [69]. It suggested

that the dendritic branch point may sum currents from the dendritic branches, such

that its output would be a logical OR of its inputs, while each of the branches would

perform a logical AND on their synaptic inputs. Moreover, a logical NOT operation

can represent the inversion of a signal. However, it’s difficult for Koch’s model to

distinguish diverse synaptic and dendritic morphology in solving specific and com-

plex problems, for a slight difference of morphology can result in great functional

variation [69].

Thus structural plasticity mechanisms in synapses and dendrites are needed to

help resolve the problem. So the neuron pruning methodology, which is a way to

reflect neuron plasticity has arisen [70–72]. It refers to an essential progress by which

useless neurons and synaptic connections are deleted in order to improve the efficiency

of the neurological system. These new biophysical phenomena are helpful for us to

propose the model in this paper.

We propose a new single neuron model of four layers with synaptic nonlinearities

in a dendritic tree: a synaptic layer, a branch layer, a membrane layer and a soma

layer. We assume that each branch receives signals at their synapses and performs a

multiplication of these signals, while the synapses perform a sigmoidal nonlinear op-

eration on their inputs. The branching point sums up each multiplied input and then

the current is transmitted to the cell body (soma). When exceeding the threshold,

the cell fires and sends signal down to other neurons through axon. An error back

propagation algorithm is used to train the neuron model and according to the prun-

ing function, useless synapses and dendrites will be removed during training, forming

a distinct synaptic and dendritic morphology. Moreover, the nonlinear interactions

in a dendrite tree are expressed using the Boolean logic AND, OR and NOT. Thus,

the proposed single neuron model can be used as a single classifier to deal with the

classical Exclusive OR problem and the effectiveness is proved by the experiment.

The remaining of the paper is organized as follows. Section II introduces the

proposed single dendritic neuron model in detail. The model’s learning algorithm is

20

X1 X2 X3 X4 X5Synaptic

Branch

Membrane

Soma

Figure 3.1: The architecture of the proposed dendritic neuron model.

described in Section III. Section IV presents the experimental results and discussion.

Finally, Section V gives the conclusions of this paper.

3.2 Proposed single neural model based on den-

dritic struture

The architecture of the single neuron model is shown in Fig. 6.1. The neuron is

composed of a set of independent branches and a soma.

3.2.1 Synaptic layer

A set of inputs labeled x1, x2, ..., xi is applied to the neuron, corresponding to signals

conveyed by synapses. The synapses can be either excitatory or inhibitory and tend

to cause the cell to fire and produce an output pulse. There should be four connection

states in the synaptic layer: a direct connection (excitatory synapse), a reverse con-

nection (inhibitory synapse), a constant 1 connection and a constant 0 connection.

We show the type of connections by modeling with a sigmoid function. The node

function from the i-th (i = 1, 2, 3, ..., I) input to the j-th (j = 1, 2, 3, ...,M) synaptic

layer is given by

Yim =1

1 + e−k(wimxi−θim)(3.1)

21

where xi is the input part of a presynapse which is a set of inputs labeled by

x1, x2, ..., xn, and its range is [0, 1]. The inputs are transformed into digital signals

“0” and “1” in the synaptic layer. wim denotes synaptic parameters, and k represents

a positive constant. θim/wim is the threshold of the synaptic layer. There are six

cases of different values of the synaptic parameters. The synaptic function varies

accordingly as the values of wim and θim change, thus exhibiting different connections

states. Furthermore, the sigmoid function is clearly differential.

State1: Direct connection (Excitatory synapse)

Case (a): 0 < θim < wim. eg.: wim = 1.0 and θim = 0.5. In the direct connection,

if xi > θim/wim , the output Yim will be 1. That can be explained that if the input

is high potential compared to the threshold, an excitatory postsynaptic potential

(EPSP) will occurred as the membrane potential rapidly depolarizes. And when

xi < θim/wim , the output Yim will be 0. That is an inhibitory postsynaptic potential

(IPSP) has occurred as the membrane will be transiently hyperpolarizes [3]. In other

words, no matter how the inputs change between 0 and 1, the outputs equal the

input.

State2: Inverse connection (Inhibitory synapse)

Case (b): wim < θim < 0. eg.: wim = −1.0 and θim = −0.5. In the inverse

connection, if xi > θim/wim , the output Yim will be 0, giving rise to an IPSP that

hyperpolarizes the cell. On the other hand, if xi < θim/wim , the output Yim will be

1, as the postsynaptic membrane is depolarized by generating an EPSP. So it can be

illustrated by the logic NOT operation.

State3: constant-1 connection

Case (c1): θim < 0 < wim. eg.: wim = 1.0 and θim = −0.5. Case (c2): θim <

wim < 0. eg.: wim = −1.0 and θim = −1.5. In the constant-1 connection, the

output will be constant 1 no matter if the input exceeds the threshold or not. The

signals from the synapse have nearly no impact on the dendritic layers, for there is

an excitatory synapse that will trigger EPSPs once the input signals come in.

State4: constant-0 connection

Case (d1): 0 < wim < θim. eg.: wim = 1.0 and θim = 1.5. Case (d2): wim < 0 <

22

θim. eg.: wim = −1.0 and θim = 0.5. In the constant-0 connection, the output will

always be 0. That is IPSPs will always occur and the postsynaptic membrane keeps

hyperpolarized.

3.2.2 Branch layer

The branch layer receive a signal at the synaptic contact point, perform a multiplica-

tive computation of these signals, and produce the local potentials

Z =I∏

i=1

Yim (3.2)

The multiplication is very equal to the logic AND operation as the value of inputs

and outputs of the dendrites are either 1 or 0.

3.2.3 Membrane layer

The somatic membrane corresponds to the sublinear summation operation at a branch-

ing point. The summation can be nearly the same as the logic OR operation for the

inputs and outputs of the membrane are also either 1 or 0. Here is the equation:

V =M∑

m=1

Zm (3.3)

3.2.4 Soma layer

The result of computation in the membrane layer will be delivered to the soma.

The neuron fires when the membrane potential exceeds the threshold. The inputs

and outputs can be expressed with values of either 1 or 0. Thus we use a sigmoid

operator described as follows. When θsoma and k are set to 0.5 and 5 respectively, the

output of the neuron will be fixed to either 1 or 0.

O =1

1 + e−k(v−θsoma)(3.4)

23

Direct

Connection

Constant 1

Connection

Constant 0

Connection

Inverse

Connection

Figure 3.2: Four connections.

3.2.5 Neuronal-pruning Function

Pruning techniques start by learning a larger than necessary network and then remove

the nodes and weights which are considered to be redundant [73,74]. In the proposed

single neuron model, there are two pruning mechanisms namely axon pruning and

dendritic pruning. An input is connected to a branch by a direct connection (l),

an inverted connection ( z), a constant-0 connection ( 0⃝), or a constant-1 connection

( 1⃝), as shown in Fig. 6.2.

Synaptic pruning: In the constant-1 connection, the output of synaptic layer is

always 1. It will have no impact on the product result in the dendrite layer as any

value multiplies 1 will be itself when a multiplication operation is performed. Thus

the synaptic layer with constant-1 connection can be overpassed.

Dendritic pruning: The product result will always be 0 when there is a constant-

0 connection in the dendritic layer. The whole dendrite layer could be eliminated

without influence.

The process of pruning is illustrated in Fig. 5.2. The initial structure has four

synaptic layers, two dendritic layers, a membrane layer and a soma as shown in Fig.

5.2(a). On the Dendrite-1 layer, the connection state of input x2 is constant 1, so

this synaptic layer could be omitted. On the Dendrite-2 layer, the connection state

of input x3 is constant 0, thus the Dendrite-2 layer should be completely removed

sice the output of the Dendrite-2 layer will be 0. The removed synapse or dendrites

will be illustrated in dotted lines as shown in Fig. 5.2(b). Fig. 5.2(c) shows the final

simplified dendritic morphology of neuron that only the input x1 on Dendrite-1 layer

can influence the final output of the soma.

24

MembraneSoma

Branch-1

Branch-2

X1

X2

X4

Synaptic

(b)

MembraneSoma

Branch-1

Branch-2

X1

X2

X3

X4

Synaptic

(a)

MembraneSoma

Branch-1

X1

Synaptic

(c)

Figure 3.3: Evolution of predicted dendrite structure by neural pruning.

3.3 Error Back-propagation Learning algorithm

The proposed neuron model is a feed-forward multilayer network, and the functions

of these nodes are all differential. Therefore, the back propagation (BP) algorithm is

employed to learn the connection types of the connection layer of the neuron model.By

using a learning rule, we can readily derive a neuron model from the condition of the

least squared error between the actual output O and the desired output T defined as:

E =1

2(T −O)2 (3.5)

25

According to the gradient descent learning algorithm, the synaptic parameters wim

and θim will be modified in the direction to decrease the value of E. The equations

are shown as:

∆wim(t) = −η∂E

∂wim

(3.6)

∆θim(t) = −η∂E

∂θim(3.7)

where n is a positive constant representing the learning rate. The partial differentials

of E with respect to wim and θim are computed as:

∂E

∂wim

=∂E

∂O· ∂O∂V· ∂V∂Zm

· ∂Zm

∂Yim

· ∂Yim

∂wim

(3.8)

∂E

∂θim=

∂E


· ∂Zm

∂Yim

· ∂Yim

∂θim(3.9)

The components in the above partial differential are shown as follow.

∂E

∂O= O − T (3.10)

∂O

∂V=

ke−k(v−θsoma)

(1 + e−k(v−θsoma))2(3.11)

∂V

∂Zm

= 1 (3.12)

∂Zm

∂Yim

=I∏

L=1andL=i

YLm (3.13)

∂Yim

∂wim

=kxie

−k(xiwim−θim)

(1 + e−k(xiwim−θim))2(3.14)

∂Yim

∂θim=−ke−k(xiwim−θim)

(1 + e−k(xiwim−θim))2(3.15)

The parameters wim and θim are updated according to the equations as follows.

wim(t+ 1) = wim(t) + ∆wim (3.16)

26

Table 3.1: Parameter setting.

Method Parameter settingThe proposed model η = 0.1, m = 10, epoch=1000, k = 5, θsoma = 0.5BPNN η = 0.1, m = 10, epoch=1000

Table 3.2: Exclusive OR problem.

HHHHHHX2

X1 0 1

0 0 11 1 0

θim(t+ 1) = θim(t) + ∆θim (3.17)

3.4 Experimental results and discussion

The experimental results of the Exclusive OR (XOR) problem are explained in this

section, conducted on MATLAB (R2013b). Here, the performance of the proposed

model is compared with the classical back propagation neural network (BPNN) by

mean square error (MSE) and accuracy. Table 1 shows the parameter setting of the

proposed neuron model and BPNN, in which the same learning rate and the hidden

layer is set.

The classic Exclusive OR problem has two inputs and its teacher signal is shown

in Table 2.

3.4.1 Performance comparison

3.4.1.1 Convergence comparison

Fig. 5.3 shows the comparison of convergence speed of the proposed model and

BPNN. As we can see in Fig. 5.3 , the proposed model provides the lower error

training in contrast and has better convergence rate than BPNN.

27

Table 3.3: Classification accuracy.

Method AccuracyProposed method 4/4 100%

BPNN 3/4 75%

0 200 400 600 800 10000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Epoch

Mea

n E

rror

Proposed modelBPNN

Figure 3.4: Convergence graphs obtained by the proposed dendritic neuron modeland BPNN.

3.4.1.2 Classification accuracy comparison

The comparison of classification accuracy is shown in Table 3. The accuracy 100%

mean the learned output of neuron model for a pattern is same with its teacher

signal. It is shown that the proposed model achieve the success of all patterns for the

Exclusive OR problem while BPNN just gets 3 success out of 4 patterns.

3.4.2 The synaptic and dendritic morphology after learning

We simplify the structure of dendrites according to the pruning mechanisms that the

synaptic layer with constant-1 connection can be completely omitted and the dendritic

layer with 0 connection should be removed. After learning, the neuron produced the

morphology shown in Fig. 5.4. It is interesting to note that the dendritic branches

1, 2, 3, 4, 7, 8, 9 and 10 showed at least a 0-constant synaptic connection. Because

one branch performs a multiplicative operation for all inputs, branches 1, 2, 3, 4, 7,

28

soma

Membrane

Branch-10 1

0 1

1

0

0

0 0

1 0

x1 x2

Branch-2

Branch-3

Branch-4

Branch-5

Branch-6

Branch-7

Branch-8

Branch-9

Branch-10

0

1

0 1

Branch-5

Branch-6

Membranex1 x2

soma

(a) (b)

Figure 3.5: Predicted dendrite structure by neural pruning obtained by the proposedmodel.

8, 9 and 10 could be eliminated, corresponding to the degeneration of the dendritic

branches. Therefore, we can rewrite Fig. 5.4(a) to Fig. 5.4(b).

3.5 Conclusion

In this study, we have presented a new model that captures the nonlinear interaction

among excitatory and inhibitory inputs on dendrites with a multiplicative operation.

Each synapse receives its input and passes it through a sigmoidal nonlinear function.

The output of each synapse is conveyed to the dendritic branches, and each branch

performs a simple multiplication of its inputs. This gives each segment of dendrite its

own computational power. We have demonstrated that the single neuron is capable

of solving the classical exclusive OR problem and get the desired accuracy of 100%.

This model may offer fundamental new insight to the neuron’s function and help to

predict cell morphology and the spatial distribution of synapses.

29

Chapter 4

Dendritic Neural Model:Immunological Learning Algorithm

4.1 Research Background

Extremely great number of neurons compose the brain, where the fundamental struc-

ture in each single neuron consists of an axon, a dendrite, a cell membrane, and a cell

body. Probably the most striking feature of a neuron is its characteristic morphology:

dendritic and axonal processes sprout as intricate tree structures to enable connec-

tions with other neurons. Through their dendrites, neurons receive signals from other

neurons, and via their axons they transmit signals to other neurons. Historically,

research on neuronal morphologies has focused more strongly on dendrites because

the larger diameters of their branches make them more amenable experimentally and

dendrites cover a more restricted space compared to axons [75]. Dendrites receive the

far majority of synaptic inputs to a neuron. The spatial distribution of inputs across

the dendrites can be exploited by neurons to increase their computational repertoire.

The role of dendrites in neural computation has recently received more and more

attentions. The exploration of the role of dendrites in neural input integration was

pioneered by Wilfrid Rall. This started in the 1950s with experimental work by Eccles

and others that suggested surprisingly brief membrane time constants for certain cat

spinal motoneurons. Those time constant estimates relied on the assumption that

motoneurons could be described as point neurons and, therefore, that voltage tran-

30

sients followed exponential time courses. In the literature, many works have reported

regarding the dendritic computation of a single neuron [69,76–80].

Traditional and dominant computational model on single neuron is the binary

McCulloch-Pitts neuron, which has been criticized to be oversimplified for the no

consideration of nonlinearities in a dendritic tree [81]. The powerful computational

capacity of dendritic processing has been taken into consideration when construct

more plausible neural models. Specifically, the synaptic interaction at the turning

point of a branch can be implemented by Boolean logical operations according to the

hypothesis by Koch, Poggio and Torre [69]. It suggested that the dendritic branch

point may sum currents from the dendritic branches, such that its output would be a

logical OR of its inputs, while each of the branches would perform a logical AND on

their synaptic inputs. Moreover, a logical NOT operation can represent the inversion

of a signal. However, its difficult for Kochs model to distinguish diverse synaptic

and dendritic morphology in solving specific and complex problems, for a slight d-

ifference of morphology can result in great functional variation [69]. Most recently,

we proposed a single four layered neuron model [68] with synaptic nonlinearities in a

dendritic tree including a synaptic layer, a branch layer, a membrane layer and a soma

layer. We assumed that each branch received signals at their synapses and performed

a multiplication of these signals, while the synapses performed a sigmoidal nonlinear

operation on their inputs. The branching point summed up each multiplied input and

then the current was transmitted to the cell body (i.e. soma). When exceeding the

threshold, the cell fired and sent signal down to other neurons through axon. An error

back propagation algorithm was used to train the neuron model and according to the

pruning function, useless synapses and dendrites would be removed during training,

forming a distinct synaptic and dendritic morphology. Moreover, the nonlinear inter-

actions in a dendrite tree were expressed using the Boolean logic AND, OR and NOT.

Nevertheless, the error BP algorithm used in the original work suffered from the local

optimal problem, which limited the learning capacity and computational plausibility

of the dendrite neural model.

This study aims to propose an effective training algorithm for the dendritic neu-

31

ral model. The training process of a neural model is an important aspect, especially

to neural model with nonlinear dendrites, and this process is also considered to be

related with the neural plasticity and dendritic morphology [82,83]. In our previous-

ly proposed dendritic neural model, the training is not only for the corresponding

mapping between the input signals from other neurons and the output the current

neuron though the associated dendrites, but also for the final formation of the den-

dritic morphology [68]. Similar to the training process in multiple-layered perceptron,

training dendritic neural model can also be regarded as a difficult global optimiza-

tion problem, despite the fact that local optimizers are usually applied for training.

Investigation of applying global optimizers to training is well-motivated, since local

optimizers have basically limited capabilities for global optimization. A further mo-

tivation comes from the need to apply transfer function or regularization approaches

that do not satisfy the requirements concerning the availability of gradient informa-

tion. Convergence to a locally optimal solution is a fundamental limitation of any

local search based training approach including BP. Based on above considerations,

we propose an artificial immune algorithm which is inspired from biological immune

systems to train the dendritic neural model. A population of antibodies are generated

and manipulated to optimize the weights and thresholds parameters in the synapses

though somatic hyper-mutation and receptor editing operators. After learning, the

final dendritic morphology of the neuron which is capable of handling specific tasks

can be obtained. Two distinct experiments based on the famous XOR problem and

a geotechnical engineering problem demonstrated the effectiveness of the proposed

artificial immune algorithm.

4.2 Single Dendritic Neural Model for Morpholo-

gy Prediction

To fully realize the sense of locality in a single neuron, local interactions within a

fixed dendritic tree should be considered in the realization of the computation, not

32

x1 x2 x3 x4 x5

M=1

M=2

M=3

M=4

M=5

Soma

Synapse Membrane

Dendrite

Figure 4.1: Schema of a neuron model with dendritic branches. Axons of presynapticneurons (input X) connect to branches of dendrites (horizontal blue lines) by synapticlayers (black triangles); the membrane layer (vertical blue lines) sums the dendriticactivations, and transfers the sum to the soma body (black sphere).

only for a better biologically plausibility but also for a more powerful computational

capacity. Such a single neuron model with four layers including a synaptic layer, a

dendrite layer, a membrane layer, and a soma layer was proposed in our previous

work [68]. To make the paper self-explanatory, we describe the details of the model

in the following.

The structure of the dendritic neuron model is illustrated in Fig. 4.1. The synaptic

layer represents the synaptic connections to the dendrite of neuron, which is imple-

mented by the receptors that take in a certain specific ion. When an ion enters

the receptor, the potential of the receptor changes and determines whether it is an

excitation synapse or an inhibition synapse. A sigmoid function is used to express

connection states. Its node function from the i-th (i = 1, 2, 3, ..., I) synaptic input to

the m-th (m = 1, 2, 3, ...,M) synaptic layer is expressed by the following equation.

Yim =1

1 + e−k(wimxi−qim)(4.1)

33

01

Direct

Connection

Inverse

Connection

Constant 1

Connection

Constant 0

Connection

Random

Connection

Figure 4.2: Four connection states of synaptic layers. The left figure responds statebefore training, each synaptic layer will land on one of the right four connection statesby training, which constitutes the structure of ALMN.

Where xi is the input part of a synapse, referred to as the presynaptic terminal,

and its range is [0, 1]. wimand qim are connection parameters; k is set to be 5. With

different values of wim and qim, six cases correspond to the four connection states: a

constant 0 connection, a constant 1 connection, a reversed connection and a direct

connection, shown in Fig. 4.2. Using the synaptic layers, we transform the inputs

into digital signals which are composed of “0” and “1”. θim is the threshold of the

synaptic layer, which is calculated by the function θim = qim/wim.

Direct connection

Case (a): 0 < qim < wim, e.g. : wim = 1.0 and qim = 0.5.

In the direct connection, if xi exceeds the threshold θim, the output is set to be 1, and

if less than θim, the output will be 0. This means that if the input is high potential

compared with the threshold, the synapse is an excitatory one, an excitatory signal

has occurred. Conversely, a low potential produces an inhibitory synapse, resulting

in an inhibitory signal.

Inverse connection Case (b): wim < qim < 0, e.g. : wim = −1.0 and

qim = −0.5.

In the inverse connection, contrary to the direct connection, if input xi does not reach

the threshold θim, the output is set to be 1, and it evokes an excitatory signal. If the

input is larger than the threshold θim, the output is 0, and an inhibitory signal will

be triggered by the output. This can be expressed by the logic NOT operation.

Constant 1 connection

34

-2 -1.5 -1 -0.5 0 0.5 1 1.5 20

0.20.4

0.60.8

1

-2 -1.5 -1 -0.5 0 0.5 1 1.5 20

0.20.4

0.60.8

1

-2 -1.5 -1 -0.5 0 0.5 1 1.5 20

0.2

0.4

0.60.8

1

-2 -1.5 -1 -0.5 0 0.5 1 1.5 20

0.2

0.4

0.60.8

1

-2 -1.5 -1 -0.5 0 0.5 1 1.5 20

0.2

0.4

0.60.8

1

-2 -1.5 -1 -0.5 0 0.5 1 1.5 20

0.2

0.4

0.60.8

1

(a) Direct connection (b) Inverse connection

(c1) Constant 1 connection (c2) Constant 1 connection

(d1) Constant 1 connection (d2) Constant 1 connection

y yyy

y y

x x

x x

x x

Figure 4.3: Six function cases of the synaptic layer. The graph’s horizontal x axisrepresents the inputs of presynaptic neurons; the vertical y axis shows the output ofthe synaptic layer. Because the range of x is [0,1], only the corresponding part needsto be observed.

There are two states in the Constant 1 connection.

Case (c1): qim < 0 < wim, e.g. : wim = 1.0 and qim = −0.5;

Case (c2): qim < wim < 0, e.g. : wim = −1.0 and qim = −1.5.

In the Constant 1 connection, whether the input exceeds the threshold θim or not,

the output is always 1. In this connection state, the dendrite layer will barely receive

the constant 1 digital signal from the synapse. An excitatory synapse is fixed in this

position; once input signals enter, excitatory output signals will be exported.

Constant 0 connection Two states are discovered in the Constant 0 connection.

Case (d1): 0 < wim < qim, e.g. : wim = 1.0 and qim = 1.5;

Case (d2): wim < 0 < qim, e.g. : wim = −1.0 and qim = 0.5.

In these states, the output is 0, independent of the input signal. In this connection

state, the synapse always degenerates into an inhibitory one; output signals remain

inhibitory. The functions of all cases are shown in Fig. 4.3.

Dendritric layer The dendrite layer represents the nonlinear interaction between

35

synaptic signals on each branch. The multiplication operation has been thought

to play an important role in the processing of neural information in the sensory

systems, where a range of visual and auditory processes are believed to be underpinned

by multiplication [84], [85]. Our model adopts the multiplicative operation in the

dendrite layer. Since the inputs and outputs of the dendrite layers are either 1 or

0, the multiplication becomes exactly equal to the logic AND operation. Here, the

dendritic equation is shown as follows.

Zm =I∏

i=1

Yim (4.2)

Membrane layer The membrane layer accumulates the sublinear summation of

the signals in each dendritic branch. The inputs and outputs of the membrane layers

are also either 1 or 0; because the threshold of soma body is set to be 0.5, unless all

inputs are 0, the result of summation will activate the soma body, the same as that

of OR operation; thus, the summation can be substituted by the logic OR operation.

The equation is shown as follows.

V =M∑

m=1

Zm (4.3)

Soma layer The soma layer represents the soma cell body. The neuron fires

depending on whether or not the membrane potential exceeds the threshold. We

express it using a sigmoid operation of the product terms, which can be described

mathematically by Eq.(4).

O =1

1 + e−k(V−θsoma)(4.4)

Where θsoma and k are the parameters of the cell body. When θsoma and k are set to

0.5 and 5 respectively, the output of the neuron will be fixed to either 1 or 0.

Neuronal-pruning Function Axon pruning: for the axon inputs in the Constant

1 connection, the output of the synaptic layer is 1. Because of the multiplication

operation, an arbitrary value times 1 yields itself. This means the synaptic input has

36

no influence on the result of the dendrite layer; hence, we can completely omit this

synaptic layer input.

Dendritic pruning: once the axon inputs are in the Constant 0 connection, the

output of the layer is 0, since any value multiplied by 0 yields 0. The multiplication

operation makes the entire dendrite layer be 0, regardless of any other synaptic signals

in the dendrite layer. Since the dendrite layer has no influence on the membrane layer,

the entire dendrite layer should be deleted.

With the above approaches, the neural network can complete the neural pruning

procedure, which screens out the useless synapses and unnecessary dendrites to sim-

plify the dendrite structure. Illustratively, we use the above approaches to simplify

the structure in Fig. 4.4(a). The original structure contains four synaptic layers, two

dendrite layers, a membrane layer, and a soma body. On the Dendrite-1 layer, the

connection state of input x2 is Constant 1, so this synaptic layer would be ignored.

On the Dendrite-2 layer, the connection state of input x1 is Constant 0, so the output

of Dendrite-2 will remain 0. Therefore, we discard the entire Dendrite-2, shown by

the dotted line in Fig. 4.4(b). Finally, we find that only input x1 on Dendrite-2 can

influence the final result of the soma body, shown in Fig. 4.4(c). As such, ALNM

simplifies the dendrite morphology of the neurons using the neural pruning function.

4.3 Artificial Immune Training Algorithm

4.3.1 Immunological Inspiration

Plenty of optimization methods have been applied to train neural networks’ weight-

s and other parameters, either local or global ones. These algorithms include BP,

modified BP [86], BP using conjugate gradient approach [87], the Marquadt algo-

rithm [88], evolutionary algorithm [89], differential evolution [90], particle swarm

optimization [91], etc. Naturally, local searches such as [86–88] are fundamentally

limited to local solutions, while global ones [89–91] attempt to avoid this limitation.

37

MembraneSoma

SynapseDendrite-1

x1

(c) Final dendrite structure

1

0

Membrane

Soma

Synapse

Dendrite-1

Dendrite-2

x1

x2

x 1

x2

(b) Structure during neural - pruning

1

0

MembraneSoma

Synapse

Dendrite-1

Dendrite-2

x1

x2

x1

x2

(a) Original dendrite structure


The training performance varies depending on the objective function and underly-

ing error surface for a given problem and network configuration. Since the gradient

information of error surface is available for the most widely applied network configu-

ration, the most popular optimization methods have been variants of gradient based

back-propagation algorithms. Of course, this is sometimes the result of an insepa-

rable combination of network configuration and training algorithm which limits the

freedom to choose the optimization method [92–95].

An attempt of utilizing artificial immune algorithm to train the dendritic neural

network is made in this study. The immune system is a mobile, dynamic, stable,

highly distributed and collaborative system. Many features and principles in the im-

mune system have been discovered and abstracted for artificial systems. The natural

immune system is a complex pattern recognition device with the main goal of protect-

ing our body from malefic external invaders, called antigens. The primary elements

are the antibodies, which bind to antigens for their posterior destruction by other

38

Antigen

Antigen

Peptide

MHC

Protein

APC

Peptide- MHCTh Cell

Activated

Th Cell

Ts Cell

IL+

IL-

(I)

(II)

(III)

(IV)

(VI)

(V)Antibody

B Cell

Activated B

Cell

(Plasma Cell)

Figure 4.5: Biological immune procedures used as the training algorithm for singledendritic neural model.

cells. The number of antibodies contained in our immune system is known to be

much inferior to the number of possible antigens, making the diversity and individual

binding capability the most important properties to be exhibited by the antibody

repertoire. In the immune system, affinity is an important measure to represent the

fitness of antibody to antigen. When there are detected antigens, the immune system

will choose B cells with higher affinity to proliferate, which is called clonal selection

and proliferation. When the antigen are eliminated, the B cells with lower affinity

will be chosen for elimination. These two procedures make the antibody population

stable. Moreover, the proliferation and elimination are specific to antigens, as they

take actions according to the affinity. Therefore, they also contribute to the diversity.

The key principles of clonal selection theory are:

(1) the clonal selection is based on the affinity;

(2) the clonal proliferation is followed by hypermutation and receptor editing;

(3) the B cells with lower affinity are eliminated after the elimination of antigens.

The general biological immune responses are shown in Fig. 4.5.

39

4.3.2 Training Algorithm based on Immune Mechanisms

The optimization of the dendritic neuron model is regarded as the antigen, while the

parameters of weights and thresholds in synapses as shown in Eq. 4.1 are treated as

the antibody. Initially, a set of N antibodies are randomly generated where wim are

generated from [−1, 1] and qim are generated in the interval of [−1.5, 1.5]. A number

of n(n < N) fittest antibodies are selected from the initial pool based on the least

squared error function between the actual output O and the desired output T , which

is showed as follows.

E =1

2(T −O)2 (4.5)

Each selected elitist antibody is separated into n distinct pools in ascending or-

der. After selection, the resulting antibodies are regarded as the population A(t)

manipulated in the generation.

According to the clonal selection theory, the elitist antibodies are proliferated

wherein the cells divide themselves, creating a set of clones identical to the parent

antibodies. The proliferation rate is directly proportional to the affinity level, the

higher affinity levels of antibodies, the more of them will be readily selected for

cloning and cloned in larger numbers. The amount of clones generated is according

to the following rule:

ni = ⌈n− i

n×K⌉ (4.6)

where the function ⌈⌉ rounds its argument towards the closest integer. i is the

ordinal number of the elite pools, and K is a multiplying factor which determines the

scope of the proliferation.

Mutation operator enables the algorithm capable of finding better solutions gener-

ation by generation. It plays a very important role in the solution evolution process.

There are two types of mutation operators in clonal selection based model, one is the

hyper-mutation mainly performing search in a local domain and the other is the re-

40

Soma

O

X1 X2 XI

W=(wim) i=1,2, ,I

Q=(qim) m=1,2, ,M

Let P=(W Q)

(1) Hyper-mutation operator HM:

HM(P)=P + g×Guass(0,1)

(2) Receptor Editing operator RE:

RE(Pi, P

j, r)= P

iwhen i r and m R

RE(Pi, P

j, r)= P

jwhen i=r or m=R

Figure 4.6: Mutation operators used in the artificial immune training algorithm.

ceptor editing aiming to act as a global search and help the algorithm jump out of the

local minima. The details of the mutation operators are illustrated in Fig. 4.6, where

the synaptic parameters are jointed together as P = (WQ). The hyper-mutation

operator HM performs a unary mutation on each antibody by bit changing variation

which is sampled from the classic Gaussian distribution. A shrinking parameter g

is used to control the mutation influence, and it reduces gradually along with the

generation as in the following.

g(t+ 1) = α× g(t) (4.7)

where the shrinking factor α is usually set to be 0.95 in the experiments.

The receptor editing operator RE is a binary operator on two antibodies and

actually carries out a crossover-like mutation. The randomly generated number r is

used to select the position where the exchange of antibody points take place.

After manipulated by mutation operators, the fittest one is selected from each elite

pool respectively to replace its corresponding parent antibody. If the fitness of selected

child antibody is larger than that of the parent, replacement takes place; otherwise

not. This process conduces to a fitter antibody population. The above procedures

41

are iterated until a terminal condition is satisfied. An adiaphorous method is used

to set a maximum number of generation T . When the current generation reaches T ,

the training algorithm is terminated and the best weights and thresholds in synapses

are output. Finally, the resultant dendritic morphology is obtained.

4.4 Simulation Results

4.4.1 Experiments Setup

Two kinds of experiments are implemented to verify the effectiveness of the proposed

artificial immune algorithm. The first one is the famous Exclusive OR (XOR) prob-

lem which is frequently utilized as one of the benchmark test problems because the

traditional single layered perceptron had been demonstrated to be failed to solve this

simple but nonlinear problem. The second experiment is based on the slope stability

analysis [96] which is a practical and important geotechnical engineering problem.

All the experiments are conducted on MATLAB (R2013b).

4.4.2 Results Analysis and Discussions

The training data of XOR is shown in Table 4.1, while the training data set and test

data set of the slope stability analysis problem are summarized in Table 4.2 and 4.3

respectively, where Γ is unit weight, C is cohesion, ϕ is friction angle of soil, H is

height of slope, β is slope angle and ru is pore pressure parameter. In this study, we

consider the slope stability analysis problem as a classification problem. The slope

failures are complex natural phenomena that constitute a serious natural hazard in

many countries. Many variables are involved in slope stability evaluation, and the

calculation of the factor of safety requires geometrical data, physical data from the

geologic materials and their shearstrength parameters (cohesion and angle of internal

friction), information on pore-water pressures, etc. Engineering assessment of earth

slope stability is usually performed using algorithms in determining its susceptibility

to failure in terms of the factor of safety. Depending on whether the factor of safety

42

Table 4.1: Target XOR training data.

X1 X2 Desired Output0 0 00 1 11 0 11 1 0

is greater or less than 1, the slope is considered stable or unstable [96,97].

Both the BP and artificial immune algorithm are used to train the dendritic neuron

model when applied to XOR and the slope stability analysis. The classification of

the slope stability is defined in terms of state of the slope, stable or failed slopes; SS

is taken as 1 and 0 for stable and failed slope, respectively. Such type of analysis is

also made for liquefaction potential evaluation of in situ soil using neural networks.

The factor of safety calculated based on limit equilibrium method is used as the

output for the neural network model developed for predicting the factor of safety.

The comparative results are summarized in Table 4.4.

From Table 4.4, it is clear that the proposed artificial immune algorithm can

produce better solutions than BP when training the dendritic neural model, no matter

on the simple but nonlinear XOR problem, or the practical engineering problem. In

addition, we also show a corresponding dendritic morphology predicted by the trained

single neuron model on XOR problem in Fig. 4.7, suggesting that the proposed

algorithm is also capable of predicting the morphologies of neurons.

4.5 Conclusion

In this study, we propose an artificial immune algorithm to train the dendritic neuron

model. Derived from the parallel computing mechanism of the population and the

no need of gradient information, the artificial immune algorithm which is a global

optimization method has been verified that it is superior to traditional local optimizer

BP method in terms of the average final least squared learning error on two tested

43

Branch-5

Branch-6

Membranex1 x2

soma

Figure 4.7: Final dendritic morphology of the XOR problem after training.

problems. In future, we plan to investigate the user-defined parameter sensitivities

of the proposed artificial immune algorithm and apply the proposed model to more

various problems.

44

Table 4.2: The training data set of slope stability classification problem.

Γ (kN/m3) C (kPa) ϕ (o) β H (m) rn FC

18.68 26.34 15 35 8.23 0 018.84 14.36 25 20 30.5 0 118.84 57.46 20 20 30.5 0 128.44 29.42 35 35 100 0 128.44 39.23 38 35 100 0 120.6 16.28 26.5 30 40 0 014.8 0 17 20 50 0 014 11.97 26 30 88 0 025 120 45 53 120 0 118.5 25 0 30 6 0 018.5 12 0 30 6 0 022.4 10 35 30 10 0 121.4 10 30.34 30 20 0 122 0 36 45 50 0 012 0 30 35 4 0 112 0 30 45 8 0 012 0 30 35 4 0 112 0 30 45 8 0 0

23.47 0 32 37 214 0 016 70 20 40 115 0 0

20.41 24.9 13 22 10.67 0.35 121.82 8.62 32 28 12.8 0.49 020.41 33.52 11 16 45.72 0.2 018.84 15.32 30 25 10.67 0.38 121.43 0 20 20 61 0.5 019.06 11.71 28 35 21 0.11 018.84 14.36 25 20 30.5 0.45 021.51 6.94 30 31 76.81 0.38 014 11.97 26 30 88 0.45 018 24 30.15 45 20 0.12 023 0 20 20 100 0.3 022.4 100 45 45 15 0.25 1

45

Table 4.3: The test data set of slope stability classification problem

Γ (kN/m3) C (kPa) ϕ (o) β H (m) rn FC

22.4 10 35 45 10 0.4 020 20 36 45 50 0.25 020 20 36 45 50 0.5 020 0 36 45 50 0.25 020 0 36 45 50 0.5 022 0 40 33 8 0.35 120 0 24.5 20 8 0.35 118 5 30 20 8 0.3 116.5 11.49 0 30 3.66 0 026 150.05 45 50 200 0 122 20 36 45 50 0 0

19.63 11.97 20 22 12.19 0.41 018.84 0 20 20 7.62 0.45 024 0 40 33 8 0.3 1

Table 4.4: Average final least squared error after learning using BP and artificialimmune algorithm for XOR and slope stability.

Algorithm XOR Training data Testing dataBP 0.25 0.46 0.75

Artificial immune algorithm 0.15 0.21 0.32

46

Chapter 5

Dendritic Neural Model:Classification Ability

5.1 Introduction

Liver disease is one of the top 10 leading causes of death that affects 30 million Amer-

icans of all ages, genders, races and life circumstances and the number keeps growing

aggressively [98]. There should be greater public awareness about liver health and

early treatment. Although numbers of deaths caused by cancer or other diseases

are still much greater, liver disease kills people at a much younger age between the

ages of 25 and 64 [99]. Therefore it is much more prominent to find out methods to

detect liver disease at an early stage. There are many risk factors from genetic to

autoimmune, or environment and behavior [98]. Accurate diagnosis of liver disease

can never been an easy task. The information afforded by the patients may conclude

redundant and interrelated symptoms and signs which can complicate the diagno-

sis of liver disease, leading to the delay of a correct diagnosis decision. Thus it is

imperative to find much more effective and advanced diagnosis methods to identify

multidimensional relationships in clinical data of liver disease, as well as improving

the accuracy of diagnosis.

There are many kinds of methods for the liver disorders classification problem

including decision trees, ensemble learning, linear regression, naive Bayes, k-nearest

neighbors algorithm, artificial neural network (ANN), support vector machine, etc.

47

The intelligent system that includes the artificial neural networks based expert system

for the automatic liver disorders diagnosis is becoming popular among researcher-

s [100–102]. The ability of the system to approximate complex and non-linear prob-

lems without knowing the mathematical representations of the system and the learn-

ing process that mimic the human brain lead to this popularity. The ANN also out-

performs the conventional statistical technique for the prediction and classification

purposes in various fields of applications, as revealed in [103].

On the basis of the latest research on the properties of neurons [104–108], we

propose a more realistic model of single neuron computation with synaptic nonlin-

earities (NMSN) in a dendritic tree for liver disease diagnosis. By modeling synaptic

nonlinearity with a sigmoid function, we show that such a single neuron is capa-

ble of computing linearly non-separable functions and approximating any complex

continuous function. The nonlinear interactions in a dendrite tree are expressed us-

ing the Boolean logic AND (conjunction), OR (disjunction) and NOT (negation),

instead of executing a complex function calculations. The model is equipped with

a neuron-pruning function that can remove useless synapses and dendrites during

learning, forming a distinct synaptic and dendritic morphology without sacrificing

the predictive accuracy. Thus we can use the model to select features for identify-

ing the underlying causes of disorders, to reduce the number of inputs to save the

diagnosis time and to achieve high classification accuracy of liver disease. We also

develop a back-propagation based learning algorithm capable of modifying synapses

adequate for performing the task. The model not only is able to achieve high accuracy,

sensitivity and specificity rate, but also can provide explanation for its predictions,

thus showing promise as an effective pattern classification method in liver disease

diagnostics.

The remaining of the paper is organized as follows. Section 2 presents some char-

acteristics of classic artificial neural network and its application in medical diagnosis,

therein the discovery of synaptic nonlinearity in single neuron is also specially de-

scribed. Section 3 introduces the proposed neuron model NMSN in details. NMSN’s

learning algorithm is described in Section 4. Section 5 presents the experimental re-

48

sults using the BUPA liver disorders datasets. Finally, Section 6 gives the discussions

and future works to conclude this paper.

5.2 Backgrounds

5.2.1 ANN in medical diagnosis

An artificial neural network (ANN) is a mathematical representation of the human

neural architecture, reflecting its “learning” and “generalization” abilities. First dom-

inant conceptual model on neural networks was a single neuron model called Mc-

Culloch-Pitts neuron [64]. Learned by the back-propagation (BP) algorithm, the

non-linear processing capabilities of ANN had been demonstrated [1]. ANNs have

been intensively applied for classification tasks in medical diagnosis [109]. Clinical

diagnosis was one of the first areas using ANNs [110]. Due to the ability of predic-

tion, parallel operation, and self-adaptivity, ANN has provided a powerful tool for

physicians to analyze, compute and figure out complex data across many medical

applications. The techniques help the disease diagnosis usually by learning the basic

characteristics to use in the decision making processes, trying to solve a quantitative

classification problem instead of qualitative diagnosis, which is more objective. The

application of ANNs in medical diagnosis has been previously described in general

in [111].

There are many studies using ANNs for liver disease diagnosis. Some of the

typical ones are introduced in the following. Jeatrakul and Wong carried out a com-

parison of the classification performance of liver disease conducted by five differen-

t types of neural networks which were back-propagation neural network (BPNN),

radial basis function neural network (RBFNN), general regression neural network

(GRNN), probabilistic neural network (PNN), and complementary neural network

(CMTNN). During them, the best classification accuracy of 70.29% was obtained by

CMTNN [100]. Besides, Zhang et al. [101] proposed new types of single-output and

multi-output Chebyshev-polynomial feed-forward neural network, named as SOCPN-

49

N and MOCPNN, to classify real-world datasets and both methods obtained a testing

accuracy of 66.78%. The best classification accuracy without noise of these methods

based on liver disorders datasets were acquired by two methods (i.e., SOCPNN and

MOCPNN) proposed in [101]. In addition, Seera and Lim used a fuzzy Min-Max

neural network to classify the liver disorders and got an accuracy of 67.25% [102].

It is obvious that a lot of research effort has been undertaken in order to perform

liver disease classifications using ANNs. However few has considered single neuron

models which are thought to be unable to solve multidimensional and nonlinear prob-

lems.

5.2.2 The discovery of synaptic nonlinearity in single neuron

The Mc-Culloch-Pitts neuron model has been widely used as a basic unit for modern

studies of neural networks which multiplies the input vector by a weight vector, and

then passes through a linear threshold gate. The neurons can have the ability to

learn arbitrary linearly separable dichotomies of the inputs space through adjusting

weights and thresholds of synapses [112].

In the traditional ANNs’ literatures, the prevailing view has been that the brain

has strong computational abilities because of the complex connectivity of neural net-

works, in which a single neuron could only perform a linear summation and a nonlin-

ear thresholding operation (all-or-none response) [64]. That is a single neuron model

cannot be used in the medical diagnosis as the clinical data generally used for di-

agnosing are multidimensional and nonlinear in nature [113]. As a consequence, the

contribution of single neurons and their dendrites has long been overlooked.

Recently it has been conjectured by a series of theoretical studies that individual

neurons could act more powerfully as computational units considering synaptic non-

linearities in a dendritic tree [69, 114–117]. The various types of synaptic plasticity

and nonlinearity mechanisms allow synapses to play a more important role in com-

putations [106]. Synaptic inputs from different neuronal sources can be distributed

spatially on the dendritic tree and plasticity in neuron can result from changing in

50

synaptic strength or connectivity, and the excitability of the neurons themselves [105].

Moreover, a slight morphological difference can just cause great functional variation,

acting as filters to determine what signals a single neuron receives and then how these

signals are integrated [118]. Blomfield proposed a pioneering theory that showed

that synaptic interactions in each individual neuron could be additive or multiplica-

tive [114]. Multiplicative operations may play a key role in neuronal computation

suggested by Schnupp and King [119]. It was proposed by theoreticians that the non-

linearity of synapses could be used to implement a type of multiplication instead of

summation [104]. Koch, Poggio and Torre hypothesized that the synaptic interaction

and the action at the turning point of a branch can be implemented by Boolean log-

ical operations [69]. It suggested that the dendritic branch point may sum currents

from the dendritic branches, such that its output would be a logical OR of its inputs,

while each of the branches would perform a logical AND on their synaptic inputs.

Moreover, a logical NOT operation can represent the inversion of a signal.

However, there are still difficulties for the so-called Koch’s model [69] to distinguish

diverse synaptic and dendritic morphology in solving specific and complex problem-

s [105], such as liver disorders diagnostics. Thus structural plasticity mechanisms in

synapses and dendrites are needed to support resolving the problem, including for-

mation and elimination of synapses and dendrites in neural circuits, and then acquire

branch-specific morphology.

In recent years, considerable effort has been directed towards neuron pruning

methodology [70–72], which is a way to reflect neuron plasticity. It refers to an es-

sential progress by which extra neurons and synaptic connections are removed for the

purpose of improving the efficiency of the neurological system. These new biophysical

phenomena are helpful for us to propose the model in this paper.

51

x1 x2 xixI

soma

SynapsesMembrane

Dendrite 1

Dendrite 2

Dendrite m

Dendrite M

Figure 5.1: The architecture of the proposed dendritic neuron model.

5.3 Single Dendritic Neural Model for Classifica-

tion

The single neuron model with synaptic nonlinearities (NMSN) proposed in this paper

simulates the essence of nonlinear interactions among synaptic inputs in the den-

drites. We assume that each branch receives signals at their synapses and performs a

multiplication of these signals, while the synapses perform a sigmoidal nonlinear op-

eration on their inputs. The branching point sums up each multiplied input and then

the current is transmitted to the cell body (soma). Once exceeding the threshold,

the cell fires and sends signal down to other neurons through axon. The architecture

of NMSN can be simply expressed by four layers: a synaptic layer, a branch layer,

a membrane layer and a soma layer, as shown in Fig. 5.1, where M dendrites are

associated with a neuron, and each dendrite receives I signals from other neurons.

Arrows in Fig. 5.1 indicate the direction of the information processing. The details

of the model are described in the following.

A synapse refers to the connection between neurons at a terminal bouton of a

dendrite to another dendrite/axon or the soma of another neural cell. The direction

of information flow is feedforward, from the presynaptic neuron to postsynaptic neu-

ron. The synapse can be either excitatory or inhibitory which depends on changes in

the postsynaptic potential caused by ionotropic [104]. There should be four connec-

52

-2 -1.5 -1 -0.5 0 0.5 1 1.5 20

0.20.4

0.60.8

1

-2 -1.5 -1 -0.5 0 0.5 1 1.5 20

0.20.4

0.60.8

1

-2 -1.5 -1 -0.5 0 0.5 1 1.5 20

0.2

0.4

0.60.8

1

-2 -1.5 -1 -0.5 0 0.5 1 1.5 20

0.2

0.4

0.60.8

1

-2 -1.5 -1 -0.5 0 0.5 1 1.5 20

0.2

0.4

0.60.8

1

-2 -1.5 -1 -0.5 0 0.5 1 1.5 20

0.2

0.4

0.60.8

1

(a) Direct connection (b) Inverse connection

(c1) Constant 1 connection (c2) Constant 1 connection

(d1) Constant 0 connection (d2) Constant 0 connection

y yyy

y y

x x

x x

x x

Figure 5.2: Six function cases of the synaptic layer.

tion states in the synaptic layer: a direct connection (excitatory synapse), a reverse

connection (inhibitory synapse), a constant 1 connection and a constant 0 connec-

tion. We show the type of connections by modeling with a one-input one-output

sigmoid function. The node function from the i-th (i = 1, 2, 3, ..., I) input to the

m-th (m = 1, 2, 3, ...,M) synaptic layer is given by

Yim =1

1 + e−k(wimxi−θim)(5.1)

where xi is the input part of a presynapse which is a set of inputs labeled by

x1, x2, ..., xI , and its range is [0, 1]. The inputs are transformed into digital signals “0”

and “1” in the synaptic layer. wim denotes synaptic parameters, and k represents a

positive constant. θim/wim is the threshold of the synaptic layer. There are six cases

of different values of the synaptic parameters. As the values of wim and θim change,

the synaptic function varies accordingly, thus exhibiting different connections states.

Furthermore, the sigmoid function is clearly differential. The functions of all six cases

are shown in Fig. 5.2.

State 1: Direct connection (Excitatory synapse)

Case (a): 0 < θim < wim. eg.: wim = 1.0 and θim = 0.5. In the direct connection,

if xi > θim/wim, the output Yim will be 1. That can be explained that if the input

is high potential compared to the threshold, an excitatory postsynaptic potential

(EPSP) will occurred as the membrane potential rapidly depolarizes. And when

53

xi < θim/wim, the output Yim will be 0. That is an inhibitory postsynaptic potential

(IPSP) has occurred as the membrane will be transiently hyperpolarizes [104]. In

other words, no matter how the inputs change between 0 and 1, the outputs equal

the input.

State 2: Inverse connection (Inhibitory synapse)

Case (b): wim < θim < 0. eg.: wim = −1.0 and θim = −0.5. In the inverse

connection, if xi > θim/wim, the output Yim will be 0, giving rise to an IPSP that

hyperpolarizes the cell. On the other hand, if xi < θim/wim, the output Yim will be

1, as the postsynaptic membrane is depolarized by generating an EPSP. So it can be

illustrated by the logic NOT operation.

State 3: constant-1 connection

Case (c1): θim < 0 < wim. eg.: wim = 1.0 and θim = −0.5. Case (c2): θim <

wim < 0. eg.: wim = −1.0 and θim = −1.5. In the constant-1 connection, the

output will be constant 1 no matter if the input exceeds the threshold or not. The

signals from the synapse have nearly no impact on the dendritic layers, for there is

an excitatory synapse that will trigger EPSPs once the input signals come in.

State 4: constant-0 connection

Case (d1): 0 < wim < θim. eg.: wim = 1.0 and θim = 1.5. Case (d2): wim < 0 <

θim. eg.: wim = −1.0 and θim = 0.5. In the constant-0 connection, the output will

always be 0. That is IPSPs will always occur and the postsynaptic membrane keeps

hyperpolarized.

The dendrite layer simply performs a multiplication on various synaptic connec-

tions of each branch. As mentioned before, the nonlinearity of synapses could be

used to implement a type of multiplication instead of summation, thus our model

adopts the multiplicative operation in the dendrite layer. It should be noted that a

soft-minimization operator was utilized in our previous dendritic neuron model [108]

to deal with binary input classification problem, while the multiplicative operation

adopted in this study can address real number input problems. The multiplication

is very equal to the logic AND operation as the value of inputs and outputs of the

54

x1 x2 x3 x4

soma

Dendrite-1

Dendrite-2

Direct

Connection

Constant 1

Connection

Constant 0

Connection

Inverse

Connection

Membrane

(a)

x1 x2 x3 x4

soma

Dendrite-1

Dendrite-2

Membrane

(b)

x1

soma

Dendrite-1

Membrane

(c)


dendrites are either 1 or 0. The output equation can be given as follows.

Zm =I∏

i=1

Yim (5.2)

This layer corresponds to the sublinear summation operation at a branching point.

It also should be pointed out that a soft-maximization operator was utilized in [108]

and it was replaced by a summation operator in this study. Because the inputs and

outputs of the membrane are also either 1 or 0, the summation can be nearly the

same as the logic OR operation in the binary cases. Here is the equation:

V =M∑

m=1

Zm (5.3)

The result of computation in the membrane layer will be delivered to the soma.

The neuron fires when the membrane potential exceeds the threshold. We use a

sigmoid operator described as follows.

O =1

1 + e−ksoma(V−θsoma)(5.4)

Pruning techniques start by learning a larger than necessary network and then re-

move the nodes and weights which are considered to be redundant [73,74]. The object

of pruning function is to eliminate useless connections and input nodes from neural

dendrites, thus reduce the complexity of the neuron significantly. In the proposed N-

MSN, there are two pruning mechanisms namely axon pruning and dendritic pruning

that screen out the unnecessary synapses and dendrites to simplify the structure of

55

dendrites. In general, an input is connected to a branch by a direct connection (l),

an inverted connection ( z), a constant-0 connection ( 0⃝), or a constant-1 connection

( 1⃝).

Synaptic pruning: In the constant-1 connection, the output of synaptic layer

is always 1. As in the dendritic layer a multiplication operation is performed, any

value multiplies 1 will be itself. That is to say the synaptic layer with constant-1

connection will have no impact on the product result in the dendrite layer. Thus the

synaptic layer with constant-1 connection can be negligible and be overpassed.

Dendritic pruning: As long as there is a constant-0 connection in the den-

dritic layer, the product result will always be 0. The entire dendrite layer could be

eliminated since its no influence.

The specific pruning process is illustrated in Fig. 5.3. The initial structure has

four synaptic layers, two dendritic layers, a membrane layer and a soma as shown in

Fig. 5.3(a). On the Dendrite-1 layer, the connection state of input x2 is constant 1,

so this synaptic layer could be omitted. On the Dendrite-2 layer, the connection state

of input x3 is constant 0, thus the Dendrite-2 layer should be completely removed sice

the output of the Dendrite-2 layer will be 0. The removed synapse or dendrites will

be illustrated in dotted lines as shown in Fig. 5.3(b). Fig. 5.3(c) shows the final

simplified dendritic morphology of neuron that only the input x1 on Dendrite-1 layer

can influence the final output of the soma.

5.4 Learning algorithm

NMSN is a feed-forward network with continuous functions. Thus, the error back-

propagation-like algorithm will be valid for NMSN. By using a learning rule, we can

readily derive a neuron model from the condition of the least squared error between

the actual output O and the desired output T defined as:

E =1

2(T −O)2 (5.5)

56

According to the gradient descent learning algorithm, the synaptic parameters wim

and θim will be modified in the direction to decrease the value of E. The equations

are shown as:

∆wim(t) = −η∂E

∂wim

(5.6)

∆θim(t) = −η∂E

∂θim(5.7)

where η is a positive constant representing the learning rate. The partial differentials

of E with respect to wim and θim are computed as:

∂E

∂wim

=∂E


· ∂Zm

∂Yim

· ∂Yim

∂wim

(5.8)

∂E

∂θim=

∂E


· ∂Zm

∂Yim

· ∂Yim

∂θim(5.9)

The components in the above partial differential are shown as follow.

∂E

∂O= O − T (5.10)

∂O

∂V=

ksomae−ksoma(V−θsoma)

(1 + e−ksoma(V−θsoma))2(5.11)

∂V

∂Zm

= 1 (5.12)

∂Zm

∂Yim

=I∏

L=1andL=i

YLm (5.13)

∂Yim

∂wim

=kxie

−k(xiwim−θim)

(1 + e−k(xiwim−θim))2(5.14)

∂Yim

∂θim=−ke−k(xiwim−θim)

(1 + e−k(xiwim−θim))2(5.15)

The parameters wim and θim are updated according to the equations as follows.

wim(t+ 1) = wim(t) + ∆wim (5.16)

57

θim(t+ 1) = θim(t) + ∆θim (5.17)

5.5 Experimental results and discussion

The experimental results of liver disease prediction are explained in this section.

Here, the performance of the proposed model NMSN is compared with the classical

back propagation neural network (BPNN) to evaluate the sensitivity, specificity and

accuracy.

5.5.1 Experimental environment and evaluation metrics

In the experiment, we design and test each neural network type using MATLAB

(R2013b). The BPNN is implemented using the MATLAB R2013b Neural Network

(NN) Toolbox. The performance metrics of mean square error (MSE), accuracy,

sensitivity, specificity and area under the ROC curve (AUC) are utilized to compare

the results of proposed model NMSN and BPNN.

The liver disorders datasets in this study are taken from the UCI machine learning

repository which are used commonly in medical classification problems [120]. It will

be divided into two subsets: the training set and the test set. In the testing phase,

the testing dataset is given to the proposed model NMSN and the performance is

quantified by its accuracy. However, it is also important to describe the acquired

results in terms of sensitivity, specificity which are metrics particularly important for

medical diagnosis [121,122].

Sensitivity and specificity quantify the model’s performance for false positive and

false negatives and the association between them is defined by the graphical repre-

sentation of the ROC curves. It helps to make a decision to find the optimal model

to determine the best threshold for the diagnostic test [123]. These methods are

based on the consideration that a test point always falls into one of the following four

categories: True positive (TP), True negative (TN), False negative (FN) and False

positive (FP) [122]. The definitions are given in Table 5.1. Fig. 5.4 shows a confusion

matrix from which the equations of several common metrics can be calculated.

58

True

Positive

True

Negative

False

Negative

False

Positive

p n

Y

N

Hypothesized

class

True class

Column totals: P N

Figure 5.4: Confusion matrix.

Table 5.1: Terms used to define sensitivity, specificity and accuracy.

Outcome of the Condition as determined by the Standard of Truthdiagnostic test Positive Negative Row total

Positive TP FP TP + FPNegative FN TN FN + TN

Column total TP + FN FP + TN N = TP + TN + FP + FN

The equations for calculating the sensitivity, specificity, and accuracy are given as

follows.

Sensitivity = TP/(TP + FN) (5.18)

Specificity = TN/(TN + FP ) (5.19)

Accuracy =TP + TN

TP + FN + TN + FP(5.20)

5.5.2 The liver disease database description

Liver disorder database which is support by BUPA Medical Research Company is ob-

tained from the UCI machine library database. The purpose of BUPA liver disorders

data set is to predict if a male patient has liver disorders. It includes 345 samples and

2 class labels: healthy and unhealthy (of liver disease). 200 samples of class1 category

are of healthy persons while the rest 145 data belong to the unhealthy class2 category.

There are six features in the database described in Table 5.2. The first five of the

59

Table 5.2: Basic features for Liver Disorders.

Indices Feature Descriptionsmcv Mean corpuscular volume

alkphos Alkaline phosphatasesgpt Alamine aminotransferasesgot Aspartate aminotransferase

gammagt Gamma-glutamyl transpeptidasedrinks Number of half-pint equivalents of alcoholic

beverages drunk per day

Table 5.3: No. of patterns in the training and testing data set.

No. for training No. for testing TotalBUPA liver disorders 242 103 345

features are obtained from blood tests and the last is from daily alcohol consumption.

5.5.3 Experimentation setup and results

In the experiment, 70% of data are randomly chosen for training while the rest 30%

are for testing, as shown in Table 5.3. Because of the use of sigmoid function in the

synaptic layer, the variables of input vectors are normalized from 0 to 1.0. The use of

sigmoid function in the output neurons results in output values in the range [0, 1]. A

value of less than 0.5 corresponds to zero, while a value greater or equal to 0.5 turns

into one.

5.5.3.1 Optimal parameters setting

For the determination of an optimal set of parameters to meet accuracy requirements

and fast convergence speed during training, the Taguchi’s method is employed using

orthogonal arrays. It tests part of the possible combinations among factors and levels

instead of full factorial analysis. It commits to a minimum of experimental runs and

best estimation of the factor main effects over the process [124].

There’re five parameters considered to be important in NMSN, namely k, ksoma,

60

Table 5.4: Parameter levels in NMSN.

k ksoma θsoma m η1,3,5,10 1,3,5,10 0,0.3,0.5,0.9 5,10,15,20 0.005,0.01,0.05,0.1

θsoma, m and η. The meaning of k, ksoma and θsoma were already described in above

equations. m is the branch number and η is the learning rate. Within each parameter

there are four levels of interest as shown in Table 5.4. An orthogonal array L16(45), is

most suitable for this problem because it has five 4-level columns to match the needs of

the matrix experiment. The L16(45) orthogonal array for this design problem is shown

in Table 5.5. In order to obtain reliable average testing accuracy, each experiment is

repeated 30 times. The number of iterations is set to 2000. As shown is Table 5.5, the

best classification accuracy is acquired by the parameters of the 8th row, that is, k = 3,

ksoma = 10, θsoma = 0.5, m = 10, and η = 0.005. However, supplemental experiments

are needed to verify the selection of ksoma and η whose values are located at the

boundary of the considered interval (either the maximum or minimum values). To

address this problem, two additional combination of parameters are also considered in

Table 5.5, and the results suggest that ksoma = 15 or η = 0.001 will cause degeneration

of the performance. Therefore, it can be said that the combination of parameter

values that k = 3, ksoma = 10, θsoma = 0.5, m = 10, and η = 0.005 is reasonable for

obtaining acceptable performance, thus revealing the influence of parameters on the

performance of the neuron model to some extent.

In order to be compared with BPNN more fairly, it needs to be under the same

computational scale with nearly equal number of weights (including the weights and

thresholds of all neurons). The BPNN can be represented as a vector with dimension

D containing the network weights. The vector for MLP is defined as in Eq. (5.21).

D = (Input×Hidden) + (Hidden×Output)

+Hiddenbias+Outputbias (5.21)

where Input, Hidden and Output refer to the number of input, hidden and output

61

Table 5.5: L16(45) orthogonal array and factor assignment.

Expt. No. k ksoma θsoma m η Testing accuracy1 1 1 0 5 0.005 41.54± 6.32 1 3 0.3 10 0.01 57.88± 7.83 1 5 0.5 15 0.05 67.50± 6.744 1 10 0.9 20 0.1 65.77± 8.065 3 1 0.3 15 0.1 60.70± 8.716 3 3 0 20 0.05 43.53± 6.117 3 5 0.9 5 0.01 69.36± 7.198 3 10 0.5 10 0.005 72.63± 7.249 5 1 0.5 20 0.01 60.77± 6.8610 5 3 0.9 15 0.005 69.04± 6.3811 5 5 0 10 0.1 41.60± 6.6312 5 10 0.3 5 0.05 58.72± 7.1413 10 1 0.9 10 0.05 63.33± 6.3714 10 3 0.5 5 0.1 58.40± 7.3415 10 5 0.3 20 0.005 64.49± 6.0116 10 10 0 15 0.01 43.14± 5.9

Optimal Parameters 3 10 0.5 10 0.005 72.63± 7.24Additional. Expt. No.

17 3 15 0.5 10 0.005 71.30± 8.2418 3 10 0.5 10 0.001 68.23± 6.29

neurons of BPNN respectively. Hiddenbias and Outputbias are the number of biases

in hidden and output layers [125].

If BPNN has I inputs and L nodes in the hidden layer, the amount to adjust

weights will be I × L+ L+ L+ 1. Meanwhile, if the NMSN model has I inputs and

M branches, the sum total to modify wim and θim will be 2M × I. According to the

rule of equivalence, we can get the value of L and dimension D when I and M are

determined. The values are set as shown in Table 5.6.

5.5.3.2 Performance comparison

For equitable comparison, the hidden layer and output layer of the BPNN both use

Log-sigmoid transfer functions and learning rate was set same as NMSN to be 0.005.

Three experiments are performed with different maximum iterations (1000, 2000,

62

Table 5.6: Structures of NMSN and BPNN for Liver disorders dataset.

Method No. of inputs No. of Hidden/Branch No. of Output No. of WeightsNMSN 6 10 1 120BPNN 6 15 1 121

Table 5.7: Classification results by NMSN and BPNN.

Epochs Model Testing Accuracy Training Accuracy Sensitivity Specificity1000 NMSN 65.45± 7.60 69.05± 3.51 69.0 56.5

BPNN 52.05± 9.34 53.32± 4.36 64.5 23.82000 NMSN 72.63± 7.24 75.12± 3.34 92.3 53.8

BPNN 55.00± 6.40 55.62± 5.24 54.1 53.33000 NMSN 72.69± 5.52 76.60± 1.65 86.5 66.7

BPNN 55.96± 8.55 55.91± 3.56 78.8 31.6

and 3000). Using the optimal set of parameters, both methods are independently

run 30 times. After the test data is classified, the average of the thirty results of

the classification accuracy will be used for comparing the performance of both neural

networks. Table 5.7 shows the classification results obtained by NMSN and BPNN

and Sensitivity and Specificity accuracies are also presented.

As shown in Table 5.7, the proposed model acquired an average testing accuracy of

72.69% when performed with 3000 iterations, which is much higher than the 55.96%

accuracy obtained by BPNN. What’s more, NMSN is also superior to BPNN in terms

of sensitivity and specificity. Higher sensitivity and specificity values indicate that

the ability of NMSN to identify patients who do in fact have liver disorders without

giving false-positive results.

In addition, for further comparison, we tuned the value of parameters of BPNN

to the level that BPNN could have its best performance. We got the best accuracy

66.92% of BPNN when we adjusted the hidden layer to 40 and the learning rate

to 0.1. The best performance of NMSN and BPNN are compared in Table 5.8, in

which the average accuracy of NMSN is still higher than that of BPNN under both

63

Table 5.8: Comparison of the simulations results between NMSN and BPNN.

Method Branchs L. Rate Average accuracy Sensitivity Specificity T-testNMSN 10 0.005 72.69± 5.52 86.5 66.7 −BPNN 15 0.005 55.96± 8.55 78.8 31.6 1.38E-08

40 0.1 66.92± 7.65 82.1 50.0 1.50E-03

of the conditions. Moreover, the values of sensitivity and specificity of NMSN are

higher in general. To gauge the statistical difference between the results of NMSN

and BPNN with two sets of parameters, we conducted a two-tailed t test which is

shown in Table 5.8. From the two-tailed p values derived from the t-tests, we can find

that the differences of the average solution values between each variant of BPNN and

NMSN are significant by rejecting the null hypothesis (p < 0.01).

There are also many other methods to perform liver disease classification. As

shown in Table 5.9, the NMSN is also compared with other methods from pervious

researches based on the BUPA liver disorder medical database. In order to facilitate

performance comparison, five different experimental train-to-test ratios were adopt-

ed, i.e., two single-fold validation method (40%-60%, 80%-20%) and three multi-fold

cross-validation (K×CV) methods including 4-fold CV (66.7%-33.3%, ×4), 5-fold CV

(80%-20%, ×5) and 10-fold CV (90%-10%, ×10). Here the train-to-test ratio denotes

the ratio between the number of samples for training and for testing. With K×CV

(K=4, 5, or 10), the whole data set is randomly divided into K mutually exclu-

sive subsets with approximately equal number of samples. In K×CV, the method is

trained on the training subsets, and the testing error is measured by testing it on the

testing subset. The procedure is repeated for a total of K trials, each time using a

different subset for testing. The performance of the model is assessed by averaging

the squared error under testing over all the trails of the experiment. Compared with

single-fold validation method, the K×CV has advantages that it could minimize bias

associated with random sampling of training samples [126] while has disadvantages

that it may require an excessive amount of computation since the model has to be

64

trained K times [127]. The results of NMSN using the above five kinds of train-to-test

ratios are summarized in Table 5.9, where these results were averaged testing accu-

racies during 30 independent runs based on the same optimal parameters in Table

5.5. From Table 5.9, it is clear that NMSN performs better than the other compared

methods in terms of the accuracy.

5.5.3.3 Convergence properties

In this section, we compare the performance of convergence with respect to the num-

ber of iterations of the two models NMSN and BPNN. Fig. 5.5 illustrates the train-

ing error convergence curve on the BUPA liver disorders datasets when the dendritic

branch of NMSN is 10 and the hidden layer nodes of BPNN is 15 or 40. The training

error is computed based on mean square error (MSE) as Eq. (5.22).

MSE =1

R

R∑a=1

[1

2

S∑b=1

(Eab −Oab)2] (5.22)

where Eab and Oab are the desired output and the network output respectively, S

is the number of patterns and R is the number of simulation repetitions. Here each

pattern denotes one of the data samples of BUPA liver disorders, and thus S = 242 in

the case of Table 3. Each simulation indicates an independent running of compared

methods, and R = 30 is set in the experiment. The results in Fig. 5.5 are obtained

based on the user-defined parameters set as follows: the optimal parameters shown

in Table 5 are used for NMSN; the learning rate is set to 0.1 for BPNN-40, and

0.005 for BPNN-15 respectively. As observed in Fig. 5.5, NMSN provides the lower

error training in contrast and has better convergence rate than BPNNs under both

conditions.

5.5.3.4 ROC analysis

To compare the classification performance of the proposed model NMSN with BPNN,

receiver operator characteristic (ROC) curves method is preferred as a graphical plot

to demonstrate the quality of classifiers. It is a reliable technique to analyze the

65

0 500 1000 1500 2000 2500 30000.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

NMSN

BPNN−40

BPNN−15

Iteration

Tra

inin

g E

rror

Figure 5.5: Comparison of convergence speed of NMSN and BPNN.

performance of algorithms in classification problem showing the true positive rate

(sensitivity) and false positive rate (1-specificity). The ROC curves of both classifiers

are shown in Fig. 5.6. AUC is the area under the curve (ROC) and because it is

one portion of the area of the unit square, the value is between 0.0 and 1.0 [36]. The

value 1.0 of AUC represents the classifier can have perfect discrimination to classify

the live disorders correctly, whereas a value of 0.5 is equal to random model [134].

AUC is calculated as follows:

AUC(%) =1

2(

TP

TP + FN+

TN

TN + FP)× 100 (5.23)

According to this method, AUC are computed for both classifiers as shown in Fig.

5.7, where the values of the AUC are 0.7660 for NMSN, 0.5520 and 0.6605 for BPNN,

suggesting that NMSN is superior than BPNN for classifying the live disorders.

5.5.4 The final synaptic and dendritic morphology

As we mentioned above, NMSN possesses a structural plasticity mechanisms in synaps-

es and dendrites that support classifying the liver disorders dataset. The computation

on neuron is performed as a combination of dimensional reduction and nonlinearity,

which has axon pruning and dendritic pruning mechanisms that can remove useless

66

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ROC

Tru

e P

osi

tiv

e R

ate

False Positive Rate

(a) NMSN

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False Positive Rate

Tru

e P

osi

tiv

e R

ate

ROC

(b) BPNN-15

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1ROC

Tru

e P

osi

tiv

e R

ate

False Positive Rate

(c) BPNN-40

Figure 5.6: The ROC curves of NMSN and BPNNs.

synapses and dendrites during learning, forming a distinct synaptic and dendritic

morphology for the purpose of improving the efficiency of the neurological system.

An input is connected to a branch by a direct connection (l), an inverse connection

( z), a constant-0 connection ( 0⃝), or a constant-1 connection ( 1⃝). We simplify the

structure of dendrites according to the pruning mechanisms that the synaptic layer

with constant-1 connection can be completely omitted and the dendritic layer with 0

connection should be removed. Through the final synaptic and dendritic morphology,

we can identify the underlying causes of disorders and reduce the number of inputs

to save the diagnosis time and achieve high classification accuracy of liver disease.

Fig. 5.8(a) shows the specific dendritic morphology that holds the best perfor-

mance before learning. Fig. 5.8(b) shows the corresponding morphology after learn-

67

NMSN BPNN−15 BPNN−400

0.2

0.4

0.6

0.8

1

0.766

0.552

0.6605

Figure 5.7: The AUC values of NMSN and BPNNs.

ing. The symbol 6 refers to a branch that can be removed. Fig. 5.8(c) shows all

branches are deleted except branches 1, 3 and 7. The final synaptic and dendritic

structure is obtained in Fig. 5.8(d) with all constant-1 inputs neglected. The final

chosen features are x2, x3, x4, x5 and x6 while x1 can be removed. That is to say, the

six inputs are reduced to five, suggesting that the first feature for liver disorders (i.e.

Mean corpuscular volume) is less important among the underlying causes of disorders.

Finally, it is worth pointing out that the simplified dendritic morphology can form

an approximate logic circuit, which is suitable for a simple hardware implementation

in practice [135].

5.6 Conclusions and Remarks

In this study, a single neuron model with synaptic nonlinearities (NMSN) in a den-

dritic tree was proposed for carrying out the liver disorders classification. The compu-

tational capacity of the single neuron model NMSN was realized by the combination

of dimensional reduction and nonlinearity. The nonlinearity computation was derived

from the multi-layer architecture of the dendritic neuronal tree, while the dimensional

reduction originated from the specific neuron-pruning function in the synaptic layer.

The performance of NMSN was verified based on the liver disease diagnostic prob-

lems. Experimental results suggested that NMSN was superior than the traditional

BPNN with the similar computational architecture (denoted as BPNN-15) or with

the best performance (namely BPNN-40), in terms of classification accuracy, conver-

gence properties, and AUC criterion. In addition, NMSN also produced better or

competitive solutions than a number of previously proposed methods, such as SVM,

68

C4.5, Classification tree, KNN, Neuron-fuzzy model, etc.

It’s worth emphasizing that NMSN has a distinct ability of pattern extraction

through the pruning function, which is a metaphor of the neuronal morphology. By

learning a larger than necessary initial network, and thereafter screening out the

useless synapses and unnecessary dendrites, NMSN can finally produce a neuron

with least necessary dentritic morphology. The resultant neuron can not only possess

significant higher computational capacity than the traditional Mc-Culloch-Pitts linear

neuron model which is incapable of solving even the simple 3-bit parity problem, but

also provide a possible information processing mechanism of the neuronal morphology

and plasticity. These findings and evidences might also give some insights into the

development of new techniques for understanding the mechanisms and constructions

of single neurons.

In the future, we plan to apply the proposed NMSN on other classification prob-

lems to further verify its performance. The theoretical convergence analysis of the

gradient descent training method used in NMSN will also be carried out. In addition,

global training methods such as differential evolution algorithm or particle swarm

optimization will be utilized to improve the training results of NMSN.

69

Tab

le5.9:

Classification

accuracies

forBUPA

Liver

Disordersproblem

obtained

byother

methodsin

literature.

Author

(year)

Method(train-to-test

ratios)

Classification

accuracy

(%)

Pham

etal.(2000)

[128]

RULES-4

(40%

-60%

)55.90

Cheung(2001)

[129]

Naıve

Bayes

(5×CV)

63.39

Cheung(2001)

[129]

C4.5(5×CV)

65.59

Cheung(2001)

[129]

BNND

(5×CV)

61.83

Cheung(2001)

[129]

BNNF(5×CV)

61.42

Van

Gestelet

al.(2002)

[130]

SupportVectorMachine(SVM)withGP(10×

CV)

69.70

Yeow

(2006)

[131]

Classification

tree

(10×

CV)

53.90

Yeow

(2006)

[131]

KNN

(10×

CV)

50.16

Jeatrak

ulan

dWon

g(2009)

[100]

RBFNN

(80%

-20%

)67.54

Jeatrak

ulan

dWon

g(2009)

[100]

CMTNN

(80%

-20%

)70.29

Bah

ramirad

etal.(2013)

[120]

SVM

69.23

Kulkarnian

dShinde(2013)

[132]

Neuro-fuzzymodel

(80%

-20%

)58.90

Kulkarnian

dShinde(2013)

[132]

Neuro-fuzzywithgaussianmem

bership

function(80%

-20%

)67.27

Ubaidillanet.al

(2014)

[133]

SVM

63.11

Zhan

get

al.(2014)

[101]

SOCPNN

(4×CV)

66.78

Zhan

get

al.(2014)

[101]

MOCPNN

(4×CV)

66.78

Seera

andLim

(2014)

[102]

Min-M

axneuralnetwork(5×CV)

67.25

Seera

andLim

(2014)

[102]

Min-M

axneuralnetwork(10×

CV)

66.13

Ourmethod(2015)

NMSN

(80%

-20%

)73.15

Ourmethod(2015)

NMSN

(40%

-60%

)69.47

Ourmethod(2015)

NMSN

(4×CV)

71.04

Ourmethod(2015)

NMSN

(5×CV)

72.78

Ourmethod(2015)

NMSN

(10×

CV)

72.43

70

soma

1 1

1 10

1 1 1

Membrane

Dendrite-1

x3

0 1 1

0

1 1

0 0

1

10

0 1

0 0 0 0 0

11 1

0 1 0

1 0 0 0

0 0 0 0

1 1

11 0 0 1

0 0 1

1 0 0 1

10 10 0 0

x1 x2 x4 x5 x6

Dendrite-2

Dendrite-3

Dendrite-4

Dendrite-5

Dendrite-6

Dendrite-7

Dendrite-8

Dendrite-9

Dendrite-10

Dendrite-11

Dendrite-12

Dendrite-13

Dendrite-14

Dendrite-15

0

0

1 1

1

(a) soma

1 1

1 10

1 1 1

Membrane

Dendrite-1

x3

0 1 1

0

1 1

0 0

1

10

0 1

0 0 0 0 0

11 1 1

0 1 0

1 0 0 0

0 0 0 0

1 0 1 1 1

11 0 0 1

0 0 0 1

1 0 0 1

10 10 0 0

x1 x2 x4 x5 x6

Dendrite-2

Dendrite-3

Dendrite-4

Dendrite-5

Dendrite-6

Dendrite-7

Dendrite-8

Dendrite-9

Dendrite-10

Dendrite-11

Dendrite-12

Dendrite-13

Dendrite-14

Dendrite-15

(b)

soma

1 1

1 1 1

Membrane

Dendrite-1

x3

1 1

1

11 1 1

x1 x2 x4 x5 x6

Dendrite-3

Dendrite-7

soma

Membrane

Dendrite-1

x3x2 x4 x5 x6

Dendrite-3

Dendrite-7

(c) (d)

Figure 5.8: The evolution of the neuronal morphology.

71

Chapter 6

Evolutionary Model: ChaoticGravitation Search

6.1 Introduction

Gravitational search algorithm (GSA) [136] is one of the newest heuristic optimization

methods based on Newtonian laws of gravity and motion. It has shown remarkable

search abilities in solving optimization problems [137] within high-dimensional search

spaces. In GSA, a series of candidate solutions are kept as a group of objects. At each

iteration, the objects update their solutions by moving stochastically. The objects

with heavier masses have stronger attraction to other objects and move more slowly

than the objects with lighter masses. By lapse of iterations, all other objects tend

to move towards the heaviest object which corresponds to be the best solution for

optimization problem. The advantages of robustness, adaptability and simplicity of

GSA make it possible to be applied to a wide scope of function optimization problems

[138]. However, GSA still suffers from the inherent disadvantages of trapping in local

minima and the slow convergence rates that reduce the solution quality.

To resolve the aforementioned problem, chaos, which is of randomicity, ergodicity

and regularity was incorporated into GSA [139]. Chaos is a very common phenomenon

of non-linear systems and has recently received many interests. In the field of optimal

design, the ergodicity of chaos has been viewed as a optimization mechanism to avoid

falling into the local search process. The chaotic state was introduced into the opti-

72

mization variables and did search using the chaos variables [140]. Meanwhile kinds of

chaos optimization algorithm applying to solve the complex object for optimization

problem were put forward [141, 142]. The search based on chaos has stronger explo-

ration and exploitation capability and can enable the algorithm to effectively jump out

of local extremum due to the inherent ergodicity of chaos. It has been demonstrated

that the combination of GSA with chaotic system can alleviate the shortcomings of

GSA and thus highlight the advantages of the usage of chaotic systems [139,143].

There are two methods to combine GSA with chaos. One uses chaotic maps to

generate chaotic sequences to substitute random sequences, while the other employs

chaos to act as a local search approach. In our previous work, the logistic map

was utilized to generate chaotic sequences and perform the local search [139]. In

this study, other four different chaotic maps involving the piecewise linear chaotic

map, the gauss map, the sinusoidal map, and the sinus map are utilized to combine

with GSA. It is apparent that different chaotic maps possess distinct distribution

characteristics. The objective of this work is, not only to find out which chaotic map

most greatly improve the performance of GSA, but only to give some insights to the

underlying reasons. To realize these, six commonly used benchmark optimization

functions are chosen from the literature. The experimental results verify that all

five incorporated chaotic maps can improve the performance of GSA in terms of the

solution quality and convergence speed. In addition, the four newly incorporated

chaotic maps exhibit better influence on improving the performance of GSA than the

logistic map, suggesting that the hybrid searching dynamics of CGSA is significantly

effected by the distribution characteristics of chaotic maps. Furthermore, simulation

results also show that the performance of CGSA is tightly related to the search

dynamics which results from the interaction between the incorporated chaotic map

and the landscape of the solved problems.

The rest of the paper is organized as follows: Section II presents a brief description

of the GSA. The five chaotic maps used in the chaotic local search procedure is

introduced in Section III. In Section IV, the chaotic gravitational search algorithms

using five different maps are proposed. Section V gives the experimental results of the

73

five variants of CGSA on the six benchmark optimization functions. Finally, some

general remarks are presented to conclude the paper.

6.2 Overview of GSA

GSA is a new stochastic search algorithm introduced by Reshedi et al. [136]. It is a

global search strategy that can handle efficiently arbitrary optimization problems. It

is based on the Newtonian laws of gravity and motion. In GSA, agents are considered

as objects and their performances are measured by their masses. All these objects

attract each other by a gravity force [144,145] and this force causes a global movement

of all objects towards the objects with heavier masses. Hence, objects cooperate with

each other using a direct form of communication through gravitational force. The

heavier masses (which correspond to good solutions) move more slowly than the

lighter ones, which guarantees the exploitation of the algorithm to find the optima

around a good solution. Consider a system with N agents (objects), we define the

position of the ith agent by:

Xi = (x1i , x

2i . . . , x

di , . . . , x

ni ) i = 1, 2, . . . , N (6.1)

where xdi is the position of the ith agent in the dth dimension, and n is the dimension

of the search space.

At the tth iteration, the gravitational force acting on the ith object from the jth

object is represented as follows [136]:

F dij(t) = G(t)

Mj(t)Mi(t)

Rij(t) + ε(xd

j (t)− xdi (t)) (6.2)

where Mi and Mj are masses of agents. G(t) is the gravitational constant at time t,

ε is a very small constant and Rij(t) indicates the Euclidean distance between two

agents i and j:

Rij(t) =∥ xi(t), xj(t) ∥2 (6.3)

74

The gravitational constant G(t) is initialized at the beginning of iterations and it

is reduced with time to control the search accuracy. G(t) is given by [146]:

G(t) = G0e−αt/itermax (6.4)

where G0 is the initial value, α is a user-defined parameter, and itermax is the maxi-

mum number of iterations.

The total force acting on the ith agent is given by:

F di (t) =

Kbest∑j=1,j =i

randjFdij(t) (6.5)

where Kbest is the set of first K agents with better fitness (i.e. bigger mass). It is a

function of time that decreases linearly along with iteration time [136] and at the end

of iterations its value becomes 2% of the initial number of agents. randj is a random

number in the interval [0,1]. Hence, by the law of motion, the acceleration adi (t) of

the agent i at time t and in dth dimension is given by:

adi (t) =F di (t)

Mi(t)(6.6)

where Mi(t) is calculated through the map of fitness defined as follows:

Mi(t) =mi(t)∑Nj=1mj(t)

(6.7)

and

mi =fiti(t)− worst(t)

best(t)− worst(t)(6.8)

where best(t) is the best fitness of all agents, worst(t) is the worst fitness of all agents,

and fiti(t) represents the fitness of agent Mi by calculating the objective functions.

The new velocity of an agent is considered as a fraction of its current velocity

added to its acceleration. Thus the position and the velocity of the ith agent at tth

75

iteration in the dth dimension is calculated as follows:

vdi (t+ 1) = randi × vdi (t) + adi (t) (6.9)

xdi (t+ 1) = xd

i (t) + vdi (t+ 1) (6.10)

where randi is a uniform random variable generated in the interval [0,1], which in

fact is an attempt of giving randomized characteristics to the search. The flow chart

of GSA is given in the following.

Traditional Gravitational Search Algorithmfor all agents i (i=1,2,...,N) doinitialize position xi randomly in search space

end-forwhile termination criteria not satisfied dofor all agent i do

compute overall force F di (t) according to Eqs.(2)-(5)

compute acceleration adi (t) according to Eq.(6)compute current velocity according to Eq.(9)compute current position according to Eq.(10)end-for

end-while

The main features of GSA are listed as follows:

(1) The object with heavier mass owns a stronger attractive force and moves slower

than the lighter agent.

(2) Gravitational constant decreases with time to make the search have better accu-

racy.

(3) The acceleration of an agent is decided by the total force which is inversely pro-

portional to the distance between two agents.

(4) The next position of agent only depends on its current velocity and current posi-

tion.

(5) GSA is a less-memory algorithm, and only requires a small memory capability of

hardware.

76

6.3 chaotic maps

Recently, chaos, which is a kind of dynamic behavior of nonlinear systems, has aroused

much concern in different fields of sciences such as chaos control, pattern recognition

and optimization theory [147]. In this section five chaotic maps are introduced.

6.3.1 Logistic map

The logistic map is a polynomial mapping which is often cited as an archetypal

example to show how complex and chaotic behavior can arise from very simple non-

linear dynamical equations. The map was popularized in a seminal paper by the

biologist Robert May [148], in part as a discrete-time demographic model analogous

to the logistic equation. The equation of this map appears in nonlinear dynamics

of biological population evidencing chaotic behavior. Its mathematical expression is

given by Eq. (11).

xk+1 = axk(1− xk) k = 1, 2, . . . , N (6.11)

where xk is the kth chaotic number and k represents the iteration number, a is usually

set to 4. The initial number x0 ∈ [0, 1] and x0 /∈ {0.0, 0.25, 0.5, 0.75, 1.0}. When the

logistic map is combined with GSA, the hybrid algorithm is labeled as CGSA1.

6.3.2 Piecewise linear chaotic map

Piecewise linear chaotic map (PWLCM) has obtained more and more attention in

chaos research recently for its simplicity in representation and good dynamical behav-

ior. PWLCM has been known as ergodic and has uniform invariant density function

on their definition intervals [149]. The simplest PWLCM is defined in Eq. (12):

xk+1 =

xk/p xk ∈ (0, p)

(1− xk)(1− p) xk ∈ [p, 1)(6.12)

77

In the experiment, p is set to be 0.7. When PWLCM is combined with GSA, the

hybrid algorithm is labeled as CGSA2.

6.3.3 Gauss map

The Gauss map can be defined for hypersurfaces in Rn as a map from a hypersurface

to the unit sphere Sn−1 ∈ Rn. Its equation is defined by [143,150]

xk+1 =

0 xk = 0

(µ/xk)mod(1) otherwise(6.13)

where µ is set to 1. When the gauss map is combined with GSA, the hybrid algorithm

is labeled as CGSA3.

6.3.4 Sinusoidal map

The following equation defines the Sinusoidal map [148]

xk+1 = ax2ksin(πxk) (6.14)

For a = 2.3 and x0 = 0.7, it has the following simplified form

xk+1 = sin(πxk) (6.15)

When the sinusoidal map is combined with GSA, the hybrid algorithm is labeled as

CGSA4.

6.3.5 Sinus map

Sinus map [151,152] is defined as

xk+1 = 2.3(xk)2sin(πxk) (6.16)

78

When the sinus map is combined with GSA, the hybrid algorithm is labeled as

CGSA5.

To illustrate the details of chaos, the distributions of x for all the five chaotic maps

are given in Fig.6.1. The dynamic ranges of the five chaotic maps are summarized

as follows: [0, 1] for the Logistic, PWLCM, and Gauss maps, [0, 0.92] for Sinusoidal

map, and [0,+∞] for Sinus map. It is worth pointing out that: 1) for all the values

of x, we take two digits after the decimal point for the convenience of illustration.

2) the distribution of x in PWLCM and gauss map are flatter than the other three

maps, which suggests that the probabilities of x visiting the values in [0,1] is nearly

the same. 3) only the values between [0, 1] are utilized in the chaotic local search.

Although Xiang et al. [153] gave an argument that flat distribution of x performed

better than rough distributions when it was applied in chaotic search, they only gave

the simulation results between PWLCM and the logistic map. It is reasonable that

the performance of chaotic search is not only related to the distribution of chaos, but

also to the landscape of the optimization function. More evidences can be found in

Section V.

6.4 Chaotic gravitational search algorithm

Taking properties of chaos like ergodicity, iteration-based searching algorithms called

chaos optimization algorithms (COA) were presented [141,142]. It is easier for COA

to escape from local minimum points than traditional stochastic optimization algo-

rithms. The chaotic system repeats through all the states of the phase space by

its ergodicity, based on the movement rule of its own from an initial state. It can

traversal for many times near the current optimal solution due to the advantage of

this property of the chaotic system. Compared with the random local search, chaotic

local search can alleviate the blindness and randomness of the search process so that

better solutions near the current optimal solutions can be reached more effectively.

The general steps of CLS are given as following:

79

Figure 6.1: The distribution of x under certain system parameters in 20000 iterationswhen x0 = 0.74

.

0 0.2 0.4 0.6 0.8 10

100200300400500600700800900

1000(a) Logistic map

0 0.2 0.4 0.6 0.8 10

50

100

150

200

250(b) PWLCM

0 0.2 0.4 0.6 0.8 10

50

100

150

200

250

300(c) Gauss map

0 0.2 0.4 0.6 0.8 10

300

600

900

1200

1500(d) Sinusoidal map

0 0.2 0.4 0.6 0.8 10

500

1000

1500

2000

2500

3000

3500(f) Sinus map

Chaotic Local Search Algorithmstep 1. Set the parameters of a chaotic system and thenumber of chaotic search L

step 2. According to the chaotic system, get a chaoticsequence whose length is N

step 3. Choose the best individual vc in the currentpopulation

step 4. Recorded chaotic search initial counter as 0step 5. while (t < L)step 6. Superimpose an item of the chaotic sequenceon vc in any dimension to form a new individual thatis marked as vn

step 7. Calculation the fitness value of the newindividual vn

step 8. Compute current velocity according to Eq. (9)step 9. for the optimization function fstep 10. if f(vc) > f(vn)step 11. vc ← vnstep 12. end-ifstep 13. t = t+ 1step 14. end-while

80

Noted that the search neighborhood of Xg is constructed in a hypercube whose

center is Xg with a radius r, where r = ρ× r. The constant ρ is set to 0.978.

Based on the GSA and chaotic local search, an improved chaotic gravitational

search algorithm is proposed here. It should be noticed that the local search is only

applied to the current global best agent Xg obtained from the GSA. Compared with

carrying out local search on all agents, it is expected that this scheme can not only

save computational times, but also produce competitive good solutions. The proce-

dure of CGSA is described in the following.

Chaotic Gravitational Search Algorithmfor all agents i (i=1,2,,N)do

initialize position xi randomly in search spaceend-forwhile termination criteria not satisfied dofor all agent i do

Compute overall force Fid(t) according

to Eqs. (2)-(5)

Compute acceleration aid(t) according to Eq. (6)

Update velocity according to Eq. (9)Update position according to Eq. (10)

end-forFind out the global best agent Xg

Implement chaotic local search approach (CLS)Decrease chaotic local search radius using r = ρ× r

end-while

6.5 Numerical simulation

6.5.1 Experimental setup

To evaluate the performance of the proposed algorithm, six benchmark optimization

problems in Table 6.1 are used, where functions f1-f3 are unimodal functions, while

functions f4-f6 are multimodal functions with plenty of local minima and the number

of the local minima in these functions increases exponentially with the dimension of

81

the function. The population size of all constructed algorithms is set to be 50. The

maximum iteration number is 1000 in each run. In order to eliminate stochastic

discrepancy and give the statical analysis, each algorithm is repeated 30 times. The

constants ε, α and G0 are set to 1000, 1.0E-100 and 100 respectively. The experiments

are conducted in Microsoft Visual Studio 2010 on a personal PC.

6.5.2 Results and discussions

We firstly compared the performance during GSA, CGSA1, CGSA2, CGSA3, CGSA4,

and CGSA5. Table 6.2 to Table 6.7 recorded the minimum fitness, maximum fitness

and average fitness for each algorithm on the benchmark functions respectively. From

these tables, it is clear that all chaotic GSAs perform better than GSA, suggesting

that chaotic search as a local search approach is able to enhance the global search

capacity of the algorithm and prevent the search to stick on a local solution. Moreover

the average fitness of the best-so-far solutions found by CGSA3, CGSA4 perform

better than CGSA1 for all the six functions, which indicates that the sinusoidal map

and gauss map possess the better searching performance than the logistic map used

in [139]. Thus, it is evident that the searching dynamics of GSA is definitely effected

by the distribution characteristics of chaos, and meanwhile the famous logistic map

is might not the best choice for the utilization for many optimization problems.

In order to analysis the final best-so-far solution in details, a box-and whisker

diagram is used in Fig. 6.2. The vertical axis indicates the fitness values of the final

solutions and the horizontal axis represents the six algorithms. From Fig. 6.2, it is

apparent that CGSA2, CGSA3 and CGSA4 generate better solutions than CGSA1

in terms of not only the maximum, average, and minimum values, but also the lower

quartile, median, and upper quartile values of the final best-so-far solutions for f2-f4

and f6. CGSA5 outperforms CGSA1 on f2, f3, f4 and f6. In particular, CGSA5

produces significant better solutions than the other algorithms on f4. The reason

seems to be distinct distribution characteristics of the sinus map where most of the

chaotic values are located around 0.4. To sum up, it can be concluded that: 1)

82

Figure 6.2: Statistical values of the final best-so-far solution obtained by the sixalgorithms.

GSA

CGSA

1

CGSA

2

CGSA

3

CGSA

4

CGSA

5

1

2

3

4

x 10−17

"n

al

be

st s

olu

tio

ns

(a) f1

GSA

CGSA

1

CGSA

2

CGSA

3

CGSA

4

CGSA

5

1.5

2

2.5

3

x 10−8

"n

al

be

st s

olu

tio

ns

(b) f2

GSA

CGSA

1

CGSA

2

CGSA

3

CGSA

4

CGSA

5

50

100

150

"n

al

be

st s

olu

tio

ns

(c) f3

GSA

CGSA

1

CGSA

2

CGSA

3

CGSA

4

CGSA

5

−12000

−10000

−8000

−6000

−4000

−2000

"n

al

be

st s

olu

tio

ns

(d) f4

GSA

CGSA

1

CGSA

2

CGSA

3

CGSA

4

CGSA

5

3

4

5x 10

−9

"n

al

be

st s

olu

tio

ns

(e) f5

GSA

CGSA

1

CGSA

2

CGSA

3

CGSA

4

CGSA

50

5

10"

na

l b

est

so

luti

on

s

(f) f6

hybridization of GSA with chaos is demonstrated to be an essential aspect of the

high performance; 2) the four newly incorporated chaotic maps generally exhibit

better influence on improving the performance of GSA than the logistic map; and 3)

however, there is no specific chaotic maps can enable GSA to achieve the best solution

for all optimization problems, suggesting that the performance of hybrid CGSAs are

related not only to the search capacity of the algorithm, but also to the landscape of

the solved problems.

To give some insights to how the chaotic local search on the search dynamics of

GSA, the convergence trendline figures of function f2, f3, f4, and f6 obtained by the

six algorithms are given in Fig. 6.3. In this figure, the horizontal axis represents the

iteration and the vertical axis denotes the average fitness best-so-far solutions in loga-

rithmic scales. The convergence graphs of the last 100 iteration are embedded aiming

83

Figure 6.3: The average fitness trendlines of the best-so far solution found by the sixalgorithms.

0 200 400 600 800 100010

−10

10−5

100

105

1010

1015

Iteration

av

era

ge

be

st−

so−

far

(a) f2

0 200 400 600 800 100010

0

102

104

106

108

1010

Iteration

av

era

ge

be

st−

so−

far

(b) f3

0 200 400 600 800 1000−10

5

−104

−103

Iteration

av

era

ge

be

st−

so−

far

(c) f4

0 200 400 600 800 100010

−4

10−2

100

102

104

Iteration

av

era

ge

be

st−

so−

far

(d) f6

GSA

CGSA1

CGSA2

CGSA3

CGSA4

CGSA5

900 920 940 960 980 100010

−8

10−7

10−6 GSA

CGSA1

CGSA2

CGSA3

CGSA4

CGSA5

900 920 940 960 980 1000

101.41

101.52

GSA

CGSA1

CGSA2

CGSA3

CGSA4

CGSA5

GSA

CGSA1

CGSA2

CGSA3

CGSA4

CGSA5

to show the differences more clearly. It is difficult to distinguish the convergence

graphs for the six algorithms on f1 since the algorithm has quite quick convergence

speed mainly manipulated by the GSA rather than the chaotic local search. The

search behaviors of algorithms on multimodal functions f4 and f6 are quite illumi-

nating. CGSA3, CGSA4 and CGSA5 performs much faster convergence speed than

the other algorithms on multimodal functions, suggesting that the gauss map, the

sinusoidal map and the sinus map might be more suitable for helping algorithms to

jump out of the local solutions.

Furthermore, we define the ratio of best-so-far solutions found by the five chaot-

ic maps to those found by GSA verse the iteration. We assume AFGSA, AFCGSA1,

AFCGSA2, AFCGSA3, AFCGSA4, and AFCGSA5 represent the average fitness of best-so-

far solution found by GSA, CGSA1, CGSA2, CGSA3, CGSA4, and CGSA5, respec-

tively. The ratio is defined as follows:

Ra =AFcandidate

AFGSA

(6.17)

84

Figure 6.4: The ratio of best-so-far solutions found by the six algorithms.

0 200 400 600 800 10000

0.2

0.4

0.6

0.8

1

1.2

1.4

Iteration

ra

tio

of

be

st−

so

−fa

r so

luti

on

(a) f2

GSA

CGSA1

CGSA2

CGSA3

CGSA4

CGSA5900 920 940 960 980 10000.8

1

1.2

0 200 400 600 800 10000.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Iteration

ra

tio

of

be

st−

so

−fa

r so

luti

on

(b) f3

GSA

CGSA1

CGSA2

CGSA3

CGSA4

CGSA5

0 200 400 600 800 10000

1

2

3

4

5

Iteration

Ra

tio

of

be

st−

so

−fa

r so

luti

on

(c) f4

GSA

CGSA1

CGSA2

CGSA3

CGSA4

CGSA5

0 200 400 600 800 10000

0.2

0.4

0.6

0.8

1

1.2

1.4

Iteration

Ra

tio

of

be

st−

so

−fa

r so

luti

on

(d) f6

GSA

CGSA1

CGSA2

CGSA3

CGSA4

CGSA5

where the candidate is obtained from one of the CGSAs. Fig. 6.4 depicts the ratios

of algorithms verse the iteration where the values of the solutions found by GSA are

set as the basis, thus forming a horizontal line in the figure. For Fig. 6.4(a), (b) and

(d), the values above this line indicate worse solutions found by the algorithm than

those by GSA, while the values below the line denote better ones. For Fig. 6.4(c), the

inverse cases happen since the values of solutions are negative numbers. From Fig.

6.4, it is clear that chaotic GSAs significantly outperform GSA on f3, f4 and f6.

On the other hand, chaotic GSAs still has capacity of jumping out of local minima

on the latter search phases which can be observed from the subfigure of Fig. 6.4(a),

although they only produce slightly better solutions than GSA.

6.6 Conclusion

In this paper, improved gravitational search algorithms (CGSA) using five different

chaotic maps were presented for global optimization. These chaotic maps were utilized

to carry out the chaotic local search which is inserted into GSA. The architecture of

85

such resultant hybrid algorithm is emerged by switching the chaotic search and GSA

to each other. Experimental results indicated that the chaotic search can directly

improve the current solution found by GSA, thus leading a faster convergence speed,

and further obtaining a higher probability of jumping out local optima.

Moreover, the distribution characteristics of the five chaotic maps were also ob-

served. Results suggested that the four newly introduced chaotic maps in this paper

generally exhibit better influence on improving the performance of GSA than the lo-

gistic map. Nevertheless, there is no specific chaotic maps can enable GSA to achieve

the best solution for all optimization problems, suggesting that the performance of

hybrid CGSAs are related not only to the search capacity of the algorithm, but also

to the landscape of the solved problems. In the future, we plan to adaptively use mul-

tiple chaotic maps simultaneously in the chaotic search to construct a more powerful

CGSA and analyze the search dynamics of the algorithm.

86

Tab

le6.1:

Thefunction

nam

e,definition,dim

ension

,feasible

interval

ofvarian

ts,an

dtheknow

nglob

alminim

um

ofsix

benchmarkfunction.

Functionnam

eDefinition

Dim

Interval

Global

minim

um

Sphere

f1(X)=

∑ n i=1x2 i

30[-100,100]

0.0

Schwefel

2.22

f2(X)=

∑ n i=1|x

i|+∏ n i=

1|x

i|30

[-10,10]

0.0

Rosenbrock

f3(X)=

∑ n−1

i=1[100(x

i+1−x2 i)2+(x

i−

1)2]

30[-30,30]

0.0

Schwefel

2.26

f4(X)=−∑ n i=

1sin(√ (|x

i|))

30[-500,500]

-418.9829D

Ackley

f5(X)=−20exp(−

0.2√ 1 n

∑ n i=1x2 i)

30[-32,32]

0.0

−exp(1 n

∑ n i=1cos(2π

xi))+20

+e

Griew

ank

f6(X)=

14000

∑ n i=1x2 i−

∏ n i=1cos(

xi

√i)+1

30[-600,600]

0.0

87

Table 6.2: Statistical results of different methods for Sphere function (f1).Method Minimum fitness Maximum fitness Average fitnessGSA 1.21E-17 3.25E-17 2.08E-17

CGSA-1 1.38E-17 3.11E-17 2.11E-17CGSA-2 8.18E-18 3.50E-17 1.99E-17CGSA-3 1.11E-17 3.69E-17 2.01E-17CGSA-4 1.10E-17 4.05E-17 1.98E-17CGSA-5 1.38E-17 3.65E-17 2.19E-17

Table 6.3: Statistical results of different methods for Schwefel function (f2).Method Minimum fitness Maximum fitness Average fitnessGSA 1.44E-8 3.11E-8 2.28E-8


Table 6.4: Statistical results of different methods for Rosenbrock function (f3).Method Minimum fitness Maximum fitness Average fitnessGSA 25.70 152.14 35.19

CGSA-1 25.44 85.49 27.62CGSA-2 24.80 136.43 33.29CGSA-3 25.07 27.06 25.42CGSA-4 25.17 25.58 25.43CGSA-5 23.73 82.17 29.75

Table 6.5: Statistical results of different methods for Schwefel 2.26 function (f4).Method Minimum fitness Maximum fitness Average fitnessGSA -3617.23 -2178.52 -2844.65

CGSA-1 -4288.88 -2321.88 -3110.29CGSA-2 -7693.55 -4158.85 -5250.43CGSA-3 -7001.99 -3645.29 -5050.60CGSA-4 -7180.01 -3448.26 -4887.43CGSA-5 -12561.4 -12123.8 -12383.54

88

Table 6.6: Statistical results of different methods for Ackley function (f5).Method Minimum fitness Maximum fitness Average fitnessGSA 2.64E-9 4.42E-9 3.40E-9


Table 6.7: Statistical results of different methods for Griewank function (f6).Method Minimum fitness Maximum fitness Average fitnessGSA 1.37 12.52 4.28

CGSA-1 1.25 4.50 2.17CGSA-2 1.01E-14 4.41E-2 3.60E-3CGSA-3 1.60E-14 7.31E-2 1.02E-2CGSA-4 1.02E-14 7.5E-2 7.17E-3CGSA-5 3.62E-2 0.88 0.38

89

Chapter 7

Evolutionary Model:Multi-objective DifferentialEvolution

7.1 Introduction

Differential evolution (DE) algorithm [154] is a novel technique that was originally

thought to solve the problem of Chebyshev polynomial. It is a population based s-

tochastic meta-heuristic for global optimization on continuous domains which related

both with simplex methods and evolutionary algorithms. Due to its simplicity, robust-

ness, and effectiveness, DE is successfully applied in solving optimization problems

arising in various practical applications [155], such as data clustering, image process-

ing, etc. DE outperforms many other evolutionary algorithms in terms of convergence

speed and the accuracy of solutions. Its performance, however, is still quite dependent

on the setting of control parameters such as the mutation factor [156] for complex

real-world optimization problems, especially those with multiple objectives [157,158].

In multiple objective problems, several objectives (or criteria) are, not unusually,

stay in conflict with each other, thus requiring a set of non-dominated solutions,

i.e., Pareto-optimal solutions to be the candidates for decision. The general goals

of this requirement are the discovery of solutions as close to the Pareto-optimal as

possible, and the distribution of solutions as diverse as possible in the obtained non-

dominated set. Many works have been reported to satisfying these two goals. Wang

90

et al. [159] proposed a crowding entropy-based diversity measure to select the elite

solutions into the elitist archive. Zhang et al. [160] utilized the direction information

provided by archived inferior solutions to evolve the differential mutations. Gong

et al. [161] introduced the ε-dominance and orthogonal design into DE to keep the

diversity of the individuals along the trade-off surface. More recently, Chen et al.

[162] proposed a cluster degree based individual selection method to maintain the

diversity of non-dominated solutions. A hybrid opposition-based DE algorithm was

proposed by combining with a multi-objective evolutionary gradient search [163].

Although these variants of multi-objective DE have demonstrated that DE is suitable

for handling multiple objectives, rare work, however, is carried out to discuss the

setting of control parameters involving the mutation factor in the multi-objective

DE.

Based on the above consideration, in this work, we proposed an adaptive mutation

operator into DE to avoid the premature convergence of non-dominated solutions. In

the former searching phases, the setting of mutation scale factor F remains large

enough to explore the search space sounding to the non-dominated solutions, thus

maintaining the diversity of the distribution of Pareto set. Along with the lapse of

evolution, F is gradually reduced to perform the exploitation around the promis-

ing search area, aiming to reserve good information and to avoid the destruction of

the optimal solutions. Furthermore, as noticed by Zitzler et al. [164] that elitism

helps in achieving better convergence of solutions in multi-objective evolutionary

algorithm, an elitist scheme is adopted by maintaining an external archive of non-

dominated solutions obtained in the evolution process. Moreover, the ε-dominance

strategy [165] which can provide a good compromise in terms of convergence near

to the Pareto-optimal and the diversity of Pareto fronts is also used in the algorith-

m. It is expected that, with the utilization of elitist scheme and ε-dominance, the

cardinality of Pareto-optimal region can be reduced, and no two obtained solutions

are located within relative small regions. To verify the performance of the proposed

algorithm, five widely used benchmark multiple objective functions are utilized as the

test suit. Experimental results indicate that the proposed adaptive mutation based

91

multi-objective DE outperforms traditional multi-objective evolutionary algorithms

in terms of the convergence and diversity of the Pareto fronts.

7.2 Brief Introduction to DE

The standard DE is essentially a kind of special genetic algorithm based on real

parameter and greedy strategy for ensuring quality. An iteration of the classical DE

algorithm consists of the four basic steps: initialization of a population of search

variable vectors, mutation, crossover or recombination, and finally selection. DE

begins its search with a randomly initiated population for a global optimum point

in a D-dimensional real parameter space. We denote subsequent generations in DE

by G = {0, 1, 2, · · · , Gmax} and the i-th (i = 1, 2, ..., NP ) individual of the current

population is denoted as Xi,G = (x1i,G, x2

i,G, ... xji,G, ..., xD

i,G). The initial population

is randomly generated by:

xj,i,0 = xj,min + randi,j[0, 1] ∗ (xj,max − xj,min) (7.1)

where randi,j[0, 1] is a uniformly distributed random number in [0, 1], xj,min and xj,max

represents the boundary values of the search space. For each individual vector Xi,G

(target vector), differential evolution algorithm uses mutation operator to generate a

new individual Vi,G (variation vector), which is generated according to Eq. (2).

Vi,G = Xr1,G + F ∗ (Xr2,G −Xr3,G) (7.2)

where three individuals vectors Xr1,G, Xr2,G and Xr3,G are selected randomly from

the current populations. r1, r2, r3 ∈ {1, 2, · · · , NP} are random indexes. F is a real

and constant scale factor ∈ [0, 2] which controls the amplification of the differential

variation (Xr2,G - Xr3,G). In order to increase the potential diversity of the per-

turbed parameter vectors, a crossover operation comes into play after generating the

donor vector through mutation. The binomial crossover operation was shown in the

92

following.

ui,G =

vji,G, if randi,j[0, 1] 6 Cr or j = jrand

Xji,G, otherwise

(7.3)

where Cr is called the crossover rate. randi,j ∈ [0, 1]. After DE generates offspring

through mutation and crossover operation, the one-to-one greedy selection operator

is performed as:

ui,G+1 =

U ji,G, if f(Ui,G) 6 f(Xi,G)

Xji,G, otherwise

(7.4)

7.3 Design of multi-objective differential evolution

algorithm

For solving multiple objective problems, the general requirements of the approxima-

tion of the Pareto optimal set are two-fold: (1) minimize the distance to the true

pareto optimal fronts, and (2) the distribution of the obtained non-dominated solu-

tions are located as diverse as possible [166]. The purpose of this research is aimed

to address the above two requirements, and the processes of the proposed adaptive

mutation based ε-dominance differential evolution (IDE) are summarized in Fig. 7.1.

To generate initial solutions evenly located over the whole decision space, the

orthogonal experimental design method [167] is adopted in IDE. Refer to [168] for

detailed description of the orthogonal experimental design in population-based evo-

lutionary algorithm. After generating the orthogonal population (denoted as OP ),

an initial archive with the nondominated individuals extracted from OP through the

traditional Pareto dominance method [169] is created. Then the initial evolutionary

population (EP ) which is responsible for finding new non-dominated solutions is gen-

erated from the initial archive and OP . If the size of initial archive is larger than NP ,

93

Start

Generate the initial Orthogonal population

OP

Generate the initial population AR with nondominated

solutions from OP

Generate the initial EP from the initial

AR and OP

Whether the termination

condition is satisfied ?

Using the improved differential evolution operation produce

offspring, and evaluate the child individual

Update evolutionary population

Update the AR

by using the ε-dominance technology

G ++

End

Output the final AR

Y

N

Figure 7.1: The general flow chart of the proposed adaptive mutation based multi-objective differential evolution (IDE).

we select NP solutions from the initial archive randomly, or all of the ar size (which

is the size of the initial archive) solutions in the initial archive are inserted into EP ,

and the remainder NP - ar size solutions are selected from OP randomly. In order

to accelerate the algorithm convergence and make use of the archive individual to

guide the evolution, we adopt a hybrid selection mechanism when selecting the target

vector Xr1 as shown in Eq. (2). At the beginning phase of the evolution, all of the

parents for mating are randomly selected from EP to generate the offspring. With

the lapse of evolution, the elitist selection is used. We randomly choose one solution

from the archive as the base parent, and the other two parents are selected from the

evolution population EP randomly.

In previously reported works [159–163], all those multi-objective DE algorithm-

s set the scaling factor F as a constant in the whole process of evolution, which

94

made the search appear precocious phenomenon frequently. It is very sensitive to

set scaling factor F for traditional differential evolution algorithms. Experimental

work in a variety of DE algorithms has provided strong evidence supporting the view

that the performance of the algorithm is strongly depending on the setting of F val-

ues [170, 171]. To be more specifically, if the F value is too large, the DE algorithm

approximates for random search, thus the search efficiency and the accuracy of getting

the global optimal solution are quite low. On the contrary, if the F value is too small,

it can lose the diversity of population into the prematurity. To alleviate this prob-

lem, we propose an adaptive mutation operator that can determine the mutation rate

adaptively according to the progress of the search of the algorithm, thus enabling the

algorithm to possess greater mutation rates in the early search stages to maintain the

individuals’ diversity and to avoid precocious phenomena during the process. Later,

the mutation operator was gradually reduced to reserve good information and avoid

the destruction of the optimal solution, and meanwhile it increases the probability of

searching to the optimal solutions.

To realize the above characteristic of the setting of F , an adaptive setting rule is

designed as in Eqs. (6) and (7).

t = e1−Gm

Gm+1−G (7.5)

F = F0 ∗ 2G (7.6)

where F0 is initial mutation operator. Gm denotes the maximum number of fitness

evaluation. G indicates the current evolution number. At the beginning search phase

of the algorithm, the adaptive mutation operator is carried out with a probability

within [F0 - 2F0], which is a relatively large value to maintain the individual diversi-

ty. Along with the lapse of evolution, the mutation operator is gradually reduced to

reserve good information and expected to well balance the exploration and exploita-

tion of the search.

In addition, as noticed by Zitzler et al. [172] that elitism helps in achieving bet-

ter convergence in handling multiple objectives. Therefore, in this paper, the elitist

95

scheme is adopted through maintaining an external archive AR of nondominated so-

lutions found in evolutionary process. In order to achieve faster convergence, we

adopted [173] ε-dominance mechanism to update archive population. At each gen-

eration, the newly generated non-dominated solution is compared with each other

member which is already contained in the archive. The new individual can be saved

in the archive only when it meets the requirements that no individuals within a ε

distance exist. By doing so, we can ensure both convergence and diversity of the

Pareto fronts within reasonable computational times.

7.4 Simulation and Analysis

Multi-objective optimization problem is also known as multi-criteria optimization

problem [174]. In order to evaluate the effectiveness of the proposed IDE and make

a comparison with other multi-objective evolutionary algorithms, five widely used

benchmark problems [172] involving ZDT1, ZDT2, ZDT3, ZDT4 and ZDT6 are

adopted as the test suit. All problems have two objective functions and all objective

functions are to be minimized. The parameter settings of IDE are as follows: the

maximum number of fitness evaluation Gm = 5000, the initial scaling factor value of

F0=0.5, the crossover probability of CR = 0.3, NP = 100. For each problem, we run

50 times independently with different random seeds, then compared the performance

of IDE with the one of the traditional multi-objective DE variants (MDE) [161]. In

addition, we compared the results of IDE algorithm with NSGA-II [169], SPEA2 [175]

and MOEO [176]. To assess the performance of the compared algorithms, the con-

vergence metric λ and the diversity metric ∆ are used [166]. The first convergence

metric λ measures the distance of the obtained non-dominated sets Q and the true

Pareto front approximation sets P ∗ as in Eq. (7).

λ =

∑|Q|i=1 di| Q |

(7.7)

96

Table 7.1: Comparison of the convergence metric between IDE and MDE.

Problem ZDT1 ZDT2 ZDT3 ZDT4 ZDT6MDE 0.0028 0.00064 0.0038 0.0026 0.0008IDE 0.00075 0.00084 0.0030 0.0020 0.00075

Table 7.2: Comparison of the diversity metric between IDE and MDE.

Problem ZDT1 ZDT2 ZDT3 ZDT4 ZDT6MDE 0.2536 0.38565 0.40025 0.3850 0.3571IDE 0.2425 0.2896 0.39575 0.2709 0.2595

where di is the Euclidean distance between the solution i ∈ Q and the nearest member

of P ∗. It is clear that the lower the λ value, the better convergence of obtained

solutions, suggesting that the obtained non-dominated sets are more closer to the

true Pareto fronts.

The second diversity metric measures the extent of distribution among the ob-

tained non-dominated sets Q. ∆ is defined as in Eq. (8).

∆ =df + dl +

∑|Q|−1i=1 | di − d |

df + dl + (| Q | −1)d(7.8)

where di measures the Euclidean distance of each point in Q to its closer point,

df and dl denote the Euclidean distance between the extreme points in Q and P ∗,

respectively. Obviously, the lower the ∆ value is, the better distribution of solutions

possess.

Table 7.1 records the convergence metric λ obtained by IDE and the previous

MDE algorithm [161]. The diversity metric ∆ obtained by IDE and MDE are shown

in Table 7.2. Table 7.3 shows the convergence metric obtained by IDE and three

multi-objective evolutionary algorithms. Table 7.4 illustrates comparative results in

terms of the diversity metric obtained by IDE and its competitors. From Table 7.1,

we can find that IDE performs better results with respect to the convergence on all

tested instances, except on ZDT2, which suggested that the incorporated adaptive

97

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

f1

f2

ZDT1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

f1

f2

ZDT1

True Pareto

Pareto front obtained by MDE

True Pareto

Pareto front obtained by IDE

0 0.2 0.4 0.6 0.8 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

f1

f2

ZDT2

0 0.2 0.4 0.6 0.8 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

f1f2

ZDT2

True Pareto


True Pareto


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

f1

f2

ZDT3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

f1

f2

ZDT3

True Pareto


True Pareto

Pareto front obtained by DE

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

f1

f2

ZDT4

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

f1

f2

ZDT4

True Pareto


True Pareto


0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

f1

f2

ZDT6

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

f1

f2

ZDT6

True Pareto


True Pareto


Figure 7.2: Pareto fronts obtained by IDE and its competitor algorithm MDE onZDT1, ZDT2, ZDT3, ZDT4, and ZDT6 respectively.

98

Table 7.3: Comparison of the convergence metric during IDE, NSGA-II, SPEA2, andMOEO.

Algorithm ZDT1 ZDT2 ZDT3 ZDT4 ZDT6NSGA-II 0.033482 0.072391 0.114500 0.513053 0.296564SPEA2 0.023285 0.16762 0.018409 4.9271 0.23255MOEO 0.001277 0.001355 0.004385 0.008145 0.000630IDE 0.00075 0.00084 0.0030 0.0020 0.00075

Table 7.4: Comparison of the diversity metric during IDE, NSGA-II, SPEA2, andMOEO.

Algorithm ZDT1 ZDT2 ZDT3 ZDT4 ZDT6NSGA-II 0.390307 0.430776 0.738540 0.702612 0.668025SPEA2 0.154723 0.33945 0.4691 0.8239 1.04422MOEO 0.327140 0.285062 0.965236 0.275567 0.225468IDE 0.2425 0.2896 0.39575 0.2709 0.2595

mutation strategy indeed help the search finding better solutions. On the other hand,

the comparative results in Table 7.2 show that IDE has capacity of finding a better

spread of solutions than MDE on all problems except ZDT6. From Table 7.3, it is

clear that IDE produces solutions significantly closer to the true Pareto fronts than

NSGA-II, SPEA2, and MOEO on all tested functions. An exception is that MOEO

can find slightly better solutions than IDE on ZDT6. With regards to the diversity

of obtained non-dominated solutions, as shown in Table 7.4, an overall improvement

can be found on IDE that its non-dominated solutions located more evenly than those

obtained by its competitor algorithms, verifying that the proposed adaptive mutation

strategy together with the ε-dominance no doubt improve the performance of DE in

terms of the diversity.

Furthermore, to further understand the performance of our improved algorith-

m more intuitively, Fig. 7.2 draws the Pareto fronts constructed by the obtained

non-dominated solutions that obtained by IDE and MDE on all tested functions re-

spectively. From this figure, it is clear that the Pareto fronts obtained by IDE is

much better than those by MDE. The performance on ZDT6 is quite illuminating to

99

further elaborate the search characteristics of the compared algorithms. Almost the

same number of non-dominated solutions are obtained by both algorithms, and the

average distance (measured by λ) to the true Pareto front is also within an acceptable

tolerance (0.0008 vs 0.00075). Nevertheless, the distribution of the non-dominated

solutions is quite different (0.3571 vs 0.2595). A significantly evenly distributed non-

dominated solutions for ZDT6 are obtained by IDE, implying that IDE is capable

of finding a well-distributed and near-complete set of non-dominated solutions when

handling multiobjectives.

7.5 Conclusion

This paper proposed an adaptive mutation operator based on the multi-objective d-

ifferential evolution algorithm. In the beginning of search phase, the algorithm has a

relatively large value to maintain the individuals’ diversity, and avoid the premature

phenomenon of fast convergence. With the lapse of evolution, the mutation oper-

ator was gradually reduced to reserve good information and avoid the destruction

to the optimal solution. Together with the ε-dominance strategy, we constructed

the effective IDE to handling multiple objectives. We test IDE via five standard

multi-objective test functions and the performance comparison during MDE, NSGA-

II, SPEA2 and MOEO. It can be concluded that IDE is superior to other algorithms

on multiple problems, indicating that our approach has ability to obtain effective

uniformly distributed and near-optimal Pareto sets.

100

Chapter 8

Conclusions

In this thesis, we proposed several models based on neural and evolutionary mecha-

nisms.

Firstly, we proposed a new single neuron model with synaptic nonlinearities in a

dendritic tree. The computation on neuron has a neuron-pruning function that can

reduce dimension by remove useless synapses and dendrites during learning, forming

a precise synaptic and dendritic morphology. The nonlinear interactions in a dendrite

tree are expressed using the Boolean logic AND (conjunction), OR (disjunction) and

NOT (negation). An error back propagation algorithm is used to train the neuron

model. Furthermore, we apply the new model to the Exclusive OR (XOR) problem

and it can solve the problem perfectly with the help of inhibitory synapses which

demonstrate synaptic nonlinear computation and the neurons ability to learn. The

research background is introduced in the following.

Secondly, accumulative study results have suggested that synaptic nonlinearities

of dendrites in a single neuron can possess powerful computational capacity. Our

previous works have established an approximate neuronal model which is able to cap-

ture the nonlinearities among excitatory and inhibitory inputs and thus successfully

predict the morphology of neurons when performing specific learning tasks. Gradient

based back-propagation (BP) method has been used to train the dendritic neuron

model. Due to its inherent local optima trapping problem, the BP method usually

cant find satisfactory solutions. Thus, we propose an artificial immune algorithm to

train the dendritic neuron model. In comparison to BP, the artificial immune algo-

101

rithm has advantages that the training process doesnt necessarily provide gradient

information, which enables the dendritic model can utilize non-conventional trans-

fer/activation functions in soma, and that the learning can be accomplished based

on a population of antibodies which is in a potential parallel computing manner and

greatly improves the probability of jumping out the local optima during training. Ex-

perimental results based on the famous XOR problem and a geotechnical engineering

problem verified the effectiveness of the proposed artificial immune algorithm.

Thirdly, with the number of liver disease deaths has been steadily increasing in

recent years, early detection and treatment for liver disease has been one of the most

active researches on using computational intelligence techniques. In this chapter, we

propose a more realistic single neuron model with synaptic nonlinearities in a den-

dritic tree for liver disorders diagnosis. The computation on neuron is performed as a

combination of dimensional reduction and nonlinearity, which has a neuron-pruning

function that can remove useless synapses and dendrites during learning, forming a

distinct synaptic and dendritic morphology. The nonlinear interactions in a dendrite

tree are expressed using the Boolean logic AND (conjunction), OR (disjunction) and

NOT (negation), which can be simply suitable for hardware implementation. Fur-

thermore, an error back propagation algorithm is used to train the neuron model

and the performance is compared with a traditional back propagation neural network

in terms of accuracy, sensitivity and specificity. We use the BUPA liver disorders

datasets obtained from the UCI Machine Learning Repository to verify the proposed

method. Simulation results show promise for use of this single neuron model as an

effective pattern classification method in liver disorders diagnostics.

Fourthly, gravitational search algorithm (GSA) has gained increasing attention

in dealing with complex optimization problems. Nevertheless it still has some draw-

backs, such as slow convergence and the tendency to become trapped in local minima.

Chaos generated by the logistic map, with the properties of ergodicity and stochastic-

ity, has been used to combine with GSA to enhance its searching performance. In this

work, other four different chaotic maps are utilized to further improve the searching

capacity of the hybrid chaotic gravitational search algorithm (CGSA), and six widely

102

used benchmark optimization instances are chosen from the literature as the test suit.

Simulation results indicate that all five chaotic maps can improve the performance of

the original GSA in terms of the solution quality and convergence speed. Moreover,

the four newly incorporated chaotic maps exhibit better influence on improving the

performance of GSA than the logistic map, suggesting that the hybrid searching dy-

namics of CGSA is significantly effected by the distribution characteristics of chaotic

maps.

Fifthly, differential evolution is well known as a powerful and efficient population-

based stochastic real-parameter optimization algorithms over continuous space. DE

is recently shown to outperform several well-known stochastic optimization methods

in solving multi-objective problems. Nevertheless, its performance is still limited in

finding a uniformly distributed and near optimal Pareto fronts. To alleviate such

limitations, this paper introduces an adaptive mutation operator to avoid prema-

ture of convergence by adaptively tuning the mutation scale factor F , and adopts

ε-dominance strategy to update the archive that stores the nondominated solutions.

Experiments based on five widely used multiple objective functions are conducted.

Simulation results demonstrate the effectiveness of our proposed approach with re-

spect to the quality of solutions in terms of the convergence and diversity of the

Pareto fronts.

103

Bibliography

[1] A. P. Engelbrecht, Computational intelligence: an introduction. John Wiley

& Sons, 2007.

[2] C. Darwin, The Origins of Species by Means of Natural Selection, Or the P-

reservation of Favoured Races in the Struggle for Life. Kartindo. com, 1888.

[3] L. A. Zadeh, “Fuzzy sets,” Information and control, vol. 8, no. 3, pp. 338–353,

1965.

[4] E. Marais, The soul of the white ant. the Philovox, 2009.

[5] C. D. Wynne, “The soul of the ape,” American Scientist, vol. 89, no. 2, pp.

120–122, 2001.

[6] R. C. Eberhart and J. Kennedy, “A new optimizer using particle swarm theo-

ry,” in Proceedings of the sixth international symposium on micro machine and

human science, vol. 1. New York, NY, 1995, pp. 39–43.

[7] J. Kennedy, “Particle swarm optimization,” in Encyclopedia of Machine Learn-

ing. Springer, 2010, pp. 760–766.

[8] S. F. M. Burnet et al., The clonal selection theory of acquired immunity. Uni-

versity Press Cambridge, 1959.

[9] P. Bretscher and M. Cohn, “A theory of self-nonself discrimination paralysis and

induction involve the recognition of one and two determinants on an antigen,

respectively,” Science, vol. 169, no. 3950, pp. 1042–1049, 1970.

104

[10] K. J. Lafferty and A. Cunningham, “A new analysis of allogeneic interactions,”

Immunology and Cell Biology, vol. 53, no. 1, pp. 27–42, 1975.

[11] B. Franklin and M. Bergerman, “Cultural algorithms: Concepts and experi-

ments,” in Evolutionary Computation, 2000. Proceedings of the 2000 Congress

on, vol. 2. IEEE, 2000, pp. 1245–1251.

[12] S. Forrest, A. S. Perelson, L. Allen, and R. Cherukuri, “Self-nonself discrimi-

nation in a computer,” in Proceedings of the IEEE Symposium on Research in

Security and Privacy. Ieee, 1994, p. 202.

[13] K. Mori, M. Tsukiyama, and T. Fukuda, “Immune algorithm with searching

diversity and its application to resource allocation problem,” Transactions-

Institute of Electrical Engineers of Japan C, vol. 113, pp. 872–872, 1993.

[14] N. K. Jerne, “Towards a network theory of the immune system,” in Annales

d’immunologie, vol. 125, no. 1-2, 1974, pp. 373–389.

[15] A. S. Perelson, “Immune network theory,” Immunological reviews, vol. 110,

no. 1, pp. 5–36, 1989.

[16] J. D. Farmer, N. H. Packard, and A. S. Perelson, “The immune system, adapta-

tion, and machine learning,” Physica D: Nonlinear Phenomena, vol. 22, no. 1,

pp. 187–204, 1986.

[17] J. E. Hunt and D. E. Cooke, “Learning using an artificial immune system,”

Journal of network and computer applications, vol. 19, no. 2, pp. 189–212, 1996.

[18] L. Chen and D. B. Flies, “Molecular mechanisms of t cell co-stimulation and

co-inhibition,” Nature Reviews Immunology, vol. 13, no. 4, pp. 227–242, 2013.

[19] C. D. Mills, K. Ley, K. Buchmann, and J. Canton, “Sequential immune re-

sponses: The weapons of immunity,” Journal of innate immunity, vol. 7, no. 5,

2015.

105

[20] P. Matzinger, “Essay 1: the danger model in its historical context,” Scandina-

vian journal of immunology, vol. 54, no. 1-2, pp. 4–9, 2001.

[21] ——, “The real function of the immune system,” Last accessed on, pp. 06–04,

2004.

[22] U. Aickelin, D. Dasgupta, and F. Gu, “Artificial immune systems,” in Search

Methodologies. Springer, 2014, pp. 187–211.

[23] K. Makisara, O. Simula, J. Kangas, and T. Kohonen, Artificial neural networks.

Elsevier, 2014, vol. 2.

[24] T. Back, U. Hammel, and H.-P. Schwefel, “Evolutionary computation: Com-

ments on the history and current state,” Evolutionary computation, IEEE

Transactions on, vol. 1, no. 1, pp. 3–17, 1997.

[25] H.-G. Beyer, The theory of evolution strategies. Springer Science & Business

Media, 2013.

[26] Y. Hu, K. Liu, X. Zhang, L. Su, E. Ngai, and M. Liu, “Application of evolution-

ary computation for rule discovery in stock algorithmic trading: A literature

review,” Applied Soft Computing, vol. 36, pp. 534–551, 2015.

[27] W. Gong, Z. Cai, and D. Liang, “Adaptive ranking mutation operator based

differential evolution for constrained optimization,” Cybernetics, IEEE Trans-

actions on, vol. 45, no. 4, pp. 716–727, 2015.

[28] J. C. Bezdek, “Ieee fellows-class of 2015 [society briefs],” Computational Intel-

ligence Magazine, IEEE, vol. 10, no. 2, pp. 7–17, 2015.

[29] W. Pedrycz, A. Sillitti, and G. Succi, “Computational intelligence: an intro-

duction,” in Computational Intelligence and Quantitative Software Engineering.

Springer, 2016, pp. 13–31.

[30] G. Beni, “From swarm intelligence to swarm robotics,” in Swarm robotics.

Springer, 2005, pp. 1–9.

106

[31] J. Halloy, G. Sempo, G. Caprari, C. Rivault, M. Asadpour, F. Tache, I. Said,

V. Durier, S. Canonge, J. M. Ame et al., “Social integration of robots into

groups of cockroaches to control self-organized choices,” Science, vol. 318, no.

5853, pp. 1155–1158, 2007.

[32] R. S. Parpinelli and H. S. Lopes, “New inspirations in swarm intelligence: a

survey,” International Journal of Bio-Inspired Computation, vol. 3, no. 1, pp.

1–16, 2011.

[33] M. Dorigo and C. Blum, “Ant colony optimization theory: A survey,” Theoret-

ical computer science, vol. 344, no. 2, pp. 243–278, 2005.

[34] C. Blum, “Ant colony optimization: Introduction and recent trends,” Physics

of Life reviews, vol. 2, no. 4, pp. 353–373, 2005.

[35] M. Dorigo, M. Birattari, and T. Stutzle, “Ant colony optimization,” Computa-

tional Intelligence Magazine, IEEE, vol. 1, no. 4, pp. 28–39, 2006.

[36] W. Xiang and H. Lee, “Ant colony intelligence in multi-agent dynamic manufac-

turing scheduling,” Engineering Applications of Artificial Intelligence, vol. 21,

no. 1, pp. 73–85, 2008.

[37] T. Blackwell and J. Branke, “Multiswarms, exclusion, and anti-convergence

in dynamic environments,” Evolutionary Computation, IEEE Transactions on,

vol. 10, no. 4, pp. 459–472, 2006.

[38] M. Dorigo, V. Maniezzo, and A. Colorni, “Ant system: optimization by a colony

of cooperating agents,” Systems, Man, and Cybernetics, Part B: Cybernetics,

IEEE Transactions on, vol. 26, no. 1, pp. 29–41, 1996.

[39] T. Stutzle and H. H. Hoos, “Max–min ant system,” Future generation computer

systems, vol. 16, no. 8, pp. 889–914, 2000.

107

[40] T. Stutzle and H. Hoos, “Max-min ant system and local search for the traveling

salesman problem,” in Evolutionary Computation, 1997., IEEE International

Conference on. IEEE, 1997, pp. 309–314.

[41] C. A. C. Coello, D. A. Van Veldhuizen, and G. B. Lamont, Evolutionary algo-

rithms for solving multi-objective problems. Springer, 2002, vol. 242.

[42] E. Bonabeau, M. Dorigo, and G. Theraulaz, Swarm intelligence: from natural

to artificial systems. Oxford university press, 1999, no. 1.

[43] J. Kennedy, J. F. Kennedy, R. C. Eberhart, and Y. Shi, Swarm intelligence.

Morgan Kaufmann, 2001.

[44] Y. Shi and R. Eberhart, “A modified particle swarm optimizer,” in Evolution-

ary Computation Proceedings, 1998. IEEE World Congress on Computational

Intelligence., The 1998 IEEE International Conference on. IEEE, 1998, pp.

69–73.

[45] J. J. Liang, A. K. Qin, P. N. Suganthan, and S. Baskar, “Comprehensive learn-

ing particle swarm optimizer for global optimization of multimodal functions,”

Evolutionary Computation, IEEE Transactions on, vol. 10, no. 3, pp. 281–295,

2006.

[46] C. A. C. Coello, G. T. Pulido, and M. S. Lechuga, “Handling multiple ob-

jectives with particle swarm optimization,” Evolutionary Computation, IEEE

Transactions on, vol. 8, no. 3, pp. 256–279, 2004.

[47] J. Robinson and Y. Rahmat-Samii, “Particle swarm optimization in electro-

magnetics,” Antennas and Propagation, IEEE Transactions on, vol. 52, no. 2,

pp. 397–407, 2004.

[48] R. Mendes, J. Kennedy, and J. Neves, “The fully informed particle swarm: sim-

pler, maybe better,” Evolutionary Computation, IEEE Transactions on, vol. 8,

no. 3, pp. 204–210, 2004.

108

[49] F. Van den Bergh and A. P. Engelbrecht, “A cooperative approach to particle

swarm optimization,” Evolutionary Computation, IEEE Transactions on, vol. 8,

no. 3, pp. 225–239, 2004.

[50] D. Dasgupta, Z. Ji, F. A. Gonzalez et al., “Artificial immune system (ais)

research in the last five years.” in IEEE Congress on Evolutionary Computation

(1), 2003, pp. 123–130.

[51] S. A. Hofmeyr and S. Forrest, “Architecture for an artificial immune system,”

Evolutionary computation, vol. 8, no. 4, pp. 443–473, 2000.

[52] S. Tonegawa, “Somatic generation of antibody diversity,” Nature, vol. 302, no.

5909, pp. 575–581, 1983.

[53] P. Matzinger, “The danger model: a renewed sense of self,” Science, vol. 296,

no. 5566, pp. 301–305, 2002.

[54] J. Timmis, M. Neal, and J. Hunt, “An artificial immune system for data anal-

ysis,” Biosystems, vol. 55, no. 1, pp. 143–150, 2000.

[55] J. Timmis and M. Neal, “A resource limited artificial immune system for data

analysis,” Knowledge-Based Systems, vol. 14, no. 3, pp. 121–130, 2001.

[56] M. J. Shlomchik, A. Marshak-Rothstein, C. B. Wolfowicz, T. L. Rothstein, and

M. G. Weigert, “The role of clonal selection and somatic mutation in autoim-

munity,” Nature, vol. 328, no. 6133, pp. 805–811, 1987.

[57] P. K. Harmer, P. D. Williams, G. H. Gunsch, and G. B. Lamont, “An artificial

immune system architecture for computer security applications,” Evolutionary

computation, IEEE transactions on, vol. 6, no. 3, pp. 252–280, 2002.

[58] L. N. De Castro and F. J. Von Zuben, “Learning and optimization using the

clonal selection principle,” Evolutionary Computation, IEEE Transactions on,

vol. 6, no. 3, pp. 239–251, 2002.

109

[59] S. GAO, H. DAI, G. YANG, and Z. TANG, “A novel clonal selection algorithm

and its application to traveling salesman problem,” IEICE Trans. on funda-

mentals of electronics, communications and computer science, vol. 90, no. 10,

pp. 2318–2325, 2007.

[60] Y. Yu, L. Cunhua, G. Shangce, and T. Zheng, “Quantum interference crossover-

based clonal selection algorithm and its application to traveling salesman prob-

lem,” IEICE Trans. on Information and Systems, vol. 92, no. 1, pp. 78–85,

2009.

[61] G. Shangce, T. Zheng, and J. ZHANG, “An improved clonal selection algorithm

and its application to traveling salesman problems,” IEICE Transactions on

Fundamentals of Electronics, Communications and Computer Sciences, vol. 90,

no. 12, pp. 2930–2938, 2007.

[62] K. Tanaka and M. Sugeno, “Stability analysis and design of fuzzy control sys-

tems,” Fuzzy sets and systems, vol. 45, no. 2, pp. 135–156, 1992.

[63] H. Li, S. Yin, Y. Pan, and H.-K. Lam, “Model reduction for interval type-2

takagi–sugeno fuzzy systems,” Automatica, vol. 61, pp. 308–314, 2015.

[64] W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in

nervous activity,” The bulletin of mathematical biophysics, vol. 5, no. 4, pp.

115–133, 1943.

[65] M. Minsky and S. Papert, Perceptrons: An essay in computational geometry.

Cambridge, MA: MIT Press, 1969.

[66] B. W. Mel, “Information processing in dendritic trees,” Neural Computation,

vol. 6, no. 6, pp. 1031–1085, 1994.

[67] M. Hausser and B. Mel, “Dendrites: bug or feature?” Current opinion in

neurobiology, vol. 13, no. 3, pp. 372–383, 2003.

110

[68] Y. Todo, H. Tamura, K. Yamashita, and Z. Tang, “Unsupervised learnable neu-

ron model with nonlinear interaction on dendrites,” Neural Networks, vol. 60,

pp. 96–103, 2014.

[69] C. Koch, T. Poggio, and V. Torre, “Nonlinear interactions in a dendritic tree:

localization, timing, and role in information processing,” Proceedings of the

National Academy of Sciences, vol. 80, no. 9, pp. 2799–2802, 1983.

[70] R. M. Garcıa-Gimeno, C. Hervas-Martınez, and M. I. de Siloniz, “Improving

artificial neural networks with a pruning methodology and genetic algorithms for

their application in microbial growth prediction in food,” International Journal

of Food Microbiology, vol. 72, no. 1, pp. 19–30, 2002.

[71] R. C. Paolicelli, G. Bolasco, F. Pagani, L. Maggi, M. Scianni, P. Panzanelli,

M. Giustetto, T. A. Ferreira, E. Guiducci, L. Dumas et al., “Synaptic pruning

by microglia is necessary for normal brain development,” Science, vol. 333, no.

6048, pp. 1456–1458, 2011.

[72] L. K. Low and H.-J. Cheng, “Axon pruning: an essential step underlying the

developmental plasticity of neuronal connections,” Philosophical Transactions

of the Royal Society of London B: Biological Sciences, vol. 361, no. 1473, pp.

1531–1544, 2006.

[73] M. M. Islam, M. Akhand, M. A. Rahman, and K. Murase, “Weight freezing to

reduce training time in designing artificial neural networks,” in Proceedings of

International Conference on Computer and Information Technology, 2002, pp.

132–136.

[74] J. Sietsma and R. J. Dow, “Neural net pruning-why and how,” in IEEE Inter-

national Conference on Neural Networks. IEEE, 1988, pp. 325–333.

[75] H. Cuntz, M. Remme, and B. Torben-Nielsen, The Computing Dendrite: From

Structure to Function. Springer, 2014.

111

[76] J. C. Magee, “Dendritic integration of excitatory synaptic input,” Nature Re-

views Neuroscience, vol. 1, no. 3, pp. 181–190, 2000.

[77] S. R. Williams and G. J. Stuart, “Role of dendritic synapse location in the

control of action potential output,” Trends in neurosciences, vol. 26, no. 3, pp.

147–154, 2003.

[78] M. London and M. Hausser, “Dendritic computation,” Annu. Rev. Neurosci.,

vol. 28, pp. 503–532, 2005.

[79] A. T. Gulledge, B. M. Kampa, and G. J. Stuart, “Synaptic integration in den-

dritic trees,” Journal of neurobiology, vol. 64, no. 1, pp. 75–90, 2005.

[80] T. Branco and M. Hausser, “The single dendritic branch as a fundamental

functional unit in the nervous system,” Current opinion in neurobiology, vol. 20,

no. 4, pp. 494–502, 2010.

[81] H. Sossa and E. Guevara, “Efficient training for dendrite morphological neural

networks,” Neurocomputing, vol. 131, pp. 132–142, 2014.

[82] P. J. Sjostrom, E. A. Rancz, A. Roth, and M. Hausser, “Dendritic excitability

and synaptic plasticity,” Physiological reviews, vol. 88, no. 2, pp. 769–840, 2008.

[83] X. Chen, U. Leischner, N. L. Rochefort, I. Nelken, and A. Konnerth, “Functional

mapping of single spines in cortical neurons in vivo,” Nature, vol. 475, no. 7357,

pp. 501–505, 2011.

[84] E. Salinas and L. Abbott, “A model of multiplicative neural responses in parietal

cortex,” Proceedings of the national academy of sciences, vol. 93, no. 21, pp.

11 956–11 961, 1996.

[85] F. Gabbiani, H. G. Krapp, C. Koch, and G. Laurent, “Multiplicative compu-

tation in a visual neuron sensitive to looming,” Nature, vol. 420, no. 6913, pp.

320–324, 2002.

112

[86] M. Liang, S.-X. Wang, and Y.-H. Luo, “Fast learning algorithms for multi-

layered feedforward neural network,” in Aerospace and Electronics Conference,

1994. NAECON 1994., Proceedings of the IEEE 1994 National. IEEE, 1994,

pp. 787–790.

[87] C. Charalambous, “Conjugate gradient algorithm for efficient training of arti-

ficial neural networks,” in Circuits, Devices and Systems, IEE Proceedings G,

vol. 139, no. 3. IET, 1992, pp. 301–310.

[88] M. T. Hagan and M. B. Menhaj, “Training feedforward networks with the

marquardt algorithm,” Neural Networks, IEEE Transactions on, vol. 5, no. 6,

pp. 989–993, 1994.

[89] X. Yao and Y. Liu, “A new evolutionary system for evolving artificial neural

networks,” Neural Networks, IEEE Transactions on, vol. 8, no. 3, pp. 694–713,

1997.

[90] J. Ilonen, J.-K. Kamarainen, and J. Lampinen, “Differential evolution training

algorithm for feed-forward neural networks,” Neural Processing Letters, vol. 17,

no. 1, pp. 93–105, 2003.

[91] J. Yu, L. Xi, and S. Wang, “An improved particle swarm optimization for evolv-

ing feedforward artificial neural networks,” Neural Processing Letters, vol. 26,

no. 3, pp. 217–231, 2007.

[92] S. Kiranyaz, T. Ince, A. Yildirim, and M. Gabbouj, “Evolutionary artificial neu-

ral networks by multi-dimensional particle swarm optimization,” Neural Net-

works, vol. 22, no. 10, pp. 1448–1462, 2009.

[93] S. Mirjalili, S. Z. M. Hashim, and H. M. Sardroudi, “Training feedforward

neural networks using hybrid particle swarm optimization and gravitational

search algorithm,” Applied Mathematics and Computation, vol. 218, no. 22, pp.

11 125–11 137, 2012.

113

[94] S. Mirjalili, S. M. Mirjalili, and A. Lewis, “Let a biogeography-based optimizer

train your multi-layer perceptron,” Information Sciences, vol. 269, pp. 188–209,

2014.

[95] N. L. Azad, A. Mozaffari, and J. K. Hedrick, “A hybrid switching predictive

controller based on bi-level kernel-based elm and online trajectory builder for

automotive coldstart emissions reduction,” Neurocomputing, vol. 173, pp. 1124–

1141, 2016.

[96] C. Yang, L. Tham, X.-T. Feng, Y. Wang, and P. Lee, “Two-stepped evolution-

ary algorithm and its application to stability analysis of slopes,” Journal of

Computing in Civil Engineering, vol. 18, no. 2, pp. 145–153, 2004.

[97] S. K. Das, R. K. Biswal, N. Sivakugan, and B. Das, “Classification of slopes

and prediction of factor of safety using differential evolution neural networks,”

Environmental Earth Sciences, vol. 64, no. 1, pp. 201–210, 2011.

[98] Http://www.liverfoundation.org/downloads/ alf download 1173.pdf.

[99] Http://www.endoflifecare-intelligence.org.uk/resources/ publication-

s/deaths from liver disease.

[100] P. Jeatrakul and K. Wong, “Comparing the performance of different neural

networks for binary classification problems,” in Eighth International Symposium

on Natural Language Processing. IEEE, 2009, pp. 111–115.

[101] Y. Zhang, Y. Yin, D. Guo, X. Yu, and L. Xiao, “Cross-validation based weights

and structure determination of chebyshev-polynomial neural networks for pat-

tern classification,” Pattern Recognition, vol. 47, no. 10, pp. 3414–3428, 2014.

[102] M. Seera and C. P. Lim, “A hybrid intelligent system for medical data clas-

sification,” Expert Systems with Applications, vol. 41, no. 5, pp. 2239–2249,

2014.

114

[103] M. Paliwal and U. A. Kumar, “Neural networks and statistical techniques: A

review of applications,” Expert systems with applications, vol. 36, no. 1, pp.

2–17, 2009.

[104] C. Koch, Biophysics of computation: information processing in single neurons.

Oxford university press, 1998.

[105] A. Destexhe and E. Marder, “Plasticity in single neuron and circuit computa-

tions,” Nature, vol. 431, no. 7010, pp. 789–795, 2004.

[106] L. Abbott and W. G. Regehr, “Synaptic computation,” Nature, vol. 431, no.

7010, pp. 796–803, 2004.

[107] R. A. Silver, “Neuronal arithmetic,” Nature Reviews Neuroscience, vol. 11,

no. 7, pp. 474–489, 2010.

[108] Y. Todo, H. Tamura, K. Yamashita, and Z. Tang, “Unsupervised learnable neu-

ron model with nonlinear interaction on dendrites,” Neural Networks, vol. 60,

pp. 96–103, 2014.

[109] Q. K. Al-Shayea, “Artificial neural networks in medical diagnosis,” Internation-

al Journal of Computer Science Issues, vol. 8, no. 2, pp. 150–154, 2011.

[110] W. G. Baxt, “Application of artificial neural networks to clinical medicine,”

The lancet, vol. 346, no. 8983, pp. 1135–1138, 1995.

[111] E. Alkım, E. Gurbuz, and E. Kılıc, “A fast and adaptive automated disease

diagnosis method with an innovative neural network model,” Neural Networks,

vol. 33, pp. 88–96, 2012.

[112] F. Rosenblatt, Principles of neurodynamics. Spartan Book, 1962.

[113] G. N. Priya and A. Kannan, “An innovative classification model for cad dataset

using svm based iterative linear discriminant analysis,” in Power Electronics

and Renewable Energy Systems. Springer, 2015, pp. 1415–1423.

115

[114] S. Blomfield, “Arithmetical operations performed by nerve cells,” Brain re-

search, vol. 69, no. 1, pp. 115–124, 1974.

[115] N. Brunel, V. Hakim, and M. J. Richardson, “Single neuron dynamics and

computation,” Current opinion in neurobiology, vol. 25, pp. 149–155, 2014.

[116] W. Rall, R. Burke, T. Smith, P. G. Nelson, and K. Frank, “Dendritic location of

synapses and possible mechanisms for the monosynaptic epsp in motoneurons,”

J. Neurophysiol, vol. 30, no. 5, pp. 884–915, 1967.

[117] V. Torre and T. Poggio, “A synaptic mechanism possibly underlying directional

selectivity to motion,” Proceedings of the Royal Society of London B: Biological

Sciences, vol. 202, no. 1148, pp. 409–416, 1978.

[118] Y.-N. Jan and L. Y. Jan, “Branching out: mechanisms of dendritic arboriza-

tion,” Nature Reviews Neuroscience, vol. 11, no. 5, pp. 316–328, 2010.

[119] J. W. Schnupp and A. J. King, “Neural processing: the logic of multiplication

in single neurons,” Current Biology, vol. 11, no. 16, pp. R640–R642, 2001.

[120] S. Bahramirad, A. Mustapha, and M. Eshraghi, “Classification of liver disease

diagnosis: A comparative study,” in International Conference on Informatics

and Applications. IEEE, 2013, pp. 42–46.

[121] W. Zhu, N. Zeng, N. Wang et al., “Sensitivity, specificity, accuracy, associated

confidence interval and roc analysis with practical sas R⃝ implementations,” NE-

SUG proceedings: health care and life sciences, Baltimore, Maryland, pp. 1–9,

2010.

[122] P. Anooj, “Clinical decision support system: Risk level prediction of heart

disease using weighted fuzzy rules,” Journal of King Saud University-Computer

and Information Sciences, vol. 24, no. 1, pp. 27–40, 2012.

[123] S. Ozsen and S. Gunes, “Attribute weighting via genetic algorithms for attribute

weighted artificial immune system (awais) and its application to heart disease

116

and liver disorders problems,” Expert Systems with Applications, vol. 36, no. 1,

pp. 386–392, 2009.

[124] J. F. Khaw, B. Lim, and L. E. Lim, “Optimal design of neural networks using

the taguchi method,” Neurocomputing, vol. 7, no. 3, pp. 225–245, 1995.

[125] Z. Beheshti, S. M. H. Shamsuddin, E. Beheshti, and S. S. Yuhaniz, “Enhance-

ment of artificial neural network learning using centripetal accelerated particle

swarm optimization for medical diseases diagnosis,” Soft Computing, vol. 18,

no. 11, pp. 2253–2270, 2014.

[126] D. Delen, G. Walker, and A. Kadam, “Predicting breast cancer survivability: a

comparison of three data mining methods,” Artificial intelligence in medicine,

vol. 34, no. 2, pp. 113–127, 2005.

[127] S. S. Haykin, S. S. Haykin, S. S. Haykin, and S. S. Haykin, Neural networks

and learning machines. Pearson Education Upper Saddle River, 2009, vol. 3.

[128] D. Pham, S. Dimov, and Z. Salem, “Technique for selecting examples in in-

ductive learning,” in European symposium on intelligent techniques, Aachen,

Germany. Citeseer, 2000, pp. 119–127.

[129] N. Cheung, “Machine learning techniques for medical analysis,” School of Infor-

mation Technology and Electrical Engineering, BsC thesis, University of Queen-

land, vol. 19, 2001.

[130] T. Van Gestel, J. A. Suykens, G. Lanckriet, A. Lambrechts, B. De Moor, and

J. Vandewalle, “Bayesian framework for least-squares support vector machine

classifiers, gaussian processes, and kernel fisher discriminant analysis,” Neural

Computation, vol. 14, no. 5, pp. 1115–1147, 2002.

[131] C. E. Yeow, “Nomograms visualization of naıve bayes classification on liver

disorders data,” School of Computer Engneering, Nanyang Technological Uni-

versity, 2006.

117

[132] U. V. Kulkarni and S. V. Shinde, “Neuro-fuzzy classifier based on the gaussian

membership function,” in International Conference on Computing, Communi-

cations and Networking Technologies (ICCCNT). IEEE, 2013, pp. 1–7.

[133] S. H. S. A. Ubaidillah, R. Sallehuddin, and N. H. Mustaffa, “Classification of

liver cancer using artificial neural network and support vector machine,” in

Proc. Of Int. Conf on Advance in Communication Network, and Computing,

2014, pp. 1–6.

[134] S. H. S. A. Ubaidillah, R. Sallehuddin, and N. A. Ali, “Cancer detection using

aritifical neural network and support vector machine: A comparative study,”

Jurnal Teknologi, vol. 65, no. 1, 2013.

[135] J. Ji, S. Gao, J. Cheng, Z. Tang, and Y. Todo, “An approximate logic neuron

model with a dendritic structure,” Neurocomputing, vol. 173, pp. 1775–1783,

2016.

[136] E. Rashedi, H. Nezamabadi-Pour, and S. Saryazdi, “Gsa: a gravitational search

algorithm,” Information Sciences, vol. 179, no. 13, pp. 2232–2248, 2009.

[137] P. K. Roy, “Solution of unit commitment problem using gravitational search al-

gorithm,” International Journal of Electrical Power & Energy Systems, vol. 53,

pp. 85–94, 2013.

[138] E. Rashedi, H. Nezamabadi-Pour, and S. Saryazdi, “Bgsa: binary gravitational

search algorithm,” Natural Computing, vol. 9, no. 3, pp. 727–745, 2010.

[139] S. Gao, C. Vairappan, Y. Wang, Q. Cao, and Z. Tang, “Gravitational search

algorithm combined with chaos for unconstrained numerical optimization,” Ap-

plied Mathematics and Computation, vol. 231, pp. 48–62, 2014.

[140] L. Bing and J. Weisun, “Chaos optimization method and its application,” Con-

trol Theory and Applications, vol. 14, no. 4, pp. 613–615, 1997.

118

[141] J. Yang, J. Z. Zhou, W. Wu, F. Liu, C. Zhu, and G. Cao, “A chaos algorithm

based on progressive optimality and tabu search algorithm,” in Proceedings of

2005 International Conference on Machine Learning and Cybernetics, vol. 5.

IEEE, 2005, pp. 2977–2981.

[142] H. Xu, Y. Zhu, T. Zhang, and Z. Wang, “Application of mutative scale chaos

optimization algorithm in power plant units economic dispatch,” Journal of

Harbin Institute of Technology, vol. 32, no. 4, pp. 55–58, 2000.

[143] M. Bucolo, R. Caponetto, L. Fortuna, M. Frasca, and A. Rizzo, “Does chaos

work better than noise?” IEEE Circuits and Systems Magazine, vol. 2, no. 3,

pp. 4–19, 2002.

[144] R. Resnick, D. Halliday, and J. Walker, Fundamentals of physics. John Wiley,

1988.

[145] P. Schroeder, “Gravity from the ground up,” Proceedings of the NPA, vol. 7,

pp. 498–503, 2010.

[146] R. Mansouri, F. Nasseri, and M. Khorrami, “Effective time variation of g in

a model universe with variable space dimension,” Physics Letters A, vol. 259,

no. 3, pp. 194–200, 1999.

[147] S. Talatahari, B. Farahmand Azar, R. Sheikholeslami, and A. Gandomi, “Im-

perialist competitive algorithm combined with chaos for global optimization,”

Communications in Nonlinear Science and Numerical Simulation, vol. 17, no. 3,

pp. 1312–1319, 2012.

[148] R. M. May, “Simple mathematical models with very complicated dynamics,”

Nature, vol. 261, no. 5560, pp. 459–467, 1976.

[149] A. Baranovsky and D. Daems, “Design of one-dimensional chaotic maps with

prescribed statistical properties,” International Journal of Bifurcation and

Chaos, vol. 5, no. 06, pp. 1585–1598, 1995.

119

[150] M. S. Tavazoei and M. Haeri, “Comparison of different one-dimensional maps as

chaotic search pattern in chaos optimization algorithms,” Applied Mathematics

and Computation, vol. 187, no. 2, pp. 1076–1085, 2007.

[151] B. Alatas, “Chaotic bee colony algorithms for global numerical optimization,”

Expert Systems with Applications, vol. 37, no. 8, pp. 5682–5687, 2010.

[152] S. Talatahari, B. F. Azar, R. Sheikholeslami, and A. Gandomi, “Imperialist

competitive algorithm combined with chaos for global optimization,” Commu-

nications in Nonlinear Science and Numerical Simulation, vol. 17, no. 3, pp.

1312–1319, 2012.

[153] T. Xiang, X. Liao, and K. Wong, “An improved particle swarm optimization

algorithm combined with piecewise linear chaotic map,” Applied Mathematics

and Computation, vol. 190, no. 2, pp. 1637–1645, 2007.

[154] K. Price, R. M. Storn, and J. A. Lampinen, Differential evolution: a practical

approach to global optimization. Springer, 2006.

[155] S. Das and P. N. Suganthan, “Differential evolution: A survey of the state-of-

the-art,” IEEE Transactions on Evolutionary Computation, no. 99, pp. 1–28,

2010.

[156] J. Zhang and A. C. Sanderson, “Jade: adaptive differential evolution with

optional external archive,” IEEE Transactions on Evolutionary Computation,

vol. 13, no. 5, pp. 945–958, 2009.

[157] J. Wang, J. Liao, Y. Zhou, and Y. Cai, “Differential evolution enhanced with

multiobjective sorting-based mutation operators,” IEEE Transactions on Cy-

bernetics, vol. 12, no. 44, pp. 2792–2805, 2014.

[158] L. V. Santana-Quintero and C. A. C. Coello, “An algorithm based on differential

evolution for multi-objective problems,” International Journal of Computation-

al Intelligence Research, vol. 1, no. 1, pp. 151–169, 2005.

120

[159] Y.-N. Wang, L.-H. Wu, and X.-F. Yuan, “Multi-objective self-adaptive differ-

ential evolution with elitist archive and crowding entropy-based diversity mea-

sure,” Soft Computing, vol. 14, no. 3, pp. 193–209, 2010.

[160] J. Zhang and A. C. Sanderson, “Self-adaptive multi-objective differential evo-

lution with direction information provided by archived inferior solutions,” in

IEEE Congress on Evolutionary Computation, 2008, pp. 2801–2810.

[161] W. Gong and Z. Cai, “An improved multiobjective differential evoluton based on

pareto-adaptive epsilon-dominance and orthogonal design,” European Journal

of Operational Research, vol. 198, no. 2, pp. 576–601, 2009.

[162] B. Chen, Y. Lin, W. Zeng, D. Zhang, and Y.-W. Si, “Modified differential evo-

lution algorithm using a new diversity maintenance strategy for multi-objective

optimization problems,” Applied Intelligence, pp. 1–25, 2015.

[163] J. K. Chong and K. C. Tan, “An opposition-based self-adaptive hybridized

differential evolution algorithm for multi-objective optimization (osade),” in

Proceedings of the 18th Asia Pacific Symposium on Intelligent and Evolutionary

Systems. Springer, 2015, pp. 447–461.

[164] E. Zitzler and L. Thiele, “Multiobjective evolutionary algorithms: a compar-

ative case study and the strength pareto approach,” IEEE Transactions on

Evolutionary Computation, vol. 3, no. 4, pp. 257–271, 1999.

[165] M. Laumanns, L. Thiele, K. Deb, and E. Zitzler, “Combining convergence and

diversity in evolutionary multiobjective optimization,” Evolutionary computa-

tion, vol. 10, no. 3, pp. 263–282, 2002.

[166] E. Zitzler, L. Thiele, M. Laumanns, C. M. Fonseca, and V. G. Da Fonseca, “Per-

formance assessment of multiobjective optimizers: An analysis and review,”

IEEE Transactions on Evolutionary Computation, vol. 7, no. 2, pp. 117–132,

2003.

121

[167] K. Fang and C. Ma, “Orthogonal and uniform experimental design,” Beijing:

Science press, 2001.

[168] Y. W. Leung and Y. Wang, “An orthogonal genetic algorithm with quantiza-

tion for global numerical optimization,” IEEE Transactions on Evolutionary


[169] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist mul-

tiobjective genetic algorithm: Nsga-II,” IEEE Transactions on Evolutionary


[170] S. Dasgupta, S. Das, A. Biswas, and A. Abraham, “On stability and conver-

gence of the population-dynamics in differential evolution,” AI Communica-

tions, vol. 22, no. 1, pp. 1–20, 2009.

[171] J. Brest, S. Greiner, B. Boskovic, M. Mernik, and V. Zumer, “Self-adapting

control parameters in differential evolution: A comparative study on numeri-

cal benchmark problems,” IEEE Transactions on Evolutionary Computation,

vol. 10, no. 6, pp. 646–657, 2006.

[172] E. Zitzler, K. Deb, and L. Thiele, “Comparison of multiobjective evolutionary

algorithms: Empirical results,” Evolutionary computation, vol. 8, no. 2, pp.

173–195, 2000.

[173] A. Hernandez-Dıaz, L. Santana-Quintero, C. Coello Coello, and J. Molina,

“Pareto-adaptive ε-dominance,” Evolutionary Computation, vol. 15, no. 4, pp.

493–517, 2007.

[174] K. Deb,Multi-objective optimization using evolutionary algorithms. JohnWiley

& Sons, 2001, vol. 16.

[175] E. Zitzler, M. Laumanns, and L. Thiele, “Spea2: Improving the strength pareto

evolutionary algorithm,” in Proc. Evolutionary Methods for Design Optimiza-

tion and Control with Applications to Industrial Problems, 2001, pp. 95–100.

122

[176] M.-R. Chen and Y.-Z. Lu, “A novel elitist multiobjective optimization algorith-

m: Multiobjective extremal optimization,” European Journal of Operational

Research, vol. 188, no. 3, pp. 637–651, 2008.

123

Acknowledgements

I would like to deeply thank the various people who, during my study and research,

gave me with useful and helpful assistance. Without their care and consideration,

this thesis would likely not have finished.

To my supervisor Prof. Zheng Tang at University of Toyama, who introduced me

to the significant and fascinating world of Intelligent Soft Computing, for his support

and continuous encouragement. Without his kind guidance and encouragement, I

would never have completed this degree. Furthermore, his help and support are not

limited in my study career, but also extended to my living life in Japan. Numerous

stimulating discussions and supports make me go ahead during the past times since

I came to Japan. Thanks to him, I could accomplish this thesis within three years.

I would like to thank my thesis referees, Prof. Hirobayashi, Prof. Yamazaki and

Associate Prof. Gao, from University of Toyama for a review and qualification of my

thesis and giving various valuable comments and suggestions.

To all the members of the Intelligent Information Systems Research Lab in Uni-

versity of Toyama, for all their help and friendship that made this time much more

enjoyable.

I would like to thank all the members of my family, for their unconditional love,

support, and encouragement through this process, through all my study process. In

particular, I would like to offer thanks to my husband, who endured my seemingly

endless hours of absorption in this effort without complaint, who gave me his unwa-

vering support, and who took care of many of those “nuisance” items usually referred

to as “Real Life” when I was off in my other world, that of completing this endeavor.

advanced computational intelligence algorithm based on

Documents