advanced computational intelligence algorithm based on
TRANSCRIPT
Advanced Computational Intelligence Algorithm
Based on Neural and Evolutionary Mechanisms
by
Tao Jiang
A dissertation
submitted to the Faculty of Engineering
in Partial Fulfillment of the Requirements
for the Degree of
Doctor of Engineering
University of Toyama
Gofuku 3190, Toyama-shi, Toyama 930-8555 Japan
2016
(Submitted January 24, 2016)
ii
Abstract
Computational intelligence (CI), a branch of artificial intelligence, makes a clear dis-
tinction from the traditional artificial intelligence based on the mathematical logic.
Computational intelligence uses heuristic algorithms such as fuzzy systems, neural
networks, and evolutionary computation. Computational intelligence also uses tech-
niques such as swarm intelligence, fractal, chaos theory, artificial immune system,
and wavelet. Making full use of these elements such as adaptation, computational
intelligence aims to create intellectual programs. Researchers have learned a lot of
things from natural systems, using the knowledge which has been obtained to develop
new algorithmic models to solve complex problems. Methods developed for low-level
cognitive functions include supervised and unsupervised learning by adaptive system-
s, and they encompass not only neural, fuzzy, and evolutionary approaches but also
probabilistic and statistical approaches, such as Bayesian networks or kernel methods.
These methods are used to solve the same type of problems in various fields such as
pattern recognition, signal processing, classification and regression, and data mining.
In order to more effectively deal with intricate data in the real world, researchers
have studied how to combine these intelligence methods to find the best way for real-
world problems. We have been concentrated mainly on artificial neural networks,
artificial immune systems, and evolutionary computation.
Accumulative results of these studies have suggested that synaptic nonlinearities
of dendrites in a single neuron can possess a powerful computational capacity. We
have established an approximate neuronal model that is able to capture the nonlin-
earities among excitatory and inhibitory inputs and thus is able to successfully make
predictions about the morphology of neurons when the model has been used for spe-
iii
cific learning tasks. Back-propagation (BP) method based on gradient has been used
to train the dendritic neuron model. Because of its inherent local optima trapping
problem, the BP method usually cannot find satisfactory solutions. Therefore we also
propose an artificial immune algorithm to train the dendritic neuron model. The arti-
ficial immune algorithm has an advantage that the training process does not provide
gradient information, which enables the dendritic model to utilize non-conventional
transfer/activation functions in soma. The learning can be accomplished on the basis
of the population of antibodies, where a potential parallel computing is used. It also
greatly improves the probability of jumping out of the local optima during training.
The single neuron model with synaptic nonlinearities in a dendritic tree was also
applied to liver disease diagnosis. Artificial neural network has provided physicians
with a powerful tool to analyze, compute, and figure out complex data across many
medical applications. The single neuron model (NMSN) simulates the essence of
nonlinear interactions among synaptic inputs in the dendrites. Experimental results
suggested that NMSN was superior to the traditional BPNN with the similar com-
putational architecture or with the best performance. NMSN has a distinct ability of
pattern extraction through a pruning function, which is a metaphor of the neuronal
morphology. We also focused on gravitational search algorithm (GSA) in dealing
with complex optimization problems. Because it still has some drawbacks, such as
slow convergence and the tendency to become trapped in local minima, we combined
Chaos with GSA to enhance its searching performance. In our work, other four differ-
ent chaotic maps are utilized to further improve the searching capacity of the hybrid
chaotic gravitational search algorithm (CGSA), and six benchmark instances, which
are widely used for optimization, are chosen from the literature as the test suit. All
five chaotic maps can improve the performance of the original GSA in terms of the
solution quality and convergence speed. The four newly incorporated chaotic maps
exhibit a better influence on improving the performance of GSA than the logistic map,
suggesting that the hybrid searching dynamics of CGSA is significantly affected by
the distribution characteristics of chaotic maps. We still worked on the evolutionary
algorithms, differential evolution in particular, which is well known as a stochastic
iv
search method for real-parameter optimization over continuous space. Differential
evolution is still limited in finding uniformly distributed solutions near optimal Pare-
to fronts. To alleviate such limitations, we introduced an adaptive mutation operator
to avoid the prematurity of convergence by tuning the mutation scale factor F and
adopted -dominance strategy to update the archive that stores the non-dominated
solutions. The effectiveness of our proposed approach was demonstrated with respect
to the quality of solutions in terms of the convergence and diversity of the Pareto
fronts.
Computational intelligence is now playing a greatly important part in our daily
life. The methods that we developed can help people get important information ef-
fectively from complex data and thus find optimal solutions. We plan to investigate
the user-defined parameter sensitivities of the proposed artificial immune algorithm
and apply the proposed model to more problems. We also try to adaptively use mul-
tiple chaotic maps simultaneously in the chaotic search to construct a more powerful
CGSA and analyze the search dynamics of the algorithm. The study of computation-
al intelligence, particularly of the mechanisms and constructions of single neuron and
the swarm intelligence, will be continued in the future.
v
Contents
Abstract ii
1 Introduction 1
1.1 Computational Intelligence Paradigms . . . . . . . . . . . . . . . . . 1
1.2 Short History of Computational Intelligence . . . . . . . . . . . . . . 2
1.3 Applications and Improvement in my study . . . . . . . . . . . . . . 4
2 Traditional Computational Intelligence 8
2.1 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Evolutionary Computation . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Swarm Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.1 Ant colony optimization algorithms . . . . . . . . . . . . . . . 13
2.3.2 Particle swarm optimization algorithm . . . . . . . . . . . . . 14
2.4 Artificial Immune Systems . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Fuzzy Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Dendritic Neural Model: Computation Capacity 18
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Proposed single neural model based on dendritic struture . . . . . . . 20
3.2.1 Synaptic layer . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.2 Branch layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.3 Membrane layer . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.4 Soma layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.5 Neuronal-pruning Function . . . . . . . . . . . . . . . . . . . . 23
vi
3.3 Error Back-propagation Learning algorithm . . . . . . . . . . . . . . 24
3.4 Experimental results and discussion . . . . . . . . . . . . . . . . . . . 26
3.4.1 Performance comparison . . . . . . . . . . . . . . . . . . . . . 26
3.4.1.1 Convergence comparison . . . . . . . . . . . . . . . . 26
3.4.1.2 Classification accuracy comparison . . . . . . . . . . 27
3.4.2 The synaptic and dendritic morphology after learning . . . . . 27
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4 Dendritic Neural Model: Immunological Learning Algorithm 29
4.1 Research Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Single Dendritic Neural Model for Morphology Prediction . . . . . . . 31
4.3 Artificial Immune Training Algorithm . . . . . . . . . . . . . . . . . . 36
4.3.1 Immunological Inspiration . . . . . . . . . . . . . . . . . . . . 36
4.3.2 Training Algorithm based on Immune Mechanisms . . . . . . 39
4.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4.1 Experiments Setup . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4.2 Results Analysis and Discussions . . . . . . . . . . . . . . . . 41
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5 Dendritic Neural Model: Classification Ability 46
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2 Backgrounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2.1 ANN in medical diagnosis . . . . . . . . . . . . . . . . . . . . 48
5.2.2 The discovery of synaptic nonlinearity in single neuron . . . . 49
5.3 Single Dendritic Neural Model for Classification . . . . . . . . . . . . 51
5.4 Learning algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.5 Experimental results and discussion . . . . . . . . . . . . . . . . . . . 57
5.5.1 Experimental environment and evaluation metrics . . . . . . . 57
5.5.2 The liver disease database description . . . . . . . . . . . . . . 58
5.5.3 Experimentation setup and results . . . . . . . . . . . . . . . 59
5.5.3.1 Optimal parameters setting . . . . . . . . . . . . . . 59
vii
5.5.3.2 Performance comparison . . . . . . . . . . . . . . . . 61
5.5.3.3 Convergence properties . . . . . . . . . . . . . . . . . 64
5.5.3.4 ROC analysis . . . . . . . . . . . . . . . . . . . . . . 64
5.5.4 The final synaptic and dendritic morphology . . . . . . . . . . 65
5.6 Conclusions and Remarks . . . . . . . . . . . . . . . . . . . . . . . . 67
6 Evolutionary Model: Chaotic Gravitation Search 71
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.2 Overview of GSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.3 chaotic maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.3.1 Logistic map . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.3.2 Piecewise linear chaotic map . . . . . . . . . . . . . . . . . . . 76
6.3.3 Gauss map . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.3.4 Sinusoidal map . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.3.5 Sinus map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.4 Chaotic gravitational search algorithm . . . . . . . . . . . . . . . . . 78
6.5 Numerical simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.5.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . 80
6.5.2 Results and discussions . . . . . . . . . . . . . . . . . . . . . . 81
6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7 Evolutionary Model: Multi-objective Differential Evolution 89
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.2 Brief Introduction to DE . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.3 Design of multi-objective differential evolution algorithm . . . . . . . 92
7.4 Simulation and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8 Conclusions 100
Bibliography 103
viii
Acknowledgements 123
ix
List of Figures
2.1 A biological neuron. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Mc-Culloch-Pitts neuron model. . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Graphical representation of multi-layer perceptron. . . . . . . . . . . . 10
3.1 The architecture of the proposed dendritic neuron model. . . . . . . . . 20
3.2 Four connections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Evolution of predicted dendrite structure by neural pruning. . . . . . . 24
3.4 Convergence graphs obtained by the proposed dendritic neuron model
and BPNN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5 Predicted dendrite structure by neural pruning obtained by the proposed
model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.1 Schema of a neuron model with dendritic branches. Axons of presynaptic
neurons (input X) connect to branches of dendrites (horizontal blue
lines) by synaptic layers (black triangles); the membrane layer (vertical
blue lines) sums the dendritic activations, and transfers the sum to the
soma body (black sphere). . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 Four connection states of synaptic layers. The left figure responds state
before training, each synaptic layer will land on one of the right four
connection states by training, which constitutes the structure of ALMN. 33
4.3 Six function cases of the synaptic layer. The graph’s horizontal x axis
represents the inputs of presynaptic neurons; the vertical y axis shows
the output of the synaptic layer. Because the range of x is [0,1], only
the corresponding part needs to be observed. . . . . . . . . . . . . . . 34
x
4.4 Evolution of predicted dendrite structure by neural pruning. . . . . . . 37
4.5 Biological immune procedures used as the training algorithm for single
dendritic neural model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.6 Mutation operators used in the artificial immune training algorithm. . 40
4.7 Final dendritic morphology of the XOR problem after training. . . . . 43
5.1 The architecture of the proposed dendritic neuron model. . . . . . . . . 51
5.2 Six function cases of the synaptic layer. . . . . . . . . . . . . . . . . . 52
5.3 Evolution of predicted dendrite structure by neural pruning. . . . . . . 54
5.4 Confusion matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.5 Comparison of convergence speed of NMSN and BPNN. . . . . . . . . 65
5.6 The ROC curves of NMSN and BPNNs. . . . . . . . . . . . . . . . . . 66
5.7 The AUC values of NMSN and BPNNs. . . . . . . . . . . . . . . . . . 67
5.8 The evolution of the neuronal morphology. . . . . . . . . . . . . . . . . 70
6.1 The distribution of x under certain system parameters in 20000 itera-
tions when x0 = 0.74 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.2 Statistical values of the final best-so-far solution obtained by the six
algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.3 The average fitness trendlines of the best-so far solution found by the
six algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.4 The ratio of best-so-far solutions found by the six algorithms. . . . . . 84
7.1 The general flow chart of the proposed adaptive mutation based multi-
objective differential evolution (IDE). . . . . . . . . . . . . . . . . . . . 93
7.2 Pareto fronts obtained by IDE and its competitor algorithm MDE on
ZDT1, ZDT2, ZDT3, ZDT4, and ZDT6 respectively. . . . . . . . . . . 97
xi
List of Tables
3.1 Parameter setting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Exclusive OR problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Classification accuracy. . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1 Target XOR training data. . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2 The training data set of slope stability classification problem. . . . . . 44
4.3 The test data set of slope stability classification problem . . . . . . . . 45
4.4 Average final least squared error after learning using BP and artificial
immune algorithm for XOR and slope stability. . . . . . . . . . . . . . 45
5.1 Terms used to define sensitivity, specificity and accuracy. . . . . . . . . 58
5.2 Basic features for Liver Disorders. . . . . . . . . . . . . . . . . . . . . . 59
5.3 No. of patterns in the training and testing data set. . . . . . . . . . . . 59
5.4 Parameter levels in NMSN. . . . . . . . . . . . . . . . . . . . . . . . . 60
5.5 L16(45) orthogonal array and factor assignment. . . . . . . . . . . . . . 61
5.6 Structures of NMSN and BPNN for Liver disorders dataset. . . . . . . 62
5.7 Classification results by NMSN and BPNN. . . . . . . . . . . . . . . . 62
5.8 Comparison of the simulations results between NMSN and BPNN. . . 63
5.9 Classification accuracies for BUPA Liver Disorders problem obtained by
other methods in literature. . . . . . . . . . . . . . . . . . . . . . . . . 69
6.1 The function name, definition, dimension, feasible interval of variants,
and the known global minimum of six benchmark function. . . . . . . . 86
6.2 Statistical results of different methods for Sphere function (f1). . . . . 87
6.3 Statistical results of different methods for Schwefel function (f2). . . . 87
xii
6.4 Statistical results of different methods for Rosenbrock function (f3). . 87
6.5 Statistical results of different methods for Schwefel 2.26 function (f4). 87
6.6 Statistical results of different methods for Ackley function (f5). . . . . 88
6.7 Statistical results of different methods for Griewank function (f6). . . 88
7.1 Comparison of the convergence metric between IDE and MDE. . . . . 96
7.2 Comparison of the diversity metric between IDE and MDE. . . . . . . 96
7.3 Comparison of the convergence metric during IDE, NSGA-II, SPEA2,
and MOEO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.4 Comparison of the diversity metric during IDE, NSGA-II, SPEA2, and
MOEO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
1
Chapter 1
Introduction
1.1 Computational Intelligence Paradigms
The design of algorithmic models has become a major thrust in algorithmic develop-
ment to solve problems which have become more and more complicated. Great suc-
cesses have been achieved through the modeling of natural and biological intelligence.
Computational intelligence paradigms include artificial neural networks, evolutionary
computation, swarm intelligence, artificial immune systems, and fuzzy systems [1].
They form part of the field of Artificial Intelligence, together with logic, deductive
reasoning, expert systems, case-based reasoning and symbolic machine learning sys-
tems. Computational Intelligence (CI) studies adaptive mechanisms to use intelligent
in complex and changing environments. These mechanisms exhibit an ability to gen-
eralize, abstract, discover and make sense in new situations. Every computational
intelligence paradigms has its origins in biological systems. Neural networks model
biological neural systems, evolutionary computation originated from natural evolu-
tion (including behavioral and genetic evolution), swarm intelligence models the social
behavior of organisms living in swarms or colonies, artificial immune system model-
s the human immune system, and fuzzy system originated from studies of the way
organisms interact with their environment [1].
2
1.2 Short History of Computational Intelligence
The first definition of artificial intelligence was established only in the 1950s by Alan
Turing. Turing studied how machinery could be used to mimic processes of the
human brain, which resulted in one of the first publications of AI, named Intelligent
Machinery [1].
The term artificial intelligence was first raised at the Dartmouth conference in
1956, organized by John MacCarthy who was regarded as the father of Artificial
intelligence. From 1956 to 1969 there were many researches in modeling biological
neurons, in which the most notable one was the work on perceptrons by Rosenblatt,
and the adaline by Widrow and Hoff. In 1969, Minsky and Papert caused a great
setback to artificial neural network research, concluding that the extension of simple
perceptrons to multilayer perceptrons is sterile. The research in neural networks kept
to be stagnant until the mid-1980s with the resurrection of neural networks research by
landmark publications from Hopfield, Hinton, Rumelhart and McLelland. Research
in neural networks started to explode from the late 1980s and it is one of the largest
research areas in Computer Science today [1].
The development of evolutionary computation started in the 1950s, with genetic
algorithms in the study of Fraser, Bremermann and Reed. However, it is John Hol-
land who is generally viewed as the father of evolutionary computation. The works
of evolutionary computation modeled Elements of Darwins theory of evolution al-
gorithmically [2]. Evolutionary strategies (ES) was developed by Rechenberg in the
1960s, and evolutionary programming was developed by Lawrence Fogel independent-
ly, as an approach to develop behavioral models. There are many other important
contributions made by De Jong, Schaffer and other scientists to shape the field of
evolutionary computation.
The history of fuzzy logic was believed to start with Gautama Buddha and Bud-
dhism, but the Western community considers that the study of Aristotle on two-valued
logic was the birth of fuzzy logic. In 1920 Lukasiewicz published the first deviation
from two-valued logic in his work on three-valued logic, which expanded to an arbi-
3
trary number of values later. It was Max Black, a quantum philosopher, who firstly
introduced quasi-fuzzy sets, wherein degrees of membership to sets were assigned to
elements [1]. Lotfi Zadeh was the developer of fuzzy sets, who contributed most to the
field of fuzzy logic [3]. Until the 1980s fuzzy systems was an active field, but it also
experienced a dark age in the 1980s. It was revived by Japanese researchers in the
late 1980s. Nowadays fuzzy systems are wildly used in many successful applications,
especially in control systems.
Swarm intelligence was firstly put forward by Eugene N Marais, an South African
poem, who had made great contributions in his works of social behaviors of apes and
ants, namely The Soul of the White Ant [4] and The Soul of the Ape [5]. Swarm
intelligence was modeled algorithmically in the work of Marco Dorigo on the modeling
of ant colonies in the early 1990s. In 1995, Eberhart and Kennedy [6,7] developed the
particle swarm optimization algorithm, modeling the behaviors of bird flocks. Swarm
intelligence has become a promising research field and has been used to resolve real-
world problems.
The theoretical definition of clonal selection in the natural immune system was
initially made by Burnet [8] as B-Cells and Killer-T-Cells with antigen-specific re-
ceptors, enhanced by the introduction of the concept of a helper T-Cell by Bretscher
and Cohn [9]. Later Lafferty and Cunningham [10] added a co-stimulatory signal
to the helper T-Cell model. Different artificial immune models have been developed
on the basis of a specific theory on immunology or a combination of the different
immunology theories. The first model in artificial immune system was the discrimi-
nation between self and non-self with mature T-Cells introduced by Forrest et al. [11],
using a training technique known as the negative selection of T-Cells [12]. The clonal
selection theory was firstly implemented to optimization problems on the model of
Mori et al [13]. The network theory of the natural immune system was introduced by
Jerne [14] that the B-Cells are interconnected to make a network of cells [14,15]. The
Jerne theory was further developed by Perelson [15]. The network theory of Jerne
was first modeled mathematically by Farmer et al. [16]. For data mining and data
analysis tasks the network theory has been modeled into artificial immune systems,
4
of which the earliest artificial immune system research was published by Hunt and
Cooke [17]. The danger theory, based on the co-stimulated model of Lafferty and
Cunningham [10,18,19], was introduced by Matzinger [20,21]. It is mainly viewed by
the danger theory that the immune system distinguishes between what is dangerous
and non-dangerous in the body. The first work of AISs based on danger theory was
published by Aickelin and Cayzer [22].
1.3 Applications and Improvement in my study
Our part of the study was concentrated on modeling the single neuron and applied
the single models to some real world questions. In the traditional ANNs’ literatures,
the prevailing view has been that the brain has strong computational abilities be-
cause of the complex connectivity of neural networks, in which a single neuron could
only perform a linear summation and a nonlinear thresholding operation (all-or-none
response). As a consequence, the contribution of single neurons and their dendrites
has long been overlooked. Recently it has been conjectured by a series of theoreti-
cal studies that individual neurons could act more powerfully as computational units
considering synaptic nonlinearities in a dendritic tree. The various types of synaptic
plasticity and nonlinearity mechanisms allow synapses to play a more important role
in computations. Synaptic inputs from different neuronal sources can be distribut-
ed spatially on the dendritic tree and plasticity in neuron can result from changing
in synaptic strength or connectivity, and the excitability of the neurons themselves.
Moreover, a slight morphological difference can just cause great functional variation,
acting as filters to determine what signals a single neuron receives and then how these
signals are integrated. However, there is no effective model that can capture the non-
linearities among excitatory and inhibitory inputs while predicting the morphology
and its evolution of synapses and dendrites.
We propose a new single neuron model with synaptic nonlinearities in a dendritic
tree. The computation on neuron has a neuron-pruning function that can reduce
dimension by remove useless synapses and dendrites during learning, forming a precise
5
synaptic and dendritic morphology. The nonlinear interactions in a dendrite tree are
expressed using the Boolean logic AND (conjunction), OR (disjunction) and NOT
(negation). An error back propagation algorithm is used to train the neuron model.
Furthermore, we apply the new model to the Exclusive OR (XOR) problem and it can
solve the problem perfectly with the help of inhibitory synapses which demonstrate
synaptic nonlinear computation and the neurons ability to learn.
The previous works have established an approximate neuronal model which is
able to capture the nonlinearities among excitatory and inhibitory inputs and thus
successfully predict the morphology of neurons when performing specific learning
tasks. Gradient based back-propagation (BP) method has been used to train the
dendritic neuron model. Due to its inherent local optima trapping problem, the BP
method usually cant find satisfactory solutions. In the following work, we proposed
an artificial immune algorithm to train the dendritic neuron model. In comparison to
BP, the artificial immune algorithm has advantages that the training process doesnt
necessarily provide gradient information, which enables the dendritic model can utilize
non-conventional transfer/activation functions in soma, and that the learning can be
accomplished based on a population of antibodies which is in a potential parallel
computing manner and greatly improve the probability of jumping out the local
optima during training. Experimental results based on the famous XOR problem and
a geotechnical engineering problem verified the effectiveness of the proposed artificial
immune algorithm.
We also applied proposed new single neuron model (NMSN) to liver disease diag-
nosis. ANN has provided a powerful tool for physicians to analyze, compute and figure
out complex data across many medical applications. The advent of ANN brought the
hope to improve diagnostic accuracy with its ability to capture complex nonlinear
and multidimensional relationships among variables. The single neuron model with
synaptic nonlinearities (NMSN) proposed simulates the essence of nonlinear interac-
tions among synaptic inputs in the dendrites. We assume that each branch receives
signals at their synapses and performs a multiplication of these signals, while the
synapses perform a sigmoidal nonlinear operation on their inputs. The branching
6
point sums up each multiplied input and then the current is transmitted to the cell
body (soma). Once exceeding the threshold, the cell fires and sends signal down
to other neurons through axon. The performance of NMSN was verified based on
the liver disease diagnostic problems. Experimental results suggested that NMSN
was superior than the traditional BPNN with the similar computational architecture
(denoted as BPNN-15) or with the best performance (namely BPNN-40), in terms
of classification accuracy, convergence properties, and AUC criterion. In addition,
NMSN also produced better or competitive solutions than a number of previous-
ly proposed methods, such as SVM, C4.5, Classification tree, KNN, Neuron-fuzzy
model, etc. NMSN has a distinct ability of pattern extraction through the pruning
function, which is a metaphor of the neuronal morphology.
By learning a larger than necessary initial network, and thereafter screening out
the useless synapses and unnecessary dendrites, NMSN can finally produce a neuron
with least necessary dentritic morphology. The resultant neuron can not only possess
significant higher computational capacity than the traditional Mc-Culloch-Pitts linear
neuron model which is incapable of solving even the simple 3-bit parity problem, but
also provide a possible information processing mechanism of the neuronal morphology
and plasticity. These findings and evidences might also give some insights into the
development of new techniques for understanding the mechanisms and constructions
of single neurons.
The other part of our study focused on gravitational search algorithm (GSA) in
dealing with complex optimization problems. Because it still has some drawbacks,
such as slow convergence and the tendency to become trapped in local minima, we
used Chaos, which is generated by the logistic map and has the properties of ergodicity
and stochasticity, to combine with GSA to enhance its searching performance. In our
work, other four different chaotic maps are utilized to further improve the searching
capacity of the hybrid chaotic gravitational search algorithm (CGSA), and six widely
used benchmark optimization instances are chosen from the literature as the test suit.
Simulation results indicate that all five chaotic maps can improve the performance of
the original GSA in terms of the solution quality and convergence speed. Moreover,
7
the four newly incorporated chaotic maps exhibit better influence on improving the
performance of GSA than the logistic map, suggesting that the hybrid searching dy-
namics of CGSA is significantly affected by the distribution characteristics of chaotic
maps.
We still worked on the evolutionary algorithms especially differential evolution
(DE) which is well known as a powerful and efficient population-based stochastic
real-parameter optimization algorithm over continuous space. DE is recently shown
to outperform several well-known stochastic optimization methods in solving multi-
objective problems. Nevertheless, its performance is still limited in finding a uniformly
distributed and near optimal Pareto fronts. To alleviate such limitations, we intro-
duced an adaptive mutation operator to avoid premature of convergence by adaptively
tuning the mutation scale factor F, and adopts ε-dominance strategy to update the
archive that stores the non-dominated solutions. Experiments based on five wide-
ly used multiple objective functions are conducted. Simulation results demonstrate
the effectiveness of our proposed approach with respect to the quality of solutions in
terms of the convergence and diversity of the Pareto fronts.
8
Chapter 2
Traditional ComputationalIntelligence
2.1 Artificial Neural Networks
The brain is composed of approximately 1011 neurons with more than 1015 connec-
tions between them. Though variable in size and shape, all neurons are composed of
three parts: the cell body, the axon and the dendrites, as illustrated in Fig. 2.1. Den-
drites receive input from other neurons, including but not limited to direct input from
the sensory system involved, at a connection called a synapse and then transmit the
message to the cell body directly or via dendrites. When the net excitation achieves
a threshold value, the neuron fires and sends signals to other neurons through the
axon. A neuron can either inhibit or excite a signal [23].
The brain is able to perform tasks such as pattern recognition and perception
much faster than computer. The brain can also learn, memorize and generalize.
Current successes in neural modeling are solving a specific task by small artificial
neural networks. An artificial neural network (NN) is a layered network of artificial
neurons. Tasks with a single objective can be solved quite easily by neural networks
with suitable size because of the constraints by the capabilities of modern computing
power and storage space [23].
An artificial neural network (ANN) is a mathematical representation of the human
neural architecture, reflecting its “learning” and “generalization” abilities. In the
9
Figure 2.1: A biological neuron.
threshold
non-linearity
W1
outputinputs
Wn
Figure 2.2: Mc-Culloch-Pitts neuron model.
1940s, Warren McCulloch and Walter Pitts explored the computational abilities of
mathematical models of neural networks made up of simple neurons. The neural
networks will compute any of the finite basic Boolean logical functions. The Mc-
Culloch-Pitts neuron model has been widely used as a basic unit for modern studies
of neural networks which multiplies the input vector by a weight vector, and then
passes through a linear threshold gate (see Fig. 2.2). The neurons can have the
ability to learn arbitrary linearly separable dichotomies of the inputs space through
adjusting weights and thresholds of synapses.
However, such neural networks were considered to be too inflexible to be applied
as models of cognition because of their inability to generalize. The development of
ANNs was promoted by Rosenblatt, who proposed a more flexible method based on
statistical separability on the analysis of neurons. Rosenblatt developed a class of
networks known as perceptrons. A typical perceptron, is made up of 3 layers of
cells: an input layer, hidden layers and an output layer (Fig. 2.3). Inputs in one
10
threshold
non-linearity
W1
outputinputs
Wn
Figure 2.3: Graphical representation of multi-layer perceptron.
layer are connected, fully or partially, to the neurons in the middle layer. These
neurons are then connected to the response layer of neurons in a random way. The
response neurons produce the outputs of the network, but also inhibit each other.
The generalization ability of perceptrons is shown when the response cell inhibits the
others through receiving the strongest input, and its response become the output. In
addition, perceptrons were also shown to be capable of learning [23].
Artificial neural networks were early used as practical applications by Widrow
and Hoff, who developed ADALINE, a simple neuron similar to the perceptron and
networks of ADALINEs called MADALINE. Widrow and Hoff also developed least
mean square, which is a supervised learning procedure considered as a pioneer to the
backpropagation learning algorithm [23].
One of the most significant developments of neural networks was the discovery
of a learning algorithm known as backpropagation to adjust the value of weights in
a multi-layer feedforward network. Using the backpropagation learning algorithm,
neural networks become more effective to solve nonlinear problems, leading to more
adoption for solving practical problem. Though many learning algorithms are avail-
able for artificial neural network, depending on its type and its practical application,
backpropagation learning algorithm is the one used most frequently.
Neural networks have been successfully applied to many data-intensive applica-
tions. These applications include: classification, prediction, pattern recognition, con-
trol and so on.
11
2.2 Evolutionary Computation
Although we can trace the origins of evolutionary computation back to the late 1950’s,
evolutionary computation has really drawn public attentions during the last decade.
However, it did not get enough development at that time for some reasons such as
lack of powerful computer platforms and the defects of previous methods [24].
Evolutionary computation should be considered as a general adaptable concept
for solving difficult optimization problems, as using evolutionary search can not only
gain flexibility and adaptability to the current task but also can combine with robust
performance and global search characteristics. There are three closely connected but
separately developed approaches currently implemented: genetic algorithms, evolu-
tionary programming, and evolution strategies.
Genetic algorithm (GA), a search heuristic that mimics the process of natural
selection, has been originally proposed as a general model of adaptive processes.
The largest application of the techniques is routinely to generate useful solutions to
optimization and search problems, inspired by natural evolution, such as inheritance,
mutation, selection, and crossover [1]. Evolutionary programming is similar to genetic
programming, but the structure of the program to be optimized is fixed, while its
numerical parameters are allowed to evolve. Evolutionary programming was originally
used as a learning process aiming to generate artificial intelligence. Finite state
machines (FSM) were evolved and were used as predictors on the basis of former
observations. The performance of an FSM might be measured on the basis prediction
capability of the machine. Currently evolutionary programming has no fixed structure
and it is becoming harder to be distinguished from evolutionary strategies [24].
Evolution strategies, an optimization technique based on ideas of adaptation and
evolution, were initially developed to solve difficult discrete and continuous opti-
mization problems. The neo-Darwinian model of bio-evolution is represented by the
structure of the following evolutionary algorithm.
Algorithm 1– General evolution framwork based on neo-Darwinian model.
12
Begin:
t := 0
initialize M(t)
evaluate M(t)
While not termination conditions fulfilled do
M ′(t) := variation [M(t)]
evaluate [M ′(t)]
M(t+ 1) := select [M ′(t) ∪M(t)]
t := t+ 1
End
In this algorithm, M(t) denotes a population of n individuals at generation t.
N is a special set of individuals considered for selection. An offspring population
M ′(t) of size λ is generated through variation operators. The offspring individuals
are then evaluated by calculating the values of objective function as each of the
solutions represented by individuals in M ′(t) and selection based on the fitness values
is implemented to get better solutions. The better an individual performs under these
conditions the greater is the possibility for the individual to live longer and generate
offspring. The uncertain nature of reproduction leads to a permanent production of
novel genetic information, thus to the creation of diverse offspring [25–27].
Evolutionary computation is closely related to some other techniques, such as
neural networks and fuzzy logic, which are usually considered as part of artificial in-
telligence. According to Bezdek [28,29], it is their characteristic of numerical knowl-
edge representation that distinguishes them from traditional artificial intelligence.
Moreover, the following characteristics were proposed by Bezdek that computational
intelligence should have:
1) numerical knowledge representation;
2) fault tolerance;
3) adaptability;
4) error rate optimality;
5) processing speed comparable to human cognition processes.
13
2.3 Swarm Intelligence
Swarm intelligence (SI) originated from the observation of the social behavior of
organisms, or the study of colonies. Efficient swarm optimization and clustering
algorithms derived from foraging behavior of ants and choreography of bird flock, such
as the ant colony optimization (ACO) algorithms and the particle swarm optimization
(PSO) algorithm. The swarm can always find an optimal pattern [30–32].
Swarm intelligence models are designed to model the simple individual behaviors
and the local interactions with the neighbor and environment, for the purpose of un-
derstanding more complicated behaviors that are useful for solving complex problems,
mostly optimization problems.
2.3.1 Ant colony optimization algorithms
An ant can be seen as a stimulus response agent [33–36]. For ants, the pheromone is
the stimulus and each ant perceives pheromone concentrations of local environment
and produces an action to select a direction with the highest pheromone concentra-
tions probabilistically. Thus an ant can be considered as a simple computational
agent and this simple behavior of real ants can be modeled algorithmically. The ar-
tificial ant decision process is shown in Algorithm 2. When the ant needs to make a
decision, this algorithm will be executed.
Algorithm 2– Artificial Ant Decision Process.
Begin:
Let r ∼ U(0, 1)
For each potential path A do
Calculate PA
If r < PA then
Follow path A
Break;
End
14
End
In Algorithm 2, PA represents the probability of the next ant to choose path
A. Ant algorithms have been wildly applied to real-world problems such as the
TSP [37–40]. However, ACO algorithms can only be applied to optimization problems
while meeting some requirements such as an appropriate graph should be able to
represent all states and transitions in discrete search space [41–43].
2.3.2 Particle swarm optimization algorithm
Particle swarm optimization (PSO) is a stochastic population-based search algorithm,
based on simulation of two simple social behaviors of individual birds within a flock:
each bird (1) moves toward its closest best neighbor, and (2) moves back to its
experienced best state. These two social behaviors lead all birds to converge on their
best environment state [6, 44,45].
Each individual in the swarm represents a candidate solution of the optimization
problem. In a PSO system, each particle fly through the hyper-dimensional search
space and it will be affected by other particles in the swarm to adjust its position in
search space. A particle uses the best position experienced by itself and its neighbors
to position itself toward a best solution. Although particle moves toward an optimum,
it will still search a wider area around the current optimum solution. The performance
of each particle is measured by a hypothetical fitness function considering the problem
to be solved [46–49]. PSO has been applied to problems including optimization of
mechanical structures, function approximation, clustering, and solving systems of
equations.
2.4 Artificial Immune Systems
The natural immune system has a powerful pattern matching ability to distinguish
between the cells belonging to the body (self) and foreign cells entering the body (non-
self). While encountering antigen, the adaptive nature of the natural immune system
15
will be shown by memorizing the structure of these antigen for quicker response when
encountering again in the future [1].
Artificial Immune System has powerful information processing capabilities such
as pattern recognition, feature extraction, learning and memory. However, it is a
highly complicated system and is still under active research. The current artificial
immune systems primarily adopt three immunological principles including the im-
mune network theory, the mechanisms of negative selection, and the clonal selection
principles [50,51].
The immune network theory, based on Jerne’s idiotypic network theory [14,52,53],
proposed a hypothesis that the immune system maintains a network of interconnected
B-cells for antigen recognition. These cells build a stable network by both stimulating
and suppressing each other in certain ways. Two B-cells will be combined if the
value of affinities between them goes beyond a certain threshold, and the connection
strength is directly proportional to the affinities [54].
The negative selection algorithm is composed of three phases: defining self, gener-
ating detectors and monitoring anomaly. The negative selection algorithm originated
from the mechanism to train the T-cells to distinguish antigens and to prevent foreign
antigens from recognizing the cells belonging to the body. A set of (binary) detector
are generated to detect anomaly [55]. The clonal selection principle [56] is a descrip-
tion of how an immune response to an antigenic stimulus. It suggests that only those
cells that recognize the antigen proliferate can be selected. The main characteristics
of the clonal selection theory are that [57,58]:
1) The new cells are copies of their parents which are subjected to somatic hyper-
mutation;
2) The newly differentiated lymphocytes which carry self-reactive receptors are
eliminated;
3) Proliferation and differentiation on interaction of mature cells with antigens.
Algorithm 3 is a proposal of basic AIS. Each of the algorithms parts is briefly
explained next.
16
Algorithm 3– Basic AIS Algorithm.
Begin:
Initialize a set of ALCs as population C
Determine the antigen patterns as training set DT
While some stopping condition(s) not true do
for each antigen pattern zp ∈ DT do
Select a subset of ALCs for exposure to zp, as population S ⊆ C;
for each ALC, xi ∈ S do
Calculate the antigen affinity between zp and xi
End
Select a subset of ALCs with the highest calculated antigen affinity as population
H ⊆ S
Adapt the ALCs in H with some selection method, based on the calculated antigen
affinity and/or the network affinity among ALCs in H
Update the stimulation level of each ALC in H
End
End
Artificial immune systems have many successful applications for many problem
domains, ranging from network intrusion and anomaly detection to pattern recogni-
tion, data classification, virus detection, and data mining. The AIS methods based
on genetic algorithm are applied to some structural optimization problems with t-
wo objectives. The optimum solutions are defined as antigens and the rest of the
population are defined as a pool of antibodies [59–61].
2.5 Fuzzy Systems
As our observations and reasoning often include a measure of uncertainty, we need
fuzzy sets and fuzzy logic which can perform an approximate reasoning. The degree
of certainty can be measured that an element belongs to a set with fuzzy sets. Fuzzy
logic allows reasoning with uncertainty to bring out new possible facts [1].
17
Fuzzy sets are an extension of two-valued sets to handle partial fact, which enables
the modeling to accommodate the uncertainty. Different to classical sets, elements
of a fuzzy set have degree measurement of belonging to that set, which indicates the
certainty (or uncertainty). Suppose X is the domain, and x ∈ X is a specific element
of the domainX. The fuzzy set A is characterized by a membership mapping function.
µA : X → [0, 1] (2.1)
Therefore, for all x ∈ X, µA(x) indicates the certainty that element x belongs to
fuzzy set A. In the case of two-valued sets, µA(x) is either 0 or 1.
For a discrete domain X, the fuzzy set can either be expressed in the form of an
nx-dimensional vector. If X = x1, x2, ..., xn,
A = {(µA(xi)/xi)|xi ∈ X, i = 1, ..., nx} (2.2)
Or in the form of using sum notation,
A = µA(x1)/x1 + µA(x2)/x2 + ...+ µA(xnx/xnx) =nx∑i=1
µA(xi)/xi (2.3)
A continuous fuzzy set A, is denoted as
A =
∫X
µ(x)/x (2.4)
The uncertainty in fuzzy systems should not be confused with statistical uncer-
tainty. Instead of basing on the laws of probability, nonstatistical uncertainty is based
on vagueness. Statistical uncertainty is altered through observations while nonsta-
tistical uncertainty is an intrinsic property of a system which cannot be altered by
observations [1, 3, 62,63].
There are many successful applications using fuzzy systems such as control sys-
tems, braking systems in vehicles, controlling traffic signals, and many others.
18
Chapter 3
Dendritic Neural Model:Computation Capacity
3.1 Introduction
Neurons are the building blocks of the nervous system. The brain has approximately
1011 neurons and each neuron may be connected to up to 10,000 other neurons,
passing signals to each other through about 1,000 trillion synaptic connections. The
neuron consists of a cell body (or soma) with branching dendrites, a cell membrane
and an axon, which conduct the nerve signal. First dominant conceptual model on
neural networks was a single neuron model called the binary Mc-Culloch-Pitts neuron,
which was proposed by McCulloch and Pitts in 1943 [64]. It has been criticized to
be oversimplified for not considering nonlinearities in a dendrite tree and some rather
elementary computations, such as the Exclusive OR problem, could not be solved by
a single layer of Mc-Culloch-Pitts neuron model according to Minsky and Papert [65].
The prevailing view in the traditional artificial neural networks literature, has been
that the brain has powerful computational abilities due to the complex connectivity
of neural networks, in which a single neuron could only perform a linear summation
and a nonlinear thresholding operation [64]. As a consequence, the contribution of
single neurons and dendrites has been neglected for a long time. Dendritic process-
ing is very nonlinear and such dendritic nonlinearities has been hypothesized to be
able to enhance computational capabilities of a single neuron [66–68]. The synaptic
19
interaction at the turning point of a branch can be implemented by Boolean logical
operations according to the hypothesis by Koch, Poggio and Torre [69]. It suggested
that the dendritic branch point may sum currents from the dendritic branches, such
that its output would be a logical OR of its inputs, while each of the branches would
perform a logical AND on their synaptic inputs. Moreover, a logical NOT operation
can represent the inversion of a signal. However, it’s difficult for Koch’s model to
distinguish diverse synaptic and dendritic morphology in solving specific and com-
plex problems, for a slight difference of morphology can result in great functional
variation [69].
Thus structural plasticity mechanisms in synapses and dendrites are needed to
help resolve the problem. So the neuron pruning methodology, which is a way to
reflect neuron plasticity has arisen [70–72]. It refers to an essential progress by which
useless neurons and synaptic connections are deleted in order to improve the efficiency
of the neurological system. These new biophysical phenomena are helpful for us to
propose the model in this paper.
We propose a new single neuron model of four layers with synaptic nonlinearities
in a dendritic tree: a synaptic layer, a branch layer, a membrane layer and a soma
layer. We assume that each branch receives signals at their synapses and performs a
multiplication of these signals, while the synapses perform a sigmoidal nonlinear op-
eration on their inputs. The branching point sums up each multiplied input and then
the current is transmitted to the cell body (soma). When exceeding the threshold,
the cell fires and sends signal down to other neurons through axon. An error back
propagation algorithm is used to train the neuron model and according to the prun-
ing function, useless synapses and dendrites will be removed during training, forming
a distinct synaptic and dendritic morphology. Moreover, the nonlinear interactions
in a dendrite tree are expressed using the Boolean logic AND, OR and NOT. Thus,
the proposed single neuron model can be used as a single classifier to deal with the
classical Exclusive OR problem and the effectiveness is proved by the experiment.
The remaining of the paper is organized as follows. Section II introduces the
proposed single dendritic neuron model in detail. The model’s learning algorithm is
20
X1 X2 X3 X4 X5Synaptic
Branch
Membrane
Soma
Figure 3.1: The architecture of the proposed dendritic neuron model.
described in Section III. Section IV presents the experimental results and discussion.
Finally, Section V gives the conclusions of this paper.
3.2 Proposed single neural model based on den-
dritic struture
The architecture of the single neuron model is shown in Fig. 6.1. The neuron is
composed of a set of independent branches and a soma.
3.2.1 Synaptic layer
A set of inputs labeled x1, x2, ..., xi is applied to the neuron, corresponding to signals
conveyed by synapses. The synapses can be either excitatory or inhibitory and tend
to cause the cell to fire and produce an output pulse. There should be four connection
states in the synaptic layer: a direct connection (excitatory synapse), a reverse con-
nection (inhibitory synapse), a constant 1 connection and a constant 0 connection.
We show the type of connections by modeling with a sigmoid function. The node
function from the i-th (i = 1, 2, 3, ..., I) input to the j-th (j = 1, 2, 3, ...,M) synaptic
layer is given by
Yim =1
1 + e−k(wimxi−θim)(3.1)
21
where xi is the input part of a presynapse which is a set of inputs labeled by
x1, x2, ..., xn, and its range is [0, 1]. The inputs are transformed into digital signals
“0” and “1” in the synaptic layer. wim denotes synaptic parameters, and k represents
a positive constant. θim/wim is the threshold of the synaptic layer. There are six
cases of different values of the synaptic parameters. The synaptic function varies
accordingly as the values of wim and θim change, thus exhibiting different connections
states. Furthermore, the sigmoid function is clearly differential.
State1: Direct connection (Excitatory synapse)
Case (a): 0 < θim < wim. eg.: wim = 1.0 and θim = 0.5. In the direct connection,
if xi > θim/wim , the output Yim will be 1. That can be explained that if the input
is high potential compared to the threshold, an excitatory postsynaptic potential
(EPSP) will occurred as the membrane potential rapidly depolarizes. And when
xi < θim/wim , the output Yim will be 0. That is an inhibitory postsynaptic potential
(IPSP) has occurred as the membrane will be transiently hyperpolarizes [3]. In other
words, no matter how the inputs change between 0 and 1, the outputs equal the
input.
State2: Inverse connection (Inhibitory synapse)
Case (b): wim < θim < 0. eg.: wim = −1.0 and θim = −0.5. In the inverse
connection, if xi > θim/wim , the output Yim will be 0, giving rise to an IPSP that
hyperpolarizes the cell. On the other hand, if xi < θim/wim , the output Yim will be
1, as the postsynaptic membrane is depolarized by generating an EPSP. So it can be
illustrated by the logic NOT operation.
State3: constant-1 connection
Case (c1): θim < 0 < wim. eg.: wim = 1.0 and θim = −0.5. Case (c2): θim <
wim < 0. eg.: wim = −1.0 and θim = −1.5. In the constant-1 connection, the
output will be constant 1 no matter if the input exceeds the threshold or not. The
signals from the synapse have nearly no impact on the dendritic layers, for there is
an excitatory synapse that will trigger EPSPs once the input signals come in.
State4: constant-0 connection
Case (d1): 0 < wim < θim. eg.: wim = 1.0 and θim = 1.5. Case (d2): wim < 0 <
22
θim. eg.: wim = −1.0 and θim = 0.5. In the constant-0 connection, the output will
always be 0. That is IPSPs will always occur and the postsynaptic membrane keeps
hyperpolarized.
3.2.2 Branch layer
The branch layer receive a signal at the synaptic contact point, perform a multiplica-
tive computation of these signals, and produce the local potentials
Z =I∏
i=1
Yim (3.2)
The multiplication is very equal to the logic AND operation as the value of inputs
and outputs of the dendrites are either 1 or 0.
3.2.3 Membrane layer
The somatic membrane corresponds to the sublinear summation operation at a branch-
ing point. The summation can be nearly the same as the logic OR operation for the
inputs and outputs of the membrane are also either 1 or 0. Here is the equation:
V =M∑
m=1
Zm (3.3)
3.2.4 Soma layer
The result of computation in the membrane layer will be delivered to the soma.
The neuron fires when the membrane potential exceeds the threshold. The inputs
and outputs can be expressed with values of either 1 or 0. Thus we use a sigmoid
operator described as follows. When θsoma and k are set to 0.5 and 5 respectively, the
output of the neuron will be fixed to either 1 or 0.
O =1
1 + e−k(v−θsoma)(3.4)
23
Direct
Connection
Constant 1
Connection
Constant 0
Connection
Inverse
Connection
Figure 3.2: Four connections.
3.2.5 Neuronal-pruning Function
Pruning techniques start by learning a larger than necessary network and then remove
the nodes and weights which are considered to be redundant [73,74]. In the proposed
single neuron model, there are two pruning mechanisms namely axon pruning and
dendritic pruning. An input is connected to a branch by a direct connection (l),
an inverted connection ( z), a constant-0 connection ( 0⃝), or a constant-1 connection
( 1⃝), as shown in Fig. 6.2.
Synaptic pruning: In the constant-1 connection, the output of synaptic layer is
always 1. It will have no impact on the product result in the dendrite layer as any
value multiplies 1 will be itself when a multiplication operation is performed. Thus
the synaptic layer with constant-1 connection can be overpassed.
Dendritic pruning: The product result will always be 0 when there is a constant-
0 connection in the dendritic layer. The whole dendrite layer could be eliminated
without influence.
The process of pruning is illustrated in Fig. 5.2. The initial structure has four
synaptic layers, two dendritic layers, a membrane layer and a soma as shown in Fig.
5.2(a). On the Dendrite-1 layer, the connection state of input x2 is constant 1, so
this synaptic layer could be omitted. On the Dendrite-2 layer, the connection state
of input x3 is constant 0, thus the Dendrite-2 layer should be completely removed
sice the output of the Dendrite-2 layer will be 0. The removed synapse or dendrites
will be illustrated in dotted lines as shown in Fig. 5.2(b). Fig. 5.2(c) shows the final
simplified dendritic morphology of neuron that only the input x1 on Dendrite-1 layer
can influence the final output of the soma.
24
MembraneSoma
Branch-1
Branch-2
X1
X2
X4
Synaptic
(b)
MembraneSoma
Branch-1
Branch-2
X1
X2
X3
X4
Synaptic
(a)
MembraneSoma
Branch-1
X1
Synaptic
(c)
Figure 3.3: Evolution of predicted dendrite structure by neural pruning.
3.3 Error Back-propagation Learning algorithm
The proposed neuron model is a feed-forward multilayer network, and the functions
of these nodes are all differential. Therefore, the back propagation (BP) algorithm is
employed to learn the connection types of the connection layer of the neuron model.By
using a learning rule, we can readily derive a neuron model from the condition of the
least squared error between the actual output O and the desired output T defined as:
E =1
2(T −O)2 (3.5)
25
According to the gradient descent learning algorithm, the synaptic parameters wim
and θim will be modified in the direction to decrease the value of E. The equations
are shown as:
∆wim(t) = −η∂E
∂wim
(3.6)
∆θim(t) = −η∂E
∂θim(3.7)
where n is a positive constant representing the learning rate. The partial differentials
of E with respect to wim and θim are computed as:
∂E
∂wim
=∂E
∂O· ∂O∂V· ∂V∂Zm
· ∂Zm
∂Yim
· ∂Yim
∂wim
(3.8)
∂E
∂θim=
∂E
∂O· ∂O∂V· ∂V∂Zm
· ∂Zm
∂Yim
· ∂Yim
∂θim(3.9)
The components in the above partial differential are shown as follow.
∂E
∂O= O − T (3.10)
∂O
∂V=
ke−k(v−θsoma)
(1 + e−k(v−θsoma))2(3.11)
∂V
∂Zm
= 1 (3.12)
∂Zm
∂Yim
=I∏
L=1andL=i
YLm (3.13)
∂Yim
∂wim
=kxie
−k(xiwim−θim)
(1 + e−k(xiwim−θim))2(3.14)
∂Yim
∂θim=−ke−k(xiwim−θim)
(1 + e−k(xiwim−θim))2(3.15)
The parameters wim and θim are updated according to the equations as follows.
wim(t+ 1) = wim(t) + ∆wim (3.16)
26
Table 3.1: Parameter setting.
Method Parameter settingThe proposed model η = 0.1, m = 10, epoch=1000, k = 5, θsoma = 0.5BPNN η = 0.1, m = 10, epoch=1000
Table 3.2: Exclusive OR problem.
HHHHHHX2
X1 0 1
0 0 11 1 0
θim(t+ 1) = θim(t) + ∆θim (3.17)
3.4 Experimental results and discussion
The experimental results of the Exclusive OR (XOR) problem are explained in this
section, conducted on MATLAB (R2013b). Here, the performance of the proposed
model is compared with the classical back propagation neural network (BPNN) by
mean square error (MSE) and accuracy. Table 1 shows the parameter setting of the
proposed neuron model and BPNN, in which the same learning rate and the hidden
layer is set.
The classic Exclusive OR problem has two inputs and its teacher signal is shown
in Table 2.
3.4.1 Performance comparison
3.4.1.1 Convergence comparison
Fig. 5.3 shows the comparison of convergence speed of the proposed model and
BPNN. As we can see in Fig. 5.3 , the proposed model provides the lower error
training in contrast and has better convergence rate than BPNN.
27
Table 3.3: Classification accuracy.
Method AccuracyProposed method 4/4 100%
BPNN 3/4 75%
0 200 400 600 800 10000
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Epoch
Mea
n E
rror
Proposed modelBPNN
Figure 3.4: Convergence graphs obtained by the proposed dendritic neuron modeland BPNN.
3.4.1.2 Classification accuracy comparison
The comparison of classification accuracy is shown in Table 3. The accuracy 100%
mean the learned output of neuron model for a pattern is same with its teacher
signal. It is shown that the proposed model achieve the success of all patterns for the
Exclusive OR problem while BPNN just gets 3 success out of 4 patterns.
3.4.2 The synaptic and dendritic morphology after learning
We simplify the structure of dendrites according to the pruning mechanisms that the
synaptic layer with constant-1 connection can be completely omitted and the dendritic
layer with 0 connection should be removed. After learning, the neuron produced the
morphology shown in Fig. 5.4. It is interesting to note that the dendritic branches
1, 2, 3, 4, 7, 8, 9 and 10 showed at least a 0-constant synaptic connection. Because
one branch performs a multiplicative operation for all inputs, branches 1, 2, 3, 4, 7,
28
soma
Membrane
Branch-10 1
0 1
1
0
0
0 0
1 0
x1 x2
Branch-2
Branch-3
Branch-4
Branch-5
Branch-6
Branch-7
Branch-8
Branch-9
Branch-10
0
1
0 1
Branch-5
Branch-6
Membranex1 x2
soma
(a) (b)
Figure 3.5: Predicted dendrite structure by neural pruning obtained by the proposedmodel.
8, 9 and 10 could be eliminated, corresponding to the degeneration of the dendritic
branches. Therefore, we can rewrite Fig. 5.4(a) to Fig. 5.4(b).
3.5 Conclusion
In this study, we have presented a new model that captures the nonlinear interaction
among excitatory and inhibitory inputs on dendrites with a multiplicative operation.
Each synapse receives its input and passes it through a sigmoidal nonlinear function.
The output of each synapse is conveyed to the dendritic branches, and each branch
performs a simple multiplication of its inputs. This gives each segment of dendrite its
own computational power. We have demonstrated that the single neuron is capable
of solving the classical exclusive OR problem and get the desired accuracy of 100%.
This model may offer fundamental new insight to the neuron’s function and help to
predict cell morphology and the spatial distribution of synapses.
29
Chapter 4
Dendritic Neural Model:Immunological Learning Algorithm
4.1 Research Background
Extremely great number of neurons compose the brain, where the fundamental struc-
ture in each single neuron consists of an axon, a dendrite, a cell membrane, and a cell
body. Probably the most striking feature of a neuron is its characteristic morphology:
dendritic and axonal processes sprout as intricate tree structures to enable connec-
tions with other neurons. Through their dendrites, neurons receive signals from other
neurons, and via their axons they transmit signals to other neurons. Historically,
research on neuronal morphologies has focused more strongly on dendrites because
the larger diameters of their branches make them more amenable experimentally and
dendrites cover a more restricted space compared to axons [75]. Dendrites receive the
far majority of synaptic inputs to a neuron. The spatial distribution of inputs across
the dendrites can be exploited by neurons to increase their computational repertoire.
The role of dendrites in neural computation has recently received more and more
attentions. The exploration of the role of dendrites in neural input integration was
pioneered by Wilfrid Rall. This started in the 1950s with experimental work by Eccles
and others that suggested surprisingly brief membrane time constants for certain cat
spinal motoneurons. Those time constant estimates relied on the assumption that
motoneurons could be described as point neurons and, therefore, that voltage tran-
30
sients followed exponential time courses. In the literature, many works have reported
regarding the dendritic computation of a single neuron [69,76–80].
Traditional and dominant computational model on single neuron is the binary
McCulloch-Pitts neuron, which has been criticized to be oversimplified for the no
consideration of nonlinearities in a dendritic tree [81]. The powerful computational
capacity of dendritic processing has been taken into consideration when construct
more plausible neural models. Specifically, the synaptic interaction at the turning
point of a branch can be implemented by Boolean logical operations according to the
hypothesis by Koch, Poggio and Torre [69]. It suggested that the dendritic branch
point may sum currents from the dendritic branches, such that its output would be a
logical OR of its inputs, while each of the branches would perform a logical AND on
their synaptic inputs. Moreover, a logical NOT operation can represent the inversion
of a signal. However, its difficult for Kochs model to distinguish diverse synaptic
and dendritic morphology in solving specific and complex problems, for a slight d-
ifference of morphology can result in great functional variation [69]. Most recently,
we proposed a single four layered neuron model [68] with synaptic nonlinearities in a
dendritic tree including a synaptic layer, a branch layer, a membrane layer and a soma
layer. We assumed that each branch received signals at their synapses and performed
a multiplication of these signals, while the synapses performed a sigmoidal nonlinear
operation on their inputs. The branching point summed up each multiplied input and
then the current was transmitted to the cell body (i.e. soma). When exceeding the
threshold, the cell fired and sent signal down to other neurons through axon. An error
back propagation algorithm was used to train the neuron model and according to the
pruning function, useless synapses and dendrites would be removed during training,
forming a distinct synaptic and dendritic morphology. Moreover, the nonlinear inter-
actions in a dendrite tree were expressed using the Boolean logic AND, OR and NOT.
Nevertheless, the error BP algorithm used in the original work suffered from the local
optimal problem, which limited the learning capacity and computational plausibility
of the dendrite neural model.
This study aims to propose an effective training algorithm for the dendritic neu-
31
ral model. The training process of a neural model is an important aspect, especially
to neural model with nonlinear dendrites, and this process is also considered to be
related with the neural plasticity and dendritic morphology [82,83]. In our previous-
ly proposed dendritic neural model, the training is not only for the corresponding
mapping between the input signals from other neurons and the output the current
neuron though the associated dendrites, but also for the final formation of the den-
dritic morphology [68]. Similar to the training process in multiple-layered perceptron,
training dendritic neural model can also be regarded as a difficult global optimiza-
tion problem, despite the fact that local optimizers are usually applied for training.
Investigation of applying global optimizers to training is well-motivated, since local
optimizers have basically limited capabilities for global optimization. A further mo-
tivation comes from the need to apply transfer function or regularization approaches
that do not satisfy the requirements concerning the availability of gradient informa-
tion. Convergence to a locally optimal solution is a fundamental limitation of any
local search based training approach including BP. Based on above considerations,
we propose an artificial immune algorithm which is inspired from biological immune
systems to train the dendritic neural model. A population of antibodies are generated
and manipulated to optimize the weights and thresholds parameters in the synapses
though somatic hyper-mutation and receptor editing operators. After learning, the
final dendritic morphology of the neuron which is capable of handling specific tasks
can be obtained. Two distinct experiments based on the famous XOR problem and
a geotechnical engineering problem demonstrated the effectiveness of the proposed
artificial immune algorithm.
4.2 Single Dendritic Neural Model for Morpholo-
gy Prediction
To fully realize the sense of locality in a single neuron, local interactions within a
fixed dendritic tree should be considered in the realization of the computation, not
32
x1 x2 x3 x4 x5
M=1
M=2
M=3
M=4
M=5
Soma
Synapse Membrane
Dendrite
Figure 4.1: Schema of a neuron model with dendritic branches. Axons of presynapticneurons (input X) connect to branches of dendrites (horizontal blue lines) by synapticlayers (black triangles); the membrane layer (vertical blue lines) sums the dendriticactivations, and transfers the sum to the soma body (black sphere).
only for a better biologically plausibility but also for a more powerful computational
capacity. Such a single neuron model with four layers including a synaptic layer, a
dendrite layer, a membrane layer, and a soma layer was proposed in our previous
work [68]. To make the paper self-explanatory, we describe the details of the model
in the following.
The structure of the dendritic neuron model is illustrated in Fig. 4.1. The synaptic
layer represents the synaptic connections to the dendrite of neuron, which is imple-
mented by the receptors that take in a certain specific ion. When an ion enters
the receptor, the potential of the receptor changes and determines whether it is an
excitation synapse or an inhibition synapse. A sigmoid function is used to express
connection states. Its node function from the i-th (i = 1, 2, 3, ..., I) synaptic input to
the m-th (m = 1, 2, 3, ...,M) synaptic layer is expressed by the following equation.
Yim =1
1 + e−k(wimxi−qim)(4.1)
33
01
Direct
Connection
Inverse
Connection
Constant 1
Connection
Constant 0
Connection
Random
Connection
Figure 4.2: Four connection states of synaptic layers. The left figure responds statebefore training, each synaptic layer will land on one of the right four connection statesby training, which constitutes the structure of ALMN.
Where xi is the input part of a synapse, referred to as the presynaptic terminal,
and its range is [0, 1]. wimand qim are connection parameters; k is set to be 5. With
different values of wim and qim, six cases correspond to the four connection states: a
constant 0 connection, a constant 1 connection, a reversed connection and a direct
connection, shown in Fig. 4.2. Using the synaptic layers, we transform the inputs
into digital signals which are composed of “0” and “1”. θim is the threshold of the
synaptic layer, which is calculated by the function θim = qim/wim.
Direct connection
Case (a): 0 < qim < wim, e.g. : wim = 1.0 and qim = 0.5.
In the direct connection, if xi exceeds the threshold θim, the output is set to be 1, and
if less than θim, the output will be 0. This means that if the input is high potential
compared with the threshold, the synapse is an excitatory one, an excitatory signal
has occurred. Conversely, a low potential produces an inhibitory synapse, resulting
in an inhibitory signal.
Inverse connection Case (b): wim < qim < 0, e.g. : wim = −1.0 and
qim = −0.5.
In the inverse connection, contrary to the direct connection, if input xi does not reach
the threshold θim, the output is set to be 1, and it evokes an excitatory signal. If the
input is larger than the threshold θim, the output is 0, and an inhibitory signal will
be triggered by the output. This can be expressed by the logic NOT operation.
Constant 1 connection
34
-2 -1.5 -1 -0.5 0 0.5 1 1.5 20
0.20.4
0.60.8
1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 20
0.20.4
0.60.8
1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 20
0.2
0.4
0.60.8
1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 20
0.2
0.4
0.60.8
1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 20
0.2
0.4
0.60.8
1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 20
0.2
0.4
0.60.8
1
(a) Direct connection (b) Inverse connection
(c1) Constant 1 connection (c2) Constant 1 connection
(d1) Constant 1 connection (d2) Constant 1 connection
y yyy
y y
x x
x x
x x
Figure 4.3: Six function cases of the synaptic layer. The graph’s horizontal x axisrepresents the inputs of presynaptic neurons; the vertical y axis shows the output ofthe synaptic layer. Because the range of x is [0,1], only the corresponding part needsto be observed.
There are two states in the Constant 1 connection.
Case (c1): qim < 0 < wim, e.g. : wim = 1.0 and qim = −0.5;
Case (c2): qim < wim < 0, e.g. : wim = −1.0 and qim = −1.5.
In the Constant 1 connection, whether the input exceeds the threshold θim or not,
the output is always 1. In this connection state, the dendrite layer will barely receive
the constant 1 digital signal from the synapse. An excitatory synapse is fixed in this
position; once input signals enter, excitatory output signals will be exported.
Constant 0 connection Two states are discovered in the Constant 0 connection.
Case (d1): 0 < wim < qim, e.g. : wim = 1.0 and qim = 1.5;
Case (d2): wim < 0 < qim, e.g. : wim = −1.0 and qim = 0.5.
In these states, the output is 0, independent of the input signal. In this connection
state, the synapse always degenerates into an inhibitory one; output signals remain
inhibitory. The functions of all cases are shown in Fig. 4.3.
Dendritric layer The dendrite layer represents the nonlinear interaction between
35
synaptic signals on each branch. The multiplication operation has been thought
to play an important role in the processing of neural information in the sensory
systems, where a range of visual and auditory processes are believed to be underpinned
by multiplication [84], [85]. Our model adopts the multiplicative operation in the
dendrite layer. Since the inputs and outputs of the dendrite layers are either 1 or
0, the multiplication becomes exactly equal to the logic AND operation. Here, the
dendritic equation is shown as follows.
Zm =I∏
i=1
Yim (4.2)
Membrane layer The membrane layer accumulates the sublinear summation of
the signals in each dendritic branch. The inputs and outputs of the membrane layers
are also either 1 or 0; because the threshold of soma body is set to be 0.5, unless all
inputs are 0, the result of summation will activate the soma body, the same as that
of OR operation; thus, the summation can be substituted by the logic OR operation.
The equation is shown as follows.
V =M∑
m=1
Zm (4.3)
Soma layer The soma layer represents the soma cell body. The neuron fires
depending on whether or not the membrane potential exceeds the threshold. We
express it using a sigmoid operation of the product terms, which can be described
mathematically by Eq.(4).
O =1
1 + e−k(V−θsoma)(4.4)
Where θsoma and k are the parameters of the cell body. When θsoma and k are set to
0.5 and 5 respectively, the output of the neuron will be fixed to either 1 or 0.
Neuronal-pruning Function Axon pruning: for the axon inputs in the Constant
1 connection, the output of the synaptic layer is 1. Because of the multiplication
operation, an arbitrary value times 1 yields itself. This means the synaptic input has
36
no influence on the result of the dendrite layer; hence, we can completely omit this
synaptic layer input.
Dendritic pruning: once the axon inputs are in the Constant 0 connection, the
output of the layer is 0, since any value multiplied by 0 yields 0. The multiplication
operation makes the entire dendrite layer be 0, regardless of any other synaptic signals
in the dendrite layer. Since the dendrite layer has no influence on the membrane layer,
the entire dendrite layer should be deleted.
With the above approaches, the neural network can complete the neural pruning
procedure, which screens out the useless synapses and unnecessary dendrites to sim-
plify the dendrite structure. Illustratively, we use the above approaches to simplify
the structure in Fig. 4.4(a). The original structure contains four synaptic layers, two
dendrite layers, a membrane layer, and a soma body. On the Dendrite-1 layer, the
connection state of input x2 is Constant 1, so this synaptic layer would be ignored.
On the Dendrite-2 layer, the connection state of input x1 is Constant 0, so the output
of Dendrite-2 will remain 0. Therefore, we discard the entire Dendrite-2, shown by
the dotted line in Fig. 4.4(b). Finally, we find that only input x1 on Dendrite-2 can
influence the final result of the soma body, shown in Fig. 4.4(c). As such, ALNM
simplifies the dendrite morphology of the neurons using the neural pruning function.
4.3 Artificial Immune Training Algorithm
4.3.1 Immunological Inspiration
Plenty of optimization methods have been applied to train neural networks’ weight-
s and other parameters, either local or global ones. These algorithms include BP,
modified BP [86], BP using conjugate gradient approach [87], the Marquadt algo-
rithm [88], evolutionary algorithm [89], differential evolution [90], particle swarm
optimization [91], etc. Naturally, local searches such as [86–88] are fundamentally
limited to local solutions, while global ones [89–91] attempt to avoid this limitation.
37
MembraneSoma
SynapseDendrite-1
x1
(c) Final dendrite structure
1
0
Membrane
Soma
Synapse
Dendrite-1
Dendrite-2
x1
x2
x 1
x2
(b) Structure during neural - pruning
1
0
MembraneSoma
Synapse
Dendrite-1
Dendrite-2
x1
x2
x1
x2
(a) Original dendrite structure
Figure 4.4: Evolution of predicted dendrite structure by neural pruning.
The training performance varies depending on the objective function and underly-
ing error surface for a given problem and network configuration. Since the gradient
information of error surface is available for the most widely applied network configu-
ration, the most popular optimization methods have been variants of gradient based
back-propagation algorithms. Of course, this is sometimes the result of an insepa-
rable combination of network configuration and training algorithm which limits the
freedom to choose the optimization method [92–95].
An attempt of utilizing artificial immune algorithm to train the dendritic neural
network is made in this study. The immune system is a mobile, dynamic, stable,
highly distributed and collaborative system. Many features and principles in the im-
mune system have been discovered and abstracted for artificial systems. The natural
immune system is a complex pattern recognition device with the main goal of protect-
ing our body from malefic external invaders, called antigens. The primary elements
are the antibodies, which bind to antigens for their posterior destruction by other
38
Antigen
Antigen
Peptide
MHC
Protein
APC
Peptide- MHCTh Cell
Activated
Th Cell
Ts Cell
IL+
IL-
(I)
(II)
(III)
(IV)
(VI)
(V)Antibody
B Cell
Activated B
Cell
(Plasma Cell)
Figure 4.5: Biological immune procedures used as the training algorithm for singledendritic neural model.
cells. The number of antibodies contained in our immune system is known to be
much inferior to the number of possible antigens, making the diversity and individual
binding capability the most important properties to be exhibited by the antibody
repertoire. In the immune system, affinity is an important measure to represent the
fitness of antibody to antigen. When there are detected antigens, the immune system
will choose B cells with higher affinity to proliferate, which is called clonal selection
and proliferation. When the antigen are eliminated, the B cells with lower affinity
will be chosen for elimination. These two procedures make the antibody population
stable. Moreover, the proliferation and elimination are specific to antigens, as they
take actions according to the affinity. Therefore, they also contribute to the diversity.
The key principles of clonal selection theory are:
(1) the clonal selection is based on the affinity;
(2) the clonal proliferation is followed by hypermutation and receptor editing;
(3) the B cells with lower affinity are eliminated after the elimination of antigens.
The general biological immune responses are shown in Fig. 4.5.
39
4.3.2 Training Algorithm based on Immune Mechanisms
The optimization of the dendritic neuron model is regarded as the antigen, while the
parameters of weights and thresholds in synapses as shown in Eq. 4.1 are treated as
the antibody. Initially, a set of N antibodies are randomly generated where wim are
generated from [−1, 1] and qim are generated in the interval of [−1.5, 1.5]. A number
of n(n < N) fittest antibodies are selected from the initial pool based on the least
squared error function between the actual output O and the desired output T , which
is showed as follows.
E =1
2(T −O)2 (4.5)
Each selected elitist antibody is separated into n distinct pools in ascending or-
der. After selection, the resulting antibodies are regarded as the population A(t)
manipulated in the generation.
According to the clonal selection theory, the elitist antibodies are proliferated
wherein the cells divide themselves, creating a set of clones identical to the parent
antibodies. The proliferation rate is directly proportional to the affinity level, the
higher affinity levels of antibodies, the more of them will be readily selected for
cloning and cloned in larger numbers. The amount of clones generated is according
to the following rule:
ni = ⌈n− i
n×K⌉ (4.6)
where the function ⌈⌉ rounds its argument towards the closest integer. i is the
ordinal number of the elite pools, and K is a multiplying factor which determines the
scope of the proliferation.
Mutation operator enables the algorithm capable of finding better solutions gener-
ation by generation. It plays a very important role in the solution evolution process.
There are two types of mutation operators in clonal selection based model, one is the
hyper-mutation mainly performing search in a local domain and the other is the re-
40
Soma
O
X1 X2 XI
W=(wim) i=1,2, ,I
Q=(qim) m=1,2, ,M
Let P=(W Q)
(1) Hyper-mutation operator HM:
HM(P)=P + g×Guass(0,1)
(2) Receptor Editing operator RE:
RE(Pi, P
j, r)= P
iwhen i r and m R
RE(Pi, P
j, r)= P
jwhen i=r or m=R
Figure 4.6: Mutation operators used in the artificial immune training algorithm.
ceptor editing aiming to act as a global search and help the algorithm jump out of the
local minima. The details of the mutation operators are illustrated in Fig. 4.6, where
the synaptic parameters are jointed together as P = (WQ). The hyper-mutation
operator HM performs a unary mutation on each antibody by bit changing variation
which is sampled from the classic Gaussian distribution. A shrinking parameter g
is used to control the mutation influence, and it reduces gradually along with the
generation as in the following.
g(t+ 1) = α× g(t) (4.7)
where the shrinking factor α is usually set to be 0.95 in the experiments.
The receptor editing operator RE is a binary operator on two antibodies and
actually carries out a crossover-like mutation. The randomly generated number r is
used to select the position where the exchange of antibody points take place.
After manipulated by mutation operators, the fittest one is selected from each elite
pool respectively to replace its corresponding parent antibody. If the fitness of selected
child antibody is larger than that of the parent, replacement takes place; otherwise
not. This process conduces to a fitter antibody population. The above procedures
41
are iterated until a terminal condition is satisfied. An adiaphorous method is used
to set a maximum number of generation T . When the current generation reaches T ,
the training algorithm is terminated and the best weights and thresholds in synapses
are output. Finally, the resultant dendritic morphology is obtained.
4.4 Simulation Results
4.4.1 Experiments Setup
Two kinds of experiments are implemented to verify the effectiveness of the proposed
artificial immune algorithm. The first one is the famous Exclusive OR (XOR) prob-
lem which is frequently utilized as one of the benchmark test problems because the
traditional single layered perceptron had been demonstrated to be failed to solve this
simple but nonlinear problem. The second experiment is based on the slope stability
analysis [96] which is a practical and important geotechnical engineering problem.
All the experiments are conducted on MATLAB (R2013b).
4.4.2 Results Analysis and Discussions
The training data of XOR is shown in Table 4.1, while the training data set and test
data set of the slope stability analysis problem are summarized in Table 4.2 and 4.3
respectively, where Γ is unit weight, C is cohesion, ϕ is friction angle of soil, H is
height of slope, β is slope angle and ru is pore pressure parameter. In this study, we
consider the slope stability analysis problem as a classification problem. The slope
failures are complex natural phenomena that constitute a serious natural hazard in
many countries. Many variables are involved in slope stability evaluation, and the
calculation of the factor of safety requires geometrical data, physical data from the
geologic materials and their shearstrength parameters (cohesion and angle of internal
friction), information on pore-water pressures, etc. Engineering assessment of earth
slope stability is usually performed using algorithms in determining its susceptibility
to failure in terms of the factor of safety. Depending on whether the factor of safety
42
Table 4.1: Target XOR training data.
X1 X2 Desired Output0 0 00 1 11 0 11 1 0
is greater or less than 1, the slope is considered stable or unstable [96,97].
Both the BP and artificial immune algorithm are used to train the dendritic neuron
model when applied to XOR and the slope stability analysis. The classification of
the slope stability is defined in terms of state of the slope, stable or failed slopes; SS
is taken as 1 and 0 for stable and failed slope, respectively. Such type of analysis is
also made for liquefaction potential evaluation of in situ soil using neural networks.
The factor of safety calculated based on limit equilibrium method is used as the
output for the neural network model developed for predicting the factor of safety.
The comparative results are summarized in Table 4.4.
From Table 4.4, it is clear that the proposed artificial immune algorithm can
produce better solutions than BP when training the dendritic neural model, no matter
on the simple but nonlinear XOR problem, or the practical engineering problem. In
addition, we also show a corresponding dendritic morphology predicted by the trained
single neuron model on XOR problem in Fig. 4.7, suggesting that the proposed
algorithm is also capable of predicting the morphologies of neurons.
4.5 Conclusion
In this study, we propose an artificial immune algorithm to train the dendritic neuron
model. Derived from the parallel computing mechanism of the population and the
no need of gradient information, the artificial immune algorithm which is a global
optimization method has been verified that it is superior to traditional local optimizer
BP method in terms of the average final least squared learning error on two tested
43
Branch-5
Branch-6
Membranex1 x2
soma
Figure 4.7: Final dendritic morphology of the XOR problem after training.
problems. In future, we plan to investigate the user-defined parameter sensitivities
of the proposed artificial immune algorithm and apply the proposed model to more
various problems.
44
Table 4.2: The training data set of slope stability classification problem.
Γ (kN/m3) C (kPa) ϕ (o) β H (m) rn FC
18.68 26.34 15 35 8.23 0 018.84 14.36 25 20 30.5 0 118.84 57.46 20 20 30.5 0 128.44 29.42 35 35 100 0 128.44 39.23 38 35 100 0 120.6 16.28 26.5 30 40 0 014.8 0 17 20 50 0 014 11.97 26 30 88 0 025 120 45 53 120 0 118.5 25 0 30 6 0 018.5 12 0 30 6 0 022.4 10 35 30 10 0 121.4 10 30.34 30 20 0 122 0 36 45 50 0 012 0 30 35 4 0 112 0 30 45 8 0 012 0 30 35 4 0 112 0 30 45 8 0 0
23.47 0 32 37 214 0 016 70 20 40 115 0 0
20.41 24.9 13 22 10.67 0.35 121.82 8.62 32 28 12.8 0.49 020.41 33.52 11 16 45.72 0.2 018.84 15.32 30 25 10.67 0.38 121.43 0 20 20 61 0.5 019.06 11.71 28 35 21 0.11 018.84 14.36 25 20 30.5 0.45 021.51 6.94 30 31 76.81 0.38 014 11.97 26 30 88 0.45 018 24 30.15 45 20 0.12 023 0 20 20 100 0.3 022.4 100 45 45 15 0.25 1
45
Table 4.3: The test data set of slope stability classification problem
Γ (kN/m3) C (kPa) ϕ (o) β H (m) rn FC
22.4 10 35 45 10 0.4 020 20 36 45 50 0.25 020 20 36 45 50 0.5 020 0 36 45 50 0.25 020 0 36 45 50 0.5 022 0 40 33 8 0.35 120 0 24.5 20 8 0.35 118 5 30 20 8 0.3 116.5 11.49 0 30 3.66 0 026 150.05 45 50 200 0 122 20 36 45 50 0 0
19.63 11.97 20 22 12.19 0.41 018.84 0 20 20 7.62 0.45 024 0 40 33 8 0.3 1
Table 4.4: Average final least squared error after learning using BP and artificialimmune algorithm for XOR and slope stability.
Algorithm XOR Training data Testing dataBP 0.25 0.46 0.75
Artificial immune algorithm 0.15 0.21 0.32
46
Chapter 5
Dendritic Neural Model:Classification Ability
5.1 Introduction
Liver disease is one of the top 10 leading causes of death that affects 30 million Amer-
icans of all ages, genders, races and life circumstances and the number keeps growing
aggressively [98]. There should be greater public awareness about liver health and
early treatment. Although numbers of deaths caused by cancer or other diseases
are still much greater, liver disease kills people at a much younger age between the
ages of 25 and 64 [99]. Therefore it is much more prominent to find out methods to
detect liver disease at an early stage. There are many risk factors from genetic to
autoimmune, or environment and behavior [98]. Accurate diagnosis of liver disease
can never been an easy task. The information afforded by the patients may conclude
redundant and interrelated symptoms and signs which can complicate the diagno-
sis of liver disease, leading to the delay of a correct diagnosis decision. Thus it is
imperative to find much more effective and advanced diagnosis methods to identify
multidimensional relationships in clinical data of liver disease, as well as improving
the accuracy of diagnosis.
There are many kinds of methods for the liver disorders classification problem
including decision trees, ensemble learning, linear regression, naive Bayes, k-nearest
neighbors algorithm, artificial neural network (ANN), support vector machine, etc.
47
The intelligent system that includes the artificial neural networks based expert system
for the automatic liver disorders diagnosis is becoming popular among researcher-
s [100–102]. The ability of the system to approximate complex and non-linear prob-
lems without knowing the mathematical representations of the system and the learn-
ing process that mimic the human brain lead to this popularity. The ANN also out-
performs the conventional statistical technique for the prediction and classification
purposes in various fields of applications, as revealed in [103].
On the basis of the latest research on the properties of neurons [104–108], we
propose a more realistic model of single neuron computation with synaptic nonlin-
earities (NMSN) in a dendritic tree for liver disease diagnosis. By modeling synaptic
nonlinearity with a sigmoid function, we show that such a single neuron is capa-
ble of computing linearly non-separable functions and approximating any complex
continuous function. The nonlinear interactions in a dendrite tree are expressed us-
ing the Boolean logic AND (conjunction), OR (disjunction) and NOT (negation),
instead of executing a complex function calculations. The model is equipped with
a neuron-pruning function that can remove useless synapses and dendrites during
learning, forming a distinct synaptic and dendritic morphology without sacrificing
the predictive accuracy. Thus we can use the model to select features for identify-
ing the underlying causes of disorders, to reduce the number of inputs to save the
diagnosis time and to achieve high classification accuracy of liver disease. We also
develop a back-propagation based learning algorithm capable of modifying synapses
adequate for performing the task. The model not only is able to achieve high accuracy,
sensitivity and specificity rate, but also can provide explanation for its predictions,
thus showing promise as an effective pattern classification method in liver disease
diagnostics.
The remaining of the paper is organized as follows. Section 2 presents some char-
acteristics of classic artificial neural network and its application in medical diagnosis,
therein the discovery of synaptic nonlinearity in single neuron is also specially de-
scribed. Section 3 introduces the proposed neuron model NMSN in details. NMSN’s
learning algorithm is described in Section 4. Section 5 presents the experimental re-
48
sults using the BUPA liver disorders datasets. Finally, Section 6 gives the discussions
and future works to conclude this paper.
5.2 Backgrounds
5.2.1 ANN in medical diagnosis
An artificial neural network (ANN) is a mathematical representation of the human
neural architecture, reflecting its “learning” and “generalization” abilities. First dom-
inant conceptual model on neural networks was a single neuron model called Mc-
Culloch-Pitts neuron [64]. Learned by the back-propagation (BP) algorithm, the
non-linear processing capabilities of ANN had been demonstrated [1]. ANNs have
been intensively applied for classification tasks in medical diagnosis [109]. Clinical
diagnosis was one of the first areas using ANNs [110]. Due to the ability of predic-
tion, parallel operation, and self-adaptivity, ANN has provided a powerful tool for
physicians to analyze, compute and figure out complex data across many medical
applications. The techniques help the disease diagnosis usually by learning the basic
characteristics to use in the decision making processes, trying to solve a quantitative
classification problem instead of qualitative diagnosis, which is more objective. The
application of ANNs in medical diagnosis has been previously described in general
in [111].
There are many studies using ANNs for liver disease diagnosis. Some of the
typical ones are introduced in the following. Jeatrakul and Wong carried out a com-
parison of the classification performance of liver disease conducted by five differen-
t types of neural networks which were back-propagation neural network (BPNN),
radial basis function neural network (RBFNN), general regression neural network
(GRNN), probabilistic neural network (PNN), and complementary neural network
(CMTNN). During them, the best classification accuracy of 70.29% was obtained by
CMTNN [100]. Besides, Zhang et al. [101] proposed new types of single-output and
multi-output Chebyshev-polynomial feed-forward neural network, named as SOCPN-
49
N and MOCPNN, to classify real-world datasets and both methods obtained a testing
accuracy of 66.78%. The best classification accuracy without noise of these methods
based on liver disorders datasets were acquired by two methods (i.e., SOCPNN and
MOCPNN) proposed in [101]. In addition, Seera and Lim used a fuzzy Min-Max
neural network to classify the liver disorders and got an accuracy of 67.25% [102].
It is obvious that a lot of research effort has been undertaken in order to perform
liver disease classifications using ANNs. However few has considered single neuron
models which are thought to be unable to solve multidimensional and nonlinear prob-
lems.
5.2.2 The discovery of synaptic nonlinearity in single neuron
The Mc-Culloch-Pitts neuron model has been widely used as a basic unit for modern
studies of neural networks which multiplies the input vector by a weight vector, and
then passes through a linear threshold gate. The neurons can have the ability to
learn arbitrary linearly separable dichotomies of the inputs space through adjusting
weights and thresholds of synapses [112].
In the traditional ANNs’ literatures, the prevailing view has been that the brain
has strong computational abilities because of the complex connectivity of neural net-
works, in which a single neuron could only perform a linear summation and a nonlin-
ear thresholding operation (all-or-none response) [64]. That is a single neuron model
cannot be used in the medical diagnosis as the clinical data generally used for di-
agnosing are multidimensional and nonlinear in nature [113]. As a consequence, the
contribution of single neurons and their dendrites has long been overlooked.
Recently it has been conjectured by a series of theoretical studies that individual
neurons could act more powerfully as computational units considering synaptic non-
linearities in a dendritic tree [69, 114–117]. The various types of synaptic plasticity
and nonlinearity mechanisms allow synapses to play a more important role in com-
putations [106]. Synaptic inputs from different neuronal sources can be distributed
spatially on the dendritic tree and plasticity in neuron can result from changing in
50
synaptic strength or connectivity, and the excitability of the neurons themselves [105].
Moreover, a slight morphological difference can just cause great functional variation,
acting as filters to determine what signals a single neuron receives and then how these
signals are integrated [118]. Blomfield proposed a pioneering theory that showed
that synaptic interactions in each individual neuron could be additive or multiplica-
tive [114]. Multiplicative operations may play a key role in neuronal computation
suggested by Schnupp and King [119]. It was proposed by theoreticians that the non-
linearity of synapses could be used to implement a type of multiplication instead of
summation [104]. Koch, Poggio and Torre hypothesized that the synaptic interaction
and the action at the turning point of a branch can be implemented by Boolean log-
ical operations [69]. It suggested that the dendritic branch point may sum currents
from the dendritic branches, such that its output would be a logical OR of its inputs,
while each of the branches would perform a logical AND on their synaptic inputs.
Moreover, a logical NOT operation can represent the inversion of a signal.
However, there are still difficulties for the so-called Koch’s model [69] to distinguish
diverse synaptic and dendritic morphology in solving specific and complex problem-
s [105], such as liver disorders diagnostics. Thus structural plasticity mechanisms in
synapses and dendrites are needed to support resolving the problem, including for-
mation and elimination of synapses and dendrites in neural circuits, and then acquire
branch-specific morphology.
In recent years, considerable effort has been directed towards neuron pruning
methodology [70–72], which is a way to reflect neuron plasticity. It refers to an es-
sential progress by which extra neurons and synaptic connections are removed for the
purpose of improving the efficiency of the neurological system. These new biophysical
phenomena are helpful for us to propose the model in this paper.
51
x1 x2 xixI
soma
SynapsesMembrane
Dendrite 1
Dendrite 2
Dendrite m
Dendrite M
Figure 5.1: The architecture of the proposed dendritic neuron model.
5.3 Single Dendritic Neural Model for Classifica-
tion
The single neuron model with synaptic nonlinearities (NMSN) proposed in this paper
simulates the essence of nonlinear interactions among synaptic inputs in the den-
drites. We assume that each branch receives signals at their synapses and performs a
multiplication of these signals, while the synapses perform a sigmoidal nonlinear op-
eration on their inputs. The branching point sums up each multiplied input and then
the current is transmitted to the cell body (soma). Once exceeding the threshold,
the cell fires and sends signal down to other neurons through axon. The architecture
of NMSN can be simply expressed by four layers: a synaptic layer, a branch layer,
a membrane layer and a soma layer, as shown in Fig. 5.1, where M dendrites are
associated with a neuron, and each dendrite receives I signals from other neurons.
Arrows in Fig. 5.1 indicate the direction of the information processing. The details
of the model are described in the following.
A synapse refers to the connection between neurons at a terminal bouton of a
dendrite to another dendrite/axon or the soma of another neural cell. The direction
of information flow is feedforward, from the presynaptic neuron to postsynaptic neu-
ron. The synapse can be either excitatory or inhibitory which depends on changes in
the postsynaptic potential caused by ionotropic [104]. There should be four connec-
52
-2 -1.5 -1 -0.5 0 0.5 1 1.5 20
0.20.4
0.60.8
1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 20
0.20.4
0.60.8
1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 20
0.2
0.4
0.60.8
1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 20
0.2
0.4
0.60.8
1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 20
0.2
0.4
0.60.8
1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 20
0.2
0.4
0.60.8
1
(a) Direct connection (b) Inverse connection
(c1) Constant 1 connection (c2) Constant 1 connection
(d1) Constant 0 connection (d2) Constant 0 connection
y yyy
y y
x x
x x
x x
Figure 5.2: Six function cases of the synaptic layer.
tion states in the synaptic layer: a direct connection (excitatory synapse), a reverse
connection (inhibitory synapse), a constant 1 connection and a constant 0 connec-
tion. We show the type of connections by modeling with a one-input one-output
sigmoid function. The node function from the i-th (i = 1, 2, 3, ..., I) input to the
m-th (m = 1, 2, 3, ...,M) synaptic layer is given by
Yim =1
1 + e−k(wimxi−θim)(5.1)
where xi is the input part of a presynapse which is a set of inputs labeled by
x1, x2, ..., xI , and its range is [0, 1]. The inputs are transformed into digital signals “0”
and “1” in the synaptic layer. wim denotes synaptic parameters, and k represents a
positive constant. θim/wim is the threshold of the synaptic layer. There are six cases
of different values of the synaptic parameters. As the values of wim and θim change,
the synaptic function varies accordingly, thus exhibiting different connections states.
Furthermore, the sigmoid function is clearly differential. The functions of all six cases
are shown in Fig. 5.2.
State 1: Direct connection (Excitatory synapse)
Case (a): 0 < θim < wim. eg.: wim = 1.0 and θim = 0.5. In the direct connection,
if xi > θim/wim, the output Yim will be 1. That can be explained that if the input
is high potential compared to the threshold, an excitatory postsynaptic potential
(EPSP) will occurred as the membrane potential rapidly depolarizes. And when
53
xi < θim/wim, the output Yim will be 0. That is an inhibitory postsynaptic potential
(IPSP) has occurred as the membrane will be transiently hyperpolarizes [104]. In
other words, no matter how the inputs change between 0 and 1, the outputs equal
the input.
State 2: Inverse connection (Inhibitory synapse)
Case (b): wim < θim < 0. eg.: wim = −1.0 and θim = −0.5. In the inverse
connection, if xi > θim/wim, the output Yim will be 0, giving rise to an IPSP that
hyperpolarizes the cell. On the other hand, if xi < θim/wim, the output Yim will be
1, as the postsynaptic membrane is depolarized by generating an EPSP. So it can be
illustrated by the logic NOT operation.
State 3: constant-1 connection
Case (c1): θim < 0 < wim. eg.: wim = 1.0 and θim = −0.5. Case (c2): θim <
wim < 0. eg.: wim = −1.0 and θim = −1.5. In the constant-1 connection, the
output will be constant 1 no matter if the input exceeds the threshold or not. The
signals from the synapse have nearly no impact on the dendritic layers, for there is
an excitatory synapse that will trigger EPSPs once the input signals come in.
State 4: constant-0 connection
Case (d1): 0 < wim < θim. eg.: wim = 1.0 and θim = 1.5. Case (d2): wim < 0 <
θim. eg.: wim = −1.0 and θim = 0.5. In the constant-0 connection, the output will
always be 0. That is IPSPs will always occur and the postsynaptic membrane keeps
hyperpolarized.
The dendrite layer simply performs a multiplication on various synaptic connec-
tions of each branch. As mentioned before, the nonlinearity of synapses could be
used to implement a type of multiplication instead of summation, thus our model
adopts the multiplicative operation in the dendrite layer. It should be noted that a
soft-minimization operator was utilized in our previous dendritic neuron model [108]
to deal with binary input classification problem, while the multiplicative operation
adopted in this study can address real number input problems. The multiplication
is very equal to the logic AND operation as the value of inputs and outputs of the
54
x1 x2 x3 x4
soma
Dendrite-1
Dendrite-2
Direct
Connection
Constant 1
Connection
Constant 0
Connection
Inverse
Connection
Membrane
(a)
x1 x2 x3 x4
soma
Dendrite-1
Dendrite-2
Membrane
(b)
x1
soma
Dendrite-1
Membrane
(c)
Figure 5.3: Evolution of predicted dendrite structure by neural pruning.
dendrites are either 1 or 0. The output equation can be given as follows.
Zm =I∏
i=1
Yim (5.2)
This layer corresponds to the sublinear summation operation at a branching point.
It also should be pointed out that a soft-maximization operator was utilized in [108]
and it was replaced by a summation operator in this study. Because the inputs and
outputs of the membrane are also either 1 or 0, the summation can be nearly the
same as the logic OR operation in the binary cases. Here is the equation:
V =M∑
m=1
Zm (5.3)
The result of computation in the membrane layer will be delivered to the soma.
The neuron fires when the membrane potential exceeds the threshold. We use a
sigmoid operator described as follows.
O =1
1 + e−ksoma(V−θsoma)(5.4)
Pruning techniques start by learning a larger than necessary network and then re-
move the nodes and weights which are considered to be redundant [73,74]. The object
of pruning function is to eliminate useless connections and input nodes from neural
dendrites, thus reduce the complexity of the neuron significantly. In the proposed N-
MSN, there are two pruning mechanisms namely axon pruning and dendritic pruning
that screen out the unnecessary synapses and dendrites to simplify the structure of
55
dendrites. In general, an input is connected to a branch by a direct connection (l),
an inverted connection ( z), a constant-0 connection ( 0⃝), or a constant-1 connection
( 1⃝).
Synaptic pruning: In the constant-1 connection, the output of synaptic layer
is always 1. As in the dendritic layer a multiplication operation is performed, any
value multiplies 1 will be itself. That is to say the synaptic layer with constant-1
connection will have no impact on the product result in the dendrite layer. Thus the
synaptic layer with constant-1 connection can be negligible and be overpassed.
Dendritic pruning: As long as there is a constant-0 connection in the den-
dritic layer, the product result will always be 0. The entire dendrite layer could be
eliminated since its no influence.
The specific pruning process is illustrated in Fig. 5.3. The initial structure has
four synaptic layers, two dendritic layers, a membrane layer and a soma as shown in
Fig. 5.3(a). On the Dendrite-1 layer, the connection state of input x2 is constant 1,
so this synaptic layer could be omitted. On the Dendrite-2 layer, the connection state
of input x3 is constant 0, thus the Dendrite-2 layer should be completely removed sice
the output of the Dendrite-2 layer will be 0. The removed synapse or dendrites will
be illustrated in dotted lines as shown in Fig. 5.3(b). Fig. 5.3(c) shows the final
simplified dendritic morphology of neuron that only the input x1 on Dendrite-1 layer
can influence the final output of the soma.
5.4 Learning algorithm
NMSN is a feed-forward network with continuous functions. Thus, the error back-
propagation-like algorithm will be valid for NMSN. By using a learning rule, we can
readily derive a neuron model from the condition of the least squared error between
the actual output O and the desired output T defined as:
E =1
2(T −O)2 (5.5)
56
According to the gradient descent learning algorithm, the synaptic parameters wim
and θim will be modified in the direction to decrease the value of E. The equations
are shown as:
∆wim(t) = −η∂E
∂wim
(5.6)
∆θim(t) = −η∂E
∂θim(5.7)
where η is a positive constant representing the learning rate. The partial differentials
of E with respect to wim and θim are computed as:
∂E
∂wim
=∂E
∂O· ∂O∂V· ∂V∂Zm
· ∂Zm
∂Yim
· ∂Yim
∂wim
(5.8)
∂E
∂θim=
∂E
∂O· ∂O∂V· ∂V∂Zm
· ∂Zm
∂Yim
· ∂Yim
∂θim(5.9)
The components in the above partial differential are shown as follow.
∂E
∂O= O − T (5.10)
∂O
∂V=
ksomae−ksoma(V−θsoma)
(1 + e−ksoma(V−θsoma))2(5.11)
∂V
∂Zm
= 1 (5.12)
∂Zm
∂Yim
=I∏
L=1andL=i
YLm (5.13)
∂Yim
∂wim
=kxie
−k(xiwim−θim)
(1 + e−k(xiwim−θim))2(5.14)
∂Yim
∂θim=−ke−k(xiwim−θim)
(1 + e−k(xiwim−θim))2(5.15)
The parameters wim and θim are updated according to the equations as follows.
wim(t+ 1) = wim(t) + ∆wim (5.16)
57
θim(t+ 1) = θim(t) + ∆θim (5.17)
5.5 Experimental results and discussion
The experimental results of liver disease prediction are explained in this section.
Here, the performance of the proposed model NMSN is compared with the classical
back propagation neural network (BPNN) to evaluate the sensitivity, specificity and
accuracy.
5.5.1 Experimental environment and evaluation metrics
In the experiment, we design and test each neural network type using MATLAB
(R2013b). The BPNN is implemented using the MATLAB R2013b Neural Network
(NN) Toolbox. The performance metrics of mean square error (MSE), accuracy,
sensitivity, specificity and area under the ROC curve (AUC) are utilized to compare
the results of proposed model NMSN and BPNN.
The liver disorders datasets in this study are taken from the UCI machine learning
repository which are used commonly in medical classification problems [120]. It will
be divided into two subsets: the training set and the test set. In the testing phase,
the testing dataset is given to the proposed model NMSN and the performance is
quantified by its accuracy. However, it is also important to describe the acquired
results in terms of sensitivity, specificity which are metrics particularly important for
medical diagnosis [121,122].
Sensitivity and specificity quantify the model’s performance for false positive and
false negatives and the association between them is defined by the graphical repre-
sentation of the ROC curves. It helps to make a decision to find the optimal model
to determine the best threshold for the diagnostic test [123]. These methods are
based on the consideration that a test point always falls into one of the following four
categories: True positive (TP), True negative (TN), False negative (FN) and False
positive (FP) [122]. The definitions are given in Table 5.1. Fig. 5.4 shows a confusion
matrix from which the equations of several common metrics can be calculated.
58
True
Positive
True
Negative
False
Negative
False
Positive
p n
Y
N
Hypothesized
class
True class
Column totals: P N
Figure 5.4: Confusion matrix.
Table 5.1: Terms used to define sensitivity, specificity and accuracy.
Outcome of the Condition as determined by the Standard of Truthdiagnostic test Positive Negative Row total
Positive TP FP TP + FPNegative FN TN FN + TN
Column total TP + FN FP + TN N = TP + TN + FP + FN
The equations for calculating the sensitivity, specificity, and accuracy are given as
follows.
Sensitivity = TP/(TP + FN) (5.18)
Specificity = TN/(TN + FP ) (5.19)
Accuracy =TP + TN
TP + FN + TN + FP(5.20)
5.5.2 The liver disease database description
Liver disorder database which is support by BUPA Medical Research Company is ob-
tained from the UCI machine library database. The purpose of BUPA liver disorders
data set is to predict if a male patient has liver disorders. It includes 345 samples and
2 class labels: healthy and unhealthy (of liver disease). 200 samples of class1 category
are of healthy persons while the rest 145 data belong to the unhealthy class2 category.
There are six features in the database described in Table 5.2. The first five of the
59
Table 5.2: Basic features for Liver Disorders.
Indices Feature Descriptionsmcv Mean corpuscular volume
alkphos Alkaline phosphatasesgpt Alamine aminotransferasesgot Aspartate aminotransferase
gammagt Gamma-glutamyl transpeptidasedrinks Number of half-pint equivalents of alcoholic
beverages drunk per day
Table 5.3: No. of patterns in the training and testing data set.
No. for training No. for testing TotalBUPA liver disorders 242 103 345
features are obtained from blood tests and the last is from daily alcohol consumption.
5.5.3 Experimentation setup and results
In the experiment, 70% of data are randomly chosen for training while the rest 30%
are for testing, as shown in Table 5.3. Because of the use of sigmoid function in the
synaptic layer, the variables of input vectors are normalized from 0 to 1.0. The use of
sigmoid function in the output neurons results in output values in the range [0, 1]. A
value of less than 0.5 corresponds to zero, while a value greater or equal to 0.5 turns
into one.
5.5.3.1 Optimal parameters setting
For the determination of an optimal set of parameters to meet accuracy requirements
and fast convergence speed during training, the Taguchi’s method is employed using
orthogonal arrays. It tests part of the possible combinations among factors and levels
instead of full factorial analysis. It commits to a minimum of experimental runs and
best estimation of the factor main effects over the process [124].
There’re five parameters considered to be important in NMSN, namely k, ksoma,
60
Table 5.4: Parameter levels in NMSN.
k ksoma θsoma m η1,3,5,10 1,3,5,10 0,0.3,0.5,0.9 5,10,15,20 0.005,0.01,0.05,0.1
θsoma, m and η. The meaning of k, ksoma and θsoma were already described in above
equations. m is the branch number and η is the learning rate. Within each parameter
there are four levels of interest as shown in Table 5.4. An orthogonal array L16(45), is
most suitable for this problem because it has five 4-level columns to match the needs of
the matrix experiment. The L16(45) orthogonal array for this design problem is shown
in Table 5.5. In order to obtain reliable average testing accuracy, each experiment is
repeated 30 times. The number of iterations is set to 2000. As shown is Table 5.5, the
best classification accuracy is acquired by the parameters of the 8th row, that is, k = 3,
ksoma = 10, θsoma = 0.5, m = 10, and η = 0.005. However, supplemental experiments
are needed to verify the selection of ksoma and η whose values are located at the
boundary of the considered interval (either the maximum or minimum values). To
address this problem, two additional combination of parameters are also considered in
Table 5.5, and the results suggest that ksoma = 15 or η = 0.001 will cause degeneration
of the performance. Therefore, it can be said that the combination of parameter
values that k = 3, ksoma = 10, θsoma = 0.5, m = 10, and η = 0.005 is reasonable for
obtaining acceptable performance, thus revealing the influence of parameters on the
performance of the neuron model to some extent.
In order to be compared with BPNN more fairly, it needs to be under the same
computational scale with nearly equal number of weights (including the weights and
thresholds of all neurons). The BPNN can be represented as a vector with dimension
D containing the network weights. The vector for MLP is defined as in Eq. (5.21).
D = (Input×Hidden) + (Hidden×Output)
+Hiddenbias+Outputbias (5.21)
where Input, Hidden and Output refer to the number of input, hidden and output
61
Table 5.5: L16(45) orthogonal array and factor assignment.
Expt. No. k ksoma θsoma m η Testing accuracy1 1 1 0 5 0.005 41.54± 6.32 1 3 0.3 10 0.01 57.88± 7.83 1 5 0.5 15 0.05 67.50± 6.744 1 10 0.9 20 0.1 65.77± 8.065 3 1 0.3 15 0.1 60.70± 8.716 3 3 0 20 0.05 43.53± 6.117 3 5 0.9 5 0.01 69.36± 7.198 3 10 0.5 10 0.005 72.63± 7.249 5 1 0.5 20 0.01 60.77± 6.8610 5 3 0.9 15 0.005 69.04± 6.3811 5 5 0 10 0.1 41.60± 6.6312 5 10 0.3 5 0.05 58.72± 7.1413 10 1 0.9 10 0.05 63.33± 6.3714 10 3 0.5 5 0.1 58.40± 7.3415 10 5 0.3 20 0.005 64.49± 6.0116 10 10 0 15 0.01 43.14± 5.9
Optimal Parameters 3 10 0.5 10 0.005 72.63± 7.24Additional. Expt. No.
17 3 15 0.5 10 0.005 71.30± 8.2418 3 10 0.5 10 0.001 68.23± 6.29
neurons of BPNN respectively. Hiddenbias and Outputbias are the number of biases
in hidden and output layers [125].
If BPNN has I inputs and L nodes in the hidden layer, the amount to adjust
weights will be I × L+ L+ L+ 1. Meanwhile, if the NMSN model has I inputs and
M branches, the sum total to modify wim and θim will be 2M × I. According to the
rule of equivalence, we can get the value of L and dimension D when I and M are
determined. The values are set as shown in Table 5.6.
5.5.3.2 Performance comparison
For equitable comparison, the hidden layer and output layer of the BPNN both use
Log-sigmoid transfer functions and learning rate was set same as NMSN to be 0.005.
Three experiments are performed with different maximum iterations (1000, 2000,
62
Table 5.6: Structures of NMSN and BPNN for Liver disorders dataset.
Method No. of inputs No. of Hidden/Branch No. of Output No. of WeightsNMSN 6 10 1 120BPNN 6 15 1 121
Table 5.7: Classification results by NMSN and BPNN.
Epochs Model Testing Accuracy Training Accuracy Sensitivity Specificity1000 NMSN 65.45± 7.60 69.05± 3.51 69.0 56.5
BPNN 52.05± 9.34 53.32± 4.36 64.5 23.82000 NMSN 72.63± 7.24 75.12± 3.34 92.3 53.8
BPNN 55.00± 6.40 55.62± 5.24 54.1 53.33000 NMSN 72.69± 5.52 76.60± 1.65 86.5 66.7
BPNN 55.96± 8.55 55.91± 3.56 78.8 31.6
and 3000). Using the optimal set of parameters, both methods are independently
run 30 times. After the test data is classified, the average of the thirty results of
the classification accuracy will be used for comparing the performance of both neural
networks. Table 5.7 shows the classification results obtained by NMSN and BPNN
and Sensitivity and Specificity accuracies are also presented.
As shown in Table 5.7, the proposed model acquired an average testing accuracy of
72.69% when performed with 3000 iterations, which is much higher than the 55.96%
accuracy obtained by BPNN. What’s more, NMSN is also superior to BPNN in terms
of sensitivity and specificity. Higher sensitivity and specificity values indicate that
the ability of NMSN to identify patients who do in fact have liver disorders without
giving false-positive results.
In addition, for further comparison, we tuned the value of parameters of BPNN
to the level that BPNN could have its best performance. We got the best accuracy
66.92% of BPNN when we adjusted the hidden layer to 40 and the learning rate
to 0.1. The best performance of NMSN and BPNN are compared in Table 5.8, in
which the average accuracy of NMSN is still higher than that of BPNN under both
63
Table 5.8: Comparison of the simulations results between NMSN and BPNN.
Method Branchs L. Rate Average accuracy Sensitivity Specificity T-testNMSN 10 0.005 72.69± 5.52 86.5 66.7 −BPNN 15 0.005 55.96± 8.55 78.8 31.6 1.38E-08
40 0.1 66.92± 7.65 82.1 50.0 1.50E-03
of the conditions. Moreover, the values of sensitivity and specificity of NMSN are
higher in general. To gauge the statistical difference between the results of NMSN
and BPNN with two sets of parameters, we conducted a two-tailed t test which is
shown in Table 5.8. From the two-tailed p values derived from the t-tests, we can find
that the differences of the average solution values between each variant of BPNN and
NMSN are significant by rejecting the null hypothesis (p < 0.01).
There are also many other methods to perform liver disease classification. As
shown in Table 5.9, the NMSN is also compared with other methods from pervious
researches based on the BUPA liver disorder medical database. In order to facilitate
performance comparison, five different experimental train-to-test ratios were adopt-
ed, i.e., two single-fold validation method (40%-60%, 80%-20%) and three multi-fold
cross-validation (K×CV) methods including 4-fold CV (66.7%-33.3%, ×4), 5-fold CV
(80%-20%, ×5) and 10-fold CV (90%-10%, ×10). Here the train-to-test ratio denotes
the ratio between the number of samples for training and for testing. With K×CV
(K=4, 5, or 10), the whole data set is randomly divided into K mutually exclu-
sive subsets with approximately equal number of samples. In K×CV, the method is
trained on the training subsets, and the testing error is measured by testing it on the
testing subset. The procedure is repeated for a total of K trials, each time using a
different subset for testing. The performance of the model is assessed by averaging
the squared error under testing over all the trails of the experiment. Compared with
single-fold validation method, the K×CV has advantages that it could minimize bias
associated with random sampling of training samples [126] while has disadvantages
that it may require an excessive amount of computation since the model has to be
64
trained K times [127]. The results of NMSN using the above five kinds of train-to-test
ratios are summarized in Table 5.9, where these results were averaged testing accu-
racies during 30 independent runs based on the same optimal parameters in Table
5.5. From Table 5.9, it is clear that NMSN performs better than the other compared
methods in terms of the accuracy.
5.5.3.3 Convergence properties
In this section, we compare the performance of convergence with respect to the num-
ber of iterations of the two models NMSN and BPNN. Fig. 5.5 illustrates the train-
ing error convergence curve on the BUPA liver disorders datasets when the dendritic
branch of NMSN is 10 and the hidden layer nodes of BPNN is 15 or 40. The training
error is computed based on mean square error (MSE) as Eq. (5.22).
MSE =1
R
R∑a=1
[1
2
S∑b=1
(Eab −Oab)2] (5.22)
where Eab and Oab are the desired output and the network output respectively, S
is the number of patterns and R is the number of simulation repetitions. Here each
pattern denotes one of the data samples of BUPA liver disorders, and thus S = 242 in
the case of Table 3. Each simulation indicates an independent running of compared
methods, and R = 30 is set in the experiment. The results in Fig. 5.5 are obtained
based on the user-defined parameters set as follows: the optimal parameters shown
in Table 5 are used for NMSN; the learning rate is set to 0.1 for BPNN-40, and
0.005 for BPNN-15 respectively. As observed in Fig. 5.5, NMSN provides the lower
error training in contrast and has better convergence rate than BPNNs under both
conditions.
5.5.3.4 ROC analysis
To compare the classification performance of the proposed model NMSN with BPNN,
receiver operator characteristic (ROC) curves method is preferred as a graphical plot
to demonstrate the quality of classifiers. It is a reliable technique to analyze the
65
0 500 1000 1500 2000 2500 30000.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
NMSN
BPNN−40
BPNN−15
Iteration
Tra
inin
g E
rror
Figure 5.5: Comparison of convergence speed of NMSN and BPNN.
performance of algorithms in classification problem showing the true positive rate
(sensitivity) and false positive rate (1-specificity). The ROC curves of both classifiers
are shown in Fig. 5.6. AUC is the area under the curve (ROC) and because it is
one portion of the area of the unit square, the value is between 0.0 and 1.0 [36]. The
value 1.0 of AUC represents the classifier can have perfect discrimination to classify
the live disorders correctly, whereas a value of 0.5 is equal to random model [134].
AUC is calculated as follows:
AUC(%) =1
2(
TP
TP + FN+
TN
TN + FP)× 100 (5.23)
According to this method, AUC are computed for both classifiers as shown in Fig.
5.7, where the values of the AUC are 0.7660 for NMSN, 0.5520 and 0.6605 for BPNN,
suggesting that NMSN is superior than BPNN for classifying the live disorders.
5.5.4 The final synaptic and dendritic morphology
As we mentioned above, NMSN possesses a structural plasticity mechanisms in synaps-
es and dendrites that support classifying the liver disorders dataset. The computation
on neuron is performed as a combination of dimensional reduction and nonlinearity,
which has axon pruning and dendritic pruning mechanisms that can remove useless
66
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
ROC
Tru
e P
osi
tiv
e R
ate
False Positive Rate
(a) NMSN
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False Positive Rate
Tru
e P
osi
tiv
e R
ate
ROC
(b) BPNN-15
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1ROC
Tru
e P
osi
tiv
e R
ate
False Positive Rate
(c) BPNN-40
Figure 5.6: The ROC curves of NMSN and BPNNs.
synapses and dendrites during learning, forming a distinct synaptic and dendritic
morphology for the purpose of improving the efficiency of the neurological system.
An input is connected to a branch by a direct connection (l), an inverse connection
( z), a constant-0 connection ( 0⃝), or a constant-1 connection ( 1⃝). We simplify the
structure of dendrites according to the pruning mechanisms that the synaptic layer
with constant-1 connection can be completely omitted and the dendritic layer with 0
connection should be removed. Through the final synaptic and dendritic morphology,
we can identify the underlying causes of disorders and reduce the number of inputs
to save the diagnosis time and achieve high classification accuracy of liver disease.
Fig. 5.8(a) shows the specific dendritic morphology that holds the best perfor-
mance before learning. Fig. 5.8(b) shows the corresponding morphology after learn-
67
NMSN BPNN−15 BPNN−400
0.2
0.4
0.6
0.8
1
0.766
0.552
0.6605
Figure 5.7: The AUC values of NMSN and BPNNs.
ing. The symbol 6 refers to a branch that can be removed. Fig. 5.8(c) shows all
branches are deleted except branches 1, 3 and 7. The final synaptic and dendritic
structure is obtained in Fig. 5.8(d) with all constant-1 inputs neglected. The final
chosen features are x2, x3, x4, x5 and x6 while x1 can be removed. That is to say, the
six inputs are reduced to five, suggesting that the first feature for liver disorders (i.e.
Mean corpuscular volume) is less important among the underlying causes of disorders.
Finally, it is worth pointing out that the simplified dendritic morphology can form
an approximate logic circuit, which is suitable for a simple hardware implementation
in practice [135].
5.6 Conclusions and Remarks
In this study, a single neuron model with synaptic nonlinearities (NMSN) in a den-
dritic tree was proposed for carrying out the liver disorders classification. The compu-
tational capacity of the single neuron model NMSN was realized by the combination
of dimensional reduction and nonlinearity. The nonlinearity computation was derived
from the multi-layer architecture of the dendritic neuronal tree, while the dimensional
reduction originated from the specific neuron-pruning function in the synaptic layer.
The performance of NMSN was verified based on the liver disease diagnostic prob-
lems. Experimental results suggested that NMSN was superior than the traditional
BPNN with the similar computational architecture (denoted as BPNN-15) or with
the best performance (namely BPNN-40), in terms of classification accuracy, conver-
gence properties, and AUC criterion. In addition, NMSN also produced better or
competitive solutions than a number of previously proposed methods, such as SVM,
68
C4.5, Classification tree, KNN, Neuron-fuzzy model, etc.
It’s worth emphasizing that NMSN has a distinct ability of pattern extraction
through the pruning function, which is a metaphor of the neuronal morphology. By
learning a larger than necessary initial network, and thereafter screening out the
useless synapses and unnecessary dendrites, NMSN can finally produce a neuron
with least necessary dentritic morphology. The resultant neuron can not only possess
significant higher computational capacity than the traditional Mc-Culloch-Pitts linear
neuron model which is incapable of solving even the simple 3-bit parity problem, but
also provide a possible information processing mechanism of the neuronal morphology
and plasticity. These findings and evidences might also give some insights into the
development of new techniques for understanding the mechanisms and constructions
of single neurons.
In the future, we plan to apply the proposed NMSN on other classification prob-
lems to further verify its performance. The theoretical convergence analysis of the
gradient descent training method used in NMSN will also be carried out. In addition,
global training methods such as differential evolution algorithm or particle swarm
optimization will be utilized to improve the training results of NMSN.
69
Tab
le5.9:
Classification
accuracies
forBUPA
Liver
Disordersproblem
obtained
byother
methodsin
literature.
Author
(year)
Method(train-to-test
ratios)
Classification
accuracy
(%)
Pham
etal.(2000)
[128]
RULES-4
(40%
-60%
)55.90
Cheung(2001)
[129]
Naıve
Bayes
(5×CV)
63.39
Cheung(2001)
[129]
C4.5(5×CV)
65.59
Cheung(2001)
[129]
BNND
(5×CV)
61.83
Cheung(2001)
[129]
BNNF(5×CV)
61.42
Van
Gestelet
al.(2002)
[130]
SupportVectorMachine(SVM)withGP(10×
CV)
69.70
Yeow
(2006)
[131]
Classification
tree
(10×
CV)
53.90
Yeow
(2006)
[131]
KNN
(10×
CV)
50.16
Jeatrak
ulan
dWon
g(2009)
[100]
RBFNN
(80%
-20%
)67.54
Jeatrak
ulan
dWon
g(2009)
[100]
CMTNN
(80%
-20%
)70.29
Bah
ramirad
etal.(2013)
[120]
SVM
69.23
Kulkarnian
dShinde(2013)
[132]
Neuro-fuzzymodel
(80%
-20%
)58.90
Kulkarnian
dShinde(2013)
[132]
Neuro-fuzzywithgaussianmem
bership
function(80%
-20%
)67.27
Ubaidillanet.al
(2014)
[133]
SVM
63.11
Zhan
get
al.(2014)
[101]
SOCPNN
(4×CV)
66.78
Zhan
get
al.(2014)
[101]
MOCPNN
(4×CV)
66.78
Seera
andLim
(2014)
[102]
Min-M
axneuralnetwork(5×CV)
67.25
Seera
andLim
(2014)
[102]
Min-M
axneuralnetwork(10×
CV)
66.13
Ourmethod(2015)
NMSN
(80%
-20%
)73.15
Ourmethod(2015)
NMSN
(40%
-60%
)69.47
Ourmethod(2015)
NMSN
(4×CV)
71.04
Ourmethod(2015)
NMSN
(5×CV)
72.78
Ourmethod(2015)
NMSN
(10×
CV)
72.43
70
soma
1 1
1 10
1 1 1
Membrane
Dendrite-1
x3
0 1 1
0
1 1
0 0
1
10
0 1
0 0 0 0 0
11 1
0 1 0
1 0 0 0
0 0 0 0
1 1
11 0 0 1
0 0 1
1 0 0 1
10 10 0 0
x1 x2 x4 x5 x6
Dendrite-2
Dendrite-3
Dendrite-4
Dendrite-5
Dendrite-6
Dendrite-7
Dendrite-8
Dendrite-9
Dendrite-10
Dendrite-11
Dendrite-12
Dendrite-13
Dendrite-14
Dendrite-15
0
0
1 1
1
(a) soma
1 1
1 10
1 1 1
Membrane
Dendrite-1
x3
0 1 1
0
1 1
0 0
1
10
0 1
0 0 0 0 0
11 1 1
0 1 0
1 0 0 0
0 0 0 0
1 0 1 1 1
11 0 0 1
0 0 0 1
1 0 0 1
10 10 0 0
x1 x2 x4 x5 x6
Dendrite-2
Dendrite-3
Dendrite-4
Dendrite-5
Dendrite-6
Dendrite-7
Dendrite-8
Dendrite-9
Dendrite-10
Dendrite-11
Dendrite-12
Dendrite-13
Dendrite-14
Dendrite-15
(b)
soma
1 1
1 1 1
Membrane
Dendrite-1
x3
1 1
1
11 1 1
x1 x2 x4 x5 x6
Dendrite-3
Dendrite-7
soma
Membrane
Dendrite-1
x3x2 x4 x5 x6
Dendrite-3
Dendrite-7
(c) (d)
Figure 5.8: The evolution of the neuronal morphology.
71
Chapter 6
Evolutionary Model: ChaoticGravitation Search
6.1 Introduction
Gravitational search algorithm (GSA) [136] is one of the newest heuristic optimization
methods based on Newtonian laws of gravity and motion. It has shown remarkable
search abilities in solving optimization problems [137] within high-dimensional search
spaces. In GSA, a series of candidate solutions are kept as a group of objects. At each
iteration, the objects update their solutions by moving stochastically. The objects
with heavier masses have stronger attraction to other objects and move more slowly
than the objects with lighter masses. By lapse of iterations, all other objects tend
to move towards the heaviest object which corresponds to be the best solution for
optimization problem. The advantages of robustness, adaptability and simplicity of
GSA make it possible to be applied to a wide scope of function optimization problems
[138]. However, GSA still suffers from the inherent disadvantages of trapping in local
minima and the slow convergence rates that reduce the solution quality.
To resolve the aforementioned problem, chaos, which is of randomicity, ergodicity
and regularity was incorporated into GSA [139]. Chaos is a very common phenomenon
of non-linear systems and has recently received many interests. In the field of optimal
design, the ergodicity of chaos has been viewed as a optimization mechanism to avoid
falling into the local search process. The chaotic state was introduced into the opti-
72
mization variables and did search using the chaos variables [140]. Meanwhile kinds of
chaos optimization algorithm applying to solve the complex object for optimization
problem were put forward [141, 142]. The search based on chaos has stronger explo-
ration and exploitation capability and can enable the algorithm to effectively jump out
of local extremum due to the inherent ergodicity of chaos. It has been demonstrated
that the combination of GSA with chaotic system can alleviate the shortcomings of
GSA and thus highlight the advantages of the usage of chaotic systems [139,143].
There are two methods to combine GSA with chaos. One uses chaotic maps to
generate chaotic sequences to substitute random sequences, while the other employs
chaos to act as a local search approach. In our previous work, the logistic map
was utilized to generate chaotic sequences and perform the local search [139]. In
this study, other four different chaotic maps involving the piecewise linear chaotic
map, the gauss map, the sinusoidal map, and the sinus map are utilized to combine
with GSA. It is apparent that different chaotic maps possess distinct distribution
characteristics. The objective of this work is, not only to find out which chaotic map
most greatly improve the performance of GSA, but only to give some insights to the
underlying reasons. To realize these, six commonly used benchmark optimization
functions are chosen from the literature. The experimental results verify that all
five incorporated chaotic maps can improve the performance of GSA in terms of the
solution quality and convergence speed. In addition, the four newly incorporated
chaotic maps exhibit better influence on improving the performance of GSA than the
logistic map, suggesting that the hybrid searching dynamics of CGSA is significantly
effected by the distribution characteristics of chaotic maps. Furthermore, simulation
results also show that the performance of CGSA is tightly related to the search
dynamics which results from the interaction between the incorporated chaotic map
and the landscape of the solved problems.
The rest of the paper is organized as follows: Section II presents a brief description
of the GSA. The five chaotic maps used in the chaotic local search procedure is
introduced in Section III. In Section IV, the chaotic gravitational search algorithms
using five different maps are proposed. Section V gives the experimental results of the
73
five variants of CGSA on the six benchmark optimization functions. Finally, some
general remarks are presented to conclude the paper.
6.2 Overview of GSA
GSA is a new stochastic search algorithm introduced by Reshedi et al. [136]. It is a
global search strategy that can handle efficiently arbitrary optimization problems. It
is based on the Newtonian laws of gravity and motion. In GSA, agents are considered
as objects and their performances are measured by their masses. All these objects
attract each other by a gravity force [144,145] and this force causes a global movement
of all objects towards the objects with heavier masses. Hence, objects cooperate with
each other using a direct form of communication through gravitational force. The
heavier masses (which correspond to good solutions) move more slowly than the
lighter ones, which guarantees the exploitation of the algorithm to find the optima
around a good solution. Consider a system with N agents (objects), we define the
position of the ith agent by:
Xi = (x1i , x
2i . . . , x
di , . . . , x
ni ) i = 1, 2, . . . , N (6.1)
where xdi is the position of the ith agent in the dth dimension, and n is the dimension
of the search space.
At the tth iteration, the gravitational force acting on the ith object from the jth
object is represented as follows [136]:
F dij(t) = G(t)
Mj(t)Mi(t)
Rij(t) + ε(xd
j (t)− xdi (t)) (6.2)
where Mi and Mj are masses of agents. G(t) is the gravitational constant at time t,
ε is a very small constant and Rij(t) indicates the Euclidean distance between two
agents i and j:
Rij(t) =∥ xi(t), xj(t) ∥2 (6.3)
74
The gravitational constant G(t) is initialized at the beginning of iterations and it
is reduced with time to control the search accuracy. G(t) is given by [146]:
G(t) = G0e−αt/itermax (6.4)
where G0 is the initial value, α is a user-defined parameter, and itermax is the maxi-
mum number of iterations.
The total force acting on the ith agent is given by:
F di (t) =
Kbest∑j=1,j =i
randjFdij(t) (6.5)
where Kbest is the set of first K agents with better fitness (i.e. bigger mass). It is a
function of time that decreases linearly along with iteration time [136] and at the end
of iterations its value becomes 2% of the initial number of agents. randj is a random
number in the interval [0,1]. Hence, by the law of motion, the acceleration adi (t) of
the agent i at time t and in dth dimension is given by:
adi (t) =F di (t)
Mi(t)(6.6)
where Mi(t) is calculated through the map of fitness defined as follows:
Mi(t) =mi(t)∑Nj=1mj(t)
(6.7)
and
mi =fiti(t)− worst(t)
best(t)− worst(t)(6.8)
where best(t) is the best fitness of all agents, worst(t) is the worst fitness of all agents,
and fiti(t) represents the fitness of agent Mi by calculating the objective functions.
The new velocity of an agent is considered as a fraction of its current velocity
added to its acceleration. Thus the position and the velocity of the ith agent at tth
75
iteration in the dth dimension is calculated as follows:
vdi (t+ 1) = randi × vdi (t) + adi (t) (6.9)
xdi (t+ 1) = xd
i (t) + vdi (t+ 1) (6.10)
where randi is a uniform random variable generated in the interval [0,1], which in
fact is an attempt of giving randomized characteristics to the search. The flow chart
of GSA is given in the following.
Traditional Gravitational Search Algorithmfor all agents i (i=1,2,...,N) doinitialize position xi randomly in search space
end-forwhile termination criteria not satisfied dofor all agent i do
compute overall force F di (t) according to Eqs.(2)-(5)
compute acceleration adi (t) according to Eq.(6)compute current velocity according to Eq.(9)compute current position according to Eq.(10)end-for
end-while
The main features of GSA are listed as follows:
(1) The object with heavier mass owns a stronger attractive force and moves slower
than the lighter agent.
(2) Gravitational constant decreases with time to make the search have better accu-
racy.
(3) The acceleration of an agent is decided by the total force which is inversely pro-
portional to the distance between two agents.
(4) The next position of agent only depends on its current velocity and current posi-
tion.
(5) GSA is a less-memory algorithm, and only requires a small memory capability of
hardware.
76
6.3 chaotic maps
Recently, chaos, which is a kind of dynamic behavior of nonlinear systems, has aroused
much concern in different fields of sciences such as chaos control, pattern recognition
and optimization theory [147]. In this section five chaotic maps are introduced.
6.3.1 Logistic map
The logistic map is a polynomial mapping which is often cited as an archetypal
example to show how complex and chaotic behavior can arise from very simple non-
linear dynamical equations. The map was popularized in a seminal paper by the
biologist Robert May [148], in part as a discrete-time demographic model analogous
to the logistic equation. The equation of this map appears in nonlinear dynamics
of biological population evidencing chaotic behavior. Its mathematical expression is
given by Eq. (11).
xk+1 = axk(1− xk) k = 1, 2, . . . , N (6.11)
where xk is the kth chaotic number and k represents the iteration number, a is usually
set to 4. The initial number x0 ∈ [0, 1] and x0 /∈ {0.0, 0.25, 0.5, 0.75, 1.0}. When the
logistic map is combined with GSA, the hybrid algorithm is labeled as CGSA1.
6.3.2 Piecewise linear chaotic map
Piecewise linear chaotic map (PWLCM) has obtained more and more attention in
chaos research recently for its simplicity in representation and good dynamical behav-
ior. PWLCM has been known as ergodic and has uniform invariant density function
on their definition intervals [149]. The simplest PWLCM is defined in Eq. (12):
xk+1 =
xk/p xk ∈ (0, p)
(1− xk)(1− p) xk ∈ [p, 1)(6.12)
77
In the experiment, p is set to be 0.7. When PWLCM is combined with GSA, the
hybrid algorithm is labeled as CGSA2.
6.3.3 Gauss map
The Gauss map can be defined for hypersurfaces in Rn as a map from a hypersurface
to the unit sphere Sn−1 ∈ Rn. Its equation is defined by [143,150]
xk+1 =
0 xk = 0
(µ/xk)mod(1) otherwise(6.13)
where µ is set to 1. When the gauss map is combined with GSA, the hybrid algorithm
is labeled as CGSA3.
6.3.4 Sinusoidal map
The following equation defines the Sinusoidal map [148]
xk+1 = ax2ksin(πxk) (6.14)
For a = 2.3 and x0 = 0.7, it has the following simplified form
xk+1 = sin(πxk) (6.15)
When the sinusoidal map is combined with GSA, the hybrid algorithm is labeled as
CGSA4.
6.3.5 Sinus map
Sinus map [151,152] is defined as
xk+1 = 2.3(xk)2sin(πxk) (6.16)
78
When the sinus map is combined with GSA, the hybrid algorithm is labeled as
CGSA5.
To illustrate the details of chaos, the distributions of x for all the five chaotic maps
are given in Fig.6.1. The dynamic ranges of the five chaotic maps are summarized
as follows: [0, 1] for the Logistic, PWLCM, and Gauss maps, [0, 0.92] for Sinusoidal
map, and [0,+∞] for Sinus map. It is worth pointing out that: 1) for all the values
of x, we take two digits after the decimal point for the convenience of illustration.
2) the distribution of x in PWLCM and gauss map are flatter than the other three
maps, which suggests that the probabilities of x visiting the values in [0,1] is nearly
the same. 3) only the values between [0, 1] are utilized in the chaotic local search.
Although Xiang et al. [153] gave an argument that flat distribution of x performed
better than rough distributions when it was applied in chaotic search, they only gave
the simulation results between PWLCM and the logistic map. It is reasonable that
the performance of chaotic search is not only related to the distribution of chaos, but
also to the landscape of the optimization function. More evidences can be found in
Section V.
6.4 Chaotic gravitational search algorithm
Taking properties of chaos like ergodicity, iteration-based searching algorithms called
chaos optimization algorithms (COA) were presented [141,142]. It is easier for COA
to escape from local minimum points than traditional stochastic optimization algo-
rithms. The chaotic system repeats through all the states of the phase space by
its ergodicity, based on the movement rule of its own from an initial state. It can
traversal for many times near the current optimal solution due to the advantage of
this property of the chaotic system. Compared with the random local search, chaotic
local search can alleviate the blindness and randomness of the search process so that
better solutions near the current optimal solutions can be reached more effectively.
The general steps of CLS are given as following:
79
Figure 6.1: The distribution of x under certain system parameters in 20000 iterationswhen x0 = 0.74
.
0 0.2 0.4 0.6 0.8 10
100200300400500600700800900
1000(a) Logistic map
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250(b) PWLCM
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250
300(c) Gauss map
0 0.2 0.4 0.6 0.8 10
300
600
900
1200
1500(d) Sinusoidal map
0 0.2 0.4 0.6 0.8 10
500
1000
1500
2000
2500
3000
3500(f) Sinus map
Chaotic Local Search Algorithmstep 1. Set the parameters of a chaotic system and thenumber of chaotic search L
step 2. According to the chaotic system, get a chaoticsequence whose length is N
step 3. Choose the best individual vc in the currentpopulation
step 4. Recorded chaotic search initial counter as 0step 5. while (t < L)step 6. Superimpose an item of the chaotic sequenceon vc in any dimension to form a new individual thatis marked as vn
step 7. Calculation the fitness value of the newindividual vn
step 8. Compute current velocity according to Eq. (9)step 9. for the optimization function fstep 10. if f(vc) > f(vn)step 11. vc ← vnstep 12. end-ifstep 13. t = t+ 1step 14. end-while
80
Noted that the search neighborhood of Xg is constructed in a hypercube whose
center is Xg with a radius r, where r = ρ× r. The constant ρ is set to 0.978.
Based on the GSA and chaotic local search, an improved chaotic gravitational
search algorithm is proposed here. It should be noticed that the local search is only
applied to the current global best agent Xg obtained from the GSA. Compared with
carrying out local search on all agents, it is expected that this scheme can not only
save computational times, but also produce competitive good solutions. The proce-
dure of CGSA is described in the following.
Chaotic Gravitational Search Algorithmfor all agents i (i=1,2,,N)do
initialize position xi randomly in search spaceend-forwhile termination criteria not satisfied dofor all agent i do
Compute overall force Fid(t) according
to Eqs. (2)-(5)
Compute acceleration aid(t) according to Eq. (6)
Update velocity according to Eq. (9)Update position according to Eq. (10)
end-forFind out the global best agent Xg
Implement chaotic local search approach (CLS)Decrease chaotic local search radius using r = ρ× r
end-while
6.5 Numerical simulation
6.5.1 Experimental setup
To evaluate the performance of the proposed algorithm, six benchmark optimization
problems in Table 6.1 are used, where functions f1-f3 are unimodal functions, while
functions f4-f6 are multimodal functions with plenty of local minima and the number
of the local minima in these functions increases exponentially with the dimension of
81
the function. The population size of all constructed algorithms is set to be 50. The
maximum iteration number is 1000 in each run. In order to eliminate stochastic
discrepancy and give the statical analysis, each algorithm is repeated 30 times. The
constants ε, α and G0 are set to 1000, 1.0E-100 and 100 respectively. The experiments
are conducted in Microsoft Visual Studio 2010 on a personal PC.
6.5.2 Results and discussions
We firstly compared the performance during GSA, CGSA1, CGSA2, CGSA3, CGSA4,
and CGSA5. Table 6.2 to Table 6.7 recorded the minimum fitness, maximum fitness
and average fitness for each algorithm on the benchmark functions respectively. From
these tables, it is clear that all chaotic GSAs perform better than GSA, suggesting
that chaotic search as a local search approach is able to enhance the global search
capacity of the algorithm and prevent the search to stick on a local solution. Moreover
the average fitness of the best-so-far solutions found by CGSA3, CGSA4 perform
better than CGSA1 for all the six functions, which indicates that the sinusoidal map
and gauss map possess the better searching performance than the logistic map used
in [139]. Thus, it is evident that the searching dynamics of GSA is definitely effected
by the distribution characteristics of chaos, and meanwhile the famous logistic map
is might not the best choice for the utilization for many optimization problems.
In order to analysis the final best-so-far solution in details, a box-and whisker
diagram is used in Fig. 6.2. The vertical axis indicates the fitness values of the final
solutions and the horizontal axis represents the six algorithms. From Fig. 6.2, it is
apparent that CGSA2, CGSA3 and CGSA4 generate better solutions than CGSA1
in terms of not only the maximum, average, and minimum values, but also the lower
quartile, median, and upper quartile values of the final best-so-far solutions for f2-f4
and f6. CGSA5 outperforms CGSA1 on f2, f3, f4 and f6. In particular, CGSA5
produces significant better solutions than the other algorithms on f4. The reason
seems to be distinct distribution characteristics of the sinus map where most of the
chaotic values are located around 0.4. To sum up, it can be concluded that: 1)
82
Figure 6.2: Statistical values of the final best-so-far solution obtained by the sixalgorithms.
GSA
CGSA
1
CGSA
2
CGSA
3
CGSA
4
CGSA
5
1
2
3
4
x 10−17
"n
al
be
st s
olu
tio
ns
(a) f1
GSA
CGSA
1
CGSA
2
CGSA
3
CGSA
4
CGSA
5
1.5
2
2.5
3
x 10−8
"n
al
be
st s
olu
tio
ns
(b) f2
GSA
CGSA
1
CGSA
2
CGSA
3
CGSA
4
CGSA
5
50
100
150
"n
al
be
st s
olu
tio
ns
(c) f3
GSA
CGSA
1
CGSA
2
CGSA
3
CGSA
4
CGSA
5
−12000
−10000
−8000
−6000
−4000
−2000
"n
al
be
st s
olu
tio
ns
(d) f4
GSA
CGSA
1
CGSA
2
CGSA
3
CGSA
4
CGSA
5
3
4
5x 10
−9
"n
al
be
st s
olu
tio
ns
(e) f5
GSA
CGSA
1
CGSA
2
CGSA
3
CGSA
4
CGSA
50
5
10"
na
l b
est
so
luti
on
s
(f) f6
hybridization of GSA with chaos is demonstrated to be an essential aspect of the
high performance; 2) the four newly incorporated chaotic maps generally exhibit
better influence on improving the performance of GSA than the logistic map; and 3)
however, there is no specific chaotic maps can enable GSA to achieve the best solution
for all optimization problems, suggesting that the performance of hybrid CGSAs are
related not only to the search capacity of the algorithm, but also to the landscape of
the solved problems.
To give some insights to how the chaotic local search on the search dynamics of
GSA, the convergence trendline figures of function f2, f3, f4, and f6 obtained by the
six algorithms are given in Fig. 6.3. In this figure, the horizontal axis represents the
iteration and the vertical axis denotes the average fitness best-so-far solutions in loga-
rithmic scales. The convergence graphs of the last 100 iteration are embedded aiming
83
Figure 6.3: The average fitness trendlines of the best-so far solution found by the sixalgorithms.
0 200 400 600 800 100010
−10
10−5
100
105
1010
1015
Iteration
av
era
ge
be
st−
so−
far
(a) f2
0 200 400 600 800 100010
0
102
104
106
108
1010
Iteration
av
era
ge
be
st−
so−
far
(b) f3
0 200 400 600 800 1000−10
5
−104
−103
Iteration
av
era
ge
be
st−
so−
far
(c) f4
0 200 400 600 800 100010
−4
10−2
100
102
104
Iteration
av
era
ge
be
st−
so−
far
(d) f6
GSA
CGSA1
CGSA2
CGSA3
CGSA4
CGSA5
900 920 940 960 980 100010
−8
10−7
10−6 GSA
CGSA1
CGSA2
CGSA3
CGSA4
CGSA5
900 920 940 960 980 1000
101.41
101.52
GSA
CGSA1
CGSA2
CGSA3
CGSA4
CGSA5
GSA
CGSA1
CGSA2
CGSA3
CGSA4
CGSA5
to show the differences more clearly. It is difficult to distinguish the convergence
graphs for the six algorithms on f1 since the algorithm has quite quick convergence
speed mainly manipulated by the GSA rather than the chaotic local search. The
search behaviors of algorithms on multimodal functions f4 and f6 are quite illumi-
nating. CGSA3, CGSA4 and CGSA5 performs much faster convergence speed than
the other algorithms on multimodal functions, suggesting that the gauss map, the
sinusoidal map and the sinus map might be more suitable for helping algorithms to
jump out of the local solutions.
Furthermore, we define the ratio of best-so-far solutions found by the five chaot-
ic maps to those found by GSA verse the iteration. We assume AFGSA, AFCGSA1,
AFCGSA2, AFCGSA3, AFCGSA4, and AFCGSA5 represent the average fitness of best-so-
far solution found by GSA, CGSA1, CGSA2, CGSA3, CGSA4, and CGSA5, respec-
tively. The ratio is defined as follows:
Ra =AFcandidate
AFGSA
(6.17)
84
Figure 6.4: The ratio of best-so-far solutions found by the six algorithms.
0 200 400 600 800 10000
0.2
0.4
0.6
0.8
1
1.2
1.4
Iteration
ra
tio
of
be
st−
so
−fa
r so
luti
on
(a) f2
GSA
CGSA1
CGSA2
CGSA3
CGSA4
CGSA5900 920 940 960 980 10000.8
1
1.2
0 200 400 600 800 10000.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Iteration
ra
tio
of
be
st−
so
−fa
r so
luti
on
(b) f3
GSA
CGSA1
CGSA2
CGSA3
CGSA4
CGSA5
0 200 400 600 800 10000
1
2
3
4
5
Iteration
Ra
tio
of
be
st−
so
−fa
r so
luti
on
(c) f4
GSA
CGSA1
CGSA2
CGSA3
CGSA4
CGSA5
0 200 400 600 800 10000
0.2
0.4
0.6
0.8
1
1.2
1.4
Iteration
Ra
tio
of
be
st−
so
−fa
r so
luti
on
(d) f6
GSA
CGSA1
CGSA2
CGSA3
CGSA4
CGSA5
where the candidate is obtained from one of the CGSAs. Fig. 6.4 depicts the ratios
of algorithms verse the iteration where the values of the solutions found by GSA are
set as the basis, thus forming a horizontal line in the figure. For Fig. 6.4(a), (b) and
(d), the values above this line indicate worse solutions found by the algorithm than
those by GSA, while the values below the line denote better ones. For Fig. 6.4(c), the
inverse cases happen since the values of solutions are negative numbers. From Fig.
6.4, it is clear that chaotic GSAs significantly outperform GSA on f3, f4 and f6.
On the other hand, chaotic GSAs still has capacity of jumping out of local minima
on the latter search phases which can be observed from the subfigure of Fig. 6.4(a),
although they only produce slightly better solutions than GSA.
6.6 Conclusion
In this paper, improved gravitational search algorithms (CGSA) using five different
chaotic maps were presented for global optimization. These chaotic maps were utilized
to carry out the chaotic local search which is inserted into GSA. The architecture of
85
such resultant hybrid algorithm is emerged by switching the chaotic search and GSA
to each other. Experimental results indicated that the chaotic search can directly
improve the current solution found by GSA, thus leading a faster convergence speed,
and further obtaining a higher probability of jumping out local optima.
Moreover, the distribution characteristics of the five chaotic maps were also ob-
served. Results suggested that the four newly introduced chaotic maps in this paper
generally exhibit better influence on improving the performance of GSA than the lo-
gistic map. Nevertheless, there is no specific chaotic maps can enable GSA to achieve
the best solution for all optimization problems, suggesting that the performance of
hybrid CGSAs are related not only to the search capacity of the algorithm, but also
to the landscape of the solved problems. In the future, we plan to adaptively use mul-
tiple chaotic maps simultaneously in the chaotic search to construct a more powerful
CGSA and analyze the search dynamics of the algorithm.
86
Tab
le6.1:
Thefunction
nam
e,definition,dim
ension
,feasible
interval
ofvarian
ts,an
dtheknow
nglob
alminim
um
ofsix
benchmarkfunction.
Functionnam
eDefinition
Dim
Interval
Global
minim
um
Sphere
f1(X)=
∑ n i=1x2 i
30[-100,100]
0.0
Schwefel
2.22
f2(X)=
∑ n i=1|x
i|+∏ n i=
1|x
i|30
[-10,10]
0.0
Rosenbrock
f3(X)=
∑ n−1
i=1[100(x
i+1−x2 i)2+(x
i−
1)2]
30[-30,30]
0.0
Schwefel
2.26
f4(X)=−∑ n i=
1sin(√ (|x
i|))
30[-500,500]
-418.9829D
Ackley
f5(X)=−20exp(−
0.2√ 1 n
∑ n i=1x2 i)
30[-32,32]
0.0
−exp(1 n
∑ n i=1cos(2π
xi))+20
+e
Griew
ank
f6(X)=
14000
∑ n i=1x2 i−
∏ n i=1cos(
xi
√i)+1
30[-600,600]
0.0
87
Table 6.2: Statistical results of different methods for Sphere function (f1).Method Minimum fitness Maximum fitness Average fitnessGSA 1.21E-17 3.25E-17 2.08E-17
CGSA-1 1.38E-17 3.11E-17 2.11E-17CGSA-2 8.18E-18 3.50E-17 1.99E-17CGSA-3 1.11E-17 3.69E-17 2.01E-17CGSA-4 1.10E-17 4.05E-17 1.98E-17CGSA-5 1.38E-17 3.65E-17 2.19E-17
Table 6.3: Statistical results of different methods for Schwefel function (f2).Method Minimum fitness Maximum fitness Average fitnessGSA 1.44E-8 3.11E-8 2.28E-8
CGSA-1 1.56E-8 3.04E-8 2.21E-8CGSA-2 1.48E-8 3.30E-8 2.10E-8CGSA-3 1.40E-8 3.18E-8 2.11E-8CGSA-4 1.54E-8 3.13E-8 2.06E-8CGSA-5 1.38E-8 3.05E-8 2.03E-8
Table 6.4: Statistical results of different methods for Rosenbrock function (f3).Method Minimum fitness Maximum fitness Average fitnessGSA 25.70 152.14 35.19
CGSA-1 25.44 85.49 27.62CGSA-2 24.80 136.43 33.29CGSA-3 25.07 27.06 25.42CGSA-4 25.17 25.58 25.43CGSA-5 23.73 82.17 29.75
Table 6.5: Statistical results of different methods for Schwefel 2.26 function (f4).Method Minimum fitness Maximum fitness Average fitnessGSA -3617.23 -2178.52 -2844.65
CGSA-1 -4288.88 -2321.88 -3110.29CGSA-2 -7693.55 -4158.85 -5250.43CGSA-3 -7001.99 -3645.29 -5050.60CGSA-4 -7180.01 -3448.26 -4887.43CGSA-5 -12561.4 -12123.8 -12383.54
88
Table 6.6: Statistical results of different methods for Ackley function (f5).Method Minimum fitness Maximum fitness Average fitnessGSA 2.64E-9 4.42E-9 3.40E-9
CGSA-1 2.56E-9 4.70E-9 3.49E-9CGSA-2 2.63E-9 4.91E-9 3.39E-9CGSA-3 2.52E-9 4.45E-9 3.42E-9CGSA-4 2.32E-9 4.49E-9 3.42E-9CGSA-5 2.90E-9 4.73E-9 3.41E-9
Table 6.7: Statistical results of different methods for Griewank function (f6).Method Minimum fitness Maximum fitness Average fitnessGSA 1.37 12.52 4.28
CGSA-1 1.25 4.50 2.17CGSA-2 1.01E-14 4.41E-2 3.60E-3CGSA-3 1.60E-14 7.31E-2 1.02E-2CGSA-4 1.02E-14 7.5E-2 7.17E-3CGSA-5 3.62E-2 0.88 0.38
89
Chapter 7
Evolutionary Model:Multi-objective DifferentialEvolution
7.1 Introduction
Differential evolution (DE) algorithm [154] is a novel technique that was originally
thought to solve the problem of Chebyshev polynomial. It is a population based s-
tochastic meta-heuristic for global optimization on continuous domains which related
both with simplex methods and evolutionary algorithms. Due to its simplicity, robust-
ness, and effectiveness, DE is successfully applied in solving optimization problems
arising in various practical applications [155], such as data clustering, image process-
ing, etc. DE outperforms many other evolutionary algorithms in terms of convergence
speed and the accuracy of solutions. Its performance, however, is still quite dependent
on the setting of control parameters such as the mutation factor [156] for complex
real-world optimization problems, especially those with multiple objectives [157,158].
In multiple objective problems, several objectives (or criteria) are, not unusually,
stay in conflict with each other, thus requiring a set of non-dominated solutions,
i.e., Pareto-optimal solutions to be the candidates for decision. The general goals
of this requirement are the discovery of solutions as close to the Pareto-optimal as
possible, and the distribution of solutions as diverse as possible in the obtained non-
dominated set. Many works have been reported to satisfying these two goals. Wang
90
et al. [159] proposed a crowding entropy-based diversity measure to select the elite
solutions into the elitist archive. Zhang et al. [160] utilized the direction information
provided by archived inferior solutions to evolve the differential mutations. Gong
et al. [161] introduced the ε-dominance and orthogonal design into DE to keep the
diversity of the individuals along the trade-off surface. More recently, Chen et al.
[162] proposed a cluster degree based individual selection method to maintain the
diversity of non-dominated solutions. A hybrid opposition-based DE algorithm was
proposed by combining with a multi-objective evolutionary gradient search [163].
Although these variants of multi-objective DE have demonstrated that DE is suitable
for handling multiple objectives, rare work, however, is carried out to discuss the
setting of control parameters involving the mutation factor in the multi-objective
DE.
Based on the above consideration, in this work, we proposed an adaptive mutation
operator into DE to avoid the premature convergence of non-dominated solutions. In
the former searching phases, the setting of mutation scale factor F remains large
enough to explore the search space sounding to the non-dominated solutions, thus
maintaining the diversity of the distribution of Pareto set. Along with the lapse of
evolution, F is gradually reduced to perform the exploitation around the promis-
ing search area, aiming to reserve good information and to avoid the destruction of
the optimal solutions. Furthermore, as noticed by Zitzler et al. [164] that elitism
helps in achieving better convergence of solutions in multi-objective evolutionary
algorithm, an elitist scheme is adopted by maintaining an external archive of non-
dominated solutions obtained in the evolution process. Moreover, the ε-dominance
strategy [165] which can provide a good compromise in terms of convergence near
to the Pareto-optimal and the diversity of Pareto fronts is also used in the algorith-
m. It is expected that, with the utilization of elitist scheme and ε-dominance, the
cardinality of Pareto-optimal region can be reduced, and no two obtained solutions
are located within relative small regions. To verify the performance of the proposed
algorithm, five widely used benchmark multiple objective functions are utilized as the
test suit. Experimental results indicate that the proposed adaptive mutation based
91
multi-objective DE outperforms traditional multi-objective evolutionary algorithms
in terms of the convergence and diversity of the Pareto fronts.
7.2 Brief Introduction to DE
The standard DE is essentially a kind of special genetic algorithm based on real
parameter and greedy strategy for ensuring quality. An iteration of the classical DE
algorithm consists of the four basic steps: initialization of a population of search
variable vectors, mutation, crossover or recombination, and finally selection. DE
begins its search with a randomly initiated population for a global optimum point
in a D-dimensional real parameter space. We denote subsequent generations in DE
by G = {0, 1, 2, · · · , Gmax} and the i-th (i = 1, 2, ..., NP ) individual of the current
population is denoted as Xi,G = (x1i,G, x2
i,G, ... xji,G, ..., xD
i,G). The initial population
is randomly generated by:
xj,i,0 = xj,min + randi,j[0, 1] ∗ (xj,max − xj,min) (7.1)
where randi,j[0, 1] is a uniformly distributed random number in [0, 1], xj,min and xj,max
represents the boundary values of the search space. For each individual vector Xi,G
(target vector), differential evolution algorithm uses mutation operator to generate a
new individual Vi,G (variation vector), which is generated according to Eq. (2).
Vi,G = Xr1,G + F ∗ (Xr2,G −Xr3,G) (7.2)
where three individuals vectors Xr1,G, Xr2,G and Xr3,G are selected randomly from
the current populations. r1, r2, r3 ∈ {1, 2, · · · , NP} are random indexes. F is a real
and constant scale factor ∈ [0, 2] which controls the amplification of the differential
variation (Xr2,G - Xr3,G). In order to increase the potential diversity of the per-
turbed parameter vectors, a crossover operation comes into play after generating the
donor vector through mutation. The binomial crossover operation was shown in the
92
following.
ui,G =
vji,G, if randi,j[0, 1] 6 Cr or j = jrand
Xji,G, otherwise
(7.3)
where Cr is called the crossover rate. randi,j ∈ [0, 1]. After DE generates offspring
through mutation and crossover operation, the one-to-one greedy selection operator
is performed as:
ui,G+1 =
U ji,G, if f(Ui,G) 6 f(Xi,G)
Xji,G, otherwise
(7.4)
7.3 Design of multi-objective differential evolution
algorithm
For solving multiple objective problems, the general requirements of the approxima-
tion of the Pareto optimal set are two-fold: (1) minimize the distance to the true
pareto optimal fronts, and (2) the distribution of the obtained non-dominated solu-
tions are located as diverse as possible [166]. The purpose of this research is aimed
to address the above two requirements, and the processes of the proposed adaptive
mutation based ε-dominance differential evolution (IDE) are summarized in Fig. 7.1.
To generate initial solutions evenly located over the whole decision space, the
orthogonal experimental design method [167] is adopted in IDE. Refer to [168] for
detailed description of the orthogonal experimental design in population-based evo-
lutionary algorithm. After generating the orthogonal population (denoted as OP ),
an initial archive with the nondominated individuals extracted from OP through the
traditional Pareto dominance method [169] is created. Then the initial evolutionary
population (EP ) which is responsible for finding new non-dominated solutions is gen-
erated from the initial archive and OP . If the size of initial archive is larger than NP ,
93
Start
Generate the initial Orthogonal population
OP
Generate the initial population AR with nondominated
solutions from OP
Generate the initial EP from the initial
AR and OP
Whether the termination
condition is satisfied ?
Using the improved differential evolution operation produce
offspring, and evaluate the child individual
Update evolutionary population
Update the AR
by using the ε-dominance technology
G ++
End
Output the final AR
Y
N
Figure 7.1: The general flow chart of the proposed adaptive mutation based multi-objective differential evolution (IDE).
we select NP solutions from the initial archive randomly, or all of the ar size (which
is the size of the initial archive) solutions in the initial archive are inserted into EP ,
and the remainder NP - ar size solutions are selected from OP randomly. In order
to accelerate the algorithm convergence and make use of the archive individual to
guide the evolution, we adopt a hybrid selection mechanism when selecting the target
vector Xr1 as shown in Eq. (2). At the beginning phase of the evolution, all of the
parents for mating are randomly selected from EP to generate the offspring. With
the lapse of evolution, the elitist selection is used. We randomly choose one solution
from the archive as the base parent, and the other two parents are selected from the
evolution population EP randomly.
In previously reported works [159–163], all those multi-objective DE algorithm-
s set the scaling factor F as a constant in the whole process of evolution, which
94
made the search appear precocious phenomenon frequently. It is very sensitive to
set scaling factor F for traditional differential evolution algorithms. Experimental
work in a variety of DE algorithms has provided strong evidence supporting the view
that the performance of the algorithm is strongly depending on the setting of F val-
ues [170, 171]. To be more specifically, if the F value is too large, the DE algorithm
approximates for random search, thus the search efficiency and the accuracy of getting
the global optimal solution are quite low. On the contrary, if the F value is too small,
it can lose the diversity of population into the prematurity. To alleviate this prob-
lem, we propose an adaptive mutation operator that can determine the mutation rate
adaptively according to the progress of the search of the algorithm, thus enabling the
algorithm to possess greater mutation rates in the early search stages to maintain the
individuals’ diversity and to avoid precocious phenomena during the process. Later,
the mutation operator was gradually reduced to reserve good information and avoid
the destruction of the optimal solution, and meanwhile it increases the probability of
searching to the optimal solutions.
To realize the above characteristic of the setting of F , an adaptive setting rule is
designed as in Eqs. (6) and (7).
t = e1−Gm
Gm+1−G (7.5)
F = F0 ∗ 2G (7.6)
where F0 is initial mutation operator. Gm denotes the maximum number of fitness
evaluation. G indicates the current evolution number. At the beginning search phase
of the algorithm, the adaptive mutation operator is carried out with a probability
within [F0 - 2F0], which is a relatively large value to maintain the individual diversi-
ty. Along with the lapse of evolution, the mutation operator is gradually reduced to
reserve good information and expected to well balance the exploration and exploita-
tion of the search.
In addition, as noticed by Zitzler et al. [172] that elitism helps in achieving bet-
ter convergence in handling multiple objectives. Therefore, in this paper, the elitist
95
scheme is adopted through maintaining an external archive AR of nondominated so-
lutions found in evolutionary process. In order to achieve faster convergence, we
adopted [173] ε-dominance mechanism to update archive population. At each gen-
eration, the newly generated non-dominated solution is compared with each other
member which is already contained in the archive. The new individual can be saved
in the archive only when it meets the requirements that no individuals within a ε
distance exist. By doing so, we can ensure both convergence and diversity of the
Pareto fronts within reasonable computational times.
7.4 Simulation and Analysis
Multi-objective optimization problem is also known as multi-criteria optimization
problem [174]. In order to evaluate the effectiveness of the proposed IDE and make
a comparison with other multi-objective evolutionary algorithms, five widely used
benchmark problems [172] involving ZDT1, ZDT2, ZDT3, ZDT4 and ZDT6 are
adopted as the test suit. All problems have two objective functions and all objective
functions are to be minimized. The parameter settings of IDE are as follows: the
maximum number of fitness evaluation Gm = 5000, the initial scaling factor value of
F0=0.5, the crossover probability of CR = 0.3, NP = 100. For each problem, we run
50 times independently with different random seeds, then compared the performance
of IDE with the one of the traditional multi-objective DE variants (MDE) [161]. In
addition, we compared the results of IDE algorithm with NSGA-II [169], SPEA2 [175]
and MOEO [176]. To assess the performance of the compared algorithms, the con-
vergence metric λ and the diversity metric ∆ are used [166]. The first convergence
metric λ measures the distance of the obtained non-dominated sets Q and the true
Pareto front approximation sets P ∗ as in Eq. (7).
λ =
∑|Q|i=1 di| Q |
(7.7)
96
Table 7.1: Comparison of the convergence metric between IDE and MDE.
Problem ZDT1 ZDT2 ZDT3 ZDT4 ZDT6MDE 0.0028 0.00064 0.0038 0.0026 0.0008IDE 0.00075 0.00084 0.0030 0.0020 0.00075
Table 7.2: Comparison of the diversity metric between IDE and MDE.
Problem ZDT1 ZDT2 ZDT3 ZDT4 ZDT6MDE 0.2536 0.38565 0.40025 0.3850 0.3571IDE 0.2425 0.2896 0.39575 0.2709 0.2595
where di is the Euclidean distance between the solution i ∈ Q and the nearest member
of P ∗. It is clear that the lower the λ value, the better convergence of obtained
solutions, suggesting that the obtained non-dominated sets are more closer to the
true Pareto fronts.
The second diversity metric measures the extent of distribution among the ob-
tained non-dominated sets Q. ∆ is defined as in Eq. (8).
∆ =df + dl +
∑|Q|−1i=1 | di − d |
df + dl + (| Q | −1)d(7.8)
where di measures the Euclidean distance of each point in Q to its closer point,
df and dl denote the Euclidean distance between the extreme points in Q and P ∗,
respectively. Obviously, the lower the ∆ value is, the better distribution of solutions
possess.
Table 7.1 records the convergence metric λ obtained by IDE and the previous
MDE algorithm [161]. The diversity metric ∆ obtained by IDE and MDE are shown
in Table 7.2. Table 7.3 shows the convergence metric obtained by IDE and three
multi-objective evolutionary algorithms. Table 7.4 illustrates comparative results in
terms of the diversity metric obtained by IDE and its competitors. From Table 7.1,
we can find that IDE performs better results with respect to the convergence on all
tested instances, except on ZDT2, which suggested that the incorporated adaptive
97
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
f1
f2
ZDT1
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
f1
f2
ZDT1
True Pareto
Pareto front obtained by MDE
True Pareto
Pareto front obtained by IDE
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
f1
f2
ZDT2
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
f1f2
ZDT2
True Pareto
Pareto front obtained by MDE
True Pareto
Pareto front obtained by IDE
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
f1
f2
ZDT3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
f1
f2
ZDT3
True Pareto
Pareto front obtained by MDE
True Pareto
Pareto front obtained by DE
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
f1
f2
ZDT4
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
f1
f2
ZDT4
True Pareto
Pareto front obtained by MDE
True Pareto
Pareto front obtained by IDE
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
f1
f2
ZDT6
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
f1
f2
ZDT6
True Pareto
Pareto front obtained by MDE
True Pareto
Pareto front obtained by IDE
Figure 7.2: Pareto fronts obtained by IDE and its competitor algorithm MDE onZDT1, ZDT2, ZDT3, ZDT4, and ZDT6 respectively.
98
Table 7.3: Comparison of the convergence metric during IDE, NSGA-II, SPEA2, andMOEO.
Algorithm ZDT1 ZDT2 ZDT3 ZDT4 ZDT6NSGA-II 0.033482 0.072391 0.114500 0.513053 0.296564SPEA2 0.023285 0.16762 0.018409 4.9271 0.23255MOEO 0.001277 0.001355 0.004385 0.008145 0.000630IDE 0.00075 0.00084 0.0030 0.0020 0.00075
Table 7.4: Comparison of the diversity metric during IDE, NSGA-II, SPEA2, andMOEO.
Algorithm ZDT1 ZDT2 ZDT3 ZDT4 ZDT6NSGA-II 0.390307 0.430776 0.738540 0.702612 0.668025SPEA2 0.154723 0.33945 0.4691 0.8239 1.04422MOEO 0.327140 0.285062 0.965236 0.275567 0.225468IDE 0.2425 0.2896 0.39575 0.2709 0.2595
mutation strategy indeed help the search finding better solutions. On the other hand,
the comparative results in Table 7.2 show that IDE has capacity of finding a better
spread of solutions than MDE on all problems except ZDT6. From Table 7.3, it is
clear that IDE produces solutions significantly closer to the true Pareto fronts than
NSGA-II, SPEA2, and MOEO on all tested functions. An exception is that MOEO
can find slightly better solutions than IDE on ZDT6. With regards to the diversity
of obtained non-dominated solutions, as shown in Table 7.4, an overall improvement
can be found on IDE that its non-dominated solutions located more evenly than those
obtained by its competitor algorithms, verifying that the proposed adaptive mutation
strategy together with the ε-dominance no doubt improve the performance of DE in
terms of the diversity.
Furthermore, to further understand the performance of our improved algorith-
m more intuitively, Fig. 7.2 draws the Pareto fronts constructed by the obtained
non-dominated solutions that obtained by IDE and MDE on all tested functions re-
spectively. From this figure, it is clear that the Pareto fronts obtained by IDE is
much better than those by MDE. The performance on ZDT6 is quite illuminating to
99
further elaborate the search characteristics of the compared algorithms. Almost the
same number of non-dominated solutions are obtained by both algorithms, and the
average distance (measured by λ) to the true Pareto front is also within an acceptable
tolerance (0.0008 vs 0.00075). Nevertheless, the distribution of the non-dominated
solutions is quite different (0.3571 vs 0.2595). A significantly evenly distributed non-
dominated solutions for ZDT6 are obtained by IDE, implying that IDE is capable
of finding a well-distributed and near-complete set of non-dominated solutions when
handling multiobjectives.
7.5 Conclusion
This paper proposed an adaptive mutation operator based on the multi-objective d-
ifferential evolution algorithm. In the beginning of search phase, the algorithm has a
relatively large value to maintain the individuals’ diversity, and avoid the premature
phenomenon of fast convergence. With the lapse of evolution, the mutation oper-
ator was gradually reduced to reserve good information and avoid the destruction
to the optimal solution. Together with the ε-dominance strategy, we constructed
the effective IDE to handling multiple objectives. We test IDE via five standard
multi-objective test functions and the performance comparison during MDE, NSGA-
II, SPEA2 and MOEO. It can be concluded that IDE is superior to other algorithms
on multiple problems, indicating that our approach has ability to obtain effective
uniformly distributed and near-optimal Pareto sets.
100
Chapter 8
Conclusions
In this thesis, we proposed several models based on neural and evolutionary mecha-
nisms.
Firstly, we proposed a new single neuron model with synaptic nonlinearities in a
dendritic tree. The computation on neuron has a neuron-pruning function that can
reduce dimension by remove useless synapses and dendrites during learning, forming
a precise synaptic and dendritic morphology. The nonlinear interactions in a dendrite
tree are expressed using the Boolean logic AND (conjunction), OR (disjunction) and
NOT (negation). An error back propagation algorithm is used to train the neuron
model. Furthermore, we apply the new model to the Exclusive OR (XOR) problem
and it can solve the problem perfectly with the help of inhibitory synapses which
demonstrate synaptic nonlinear computation and the neurons ability to learn. The
research background is introduced in the following.
Secondly, accumulative study results have suggested that synaptic nonlinearities
of dendrites in a single neuron can possess powerful computational capacity. Our
previous works have established an approximate neuronal model which is able to cap-
ture the nonlinearities among excitatory and inhibitory inputs and thus successfully
predict the morphology of neurons when performing specific learning tasks. Gradient
based back-propagation (BP) method has been used to train the dendritic neuron
model. Due to its inherent local optima trapping problem, the BP method usually
cant find satisfactory solutions. Thus, we propose an artificial immune algorithm to
train the dendritic neuron model. In comparison to BP, the artificial immune algo-
101
rithm has advantages that the training process doesnt necessarily provide gradient
information, which enables the dendritic model can utilize non-conventional trans-
fer/activation functions in soma, and that the learning can be accomplished based
on a population of antibodies which is in a potential parallel computing manner and
greatly improves the probability of jumping out the local optima during training. Ex-
perimental results based on the famous XOR problem and a geotechnical engineering
problem verified the effectiveness of the proposed artificial immune algorithm.
Thirdly, with the number of liver disease deaths has been steadily increasing in
recent years, early detection and treatment for liver disease has been one of the most
active researches on using computational intelligence techniques. In this chapter, we
propose a more realistic single neuron model with synaptic nonlinearities in a den-
dritic tree for liver disorders diagnosis. The computation on neuron is performed as a
combination of dimensional reduction and nonlinearity, which has a neuron-pruning
function that can remove useless synapses and dendrites during learning, forming a
distinct synaptic and dendritic morphology. The nonlinear interactions in a dendrite
tree are expressed using the Boolean logic AND (conjunction), OR (disjunction) and
NOT (negation), which can be simply suitable for hardware implementation. Fur-
thermore, an error back propagation algorithm is used to train the neuron model
and the performance is compared with a traditional back propagation neural network
in terms of accuracy, sensitivity and specificity. We use the BUPA liver disorders
datasets obtained from the UCI Machine Learning Repository to verify the proposed
method. Simulation results show promise for use of this single neuron model as an
effective pattern classification method in liver disorders diagnostics.
Fourthly, gravitational search algorithm (GSA) has gained increasing attention
in dealing with complex optimization problems. Nevertheless it still has some draw-
backs, such as slow convergence and the tendency to become trapped in local minima.
Chaos generated by the logistic map, with the properties of ergodicity and stochastic-
ity, has been used to combine with GSA to enhance its searching performance. In this
work, other four different chaotic maps are utilized to further improve the searching
capacity of the hybrid chaotic gravitational search algorithm (CGSA), and six widely
102
used benchmark optimization instances are chosen from the literature as the test suit.
Simulation results indicate that all five chaotic maps can improve the performance of
the original GSA in terms of the solution quality and convergence speed. Moreover,
the four newly incorporated chaotic maps exhibit better influence on improving the
performance of GSA than the logistic map, suggesting that the hybrid searching dy-
namics of CGSA is significantly effected by the distribution characteristics of chaotic
maps.
Fifthly, differential evolution is well known as a powerful and efficient population-
based stochastic real-parameter optimization algorithms over continuous space. DE
is recently shown to outperform several well-known stochastic optimization methods
in solving multi-objective problems. Nevertheless, its performance is still limited in
finding a uniformly distributed and near optimal Pareto fronts. To alleviate such
limitations, this paper introduces an adaptive mutation operator to avoid prema-
ture of convergence by adaptively tuning the mutation scale factor F , and adopts
ε-dominance strategy to update the archive that stores the nondominated solutions.
Experiments based on five widely used multiple objective functions are conducted.
Simulation results demonstrate the effectiveness of our proposed approach with re-
spect to the quality of solutions in terms of the convergence and diversity of the
Pareto fronts.
103
Bibliography
[1] A. P. Engelbrecht, Computational intelligence: an introduction. John Wiley
& Sons, 2007.
[2] C. Darwin, The Origins of Species by Means of Natural Selection, Or the P-
reservation of Favoured Races in the Struggle for Life. Kartindo. com, 1888.
[3] L. A. Zadeh, “Fuzzy sets,” Information and control, vol. 8, no. 3, pp. 338–353,
1965.
[4] E. Marais, The soul of the white ant. the Philovox, 2009.
[5] C. D. Wynne, “The soul of the ape,” American Scientist, vol. 89, no. 2, pp.
120–122, 2001.
[6] R. C. Eberhart and J. Kennedy, “A new optimizer using particle swarm theo-
ry,” in Proceedings of the sixth international symposium on micro machine and
human science, vol. 1. New York, NY, 1995, pp. 39–43.
[7] J. Kennedy, “Particle swarm optimization,” in Encyclopedia of Machine Learn-
ing. Springer, 2010, pp. 760–766.
[8] S. F. M. Burnet et al., The clonal selection theory of acquired immunity. Uni-
versity Press Cambridge, 1959.
[9] P. Bretscher and M. Cohn, “A theory of self-nonself discrimination paralysis and
induction involve the recognition of one and two determinants on an antigen,
respectively,” Science, vol. 169, no. 3950, pp. 1042–1049, 1970.
104
[10] K. J. Lafferty and A. Cunningham, “A new analysis of allogeneic interactions,”
Immunology and Cell Biology, vol. 53, no. 1, pp. 27–42, 1975.
[11] B. Franklin and M. Bergerman, “Cultural algorithms: Concepts and experi-
ments,” in Evolutionary Computation, 2000. Proceedings of the 2000 Congress
on, vol. 2. IEEE, 2000, pp. 1245–1251.
[12] S. Forrest, A. S. Perelson, L. Allen, and R. Cherukuri, “Self-nonself discrimi-
nation in a computer,” in Proceedings of the IEEE Symposium on Research in
Security and Privacy. Ieee, 1994, p. 202.
[13] K. Mori, M. Tsukiyama, and T. Fukuda, “Immune algorithm with searching
diversity and its application to resource allocation problem,” Transactions-
Institute of Electrical Engineers of Japan C, vol. 113, pp. 872–872, 1993.
[14] N. K. Jerne, “Towards a network theory of the immune system,” in Annales
d’immunologie, vol. 125, no. 1-2, 1974, pp. 373–389.
[15] A. S. Perelson, “Immune network theory,” Immunological reviews, vol. 110,
no. 1, pp. 5–36, 1989.
[16] J. D. Farmer, N. H. Packard, and A. S. Perelson, “The immune system, adapta-
tion, and machine learning,” Physica D: Nonlinear Phenomena, vol. 22, no. 1,
pp. 187–204, 1986.
[17] J. E. Hunt and D. E. Cooke, “Learning using an artificial immune system,”
Journal of network and computer applications, vol. 19, no. 2, pp. 189–212, 1996.
[18] L. Chen and D. B. Flies, “Molecular mechanisms of t cell co-stimulation and
co-inhibition,” Nature Reviews Immunology, vol. 13, no. 4, pp. 227–242, 2013.
[19] C. D. Mills, K. Ley, K. Buchmann, and J. Canton, “Sequential immune re-
sponses: The weapons of immunity,” Journal of innate immunity, vol. 7, no. 5,
2015.
105
[20] P. Matzinger, “Essay 1: the danger model in its historical context,” Scandina-
vian journal of immunology, vol. 54, no. 1-2, pp. 4–9, 2001.
[21] ——, “The real function of the immune system,” Last accessed on, pp. 06–04,
2004.
[22] U. Aickelin, D. Dasgupta, and F. Gu, “Artificial immune systems,” in Search
Methodologies. Springer, 2014, pp. 187–211.
[23] K. Makisara, O. Simula, J. Kangas, and T. Kohonen, Artificial neural networks.
Elsevier, 2014, vol. 2.
[24] T. Back, U. Hammel, and H.-P. Schwefel, “Evolutionary computation: Com-
ments on the history and current state,” Evolutionary computation, IEEE
Transactions on, vol. 1, no. 1, pp. 3–17, 1997.
[25] H.-G. Beyer, The theory of evolution strategies. Springer Science & Business
Media, 2013.
[26] Y. Hu, K. Liu, X. Zhang, L. Su, E. Ngai, and M. Liu, “Application of evolution-
ary computation for rule discovery in stock algorithmic trading: A literature
review,” Applied Soft Computing, vol. 36, pp. 534–551, 2015.
[27] W. Gong, Z. Cai, and D. Liang, “Adaptive ranking mutation operator based
differential evolution for constrained optimization,” Cybernetics, IEEE Trans-
actions on, vol. 45, no. 4, pp. 716–727, 2015.
[28] J. C. Bezdek, “Ieee fellows-class of 2015 [society briefs],” Computational Intel-
ligence Magazine, IEEE, vol. 10, no. 2, pp. 7–17, 2015.
[29] W. Pedrycz, A. Sillitti, and G. Succi, “Computational intelligence: an intro-
duction,” in Computational Intelligence and Quantitative Software Engineering.
Springer, 2016, pp. 13–31.
[30] G. Beni, “From swarm intelligence to swarm robotics,” in Swarm robotics.
Springer, 2005, pp. 1–9.
106
[31] J. Halloy, G. Sempo, G. Caprari, C. Rivault, M. Asadpour, F. Tache, I. Said,
V. Durier, S. Canonge, J. M. Ame et al., “Social integration of robots into
groups of cockroaches to control self-organized choices,” Science, vol. 318, no.
5853, pp. 1155–1158, 2007.
[32] R. S. Parpinelli and H. S. Lopes, “New inspirations in swarm intelligence: a
survey,” International Journal of Bio-Inspired Computation, vol. 3, no. 1, pp.
1–16, 2011.
[33] M. Dorigo and C. Blum, “Ant colony optimization theory: A survey,” Theoret-
ical computer science, vol. 344, no. 2, pp. 243–278, 2005.
[34] C. Blum, “Ant colony optimization: Introduction and recent trends,” Physics
of Life reviews, vol. 2, no. 4, pp. 353–373, 2005.
[35] M. Dorigo, M. Birattari, and T. Stutzle, “Ant colony optimization,” Computa-
tional Intelligence Magazine, IEEE, vol. 1, no. 4, pp. 28–39, 2006.
[36] W. Xiang and H. Lee, “Ant colony intelligence in multi-agent dynamic manufac-
turing scheduling,” Engineering Applications of Artificial Intelligence, vol. 21,
no. 1, pp. 73–85, 2008.
[37] T. Blackwell and J. Branke, “Multiswarms, exclusion, and anti-convergence
in dynamic environments,” Evolutionary Computation, IEEE Transactions on,
vol. 10, no. 4, pp. 459–472, 2006.
[38] M. Dorigo, V. Maniezzo, and A. Colorni, “Ant system: optimization by a colony
of cooperating agents,” Systems, Man, and Cybernetics, Part B: Cybernetics,
IEEE Transactions on, vol. 26, no. 1, pp. 29–41, 1996.
[39] T. Stutzle and H. H. Hoos, “Max–min ant system,” Future generation computer
systems, vol. 16, no. 8, pp. 889–914, 2000.
107
[40] T. Stutzle and H. Hoos, “Max-min ant system and local search for the traveling
salesman problem,” in Evolutionary Computation, 1997., IEEE International
Conference on. IEEE, 1997, pp. 309–314.
[41] C. A. C. Coello, D. A. Van Veldhuizen, and G. B. Lamont, Evolutionary algo-
rithms for solving multi-objective problems. Springer, 2002, vol. 242.
[42] E. Bonabeau, M. Dorigo, and G. Theraulaz, Swarm intelligence: from natural
to artificial systems. Oxford university press, 1999, no. 1.
[43] J. Kennedy, J. F. Kennedy, R. C. Eberhart, and Y. Shi, Swarm intelligence.
Morgan Kaufmann, 2001.
[44] Y. Shi and R. Eberhart, “A modified particle swarm optimizer,” in Evolution-
ary Computation Proceedings, 1998. IEEE World Congress on Computational
Intelligence., The 1998 IEEE International Conference on. IEEE, 1998, pp.
69–73.
[45] J. J. Liang, A. K. Qin, P. N. Suganthan, and S. Baskar, “Comprehensive learn-
ing particle swarm optimizer for global optimization of multimodal functions,”
Evolutionary Computation, IEEE Transactions on, vol. 10, no. 3, pp. 281–295,
2006.
[46] C. A. C. Coello, G. T. Pulido, and M. S. Lechuga, “Handling multiple ob-
jectives with particle swarm optimization,” Evolutionary Computation, IEEE
Transactions on, vol. 8, no. 3, pp. 256–279, 2004.
[47] J. Robinson and Y. Rahmat-Samii, “Particle swarm optimization in electro-
magnetics,” Antennas and Propagation, IEEE Transactions on, vol. 52, no. 2,
pp. 397–407, 2004.
[48] R. Mendes, J. Kennedy, and J. Neves, “The fully informed particle swarm: sim-
pler, maybe better,” Evolutionary Computation, IEEE Transactions on, vol. 8,
no. 3, pp. 204–210, 2004.
108
[49] F. Van den Bergh and A. P. Engelbrecht, “A cooperative approach to particle
swarm optimization,” Evolutionary Computation, IEEE Transactions on, vol. 8,
no. 3, pp. 225–239, 2004.
[50] D. Dasgupta, Z. Ji, F. A. Gonzalez et al., “Artificial immune system (ais)
research in the last five years.” in IEEE Congress on Evolutionary Computation
(1), 2003, pp. 123–130.
[51] S. A. Hofmeyr and S. Forrest, “Architecture for an artificial immune system,”
Evolutionary computation, vol. 8, no. 4, pp. 443–473, 2000.
[52] S. Tonegawa, “Somatic generation of antibody diversity,” Nature, vol. 302, no.
5909, pp. 575–581, 1983.
[53] P. Matzinger, “The danger model: a renewed sense of self,” Science, vol. 296,
no. 5566, pp. 301–305, 2002.
[54] J. Timmis, M. Neal, and J. Hunt, “An artificial immune system for data anal-
ysis,” Biosystems, vol. 55, no. 1, pp. 143–150, 2000.
[55] J. Timmis and M. Neal, “A resource limited artificial immune system for data
analysis,” Knowledge-Based Systems, vol. 14, no. 3, pp. 121–130, 2001.
[56] M. J. Shlomchik, A. Marshak-Rothstein, C. B. Wolfowicz, T. L. Rothstein, and
M. G. Weigert, “The role of clonal selection and somatic mutation in autoim-
munity,” Nature, vol. 328, no. 6133, pp. 805–811, 1987.
[57] P. K. Harmer, P. D. Williams, G. H. Gunsch, and G. B. Lamont, “An artificial
immune system architecture for computer security applications,” Evolutionary
computation, IEEE transactions on, vol. 6, no. 3, pp. 252–280, 2002.
[58] L. N. De Castro and F. J. Von Zuben, “Learning and optimization using the
clonal selection principle,” Evolutionary Computation, IEEE Transactions on,
vol. 6, no. 3, pp. 239–251, 2002.
109
[59] S. GAO, H. DAI, G. YANG, and Z. TANG, “A novel clonal selection algorithm
and its application to traveling salesman problem,” IEICE Trans. on funda-
mentals of electronics, communications and computer science, vol. 90, no. 10,
pp. 2318–2325, 2007.
[60] Y. Yu, L. Cunhua, G. Shangce, and T. Zheng, “Quantum interference crossover-
based clonal selection algorithm and its application to traveling salesman prob-
lem,” IEICE Trans. on Information and Systems, vol. 92, no. 1, pp. 78–85,
2009.
[61] G. Shangce, T. Zheng, and J. ZHANG, “An improved clonal selection algorithm
and its application to traveling salesman problems,” IEICE Transactions on
Fundamentals of Electronics, Communications and Computer Sciences, vol. 90,
no. 12, pp. 2930–2938, 2007.
[62] K. Tanaka and M. Sugeno, “Stability analysis and design of fuzzy control sys-
tems,” Fuzzy sets and systems, vol. 45, no. 2, pp. 135–156, 1992.
[63] H. Li, S. Yin, Y. Pan, and H.-K. Lam, “Model reduction for interval type-2
takagi–sugeno fuzzy systems,” Automatica, vol. 61, pp. 308–314, 2015.
[64] W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in
nervous activity,” The bulletin of mathematical biophysics, vol. 5, no. 4, pp.
115–133, 1943.
[65] M. Minsky and S. Papert, Perceptrons: An essay in computational geometry.
Cambridge, MA: MIT Press, 1969.
[66] B. W. Mel, “Information processing in dendritic trees,” Neural Computation,
vol. 6, no. 6, pp. 1031–1085, 1994.
[67] M. Hausser and B. Mel, “Dendrites: bug or feature?” Current opinion in
neurobiology, vol. 13, no. 3, pp. 372–383, 2003.
110
[68] Y. Todo, H. Tamura, K. Yamashita, and Z. Tang, “Unsupervised learnable neu-
ron model with nonlinear interaction on dendrites,” Neural Networks, vol. 60,
pp. 96–103, 2014.
[69] C. Koch, T. Poggio, and V. Torre, “Nonlinear interactions in a dendritic tree:
localization, timing, and role in information processing,” Proceedings of the
National Academy of Sciences, vol. 80, no. 9, pp. 2799–2802, 1983.
[70] R. M. Garcıa-Gimeno, C. Hervas-Martınez, and M. I. de Siloniz, “Improving
artificial neural networks with a pruning methodology and genetic algorithms for
their application in microbial growth prediction in food,” International Journal
of Food Microbiology, vol. 72, no. 1, pp. 19–30, 2002.
[71] R. C. Paolicelli, G. Bolasco, F. Pagani, L. Maggi, M. Scianni, P. Panzanelli,
M. Giustetto, T. A. Ferreira, E. Guiducci, L. Dumas et al., “Synaptic pruning
by microglia is necessary for normal brain development,” Science, vol. 333, no.
6048, pp. 1456–1458, 2011.
[72] L. K. Low and H.-J. Cheng, “Axon pruning: an essential step underlying the
developmental plasticity of neuronal connections,” Philosophical Transactions
of the Royal Society of London B: Biological Sciences, vol. 361, no. 1473, pp.
1531–1544, 2006.
[73] M. M. Islam, M. Akhand, M. A. Rahman, and K. Murase, “Weight freezing to
reduce training time in designing artificial neural networks,” in Proceedings of
International Conference on Computer and Information Technology, 2002, pp.
132–136.
[74] J. Sietsma and R. J. Dow, “Neural net pruning-why and how,” in IEEE Inter-
national Conference on Neural Networks. IEEE, 1988, pp. 325–333.
[75] H. Cuntz, M. Remme, and B. Torben-Nielsen, The Computing Dendrite: From
Structure to Function. Springer, 2014.
111
[76] J. C. Magee, “Dendritic integration of excitatory synaptic input,” Nature Re-
views Neuroscience, vol. 1, no. 3, pp. 181–190, 2000.
[77] S. R. Williams and G. J. Stuart, “Role of dendritic synapse location in the
control of action potential output,” Trends in neurosciences, vol. 26, no. 3, pp.
147–154, 2003.
[78] M. London and M. Hausser, “Dendritic computation,” Annu. Rev. Neurosci.,
vol. 28, pp. 503–532, 2005.
[79] A. T. Gulledge, B. M. Kampa, and G. J. Stuart, “Synaptic integration in den-
dritic trees,” Journal of neurobiology, vol. 64, no. 1, pp. 75–90, 2005.
[80] T. Branco and M. Hausser, “The single dendritic branch as a fundamental
functional unit in the nervous system,” Current opinion in neurobiology, vol. 20,
no. 4, pp. 494–502, 2010.
[81] H. Sossa and E. Guevara, “Efficient training for dendrite morphological neural
networks,” Neurocomputing, vol. 131, pp. 132–142, 2014.
[82] P. J. Sjostrom, E. A. Rancz, A. Roth, and M. Hausser, “Dendritic excitability
and synaptic plasticity,” Physiological reviews, vol. 88, no. 2, pp. 769–840, 2008.
[83] X. Chen, U. Leischner, N. L. Rochefort, I. Nelken, and A. Konnerth, “Functional
mapping of single spines in cortical neurons in vivo,” Nature, vol. 475, no. 7357,
pp. 501–505, 2011.
[84] E. Salinas and L. Abbott, “A model of multiplicative neural responses in parietal
cortex,” Proceedings of the national academy of sciences, vol. 93, no. 21, pp.
11 956–11 961, 1996.
[85] F. Gabbiani, H. G. Krapp, C. Koch, and G. Laurent, “Multiplicative compu-
tation in a visual neuron sensitive to looming,” Nature, vol. 420, no. 6913, pp.
320–324, 2002.
112
[86] M. Liang, S.-X. Wang, and Y.-H. Luo, “Fast learning algorithms for multi-
layered feedforward neural network,” in Aerospace and Electronics Conference,
1994. NAECON 1994., Proceedings of the IEEE 1994 National. IEEE, 1994,
pp. 787–790.
[87] C. Charalambous, “Conjugate gradient algorithm for efficient training of arti-
ficial neural networks,” in Circuits, Devices and Systems, IEE Proceedings G,
vol. 139, no. 3. IET, 1992, pp. 301–310.
[88] M. T. Hagan and M. B. Menhaj, “Training feedforward networks with the
marquardt algorithm,” Neural Networks, IEEE Transactions on, vol. 5, no. 6,
pp. 989–993, 1994.
[89] X. Yao and Y. Liu, “A new evolutionary system for evolving artificial neural
networks,” Neural Networks, IEEE Transactions on, vol. 8, no. 3, pp. 694–713,
1997.
[90] J. Ilonen, J.-K. Kamarainen, and J. Lampinen, “Differential evolution training
algorithm for feed-forward neural networks,” Neural Processing Letters, vol. 17,
no. 1, pp. 93–105, 2003.
[91] J. Yu, L. Xi, and S. Wang, “An improved particle swarm optimization for evolv-
ing feedforward artificial neural networks,” Neural Processing Letters, vol. 26,
no. 3, pp. 217–231, 2007.
[92] S. Kiranyaz, T. Ince, A. Yildirim, and M. Gabbouj, “Evolutionary artificial neu-
ral networks by multi-dimensional particle swarm optimization,” Neural Net-
works, vol. 22, no. 10, pp. 1448–1462, 2009.
[93] S. Mirjalili, S. Z. M. Hashim, and H. M. Sardroudi, “Training feedforward
neural networks using hybrid particle swarm optimization and gravitational
search algorithm,” Applied Mathematics and Computation, vol. 218, no. 22, pp.
11 125–11 137, 2012.
113
[94] S. Mirjalili, S. M. Mirjalili, and A. Lewis, “Let a biogeography-based optimizer
train your multi-layer perceptron,” Information Sciences, vol. 269, pp. 188–209,
2014.
[95] N. L. Azad, A. Mozaffari, and J. K. Hedrick, “A hybrid switching predictive
controller based on bi-level kernel-based elm and online trajectory builder for
automotive coldstart emissions reduction,” Neurocomputing, vol. 173, pp. 1124–
1141, 2016.
[96] C. Yang, L. Tham, X.-T. Feng, Y. Wang, and P. Lee, “Two-stepped evolution-
ary algorithm and its application to stability analysis of slopes,” Journal of
Computing in Civil Engineering, vol. 18, no. 2, pp. 145–153, 2004.
[97] S. K. Das, R. K. Biswal, N. Sivakugan, and B. Das, “Classification of slopes
and prediction of factor of safety using differential evolution neural networks,”
Environmental Earth Sciences, vol. 64, no. 1, pp. 201–210, 2011.
[98] Http://www.liverfoundation.org/downloads/ alf download 1173.pdf.
[99] Http://www.endoflifecare-intelligence.org.uk/resources/ publication-
s/deaths from liver disease.
[100] P. Jeatrakul and K. Wong, “Comparing the performance of different neural
networks for binary classification problems,” in Eighth International Symposium
on Natural Language Processing. IEEE, 2009, pp. 111–115.
[101] Y. Zhang, Y. Yin, D. Guo, X. Yu, and L. Xiao, “Cross-validation based weights
and structure determination of chebyshev-polynomial neural networks for pat-
tern classification,” Pattern Recognition, vol. 47, no. 10, pp. 3414–3428, 2014.
[102] M. Seera and C. P. Lim, “A hybrid intelligent system for medical data clas-
sification,” Expert Systems with Applications, vol. 41, no. 5, pp. 2239–2249,
2014.
114
[103] M. Paliwal and U. A. Kumar, “Neural networks and statistical techniques: A
review of applications,” Expert systems with applications, vol. 36, no. 1, pp.
2–17, 2009.
[104] C. Koch, Biophysics of computation: information processing in single neurons.
Oxford university press, 1998.
[105] A. Destexhe and E. Marder, “Plasticity in single neuron and circuit computa-
tions,” Nature, vol. 431, no. 7010, pp. 789–795, 2004.
[106] L. Abbott and W. G. Regehr, “Synaptic computation,” Nature, vol. 431, no.
7010, pp. 796–803, 2004.
[107] R. A. Silver, “Neuronal arithmetic,” Nature Reviews Neuroscience, vol. 11,
no. 7, pp. 474–489, 2010.
[108] Y. Todo, H. Tamura, K. Yamashita, and Z. Tang, “Unsupervised learnable neu-
ron model with nonlinear interaction on dendrites,” Neural Networks, vol. 60,
pp. 96–103, 2014.
[109] Q. K. Al-Shayea, “Artificial neural networks in medical diagnosis,” Internation-
al Journal of Computer Science Issues, vol. 8, no. 2, pp. 150–154, 2011.
[110] W. G. Baxt, “Application of artificial neural networks to clinical medicine,”
The lancet, vol. 346, no. 8983, pp. 1135–1138, 1995.
[111] E. Alkım, E. Gurbuz, and E. Kılıc, “A fast and adaptive automated disease
diagnosis method with an innovative neural network model,” Neural Networks,
vol. 33, pp. 88–96, 2012.
[112] F. Rosenblatt, Principles of neurodynamics. Spartan Book, 1962.
[113] G. N. Priya and A. Kannan, “An innovative classification model for cad dataset
using svm based iterative linear discriminant analysis,” in Power Electronics
and Renewable Energy Systems. Springer, 2015, pp. 1415–1423.
115
[114] S. Blomfield, “Arithmetical operations performed by nerve cells,” Brain re-
search, vol. 69, no. 1, pp. 115–124, 1974.
[115] N. Brunel, V. Hakim, and M. J. Richardson, “Single neuron dynamics and
computation,” Current opinion in neurobiology, vol. 25, pp. 149–155, 2014.
[116] W. Rall, R. Burke, T. Smith, P. G. Nelson, and K. Frank, “Dendritic location of
synapses and possible mechanisms for the monosynaptic epsp in motoneurons,”
J. Neurophysiol, vol. 30, no. 5, pp. 884–915, 1967.
[117] V. Torre and T. Poggio, “A synaptic mechanism possibly underlying directional
selectivity to motion,” Proceedings of the Royal Society of London B: Biological
Sciences, vol. 202, no. 1148, pp. 409–416, 1978.
[118] Y.-N. Jan and L. Y. Jan, “Branching out: mechanisms of dendritic arboriza-
tion,” Nature Reviews Neuroscience, vol. 11, no. 5, pp. 316–328, 2010.
[119] J. W. Schnupp and A. J. King, “Neural processing: the logic of multiplication
in single neurons,” Current Biology, vol. 11, no. 16, pp. R640–R642, 2001.
[120] S. Bahramirad, A. Mustapha, and M. Eshraghi, “Classification of liver disease
diagnosis: A comparative study,” in International Conference on Informatics
and Applications. IEEE, 2013, pp. 42–46.
[121] W. Zhu, N. Zeng, N. Wang et al., “Sensitivity, specificity, accuracy, associated
confidence interval and roc analysis with practical sas R⃝ implementations,” NE-
SUG proceedings: health care and life sciences, Baltimore, Maryland, pp. 1–9,
2010.
[122] P. Anooj, “Clinical decision support system: Risk level prediction of heart
disease using weighted fuzzy rules,” Journal of King Saud University-Computer
and Information Sciences, vol. 24, no. 1, pp. 27–40, 2012.
[123] S. Ozsen and S. Gunes, “Attribute weighting via genetic algorithms for attribute
weighted artificial immune system (awais) and its application to heart disease
116
and liver disorders problems,” Expert Systems with Applications, vol. 36, no. 1,
pp. 386–392, 2009.
[124] J. F. Khaw, B. Lim, and L. E. Lim, “Optimal design of neural networks using
the taguchi method,” Neurocomputing, vol. 7, no. 3, pp. 225–245, 1995.
[125] Z. Beheshti, S. M. H. Shamsuddin, E. Beheshti, and S. S. Yuhaniz, “Enhance-
ment of artificial neural network learning using centripetal accelerated particle
swarm optimization for medical diseases diagnosis,” Soft Computing, vol. 18,
no. 11, pp. 2253–2270, 2014.
[126] D. Delen, G. Walker, and A. Kadam, “Predicting breast cancer survivability: a
comparison of three data mining methods,” Artificial intelligence in medicine,
vol. 34, no. 2, pp. 113–127, 2005.
[127] S. S. Haykin, S. S. Haykin, S. S. Haykin, and S. S. Haykin, Neural networks
and learning machines. Pearson Education Upper Saddle River, 2009, vol. 3.
[128] D. Pham, S. Dimov, and Z. Salem, “Technique for selecting examples in in-
ductive learning,” in European symposium on intelligent techniques, Aachen,
Germany. Citeseer, 2000, pp. 119–127.
[129] N. Cheung, “Machine learning techniques for medical analysis,” School of Infor-
mation Technology and Electrical Engineering, BsC thesis, University of Queen-
land, vol. 19, 2001.
[130] T. Van Gestel, J. A. Suykens, G. Lanckriet, A. Lambrechts, B. De Moor, and
J. Vandewalle, “Bayesian framework for least-squares support vector machine
classifiers, gaussian processes, and kernel fisher discriminant analysis,” Neural
Computation, vol. 14, no. 5, pp. 1115–1147, 2002.
[131] C. E. Yeow, “Nomograms visualization of naıve bayes classification on liver
disorders data,” School of Computer Engneering, Nanyang Technological Uni-
versity, 2006.
117
[132] U. V. Kulkarni and S. V. Shinde, “Neuro-fuzzy classifier based on the gaussian
membership function,” in International Conference on Computing, Communi-
cations and Networking Technologies (ICCCNT). IEEE, 2013, pp. 1–7.
[133] S. H. S. A. Ubaidillah, R. Sallehuddin, and N. H. Mustaffa, “Classification of
liver cancer using artificial neural network and support vector machine,” in
Proc. Of Int. Conf on Advance in Communication Network, and Computing,
2014, pp. 1–6.
[134] S. H. S. A. Ubaidillah, R. Sallehuddin, and N. A. Ali, “Cancer detection using
aritifical neural network and support vector machine: A comparative study,”
Jurnal Teknologi, vol. 65, no. 1, 2013.
[135] J. Ji, S. Gao, J. Cheng, Z. Tang, and Y. Todo, “An approximate logic neuron
model with a dendritic structure,” Neurocomputing, vol. 173, pp. 1775–1783,
2016.
[136] E. Rashedi, H. Nezamabadi-Pour, and S. Saryazdi, “Gsa: a gravitational search
algorithm,” Information Sciences, vol. 179, no. 13, pp. 2232–2248, 2009.
[137] P. K. Roy, “Solution of unit commitment problem using gravitational search al-
gorithm,” International Journal of Electrical Power & Energy Systems, vol. 53,
pp. 85–94, 2013.
[138] E. Rashedi, H. Nezamabadi-Pour, and S. Saryazdi, “Bgsa: binary gravitational
search algorithm,” Natural Computing, vol. 9, no. 3, pp. 727–745, 2010.
[139] S. Gao, C. Vairappan, Y. Wang, Q. Cao, and Z. Tang, “Gravitational search
algorithm combined with chaos for unconstrained numerical optimization,” Ap-
plied Mathematics and Computation, vol. 231, pp. 48–62, 2014.
[140] L. Bing and J. Weisun, “Chaos optimization method and its application,” Con-
trol Theory and Applications, vol. 14, no. 4, pp. 613–615, 1997.
118
[141] J. Yang, J. Z. Zhou, W. Wu, F. Liu, C. Zhu, and G. Cao, “A chaos algorithm
based on progressive optimality and tabu search algorithm,” in Proceedings of
2005 International Conference on Machine Learning and Cybernetics, vol. 5.
IEEE, 2005, pp. 2977–2981.
[142] H. Xu, Y. Zhu, T. Zhang, and Z. Wang, “Application of mutative scale chaos
optimization algorithm in power plant units economic dispatch,” Journal of
Harbin Institute of Technology, vol. 32, no. 4, pp. 55–58, 2000.
[143] M. Bucolo, R. Caponetto, L. Fortuna, M. Frasca, and A. Rizzo, “Does chaos
work better than noise?” IEEE Circuits and Systems Magazine, vol. 2, no. 3,
pp. 4–19, 2002.
[144] R. Resnick, D. Halliday, and J. Walker, Fundamentals of physics. John Wiley,
1988.
[145] P. Schroeder, “Gravity from the ground up,” Proceedings of the NPA, vol. 7,
pp. 498–503, 2010.
[146] R. Mansouri, F. Nasseri, and M. Khorrami, “Effective time variation of g in
a model universe with variable space dimension,” Physics Letters A, vol. 259,
no. 3, pp. 194–200, 1999.
[147] S. Talatahari, B. Farahmand Azar, R. Sheikholeslami, and A. Gandomi, “Im-
perialist competitive algorithm combined with chaos for global optimization,”
Communications in Nonlinear Science and Numerical Simulation, vol. 17, no. 3,
pp. 1312–1319, 2012.
[148] R. M. May, “Simple mathematical models with very complicated dynamics,”
Nature, vol. 261, no. 5560, pp. 459–467, 1976.
[149] A. Baranovsky and D. Daems, “Design of one-dimensional chaotic maps with
prescribed statistical properties,” International Journal of Bifurcation and
Chaos, vol. 5, no. 06, pp. 1585–1598, 1995.
119
[150] M. S. Tavazoei and M. Haeri, “Comparison of different one-dimensional maps as
chaotic search pattern in chaos optimization algorithms,” Applied Mathematics
and Computation, vol. 187, no. 2, pp. 1076–1085, 2007.
[151] B. Alatas, “Chaotic bee colony algorithms for global numerical optimization,”
Expert Systems with Applications, vol. 37, no. 8, pp. 5682–5687, 2010.
[152] S. Talatahari, B. F. Azar, R. Sheikholeslami, and A. Gandomi, “Imperialist
competitive algorithm combined with chaos for global optimization,” Commu-
nications in Nonlinear Science and Numerical Simulation, vol. 17, no. 3, pp.
1312–1319, 2012.
[153] T. Xiang, X. Liao, and K. Wong, “An improved particle swarm optimization
algorithm combined with piecewise linear chaotic map,” Applied Mathematics
and Computation, vol. 190, no. 2, pp. 1637–1645, 2007.
[154] K. Price, R. M. Storn, and J. A. Lampinen, Differential evolution: a practical
approach to global optimization. Springer, 2006.
[155] S. Das and P. N. Suganthan, “Differential evolution: A survey of the state-of-
the-art,” IEEE Transactions on Evolutionary Computation, no. 99, pp. 1–28,
2010.
[156] J. Zhang and A. C. Sanderson, “Jade: adaptive differential evolution with
optional external archive,” IEEE Transactions on Evolutionary Computation,
vol. 13, no. 5, pp. 945–958, 2009.
[157] J. Wang, J. Liao, Y. Zhou, and Y. Cai, “Differential evolution enhanced with
multiobjective sorting-based mutation operators,” IEEE Transactions on Cy-
bernetics, vol. 12, no. 44, pp. 2792–2805, 2014.
[158] L. V. Santana-Quintero and C. A. C. Coello, “An algorithm based on differential
evolution for multi-objective problems,” International Journal of Computation-
al Intelligence Research, vol. 1, no. 1, pp. 151–169, 2005.
120
[159] Y.-N. Wang, L.-H. Wu, and X.-F. Yuan, “Multi-objective self-adaptive differ-
ential evolution with elitist archive and crowding entropy-based diversity mea-
sure,” Soft Computing, vol. 14, no. 3, pp. 193–209, 2010.
[160] J. Zhang and A. C. Sanderson, “Self-adaptive multi-objective differential evo-
lution with direction information provided by archived inferior solutions,” in
IEEE Congress on Evolutionary Computation, 2008, pp. 2801–2810.
[161] W. Gong and Z. Cai, “An improved multiobjective differential evoluton based on
pareto-adaptive epsilon-dominance and orthogonal design,” European Journal
of Operational Research, vol. 198, no. 2, pp. 576–601, 2009.
[162] B. Chen, Y. Lin, W. Zeng, D. Zhang, and Y.-W. Si, “Modified differential evo-
lution algorithm using a new diversity maintenance strategy for multi-objective
optimization problems,” Applied Intelligence, pp. 1–25, 2015.
[163] J. K. Chong and K. C. Tan, “An opposition-based self-adaptive hybridized
differential evolution algorithm for multi-objective optimization (osade),” in
Proceedings of the 18th Asia Pacific Symposium on Intelligent and Evolutionary
Systems. Springer, 2015, pp. 447–461.
[164] E. Zitzler and L. Thiele, “Multiobjective evolutionary algorithms: a compar-
ative case study and the strength pareto approach,” IEEE Transactions on
Evolutionary Computation, vol. 3, no. 4, pp. 257–271, 1999.
[165] M. Laumanns, L. Thiele, K. Deb, and E. Zitzler, “Combining convergence and
diversity in evolutionary multiobjective optimization,” Evolutionary computa-
tion, vol. 10, no. 3, pp. 263–282, 2002.
[166] E. Zitzler, L. Thiele, M. Laumanns, C. M. Fonseca, and V. G. Da Fonseca, “Per-
formance assessment of multiobjective optimizers: An analysis and review,”
IEEE Transactions on Evolutionary Computation, vol. 7, no. 2, pp. 117–132,
2003.
121
[167] K. Fang and C. Ma, “Orthogonal and uniform experimental design,” Beijing:
Science press, 2001.
[168] Y. W. Leung and Y. Wang, “An orthogonal genetic algorithm with quantiza-
tion for global numerical optimization,” IEEE Transactions on Evolutionary
Computation, vol. 5, no. 1, pp. 41–53, 2001.
[169] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist mul-
tiobjective genetic algorithm: Nsga-II,” IEEE Transactions on Evolutionary
Computation, vol. 6, no. 2, pp. 182–197, 2002.
[170] S. Dasgupta, S. Das, A. Biswas, and A. Abraham, “On stability and conver-
gence of the population-dynamics in differential evolution,” AI Communica-
tions, vol. 22, no. 1, pp. 1–20, 2009.
[171] J. Brest, S. Greiner, B. Boskovic, M. Mernik, and V. Zumer, “Self-adapting
control parameters in differential evolution: A comparative study on numeri-
cal benchmark problems,” IEEE Transactions on Evolutionary Computation,
vol. 10, no. 6, pp. 646–657, 2006.
[172] E. Zitzler, K. Deb, and L. Thiele, “Comparison of multiobjective evolutionary
algorithms: Empirical results,” Evolutionary computation, vol. 8, no. 2, pp.
173–195, 2000.
[173] A. Hernandez-Dıaz, L. Santana-Quintero, C. Coello Coello, and J. Molina,
“Pareto-adaptive ε-dominance,” Evolutionary Computation, vol. 15, no. 4, pp.
493–517, 2007.
[174] K. Deb,Multi-objective optimization using evolutionary algorithms. JohnWiley
& Sons, 2001, vol. 16.
[175] E. Zitzler, M. Laumanns, and L. Thiele, “Spea2: Improving the strength pareto
evolutionary algorithm,” in Proc. Evolutionary Methods for Design Optimiza-
tion and Control with Applications to Industrial Problems, 2001, pp. 95–100.
122
[176] M.-R. Chen and Y.-Z. Lu, “A novel elitist multiobjective optimization algorith-
m: Multiobjective extremal optimization,” European Journal of Operational
Research, vol. 188, no. 3, pp. 637–651, 2008.
123
Acknowledgements
I would like to deeply thank the various people who, during my study and research,
gave me with useful and helpful assistance. Without their care and consideration,
this thesis would likely not have finished.
To my supervisor Prof. Zheng Tang at University of Toyama, who introduced me
to the significant and fascinating world of Intelligent Soft Computing, for his support
and continuous encouragement. Without his kind guidance and encouragement, I
would never have completed this degree. Furthermore, his help and support are not
limited in my study career, but also extended to my living life in Japan. Numerous
stimulating discussions and supports make me go ahead during the past times since
I came to Japan. Thanks to him, I could accomplish this thesis within three years.
I would like to thank my thesis referees, Prof. Hirobayashi, Prof. Yamazaki and
Associate Prof. Gao, from University of Toyama for a review and qualification of my
thesis and giving various valuable comments and suggestions.
To all the members of the Intelligent Information Systems Research Lab in Uni-
versity of Toyama, for all their help and friendship that made this time much more
enjoyable.
I would like to thank all the members of my family, for their unconditional love,
support, and encouragement through this process, through all my study process. In
particular, I would like to offer thanks to my husband, who endured my seemingly
endless hours of absorption in this effort without complaint, who gave me his unwa-
vering support, and who took care of many of those “nuisance” items usually referred
to as “Real Life” when I was off in my other world, that of completing this endeavor.