adaptive selection in evolutionary algorithm thesis

46
I An Adaptive Selection Scheme for Balancing Exploitation and Exploration By Muhammad Riyad Parvez Student ID: 200605043 Department of Computer Science and Engineering Bangladesh University of Engineering and Technology March 2012

Upload: riyad-parvez

Post on 11-May-2015

210 views

Category:

Engineering


5 download

TRANSCRIPT

Page 1: Adaptive Selection in Evolutionary Algorithm thesis

I

An Adaptive Selection Scheme for Balancing

Exploitation and Exploration

By

Muhammad Riyad Parvez

Student ID: 200605043

Department of Computer Science and Engineering

Bangladesh University of Engineering and Technology

March 2012

Page 2: Adaptive Selection in Evolutionary Algorithm thesis

II

DECLARATION

I, hereby, declare that the work presented in this thesis is the outcome of the investigation

performed by me under the supervision of Dr. Md. Monirul Islam, Associate Professor,

Department of Computer Science and Engineering, Bangladesh University of Engineering

and Technology, Dhaka. I also declare that no part of this thesis and thereof has been or is

being submitted elsewhere for the award of any degree or diploma.

Signature

(Muhammad Riyad Parvez)

Candidate

Countersigned

(Dr. Md. Monirul Islam)

Supervisor

Page 3: Adaptive Selection in Evolutionary Algorithm thesis

III

Acknowledgements

Firstly, I would like to express my profound gratefulness and honor to Dr. Md. Monirul Islam,

Associate Professor, Department of Computer Science and Engineering, Bangladesh University of

Engineering and Technology, for his continuous support, advice and care. His endless patience,

scholarly guidance, continual encouragement, constant and energetic supervision, constructive

criticism, valuable advice, reading many inferior drafts and correcting them at all stages have made it

possible to complete this thesis.

I would also like to thank all the faculty members and staff of Department of CSE, BUET, for their

support and cooperation.

Finally, I would like to express my deep respect and gratitude to my parents and my family. And to

Charles Darwin from whom, evolutionary algorithms get their aesthetic beauty.

Page 4: Adaptive Selection in Evolutionary Algorithm thesis

IV

Abstract

Evolutionary algorithms are successfully applied to problems where typical optimization

algorithms and local search methods fail. Like any search algorithm, evolutionary algorithms

have to face the conflicting goals of local exploitations and global explorations during the

search process. The success of evolutionary algorithm lies in its ability to explore and exploit

simultaneously. But EA is also error prone to loosing population diversity early generations

which results into population trapped into local optima and degrading of its performance.

Though numerous schemes are suggested for various stages of EA to prevent premature

optimization, maintaining balance between exploration and exploitation isn’t tried at survivor

selection stage. Existing selection strategies either focuses on exploiting or exploring.

This thesis introduces Adaptive Survivor Selection Strategy (ASSS), a totally new concept

that tries to maintain required amount of diversity at survivor selection stage. Along with new

selection strategy this thesis paper also presents new criteria for measuring diversity both for

individual and whole population. Key concept of this strategy is to measure diversity across

the population, calculating needed amount of diversity at that time and try to gain that

diversity level selecting survivors using newly introduced diversity measurement technique.

ASSS uses both fitness of an individual and how much diverse is the individual regarding to

current population to calculate an adaptive survivor selection fitness function. Using that

fitness we simply weed out inferior individuals in terms of fitness and diversity. This survivor

fitness function is adaptive which gives it control on selection pressure.

Performance in maintaining required level of diversity at any time of algorithm is evaluated

on a number of benchmark numerical optimization problems and results are compared with

several existing selection schemes. Experimental result shows that ASSS shows significant

performance gain in managing diversity for any sort of EA.

Page 5: Adaptive Selection in Evolutionary Algorithm thesis

V

Table of Contents

Chapter 1: Introduction……………………………………….……1

1.1 General Introduction……………………………………….……...1

1.2 Objective of the Thesis……………………………………….……2

1.3 Thesis Organization……………………………………………….3

Chapter 2: Background………………………………………….….4

2.1 When EA is Needed……………………………………………….4

2.2 Advantages of EA…………………………………………………4

2.3 Disadvantages of EA……………………………………………...4

2.4 Canonical Structure of EA……………………………………….5

2.5 Representation of Gene…………………………………………...6

2.6 Major Branches of EA…………………………………………....7

2.6.1 Genetic Algorithm………………………………………………….…..7

2.6.2 Evolutionary Programming……………………………………….…..7

2.6.3 Evolutionary Strategy…………………………………………………7

2.6.4 Genetic Programming…………………………………………………8

2.6.5 Memetic Algorithm……………………………………………………9

2.7 Existing Work…………………………………………………….9

2.7.1 Dynamic Parameter Control…………………………………………9

2.7.2 Maintaining Diversity and Multi-population GAs…………………9

2.7.3 Memory Based Genetic Algorithm…………………………………10

2.7.4 Mutation Based Work………………………………………………10

2.7.5 Survivor Selection Based Work……………………………………12

Page 6: Adaptive Selection in Evolutionary Algorithm thesis

VI

Chapter 3: Proposed Algorithms………………………………..15

3.1 Dual Population Genetic Algorithm………………………….15

3.1.1 Advantages………………………………………………………….16

3.1.2 Disadvantages……………………………………………………….16

3.1.3 Recommendation……………………………………………………17

3.2 Modified DPGA Proposal……………………………………...17

3.2.1 Structure of Individual……………………………………………..17

3.2.2 Initialization………………………………………………………….18

3.2.3 Parent Selection……………………………………………………..18

3.2.4 Generating Parent Individual On The Fly………………………..18

3.2.5 Mutation…………………………………………………………….18

3.2.6 Survivor Selection…………………………………………………..19

3.2.6.1 Exploited Individual……………………………………...19

3.2.6.2 Explored Individual……………………………………....19

3.2.6.3 Normal Individual………………………………………..19

3.2.7 Schedule of T……………………………………………………….20

3.2.8 Advantages…………………………………………………………20

3.3 New Survivor Selection Strategy……………………………..20

3.4 New Mutation Strategy……………………………………….24

3.4.1 Laplace Distribution………………………………………………25

3.4.2 Slash Distribution…………………………………………………27

3.4.3 Students T-Distribution…………………………………………..28

Chapter 4: Experimental Study………………………………...29

4.1 Modified DPGA………………………………………………29

4.1.1 Pitfalls of Modified DPGA………………………………………29

Page 7: Adaptive Selection in Evolutionary Algorithm thesis

VII

4.2 Adaptive Survivor Selection Strategy……………………….30

Chapter 5: Conclusion…………………………………………...32

5.2 Future Works……………………………………………….....32

5.2.1 Modified DPGA…………………………………………………...32

5.2.2 Adaptive Survivor Selection………………………………………33

5.2.3 New Distribution Based Mutation………………………………..33

References………………………………………………………...34

Appendix………………………………………………………….35

Page 8: Adaptive Selection in Evolutionary Algorithm thesis

VIII

List of Figures

Table 1.1(a): Change in best fitness (best solution) with number of generations…………….1

Table 1.1(b): Comparison between Random Search, EA and Problem Tailored Search……..2

Table 2.4: Basic skeleton of an Evolutionary Algorithm………………………………….…..6

Table 2.6.4: Individual structure of GP………………………………………………….……8

Table 2.7.4: Probability Distribution Function (PDF) of Gaussian distribution…………..…11

Table 2.7.5: Truncation Selection……………………………………………………………12

Table 3.1(a): Offspring Generation of DPGA…………………………………………….…15

Table 3.1(b): Reserve Population Fitness Function……………………………………….…16

Table 3.4: Probability Density Function of Stable Family………………………….…….….24

Table 3.4.1(a): Probability Density Function of Laplace Distribution…………………...…..26

Table 3.4.1(b): Comparison of Gaussian and Laplace Distribution…………………….…....26

Table 3.4.2: Probability Density Function of Slash Distribution at different parameters........27

Table 3.4.3: Probability Density Function for Student’s t-distribution with different degrees

of freedom……………………………………………………………………………28

Table 4.2(a): Change in diversity across generations………………………………………...31

Table 4.2(b): Number of buckets searched…………………………………………………..31

Page 9: Adaptive Selection in Evolutionary Algorithm thesis

IX

List of Symbols

MPGA Multi Population Genetic Algorithm

DPGA Dual Population Genetic Algorithm

GA Genetic Algorithm

StGA Standard Genetic Algorithm

EA Evolutionary Algorithm

EC Evolutionary Computing

EP Evolutionary Programming

MA Memetic Algorithm

FEP Fast Evolutionary Programming

ASSS Adaptive Survivor Selection Strategy

Page 10: Adaptive Selection in Evolutionary Algorithm thesis

X

Page 11: Adaptive Selection in Evolutionary Algorithm thesis

XI

Chapter 1

Introduction

1.1 General Introduction

Evolutionary Algorithm (EA) is the study of computational system which use ideas and get

inspirations from natural evolution. It’s a generic population based meta-heuristic

optimization algorithm. EA falls into category of bio-inspired computing. It uses selection,

crossover, mutation mechanisms borrowed from natural evolution. And survival of the fittest

principle lies in the heart of EA [1] [2]. Evolution Algorithms are often viewed as function

optimizers, although the range of problems to which EAs are applies quite broad. One of the

many advantages of EAs is they don’t require very broad domain knowledge. Although

domain knowledge can be introduced in EAs.

Fitness curve by generations for EA is asymptotic in nature. Fitness improvement in earlier

generations of EA is rapid and decreasingly increasing. And after certain generations,

improvement in best fitness throughout generations is negligible. That’s when we call

population has converged. It’s expected that population will converge to good enough

solution. But sometimes population converges to local optima which is not accepted result.

This phenomenon is called premature optimization.

Figure 1.1(a): Change in best fitness (best solution) with number of generations

EAs performs better than random search because search because of its exploitative behavior.

It uses random walk, but also tries exploit good solutions. It also outperforms local greedy

Page 12: Adaptive Selection in Evolutionary Algorithm thesis

XII

search. Local greedy searches are exploitative in nature, often trapped into local maxima. But

EA has random walk and maintaining required level of diversity it’s less likely to be trapped

into local maxima. Problem tailored searches outperform EA only for the problem in which

the search is tailored and uses deep domain knowledge of that problem. Such deep domain

knowledge isn’t readily available and incorporating to problem tailored search is difficult.

Figure (1.1b): Comparison between Random Search, EA and Problem Tailored Search[4]

1.2 Thesis Objective

This thesis mainly focuses into maintaining diversity of single population algorithms. It is

frequently observed that populations lose diversity too early and their individuals are trapped

into local optima. For lack of diversity trapped individuals can’t escape basin of local

minima. This phenomenon is called Premature Convergence. Objective of this thesis paper is

to investigate better schemes which can maintain diversity of a population and also give

control on diversity. The quest is searching for an adaptive diversity maintaining scheme.

Thesis is done in three focused areas:

1. Modifying Dual Population Genetic Algorithm (DPGA) so that it can properly

manage diversity.

2. Seeking a survivor selection technique which is adaptive and gives more control

on diversity at any time of algorithm.

3. Examining probability distributions other than already used distributions which

can give appropriate amount of jumps in any stage of evolution.

Page 13: Adaptive Selection in Evolutionary Algorithm thesis

XIII

1.3 Thesis Organization

The rest of the thesis is organized as follows. Chapter 2 introduces the fundamentals of

evolutionary algorithm, with its operators and processes. The essential terms related to

evolutionary algorithm are explained with examples. The strengths, limitations, and

applications of evolutionary algorithm are also mentioned.

In Chapter 3, we introduce new evolutionary strategies, entitled as Modified DPGA,

Adaptive Survivor Selection Strategy, New Mutation Based on Distributions, to balance the

exploitative and explorative features of the standard evolutionary algorithm. The different

stages, operators and procedures of Modified DPGA, ASSS, and Mutation Based on

Distribution are described in details. It is also explained how they differ substantially from

other existing works.

Chapter 4 evaluates Modified DPGA and ASSS on a number of benchmark numerical

optimization problems and makes comparisons with several other existing works. Although

Modified DPGA didn’t perform well, but we gained valuable insight how we can modify this

further to gain more performance. An in-depth experimentation with the parameters,

operators and the stages of ASSS, with their effects on population fitness and diversity, is

also carried out. Finally, in Chapter 5, we summarize our work and provide directions for

future research.

Page 14: Adaptive Selection in Evolutionary Algorithm thesis

XIV

Chapter 2

Background

Evolutionary Algorithms (EA) consist of several heuristics, which are able to solve

optimization tasks by imitating some aspects of natural evolution. They may use different

levels of abstraction, but they are always working on whole populations of possible solutions

for a given task. EAs are an approved set of heuristics, which are flexible to use and postulate

only negligible requirements on the optimization task.

2.1 When EA is Needed

The search space is large, complex or poorly understood.

Domain knowledge is scarce or expert knowledge is difficult to encode to narrow the

search space.

Only target (fitness) function is provided.

No mathematical analysis is available.

Traditional search methods fail.

Not the best solution but good enough solution is needed.

Local search methods can’t give good enough solutions.

Continuous optimization problems.

2.2 Advantages of EA

Applicable to a wide range of problems.

Useful in areas without good problem specific techniques.

No explicit assumptions about the search space necessary.

Easy to implement.

Any-time behavior.

2.3 Disadvantages of EA

Problem representation must be robust.

No general guarantee for an optimum.

No solid theoretically foundations (yet).

Parameter tuning: trial-and-error Process (but self-adaptive variants in evolution

strategies).

Sometimes high memory requirements.

Implementation: High degree of freedom.

Page 15: Adaptive Selection in Evolutionary Algorithm thesis

XV

2.4 Canonical Structure of EA

EAs are family of algorithms. There’s no definite structure exists among them. Although

most of the EAs follow more or less following structure:

1. Initialization: The initial population of candidate solutions is usually generated

randomly across the search space. However, domain specific knowledge or other

knowledge can easily be incorporated.

2. Evaluation: Once the population is initialized or offspring population is created,

the fitness value of the candidate solutions is evaluated.

3. Parent Selection: Selection allocates more copies of those solutions with higher

fitness values and thus imposes the survival-of-the-fittest mechanism on the

candidate solutions. The main idea of selection is to prefer better solutions to

worse ones, and many selection procedures have been proposed to accomplish this

idea, including roulette-wheel selection, stochastic universal selection, ranking

selection and tournament selection, some of which are described in the next

section.

4. Recombination: Recombination combines parts of two or more parental solutions

to create new, possibly better solutions (i.e. offspring). There are many ways of

accomplishing this (some of which are discussed in the next section), and

competent performance depends on a properly designed recombination

mechanism. The offspring under recombination will not be identical to any

particular parent and will instead combine parental traits in a novel manner.

5. Mutation: While recombination operates on two or more parental chromosomes,

mutation locally but randomly modifies a solution. Again, there are many

variations of mutation, but it usually involves one or more changes being made to

an individual’s trait or traits. In other words, mutation performs a random walk in

the vicinity of a candidate solution.

6. Replacement: The offspring population created by selection, recombination, and

mutation replaces the original parental population. Many replacement techniques

such as elitist replacement, generation-wise re-placement and steady-state

replacement methods are used in GAs.

7. Repeat steps 2–6 until a terminating condition is met.

Page 16: Adaptive Selection in Evolutionary Algorithm thesis

XVI

Figure (2.4): Basic skeleton of an Evolutionary Algorithm

2.5 Representation of Gene

Individual representations are typically divided into two types:

1. Genotypic Representation: Genes are internal structures those determine physical

characteristics of an individual. Usually represented by array of letters like genes

in human DNA. In case of EA, it is represented by bit-string. Genotypic

representation is used extensible in Genetic Algorithm. But it has some limitation.

Most real world problems are not in form of genotypic representation. So we have

to device a scheme to represent genotype by bit-string. Performance of algorithm

is dependent on representation of bit-string.

2. Phenotypic Representation: Individuals are represented by real valued vectors. So

there’s no need to convert them to any other representations. Algorithm directly

works on real valued vectors of problems. Extensively used in Evolutionary

Strategy and Evolutionary Programming. It’s used in real valued function

optimization.

Page 17: Adaptive Selection in Evolutionary Algorithm thesis

XVII

2.6 Major Branches of EA

EAs are divided into four major branches.

2.6.1 Genetic Algorithm

Genetic Algorithm (GA) was first formulated by John Holland. Holland’s original GA is

called standard Genetic Algorithm which uses two parents, produces two offspring. It

simulates Darwinian evolution. Search operators are only applied to genotypic representation;

hence it’s called Genotypic Algorithm. It emphasizes the role of crossover and mutation as a

background operator. GA uses binary string as representation of individuals extensively.

2.6.2 Evolutionary Programming

Evolutionary Programming (EP) was first proposed by David Fogel [2]. It is closer to

Lamarckian evolution. It doesn’t use any kind of crossover. Only mutation is used both for

exploitation and exploration. Individuals are represented by two parts: object variables and

mutation step size . are essentially real valued vectors i.e. phenotypes. So they are called

Phenotypic Algorithm.

2.6.3 Evolutionary Strategies

Evolutionary Strategies (ES) was first proposed by Ingred Rechenberg. Individuals are

represented by real valued vectors. Good optimizer of real valued functions. Like EP, they

are also Phenotypic Algorithm. Mutation plays the main role, crossover is also used. It has

special self-adapting step size of mutation. ES has some basic notation:

1. (p,c) The p parents 'produce' c children using mutation. Each of the c children is then

assigned a fitness value, depending on its quality considering the problem-specific

environment. The best (the fittest) p children become next generations parents. This

means the c children are sorted by their fitness value and the first p individuals are

selected to be next generations parents (c must be greater or equal p).

2. (p+c) The p parents 'produce' c children using mutation. Each of the c children is then

assigned a fitness value, depending on its quality considering the problem-specific

environment. The best (the fittest) p individuals of both: parents and children become

next generations parents. This means the c children together with the p parents are

sorted by their fitness value and the first p individuals are selected to be next

generations parents.

3. (p/r,c) The p parents 'produce' c children using mutation and recombination. Each of

the c children is then assigned a fitness value, depending on its quality considering the

problem-specific environment. The best (the fittest) p children become next

generations parents. This means the c children are sorted by their fitness value and the

first p individuals are selected to be next generation parents (c must be greater or

equal p). 4. (p+c) The p parents 'produce' c children using mutation and recombination. Each of

the c children is then assigned a fitness value, depending on its quality considering the

problem-specific environment. The best (the fittest) p individuals of both: parents and

Page 18: Adaptive Selection in Evolutionary Algorithm thesis

XVIII

children become next generation parents. This means the c children together with the

p parents are sorted by their fitness value and the first p individuals are selected to be

next generations parents.

2.6.4 Genetic Programming

Genetic Programming (GP) is put forward by John Koza. GP evolves computer programs. It

is a specialization of genetic algorithms (GA) where each individual is a computer program.

It is a machine learning technique used to optimize a population of computer programs

according to a fitness landscape determined by a program's ability to perform a given

computational task. Trees can be easily evaluated in a recursive manner. Every tree node has

an operator function and every terminal node has an operand, making mathematical

expressions easy to evolve and evaluate. Genetic programming starts with a primordial ooze

of thousands of randomly created computer programs. This population of programs is

progressively evolved over a series of generations. The evolutionary search uses the

Darwinian principle of natural selection (survival of the fittest) and analogs of various

naturally occurring operations, including crossover (sexual recombination), mutation, gene

duplication, gene deletion. Genetic programming sometimes also employs developmental

processes by which an embryo grows into fully developed organism. It uses both mutation

and crossover. Trees are often used as data structure for individuals. Although non-tree

representations have been suggested and successfully implemented. Although other fields of

EA developed to be in mainstream usage, GP still is in its infancy. Because of representation

of programs, huge search space, complex operation is needed to generate better individuals,

GP isn’t mainstream yet.

Figure (2.6.4): Individual structure of GP

Page 19: Adaptive Selection in Evolutionary Algorithm thesis

XIX

2.6.5 Memetic algorithm

Although Memetic algorithms don’t fall into EA category, they incorporate other searching

techniques to EAs. The combination of Evolutionary Algorithms with Local Search

Operators that work within the EA loop has been termed “Memetic Algorithms” (MA). Quite

often, MA are also referred to in the literature as Baldwinian Evolutionary algorithms (EA),

Lamarckian EAs, cultural algorithms or genetic local search. After generating individuals

local search is performed on them. The frequency and intensity of individual learning directly

define the degree of evolution (exploration) against individual learning (exploitation) in the

MA search, for a given fixed limited computational budget. Clearly, a more intense

individual learning provides greater chance of convergence to the local optima but limits the

amount of evolution that may be expended without incurring excessive computational

resources. Therefore, care should be taken when setting these two parameters to balance the

computational budget available in achieving maximum search performance. When only a

portion of the population individuals undergo learning, the issues on which subset of

individuals to improve need to be considered to maximize the utility of MA search.

2.7 Existing Works

2.7.1 Dynamic Parameter Control

A variety of previous works have proposed methods of dynamically adjusting the parameters

of GA or other evolutionary algorithms. These methods include deterministic parameter

control, adaptive parameter control, and self-adaptive parameter control. The simplest

technique is the deterministic parameter control, which adjusts parameters according to a

predetermined policy. Since it controls the parameters deterministically, it cannot adapt to the

changes that occur during the execution of an algorithm.

Adaptive parameter control exploits feedback from the evolution of a population to control

the parameters. A notable example is the 1:5 adaptive Gaussian mutation widely used in the

evolution strategy algorithms. According to this method, the mutation step size is increased if

more than 20% of the mutations are successful and reduced otherwise. However, this method

cannot be applied to algorithms adopting other than the real number representation. Finally,

self-adaptive parameter control encodes the parameters into chromosomes and let them

evolve with other genes. Although elegant, its applicability and effectiveness in a broad range

of problems have not yet been shown

2.7.2 Maintaining Diversity and Multi-population Genetic Algorithms

Multi population GAs (MPGAs) do so by evolving multiple subpopulations which are

spatially separated [6]. Island-model GA (IMGA), which is a typical example of MPGA,

evolves two or more subpopulations and uses periodic migration for the exchange of

information between the subpopulations. The number and size of the populations of IMGA

Page 20: Adaptive Selection in Evolutionary Algorithm thesis

XX

are predetermined and kept unchanged during the algorithm’s execution. However, other

MPGAs such as multinational GA forking GA the bi-objective multi population algorithm

and variable island GA can adjust the number and size of populations dynamically by

splitting a population into two smaller ones or combining two similar ones. The performance

of IMGA is sensitive to the migration policy, migration rates and size, and the particular

topology used, because they determine the spread speed of good solutions among the

subpopulations. A variety of previous works have studied the effect of these parameters for

migration both theoretically and experimentally.

2.7.3 Memory Based Genetic Algorithm

Diploid GA, GA with unexpressed genes, dual GA (dGA), and primal-dual GA (PDGA) have

adopted complementary and dominance mechanisms to maintain or provide population

diversity. Most organisms in nature have a great number of genes in their chromosomes and

only some of the dominant genes are expressed in a particular environment. The repressed

genes are considered as a means of storing additional information and providing a latent

source of

population diversity. Diploid GAs use diploid chromosomes which are different from natural

ones in that the two strands of the diploid chromosomes are not complementary. Only some

genes in a diploid chromosome are expressed and used for fitness evaluation by some

predetermined dominance rules. GAUG is different from diploid GA in that it uses haploid

chromosomes, but it also incorporates some unexpressed genes into its chromosomes. The

unexpressed genes in GAUG are not used for fitness evaluation but used for preserving

diversity.

dGAs and PDGAs also have haploid chromosomes in the population, but the chromosomes

are sometimes interpreted complementarily to provide additional diversity. In dGA, each

chromosome is attached with an additional bit which indicates whether the chromosome

should be interpreted as it is or as complemented. In PDGA, some bad-looking chromosomes

are interpreted both as complemented and original, and the original one is replaced by the

complemented one if the latter gives better evaluations. Since the additional diversity

provided by memory-based algorithms makes it easier to adapt to extreme environmental

changes, these methods are frequently used for dynamic optimization problems.

2.7.4 Mutation Based Work

ES and EP use mutations exclusively for both maintaining diversity and exploitation.

Mutations can be divided into several categories. Mutation classification based on uniform

ness across generations is:

1. Uniform Mutation: When mutation step size or mutation rate is uniform

regardless of generation at any time of algorithm, then it’s called uniform

mutation. Its usage not very high because of deterministic behavior regardless of

generations.

Page 21: Adaptive Selection in Evolutionary Algorithm thesis

XXI

2. Non Uniform Mutation: If mutation step size of mutation rate varies with respect

to generation, then it’s called non uniform mutation. Usually at initial generations,

step size or mutation rate is higher. As generation continues to increase step size

or mutation rate is decreased gradually. It’s used frequently, because it gives

option for governing diversity rate and also when diversity is needed it’s

facilitated by large step size and convergence is needed it’s facilitated by small

step size.

For genetic algorithm, random bit-flipping is used for mutation. Random bit changing has

some issues. For example, bit changing in higher position in bit-string has more effect on bit

changing in lower position. And also for some bit-string going to immediate next or previous

bit-string needs all bits changing. So exploitation becomes difficult. It’s called Hamming

Cliff problem. Using gray code can mitigate effect of this problem.

For mutation, random step size is needed to introduce random walk into search space. For

random number generation, Gaussian distribution is most used. It’s a bell shaped curve. It’s

defined by two parameters: position parameter (mean, µ), scale parameter (standard

deviation, σ) and is denoted by . Always µ=0 and usually σ=3 i.e. is used for

random number.

generation (RNG). Mutations using Gaussian distribution is called Gaussian mutation.

Algorithms using distribution based mutation.

Figure (2.7.4): Probability Distribution Function (PDF) of Gaussian distribution

Xin Yao uses two more distributions for RNG. They are:

1. Cauchy Distribution

2. Levy Distribution

Gaussian, Cauchy and Levy they all have same bell curve shape PDF. Both of them have

same parameter set like Gaussian. Both Cauchy and Levy have fatter tail than Gaussian. That

means they are able to give more long jumps which can give more diverse individuals; less

prone to getting trapped into local optima. Mutation using Cauchy and Levy distribution as

Page 22: Adaptive Selection in Evolutionary Algorithm thesis

XXII

RNG are called Cauchy mutation and Levy mutation respectively. Xin Yao uses adaptive

mutation parameter. Every individual is represented by pair of , where is real values

vectors, is adaptive mutation parameter, is size parameter or standard deviation of that

distribution.

2.7.5 Survivor Selection Based Work

Survivor selection is usually deterministic. In this phase of algorithm, selection pressure is

applied to individuals. Several survivor selection schemes exist:

1. Naïve Survivor Selection: Basically follows survival of the fittest principle.

Individuals are selected based on their fitness value for next generation. Lower fitness

valued individuals are weed out. Sometimes risky, because lower fitness individuals

can have latent genes which can give better individuals in later generations.

2. Elitist Selection: Population maintains spot for best individuals so that they didn’t get

lost across the generations. Certain portion of best individuals is transferred directly to

next generation without any modification. This ensures even if algorithm can’t make

any solution any better than current solutions, the best solution must remain and in the

end of the algorithm is returned.

3. Truncation Selection: Truncation selection simply retains the fittest x% of the

population. These fittest individuals are duplicated into the next generation, so that the

population size is maintained. Less fit candidates are culled even without being given

the opportunity to evolve into something better. Very often results in premature

convergence. Only advantage is rapid convergence.

Figure (2.7.5): Truncation Selection

Page 23: Adaptive Selection in Evolutionary Algorithm thesis

XXIII

4. Fitness Proportionate Reproduction: Same as roulette wheel selection scheme.

Individuals are directly transferred to next generation based on their proportionate

fitness value. Individuals of lower fitness still have some chances to survive, so that

some genes that are latent can survive through generations even so they haven’t been

able to generate good individuals.

5. Niching Methods: Niching methods strive to maintain niches [9] [10]. That means it

ensures individuals of one niche don’t have to compete with individuals of other

niches. The advantage is pre-existing diversity is maintained. But also makes

convergence harder as selection pressure is lower. Niching methods are divided into

two categories:

i) Fitness Sharing: In nature, individuals of same species compete with each other

for fixed resources [13]. Like nature, in fitness sharing, individuals in same region

share fixed fitness values assigned to that region. Fitness is a shared resource of

the population. Population is first divided into niches. Region is defined by

sharing radius . Sharing Radius defines the niche size. This scheme is

very sensitive to the value of assigned fitness per region and sharing radius.

Population does not converge as a whole, but convergence takes place within the

niches. Sharing can be done at genotypic or phenotypic level: 1. Genotypic level:

Hamming distance and 2. Phenotypic Level: Euclidean distance. Sharing radius: if

too small, practically no effect on the process; if too large, several peaks will

’melt’ individual peaks into one.

ii) Crowding: Similar individuals in natural population, often of the same species,

compete against each other for limited resources. Dissimilar individuals tend to

occupy different niches, they typically don’t compete. Crowding uses individuals

newly entering in a population to replace similar individuals. Random sample of

CF (Crowding Factor) individuals is taken from the population. Larger crowding

factor indicates less tolerance for the similar solutions, smaller values indicate

similar solutions are more welcomed. New members of particular species replace

older members of that species, not replacing members of other species. Crowding

doesn’t increase the diversity of population; rather it strives to maintain the pre-

existing diversity. It’s not directly influenced by fitness value. Crowding is

divided into:

(1) Deterministic Crowding: New individual will always replace the most similar

individual if it has better fitness value.

(2) Probabilistic Crowding: Primarily a distance based niching method. Main

difference is the use of a probabilistic rather than deterministic acceptance

function. No longer do stronger individuals win over weaker individuals, they

win proportionally according to their fitness, and thus we get restorative

pressure. Two core ideas of probabilistic crowding are to hold tournament

between similar individuals and to let tournaments be probabilistic.

6. Deterministic Sampling: Average fitness of the population is calculated. Fitness

associated to each individual is divided by the average fitness, but only the integer

part of this operation is stored. If the value is equal or higher than one, the individual

Page 24: Adaptive Selection in Evolutionary Algorithm thesis

XXIV

is copied to the next generation. Remaining free places in the new population is

fulfilled with individuals with the greatest fraction.

Page 25: Adaptive Selection in Evolutionary Algorithm thesis

XXV

Chapter 3

Algorithm Proposal

3.1 Dual Population Genetic Algorithm

Dual Population Genetic Algorithm (DPGA) is a genetic algorithm which uses two

populations instead of one to avoid premature convergence with two different evolutionary

objectives [11] [12] [13]. The main population plays the role of that of an ordinary genetic

algorithm. It evolves to find a good solution of high fitness value. The additional population

is called reserve population is employed as reservoir for additional chromosomes which are

rather different from chromosomes of main population. Two different fitness functions are

used. Main population uses actual fitness function (like normal GA) and reserve population

uses a fitness function which gives better fitness to the chromosomes more different from

chromosomes of main population. Multi Population Genetic Algorithms use migration of

chromosomes from one population to another population to exchange information. DPGA

doesn’t use migration instead it uses another noble approach called crossbreeding.

Crossbreeding is performed by taking one parent from main population and another parent

from reserve population, making crossover between them. Newly born offspring are called

crossbred offspring. Crossbred offspring then evaluated for both main population and reserve

population for survival. DPGA also employs inbreeding, which takes two parents from the

same population and makes offspring by crossover. These inbred offspring compete for

survival in their respective parent population.

Figure (3.1a): Offspring Generation of DPGA

Mutation plays minimal role in DPGA and diversity is mainly provided by reserve population

through crossbreeding. Crossbreeding plays the role of maintaining diversity in DPGA. The

amount of diversity needed in any step of DPGA is specified by a self-adaptive parameter δ

(0< δ <1). δ defines the distance of parents from main population and parents from reserve

Page 26: Adaptive Selection in Evolutionary Algorithm thesis

XXVI

population. As δ determines which individual will participate in crossbreeding, we can say

roughly δ is analogous to the length of step size. The fitness function of reserve population is

------------------------------------(1)

Figure (3.1b): Reserve Population Fitness Function

d(M, x) is average distance from main population of individual x. So we have turned our

focus into crossbreeding and fitness function for reserve population. δ defines how much

distant reserve population will be from main population. δ is set to lower values for

exploitation and to higher values for exploration. If δ is kept similar for several generations,

reserve population will start to converge at δ distance from main population.

There are some pros and cons of DPGA:

3.1.1 Advantages

1. Reserve population preserves genes which is extinct from main population. As

survivor crossbred offspring holds gene inherited from best individual of main

population (which is lost in later generations), can be recovered from reserve

population.

2. DPGA utilizes information from successful breeding. Value of δ which produces

surviving offspring used later for selecting parent. If crossover is unsuccessful, δ

is set to maximum, which influences selection of future parents.

3.1.2 Disadvantages

1. Reserve population introduces space and computational overhead. For main

population, individuals only need one fitness evaluation when they are created or

modified. But for reserve population individuals, every individual needs to be

evaluated whenever the δ changes value as well as evaluation when created or

modified. If number of chromosomes of main population is n and reserve

population is m, then total evaluation of reserve fitness function is O(nm).

2. When selecting parent, dual population genetic algorithm doesn’t measure

diversity of reserved population. Reserve population should be diverse enough for

Page 27: Adaptive Selection in Evolutionary Algorithm thesis

XXVII

exploration of search space. As the selected parent of reserve population may not

be so dissimilar to the parent from main population. For crossbreeding, distance

between parents may not be δ. At the worst case, the distance may be far more

less or greater than δ.

3. If crossbred child survives in both main and reserved population. Diversity

decreases as same individual is copied to both populations. This gives us another

insight, crossbred child has higher fitness (according to fitness function for reserve

population) than inbred child and parent of reserve population may not be at

desired distant from parents of main population i.e. reserve population may not

have individuals who can breed offspring at the desired step size.

4. If crossbred offspring can’t survive in the main population, DPGA transforms into

single population algorithm. And if this happens for several generation, measures

to be taken to increase diversity of reserve population which incurs overhead.

5. If crossbred offspring manage to survive in reserve population, reserve population

will contain replicated genes of an individual of main population, decreasing

diversity of reserve population further.

6. At time of converging, DPGA keeps the value of δ low to facilitate convergence

for several generations. From equation (1), individuals having distance δ gains

more fitness over other individuals replacing individuals whose distance d(M, x)

much greater or less than delta are to be replaced. As in time of convergence, δ is

set to minimum i.e. individuals most distant from main population begins to

diminish (DPGA uses best n individual for survival selection for both population

with same fitness function for parent selection) and individuals similar to main

population individuals begins to takeover reserve population after few

generations. Hence, reserve population also begins to converge as like main

population but at distance δ from main population. When DPGA detects main

population converges to local optima, it sets δ to maximum to escape from local

optima. Now DPGA picks individual most distant from main population, but the

reserve population is already similar to main population and can’t provide

diversity any further.

7. Diversity is also dependent on success of crossover. If parents for crossbreeding

are at desired distance, they may not produce fittest individuals. Crossover always

a big jump to an area somewhere “in between” two (parent) areas. Offspring

seldom goes beyond their parents.

8. In DPGA, total gene frequency remained constant from the very beginning. As

crossover is only operator used, new gene is never introduced. Crossbreeding

changes gene frequency in individual population, but total frequency remained

unchanged. At worst scenario, if the best gene is missed at the initialization of

populations, DPGA never gets the optima.

9. Inbreeding in reserve population doesn’t introduce new genes. And if the distance

of two parents is δ and –δ (selection is based on their distance, not direction), then

the inbred offspring will be more similar to main population.

Page 28: Adaptive Selection in Evolutionary Algorithm thesis

XXVIII

3.1.3 Recommendation

One of the biggest drawbacks of DPGA is convergence of reserve population along main

population. For survivor selection of reserve population, probabilistic crowding (fitness

function would be same as before for parent selection) should be used for survivor selection.

As we have seen, current reserve population survivor selection of DPGA leads to

convergence of reserve population. Probabilistic crowding prevents similar genes to takeover

whole population simultaneously preserving genes from extinction.

3.2 Modified DPGA Proposal

We have seen above that selected parent from reserve population may not be different enough

from parent of main population. We can say this parent of reserve population is best of bad

bunch. As a result, crossbred offspring are not so different from their parents. And once δ is

set to lower value, near to zero for several generations, reserve population also become

almost identical to main population. We have no problem if main population converges to

good enough solution or terminating criteria is met. But if we detect premature convergence,

then we have to increase diversity of main population to escape local optima. But as reserve

population is identical with main population, it can’t give diversity to main population. So it

remains trapped in local optima.

To address this problem, we propose elimination of reserve population, instead we will

generate individual on the fly which will play the role of reserve population parent. On the fly

generated individual will be at exactly δ distance from parent of main population.

3.2.1 Structure Of Individual

Every individual in main population will be consisted of pairs of (xi, δi). Where xi is real

valued vector in each dimension and δi determines how much jump or distant will be on the

fly generated individual incorporated in each dimension. δi is called jump parameter.

Another parameter temperature T is also introduced. This parameter plays similar role like in

simulated annealing. We tried to bring the concept of simulated annealing as local search for

rigorously searching newly found potential search regions. Value of T is bigger at the

beginning of the algorithm, so that search region will be bigger and more uniform in all

dimensions i.e. shape of search region will be n-dimensional sphere. At the final stage of

evolution, value of T will be scheduled to lower to facilitate more exploited local search and

the search will exploit more in the dimensions where solutions are getting better. The local

search region will be like elliptical shape, where the major axis of the ellipse will be towards

the direction of the local (maybe global) optima of that region.

Page 29: Adaptive Selection in Evolutionary Algorithm thesis

XXIX

3.2.2 Initialization

xi is initialized in regular fashion. For δi, we will generate n random numbers. Then

3.2.3 Parent Selection

Any selection method can be used. But we prefer tournament selection or restricted

tournament selection (RTS).

3.2.4 Generating Parent Individual On The Fly

One parent is selected from main population and another parent is generated based on δ. If

is the real valued vector of generated individual at dimension I, then

as

We will take value of such that

is the maximum possible Euclidean distance between two points in search

space.

√∑

Then we will reset main parents jump parameters. Because if this parent is selected again and

jump parameters are unchanged; then same individual will be generated again. As a result,

same offspring will be produced and computation of a generation will be wasted. So we will

reset jump parameters like initialization.

3.2.5 Mutation

DPGA uses non-uniform mutation. But we will use Cauchy Mutation, as it gives more long

jumps to facilitate exploration, when algorithm is in exploration stage,. And when algorithm

is in exploitation stage, we will use Gaussian Mutation; it gives short jump to facilitate

convergence of individuals.

Page 30: Adaptive Selection in Evolutionary Algorithm thesis

XXX

3.2.6 Survivor Selection

New algorithm will evaluate both on the fly generated individuals and their inbred offspring.

If we offspring survive, we will divide them into 3 categories

1. Exploited Individual: When

2. Normal Individual: When

3. Explored Individual: When

Here distance is the Euclidean distance between offspring and main parent. We have divided

them into 3 categories so that we can explore and exploit at the very same time. When in

exploitation stage, the algorithm can still explore other potential regions in search space while

exploiting in current region. On the other hand, when algorithm is in exploration stage, if a

potential region is found we can exploit that region by conducting a local search like memetic

algorithm while still exploring other regions.

3.2.6.1 Exploited Individual

⁄ when is the explored dimension

⁄ when is the exploited dimension

Here,

The rationale is exploited individual comes from a region which is already explored or being

explored by another individual. So it doesn’t need to explore surrounding region twice.

3.2.6.2 Explored Individual

for every dimension

This individual is far away from its parent. It can be assumed that this offspring is in region

where the algorithm never searched before. So this potential region needs exploration.

Exploration is provided because . Even if the algorithm is in exploitation mood, it

can still explore newly found unsearched potentially good region.

3.2.6.3 Normal Individual

Randomly select dimensions.

⁄ for randomly selected dimension

⁄ for other dimensions

Page 31: Adaptive Selection in Evolutionary Algorithm thesis

XXXI

The individual is not in the distance which can be called exploited or explored. We have

selected dimensions because we want to introduce some variations based on fitness

difference.

3.2.7 Schedule of T

The initial value of T is dependent of optimization problem. For complex, multi-modal, rough

search space T should be greater to facilitate more exploration in local search and for simple,

unimodal search space smaller value of T is better. The value of T is a function of generation

count and surviving of offspring. We propose that, T should be increased with generation

count and if no offspring survived T should be remained same and if offspring survives T

should be increased. Because surviving of offspring means we are making progress towards

convergence, not surviving means we still need to explore more regions.

3.2.8 Advantages

1. Extra space of reserve population is no longer needed. Evaluation of reserve

population individuals is also eliminated.

2. On the fly generated individual is exactly at δ distance from parent of main

population, so diversity can be incorporated as much as we want.

3. New proposal introduces δi for each dimension ∑δi= 1. δi determines how much

exploitation or exploration will take place in any dimension. If we find a

dimension in which you can find better individual, we can continue to explore or

exploit in that dimension. On the other hand, if population is trapped in any deep

local optima, then we can experiment changing value of δi, to escape local optima.

4. Every individual will have their own δi, so we have granular control for every

individual in each dimension.

5. DPGA doesn’t facilitate exploitation in newly found good regions on fitness

landscapes, whether proposed algorithm gives full throttle in exploitation in

newly found region even if the algorithm is in globally exploration mode,

giving full local search capability like memetic algorithm.

6. DPGA doesn’t evaluate reserve inbred offspring and reserve parents for survival

in the main population. But this algorithm will evaluate both on the fly generated

individuals and inbred generated offspring. Since these individuals are already

generated as by-products of crossbreeding, evaluation of them has very little

overhead and if any of them survives, they can introduce more diversity in main

population and give a new region to search for potential global maxima.

3.3 New Survivor Selection Strategy

Current schemes of survivor selections fall into two categories:

1. Scheme those focused solely on survival of the fittest or exploitation. For

example, elitist selection, rank selection, fitness proportionate reproduction.

2. Scheme those focused on solely maintaining diversity. For example, niching

methods: fitness sharing, deterministic crowding, probabilistic crowding.

Page 32: Adaptive Selection in Evolutionary Algorithm thesis

XXXII

Above two categories are in two extreme ends. Those who focused on exploitation don’t take

diversity into account. On the other hand, those who focused on diversity don’t take

exploitation into account. But survivor selection should be based on both diversity and

exploitation. So we propose new survivor selection scheme which will take both diversity and

exploitation into account. The fitness function for survivor selection:

Here, real fitness function,

function of gene variation with chromosomes of current generation.

adaptive parameter which determines how much weight will be put on functions

.

Usually value of will be lower for early generations to preserve diversity; will be bigger for

final generations to facilitate convergence. We propose changing value of is function of

generation count, survival of offspring and difference between fitness of the best individual

and desired fitness.

Measuring gene variation can be crucial. One naïve approach we can adopt is to measure

Euclidean distance from all the individuals of current population, which is of O(n). We can

improve this algorithm further by some trivial modifications. At first, we will take a point or

individual as reference for measuring distances. Let, we take the individual (LowerBound0,

LowerBound1,……, LowerBoundn-1) as our reference individual. Now at the beginning, we

will measure distance from reference individual to every individual of current population. So

normalized distance of any individual is

So mean normalized diversity of current population is

Standard deviation using µ as reference, √

Every individual will include an additional real valued vector called relative diversity.

Relative diversity is a measure of diversity of an individual relative to the rest of the

population. Relative diversity is found by calculating standard deviation of the population

using corresponding individual as reference

Page 33: Adaptive Selection in Evolutionary Algorithm thesis

XXXIII

So diversity fitness function is

is monotonically increasing function of . A generic fitness function will be of form

where α is scaling factor dependent on optimization problem

Scaling factor α is needed to make diversity fitness function more comparable to real fitness

function.

Usually if , then offspring improves diversity .

Now if the offspring survives and normalized distance of replaced individual is . So

the new relative diversity measurement of individuals is

( )

( )

We will adopt elitist selection. Top 10% individuals according to real fitness function and top

10% individuals according to gene variation will be reserved. So this scheme emphasize both

on exploration and exploitation. The rest 80% individuals have to survive through proposed

fitness function.

Careful observation reveals that normalized diversity of individual is in range . So

standard deviation of ( ) will certainly between 0 and 1. We can use as adaptive

parameter . But if two groups of individuals are at maximum distance while group members

are in the same neighborhood, then will be nearly 0.5 high. But the population is not

diverse at all; it just converges in two groups situated far away from each other. We can take

average relative diversity as . But same problem still persists.

First we have to determine the area of neighborhood (β) in terms of normalized diversity .

Optimal size of neighborhood depends on optimization problem. We will introduce n-

dimensional array of buckets. A bucket is a small region specified by neighborhood size

which is essentially an n-dimensional hypercube. Individuals located on buckets region will

fall into that bucket. So there will be buckets. Each value in an element of array

means number of individuals in that bucket. Every element will be initialized to zero. One

can easily find value of bucket array index

For huge search space this bucket array may require enormous amount of memory. We can

use sparse matrix as data structure to address this issue.

After finding an individual in bucket region, value of that bucket will be increased. Individual

will also contain the location of bucket for easy removal.

Page 34: Adaptive Selection in Evolutionary Algorithm thesis

XXXIV

We can see that . If we have to remove an individual we simply

decrement the value of corresponding bucket. We can use this as parameter .

When value of is higher algorithm puts more weight in real fitness function, because

current population is diverse enough. On the other hand, if value of is lower, algorithm puts

more weight in diversity fitness function as current population is losing diversity.

Even so, this function needs to be modified. It doesn’t take generation number into account.

So when the algorithm is converging, value of this function i.e. is low which will slow the

rate of convergence. A simple approach can be

( (

))

It works because is a monotonically increasing function. As generation increases value of

(

) also increases. And maximum value of (

) can be

1, then value of will be less than 1.

Using buckets, we can also define the regions that are already searched. We will introduce a

boolean variable named isSearched (true if the region is already searched, false if not) for

each bucket. We will define criteria eligible of being searched for each separate optimization

problem. For example, we can declare a bucket region searched when it has at least three

individuals each surviving at least 50 generations. The point of marking regions as searched

is that two individuals of same relative diversity, individual belongs to unsearched bucket

will get more weight in diversity than the one in searched bucket. For example, we will

reduce the to

⁄ .

(

)

A rather extreme scheme can be eliminating all individuals of a searched bucket except the

best individual in that bucket. These individuals will be replaced by new individuals taken

from buckets which are not searched and individual count is 0. But it can be detrimental for

complex optimization problem.

Above adaptive survival fitness function is for function maximization problems, but many

real life problems involve function minimization i.e. cost minimization. In case of function

minimization, above approach won’t work. Because, both real fitness function and diversity

fitness function should decrease value for better individuals. So we need to adopt an

algorithm which will assign lower diversity fitness value for diverse individuals, higher

diversity fitness value for less diverse individual. A simple approach can be

Page 35: Adaptive Selection in Evolutionary Algorithm thesis

XXXV

Above equation assigns more diversity fitness for less diverse individual and less diversity

fitness for more diverse individual. So we can also apply this diversity management

technique to function minimization.

Another approach can be applying local search to the individuals like Memetic algorithm.

Steepest ascent hill climbing is adopted here. During this hill climbing process if that

individual goes through the region of a bucket, then that bucket will be marked as searched.

Advantage of this scheme is that we can easily identify the searched regions even if those

regions don’t meet the criteria for being marked as search.

In cases of premature convergence, we can override this function and manually set the value

of .

Although niching methods: crowding methods or fitness sharing maintains diversity. But

these methods lack control on diversity. They solely try to keep pre-existing diversity level;

they neither increase diversity level in case of premature convergence nor decrease diversity

to facilitate exploitation. On the other hand, proposed survivor selection scheme gives full

control on diversity level needed in any time of evolution.

3.4 New Mutation Strategy

Using probability distributions for generating random numbers to introduce random variation

in real vectors (is called mutation). Till now, only three distributions are used successfully in

mutation. They are:

1. Gaussian Mutation

2. Cauchy Mutation

3. Levy Mutation

Careful observation reveals that above three distributions used are members of Stable Family

of distributions. Stable Family is a family of distributions where linear combination of two

independent distributions of same kind has the same distribution up to location and scale

parameters. In fact, above three distributions are special cases of stable distribution. All the

stable distributions are infinitely divisible. They are absolutely continuous and unimodal. A

random variable X is called stable (has a stable distribution) if, for n independent copies Xi of

X, there exist constants cn > 0 and dn such that

Page 36: Adaptive Selection in Evolutionary Algorithm thesis

XXXVI

Figure (3.4): Probability Density Function of Stable Family

So we can try other member distributions of stable family to generate random numbers for

mutation. Other two members of stable family are:

1. Laplace Distribution

2. Slash Distribution

3.4.1 Laplace Distribution

Like Gaussian distribution, it has two parameters: Location parameter, µ and Scale

parameter, σ. Cauchy distribution is the result of Fourier transformation of Laplace

distribution. The probability density function of the Laplace distribution is also reminiscent

of the Gaussian distribution; however, whereas the Gaussian distribution is expressed in

terms of the squared difference from the mean μ, the Laplace density is expressed in terms of

the absolute difference from the mean. Consequently the Laplace distribution has fatter tails

than the Gaussian distribution.

Page 37: Adaptive Selection in Evolutionary Algorithm thesis

XXXVII

Figure (3.4.1a): Probability Density Function of Laplace Distribution

Figure (3.4.1b): Comparison of Gaussian and Laplace Distribution

Above is a graph of Gaussian and Laplace distribution with same scale and location

parameter. It is noticeable that Laplace has fatter tail than Gaussian and has a sharper peak

than Gaussian. Laplace falls rather quickly in comparison with Gaussian. It is expected to

have a higher probability of escaping from a local optima or moving away from a plateau,

especially when “the basin of attraction” of the local optima or plateau is large relative to the

mean step size. On the other hand, Gaussian has greater probability in the mid-range. From

observation, we can conclude that, sharp peak of Laplace facilitates exploitation as it has

more probability of producing short jump; it can also give long jump more than Gaussian.

Although for the mid-range jump, Gaussian gives better result.

Page 38: Adaptive Selection in Evolutionary Algorithm thesis

XXXVIII

3.4.2 Slash Distribution

The Slash distribution is a continuous unbounded distribution developed as a deviation to the

Gaussian distribution to allow for fatter tails kurtosis by altering the κ parameter, as

illustrated in the plot below. When κ=0 the distribution reduces to a Gaussian(μ, σ). If

Gaussian distribution is divided by a standard uniform random variable, then the resulting

distribution is Slash distribution. It’s an example of ratio distribution.

It has three parameters, like Gaussian distribution location parameter μ, scale parameter σ and

an extra parameter κ.

Figure (3.4.2): Probability Density Function of Slash Distribution at different parameters

From above graph, we see that, as value of κ getting bigger, tail and peak of Slash

distribution is getting bigger, slope is getting steeper and mid-range is getting smaller. By

controlling the value κ, we can get an adaptive probability distribution which will facilitate

two extreme ends: exploitation and exploration.

The Slash distribution is used to fit to data that are approximately Gaussian distribution but

have a kurtosis > 3. i.e. greater than the Gaussian distribution. The Slash distribution can

readily be compared to a Gaussian distribution since they share the same mean μ and standard

deviation σ parameters.

Another distribution of which Gaussian distribution is a special form of family of

distributions is called Student’s t-distribution.

Page 39: Adaptive Selection in Evolutionary Algorithm thesis

XXXIX

3.4.3 Student’s t-distribution

Student’s t-distribution (or simply the t-distribution) is a family of continuous probability

distributions that arises when estimating the mean of a normally distributed population in

situations where the sample size is small and population standard deviation is unknown. The

t-distribution is symmetric and bell-shaped, like the normal distribution, but has heavier tails,

meaning that it is more prone to producing values that fall far from its mean.

Figure (3.4.3): Probability Density Function for Student’s t-distribution with different degrees of

freedom

The overall shape of the probability density function of the t-distribution resembles the bell

shape of a normally distributed variable with mean 0 and variance 1, except that it is a bit

lower and wider. As the number of degrees of freedom grows, the t-distribution approaches

the normal distribution with mean 0 and variance 1. As t-distributions have similar bell curve

shape of Gaussian distribution and when the number of degrees of freedom reaches infinity it

converges to Gaussian. Although for practical purposes, when degree of freedom is 30, t-

distribution converges to Gaussian. Careful observation reveals that when degree of freedom

is low, t-distribution have much fatter tail and lower peak. As degree of freedom (DOF)

increases, tails become thinner and peak becomes higher. That means, at low DOF, this

distribution gives more long jumps and with increase of DOF distribution gives sorter jumps.

So we can exploit this behavior of t-distribution. At the beginning of EA, DOF for t-

distribution will be low; diversity is needed so t-distribution will produce long jumps to

facilitate diversity. As generation increases we will increase DOF of t-distribution, it will

give less short jumps and the algorithm will be less exploitative.

Page 40: Adaptive Selection in Evolutionary Algorithm thesis

XL

Chapter 4

Experimental Study

4.1 Modified DPGA

We have implemented

1. Standard GA

2. DPGA

3. Modified DPGA

Parameter setting for algorithms:

Maximum generation = 1000

Population size = 500.

Main population parents = 2

Reserve population parents = 2

Inbred main population offspring = 2

Inbred reserve population offspring = 2

Crossbred offspring (are produced by taking 1 parent individual from main population

and another parent from reserve population) = 2

For crossover blend crossover method was used with parameter .

Uniform Gaussian mutation with is applied for both of them.

Tournament selection was used for parent selection. Naïve survivor selection method was

adopted.

4.1.1 Pitfalls of Modified DPGA

Theoretically the proposed algorithm should work better than DPGA. But in practice it

doesn’t. At the beginning if the algorithm, we set which means generated individual

should be at maximum distance possible. As a result, on the fly generated individual always

go to the edge of search space. So generated individual only searches extreme ends of search

space. Thus offspring produced by crossover taking generated individual as parent, are also

on the boundary of search space or in its neighborhood. Mutating these offspring seldom

works, because short or mid-range jumps will still keep individual near other individuals.

And long jump needed to introduce diversity is very unlikely by current mutation operators.

Page 41: Adaptive Selection in Evolutionary Algorithm thesis

XLI

Even if we design a mutation operator which gives this sort of jump, it has the risk of taking

individual out of search space and taking sufficiently diverse individual to already searched

regions. If we have search space of n dimensional, every dimension has same lower bound,

upper bound then the search space will be like n-dimensional hypercube. Literally this

algorithm only searches the faces of hypercube and their neighborhoods, while core region of

hypercube remains unsearched.

Another modification can be made to start the algorithm with lower value of . This will

prevent generating individual at the boundary of search space as well as offspring. But lower

value of also means algorithm is unable to make long jumps.

4.2 Adaptive Survivor Selection Strategy

We have used standard GA with different types of survivor selection scheme. Implemented

schemes are:

1. Naïve survivor selection

2. Adaptive survivor selection

Initial setting of parameters of adaptive survivor selection:

Population size = 2000

Maximum generation = 500

Bucket edge length = 1.90734863e-6

Minimum number of generations required to be declared searched = 70

Minimum number of individuals required to be declared searched = 3

Survival adaptive parameter = 0.3

Diversity scaling factor = 50

Penalty factor = 10

Reserved number of best individuals (elite) = 20

Reserved number of most diverse individuals (elite) = 20

Experiment result for Ackley function by both adaptive survivor selection and naïve

survivor selection given below:

Page 42: Adaptive Selection in Evolutionary Algorithm thesis

XLII

Figure (4.2a): Change in diversity across generations

Figure (4.2b): Number of buckets searched

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

Adaptive Survivor Selection Naive Survivor Selection

Page 43: Adaptive Selection in Evolutionary Algorithm thesis

XLIII

Chapter 5

Conclusion

5.1

Adaptive survivor selection is a noble approach which first introduces adaptive survivor

selection with new diversity measurement. The best tool it provides that selection method can

be adapted with respect to generation and diversity level. It checks the amount of diversity so

that at any moment diversity doesn’t fall beyond lowest permitted value. It also incorporates

elitist selection scheme not only for best individuals but also for most diverse individuals; so

that when individuals are trapped into deep local optima, these most diverse individuals

found by far helps to escape.

Experiments show that adaptive survivor selection beats currently most used naïve survivor

selection in terms of maintaining diversity exclusively. Although niching methods can

maintain pre-existing diversity better than adaptive survivor selection sometimes, but we can

mitigate this gap of performance by using proper initialization of adaptive diversity parameter

and update rule. This scheme addresses one of the drawbacks of niching methods, they can’t

control the diversity needed for at any generation. Actually niching methods and adaptive

survivor selection have different goals. Niching methods mainly focuses on growing niches

of individuals and maintaining niches, on the other hand our scheme focuses on maintain the

level of diversity which can guide to individuals to global maxima.

5.2 Future Work

5.2.1 Modified DPGA

It is obvious that value of δ caused this measurable performance of this algorithm. If we can

change initialization and update rule of δ, hopefully this algorithm will perform better. One

approach could be instead of initializing δ to 1, we will initialize δ to lower values. Thus risk

of individuals going beyond the search space or only residing on the search space boundary

will be mitigated. But this approach has a flaw. If we restrict δ to lower values, that means

algorithm is now less capable of getting out of local optima and hence more prone to

premature convergence. Assigning value to δ can be taken from a probability distribution. So

that δ won’t be vulnerable to being too high or too low. After the initialization problem of δ is

solved, update rule of δ is still needs to be revised.

Page 44: Adaptive Selection in Evolutionary Algorithm thesis

XLIV

5.2.2 Adaptive Survivor Selection

We have investigated new diversity measurement technique and using that diversity

measurement technique, we have proposed new survival selection strategy which works

better than existing survivor selection schemes. A pitfall of new diversity measurement is for

some edge cases, diversity measurement gives high value of diversity although the population

isn’t diverse at all. So detecting these edge cases and mitigating the error caused by these

edge cases can be done in future. Also we can adopt fitness sharing to assign fitness to each

bucket, where every individual of that bucket will share that fitness. Assigned fitness to a

bucket will be dependent how much diverse that bucket is. That means instead of measuring

diversity of individuals, we are measuring diversity of their container buckets. Once bucket is

assigned fitness, then individuals of same bucket will share that fitness among them.

5.2.3 New Distribution Based Mutation

We have investigated distributions which have similar properties of currently deployed

distributions or have same family origin. These distributions have bell shaped curve similar to

Gaussian to and also dependent on the same set of parameters like Gaussian, Cauchy or Levy

distributions. Three distributions presented before has potential to replace current distribution

based mutations. All of them have fatter tails and Laplace, Slash distributions have higher

peaks, so theoretically both of them should give better performance in both exploration and

exploitation. Student’s t-distribution has converged to Gaussian at DOF 30. So we can

experiment on which initial DOF, we initiate our algorithm and how we can change the DOF

as the generation increases.

Page 45: Adaptive Selection in Evolutionary Algorithm thesis

XLV

References

[1] D.E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning.

Addison-Wesley, Reading, MA, 1989.

[2] L.J. Fogel, A.J. Owens, and M.J. Walsh, Artificial Intelligence through simulated

evolution, New York, John Wiley & Sons, 1966.

[3] E. Eiben, R. Hinterding, and Z. Michalewicz, “Parameter control in evolutionary

algorithms,” IEEE Trans. Evol. Comput., vol. 3, no. 2, pp. 124–141, Jul. 1999.

[4] D. H. Wolpert and W. G. Macready, “No free lunch theorems for optimization”, IEEE

Transactions on Evolutionary Computation, vol. 1, no. 1, pp. 67–82, 1997.

[5] D. E. Goldberg and J. Richardson, “Genetic algorithms with sharing for multimodal

function optimization,” in Proc. 2nd Int. Conf. Genetic Algorithms (ICGA), 1987, pp. 41–49.

[6] T. Jumonji, G. Chakraborty, H. Mabuchi, and M. Matsuhara, “A novel distributed genetic

algorithm implementation with variable number of islands,” in Proc. IEEE Congr. Evolut.

Comput., 2007, pp. 4698–4705.

[7] Y. Yoshida and N. Adachi, “A diploid genetic algorithm for preserving population

diversity-pseudo-Meiosis GA,” in Proc. 3rd Parallel Problem Solving Nature (PPSN), 1994,

pp. 36–45.

[8] M. Kominami and T. Hamagami, “A new genetic algorithm with diploid chromosomes by

using probability decoding for nonstationary function optimization,” in Proc. IEEE Int. Conf.

Syst., Man, Cybern., 2007, pp. 1268–1273.

[9] S. W. Mahfoud, “Crowding and preselection revisited,” in Proc. 2nd Parallel Problem

Solving Nature (PPSN), 1992, pp. 27–37.

[10] S. W. Mahfoud, “Niching methods for genetic algorithms,” Ph.D. dis-sertation, Dept.

General Eng., Univ. Illinois, Urbana-Champaign, 1995.

[11] T. Park and K. R. Ryu, “A dual population genetic algorithm with evolving diversity,” in

Proc. IEEE Congr. Evol. Comput. , 2007, pp. 3516–3522.

[12] T. Park and K. R. Ryu, “Adjusting population distance for dual-population genetic

algorithm,” in Proc. Aust. Joint Conf. Artif. Intell., 2007, pp. 171–180.

[13] T. Park and K. R. Ryu, “A Dual-Population Genetic Algorithm for Adaptive Diversity

Control” in Proc. Aust. Joint Conf. Artif. Intell., 2009, pp. 191–210.

[13] R. McKay, “Fitness sharing in genetic programming,” in Proc. of the Genetic and

Evolutionary Computation Conference, Las Vegas, Nevada, 2000, pp. 435–442.

Page 46: Adaptive Selection in Evolutionary Algorithm thesis

XLVI

[14] R. K. Ursem, “Diversity guided Evolutionary algorithm,” in Proc. of Parallel Problem

Solving from Nature (PPSN) VII, vol. 2439, J. J. Merelo, P. Adamidis, H. P. Schwefel, Eds.

Granada, Spain, 2002, pp. 462–471.

[15] T. Bäck and H.-P. Schwefel, “An overview of evolutionary algorithms for parameter

optimization,” Evol. Comput., vol. 1, pp. 1–23, 1993.

[16] K. Chellapilla, “Combining mutation operators in evolutionary programming,” IEEE

Trans. Evol. Comput., vol. 2, pp. 91–96, Sept. 1998.

[17] R. Mantegna, “Fast, accurate algorithm for numerical simulation of Lévy stable

stochastic process,” Phys. Rev. E, vol. 49, no. 5, pp. 4677–4683, 1994.

[18] X. Yao, G. Lin, and Y. Liu, “An analysis of evolutionary algorithms based on

neighborhood and step size,” in Proc. 6th Int. Conf. Evolutionary Programming, 1997, pp.

297–307

[19] D. Thierens, “Adaptive mutation rate control schemes in genetic algorithms,” in Proc.

Congr. Evol. Comput. , vol. 1. 2002, pp. 980–985.

[20] G. Rudolph, “On takeover times in spatially structured populations: Array and ring,” in

Proc. 2nd Asia-Pacific Conf. Genetic Algorithms Applicat., 2000, pp. 144–151.

[21] X. Yao, Y. Liu, and G. Lin, “Evolutionary programming made faster,” IEEE Trans.

Evol. Comput., vol. 3, no. 2, pp. 82–102, Jul. 1999.