presentation

Major Defence 2014Major Defence 2014

Swarm IntelligenceAnts, Birds, Wolves and .... Bees

Major Defence 2014Major Defence 2014 2

Particle Swarm Optimisation (PSO)

PSO is an optimisation procedure based on the social behaviour of groups of organisations (for example the flocking of birds and the schooling of fish)

Individual solutions in a population are viewed as “particles” that evolve or change their positions with time

Each particle modifies its position in search space according to its own experience and also that of a neighbouring particle by remembering that best position visited by itself and its neighbours (combining local and global search methods)


PSO update equations

Grey Wolves

Inspiration

Based on the hunting behavior of canis lupis

Grey wolves

are considered as apex predators, meaning that they are at the top

of the food chain. Grey wolves mostly prefer to live in a pack. The

group size is 5–12 on average. Of particular interest is that they

have a very strict social dominant hierarchy.

Concept

● The leaders are a male and a female, called alphas. The alpha is mostly responsible for making decisions about hunting, sleeping place, time to wake, and so on. The alpha’s decisions are dictated to the pack. - Interestingly,

● the alpha is not necessarily the strongest member of the pack but the best in terms of managing the pack.

● The second level in the hierarchy of grey wolves is beta. The betas are subordinate wolves that help the alpha in decision-making or other pack activities.

● and he/she is probably the best candidate to be the alpha in caseone of the alpha wolves passes away or becomes very old (i.e. non global optima follower). The beta wolf should respect the alpha, but commands the other lower-level wolves as well. It plays the role of an advisor to the alpha and discipliner for the pack. The beta reinforces the alpha’s commands throughout the pack and gives feedback to the alpha.

● Omega responsible for keeping structure

● Delta are ranomd sample points akin to scout bees

Algorithm

● Initialize the grey wolf population Xi (i = 1, 2, ..., n)

● Initialize a, A, and C

● Calculate the fitness of each search agent

● X_{\alpha} =the best search agent

● X_{\beta} =the second best search agent

● X_{\delta} =the third best search agent

● while (t < Max number of iterations)

● for each search agent

● Update the position of the current search agent by above equations

● end for

● Update a, A, and C C

● Calculate the fitness of all search agents

● Update X_\alpha, X_\beta, and X_\delta

● t=t+1

● end while

● return X_\alpha

Algorithm

● Circling the prey – local search

● Where components of \vec{a} are linearly decreased from 2 to 0 over the course of iterations and r_1, r_2 are random vectors in [0,1].

● A grey wolf in the position of (X,Y) can update its position according to the position of the prey (X*,Y*). Different places around the best agent can be reached with respect to the current position by adjusting the value of \vec{A} and \vec{C} vectors. For instance, (X*-X,Y*) can be reached by setting \vec{a}=(1,0) and \vec{C}=(1,1). Note that the random vectors r_1 and r_2 allow wolves to reach any position between the two particular points. So a grey wolf can update its position inside the space around the prey in any random location by the above-mentioned equations.

● The same concept can be extended to a search space with n dimensions, and the grey wolves will move in hyper-cubes (or hyper-spheres) around the best solution obtained so far.

(X*,Y*,Z*)

(X,Y*-Y,Z*-Z)

(X*-X,Y,Z*-Z)

(X*,Y*-Y,Z*-Z)

(X*-X,Y*-Y,Z-Z*)

(X*-X,Y*,Z*-Z)

(X,Y*,Z)

(X,Y*-Y,Z)

(X,Y*,Z*)

(X,Y,Z*)

(X*,Y*,Z*-Z) (X,Y*,Z*-Z)

(X*,Y,Z*-Z) (X,Y,Z*-Z)

Algorithm

● On some indication of convergence we start an exploitation mode. Mathematically the value of \vec{decrease}. Note that the fluctuation range of \vec{A} is also decreased by \vec{a}. In other words \vec{A} is a random value in the interval [-2a,2a] where a is decreased from 2 to 0 over the course of iterations. When random values of \vec{A} are in [-1,1], the next position of a search agent can be in any position between its current position and the position of the prey.

● With the operators proposed so far, the GWO algorithm allows its search agents to update their position based on the location of the alpha, beta, and delta; and attack towards the prey. However, the GWO algorithm is prone to stagnation in local solutions with these operators. Hence the alogithm's performance is sub optimal in areas of high variance. The circling mechanism itself is not sufficient, however it interesting component to local search

Algorithm

● Hunting:

● Grey wolves have the ability to recognize the location of prey and encircle them. The hunt is usually guided by the alpha. The beta and delta might also participate in hunting occasionally. However, in an abstract search space we have no idea about the location of the optimum (prey). In order to mathematically simulate the hunting behavior of grey wolves, we suppose that the alpha (best candidate solution) beta, and delta have better knowledge about the potential location of prey. Therefore, we save the first three best solutions obtained so far and oblige the other search agents (including the omegas) to update their positions according to the position of the best search agent

Algorithm

● A alternating divergence and convergence is require to prevent premature convergence. To measure convergence/divergence criteria we use \vec{c} and \vec{A}

● Note that C>1 and convergence is indicated by A<1

● The C vector can be also considered as the effect of obstacles to approaching prey in nature. Generally speaking, the obstacles in nature appear in the hunting paths of wolves and in fact prevent them from quickly and conveniently approaching prey. This is exactly what the vector C does. Depending on the position of a wolf, it can randomly give the prey a weight and make it harder and farther to reach for wolves, or vice versa.

● To sum up, the search process starts with creating a random population of grey wolves (candidate solutions) in the GWO algorithm. Over the course of iterations, alpha, beta, and delta wolves estimate the probable position of the prey. Each candidate solution updates its distance from the prey. The parameter a is decreased from 2 to 0 in order to emphasize exploration and exploitation, respectively. Candidate solutions tend to diverge from the prey when |\vec{A}|>1 and converge towards the prey when |\vec{A}|<1. Finally, the GWO algorithm is terminated by the satisfaction of an end criterion.

Critique

➔ Works, but not a real hierarchy➔ Since average is eqully weighted across all

named samples. Nothing seperates an alpha from a beta

➔ Highly senistive.(Problem dependent) to hieararchy depth


Bees in Nature

1- Bee colonies can span huge distances. They are able to accurately search large spaces.


Bees in Nature

Flower patches with plentiful amounts of nectar or pollen that can be collected with less effort should be visited by more bees, whereas patches with less nectar or pollen should receive fewer bees. This is equivalent to saying areas of high fitness have a high density of updates.


Bees in Nature

2- Scout bees search randomly from food source to food source


Bees in Nature

3- The bees who return to the hive, evaluate the different patches depending on certain quality threshold (measured as a combination of some elements, such as sugar content)


Bees in Nature

4- After depositing the nector or pollen the “dance floor” to perform a “waggle dance”


Bees in Nature

5- Bees communicate through this waggle dance which contains the following information:

1. The direction of flower patches (angle between the sun and the patch)

2. The distance from the hive (duration of the dance)

3. The quality rating (fitness) (frequency of the dance)


Bees in Nature

These information helps the colony to send its bees precisely

6- Follower bees go after the dancer bee to the patch to gather food efficiently and quickly


Bees in Nature

7- The same patch will be advertised in the waggle dance again when returning to the hive is it still good enough as a food source (depending on the food level) and more bees will be recruited to that source

8- More bees visit flower patches with a (fitness) amount of food


Bees in Nature

Depending on the fitness, a solution or food source may be abandoned or may become popular


Artificial Bee Colony Algorithm (ABC)

Initialization Phase

Repeat

Employed Bees Phase

Onlooker Bees Phase

Scout Bees Phase

Memorize the best solution achieved so far

Until(Cycle=Maximum Cycle Number)


Phases

● Employed bees:

– Perform a local search to improve the existing solution

– When a solution point cannot be improved further the bee is reassigned as a scout bee – after a certain threshold limit

● Onlooker bees:

– Are probabalistically added to the neighborhoods of existing sample points

– Thus better sources/samples attract more bees in their neighborhood (positive feedback)

– using the expression given in equation :


Phases

● Scout Phase

― Purely random

― Population fixed i.e. number of scouts are fixed -

― Most unemployed bees are allocated to the onlooker worker base


Swarm Criteria – What makes a smart swarm?

The swarm should be able to do simple space and time computations (the proximity

principle).

The swarm should be able to respond to quality factors in the environment (the quality

principle).

The swarm should not commit its activities along excessively narrow channels (the principle

of diverse response).

The swarm should not change its mode of behavior upon every fluctuation of the environment

(the stability principle).

The swarm must be able to change behavior mode when needed (the adaptability principle).


Variation - the cause

● Employed bees that cannot locally improve their sources are re-assigned.

● The reassignment is independent of where the particle used to be.

● Thus leading to large jumps. And hence abrupt changes in the fitness/error.

● Leads to large improvements very quickly. But...

● Leads to high variations in the perfomance,

● Centalizes the algorithm -> something sarm intelligence tries to prevent


Lifespan

● Allows a popuation to improve itself quickly (since information can be lost or deleted between generations)

● However,

– No control over information lost

– No information gained


Where genetic Algs Come in

● Framework built around the generational concept

● Allows information transfer between generations:

● Genetic operators like crossover can be modified to favor specific types of information transferr

● e.g. diemensional bias

● This reduces the variation without icreasing centralization


Genetic Operators

●Cross over:

– Single point

– Multi point

– Arithmetic

– Heuristic

– One-way● Crossover probability parameter should be reduced as the algorithm

progresses

– Since after initial area allocation is done; conserving generational information loses priority

●Mutation


Support: Experiments and Results


Proposed Algorithmic Improvement

Initialization Phase

Repeat

● Context Sensitive parameters

Employed Bees Phase

● Crossover Phase

Onlooker Bees Phase

Scout Bees Phase

Memorize the best solution achieved so far

● Mutation Phase

Until(Cycle=Maximum Cycle Number)

• The Crossover ensures that information is carried from one generation to the next

• Crossover also dampens the variations effects

• Muatation allows more effective local searching

• Context senitive parameters allow a more efficient algorithm

• Why is the operator inserted in this order? - the answer is not trivial.


GA – results of experiments

●Several Xover operators were tried e.g.

● Single pt

● Uniform – best results

●We even made our own operator

● Single dir Xover – leads to a better general population but hits thrashing problems


Effectiveness

● Clearly leads to a more montonica decrease in error

● A smoother curve

● Less varitaion – uniformity and consistancy

● Behavior near higher iterations show definite improvement


Effectiveness contd.

● Offers better performance in higher dimensional problems

● The graph shows the the performace of ABC vs GA across 30 runs

● Run paramters:

● Low mutation

● Low crossover

● For rastrigin(100)


PSO type falloff


Rosenbrock Greiwank


Experimental Results

Function ABC - mean(across30runs)

minimum error

ABCwith proposed modi-

fications - mean(across 30

runs) minimum error

ABC std ABC withproposed

modifications std

Spheren

i=1x 2i 562.55 541.36 39.343 31.394

Griewank 1 + 14000

n

i=1x 2i

−n

i=1cos(

x i√i

)

0.1497 0.1376 0.008757 0.0086948

Rastrigin An+n

i=1x 2i− Acos(2πx i ) 1348.84 1307.85 64.885 60.479

Rosenbrockn

i=1[(1 − x 2

i ) 2 − 100(x i+1 −

x 2i ) 2 ]

767751.64 729240.13 9.54E+004 8.24E+004

Ackley − ae

− b

i=D

i=1x 2i

D

−

e

i=D

i=1cos (cx i )

D

+a +e

9.1524613695 9.0952396525 0.045036 0.03556

Multi

Dimensional

Easom

i=D

i=1cos (x i )e

−i=D

i=1(x i − π) 2

0 0 0 0

Schwefel 418.9829D −D

1x i sin( |x i |)

41389.9780380048 41371.3919573781 195.8 156.84

Zakharovi=D

i=1x 2i +

0.5i=D

i=1ix i

2+

0.5i=D

i=1ix i

4

7.85E+002 7.91E+002 2.51E+003 3.95E+003


Proposed Modifications to the ABC: Analysis

● The Proposed algorithm show significantly lower variation in runs and a steadier monotonic decrease in error

● The probability of premature convergence is mch lower

– Due to underlying ABC structure

– Control of information flow between generation● Proposed modifications are Local

● Enables a high degree of distributiveness and parallizability

― Relatively Cheap O(1) – O(n) [depending on option opted for]

― Minimizes variance


Models

●No known accurate models for algorithm

●Original model very complex uses tensors... really doesn't give too much insight

– Tensor construct grows expoentially ... not computationaly feasible● 2 phase model

– Local phase

– Non local phase● Pumping model

– Uses modified voroni diagram (may not use euclidean distance metric)

– Scales with number of extreama


Pumping model

● Create a cover of the space

● Cover must centre aound extremas

● Each set has a fixed search time associated with it

● Can be based on saddle points/extreamas, contours, voronoi diagram with any metric

● The search time depends on

● Area of the cover● Magnitude of extrema (relative

depth of minima and relative height for maxima)


Pumping Model 2: update equations


Visualization

Projection into subspace leads to loss information; we need to minimize it

Users preferr:

● Linear projection easier to corelate

● Limited or no change in basis

This implies:

● Use linear embedding (eigen vector based, LDA, etc) no non linear (kohonen maps, etc)

● Cannot use PCA, etc at each instant of time instead rotate the axis by certain amount

● Amount calculated is theta. Have found an equation forit

● How to rotate is still a big question Fraction of population at maximum extrema

An Intelligent System:from Birds, Bees, Genes and

Wolves to...

The swarm

Context sensitive parameters

Intra generation information passing

Inter generationimformation passing

Hirarchies

global local

Finite lifespan

PSO

GA

ABC

Multiple sample points

GWO

What does this remind you off?

humans


Future work

Context sensitive parameters: we have just scratached the surface

Layered models

– Hierarchy based GWO is not a true hierarchy based

Semi localized models – control degree of localization

Memtic search – type1,2,3


Bibliography

[1]Hagan, Martin T., Howard B. Demuth, and Mark H. Beale. Neural network design. Boston: Pws Pub., 1996.

[2]Brownlee, Jason. Clever algorithms: nature-inspired programming recipes. Jason Brownlee, 2011.

[3]Kollar, Daphne, and Nir Friedman. Probabilistic graphical models: principles and techniques. The MIT Press, 2009.

[4]Pham, D. T., et al. "The bees algorithm–a novel tool for complex optimisation problems." Proceedings of the 2nd Virtual International Conference on

Intelligent Production Machines and Systems (IPROMS 2006). 2006.

[5]The article describes the Bees Algorithm. It first describes existing swarm/evolutionary approaches pointing out that these require a number of

parameters. It then details the behavior of bees in nature.

[6]Karaboga, Dervis, and Bahriye Akay. "A comparative study of artificial bee colony algorithm." Applied Mathematics and Computation 214.1 (2009):

108-132.


[7]Larrañaga, Pedro, et al. "A review on evolutionary algorithms in Bayesian network learning and inference tasks."

Information Sciences (2013).

[8]Zitzler, Eckart, Kalyanmoy Deb, and Lothar Thiele. "Comparison of multiobjective evolutionary algorithms: Empirical

results. (revised version)" Evolutionary computation 8.2 (2000): 173-195.

[9]Shah, Sameena, Ravi Kothari, and Suresh Chandra. "Trail formation in ants. A generalized Polya urn process." Swarm

Intelligence 4.2 (2010): 145-171.

[10]Shah, Sameena, et al. "Mathematical Modeling and Convergence Analysis of Trail Formation." AAAI. 2008.

[11]Mitchell, Melanie. "An introduction to genetic algorithms (complex adaptive systems)." (1998).


[12]Pham, D. T., et al. "The bees algorithm–a novel tool for complex optimisation problems." Proceedings of the 2nd Virtual International

Conference on Intelligent Production Machines and Systems (IPROMS 2006). 2006.

[13]Goldberg, David E., and John H. Holland. "Genetic algorithms and machine learning." Machine learning 3.2 (1988).


Thank You

presentation

Documents

position of x

prey x

best search agent x

alpha wolves

best position

best agent

prey local search

current search agent