cpsc 881: machine learning genetic algorithm. 2 copy right notice most slides in this presentation...
TRANSCRIPT
CpSc 881: Machine Learning
Genetic Algorithm
2
Copy Right Notice
Most slides in this presentation are adopted from slides of text book and various sources. The Copyright belong to the original authors. Thanks!
3
Overview of Genetic Algorithm (GA)
GA is a learning method motivated by analogy to biological evolution.
GAs search the hypothesis space by generating successor hypotheses which repeatedly mutate and recombine parts of the best currently known hypotheses.
In Genetic Programming (GP), entire computer programs are evolved to certain fitness criteria.
4
Biological Evolution
Lamarck and others:Species “transmute” over time
Darwin and Wallace:Consistent, heritable variation among individuals in populationNatural selection of the fittest
Mendel and genetics:A mechanism for inheriting traitsGenotype->phenotype mapping
5
Genetic Algorithms
The algorithm operates by iteratively updating a pool of hypotheses, called the population.
On each iteration, all members of the population are evaluated according to the fitness function.
A new population is then generated by probabilistically selecting the most fit individuals from current population
some of these selected individuals are carried forward into the next generation population intact.Others are used as the basis for creating new offspring individuals by applying genetic operations.
6
GA
GA(Fitness, Threshold, p, r, m)Initialize Population: generate p hypotheses at random->P.
Evaluate: for each p, compute fitness(h)
While Maxh Fitness(h) < Threshold do
Select: probabilistically select (1-r)p members of P. Call this new generation Pnew
Crossover: probabilistically select (r*p)/2 pairs of hypotheses from P. For each pair, <h1, h2> produce two offspring by applying the crossover operator. Add all offspring to Pnew.
Mutate: Choose m% of PNew with uniform probability. For each, invert one randomly selected bit in its representation.
Update: P <- Pnew
Evaluate: for each p in P, compute fitness(p)
Return the hypothesis from P that has the highest fitness.
||
1)(
)()Pr(
P
ji
ii
hFitness
hFitnessh
7
Representing Hypotheses
In GAs, hypotheses are often represented by bit strings so that they can be easily manipulated by genetic operators such as mutation and crossover.
Examples:Represent
(Outlook = Overcast v Rain) ^ (Wind = Strong) By Outlook Wind
011 10RepresentIF Wind = Strong THEN PlayTennis = yes By Outlook Wind PlayTennis
111 10 10
8
Genetic Operators
Crossover Techniques: produce two new offspring from two parent strings by copying selecting bits form each parents. The choice of which parent contributes the bit for position i is determined by an additional string called crossover mask.
Single-point Crossover: Mask example: 11111000000
Two-point Crossover. Mask example: 00111110000
Uniform Crossover. Mask example: 10011010011
9
Genetic Operators
Mutation Techniques: produces small random changes to the bit string
Point Mutation
10
Select Most Fit Hypothesis
A simple measure for modeling the probability that a hypothesis will be selected is given by the fitness proportionate selection (or roulette wheel selection):
Pr(hi)= Fitness(hi)/j=1p Fitness(hj)
This simple measure can lead to crowding
Tournament SelectionPick h1, h2 at random with uniform probabilityWith probability p, select the more fit
Rank SelectionSort all hypotheses by fitnessProbability of selection is proportional to rank
In classification tasks, the Fitness function typically has a component that scores the classification accuracy over a set of provided training examples. Other criteria can be added (e.g., complexity or generality of the rule)
11
GABIL (DeJong et al. 1993)
Lean disjunctive set of propositional rules, competitive with C4.5
Fitness: Fitness(h)=(correct(h))2
Representation:
IF a1=T Λ a2=F THEN c=T; IF a2=T THEN c=FRepresented bya1 a2 c a1 a2 c10 01 1 11 10 0
Genetic operators: Standard mutation operatorsExtended two point crossover
Want variable length rule sets
Want only well-formed bitstring hypotheses
12
Crossover with Variable-Length Bit-strings
Start witha1 a2 c a1 a2 c
h1 10 01 1 11 10 0h2 01 11 0 10 01 0
1. choose crossover points for h1, e.g., after bit 1, 8
2. now restrict points in h2 to those that produce bitstrings with well-defined semantics, e.g., <1,3>, <1,8>, <6, 8>
Let d1 and d2 denote the distance from the leftmost and rightmost of two crossover points in h1 to the rule boundary immediately to it left.Then, the crossover points in h2 must have the same d1 and d2 values.
If we choose <1,3> result isa1 a2 c
h3 11 10 0a1 a2 c a1 a2 c a1 a2 c
h4 00 01 1 11 11 0 10 01 0
13
GABIL Extentions
Add new genetic operators, also applied probabilistically
1.AddAlternative: generalize constraint on ai by changing a 0 to 1.2. DropCondition: generalize constraint on by change every 0 to 1.
And add new filed to bitstring to determine whether to allow these
a1 a2 c a1 a2 c AA DC
01 11 0 10 01 0 1 0So now the learning strategy also evolves
14
GABIL Results
Performance of GABIL comparable to symbolic rule/tree learning methods C4.5, ID5R, AQ14
Average performance on a set of 12 synthetic problems:
GABIL with out AA and DC operator: 92.1%GABIL with AA and DC operators: 95.2%Symbolic learning methods ranged from 91.2 to 96.6%.
15
Hypothesis Space Search
GA search can move very abruptly (as compared to Backpropagation, for example), replacing a parent hypothesis by an offspring that may be radically different from the parent.The problem of Crowding: when one individual is more fit than others, this individual and closely related ones will take up a large fraction of the population.Solutions:
Use tournament or rank selection instead of roulette selection.Fitness sharing: the measured fitness of an individual is reduced by the presence of other similar individuals in the population.restrict ion on the kinds of individuals allowed to recombine to form offspring.
16
The Schema Theorem [Holland, 75]
Definition: A schema is any string composed of 0s, 1s and *s where * means ‘don’t care’.
Example: schema 0*10 represents strings 0010 and 0110.
Characterize population by number of instance representing each possible schema
17
Consider Just Selection
f(t)= average fitness of population at time t
m(s, t) = number of instances of schema s in population at time t.
u^(s, t) = average fitness of instances of s at time t
Probability of selecting h in one selection step
Probability of selecting an instance of s in one step
Expected number of instances of s after n selections
)(
)(
)(
)()Pr(
1
tnf
hf
hf
hfh
n
ii
),()(
),(ˆ
)(
)()Pr( tsm
tnf
tsu
tnf
hfsh
tpsh
),()(
),(ˆ)]1_,([ tsm
tf
tsutsmE
18
Schema Theorem
The full schema theorem provides a lower bound on the expected frequency of schema s, as follow:
pc= probability of single point crossover operator
pm=probability of mutation operatorl = length of single bit stringso(s) = number of defined (non “*”) bits in sd(s) = distance between leftmost, rightmost defined bits in s
The Schema Theorem: More fit schemas will tend to grow in influence, especially schemas containing a small number of defined bits (i.e., containing a large number of *s), and especially when these defined bits are near one another within the bit string.
)()1(1
)(1),(
)(
),(ˆ)]1_,([ so
mc pl
sdptsm
tf
tsutsmE
19
Genetic Programming
Genetic programming is a form of evolutionary computation in which the individuals in the evolving population are computer programs rather than bit strings
Population of programs represented by trees
20
Genetic Programming
On each interaction, GP produces a new generation of individuals using selection, crossover and mutation.
The fitness of a give individual program in the population is typically determined by executing the program on a set of training data.
21
Crossover
22
Models of Evolution and Learning I: Lamarckian Evolution [Late 19th C] Proposition: Experiences of a single organism directly affect the genetic makeup of their offsprings.
Assessment: This proposition is wrong: the genetic makeup of an individual is unaffected by the lifetime experience of one’s biological parents.
However: Lamarckian processes can sometimes improve the effectiveness of computerized genetic algorithms.
23
Models of Evolution and Learning II: Baldwin Effect
AssumeIndividual learning has no direct influence on individual DNABut ability to learn reduces need to “hard wire” traits in DNA
ThenAbility of individuals to learn will support more diverse gene pool
Because learning allows individuals with various “hard wired” traits to be successful
More diverse gene pool will support faster evolution of gene pool
Individual learning (indirectly) increases rate of evolution
24
Models of Evolution and Learning II: Baldwin Effect
Plausible example:New predator appears in environmentIndividuals who can learn (to avoid it) will be selectedIncrease in learning individuals will support more diverse gene poolResulting in faster evolutionPossibly resulting in new non-learned traits such as instinctive fear of predator
25
Computer Experiments on Baldwin Effect
Evolve simple neural networksSome network weights fixed during lifetime, other trainableGenetic makeup determines which are fixed and their weight values
ResultsWith no individual learning, population failed to improve over timeWhen individual learning allowed
Early generations: population contained many individuals with many trainable weights
Later generations: higher fitness, while number of trainable weights decreased
26
Summary: Evolutionary programming
Conduct randomized, parallel, hill-climbing search through H
Approach learning as optimization problem (optimize fitness)
Nice feature: evaluation of Fitness can be very indirect
Consider learning rule set for multistep decision makingNo issue of assigning credit/blame to individual steps
27
Parallelizing Genetic Algorithms
GAs are naturally suited to parallel implementation. Different approaches were tried:
Coarse Grain: subdivides the population into distinct groups of individuals (demes) and conducts a GA search in each deme. Transfer between demes occurs (though infrequently) by a migration process in which individuals from one deme are copied or transferred to other demesFine Grain: One processor is assigned per individual in the population and recombination takes place among neighboring individuals.