![Page 1: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/1.jpg)
1
Coevolving Solutions to the Shortest Common
Superstring Problem
Assaf Zaritsky & Moshe SipperAssaf Zaritsky & Moshe SipperBen-Gurion University, IsraelBen-Gurion University, Israel
www.cs.bgu.ac.il/~assafzawww.cs.bgu.ac.il/~assafza
![Page 2: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/2.jpg)
2
Outline
The “Shortest Common Superstring” problem.The “Shortest Common Superstring” problem. DNA sequencing and the input domain. Standard and cooperative coevolutionary genetic
algorithm (GA). The Puzzle approach. Conclusions and future work.
Messy Puzzle.
![Page 3: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/3.jpg)
3
The Shortest Common Superstring Problem (SCS)
Let SS = { = {ss11,…,,…,ssnn}} be a set of strings (blocksblocks) over some alphabet ΣΣ. A superstringsuperstring of S is a string x such that each si in S is a substring of x.
Problem: Find shortest (common) superstring.Problem: Find shortest (common) superstring. NP-Complete. MAX-SNP hard. Motivation: DNA sequencing, data compression.
![Page 4: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/4.jpg)
4
S = {ate, half, lethal, alpha, alfalfa} A trivial superstring is “atehalflethalalphaalfalfa” of
length 25 (a simple concatenation of all blocks). A shortest common superstring is “lethalphalfalfate”
of length 17. Note that a “compressed” permutation of the blocks
is actually a superstring.
SCS: Example
![Page 5: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/5.jpg)
5
Approximation Algorithms Several linear approximations for SCS have been
proposed, most of which rely on greedy approaches. GREEDY
The most widely heuristic used in DNA sequencing. Conjecture [Blum 1994, Sweedyk 1999]: Superstring
produced by GREEDY is of length at most two times the optimal.
We are not aware of any previous evolutionary approach to the SCS problem.
![Page 6: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/6.jpg)
6
Outline The “Shortest Common Superstring” problem.The “Shortest Common Superstring” problem.
DNA sequencing and the input domain.DNA sequencing and the input domain. Standard and cooperative coevolutionary genetic
algorithm (GA). The Puzzle approach. Conclusions and future work.
Messy Puzzle.
![Page 7: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/7.jpg)
7
DNA SequencingThe most common usage of the SCS problem.
![Page 8: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/8.jpg)
8
DNA Sequencing (cont’d)
The problem: “read” a string of DNA. Short DNA strands can be read in laboratory. To sequence a long DNA strand:
(The DNA sequence appears in many copies)1. Cut the DNA to short fragments using restriction
enzymes.2. Sequence each of the resulting fragments.
3. Order those fragments using a SCS algorithm.
![Page 9: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/9.jpg)
9
The Input DomainThe input strings used in the experiments were inspired by DNA sequencing:
![Page 10: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/10.jpg)
10
Input Generation Setup: Parameters
NB: increasing number of blocks results in exponential growth of the problem’s complexity.
Size of random string250 bits (~50 blocks)400 bits (~80 blocks)
Minimal block size20 bits
Maximal block size30 bits
Number of duplicates created from a random string
5
![Page 11: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/11.jpg)
11
Outline The “Shortest Common Superstring” problem.The “Shortest Common Superstring” problem. DNA sequencing and the input domain.
Standard and cooperative coevolutionary Standard and cooperative coevolutionary genetic algorithm (GA).genetic algorithm (GA).
The Puzzle approach. Conclusions and future work.
Messy Puzzle.
![Page 12: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/12.jpg)
12
Simple Genetic Algorithmproduce an initialinitial population of individuals
evaluateevaluate fitness of all individuals
whilewhile termination condition not met dodo
selectselect fitter individuals for reproduction
recombinerecombine individuals
mutatemutate individuals
evaluateevaluate fitness of modified individuals
generategenerate a new population
end whileend while
![Page 13: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/13.jpg)
13
Simple GA for the SCS Problem Given a set of strings as input, generate initial
population of random candidate solutions. The fitness of each individual depends on its
lengthlength and accuracyaccuracy. The GA uses selection, recombination, and
mutation to create the next generation, each individual of which is then evaluated.
Theses steps are repeated a predefined number of times or until the solution is deemed satisfactory.
![Page 14: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/14.jpg)
14
Simple GA for the SCS Problem (cont’d)
Blocks of the input set are atomicatomic components. Representation: An individual’s genome is
represented as a sequence of blocks. An individual may have missing blocks or
contain duplicate copies of the same block. Permutation Representation: Good or Bad?
![Page 15: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/15.jpg)
15
Simple GA for the SCS Problem (cont’d)
Evaluation: fitness of an individual is the length of it’s compressed genome + the total length of the blocks that are not covered by the individual.
Genetic operators: Fitness proportionate selection. Two-points recombination. Allows growth and
reduction in genome’s length. Block-change mutation.
![Page 16: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/16.jpg)
16
Simple GA for the SCS Problem (example)
S = {s1,s2,s3,s4}; s1 = 0011, s2 = 1100, s3 = 1001, s4 = 111. Fitness (< s2,s1>) = |110011| + |111| = 6 + 3 = 9. Fitness (< s4,s2,s1,s4>) = |11100111| = 8. Recombination:
p1 = <s1,||s2,s3||,s4> p2 = <s4,||s1,s3,s2||> p3 = recombine1(p1,p2) = <s1,s1,s3 ,s2,s4> p4 = recombine2(p1,p2) = <s4,s2,s3 >
mutate (<s1,s2,s2>) = <s1,s4,s2>
![Page 17: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/17.jpg)
17
Coevolution
Simultaneous evolution of two or more species with coupled fitness.
Coevolving species either competecompete or cooperatecooperate.
Competitive coevolution: Fitness of individual based on direct competition with individuals of other species, which in turn evolve separately in their own populations (“prey-predator”).
![Page 18: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/18.jpg)
18
Cooperative Coevolution
![Page 19: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/19.jpg)
19
Cooperative Coevolution (cont’d)
Cooperative Coevolution involves a number of independently evolving species.
Interaction between species occurs via fitness function only.
The fitness of an individual depends on its ability to collaborate with individuals from other species.
![Page 20: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/20.jpg)
20
Cooperative Coevolution (cont’d)
Source: Potter & DeJong (1997)Source: Potter & DeJong (1997)
![Page 21: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/21.jpg)
21
Cooperative Coevolutionary Algorithm for the SCS Problem
Two species evolve simultaneously. First species contains prefixesprefixes of candidate
solutions to the SCS problem at hand. Second species contains candidate suffixessuffixes. Fitness of an individual in each species
depends on how good it interacts with representativesrepresentatives from other species to construct a global solutionconstruct a global solution.
![Page 22: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/22.jpg)
22
Cooperative Coevolutionary Algorithm for the SCS Problem (evaluation process)
Prefixes population
Suffixes population
Suffix
Suffix
Representative
RepresentativeIndiv
idual
Indiv
idual
Merge
![Page 23: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/23.jpg)
23
Cooperative Coevolutionary Algorithm for the SCS Problem (evaluation process)
Prefixes population
Suffixes population
Fitness
Fitness
Evaluate
![Page 24: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/24.jpg)
24
ExperimentsCompare: GREEDY, Standard GA, Cooperative CoevolutionCompare: GREEDY, Standard GA, Cooperative Coevolution
![Page 25: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/25.jpg)
25
Experimental Setup
Each type of GA was executed twice on each problem instance; the better run of the two was used for statistical purposes.
Population size500Number of generations5000Recombination rate0.8Mutation rate0.03Problem instances per experiment50
![Page 26: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/26.jpg)
26
Results: Experiment I (~50 blocks)
![Page 27: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/27.jpg)
27
Results: Experiment II (~80 blocks)
![Page 28: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/28.jpg)
28
Results: Summary
381381Distance from Distance from optimum: optimum: 131131
280280Distance from Distance from optimum: optimum: 3030
275275Distance from Distance from optimum: optimum: 2525
596596Distance from Distance from optimum: optimum: 196196
685685Distance from Distance from optimum: optimum: 285285
547547Distance from Distance from optimum: optimum: 147147
Problem size
Problem size
Algorithm
Algorithm
50 blocks
80 blocks
GREEDY Genetic Cooperative
Average of the best superstring lengthsAverage of the best superstring lengths
![Page 29: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/29.jpg)
29
Conclusion:
The collaboration between the two The collaboration between the two populations results in a populations results in a good good decomposition of the problem into decomposition of the problem into two smaller sub-problems, each is two smaller sub-problems, each is solved using a standard GA.solved using a standard GA.
![Page 30: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/30.jpg)
30
Outline The “Shortest Common Superstring” problem.The “Shortest Common Superstring” problem. DNA sequencing and the input domain. Standard and cooperative coevolutionary genetic
algorithm (GA).
The The PuzzlePuzzle approach. approach. Conclusions and future work.
Messy Puzzle.
![Page 31: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/31.jpg)
31
The Puzzle Algorithm
![Page 32: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/32.jpg)
32
The Schema Theorem
““Short, low-order, above-average Short, low-order, above-average schemata receive exponentially schemata receive exponentially increasing trials in subsequent increasing trials in subsequent generations of a genetic algorithm.”generations of a genetic algorithm.”
Holland (1975)Holland (1975)
![Page 33: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/33.jpg)
33
Building Blocks Hypothesis
““A genetic algorithm seeks near-optimal A genetic algorithm seeks near-optimal performance through the juxtaposition performance through the juxtaposition of short, low-order, high-performance of short, low-order, high-performance schemata, called the building blocks.”schemata, called the building blocks.”
![Page 34: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/34.jpg)
34
Our Interpretation
““The The success of Gsuccess of GAAss stems from stems from their ability to combine quality their ability to combine quality sub-solutions (building blocks)sub-solutions (building blocks) from separate individuals in order from separate individuals in order to form better global solutions.to form better global solutions.””
![Page 35: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/35.jpg)
35
The Main Assumption
PProblems in nature have an roblems in nature have an inherentinherent structural design. Even structural design. Even when the structure is not known when the structure is not known explicitly Gexplicitly GAAss detect it detect it implicitly and gradually implicitly and gradually enhance good building blocks.enhance good building blocks.
![Page 36: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/36.jpg)
36
A Problem
Recombination may Recombination may destroy quality building destroy quality building blocks found by the GA. blocks found by the GA.
![Page 37: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/37.jpg)
37
ExampleBrain AppearanceBrain Appearance
00101010101010101010000111101000100000010101010101010101000011110100010000
![Page 38: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/38.jpg)
38
Example (con’t)Brain AppearanceBrain Appearance
00101010101010101010000111101000100000010101010101010101000011110100010000
1. Smart (assumable)1. Smart (assumable)
2. Blond 2. Blond
But not very beautiful…But not very beautiful…
![Page 39: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/39.jpg)
39
The Preservation of Favoured Building Blocks in the Struggle for Fitness: The Puzzle Algorithm
![Page 40: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/40.jpg)
40
Puzzle Algorithm: The Idea
Improve Recombination Operator. Preserve good building blocks discovered by
GA using selection of recombination loci that do not destroy good building blocks.
Result: Assembly of good building blocks to construct better solutions (as in a puzzle).
![Page 41: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/41.jpg)
41
Puzzle Algorithm (cont’d) Two populations:
1. Candidate solutions: As in simple GA.2. Building blocks: Each individual is a sequence of blocks contained in at least one candidate solution.
Building blocks population
Candidate solutions population
![Page 42: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/42.jpg)
42
Puzzle Algorithm (cont’d) Interaction between candidate solutionscandidate solutions and
building blocks is through fitness function.
Fitness evaluationFitness evaluation
Crossover locationCrossover location
Building blocks
population
Candidate solutions
population
Interaction between building blocksbuilding blocks and candidate solutions is through constraints on recombination points.
![Page 43: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/43.jpg)
43
Puzzle Algorithm: Zoom In
Building blocks population
Candidate solutions population
Fitness evaluationFitness evaluation
Crossover locationCrossover location
each individual is a sequence of blocks
![Page 44: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/44.jpg)
44
Puzzle Algorithm: Zoom In
Building blocks population
Candidate solutions population
Fitness evaluationFitness evaluation
Crossover locationCrossover location
each building block is contained in at each building block is contained in at least one individual in the solutions least one individual in the solutions
populationpopulation
overlapping building blocks
![Page 45: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/45.jpg)
45
The Candidate Solutions Population
Representation, fitness evaluation, selection, and mutation are identical to the simple GA.
Recombination-aid vector aids in selecting the recombination loci.
Recombination-aid vector is updated by building blocks individuals.
Building blocks population
Candidate solutions population
Fitness evaluation
Crossover location
![Page 46: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/46.jpg)
46
The Building Blocks Population An individual is represented as a sequence of
blocks, contained in at least one candidate solution. Fitness of an individual is the average of the fitness
of candidate solutions containing it. Fitness-proportionate selection.
Building blocks population
Candidate solutions population
Fitness evaluation
Crossover location
![Page 47: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/47.jpg)
47
The Building Blocks Population (con’t) “Unisex” individuals. Two modification operators:
Expansion: Increase it’s genome by one block. Occurs with high probability.
Exploration: “Die”, and start over as a new 2-block individual. Occurs with low probability.
Building blocks population
Candidate solutions population
Fitness evaluation
Crossover location
![Page 48: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/48.jpg)
48
Building Blocks – Candidate Solutions
Fitness evaluationFitness evaluationBuilding blocks population
Candidate solutions population
ff22
ff33
ff44
ff11
![Page 49: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/49.jpg)
49
Building Blocks – Candidate Solutions
Fitness evaluationFitness evaluationBuilding blocks population
Candidate solutions population
ff22
ff33
ff44
ff11
Update Update “recombination-aid” “recombination-aid”
vectorvector
ff11
ff11 ff22
ff22
ff33
ff33
ff44
![Page 50: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/50.jpg)
50
Update Recombination-aid vector
Solution’s genome
building block #1 fitness = 0.3
00000000000000Recombination-aid vector
building block #2 fitness = 0.4
building block #3 fitness = 0.6
![Page 51: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/51.jpg)
51
Update Recombination-aid vector
Solution’s genome
000.60.60.40.4000.30.30.30.300Recombination-aid vector
building block #1 fitness = 0.3
building block #2 fitness = 0.4
building block #3 fitness = 0.6
![Page 52: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/52.jpg)
52
Update Recombination-aid vector
Solution’s genome
0.60.60.60.60.40.4000.30.30.30.30.30.3Recombination-aid vector
building block #1 fitness = 0.3
building block #2 fitness = 0.4
building block #3 fitness = 0.6
![Page 53: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/53.jpg)
53
Recombination-loci selection
Solution’s genome
0.60.60.60.60.40.4000.30.30.30.30.30.3Recombination-aid vector
* Ties are broken arbitrarily
![Page 54: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/54.jpg)
54
ExperimentsCompare: GREEDY, Standard GA, PuzzleCompare: GREEDY, Standard GA, Puzzle
![Page 55: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/55.jpg)
55
Building Blocks - Experimental Setup
Population size1000Expansion rate0.8Exploration rate0.1
![Page 56: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/56.jpg)
56
Results: Experiment III (~50 blocks)
CooperativeCooperative
![Page 57: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/57.jpg)
57
Results: Experiment IV (~80 blocks)
CooperativeCooperative
Did we lose to cooperative?Did we lose to cooperative?
NO!NO!
![Page 58: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/58.jpg)
58
Results: Summary
381381Distance from Distance from optimum: optimum: 131131
280280Distance from Distance from optimum: optimum: 3030
253253Distance from Distance from
optimum: optimum: 33
596596Distance from Distance from optimum: optimum: 196196
685685Distance from Distance from optimum: optimum: 285285
571571Distance from Distance from optimum: optimum: 171171
Problem size
Problem size
Algorithm
Algorithm
50 blocks
80 blocks
GREEDY Genetic Puzzle
Average of the best superstring lengthsAverage of the best superstring lengths
![Page 59: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/59.jpg)
59
Relations Between The Algorithms
Co-PuzzleCo-Puzzle
GAGA
PuzzlePuzzle
puzzl
epu
zzle
puzzl
epu
zzle
CooperativeCooperativecooperation
cooperation
cooperation
cooperation
![Page 60: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/60.jpg)
60
The Co-Puzzle Algorithm
Possible building blocks population
Candidate prefixes population
Fitness eval
Crossover location
Possible building blocks population
Candidate suffixes population
Fitness eval
Crossover location
Fitness evaluation
![Page 61: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/61.jpg)
61
ExperimentsCompare: GREEDY, Cooperative Coevolution, Co-PuzzleCompare: GREEDY, Cooperative Coevolution, Co-Puzzle
![Page 62: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/62.jpg)
62
Results: Experiment V (~80 blocks)
![Page 63: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/63.jpg)
63
Results: Experiment VI (~50 blocks)
PuzzlePuzzle
????????
![Page 64: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/64.jpg)
64
Results: Summary
381381Distance from Distance from optimum: optimum: 131131
275275Distance from Distance from optimum: optimum: 2525
268268Distance from Distance from optimum: optimum: 1818
596596Distance from Distance from optimum: optimum: 196196
547547Distance from Distance from optimum: optimum: 147147
482482Distance from Distance from optimum: optimum: 8282
Problem size
Problem size
Algorithm
Algorithm
50 blocks
80 blocks
GREEDY Cooperative Co-puzzle
size of shortest common superstringsize of shortest common superstring
42% 42% improvement over cooperative
![Page 65: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/65.jpg)
65
Outline The “Shortest Common Superstring” problem.The “Shortest Common Superstring” problem. DNA sequencing and the input domain. Standard and cooperative coevolutionary genetic
algorithm (GA). The Puzzle approach.
Conclusions and future work.Conclusions and future work.
Messy Puzzle.
![Page 66: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/66.jpg)
66
Results: Summary
381381Distance from Distance from optimum:optimum: 131131
275275Distance from Distance from optimum:optimum: 2525
253253Distance from Distance from optimum:optimum: 33
268268Distance from Distance from optimum:optimum: 1818
596596Distance from Distance from optimum:optimum: 196196
547547Distance from Distance from optimum:optimum: 147147
571571Distance from Distance from optimum:optimum: 171171
482482Distance from Distance from optimum:optimum: 8282
Problem size
Problem size
Algorithm
Algorithm
50 blocks
80 blocks
GREEDY Cooperative Co-puzzle
size of shortest common superstringsize of shortest common superstring
Puzzle
677677Distance from Distance from optimum:optimum: 227227
673673Distance from Distance from optimum:optimum: 223223
683683Distance from Distance from optimum:optimum: 233233
617617Distance from Distance from optimum:optimum: 167167
768768Distance from Distance from optimum:optimum: 268268
768768Distance from Distance from optimum:optimum: 268268
813813Distance from Distance from optimum:optimum: 313313
732732Distance from Distance from optimum:optimum: 232232
90 blocks
100 blocks
20 problem instances per experiment
25% 25% betterbetter
13% 13% betterbetter
83% 83% betterbetter
42% 42% betterbetter
![Page 67: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/67.jpg)
67
Larger Problems - Using More Species
836836Distance from Distance from optimum: optimum: 286286
867867Distance from Distance from optimum: optimum: 317317
??Distance from Distance from
optimumoptimum : :??
906906Distance from Distance from optimum: optimum: 306306
992992Distance from Distance from optimum: optimum: 392392
906906Distance from Distance from optimum: optimum: 306306
Problem size
Problem size
Algorithm
Algorithm
110 blocks
120 blocks
GREEDY Co-puzzle 3-Co-puzzle
size of shortest common superstringsize of shortest common superstring
![Page 68: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/68.jpg)
68
Conclusions
Cooperative coevolution might prove Cooperative coevolution might prove deleterious when too many species are deleterious when too many species are used (when close to optimum?).used (when close to optimum?).
When a suitable number of species are When a suitable number of species are used, cooperative coevolution improves used, cooperative coevolution improves performance by decomposing the performance by decomposing the problem to several easier subproblems.problem to several easier subproblems.
![Page 69: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/69.jpg)
69
Conclusions (con’t)
Evolving a population of building blocks Evolving a population of building blocks to aid in the selection of recombination to aid in the selection of recombination loci improves drastically the loci improves drastically the performance of a standard GA.performance of a standard GA.
Cooperation between cooperative Cooperation between cooperative coevolution and Puzzle ultimately coevolution and Puzzle ultimately improves global performance.improves global performance.
![Page 70: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/70.jpg)
70
Future Work Test the (Co-) Puzzle approach on other Test the (Co-) Puzzle approach on other
problem domains.problem domains. A hybrid GA.A hybrid GA.
Tackle larger problems.Tackle larger problems. Comparison to greedy-stochastically based Comparison to greedy-stochastically based
local-search algorithms.local-search algorithms.
![Page 71: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/71.jpg)
71
Outline The “Shortest Common Superstring” problem.The “Shortest Common Superstring” problem. DNA sequencing and the input domain. Standard and cooperative coevolutionary genetic
algorithm (GA). The Puzzle approach. Conclusions and future work.
Messy Puzzle.Messy Puzzle.
![Page 72: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/72.jpg)
72
The Messy Puzzle Algorithm
![Page 73: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/73.jpg)
73
Static Detection of Building Blocks for addressing the
Linkage ProblemHillel MaozHillel Maoz
Ben-Gurion University, IsraelBen-Gurion University, Israel
![Page 74: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/74.jpg)
74
The Linkage Problem A binary Genome of size n = 14.A binary Genome of size n = 14. Genes Genes aa and and bb togethertogether encode important information. encode important information. Random cross over is applied.Random cross over is applied.
Survival probability = The chance to appear in the offspringSurvival probability = The chance to appear in the offspring Left genome – 4/15Left genome – 4/15 Right genome – 14/15Right genome – 14/15
![Page 75: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/75.jpg)
75
The Linkage Problem (con’t)
In many cases it is hard In many cases it is hard to know the optimal to know the optimal
representationrepresentation
![Page 76: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/76.jpg)
76
The MaxCut Problem
Input: undirected weighted graph G=(V, E, W).
Output: a partition of V into two disjoint sets (S,V\S).
Goal: maximal sum of edge weights between the sets.
NP-complete.
![Page 77: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/77.jpg)
77
Cut = 34
Cut = 47
MaxCut - Example
![Page 78: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/78.jpg)
78
Simple GA for MaxCut
Population of candidate solutionsPopulation of candidate solutions• Give each node with a numberGive each node with a number• Assign ‘0’ or ‘1’ to indicate which set the node belongs toAssign ‘0’ or ‘1’ to indicate which set the node belongs to
Iteration step Iteration step • Select any two parentsSelect any two parents• Recombine and create an offspringRecombine and create an offspring• Repeat until a new population is generatedRepeat until a new population is generated
Fitness – The weight of the cutFitness – The weight of the cut
![Page 79: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/79.jpg)
79
The Representation Problem
““How to define the order of the How to define the order of the vertices within the genome ?”vertices within the genome ?”
![Page 80: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/80.jpg)
80
Messy Genes
The main difficulty: identifying the related vertexes. Messy gene is an ordered pair <allele-locus,allele-value>. Possible solution:
Use some sort of messy genes to detect related genes.
Use the Puzzle approach to keep them together.
![Page 81: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/81.jpg)
81
The Messy Puzzle Algorithm
A building block’s genome A building block’s genome is represented as a is represented as a
sequence of messy genessequence of messy genes
![Page 82: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/82.jpg)
82
Messy Puzzle Algorithm
Two population setup as in the puzzle algorithm.Two population setup as in the puzzle algorithm. Enhanced recombination operator.Enhanced recombination operator. Evolved building blocks structure (similar to Evolved building blocks structure (similar to
puzzle).puzzle).
<0,0>
<2,0>
<1,1>
<5,0>
<6,1>
![Page 83: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/83.jpg)
83
Enhanced Recombination
I)I)
II)II)
IIIIII))IV)IV)
0.8 0.7 0.60.8 0.7 0.6
1 2 3 4 5 6 7 81 2 3 4 5 6 7 8 1 2 3 4 5 6 7 81 2 3 4 5 6 7 8
Add the Add the 1st1st BB - success BB - success
Add the Add the 2nd2nd BB - failure BB - failure
Add the Add the 33rdrd BB - success BB - success
Simple crossoverSimple crossover
![Page 84: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/84.jpg)
84
Static Detection of Building Blocks
Building blocks do not truly evolve. No Expansion and Exploration operators. Building blocks’ fitness is based on a number of
generations. Purpose: to check and understand the core of the
messy puzzle algorithm.
![Page 85: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/85.jpg)
85
Results
Max Cut Size - Puzzle VS. GA
0.010.020.030.040.050.060.070.080.0
1 2 3 4 5 6 7 8 9 10
graph number
cut s
ize
diffe
renc
e
1graph_200_0.01_1
2graph_200_0.05_1
3graph_200_0.1_1
4graph_200_0.3_1
5graph_200_0.5_1
6graph_300_0.01_1
7graph_300_0.05_1
8graph_300_0.1_1
9graph_300_0.3_1
10graph_300_0.5_1
Random Generated Graphs.Random Generated Graphs. 1000 generations.1000 generations. 10 separate experiments per problem instance.10 separate experiments per problem instance.
Avg Cut Size - different number of BB (graph_300_0.1_1)
0
10
20
30
40
50
10 20 30 40 50 60
number of BB
dist
ance
from
GA
Avg Cut Size - different number of BB (graph_300_0.5_1)
-20
0
20
40
60
80
10 20 30 40 50 60
number of BB
dist
ance
from
GA
Max Cut Size - Bi-partite graphs
-200
0
200
400
600
800
1000
1200
1400
1600
1 2 3 4 5 6
graph number
cut s
ize
diffe
renc
e
•Distance to optimum
•Puzzle addition
![Page 86: Coevolving Solutions to the Shortest Common Superstring Problem](https://reader035.vdocuments.net/reader035/viewer/2022062810/56815c37550346895dca2555/html5/thumbnails/86.jpg)
86
Conclusions and Future Work Do messy work to solve the linkage problem. Even a small population of building blocks
improves the GA performance. Messy puzzle is better when inner structures
exists.
Applying evolution to the building blocks population.
Comparing to different representation-search techniques.