the assessment and application of lineage information in genetic programs for producing better...
TRANSCRIPT
![Page 1: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston](https://reader036.vdocuments.net/reader036/viewer/2022062620/551b57c8550346ae7a8b561f/html5/thumbnails/1.jpg)
The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models
Gary D. Boetticher [email protected]. of Houston - Clear Lake, Houston, TX, USA
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Kim Kaminsky [email protected]. of Houston - Clear Lake, Houston, TX, USA
![Page 2: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston](https://reader036.vdocuments.net/reader036/viewer/2022062620/551b57c8550346ae7a8b561f/html5/thumbnails/2.jpg)
About the Author: Gary D. Boetticher
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Ph.D. in Machine Learning and Software Engineering
A neural network-based software reuse economic model Executive member of IEEE Reuse Standard Committees (1990s) Commercial consultant:
U.S. Olympic Committee, LDDS Worldcom, Mellon Mortgage, … Currently: Associate Professor
Department of Comp. Science/Software Engineering
University of Houston - Clear Lake,
Houston, TX, USA
[email protected] Research interests: Data mining, ML, Computational Bioinformatics,
and Software metrics
![Page 3: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston](https://reader036.vdocuments.net/reader036/viewer/2022062620/551b57c8550346ae7a8b561f/html5/thumbnails/3.jpg)
Motivating Questions
Does chromosome lineage information within a Genetic Program (GP) provide any insight into the effectiveness of solving problems?
If so, how could these insights be utilized to make better breeding decisions?
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
![Page 4: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston](https://reader036.vdocuments.net/reader036/viewer/2022062620/551b57c8550346ae7a8b561f/html5/thumbnails/4.jpg)
2) Determine the fitness for each (1 /Stand. Error)http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on
Information Reuse and Integration
Genetic Program Overview
X, Y, and Z RESULT?
X Y Z RESULT
2 4 5 30
5 3 2 16
: : : :
1 3 6 24
1) Create a population of equations
Eq# Equation
1 X+Y
2 (Z-X)*Y+X
: :
1000 (X*X)-Z
87
84
:
57
3) Breed Equations
X + Y
(Z-X) * Y+X
(Z-X) + Y
X * Y+X
4) Generate new populations and breed until a solution is found
![Page 5: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston](https://reader036.vdocuments.net/reader036/viewer/2022062620/551b57c8550346ae7a8b561f/html5/thumbnails/5.jpg)
Genetic Program Overview
Equation Fitness
(X+Y) 87
(X - Z) * (Y * Y) 86
ZY 75
: :
Y 22
Y - X 18
Generation N Generation N+1
Equation Fitness
(X - Z)
(X + Y) * (Y * Y)
Z + Y
:
X
Y + Y
Why discard legacy information?
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
![Page 6: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston](https://reader036.vdocuments.net/reader036/viewer/2022062620/551b57c8550346ae7a8b561f/html5/thumbnails/6.jpg)
Goal: Examine fitness patterns over time
Equation Fitness(X+Y) 87
(X - Z) * (Y * Y) 86
ZY 85
(X - Z) * (Y * Y) 84
Y 79
Y - X 75
Z + Y 75
(X - Z) * (Y * Y) 75
Y 73
Y - X 71
(X - Z) * (Y * Y) + W + W 68
Y - X 67
ZY 66
(X - Z) * (Y * Y) 66
Y 65
Y - X 65
(X - Z) * (Y * Y) + W + W 64
Y - X 64
Z - Y 62
(X - Z) * (Y * Y) 59
Y 58
Y - X 55
(X - Z) * (Y * Y) + W + W 44
Equation Fitness(X+Y) 87
(X - Z) * (Y * Y) 86
ZY 85
(X - Z) * (Y * Y) 84
Y 79
Y - X 75
Z + Y 75
(X - Z) * (Y * Y) 75
Y 73
Y - X 71
(X - Z) * (Y * Y) + W + W 68
Y - X 67
ZY 66
(X - Z) * (Y * Y) 66
Y 65
Y - X 65
(X - Z) * (Y * Y) + W + W 64
Y - X 64
Z - Y 62
(X - Z) * (Y * Y) 59
Y 58
Y - X 55
(X - Z) * (Y * Y) + W + W 44
Equation Fitness(X+Y) 87
(X - Z) * (Y * Y) 86
ZY 85
(X - Z) * (Y * Y) 84
Y 79
Y - X 75
Z + Y 75
(X - Z) * (Y * Y) 75
Y 73
Y - X 71
(X - Z) * (Y * Y) + W + W 68
Y - X 67
ZY 66
(X - Z) * (Y * Y) 66
Y 65
Y - X 65
(X - Z) * (Y * Y) + W + W 64
Y - X 64
Z - Y 62
(X - Z) * (Y * Y) 59
Y 58
Y - X 55
(X - Z) * (Y * Y) + W + W 44
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Generation 1 Generation 2 Generation 3
Localized?
Volatile?
![Page 7: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston](https://reader036.vdocuments.net/reader036/viewer/2022062620/551b57c8550346ae7a8b561f/html5/thumbnails/7.jpg)
Proof of Concept Experiments - 1
5 experiments using synthetic equations:Z = W + X + Y
Z = 2 * X + Y – W
Z = X / Y
Z = X3
Z = W2 + W * X - Y
Data slightly perturbedto prevent prematureconvergence
Genetic Program1000 Chromosomes (Equations)50 GenerationsBreeding based on fitness rank
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
![Page 8: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston](https://reader036.vdocuments.net/reader036/viewer/2022062620/551b57c8550346ae7a8b561f/html5/thumbnails/8.jpg)
Proof of Concept Experiments - 2
For the 1000 Chromosomes:
Divide into 5 groups of 200(by fitness)
Focus on the best, middle, and worst groups
See where each group’s offspring occur in the next generation
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
![Page 9: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston](https://reader036.vdocuments.net/reader036/viewer/2022062620/551b57c8550346ae7a8b561f/html5/thumbnails/9.jpg)
Results for Z = W + X + Y
Best
Middle
Worst
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
![Page 10: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston](https://reader036.vdocuments.net/reader036/viewer/2022062620/551b57c8550346ae7a8b561f/html5/thumbnails/10.jpg)
Results for Z = 2 * X + Y – W
Best
Middle
Worst
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
![Page 11: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston](https://reader036.vdocuments.net/reader036/viewer/2022062620/551b57c8550346ae7a8b561f/html5/thumbnails/11.jpg)
Results for Z = X / Y
Best
MiddleWorst
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
![Page 12: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston](https://reader036.vdocuments.net/reader036/viewer/2022062620/551b57c8550346ae7a8b561f/html5/thumbnails/12.jpg)
Results for Z = X 3
Best
Middle
Worst
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
![Page 13: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston](https://reader036.vdocuments.net/reader036/viewer/2022062620/551b57c8550346ae7a8b561f/html5/thumbnails/13.jpg)
Results for Z = W 2 + W * X - Y
Best
Middle
Worst
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
![Page 14: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston](https://reader036.vdocuments.net/reader036/viewer/2022062620/551b57c8550346ae7a8b561f/html5/thumbnails/14.jpg)
Applied ExperimentsBest class produces best offspring. Now what?Compare 2 Genetic Programs (GPs)
1) Use a vanilla-based GP2) Use a GP that breeds only the top 20% of a
population and replicates 5 times.
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Genetic Program1000 Chromosomes (Equations)50 Generations20 Trials
Equations to modelZ = Sin(W) + Sin(X) + Sin(Y)
Z = log10
(WX) + (Y * Z)
![Page 15: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston](https://reader036.vdocuments.net/reader036/viewer/2022062620/551b57c8550346ae7a8b561f/html5/thumbnails/15.jpg)
Results for Z = Sin(W) + Sin(X) + Sin(Y)
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Vanilla-Based
GP
Lineage-Based
GPAverage Fitness 591.8 740.9
Average r2 0.8734 0.9315
Ave. Generations needed to complete
29.1
28.5
![Page 16: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston](https://reader036.vdocuments.net/reader036/viewer/2022062620/551b57c8550346ae7a8b561f/html5/thumbnails/16.jpg)
Results for Z = log10
(W X) + (Y * Z)
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Vanilla-Based
GP
Lineage-Based
GPAverage Fitness 210.9 346.5
Average r2 0.7244 0.8069
Ave. Generations needed to complete
50.0
48.6
![Page 17: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston](https://reader036.vdocuments.net/reader036/viewer/2022062620/551b57c8550346ae7a8b561f/html5/thumbnails/17.jpg)
Conclusions
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Proof of concept experiments demonstrate the viability of considering lineage in GPs
Applied experiments show that lineage-based GP modeling produce better results faster