cgp visits the santa fe trail – effects of heuristics on gp cezary z. janikow christopher j mann...
TRANSCRIPT
CGP Visits the Santa Fe Trail – Effects of Heuristics on GP
Cezary Z. Janikow
Christopher J MannUMSL
Page 2
Roadmap• GP
• GP Search Space
• Local heuristics
• CGP
• Heuristics in SantaFe Trail• Function/Terminal set
• Structural
• Combination
• Generality
• Probabilistic heuristics
• Summary
Page 3
GP Search SpaceRepresentation Space
Solution Space•Best mappings
•One-to-one, onto
•Real life
•Large function/terminal set
•Redundancy
•Many-to-one
•Can domain-specific knowledge improve GP performance?
•Can we learn some domain-specific knowledge from GP?
Map
Page 4
GP Search Space
• 2-D space– Tree structures
• constrained by size limits and function arity
– Tree instances of specific structures• constrained by domain sizes
Instances
Uninstantiated Structures
Representation space
size limit
sin(a))/2+(x=y /
+ 2
x sin
a
Page 5
Pruning/Constraining GP Search Space
•Tree structures
•Hard to accomplish directly w/o instantiations
•Indirect by adjusting possible instantiations
•Tree instances
•Strong constraints
•prohibit some instantiations (labelings)
•Structure-preserving cross, STGP, CGP, CFG-GP
•Weak probabilistic constraints
•favor some instantiations over others
•CGP, Probabilistic Tree Grammars
Page 6
GP Design
• GP only explores a well defined subspace of the potential search space
• Later generations search smaller subspaces• Initial choice of the root node has significant
impact on search and final solution– Called the GP Design
• Daida, Langdon, Hall and Soule
• Heuristics can alter the design and redirect later generations toward specific subspaces
• Conversely, observing the designs tells us about problem-specific heuristics - ACGP
CGP
• Principles
• What heuristics/constraints can be processed
Page 8
CGP Principles
• Strong input constraints– Prune the search space in such a way that valid
parent(s) guarantee valid offspring
– Start with valid initialization
• Weak probabilistic constraints– Adjust probabilities of specific
mutations/crossovers• Only local heusristics
• Both with minimal linear overhead
Page 9
GP with Strong and Weak Constraints
Reproduction
Mutation/Crossover
Pi Pi+1
Pruned non-uniformdistribution
Probabilistic Grammars, CGP, EDA
Page 10
CGP Means of Processing
• Strong constraints – Explicit structures and by data typing
• Overloaded functions on types• Weak constraints
Page 11
CGP Means of Processing
• Explicit labeling constraints– First order only
• Parent-child
• Can be with probability
• Data typing constraints– Propagated through overloaded
functions• This links first-order information
f
ff
first-order
Page 12
CGP Mutation
/
+
sin
a
x
2
/
+
*
c
x
2
3
Page 13
GP Crossover
/
+
sin
a
x
2
+
2
y
+
4
/
+
sin
a
x y
+
4
2
+
2
SantaFe Experiments
• Problem
• Function set
• Heuristics exploration
• Generality of the heuristics
• Comparing vs. ACGP’s probabilistic heuristics (on performance)
SantaFe Problem
• 32x32 grid
• Food trail, 144 cells long, with 21 turns and 89 pieces of food
• Start northwest corner of the grid facing east
• Fitness is the number of food pieces consumed in up to 400 moves
SantaFe Functions/Terminals• Terminals
– turn left, right, move action
• Functions– if-food-ahead
• test the position directly ahead for food, and if true perform the first action, otherwise perform the second action
– progn2, progn3 • take two and three arguments, respectively, and execute
them sequentially.
Experimental Methodology• Analyze and propose heuristics
– Reducing function set
– Constraining root and local structures
– Combing the above
• Assess heuristics using 10 independent runs– Learning curves – average of best
– Efficiency – average tree size in populations
Reducing Function Set: Basics, Quality
Reducing Function Set: Basics, Efficiency
Figure 2. Efficiency for reduced sets (individually).
0
50
100
150
200
250
300
350
400
0 5 10 15 20 25 30 35 40 45 50
Generations
!P3
Base
!P2
!R
!L
Avera
ge C
om
ple
xit
y
Reducing Function Set: Combined, Quality
Figure 3. Learning for reduced sets (combined).
30
40
50
60
70
80
90
0 5 10 15 20 25 30 35 40 45 50
Generations
!P3
!R!P3
Base
!R
Avera
ge F
itn
ess
Reducing Function Set: Combined, Efficiency
Figure 4. Efficiency for reduced sets (combined).
0
50
100
150
200
250
300
350
400
0 5 10 15 20 25 30 35 40 45 50
Generations
!P3
!R!P3
Base
!R
Avera
ge C
om
ple
xit
y
Constraining Root and Local Structure: Basics, Quality
Figure 5. Learning for basic structural heuristics.
30
40
50
60
70
80
90
0 5 10 15 20 25 30 35 40 45 50
Generations
ifrootif0m
if0m
Base
ifroot
Aver
age F
itn
ess
Constraining Root and Local Structure: Basics,Efficiency
Figure 6 Efficiency for basic structural heuristics.
0
50
100
150
200
250
300
350
400
450
500
550
600
0 5 10 15 20 25 30 35 40 45 50
Generations
ifrootif0m
if0mBase
ifroot
Avera
ge C
om
ple
xit
y
Constraining Root and Local Structure: Combined,
Quality
Figure 7. Learning for combined structural heuristics.
30
40
50
60
70
80
90
0 5 10 15 20 25 30 35 40 45 50
Generations
ifrootif0m
ifrootif0mif1p
Base
if1p
Avera
ge F
itn
ess
Constraining Root and Local Structure: Combined,
Efficiency
Figure 8. Efficiency for combined structural heuristics.
0
50
100
150
200
250
300
350
400
450
500
550
600
0 5 10 15 20 25 30 35 40 45 50
Generations
ifrootif0m
ifrootif0mif1pBase
if1p
Avera
ge C
om
ple
xity
Combined Function Set and Structural Heuristics:
Quality
Figure 9. Learning for combinations of heuristics.
30
40
50
60
70
80
90
0 5 10 15 20 25 30 35 40 45 50
Generations
!P3ifrootif0m
!P3if0m
!P3
if0m
Base
Avera
ge F
itn
ess
Combined Function Set and Structural Heuristics:
Efficiency
Figure 10. Efficiency for combinations of heuristics.
0
50
100
150
200
250
300
350
400
450
500
550
600
0 5 10 15 20 25 30 35 40 45 50
Generations
!P3ifrootif0m!P3if0m!P3if0mBase
Avera
ge C
om
ple
xity
More Combined Heuristics: Quality
Figure 11. Learning for more combined heuristics.
30
40
50
60
70
80
90
0 5 10 15 20 25 30 35 40 45 50Generations
!R!P3ifrootif0m
!P3ifrootif0m
ifrootif0m
!P3
!R!P3
Base
Avera
ge F
itn
ess
More Combined Heuristics: Quality
Figure 12. Efficiency for more combined heuristics.
0
50
100
150
200
250
300
350
0 5 10 15 20 25 30 35 40 45 50
Generations
!R!P3ifrootif0m!P3ifrootif0mifrootif0m!P3!R!P3Base
Avera
ge C
om
ple
xity
Best Heuristics by Inspection• Analyze best trees
– constrain progn2 and progn3 so that neither can call neither (P!P2!P3)
– constrain root to always test for food (ifroot)
– constrain if-food-ahead to always move first if there is food ahead (if0m), while disallowing testing for food again if there is no food ahead (if1!if).
• Best heuristics even though individual components were not best
Best Heuristics by Inspection: Quality (vs. components)
Figure 13. Learning with CJM heuristic.
30
40
50
60
70
80
90
0 5 10 15 20 25 30 35 40 45 50
Generations
CJMP3!P2!P3base
P2!P2!P3if1!if
Avera
ge F
itness
Best Heuristics by Inspection: Efficiency (vs. components)
Figure 14. Efficiency with CJM heuristic.
0
50
100
150
200
250
300
350
0 5 10 15 20 25 30 35 40 45 50
Generations
CJMP3!P2!P3baseP2!P2!P3if1!if
Avera
ge C
om
ple
xity
Best Heuristics Summary: Quality
Figure 15. Learning summaries. .
30
40
50
60
70
80
90
0 5 10 15 20 25 30 35 40 45 50
Generations
CJM
!R!P3ifrootif0m
ifrootif0m
!P3
Base
Avera
ge F
itn
ess
Best Heuristics Summary: Efficiency
Figure 16. Efficiency summaries. .
0
50
100
150
200
250
300
350
0 5 10 15 20 25 30 35 40 45 50
Generations
CJM!R!P3ifrootif0mifrootif0m!P3Base
Avera
ge
Co
mp
lexit
y
Best Shortest Solution• (if-food-ahead move (progn3
right (if-food-ahead move (progn3 left left (if-food-ahead move right))) move))
Testing Slightly Different Trails: Same Basic
Primitives
Figure 17. Learning for slightly different trails.
30
40
50
60
70
80
90
0 5 10 15 20 25 30 35 40 45 50
Generations
CJM
!R!P3ifrootif0mifrootif0m!P3
Base
Avera
ge F
itn
ess
Testing Different Trails: Similar Basic Primitives
Figure 18. Learning on substantially different trails.
20
30
40
50
60
70
80
90
0 5 10 15 20 25 30 35 40 45 50
Generations
CJM
!R!P3ifrootif0mifrootif0m
!P3Base
Avera
ge F
itn
ess
Learning Probabilistic Heuristics with ACGP
Figure 19. Learning curve in ACGP (off-line mode).
30
40
50
60
70
80
90
0 50 100 150 200 250 300 350 400 450 500
Generations
ACGP Learning Curve
baseAvera
ge F
itness
Comparing Probabilistic Heuristics vs. Strong
Figure 20. Comparing our heuristics against ACGP’s.
30
40
50
60
70
80
90
0 5 10 15 20 25 30 35 40 45 50
Generations
C JM!R !P3ifrootif0mif rootif0m!P3BaseACGP
Av
era
ge
Fit
ne
ss
Magnification of Top Three
87
88
89
0 5 10 15 20 25 30 35 40 45 50
Page 40
Summary 1
• Heuristics improve GP search
• Learning curve improves
• Learning complexity improves
• Timing improves because if low overhead
• Complex heuristics may be better even if their components are not very good
• Good components do not guarantee better combination
Page 41
Summary 2
• Probabilistic heuristics can easily outperform strong heuristics
• But may be less comprehensible if information sought
• Heuristics are specific to a problem
• Help on similar problems
• More specific are less less generalizing
• Conversely, learning heuristics may tell us about domain knowledge