cgp visits the santa fe trail – effects of heuristics on gp cezary z. janikow christopher j mann...

CGP Visits the Santa Fe Trail – Effects of Heuristics on GP

Cezary Z. Janikow

Christopher J MannUMSL

Roadmap• GP

• GP Search Space

• Local heuristics

• CGP

• Heuristics in SantaFe Trail• Function/Terminal set

• Structural

• Combination

• Generality

• Probabilistic heuristics

• Summary

GP Search SpaceRepresentation Space

Solution Space•Best mappings

•One-to-one, onto

•Real life

•Large function/terminal set

•Redundancy

•Many-to-one

•Can domain-specific knowledge improve GP performance?

•Can we learn some domain-specific knowledge from GP?

Map

GP Search Space

• 2-D space– Tree structures

• constrained by size limits and function arity

– Tree instances of specific structures• constrained by domain sizes

Instances

Uninstantiated Structures

Representation space

size limit

sin(a))/2+(x=y /

+ 2

x sin

a

Pruning/Constraining GP Search Space

•Tree structures

•Hard to accomplish directly w/o instantiations

•Indirect by adjusting possible instantiations

•Tree instances

•Strong constraints

•prohibit some instantiations (labelings)

•Structure-preserving cross, STGP, CGP, CFG-GP

•Weak probabilistic constraints

•favor some instantiations over others

•CGP, Probabilistic Tree Grammars

GP Design

• GP only explores a well defined subspace of the potential search space

• Later generations search smaller subspaces• Initial choice of the root node has significant

impact on search and final solution– Called the GP Design

• Daida, Langdon, Hall and Soule

• Heuristics can alter the design and redirect later generations toward specific subspaces

• Conversely, observing the designs tells us about problem-specific heuristics - ACGP

CGP

• Principles

• What heuristics/constraints can be processed

CGP Principles

• Strong input constraints– Prune the search space in such a way that valid

parent(s) guarantee valid offspring

– Start with valid initialization

• Weak probabilistic constraints– Adjust probabilities of specific

mutations/crossovers• Only local heusristics

• Both with minimal linear overhead

GP with Strong and Weak Constraints

Reproduction

Mutation/Crossover

Pi Pi+1

Pruned non-uniformdistribution

Probabilistic Grammars, CGP, EDA

CGP Means of Processing

• Strong constraints – Explicit structures and by data typing

• Overloaded functions on types• Weak constraints

CGP Means of Processing

• Explicit labeling constraints– First order only

• Parent-child

• Can be with probability

• Data typing constraints– Propagated through overloaded

functions• This links first-order information

f

ff

first-order

CGP Mutation

/

+

sin

a

x

2

/

+

*

c

x

2

3

GP Crossover

/

+

sin

a

x

2

+

2

y

+

4

/

+

sin

a

x y

+

4

2

+

2

SantaFe Experiments

• Problem

• Function set

• Heuristics exploration

• Generality of the heuristics

• Comparing vs. ACGP’s probabilistic heuristics (on performance)

SantaFe Problem

• 32x32 grid

• Food trail, 144 cells long, with 21 turns and 89 pieces of food

• Start northwest corner of the grid facing east

• Fitness is the number of food pieces consumed in up to 400 moves

SantaFe Functions/Terminals• Terminals

– turn left, right, move action

• Functions– if-food-ahead

• test the position directly ahead for food, and if true perform the first action, otherwise perform the second action

– progn2, progn3 • take two and three arguments, respectively, and execute

them sequentially.

Experimental Methodology• Analyze and propose heuristics

– Reducing function set

– Constraining root and local structures

– Combing the above

• Assess heuristics using 10 independent runs– Learning curves – average of best

– Efficiency – average tree size in populations

Reducing Function Set: Basics, Quality

Reducing Function Set: Basics, Efficiency

Figure 2. Efficiency for reduced sets (individually).

0

50

100

150

200

250

300

350

400

0 5 10 15 20 25 30 35 40 45 50

Generations

!P3

Base

!P2

!R

!L

Avera

ge C

om

ple

xit

y

Reducing Function Set: Combined, Quality

Figure 3. Learning for reduced sets (combined).

30

40

50

60

70

80

90

0 5 10 15 20 25 30 35 40 45 50

Generations

!P3

!R!P3

Base

!R

Avera

ge F

itn

ess

Reducing Function Set: Combined, Efficiency

Figure 4. Efficiency for reduced sets (combined).

0

50

100

150

200

250

300

350

400

0 5 10 15 20 25 30 35 40 45 50

Generations

!P3

!R!P3

Base

!R

Avera

ge C

om

ple

xit

y

Constraining Root and Local Structure: Basics, Quality

Figure 5. Learning for basic structural heuristics.

30

40

50

60

70

80

90

0 5 10 15 20 25 30 35 40 45 50

Generations

ifrootif0m

if0m

Base

ifroot

Aver

age F

itn

ess

Constraining Root and Local Structure: Basics,Efficiency

Figure 6 Efficiency for basic structural heuristics.

0

50

100

150

200

250

300

350

400

450

500

550

600

0 5 10 15 20 25 30 35 40 45 50

Generations

ifrootif0m

if0mBase

ifroot

Avera

ge C

om

ple

xit

y

Constraining Root and Local Structure: Combined,

Quality

Figure 7. Learning for combined structural heuristics.

30

40

50

60

70

80

90

0 5 10 15 20 25 30 35 40 45 50

Generations

ifrootif0m

ifrootif0mif1p

Base

if1p

Avera

ge F

itn

ess

Constraining Root and Local Structure: Combined,

Efficiency

Figure 8. Efficiency for combined structural heuristics.

0

50

100

150

200

250

300

350

400

450

500

550

600

0 5 10 15 20 25 30 35 40 45 50

Generations

ifrootif0m

ifrootif0mif1pBase

if1p

Avera

ge C

om

ple

xity

Combined Function Set and Structural Heuristics:

Quality

Figure 9. Learning for combinations of heuristics.

30

40

50

60

70

80

90

0 5 10 15 20 25 30 35 40 45 50

Generations

!P3ifrootif0m

!P3if0m

!P3

if0m

Base

Avera

ge F

itn

ess

Combined Function Set and Structural Heuristics:

Efficiency

Figure 10. Efficiency for combinations of heuristics.

0

50

100

150

200

250

300

350

400

450

500

550

600

0 5 10 15 20 25 30 35 40 45 50

Generations

!P3ifrootif0m!P3if0m!P3if0mBase

Avera

ge C

om

ple

xity

More Combined Heuristics: Quality

Figure 11. Learning for more combined heuristics.

30

40

50

60

70

80

90

0 5 10 15 20 25 30 35 40 45 50Generations

!R!P3ifrootif0m

!P3ifrootif0m

ifrootif0m

!P3

!R!P3

Base

Avera

ge F

itn

ess

More Combined Heuristics: Quality

Figure 12. Efficiency for more combined heuristics.

0

50

100

150

200

250

300

350

0 5 10 15 20 25 30 35 40 45 50

Generations

!R!P3ifrootif0m!P3ifrootif0mifrootif0m!P3!R!P3Base

Avera

ge C

om

ple

xity

Best Heuristics by Inspection• Analyze best trees

– constrain progn2 and progn3 so that neither can call neither (P!P2!P3)

– constrain root to always test for food (ifroot)

– constrain if-food-ahead to always move first if there is food ahead (if0m), while disallowing testing for food again if there is no food ahead (if1!if).

• Best heuristics even though individual components were not best

Best Heuristics by Inspection: Quality (vs. components)

Figure 13. Learning with CJM heuristic.

30

40

50

60

70

80

90

0 5 10 15 20 25 30 35 40 45 50

Generations

CJMP3!P2!P3base

P2!P2!P3if1!if

Avera

ge F

itness

Best Heuristics by Inspection: Efficiency (vs. components)

Figure 14. Efficiency with CJM heuristic.

0

50

100

150

200

250

300

350

0 5 10 15 20 25 30 35 40 45 50

Generations

CJMP3!P2!P3baseP2!P2!P3if1!if

Avera

ge C

om

ple

xity

Best Heuristics Summary: Quality

Figure 15. Learning summaries. .

30

40

50

60

70

80

90

0 5 10 15 20 25 30 35 40 45 50

Generations

CJM

!R!P3ifrootif0m

ifrootif0m

!P3

Base

Avera

ge F

itn

ess

Best Heuristics Summary: Efficiency

Figure 16. Efficiency summaries. .

0

50

100

150

200

250

300

350

0 5 10 15 20 25 30 35 40 45 50

Generations

CJM!R!P3ifrootif0mifrootif0m!P3Base

Avera

ge

Co

mp

lexit

y

Best Shortest Solution• (if-food-ahead move (progn3

right (if-food-ahead move (progn3 left left (if-food-ahead move right))) move))

Testing Slightly Different Trails: Same Basic

Primitives

Figure 17. Learning for slightly different trails.

30

40

50

60

70

80

90

0 5 10 15 20 25 30 35 40 45 50

Generations

CJM

!R!P3ifrootif0mifrootif0m!P3

Base

Avera

ge F

itn

ess

Testing Different Trails: Similar Basic Primitives

Figure 18. Learning on substantially different trails.

20

30

40

50

60

70

80

90

0 5 10 15 20 25 30 35 40 45 50

Generations

CJM

!R!P3ifrootif0mifrootif0m

!P3Base

Avera

ge F

itn

ess

Learning Probabilistic Heuristics with ACGP

Figure 19. Learning curve in ACGP (off-line mode).

30

40

50

60

70

80

90

0 50 100 150 200 250 300 350 400 450 500

Generations

ACGP Learning Curve

baseAvera

ge F

itness

Comparing Probabilistic Heuristics vs. Strong

Figure 20. Comparing our heuristics against ACGP’s.

30

40

50

60

70

80

90

0 5 10 15 20 25 30 35 40 45 50

Generations

C JM!R !P3ifrootif0mif rootif0m!P3BaseACGP

Av

era

ge

Fit

ne

ss

Magnification of Top Three

87

88

89

0 5 10 15 20 25 30 35 40 45 50

Summary 1

• Heuristics improve GP search

• Learning curve improves

• Learning complexity improves

• Timing improves because if low overhead

• Complex heuristics may be better even if their components are not very good

• Good components do not guarantee better combination

Summary 2

• Probabilistic heuristics can easily outperform strong heuristics

• But may be less comprehensible if information sought

• Heuristics are specific to a problem

• Help on similar problems

• More specific are less less generalizing

• Conversely, learning heuristics may tell us about domain knowledge

cgp visits the santa fe trail – effects of heuristics on gp cezary z. janikow christopher j mann...

Documents

specific subspacesconversely

gp search space2d spacetree

acgps probabilistic

qualityreducing function

gp designdaida

gp performance

aboveassess heuristics

valid offspringstart