using previous models to bias structural learning in the hierarchical boa

Motivation Outline hBOA Problems Biasing Experiments Conclusions

Using Previous Models to Bias StructuralLearning in the Hierarchical BOA

M. Hauschild1 M. Pelikan1 K. Sastry2 D.E. Goldberg2

1Missouri Estimation of Distribution Algorithms Laboratory (MEDAL)Department of Mathematics and Computer Science

University of Missouri - St. Louis

2Illinois Genetic Algorithms Laboratory (ILLiGAL)Department of Industrial and Enterprise Systems Engineering

University of Illinois at Urbana-Champaign

Genetic and Evolutionary Computation Conference, 2008

M. Hauschild, M. Pelikan, K. Sastry and D. E. Goldberg Universities of Missouri - St. Louis and Illinois at Urbana-Champaign

Using Previous Models to Bias hBOA


Motivation

In optimization, always looking to solve harder problemshBOA can solve a broad class of problems robustly andfast

Scalability isn’t always enough

Much work has been done in speeding up hBOASporadic Model-BuildingParallelizationOthers




Motivation

Each run of hBOA leaves us with a tremendous amount ofinformation

The algorithm decomposes the problem for usThis information we get “for free”

Use this knowledge to bias our model-building on similarproblems

Exploit this free informationFocus our model-based search to promising regions




Outline

hBOA

Test Problems

Biasing Model-BuildingExperiments

Probability Coincidence Matrix (PCM)Distance-based bias

Conclusions




hierarchical Bayesian Optimization Algorithm (hBOA)

Pelikan, Goldberg, and Cantú-Paz; 2001Uses Bayesian network with local structures to modelsolutions

Acyclic directed GraphString positions are the nodesEdges represent conditional dependenciesWhere there is no edge, implicit independence

Niching to maintain diversity




hBOA

Two ComponentsStructure

Edges determine dependenciesMajority of time spent here

ParametersConditional probabilities depending on parentsExample - p(Accident|Wet Road, Speed)

Network built greedily, one edge at a time

Metric punishes complexity




Test Problems

Test ProblemsConcatenated Trap of Order 52D Ising Spin GlassMAXSAT

ExperimentsBiasing using PCMBiasing using Distance metric




Trap-5

Partition binary string into disjoint groups of 5 bits

trap5(ones) =

{

5 if ones = 54 − ones otherwise

, (1)

Total fitness is sum of single traps

Optimum: String 1111...1




2D Ising Spin Glass

Origin in physics

Spins arranged on a 2D grid

Each spin sj can have two values: +1 or -1

Each connection i , j has a weight Jij . Set of weightsspecifies one instance.




2D Ising Spin Glass

Energy is given by...

E(C) =∑

〈i ,j〉

siJi ,jsj , (2)

Problem is to find the values of the spins so energy isminimizedVery hard for most optimization techniques

Extremely large number of local optimaDecomposition of bounded order is insufficientSolvable in polynomial time by analytical techniques

hBOA solves it in polynomial timeA deterministic hill-climber(DHC) is used to improve thequality of evaluated solutions




MAXSAT

Find the maximum satisfiable clauses in a propositionallogic formulaUsually expressed in k-CNF form

Logical and of clausesLogical or of literalsClauses of at most length k

Example of a 3-CNF with 4 propositions

(X4 ∨ X3) ∧ (X1 ∨ ¬X2) ∧ (¬X4 ∨ X2 ∨ X3) (3)




MAXSAT

Can be mapped to many other problems

NP-Complete when k > 1Not much structure in general case

Combined graph-coloringCombined depending on parameter pRegular ring lattice (1-p edges)Random graph (p edges)

Test under various amounts of structure




Biasing Model Building

Exploit information from previous runs to simplify MBTwo approaches

Bias MB using a Probability Coincidence Matrix (PCM)Directly gathered from previous runs

Bias MB using a distance thresholdPrior knowledge of the problem




Biasing using PCM

When hBOA solves a problem, left with a series of modelsCompute a PCM from these models

Pij is probability of an edge between i and jIf Pij=.25, 25% of models contain an edge between i and j

Do not consider when an edge occursSome runs more strongly represented

Good and bad

We can now use PCM to restrictPij < pmin




Biasing using PCM

Trap-5 10x10 Ising spin glass

node i

node

j

10 20 30 40 50

10

20

30

40

50 0

0.5

1

2 4 6 8 10 12

2468

1012

node i

node

j

20 40 60 80 100

20

40

60

80

100 0

0.2

0.4

0.6

2 4 6 8 10 12

2468

1012




Biasing using distance metric

On other problems, define a distance metricVector Q such that Qi is proportion of dependencies at idistance or shorterQmax = 1Corresponds to strength of dependency between variablesMAXSAT, MVC, IsingCan now use Q to restrict

Qd < qmin




Biasing Model Building

Benefits (of good bias)Model-building becomes faster

Less dependencies to examine

Increase effectiveness of searchLess spurious dependencies

Negatives (of poor bias)Slower convergenceLarger populationFailure to solve the problem




PCM on Ising Spin Glass

Need to learn PCM from sample

Show for some pmin, the effects

100 instances of 4 different sizesCross-validation

PCM built from 90 instances, used to solve remaining 10Repeated 10 times

Varied pmin from 0 to maximum value based oncomputational time





20x20

0 5 100

1

2

3

4

5

Minimum edge percentage allowed

Exe

cutio

n T

ime

Spe

edup

28x28

0 0.5 1 1.5 20

1

2

3

4

5


Exe

cutio

n T

ime

Spe

edup

24x24

0 1 2 30

1

2

3

4

5


Exe

cutio

n T

ime

Spe

edup

32x32

0 0.5 1 1.5 20

1

2

3

4

5


Exe

cutio

n T

ime

Spe

edup





20x20

0 5 100

5

10

15

20

25

30

35

Minimum edge percent allowed

Red

uctio

n F

acto

r

28x28

0 0.5 1 1.5 20

5

10

15

20

25

30

35


Red

uctio

n F

acto

r

24x24

0 1 2 30

5

10

15

20

25

30

35


Red

uctio

n F

acto

r

32x32

0 0.5 1 1.5 20

5

10

15

20

25

30

35


Red

uctio

n F

acto

r




Distance Bias on Ising Spin Glass

100 instances of 4 different sizesRestrict dependencies by maximum distance allowed

From 1 to half the maximumOmitted results for too severe restrictions





16x16

0.2 0.4 0.6 0.8 10

1

2

3

4

5

6

← 1

← 2 ← 3

← 4

← 5

← 6← 7

← 8

16 →

Original Ratio of Total Dependencies

Exe

cutio

n T

ime

Spe

edup

24x24

0.2 0.4 0.6 0.8 10

1

2

3

4

5

6

← 2

← 3

← 4← 5

← 6← 7

← 8

← 9← 10

← 11← 12

24 →


Exe

cutio

n T

ime

Spe

edup

20x20

0.2 0.4 0.6 0.8 10

1

2

3

4

5

6

← 1

← 2

← 3

← 4← 5

← 6

← 7← 8

← 9← 10

20 →


Exe

cutio

n T

ime

Spe

edup

28x28

0.2 0.4 0.6 0.8 10

1

2

3

4

5

6

← 2

← 3

← 4← 5← 6

← 7← 8

← 9← 10← 11

← 12← 13

← 14

28 →


Exe

cutio

n T

ime

Spe

edup





16x16

0.2 0.4 0.6 0.8 10

5

10

15

20

25

30

35

← 1

← 2

← 3

← 4

← 5← 6← 7← 8 ← 16


Red

uctio

n F

acto

r

24x24

0.2 0.4 0.6 0.8 10

5

10

15

20

25

30

35

← 2

← 3

← 4

← 5

← 6← 7

← 8← 9← 10← 11← 12 ← 24


Red

uctio

n F

acto

r

20x20

0.2 0.4 0.6 0.8 10

5

10

15

20

25

30

35

← 1

← 2

← 3

← 4

← 5← 6

← 7← 8← 9← 10 ← 20


Red

uctio

n F

acto

r

28x28

0.2 0.4 0.6 0.8 10

5

10

15

20

25

30

35

← 2

← 3

← 4

← 5

← 6

← 7← 8

← 9← 10← 11← 12← 13← 14← 28


Red

uctio

n F

acto

r




Distance Bias on MAXSAT

Must define a distance metricCreate a graph for each MAXSAT instance

Propositions in same clause are connectedDistance between any other propositions is shortest path inthis graph

Combined graph-coloring, 3 levels of structurep = 1, 2−4, 2−8

100 instances of each type

500 propositions, 3100 clauses

Distances from 1 to the maximum





p=1

0.98 0.99 10

0.5

1

1.5

2

4 →

5 →All →


Exe

cutio

n T

ime

Spe

edup

p=2−4

0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

← 3

← 4

← 5

All →


Exe

cutio

n T

ime

Spe

edup

p=2−8

0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

← 3← 4

← 5

← 6← 7

All →


Exe

cutio

n T

ime

Spe

edup





p=1

0.98 0.99 10

0.5

1

1.5

24 →

5 →All →


Red

uctio

n F

acto

r

p=2−4

0.7 0.8 0.9 10

2

4

6

8

10 ← 3

← 4

← 5

All →


Red

uctio

n F

acto

r

p=2−8

0.4 0.6 0.8 10

2

4

6

8

10

← 3

← 4

← 5

← 6← 7

All →


Red

uctio

n F

acto

r




Conclusions

We took some first steps in exploiting previous runs tospeed up new runsLeads to speedups from 4-5 on Ising spin glasses to1.5-2.5 on MAXSATCan be extended to many other problemsNext step is to try and automate processEfficiency enhancements work together

Parallelization 50Hybridization 2Learning from past runs 2Evaluation Relaxation 1.1Total 220




Any Questions?



using previous models to bias structural learning in the hierarchical boa

Technology

hierarchical boa