towards billion bit optimization via parallel estimation of distribution algorithm

Towards Billion Bit Optimization via Efficient Estimation of Distribution Algorithms

Kumara Sastry1,2 David E. Goldberg1, Xavier Llorà1,3

1Illinois Genetic Algorithms Laboratory (IlliGAL)2Materials Computation Center (MCC)

3National Center for Super Computing Applications (NCSA)University of Illinois at Urbana-Champaign, Urbana, IL 61801

[email protected], [email protected], [email protected]://www.illigal.uiuc.edu

Supported by AFOSR FA9550-06-1-0096 and NSF DMR 03-25939. Computational results were obtained using CSE’s Turing cluster.

mailto:[email protected]



http://www.illigal.uiuc.edu/

2

Billion-Bit Optimization?

Strides w/ genetic algorithm (GA) theory/practice.Solving large, hard problems in principled way.Moving to practice in important problem domains.

Still GA boobirds claim:(1) no theory, (2) too slow, and (3) just voodoo.

How demonstrate results achieved so far in dramatic way?

DEG lunch questions:A million? Sure. A billion? Maybe.

Naïve GA approach/implementation goes nowhere:~100 terabytes memory for population storage.~272 random number calls.

3

RoadmapMotivation

Robust, scalable, and efficient GA designs

Toward billion-variable optimizationTheory KeysImplementation KeysEfficiency KeysResults

Why this matters in practice?

Challenges to using this in real-world.

Summary and Conclusions

4

Three Os and Million/Billion Decisions

The Os all have many decisions to make:

Nano, Bio, and Info

Modern systems increasingly complex:~105 parts in a modern automobile ~107 parts in commercial jetliner.

Increased complexity increases appetite for large optimization.

Will be driven toward routine million/billion variable problems.

“We get the warhead and then hold the world ransom for... 1 MILLION dollars!”

5

Competent and Efficient GAsRobust, scalable and efficient GA designs available.

Competence: Solve hard problems quickly, reliably, and accurately (Intractable to tractable).

Efficiency: Develop speedup procedures (tractability to practicality).

Principled design: [Goldberg, 2002]

Relax rigor, emphasize scalability/quality.Use problem decomposition.Use facetwise models, and patchquilt integration using dimensional analysis.Test algorithms on adversarial problems

http://www.amazon.com/Genetic-Algorithms-Innovation-David-Goldberg/dp/0387353747/ref=sr_1_1/002-1306644-4349618?ie=UTF8&s=books&qid=1177359281&sr=1-1

6

Aiming for a Billion

Theory & algorithms in place.Focus on key theory, implementation, & efficiency enhancements.Theory keys:

Problem difficulty.Parallelism.

Implementation key: compact GA.Efficiency keys:

Various speedup.Memory savings.

Results on a billion-variable noisy OneMax.

7

Theory Key 1: Master-Slave Linear Speedup

Speed-up:

Max speed-up at

[Cantu-Paz & Goldberg, 1997; Cantú-Paz, 2000]

Near linear speed-up until

http://www.amazon.com/exec/obidos/ASIN/0792372212/illigallab/104-9079836-7059158

8

Theory Key 2: Noise Covers Most Problems

Adversarial problem design [Goldberg, 2002]

Blind noisy OneMax

P

Fluctuating

Deception NoiseScaling R

9

Implementation Key: Compact GA

Simplest probabilistic model building GA [Harik, Lobo & Goldberg, 1997; Baluja, 1994; Mühlenbein & Paaß, 1996]

Represent population by probability vectorProbability that ith bit is 1

Replace recombination with probabilistic sampling

Selectionist scheme

New population evolution through probability updates

Equivalent to GA with steady-state tournament selection and uniform crossover

10

Compact Genetic Algorithm (cGA)

Random initialization: Set probabilities to 0.5

Model Sampling: Generate two candidate solutions by sampling the probability vector

Evaluation: Evaluate the fitness of two sampled solutions

Selection: Select the best among the sampled solutions

Probabilistic model update: Increase the proportion of winning alleles by 1/n

11

Parallel cGA Architecture

Processor #np

Sample bits (np-1)l/np+1- l

Select best individual

Update probabilities

Processor #1

Sample bits 1- l/np



Processor #2

Sample bits l/np+1- 2l/np



Collect partial sampled solutions and combine

Parallel fitness evaluation of sampled solutions

Broadcast fitness values of sampled solutions

12

cGA is Memory Efficient: O(l) vs. O(l1.5)

Orders of magnitude memory savings via efficient GAExample: ~32 MB per processor on a modest 128 processors for billion-bit optimization

Simple GA:

Compact GA:Frequencies instead of probabilities (4 bytes)Parallelization reduces memory per processor by factor of np

13

Vectorization Yields Speedup of 4

SIMD instruction set allows vector operations on 128-bit registersEquivalent to 4 processors per processor

Vectorize costly code segments with AltiVec/SSE2

Generate 4 random numbers at a timeSample 4 bits at a timeUpdate 4 probabilities at a time

14

Other Efficiencies Yield Speedup of 15

Bitwise operations

Limited floating-point operations

Inline functions

Avoid using mod and division operations

Precomputing bit sums and indexing

Parallel, vectorized, and efficient GA:Memory scales as Θ(l/np); Speedup scales as 60np

~32 MB memory, and ~104 speedup with 128 processors

Solves 65,536-bit noisy OneMax problem in ~45 minutes on a 3GHz PC.

15

Experimental Procedure

128 – 256 processor partition of 1280-processor Apple G5 Xserve

Population was doubled till cGA converged to at least l-1 out of l bits set to optimal values

For l > 223; Population size fixed according to theory.

Number of independent runsl ≤ 218 (262,144): 50l ≤ 225 (33, 554, 432): 10l > 225 (33, 554, 432): 1

Compare cGA performance withSequential hillclimber (sHC)Random hillclimber (rHC)

16

Compact GA Population Sizing

Additive Gaussian noise with variance σ2

N

Population sizing scales:O(l0.5 log l)

Noise-to-fitness variance ratio

Error toleranceSignal-to-Noise ratio # Competing sub-components

# Components (# BBs)

[Harik, et al, 1997]

http://www.illigal.uiuc.edu/pub/papers/IlliGALs/96004.pdf

17

Compact GA Convergence Time

Selection Intensity

Problem size (m·k )

[Miller & Goldberg, 1995; Goldberg, 2002; Sastry & Goldberg, 2002]

Convergence time scales: O(m0.5)

GA scales as:O(m log m)



18

Scalability on OneMax

19

GA scales Θ(l·logl·(1+σ2N/σ2

f))

EDA Solves Billion-Bit Noisy OneMax

Solved 33 million (225) bit problem to optimality.Solved 1.1 billion (230) bit problem with relaxed, but guaranteed convergence

20

Do Problems Like This Matter?Yes, for three reasons:

Many GAs no more sophisticated than cGA.Inclusion of noise was important because it covers an important facet of difficulty.Know how to handle deception and other problems through EDAs like hBOA.

Compact GA-like algorithms can solve tough problems:Material science [Sastry et al, 2004; Sastry et al, 2005]*

Chemistry [Sastry et al, 2006]**.

Complex versions of these kinds of problems need million/billion-bit optimization.

*chosen by the AIP editors as focused article of frontier research in Virtual Journal of Nanoscale Science & Technology, 12(9), 2005; **Best paper and Silver “Humies” award. GECCO 2006]

http://web.mse.uiuc.edu/group/papers/GP-IJMCE.pdf

http://dx.doi.org/10.1103/PhysRevB.72.085438

ftp://ftp-illigal.ge.uiuc.edu/pub/papers/IlliGAL2/2006005.pdf

21

Challenges to Routine Billion-Bit Optimization

What if you have large nonlinear solver (PDE, ODE, FEM, KMC, MD, whatever)?

Need efficiency enhancement:Parallelization: Effective use of computational “space”Time continuation: Effective use of computational “time”Hybridization: Effective use of global and local searchersEvaluation relaxation: Effective use of expensive-accurate & cheap-inaccurate evaluations

Need more powerful solvers

Need highly efficient implementations

22

Summary and Conclusions

Parallel and efficient implementation of compact GA.Memory and computational efficiency enhancementsSolved 33 million bit noisy OneMax problem to optimalitySolved 1.1 billion bit noisy OneMax problem to relaxed, but guaranteed convergence

Big optimization is a frontier today:Take extant cluster computingMix in robustness, scalability and efficiency lessonsIntegrate into problems.

Nano, bio, and info systems are increasingly complex:Call for routine mega/giga-variable optimizationNeed robust, scalable and efficient methods.

towards billion bit optimization via parallel estimation of distribution algorithm

Business

simple ga

memory efficient

efficient ga example

key theory

efficient ga designs

theory algorithms

lnp sample bits lnp

ga boobirds claim