towards billion bit optimization via parallel estimation of distribution algorithm
DESCRIPTION
This paper presents a highly efficient, fully parallelized implementation of the compact genetic algorithm to solve very large scale problems with millions to billions of variables. The paper presents principled results demonstrating the scalable solution of a difficult test function on instances over a billion variables using a parallel implementation of compact genetic algorithm (cGA). The problem addressed is a noisy, blind problem over a vector of binary decision variables. Noise is added equaling up to a tenth of the deterministic objective function variance of the problem, thereby making it difficult for simple hillclimbers to find the optimal solution. The compact GA, on the other hand, is able to find the optimum in the presence of noise quickly, reliably, and accurately, and the solution scalability follows known convergence theories. These results on noisy problem together with other results on problems involving varying modularity, hierarchy, and overlap foreshadow routine solution of billion-variable problems across the landscape of search problems.TRANSCRIPT
Towards Billion Bit Optimization via Efficient Estimation of Distribution Algorithms
Kumara Sastry1,2 David E. Goldberg1, Xavier Llorà1,3
1Illinois Genetic Algorithms Laboratory (IlliGAL)2Materials Computation Center (MCC)
3National Center for Super Computing Applications (NCSA)University of Illinois at Urbana-Champaign, Urbana, IL 61801
[email protected], [email protected], [email protected]://www.illigal.uiuc.edu
Supported by AFOSR FA9550-06-1-0096 and NSF DMR 03-25939. Computational results were obtained using CSE’s Turing cluster.
2
Billion-Bit Optimization?
Strides w/ genetic algorithm (GA) theory/practice.Solving large, hard problems in principled way.Moving to practice in important problem domains.
Still GA boobirds claim:(1) no theory, (2) too slow, and (3) just voodoo.
How demonstrate results achieved so far in dramatic way?
DEG lunch questions:A million? Sure. A billion? Maybe.
Naïve GA approach/implementation goes nowhere:~100 terabytes memory for population storage.~272 random number calls.
3
RoadmapMotivation
Robust, scalable, and efficient GA designs
Toward billion-variable optimizationTheory KeysImplementation KeysEfficiency KeysResults
Why this matters in practice?
Challenges to using this in real-world.
Summary and Conclusions
4
Three Os and Million/Billion Decisions
The Os all have many decisions to make:
Nano, Bio, and Info
Modern systems increasingly complex:~105 parts in a modern automobile ~107 parts in commercial jetliner.
Increased complexity increases appetite for large optimization.
Will be driven toward routine million/billion variable problems.
“We get the warhead and then hold the world ransom for... 1 MILLION dollars!”
5
Competent and Efficient GAsRobust, scalable and efficient GA designs available.
Competence: Solve hard problems quickly, reliably, and accurately (Intractable to tractable).
Efficiency: Develop speedup procedures (tractability to practicality).
Principled design: [Goldberg, 2002]
Relax rigor, emphasize scalability/quality.Use problem decomposition.Use facetwise models, and patchquilt integration using dimensional analysis.Test algorithms on adversarial problems
6
Aiming for a Billion
Theory & algorithms in place.Focus on key theory, implementation, & efficiency enhancements.Theory keys:
Problem difficulty.Parallelism.
Implementation key: compact GA.Efficiency keys:
Various speedup.Memory savings.
Results on a billion-variable noisy OneMax.
7
Theory Key 1: Master-Slave Linear Speedup
Speed-up:
Max speed-up at
[Cantu-Paz & Goldberg, 1997; Cantú-Paz, 2000]
Near linear speed-up until
8
Theory Key 2: Noise Covers Most Problems
Adversarial problem design [Goldberg, 2002]
Blind noisy OneMax
P
Fluctuating
Deception NoiseScaling R
9
Implementation Key: Compact GA
Simplest probabilistic model building GA [Harik, Lobo & Goldberg, 1997; Baluja, 1994; Mühlenbein & Paaß, 1996]
Represent population by probability vectorProbability that ith bit is 1
Replace recombination with probabilistic sampling
Selectionist scheme
New population evolution through probability updates
Equivalent to GA with steady-state tournament selection and uniform crossover
10
Compact Genetic Algorithm (cGA)
Random initialization: Set probabilities to 0.5
Model Sampling: Generate two candidate solutions by sampling the probability vector
Evaluation: Evaluate the fitness of two sampled solutions
Selection: Select the best among the sampled solutions
Probabilistic model update: Increase the proportion of winning alleles by 1/n
11
Parallel cGA Architecture
Processor #np
Sample bits (np-1)l/np+1- l
Select best individual
Update probabilities
Processor #1
Sample bits 1- l/np
Select best individual
Update probabilities
Processor #2
Sample bits l/np+1- 2l/np
Select best individual
Update probabilities
Collect partial sampled solutions and combine
Parallel fitness evaluation of sampled solutions
Broadcast fitness values of sampled solutions
12
cGA is Memory Efficient: O(l) vs. O(l1.5)
Orders of magnitude memory savings via efficient GAExample: ~32 MB per processor on a modest 128 processors for billion-bit optimization
Simple GA:
Compact GA:Frequencies instead of probabilities (4 bytes)Parallelization reduces memory per processor by factor of np
13
Vectorization Yields Speedup of 4
SIMD instruction set allows vector operations on 128-bit registersEquivalent to 4 processors per processor
Vectorize costly code segments with AltiVec/SSE2
Generate 4 random numbers at a timeSample 4 bits at a timeUpdate 4 probabilities at a time
14
Other Efficiencies Yield Speedup of 15
Bitwise operations
Limited floating-point operations
Inline functions
Avoid using mod and division operations
Precomputing bit sums and indexing
Parallel, vectorized, and efficient GA:Memory scales as Θ(l/np); Speedup scales as 60np
~32 MB memory, and ~104 speedup with 128 processors
Solves 65,536-bit noisy OneMax problem in ~45 minutes on a 3GHz PC.
15
Experimental Procedure
128 – 256 processor partition of 1280-processor Apple G5 Xserve
Population was doubled till cGA converged to at least l-1 out of l bits set to optimal values
For l > 223; Population size fixed according to theory.
Number of independent runsl ≤ 218 (262,144): 50l ≤ 225 (33, 554, 432): 10l > 225 (33, 554, 432): 1
Compare cGA performance withSequential hillclimber (sHC)Random hillclimber (rHC)
16
Compact GA Population Sizing
Additive Gaussian noise with variance σ2
N
Population sizing scales:O(l0.5 log l)
Noise-to-fitness variance ratio
Error toleranceSignal-to-Noise ratio # Competing sub-components
# Components (# BBs)
[Harik, et al, 1997]
17
Compact GA Convergence Time
Selection Intensity
Problem size (m·k )
[Miller & Goldberg, 1995; Goldberg, 2002; Sastry & Goldberg, 2002]
Convergence time scales: O(m0.5)
GA scales as:O(m log m)
18
Scalability on OneMax
19
GA scales Θ(l·logl·(1+σ2N/σ2
f))
EDA Solves Billion-Bit Noisy OneMax
Solved 33 million (225) bit problem to optimality.Solved 1.1 billion (230) bit problem with relaxed, but guaranteed convergence
20
Do Problems Like This Matter?Yes, for three reasons:
Many GAs no more sophisticated than cGA.Inclusion of noise was important because it covers an important facet of difficulty.Know how to handle deception and other problems through EDAs like hBOA.
Compact GA-like algorithms can solve tough problems:Material science [Sastry et al, 2004; Sastry et al, 2005]*
Chemistry [Sastry et al, 2006]**.
Complex versions of these kinds of problems need million/billion-bit optimization.
*chosen by the AIP editors as focused article of frontier research in Virtual Journal of Nanoscale Science & Technology, 12(9), 2005; **Best paper and Silver “Humies” award. GECCO 2006]
21
Challenges to Routine Billion-Bit Optimization
What if you have large nonlinear solver (PDE, ODE, FEM, KMC, MD, whatever)?
Need efficiency enhancement:Parallelization: Effective use of computational “space”Time continuation: Effective use of computational “time”Hybridization: Effective use of global and local searchersEvaluation relaxation: Effective use of expensive-accurate & cheap-inaccurate evaluations
Need more powerful solvers
Need highly efficient implementations
22
Summary and Conclusions
Parallel and efficient implementation of compact GA.Memory and computational efficiency enhancementsSolved 33 million bit noisy OneMax problem to optimalitySolved 1.1 billion bit noisy OneMax problem to relaxed, but guaranteed convergence
Big optimization is a frontier today:Take extant cluster computingMix in robustness, scalability and efficiency lessonsIntegrate into problems.
Nano, bio, and info systems are increasingly complex:Call for routine mega/giga-variable optimizationNeed robust, scalable and efficient methods.