# investigating the impact of adaptation sampling in natural ......

Post on 01-Apr-2018

212 views

Embed Size (px)

TRANSCRIPT

Investigating the Impact of Adaptation Sampling in NaturalEvolution Strategies on Black-box Optimization Testbeds

Tom SchaulCourant Institute of Mathematical Sciences, New York University

Broadway 715, New York, USAschaul@cims.nyu.edu

ABSTRACTNatural Evolution Strategies (NES) are a recent memberof the class of real-valued optimization algorithms that arebased on adapting search distributions. Exponential NES(xNES) are the most common instantiation of NES, andparticularly appropriate for the BBOB 2012 benchmarks,given that many are non-separable, and their relatively smallproblem dimensions. The technique of adaptation sampling,which adapts learning rates online further improves the al-gorithms performance. This report provides an extensiveempirical comparison to study the impact of adaptationsampling in xNES, both on the noise-free and noisy BBOBtestbeds.

Categories and Subject DescriptorsG.1.6 [Numerical Analysis]: Optimizationglobal opti-mization, unconstrained optimization; F.2.1 [Analysis ofAlgorithms and Problem Complexity]: Numerical Al-gorithms and Problems

General TermsAlgorithms

KeywordsEvolution Strategies, Natural Gradient, Benchmarking

1. INTRODUCTIONEvolution strategies (ES), in contrast to traditional evo-

lutionary algorithms, aim at repeating the type of muta-tion that led to those good individuals. We can characterizethose mutations by an explicitly parameterized search dis-tribution from which new candidate samples are drawn, akinto estimation of distribution algorithms (EDA). Covariancematrix adaptation ES (CMA-ES [10]) innovated the field byintroducing a parameterization that includes the full covari-ance matrix, allowing them to solve highly non-separableproblems.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.GECCO12 Companion, July 711, 2012, Philadelphia, PA, USA.Copyright 2012 ACM 978-1-4503-1178-6/12/07 ...$10.00.

A more recent variant, natural evolution strategies (NES [20,6, 18, 19]) aims at a higher level of generality, providing aprocedure to update the search distributions parameters forany type of distribution, by ascending the gradient towardshigher expected fitness. Further, it has been shown [15, 12]that following the natural gradient to adapt the search dis-tribution is highly beneficial, because it appropriately nor-malizes the update step with respect to its uncertainty andmakes the algorithm scale-invariant.

Exponential NES (xNES), the most common instantiationof NES, used a search distribution parameterized by a meanvector and a full covariance matrix, and is thus most sim-ilar to CMA-ES (in fact, the precise relation is describedin [4] and [5]). Given the relatively small problem dimen-sions of the BBOB benchmarks, and the fact that many arenon-separable, it is also among the most appropriate NESvariants for the task.

Adaptation sampling (AS) is a technique for the onlineadaptation of its learning rate, which is designed to speedup convergence. This may be beneficial to algorithms likexNES, because the optimization traverses qualitatively dif-ferent phases, during which different learning rates may beoptimal.

In this report, we retain the original formulation of xNES(including all parameter settings, except for an added stop-ping criterion), and compare this base-line algorithm withan AS-augmented variant. We describe the empirical perfor-mance on all 54 benchmark functions (both noise-free andnoisy) of the BBOB 2012 workshop.

2. NATURAL EVOLUTION STRATEGIESNatural evolution strategies (NES) maintain a search dis-

tribution and adapt the distribution parameters by fol-lowing the natural gradient [1] of expected fitness J , that is,maximizing

J() = E[f(z)] =f(z) (z | ) dz

Just like their close relative CMA-ES [10], NES algorithmsare invariant under monotone transformations of the fit-ness function and linear transformations of the search space.Each iteration the algorithm produces n samples zi (z|),i {1, . . . , n}, i.i.d. from its search distribution, which is pa-rameterized by . The gradient w.r.t. the parameters canbe rewritten (see [20]) as

J() = f(z) (z | ) dz = E [f(z) log (z | )]

221

from which we obtain a Monte Carlo estimate

J() 1

n

ni=1

f(zi) log (zi | )

of the search gradient. The key step then consists in replac-ing this gradient by the natural gradient defined as F1J()where F = E

[ log (z|) log (z|)>

]is the Fisher

information matrix. The search distribution is iterativelyupdated using natural gradient ascent

+ F1J()

with learning rate parameter .

2.1 Exponential NESWhile the NES formulation is applicable to arbitrary pa-

rameterizable search distributions [20, 12], the most com-mon variant employs multinormal search distributions. Forthat case, two helpful techniques were introduced in [6],namely an exponential parameterization of the covariancematrix, which guarantees positive-definiteness, and a novelmethod for changing the coordinate system into a naturalone, which makes the algorithm computationally efficient.The resulting algorithm, NES with a multivariate Gaussiansearch distribution and using both these techniques is calledxNES, and the pseudocode is given in Algorithm 1.

Algorithm 1: Exponential NES (xNES)

input: f , init, , B, uk

initialize init 1B I

repeatfor k = 1 . . . n do

draw sample sk N (0, I)zk + B>skevaluate the fitness f(zk)

end

sort {(sk, zk)} with respect to f(zk)and assign utilities uk to each sample

compute gradientsJ

nk=1 uk sk

MJ nk=1 uk (sks

>k I)

J tr(MJ)/dBJ MJ J I

update parameters + B J exp(/2 J)B B exp(B/2 BJ)

until stopping criterion is met

2.2 Adaptation SamplingFirst introduced in [12] (chapter 2, section 4.4), adapta-

tion sampling is a new meta-learning technique [17] that canadapt hyper-parameters online, in an economical way thatis grounded on a measure statistical improvement.

Here, we apply it to the learning rate of the global step-size . The idea is to consider whether a larger learning-

rate =32 would have been more likely to generate the

good samples in the current batch. For this we determinethe (hypothetical) search distribution that would have re-sulted from such a larger update (|). Then we computeimportance weights

wk =(zk|)(zk|)

for each of the n samples zk in our current population, gen-erated from the actual search distribution (|). We thenconduct a weighted Mann-Whitney test [12] (appendix A) todetermine if the set {rank(zk)} is inferior to its reweightedcounterpart {wk rank(zk)} (corresponding to the largerlearning rate), with statistical significance . If so, we in-crease the learning rate by a factor of 1 + c, up to at most = 1 (where c

= 0.1). Otherwise it decays to its initialvalue:

(1 c) + c ,init

The procedure is summarized in algorithm 2 (for details andderivations, see [12]). The combination of xNES with adap-tation sampling is dubbed xNES-as.

One interpretation of why adaptation sampling is helpfulis that half-way into the search, (after a local attractor hasbeen found, e.g., towards the end of the valley on the Rosen-brock benchmarks f8 or f9), the convergence speed can beboosted by an increased learning rate. For such situations,an online adaptation of hyper-parameters is inherently well-suited.

Algorithm 2: Adaptation sampling

input : ,t,,init, t, t1, {(zk, f(zk))}, c, output: ,t+1compute hypothetical , given t1 and using 3/2,tfor k = 1 . . . n do

wk =(zk|)(zk|)

endS {rank(zk)}S {wk rank(zk)}if weighted-Mann-Whitney(S, S) < then

return (1 c) + c ,initelse

return min((1 + c) , 1)end

3. EXPERIMENTAL SETTINGSWe use identical default hyper-parameter values for all

benchmarks (both noisy and noise-free functions), whichare taken from [6, 12]. Table 1 summarizes all the hyper-parameters used.

In addition, we make use of the provided target fitness foptto trigger independent algorithm restarts1, using a simplead-hoc procedure: If the log-progress during the past 1000d

1It turns out that this use of fopt is technically not permit-ted by the BBOB guidelines, so strictly speaking a differentrestart strategy should be employed, for example the onedescribed in [12].

222

Table 1: Default parameter values for xNES (includ-ing the utility function and adaptation sampling) asa function of problem dimension d.

parameter default value

n 4 + b3 log(d)c

= B3(3 + log(d))

5dd

ukmax

(0, log(n

2+ 1) log(k)

)nj=1 max

(0, log(n

2+ 1) log(j)

) 1n

1

2 1

3(d+ 1)

c1

10

evaluations is too small, i.e., if

log10

fopt ftfopt ft1000d < (r+2)2 m3/2 [log10 |foptft|+8]

where m is the remaining budget of evaluations divided by1000d, ft is the best fitness encountered until evaluation tand r is the number of restarts so far. The total budget is105d3/2 evaluations.

Implementations of this and other NES algorithm vari-ants are available in Python through the PyBrain machinelearning library [16], as well as in other languages at www.idsia.ch/~tom/nes.html.

4. RESULTSResults from experiments according to [7] on the bench-

mark functions given in [2, 8, 3, 9] are presented in Figures 1,2 and 3, and in Tables 2, 3 and 4. The expected runningtime

Recommended