how to update the di erent parameters m

How to update the di!erent parameters ?m, σ, C1. Adapting the mean 2. Adapting the step-size 3. Adapting the covariance matrix

mσ

C

Why Step-size Adaptation?

Assume a (1+1)-ES algorithm with !xed step-size (and ) optimizing the function .

σ

C = Id f(x) =n

∑i=1

x2i = ∥x∥2

Initialize m, σWhile (stopping criterion not met)

sample new solution: x ← m + σ#(0,Id)

if f(x) ≤ f(m)m ← x

What will happen if you look at the convergence

of f(m)?

red curve: (1+1)-ES with optimal step-size (see later) green curve: (1+1)-ES with constant step-size ( )σ = 10−3

Why Step-size Adaptation?PHASE III"" "* ±

/ )

twhat is going on for late) - ES with fixed step-size ?

-- -mt

-

÷:*:::÷:±÷÷¥¥÷÷€€

ot K Hmt H . Phase 3 : int close to optimumof> Hmt H

Boba to sample better point small

Does the ( att ) - ES with fixed step- size converge ?

II : Does mt o ?

AFTER NEXT SLIDE- .

ebay.es#iEixkxmtbIf T ⇒ Hmtll then Ir ( Hmt + off, Ia) H L llmtll) small-

candidate solution

⇒ with large probability meter = mt ( sampled solutionnot accepted )

↳ stagnation in phase II

Phani Ir ( Hmt + onto, Ia) H - Hmt " ) x I §¥€

red curve: (1+1)-ES with optimal step-size (see later) green curve: (1+1)-ES with constant step-size ( )σ = 10−3

Why Step-size Adaptation?

We need step-size adaptation to approach

the optimum fast (converge linearly)

Methods for Step-size Adaptation

1/5th success rule, typically applied with “+” selection[Rechenberg, 73][Schumer and Steiglitz, 78][Devroye, 72]

-self adaptation, applied with “,” selectionσ

random variation is applied to the step-size and the better one, according to the objective function value, is selected

[Schwefel, 81]

path-length control or Cumulative step-size adaptation (CSA), applied with “,” selection

[Ostermeier et al. 84][Hansen, Ostermeier, 2001]

two-point adaptation (TPA), applied with “,” selection [Hansen 2008]

test two solutions in the direction of the mean shift, increase or decrease accordingly the step-size

Step-size control: 1/5th Success Rule

Step-size control: 1/5th Success Rule

probability of success per iteration: ps = #candidate solutions better than m

#candidate solutions

[ f(x) ≤ f(m)]

(1+1)-ES with One-!fth Success Rule - Convergence

• what if display ft instead ?

CCONVERGENCEtoo,mo

C-*no- e←

ontogeny rate

t.tn Hmtll → c> o almost surely

GLOBAL LINEAR CONVERGENCE . ie in log - scale , awe observe a line

after an adaptation phase .

Froottoo involved to be presented shortly

Also provenf

g of where f is strongly - convex - Lipschitzg : Im(f) → IR strictly increasing

Path Length Control - Cumulative Step-size Adaptation (CSA)

step-size adaptation used in the -ES algorithm framework (in CMA-ES in particular)

(μ/μw, λ)

Main Idea:

CSA-ES The Equations

The update of the step- are in CSA -ES satisfiesIf f- (x) = rand [random objective function ) , ponchoId)

or fat. No, D sampled independently at eachtime

f- [log Lotti ) I otimt) = log lot)i.e. .. .. x

PRoo candidate solutions xi= mt + oti Vft , where

{ ytiei , i-- i, . - rt , y't, nRaIdllytii ) i

define it such that

flxa.tl e - -- E fl xx :t)

and y' such that xi# = mteotyit,

under random selection fytijt u updated) and hytiit,i --e . - it} are

independent .

Consequently yw-

- Aw ?Iwiy'Iit v upload)Hone

where few = [ proof )

therefore poet, -- H - co) pot +EEF,wiytii

v No , Id) if ptu Wfaa)Here if f- rand , poncho , Id) , ohh ptf.gov Who, Id ) ft

since ota -- ot up¥C¥F⇒i ' ))log Lotti) = log lot +¥ - t)

[email protected] - t))-

= geofIBptI - e )in= 0

Convergence of -CSA-ES(μ/μw, λ)2x11 runs

Convergence of -CSA-ES(μ/μw, λ)

Note: initial step-size taken too small ( ) to illustrate the step-size adaptation

σ0 = 10−2

Convergence of -CSA-ES(μ/μw, λ)

Optimal Step-size - Lower-bound for Convergence Rates

In the previous slides we have displayed some runs with “optimal” step-size.

Optimal step-size relates to step-size proportional to the distance to the optimum: where is the optimum of the optimized function (with properly chosen).

The associated algorithm is not a real algorithm (as it needs to know the distance to the optimum) but it gives bounds on convergence rates and allows to compute many important quantities.

σt = σ∥x − x⋆∥ x⋆

σ

The goal for a step-size adaptive algorithm is to achieve convergence rates close to the one with optimal step-size

We will formalize this in the context of the (1+1)-ES. Similar results can be obtained for other algorithm frameworks.

Optimal Step-size - Bound on Convergence Rate - (1+1)-ES

Consider a (1+1)-ES algorithm with any step-size adaptation mechanism:

Xt+1 = {Xt + σt#t+1 if f(Xt + σt#t+1) ≤ f(Xt)Xt otherwise

Xt+1 = Xt + σt#t+11{f(Xt+σt#t+1)≤ f(Xt)}

with i.i.d. {#t, t ≥ 1} ∼ #(0,Id)

equivalent writing:

Bound on Convergence Rate - (1+1)-ES

Theorem: For any objective function , for any f : ℝn → ℝy⋆ ∈ ℝn

E[∥Xt+1 − y⋆∥] ≥ E[∥Xt − y⋆∥] − τwhere with τ = max

σ∈ℝ>E[ln− ∥e1 + σ#∥]

=:φ(σ)

e1 = (1,0,…,0)

Theorem: The convergence rate lower-bound is reached on spherical functions (with strictly increasing) and step-size proportional to the distance to the optimum with such that .

f(x) = g(∥x − x⋆∥) g : ℝ≥0 → ℝ

σt = σopt∥x − x⋆∥ σopt φ(σopt) = τ

lower boundin in

hilxkmaxh - en,o3j¥_hi

Xtti = Xt + et Nfu Itgflxteotdttiefcxt)} after -- WEED)

xtta is either equal to Xt or xttotirtx,

Titus

Axtell 2 mind HXHI , Htttotopttill }= llxttlmimtyd , HY¥upE¥tNMtt'M

lnlltt-ull-lnllxth-lnlminhmll.FI#itffInMtt'll}-

= - hi ( HII t :# Nt" ")

lnlltttill -- lnllxth - hi lift,,*f¥nMtt'll⇐[lnlltteitl lot, mmxt) -- lntlttll + EE

t lot,mXt)

bmma-IE-Llnminhn.ly#*ffInMttiHlglotixt )→41 YEH,) where

*410)=-E[ lnlminhn.lleri-owllf.HE.fi HentaiE- [ lnllxttill lot.tt/--ln1lxtH-4/ftf)9looptt--mgxUlo)

in

z.NO/z-mgx9lo)optweZ

-6-410) Ko) = # [hithertoMll )) :PRz

§§,

Progress rate

apt

- No)=E[ln(minds, Hee -10Wh } )Uco)=E[ hither-10Wh ))

Log-Linear Convergence of scale-invariance step-size ES

Theorem: The (1+1)-ES with step-size proportional to the distance to the optimum converges (log)-linearly on the sphere function almost surely:

σt = σ∥x∥f(x) = g(∥x∥)

1t

ln ∥Xt∥∥X0∥ t→∞

− φ(σ) =: CR(1+1)(σ) feat::tanarAlgo.

Asymptotic Results (n → ∞)

1st -- 014×11

Asymptotic Results (n → ∞)

how to update the di erent parameters m

Documents