how to update the di erent parameters m
TRANSCRIPT
![Page 1: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/1.jpg)
How to update the di!erent parameters ?m, σ, C1. Adapting the mean 2. Adapting the step-size 3. Adapting the covariance matrix
mσ
C
![Page 2: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/2.jpg)
Why Step-size Adaptation?
Assume a (1+1)-ES algorithm with !xed step-size (and ) optimizing the function .
σ
C = Id f(x) =n
∑i=1
x2i = ∥x∥2
Initialize m, σWhile (stopping criterion not met)
sample new solution: x ← m + σ#(0,Id)
if f(x) ≤ f(m)m ← x
What will happen if you look at the convergence
of f(m)?
![Page 3: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/3.jpg)
red curve: (1+1)-ES with optimal step-size (see later) green curve: (1+1)-ES with constant step-size ( )σ = 10−3
Why Step-size Adaptation?PHASE III"" "* ±
/ )
![Page 4: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/4.jpg)
twhat is going on for late) - ES with fixed step-size ?
-- -mt
-
÷:*:::÷:±÷÷¥¥÷÷€€
ot K Hmt H . Phase 3 : int close to optimumof> Hmt H
Boba to sample better point small
![Page 5: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/5.jpg)
Does the ( att ) - ES with fixed step- size converge ?
II : Does mt o ?
AFTER NEXT SLIDE- .
![Page 6: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/6.jpg)
ebay.es#iEixkxmtbIf T ⇒ Hmtll then Ir ( Hmt + off, Ia) H L llmtll) small-
candidate solution
⇒ with large probability meter = mt ( sampled solutionnot accepted )
↳ stagnation in phase II
Phani Ir ( Hmt + onto, Ia) H - Hmt " ) x I §¥€
![Page 7: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/7.jpg)
red curve: (1+1)-ES with optimal step-size (see later) green curve: (1+1)-ES with constant step-size ( )σ = 10−3
Why Step-size Adaptation?
We need step-size adaptation to approach
the optimum fast (converge linearly)
![Page 8: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/8.jpg)
Methods for Step-size Adaptation
1/5th success rule, typically applied with “+” selection[Rechenberg, 73][Schumer and Steiglitz, 78][Devroye, 72]
-self adaptation, applied with “,” selectionσ
random variation is applied to the step-size and the better one, according to the objective function value, is selected
[Schwefel, 81]
path-length control or Cumulative step-size adaptation (CSA), applied with “,” selection
[Ostermeier et al. 84][Hansen, Ostermeier, 2001]
two-point adaptation (TPA), applied with “,” selection [Hansen 2008]
test two solutions in the direction of the mean shift, increase or decrease accordingly the step-size
![Page 9: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/9.jpg)
Step-size control: 1/5th Success Rule
![Page 10: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/10.jpg)
Step-size control: 1/5th Success Rule
![Page 11: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/11.jpg)
Step-size control: 1/5th Success Rule
probability of success per iteration: ps = #candidate solutions better than m
#candidate solutions
[ f(x) ≤ f(m)]
![Page 12: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/12.jpg)
(1+1)-ES with One-!fth Success Rule - Convergence
![Page 13: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/13.jpg)
• what if display ft instead ?
CCONVERGENCEtoo,mo
C-*no- e←
ontogeny rate
t.tn Hmtll → c> o almost surely
GLOBAL LINEAR CONVERGENCE . ie in log - scale , awe observe a line
after an adaptation phase .
Froottoo involved to be presented shortly
Also provenf
g of where f is strongly - convex - Lipschitzg : Im(f) → IR strictly increasing
![Page 14: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/14.jpg)
Path Length Control - Cumulative Step-size Adaptation (CSA)
step-size adaptation used in the -ES algorithm framework (in CMA-ES in particular)
(μ/μw, λ)
Main Idea:
![Page 15: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/15.jpg)
CSA-ES The Equations
![Page 16: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/16.jpg)
The update of the step- are in CSA -ES satisfiesIf f- (x) = rand [random objective function ) , ponchoId)
or fat. No, D sampled independently at eachtime
f- [log Lotti ) I otimt) = log lot)i.e. .. .. x
PRoo candidate solutions xi= mt + oti Vft , where
{ ytiei , i-- i, . - rt , y't, nRaIdllytii ) i
define it such that
flxa.tl e - -- E fl xx :t)
![Page 17: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/17.jpg)
and y' such that xi# = mteotyit,
under random selection fytijt u updated) and hytiit,i --e . - it} are
independent .
Consequently yw-
- Aw ?Iwiy'Iit v upload)Hone
where few = [ proof )
therefore poet, -- H - co) pot +EEF,wiytii
v No , Id) if ptu Wfaa)Here if f- rand , poncho , Id) , ohh ptf.gov Who, Id ) ft
![Page 18: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/18.jpg)
since ota -- ot up¥C¥F⇒i ' ))log Lotti) = log lot +¥ - t)
[email protected] - t))-
= geofIBptI - e )in= 0
![Page 19: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/19.jpg)
Convergence of -CSA-ES(μ/μw, λ)2x11 runs
![Page 20: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/20.jpg)
Convergence of -CSA-ES(μ/μw, λ)
Note: initial step-size taken too small ( ) to illustrate the step-size adaptation
σ0 = 10−2
![Page 21: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/21.jpg)
Convergence of -CSA-ES(μ/μw, λ)
![Page 22: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/22.jpg)
Optimal Step-size - Lower-bound for Convergence Rates
In the previous slides we have displayed some runs with “optimal” step-size.
Optimal step-size relates to step-size proportional to the distance to the optimum: where is the optimum of the optimized function (with properly chosen).
The associated algorithm is not a real algorithm (as it needs to know the distance to the optimum) but it gives bounds on convergence rates and allows to compute many important quantities.
σt = σ∥x − x⋆∥ x⋆
σ
The goal for a step-size adaptive algorithm is to achieve convergence rates close to the one with optimal step-size
![Page 23: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/23.jpg)
We will formalize this in the context of the (1+1)-ES. Similar results can be obtained for other algorithm frameworks.
![Page 24: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/24.jpg)
Optimal Step-size - Bound on Convergence Rate - (1+1)-ES
Consider a (1+1)-ES algorithm with any step-size adaptation mechanism:
Xt+1 = {Xt + σt#t+1 if f(Xt + σt#t+1) ≤ f(Xt)Xt otherwise
Xt+1 = Xt + σt#t+11{f(Xt+σt#t+1)≤ f(Xt)}
with i.i.d. {#t, t ≥ 1} ∼ #(0,Id)
equivalent writing:
![Page 25: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/25.jpg)
Bound on Convergence Rate - (1+1)-ES
Theorem: For any objective function , for any f : ℝn → ℝy⋆ ∈ ℝn
E[∥Xt+1 − y⋆∥] ≥ E[∥Xt − y⋆∥] − τwhere with τ = max
σ∈ℝ>E[ln− ∥e1 + σ#∥]
=:φ(σ)
e1 = (1,0,…,0)
Theorem: The convergence rate lower-bound is reached on spherical functions (with strictly increasing) and step-size proportional to the distance to the optimum with such that .
f(x) = g(∥x − x⋆∥) g : ℝ≥0 → ℝ
σt = σopt∥x − x⋆∥ σopt φ(σopt) = τ
lower boundin in
hilxkmaxh - en,o3j¥_hi
![Page 26: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/26.jpg)
Xtti = Xt + et Nfu Itgflxteotdttiefcxt)} after -- WEED)
xtta is either equal to Xt or xttotirtx,
Titus
Axtell 2 mind HXHI , Htttotopttill }= llxttlmimtyd , HY¥upE¥tNMtt'M
lnlltt-ull-lnllxth-lnlminhmll.FI#itffInMtt'll}-
= - hi ( HII t :# Nt" ")
![Page 27: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/27.jpg)
lnlltttill -- lnllxth - hi lift,,*f¥nMtt'll⇐[lnlltteitl lot, mmxt) -- lntlttll + EE
t lot,mXt)
bmma-IE-Llnminhn.ly#*ffInMttiHlglotixt )→41 YEH,) where
*410)=-E[ lnlminhn.lleri-owllf.HE.fi HentaiE- [ lnllxttill lot.tt/--ln1lxtH-4/ftf)9looptt--mgxUlo)
in
z.NO/z-mgx9lo)optweZ
![Page 28: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/28.jpg)
-6-410) Ko) = # [hithertoMll )) :PRz
§§,
Progress rate
apt
- No)=E[ln(minds, Hee -10Wh } )Uco)=E[ hither-10Wh ))
![Page 29: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/29.jpg)
Log-Linear Convergence of scale-invariance step-size ES
Theorem: The (1+1)-ES with step-size proportional to the distance to the optimum converges (log)-linearly on the sphere function almost surely:
σt = σ∥x∥f(x) = g(∥x∥)
1t
ln ∥Xt∥∥X0∥ t→∞
− φ(σ) =: CR(1+1)(σ) feat::tanarAlgo.
![Page 30: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/30.jpg)
Asymptotic Results (n → ∞)
1st -- 014×11
![Page 31: How to update the di erent parameters m](https://reader034.vdocuments.net/reader034/viewer/2022042320/625d01f6954ad420684fccdb/html5/thumbnails/31.jpg)
Asymptotic Results (n → ∞)