a surrogate model based on walsh decomposition for pseudo …verel/talks/poster-ppsn18... ·...
TRANSCRIPT
A Surrogate Model based on Walsh Decomposition
for Pseudo-Boolean FunctionsSebastien Verel1, Bilel Derbel2,3, Arnaud Liefooghe2,3, Hernan Aguirre4, and Kiyoshi Tanaka4
Black-box Pseudo-Boolean Functions
x−→ −→ f (x)
No information on the definition of f
Search space: x ∈ {0,1}n
Objective fonction: given by a com-putation or an (expensive) simulation
Surrogate (meta-model) f :approximation of f , fast to compute
Surrogate models
For numerical problems:
•Huge number of works
•Kriging (Gaussian Proc.):collection of r. v.: N (m,k)mean: m(x) = E[ f (x)]cov.: k(x,x′) = exp(−θdist(x,x′)p)
•Efficient Global Optimization (EGO):maximize expected improvement (EI)...
For combinatorial problems:distance for combinatorial space:
•Radial Basis Function Networks(RBFN) [Moraglio:2011]
•Kriging, EGO [Zaefferer:2014]
Walsh functions in genetic algorithm
Definition [Bethke:1980]For any k ∈ [0,2n−1], Walsh function
ϕk : {0,1}n→{−1,1}
x ∈ {0,1}n, ϕk(x) = (−1)∑n−1j=0 k jx j
(ϕ0, . . . ,ϕ2n−1) is an orthogonal basis:
x0 = 0001 = 0012 = 0103 = 0114 = 1005 = 1016 = 1107 = 111
ϕ0 ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6 ϕ7
1 1 1 1 1 1 1 11 −1 1 −1 1 −1 1 −11 1 −1 −1 1 1 −1 −11 −1 −1 1 1 −1 −1 11 1 1 1 −1 −1 −1 −11 −1 1 −1 −1 1 −1 11 1 −1 −1 −1 −1 1 11 −1 −1 1 −1 1 1 −1
Decomposition of pseudo-bool. func.
∀x ∈ {0,1}n, f (x) =2n−1
∑k=0
wk.ϕk(x)
∀k ∈ [0,2n−1],
wk =12n ∑
x∈{0,1}n
f (x).ϕk(x)
Applications
•Design deceptive functions:average fitness over a schemata of order p is
given by coefficients of order lower than p.
•Grey-box optimization : use linear de-
composition for smart computation (see F.
Chicano, D. Withley, etc.)
Walsh surrogate model
Approximation using Walsh decomposition:
fw(x) = ∑k
wk.ϕk(x)
with k ∈ { j : ord(ϕ j)6 d}
Linear model with predictors:(ϕ0(xi), . . . ,ϕk(xi)), and yi = f (xi).
Estimators based on Mean Square Error
w = argminw ∑xi∈X( fw(xi)− f (xi))2
Estimator types:
•Non sparse estimator: Conjugate Gradient (CG)
•Sparse estimator:Least-Angle Regression (LARS)
Regularization methods (lasso, ridge, etc.)Forward stepwise selection regression that reducethe number of non-zero coefficients
Experimental benchmark: nk-landscapes
nk-landscapes [Kaufmann:1993] :f (x) = 1
n ∑ni=1 fi(xi,xi1, . . . ,xik)
k = 0, linear problem (easy to optimize),k = 1, quadratic problem (∼ UBQP),k = 2, cubic problem (∼ max-3-SAT problem)
n ∈ {10,15,20,25}, k ∈ {0,1,2}, 5 instances.
Solutions generated uniformly at random for training and test sets,
Maximum number of non-zero coefficients:n0 = 1; nd = nd−1+
(nd
)n
ord. 10 15 20 250 1 1 1 11 11 16 21 262 46 121 211 3263 176 576 1351 2626
but number of non-zero coef. is much smallerfor nk-landscapes: ≤ n(2k+1−1)+1
Non-sparse CG vs. sparse LARS
Regression error (test on 103 solutions)according training sample size
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ● ● ●●
● ●
●●
● ● ● ● ● ● ● ● ●0.0
0.1
0.2
0.3
0.4
0.5
0 100 200 300 400
Sample size
Mea
n A
bs. E
rror
of f
itnes
s
method●
●
CGLARS
Walsh (with LARS) vs. Kriging
Mean absolute error according to training sample size
●
●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●●
●
●●
●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●●●
●
●
●● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●● ● ● ●
● ●●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●●
●●
●●
●●
●●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●●
● ● ●●
● ●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ●
● ● ●● ●
● ● ● ● ● ● ● ● ● ●● ●
●●
● ● ● ● ● ● ●
●●
● ●●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
n = 10, k=0 n = 10, k=1 n = 10, k=2
n = 15, k=0 n = 15, k=1 n = 15, k=2
n = 20, k=0 n = 20, k=1 n = 20, k=2
n = 25, k=0 n = 25, k=1 n = 25, k=2
0.00
0.02
0.04
0.06
0.00
0.03
0.06
0.09
0.000
0.025
0.050
0.075
0.100
0.0e+00
5.0e−06
1.0e−05
1.5e−05
2.0e−05
0.00
0.02
0.04
0.00
0.02
0.04
0.06
0.000
0.005
0.010
0.015
0.020
0.00
0.01
0.02
0.03
0.04
0.05
0.00
0.02
0.04
0.06
0.00
0.01
0.02
0.00
0.01
0.02
0.03
0.04
0.05
0.00
0.02
0.04
0 100 200 300 0 100 200 300 0 100 200 300
100 200 300 100 200 300 100 200 300 400
100 200 300 100 200 300 100 200 300 400
100 200 300 0 100 200 300 400 500 0 200 400 600# function evaluations (sample size)
Mea
n ab
solu
te e
rror
Surrogate Model ● kriging walsh
Regression error according to the order of Walsh function
R2 of coefficients according to training sample size
●
●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●●
●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
n=10, k=0 n=10, k=1 n=10, k=2
n=15, k=0 n=15, k=1 n=15, k=2
n=20, k=0 n=20, k=1 n=20, k=2
n=25, k=0 n=25, k=1 n=25, k=2
0.98
0.99
1.00
0.92
0.96
1.00
0.80
0.85
0.90
0.95
1.00
0.985
0.990
0.995
1.000
0.980
0.985
0.990
0.995
1.000
0.80
0.85
0.90
0.95
1.00
0.9900
0.9925
0.9950
0.9975
1.0000
0.925
0.950
0.975
1.000
0.6
0.7
0.8
0.9
1.0
0.992
0.994
0.996
0.998
1.000
0.95
0.96
0.97
0.98
0.99
1.00
0.900
0.925
0.950
0.975
1.000
0 100 200 300 0 100 200 300 0 100 200 300
100 200 300 100 200 300 100 200 300 400
100 200 300 100 200 300 100 200 300 400
100 200 300 0 100 200 300 400 500 0 200 400 600# function evaluations (sample size)
Mea
n sq
uare
d er
ror
(R2)
of w
alsh
coe
ffici
ent e
stim
ate
Walsh order ● 1 2 3
Conclusions and discussion
Surrogate based on Walsh decomposition:
•Relevant orthogonal basis for learning pseudo-boolean function
•Efficient when using machine learning technics
Model is not limited to surrogate :
• From black-box to grey-box: learn a model !application to Cellular automata pb., etc.
•A way to detect interactions between variables
Perspectives
•Replace LARS by others heuristics:from low to larger orders, or others subsets,a priori knowledge on the model, etc.⇒ model selection problem
•Combine Walsh surrogate with grey-box technics
•Bayesian estimation of the coefficientsto compute the estimation error
•Extend to other combinatorial optimization problems
•Apply on expensive optimization problems
1Universite du Littoral Cote d’Opale, LISIC, France2Univ. Lille, CNRS, Centrale Lille, UMR 9189 – CRIStAL, F-59000
Lille, France3Inria Lille – Nord Europe, F-59650 Villeneuve d’Ascq, France —
4Shinshu University, Faculty of Engineering, Nagano, Japan