design of computer experiments (2) with model · luc pronzato (cnrs) design of computer experiments...

151
Design of Computer Experiments — (2) with model — Luc Pronzato Université Côte d’Azur, CNRS, I3S, France

Upload: others

Post on 08-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

Design of Computer Experiments— (2) with model —

Luc Pronzato

Université Côte d’Azur, CNRS, I3S, France

Page 2: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

Plan I

1 Optimal design for Gaussian process models & kriging1.1 Gaussian processes and kriging1.2 Criteria based on MSE1.3 Maximum Entropy Sampling

2 Optimal design for linear regression2.1 Linear regression2.2 Exact design2.3 Approximate design theory2.4 Tensor-product models2.5 Consequences for space-filling design

3 Optimal design for Bayesian prediction3.1 Karhunen-Loève decomposition3.2 Bayesian prediction3.3 IMSE-optimal design

4 Beyond space filling

5 Conclusions part (2)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 2 / 62

Page 3: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

Objectives (same as part (1))

Computer experiments: based on simulations

➤ Usually, x ∈ Rd➠ observation Y (x) (physical experiment)

➤ here, numerical simulation: Y (x) = f (x), observation = evaluation of anunknown function f (·)(no measurement error)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 3 / 62

Page 4: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

Objectives (same as part (1))

Computer experiments: based on simulations

➤ Usually, x ∈ Rd➠ observation Y (x) (physical experiment)

➤ here, numerical simulation: Y (x) = f (x), observation = evaluation of anunknown function f (·)(no measurement error)

from pairs (xi , f (xi)), i = 1, 2, . . . , n

optimization: find x∗ = arg maxx∈X f (x)

inversion: construct {x ∈ X : f (x) = T}estimation of a probability of failure: Prob{f (x) > C} when x ∼ probabilitydensity ϕ(·)sensitivity analysis

approximation/interpolation of f (·) by a predictor ηn(·), to be constructed

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 3 / 62

Page 5: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.1 Gaussian processes and kriging

1 Optimal design for Gaussian process models & kriging

1.1 Gaussian processes and kriging

Model for f (·): Gaussian process

f (x) = r⊤(x)β + Z (x), withr(x) a vector of known functions of x (the trend)Z (x) = realization of a random process (random field), second-order stationary,typically supposed to be Gaussian)E{Z (x)} = 0, E{Z (x)Z (x′)} = σ2 C(x − x′; θ)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 4 / 62

Page 6: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.1 Gaussian processes and kriging

1 Optimal design for Gaussian process models & kriging

1.1 Gaussian processes and kriging

Model for f (·): Gaussian process

f (x) = r⊤(x)β + Z (x), withr(x) a vector of known functions of x (the trend)Z (x) = realization of a random process (random field), second-order stationary,typically supposed to be Gaussian)E{Z (x)} = 0, E{Z (x)Z (x′)} = σ2 C(x − x′; θ)

Computer experiments

Following (Sacks et al., 1989), choose C(δ; θ) continuous at δ = 0, C(0; θ) = 1➠ 2 repetitions at the same x yield the same f (x)

(no measurement error)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 4 / 62

Page 7: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.1 Gaussian processes and kriging

Objective = interpolation (or extrapolation): build a predictor ηn(x) based on asingle realization of Z (·)much different from prediction of other realizations of Z (·) (➠ simply estimate β)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 5 / 62

Page 8: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.1 Gaussian processes and kriging

Objective = interpolation (or extrapolation): build a predictor ηn(x) based on asingle realization of Z (·)much different from prediction of other realizations of Z (·) (➠ simply estimate β)

ordinary kriging

(expression for universal kriging with trend r⊤(x)β, β ∈ Rp, p > 1, are slightlymore complicated):f (x) = β + Z (x) → ηn(x) = ηn[f ](x)

BLUP (Best Linear Unbiased Predictor) at x: ηn(x) = v⊤n (x)yn with

yn = (f (x1), . . . , f (xn))⊤

vn(x) minimizes E{(v⊤n yn − [β + Z (x)])2}

with the constraint E{v⊤n yn} = β

∑ni=1{vn}i = E{f (x)} = β, i.e.,∑n

i=1{vn}i = 1

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 5 / 62

Page 9: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.1 Gaussian processes and kriging

Prediction: ηn(x) = βn + c⊤n (x)C−1

n (yn − βn1)

MSE (Mean-Squared Error) proportional to

ρn(x) =

(1 −

[c⊤

n (x) 1] [ Cn 1

1⊤ 0

]−1 [cn(x)

1

])

[with {Cn}i,j = C((Xi − Xj ); θ), {cn(x)}i = C((Xi − x); θ), βn = (1⊤C−1n yn)/(1⊤C−1

n 1) (WLS) and

1 = (1, . . . , 1)⊤]

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 6 / 62

Page 10: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.1 Gaussian processes and kriging

Prediction: ηn(x) = βn + c⊤n (x)C−1

n (yn − βn1)

MSE (Mean-Squared Error) proportional to

ρn(x) =

(1 −

[c⊤

n (x) 1] [ Cn 1

1⊤ 0

]−1 [cn(x)

1

])

[with {Cn}i,j = C((Xi − Xj ); θ), {cn(x)}i = C((Xi − x); θ), βn = (1⊤C−1n yn)/(1⊤C−1

n 1) (WLS) and

1 = (1, . . . , 1)⊤]

Ex. with d = 1, n = 5(note that ρn(xi) = 0— no measurement error)

−2 0 2 4 6 8 10 12 14 16−4

−3

−2

−1

0

1

2

3

4

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 6 / 62

Page 11: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.1 Gaussian processes and kriging

Prediction: ηn(x) = βn + c⊤n (x)C−1

n (yn − βn1)

MSE (Mean-Squared Error) proportional to

ρn(x) =

(1 −

[c⊤

n (x) 1] [ Cn 1

1⊤ 0

]−1 [cn(x)

1

])

[with {Cn}i,j = C((Xi − Xj ); θ), {cn(x)}i = C((Xi − x); θ), βn = (1⊤C−1n yn)/(1⊤C−1

n 1) (WLS) and

1 = (1, . . . , 1)⊤]

Ex. with d = 2, n = 20(Xn = random Lh)

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

−1.5

−1

−0.5

0

0.5

1

1.5

f(x)

−1

−0.5

0

0.5

1

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 6 / 62

Page 12: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.1 Gaussian processes and kriging

Prediction: ηn(x) = βn + c⊤n (x)C−1

n (yn − βn1)

MSE (Mean-Squared Error) proportional to

ρn(x) =

(1 −

[c⊤

n (x) 1] [ Cn 1

1⊤ 0

]−1 [cn(x)

1

])

[with {Cn}i,j = C((Xi − Xj ); θ), {cn(x)}i = C((Xi − x); θ), βn = (1⊤C−1n yn)/(1⊤C−1

n 1) (WLS) and

1 = (1, . . . , 1)⊤]

Ex. with d = 2, n = 20(Xn = random Lh)

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

−1.5

−1

−0.5

0

0.5

1

1.5

kriging prediction

−1

−0.5

0

0.5

1

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 6 / 62

Page 13: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.1 Gaussian processes and kriging

Prediction: ηn(x) = βn + c⊤n (x)C−1

n (yn − βn1)

MSE (Mean-Squared Error) proportional to

ρn(x) =

(1 −

[c⊤

n (x) 1] [ Cn 1

1⊤ 0

]−1 [cn(x)

1

])

[with {Cn}i,j = C((Xi − Xj ); θ), {cn(x)}i = C((Xi − x); θ), βn = (1⊤C−1n yn)/(1⊤C−1

n 1) (WLS) and

1 = (1, . . . , 1)⊤]

Ex. with d = 2, n = 20(Xn = random Lh)

f(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 6 / 62

Page 14: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.1 Gaussian processes and kriging

Prediction: ηn(x) = βn + c⊤n (x)C−1

n (yn − βn1)

MSE (Mean-Squared Error) proportional to

ρn(x) =

(1 −

[c⊤

n (x) 1] [ Cn 1

1⊤ 0

]−1 [cn(x)

1

])

[with {Cn}i,j = C((Xi − Xj ); θ), {cn(x)}i = C((Xi − x); θ), βn = (1⊤C−1n yn)/(1⊤C−1

n 1) (WLS) and

1 = (1, . . . , 1)⊤]

Ex. with d = 2, n = 20(Xn = random Lh)

kriging prediction

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 6 / 62

Page 15: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

1.2 Criteria based on MSE

A natural idea: minimize ρn(x) for all x

In practice:

minimize MMSE(Xn) = maxx∈X ρn(x)

minimize IMSE(Xn) =∫

Xρn(x)dµ(x), with µ(·) some measure of interest

over X

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62

Page 16: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

1.2 Criteria based on MSE

A natural idea: minimize ρn(x) for all x

In practice:

minimize MMSE(Xn) = maxx∈X ρn(x)

minimize IMSE(Xn) =∫

Xρn(x)dµ(x), with µ(·) some measure of interest

over X

Optimal designs are typically space-filling:

Johnson et al. (1990): if C(x − x′) = c(‖x − x′‖) with c(·) decreasing, then X∗n

optimal for ΦmM(·) (miniMax optimal) tends to be optimal for MMSE(Xn) withcovariance Ca(x − x′) = [C(x − x′)]a when a → ∞

➠ no point xi on the boundary of X

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62

Page 17: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

small

inter-distance

(miniMax,

dispersion)

large

intra-distances

(Maximin, energy)

uniformity

(entropy,

discrepancy)

Inequalities, bounds

Minimize

max. of MSE

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 8 / 62

Page 18: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Xn = {x1, . . . , xn}

Calculation of MMSE(Xn):

Compute ρn(x(k)) for a finite Q-points set XQ = {x(1), . . . , x(Q)}(e.g., first Q point of a LDS in X ),

then MMSE(Xn) ≃ maxk ρn(x(k)),to be minimized, for instance by simulated annealing

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 9 / 62

Page 19: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Xn = {x1, . . . , xn}

Calculation of MMSE(Xn):

Compute ρn(x(k)) for a finite Q-points set XQ = {x(1), . . . , x(Q)}(e.g., first Q point of a LDS in X ),

then MMSE(Xn) ≃ maxk ρn(x(k)),to be minimized, for instance by simulated annealing

Calculation of IMSE(Xn) =∫

Xρn(x)dµ(x) (Gauthier and P. 2014, 2016) :

Without trend (r(x) = 0 ∀x) ➠ ρn(x) = 1 − c⊤n (x)C−1

n cn(x),where {cn(x)}i = C(x − xi), {Cn}ij = C(xi − xj)

IMSE(Xn) = 1 − trace

[C−1

n

X

cn(x)c⊤n (x)dµ(x)

]

= 1 − trace[C−1

n Σn

]

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 9 / 62

Page 20: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Calculation for a finite Q-points set XQ = {x(1), . . . , x(Q)}:

IMSE(Xn) ≃ IMSE(Xn) =

Q∑

k=1

wk ρn(x(k))

= 1 − trace[C−1

n Σn

]

with∑Q

k=1 wk = 1 (wk = 1/Q when µ if uniform)

and Σn =∑Q

k=1 wkcn(x(k))c⊤n (x(k))

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 10 / 62

Page 21: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Calculation for a finite Q-points set XQ = {x(1), . . . , x(Q)}:

IMSE(Xn) ≃ IMSE(Xn) =

Q∑

k=1

wk ρn(x(k))

= 1 − trace[C−1

n Σn

]

with∑Q

k=1 wk = 1 (wk = 1/Q when µ if uniform)

and Σn =∑Q

k=1 wkcn(x(k))c⊤n (x(k))

If, moreover, x1, . . . , xn ∈ XQ , with xi = x(ki ), i = 1, . . . , n,

then Σn = {QWQ}JnJn

with {Q}kℓ = C(x(k) − x(ℓ)), W = diag{w1, . . . , wQ}and Jn = {k1, . . . , kn}

➠ IMSE(Xn) ≃ 1 − trace[QJnJn

−1{QWQ}JnJn

]

not expensive to compute once Q and QWQ have been calculated

(a bit more complicated with a trend r⊤(x)β)

Minimization not obvious (for instance, by simulated annealing),see § 3.3 for another approach

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 10 / 62

Page 22: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Ex. of IMSE-optimal design (Gauthier & P., 2014, 2016):

X = regular grid with 372 = 1 369 pointsC(x − x′) = C1({x}1 − {x′}1) × C2({x}2 − {x′}2),Ci(x − x ′) = (1 + 25/

√3|x − x ′|) exp[−25/

√3|x − x ′|] (Matérn 3/2)

x1

0.2

0.4

0.6

0.8

x2

0.2

0.4

0.6

0.80.000

0.001

0.002

0.003

Grid weight ωk

(measure of interest µ)

0.0000

0.0005

0.0010

0.0015

0.2 0.4 0.6 0.8

0.2

0.4

0.6

0.8

Grid pointsDesign points

33-point optimal design

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 11 / 62

Page 23: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

Page 24: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

For MMSE : do not choose x∗n+1 = arg minx∈X MMSE({Xn, x})!

➠ take instead x∗n+1 = arg maxx∈X ρn(x)

ρ5(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

Page 25: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

For MMSE : do not choose x∗n+1 = arg minx∈X MMSE({Xn, x})!

➠ take instead x∗n+1 = arg maxx∈X ρn(x)

ρ6(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

Page 26: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

For MMSE : do not choose x∗n+1 = arg minx∈X MMSE({Xn, x})!

➠ take instead x∗n+1 = arg maxx∈X ρn(x)

ρ7(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

Page 27: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

For MMSE : do not choose x∗n+1 = arg minx∈X MMSE({Xn, x})!

➠ take instead x∗n+1 = arg maxx∈X ρn(x)

ρ8(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

Page 28: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

For MMSE : do not choose x∗n+1 = arg minx∈X MMSE({Xn, x})!

➠ take instead x∗n+1 = arg maxx∈X ρn(x)

ρ9(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.01

0.02

0.03

0.04

0.05

0.06

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

Page 29: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

For MMSE : do not choose x∗n+1 = arg minx∈X MMSE({Xn, x})!

➠ take instead x∗n+1 = arg maxx∈X ρn(x)

ρ10(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

Page 30: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

For MMSE : do not choose x∗n+1 = arg minx∈X MMSE({Xn, x})!

➠ take instead x∗n+1 = arg maxx∈X ρn(x)

ρ11(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.005

0.01

0.015

0.02

0.025

0.03

0.035

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

Page 31: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

For MMSE : do not choose x∗n+1 = arg minx∈X MMSE({Xn, x})!

➠ take instead x∗n+1 = arg maxx∈X ρn(x)

ρ12(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.005

0.01

0.015

0.02

0.025

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

Page 32: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

For MMSE : do not choose x∗n+1 = arg minx∈X MMSE({Xn, x})!

➠ take instead x∗n+1 = arg maxx∈X ρn(x)

ρ13(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.002

0.004

0.006

0.008

0.01

0.012

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

Page 33: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

For MMSE : do not choose x∗n+1 = arg minx∈X MMSE({Xn, x})!

➠ take instead x∗n+1 = arg maxx∈X ρn(x)

ρ14(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

3.5x 10

−3

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

Page 34: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

For MMSE : do not choose x∗n+1 = arg minx∈X MMSE({Xn, x})!

➠ take instead x∗n+1 = arg maxx∈X ρn(x)

ρ15(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5x 10

−3

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

Page 35: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

For MMSE : do not choose x∗n+1 = arg minx∈X MMSE({Xn, x})!

➠ take instead x∗n+1 = arg maxx∈X ρn(x)

ρ16(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

1

2

3

4

5

6x 10

−4

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

Page 36: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

For MMSE : do not choose x∗n+1 = arg minx∈X MMSE({Xn, x})!

➠ take instead x∗n+1 = arg maxx∈X ρn(x)

ρ17(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8x 10

−4

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

Page 37: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.3 Maximum Entropy Sampling

1.3 Maximum Entropy Sampling (Shewry and Wynn, 1987)

Without trend (r(x) = 0 ∀x) :➤ zQ , vector with components Z (x(k)), x(k) ∈ XQ

➤ zn , vector with components Z (xi), i = 1, . . . , n (observations)➤ H1(z) , −

∫ϕ(z) log[ϕ(z)] dz Shannon entropy of ϕ(z)

= measure of “dispersion”H1(z1|z2) , conditional entropy of z1 given z2

=∫ [

−∫

ϕ(z1|z2) log[ϕ(z1|z2)] dz1]

ϕ(z2) d(z2)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 13 / 62

Page 38: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.3 Maximum Entropy Sampling

1.3 Maximum Entropy Sampling (Shewry and Wynn, 1987)

Without trend (r(x) = 0 ∀x) :➤ zQ , vector with components Z (x(k)), x(k) ∈ XQ

➤ zn , vector with components Z (xi), i = 1, . . . , n (observations)➤ H1(z) , −

∫ϕ(z) log[ϕ(z)] dz Shannon entropy of ϕ(z)

= measure of “dispersion”H1(z1|z2) , conditional entropy of z1 given z2

=∫ [

−∫

ϕ(z1|z2) log[ϕ(z1|z2)] dz1]

ϕ(z2) d(z2)

We get H1(yQ) = H1(yn) + E{H1(yQ |yn)}

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 13 / 62

Page 39: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.3 Maximum Entropy Sampling

1.3 Maximum Entropy Sampling (Shewry and Wynn, 1987)

Without trend (r(x) = 0 ∀x) :➤ zQ , vector with components Z (x(k)), x(k) ∈ XQ

➤ zn , vector with components Z (xi), i = 1, . . . , n (observations)➤ H1(z) , −

∫ϕ(z) log[ϕ(z)] dz Shannon entropy of ϕ(z)

= measure of “dispersion”H1(z1|z2) , conditional entropy of z1 given z2

=∫ [

−∫

ϕ(z1|z2) log[ϕ(z1|z2)] dz1]

ϕ(z2) d(z2)

We get H1(yQ)︸ ︷︷ ︸=constant

= H1(yn) + E{H1(yQ |yn)}︸ ︷︷ ︸to be minimized

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 13 / 62

Page 40: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.3 Maximum Entropy Sampling

1.3 Maximum Entropy Sampling (Shewry and Wynn, 1987)

Without trend (r(x) = 0 ∀x) :➤ zQ , vector with components Z (x(k)), x(k) ∈ XQ

➤ zn , vector with components Z (xi), i = 1, . . . , n (observations)➤ H1(z) , −

∫ϕ(z) log[ϕ(z)] dz Shannon entropy of ϕ(z)

= measure of “dispersion”H1(z1|z2) , conditional entropy of z1 given z2

=∫ [

−∫

ϕ(z1|z2) log[ϕ(z1|z2)] dz1]

ϕ(z2) d(z2)

We get H1(yQ)︸ ︷︷ ︸=constant

= H1(yn) + E{H1(yQ |yn)}︸ ︷︷ ︸to be minimized

Minimize E{H1(yQ |yn)} w.r.t. Xn ⇔ maximize H1(yn)

Z (x) is Gaussian ➠ maximize det[Cn]

= intra-distances criterion

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 13 / 62

Page 41: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.3 Maximum Entropy Sampling

Sequential construction of an optimal design:

xn+1 = arg maxx∈X det[Cn+1] = arg maxx∈X det

[Cn cn(x)

c⊤n (x) 1

]

︸ ︷︷ ︸=det[Cn] (1 − c⊤

n (x)C−1n cn(x))︸ ︷︷ ︸

=ρn(x)

➠ xn+1 = arg maxx∈X ρn(x)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 14 / 62

Page 42: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.3 Maximum Entropy Sampling

Sequential construction of an optimal design:

xn+1 = arg maxx∈X det[Cn+1] = arg maxx∈X det

[Cn cn(x)

c⊤n (x) 1

]

︸ ︷︷ ︸=det[Cn] (1 − c⊤

n (x)C−1n cn(x))︸ ︷︷ ︸

=ρn(x)

➠ xn+1 = arg maxx∈X ρn(x)

The designs obtained are typically space-filling:

Johnson et al. (1990) : if C(x − x′) = c(‖x − x′‖) with c(·) decreasing, then X∗n

optimal for ΦMm(·) (Maximin optimal) tends to be optimal for det[Cn] withcovariance Ca(x − x′) = [C(x − x′)]a when a → ∞

➠ there are design points xi on the boundary of X

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 14 / 62

Page 43: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

1 Optimal design for Gaussian process models & kriging 1.3 Maximum Entropy Sampling

small

inter-distance

(miniMax,

dispersion)

large

intra-distances

(Maximin, energy)

uniformity

(entropy,

discrepancy)

Inequalities, bounds

Max. Entropy

SamplingMinimize

max. of MSE

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 15 / 62

Page 44: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.1 Linear regression

2 Optimal design for linear regression

2.1 Linear regression

Observations yi = y(xi) = r⊤(xi)γ + εi , γ ∈ Rp

with (εi) i.i.d., E{εi} = 0, var{εi} = σ2 ∀i

Estimation of γ by Least-Squares (LS)

γn = (R⊤n Rn)−1R⊤

n yn, with yn = (y1, . . . , yn)⊤ and Rn =

r⊤(x1)...

r⊤(xn)

E{γn} = γ (unbiased)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 16 / 62

Page 45: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.1 Linear regression

2 Optimal design for linear regression

2.1 Linear regression

Observations yi = y(xi) = r⊤(xi)γ + εi , γ ∈ Rp

with (εi) i.i.d., E{εi} = 0, var{εi} = σ2 ∀i

Estimation of γ by Least-Squares (LS)

γn = (R⊤n Rn)−1R⊤

n yn, with yn = (y1, . . . , yn)⊤ and Rn =

r⊤(x1)...

r⊤(xn)

E{γn} = γ (unbiased)

Covariance = cov(γn) = σ2(R⊤n Rn)−1 = σ2

n

1

n

n∑

i=1

r(xi)r⊤(xi)

︸ ︷︷ ︸Mn

−1

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 16 / 62

Page 46: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.1 Linear regression

cov(γn) = σ2

nM−1

n , with

Mn = M(Xn) = 1n

∑ni=1 r(xi)r

⊤(xi) ∈ Rp×p

= information matrix (per observation)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 17 / 62

Page 47: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.1 Linear regression

cov(γn) = σ2

nM−1

n , with

Mn = M(Xn) = 1n

∑ni=1 r(xi)r

⊤(xi) ∈ Rp×p

= information matrix (per observation)

Optimal design X∗n: maximizes a scalar function Φ(·) of Mn (with Φ(·) Loewner

increasing)

E -optimality: maximize λmin(Mn)(minimize longest axis of confidence ellipsoids for γ)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 17 / 62

Page 48: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.1 Linear regression

cov(γn) = σ2

nM−1

n , with

Mn = M(Xn) = 1n

∑ni=1 r(xi)r

⊤(xi) ∈ Rp×p

= information matrix (per observation)

Optimal design X∗n: maximizes a scalar function Φ(·) of Mn (with Φ(·) Loewner

increasing)

E -optimality: maximize λmin(Mn)(minimize longest axis of confidence ellipsoids for γ)

A-optimality: maximize −trace[M−1n ] ⇔ maximize 1/trace[M−1

n ](minimize sum of squared lengths of axes of confidence ellipsoids for γ)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 17 / 62

Page 49: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.1 Linear regression

cov(γn) = σ2

nM−1

n , with

Mn = M(Xn) = 1n

∑ni=1 r(xi)r

⊤(xi) ∈ Rp×p

= information matrix (per observation)

Optimal design X∗n: maximizes a scalar function Φ(·) of Mn (with Φ(·) Loewner

increasing)

E -optimality: maximize λmin(Mn)(minimize longest axis of confidence ellipsoids for γ)

A-optimality: maximize −trace[M−1n ] ⇔ maximize 1/trace[M−1

n ](minimize sum of squared lengths of axes of confidence ellipsoids for γ)

more generally, L-optimality: maximize −trace[LM−1n ]

(we only consider the case L symmetric positive definite)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 17 / 62

Page 50: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.1 Linear regression

D-optimality: maximize log det Mn

(minimize volume of confidence ellipsoids for γ)Very much used:

a D-optimal design is invariant by reparametrization:

det M′

n(β(γ)) = det Mn(γ) det−2

(∂β

∂γ⊤

)

➠ often leads to repeat the same experimental conditions (replications)(we assumed i.i.d. errors εi ⇒ several observations at the same xi carryinformation)

Tensor-product models

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 18 / 62

Page 51: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.2 Exact design

2.2 Exact design

n observations at x1, . . . , xn, xi ∈ Rd

Maximize Φ(Mn) w.r.t. Xn = {x1, . . . , xn} ∈ Rn×d

with Mn = M(Xn) = 1n

∑ni=1 r(xi)r

⊤(xi)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 19 / 62

Page 52: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.2 Exact design

2.2 Exact design

n observations at x1, . . . , xn, xi ∈ Rd

Maximize Φ(Mn) w.r.t. Xn = {x1, . . . , xn} ∈ Rn×d

with Mn = M(Xn) = 1n

∑ni=1 r(xi)r

⊤(xi)

➤ If problem dimension n × d is not too big➠ “standard” algorithm (but careful with constraints and local optimas!)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 19 / 62

Page 53: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.2 Exact design

2.2 Exact design

n observations at x1, . . . , xn, xi ∈ Rd

Maximize Φ(Mn) w.r.t. Xn = {x1, . . . , xn} ∈ Rn×d

with Mn = M(Xn) = 1n

∑ni=1 r(xi)r

⊤(xi)

➤ If problem dimension n × d is not too big➠ “standard” algorithm (but careful with constraints and local optimas!)

➤ Otherwise, ➠ specific algorithm

Exchange method: at step k, exchange one support point xj with a better one x∗

in X (better for Φ(·))

Xkn = (x1, . . . , xj

lx∗

, . . . , xn)

Approximate design

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 19 / 62

Page 54: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.2 Exact design

Fedorov (1972) algorithm:

At each iteration k, consider all n possible exchanges successively, each timestarting from Xk

n , retain the «best» one among these n → Xk+1n ➠ Xk+1

n

Xkn = ( x1

lx∗

1

, . . . , xj

lx∗

j

, . . . , xn

lx∗

n

)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 20 / 62

Page 55: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.2 Exact design

Fedorov (1972) algorithm:

At each iteration k, consider all n possible exchanges successively, each timestarting from Xk

n , retain the «best» one among these n → Xk+1n ➠ Xk+1

n

Xkn = ( x1

lx∗

1

, . . . , xj

lx∗

j

, . . . , xn

lx∗

n

)

One iteration → n optimizations of dimension d followed by ranking n criterionvalues

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 20 / 62

Page 56: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.2 Exact design

DETMAX algorithm Mitchell (1974):

If one additional observation were allowed: optimal choice

Xk+n+1 = (x1, . . . , xj , . . . , xn, x∗

n+1)

Then, remove one support point to return to a n-points design:➔ consider all n + 1 possible cancellations,retain the less penalizing in the sense of Φ(·)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 21 / 62

Page 57: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.2 Exact design

DETMAX algorithm Mitchell (1974):

If one additional observation were allowed: optimal choice

Xk+n+1 = (x1, . . . , xj , . . . , xn, x∗

n+1)

Then, remove one support point to return to a n-points design:➔ consider all n + 1 possible cancellations,retain the less penalizing in the sense of Φ(·)

➔ globally, exchange some xj with x∗n+1

[= excursion of length 1, longer excursions are possible. . . ]One iteration → 1 optimization of dimension d followed by ranking n + 1 criterionvalues

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 21 / 62

Page 58: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.2 Exact design

DETMAX has simpler iterations than Fedorov, but usually requires moreiterations

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 22 / 62

Page 59: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.2 Exact design

DETMAX has simpler iterations than Fedorov, but usually requires moreiterations

dead ends are possible:• DETMAX: the point to be removed is xn+1

• Fedorov: no possible improvement when optimizing one xi at a time

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 22 / 62

Page 60: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.2 Exact design

DETMAX has simpler iterations than Fedorov, but usually requires moreiterations

dead ends are possible:• DETMAX: the point to be removed is xn+1

• Fedorov: no possible improvement when optimizing one xi at a time

▲ both give local optima only ▲

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 22 / 62

Page 61: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.2 Exact design

DETMAX has simpler iterations than Fedorov, but usually requires moreiterations

dead ends are possible:• DETMAX: the point to be removed is xn+1

• Fedorov: no possible improvement when optimizing one xi at a time

▲ both give local optima only ▲

Other methods:

Branch and bound: guaranteed convergence, but complicated [Welch 1982]Rounding an optimal design measure (support points xi and associatedweights w∗

i , i = 1, . . . , m, presented next in § 2.3):choose n integers ri (ri = nb. of replications of observations at xi ) such that∑m

i=1 ri = n and ri /n ≈ w∗

i

(e.g., maximize mini=1,...,m ri /w∗

i = Adams apportionment, see [Pukelsheim &Reider 1992])

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 22 / 62

Page 62: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.3 Approximate design theory

2.3 Approximate design theory

(Chernoff, 1953; Kiefer and Wolfowitz, 1960; Fedorov, 1972; Silvey, 1980;Pukelsheim, 1993) . . .

Mn = M(Xn) = 1n

∑ni=1 r(xi)r

⊤(xi)

(the additive form is essential — related to the independence of observations)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 23 / 62

Page 63: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.3 Approximate design theory

2.3 Approximate design theory

(Chernoff, 1953; Kiefer and Wolfowitz, 1960; Fedorov, 1972; Silvey, 1980;Pukelsheim, 1993) . . .

Mn = M(Xn) = 1n

∑ni=1 r(xi)r

⊤(xi)

(the additive form is essential — related to the independence of observations)

If several xi coincide (repetitions), with only m < n different xi

M(Xn) =∑m

i=1ri

nr(xi)r

⊤(xi)

ri

n= proportion of observations collected at xi

= «percentage of experimental effort» at xi

= weight wi of support point xi

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 23 / 62

Page 64: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.3 Approximate design theory

M(Xn) =∑m

i=1 wi r(xi)r⊤(xi)

➠ design Xn ⇔{

x1 · · · xm

w1 · · · wm

}with

∑mi=1 wi = 1

➠ normalized discrete distribution on the xi ,

with constraints wi = ri/n

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 24 / 62

Page 65: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.3 Approximate design theory

M(Xn) =∑m

i=1 wi r(xi)r⊤(xi)

➠ design Xn ⇔{

x1 · · · xm

w1 · · · wm

}with

∑mi=1 wi = 1

➠ normalized discrete distribution on the xi ,

with constraints wi = ri/n

➠ Release the constraints: only enforce wi ≥ 0 et∑m

i=1 wi = 1➠ ξ = discrete probability measure on X

support points xi and associated weights wi

= “approximate design”

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 24 / 62

Page 66: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.3 Approximate design theory

M(Xn) =∑m

i=1 wi r(xi)r⊤(xi)

➠ design Xn ⇔{

x1 · · · xm

w1 · · · wm

}with

∑mi=1 wi = 1

➠ normalized discrete distribution on the xi ,

with constraints wi = ri/n

➠ Release the constraints: only enforce wi ≥ 0 et∑m

i=1 wi = 1➠ ξ = discrete probability measure on X

support points xi and associated weights wi

= “approximate design”More general expression: ξ = any probability measure on X

M(ξ) =

X

r(x)r⊤(x) ξ(dx) with

X

ξ(dx) = 1

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 24 / 62

Page 67: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.3 Approximate design theory

M(ξ) ∈ convex closure of M= set of rank 1 matricesM(δx ) = r(x)r⊤(x)

M(ξ) is symmetric p × p ⇒ ∈ q-dimensional space, q = p(p+1)2

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 25 / 62

Page 68: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.3 Approximate design theory

M(ξ) ∈ convex closure of M= set of rank 1 matricesM(δx ) = r(x)r⊤(x)

M(ξ) is symmetric p × p ⇒ ∈ q-dimensional space, q = p(p+1)2

ξ = w1δx1 + w2δx2 + w3δx3

(3 points are enough for q = 2)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 25 / 62

Page 69: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.3 Approximate design theory

Caratheodory Theorem:

M(ξ) can be written as the linear combination of at most q + 1 elements of M:

M(ξ) =∑m

i=1 wi r(xi)r⊤(xi) , m ≤ p(p+1)

2 + 1

⇒ consider discrete probability measures with p(p+1)2 + 1 support points at most

(true in particular for the optimum design!)[Even better: for many criteria Φ(·), if ξ∗ is optimal (maximizes Φ[M(ξ)]) then M(ξ∗) is

on the boundary of the convex closure of M and p(p+1)2 support points are enough]

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 26 / 62

Page 70: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.3 Approximate design theory

Caratheodory Theorem:

M(ξ) can be written as the linear combination of at most q + 1 elements of M:

M(ξ) =∑m

i=1 wi r(xi)r⊤(xi) , m ≤ p(p+1)

2 + 1

⇒ consider discrete probability measures with p(p+1)2 + 1 support points at most

(true in particular for the optimum design!)[Even better: for many criteria Φ(·), if ξ∗ is optimal (maximizes Φ[M(ξ)]) then M(ξ∗) is

on the boundary of the convex closure of M and p(p+1)2 support points are enough]

Suppose we found an optimal ξ∗ =∑m

i=1 w∗i δxi

☞ for a given n, choose the ri so that ri

n≃ w∗

i optimum→ rounding of an approximate design

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 26 / 62

Page 71: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.3 Approximate design theory

Caratheodory Theorem:

M(ξ) can be written as the linear combination of at most q + 1 elements of M:

M(ξ) =∑m

i=1 wi r(xi)r⊤(xi) , m ≤ p(p+1)

2 + 1

⇒ consider discrete probability measures with p(p+1)2 + 1 support points at most

(true in particular for the optimum design!)[Even better: for many criteria Φ(·), if ξ∗ is optimal (maximizes Φ[M(ξ)]) then M(ξ∗) is

on the boundary of the convex closure of M and p(p+1)2 support points are enough]

Suppose we found an optimal ξ∗ =∑m

i=1 w∗i δxi

☞ for a given n, choose the ri so that ri

n≃ w∗

i optimum→ rounding of an approximate design

Why design measures are interesting?How does it simplify the optimization problem?

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 26 / 62

Page 72: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.3 Approximate design theory

➠ Maximize Φ(·) concave w.r.t. M(ξ) in a convex setEx: D-optimality: ∀ M1 ≻ O, M2 � O, with M2 6∝ M2, ∀α, 0 < α < 1,log det[(1 − α)M1 + αM2] > (1 − α) log det M1 + α log det M2

⇒ log det[·] is (strictly) concaveconvex set + concave criterion ⇒ one unique optimum!

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 27 / 62

Page 73: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.3 Approximate design theory

➠ Maximize Φ(·) concave w.r.t. M(ξ) in a convex setEx: D-optimality: ∀ M1 ≻ O, M2 � O, with M2 6∝ M2, ∀α, 0 < α < 1,log det[(1 − α)M1 + αM2] > (1 − α) log det M1 + α log det M2

⇒ log det[·] is (strictly) concaveconvex set + concave criterion ⇒ one unique optimum!

ξ∗ is optimal ⇔ directional derivative≤ 0 in all directions

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 27 / 62

Page 74: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.3 Approximate design theory

=⇒ “Equivalence Theorem” [Kiefer & Wolfowitz 1960]Ξ = set of probability measures on X , Φ(·) concave, φ(ξ) = Φ[M(ξ)]

Fφ(ξ; ν) = limα→0+φ[(1−α)ξ+αν]−φ(ξ)

α= directional derivative of φ(·) at ξ in direction ν

Equivalence Theorem: ξ∗ maximizes φ(ξ) ⇔ maxν∈Ξ Fφ(ξ∗; ν) ≤ 0

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 28 / 62

Page 75: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.3 Approximate design theory

=⇒ “Equivalence Theorem” [Kiefer & Wolfowitz 1960]Ξ = set of probability measures on X , Φ(·) concave, φ(ξ) = Φ[M(ξ)]

Fφ(ξ; ν) = limα→0+φ[(1−α)ξ+αν]−φ(ξ)

α= directional derivative of φ(·) at ξ in direction ν

Equivalence Theorem: ξ∗ maximizes φ(ξ) ⇔ maxν∈Ξ Fφ(ξ∗; ν) ≤ 0

➔ Takes a simple form when Φ(·) is differentiable

ξ∗ maximizes φ(ξ) ⇔ maxx∈X Fφ(ξ∗; δx) ≤ 0

☞ Check optimality of ξ∗ by plotting Fφ(ξ∗; δx)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 28 / 62

Page 76: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.3 Approximate design theory

Ex: D-optimal design

ξ∗D maximizes log det[M(ξ)] w.r.t. ξ ∈ Ξ

⇔ maxx∈X d(ξ∗D , x) ≤ p

⇔ ξ∗D minimizes maxx∈X d(ξ, x) w.r.t. ξ ∈ Ξ

where d(ξ, x) = r⊤(x)M−1(ξ)r(x)Moreover, d(ξ∗

D , xi) = p = dim(θ) for any xi = support point of ξ∗D

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 29 / 62

Page 77: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.3 Approximate design theory

Ex. : r(x) = (1 x x2)⊤ (p = 3) i.i.d. erreurs, X = [0, 2]➠ d(ξ, x) as a function of x

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 30 / 62

Page 78: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.3 Approximate design theory

Ex. : r(x) = (1 x x2)⊤ (p = 3) i.i.d. erreurs, X = [0, 2]➠ d(ξ, x) as a function of x

ξ∗D =

{0 1 2

1/3 1/3 1/3

}

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 21

1.5

2

2.5

3

3.5

1/3 1/3 1/3

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 30 / 62

Page 79: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.3 Approximate design theory

Ex. : r(x) = (1 x x2)⊤ (p = 3) i.i.d. erreurs, X = [0, 2]➠ d(ξ, x) as a function of x

ξ∗D =

{0 1 2

1/3 1/3 1/3

}

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 21

1.5

2

2.5

3

3.5

1/3 1/3 1/3

ξ =

{0 1.5 2

1/3 1/2 1/6

}

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 21

1.5

2

2.5

3

3.5

4

4.5

5

5.5

6

1/3 1/2 1/6

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 30 / 62

Page 80: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.3 Approximate design theory

KW Eq. Th. relates optimality in γ space (parameters)to optimality in y space (observations)

nvar[r⊤(x)γn)] = σ2 r⊤(x)M−1(ξ)r(x) = σ2 d(ξ, x) (i.i.d. errors)

D-optimality ⇔ G-optimality

➠ ξ∗D minimizes the maximum value of prediction variance over X

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 31 / 62

Page 81: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.3 Approximate design theory

KW Eq. Th. relates optimality in γ space (parameters)to optimality in y space (observations)

nvar[r⊤(x)γn)] = σ2 r⊤(x)M−1(ξ)r(x) = σ2 d(ξ, x) (i.i.d. errors)

D-optimality ⇔ G-optimality

➠ ξ∗D minimizes the maximum value of prediction variance over X

η(x, γn)η(x, γn)± 2 standard deviations

➠ put next observation where d(ξ, x) islarge

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Tensor-product models

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 31 / 62

Page 82: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.3 Approximate design theory

Construction of an optimal design measure

Central idea (▲ for a differentiable Φ(·) ▲): use steepest-ascent directionFedorov–Wynn :

1 : Choose ξ1 non degenerate (det M(ξ1) > 0)

2 : Compute x∗k = arg maxX Fφ(ξk ; δx)

If Fφ(ξk ; δx+k) < ǫ, stop: ξk is ǫ-optimal

3 : ξk+1 = (1 − αk)ξk + αkδx∗

k(delta measure at x∗

k)[Vertex Direction]

k → k + 1, return to step 2

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 32 / 62

Page 83: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.3 Approximate design theory

Construction of an optimal design measure

Central idea (▲ for a differentiable Φ(·) ▲): use steepest-ascent directionFedorov–Wynn :

1 : Choose ξ1 non degenerate (det M(ξ1) > 0)

2 : Compute x∗k = arg maxX Fφ(ξk ; δx)

If Fφ(ξk ; δx+k) < ǫ, stop: ξk is ǫ-optimal

3 : ξk+1 = (1 − αk)ξk + αkδx∗

k(delta measure at x∗

k)[Vertex Direction]

k → k + 1, return to step 2

Step size αk?➠ αk = arg max φ(ξk+1)

=d(ξk ,x∗

k )−p

p[d(ξk ,x∗

k)−1] for D-optimality (Fedorov, 1972)

→ monotone convergence➠ αk > 0 , limk→∞ αk = 0 ,

∑∞i=1 αk = ∞

((Wynn, 1970) for D-optimality)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 32 / 62

Page 84: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.3 Approximate design theory

Remarks:

Sequential design, one xi at at a time enters M(X) :M(Xk+1) = k

k+1 M(Xk) + 1k+1 r(xk+1)r⊤(xk+1)

with xk+1 = arg maxx∈X Fφ(ξk ; δx)⇔ Wynn’s algorithm with αk = 1

k+1

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 33 / 62

Page 85: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.3 Approximate design theory

Remarks:

Sequential design, one xi at at a time enters M(X) :M(Xk+1) = k

k+1 M(Xk) + 1k+1 r(xk+1)r⊤(xk+1)

with xk+1 = arg maxx∈X Fφ(ξk ; δx)⇔ Wynn’s algorithm with αk = 1

k+1

Guaranteed convergence to the optimum

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 33 / 62

Page 86: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.3 Approximate design theory

Remarks:

Sequential design, one xi at at a time enters M(X) :M(Xk+1) = k

k+1 M(Xk) + 1k+1 r(xk+1)r⊤(xk+1)

with xk+1 = arg maxx∈X Fφ(ξk ; δx)⇔ Wynn’s algorithm with αk = 1

k+1

Guaranteed convergence to the optimum

There exist faster methods:

remove support points from ξk (≈ allow αk to be < 0) (Atwood, 1973;Böhning, 1985, 1986)combine with gradient projection (or a second-order method) (Wu, 1978)use a multiplicative algorithm (Titterington, 1976; Torsney, 1983, 2009; Yu,2010) (for A and D optimality, far from the optimum)combine different methods (Yu, 2011)Still an active topic — especially for non differentiable Φ(·). . .

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 33 / 62

Page 87: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.4 Tensor-product models

2.4 Tensor-product models

D-optimality (also true for A-optimality under some conditions (Schwabe, 1996))

[r(k)(x)]⊤θ(k) ,∑dk

i=1 θ(k)i x i polynomial with degree dk , dim(θ(k)) = pk = 1 + dk

Global model for x = ({x}1, {x}2, . . . , {x}d)⊤:

r⊤(x)γ =∏d

k=1[r(k)(x)]⊤θ(k),

total degree∑d

k=1 dk , dim(γ) =∏d

k=1 pk

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 34 / 62

Page 88: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.4 Tensor-product models

2.4 Tensor-product models

D-optimality (also true for A-optimality under some conditions (Schwabe, 1996))

[r(k)(x)]⊤θ(k) ,∑dk

i=1 θ(k)i x i polynomial with degree dk , dim(θ(k)) = pk = 1 + dk

Global model for x = ({x}1, {x}2, . . . , {x}d)⊤:

r⊤(x)γ =∏d

k=1[r(k)(x)]⊤θ(k),

total degree∑d

k=1 dk , dim(γ) =∏d

k=1 pk

Example:

r⊤(x)γ = (θ(1)0 + θ

(1)1 {x}1 + θ

(1)2 {x}2

1) × (θ(2)0 + θ

(2)1 {x}2 + θ

(2)2 {x}2

2)

= γ0 + γ1{x}1 + γ2{x}2 + γ12{x}1{x}2 + γ11{x}12 + γ22{x}2

2

+γ112{x}12{x}2 + γ122{x}1{x}2

2 + γ1122{x}12{x}2

2

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 34 / 62

Page 89: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.4 Tensor-product models

2.4 Tensor-product models

D-optimality (also true for A-optimality under some conditions (Schwabe, 1996))

[r(k)(x)]⊤θ(k) ,∑dk

i=1 θ(k)i x i polynomial with degree dk , dim(θ(k)) = pk = 1 + dk

Global model for x = ({x}1, {x}2, . . . , {x}d)⊤:

r⊤(x)γ =∏d

k=1[r(k)(x)]⊤θ(k),

total degree∑d

k=1 dk , dim(γ) =∏d

k=1 pk

Example:

r⊤(x)γ = (θ(1)0 + θ

(1)1 {x}1 + θ

(1)2 {x}2

1) × (θ(2)0 + θ

(2)1 {x}2 + θ

(2)2 {x}2

2)

= γ0 + γ1{x}1 + γ2{x}2 + γ12{x}1{x}2 + γ11{x}12 + γ22{x}2

2

+γ112{x}12{x}2 + γ122{x}1{x}2

2 + γ1122{x}12{x}2

2

D-optimal design (approximate theory) = tensor product of d one-dimensionalD-optimal designs(true for any type of model, not only polynomials)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 34 / 62

Page 90: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.4 Tensor-product models

Polynomial of degree k: D-optimal design supported on k + 1 points,(on [−1, 1], roots of (1 − t2)P ′

k(t), with Pk(t) , k-th Legendre polynomial)all with equal weight 1/(k + 1)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 35 / 62

Page 91: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.4 Tensor-product models

Polynomial of degree k: D-optimal design supported on k + 1 points,(on [−1, 1], roots of (1 − t2)P ′

k(t), with Pk(t) , k-th Legendre polynomial)all with equal weight 1/(k + 1)

dimension 2, d1 = d2 = 29 points, weights =1/9

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 35 / 62

Page 92: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.4 Tensor-product models

Polynomial of degree k: D-optimal design supported on k + 1 points,(on [−1, 1], roots of (1 − t2)P ′

k(t), with Pk(t) , k-th Legendre polynomial)all with equal weight 1/(k + 1)

dimension 2, d1 = d2 = 316 points, weights =1/16

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 35 / 62

Page 93: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.4 Tensor-product models

Polynomial of degree k: D-optimal design supported on k + 1 points,(on [−1, 1], roots of (1 − t2)P ′

k(t), with Pk(t) , k-th Legendre polynomial)all with equal weight 1/(k + 1)

dimension 2, d1 = d2 = 425 points, weights =1/25

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 35 / 62

Page 94: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.4 Tensor-product models

Polynomial of degree k: D-optimal design supported on k + 1 points,(on [−1, 1], roots of (1 − t2)P ′

k(t), with Pk(t) , k-th Legendre polynomial)all with equal weight 1/(k + 1)

dimension 2, d1 = d2 = 536 points, weights =1/36

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 35 / 62

Page 95: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.4 Tensor-product models

Sum of polynomials?r⊤(x)γ =

∑dk=1[r(k)(x)]⊤θ(k),

total degree maxdk=1 dk , dim(γ) = (

∑dk=1 pk) − 1 =

∑dk=1 dk + d − 1

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 36 / 62

Page 96: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.4 Tensor-product models

Sum of polynomials?r⊤(x)γ =

∑dk=1[r(k)(x)]⊤θ(k),

total degree maxdk=1 dk , dim(γ) = (

∑dk=1 pk) − 1 =

∑dk=1 dk + d − 1

Example:

r⊤(x)γ = (θ(1)0 + θ

(1)1 {x}1 + θ

(1)2 {x}2

1) + (θ(2)0 + θ

(2)1 {x}2 + θ

(2)2 {x}2

2)

= γ0 + γ1{x}1 + γ2{x}2 + γ11{x}12 + γ22{x}2

2

(no interaction term)

Again, D-optimal design (approximate theory) = tensor product of d

one-dimensional D-optimal designs (Schwabe, 1996)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 36 / 62

Page 97: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.4 Tensor-product models

Sum of polynomials?r⊤(x)γ =

∑dk=1[r(k)(x)]⊤θ(k),

total degree maxdk=1 dk , dim(γ) = (

∑dk=1 pk) − 1 =

∑dk=1 dk + d − 1

Example:

r⊤(x)γ = (θ(1)0 + θ

(1)1 {x}1 + θ

(1)2 {x}2

1) + (θ(2)0 + θ

(2)1 {x}2 + θ

(2)2 {x}2

2)

= γ0 + γ1{x}1 + γ2{x}2 + γ11{x}12 + γ22{x}2

2

(no interaction term)

Again, D-optimal design (approximate theory) = tensor product of d

one-dimensional D-optimal designs (Schwabe, 1996)

Difficult to apply in big dimension:d polynomials of degree k ➠ (k + 1)d support points!

but a general lesson, and possible extension towards Gaussian process models andkriging

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 36 / 62

Page 98: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.5 Consequences for space-filling design

2.5 Consequences for space-filling design

D-optimality + polynomials ➠ more points close to the boundary as degreeincreasesErdös-Turan theorem: roots r of orthonormal polynomials on [0, 1] areasymptotically distributed with the arcsine law, with density ϕ0(r) = 1

π√

r (1−r)

➠ Should we put more points close to the boundary ?

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 37 / 62

Page 99: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.5 Consequences for space-filling design

2.5 Consequences for space-filling design

D-optimality + polynomials ➠ more points close to the boundary as degreeincreasesErdös-Turan theorem: roots r of orthonormal polynomials on [0, 1] areasymptotically distributed with the arcsine law, with density ϕ0(r) = 1

π√

r (1−r)

➠ Should we put more points close to the boundary ?

In order to counter the boundary effect Dette and Pepelyshev (2010):

Take a “standard” space-filling design (e.g., Maximin, Lh Maximin, LDS),

for each j = 1, . . . , d , transform j-th coordinates {xi}j with

T : x 7→ z = T (x) = 1+cos(π x)2

(x ∼ uniform → z ∼ arcsine),

Use the transformed design Zn = (z1, . . . , zn)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 37 / 62

Page 100: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.5 Consequences for space-filling design

dimension 2, 5-degree polynomials:D-optimal design has 36 points, weights =1/36

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 38 / 62

Page 101: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.5 Consequences for space-filling design

dimension 2, n = 36Transformed (arcsine) Maximin-optimal design

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 38 / 62

Page 102: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.5 Consequences for space-filling design

Illustration of boundary effect: d = 1, n = 11 observations in [0, 1], ordinary

kriging with covariance C(t) = exp(−50 t2) ➠ plot of ρn(x)

Xn Maximin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.002

0.004

0.006

0.008

0.01

0.012

0.014

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 39 / 62

Page 103: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.5 Consequences for space-filling design

Illustration of boundary effect: d = 1, n = 11 observations in [0, 1], ordinary

kriging with covariance C(t) = exp(−50 t2) ➠ plot of ρn(x)

Xn Maximin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.002

0.004

0.006

0.008

0.01

0.012

0.014

Xn miniMax

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

Uniform distribution of design points⇒ prediction near boundaries relies on less points⇒ precision is worse close to boundaries

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 39 / 62

Page 104: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.5 Consequences for space-filling design

Illustration of boundary effect: d = 1, n = 11 observations in [0, 1], ordinary

kriging with covariance C(t) = exp(−50 t2) ➠ plot of ρn(x)

Xn Maximin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.002

0.004

0.006

0.008

0.01

0.012

0.014

Maximin and miniMax

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

Uniform distribution of design points⇒ prediction near boundaries relies on less points⇒ precision is worse close to boundaries

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 39 / 62

Page 105: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.5 Consequences for space-filling design

. . . But transformation T : x 7→ z = T (x) = 1+cos(π x)2 may be too “strong”

Maximin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.002

0.004

0.006

0.008

0.01

0.012

0.014

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 40 / 62

Page 106: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.5 Consequences for space-filling design

. . . But transformation T : x 7→ z = T (x) = 1+cos(π x)2 may be too “strong”

Maximin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.002

0.004

0.006

0.008

0.01

0.012

0.014

Transformed (arcsine) Maximin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.02

0.04

0.06

0.08

0.1

0.12

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 40 / 62

Page 107: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.5 Consequences for space-filling design

. . . But transformation T : x 7→ z = T (x) = 1+cos(π x)2 may be too “strong”

Maximin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.002

0.004

0.006

0.008

0.01

0.012

0.014

Maximin + transformed Maximin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.02

0.04

0.06

0.08

0.1

0.12

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 40 / 62

Page 108: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.5 Consequences for space-filling design

Arcsine distribution: maximizes Φ[0](ξ) = exp[∫ 1

0

∫ 10 log ‖x − y‖ ξ(dx) ξ(dy)

]

(continuous version of φ[0](X) = exp[∑

i<j µij log(dij)], see § I-1.6)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 41 / 62

Page 109: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.5 Consequences for space-filling design

Arcsine distribution: maximizes Φ[0](ξ) = exp[∫ 1

0

∫ 10 log ‖x − y‖ ξ(dx) ξ(dy)

]

(continuous version of φ[0](X) = exp[∑

i<j µij log(dij)], see § I-1.6)

➤ Maximization of

Φ[q](ξ) =

[∫ 1

0

∫ 1

0‖x − y‖−q ξ(dx) ξ(dy)

]−1/q

, 0 < q < 1

(continuous version of φ[q](X) =[∑

i<j µij d−qij

]−1/q

, see § I-1.6) yields a

measure ξ with density ϕq(x) = x (q−1)/2(1−x)(q−1)/2

B( q+12 , q+1

2 )(Beta distribution) (Zhigljavsky

et al., 2010)(tends to arcsine when q → 0, to uniform when q → 1)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 41 / 62

Page 110: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.5 Consequences for space-filling design

In order to counter the boundary effect (Dette and Pepelyshev, 2010) :

Take a “standard” space-filling design (e.g., Maximin, Lh Maximin, LDS),

for each j = 1, . . . , d , transform j-th coordinates {xi}j withT : x 7→ z = T (x) such that x =

∫ z

0 ϕq(t) dt (x ∼ uniform → z ∼ ϕq),

Use the transformed design Zn = (z1, . . . , zn)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 42 / 62

Page 111: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.5 Consequences for space-filling design

In order to counter the boundary effect (Dette and Pepelyshev, 2010) :

Take a “standard” space-filling design (e.g., Maximin, Lh Maximin, LDS),

for each j = 1, . . . , d , transform j-th coordinates {xi}j withT : x 7→ z = T (x) such that x =

∫ z

0 ϕq(t) dt (x ∼ uniform → z ∼ ϕq),

Use the transformed design Zn = (z1, . . . , zn)

dimension 2, 5-degree polynomialsD-optimal design:

36 points, weights =1/36

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 42 / 62

Page 112: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.5 Consequences for space-filling design

In order to counter the boundary effect (Dette and Pepelyshev, 2010) :

Take a “standard” space-filling design (e.g., Maximin, Lh Maximin, LDS),

for each j = 1, . . . , d , transform j-th coordinates {xi}j withT : x 7→ z = T (x) such that x =

∫ z

0 ϕq(t) dt (x ∼ uniform → z ∼ ϕq),

Use the transformed design Zn = (z1, . . . , zn)

dimension 2, 5-degree polynomialsD-optimal design:

36 points, weights =1/36

dimension 2, n = 36Transformed (arcsine)

Maximin-optimal design

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 42 / 62

Page 113: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.5 Consequences for space-filling design

In order to counter the boundary effect (Dette and Pepelyshev, 2010) :

Take a “standard” space-filling design (e.g., Maximin, Lh Maximin, LDS),

for each j = 1, . . . , d , transform j-th coordinates {xi}j withT : x 7→ z = T (x) such that x =

∫ z

0 ϕq(t) dt (x ∼ uniform → z ∼ ϕq),

Use the transformed design Zn = (z1, . . . , zn)

dimension 2, 5-degree polynomialsD-optimal design:

36 points, weights =1/36

dimension 2, n = 36Transformed (Beta, q = 0.4)

Maximin-optimal design

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 42 / 62

Page 114: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.5 Consequences for space-filling design

. . . For a suitably chosen Beta-transformation (q = 0.84)

Maximin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.002

0.004

0.006

0.008

0.01

0.012

0.014

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 43 / 62

Page 115: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.5 Consequences for space-filling design

. . . For a suitably chosen Beta-transformation (q = 0.84)

Maximin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.002

0.004

0.006

0.008

0.01

0.012

0.014

Transformed (Beta) Maximin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

1

2

3

4

5

6

7

8

9x 10

−3

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 43 / 62

Page 116: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

2 Optimal design for linear regression 2.5 Consequences for space-filling design

. . . For a suitably chosen Beta-transformation (q = 0.84)

Maximin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.002

0.004

0.006

0.008

0.01

0.012

0.014

Maximin + transformed Maximin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.002

0.004

0.006

0.008

0.01

0.012

0.014

Choice of a suitable q?➠ Optimize a precision criterion based on ρn(x)

(depends on covariance C(·))

▲ requires X = hypercube ▲

▲ if d is big, many points are on the boundary (or at the vertices!) of X ▲

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 43 / 62

Page 117: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

3 Optimal design for Bayesian prediction 3.1 Karhunen-Loève decomposition

3 Optimal design for Bayesian prediction

3.1 Karhunen-Loève decomposition of a Gaussian process

Model without trend: f (x) = Z (x), Gaussian processE{Z (x)} = 0, E{Z (x)Z (x′)} = C(x, x′) (= C(x − x′) if stationary)

IMSEµ(Xn) ,∫

XE{

[Z (x) − E{Z (x)|yn}]2}

dµ(x)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 44 / 62

Page 118: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

3 Optimal design for Bayesian prediction 3.1 Karhunen-Loève decomposition

3 Optimal design for Bayesian prediction

3.1 Karhunen-Loève decomposition of a Gaussian process

Model without trend: f (x) = Z (x), Gaussian processE{Z (x)} = 0, E{Z (x)Z (x′)} = C(x, x′) (= C(x − x′) if stationary)

IMSEµ(Xn) ,∫

XE{

[Z (x) − E{Z (x)|yn}]2}

dµ(x)

The integral operator Tµ defined by∀f ∈ L2(X , µ), ∀x ∈ X , Tµ [f ] (x) =

∫X

f (x′)K (x, x′)dµ(x′)is diagonalisable:

eigenvalues λi , i = 1, 2, 3 . . . (in ց order)associated eigenfunctions ϕi(·) (extended over X ), with∫

Xϕi(x)ϕj(x)dµ(x) = δij

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 44 / 62

Page 119: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

3 Optimal design for Bayesian prediction 3.1 Karhunen-Loève decomposition

Z ′(x) , PHµ[Zx] =

∑i ζi

√λiϕi(x)

with all ζi i.i.d. N (0, 1)PHµ = projection ⊥ on the space “which contributes to IMSEµ”

Z ′(x) =∑

i γiϕi(x) where the r.v. γi are independent N (0, λi)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 45 / 62

Page 120: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

3 Optimal design for Bayesian prediction 3.1 Karhunen-Loève decomposition

Z ′(x) , PHµ[Zx] =

∑i ζi

√λiϕi(x)

with all ζi i.i.d. N (0, 1)PHµ = projection ⊥ on the space “which contributes to IMSEµ”

Z ′(x) =∑

i γiϕi(x) where the r.v. γi are independent N (0, λi)

For a given truncation level m,

Z ′(x) =

m∑

i=1

γiϕi(x) +∑

i>m

γiϕi(x)

≃ Z ′′(x) =∑m

i=1 γiϕi(x) + ε(x)

with E{ε(xi)} = 0, E{ε(xi)ε(xj)} = σ2δij et σ2 =∑

i>m λi

Z ′′(xi) = φ⊤(xi)γ + εi

= linear regression model (as in § 2.1)(with eigenfunctions ϕi(·), i = 1, . . . , m, instead of polynomials)

(Fedorov, 1996) ➠ construct an optimal design for this model

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 45 / 62

Page 121: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

3 Optimal design for Bayesian prediction 3.2 Bayesian prediction

3.2 Bayesian prediction for Z ′′(xi) = φ⊤(xi)γ + εi

LS estimation:

γn = (Φ⊤n Φn)−1Φ⊤

n yn, with yn = (y1, . . . , yn)⊤ and Φn =

φ⊤(x1)...

φ⊤(xn)

cov(γn) = σ2(Φ⊤n Φn)−1 = σ2

n

1

n

n∑

i=1

φ(xi)φ⊤(xi)

︸ ︷︷ ︸Mn

−1

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 46 / 62

Page 122: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

3 Optimal design for Bayesian prediction 3.2 Bayesian prediction

3.2 Bayesian prediction for Z ′′(xi) = φ⊤(xi)γ + εi

LS estimation:

γn = (Φ⊤n Φn)−1Φ⊤

n yn, with yn = (y1, . . . , yn)⊤ and Φn =

φ⊤(x1)...

φ⊤(xn)

cov(γn) = σ2(Φ⊤n Φn)−1 = σ2

n

1

n

n∑

i=1

φ(xi)φ⊤(xi)

︸ ︷︷ ︸Mn

−1

Prediction at x : ηn(x) = φ⊤(x)γn

IMSE(Xn) =∫

Xφ⊤(x)cov(γn)φ(x) dµ(x) = σ2

ntrace[M−1

n ]= A-optimality criterion(requires n ≥ m to have a full rank Mn)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 46 / 62

Page 123: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

3 Optimal design for Bayesian prediction 3.2 Bayesian prediction

Bayesian estimation: prior distribution N (0, Λm) for γ,with Λm = diag{λ1, . . . , λm}

γn = [Φ⊤n Φn/σ2 + Λ−1

m ]−1[Φ⊤n yn/σ2]

cov(γn) = [Φ⊤n Φn/σ2 + Λ−1

m ]−1 = σ2

n

1

n

n∑

i=1

φ(xi)φ⊤(xi)

︸ ︷︷ ︸Mn

+σ2

nΛ−1

m

︸ ︷︷ ︸MB(xn)

−1

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 47 / 62

Page 124: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

3 Optimal design for Bayesian prediction 3.2 Bayesian prediction

Bayesian estimation: prior distribution N (0, Λm) for γ,with Λm = diag{λ1, . . . , λm}

γn = [Φ⊤n Φn/σ2 + Λ−1

m ]−1[Φ⊤n yn/σ2]

cov(γn) = [Φ⊤n Φn/σ2 + Λ−1

m ]−1 = σ2

n

1

n

n∑

i=1

φ(xi)φ⊤(xi)

︸ ︷︷ ︸Mn

+σ2

nΛ−1

m

︸ ︷︷ ︸MB(xn)

−1

Prediction at x : ηn(x) = φ⊤(x)γn

IMSE(Xn) =∫

Xφ⊤(x)cov(γn)φ(x) dµ(x) = σ2

ntrace[M−1

B (Xn)]= A-optimality criterion applied to MB(Xn)(MB(Xn) has full rank for any m!)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 47 / 62

Page 125: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

3 Optimal design for Bayesian prediction 3.3 IMSE-optimal design

3.3 IMSE-optimal design

All the machinery of optimal design for parametric models is available (Pilz, 1983)

Exact design: Spöck and Pilz (2010) for prediction of spatial random fields (butno guaranteed convergence to the optimum)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 48 / 62

Page 126: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

3 Optimal design for Bayesian prediction 3.3 IMSE-optimal design

3.3 IMSE-optimal design

All the machinery of optimal design for parametric models is available (Pilz, 1983)

Exact design: Spöck and Pilz (2010) for prediction of spatial random fields (butno guaranteed convergence to the optimum)

Approximate design: minimize Ψ(ξ) = trace[M−1B (ξ)],

with ξ a probability measure over X and

MB(ξ) =

∫φ(x)φ⊤(x) ξ(dx) +

σ2

nΛ−1

m

➤ guaranteed convergence towards an optimal ξ∗, with N∗ support points

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 48 / 62

Page 127: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

3 Optimal design for Bayesian prediction 3.3 IMSE-optimal design

In practice: eigen-decomposition

➤ use a finite Q-point set XQ = {x(k), . . . , x(Q)}➤ diagonalize QW where

{Q}kℓ = C(x(k), x(ℓ)), W = diag{w1, . . . , wQ}(wk = 1/Q when µ uniform)

➔ QW = PΛP−1 with P⊤WP = IQ andΛ = diag{λ1, . . . , λQ} with λ1 ≥ · · · ≥ λQ

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 49 / 62

Page 128: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

3 Optimal design for Bayesian prediction 3.3 IMSE-optimal design

In practice: eigen-decomposition

➤ use a finite Q-point set XQ = {x(k), . . . , x(Q)}➤ diagonalize QW where

{Q}kℓ = C(x(k), x(ℓ)), W = diag{w1, . . . , wQ}(wk = 1/Q when µ uniform)

➔ QW = PΛP−1 with P⊤WP = IQ andΛ = diag{λ1, . . . , λQ} with λ1 ≥ · · · ≥ λQ

MB(ξ) =

m∑

k=1

pkφ(xk)φ⊤(xk) +

∑Qi=m+1 λi

ndiag{λ−1

1 , . . . , λ−1m } , m < Q

where pk = ξ{xk} and {φ(xk)}j = ϕj(xk) = Pkj , k = 1, . . . , m, j = 1, . . . , m

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 49 / 62

Page 129: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

3 Optimal design for Bayesian prediction 3.3 IMSE-optimal design

In practice: eigen-decomposition

➤ use a finite Q-point set XQ = {x(k), . . . , x(Q)}➤ diagonalize QW where

{Q}kℓ = C(x(k), x(ℓ)), W = diag{w1, . . . , wQ}(wk = 1/Q when µ uniform)

➔ QW = PΛP−1 with P⊤WP = IQ andΛ = diag{λ1, . . . , λQ} with λ1 ≥ · · · ≥ λQ

MB(ξ) =

m∑

k=1

pkφ(xk)φ⊤(xk) +

∑Qi=m+1 λi

ndiag{λ−1

1 , . . . , λ−1m } , m < Q

where pk = ξ{xk} and {φ(xk)}j = ϕj(xk) = Pkj , k = 1, . . . , m, j = 1, . . . , m

➤ minimization of trace[M−1B (ξ)] ➔ ξ∗

p∗k = 0 for many k, but some p∗

k > 0 are very small,there may exist clusters of points, etc.➔ aggregate support points of ξ∗

➔ remove some points (transfer their weights on others, optimally)(Gauthier & P., 2016)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 49 / 62

Page 130: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

3 Optimal design for Bayesian prediction 3.3 IMSE-optimal design

The number N of points is not totally controlled, but 2 tuning parameters areavailable: m (truncation level) and n (take m ≈ n ≈ N)

N points ➠ initialization for optimization of the true IMSE(XN) by any standardalgorithm (optimal points remain in the convex hull of X )

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 50 / 62

Page 131: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

3 Optimal design for Bayesian prediction 3.3 IMSE-optimal design

Example :d = 2, C(x, x′) = (1 + 10‖x − x′‖) exp(−10‖x − x′‖) (Matérn 3/2)XQ = regular grid with Q = 33 × 33 = 1 089 points, σ2 =

∑i>m λi

m = n = 10aggregation of support of ξ∗

7

7

7

2

3

8

9

88

99

10

11

10

11

10

11

1212

66

12

6

4

5

1

1

1

m = n = 10➠ N = 12

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 51 / 62

Page 132: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

3 Optimal design for Bayesian prediction 3.3 IMSE-optimal design

Example :d = 2, C(x, x′) = (1 + 10‖x − x′‖) exp(−10‖x − x′‖) (Matérn 3/2)XQ = regular grid with Q = 33 × 33 = 1 089 points, σ2 =

∑i>m λi

m = 30, n = 10 ➠ N = 36 X36 miniMax optimal

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 51 / 62

Page 133: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

3 Optimal design for Bayesian prediction 3.3 IMSE-optimal design

Can be used for any X if d not too big: use a finite XQ given by first Q pointsof a LDS in X

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 52 / 62

Page 134: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

3 Optimal design for Bayesian prediction 3.3 IMSE-optimal design

Can be used for any X if d not too big: use a finite XQ given by first Q pointsof a LDS in X

With trend, f (x) = Z (x) + r⊤(x)β?Same thing (Spöck and Pilz, 2010), with slightly more complicated expressions

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 52 / 62

Page 135: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

3 Optimal design for Bayesian prediction 3.3 IMSE-optimal design

Can be used for any X if d not too big: use a finite XQ given by first Q pointsof a LDS in X

With trend, f (x) = Z (x) + r⊤(x)β?Same thing (Spöck and Pilz, 2010), with slightly more complicated expressions

More advanced: avoid mixing eigenfunctions ϕi(·) with trend components {r}j(·)(Gauthier & P., 2016)

Conclusions part (2)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 52 / 62

Page 136: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

4 Beyond space filling

4 Beyond space filling

Optimal design for kriging: there is a hidden difficultythe value of θ in covariance C(·; θ) is unknown

➠ use the same observations to estimate θ and then construct ηn(x) with θn

estimated (plug-in method)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 53 / 62

Page 137: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

4 Beyond space filling

4 Beyond space filling

Optimal design for kriging: there is a hidden difficultythe value of θ in covariance C(·; θ) is unknown

➠ use the same observations to estimate θ and then construct ηn(x) with θn

estimated (plug-in method)

➠ In particular, we may use θn = Maximum Likelihood Estimator (MLE)(Z (x) is supposed to be Gaussian)

➠ there is a corrective term (Harville and Jeske, 1992; Abt, 1999) :

ρn(x; θ) = ρn(x; θ) + trace{M−1θ

∂vn(x;θ)∂θ Cn

∂vn(x;θ)∂θ⊤ }

( = Empirical Kriging (EK) variance)avec :

vn(x; θ) such that ηn(x) = v⊤n (x; θ)yn

Mθ = Fisher Information Matrix (FIM) for θ

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 53 / 62

Page 138: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

4 Beyond space filling

Example (Zimmerman, 2006):

E{Z (x)Z (x′)} = θ‖x−x′‖, θ = 0.3X = regular grid 5 × 5, optimal designs for:

0 2 4 60

1

2

3

4

5

6

prediction for θ known

0 2 4 60

1

2

3

4

5

6

estimation of θ

0 2 4 60

1

2

3

4

5

6

prediction with θ estimatedprediction with known θ:X4 minimizes maxx∈X ρ4(x)estimation of θ:X4 maximizes det Mθ

prediction with θ estimated:X4 minimizes MEK=maxx∈X ρ4(x; θ)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 54 / 62

Page 139: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

4 Beyond space filling

Example (Müller et al., 2015):NH4 concentration in north sea (collaboration with MUMM, Belgium) —simulated data, kriging with Matérn 3/2 kernel

ρ14(x; θ) for miniMax-optimal designX∗

mM,n=14

ρ14(x; θ) for X∗14 minimizing

MEK=maxx∈X ρ14(x; θ)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 55 / 62

Page 140: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

4 Beyond space filling

Choosing Xn that minimizes MEK=maxx∈X ρn(x; θ) is difficult

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 56 / 62

Page 141: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

4 Beyond space filling

Choosing Xn that minimizes MEK=maxx∈X ρn(x; θ) is difficult

➠ Use a compromise criterion between space filling and “aggregation of points”

for instance, take Xn that maximizes γ log det(Mβ) + (1 − γ) log det(Mθ) (Mülleret al., 2011, 2015), with

Mβ= FIM for trend parameters β (maximization → space filling design)

Mθ= FIM for correlation parameters θ (maximization → aggregation)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 56 / 62

Page 142: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

4 Beyond space filling

Example: n = 7, d = 2, C(x − x′; θ) = exp(−θ‖x − x′‖), θ = 0.7,1000 Lh (999 random + ♦ for a Maximin optimal design)MKV=maxx∈X ρn(x; θ), Jα = detα(Mβ) + det1−α(Mθ) (α = 0.8)

1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3−0.3

−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

MKV

−lo

g(J

α)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 57 / 62

Page 143: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

4 Beyond space filling

Example: n = 7, d = 2, C(x − x′; θ) = exp(−θ‖x − x′‖), θ = 0.7,1000 Lh (999 random + ♦ for a Maximin optimal design)MKV=maxx∈X ρn(x; θ), Jα = detα(Mβ) + det1−α(Mθ) (α = 0.8)

1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3−0.3

−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

MKV

−lo

g(J

α)

However, the effet of corrective term inρn(x; θ) = ρn(x; θ) + trace{M−1

θ∂vn(x;θ)

∂θ Cn∂vn(x;θ)

∂θ⊤ }quickly vanishes as n increases

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 57 / 62

Page 144: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

5 Conclusions part (2)

5 Conclusions part (2) — with model

➤ Design criteria relying on a Gaussian-process model (entropy, MMSE, IMSE)depend on the chosen covariance (and on θ in C(·, θ))

➠ expectation w.r.t. θ (Joseph et al., 2015) → entropy➠ worst case w.r.t. θ (Spöck and Pilz, 2010) → IMSE➠ However, the model is often just a tool to generate a space-filling

design: the value of θ is not critical (choose a small enough correlation length tospread the points in X )

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 58 / 62

Page 145: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

5 Conclusions part (2)

5 Conclusions part (2) — with model

➤ Design criteria relying on a Gaussian-process model (entropy, MMSE, IMSE)depend on the chosen covariance (and on θ in C(·, θ))

➠ expectation w.r.t. θ (Joseph et al., 2015) → entropy➠ worst case w.r.t. θ (Spöck and Pilz, 2010) → IMSE➠ However, the model is often just a tool to generate a space-filling

design: the value of θ is not critical (choose a small enough correlation length tospread the points in X )➤ Put more points near the boundaries than in the central part of X

(but be careful ➠ in high dimension almost all the volume is near the boundaries!)➤ Put a few points close to each other to help estimation of θ

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 58 / 62

Page 146: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

5 Conclusions part (2)

5 Conclusions part (2) — with model

➤ Design criteria relying on a Gaussian-process model (entropy, MMSE, IMSE)depend on the chosen covariance (and on θ in C(·, θ))

➠ expectation w.r.t. θ (Joseph et al., 2015) → entropy➠ worst case w.r.t. θ (Spöck and Pilz, 2010) → IMSE➠ However, the model is often just a tool to generate a space-filling

design: the value of θ is not critical (choose a small enough correlation length tospread the points in X )➤ Put more points near the boundaries than in the central part of X

(but be careful ➠ in high dimension almost all the volume is near the boundaries!)➤ Put a few points close to each other to help estimation of θ

Test several methods (none is perfect) by comparing values of different criteria

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 58 / 62

Page 147: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

5 Conclusions part (2)

small

inter-distance

(miniMax,

dispersion)

large

intra-distances

(Maximin, energy)

uniformity

(entropy,

discrepancy)

Inequalities, bounds

Max. Entropy

Sampling

Minimize

max. of MSE

or IMSE

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 59 / 62

Page 148: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

5 Conclusions part (2)

small

inter-distance

(miniMax,

dispersion)

large

intra-distances

(Maximin, energy)

uniformity

(entropy,

discrepancy)

Inequalities, bounds

Max. Entropy

Sampling

Minimize

max. of MSE

or IMSE

XQ finite

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 59 / 62

Page 149: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

Références

Références I

Abt, M., 1999. Estimating the prediction mean squared error in gaussian stochastic processes with exponentialcorrelation structure. Scandinavian Journal of Statistics 26 (4), 563–578.

Atwood, C., 1973. Sequences converging to D-optimal designs of experiments. Annals of Statistics 1 (2),342–352.

Böhning, D., 1985. Numerical estimation of a probability measure. Journal of Statistical Planning andInference 11, 57–69.

Böhning, D., 1986. A vertex-exchange-method in D-optimal design theory. Metrika 33, 337–347.

Chernoff, H., 1953. Locally optimal designs for estimating parameters. Annals of Math. Stat. 24, 586–602.

Dette, H., Pepelyshev, A., 2010. Generalized latin hypercube design for computer experiments. Technometrics52 (4), 421–429.

Fedorov, V., 1972. Theory of Optimal Experiments. Academic Press, New York.

Fedorov, V., 1996. Design of spatial experiments: model fitting and prediction. In: Gosh, S., Rao, C. (Eds.),Handbook of Statistics, vol. 13. Elsevier, Amsterdam, Ch. 16, pp. 515–553.

Gauthier, B., Pronzato, L., 2014. Spectral approximation of the IMSE criterion for optimal designs inkernel-based interpolation models. SIAM/ASA J. Uncertainty Quantification 2, 805–825, DOI10.1137/130928534.

Gauthier, B., Pronzato, L., 2016. Approximation of IMSE-optimal designs via quadrature rules and spectraldecomposition. Communications in Statistics – Simulation and Computation 45 (5), 1600–1612.

Gauthier, B., Pronzato, L., 2017. Convex relaxation for IMSE optimal design in random field models.Computational Statistics and Data Analysis 113, 375–394.

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 60 / 62

Page 150: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

Références

Références II

Harville, D., Jeske, D., 1992. Mean squared error of estimation or prediction under a general linear model.Journal of the American Statistical Association 87 (419), 724–731.

Johnson, M., Moore, L., Ylvisaker, D., 1990. Minimax and maximin distance designs. Journal of StatisticalPlanning and Inference 26, 131–148.

Joseph, V., Gul, E., Ba, S., 2015. Maximum projection designs for computer experiments. Biometrika 102 (2),371–380.

Kiefer, J., Wolfowitz, J., 1960. The equivalence of two extremum problems. Canadian Journal of Mathematics12, 363–366.

Mitchell, T., 1974. An algorithm for the construction of “D-optimal” experimental designs. Technometrics 16,203–210.

Müller, W., Pronzato, L., Rendas, J., Waldl, H., 2015. Efficient prediction designs for random fields. AppliedStochastic Models in Business and Industry 31 (2), 178–194 (+ discussion pages 195–203).

Müller, W., Pronzato, L., Waldl, H., 2011. Beyond space-filling: An illustrative case. Procedia EnvironmentalSciences 7, 14–19, dOI 10.1016/j.proenv.2011.07.004.

Pilz, J., 1983. Bayesian Estimation and Experimental Design in Linear Regression Models. Vol. 55.Teubner-Texte zur Mathematik, Leipzig, (also Wiley, New York, 1991).

Pukelsheim, F., 1993. Optimal Experimental Design. Wiley, New York.

Pukelsheim, F., Reider, S., 1992. Efficient rounding of approximate designs. Biometrika 79 (4), 763–770.

Sacks, J., Welch, W., Mitchell, T., Wynn, H., 1989. Design and analysis of computer experiments. StatisticalScience 4 (4), 409–435.

Schwabe, R., 1996. Optimum Designs for Multi-Factor Models. Springer, New York.

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 61 / 62

Page 151: Design of Computer Experiments (2) with model · Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62 1 Optimal design for Gaussian

Références

Références III

Shewry, M., Wynn, H., 1987. Maximum entropy sampling. Applied Statistics 14, 165–170.Silvey, S., 1980. Optimal Design. Chapman & Hall, London.Spöck, G., Pilz, J., 2010. Spatial sampling design and covariance-robust minimax prediction based on convex

design ideas. Stochastic Environmental Research and Risk Assessment 24 (3), 463–482.Titterington, D., 1976. Algorithms for computing D-optimal designs on a finite design space. In: Proc. of the

1976 Conference on Information Science and Systems. Dept. of Electronic Engineering, John HopkinsUniversity, Baltimore, pp. 213–216.

Torsney, B., 1983. A moment inequality and monotonicity of an algorithm. In: Kortanek, K., Fiacco, A. (Eds.),Proc. Int. Symp. on Semi-infinite Programming and Applications. Springer, Heidelberg, pp. 249–260.

Torsney, B., 2009. W-iterations and ripples therefrom. In: Pronzato, L., Zhigljavsky, A. (Eds.), Optimal Designand Related Areas in Optimization and Statistics. Springer, Ch. 1, pp. 1–12.

Welch, W., 1982. Branch-and-bound search for experimental designs based on D-optimality and other criteria.Technometrics 24 (1), 41–28.

Wu, C., 1978. Some algorithmic aspects of the theory of optimal designs. Annals of Statistics 6 (6),1286–1301.

Wynn, H., 1970. The sequential generation of D-optimum experimental designs. Annals of Math. Stat. 41,1655–1664.

Yu, Y., 2010. Strict monotonicity and convergence rate of Titterington’s algorithm for computing D-optimaldesigns. Comput. Statist. Data Anal. 54, 1419–1425.

Yu, Y., 2011. D-optimal designs via a cocktail algorithm. Stat. Comput. 21, 475–481.Zhigljavsky, A., Dette, H., Pepelyshev, A., 2010. A new approach to optimal design for linear models with

correlated observations. Journal of the American Statistical Association 105 (491), 1093–1103.Zimmerman, D., 2006. Optimal network design for spatial prediction, covariance parameter estimation, and

empirical prediction. Environmetrics 17 (6), 635–652.

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 62 / 62