design of computer experiments (2) with model · luc pronzato (cnrs) design of computer experiments...

Post on 08-May-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Design of Computer Experiments— (2) with model —

Luc Pronzato

Université Côte d’Azur, CNRS, I3S, France

Plan I

1 Optimal design for Gaussian process models & kriging1.1 Gaussian processes and kriging1.2 Criteria based on MSE1.3 Maximum Entropy Sampling

2 Optimal design for linear regression2.1 Linear regression2.2 Exact design2.3 Approximate design theory2.4 Tensor-product models2.5 Consequences for space-filling design

3 Optimal design for Bayesian prediction3.1 Karhunen-Loève decomposition3.2 Bayesian prediction3.3 IMSE-optimal design

4 Beyond space filling

5 Conclusions part (2)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 2 / 62

Objectives (same as part (1))

Computer experiments: based on simulations

➤ Usually, x ∈ Rd➠ observation Y (x) (physical experiment)

➤ here, numerical simulation: Y (x) = f (x), observation = evaluation of anunknown function f (·)(no measurement error)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 3 / 62

Objectives (same as part (1))

Computer experiments: based on simulations

➤ Usually, x ∈ Rd➠ observation Y (x) (physical experiment)

➤ here, numerical simulation: Y (x) = f (x), observation = evaluation of anunknown function f (·)(no measurement error)

from pairs (xi , f (xi)), i = 1, 2, . . . , n

optimization: find x∗ = arg maxx∈X f (x)

inversion: construct {x ∈ X : f (x) = T}estimation of a probability of failure: Prob{f (x) > C} when x ∼ probabilitydensity ϕ(·)sensitivity analysis

approximation/interpolation of f (·) by a predictor ηn(·), to be constructed

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 3 / 62

1 Optimal design for Gaussian process models & kriging 1.1 Gaussian processes and kriging

1 Optimal design for Gaussian process models & kriging

1.1 Gaussian processes and kriging

Model for f (·): Gaussian process

f (x) = r⊤(x)β + Z (x), withr(x) a vector of known functions of x (the trend)Z (x) = realization of a random process (random field), second-order stationary,typically supposed to be Gaussian)E{Z (x)} = 0, E{Z (x)Z (x′)} = σ2 C(x − x′; θ)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 4 / 62

1 Optimal design for Gaussian process models & kriging 1.1 Gaussian processes and kriging

1 Optimal design for Gaussian process models & kriging

1.1 Gaussian processes and kriging

Model for f (·): Gaussian process

f (x) = r⊤(x)β + Z (x), withr(x) a vector of known functions of x (the trend)Z (x) = realization of a random process (random field), second-order stationary,typically supposed to be Gaussian)E{Z (x)} = 0, E{Z (x)Z (x′)} = σ2 C(x − x′; θ)

Computer experiments

Following (Sacks et al., 1989), choose C(δ; θ) continuous at δ = 0, C(0; θ) = 1➠ 2 repetitions at the same x yield the same f (x)

(no measurement error)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 4 / 62

1 Optimal design for Gaussian process models & kriging 1.1 Gaussian processes and kriging

Objective = interpolation (or extrapolation): build a predictor ηn(x) based on asingle realization of Z (·)much different from prediction of other realizations of Z (·) (➠ simply estimate β)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 5 / 62

1 Optimal design for Gaussian process models & kriging 1.1 Gaussian processes and kriging

Objective = interpolation (or extrapolation): build a predictor ηn(x) based on asingle realization of Z (·)much different from prediction of other realizations of Z (·) (➠ simply estimate β)

ordinary kriging

(expression for universal kriging with trend r⊤(x)β, β ∈ Rp, p > 1, are slightlymore complicated):f (x) = β + Z (x) → ηn(x) = ηn[f ](x)

BLUP (Best Linear Unbiased Predictor) at x: ηn(x) = v⊤n (x)yn with

yn = (f (x1), . . . , f (xn))⊤

vn(x) minimizes E{(v⊤n yn − [β + Z (x)])2}

with the constraint E{v⊤n yn} = β

∑ni=1{vn}i = E{f (x)} = β, i.e.,∑n

i=1{vn}i = 1

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 5 / 62

1 Optimal design for Gaussian process models & kriging 1.1 Gaussian processes and kriging

Prediction: ηn(x) = βn + c⊤n (x)C−1

n (yn − βn1)

MSE (Mean-Squared Error) proportional to

ρn(x) =

(1 −

[c⊤

n (x) 1] [ Cn 1

1⊤ 0

]−1 [cn(x)

1

])

[with {Cn}i,j = C((Xi − Xj ); θ), {cn(x)}i = C((Xi − x); θ), βn = (1⊤C−1n yn)/(1⊤C−1

n 1) (WLS) and

1 = (1, . . . , 1)⊤]

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 6 / 62

1 Optimal design for Gaussian process models & kriging 1.1 Gaussian processes and kriging

Prediction: ηn(x) = βn + c⊤n (x)C−1

n (yn − βn1)

MSE (Mean-Squared Error) proportional to

ρn(x) =

(1 −

[c⊤

n (x) 1] [ Cn 1

1⊤ 0

]−1 [cn(x)

1

])

[with {Cn}i,j = C((Xi − Xj ); θ), {cn(x)}i = C((Xi − x); θ), βn = (1⊤C−1n yn)/(1⊤C−1

n 1) (WLS) and

1 = (1, . . . , 1)⊤]

Ex. with d = 1, n = 5(note that ρn(xi) = 0— no measurement error)

−2 0 2 4 6 8 10 12 14 16−4

−3

−2

−1

0

1

2

3

4

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 6 / 62

1 Optimal design for Gaussian process models & kriging 1.1 Gaussian processes and kriging

Prediction: ηn(x) = βn + c⊤n (x)C−1

n (yn − βn1)

MSE (Mean-Squared Error) proportional to

ρn(x) =

(1 −

[c⊤

n (x) 1] [ Cn 1

1⊤ 0

]−1 [cn(x)

1

])

[with {Cn}i,j = C((Xi − Xj ); θ), {cn(x)}i = C((Xi − x); θ), βn = (1⊤C−1n yn)/(1⊤C−1

n 1) (WLS) and

1 = (1, . . . , 1)⊤]

Ex. with d = 2, n = 20(Xn = random Lh)

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

−1.5

−1

−0.5

0

0.5

1

1.5

f(x)

−1

−0.5

0

0.5

1

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 6 / 62

1 Optimal design for Gaussian process models & kriging 1.1 Gaussian processes and kriging

Prediction: ηn(x) = βn + c⊤n (x)C−1

n (yn − βn1)

MSE (Mean-Squared Error) proportional to

ρn(x) =

(1 −

[c⊤

n (x) 1] [ Cn 1

1⊤ 0

]−1 [cn(x)

1

])

[with {Cn}i,j = C((Xi − Xj ); θ), {cn(x)}i = C((Xi − x); θ), βn = (1⊤C−1n yn)/(1⊤C−1

n 1) (WLS) and

1 = (1, . . . , 1)⊤]

Ex. with d = 2, n = 20(Xn = random Lh)

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

−1.5

−1

−0.5

0

0.5

1

1.5

kriging prediction

−1

−0.5

0

0.5

1

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 6 / 62

1 Optimal design for Gaussian process models & kriging 1.1 Gaussian processes and kriging

Prediction: ηn(x) = βn + c⊤n (x)C−1

n (yn − βn1)

MSE (Mean-Squared Error) proportional to

ρn(x) =

(1 −

[c⊤

n (x) 1] [ Cn 1

1⊤ 0

]−1 [cn(x)

1

])

[with {Cn}i,j = C((Xi − Xj ); θ), {cn(x)}i = C((Xi − x); θ), βn = (1⊤C−1n yn)/(1⊤C−1

n 1) (WLS) and

1 = (1, . . . , 1)⊤]

Ex. with d = 2, n = 20(Xn = random Lh)

f(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 6 / 62

1 Optimal design for Gaussian process models & kriging 1.1 Gaussian processes and kriging

Prediction: ηn(x) = βn + c⊤n (x)C−1

n (yn − βn1)

MSE (Mean-Squared Error) proportional to

ρn(x) =

(1 −

[c⊤

n (x) 1] [ Cn 1

1⊤ 0

]−1 [cn(x)

1

])

[with {Cn}i,j = C((Xi − Xj ); θ), {cn(x)}i = C((Xi − x); θ), βn = (1⊤C−1n yn)/(1⊤C−1

n 1) (WLS) and

1 = (1, . . . , 1)⊤]

Ex. with d = 2, n = 20(Xn = random Lh)

kriging prediction

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 6 / 62

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

1.2 Criteria based on MSE

A natural idea: minimize ρn(x) for all x

In practice:

minimize MMSE(Xn) = maxx∈X ρn(x)

minimize IMSE(Xn) =∫

Xρn(x)dµ(x), with µ(·) some measure of interest

over X

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

1.2 Criteria based on MSE

A natural idea: minimize ρn(x) for all x

In practice:

minimize MMSE(Xn) = maxx∈X ρn(x)

minimize IMSE(Xn) =∫

Xρn(x)dµ(x), with µ(·) some measure of interest

over X

Optimal designs are typically space-filling:

Johnson et al. (1990): if C(x − x′) = c(‖x − x′‖) with c(·) decreasing, then X∗n

optimal for ΦmM(·) (miniMax optimal) tends to be optimal for MMSE(Xn) withcovariance Ca(x − x′) = [C(x − x′)]a when a → ∞

➠ no point xi on the boundary of X

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 7 / 62

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

small

inter-distance

(miniMax,

dispersion)

large

intra-distances

(Maximin, energy)

uniformity

(entropy,

discrepancy)

Inequalities, bounds

Minimize

max. of MSE

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 8 / 62

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Xn = {x1, . . . , xn}

Calculation of MMSE(Xn):

Compute ρn(x(k)) for a finite Q-points set XQ = {x(1), . . . , x(Q)}(e.g., first Q point of a LDS in X ),

then MMSE(Xn) ≃ maxk ρn(x(k)),to be minimized, for instance by simulated annealing

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 9 / 62

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Xn = {x1, . . . , xn}

Calculation of MMSE(Xn):

Compute ρn(x(k)) for a finite Q-points set XQ = {x(1), . . . , x(Q)}(e.g., first Q point of a LDS in X ),

then MMSE(Xn) ≃ maxk ρn(x(k)),to be minimized, for instance by simulated annealing

Calculation of IMSE(Xn) =∫

Xρn(x)dµ(x) (Gauthier and P. 2014, 2016) :

Without trend (r(x) = 0 ∀x) ➠ ρn(x) = 1 − c⊤n (x)C−1

n cn(x),where {cn(x)}i = C(x − xi), {Cn}ij = C(xi − xj)

IMSE(Xn) = 1 − trace

[C−1

n

X

cn(x)c⊤n (x)dµ(x)

]

= 1 − trace[C−1

n Σn

]

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 9 / 62

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Calculation for a finite Q-points set XQ = {x(1), . . . , x(Q)}:

IMSE(Xn) ≃ IMSE(Xn) =

Q∑

k=1

wk ρn(x(k))

= 1 − trace[C−1

n Σn

]

with∑Q

k=1 wk = 1 (wk = 1/Q when µ if uniform)

and Σn =∑Q

k=1 wkcn(x(k))c⊤n (x(k))

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 10 / 62

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Calculation for a finite Q-points set XQ = {x(1), . . . , x(Q)}:

IMSE(Xn) ≃ IMSE(Xn) =

Q∑

k=1

wk ρn(x(k))

= 1 − trace[C−1

n Σn

]

with∑Q

k=1 wk = 1 (wk = 1/Q when µ if uniform)

and Σn =∑Q

k=1 wkcn(x(k))c⊤n (x(k))

If, moreover, x1, . . . , xn ∈ XQ , with xi = x(ki ), i = 1, . . . , n,

then Σn = {QWQ}JnJn

with {Q}kℓ = C(x(k) − x(ℓ)), W = diag{w1, . . . , wQ}and Jn = {k1, . . . , kn}

➠ IMSE(Xn) ≃ 1 − trace[QJnJn

−1{QWQ}JnJn

]

not expensive to compute once Q and QWQ have been calculated

(a bit more complicated with a trend r⊤(x)β)

Minimization not obvious (for instance, by simulated annealing),see § 3.3 for another approach

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 10 / 62

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Ex. of IMSE-optimal design (Gauthier & P., 2014, 2016):

X = regular grid with 372 = 1 369 pointsC(x − x′) = C1({x}1 − {x′}1) × C2({x}2 − {x′}2),Ci(x − x ′) = (1 + 25/

√3|x − x ′|) exp[−25/

√3|x − x ′|] (Matérn 3/2)

x1

0.2

0.4

0.6

0.8

x2

0.2

0.4

0.6

0.80.000

0.001

0.002

0.003

Grid weight ωk

(measure of interest µ)

0.0000

0.0005

0.0010

0.0015

0.2 0.4 0.6 0.8

0.2

0.4

0.6

0.8

Grid pointsDesign points

33-point optimal design

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 11 / 62

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

For MMSE : do not choose x∗n+1 = arg minx∈X MMSE({Xn, x})!

➠ take instead x∗n+1 = arg maxx∈X ρn(x)

ρ5(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

For MMSE : do not choose x∗n+1 = arg minx∈X MMSE({Xn, x})!

➠ take instead x∗n+1 = arg maxx∈X ρn(x)

ρ6(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

For MMSE : do not choose x∗n+1 = arg minx∈X MMSE({Xn, x})!

➠ take instead x∗n+1 = arg maxx∈X ρn(x)

ρ7(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

For MMSE : do not choose x∗n+1 = arg minx∈X MMSE({Xn, x})!

➠ take instead x∗n+1 = arg maxx∈X ρn(x)

ρ8(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

For MMSE : do not choose x∗n+1 = arg minx∈X MMSE({Xn, x})!

➠ take instead x∗n+1 = arg maxx∈X ρn(x)

ρ9(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.01

0.02

0.03

0.04

0.05

0.06

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

For MMSE : do not choose x∗n+1 = arg minx∈X MMSE({Xn, x})!

➠ take instead x∗n+1 = arg maxx∈X ρn(x)

ρ10(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

For MMSE : do not choose x∗n+1 = arg minx∈X MMSE({Xn, x})!

➠ take instead x∗n+1 = arg maxx∈X ρn(x)

ρ11(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.005

0.01

0.015

0.02

0.025

0.03

0.035

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

For MMSE : do not choose x∗n+1 = arg minx∈X MMSE({Xn, x})!

➠ take instead x∗n+1 = arg maxx∈X ρn(x)

ρ12(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.005

0.01

0.015

0.02

0.025

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

For MMSE : do not choose x∗n+1 = arg minx∈X MMSE({Xn, x})!

➠ take instead x∗n+1 = arg maxx∈X ρn(x)

ρ13(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.002

0.004

0.006

0.008

0.01

0.012

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

For MMSE : do not choose x∗n+1 = arg minx∈X MMSE({Xn, x})!

➠ take instead x∗n+1 = arg maxx∈X ρn(x)

ρ14(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

3.5x 10

−3

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

For MMSE : do not choose x∗n+1 = arg minx∈X MMSE({Xn, x})!

➠ take instead x∗n+1 = arg maxx∈X ρn(x)

ρ15(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5x 10

−3

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

For MMSE : do not choose x∗n+1 = arg minx∈X MMSE({Xn, x})!

➠ take instead x∗n+1 = arg maxx∈X ρn(x)

ρ16(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

1

2

3

4

5

6x 10

−4

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

1 Optimal design for Gaussian process models & kriging 1.2 Criteria based on MSE

Sequential construction of an optimal design:

For IMSE: nothing special, at step n + 1, Xn+1 = {Xn, xn+1} with

x∗n+1 = arg minx∈X IMSE({Xn, x})

For MMSE : do not choose x∗n+1 = arg minx∈X MMSE({Xn, x})!

➠ take instead x∗n+1 = arg maxx∈X ρn(x)

ρ17(x)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8x 10

−4

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 12 / 62

1 Optimal design for Gaussian process models & kriging 1.3 Maximum Entropy Sampling

1.3 Maximum Entropy Sampling (Shewry and Wynn, 1987)

Without trend (r(x) = 0 ∀x) :➤ zQ , vector with components Z (x(k)), x(k) ∈ XQ

➤ zn , vector with components Z (xi), i = 1, . . . , n (observations)➤ H1(z) , −

∫ϕ(z) log[ϕ(z)] dz Shannon entropy of ϕ(z)

= measure of “dispersion”H1(z1|z2) , conditional entropy of z1 given z2

=∫ [

−∫

ϕ(z1|z2) log[ϕ(z1|z2)] dz1]

ϕ(z2) d(z2)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 13 / 62

1 Optimal design for Gaussian process models & kriging 1.3 Maximum Entropy Sampling

1.3 Maximum Entropy Sampling (Shewry and Wynn, 1987)

Without trend (r(x) = 0 ∀x) :➤ zQ , vector with components Z (x(k)), x(k) ∈ XQ

➤ zn , vector with components Z (xi), i = 1, . . . , n (observations)➤ H1(z) , −

∫ϕ(z) log[ϕ(z)] dz Shannon entropy of ϕ(z)

= measure of “dispersion”H1(z1|z2) , conditional entropy of z1 given z2

=∫ [

−∫

ϕ(z1|z2) log[ϕ(z1|z2)] dz1]

ϕ(z2) d(z2)

We get H1(yQ) = H1(yn) + E{H1(yQ |yn)}

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 13 / 62

1 Optimal design for Gaussian process models & kriging 1.3 Maximum Entropy Sampling

1.3 Maximum Entropy Sampling (Shewry and Wynn, 1987)

Without trend (r(x) = 0 ∀x) :➤ zQ , vector with components Z (x(k)), x(k) ∈ XQ

➤ zn , vector with components Z (xi), i = 1, . . . , n (observations)➤ H1(z) , −

∫ϕ(z) log[ϕ(z)] dz Shannon entropy of ϕ(z)

= measure of “dispersion”H1(z1|z2) , conditional entropy of z1 given z2

=∫ [

−∫

ϕ(z1|z2) log[ϕ(z1|z2)] dz1]

ϕ(z2) d(z2)

We get H1(yQ)︸ ︷︷ ︸=constant

= H1(yn) + E{H1(yQ |yn)}︸ ︷︷ ︸to be minimized

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 13 / 62

1 Optimal design for Gaussian process models & kriging 1.3 Maximum Entropy Sampling

1.3 Maximum Entropy Sampling (Shewry and Wynn, 1987)

Without trend (r(x) = 0 ∀x) :➤ zQ , vector with components Z (x(k)), x(k) ∈ XQ

➤ zn , vector with components Z (xi), i = 1, . . . , n (observations)➤ H1(z) , −

∫ϕ(z) log[ϕ(z)] dz Shannon entropy of ϕ(z)

= measure of “dispersion”H1(z1|z2) , conditional entropy of z1 given z2

=∫ [

−∫

ϕ(z1|z2) log[ϕ(z1|z2)] dz1]

ϕ(z2) d(z2)

We get H1(yQ)︸ ︷︷ ︸=constant

= H1(yn) + E{H1(yQ |yn)}︸ ︷︷ ︸to be minimized

Minimize E{H1(yQ |yn)} w.r.t. Xn ⇔ maximize H1(yn)

Z (x) is Gaussian ➠ maximize det[Cn]

= intra-distances criterion

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 13 / 62

1 Optimal design for Gaussian process models & kriging 1.3 Maximum Entropy Sampling

Sequential construction of an optimal design:

xn+1 = arg maxx∈X det[Cn+1] = arg maxx∈X det

[Cn cn(x)

c⊤n (x) 1

]

︸ ︷︷ ︸=det[Cn] (1 − c⊤

n (x)C−1n cn(x))︸ ︷︷ ︸

=ρn(x)

➠ xn+1 = arg maxx∈X ρn(x)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 14 / 62

1 Optimal design for Gaussian process models & kriging 1.3 Maximum Entropy Sampling

Sequential construction of an optimal design:

xn+1 = arg maxx∈X det[Cn+1] = arg maxx∈X det

[Cn cn(x)

c⊤n (x) 1

]

︸ ︷︷ ︸=det[Cn] (1 − c⊤

n (x)C−1n cn(x))︸ ︷︷ ︸

=ρn(x)

➠ xn+1 = arg maxx∈X ρn(x)

The designs obtained are typically space-filling:

Johnson et al. (1990) : if C(x − x′) = c(‖x − x′‖) with c(·) decreasing, then X∗n

optimal for ΦMm(·) (Maximin optimal) tends to be optimal for det[Cn] withcovariance Ca(x − x′) = [C(x − x′)]a when a → ∞

➠ there are design points xi on the boundary of X

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 14 / 62

1 Optimal design for Gaussian process models & kriging 1.3 Maximum Entropy Sampling

small

inter-distance

(miniMax,

dispersion)

large

intra-distances

(Maximin, energy)

uniformity

(entropy,

discrepancy)

Inequalities, bounds

Max. Entropy

SamplingMinimize

max. of MSE

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 15 / 62

2 Optimal design for linear regression 2.1 Linear regression

2 Optimal design for linear regression

2.1 Linear regression

Observations yi = y(xi) = r⊤(xi)γ + εi , γ ∈ Rp

with (εi) i.i.d., E{εi} = 0, var{εi} = σ2 ∀i

Estimation of γ by Least-Squares (LS)

γn = (R⊤n Rn)−1R⊤

n yn, with yn = (y1, . . . , yn)⊤ and Rn =

r⊤(x1)...

r⊤(xn)

E{γn} = γ (unbiased)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 16 / 62

2 Optimal design for linear regression 2.1 Linear regression

2 Optimal design for linear regression

2.1 Linear regression

Observations yi = y(xi) = r⊤(xi)γ + εi , γ ∈ Rp

with (εi) i.i.d., E{εi} = 0, var{εi} = σ2 ∀i

Estimation of γ by Least-Squares (LS)

γn = (R⊤n Rn)−1R⊤

n yn, with yn = (y1, . . . , yn)⊤ and Rn =

r⊤(x1)...

r⊤(xn)

E{γn} = γ (unbiased)

Covariance = cov(γn) = σ2(R⊤n Rn)−1 = σ2

n

1

n

n∑

i=1

r(xi)r⊤(xi)

︸ ︷︷ ︸Mn

−1

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 16 / 62

2 Optimal design for linear regression 2.1 Linear regression

cov(γn) = σ2

nM−1

n , with

Mn = M(Xn) = 1n

∑ni=1 r(xi)r

⊤(xi) ∈ Rp×p

= information matrix (per observation)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 17 / 62

2 Optimal design for linear regression 2.1 Linear regression

cov(γn) = σ2

nM−1

n , with

Mn = M(Xn) = 1n

∑ni=1 r(xi)r

⊤(xi) ∈ Rp×p

= information matrix (per observation)

Optimal design X∗n: maximizes a scalar function Φ(·) of Mn (with Φ(·) Loewner

increasing)

E -optimality: maximize λmin(Mn)(minimize longest axis of confidence ellipsoids for γ)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 17 / 62

2 Optimal design for linear regression 2.1 Linear regression

cov(γn) = σ2

nM−1

n , with

Mn = M(Xn) = 1n

∑ni=1 r(xi)r

⊤(xi) ∈ Rp×p

= information matrix (per observation)

Optimal design X∗n: maximizes a scalar function Φ(·) of Mn (with Φ(·) Loewner

increasing)

E -optimality: maximize λmin(Mn)(minimize longest axis of confidence ellipsoids for γ)

A-optimality: maximize −trace[M−1n ] ⇔ maximize 1/trace[M−1

n ](minimize sum of squared lengths of axes of confidence ellipsoids for γ)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 17 / 62

2 Optimal design for linear regression 2.1 Linear regression

cov(γn) = σ2

nM−1

n , with

Mn = M(Xn) = 1n

∑ni=1 r(xi)r

⊤(xi) ∈ Rp×p

= information matrix (per observation)

Optimal design X∗n: maximizes a scalar function Φ(·) of Mn (with Φ(·) Loewner

increasing)

E -optimality: maximize λmin(Mn)(minimize longest axis of confidence ellipsoids for γ)

A-optimality: maximize −trace[M−1n ] ⇔ maximize 1/trace[M−1

n ](minimize sum of squared lengths of axes of confidence ellipsoids for γ)

more generally, L-optimality: maximize −trace[LM−1n ]

(we only consider the case L symmetric positive definite)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 17 / 62

2 Optimal design for linear regression 2.1 Linear regression

D-optimality: maximize log det Mn

(minimize volume of confidence ellipsoids for γ)Very much used:

a D-optimal design is invariant by reparametrization:

det M′

n(β(γ)) = det Mn(γ) det−2

(∂β

∂γ⊤

)

➠ often leads to repeat the same experimental conditions (replications)(we assumed i.i.d. errors εi ⇒ several observations at the same xi carryinformation)

Tensor-product models

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 18 / 62

2 Optimal design for linear regression 2.2 Exact design

2.2 Exact design

n observations at x1, . . . , xn, xi ∈ Rd

Maximize Φ(Mn) w.r.t. Xn = {x1, . . . , xn} ∈ Rn×d

with Mn = M(Xn) = 1n

∑ni=1 r(xi)r

⊤(xi)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 19 / 62

2 Optimal design for linear regression 2.2 Exact design

2.2 Exact design

n observations at x1, . . . , xn, xi ∈ Rd

Maximize Φ(Mn) w.r.t. Xn = {x1, . . . , xn} ∈ Rn×d

with Mn = M(Xn) = 1n

∑ni=1 r(xi)r

⊤(xi)

➤ If problem dimension n × d is not too big➠ “standard” algorithm (but careful with constraints and local optimas!)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 19 / 62

2 Optimal design for linear regression 2.2 Exact design

2.2 Exact design

n observations at x1, . . . , xn, xi ∈ Rd

Maximize Φ(Mn) w.r.t. Xn = {x1, . . . , xn} ∈ Rn×d

with Mn = M(Xn) = 1n

∑ni=1 r(xi)r

⊤(xi)

➤ If problem dimension n × d is not too big➠ “standard” algorithm (but careful with constraints and local optimas!)

➤ Otherwise, ➠ specific algorithm

Exchange method: at step k, exchange one support point xj with a better one x∗

in X (better for Φ(·))

Xkn = (x1, . . . , xj

lx∗

, . . . , xn)

Approximate design

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 19 / 62

2 Optimal design for linear regression 2.2 Exact design

Fedorov (1972) algorithm:

At each iteration k, consider all n possible exchanges successively, each timestarting from Xk

n , retain the «best» one among these n → Xk+1n ➠ Xk+1

n

Xkn = ( x1

lx∗

1

, . . . , xj

lx∗

j

, . . . , xn

lx∗

n

)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 20 / 62

2 Optimal design for linear regression 2.2 Exact design

Fedorov (1972) algorithm:

At each iteration k, consider all n possible exchanges successively, each timestarting from Xk

n , retain the «best» one among these n → Xk+1n ➠ Xk+1

n

Xkn = ( x1

lx∗

1

, . . . , xj

lx∗

j

, . . . , xn

lx∗

n

)

One iteration → n optimizations of dimension d followed by ranking n criterionvalues

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 20 / 62

2 Optimal design for linear regression 2.2 Exact design

DETMAX algorithm Mitchell (1974):

If one additional observation were allowed: optimal choice

Xk+n+1 = (x1, . . . , xj , . . . , xn, x∗

n+1)

Then, remove one support point to return to a n-points design:➔ consider all n + 1 possible cancellations,retain the less penalizing in the sense of Φ(·)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 21 / 62

2 Optimal design for linear regression 2.2 Exact design

DETMAX algorithm Mitchell (1974):

If one additional observation were allowed: optimal choice

Xk+n+1 = (x1, . . . , xj , . . . , xn, x∗

n+1)

Then, remove one support point to return to a n-points design:➔ consider all n + 1 possible cancellations,retain the less penalizing in the sense of Φ(·)

➔ globally, exchange some xj with x∗n+1

[= excursion of length 1, longer excursions are possible. . . ]One iteration → 1 optimization of dimension d followed by ranking n + 1 criterionvalues

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 21 / 62

2 Optimal design for linear regression 2.2 Exact design

DETMAX has simpler iterations than Fedorov, but usually requires moreiterations

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 22 / 62

2 Optimal design for linear regression 2.2 Exact design

DETMAX has simpler iterations than Fedorov, but usually requires moreiterations

dead ends are possible:• DETMAX: the point to be removed is xn+1

• Fedorov: no possible improvement when optimizing one xi at a time

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 22 / 62

2 Optimal design for linear regression 2.2 Exact design

DETMAX has simpler iterations than Fedorov, but usually requires moreiterations

dead ends are possible:• DETMAX: the point to be removed is xn+1

• Fedorov: no possible improvement when optimizing one xi at a time

▲ both give local optima only ▲

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 22 / 62

2 Optimal design for linear regression 2.2 Exact design

DETMAX has simpler iterations than Fedorov, but usually requires moreiterations

dead ends are possible:• DETMAX: the point to be removed is xn+1

• Fedorov: no possible improvement when optimizing one xi at a time

▲ both give local optima only ▲

Other methods:

Branch and bound: guaranteed convergence, but complicated [Welch 1982]Rounding an optimal design measure (support points xi and associatedweights w∗

i , i = 1, . . . , m, presented next in § 2.3):choose n integers ri (ri = nb. of replications of observations at xi ) such that∑m

i=1 ri = n and ri /n ≈ w∗

i

(e.g., maximize mini=1,...,m ri /w∗

i = Adams apportionment, see [Pukelsheim &Reider 1992])

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 22 / 62

2 Optimal design for linear regression 2.3 Approximate design theory

2.3 Approximate design theory

(Chernoff, 1953; Kiefer and Wolfowitz, 1960; Fedorov, 1972; Silvey, 1980;Pukelsheim, 1993) . . .

Mn = M(Xn) = 1n

∑ni=1 r(xi)r

⊤(xi)

(the additive form is essential — related to the independence of observations)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 23 / 62

2 Optimal design for linear regression 2.3 Approximate design theory

2.3 Approximate design theory

(Chernoff, 1953; Kiefer and Wolfowitz, 1960; Fedorov, 1972; Silvey, 1980;Pukelsheim, 1993) . . .

Mn = M(Xn) = 1n

∑ni=1 r(xi)r

⊤(xi)

(the additive form is essential — related to the independence of observations)

If several xi coincide (repetitions), with only m < n different xi

M(Xn) =∑m

i=1ri

nr(xi)r

⊤(xi)

ri

n= proportion of observations collected at xi

= «percentage of experimental effort» at xi

= weight wi of support point xi

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 23 / 62

2 Optimal design for linear regression 2.3 Approximate design theory

M(Xn) =∑m

i=1 wi r(xi)r⊤(xi)

➠ design Xn ⇔{

x1 · · · xm

w1 · · · wm

}with

∑mi=1 wi = 1

➠ normalized discrete distribution on the xi ,

with constraints wi = ri/n

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 24 / 62

2 Optimal design for linear regression 2.3 Approximate design theory

M(Xn) =∑m

i=1 wi r(xi)r⊤(xi)

➠ design Xn ⇔{

x1 · · · xm

w1 · · · wm

}with

∑mi=1 wi = 1

➠ normalized discrete distribution on the xi ,

with constraints wi = ri/n

➠ Release the constraints: only enforce wi ≥ 0 et∑m

i=1 wi = 1➠ ξ = discrete probability measure on X

support points xi and associated weights wi

= “approximate design”

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 24 / 62

2 Optimal design for linear regression 2.3 Approximate design theory

M(Xn) =∑m

i=1 wi r(xi)r⊤(xi)

➠ design Xn ⇔{

x1 · · · xm

w1 · · · wm

}with

∑mi=1 wi = 1

➠ normalized discrete distribution on the xi ,

with constraints wi = ri/n

➠ Release the constraints: only enforce wi ≥ 0 et∑m

i=1 wi = 1➠ ξ = discrete probability measure on X

support points xi and associated weights wi

= “approximate design”More general expression: ξ = any probability measure on X

M(ξ) =

X

r(x)r⊤(x) ξ(dx) with

X

ξ(dx) = 1

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 24 / 62

2 Optimal design for linear regression 2.3 Approximate design theory

M(ξ) ∈ convex closure of M= set of rank 1 matricesM(δx ) = r(x)r⊤(x)

M(ξ) is symmetric p × p ⇒ ∈ q-dimensional space, q = p(p+1)2

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 25 / 62

2 Optimal design for linear regression 2.3 Approximate design theory

M(ξ) ∈ convex closure of M= set of rank 1 matricesM(δx ) = r(x)r⊤(x)

M(ξ) is symmetric p × p ⇒ ∈ q-dimensional space, q = p(p+1)2

ξ = w1δx1 + w2δx2 + w3δx3

(3 points are enough for q = 2)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 25 / 62

2 Optimal design for linear regression 2.3 Approximate design theory

Caratheodory Theorem:

M(ξ) can be written as the linear combination of at most q + 1 elements of M:

M(ξ) =∑m

i=1 wi r(xi)r⊤(xi) , m ≤ p(p+1)

2 + 1

⇒ consider discrete probability measures with p(p+1)2 + 1 support points at most

(true in particular for the optimum design!)[Even better: for many criteria Φ(·), if ξ∗ is optimal (maximizes Φ[M(ξ)]) then M(ξ∗) is

on the boundary of the convex closure of M and p(p+1)2 support points are enough]

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 26 / 62

2 Optimal design for linear regression 2.3 Approximate design theory

Caratheodory Theorem:

M(ξ) can be written as the linear combination of at most q + 1 elements of M:

M(ξ) =∑m

i=1 wi r(xi)r⊤(xi) , m ≤ p(p+1)

2 + 1

⇒ consider discrete probability measures with p(p+1)2 + 1 support points at most

(true in particular for the optimum design!)[Even better: for many criteria Φ(·), if ξ∗ is optimal (maximizes Φ[M(ξ)]) then M(ξ∗) is

on the boundary of the convex closure of M and p(p+1)2 support points are enough]

Suppose we found an optimal ξ∗ =∑m

i=1 w∗i δxi

☞ for a given n, choose the ri so that ri

n≃ w∗

i optimum→ rounding of an approximate design

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 26 / 62

2 Optimal design for linear regression 2.3 Approximate design theory

Caratheodory Theorem:

M(ξ) can be written as the linear combination of at most q + 1 elements of M:

M(ξ) =∑m

i=1 wi r(xi)r⊤(xi) , m ≤ p(p+1)

2 + 1

⇒ consider discrete probability measures with p(p+1)2 + 1 support points at most

(true in particular for the optimum design!)[Even better: for many criteria Φ(·), if ξ∗ is optimal (maximizes Φ[M(ξ)]) then M(ξ∗) is

on the boundary of the convex closure of M and p(p+1)2 support points are enough]

Suppose we found an optimal ξ∗ =∑m

i=1 w∗i δxi

☞ for a given n, choose the ri so that ri

n≃ w∗

i optimum→ rounding of an approximate design

Why design measures are interesting?How does it simplify the optimization problem?

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 26 / 62

2 Optimal design for linear regression 2.3 Approximate design theory

➠ Maximize Φ(·) concave w.r.t. M(ξ) in a convex setEx: D-optimality: ∀ M1 ≻ O, M2 � O, with M2 6∝ M2, ∀α, 0 < α < 1,log det[(1 − α)M1 + αM2] > (1 − α) log det M1 + α log det M2

⇒ log det[·] is (strictly) concaveconvex set + concave criterion ⇒ one unique optimum!

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 27 / 62

2 Optimal design for linear regression 2.3 Approximate design theory

➠ Maximize Φ(·) concave w.r.t. M(ξ) in a convex setEx: D-optimality: ∀ M1 ≻ O, M2 � O, with M2 6∝ M2, ∀α, 0 < α < 1,log det[(1 − α)M1 + αM2] > (1 − α) log det M1 + α log det M2

⇒ log det[·] is (strictly) concaveconvex set + concave criterion ⇒ one unique optimum!

ξ∗ is optimal ⇔ directional derivative≤ 0 in all directions

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 27 / 62

2 Optimal design for linear regression 2.3 Approximate design theory

=⇒ “Equivalence Theorem” [Kiefer & Wolfowitz 1960]Ξ = set of probability measures on X , Φ(·) concave, φ(ξ) = Φ[M(ξ)]

Fφ(ξ; ν) = limα→0+φ[(1−α)ξ+αν]−φ(ξ)

α= directional derivative of φ(·) at ξ in direction ν

Equivalence Theorem: ξ∗ maximizes φ(ξ) ⇔ maxν∈Ξ Fφ(ξ∗; ν) ≤ 0

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 28 / 62

2 Optimal design for linear regression 2.3 Approximate design theory

=⇒ “Equivalence Theorem” [Kiefer & Wolfowitz 1960]Ξ = set of probability measures on X , Φ(·) concave, φ(ξ) = Φ[M(ξ)]

Fφ(ξ; ν) = limα→0+φ[(1−α)ξ+αν]−φ(ξ)

α= directional derivative of φ(·) at ξ in direction ν

Equivalence Theorem: ξ∗ maximizes φ(ξ) ⇔ maxν∈Ξ Fφ(ξ∗; ν) ≤ 0

➔ Takes a simple form when Φ(·) is differentiable

ξ∗ maximizes φ(ξ) ⇔ maxx∈X Fφ(ξ∗; δx) ≤ 0

☞ Check optimality of ξ∗ by plotting Fφ(ξ∗; δx)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 28 / 62

2 Optimal design for linear regression 2.3 Approximate design theory

Ex: D-optimal design

ξ∗D maximizes log det[M(ξ)] w.r.t. ξ ∈ Ξ

⇔ maxx∈X d(ξ∗D , x) ≤ p

⇔ ξ∗D minimizes maxx∈X d(ξ, x) w.r.t. ξ ∈ Ξ

where d(ξ, x) = r⊤(x)M−1(ξ)r(x)Moreover, d(ξ∗

D , xi) = p = dim(θ) for any xi = support point of ξ∗D

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 29 / 62

2 Optimal design for linear regression 2.3 Approximate design theory

Ex. : r(x) = (1 x x2)⊤ (p = 3) i.i.d. erreurs, X = [0, 2]➠ d(ξ, x) as a function of x

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 30 / 62

2 Optimal design for linear regression 2.3 Approximate design theory

Ex. : r(x) = (1 x x2)⊤ (p = 3) i.i.d. erreurs, X = [0, 2]➠ d(ξ, x) as a function of x

ξ∗D =

{0 1 2

1/3 1/3 1/3

}

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 21

1.5

2

2.5

3

3.5

1/3 1/3 1/3

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 30 / 62

2 Optimal design for linear regression 2.3 Approximate design theory

Ex. : r(x) = (1 x x2)⊤ (p = 3) i.i.d. erreurs, X = [0, 2]➠ d(ξ, x) as a function of x

ξ∗D =

{0 1 2

1/3 1/3 1/3

}

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 21

1.5

2

2.5

3

3.5

1/3 1/3 1/3

ξ =

{0 1.5 2

1/3 1/2 1/6

}

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 21

1.5

2

2.5

3

3.5

4

4.5

5

5.5

6

1/3 1/2 1/6

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 30 / 62

2 Optimal design for linear regression 2.3 Approximate design theory

KW Eq. Th. relates optimality in γ space (parameters)to optimality in y space (observations)

nvar[r⊤(x)γn)] = σ2 r⊤(x)M−1(ξ)r(x) = σ2 d(ξ, x) (i.i.d. errors)

D-optimality ⇔ G-optimality

➠ ξ∗D minimizes the maximum value of prediction variance over X

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 31 / 62

2 Optimal design for linear regression 2.3 Approximate design theory

KW Eq. Th. relates optimality in γ space (parameters)to optimality in y space (observations)

nvar[r⊤(x)γn)] = σ2 r⊤(x)M−1(ξ)r(x) = σ2 d(ξ, x) (i.i.d. errors)

D-optimality ⇔ G-optimality

➠ ξ∗D minimizes the maximum value of prediction variance over X

η(x, γn)η(x, γn)± 2 standard deviations

➠ put next observation where d(ξ, x) islarge

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Tensor-product models

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 31 / 62

2 Optimal design for linear regression 2.3 Approximate design theory

Construction of an optimal design measure

Central idea (▲ for a differentiable Φ(·) ▲): use steepest-ascent directionFedorov–Wynn :

1 : Choose ξ1 non degenerate (det M(ξ1) > 0)

2 : Compute x∗k = arg maxX Fφ(ξk ; δx)

If Fφ(ξk ; δx+k) < ǫ, stop: ξk is ǫ-optimal

3 : ξk+1 = (1 − αk)ξk + αkδx∗

k(delta measure at x∗

k)[Vertex Direction]

k → k + 1, return to step 2

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 32 / 62

2 Optimal design for linear regression 2.3 Approximate design theory

Construction of an optimal design measure

Central idea (▲ for a differentiable Φ(·) ▲): use steepest-ascent directionFedorov–Wynn :

1 : Choose ξ1 non degenerate (det M(ξ1) > 0)

2 : Compute x∗k = arg maxX Fφ(ξk ; δx)

If Fφ(ξk ; δx+k) < ǫ, stop: ξk is ǫ-optimal

3 : ξk+1 = (1 − αk)ξk + αkδx∗

k(delta measure at x∗

k)[Vertex Direction]

k → k + 1, return to step 2

Step size αk?➠ αk = arg max φ(ξk+1)

=d(ξk ,x∗

k )−p

p[d(ξk ,x∗

k)−1] for D-optimality (Fedorov, 1972)

→ monotone convergence➠ αk > 0 , limk→∞ αk = 0 ,

∑∞i=1 αk = ∞

((Wynn, 1970) for D-optimality)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 32 / 62

2 Optimal design for linear regression 2.3 Approximate design theory

Remarks:

Sequential design, one xi at at a time enters M(X) :M(Xk+1) = k

k+1 M(Xk) + 1k+1 r(xk+1)r⊤(xk+1)

with xk+1 = arg maxx∈X Fφ(ξk ; δx)⇔ Wynn’s algorithm with αk = 1

k+1

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 33 / 62

2 Optimal design for linear regression 2.3 Approximate design theory

Remarks:

Sequential design, one xi at at a time enters M(X) :M(Xk+1) = k

k+1 M(Xk) + 1k+1 r(xk+1)r⊤(xk+1)

with xk+1 = arg maxx∈X Fφ(ξk ; δx)⇔ Wynn’s algorithm with αk = 1

k+1

Guaranteed convergence to the optimum

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 33 / 62

2 Optimal design for linear regression 2.3 Approximate design theory

Remarks:

Sequential design, one xi at at a time enters M(X) :M(Xk+1) = k

k+1 M(Xk) + 1k+1 r(xk+1)r⊤(xk+1)

with xk+1 = arg maxx∈X Fφ(ξk ; δx)⇔ Wynn’s algorithm with αk = 1

k+1

Guaranteed convergence to the optimum

There exist faster methods:

remove support points from ξk (≈ allow αk to be < 0) (Atwood, 1973;Böhning, 1985, 1986)combine with gradient projection (or a second-order method) (Wu, 1978)use a multiplicative algorithm (Titterington, 1976; Torsney, 1983, 2009; Yu,2010) (for A and D optimality, far from the optimum)combine different methods (Yu, 2011)Still an active topic — especially for non differentiable Φ(·). . .

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 33 / 62

2 Optimal design for linear regression 2.4 Tensor-product models

2.4 Tensor-product models

D-optimality (also true for A-optimality under some conditions (Schwabe, 1996))

[r(k)(x)]⊤θ(k) ,∑dk

i=1 θ(k)i x i polynomial with degree dk , dim(θ(k)) = pk = 1 + dk

Global model for x = ({x}1, {x}2, . . . , {x}d)⊤:

r⊤(x)γ =∏d

k=1[r(k)(x)]⊤θ(k),

total degree∑d

k=1 dk , dim(γ) =∏d

k=1 pk

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 34 / 62

2 Optimal design for linear regression 2.4 Tensor-product models

2.4 Tensor-product models

D-optimality (also true for A-optimality under some conditions (Schwabe, 1996))

[r(k)(x)]⊤θ(k) ,∑dk

i=1 θ(k)i x i polynomial with degree dk , dim(θ(k)) = pk = 1 + dk

Global model for x = ({x}1, {x}2, . . . , {x}d)⊤:

r⊤(x)γ =∏d

k=1[r(k)(x)]⊤θ(k),

total degree∑d

k=1 dk , dim(γ) =∏d

k=1 pk

Example:

r⊤(x)γ = (θ(1)0 + θ

(1)1 {x}1 + θ

(1)2 {x}2

1) × (θ(2)0 + θ

(2)1 {x}2 + θ

(2)2 {x}2

2)

= γ0 + γ1{x}1 + γ2{x}2 + γ12{x}1{x}2 + γ11{x}12 + γ22{x}2

2

+γ112{x}12{x}2 + γ122{x}1{x}2

2 + γ1122{x}12{x}2

2

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 34 / 62

2 Optimal design for linear regression 2.4 Tensor-product models

2.4 Tensor-product models

D-optimality (also true for A-optimality under some conditions (Schwabe, 1996))

[r(k)(x)]⊤θ(k) ,∑dk

i=1 θ(k)i x i polynomial with degree dk , dim(θ(k)) = pk = 1 + dk

Global model for x = ({x}1, {x}2, . . . , {x}d)⊤:

r⊤(x)γ =∏d

k=1[r(k)(x)]⊤θ(k),

total degree∑d

k=1 dk , dim(γ) =∏d

k=1 pk

Example:

r⊤(x)γ = (θ(1)0 + θ

(1)1 {x}1 + θ

(1)2 {x}2

1) × (θ(2)0 + θ

(2)1 {x}2 + θ

(2)2 {x}2

2)

= γ0 + γ1{x}1 + γ2{x}2 + γ12{x}1{x}2 + γ11{x}12 + γ22{x}2

2

+γ112{x}12{x}2 + γ122{x}1{x}2

2 + γ1122{x}12{x}2

2

D-optimal design (approximate theory) = tensor product of d one-dimensionalD-optimal designs(true for any type of model, not only polynomials)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 34 / 62

2 Optimal design for linear regression 2.4 Tensor-product models

Polynomial of degree k: D-optimal design supported on k + 1 points,(on [−1, 1], roots of (1 − t2)P ′

k(t), with Pk(t) , k-th Legendre polynomial)all with equal weight 1/(k + 1)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 35 / 62

2 Optimal design for linear regression 2.4 Tensor-product models

Polynomial of degree k: D-optimal design supported on k + 1 points,(on [−1, 1], roots of (1 − t2)P ′

k(t), with Pk(t) , k-th Legendre polynomial)all with equal weight 1/(k + 1)

dimension 2, d1 = d2 = 29 points, weights =1/9

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 35 / 62

2 Optimal design for linear regression 2.4 Tensor-product models

Polynomial of degree k: D-optimal design supported on k + 1 points,(on [−1, 1], roots of (1 − t2)P ′

k(t), with Pk(t) , k-th Legendre polynomial)all with equal weight 1/(k + 1)

dimension 2, d1 = d2 = 316 points, weights =1/16

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 35 / 62

2 Optimal design for linear regression 2.4 Tensor-product models

Polynomial of degree k: D-optimal design supported on k + 1 points,(on [−1, 1], roots of (1 − t2)P ′

k(t), with Pk(t) , k-th Legendre polynomial)all with equal weight 1/(k + 1)

dimension 2, d1 = d2 = 425 points, weights =1/25

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 35 / 62

2 Optimal design for linear regression 2.4 Tensor-product models

Polynomial of degree k: D-optimal design supported on k + 1 points,(on [−1, 1], roots of (1 − t2)P ′

k(t), with Pk(t) , k-th Legendre polynomial)all with equal weight 1/(k + 1)

dimension 2, d1 = d2 = 536 points, weights =1/36

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 35 / 62

2 Optimal design for linear regression 2.4 Tensor-product models

Sum of polynomials?r⊤(x)γ =

∑dk=1[r(k)(x)]⊤θ(k),

total degree maxdk=1 dk , dim(γ) = (

∑dk=1 pk) − 1 =

∑dk=1 dk + d − 1

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 36 / 62

2 Optimal design for linear regression 2.4 Tensor-product models

Sum of polynomials?r⊤(x)γ =

∑dk=1[r(k)(x)]⊤θ(k),

total degree maxdk=1 dk , dim(γ) = (

∑dk=1 pk) − 1 =

∑dk=1 dk + d − 1

Example:

r⊤(x)γ = (θ(1)0 + θ

(1)1 {x}1 + θ

(1)2 {x}2

1) + (θ(2)0 + θ

(2)1 {x}2 + θ

(2)2 {x}2

2)

= γ0 + γ1{x}1 + γ2{x}2 + γ11{x}12 + γ22{x}2

2

(no interaction term)

Again, D-optimal design (approximate theory) = tensor product of d

one-dimensional D-optimal designs (Schwabe, 1996)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 36 / 62

2 Optimal design for linear regression 2.4 Tensor-product models

Sum of polynomials?r⊤(x)γ =

∑dk=1[r(k)(x)]⊤θ(k),

total degree maxdk=1 dk , dim(γ) = (

∑dk=1 pk) − 1 =

∑dk=1 dk + d − 1

Example:

r⊤(x)γ = (θ(1)0 + θ

(1)1 {x}1 + θ

(1)2 {x}2

1) + (θ(2)0 + θ

(2)1 {x}2 + θ

(2)2 {x}2

2)

= γ0 + γ1{x}1 + γ2{x}2 + γ11{x}12 + γ22{x}2

2

(no interaction term)

Again, D-optimal design (approximate theory) = tensor product of d

one-dimensional D-optimal designs (Schwabe, 1996)

Difficult to apply in big dimension:d polynomials of degree k ➠ (k + 1)d support points!

but a general lesson, and possible extension towards Gaussian process models andkriging

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 36 / 62

2 Optimal design for linear regression 2.5 Consequences for space-filling design

2.5 Consequences for space-filling design

D-optimality + polynomials ➠ more points close to the boundary as degreeincreasesErdös-Turan theorem: roots r of orthonormal polynomials on [0, 1] areasymptotically distributed with the arcsine law, with density ϕ0(r) = 1

π√

r (1−r)

➠ Should we put more points close to the boundary ?

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 37 / 62

2 Optimal design for linear regression 2.5 Consequences for space-filling design

2.5 Consequences for space-filling design

D-optimality + polynomials ➠ more points close to the boundary as degreeincreasesErdös-Turan theorem: roots r of orthonormal polynomials on [0, 1] areasymptotically distributed with the arcsine law, with density ϕ0(r) = 1

π√

r (1−r)

➠ Should we put more points close to the boundary ?

In order to counter the boundary effect Dette and Pepelyshev (2010):

Take a “standard” space-filling design (e.g., Maximin, Lh Maximin, LDS),

for each j = 1, . . . , d , transform j-th coordinates {xi}j with

T : x 7→ z = T (x) = 1+cos(π x)2

(x ∼ uniform → z ∼ arcsine),

Use the transformed design Zn = (z1, . . . , zn)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 37 / 62

2 Optimal design for linear regression 2.5 Consequences for space-filling design

dimension 2, 5-degree polynomials:D-optimal design has 36 points, weights =1/36

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 38 / 62

2 Optimal design for linear regression 2.5 Consequences for space-filling design

dimension 2, n = 36Transformed (arcsine) Maximin-optimal design

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 38 / 62

2 Optimal design for linear regression 2.5 Consequences for space-filling design

Illustration of boundary effect: d = 1, n = 11 observations in [0, 1], ordinary

kriging with covariance C(t) = exp(−50 t2) ➠ plot of ρn(x)

Xn Maximin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.002

0.004

0.006

0.008

0.01

0.012

0.014

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 39 / 62

2 Optimal design for linear regression 2.5 Consequences for space-filling design

Illustration of boundary effect: d = 1, n = 11 observations in [0, 1], ordinary

kriging with covariance C(t) = exp(−50 t2) ➠ plot of ρn(x)

Xn Maximin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.002

0.004

0.006

0.008

0.01

0.012

0.014

Xn miniMax

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

Uniform distribution of design points⇒ prediction near boundaries relies on less points⇒ precision is worse close to boundaries

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 39 / 62

2 Optimal design for linear regression 2.5 Consequences for space-filling design

Illustration of boundary effect: d = 1, n = 11 observations in [0, 1], ordinary

kriging with covariance C(t) = exp(−50 t2) ➠ plot of ρn(x)

Xn Maximin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.002

0.004

0.006

0.008

0.01

0.012

0.014

Maximin and miniMax

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

Uniform distribution of design points⇒ prediction near boundaries relies on less points⇒ precision is worse close to boundaries

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 39 / 62

2 Optimal design for linear regression 2.5 Consequences for space-filling design

. . . But transformation T : x 7→ z = T (x) = 1+cos(π x)2 may be too “strong”

Maximin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.002

0.004

0.006

0.008

0.01

0.012

0.014

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 40 / 62

2 Optimal design for linear regression 2.5 Consequences for space-filling design

. . . But transformation T : x 7→ z = T (x) = 1+cos(π x)2 may be too “strong”

Maximin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.002

0.004

0.006

0.008

0.01

0.012

0.014

Transformed (arcsine) Maximin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.02

0.04

0.06

0.08

0.1

0.12

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 40 / 62

2 Optimal design for linear regression 2.5 Consequences for space-filling design

. . . But transformation T : x 7→ z = T (x) = 1+cos(π x)2 may be too “strong”

Maximin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.002

0.004

0.006

0.008

0.01

0.012

0.014

Maximin + transformed Maximin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.02

0.04

0.06

0.08

0.1

0.12

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 40 / 62

2 Optimal design for linear regression 2.5 Consequences for space-filling design

Arcsine distribution: maximizes Φ[0](ξ) = exp[∫ 1

0

∫ 10 log ‖x − y‖ ξ(dx) ξ(dy)

]

(continuous version of φ[0](X) = exp[∑

i<j µij log(dij)], see § I-1.6)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 41 / 62

2 Optimal design for linear regression 2.5 Consequences for space-filling design

Arcsine distribution: maximizes Φ[0](ξ) = exp[∫ 1

0

∫ 10 log ‖x − y‖ ξ(dx) ξ(dy)

]

(continuous version of φ[0](X) = exp[∑

i<j µij log(dij)], see § I-1.6)

➤ Maximization of

Φ[q](ξ) =

[∫ 1

0

∫ 1

0‖x − y‖−q ξ(dx) ξ(dy)

]−1/q

, 0 < q < 1

(continuous version of φ[q](X) =[∑

i<j µij d−qij

]−1/q

, see § I-1.6) yields a

measure ξ with density ϕq(x) = x (q−1)/2(1−x)(q−1)/2

B( q+12 , q+1

2 )(Beta distribution) (Zhigljavsky

et al., 2010)(tends to arcsine when q → 0, to uniform when q → 1)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 41 / 62

2 Optimal design for linear regression 2.5 Consequences for space-filling design

In order to counter the boundary effect (Dette and Pepelyshev, 2010) :

Take a “standard” space-filling design (e.g., Maximin, Lh Maximin, LDS),

for each j = 1, . . . , d , transform j-th coordinates {xi}j withT : x 7→ z = T (x) such that x =

∫ z

0 ϕq(t) dt (x ∼ uniform → z ∼ ϕq),

Use the transformed design Zn = (z1, . . . , zn)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 42 / 62

2 Optimal design for linear regression 2.5 Consequences for space-filling design

In order to counter the boundary effect (Dette and Pepelyshev, 2010) :

Take a “standard” space-filling design (e.g., Maximin, Lh Maximin, LDS),

for each j = 1, . . . , d , transform j-th coordinates {xi}j withT : x 7→ z = T (x) such that x =

∫ z

0 ϕq(t) dt (x ∼ uniform → z ∼ ϕq),

Use the transformed design Zn = (z1, . . . , zn)

dimension 2, 5-degree polynomialsD-optimal design:

36 points, weights =1/36

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 42 / 62

2 Optimal design for linear regression 2.5 Consequences for space-filling design

In order to counter the boundary effect (Dette and Pepelyshev, 2010) :

Take a “standard” space-filling design (e.g., Maximin, Lh Maximin, LDS),

for each j = 1, . . . , d , transform j-th coordinates {xi}j withT : x 7→ z = T (x) such that x =

∫ z

0 ϕq(t) dt (x ∼ uniform → z ∼ ϕq),

Use the transformed design Zn = (z1, . . . , zn)

dimension 2, 5-degree polynomialsD-optimal design:

36 points, weights =1/36

dimension 2, n = 36Transformed (arcsine)

Maximin-optimal design

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 42 / 62

2 Optimal design for linear regression 2.5 Consequences for space-filling design

In order to counter the boundary effect (Dette and Pepelyshev, 2010) :

Take a “standard” space-filling design (e.g., Maximin, Lh Maximin, LDS),

for each j = 1, . . . , d , transform j-th coordinates {xi}j withT : x 7→ z = T (x) such that x =

∫ z

0 ϕq(t) dt (x ∼ uniform → z ∼ ϕq),

Use the transformed design Zn = (z1, . . . , zn)

dimension 2, 5-degree polynomialsD-optimal design:

36 points, weights =1/36

dimension 2, n = 36Transformed (Beta, q = 0.4)

Maximin-optimal design

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 42 / 62

2 Optimal design for linear regression 2.5 Consequences for space-filling design

. . . For a suitably chosen Beta-transformation (q = 0.84)

Maximin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.002

0.004

0.006

0.008

0.01

0.012

0.014

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 43 / 62

2 Optimal design for linear regression 2.5 Consequences for space-filling design

. . . For a suitably chosen Beta-transformation (q = 0.84)

Maximin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.002

0.004

0.006

0.008

0.01

0.012

0.014

Transformed (Beta) Maximin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

1

2

3

4

5

6

7

8

9x 10

−3

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 43 / 62

2 Optimal design for linear regression 2.5 Consequences for space-filling design

. . . For a suitably chosen Beta-transformation (q = 0.84)

Maximin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.002

0.004

0.006

0.008

0.01

0.012

0.014

Maximin + transformed Maximin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.002

0.004

0.006

0.008

0.01

0.012

0.014

Choice of a suitable q?➠ Optimize a precision criterion based on ρn(x)

(depends on covariance C(·))

▲ requires X = hypercube ▲

▲ if d is big, many points are on the boundary (or at the vertices!) of X ▲

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 43 / 62

3 Optimal design for Bayesian prediction 3.1 Karhunen-Loève decomposition

3 Optimal design for Bayesian prediction

3.1 Karhunen-Loève decomposition of a Gaussian process

Model without trend: f (x) = Z (x), Gaussian processE{Z (x)} = 0, E{Z (x)Z (x′)} = C(x, x′) (= C(x − x′) if stationary)

IMSEµ(Xn) ,∫

XE{

[Z (x) − E{Z (x)|yn}]2}

dµ(x)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 44 / 62

3 Optimal design for Bayesian prediction 3.1 Karhunen-Loève decomposition

3 Optimal design for Bayesian prediction

3.1 Karhunen-Loève decomposition of a Gaussian process

Model without trend: f (x) = Z (x), Gaussian processE{Z (x)} = 0, E{Z (x)Z (x′)} = C(x, x′) (= C(x − x′) if stationary)

IMSEµ(Xn) ,∫

XE{

[Z (x) − E{Z (x)|yn}]2}

dµ(x)

The integral operator Tµ defined by∀f ∈ L2(X , µ), ∀x ∈ X , Tµ [f ] (x) =

∫X

f (x′)K (x, x′)dµ(x′)is diagonalisable:

eigenvalues λi , i = 1, 2, 3 . . . (in ց order)associated eigenfunctions ϕi(·) (extended over X ), with∫

Xϕi(x)ϕj(x)dµ(x) = δij

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 44 / 62

3 Optimal design for Bayesian prediction 3.1 Karhunen-Loève decomposition

Z ′(x) , PHµ[Zx] =

∑i ζi

√λiϕi(x)

with all ζi i.i.d. N (0, 1)PHµ = projection ⊥ on the space “which contributes to IMSEµ”

Z ′(x) =∑

i γiϕi(x) where the r.v. γi are independent N (0, λi)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 45 / 62

3 Optimal design for Bayesian prediction 3.1 Karhunen-Loève decomposition

Z ′(x) , PHµ[Zx] =

∑i ζi

√λiϕi(x)

with all ζi i.i.d. N (0, 1)PHµ = projection ⊥ on the space “which contributes to IMSEµ”

Z ′(x) =∑

i γiϕi(x) where the r.v. γi are independent N (0, λi)

For a given truncation level m,

Z ′(x) =

m∑

i=1

γiϕi(x) +∑

i>m

γiϕi(x)

≃ Z ′′(x) =∑m

i=1 γiϕi(x) + ε(x)

with E{ε(xi)} = 0, E{ε(xi)ε(xj)} = σ2δij et σ2 =∑

i>m λi

Z ′′(xi) = φ⊤(xi)γ + εi

= linear regression model (as in § 2.1)(with eigenfunctions ϕi(·), i = 1, . . . , m, instead of polynomials)

(Fedorov, 1996) ➠ construct an optimal design for this model

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 45 / 62

3 Optimal design for Bayesian prediction 3.2 Bayesian prediction

3.2 Bayesian prediction for Z ′′(xi) = φ⊤(xi)γ + εi

LS estimation:

γn = (Φ⊤n Φn)−1Φ⊤

n yn, with yn = (y1, . . . , yn)⊤ and Φn =

φ⊤(x1)...

φ⊤(xn)

cov(γn) = σ2(Φ⊤n Φn)−1 = σ2

n

1

n

n∑

i=1

φ(xi)φ⊤(xi)

︸ ︷︷ ︸Mn

−1

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 46 / 62

3 Optimal design for Bayesian prediction 3.2 Bayesian prediction

3.2 Bayesian prediction for Z ′′(xi) = φ⊤(xi)γ + εi

LS estimation:

γn = (Φ⊤n Φn)−1Φ⊤

n yn, with yn = (y1, . . . , yn)⊤ and Φn =

φ⊤(x1)...

φ⊤(xn)

cov(γn) = σ2(Φ⊤n Φn)−1 = σ2

n

1

n

n∑

i=1

φ(xi)φ⊤(xi)

︸ ︷︷ ︸Mn

−1

Prediction at x : ηn(x) = φ⊤(x)γn

IMSE(Xn) =∫

Xφ⊤(x)cov(γn)φ(x) dµ(x) = σ2

ntrace[M−1

n ]= A-optimality criterion(requires n ≥ m to have a full rank Mn)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 46 / 62

3 Optimal design for Bayesian prediction 3.2 Bayesian prediction

Bayesian estimation: prior distribution N (0, Λm) for γ,with Λm = diag{λ1, . . . , λm}

γn = [Φ⊤n Φn/σ2 + Λ−1

m ]−1[Φ⊤n yn/σ2]

cov(γn) = [Φ⊤n Φn/σ2 + Λ−1

m ]−1 = σ2

n

1

n

n∑

i=1

φ(xi)φ⊤(xi)

︸ ︷︷ ︸Mn

+σ2

nΛ−1

m

︸ ︷︷ ︸MB(xn)

−1

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 47 / 62

3 Optimal design for Bayesian prediction 3.2 Bayesian prediction

Bayesian estimation: prior distribution N (0, Λm) for γ,with Λm = diag{λ1, . . . , λm}

γn = [Φ⊤n Φn/σ2 + Λ−1

m ]−1[Φ⊤n yn/σ2]

cov(γn) = [Φ⊤n Φn/σ2 + Λ−1

m ]−1 = σ2

n

1

n

n∑

i=1

φ(xi)φ⊤(xi)

︸ ︷︷ ︸Mn

+σ2

nΛ−1

m

︸ ︷︷ ︸MB(xn)

−1

Prediction at x : ηn(x) = φ⊤(x)γn

IMSE(Xn) =∫

Xφ⊤(x)cov(γn)φ(x) dµ(x) = σ2

ntrace[M−1

B (Xn)]= A-optimality criterion applied to MB(Xn)(MB(Xn) has full rank for any m!)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 47 / 62

3 Optimal design for Bayesian prediction 3.3 IMSE-optimal design

3.3 IMSE-optimal design

All the machinery of optimal design for parametric models is available (Pilz, 1983)

Exact design: Spöck and Pilz (2010) for prediction of spatial random fields (butno guaranteed convergence to the optimum)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 48 / 62

3 Optimal design for Bayesian prediction 3.3 IMSE-optimal design

3.3 IMSE-optimal design

All the machinery of optimal design for parametric models is available (Pilz, 1983)

Exact design: Spöck and Pilz (2010) for prediction of spatial random fields (butno guaranteed convergence to the optimum)

Approximate design: minimize Ψ(ξ) = trace[M−1B (ξ)],

with ξ a probability measure over X and

MB(ξ) =

∫φ(x)φ⊤(x) ξ(dx) +

σ2

nΛ−1

m

➤ guaranteed convergence towards an optimal ξ∗, with N∗ support points

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 48 / 62

3 Optimal design for Bayesian prediction 3.3 IMSE-optimal design

In practice: eigen-decomposition

➤ use a finite Q-point set XQ = {x(k), . . . , x(Q)}➤ diagonalize QW where

{Q}kℓ = C(x(k), x(ℓ)), W = diag{w1, . . . , wQ}(wk = 1/Q when µ uniform)

➔ QW = PΛP−1 with P⊤WP = IQ andΛ = diag{λ1, . . . , λQ} with λ1 ≥ · · · ≥ λQ

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 49 / 62

3 Optimal design for Bayesian prediction 3.3 IMSE-optimal design

In practice: eigen-decomposition

➤ use a finite Q-point set XQ = {x(k), . . . , x(Q)}➤ diagonalize QW where

{Q}kℓ = C(x(k), x(ℓ)), W = diag{w1, . . . , wQ}(wk = 1/Q when µ uniform)

➔ QW = PΛP−1 with P⊤WP = IQ andΛ = diag{λ1, . . . , λQ} with λ1 ≥ · · · ≥ λQ

MB(ξ) =

m∑

k=1

pkφ(xk)φ⊤(xk) +

∑Qi=m+1 λi

ndiag{λ−1

1 , . . . , λ−1m } , m < Q

where pk = ξ{xk} and {φ(xk)}j = ϕj(xk) = Pkj , k = 1, . . . , m, j = 1, . . . , m

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 49 / 62

3 Optimal design for Bayesian prediction 3.3 IMSE-optimal design

In practice: eigen-decomposition

➤ use a finite Q-point set XQ = {x(k), . . . , x(Q)}➤ diagonalize QW where

{Q}kℓ = C(x(k), x(ℓ)), W = diag{w1, . . . , wQ}(wk = 1/Q when µ uniform)

➔ QW = PΛP−1 with P⊤WP = IQ andΛ = diag{λ1, . . . , λQ} with λ1 ≥ · · · ≥ λQ

MB(ξ) =

m∑

k=1

pkφ(xk)φ⊤(xk) +

∑Qi=m+1 λi

ndiag{λ−1

1 , . . . , λ−1m } , m < Q

where pk = ξ{xk} and {φ(xk)}j = ϕj(xk) = Pkj , k = 1, . . . , m, j = 1, . . . , m

➤ minimization of trace[M−1B (ξ)] ➔ ξ∗

p∗k = 0 for many k, but some p∗

k > 0 are very small,there may exist clusters of points, etc.➔ aggregate support points of ξ∗

➔ remove some points (transfer their weights on others, optimally)(Gauthier & P., 2016)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 49 / 62

3 Optimal design for Bayesian prediction 3.3 IMSE-optimal design

The number N of points is not totally controlled, but 2 tuning parameters areavailable: m (truncation level) and n (take m ≈ n ≈ N)

N points ➠ initialization for optimization of the true IMSE(XN) by any standardalgorithm (optimal points remain in the convex hull of X )

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 50 / 62

3 Optimal design for Bayesian prediction 3.3 IMSE-optimal design

Example :d = 2, C(x, x′) = (1 + 10‖x − x′‖) exp(−10‖x − x′‖) (Matérn 3/2)XQ = regular grid with Q = 33 × 33 = 1 089 points, σ2 =

∑i>m λi

m = n = 10aggregation of support of ξ∗

7

7

7

2

3

8

9

88

99

10

11

10

11

10

11

1212

66

12

6

4

5

1

1

1

m = n = 10➠ N = 12

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 51 / 62

3 Optimal design for Bayesian prediction 3.3 IMSE-optimal design

Example :d = 2, C(x, x′) = (1 + 10‖x − x′‖) exp(−10‖x − x′‖) (Matérn 3/2)XQ = regular grid with Q = 33 × 33 = 1 089 points, σ2 =

∑i>m λi

m = 30, n = 10 ➠ N = 36 X36 miniMax optimal

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 51 / 62

3 Optimal design for Bayesian prediction 3.3 IMSE-optimal design

Can be used for any X if d not too big: use a finite XQ given by first Q pointsof a LDS in X

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 52 / 62

3 Optimal design for Bayesian prediction 3.3 IMSE-optimal design

Can be used for any X if d not too big: use a finite XQ given by first Q pointsof a LDS in X

With trend, f (x) = Z (x) + r⊤(x)β?Same thing (Spöck and Pilz, 2010), with slightly more complicated expressions

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 52 / 62

3 Optimal design for Bayesian prediction 3.3 IMSE-optimal design

Can be used for any X if d not too big: use a finite XQ given by first Q pointsof a LDS in X

With trend, f (x) = Z (x) + r⊤(x)β?Same thing (Spöck and Pilz, 2010), with slightly more complicated expressions

More advanced: avoid mixing eigenfunctions ϕi(·) with trend components {r}j(·)(Gauthier & P., 2016)

Conclusions part (2)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 52 / 62

4 Beyond space filling

4 Beyond space filling

Optimal design for kriging: there is a hidden difficultythe value of θ in covariance C(·; θ) is unknown

➠ use the same observations to estimate θ and then construct ηn(x) with θn

estimated (plug-in method)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 53 / 62

4 Beyond space filling

4 Beyond space filling

Optimal design for kriging: there is a hidden difficultythe value of θ in covariance C(·; θ) is unknown

➠ use the same observations to estimate θ and then construct ηn(x) with θn

estimated (plug-in method)

➠ In particular, we may use θn = Maximum Likelihood Estimator (MLE)(Z (x) is supposed to be Gaussian)

➠ there is a corrective term (Harville and Jeske, 1992; Abt, 1999) :

ρn(x; θ) = ρn(x; θ) + trace{M−1θ

∂vn(x;θ)∂θ Cn

∂vn(x;θ)∂θ⊤ }

( = Empirical Kriging (EK) variance)avec :

vn(x; θ) such that ηn(x) = v⊤n (x; θ)yn

Mθ = Fisher Information Matrix (FIM) for θ

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 53 / 62

4 Beyond space filling

Example (Zimmerman, 2006):

E{Z (x)Z (x′)} = θ‖x−x′‖, θ = 0.3X = regular grid 5 × 5, optimal designs for:

0 2 4 60

1

2

3

4

5

6

prediction for θ known

0 2 4 60

1

2

3

4

5

6

estimation of θ

0 2 4 60

1

2

3

4

5

6

prediction with θ estimatedprediction with known θ:X4 minimizes maxx∈X ρ4(x)estimation of θ:X4 maximizes det Mθ

prediction with θ estimated:X4 minimizes MEK=maxx∈X ρ4(x; θ)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 54 / 62

4 Beyond space filling

Example (Müller et al., 2015):NH4 concentration in north sea (collaboration with MUMM, Belgium) —simulated data, kriging with Matérn 3/2 kernel

ρ14(x; θ) for miniMax-optimal designX∗

mM,n=14

ρ14(x; θ) for X∗14 minimizing

MEK=maxx∈X ρ14(x; θ)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 55 / 62

4 Beyond space filling

Choosing Xn that minimizes MEK=maxx∈X ρn(x; θ) is difficult

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 56 / 62

4 Beyond space filling

Choosing Xn that minimizes MEK=maxx∈X ρn(x; θ) is difficult

➠ Use a compromise criterion between space filling and “aggregation of points”

for instance, take Xn that maximizes γ log det(Mβ) + (1 − γ) log det(Mθ) (Mülleret al., 2011, 2015), with

Mβ= FIM for trend parameters β (maximization → space filling design)

Mθ= FIM for correlation parameters θ (maximization → aggregation)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 56 / 62

4 Beyond space filling

Example: n = 7, d = 2, C(x − x′; θ) = exp(−θ‖x − x′‖), θ = 0.7,1000 Lh (999 random + ♦ for a Maximin optimal design)MKV=maxx∈X ρn(x; θ), Jα = detα(Mβ) + det1−α(Mθ) (α = 0.8)

1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3−0.3

−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

MKV

−lo

g(J

α)

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 57 / 62

4 Beyond space filling

Example: n = 7, d = 2, C(x − x′; θ) = exp(−θ‖x − x′‖), θ = 0.7,1000 Lh (999 random + ♦ for a Maximin optimal design)MKV=maxx∈X ρn(x; θ), Jα = detα(Mβ) + det1−α(Mθ) (α = 0.8)

1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3−0.3

−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

MKV

−lo

g(J

α)

However, the effet of corrective term inρn(x; θ) = ρn(x; θ) + trace{M−1

θ∂vn(x;θ)

∂θ Cn∂vn(x;θ)

∂θ⊤ }quickly vanishes as n increases

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 57 / 62

5 Conclusions part (2)

5 Conclusions part (2) — with model

➤ Design criteria relying on a Gaussian-process model (entropy, MMSE, IMSE)depend on the chosen covariance (and on θ in C(·, θ))

➠ expectation w.r.t. θ (Joseph et al., 2015) → entropy➠ worst case w.r.t. θ (Spöck and Pilz, 2010) → IMSE➠ However, the model is often just a tool to generate a space-filling

design: the value of θ is not critical (choose a small enough correlation length tospread the points in X )

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 58 / 62

5 Conclusions part (2)

5 Conclusions part (2) — with model

➤ Design criteria relying on a Gaussian-process model (entropy, MMSE, IMSE)depend on the chosen covariance (and on θ in C(·, θ))

➠ expectation w.r.t. θ (Joseph et al., 2015) → entropy➠ worst case w.r.t. θ (Spöck and Pilz, 2010) → IMSE➠ However, the model is often just a tool to generate a space-filling

design: the value of θ is not critical (choose a small enough correlation length tospread the points in X )➤ Put more points near the boundaries than in the central part of X

(but be careful ➠ in high dimension almost all the volume is near the boundaries!)➤ Put a few points close to each other to help estimation of θ

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 58 / 62

5 Conclusions part (2)

5 Conclusions part (2) — with model

➤ Design criteria relying on a Gaussian-process model (entropy, MMSE, IMSE)depend on the chosen covariance (and on θ in C(·, θ))

➠ expectation w.r.t. θ (Joseph et al., 2015) → entropy➠ worst case w.r.t. θ (Spöck and Pilz, 2010) → IMSE➠ However, the model is often just a tool to generate a space-filling

design: the value of θ is not critical (choose a small enough correlation length tospread the points in X )➤ Put more points near the boundaries than in the central part of X

(but be careful ➠ in high dimension almost all the volume is near the boundaries!)➤ Put a few points close to each other to help estimation of θ

Test several methods (none is perfect) by comparing values of different criteria

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 58 / 62

5 Conclusions part (2)

small

inter-distance

(miniMax,

dispersion)

large

intra-distances

(Maximin, energy)

uniformity

(entropy,

discrepancy)

Inequalities, bounds

Max. Entropy

Sampling

Minimize

max. of MSE

or IMSE

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 59 / 62

5 Conclusions part (2)

small

inter-distance

(miniMax,

dispersion)

large

intra-distances

(Maximin, energy)

uniformity

(entropy,

discrepancy)

Inequalities, bounds

Max. Entropy

Sampling

Minimize

max. of MSE

or IMSE

XQ finite

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 59 / 62

Références

Références I

Abt, M., 1999. Estimating the prediction mean squared error in gaussian stochastic processes with exponentialcorrelation structure. Scandinavian Journal of Statistics 26 (4), 563–578.

Atwood, C., 1973. Sequences converging to D-optimal designs of experiments. Annals of Statistics 1 (2),342–352.

Böhning, D., 1985. Numerical estimation of a probability measure. Journal of Statistical Planning andInference 11, 57–69.

Böhning, D., 1986. A vertex-exchange-method in D-optimal design theory. Metrika 33, 337–347.

Chernoff, H., 1953. Locally optimal designs for estimating parameters. Annals of Math. Stat. 24, 586–602.

Dette, H., Pepelyshev, A., 2010. Generalized latin hypercube design for computer experiments. Technometrics52 (4), 421–429.

Fedorov, V., 1972. Theory of Optimal Experiments. Academic Press, New York.

Fedorov, V., 1996. Design of spatial experiments: model fitting and prediction. In: Gosh, S., Rao, C. (Eds.),Handbook of Statistics, vol. 13. Elsevier, Amsterdam, Ch. 16, pp. 515–553.

Gauthier, B., Pronzato, L., 2014. Spectral approximation of the IMSE criterion for optimal designs inkernel-based interpolation models. SIAM/ASA J. Uncertainty Quantification 2, 805–825, DOI10.1137/130928534.

Gauthier, B., Pronzato, L., 2016. Approximation of IMSE-optimal designs via quadrature rules and spectraldecomposition. Communications in Statistics – Simulation and Computation 45 (5), 1600–1612.

Gauthier, B., Pronzato, L., 2017. Convex relaxation for IMSE optimal design in random field models.Computational Statistics and Data Analysis 113, 375–394.

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 60 / 62

Références

Références II

Harville, D., Jeske, D., 1992. Mean squared error of estimation or prediction under a general linear model.Journal of the American Statistical Association 87 (419), 724–731.

Johnson, M., Moore, L., Ylvisaker, D., 1990. Minimax and maximin distance designs. Journal of StatisticalPlanning and Inference 26, 131–148.

Joseph, V., Gul, E., Ba, S., 2015. Maximum projection designs for computer experiments. Biometrika 102 (2),371–380.

Kiefer, J., Wolfowitz, J., 1960. The equivalence of two extremum problems. Canadian Journal of Mathematics12, 363–366.

Mitchell, T., 1974. An algorithm for the construction of “D-optimal” experimental designs. Technometrics 16,203–210.

Müller, W., Pronzato, L., Rendas, J., Waldl, H., 2015. Efficient prediction designs for random fields. AppliedStochastic Models in Business and Industry 31 (2), 178–194 (+ discussion pages 195–203).

Müller, W., Pronzato, L., Waldl, H., 2011. Beyond space-filling: An illustrative case. Procedia EnvironmentalSciences 7, 14–19, dOI 10.1016/j.proenv.2011.07.004.

Pilz, J., 1983. Bayesian Estimation and Experimental Design in Linear Regression Models. Vol. 55.Teubner-Texte zur Mathematik, Leipzig, (also Wiley, New York, 1991).

Pukelsheim, F., 1993. Optimal Experimental Design. Wiley, New York.

Pukelsheim, F., Reider, S., 1992. Efficient rounding of approximate designs. Biometrika 79 (4), 763–770.

Sacks, J., Welch, W., Mitchell, T., Wynn, H., 1989. Design and analysis of computer experiments. StatisticalScience 4 (4), 409–435.

Schwabe, R., 1996. Optimum Designs for Multi-Factor Models. Springer, New York.

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 61 / 62

Références

Références III

Shewry, M., Wynn, H., 1987. Maximum entropy sampling. Applied Statistics 14, 165–170.Silvey, S., 1980. Optimal Design. Chapman & Hall, London.Spöck, G., Pilz, J., 2010. Spatial sampling design and covariance-robust minimax prediction based on convex

design ideas. Stochastic Environmental Research and Risk Assessment 24 (3), 463–482.Titterington, D., 1976. Algorithms for computing D-optimal designs on a finite design space. In: Proc. of the

1976 Conference on Information Science and Systems. Dept. of Electronic Engineering, John HopkinsUniversity, Baltimore, pp. 213–216.

Torsney, B., 1983. A moment inequality and monotonicity of an algorithm. In: Kortanek, K., Fiacco, A. (Eds.),Proc. Int. Symp. on Semi-infinite Programming and Applications. Springer, Heidelberg, pp. 249–260.

Torsney, B., 2009. W-iterations and ripples therefrom. In: Pronzato, L., Zhigljavsky, A. (Eds.), Optimal Designand Related Areas in Optimization and Statistics. Springer, Ch. 1, pp. 1–12.

Welch, W., 1982. Branch-and-bound search for experimental designs based on D-optimality and other criteria.Technometrics 24 (1), 41–28.

Wu, C., 1978. Some algorithmic aspects of the theory of optimal designs. Annals of Statistics 6 (6),1286–1301.

Wynn, H., 1970. The sequential generation of D-optimum experimental designs. Annals of Math. Stat. 41,1655–1664.

Yu, Y., 2010. Strict monotonicity and convergence rate of Titterington’s algorithm for computing D-optimaldesigns. Comput. Statist. Data Anal. 54, 1419–1425.

Yu, Y., 2011. D-optimal designs via a cocktail algorithm. Stat. Comput. 21, 475–481.Zhigljavsky, A., Dette, H., Pepelyshev, A., 2010. A new approach to optimal design for linear models with

correlated observations. Journal of the American Statistical Association 105 (491), 1093–1103.Zimmerman, D., 2006. Optimal network design for spatial prediction, covariance parameter estimation, and

empirical prediction. Environmetrics 17 (6), 635–652.

Luc Pronzato (CNRS) Design of Computer Experiments (2) École ETICS, Porquerolles, 06/10/2017 62 / 62

top related