kriging models with gaussian processes - covariance function estimation and...

28
Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling François Bachoc former PhD advisor: Josselin Garnier former CEA advisor: Jean-Marc Martinez Department of Statistics and Operations Research, University of Vienna (Former PhD student at CEA-Saclay, DEN, DM2S, STMF, LGLS, F-91191 Gif-Sur-Yvette, France and LPMA, Université Paris Diderot) JPS 2014 - Forges-les-Eaux - March 2014 François Bachoc Kriging models April 2014 1 / 28

Upload: others

Post on 15-Apr-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

Kriging models with Gaussian processes - covariance functionestimation and impact of spatial sampling

François Bachoc

former PhD advisor: Josselin Garnierformer CEA advisor: Jean-Marc Martinez

Department of Statistics and Operations Research, University of Vienna(Former PhD student at CEA-Saclay, DEN, DM2S, STMF, LGLS, F-91191 Gif-Sur-Yvette, France

and LPMA, Université Paris Diderot)

JPS 2014 - Forges-les-Eaux - March 2014

François Bachoc Kriging models April 2014 1 / 28

Page 2: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

1 Kriging models with Gaussian processes

2 Asymptotic analysis of covariance function estimation and of spatial sampling impact

François Bachoc Kriging models April 2014 2 / 28

Page 3: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

Kriging model with Gaussian processes

Kriging model

Study of a single realization of a Gaussian process Y (x) on a domain X ∈ Rd

−2 −1 0 1 2

−2−1

01

2

x

y

GoalPredicting the continuous realization function, from a finite number of observation points

François Bachoc Kriging models April 2014 3 / 28

Page 4: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

The Gaussian process

The Gaussian processWe consider that the Gaussian process is centered, ∀x ,E(Y (x)) = 0

The Gaussian process is hence characterized by its covariance function

The covariance functionThe function K : X 2 → R, defined by K (x1, x2) = cov(Y (x1),Y (x2))

In most classical cases :

Stationarity : K (x1, x2) = K (x1 − x2)

Continuity : K (x) is continuous⇒ Gaussian process realizations are continuous

Decrease : K (x) is a decreasing function for x ≥ 0 and limx→+∞ K (x) = 0

François Bachoc Kriging models April 2014 4 / 28

Page 5: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

Example of the Matérn 32 covariance function on R

The Matérn 32 covariance function, for a Gaussian

process on R is parameterized by

A variance parameter σ2 > 0

A correlation length parameter ` > 0

It is defined as

Cσ2,`(x1, x2) = σ2(

1 +√

6|x1 − x2|

`

)e−√

6|x1−x2|

`

0 1 2 3 4

0.0

0.2

0.4

0.6

0.8

1.0

x

cov

l=0.5l=1l=2

InterpretationStationarity, continuity, decrease

σ2 corresponds to the order of magnitude of the functions that are realizations of theGaussian process

` corresponds to the speed of variation of the functions that are realizations of the Gaussianprocess

⇒ Natural generalization on Rd

François Bachoc Kriging models April 2014 5 / 28

Page 6: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

Covariance function estimation

ParameterizationCovariance function model

{σ2Kθ, σ2 ≥ 0, θ ∈ Θ

}for the Gaussian Process Y .

σ2 is the variance parameter

θ is the multidimensional correlation parameter. Kθ is a stationary correlation function.

ObservationsY is observed at x1, ..., xn ∈ X , yielding the Gaussian vector y = (Y (x1), ...,Y (xn)).

Estimation

Objective : build estimators σ2(y) and θ(y)

François Bachoc Kriging models April 2014 6 / 28

Page 7: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

Maximum Likelihood for estimation

Explicit Gaussian likelihood function for the observation vector y

Maximum LikelihoodDefine Rθ as the correlation matrix of y = (Y (x1), ...,Y (xn)) with correlation function Kθ andσ2 = 1.The Maximum Likelihood estimator of (σ2, θ) is

(σ2ML, θML) ∈ argmin

σ2≥0,θ∈Θ

1n

(ln (|σ2Rθ|) +

1σ2

y t R−1θ y

)

⇒ Numerical optimization with O(n3) criterion

François Bachoc Kriging models April 2014 7 / 28

Page 8: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

Cross Validation for estimation

yθ,i,−i = Eσ2,θ(Y (xi )|y1, ..., yi−1, yi+1, ..., yn)

σ2c2θ,i,−i = varσ2,θ(Y (xi )|y1, ..., yi−1, yi+1, ..., yn)

Leave-One-Out criteria we study

θCV ∈ argminθ∈Θ

n∑i=1

(yi − yθ,i,−i )2

and1n

n∑i=1

(yi − yθCV ,i,−i )2

σ2CV c2

θCV ,i,−i

= 1⇔ σ2CV =

1n

n∑i=1

(yi − yθCV ,i,−i )2

c2θCV ,i,−i

RobustnessWe show that Cross Validation can be preferable to Maximum Likelihood when the covariancefunction model is misspecified

Bachoc F, Cross Validation and Maximum Likelihood estimations of hyper-parameters ofGaussian processes with model misspecification, Computational Statistics and Data Analysis66 (2013) 55-69

François Bachoc Kriging models April 2014 8 / 28

Page 9: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

Virtual Leave One Out formula

Let Rθ be the covariance matrix of y = (y1, ..., yn) with correlation function Kθ and σ2 = 1

Virtual Leave-One-Out

yi − yθ,i,−i =1

(R−1θ )i,i

(R−1θ y

)i

and c2i,−i =

1

(R−1θ )i,i

O. Dubrule, Cross Validation of Kriging in a Unique Neighborhood, Mathematical Geology,1983.

Using the virtual Cross Validation formula :

θCV ∈ argminθ∈Θ

1n

y t R−1θ diag(R−1

θ )−2R−1θ y

andσ2

CV =1n

y t R−1θCV

diag(R−1θCV

)−1R−1θCV

y

⇒ Same computational cost as ML

François Bachoc Kriging models April 2014 9 / 28

Page 10: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

Prediction with fixed covariance function

Gaussian process Y observed at x1, ..., xn and predicted at xnewy = (Y (x1), ...,Y (xn))t

Once the covariance function has been estimated and fixedR is the covariance matrix of Y at x1, ..., xn

r is the covariance vector of Y between x1, ..., xn and xnew

Prediction

The prediction is Y (xnew ) := E(Y (xnew )|Y (x1), ...,Y (xn)) = r t R−1y .

Predictive varianceThe predictive variance isvar(Y (xnew )|Y (x1), ...,Y (xn)) = E

[(Y (xnew )− Y (xnew ))2

]= var(Y (xnew ))− r t R−1r .

François Bachoc Kriging models April 2014 10 / 28

Page 11: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

Illustration of prediction

Observations

Prediction mean

95% confidence intervals

Conditional realizations

François Bachoc Kriging models April 2014 11 / 28

Page 12: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

Application to computer experiments

Computer modelA computer model, computing a given variable of interest, corresponds to a deterministic functionRd → R. Evaluations of this function are time consuming

Examples : Simulation of a nuclear fuel pin, of thermal-hydraulic systems, of components of acar, of a plane...

Kriging model for computer experimentsBasic idea : representing the code function by a realization of a Gaussian process

Bayesian framework on a fixed function

What we obtainmetamodel of the code : the Kriging prediction function approximates the code function, andits evaluation cost is negligible

Error indicator with the predictive variance

Full conditional Gaussian process⇒ possible goal-oriented iterative strategies foroptimization, failure domain estimation, small probability problems, code calibration...

François Bachoc Kriging models April 2014 12 / 28

Page 13: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

Conclusion

Kriging models :

The covariance function characterizes the Gaussian process

It is estimated first. Here we consider Maximum Likelihood and Cross Validation estimation

Then we can compute prediction and predictive variances with explicit matrix vector formulas

Widely used for computer experiments

François Bachoc Kriging models April 2014 13 / 28

Page 14: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

1 Kriging models with Gaussian processes

2 Asymptotic analysis of covariance function estimation and of spatial sampling impact

François Bachoc Kriging models April 2014 14 / 28

Page 15: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

Framework and objectives

EstimationWe do not make use of the distinction σ2,θ. Hence we use the set {Kθ ,θ ∈ Θ} of stationarycovariance functions for the estimation.

Well-specified modelThe true covariance function K of the Gaussian Process belongs to the set {Kθ ,θ ∈ Θ}. Hence

K = Kθ0 ,θ0 ∈ Θ

ObjectivesStudy the consistency and asymptotic distribution of the Cross Validation estimator

Confirm that, asymptotically, Maximum Likelihood is more efficient

Study the influence of the spatial sampling on the estimation

François Bachoc Kriging models April 2014 15 / 28

Page 16: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

Spatial sampling for covariance parameter estimation

Spatial sampling : initial design of experiments for Kriging

It has been shown that irregular spatial sampling is often an advantage for covarianceparameter estimation

Stein M, Interpolation of Spatial Data : Some Theory for Kriging, Springer, New York,1999. Ch.6.9.

Zhu Z, Zhang H, Spatial sampling design under the infill asymptotics framework,Environmetrics 17 (2006) 323-337.

Our question : can we confirm this finding in an asymptotic framework

François Bachoc Kriging models April 2014 16 / 28

Page 17: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

Two asymptotic frameworks for covariance parameter estimation

Asymptotics (number of observations n→ +∞) is an active area of research(Maximum-Likelihood estimator)

Two main asymptotic frameworksfixed-domain asymptotics : The observation points are dense in a bounded domain

0 1 2 3 4 5 6 7

01

23

45

67

0 1 2 3 4 5 6 7

01

23

45

67

0 1 2 3 4 5 6 7

01

23

45

67

increasing-domain asymptotics : A minimum spacing exists between the observation points−→ infinite observation domain.

0 1 2 3 4 5 6 7

01

23

45

67

0 1 2 3 4 5 6 7

01

23

45

67

0 1 2 3 4 5 6 7

01

23

45

67

François Bachoc Kriging models April 2014 17 / 28

Page 18: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

Choice of the asymptotic framework

Comments on the two asymptotic frameworksfixed-domain asymptoticsFrom 80’-90’ and onwards. Fruitful theory

Stein, M., Interpolation of Spatial Data Some Theory for Kriging, Springer, New York,1999.

However, when convergence in distribution is proved, the asymptotic distribution does notdepend on the spatial sampling −→ Impossible to compare sampling techniques forestimation in this context

increasing-domain asymptotics :Asymptotic normality proved for Maximum-Likelihood (under conditions that are not simple tocheck)

Sweeting, T., Uniform asymptotic normality of the maximum likelihood estimator, Annalsof Statistics 8 (1980) 1375-1381.

Mardia K, Marshall R, Maximum likelihood estimation of models for residual covariancein spatial regression, Biometrika 71 (1984) 135-146.

(no results for CV)

We study increasing-domain asymptotics for ML and CV under irregular sampling

François Bachoc Kriging models April 2014 18 / 28

Page 19: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

The randomly perturbed regular grid that we study

Observation point i :v i + εXi

(v i )i∈N∗ : regular square grid of step one in dimension d(Xi )i∈N∗ : iid with symmetric distribution on [−1, 1]d

ε ∈ (− 12 ,

12 ) is the regularity parameter of the grid.

ε = 0 −→ regular grid.|ε| close to 1

2 −→ irregularity is maximal

Illustration with ε = 0, 18 ,

38

1 2 3 4 5 6 7 8

12

34

56

78

2 4 6 8

24

68

2 4 6 8

24

68

François Bachoc Kriging models April 2014 19 / 28

Page 20: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

Consistency and asymptotic normality

Under general summability, regularity and identifiability conditions, we show

Proposition : for MLa.s convergence of the random Fisher information : The random trace1

2n Tr(

R−1θ0

∂Rθ0∂θi

R−1θ0

∂Rθ0∂θj

)converges a.s to the element (IML)i,j of a p × p deterministic

matrix IML as n→ +∞asymptotic normality : With ΣML = I−1

ML

√n(θML − θ0

)→ N (0,ΣML)

Proposition : for CVSame result with more complex expressions for asymptotic covariance matrix ΣCV

François Bachoc Kriging models April 2014 20 / 28

Page 21: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

Main ideas for the proof

A central tool : because of the minimum distance between observation points : theeigenvalues of the random matrices involved are uniformly lower and upper bounded

For consistency : bounding from below the difference of M-estimator criteria between θ andθ0 by the integrated square difference between Kθ and Kθ0

For almost-sure convergence of random traces : block-diagonal approximation of the randommatrices involved and Cauchy criterion

For asymptotic normality of criterion gradient : almost-sure (with respect to the randomperturbations) Lindeberg-Feller Central Limit Theorem

Conclude with classical M-estimator method

François Bachoc Kriging models April 2014 21 / 28

Page 22: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

Analysis of the asymptotic covariance matrices

The asymptotic covariance matrices ΣML,CV depend only on the regularity parameter ε.−→ in the sequel, we study the functions ε→ ΣML,CV

Matérn model in dimension one

K`,ν(x1, x2) =1

Γ(ν)2ν−1

(2√ν|x1 − x2|

`

)νKν(

2√ν|x1 − x2|

`

),

with Γ the Gamma function and Kν the modified Bessel function of second order

We consider

The estimation of ` when ν0 is known

The estimation of ν when `0 is known

=⇒We study scalar asymptotic variances

François Bachoc Kriging models April 2014 22 / 28

Page 23: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

Results for the Matérn model (1/2)

Estimation of ` when ν0 is known.Level plot of

[ΣML,CV (ε = 0)

]/[ΣML,CV (ε = 0.45)

]in `0 × ν0 for ML (left) and CV (right)

0.5 1.0 1.5 2.0 2.5

12

34

5

l0

ν 0

0.75

3

12

48

190

0.5 1.0 1.5 2.0 2.5

12

34

5

l0

ν 0

0.75

3

12

48

190

Perturbations of the regular grid are always beneficial for ML

François Bachoc Kriging models April 2014 23 / 28

Page 24: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

Results for the Matérn model (2/2)

Estimation of ν when `0 is known.Level plot of

[ΣML,CV (ε = 0)

]/[ΣML,CV (ε = 0.45)

]in `0 × ν0 for ML (left) and CV (right)

0.5 1.0 1.5 2.0 2.5

12

34

5

l0

ν 0

0.95

3.6

14

56

220

0.5 1.0 1.5 2.0 2.5

12

34

5

l0

ν 0

0.95

3.6

14

56

220

Perturbations of the regular grid are always beneficial for ML and CV

François Bachoc Kriging models April 2014 24 / 28

Page 25: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

Conclusion on covariance function estimation and spatial sampling

CV is consistent and has the same rate of convergence as ML

We confirm (not presented here) that ML is more efficientStrong irregularity in the sampling is an advantage for covariance function estimation

With ML, irregular sampling is more often an advantage than with CVWe show that, however, regular sampling is better for prediction with known covariance function =⇒motivation for using space-filling samplings augmented with some clustered observation points

Z. Zhu and H. Zhang, Spatial Sampling Design Under the Infill Asymptotics Framework,Environmetrics 17 (2006) 323-337.

L. Pronzato and W. G. Müller, Design of computer experiments : space filling and beyond,Statistics and Computing 22 (2012) 681-701.

For further details :

F. Bachoc, Asymptotic analysis of the role of spatial sampling for covariance parameterestimation of Gaussian processes, Journal of Multivariate Analysis 125 (2014) 1-35.

François Bachoc Kriging models April 2014 25 / 28

Page 26: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

Some perspectives

Ongoing workAsymptotic analysis of the case of a misspecified covariance function model with purelyrandom sampling

Other potential perspectivesDesigning other CV procedures (LOO error weighting, decorrelation and penalty term) toreduce the variance

Start studying the fixed-domain asymptotics of CV, in the particular cases where it is done forML

François Bachoc Kriging models April 2014 26 / 28

Page 27: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

French community

French community

GDR MASCOT-NUM www.gdr-mascotnum.fr

consortium ReDICE www.redice-project.org

François Bachoc Kriging models April 2014 27 / 28

Page 28: Kriging models with Gaussian processes - covariance function estimation and …fbachoc/Bachoc_Forge... · 2015-09-14 · Kriging models with Gaussian processes - covariance function

Thank you for your attention !

François Bachoc Kriging models April 2014 28 / 28