random notes on kriging: an introduction to geostatistical...

Random notes on kriging:an introduction to geostatisticalinterpolation for environmental

applications

Luca BonaventuraStefano Castruccio

MOX - Laboratorio di Matematica ApplicataDipartimento di Matematica

Politecnico di [email protected]

Contents

1 The estimation of spatially distributed and uncertain data 2

2 Basic definitions on random fields 42.1 Finite dimensional distributions . . . . . . . . . . . . . . . . . . 42.2 First, second order moments and variograms of random fields . . . 52.3 Analysis of random fields . . . . . . . . . . . . . . . . . . . . . . 62.4 Definitions of stationarity of random fields . . . . . . . . . . . . . 72.5 Characterization and representation theorems for variogram func-

tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.6 Measurement error and subgrid scales:

the nugget effect . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.7 Isotropic variogram models . . . . . . . . . . . . . . . . . . . . . 12

3 Variogram estimation 143.1 Empirical variogram estimators . . . . . . . . . . . . . . . . . . . 143.2 Least squares variogram fitting procedures . . . . . . . . . . . . . 15

4 Spatial prediction and kriging 174.1 Ordinary kriging . . . . . . . . . . . . . . . . . . . . . . . . . . 174.2 Universal kriging . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5 Appendix: Basics of random variables 23

References 26

i

Introduction

The purpose of these notes is to provide a short and self-contained introduc-tion to the literature on geostatistical interpolation of scattered data. The existingliterature on this topic has a very broad scope and presents the related issues froma wide range of rather different perspectives, motivated by highly specific appli-cations, e.g. in mining, groundwater flow modelling, oceanography, meteorology.The aim of this introduction is to summarize in a consistent way the basic ter-minology and the key theorical concepts underlying the practice of geostatisticalinterpolation and to present the derivation of the most widely used kriging estima-tors.

There is no attempt at a complete presentation of the underlying theories ormethods, which is available in a number of well known publications. For a morecomplete description of the statistical techiques surveyed here, the reader is re-ferred, among many others, to the presentations in [5], [9], [12], [13]. A moreadvanced presentation of the same material for readers with good background inmathematical statistics can be found in [4].

There is also no attempt at achieving a high standard of mathematical rigourin the formulation of the definitions and theorems. The reader interested in thecomplete presentation of the measure theoretic problems associated with proba-bility spaces, random variables and random fields should consult textbooks suchas [2]. A basic introduction to probability theory and mathematical statistics canbe found e.g. in [10].

1

Chapter 1

The estimation of spatiallydistributed and uncertain data

Consider N points xi, i = 1, · · · , N in the vector space Rd. At these loca-tions, data zi i = 1, · · · , N are assumed to be known. These data are interpretedas the values of a field z, whose value depends on the position in space. In general,the points xi will be scattered disorderly in space, rather than aligned on a regulargrid. Furthermore, the data are assumed to be affected by some uncertainty, dueeither to measurement error, or to the fact that the quantity z is dependent on someunpredictable physical process, or both.

Definition 1 (Geostatistical interpolation) Given the N points xi, i = 1, · · · , Nand the uncertain data zi i = 1, · · · , N, the geostatistical interpolation problemconsist of

• predicting the most appropriate value z0 for the quantity z at a point x0,different from the points associated to the available data

• estimating the uncertainty of the prediction z0 as a function of the uncer-tainty on the available data zi i = 1, · · · , N and of their correlation struc-ture.

The geostatistical interpolation problem is quite different from the classicalinterpolation problem. In classical interpolation, the data zi are assumed to besampled from a function z(x), that is reconstructed from the data under someassumption on the nature of the interpolating function z. Typically, for classicalLagrange interpolation one assumes that the function z is a polynomial (see e.g.[11]), while in the case of Radial Basis Function interpolation (which is quiteuseful for deterministic interpolation from scattered data and has many technicalsimilarities with kriging as far as the formulation of the interpolation problem is

2

Spatial data 3

concerned, see e.g. [3]) the interpolator is assumed to be a linear combination ofshape functions with particular properties. Furthermore, the approximation erroris dependent on the regularity of the underlying function z and of its derivatives.On the other hand, geostatistical interpolators do not depend in general on theregularity of z and do not yield in general regular reconstructions, apart fromthe fact that, if measurement errors and subgrid effects are disregarded, an exactinterpolation condition holds at the points xi, i = 1, · · · , N.

Chapter 2

Basic definitions on random fields

Definition 2 A random field is a function Z = Z(ω,x) which prescribes a realnumber Z for each couple (ω,x), where ω is an event in a probability space(Ω, P ) and x ∈ Rd (in the following, the dependence on ω will often be omittedfor the sake of simplifying the notation).

Thus, a random field is a function of several real variables which also happensto depend on elements of a probability space. A short review of the basic proper-ties of these mathematical objects will show how they combine the peculiaritiesof random variables and scalar fields on Rd. Concepts from both analysis andprobability theory are necessary for a proper description of their behaviour.

2.1 Finite dimensional distributions

From a probabilistic viewpoint, the behaviour of a random field is completelydetermined if it is known how to compute the probabilities

P[Z(x1) ∈ (a1, b1), · · · , Z(xN) ∈ (aN , bN)

], (2.1)

where ai, bi, denote the extremes of arbitrary intervals on the real line. For each Nand each set of N points xi, i = 1, · · · , N the probabilities (2.1) define uniquelya set of values P(x1,··· ,xN )[(a1, b1), · · · , (aN , bN)] which identifies a probabilitydistribution on RN . These probability distributions are called finite dimensionaldistributions of the random field Z. It should be observed that the quantities (2.1)are symmetric with respect to permutations of the set of points xi, i = 1, · · · , N.

In the case of random fields with continuous finite dimensional distributions,to compute (2.1) it is sufficient to prescribe for each N and each set of N pointsxi, i = 1, · · · , N a probability density fZ(u) = f(Z(x1),··· ,Z(xN ))(u1, · · · , uN).

4

Random fields 5

Theorem 1 (Kolmogorov) A set of probability distributions on RN , defined asP(x1,··· ,xN )([a1, b1], · · · , [aN , bN ]) for N ≥ 1 and symmetric with respect to per-mutations of the set of points xi, i = 1, · · · , N determines uniquely the probabil-ity of any event associated with the random field if one assumes

P(x1,··· ,xN )([a1, b1], · · · , [aN , bN ]) = P[Z(x1) ∈ [a1, b1], · · · , Z(xN) ∈ [aN , bN ]

].

An important example are gaussian random fields, for which the finite dimen-sional distributions are defined by multidimensional gaussian distributions, whosedensities are given for a generic set of points xi, i = 1, · · · , N by

fZ(u) =1√

(2π)Ndet(A)exp

−(u − m)A−1(u − m)

2

(2.2)

where m = (m(x1), · · · , m(xN) is vector of space dependent quantities and A =Ax1,··· ,xN

is a symmetric, positive definite matrix.

2.2 First, second order moments and variograms ofrandom fields

The average and the variance of a random field are defined as usually for ran-dom variables

m(x) = E[Z(x)

]=

∫ +∞

−∞

ufZ(x)(u) du (2.3)

V ar[Z(x)

]= σ2

Z(x) = E[(

Z(x) − m(x))2]

=

∫ +∞

−∞

(u − m(x))2fZ(x)(u) du (2.4)

The computation of mean and variance only involves the one dimensional distri-butions. Other quantities such as the covariance require instead two dimensionalfinite distributions

Cov[Z(x), Z(y)

]= E

[(Z(x) − m(x)

)(Z(y) − m(y)

)]. (2.5)

The covariance is defined if the first and second order moments of the random fieldexist. In the case of Gaussian random fields whose finite dimensional distributionsare described by equation (2.2), the vector m = (m(x1), · · · , m(xN) has indeed

Random fields 6

as components the mean values of the field at locations x1, · · · ,xN , while thematrix A is such that ai,j = Cov[Z(xi)Z(xj)].

A very important quantity which plays a key role in the development of statis-tical interpolators is the variogram.

Definition 3 (Variogram) The variogram of a random field Z(x) is defined as

V ar[Z(x) − Z(y)

].

The quantity

γ(x,y) =1

2V ar

[Z(x) − Z(y)

]

is called semivariogram of Z. If Z has constant mean, the semivariogram is de-fined equivalently as

γ(x,y) =1

2E[(

Z(x) − Z(y))2]

If a random field has second order moments, both variogram and covarianceexist and there is a simple relationship between them

V ar[Z(x) − Z(y)

]= V ar

[Z(x)

]+ V ar

[Z(y)

]− 2Cov

[Z(x), Z(y)

](2.6)

. Higher order moments can also be defined as done for standard random variables.However, in practice they are quite difficult to estimate from the data and in manyapplications estimation and inference are only feasible for first and second ordermoments.

2.3 Analysis of random fields

If a random field Z(ω,x) is considered as a function of the spatial variable,a number of the usual analysis concepts (limit, continuity, derivative) can be in-troduced. This, however, can be done in different ways depending on how thedependency on the probability space is dealt with. Various concepts of limit aregiven here for the (spatially) pointwise convergence of a sequence of random fieldsZn(ω,x), n = 1, · · · ,∞. The same definitions can be extended to different typesof convergence in the spatial variable. Furthermore, based on these limit con-cepts, the continuity and differentiability of the random fields can also be definedaccordingly.

Random fields 7

Definition 4 (Pointwise convergence in probability) The sequence Zn(ω,x), n =1, · · · ,∞ converges pointwise in probability to Z(ω,x) if for any ε > 0 and forany x ∈ Rdone has

limn→∞

P [|Zn(ω,x) − Z(ω,x)| > ε] = 0. (2.7)

Definition 5 (Convergence with probability one ) The sequence Zn(ω,x), n =1, · · · ,∞ converges pointwise with probability one to Z(ω,x) if for any x ∈ Rd

one hasP [ lim

n→∞|Zn(ω,x) − Z(ω,x)| = 0] = 1. (2.8)

Definition 6 (Convergence in mean square sense ) The sequence Zn(ω,x), n =1, · · · ,∞ converges pointwise in mean square sense to Z(ω,x) if for any x ∈ Rd

one haslim

n→∞E|Zn(ω,x) − Z(ω,x)|2] = 0. (2.9)

These convergence concepts are not independent of each other: for example, bothconvergence in mean square sense and convergence with probability one implyconvergence in probability.

An important result relating the continuity of a random field to the propertiesof its second order moments is the following:

Theorem 2 (Continuity of random fields) If there is a β > 0 such that

E[(

Z(x) − Z(y))2]

≤ C‖x − y‖2d+β

the random field Z(x) is continuous with probability one.

Proof: See e.g. [1].This theorem implies that the specific features of the variogram function have

a relevant impact on the regularity of the field as function of the spatial variables.

2.4 Definitions of stationarity of random fields

Geostatistical interpolation, as it will be seen later, can be in general intro-duced independently of any hypothesis on the nature of the random field. How-ever, in order to achieve an acceptable estimate of the semivariogram withoutrequiring an amount of data much larger than what is usually available (especiallyin underground flow or mining applications) some restrictions on the nature of theallowed random fields are necessary. Similar restrictions are also introduced foreither conceptual or practical reasons in other areas in which random fields areapplied.

Random fields 8

Definition 7 (Stationary random fields) A random field is called stationary iffor any vector h ∈ Rd and for any set of points xi, i = 1, · · · , N one has

P[Z(x1 + h) ∈ [a1, b1], · · · , Z(xN + h) ∈ [aN , bN ]

]

= P[Z(x1) ∈ [a1, b1], · · · , Z(xN) ∈ [aN , bN ]

]. (2.10)

The stationarity property can also be summarized by saying that the finite di-mensional distributions of a stationary field are translation invariant. As a conse-quence, all the single site moments E[Z(x)k], k ≥ 1 are constants. If they exist,either covariance or semivariogram only depend on the difference between thetwo locations at which Z is evaluated.

Definition 8 (Intrinsically stationary random fields) A random field is calledintrinsically stationary if the field semivariogram is only a function of the differ-ence between the two positions at which the increment is computed, that is, if thereexists a real scalar field γ on Rd such that

γ(x,y) = γ(x − y) (2.11)

In general, the class of intrinsically stationary random fields is much largerthan that of stationary random fields. Furthermore, a stationary field is also intrin-sically stationary.

Definition 9 (Second order stationary random fields) A random field is calledsecond order stationary if the field covariance exists and is only a function of thedifference between the two positions at which the increment is computed, that is,if there exists a real scalar field C on Rd such that

C(x,y) = C(x − y) (2.12)

If a field Z has finite second order moments that are constant in space, definitions2.11 and 2.12 are equivalent, since one can use equation 2.6 to obtain

γ(x,y) = V ar[Z(0)

]+ V ar

[Z(0)

]− 2Cov

[Z(x), Z(y)

](2.13)

Definition 10 (Increment stationary random fields) A random field is calledincrement stationary if the field of increments Z(x) − Z(0) is stationary.

Increment stationary random fields are also intrinsically stationary.

Random fields 9

Definition 11 (Isotropic random fields) An intrinsically (second order) station-ary random field is called isotropic if, for any x,y ∈ Rd the semivariogram(covariance) only depends on the euclidean norm of the difference between thethe two points, that is γ(x,y) = γ(‖x − y‖).

Some special cases of anisotropy can be handled more easily, as in the case of

Definition 12 (Geometrically anisotropic random fields) An intrinsically sta-tionary random field is geometrically anisotropic if its semivariogram is givenby

γ(x,y) = γo(‖A(x − y)‖)where A is a d × d matrix.

In the case of gaussian random fields, stationarity and second order station-arity coincide, since the finite dimensional distributions of the field are entirelydetermined by the covariance function.

2.5 Characterization and representation theorems forvariogram functions

In geostatistical interpolation, variograms have in general to be estimated fromthe data. In order to reconstruct their functional form, however, it is necessary totake into account that variograms belong to a special class of functions that willnow be defined. If this fact is disregarded, some serious inconsistencies may arisewhen using estimated variograms which do not belong to this class, such as forexample negative values for positive quantities such as the kriging variance.

Definition 13 (Conditionally negative definite functions) A function φ(x,y) iscalled conditionally negative if for any N ≥ 2, given xi ∈ Rd, i = 1, · · · , N andany set of real numbers αi, i = 1, · · · , N such that

N∑

i=1

αi = 0

one hasN∑

i=1

N∑

j=1

αiαjφ(xi,xj) ≤ 0.

Theorem 3 (Conditional negative definiteness of variograms) The semivari-ogram of an an intrinsecally stationary random field is a conditionally negativedefinite function.

Random fields 10

Proof: Let αi, i = 1, · · · , N such that∑N

i=1 αi = 0 and assume that Z is anintrinsecally stationary random field. Given xi ∈ Rd, i = 1, · · · , N, one has

N∑

i=1

αiZ(xi)

2

= −1

2

N∑

i=1

N∑

j=1

αiαj(Z(xi) − Z(xj))2

, (2.14)

since∑N

i=1 αi = 0. Taking the expected value one obtains

N∑

i=1

N∑

j=1

αiαj2γ(xi − xj) = −2var

(m∑

i=1

αiZ(si)

)≤ 0. (2.15)

Conditionally negative definite functions can be characterised as follows

Theorem 4 Let γ(·) be a continuous function on Rd such that γ(0) = 0. The

following statements are equivalent

• γ(·) is conditionally definite negative;

• for any a > 0, exp(−aγ(·)) is positive definite

• there exist a quadratic form Q(·) ≥ 0 and a positive measure G(·) thatis symmetric, continuous at the origin and that satisfies

∫ +∞

−∞. . .∫ +∞

−∞(1 +

‖ω‖2)−1G(dω) < +∞ such that

γ(h) = Q(h) +

∫ +∞

−∞

. . .

∫ +∞

−∞

1 − cos(ω′h)

‖ω‖2 G(dω). (2.16)

As a result, one obtains the following representation theorem

Theorem 5 (Schoenberg-Yaglom) A continuous function φ(x,y) that is condi-tionally negative definite and such that φ(x,x) = 0 is the variogram of an intrin-sically stationary random field.

Proof: Define the random field

Z(s) =

∫ +∞

−∞

. . .

∫ +∞

−∞

eiω′s − 1

‖ω‖ W (dω), (2.17)

where W (s) : s ∈ Rd is a complex valued zero mean random field with inde-

pendent increments and such that E(|W (dω)|2) = G(dω)/2. One then has

Z(s + h) − Z(s) =

∫ +∞

−∞

. . .

∫ +∞

−∞

eiω′sW ∗h(dω), (2.18)

Random fields 11

where W ∗h

is the independent increment fields such that

E(|W ∗h(dω)|2) = G∗

h(dω) =

∫ ω1

−∞

. . .

∫ ωd

−∞

1 − cos(ν ′h)

‖ν‖2 G(dν). (2.19)

The random field defined by (2.17) has then semivariogram given by

γ(h) =1

2

∫ +∞

−∞

. . .

∫ +∞

−∞

1 − cos(ω′h)

‖ω‖2 G(dω), (2.20)

which is in the form of equation (2.16) with Q(h) = 0.

A consequence of these representation theorems is that, given any set of semi-variograms γi, i = 1, · · · , m and non negative coefficients αi, i = 1, · · · , m thelinear combination γ =

∑mi=1 αiγi is also the semivariogram of an intrinsically

stationary process.Functions that satisfy the hypotheses of theorem 4 are also called admissible

or valid variogram functions. Similar representation theorems can be derived alsofor covariograms, based on the concept of conditionally positive definite function.For second order stationary fields the related representation theorems are entirelyequivalent.

2.6 Measurement error and subgrid scales:the nugget effect

It is clear from definition 3 that for a stationary (in any sense) random field onehas γ(0) = 0. If the variogram is assumed to be continuous in the origin, it will beseen in the following that the geostatistical interpolation procedure yields an exactinterpolation of the known data at the points where the field has effectively beensampled. This is not appropriate in many cases for two reasons. On one hand, itdoes not allow to include measurement error among the uncertainties that affectthe data: Measurement error is in general assumed to be spatially uncorrelatedand should not affect the structure of the variogram for values of h different fromzero. Another important effect that is not taken into account if the variogramis assumed to be continuous is the so called nugget effect, i.e. the possibilityof sudden jumps in field values on spatial scales that have not been completelysampled by the available data. In many applications, it is necessary to allow forthe possibility that even very close to the sampled point the reconstructed randomfield can take rather different values in a way that is effectively independent of thesampled value.

Random fields 12

Both these effects, although conceptually quite different, can be effectively de-scribed by allowing the variogram to be discontinuous at the origin. In particular,if limh→0 γ(h) = c0 with c0 different from zero, the variogram is said to displaythe nugget effect. A complete proof of the formal equivalence of nugget effectand inclusion of measurement errors can be found in [4].

2.7 Isotropic variogram models

A number of isotropic variogram model have been widely used in the applica-tions. In all these examples, we denote semivariograms by γθ(·), where θ repre-sents the vector of free parameters that fully determine the variogram shape. Forthe variogram models we consider, it will often be the case that θ = (c0, c1, c2),where c0 is the nugget parameter, i.e. the non zero limit limh→0 γ(h) = c0 incase the variogram model is assumed to be discontinuous in the origin, c1 is theso called sill parameter, that is the limit value limh→+∞ γ(h) = +∞, and c2 is therange, i.e. the typical spatial scale associated to significant changes in the vari-ogram function. It is to be remarked that for some authors range denotes indeedthe maximum distance beyond which the correlation between two different fieldvalues is zero; we use here a more general definition.

Definition 14 (Power law model) The power law variogram is given by

γθ(h) =

0, h = 0,

c0 + c1|h|λ, h 6= 0,(2.21)

with θ = (c0, c1) and c0, c1 ≥ 0. The particular case λ = 1 is also known aslinear variogram model.

In order to satisfiy the requirements for admissible variograms described in section2.5, it must be assumed that 0 < λ < 2. For this variogram model, limh→+∞ γ(h) =+∞, so that the variogram does not have a sill, does not define an associated co-variogram and the associated random field does not have a spatial scale on whichcorrelations decay.

Definition 15 (Exponential model) The exponential variogram model is givenby

γθ(h) =

0, h = 0,

c0 + c1

(1 − exp (−|h|/c2)

), h 6= 0,

(2.22)

where θ = (c0, c1, c2), ci ≥ 0 for i = 0, 1, 2.

Random fields 13

Definition 16 (Gaussian model) The gaussian variogram model is defined by

γθ(h) =

0, h = 0,

c0 + c1

(1 − exp (− |h|2

c22

)), h 6= 0,

(2.23)

with θ = (c0, c1, c2), ci ≥ 0 for i = 0, 1, 2.

It should be remarked that random fields with Gaussian variogram need not beGaussian random fields. Gaussian variograms imply very smooth random fields,that are often not realistic for many practical applications.

Definition 17 (Spherical model) The spherical model is defined by

γθ(h) =

0, h = 0,

c0 + c1

(32

(|h|c2

)− 1

2

(|h|c2

)3)

, 0 < h ≤ c2,

c0 + c1, h > c2,

(2.24)

with θ = (c0, c1, c2), ci ≥ 0 for i = 0, 1, 2.

This formula defines a valid variogram only if h is the absolute value of a vectorin R

2 or R3.

Chapter 3

Variogram estimation

In order to estimate the variogram of an intrinsically stationary random field fromthe available data, several variogram estimators have been introduced, which areused to derive the so called empirical variogram, i.e. a discrete set of values towhich then an admissible variogram model can be fitted. For the purposes of thispresentation, we will restrict the attention to isotropic random fields, althoughsimilar considerations can be carried out in the anisotropic case.

3.1 Empirical variogram estimators

A finite set of positive values hk, k = 1, · · · , K is introduced. These values areassumed to be ordered so that hk < hk+1 and they are interpreted as absolutedistances from the origin. We also introduce the positive values δk, k = 1, · · · , Kso that the intervals [hk − δk

2, hk + δk

2] are mutually disjoint and cover completely

the interval [0, hK + δK

2]. These values can be used to define the distance classes

N (hk) = (xi,xj) : hk −δk

2≤ ‖xi − xj‖ < hk +

δk

2δk, (3.1)

Here, xi ∈ Rd, i = 1, · · · , N denotes as in the previous chapters the points atwhich the data are available, so that class N (hk) includes all pairs of measurementpoints whose mutual distance falls in the interval [hk − δk

2, hk + δk

2). N(hk) =

|N (hk)| will denote in the following the cardinality of class N (hk). In general,it is required that the distance classes are sufficiently populated for the variogramestimation to be significant. For example, [5] suggests that N(hk) ≥ 30. In casethis condition is not satisfied, new values of hk should be chosen to guarantee thesignificance of the variogram estimation.

The classical Matheron estimator is defined for k = 1, · · · , K as

14

Variogram estimation 15

γM(hk) =1

2N(hk)

∑

N (hk)

(Z(xi) − Z(xj))2. (3.2)

This the most straightforward form of a variogram estimator and it has beenwidely applied, see e.g. [5], [6], [7], [12].

One problem with the Matheron estimator is that it can be very sensitive to thepresence of outliers in the data. In [8], a more robust estimator was proposed byCressie and Hawkins. This is defined for k = 1, · · · , K as

γC(hk) =1

2

(0.457 + 0.494

N(hk)

)(

1

N(hk)

∑

N (hk)

|Z(xi) − Z(xj)|1

2

)4

. (3.3)

This choice can be explained as follows since, for gaussian random fields,Z(xi) − Z(xj))

2 is a random variable with a χ21 distribution with one degree of

freedom. For this type of variables, it can be seen heuristically that elevating to thepower 1/4 is the transformation that yields a distribution most similar to a normaldistribution, for which it can be proven that |Z(xi)−Z(xj)|1/2 are less correlatedamong themselves than |Z(xi) − Z(xj)|2.

Another alternative is the estimator

γmed(h) =

[med

|Z(xi) − Z(xj)|1/2 : (xi,xj) ∈ N(h)

]4/2B(h), (3.4)

where med· denotes the median of the values in brackets and B(h) is a biascorrector that tends to the asymptotic value of 0.457.

3.2 Least squares variogram fitting procedures

Once an empirical variogram has been estimated using the techniques outlinedin the previous section, a valid variogram model can be fitted to the estimatedvalues. More precisely, denote by γ](h) one of the variogram estimators definedin section 3.1 and by γ(h; θ) a valid variogram model, dependent on a parametervector θ. The simplest fitting procedure, also known as ordinary least squaresmethod, computes an optimal value of θ by minimization of the functional

K∑

k=1

γ](hk) − γ(hk; θ)

2. (3.5)

Variogram estimation 16

This provides a purely geometrical fitting and does not use any information on thedistribution of the specific estimator γ](h) being used. This is instead taken intoaccount in the so called generalized least squares method, that can be defined asfollows. Let γ](hk) k = 1, · · · , K the estimated values of the empirical vari-ogram for an a priori fixed number K of distance classes. Furthermore, assumethat the number of data pairs in each distance class is sufficiently large (Cressiesuggests to consider only classes for which at least 30 data pairs are present). Onecan then consider the random vector 2γ] = (2γ](h1), . . . , 2γ

](hK))T and its co-variance matrix V = var(2γ]). The generalized least squares method consists indetermining the parameter vector θ that minimizes the functional

(2γ] − 2γ(θ))TV(θ)−1(2γ] − 2γ(θ)), (3.6)

where 2γ(θ) = (2γ(h1; θ), . . . , 2γ(hK; θ)T is the theorical variogram model to befitted computed at distances h1, . . . , hK . Lo stimatore così ottenuto viene indicatocon θ]

V.The generalized least square method is only using the second order moments

of the variogram estimator and does not require any assumption on the data distri-bution. On the other hand, the covariance matrix can be quite complex to deriveand minimization of the functional 3.6 not easy. For this reason, a simplified pro-cedure is presented in [5], that is based on heuristic considerations valid in thecase of a gaussian field Z. This derivation shows that the nondiagonal terms ofV can be disregarded in a first approximation, and that the diagonal terms can beapproximated by

Vj,j ≈2(2γ(hj; θ)2

|N(hj)|.

As a consequence, an estimator of the parameter vector θ can be obtained byminization of the functional

K∑

j=1

N(hj)

γ(hj)

γ(hj; θ)− 1

2

. (3.7)

Formula 3.7 yields a criterion that attributes a greater importance to well populateddistance classes hj for which N(hj) is larger. This approximation can also beconsidered as the first step of an iterative procedure, in which the minimization of3.6 is sought via a sequence θk, where θ0 is obtained by minimizing 3.7, and thefollowing θk are obtained by minimization of

(2γ] − 2γ(θ))TV(θk−1)−1(2γ] − 2γ(θ)). (3.8)

Chapter 4

Spatial prediction and kriging

Geostatistical interpolation consists in recovering an optimal prediction of thefield value at a location where no data are available, using the known data both forthe purpose of estimating the field variogram (or covariance) and to provide a pre-diction and estimate the prediction error. It is to be remarked that the stationarityassumptions that will

4.1 Ordinary kriging

In ordinary kriging, the uncertain data zi i = 1, · · · , N, assumed to be knownat the N points xi, i = 1, · · · , N are interpreted as a realization of an intrin-sically stationary random field Z(x) with constant mean µ. The constant meanis not assumed to be known, while the semivariogram has to be available. Theimplications of using estimated variograms on the quality of the estimate will bediscussed later. This amounts to assume Z(x) = µ+ δ(x), where δ is a zero meanrandom field. Considering definition 3 these assumptions imply that

E[(

Z(x) − Z(y))2]

= E[(

δ(x) − δ(y))2]

= 2γ(x,y). (4.1)

Under these assumption, one can define ordinary kriging as follows:

Definition 18 (Ordinary kriging) Given a point x0, the ordinary kriging estima-tor at x0 based on the data Z(xi) i = 1, · · · , N is defined as the linear unbiasedestimator

Z(x0) =N∑

i=1

λiZ(xi)

of Z(x0) with minimum mean square prediction error.

17

Kriging 18

It can be remarked that the unbiasedness assumption amounts to require∑N

i=1 λi =1, since

E[Z(x0)

]= E

[ N∑

i=1

λiZ(xi)]

=N∑

i=1

λiE[Z(xi)

]= µ

N∑

i=1

λi,

which is equal to µ = E[Z(x0)

]if and only if the coefficients of the linear com-

bination sum to one.In order to derive an expression for these coefficients, it is practical to resort

to the method of Lagrange multipliers to reduce the problem to an unconstrainedminimization. Thus, one introduces the function

φ(λ1, . . . , λN , β) = E[(

Z(x0) −N∑

i=1

λiZ(xi))2]

− 2β( N∑

i=1

λi − 1)

and seeks values of λ1, . . . , λN , β such that φ attains its minimum. Before pro-ceeding to the minimization, the function is rewritten using the fact that

(Z(x0) −

N∑

i=1

λiZ(xi))2

= Z(x0)2 − 2Z(x0)

N∑

i=1

λiZ(xi) +( N∑

i=1

λiZ(xi))2

= Z(x0)2 − 2Z(x0)

N∑

i=1

λiZ(xi) +

N∑

i=1

λiZ(xi)2

−N∑

i=1

λiZ(xi)2 +

( N∑

i=1

λiZ(xi))2

=

N∑

i=1

λi

(Z(x0) − 2Z(x0)Z(xi) + Z(xi)

2)

−1

2

[ N∑

i=1

λiZ(xi)2 +

N∑

j=1

λjZ(xj)2 − 2(

N∑

i=1

λiZ(xi)2)(

N∑

j=1

λjZ(xj)2)]

=N∑

i=1

λi

(Z(x0) − Z(xi)

)2

− 1

2

N∑

i=1

N∑

j=1

λiλj

(Z(xi) − Z(xj)

)2

.

Kriging 19

Because of equation (4.1), this implies that

φ(λ1, . . . , λN , β)

=N∑

i=1

λiγ(x0,xi)N∑

i=1

λi

[γ(x0,xi) −

N∑

j=1

λjγ(xi,xj)]− 2β

( N∑

i=1

λi − 1)

= −N∑

i=1

N∑

j=1

λiλjγ(xi,xj) + 2N∑

i=1

λiγ(x0,xi) − 2β( N∑

i=1

λi − 1). (4.2)

Setting the gradients of the function φ equal to zero leads to the linear system

ΓOλO = γO (4.3)

where the unknown and right hand side are given by, respectively,

λO =

λ1

. . .λN

β

γO =

γ(x0,x1). . .

γ(x0,xN )1

(4.4)

and the system matrix is defined by

ΓO =

γ(x1,x1) γ(x1,x2) . . . γ(x1,xN) 1γ(x2,x1) γ(x2,x2) . . . γ(x2,xN) 1

. . . . . . . . . . . . 1γ(xN ,x1) γ(xN ,x2) . . . γ(xN ,xN ) 1

1 1 . . . 1 0

. (4.5)

The ordinary kriging coefficients can then be determined solving the linearsystem (4.3), so that

λO = Γ−1O γO. (4.6)

It is to be remarked that the solution λO provides two types of information. Alongwith the values of the coefficients λi, i = 1, . . . , N, the solution of the system alsoprovides the value of the Lagrange multiplier β that minimizes the mean squareprediction error. Putting the computed values back into the expression of thisfunctional one can see that the optimal value of the prediction error is given by

σ2OK(x0) = λT

OγO = γTOΓ−1

O γO. (4.7)

This expression is also called kriging variance and is an estimate of the predictionerror associated with the ordinary kriging predictor.

Kriging 20

4.2 Universal kriging

In universal kriging, the uncertain data zi i = 1, · · · , N, assumed to be knownat the N points xi, i = 1, · · · , N are interpreted as a realization of a random fieldthat can be decomposed in the sum of a deterministic component and of an in-trinsically stationary random field Z(x) with zero mean. This amounts to assumeZ(x) =

∑pj=1 βjfj(x) + δ(x), where δ is the zero mean random field. The deter-

ministic component is represented using shape functions fj that are assumed to beknown, along with the semivariogram of the random field δ, but the coefficientsβj are not needed to formulate the prediction. Under these assumption, one candefine universal kriging as follows:

Definition 19 (Universal kriging) Given a point x0, the universal kriging es-timator at x0 based on the data Z(xi) i = 1, · · · , N is defined as the linearunbiased estimator

Z(x0) =

N∑

i=1

λiZ(xi)

of Z(x0) with minimum mean squared prediction error.

It is to be remarked that if p = 1 and f1 = 1 are chosen, ordinary kriging isrecovered exactly. Introducing the matrix

X =

f1(x1) . . . fp(x1)f1(x2) . . . fp(x2). . . . . . . . .

f1(xN) . . . fp(xN )

, (4.8)

and the vectors

β =

β1

. . .βp

δ =

δ(x1). . .

δ(xN )

Z =

Z(x1). . .

Z(xN)

the universal kriging data can also be rewritten as

Z = Xβ + δ, (4.9)

which highlights the formal similarity with the general linear estimation problem.The functional to be minimized can be written in the case of universal kriging as

φ(λ1, . . . , λN , m1, . . . , mp) = E[(

Z(x0) −N∑

i=1

λiZ(xi))2]

− 2

p∑

j=1

mj

( N∑

i=1

λifj(xi) − fj(x0))

Kriging 21

where ml, l = 1, . . . , p are the Lagrange multipliers. Repeating the derivationalong the lines of the previous section leads to the linear system

ΓUλU = γU (4.10)

where the unknown and right hand side vector are given by, respectively,

λU =

λ1

. . .λN

β

γU =

γ(x0,x1). . .

γ(x0,xN)1

(4.11)

and the system matrix is given by

ΓU =

γ(x1,x1) . . . γ(x1,xN) f1(x1) . . . fp(x1)γ(x2,x1) . . . γ(x2,xN) f1(x2) . . . fp(x2)

. . . . . . . . . . . . . . . . . .γ(xN ,x1) . . . γ(xN ,xN) f1(xN ) . . . fp(xN)

f1(x1) . . . f1(xN ) 0 . . . 0. . . . . . . . . 0 . . . 0

fp(x1) . . . fp(xN ) 0 . . . 0

. (4.12)

The universal kriging coefficients can then be determined by solving the linearsystem (4.10), so that

λU = Γ−1U γU . (4.13)

Similarly to the ordinary kriging case, along with the prediction also the meansquared prediction error can be computed by the formula

σ2UK(x0) = λT

UγU = γTUΓ−1

U γU (4.14)

once the universal kriging coefficients and Lagrange multipliers have been deter-mined.

The main difficulty in the practical application of universal kringing lies withthe fact that, if the variogram is not known, for any random field with non constantmean the standard variogram estimators described in section 3 are no more unbi-ased and, indeed, cannot be applied if the coefficients βj are not known. Thesecan be in turn estimated assuming that the field δ has known covariance. Indeed, ifthe covariance of the data Z is known and denoted by Σ, the standard generalizedleast squares estimator yields the value

βgls = (XTΣ−1X)−1XTΣ−1Z.

Kriging 22

However, the data covariance, assuming it exists, is in fact related to the vari-ogram. This leads to a circularity between the hypotheses needed for variogramestimation that can be resolved in a number of ways, none of which is free fromcriticism and practical problems. The reader is referred to the discussion in [5],[4] for more details.

Chapter 5

Appendix: Basics of randomvariables

In order to make these notes self contained, some basic definitions and resultsin probability theory are summarized in this appendix. There is also no attemptat achieving a high standard of mathematical rigour in the formulation of the def-initions and theorems. The reader interested in the complete presentation of themeasure theoretic problems associated with probability spaces and random vari-ables should consult textbooks such as [2]. A basic introduction to probabilitytheory and mathematical statistics can be found in [10].

Definition 20 (Probability space) A probability space is defined by

• the set Ω of all events that are considered admissible

• the collection F of all subsets of Ω for which a probability is defined (whichincludes Ω and the empty set ∅); in order to avoid some paradoxes and toendow P with all the desirable properties defined below, F cannot coincidewith the set of all subsets of Ω and must satisfy a series of properties whichwill not be listed here.

• the probability P, a function that assignes values in the interval [0, 1] toeach set in F , representing the relative weight of a given event with respectto the set of all admissible events

The probability P must satisty the properties

• P (Ω) = 1;

• P (Ac) = 1−P (A), for each set A ∈ F , where Ac denotes the complemen-tary of A;

23

Random variables 24

• given an arbitrary (possibly infinite) sequence of mutually disjoint sets Ai, i ≥1, Ai ∪ Aj = ∅, it holds

P (∪i≥1Ai) =∑

i≥1

P (Ai).

Definition 21 (Random variable) A random variable is a function Z = Z(ω)which prescribes a real number Z for each event ω in a probability space (Ω, P ).

From a probabilistic viewpoint, the behaviour of a scalar random variable X iscompletely determined if it is known how to compute the probabilities

P[X ∈ [a, b)

]. (5.1)

Definition 22 (Probability distribution) The probability distribution of the ran-dom variable X is defined for each x ∈ R by

FX(x) = P[X ∈ [−∞, x)

]. (5.2)

Definition 23 (Continuous random variables) The random variable X has acontinuous distribution if for each x ∈ R there is a non negative real functionfX(u) such that

FX(x) = P[X ∈ [−∞, x)

]=

∫ +∞

−∞

fX(u) du. (5.3)

fX(u) is called the probability density function of X.

An important example are gaussian random variables, for which the distribu-tion are defined by multidimensional gaussian distributions

fX(u) =1√2πa

exp− (u − m)2

2a

(5.4)

where m = is vector of space dependent quantities and a is a positive number.The average and the variance of a random variable are defined as

mX = E[X] =

∫ +∞

−∞

ufX(u) du (5.5)

V ar[X] = σ2X = E

[(X − mX)2

]

=

∫ +∞

−∞

(u − m)2fX(u) du (5.6)

Random variables 25

The median of a random variable is defined implicitly by the equation

FX(med[X]) =1

2. (5.7)

The mean of a random variable its approximation with a constant that mini-mizes the L2 norm of the difference, i.e.

Theorem 6 (Variance as minimum mean square estimator) For any real num-ber λ, one has

E[(X − mX)2

]≤ E

[(X − λ)2

].

The median of a random variable its approximation with a constant that mini-mizes the L1 norm of the difference, i.e.

Theorem 7 (Median as minimum mean of absolute estimator) For any realnumber λ, one has

E[|X − med[X]|

]≤ E

[|X − λ|

].

Other quantities such as the covariance require instead two dimensional finitedistributions:

Cov(X, Y ) = E[(X − mX)(Y − mY )

]. (5.8)

Existence of the covariance is equivalent to the existence of the second order mo-ments of the random fields. Variance and covariance are related by

V ar[X + Y ] = V ar[X] + V ar[Y ] + Cov(X, Y ). (5.9)

Bibliography

[1] R.J. Adler. Geometry of Random Fields. Wiley, 1981.

[2] P. Billingsley. Probability and measure. Wiley, New York, 1986.

[3] M. D. Buhmann. Radial Basis Functions. Cambridge University Press,Cambridge, 2003.

[4] R. Christensen. Linear Models for Multivariate, Time Series and Spatialdata. Springer Verlag, 1991.

[5] N. Cressie. Statistics for spatial data. Wiley, 1991.

[6] M.G. Genton. Higly robust variogram estimation. Mathematical Geology,30:213–221, 1998.

[7] D.J. Gorsich and M.G. Genton. Variogram model selection via nonparamet-ric derivative estimation. Mathematical Geology, 32:249–270, 2000.

[8] D.M. Hawkins and N. Cressie. Robust kriging-a proposal. Journal of theInternational Association for Mathematical Geology, 16:3–18, 1984.

[9] G. Kitanidis. Geostatistics. In D.R. Maidment, editor, Handbook of Hydrol-ogy, pages 153–165. McGraw Hill, 1993.

[10] S. Ross. Probability and Statistics for the applied sciences. ??, Berlin, 1995.

[11] J. Stoer and R. Bulirsch. An Introduction to Numerical Analysis, 2nd edition.Springer Verlag, Berlin, 1990.

[12] H. Wackernagel. Multivariate Geostatistics. Springer Verlag, Berlin, 1995.

[13] A. T. Walden and P.Guttorp. Statistics in the environmental and earth sci-ences. Arnold, 1992.

26

random notes on kriging: an introduction to geostatistical...

Documents