tapered covariance: bayesian estimation and asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfnov...

35
Tapered Covariance: Bayesian Estimation and Asymptotics Benjamin Shaby SAMSI Research Triangle Park, NC 27709 David Ruppert OR&IE Cornell University Ithaca, NY 14850 November 15, 2010 Author’s Footnote: Benjamin Shaby is Postdoctoral Fellow, SAMSI, Research Triangle Park, NC 27709 (E-mail: [email protected]). David Ruppert is Andrew Schultz, Jr. Professor of Engineering, School of Operations Research and Information Engineering, Cornell University, 14853 (E-mail: [email protected]). This work was supported by NSF grant ITS 0612031, NIH grant R37 CA057030, and NSF grant DMS-0805975. Abstract The method of maximum tapered likelihood has been proposed as a way quickly to estimate covariance parameters for stationary Gaussian random fields. We show that under a useful asymptotic regime, maximum tapered likelihood estimators are consistent and asymptotically normal for covariance models in common use. We then formalize the notion of tapered quasi-Bayesian estimators and show that they too are consistent and asymptotically normal. 1

Upload: others

Post on 08-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

Tapered Covariance: Bayesian Estimation and

Asymptotics

Benjamin Shaby

SAMSI

Research Triangle Park, NC 27709

David Ruppert

OR&IE

Cornell University

Ithaca, NY 14850

November 15, 2010

Author’s Footnote:

Benjamin Shaby is Postdoctoral Fellow, SAMSI, Research Triangle Park, NC 27709

(E-mail: [email protected]). David Ruppert is Andrew Schultz, Jr. Professor of

Engineering, School of Operations Research and Information Engineering, Cornell

University, 14853 (E-mail: [email protected]). This work was supported by NSF

grant ITS 0612031, NIH grant R37 CA057030, and NSF grant DMS-0805975.

Abstract

The method of maximum tapered likelihood has been proposed as a way

quickly to estimate covariance parameters for stationary Gaussian random

fields. We show that under a useful asymptotic regime, maximum tapered

likelihood estimators are consistent and asymptotically normal for covariance

models in common use. We then formalize the notion of tapered quasi-Bayesian

estimators and show that they too are consistent and asymptotically normal.

1

Page 2: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

We also present asymptotic confidence intervals for both types of estimators

and show via simulation that they accurately reflect sampling variability, even

at modest sample sizes. Proofs, examples, and detailed derivations are found

in the supplemental materials, available online.

Keywords: Covariance estimation, Gaussian process, Consistency, Bayesian

inference

1 Introduction

Covariance tapering was introduced as a way to mitigate the computational

burdens required for calculating statistically-relevant quantities involving large

covariance matrices arising from irregularly-spaced spatial data. These com-

putations typically require O(n3) operations, where n is the number of spatial

observations. The idea behind tapering is to introduce, in a principled way,

many zeros into the covariance matrices, enabling the use of sparse matrix al-

gorithms, which have computational complexities that are generally functions

of the number of non-zero elements in the matrix.

Tapering has been studied as a way to speed up computations required

for optimal spatial prediction (Furrer et al. 2006; Furrer and Sain 2009) and

for Kalman filter updates (Furrer and Bengtsson 2007). Kaufman (2006) and

Kaufman et al. (2008) introduced the maximum tapered likelihood estimate as

a way to use tapered covariance matrices to quickly estimate covariance func-

tion parameters. Du et al. (2009) and Zhang and Du (2008) further explicated

the properties of these estimators. In addition, Kaufman (2006) discussed ap-

proximating Bayesian estimation using tapered likelihood functions.

Here, we examine the behavior of both maximum tapered likelihood esti-

mators, as well as what we will call tapered quasi-Bayesian estimators (we use

the term quasi-Bayesian here despite its previous introduction in Berger (2000)

2

Page 3: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

to describe, pejoratively, something completely different).

Tapering is not the only approach that has been proposed to quickly com-

pute approximations to the likelihood function for large spatial datasets. When

the data are sampled on a regular spatial grid, the resulting structure of the

covariance matrix may be exploited to increase computational efficiency (Whit-

tle 1954; Zimmerman 1989). When data locations are not gridded, it is still

possible to use Fourier transform methods for approximate inference either by

integrating locations them onto a latent grid (Fuentes 2007) or employing a

non-standard periodogram formulation (Matsuda and Yajima 2009). Another

approach for non-gridded data is to factor the full likelihood into conditional

likelihoods that ignore dependence on far-away observations (Vecchia 1988;

Stein et al. 2004). Composite likelihood approaches have also been considered

(Heagerty and Lele 1998; Curriero and Lele 1999). Still another approach is

to project the data onto a lower-dimensional space (Cressie and Johannesson

2008; Banerjee et al. 2008; Finley et al. 2009), although theoretical properties

of these techniques are not known.

Here, like Kaufman (2006), Kaufman et al. (2008), Du et al. (2009), and

Zhang and Du (2008), we study the use of tapering for parameter estimation.

Indeed, the present study may be seen as a follow-up to these works, and

makes use of some of the proof techniques contained in Kaufman (2006) and

Kaufman et al. (2008). Unlike the previous works, which introduced the method

and considered the asymptotic behavior of tapering with the popular Matern

covariance function, we do not restrict ourselves to a single covariance model.

We also devote considerably more theoretical attention to the quasi-Bayesian

perspective than Kaufman (2006).

In addition, while the previous asymptotic studies of the maximum tapered

likelihood estimator are as much results on inconsistency as they are on con-

sistency (see Section 2.1), we provide proofs for consistency and asymptotic

normality of both the maximum tapered likelihood estimator and tapered quasi-

3

Page 4: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

Bayesian estimators. The key reason for the stronger results presented here is

that we consider a different type of asymptotic regime.

Asymptotics for random fields, unlike the case of asymptotics for indepen-

dent data, are somewhat ill-defined because the manner in which sample points

are added such that their number increases to infinity is not clear cut. There are

two standard approaches to increasing the number of observations toward infin-

ity (Cressie 1991). The first is called increasing-domain asymptotics, where the

domain expands in spatial extent, while the sampling density stays constant.

The second is called infill, or fixed-domain, asymptotics, where the domain

stays constant, while the sampling density increases to infinity.

For spatial prediction, Stein (1999, Chapter 3.3) prefers infill asymptotics,

arguing essentially that because the usual goal for spatial prediction is inter-

polation, and it is reasonable to posit that the denser the sample gets, the

better the interpolation ought to get, we are led towards fixed-domain asymp-

totics. For parameter estimation, however, the situation is somewhat different

in that it is not immediately clear which asymptotics better represents the case

of infinitely-increasing information.

The maximum likelihood estimator for the parameters of popular covariance

models has been studied under both types of asymptotics. Mardia and Marshall

(1984) showed that the maximum likelihood estimate is consistent and asymp-

totically normal for many covariance models under asymptotic sampling that

includes increasing domain asymptotics as a special case. On the other hand,

Zhang (2004) proved that the parameters of the popular Matern covariance

model (4) are not individually consistently estimable under infill asymptotics.

The parameter estimation analogue to Stein (1999)’s argument for prediction,

then, seems to be that because information increases infinitely in the case of

increasing-domain asymptotics, we might prefer this scheme over fixed-domain

asymptotics, where information does not increase to infinity for all parameters

as the number of sample points increases to infinity. In addition, relative to in-

4

Page 5: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

fill, results concerning expanding domain asymptotics are available for a much

more general cases.

In terms of how well the limiting distributions under the two asymptotic

regimes approximate their finite-sample correspondents, Zhang and Zimmer-

man (2005) found that infill asymptotics are preferable in some situations.

However, as we will see from the simulations in Section 2.2, as long as the

spatial extent of the sampling region is large compared to the range of depen-

dence of the process, increasing-domain asymptotics provide a very accurate

description of the behavior of the maximum tapered likelihood estimate.

More concretely, let Z(s), be a Gaussian random field, where s is a location

index that varies continuously over a domain D. Suppose also that we have

observed Z at n locations s1, . . . , sn. The covariance between measurements

at two locations Cov (Z(si), Z(sj)) = C(θ; si, sj) is assumed to be a function

of only the locations themselves, and is known up to a parameter θ. We will

further assume that Z(s) is second order stationary and, for simplicity, has

mean zero. For convenience, Z(s) may also be assumed to be isotropic, so that

C(θ; si, sj) = C(θ;hij), where hij = ‖si − sj‖, although this assumption may

be dropped.

The log likelihood of Zn, a vector of n observations of Z(s), is

`n(θ; Zn) = −1

2log(|Σn(θ)|)− 1

2Z′nΣn(θ)−1Zn, (1)

up to an additive constant (with respect to θ), where Σij,n(θ) = C(θ;hij).

Dependence of `n(θ; Zn) and Σn(θ) on θ will from here on be suppressed

for convenience. Also, for a matrix A, A′ will refer to the transpose of A. The

notation f will refer to the vector of first derivatives of the scalar function f ,

and f will refer to the matrix of second derivatives of f . All derivatives are

with respect to the vector θ.

Let ρt(L;h) be a correlation function and T be a “taper matrix” such that

5

Page 6: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

Tij = ρt(L;hij). All we assume about ρt(L;h) is that it does not depend on

θ, that it be a valid correlation function, that its support is [0, L) for some

appropriately chosen L > 0, and that it be greater than zero for h < L. The

choice of L is discussed below. One example taken from a class of compactly-

supported polynomials from Wendland (1995, 1998) is

ρt(L;h) = (1− h/L)6+(1 + 6h/L+ 35h2/3L2). (2)

Other useful examples of such functions may be found in Wendland (1995,

1998).

Define the tapered log likelihood as

`t,n =− 1

2log(|Σn ◦Tn|)

− 1

2Z′n((Σn ◦Tn)−1 ◦Tn

)Zn, (3)

where the ◦ notation denotes the element-wise product, sometimes called the

Hadamard or Schur product. Note that (3) does not correspond to the log

density of any random vector. Importantly, Σn◦Tn is guaranteed to be positive

definite as long as both Σn and Tn are both positive definite (Horn and Johnson

1991, page 458). Equation (3) is referred to in Kaufman (2006) and Kaufman

et al. (2008) as the two-taper approximation.

Their one-taper approximation, which was studied by Du et al. (2009) and

Zhang and Du (2008), does not correspond to an unbiased estimating equation

and can produce significantly biased estimates (Kaufman et al. 2008). We will

not consider it here.

Throughout this study, we favor simplicity of assumptions over slightly more

general results. In many cases, weaker but more elaborate assumptions are

possible, requiring only minimal changes to the proofs.

6

Page 7: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

2 The Maximum Tapered Likelihood Esti-

mator

The maximum tapered likelihood estimate is defined as θt,n = argmaxθ∈Θ `t,n(θ).

In Section 2.1, we consider this estimator from within the framework of ex-

tremum estimators and investigate its asymptotic properties. In Section 2.2,

we conduct a simulation experiment to determine, firstly, how quickly and to

what extent asymptotic sampling distributions approximate empirical sampling

distributions, and secondly how the taper range affects sampling variability.

2.1 Asymptotic Behavior of the Maximum Tapered

Likelihood Estimator

Here we study a form of increasing-domain asymptotics. The requirement of

an expanding domain is not stated explicitly, but rather, as in Mardia and

Marshall (1984), is implied by eigenvalue conditions on the covariance matrix

and its derivatives.

In contrast, Kaufman (2006) and Kaufman et al. (2008) study infill asymp-

totics. Specifically, consider the Matern covariance model, defined by

C(σ2, α, ν;h) =σ2(αh)ν

Γ(ν)2ν−1Kν(αh), σ2, α, ν > 0, (4)

where Kν is the modified Bessel function of the second kind of order ν. It is not

uncommon to assume that ν is fixed and known, a practice followed later in this

paper. They show that for a known ν and some fixed α∗, σ2α∗2νa.s.−−→ σ2

0α2ν0 ,

where σ20 and α0 are the true parameters and σ2 is the value that maximizes

(3). Unfortunately, under infill asymptotics, σ2 and α cannot both be estimated

consistently (Zhang 2004).

Kaufman (2006) and Kaufman et al. (2008) argue heuristically that a good

way to measure sampling variability of the estimator is to study a sandwich

7

Page 8: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

estimator of the sampling covariance matrix (e.g. Durbin 1960; Bhapkar 1972;

Morton 1981; Ferreira 1982; Godambe and Heyde 1987; Kauermann and Carroll

2001), and mention that, under conditions that are not met in the context of

Gaussian random fields, quasi-likelihood theory says that the quasi-likelihood

estimator of θ is asymptotically normal with covariance equal to the sandwich

matrix (Heyde 1997). We will show formally that under the asymptotic regime

we consider, this sandwich matrix is in fact the asymptotic covariance of θt,n.

2.1.1 Consistency

Let θ0 be the true parameter vector and P0 the probability measure under θ0.

Also, let E0 and Cov0 denote the expectation and covariance, respectively, un-

der θ0. Let λmax{A} and λmin{A} denote, respectively, the largest and small-

est absolute eigenvalues of a matrix A. We assume throughout that standard

measurability conditions hold.

Theorem 2.1.1 (Consistency). Assume θ0 ∈ Θ, a convex compact subset of

Rp.

(A) Assume a continuously differentiable covariance function C(θ;h) and an

increasing sequence of domains {Dn} such that and

(A1) infn λmin{Σn} > 0 and supn λmax{Σn} <∞ for all θ ∈ Θ;

(A2) supn λmax

{∂Σn∂θk◦Tn

}<∞ for all θ ∈ Θ, k = 1, . . . , p.

(A3) For some N > 0 and γ > 0, the minimum eigenvalue of the matrix

−E0[n−1¨t,n(θ0)] is greater than γ for all n ≥ N .

(B) Assume that θ1 6= θ2 ⇒ (Σn(θ1) ◦Tn) 6= (Σn(θ2) ◦Tn).

Then θt,n = argmaxθ∈Θ `t,n(θ) is consistent.

Proof. See Section S-3 in the supplemental materials.

We note that the dimension of the domain D does not play a direct role

in assumptions (A) and (B). For the covariance models often used in practice,

8

Page 9: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

the identifiability assumption (B) is usually satisfied by making sure the taper

range L is larger than the smallest interpoint distances in the data. Assumption

(A) takes a bit more work to check, but is not prohibitively difficult in most

useful contexts. The following example shows one common case.

Example 2.1.1 Exponential covariance function on a rectangular lat-

tice (I)

A commonly-used covariance function is the exponential covariance function

C(θ;h) = σ2 exp{−αh}, where θ = (σ2, α)′. Consider the increasing domain

scenario where sampling points s1, . . . , sn are placed on a rectangular lattice

{Dn} ⊂ ∆Zd, for a fixed 0 < ∆ < L, and Dn ⊂ Dn+1 for all n.

Define the distance matrix Hn, with Hij,n = ‖si − sj‖, where ‖ · ‖ denotes

the Euclidian norm. Let Σn = σ2Γn, so that the correlation matrix Γij,n =

e−αHij,n . Taking derivatives, we get ∂Σn∂σ2 = Γn and ∂Σn

∂α = −σ2Hn ◦ Γn.

For any matrix norm, the spectral radius λmax{A} of an n × n matrix A

satisfies λmax{A} ≤ ‖A‖ (Horn and Johnson 1991, page 297). If we choose the

maximum row sum norm (Horn and Johnson 1991, page 295),

λmax{Γn} ≤ ‖Γn‖∞

= sup1≤i≤n

n∑j=1

e−αHij

≤∑

s∈∆Zd

e−α‖s‖. (5)

One can check that∫

s∈Rd

e−α‖s‖ ds < ∞ for α > 0, which implies that the last

line in (5) is finite, so supn λmax{Γn} is finite. We finish confirming (A1) by

noting that all interpoint distances are fixed away from zero, ensuring that Γn

is strictly positive definite as n→∞.

9

Page 10: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

Next, for a taper function ρt(L;h) with support on [0, L),

λmax{(Hn ◦ Γn ◦Tn)} ≤ ‖(Hn ◦ Γn ◦Tn)‖∞

= sup1≤i≤n

n∑j=1

Hije−αHijρt(L; Hij)

≤ sup1≤i≤n

n∑j=1

Hije−αHij11{Hij<L}

≤∑

s∈∆Zd

‖s‖e−α‖s‖11{‖s‖<L}

=∑

s∈∆Zd∩{s:‖s‖<L}

‖s‖e−α‖s‖. (6)

The last line in (6) is the sum of a finite number of bounded summands, and

is therefore itself finite. Thus, for σ2 < ∞, supn λmax{(Hn ◦ Γn ◦ Tn)} < ∞.

Now we can see that supn λmax

{∂Σn∂θk◦ Tn

}< ∞ for k = 1, 2, thus verifying

Assumption (A2).

For Assumption (A3) it is possible to write down the form of the two eigen-

values analytically. Let qij denote the ijth entry of −E0[n−1¨t,n(θ0)]. Following

the computations in (S-9), we see that, suppressing the dependence on n for

the moment,

q11 =1

2σ4tr{

(Γ ◦T)−1(Γ ◦T)(Γ ◦T)−1(Γ ◦T)}

=1

2σ4tr{In}

=n

2σ4

q12 = q21 = − 1

2σ2tr{

(Γ ◦T)−1(H ◦ Γ ◦T)}

q22 =1

2tr{

(Γ ◦T)−1(H ◦ Γ ◦T)(Γ ◦T)−1(H ◦ Γ ◦T)}

The eigenvalues of −E0[n−1¨t,n(θ0)] are

1

2

(q11 + q22 ±

√4q2

12 + (q11 − q22)2). (7)

10

Page 11: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

Noting that q11 and q22 are both positive, the smaller of the eigenvalues is

positive when (q11 + q22)2 > 4q212 + (q11 − q22)2, or, after rearranging terms,

when the determinant q11q22−q212 > 0. Letting An = (Γn◦Tn)−1(Hn◦Γn◦Tn),

we can write the determinant as

q11q22 − q212 =

n

4σ4tr{A2

n} −1

4σ4tr2{An}

=n

4σ4

n∑i=1

(λi{An})2 − 1

4σ4

( n∑i=1

λi{An})2. (8)

The right hand side of (8) will decay to zero with increasing n only if the

eigenvalues of An converge to a common value . Because A is symmetric, this

can only happen if (Γn ◦Tn)−1(Hn ◦ Γn ◦Tn)− anIn → 0 for some scalar an,

which is clearly not the case. Thus, Assumption (A3) is satisfied.

To show Assumption (B), we write (Σn(θ)◦Tn)ij = σ2 exp{−αhij}ρt(L;hij).

Since ρt(L;h) > 0 when h < L, Assumption (B) holds as long as hij > L for

some at least two pairs of points ij and i′j′ for which hij 6= hi′j′ . For the

rectangular grid, this is true for for n ≥ 3.

We now see that the exponential covariance function on an expanding grid

satisfies assumptions (A) and (B), so the maximum tapered likelihood estimate

of θ = (σ2, α)′ is consistent.

Although covariance tapering is designed for use on irregularly-spaced datasets,

a regular grid was chosen for Example 2.1.1 to make the analysis tractable. We

demonstrate the use of the tapered likelihood on irregularly-spaced datasets in

simulations in Sections 2.2 and 3.2.

The argument in Example 2.1.1 is trivially repeated for other classes of

covariance functions. Notably, the Maternν=3/2 model, with C(θ;h) = σ2(1 +

αh) exp{−αh}, θ = (σ2, α)′, as well as the squared exponential model, with

C(θ;h) = σ2 exp{−(αh)2}, θ = (σ2, α)′, are easily accommodated.

11

Page 12: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

2.1.2 Asymptotic Normality

Let us first define some important matrices. Let

Pn = E0[ ˙ t,n ˙ ′t,n]

Qn = −E0[¨t,n]

Jn = QnP−1n Qn. (9)

The matrix J−1n is familiar from generalized estimating equations, quasi-likelihood,

and other areas, and is referred to by various names, including the sandwich

matrix, the Godambe information criterion, and the robust information crite-

rion (e.g. Durbin 1960; Bhapkar 1972; Morton 1981; Ferreira 1982; Godambe

and Heyde 1987; Heyde 1997). We are now ready to prove two useful lemmas.

Lemma 2.1.1. Assume conditions (A1)–(A3).

Then ˙t,n

D−→ N(0,Pn), where Pn = E0[ ˙ t,n ˙ ′t,n].

Proof. See Section S-3 in the supplemental materials.

Lemma 2.1.2. Assume a twice continuously differentiable covariance function

C(θ;h) and an increasing sequence of domains {Dn} such that (Σn ◦ Tn)−1

exists. Also, assume (A1)–(A3), along with

(A4) supn λmax

{∂2Σn∂θj∂θk

◦Tn

}<∞ for all θ ∈ Θ, j, k = 1, . . . , p.

Then n−1¨t,n − n−1Qn

P0−−→ 0, where Qn = −E0[¨t,n].

Proof. See Section S-3 in the supplemental materials.

Introducing some more notation, a p× p symmetric positive definite matrix

A can be written as A = ODO′ with O orthogonal and D diagonal. Define

A1/2 to be the square root OD1/2O′.

12

Page 13: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

Theorem 2.1.2 (Asymptotic Normality). Assume the conditions of Theorem

2.1.1 and Lemmas 2.1.1 and 2.1.2. Then

J1/2n (θt,n − θ0)

D−→ N(0, I),

where Jn is defined as in (9).

Proof. See Section S-3 in the supplemental materials.

Example 2.1.2 Exponential covariance function on a rectangular lat-

tice (II)

We consider the same setting as Example 2.1.1. Since we have already shown

that conditions (A1)–(A3) are satisfied, all we have left is (A4).

The second partial derivatives are

∂2Σn

∂σ2∂α= Hn ◦ Γn

∂2Σn

∂σ4= 0

∂2Σn

∂α2= σ2Hn ◦Hn ◦ Γn.

We showed in Example 2.1.1 that supn λmax{(Hn◦Γn◦Tn)} <∞, and obviously

supn λmax{0} < ∞. So all that is left to do to demonstrate (A4) is show that

supn λmax{(Hn ◦Hn ◦ Γn ◦Tn)} <∞.

13

Page 14: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

As before, for a taper function ρt(L;h) with support on [0, L),

λmax{(Hn ◦Hn◦Γn ◦Tn)}

≤ ‖(Hn ◦Hn ◦ Γn ◦Tn)‖∞

= sup1≤i≤n

n∑j=1

H2ije−αHijρt(L; Hij)

≤ sup1≤i≤n

n∑j=1

H2ije−αHij11{Hij<L}

≤∑

s∈∆Zd

‖s‖2e−α‖s‖11{‖s‖<L}

=∑

s∈∆Zd∩{s:‖s‖<L}

‖s‖2e−α‖s‖. (10)

Again, as before, the last line in (10) is the sum of a finite number of bounded

summands, and is therefore itself bounded. Thus, for σ2 < ∞, we see that

supn λmax{(Hn ◦Hn ◦ Γn ◦ Tn)} = 1/σ2 supn λmax

{∂Σn∂σ2 ◦ Tn

}< ∞. So (A4)

is satisfied, and the maximum tapered likelihood estimate of θ = (σ2, α) is

asymptotically normal with asymptotic variance Jn.

We note again that the calculations in Example 2.1.2 are trivially modified

to accommodate the Maternν=3/2 and squared exponential models.

2.2 Simulation Example

To explore the sampling characteristics of the maximum tapered likelihood

estimator, we simulated datasets with exponential and Maternν=3/2 covariance

functions, a setup following examples 2.1.1 and 2.1.2. Data locations are sited

irregularly on a unit grid purturbed with iid uniform(-1/3, 1/3) deviations,

closely following Stein et al. (2004) and Kaufman et al. (2008).

Each dataset consisted of 1000 random samples drawn from a N(0,Σ),

where Σij = σ2 exp{−αhij} in the exponential case and Σij = σ2(1+αh) exp{−αhij}

in the Maternν=3/2 case. σ2 was set at 1, and the α parameters were set at

14

Page 15: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

0.2 and 0.316, respectively, for the exponential and Maternν=3/2 models, giving

an effective range of 15 for all simulations. We used the re-parametrization

θ = (σ2, c)′, where c = σ2α2ν , because this parametrization is more easily iden-

tified by the data (Zhang 2004; Kaufman 2006). For the taper function, we

chose (2) from the class of compactly-supported polynomial correlation func-

tions introduced by Wendland (1995) and first used for tapering by Furrer et al.

(2006).

Sample points were located on the purturbed square grid (1, . . . ,√n) ×

(1, . . . ,√n), with n = 100, 400, 1600, and 2500. All computations were carried

out on a 2.3 GHz Linux machine using the R package spam for handling sparse

matrices.

[Figure 1 about here.]

The covariance function and tapered covariance functions are shown in Fig-

ure 1 for the three different taper ranges used in this simulation experiment.

The severity of the taper at L = 3 is evident. The times required to compute

a single evaluation of the tapered and full log likelihood functions are shown in

Figure S-1 in the supplemental materials.

For each dataset, the maximum tapered likelihood estimate of θ = (σ2, c)′

was calculated using three different taper ranges, L = 15, 5, and 3. In addition,

confidence intervals for θ were computed from J(θ) for each θ. L = 15 was

chosen because it is roughly equal to the “effective range,” the distance at which

correlations drop below .05, of the process. L = 5 is equal to 1/α, the range

parameter of the exponential covariance function in its usual parametrization.

Finally, L = 3 was chosen because it represents an extreme case of being close

to the smallest value for which each sample point is guaranteed to have its eight

immediate neighbors contained within the taper range.

Finally, for each combination of L and n, the sample covariance matrix

was calculated for the 1000 θts and compared to the matrix J, the asymp-

15

Page 16: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

totic covariance matrix, calculated using (S-9) and (S-11) in Section S-1 in the

supplemental materials.

It is interesting to note that the domain corresponding to the largest sample

size considered here is a square grid 50 units on a side. The domain then is just

a few times larger than 15-unit effective range of the simulated process. This

configuration is important to keep in mind when considering the applicability of

expanding domain asymptotics to parameter estimation with data from small

or moderately-sized domains.

The resulting empirical density estimates from the simulations are shown in

Figures 2, 3, 4, and 5. In all of these figures, as expected, we see the empirical

densities become more symmetric and more sharply peaked as the sample size

increases. Comparing across taper ranges, empirical density plots for σt at the

three taper ranges look very similar for both covariance models. The densities

of ct show small differences for the exponential model and marked differences

for the Maternν=3/2 model, but they are not as large as one might expect

when tapering so severely. Kaufman et al. (2008) noted the the similarities

between the empirical densities computed from maximum likelihood estimates

and maximum tapered likelihood estimates, when applied to the exponential

model.

[Figure 2 about here.]

[Figure 3 about here.]

[Figure 4 about here.]

[Figure 5 about here.]

Coverage rates for nominal 95% confidence intervals are shown in Tables 1

and 2. The normal approximation for the c parameter is excellent (see QQ-

plots, Figures S-3 and S-5 in the Supplementary Materials), and the coverage

rates are spot-on for both covariance models. Despite notable departures from

16

Page 17: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

normality for the σ2 estimates (see Figures S-2 and S-4 in the Supplementary

Materials), coverage rates are quite reasonable for moderately-sized datasets

and excellent for large datasets.

[Table 1 about here.]

[Table 2 about here.]

For each sample size and taper range, tables comparing sample covariance

matrices of the 1000 estimates of θ to their corresponding asymptotic calcula-

tions using the sandwich matrix (9) can be found in Section S-5. Importantly,

Tables S-1 and S-2 show that asymptotic calculations of sampling variability

become accurate at moderate sample sizes.

3 The Tapered Quasi-Bayesian Estimator

Often one might prefer Bayesian estimation over maximum likelihood-type es-

timation for covariance parameters of Gaussian random fields for all the usual

reasons: the ability to incorporate prior knowledge, the natural inclusion of pa-

rameter shrinkage, the straightforward extension to larger hierarchical models,

and so on.

Here we investigate the properties of tapered quasi-Bayesian estimators,

which are analagous to Bayesian estimators, only the likelihood is replaced

by the tapered likelihood. Specifically, we show that tapered quasi-Bayesian

estimators are consistent in that the quasi posterior, defined below in (11),

converges to a point mass at the true parameter θ0. We also show that the

quasi-posterior is asymptotically normal, and hence that samples from the

quasi-posterior (generated by MCMC, e.g.) can be used to construct consistent

confidence intervals for θ.

Following Chernozhukov and Hong (2003), we define the tapered quasi-

17

Page 18: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

posterior distribution as

πt,n(θ|Zn) =Lt,n(θ; Zn)π(θ)∫

Θ Lt,n(θ; Zn)π(θ) dθ, (11)

where Lt,n(θ; Zn) = exp {`t,n(θ; Zn)}, and π(θ) is a prior density on θ. We will

assume, for convenience, that π(θ) proper. Recall that Lt,n is not a density,

and thus πt,n(θ|Zn) is not a true posterior. We are guaranteed, however, that

as long as the prior π(θ) is proper, then πt,n(θ|Zn) will be a proper density

(Kaufman 2006).

In Section 3.1, we closely follow Chernozhukov and Hong (2003), who study

quasi-Bayesian estimation in a more general context. However, while their for-

mulation is the same, their outlook is quite different. They view quasi-Bayesian

estimation as a tool to enable the use of MCMC to maximize objective functions

that are not differentiable or are otherwise poorly behaved. That is, they view it

as a way to use Bayesian computational machinery to compute frequentist esti-

mators. Although we study frequentist properties of quasi-Bayesian estimators

here, we consider the quasi-Bayesian framework an a computationally-tractable

alternative to exact Bayesian methods.

3.1 Asymptotic Behavior of Tapered Quasi-Bayesian

Estimators

The results in this section are applications of theorems in Chernozhukov and

Hong (2003), which we re-state in the language of tapering. The main difficulty

in applying their theory to case of the the tapered likelihood is showing that

their Assumptions 1–4 are satisfied. In particular, Chernozhukov and Hong

(2003) assume consistency of the extremum estimator and asymptotic normality

of the first derivative of the objective function, which we have shown in Theorem

2.1.1 and Lemma 2.1.2. The remaining assumptions of Chernozhukov and Hong

(2003) follow easily from the conditions for Theorem 2.1.1 and Lemmas 2.1.1

18

Page 19: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

and 2.1.2, so we will assume that they hold throughout this section.

3.1.1 Convergence of the Tapered Quasi-Posterior

First, we define the total variation of moments norm of a real-valued measurable

function f as ‖f‖TVM(ω) ≡∫(1 + ‖δ‖ω)|f(δ)|dδ. Note that the special case of

ω = 0 is the usual total variation norm.

We now define the parameter δ, the scaled deviation from θ0, centered at the

tapered score ˙t,n(θ0) (scaled by ¨

t,n(θ0)) as δ =√n(θ−θ0)−

√n ¨

t,n(θ0)−1 ˙t,n(θ0).

The tapered quasi-posterior density of δ is then π∗t,n(δ|Zn) = 1√nπt,n

(δ√n

+θ0 +

¨t,n(θ0)−1 ˙

t,n(θ0)). We are now ready to state a consistency result about the

tapered quasi-posterior distribution.

Theorem 3.1.1 (Theorem 1 of C-H). For any 0 ≤ ω <∞

‖π∗t,n(δ|Zn)− π∗t,∞(δ|Zn)‖TVM(ω)P−→ 0,

where

π∗t,∞(δ|Zn) =

(|Qn/n|(2π)p

)1/2

exp{− 1

2δ′(Qn/n)δ

}.

From Theorem 3.1.1 we see that πt,n(θ|Zn) concentrates its mass at θ0 at

a rate of 1/√n, as measured by the total variation of moments norm. Then,

asymptotically, πt,n(θ|Zn) approximates a normal random variable with mean

θ0 + Q−1n

˙t,n(θ0) and covariance matrix Q−1

n .

3.1.2 Tapered Quasi-Bayesian Point Estimates

We will construct tapered quasi-Bayesian point estimates in a manner analogous

to the construction of proper Bayes estimators. Let the scaler function κn(u) be

a loss function. For simplicity, we will only consider symmetric loss functions,

with κn(u) = κn(−u), although this restriction is not necessary. Common

symmetric loss functions include the quadratic and absolute loss functions.

19

Page 20: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

We can now define the tapered quasi-posterior risk function as the expected

loss (with respect to the quasi-posterior), Rn(θ) =∫Θκn(θ−θ∗)πt,n(θ∗|Zn) dθ∗.

Then for a given choice of loss function κn(u), the tapered quasi-Bayes esti-

mator is the value of θ that minimizes the tapered quasi-posterior risk, θQB =

argminθ∈ΘRn(θ). As usual, quadratic and absolute loss functions will lead, re-

spectively, to the tapered quasi-posterior mean and median as the quasi-Bayes

estimators.

Theorem 3.1.2 (Theorem 2 of C-H). For a symmetric loss function κn(u),

J1/2n (θQB − θ0)

D−→ N(0, I).

What we see from Theorem 3.1.2 is that tapered quasi-Bayes estimators

such as the quasi-posterior mean and median are asymptotically normal with

covariance equal to the sandwich matrix Jn, the same asymptotic distribution

as the maximum tapered likelihood estimator.

3.1.3 Tapered Quasi-Bayesian Confidence Regions

We now turn from the question of constructing point estimates to the question

of constructing confidence regions from the tapered quasi-posterior. Here we

continue to use a frequentist vocabulary to derive frequentist properties, even

though we are studing quasi-Bayesian inference.

In constructing intervals, what we would like, from a practical point of view,

is to directly use the empirical quantiles of a sample (generated from MCMC)

from the tapered quasi-posterior as our confidence region. What we will see,

however, is that this approach does not yield asymptotically valid confidence

intervals. The question of how good such regions are is investigated in Section

3.2.1.

Although the quantiles of πt,n(θ|Zn) do not converge to the quantiles of the

limiting normal distribution if θQB, we can still use a quasi-posterior sample to

20

Page 21: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

construct intervals that are consistent using the delta method.

Theorem 3.1.3 (Theorem 4 of C-H). Let g(θ) be some scalar function of θ,

and let Jn be a consistent estmator for Jn. That is, J−1n Jn

P−→ I.

Define

ct,g,n(α) = g(θQB) + zα

√g(θQB)′J−1

n g(θQB),

where zα is the αth quantile of the standard normal distribution. Then

limn→∞

P{ct,g,n(α/2) ≤ g(θ0) ≤ ct,g,n(1− α/2)} = 1− α.

This method, of course, requires one to somehow calculate J−1n , which itself

requires Pn and Qn. Assuming a sample from πt,n(θ|Zn), generated by MCMC,

is available, a simple way of estimating Q−1n that immediately presents itself is

to compute the sample covariance matrix of the chain. Another possibility is

to plug θQB into (S-9) and (S-11) to estimate Pn and Qn. We compare these

methods in Section 3.2.1.

3.2 Simulation Example

Simulations to explore the sampling properties of tapered quasi-Bayesian es-

timators were set up in exactly the same way as in Section 2.2, with 1000

samples drawn from a N(0,Σ), where Σ comes from an exponential model and

a Maternν=3/2 model, with σ2 = 1, and c = σ2α = 0.2 for the exponential and

c = σ2α3 ≈ 0.03 for the Maternν=3/2, giving effective ranges of around 15. We

again used the taper function (2) from Wendland (1995). Sample points were

again located on an expanding perturbed square grid, with n = 100, 400, 1600,

and 2500. Both parameters were assigned Cauchy prior distributions with scale

parameter 10, truncated to have non-negative support. This prior is proper but

extremely weakly informative.

Because of the computational infeasibility of conducting 1000 long MCMC

21

Page 22: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

runs for each model, sample size, and taper range, MCMC was avoided by

defining the estimators θQB as quasi-posterior modes. This enabled θQB to

be computed much faster using numerical optimization routines on the un-

normalized quasi-posteriors.

Empirical density estimates of the resulting estimates are shown in Figures

6, 7, 8, and 9. Not surprisingly, these plots are almost indistinguishible from

those in Section 2.2, and show the same trend of collecting mass at the true

parameter as the sample size increases, while, in the case of σ2 for both mod-

els and c for the exponential model, not dissipating much as the taper range

decreases.

[Figure 6 about here.]

[Figure 7 about here.]

[Figure 8 about here.]

[Figure 9 about here.]

Tables S-3 and S-4 in the supplemental materials also tell a similar story,

that for the models considered here, decreases in statistical efficiency due to

tapering are acceptable when factored against the increases in computational

efficiency, and that the asymptotic variances based on the sandwich matrix

provide good estimates of sampling variability, even at modest sample sizes.

3.2.1 Comparing confidence intervals

An additional simulation was conducted to test the accuracy of confidence in-

tervals constructed from MCMC samples, as described in Section 3.1.3. 1000

datasets were constructed on a perturbed grid, with n = 1600, from the expo-

nential and Maternν=3/2 covariance models. In both cases, the range parameter

c was chosen to give an effective range of 4.75 for the associated process. For

each dataset, 95% confidence intervals were constructed three ways: as the

22

Page 23: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

0.025 and 0.975 quantiles of the MCMC sample, using Theorem 3.1.3 from Sec-

tion 3.1.3 with Q estimated by plugging θQB into (S-9), and using Theorem

3.1.3 with Q estimated as the inverse of the sample covariance of the MCMC

sample. We refer to these three methods, respectively, as the naive, asymptotic,

and MCMC estimators.

[Table 3 about here.]

From Table 3, we see that the naive intervals substantially under-cover σ2

while catastrophically over-covering c for both covariance models. The asymp-

totic and MCMC intervals both do extremely well, covering at almost exactly

their nominal 95% rate. The MCMC method is easier to compute, which rec-

ommends it slightly over the asymptotic method.

4 Discussion

Covariance tapering provides a way to estimate parameters of stationary Gaus-

sian random fields from very large datasets. The method provides huge gains

in computational efficiency by paying a relatively small price in statistical effi-

ciency, and extends the tractible size of problems many fold relative to standard

likelihood estimation tools. We have emphasized computational efficiencies in-

duced by using sparse matrix computations, but the sparse representation also

allows for large improvements in storage efficiency, enabling one to work with

datasets for which dense covariance matrices cannot even be stored in memory.

We showed that asymptotic theory for tapering provides a way to construct

sensible confidence intervals. Furthermore, we provided a formal framework for

incorporating tapered likelihoods into an approximate Bayesian inferential and

computational engine.

The theoretical results presented here are only possible in the expanding

domain asymptotic regime, which others have argued is less appropriate than

23

Page 24: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

the infill regime (Zhang 2004; Zhang and Zimmerman 2005). In the context of

parameter estimation, we have suggested the converse in Section 1, and demon-

strated, through simulation, the accuracy of calculations based on expanding

domain asymptotics in a scenario where the domain is not much larger than

the range of the process. In practice, we suspect that it would not be difficult

to find datasets for which the characteristics of parameter estimation are better

approximated by one or the other types of asymptotics.

5 Supplemental Materials

Supplemental Materials: Proofs, detailed derivations, examples, and addi-

tional data analyses (.pdf file)

References

Banerjee, S., Gelfand, A. E., Finley, A. O., and Sang, H. (2008), “Gaussian

Predictive Process Models for Large Spatial Data Sets,” J. R. Stat. Soc. Ser.

B Stat. Methodol., 70, 825–848.

Berger, J. O. (2000), “Bayesian analysis: a look at today and thoughts of

tomorrow,” J. Amer. Statist. Assoc., 95, 1269–1276.

Bhapkar, V. P. (1972), “On a measure of efficiency of an estimating equation,”

Sankhya Ser. A, 34, 467–472.

Chernozhukov, V. and Hong, H. (2003), “An MCMC approach to classical

estimation,” J. Econometrics, 115, 293–346.

Cressie, N. and Johannesson, G. (2008), “Fixed rank kriging for very large

spatial data sets,” J. Roy. Statist. Soc. Ser. B, 70, 209–226.

24

Page 25: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

Cressie, N. A. C. (1991), Statistics for spatial data, Wiley Series in Probability

and Mathematical Statistics: Applied Probability and Statistics, New York:

John Wiley & Sons Inc., a Wiley-Interscience Publication.

Curriero, F. C. and Lele, S. (1999), “A composite likelihood approach to semi-

variogram estimation,” J. Agric. Biol. Environ. Stat., 4, 9–28.

Du, J., Zhang, H., and Mandrekar, V. (2009), “Fixed-domain asymptotic prop-

erties of tapered maximum likelihood estimators,” The Annals of Statistics,

37, 3330–3361.

Durbin, J. (1960), “Estimation of parameters in time-series regression models,”

J. Roy. Statist. Soc. Ser. B, 22, 139–153.

Ferreira, P. E. (1982), “Multiparametric estimating equations,” Ann. Inst.

Statist. Math., 34, 423–431.

Finley, A., Sang, H., Banerjee, S., and Gelfand, A. (2009), “Improving the per-

formance of predictive process modeling for large datasets,” Computational

statistics & data analysis, 53, 2873–2884.

Fuentes, M. (2007), “Approximate likelihood for large irregularly spaced spatial

data,” J. Amer. Statist. Assoc., 102, 321–331.

Furrer, R. and Bengtsson, T. (2007), “Estimation of high-dimensional prior

and posterior covariance matrices in Kalman filter variants,” J. Multivariate

Anal., 98, 227–255.

Furrer, R., Genton, M. G., and Nychka, D. (2006), “Covariance tapering for

interpolation of large spatial datasets,” J. Comput. Graph. Statist., 15, 502–

523.

Furrer, R. and Sain, S. (2009), “Spatial model fitting for large datasets with

applications to climate and microarray problems,” Statistics and Computing,

19, 113–128.

25

Page 26: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

Godambe, V. P. and Heyde, C. C. (1987), “Quasi-likelihood and optimal esti-

mation,” Internat. Statist. Rev., 55, 231–244.

Heagerty, P. J. and Lele, S. R. (1998), “A composite likelihood approach to

binary spatial data,” J. Amer. Statist. Assoc., 93, 1099–1111.

Heyde, C. C. (1997), Quasi-likelihood and its application, Springer Series in

Statistics, New York: Springer-Verlag, a general approach to optimal pa-

rameter estimation.

Horn, R. A. and Johnson, C. R. (1991), Topics in matrix analysis, Cambridge:

Cambridge University Press.

Kauermann, G. and Carroll, R. J. (2001), “A note on the efficiency of sandwich

covariance matrix estimation,” J. Amer. Statist. Assoc., 96, 1387–1396.

Kaufman, C. (2006), “Covariance Tapering for Likelihood Based Estimation

in Large Spatial Datasets,” Ph.D. thesis, Carnegie Mellon University, Pitts-

burgh, Pennsylvania 15213.

Kaufman, C., Schervish, M., and Nychka, D. (2008), “Covariance Tapering for

Likelihood-Based Estimation in Large Spatial Datasets,” J. Amer. Statist.

Assoc., 103, 1545–1569.

Mardia, K. V. and Marshall, R. J. (1984), “Maximum likelihood estimation of

models for residual covariance in spatial regression,” Biometrika, 71, 135–

146.

Matsuda, Y. and Yajima, Y. (2009), “Fourier analysis of irregularly spaced

data on Rd,” J. Roy. Statist. Soc. Ser. B, 71, 191–217.

Morton, R. (1981), “Efficiency of estimating equations and the use of pivots,”

Biometrika, 68, 227–233.

26

Page 27: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

Stein, M. L. (1999), Interpolation of spatial data, Springer Series in Statistics,

New York: Springer-Verlag, some theory for Kriging.

Stein, M. L., Chi, Z., and Welty, L. J. (2004), “Approximating likelihoods for

large spatial data sets,” J. R. Stat. Soc. Ser. B Stat. Methodol., 66, 275–296.

Vecchia, A. V. (1988), “Estimation and model identification for continuous

spatial processes,” J. Roy. Statist. Soc. Ser. B, 50, 297–312.

Wendland, H. (1995), “Piecewise polynomial, positive definite and compactly

supported radial functions of minimal degree,” Adv. Comput. Math., 4, 389–

396.

— (1998), “Error estimates for interpolation by compactly supported radial

basis functions of minimal degree,” J. Approx. Theory, 93, 258–272.

Whittle, P. (1954), “On stationary processes in the plane,” Biometrika, 41,

434–449.

Zhang, H. (2004), “Inconsistent estimation and asymptotically equal interpo-

lations in model-based geostatistics,” J. Amer. Statist. Assoc., 99, 250–261.

Zhang, H. and Du, J. (2008), “Covariance tapering in spatial statistics,” in

Positive definite functions: From Schoenberg to space-time challenges, eds.

Mateu, J. and Porcu, E., Graficas Castan, s.l.

Zhang, H. and Zimmerman, D. L. (2005), “Towards reconciling two asymptotic

frameworks in spatial statistics,” Biometrika, 92, 921–936.

Zimmerman, D. L. (1989), “Computationally exploitable structure of covari-

ance matrices and generalized covariance matrices in spatial models,” J.

Statist. Comput. Simulation, 32, 1–15.

27

Page 28: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

Table 1: Coverage rates for nominal 95% confidence intervals, constructed using thesandwich formula, for the maximum tapered likelihood estimates of the σ2 parameterof exponential and Maternν=3/2 covariance models.

nL 100 400 1600 25003 0.81 0.85 0.92 0.91

Exponential 5 0.76 0.87 0.92 0.9415 0.79 0.85 0.89 0.943 0.78 0.88 0.92 0.93

Maternν=3/2 5 0.79 0.86 0.92 0.9215 0.81 0.89 0.92 0.92

Table 2: Coverage rates for nominal 95% confidence intervals, constructed using thesandwich formula, for the maximum tapered likelihood estimates of the c parameterof exponential and Maternν=3/2 covariance models.

nL 100 400 1600 25003 0.94 0.94 0.95 0.94

Exponential 5 0.95 0.94 0.96 0.9615 0.95 0.94 0.95 0.943 0.94 0.94 0.96 0.94

Maternν=3/2 5 0.95 0.95 0.95 0.9615 0.95 0.95 0.93 0.95

28

Page 29: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

Table 3: Coverage rates based on draws from the quasi-posterior for nominal 95%credible intervals for the exponential model, c = σ2α = 1.58, and the Maternν=3/2

model, c = σ2α3 = 1 (both with effective range ≈ 4.75).

type σ2 cnaive 0.697 1.000

Exponential asymptotic 0.946 0.947MCMC 0.950 0.956naive 0.623 1.000

Maternν=3/2 asymptotic 0.951 0.948MCMC 0.954 0.952

29

Page 30: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

List of Figures

1 Covariance and tapered covariance functions for L = 3, 5,and 15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2 Kernel density estimates for σ2t for the exponential model. 31

3 Kernel density estimates for ct for the exponential model. . 324 Kernel density estimates for σ2

t for the Maternν=3/2 model. 325 Kernel density estimates for ct for the Maternν=3/2 model. 336 Kernel density estimates for σ2

QB for the exponential model. 337 Kernel density estimates for cQB for the exponential model. 348 Kernel density estimates for σ2

QB for the Maternν=3/2 model. 349 Kernel density estimates for cQB for the Maternν=3/2 model. 35

30

Page 31: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

0 5 10 15 20

0.0

0.2

0.4

0.6

0.8

1.0

distanceh

Tapered covariance functions

exponential cov. fn.tapered exponential cov. fn.Maternν=3 2 cov. fn.tapered Maternν=3 2 cov. fn.

Figure 1: Covariance and tapered covariance functions for L = 3, 5, and 15.

0 1 2 3 4

0.0

1.0

2.0

n = 100

L=3

0 1 2 3 4

0.0

1.0

2.0

n = 400

0 1 2 3 4

0.0

1.0

2.0

n = 1600

0 1 2 3 4

0.0

1.0

2.0

n = 2500

0 1 2 3 4

0.0

1.0

2.0

L=5

0 1 2 3 4

0.0

1.0

2.0

0 1 2 3 4

0.0

1.0

2.0

0 1 2 3 4

0.0

1.0

2.0

0 1 2 3 4

0.0

1.0

2.0

L=15

0 1 2 3 4

0.0

1.0

2.0

0 1 2 3 4

0.0

1.0

2.0

0 1 2 3 4

0.0

1.0

2.0

Figure 2: Kernel density estimates for σ2t for the exponential model.

31

Page 32: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

0.10 0.20 0.30

020

4060

n = 100

L=3

0.10 0.20 0.30

020

4060

n = 400

0.10 0.20 0.30

020

4060

n = 1600

0.10 0.20 0.30

020

4060

n = 2500

0.10 0.20 0.30

020

4060

L=5

0.10 0.20 0.30

020

4060

0.10 0.20 0.30

020

4060

0.10 0.20 0.30

020

4060

0.10 0.20 0.30

020

4060

L=15

0.10 0.20 0.30

020

4060

0.10 0.20 0.30

020

4060

0.10 0.20 0.30

020

4060

Figure 3: Kernel density estimates for ct for the exponential model.

0 1 2 3 4

0.0

1.0

2.0

n = 100

L=3

0 1 2 3 4

0.0

1.0

2.0

n = 400

0 1 2 3 4

0.0

1.0

2.0

n = 1600

0 1 2 3 4

0.0

1.0

2.0

n = 2500

0 1 2 3 4

0.0

1.0

2.0

L=5

0 1 2 3 4

0.0

1.0

2.0

0 1 2 3 4

0.0

1.0

2.0

0 1 2 3 4

0.0

1.0

2.0

0 1 2 3 4

0.0

1.0

2.0

L=15

0 1 2 3 4

0.0

1.0

2.0

0 1 2 3 4

0.0

1.0

2.0

0 1 2 3 4

0.0

1.0

2.0

Figure 4: Kernel density estimates for σ2t for the Maternν=3/2 model.

32

Page 33: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

0.00 0.04 0.08

050

150

n = 100

L=3

0.00 0.04 0.08

050

150

n = 400

0.00 0.04 0.08

050

150

n = 1600

0.00 0.04 0.08

050

150

n = 2500

0.00 0.04 0.08

050

150

L=5

0.00 0.04 0.08

050

150

0.00 0.04 0.08

050

150

0.00 0.04 0.08

050

150

0.00 0.04 0.08

050

100

200

L=15

0.00 0.04 0.08

050

100

200

0.00 0.04 0.08

050

100

200

0.00 0.04 0.08

050

100

200

Figure 5: Kernel density estimates for ct for the Maternν=3/2 model.

0 1 2 3 4

0.0

1.0

2.0

n = 100

L=3

0 1 2 3 4

0.0

1.0

2.0

n = 400

0 1 2 3 4

0.0

1.0

2.0

n = 1600

0 1 2 3 4

0.0

1.0

2.0

n = 2500

0 1 2 3 4

0.0

1.0

2.0

L=5

0 1 2 3 4

0.0

1.0

2.0

0 1 2 3 4

0.0

1.0

2.0

0 1 2 3 4

0.0

1.0

2.0

0 1 2 3 4

0.0

1.0

2.0

L=15

0 1 2 3 4

0.0

1.0

2.0

0 1 2 3 4

0.0

1.0

2.0

0 1 2 3 4

0.0

1.0

2.0

Figure 6: Kernel density estimates for σ2QB for the exponential model.

33

Page 34: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

0.10 0.20 0.30

020

4060

n = 100

L=3

0.10 0.20 0.30

020

4060

n = 400

0.10 0.20 0.30

020

4060

n = 1600

0.10 0.20 0.30

020

4060

n = 2500

0.10 0.20 0.30

020

4060

L=5

0.10 0.20 0.30

020

4060

0.10 0.20 0.30

020

4060

0.10 0.20 0.30

020

4060

0.10 0.20 0.30

020

4060

L=15

0.10 0.20 0.30

020

4060

0.10 0.20 0.30

020

4060

0.10 0.20 0.30

020

4060

Figure 7: Kernel density estimates for cQB for the exponential model.

0 1 2 3 4

0.0

1.0

2.0

n = 100

L=3

0 1 2 3 4

0.0

1.0

2.0

n = 400

0 1 2 3 4

0.0

1.0

2.0

n = 1600

0 1 2 3 4

0.0

1.0

2.0

n = 2500

0 1 2 3 4

0.0

1.0

2.0

L=5

0 1 2 3 4

0.0

1.0

2.0

0 1 2 3 4

0.0

1.0

2.0

0 1 2 3 4

0.0

1.0

2.0

0 1 2 3 4

0.0

1.0

2.0

L=15

0 1 2 3 4

0.0

1.0

2.0

0 1 2 3 4

0.0

1.0

2.0

0 1 2 3 4

0.0

1.0

2.0

Figure 8: Kernel density estimates for σ2QB for the Maternν=3/2 model.

34

Page 35: Tapered Covariance: Bayesian Estimation and Asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfNov 15, 2010  · tremum estimators and investigate its asymptotic properties. In Section

0.00 0.04 0.08

050

150

n = 100

L=3

0.00 0.04 0.08

050

150

n = 400

0.00 0.04 0.08

050

150

n = 1600

0.00 0.04 0.08

050

150

n = 2500

0.00 0.04 0.08

050

150

L=5

0.00 0.04 0.08

050

150

0.00 0.04 0.08

050

150

0.00 0.04 0.08

050

150

0.00 0.04 0.08

050

100

200

L=15

0.00 0.04 0.080

5010

020

00.00 0.04 0.08

050

100

200

0.00 0.04 0.08

050

100

200

Figure 9: Kernel density estimates for cQB for the Maternν=3/2 model.

35