tapered covariance: bayesian estimation and asymptoticsbs128/papers/taper-bayes_11-15-2010.pdfnov...
TRANSCRIPT
Tapered Covariance: Bayesian Estimation and
Asymptotics
Benjamin Shaby
SAMSI
Research Triangle Park, NC 27709
David Ruppert
OR&IE
Cornell University
Ithaca, NY 14850
November 15, 2010
Author’s Footnote:
Benjamin Shaby is Postdoctoral Fellow, SAMSI, Research Triangle Park, NC 27709
(E-mail: [email protected]). David Ruppert is Andrew Schultz, Jr. Professor of
Engineering, School of Operations Research and Information Engineering, Cornell
University, 14853 (E-mail: [email protected]). This work was supported by NSF
grant ITS 0612031, NIH grant R37 CA057030, and NSF grant DMS-0805975.
Abstract
The method of maximum tapered likelihood has been proposed as a way
quickly to estimate covariance parameters for stationary Gaussian random
fields. We show that under a useful asymptotic regime, maximum tapered
likelihood estimators are consistent and asymptotically normal for covariance
models in common use. We then formalize the notion of tapered quasi-Bayesian
estimators and show that they too are consistent and asymptotically normal.
1
We also present asymptotic confidence intervals for both types of estimators
and show via simulation that they accurately reflect sampling variability, even
at modest sample sizes. Proofs, examples, and detailed derivations are found
in the supplemental materials, available online.
Keywords: Covariance estimation, Gaussian process, Consistency, Bayesian
inference
1 Introduction
Covariance tapering was introduced as a way to mitigate the computational
burdens required for calculating statistically-relevant quantities involving large
covariance matrices arising from irregularly-spaced spatial data. These com-
putations typically require O(n3) operations, where n is the number of spatial
observations. The idea behind tapering is to introduce, in a principled way,
many zeros into the covariance matrices, enabling the use of sparse matrix al-
gorithms, which have computational complexities that are generally functions
of the number of non-zero elements in the matrix.
Tapering has been studied as a way to speed up computations required
for optimal spatial prediction (Furrer et al. 2006; Furrer and Sain 2009) and
for Kalman filter updates (Furrer and Bengtsson 2007). Kaufman (2006) and
Kaufman et al. (2008) introduced the maximum tapered likelihood estimate as
a way to use tapered covariance matrices to quickly estimate covariance func-
tion parameters. Du et al. (2009) and Zhang and Du (2008) further explicated
the properties of these estimators. In addition, Kaufman (2006) discussed ap-
proximating Bayesian estimation using tapered likelihood functions.
Here, we examine the behavior of both maximum tapered likelihood esti-
mators, as well as what we will call tapered quasi-Bayesian estimators (we use
the term quasi-Bayesian here despite its previous introduction in Berger (2000)
2
to describe, pejoratively, something completely different).
Tapering is not the only approach that has been proposed to quickly com-
pute approximations to the likelihood function for large spatial datasets. When
the data are sampled on a regular spatial grid, the resulting structure of the
covariance matrix may be exploited to increase computational efficiency (Whit-
tle 1954; Zimmerman 1989). When data locations are not gridded, it is still
possible to use Fourier transform methods for approximate inference either by
integrating locations them onto a latent grid (Fuentes 2007) or employing a
non-standard periodogram formulation (Matsuda and Yajima 2009). Another
approach for non-gridded data is to factor the full likelihood into conditional
likelihoods that ignore dependence on far-away observations (Vecchia 1988;
Stein et al. 2004). Composite likelihood approaches have also been considered
(Heagerty and Lele 1998; Curriero and Lele 1999). Still another approach is
to project the data onto a lower-dimensional space (Cressie and Johannesson
2008; Banerjee et al. 2008; Finley et al. 2009), although theoretical properties
of these techniques are not known.
Here, like Kaufman (2006), Kaufman et al. (2008), Du et al. (2009), and
Zhang and Du (2008), we study the use of tapering for parameter estimation.
Indeed, the present study may be seen as a follow-up to these works, and
makes use of some of the proof techniques contained in Kaufman (2006) and
Kaufman et al. (2008). Unlike the previous works, which introduced the method
and considered the asymptotic behavior of tapering with the popular Matern
covariance function, we do not restrict ourselves to a single covariance model.
We also devote considerably more theoretical attention to the quasi-Bayesian
perspective than Kaufman (2006).
In addition, while the previous asymptotic studies of the maximum tapered
likelihood estimator are as much results on inconsistency as they are on con-
sistency (see Section 2.1), we provide proofs for consistency and asymptotic
normality of both the maximum tapered likelihood estimator and tapered quasi-
3
Bayesian estimators. The key reason for the stronger results presented here is
that we consider a different type of asymptotic regime.
Asymptotics for random fields, unlike the case of asymptotics for indepen-
dent data, are somewhat ill-defined because the manner in which sample points
are added such that their number increases to infinity is not clear cut. There are
two standard approaches to increasing the number of observations toward infin-
ity (Cressie 1991). The first is called increasing-domain asymptotics, where the
domain expands in spatial extent, while the sampling density stays constant.
The second is called infill, or fixed-domain, asymptotics, where the domain
stays constant, while the sampling density increases to infinity.
For spatial prediction, Stein (1999, Chapter 3.3) prefers infill asymptotics,
arguing essentially that because the usual goal for spatial prediction is inter-
polation, and it is reasonable to posit that the denser the sample gets, the
better the interpolation ought to get, we are led towards fixed-domain asymp-
totics. For parameter estimation, however, the situation is somewhat different
in that it is not immediately clear which asymptotics better represents the case
of infinitely-increasing information.
The maximum likelihood estimator for the parameters of popular covariance
models has been studied under both types of asymptotics. Mardia and Marshall
(1984) showed that the maximum likelihood estimate is consistent and asymp-
totically normal for many covariance models under asymptotic sampling that
includes increasing domain asymptotics as a special case. On the other hand,
Zhang (2004) proved that the parameters of the popular Matern covariance
model (4) are not individually consistently estimable under infill asymptotics.
The parameter estimation analogue to Stein (1999)’s argument for prediction,
then, seems to be that because information increases infinitely in the case of
increasing-domain asymptotics, we might prefer this scheme over fixed-domain
asymptotics, where information does not increase to infinity for all parameters
as the number of sample points increases to infinity. In addition, relative to in-
4
fill, results concerning expanding domain asymptotics are available for a much
more general cases.
In terms of how well the limiting distributions under the two asymptotic
regimes approximate their finite-sample correspondents, Zhang and Zimmer-
man (2005) found that infill asymptotics are preferable in some situations.
However, as we will see from the simulations in Section 2.2, as long as the
spatial extent of the sampling region is large compared to the range of depen-
dence of the process, increasing-domain asymptotics provide a very accurate
description of the behavior of the maximum tapered likelihood estimate.
More concretely, let Z(s), be a Gaussian random field, where s is a location
index that varies continuously over a domain D. Suppose also that we have
observed Z at n locations s1, . . . , sn. The covariance between measurements
at two locations Cov (Z(si), Z(sj)) = C(θ; si, sj) is assumed to be a function
of only the locations themselves, and is known up to a parameter θ. We will
further assume that Z(s) is second order stationary and, for simplicity, has
mean zero. For convenience, Z(s) may also be assumed to be isotropic, so that
C(θ; si, sj) = C(θ;hij), where hij = ‖si − sj‖, although this assumption may
be dropped.
The log likelihood of Zn, a vector of n observations of Z(s), is
`n(θ; Zn) = −1
2log(|Σn(θ)|)− 1
2Z′nΣn(θ)−1Zn, (1)
up to an additive constant (with respect to θ), where Σij,n(θ) = C(θ;hij).
Dependence of `n(θ; Zn) and Σn(θ) on θ will from here on be suppressed
for convenience. Also, for a matrix A, A′ will refer to the transpose of A. The
notation f will refer to the vector of first derivatives of the scalar function f ,
and f will refer to the matrix of second derivatives of f . All derivatives are
with respect to the vector θ.
Let ρt(L;h) be a correlation function and T be a “taper matrix” such that
5
Tij = ρt(L;hij). All we assume about ρt(L;h) is that it does not depend on
θ, that it be a valid correlation function, that its support is [0, L) for some
appropriately chosen L > 0, and that it be greater than zero for h < L. The
choice of L is discussed below. One example taken from a class of compactly-
supported polynomials from Wendland (1995, 1998) is
ρt(L;h) = (1− h/L)6+(1 + 6h/L+ 35h2/3L2). (2)
Other useful examples of such functions may be found in Wendland (1995,
1998).
Define the tapered log likelihood as
`t,n =− 1
2log(|Σn ◦Tn|)
− 1
2Z′n((Σn ◦Tn)−1 ◦Tn
)Zn, (3)
where the ◦ notation denotes the element-wise product, sometimes called the
Hadamard or Schur product. Note that (3) does not correspond to the log
density of any random vector. Importantly, Σn◦Tn is guaranteed to be positive
definite as long as both Σn and Tn are both positive definite (Horn and Johnson
1991, page 458). Equation (3) is referred to in Kaufman (2006) and Kaufman
et al. (2008) as the two-taper approximation.
Their one-taper approximation, which was studied by Du et al. (2009) and
Zhang and Du (2008), does not correspond to an unbiased estimating equation
and can produce significantly biased estimates (Kaufman et al. 2008). We will
not consider it here.
Throughout this study, we favor simplicity of assumptions over slightly more
general results. In many cases, weaker but more elaborate assumptions are
possible, requiring only minimal changes to the proofs.
6
2 The Maximum Tapered Likelihood Esti-
mator
The maximum tapered likelihood estimate is defined as θt,n = argmaxθ∈Θ `t,n(θ).
In Section 2.1, we consider this estimator from within the framework of ex-
tremum estimators and investigate its asymptotic properties. In Section 2.2,
we conduct a simulation experiment to determine, firstly, how quickly and to
what extent asymptotic sampling distributions approximate empirical sampling
distributions, and secondly how the taper range affects sampling variability.
2.1 Asymptotic Behavior of the Maximum Tapered
Likelihood Estimator
Here we study a form of increasing-domain asymptotics. The requirement of
an expanding domain is not stated explicitly, but rather, as in Mardia and
Marshall (1984), is implied by eigenvalue conditions on the covariance matrix
and its derivatives.
In contrast, Kaufman (2006) and Kaufman et al. (2008) study infill asymp-
totics. Specifically, consider the Matern covariance model, defined by
C(σ2, α, ν;h) =σ2(αh)ν
Γ(ν)2ν−1Kν(αh), σ2, α, ν > 0, (4)
where Kν is the modified Bessel function of the second kind of order ν. It is not
uncommon to assume that ν is fixed and known, a practice followed later in this
paper. They show that for a known ν and some fixed α∗, σ2α∗2νa.s.−−→ σ2
0α2ν0 ,
where σ20 and α0 are the true parameters and σ2 is the value that maximizes
(3). Unfortunately, under infill asymptotics, σ2 and α cannot both be estimated
consistently (Zhang 2004).
Kaufman (2006) and Kaufman et al. (2008) argue heuristically that a good
way to measure sampling variability of the estimator is to study a sandwich
7
estimator of the sampling covariance matrix (e.g. Durbin 1960; Bhapkar 1972;
Morton 1981; Ferreira 1982; Godambe and Heyde 1987; Kauermann and Carroll
2001), and mention that, under conditions that are not met in the context of
Gaussian random fields, quasi-likelihood theory says that the quasi-likelihood
estimator of θ is asymptotically normal with covariance equal to the sandwich
matrix (Heyde 1997). We will show formally that under the asymptotic regime
we consider, this sandwich matrix is in fact the asymptotic covariance of θt,n.
2.1.1 Consistency
Let θ0 be the true parameter vector and P0 the probability measure under θ0.
Also, let E0 and Cov0 denote the expectation and covariance, respectively, un-
der θ0. Let λmax{A} and λmin{A} denote, respectively, the largest and small-
est absolute eigenvalues of a matrix A. We assume throughout that standard
measurability conditions hold.
Theorem 2.1.1 (Consistency). Assume θ0 ∈ Θ, a convex compact subset of
Rp.
(A) Assume a continuously differentiable covariance function C(θ;h) and an
increasing sequence of domains {Dn} such that and
(A1) infn λmin{Σn} > 0 and supn λmax{Σn} <∞ for all θ ∈ Θ;
(A2) supn λmax
{∂Σn∂θk◦Tn
}<∞ for all θ ∈ Θ, k = 1, . . . , p.
(A3) For some N > 0 and γ > 0, the minimum eigenvalue of the matrix
−E0[n−1¨t,n(θ0)] is greater than γ for all n ≥ N .
(B) Assume that θ1 6= θ2 ⇒ (Σn(θ1) ◦Tn) 6= (Σn(θ2) ◦Tn).
Then θt,n = argmaxθ∈Θ `t,n(θ) is consistent.
Proof. See Section S-3 in the supplemental materials.
We note that the dimension of the domain D does not play a direct role
in assumptions (A) and (B). For the covariance models often used in practice,
8
the identifiability assumption (B) is usually satisfied by making sure the taper
range L is larger than the smallest interpoint distances in the data. Assumption
(A) takes a bit more work to check, but is not prohibitively difficult in most
useful contexts. The following example shows one common case.
Example 2.1.1 Exponential covariance function on a rectangular lat-
tice (I)
A commonly-used covariance function is the exponential covariance function
C(θ;h) = σ2 exp{−αh}, where θ = (σ2, α)′. Consider the increasing domain
scenario where sampling points s1, . . . , sn are placed on a rectangular lattice
{Dn} ⊂ ∆Zd, for a fixed 0 < ∆ < L, and Dn ⊂ Dn+1 for all n.
Define the distance matrix Hn, with Hij,n = ‖si − sj‖, where ‖ · ‖ denotes
the Euclidian norm. Let Σn = σ2Γn, so that the correlation matrix Γij,n =
e−αHij,n . Taking derivatives, we get ∂Σn∂σ2 = Γn and ∂Σn
∂α = −σ2Hn ◦ Γn.
For any matrix norm, the spectral radius λmax{A} of an n × n matrix A
satisfies λmax{A} ≤ ‖A‖ (Horn and Johnson 1991, page 297). If we choose the
maximum row sum norm (Horn and Johnson 1991, page 295),
λmax{Γn} ≤ ‖Γn‖∞
= sup1≤i≤n
n∑j=1
e−αHij
≤∑
s∈∆Zd
e−α‖s‖. (5)
One can check that∫
s∈Rd
e−α‖s‖ ds < ∞ for α > 0, which implies that the last
line in (5) is finite, so supn λmax{Γn} is finite. We finish confirming (A1) by
noting that all interpoint distances are fixed away from zero, ensuring that Γn
is strictly positive definite as n→∞.
9
Next, for a taper function ρt(L;h) with support on [0, L),
λmax{(Hn ◦ Γn ◦Tn)} ≤ ‖(Hn ◦ Γn ◦Tn)‖∞
= sup1≤i≤n
n∑j=1
Hije−αHijρt(L; Hij)
≤ sup1≤i≤n
n∑j=1
Hije−αHij11{Hij<L}
≤∑
s∈∆Zd
‖s‖e−α‖s‖11{‖s‖<L}
=∑
s∈∆Zd∩{s:‖s‖<L}
‖s‖e−α‖s‖. (6)
The last line in (6) is the sum of a finite number of bounded summands, and
is therefore itself finite. Thus, for σ2 < ∞, supn λmax{(Hn ◦ Γn ◦ Tn)} < ∞.
Now we can see that supn λmax
{∂Σn∂θk◦ Tn
}< ∞ for k = 1, 2, thus verifying
Assumption (A2).
For Assumption (A3) it is possible to write down the form of the two eigen-
values analytically. Let qij denote the ijth entry of −E0[n−1¨t,n(θ0)]. Following
the computations in (S-9), we see that, suppressing the dependence on n for
the moment,
q11 =1
2σ4tr{
(Γ ◦T)−1(Γ ◦T)(Γ ◦T)−1(Γ ◦T)}
=1
2σ4tr{In}
=n
2σ4
q12 = q21 = − 1
2σ2tr{
(Γ ◦T)−1(H ◦ Γ ◦T)}
q22 =1
2tr{
(Γ ◦T)−1(H ◦ Γ ◦T)(Γ ◦T)−1(H ◦ Γ ◦T)}
The eigenvalues of −E0[n−1¨t,n(θ0)] are
1
2
(q11 + q22 ±
√4q2
12 + (q11 − q22)2). (7)
10
Noting that q11 and q22 are both positive, the smaller of the eigenvalues is
positive when (q11 + q22)2 > 4q212 + (q11 − q22)2, or, after rearranging terms,
when the determinant q11q22−q212 > 0. Letting An = (Γn◦Tn)−1(Hn◦Γn◦Tn),
we can write the determinant as
q11q22 − q212 =
n
4σ4tr{A2
n} −1
4σ4tr2{An}
=n
4σ4
n∑i=1
(λi{An})2 − 1
4σ4
( n∑i=1
λi{An})2. (8)
The right hand side of (8) will decay to zero with increasing n only if the
eigenvalues of An converge to a common value . Because A is symmetric, this
can only happen if (Γn ◦Tn)−1(Hn ◦ Γn ◦Tn)− anIn → 0 for some scalar an,
which is clearly not the case. Thus, Assumption (A3) is satisfied.
To show Assumption (B), we write (Σn(θ)◦Tn)ij = σ2 exp{−αhij}ρt(L;hij).
Since ρt(L;h) > 0 when h < L, Assumption (B) holds as long as hij > L for
some at least two pairs of points ij and i′j′ for which hij 6= hi′j′ . For the
rectangular grid, this is true for for n ≥ 3.
We now see that the exponential covariance function on an expanding grid
satisfies assumptions (A) and (B), so the maximum tapered likelihood estimate
of θ = (σ2, α)′ is consistent.
Although covariance tapering is designed for use on irregularly-spaced datasets,
a regular grid was chosen for Example 2.1.1 to make the analysis tractable. We
demonstrate the use of the tapered likelihood on irregularly-spaced datasets in
simulations in Sections 2.2 and 3.2.
The argument in Example 2.1.1 is trivially repeated for other classes of
covariance functions. Notably, the Maternν=3/2 model, with C(θ;h) = σ2(1 +
αh) exp{−αh}, θ = (σ2, α)′, as well as the squared exponential model, with
C(θ;h) = σ2 exp{−(αh)2}, θ = (σ2, α)′, are easily accommodated.
11
2.1.2 Asymptotic Normality
Let us first define some important matrices. Let
Pn = E0[ ˙ t,n ˙ ′t,n]
Qn = −E0[¨t,n]
Jn = QnP−1n Qn. (9)
The matrix J−1n is familiar from generalized estimating equations, quasi-likelihood,
and other areas, and is referred to by various names, including the sandwich
matrix, the Godambe information criterion, and the robust information crite-
rion (e.g. Durbin 1960; Bhapkar 1972; Morton 1981; Ferreira 1982; Godambe
and Heyde 1987; Heyde 1997). We are now ready to prove two useful lemmas.
Lemma 2.1.1. Assume conditions (A1)–(A3).
Then ˙t,n
D−→ N(0,Pn), where Pn = E0[ ˙ t,n ˙ ′t,n].
Proof. See Section S-3 in the supplemental materials.
Lemma 2.1.2. Assume a twice continuously differentiable covariance function
C(θ;h) and an increasing sequence of domains {Dn} such that (Σn ◦ Tn)−1
exists. Also, assume (A1)–(A3), along with
(A4) supn λmax
{∂2Σn∂θj∂θk
◦Tn
}<∞ for all θ ∈ Θ, j, k = 1, . . . , p.
Then n−1¨t,n − n−1Qn
P0−−→ 0, where Qn = −E0[¨t,n].
Proof. See Section S-3 in the supplemental materials.
Introducing some more notation, a p× p symmetric positive definite matrix
A can be written as A = ODO′ with O orthogonal and D diagonal. Define
A1/2 to be the square root OD1/2O′.
12
Theorem 2.1.2 (Asymptotic Normality). Assume the conditions of Theorem
2.1.1 and Lemmas 2.1.1 and 2.1.2. Then
J1/2n (θt,n − θ0)
D−→ N(0, I),
where Jn is defined as in (9).
Proof. See Section S-3 in the supplemental materials.
Example 2.1.2 Exponential covariance function on a rectangular lat-
tice (II)
We consider the same setting as Example 2.1.1. Since we have already shown
that conditions (A1)–(A3) are satisfied, all we have left is (A4).
The second partial derivatives are
∂2Σn
∂σ2∂α= Hn ◦ Γn
∂2Σn
∂σ4= 0
∂2Σn
∂α2= σ2Hn ◦Hn ◦ Γn.
We showed in Example 2.1.1 that supn λmax{(Hn◦Γn◦Tn)} <∞, and obviously
supn λmax{0} < ∞. So all that is left to do to demonstrate (A4) is show that
supn λmax{(Hn ◦Hn ◦ Γn ◦Tn)} <∞.
13
As before, for a taper function ρt(L;h) with support on [0, L),
λmax{(Hn ◦Hn◦Γn ◦Tn)}
≤ ‖(Hn ◦Hn ◦ Γn ◦Tn)‖∞
= sup1≤i≤n
n∑j=1
H2ije−αHijρt(L; Hij)
≤ sup1≤i≤n
n∑j=1
H2ije−αHij11{Hij<L}
≤∑
s∈∆Zd
‖s‖2e−α‖s‖11{‖s‖<L}
=∑
s∈∆Zd∩{s:‖s‖<L}
‖s‖2e−α‖s‖. (10)
Again, as before, the last line in (10) is the sum of a finite number of bounded
summands, and is therefore itself bounded. Thus, for σ2 < ∞, we see that
supn λmax{(Hn ◦Hn ◦ Γn ◦ Tn)} = 1/σ2 supn λmax
{∂Σn∂σ2 ◦ Tn
}< ∞. So (A4)
is satisfied, and the maximum tapered likelihood estimate of θ = (σ2, α) is
asymptotically normal with asymptotic variance Jn.
We note again that the calculations in Example 2.1.2 are trivially modified
to accommodate the Maternν=3/2 and squared exponential models.
2.2 Simulation Example
To explore the sampling characteristics of the maximum tapered likelihood
estimator, we simulated datasets with exponential and Maternν=3/2 covariance
functions, a setup following examples 2.1.1 and 2.1.2. Data locations are sited
irregularly on a unit grid purturbed with iid uniform(-1/3, 1/3) deviations,
closely following Stein et al. (2004) and Kaufman et al. (2008).
Each dataset consisted of 1000 random samples drawn from a N(0,Σ),
where Σij = σ2 exp{−αhij} in the exponential case and Σij = σ2(1+αh) exp{−αhij}
in the Maternν=3/2 case. σ2 was set at 1, and the α parameters were set at
14
0.2 and 0.316, respectively, for the exponential and Maternν=3/2 models, giving
an effective range of 15 for all simulations. We used the re-parametrization
θ = (σ2, c)′, where c = σ2α2ν , because this parametrization is more easily iden-
tified by the data (Zhang 2004; Kaufman 2006). For the taper function, we
chose (2) from the class of compactly-supported polynomial correlation func-
tions introduced by Wendland (1995) and first used for tapering by Furrer et al.
(2006).
Sample points were located on the purturbed square grid (1, . . . ,√n) ×
(1, . . . ,√n), with n = 100, 400, 1600, and 2500. All computations were carried
out on a 2.3 GHz Linux machine using the R package spam for handling sparse
matrices.
[Figure 1 about here.]
The covariance function and tapered covariance functions are shown in Fig-
ure 1 for the three different taper ranges used in this simulation experiment.
The severity of the taper at L = 3 is evident. The times required to compute
a single evaluation of the tapered and full log likelihood functions are shown in
Figure S-1 in the supplemental materials.
For each dataset, the maximum tapered likelihood estimate of θ = (σ2, c)′
was calculated using three different taper ranges, L = 15, 5, and 3. In addition,
confidence intervals for θ were computed from J(θ) for each θ. L = 15 was
chosen because it is roughly equal to the “effective range,” the distance at which
correlations drop below .05, of the process. L = 5 is equal to 1/α, the range
parameter of the exponential covariance function in its usual parametrization.
Finally, L = 3 was chosen because it represents an extreme case of being close
to the smallest value for which each sample point is guaranteed to have its eight
immediate neighbors contained within the taper range.
Finally, for each combination of L and n, the sample covariance matrix
was calculated for the 1000 θts and compared to the matrix J, the asymp-
15
totic covariance matrix, calculated using (S-9) and (S-11) in Section S-1 in the
supplemental materials.
It is interesting to note that the domain corresponding to the largest sample
size considered here is a square grid 50 units on a side. The domain then is just
a few times larger than 15-unit effective range of the simulated process. This
configuration is important to keep in mind when considering the applicability of
expanding domain asymptotics to parameter estimation with data from small
or moderately-sized domains.
The resulting empirical density estimates from the simulations are shown in
Figures 2, 3, 4, and 5. In all of these figures, as expected, we see the empirical
densities become more symmetric and more sharply peaked as the sample size
increases. Comparing across taper ranges, empirical density plots for σt at the
three taper ranges look very similar for both covariance models. The densities
of ct show small differences for the exponential model and marked differences
for the Maternν=3/2 model, but they are not as large as one might expect
when tapering so severely. Kaufman et al. (2008) noted the the similarities
between the empirical densities computed from maximum likelihood estimates
and maximum tapered likelihood estimates, when applied to the exponential
model.
[Figure 2 about here.]
[Figure 3 about here.]
[Figure 4 about here.]
[Figure 5 about here.]
Coverage rates for nominal 95% confidence intervals are shown in Tables 1
and 2. The normal approximation for the c parameter is excellent (see QQ-
plots, Figures S-3 and S-5 in the Supplementary Materials), and the coverage
rates are spot-on for both covariance models. Despite notable departures from
16
normality for the σ2 estimates (see Figures S-2 and S-4 in the Supplementary
Materials), coverage rates are quite reasonable for moderately-sized datasets
and excellent for large datasets.
[Table 1 about here.]
[Table 2 about here.]
For each sample size and taper range, tables comparing sample covariance
matrices of the 1000 estimates of θ to their corresponding asymptotic calcula-
tions using the sandwich matrix (9) can be found in Section S-5. Importantly,
Tables S-1 and S-2 show that asymptotic calculations of sampling variability
become accurate at moderate sample sizes.
3 The Tapered Quasi-Bayesian Estimator
Often one might prefer Bayesian estimation over maximum likelihood-type es-
timation for covariance parameters of Gaussian random fields for all the usual
reasons: the ability to incorporate prior knowledge, the natural inclusion of pa-
rameter shrinkage, the straightforward extension to larger hierarchical models,
and so on.
Here we investigate the properties of tapered quasi-Bayesian estimators,
which are analagous to Bayesian estimators, only the likelihood is replaced
by the tapered likelihood. Specifically, we show that tapered quasi-Bayesian
estimators are consistent in that the quasi posterior, defined below in (11),
converges to a point mass at the true parameter θ0. We also show that the
quasi-posterior is asymptotically normal, and hence that samples from the
quasi-posterior (generated by MCMC, e.g.) can be used to construct consistent
confidence intervals for θ.
Following Chernozhukov and Hong (2003), we define the tapered quasi-
17
posterior distribution as
πt,n(θ|Zn) =Lt,n(θ; Zn)π(θ)∫
Θ Lt,n(θ; Zn)π(θ) dθ, (11)
where Lt,n(θ; Zn) = exp {`t,n(θ; Zn)}, and π(θ) is a prior density on θ. We will
assume, for convenience, that π(θ) proper. Recall that Lt,n is not a density,
and thus πt,n(θ|Zn) is not a true posterior. We are guaranteed, however, that
as long as the prior π(θ) is proper, then πt,n(θ|Zn) will be a proper density
(Kaufman 2006).
In Section 3.1, we closely follow Chernozhukov and Hong (2003), who study
quasi-Bayesian estimation in a more general context. However, while their for-
mulation is the same, their outlook is quite different. They view quasi-Bayesian
estimation as a tool to enable the use of MCMC to maximize objective functions
that are not differentiable or are otherwise poorly behaved. That is, they view it
as a way to use Bayesian computational machinery to compute frequentist esti-
mators. Although we study frequentist properties of quasi-Bayesian estimators
here, we consider the quasi-Bayesian framework an a computationally-tractable
alternative to exact Bayesian methods.
3.1 Asymptotic Behavior of Tapered Quasi-Bayesian
Estimators
The results in this section are applications of theorems in Chernozhukov and
Hong (2003), which we re-state in the language of tapering. The main difficulty
in applying their theory to case of the the tapered likelihood is showing that
their Assumptions 1–4 are satisfied. In particular, Chernozhukov and Hong
(2003) assume consistency of the extremum estimator and asymptotic normality
of the first derivative of the objective function, which we have shown in Theorem
2.1.1 and Lemma 2.1.2. The remaining assumptions of Chernozhukov and Hong
(2003) follow easily from the conditions for Theorem 2.1.1 and Lemmas 2.1.1
18
and 2.1.2, so we will assume that they hold throughout this section.
3.1.1 Convergence of the Tapered Quasi-Posterior
First, we define the total variation of moments norm of a real-valued measurable
function f as ‖f‖TVM(ω) ≡∫(1 + ‖δ‖ω)|f(δ)|dδ. Note that the special case of
ω = 0 is the usual total variation norm.
We now define the parameter δ, the scaled deviation from θ0, centered at the
tapered score ˙t,n(θ0) (scaled by ¨
t,n(θ0)) as δ =√n(θ−θ0)−
√n ¨
t,n(θ0)−1 ˙t,n(θ0).
The tapered quasi-posterior density of δ is then π∗t,n(δ|Zn) = 1√nπt,n
(δ√n
+θ0 +
¨t,n(θ0)−1 ˙
t,n(θ0)). We are now ready to state a consistency result about the
tapered quasi-posterior distribution.
Theorem 3.1.1 (Theorem 1 of C-H). For any 0 ≤ ω <∞
‖π∗t,n(δ|Zn)− π∗t,∞(δ|Zn)‖TVM(ω)P−→ 0,
where
π∗t,∞(δ|Zn) =
(|Qn/n|(2π)p
)1/2
exp{− 1
2δ′(Qn/n)δ
}.
From Theorem 3.1.1 we see that πt,n(θ|Zn) concentrates its mass at θ0 at
a rate of 1/√n, as measured by the total variation of moments norm. Then,
asymptotically, πt,n(θ|Zn) approximates a normal random variable with mean
θ0 + Q−1n
˙t,n(θ0) and covariance matrix Q−1
n .
3.1.2 Tapered Quasi-Bayesian Point Estimates
We will construct tapered quasi-Bayesian point estimates in a manner analogous
to the construction of proper Bayes estimators. Let the scaler function κn(u) be
a loss function. For simplicity, we will only consider symmetric loss functions,
with κn(u) = κn(−u), although this restriction is not necessary. Common
symmetric loss functions include the quadratic and absolute loss functions.
19
We can now define the tapered quasi-posterior risk function as the expected
loss (with respect to the quasi-posterior), Rn(θ) =∫Θκn(θ−θ∗)πt,n(θ∗|Zn) dθ∗.
Then for a given choice of loss function κn(u), the tapered quasi-Bayes esti-
mator is the value of θ that minimizes the tapered quasi-posterior risk, θQB =
argminθ∈ΘRn(θ). As usual, quadratic and absolute loss functions will lead, re-
spectively, to the tapered quasi-posterior mean and median as the quasi-Bayes
estimators.
Theorem 3.1.2 (Theorem 2 of C-H). For a symmetric loss function κn(u),
J1/2n (θQB − θ0)
D−→ N(0, I).
What we see from Theorem 3.1.2 is that tapered quasi-Bayes estimators
such as the quasi-posterior mean and median are asymptotically normal with
covariance equal to the sandwich matrix Jn, the same asymptotic distribution
as the maximum tapered likelihood estimator.
3.1.3 Tapered Quasi-Bayesian Confidence Regions
We now turn from the question of constructing point estimates to the question
of constructing confidence regions from the tapered quasi-posterior. Here we
continue to use a frequentist vocabulary to derive frequentist properties, even
though we are studing quasi-Bayesian inference.
In constructing intervals, what we would like, from a practical point of view,
is to directly use the empirical quantiles of a sample (generated from MCMC)
from the tapered quasi-posterior as our confidence region. What we will see,
however, is that this approach does not yield asymptotically valid confidence
intervals. The question of how good such regions are is investigated in Section
3.2.1.
Although the quantiles of πt,n(θ|Zn) do not converge to the quantiles of the
limiting normal distribution if θQB, we can still use a quasi-posterior sample to
20
construct intervals that are consistent using the delta method.
Theorem 3.1.3 (Theorem 4 of C-H). Let g(θ) be some scalar function of θ,
and let Jn be a consistent estmator for Jn. That is, J−1n Jn
P−→ I.
Define
ct,g,n(α) = g(θQB) + zα
√g(θQB)′J−1
n g(θQB),
where zα is the αth quantile of the standard normal distribution. Then
limn→∞
P{ct,g,n(α/2) ≤ g(θ0) ≤ ct,g,n(1− α/2)} = 1− α.
This method, of course, requires one to somehow calculate J−1n , which itself
requires Pn and Qn. Assuming a sample from πt,n(θ|Zn), generated by MCMC,
is available, a simple way of estimating Q−1n that immediately presents itself is
to compute the sample covariance matrix of the chain. Another possibility is
to plug θQB into (S-9) and (S-11) to estimate Pn and Qn. We compare these
methods in Section 3.2.1.
3.2 Simulation Example
Simulations to explore the sampling properties of tapered quasi-Bayesian es-
timators were set up in exactly the same way as in Section 2.2, with 1000
samples drawn from a N(0,Σ), where Σ comes from an exponential model and
a Maternν=3/2 model, with σ2 = 1, and c = σ2α = 0.2 for the exponential and
c = σ2α3 ≈ 0.03 for the Maternν=3/2, giving effective ranges of around 15. We
again used the taper function (2) from Wendland (1995). Sample points were
again located on an expanding perturbed square grid, with n = 100, 400, 1600,
and 2500. Both parameters were assigned Cauchy prior distributions with scale
parameter 10, truncated to have non-negative support. This prior is proper but
extremely weakly informative.
Because of the computational infeasibility of conducting 1000 long MCMC
21
runs for each model, sample size, and taper range, MCMC was avoided by
defining the estimators θQB as quasi-posterior modes. This enabled θQB to
be computed much faster using numerical optimization routines on the un-
normalized quasi-posteriors.
Empirical density estimates of the resulting estimates are shown in Figures
6, 7, 8, and 9. Not surprisingly, these plots are almost indistinguishible from
those in Section 2.2, and show the same trend of collecting mass at the true
parameter as the sample size increases, while, in the case of σ2 for both mod-
els and c for the exponential model, not dissipating much as the taper range
decreases.
[Figure 6 about here.]
[Figure 7 about here.]
[Figure 8 about here.]
[Figure 9 about here.]
Tables S-3 and S-4 in the supplemental materials also tell a similar story,
that for the models considered here, decreases in statistical efficiency due to
tapering are acceptable when factored against the increases in computational
efficiency, and that the asymptotic variances based on the sandwich matrix
provide good estimates of sampling variability, even at modest sample sizes.
3.2.1 Comparing confidence intervals
An additional simulation was conducted to test the accuracy of confidence in-
tervals constructed from MCMC samples, as described in Section 3.1.3. 1000
datasets were constructed on a perturbed grid, with n = 1600, from the expo-
nential and Maternν=3/2 covariance models. In both cases, the range parameter
c was chosen to give an effective range of 4.75 for the associated process. For
each dataset, 95% confidence intervals were constructed three ways: as the
22
0.025 and 0.975 quantiles of the MCMC sample, using Theorem 3.1.3 from Sec-
tion 3.1.3 with Q estimated by plugging θQB into (S-9), and using Theorem
3.1.3 with Q estimated as the inverse of the sample covariance of the MCMC
sample. We refer to these three methods, respectively, as the naive, asymptotic,
and MCMC estimators.
[Table 3 about here.]
From Table 3, we see that the naive intervals substantially under-cover σ2
while catastrophically over-covering c for both covariance models. The asymp-
totic and MCMC intervals both do extremely well, covering at almost exactly
their nominal 95% rate. The MCMC method is easier to compute, which rec-
ommends it slightly over the asymptotic method.
4 Discussion
Covariance tapering provides a way to estimate parameters of stationary Gaus-
sian random fields from very large datasets. The method provides huge gains
in computational efficiency by paying a relatively small price in statistical effi-
ciency, and extends the tractible size of problems many fold relative to standard
likelihood estimation tools. We have emphasized computational efficiencies in-
duced by using sparse matrix computations, but the sparse representation also
allows for large improvements in storage efficiency, enabling one to work with
datasets for which dense covariance matrices cannot even be stored in memory.
We showed that asymptotic theory for tapering provides a way to construct
sensible confidence intervals. Furthermore, we provided a formal framework for
incorporating tapered likelihoods into an approximate Bayesian inferential and
computational engine.
The theoretical results presented here are only possible in the expanding
domain asymptotic regime, which others have argued is less appropriate than
23
the infill regime (Zhang 2004; Zhang and Zimmerman 2005). In the context of
parameter estimation, we have suggested the converse in Section 1, and demon-
strated, through simulation, the accuracy of calculations based on expanding
domain asymptotics in a scenario where the domain is not much larger than
the range of the process. In practice, we suspect that it would not be difficult
to find datasets for which the characteristics of parameter estimation are better
approximated by one or the other types of asymptotics.
5 Supplemental Materials
Supplemental Materials: Proofs, detailed derivations, examples, and addi-
tional data analyses (.pdf file)
References
Banerjee, S., Gelfand, A. E., Finley, A. O., and Sang, H. (2008), “Gaussian
Predictive Process Models for Large Spatial Data Sets,” J. R. Stat. Soc. Ser.
B Stat. Methodol., 70, 825–848.
Berger, J. O. (2000), “Bayesian analysis: a look at today and thoughts of
tomorrow,” J. Amer. Statist. Assoc., 95, 1269–1276.
Bhapkar, V. P. (1972), “On a measure of efficiency of an estimating equation,”
Sankhya Ser. A, 34, 467–472.
Chernozhukov, V. and Hong, H. (2003), “An MCMC approach to classical
estimation,” J. Econometrics, 115, 293–346.
Cressie, N. and Johannesson, G. (2008), “Fixed rank kriging for very large
spatial data sets,” J. Roy. Statist. Soc. Ser. B, 70, 209–226.
24
Cressie, N. A. C. (1991), Statistics for spatial data, Wiley Series in Probability
and Mathematical Statistics: Applied Probability and Statistics, New York:
John Wiley & Sons Inc., a Wiley-Interscience Publication.
Curriero, F. C. and Lele, S. (1999), “A composite likelihood approach to semi-
variogram estimation,” J. Agric. Biol. Environ. Stat., 4, 9–28.
Du, J., Zhang, H., and Mandrekar, V. (2009), “Fixed-domain asymptotic prop-
erties of tapered maximum likelihood estimators,” The Annals of Statistics,
37, 3330–3361.
Durbin, J. (1960), “Estimation of parameters in time-series regression models,”
J. Roy. Statist. Soc. Ser. B, 22, 139–153.
Ferreira, P. E. (1982), “Multiparametric estimating equations,” Ann. Inst.
Statist. Math., 34, 423–431.
Finley, A., Sang, H., Banerjee, S., and Gelfand, A. (2009), “Improving the per-
formance of predictive process modeling for large datasets,” Computational
statistics & data analysis, 53, 2873–2884.
Fuentes, M. (2007), “Approximate likelihood for large irregularly spaced spatial
data,” J. Amer. Statist. Assoc., 102, 321–331.
Furrer, R. and Bengtsson, T. (2007), “Estimation of high-dimensional prior
and posterior covariance matrices in Kalman filter variants,” J. Multivariate
Anal., 98, 227–255.
Furrer, R., Genton, M. G., and Nychka, D. (2006), “Covariance tapering for
interpolation of large spatial datasets,” J. Comput. Graph. Statist., 15, 502–
523.
Furrer, R. and Sain, S. (2009), “Spatial model fitting for large datasets with
applications to climate and microarray problems,” Statistics and Computing,
19, 113–128.
25
Godambe, V. P. and Heyde, C. C. (1987), “Quasi-likelihood and optimal esti-
mation,” Internat. Statist. Rev., 55, 231–244.
Heagerty, P. J. and Lele, S. R. (1998), “A composite likelihood approach to
binary spatial data,” J. Amer. Statist. Assoc., 93, 1099–1111.
Heyde, C. C. (1997), Quasi-likelihood and its application, Springer Series in
Statistics, New York: Springer-Verlag, a general approach to optimal pa-
rameter estimation.
Horn, R. A. and Johnson, C. R. (1991), Topics in matrix analysis, Cambridge:
Cambridge University Press.
Kauermann, G. and Carroll, R. J. (2001), “A note on the efficiency of sandwich
covariance matrix estimation,” J. Amer. Statist. Assoc., 96, 1387–1396.
Kaufman, C. (2006), “Covariance Tapering for Likelihood Based Estimation
in Large Spatial Datasets,” Ph.D. thesis, Carnegie Mellon University, Pitts-
burgh, Pennsylvania 15213.
Kaufman, C., Schervish, M., and Nychka, D. (2008), “Covariance Tapering for
Likelihood-Based Estimation in Large Spatial Datasets,” J. Amer. Statist.
Assoc., 103, 1545–1569.
Mardia, K. V. and Marshall, R. J. (1984), “Maximum likelihood estimation of
models for residual covariance in spatial regression,” Biometrika, 71, 135–
146.
Matsuda, Y. and Yajima, Y. (2009), “Fourier analysis of irregularly spaced
data on Rd,” J. Roy. Statist. Soc. Ser. B, 71, 191–217.
Morton, R. (1981), “Efficiency of estimating equations and the use of pivots,”
Biometrika, 68, 227–233.
26
Stein, M. L. (1999), Interpolation of spatial data, Springer Series in Statistics,
New York: Springer-Verlag, some theory for Kriging.
Stein, M. L., Chi, Z., and Welty, L. J. (2004), “Approximating likelihoods for
large spatial data sets,” J. R. Stat. Soc. Ser. B Stat. Methodol., 66, 275–296.
Vecchia, A. V. (1988), “Estimation and model identification for continuous
spatial processes,” J. Roy. Statist. Soc. Ser. B, 50, 297–312.
Wendland, H. (1995), “Piecewise polynomial, positive definite and compactly
supported radial functions of minimal degree,” Adv. Comput. Math., 4, 389–
396.
— (1998), “Error estimates for interpolation by compactly supported radial
basis functions of minimal degree,” J. Approx. Theory, 93, 258–272.
Whittle, P. (1954), “On stationary processes in the plane,” Biometrika, 41,
434–449.
Zhang, H. (2004), “Inconsistent estimation and asymptotically equal interpo-
lations in model-based geostatistics,” J. Amer. Statist. Assoc., 99, 250–261.
Zhang, H. and Du, J. (2008), “Covariance tapering in spatial statistics,” in
Positive definite functions: From Schoenberg to space-time challenges, eds.
Mateu, J. and Porcu, E., Graficas Castan, s.l.
Zhang, H. and Zimmerman, D. L. (2005), “Towards reconciling two asymptotic
frameworks in spatial statistics,” Biometrika, 92, 921–936.
Zimmerman, D. L. (1989), “Computationally exploitable structure of covari-
ance matrices and generalized covariance matrices in spatial models,” J.
Statist. Comput. Simulation, 32, 1–15.
27
Table 1: Coverage rates for nominal 95% confidence intervals, constructed using thesandwich formula, for the maximum tapered likelihood estimates of the σ2 parameterof exponential and Maternν=3/2 covariance models.
nL 100 400 1600 25003 0.81 0.85 0.92 0.91
Exponential 5 0.76 0.87 0.92 0.9415 0.79 0.85 0.89 0.943 0.78 0.88 0.92 0.93
Maternν=3/2 5 0.79 0.86 0.92 0.9215 0.81 0.89 0.92 0.92
Table 2: Coverage rates for nominal 95% confidence intervals, constructed using thesandwich formula, for the maximum tapered likelihood estimates of the c parameterof exponential and Maternν=3/2 covariance models.
nL 100 400 1600 25003 0.94 0.94 0.95 0.94
Exponential 5 0.95 0.94 0.96 0.9615 0.95 0.94 0.95 0.943 0.94 0.94 0.96 0.94
Maternν=3/2 5 0.95 0.95 0.95 0.9615 0.95 0.95 0.93 0.95
28
Table 3: Coverage rates based on draws from the quasi-posterior for nominal 95%credible intervals for the exponential model, c = σ2α = 1.58, and the Maternν=3/2
model, c = σ2α3 = 1 (both with effective range ≈ 4.75).
type σ2 cnaive 0.697 1.000
Exponential asymptotic 0.946 0.947MCMC 0.950 0.956naive 0.623 1.000
Maternν=3/2 asymptotic 0.951 0.948MCMC 0.954 0.952
29
List of Figures
1 Covariance and tapered covariance functions for L = 3, 5,and 15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2 Kernel density estimates for σ2t for the exponential model. 31
3 Kernel density estimates for ct for the exponential model. . 324 Kernel density estimates for σ2
t for the Maternν=3/2 model. 325 Kernel density estimates for ct for the Maternν=3/2 model. 336 Kernel density estimates for σ2
QB for the exponential model. 337 Kernel density estimates for cQB for the exponential model. 348 Kernel density estimates for σ2
QB for the Maternν=3/2 model. 349 Kernel density estimates for cQB for the Maternν=3/2 model. 35
30
0 5 10 15 20
0.0
0.2
0.4
0.6
0.8
1.0
distanceh
Tapered covariance functions
exponential cov. fn.tapered exponential cov. fn.Maternν=3 2 cov. fn.tapered Maternν=3 2 cov. fn.
Figure 1: Covariance and tapered covariance functions for L = 3, 5, and 15.
0 1 2 3 4
0.0
1.0
2.0
n = 100
L=3
0 1 2 3 4
0.0
1.0
2.0
n = 400
0 1 2 3 4
0.0
1.0
2.0
n = 1600
0 1 2 3 4
0.0
1.0
2.0
n = 2500
0 1 2 3 4
0.0
1.0
2.0
L=5
0 1 2 3 4
0.0
1.0
2.0
0 1 2 3 4
0.0
1.0
2.0
0 1 2 3 4
0.0
1.0
2.0
0 1 2 3 4
0.0
1.0
2.0
L=15
0 1 2 3 4
0.0
1.0
2.0
0 1 2 3 4
0.0
1.0
2.0
0 1 2 3 4
0.0
1.0
2.0
Figure 2: Kernel density estimates for σ2t for the exponential model.
31
0.10 0.20 0.30
020
4060
n = 100
L=3
0.10 0.20 0.30
020
4060
n = 400
0.10 0.20 0.30
020
4060
n = 1600
0.10 0.20 0.30
020
4060
n = 2500
0.10 0.20 0.30
020
4060
L=5
0.10 0.20 0.30
020
4060
0.10 0.20 0.30
020
4060
0.10 0.20 0.30
020
4060
0.10 0.20 0.30
020
4060
L=15
0.10 0.20 0.30
020
4060
0.10 0.20 0.30
020
4060
0.10 0.20 0.30
020
4060
Figure 3: Kernel density estimates for ct for the exponential model.
0 1 2 3 4
0.0
1.0
2.0
n = 100
L=3
0 1 2 3 4
0.0
1.0
2.0
n = 400
0 1 2 3 4
0.0
1.0
2.0
n = 1600
0 1 2 3 4
0.0
1.0
2.0
n = 2500
0 1 2 3 4
0.0
1.0
2.0
L=5
0 1 2 3 4
0.0
1.0
2.0
0 1 2 3 4
0.0
1.0
2.0
0 1 2 3 4
0.0
1.0
2.0
0 1 2 3 4
0.0
1.0
2.0
L=15
0 1 2 3 4
0.0
1.0
2.0
0 1 2 3 4
0.0
1.0
2.0
0 1 2 3 4
0.0
1.0
2.0
Figure 4: Kernel density estimates for σ2t for the Maternν=3/2 model.
32
0.00 0.04 0.08
050
150
n = 100
L=3
0.00 0.04 0.08
050
150
n = 400
0.00 0.04 0.08
050
150
n = 1600
0.00 0.04 0.08
050
150
n = 2500
0.00 0.04 0.08
050
150
L=5
0.00 0.04 0.08
050
150
0.00 0.04 0.08
050
150
0.00 0.04 0.08
050
150
0.00 0.04 0.08
050
100
200
L=15
0.00 0.04 0.08
050
100
200
0.00 0.04 0.08
050
100
200
0.00 0.04 0.08
050
100
200
Figure 5: Kernel density estimates for ct for the Maternν=3/2 model.
0 1 2 3 4
0.0
1.0
2.0
n = 100
L=3
0 1 2 3 4
0.0
1.0
2.0
n = 400
0 1 2 3 4
0.0
1.0
2.0
n = 1600
0 1 2 3 4
0.0
1.0
2.0
n = 2500
0 1 2 3 4
0.0
1.0
2.0
L=5
0 1 2 3 4
0.0
1.0
2.0
0 1 2 3 4
0.0
1.0
2.0
0 1 2 3 4
0.0
1.0
2.0
0 1 2 3 4
0.0
1.0
2.0
L=15
0 1 2 3 4
0.0
1.0
2.0
0 1 2 3 4
0.0
1.0
2.0
0 1 2 3 4
0.0
1.0
2.0
Figure 6: Kernel density estimates for σ2QB for the exponential model.
33
0.10 0.20 0.30
020
4060
n = 100
L=3
0.10 0.20 0.30
020
4060
n = 400
0.10 0.20 0.30
020
4060
n = 1600
0.10 0.20 0.30
020
4060
n = 2500
0.10 0.20 0.30
020
4060
L=5
0.10 0.20 0.30
020
4060
0.10 0.20 0.30
020
4060
0.10 0.20 0.30
020
4060
0.10 0.20 0.30
020
4060
L=15
0.10 0.20 0.30
020
4060
0.10 0.20 0.30
020
4060
0.10 0.20 0.30
020
4060
Figure 7: Kernel density estimates for cQB for the exponential model.
0 1 2 3 4
0.0
1.0
2.0
n = 100
L=3
0 1 2 3 4
0.0
1.0
2.0
n = 400
0 1 2 3 4
0.0
1.0
2.0
n = 1600
0 1 2 3 4
0.0
1.0
2.0
n = 2500
0 1 2 3 4
0.0
1.0
2.0
L=5
0 1 2 3 4
0.0
1.0
2.0
0 1 2 3 4
0.0
1.0
2.0
0 1 2 3 4
0.0
1.0
2.0
0 1 2 3 4
0.0
1.0
2.0
L=15
0 1 2 3 4
0.0
1.0
2.0
0 1 2 3 4
0.0
1.0
2.0
0 1 2 3 4
0.0
1.0
2.0
Figure 8: Kernel density estimates for σ2QB for the Maternν=3/2 model.
34
0.00 0.04 0.08
050
150
n = 100
L=3
0.00 0.04 0.08
050
150
n = 400
0.00 0.04 0.08
050
150
n = 1600
0.00 0.04 0.08
050
150
n = 2500
0.00 0.04 0.08
050
150
L=5
0.00 0.04 0.08
050
150
0.00 0.04 0.08
050
150
0.00 0.04 0.08
050
150
0.00 0.04 0.08
050
100
200
L=15
0.00 0.04 0.080
5010
020
00.00 0.04 0.08
050
100
200
0.00 0.04 0.08
050
100
200
Figure 9: Kernel density estimates for cQB for the Maternν=3/2 model.
35