hierarchical modelling for large spatial...

4
Hierarchical Modelling for Large Spatial Datasets Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. October 15, 2012 1 Big N problem Introduction The Big n issue Univariate spatial regression Y = Xβ + w + , Estimation involves (σ 2 R(φ)+ τ 2 I ) -1 , which is n × n. Matrix computations occur in each MCMC iteration. Known as the “Big-N problem” in geostatistics. Approach: Use a model Y = Xβ + Zw * + . But what Z? 2 UNL Department of Statstics Spatio-temporal Workshop Big N problem Predictive Process Models Consider “knots” S * = {s * 1 ,..., s * n * } with n * << n. Let w * = {w(s * i )} n * i=1 Z(θ)= {cov(w(s i ),w(s * j ))} {var(w * )} -1 is n × n * . Predictive process regression model Y = Xβ + Z(θ)w * + , Fitting requires only n * × n * matrix computations (n * << n). Key attraction: The above arises as a process model: ˜ w(s) GP (02 w ˜ ρ(·; φ)) instead of w(s). ˜ ρ(s 1 , s 2 ; φ)= cov(w(s 1 ), w * )var(w * ) -1 cov(w * ,w(s 2 )) 3 UNL Department of Statstics Spatio-temporal Workshop Big N problem Selection of knots Knots: A “Knotty” problem?? Knot selection: Regular grid? More knots near locations we have sampled more? Formal spatial design paradigm: maximize information metrics (Zhu and Stein, 2006; Diggle & Lophaven, 2006) Geometric considerations: space-filling designs (Royle & Nychka, 1998); various clustering algorithms Compare performance of estimation of range and smoothness by varying knot size. Stein (2007, 2008): method may not work for fine-scale spatial data Still a popular choice – seamlessly adapts to multivariate and spatiotemporal settings. 4 UNL Department of Statstics Spatio-temporal Workshop Big N problem Selection of knots 0 50 100 150 200 0 5 10 15 20 25 knots tauˆ2 0 50 100 150 200 5 UNL Department of Statstics Spatio-temporal Workshop Big N problem Comparisons: Unrectified VS Rectified A rectified predictive process is defined as ˜ w ˜ (s)=˜ w(s)+˜ (s), where ˜ (s) indep N (02 w (1 - r(s, φ) R *-1 (φ)r(s, φ))). Maximum likelihood estimates of τ 2 : # of Knots Predictive Process Rectified Predictive Process 25 1.56941 1.00786 36 1.65688 1.15386 64 1.45169 1.08358 100 1.37916 1.09657 225 1.27391 1.08985 400 1.22429 1.09489 625 1.21127 1.09998 exact 1.14414 1.14414 6 UNL Department of Statstics Spatio-temporal Workshop

Upload: phungminh

Post on 27-May-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hierarchical Modelling for Large Spatial Datasetsblue.for.msu.edu/UNL_12/SC/slides/PredictiveProcess-6up.pdf · Hierarchical Modelling for Large Spatial Datasets ... 2 UNL Department

Hierarchical Modelling for Large SpatialDatasets

Sudipto Banerjee1 and Andrew O. Finley2

1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A.

2 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A.

October 15, 2012

1

Big N problem Introduction

The Big n issue

Univariate spatial regression

Y = Xβ + w + ε,

Estimation involves (σ2R(φ) + τ2I)−1, which is n× n.

Matrix computations occur in each MCMC iteration.

Known as the “Big-N problem” in geostatistics.

Approach: Use a model Y = Xβ + Zw∗ + ε. But what Z?

2 UNL Department of Statstics Spatio-temporal Workshop

Big N problem Predictive Process Models

Consider “knots” S ∗ = {s∗1, . . . ,s∗n∗} with n∗ << n.Let w∗ = {w(s∗i )}n

∗i=1

Z(θ) = {cov(w(si), w(s∗j ))}′{var(w∗)}−1 is n× n∗.

Predictive process regression model

Y = Xβ + Z(θ)w∗ + ε,

Fitting requires only n∗ × n∗ matrix computations(n∗ << n).Key attraction: The above arises as a process model:w̃(s) ∼ GP (0, σ2wρ̃(·;φ)) instead of w(s).ρ̃(s1,s2;φ) = cov(w(s1),w∗)var(w∗)−1cov(w∗, w(s2))

3 UNL Department of Statstics Spatio-temporal Workshop

Big N problem Selection of knots

Knots: A “Knotty” problem??

Knot selection: Regular grid? More knots near locationswe have sampled more?

Formal spatial design paradigm: maximize informationmetrics (Zhu and Stein, 2006; Diggle & Lophaven, 2006)

Geometric considerations: space-filling designs (Royle &Nychka, 1998); various clustering algorithms

Compare performance of estimation of range andsmoothness by varying knot size.

Stein (2007, 2008): method may not work for fine-scalespatial data

Still a popular choice – seamlessly adapts to multivariateand spatiotemporal settings.

4 UNL Department of Statstics Spatio-temporal Workshop

Big N problem Selection of knots

0 50 100 150 200

05

1015

2025

knots

tauˆ

2

0 50 100 150 200

5 UNL Department of Statstics Spatio-temporal Workshop

Big N problem Comparisons: Unrectified VS Rectified

A rectified predictive process is defined as

w̃ε̃(s) = w̃(s) + ε̃(s), where

ε̃(s)indep∼ N(0, σ2w(1− r(s,φ)′R∗−1(φ)r(s,φ))).

Maximum likelihood estimates of τ2:

# of Knots Predictive Process Rectified Predictive Process25 1.56941 1.0078636 1.65688 1.1538664 1.45169 1.08358

100 1.37916 1.09657225 1.27391 1.08985400 1.22429 1.09489625 1.21127 1.09998

exact 1.14414 1.14414

6 UNL Department of Statstics Spatio-temporal Workshop

Page 2: Hierarchical Modelling for Large Spatial Datasetsblue.for.msu.edu/UNL_12/SC/slides/PredictiveProcess-6up.pdf · Hierarchical Modelling for Large Spatial Datasets ... 2 UNL Department

Illustration

Illustration from:

Finley, A.O., S. Banerjee, P. Waldmann, and T. Ericsson. (2008)Hierarchical spatial modeling of additive and dominancegenetic variance for large spatial trial datasets. Biometrics.DOI:10.1111/j.1541-0420.2008.01115.x

7 UNL Department of Statstics Spatio-temporal Workshop

Illustration Univariate random effects models

Univariate random effects models

Modeling genetic variation in Scots pine (Pinus sylvestris L.),long-term progeny study in northern Sweden.

Quantitative genetics: studies the inheritance of polygenictraits, focusing upon estimation of additive genetic variance, σ2a,and the heritability h2 = σ2a/σ

2Tot, where the σ2Tot represents the

total genetic and unexplained variation.

A high heritability, h2, should result in a larger selectionresponse (i.e., a higher probability for genetic gain in futuregenerations).

8 UNL Department of Statstics Spatio-temporal Workshop

Illustration Univariate random effects models

Observed trees

Observed height

Data overview:

established in 1971 (bySkogforsk)

partial diallel design of52 parent trees

8,160 planted randomlyon 2.2m squares

1997 reinventory of4,970 surviving trees,height, DBH, branchangle, etc.

9 UNL Department of Statstics Spatio-temporal Workshop

Illustration Univariate random effects models

Genetic effects model:

Yi = xTi β + ai + di + εi,

a = [ai]ni=1 ∼MVN(0, σ2aA)

d = [di]ni=1 ∼MVN(0, σ2dD)

ε = [εi]ni=1 ∼ N(0, τ2In)

A and D are fixed relationship matrices (See e.g., Henderson,1985; Lynch and Walsh,1998)

Note, genetic variance is further partitioned into additive andthe non-additive dominance component σ2d

10 UNL Department of Statstics Spatio-temporal Workshop

Illustration Univariate random effects models

Genetic effects model:

Yi = xTi β + ai + di + εi,

Common feature is systematic heterogeneity amongobservational units (i.e., violation of ε ∼ N(0, τ2In))

Spatial heterogeneity arises from:soil characteristicsmicro-climateslight availability

Residual correlation among units as a function of distanceand/or direction = erroneous parameter estimates (e.g.,biased h2)

11 UNL Department of Statstics Spatio-temporal Workshop

Illustration Univariate random effects models

Genetic model results

Parameter credible intervals, 50% (2.5%, 97.5%) for the non-spatial modelsScots pine trial.

Non-spatialParameter Add. Add. Dom.

β 72.53 (69.66, 75.08) 72.27 (70.04, 74.57)σ2a 31.94 (18.30, 49.85) 25.23 (14.12, 43.96)σ2d – 22.37 (11.24, 40.11)τ2 133.60 (121.18, 144.70) 116.14 (100.51, 127.76)h2 0.19 (0.12, 0.28) 0.15 (0.09, 0.26)

12 UNL Department of Statstics Spatio-temporal Workshop

Page 3: Hierarchical Modelling for Large Spatial Datasetsblue.for.msu.edu/UNL_12/SC/slides/PredictiveProcess-6up.pdf · Hierarchical Modelling for Large Spatial Datasets ... 2 UNL Department

Illustration Univariate random effects models

Genetic model results, cont’d.

Residual surface Residual semivariogram

So, ε � N(0, τ2In). Consider a spatial model.

13 UNL Department of Statstics Spatio-temporal Workshop

Illustration Univariate random effects models

Previous approaches to accommodating residual spatialdependence:

Manipulate the mean functionconstructing covariates using residuals from neighboringunits (see e.g., Wilkinson et al., 1983; Besag and Kempton,1986; Williams, 1986)

Geostatiticalspatial process formed AR(1)col ⊗AR(1)row (Martin, 1990;Cullis et al., 1998)classical geostatistical method (Zimmerman and Harville,1991)

All are computationally feasible, but ad hoc and/or restrictivefrom a modeling perspective.

14 UNL Department of Statstics Spatio-temporal Workshop

Illustration Univariate random effects models

Spatial model for genetic trials:

Y (si) = xT (si)β + ai + di + w(si) + εi,

a = [ai]ni=1 ∼MVN(0, σ2aA)

d = [di]ni=1 ∼MVN(0, σ2dD)

w = [w(si)]ni=1 ∼MVN(0, σ2wC(θ))

ε = [εi]ni=1 ∼ N(0, τ2In)

Tools used to estimate parameters:Markov chain Monte Carlo (MCMC) - iterative

Gibbs sampler (β, a, d, w)Metropolis-Hastings and Slice samplers (θ)

Here MCMC is computationally infeasible because of Big-N!

15 UNL Department of Statstics Spatio-temporal Workshop

Illustration Univariate random effects models

Trick to sample genetic effects:

Gibbs draw for random effects, e.g., a|· ∼MVN(µa|·,Σa|·),

where calculating Σa|· =[

1σ2aA−1 + In

τ2

]−1is computationally

expensive!

However A and D are known, so use initial spectraldecomposition i.e., A−1 = P TΛ−1P .

Thus, Σa|· = P T(

1σ2aΛ−1 + 1

τ2I)−1

P to achieve computationalbenefits.

16 UNL Department of Statstics Spatio-temporal Workshop

Illustration Univariate random effects models

Unfortunately, this trick does not work for w. Rather, weproposed the knot-based predictive process.

Corresponding predictive process model:

Y (si) = xT (si)β + ai + di + w̃(si) + εi,

w̃(si) = c(si;θ)TC(θ)∗−1(θ)w∗

where, w∗ = [w(s∗i )]

mi=1 ∼MVN(0, C∗(θ)) and C∗(θ) = [C(s∗

i , s∗j ;θ)]

mi,j=1

Projection =⇒

17 UNL Department of Statstics Spatio-temporal Workshop

Illustration Univariate random effects models

w̃ can accomodate complex spatial dependence structures,E.g., anisotropic Matérn correlation function:ρ(si,sj ;θ) =

(1/Γ(ν)2ν−1

) (2√νdij)

νκν(2√νdij

), where

dij = (si − sj)T Σ−1 (si − sj), Σ = G(ψ)Λ2GT (ψ). Thus,θ = (ν, ψ,Λ).

Simulated Predicted

18 UNL Department of Statstics Spatio-temporal Workshop

Page 4: Hierarchical Modelling for Large Spatial Datasetsblue.for.msu.edu/UNL_12/SC/slides/PredictiveProcess-6up.pdf · Hierarchical Modelling for Large Spatial Datasets ... 2 UNL Department

Illustration Univariate random effects models

Genetic + spatial effects models

Candidate spatial models (i.e., specifications of C∗(θ)):1 AR(1)col ⊗AR(1)row2 isotropic Matérn3 anisotropic Matérn

Each model evaluated using 64, 144, and 256 knot grids.

Model choice using Deviance Information Criterion (DIC)(Spiegelhalter et al., 2002)

19 UNL Department of Statstics Spatio-temporal Workshop

Illustration Univariate random effects models

Table: Model comparisons using the DIC criterion for the Scots pinedataset.

Model pD DICNon-spatial

Add. 306.40 15,618.09Add. Dom. 555.92 15,547.85

Spatial Isotropic64 Knots 639.77 14,877.51144 Knots 739.61 14,814.89256 Knots 802.29 14,771.64

Spatial Anisotropic64 Knots 678.82 14,884.13144 Knots 748.89 14,823.90256 Knots 806.46 14,781.53

20 UNL Department of Statstics Spatio-temporal Workshop

Illustration Univariate random effects models

Genetic + spatial effects models results

Parameter credible intervals, 50% (2.5%, 97.5%) for the isotropic Matérn and64 and 256 knots Scots pine trial.

SpatialParameter 64 Knots 256 Knots

β 72.53 (69.00, 76.05) 74.21 (69.66, 79.66)σ2a 26.87 (17.14, 41.82) 33.03 (18.19, 53.69)σ2d 11.69 (6.00, 34.27) 13.96 (7.65, 27.05)

σ2w 41.84 (23.71, 73.34) 50.36 (30.24, 88.10)τ2 89.55 (72.11, 99.65) 80.75 (67.90, 96.16)ν 0.83 ( 0.31, 1.46) 0.47 (0.26, 1.28)φ 0.05 (0.02, 0.09) 0.04 (0.02, 0.09)

Eff. Range 71.00 (44.66, 127.93) 74.59 (45.22, 129.83)h2 0.21 (0.13, 0.31) 0.25 (0.15, 0.39)

Decrease in τ2 due to removal of spatial variation, resultsin increase in h2 (i.e., ∼ 0.25 vs. ∼ 0.15 with confounding).

21 UNL Department of Statstics Spatio-temporal Workshop

Illustration Univariate random effects models

Genetic + spatial effects models results, cont’d.

Genetic model residuals w̃(s), 64 knots w̃(s), 256 knots

Predictive process – balance model richness withcomputational feasibility (e.g., 4,970×4,970 vs. 64×64).

22 UNL Department of Statstics Spatio-temporal Workshop

Summary

Summary

Challenge - to meet modeling needs:

ensure computationally feasiblereduce algorithmic complexity = cheap tricks (e.g., spectraldecomp. of A prior to MCMC)reduce dimensionality = predictive process

maintain richness and flexibilityfocus on the model not how to estimate the parameters =embrace new tools (MCMC) for estimating highly flexiblehierarchical models

truly acknowledge sources of uncertaintypropagate uncertainty through hierarchical structures (e.g.,recognize uncertainty in C(θ))

23 UNL Department of Statstics Spatio-temporal Workshop