physically motivated scale interaction parameterization in reduced rank quadratic nonlinear dynamic...

15
Environmetrics Special Issue Paper Received: 13 October 2013, Revised: 19 December 2013, Accepted: 27 January 2014, Published online in Wiley Online Library: 18 March 2014 (wileyonlinelibrary.com) DOI: 10.1002/env.2266 Physically motivated scale interaction parameterization in reduced rank quadratic nonlinear dynamic spatio-temporal models D. W. Gladish * and C. K. Wikle a Many environmental spatio-temporal processes are best characterized by nonlinear dynamical evolution. Recently, it has been shown that general quadratic nonlinear models provide a very flexible class of parametric models for such processes. However, such models have a very large potential parameter space that must be reduced for most practical applications, even when one considers a reduced rank state process. We provide a parameterization for such models, which is motivated by physical arguments of wave mode interactions in which medium scales influence the evolution of large-scale modes. This parameterization has the potential to improve forecasts in addition to reducing the parameter space. The methodology is illustrated on real-world forecasting problems associated with Pacific sea surface temperature anomalies and mid-latitude sea level pressure. Copyright © 2014 John Wiley & Sons, Ltd. Keywords: Bayesian; empirical orthogonal functions; hierarchical; spatial 1. INTRODUCTION Spatio-temporal processes in the environmental sciences can be quite complex in terms of the interactions that take place between process components in time and space. These interactions can be nonstationary in time and/or space and include both linear and nonlinear compo- nents. Often, these interactions are scale dependent, with energy cascades that transfer information from one scale to another. This complexity is the main reason why these processes are often modeled deterministically through the numerical solutions of differential equations. There are several issues that complicate the use of such deterministic models; perhaps the most important of which is that the models necessarily have a certain amount of reducible and irreducible errors. This could be associated with inadequate knowledge of the dynamical system, uncertainty in the parameters, truncations necessary for implementation, inadequate boundary condition specification, and so on. One may account for the reducible component of this error by adding more scientific content to the model with the addition of random processes (either additive or in terms of the parameters) with dependent covariance structure. Here, we consider an approach that accounts for scien- tifically plausible interaction between spatial scales of variability in a reduced rank nonlinear framework that provides a physically realistic parameter reduction. Since their initial use in hierarchical spatio-temporal statistical models (Wikle and Hooten, 2010), general quadratic nonlinear (GQN) dynamic models have proven useful for a number of environmental processes (Wikle and Holan, 2011; Majda and Yuan, 2012; Majda and Harlim, 2013; Leeds et al., 2013; and see the overview in Cressie and Wikle, 2011). In their typical usage, these models are either implemented in a mechanistically motivated context, in which the model structures are suggested by the discretization of partial differential equations, or more often, in the rank-reduced “spectral” context due to the curse of dimensionality. Typically, this rank reduction must be fairly dramatic as there are on the order of p 3 parameters, where p is the dimension of the rank-reduced process. Such models accommodate the remaining (assumed to be non-dynamical) medium-scale and small-scale spectral components of the model in the residual covariance term. Wikle and Hooten (2010) note that in some cases, this is physically realistic on the basis of a “Reynolds averaging” argument from turbulence theory. That is, the assumption is that the nonlinear interactions occur with the large scales, interactions between large and small scales correspond to random parameters multiplied by the large-scale components (i.e., linearly), and the interactions between the many small scales are averaged together, implying a dependent covariance structure. In many situations, the removed medium-scale modes correspond to phenomena that might be important in representing the process and, more critically, may be important in their connection to the evolution of the larger scale modes. There are often physical reasons why this may be the case, such as energy cascades in fluid dynamics, where the energy in the medium-scale flow interacts to contribute to evolution of the larger scale modes (e.g., Wiin-Nielsen and Chen, 1993). Thus, it is important to develop flexible and efficient models that can accommodate this behavior. Importantly, by allowing these medium-scale modes to interact with the large-scale modes nonlinearly, but constraining the * Correspondence to: D. W. Gladish, Department of Statistics, University of Missouri, 146 Middlebush Hall, Columbia, MO 65211 U.S.A. E-mail: [email protected] a Department of Statistics, University of Missouri, Columbia, MO, 65211, U.S.A. This article is published in Environmetrics as part of a Special on Issue Physical-statistical modelling; guest-edited by Petra Kuhnert, CSIRO Computational Informatics, PMB 2, Glen Osmond, SA 5064 Australia. Environmetrics 2014; 25: 230–244 Copyright © 2014 John Wiley & Sons, Ltd. 230

Upload: ck

Post on 28-Mar-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Physically motivated scale interaction parameterization in reduced rank quadratic nonlinear dynamic spatio-temporal models

Environmetrics Special Issue Paper

Received: 13 October 2013, Revised: 19 December 2013, Accepted: 27 January 2014, Published online in Wiley Online Library: 18 March 2014

(wileyonlinelibrary.com) DOI: 10.1002/env.2266

Physically motivated scale interactionparameterization in reduced rank quadraticnonlinear dynamic spatio-temporal models†

D. W. Gladish* and C. K. Wiklea

Many environmental spatio-temporal processes are best characterized by nonlinear dynamical evolution. Recently, it hasbeen shown that general quadratic nonlinear models provide a very flexible class of parametric models for such processes.However, such models have a very large potential parameter space that must be reduced for most practical applications,even when one considers a reduced rank state process. We provide a parameterization for such models, which is motivatedby physical arguments of wave mode interactions in which medium scales influence the evolution of large-scale modes. Thisparameterization has the potential to improve forecasts in addition to reducing the parameter space. The methodology isillustrated on real-world forecasting problems associated with Pacific sea surface temperature anomalies and mid-latitudesea level pressure. Copyright © 2014 John Wiley & Sons, Ltd.

Keywords: Bayesian; empirical orthogonal functions; hierarchical; spatial

1. INTRODUCTIONSpatio-temporal processes in the environmental sciences can be quite complex in terms of the interactions that take place between processcomponents in time and space. These interactions can be nonstationary in time and/or space and include both linear and nonlinear compo-nents. Often, these interactions are scale dependent, with energy cascades that transfer information from one scale to another. This complexityis the main reason why these processes are often modeled deterministically through the numerical solutions of differential equations. Thereare several issues that complicate the use of such deterministic models; perhaps the most important of which is that the models necessarilyhave a certain amount of reducible and irreducible errors. This could be associated with inadequate knowledge of the dynamical system,uncertainty in the parameters, truncations necessary for implementation, inadequate boundary condition specification, and so on. One mayaccount for the reducible component of this error by adding more scientific content to the model with the addition of random processes(either additive or in terms of the parameters) with dependent covariance structure. Here, we consider an approach that accounts for scien-tifically plausible interaction between spatial scales of variability in a reduced rank nonlinear framework that provides a physically realisticparameter reduction.

Since their initial use in hierarchical spatio-temporal statistical models (Wikle and Hooten, 2010), general quadratic nonlinear (GQN)dynamic models have proven useful for a number of environmental processes (Wikle and Holan, 2011; Majda and Yuan, 2012; Majdaand Harlim, 2013; Leeds et al., 2013; and see the overview in Cressie and Wikle, 2011). In their typical usage, these models are eitherimplemented in a mechanistically motivated context, in which the model structures are suggested by the discretization of partial differentialequations, or more often, in the rank-reduced “spectral” context due to the curse of dimensionality. Typically, this rank reduction must befairly dramatic as there are on the order of p3 parameters, where p is the dimension of the rank-reduced process. Such models accommodatethe remaining (assumed to be non-dynamical) medium-scale and small-scale spectral components of the model in the residual covarianceterm. Wikle and Hooten (2010) note that in some cases, this is physically realistic on the basis of a “Reynolds averaging” argument fromturbulence theory. That is, the assumption is that the nonlinear interactions occur with the large scales, interactions between large and smallscales correspond to random parameters multiplied by the large-scale components (i.e., linearly), and the interactions between the manysmall scales are averaged together, implying a dependent covariance structure.

In many situations, the removed medium-scale modes correspond to phenomena that might be important in representing the process and,more critically, may be important in their connection to the evolution of the larger scale modes. There are often physical reasons why this maybe the case, such as energy cascades in fluid dynamics, where the energy in the medium-scale flow interacts to contribute to evolution of thelarger scale modes (e.g., Wiin-Nielsen and Chen, 1993). Thus, it is important to develop flexible and efficient models that can accommodatethis behavior. Importantly, by allowing these medium-scale modes to interact with the large-scale modes nonlinearly, but constraining the

* Correspondence to: D. W. Gladish, Department of Statistics, University of Missouri, 146 Middlebush Hall, Columbia, MO 65211 U.S.A. E-mail:[email protected]

a Department of Statistics, University of Missouri, Columbia, MO, 65211, U.S.A.

†This article is published in Environmetrics as part of a Special on Issue Physical-statistical modelling; guest-edited by Petra Kuhnert, CSIRO Computational Informatics,PMB 2, Glen Osmond, SA 5064 Australia.

Environmetrics 2014; 25: 230–244 Copyright © 2014 John Wiley & Sons, Ltd.

230

Page 2: Physically motivated scale interaction parameterization in reduced rank quadratic nonlinear dynamic spatio-temporal models

REDUCED RANK NONLINEAR SPATIO-TEMPORAL MODELS Environmetrics

medium scales to evolve linearly, these models provide the important nonlinear scale interaction while reducing the number of parametersthat need to be estimated—which is critical in the GQN statistical modeling framework.

Here, we explore such an extension to the GQN framework that can accommodate the presence of medium-scale spectral modes influenc-ing the larger scale modes either linearly or nonlinearly. The methodology is described in Section 2. We then consider various forms of thismodel relative to two real-world data sets in Section 3. First, we consider the long-lead forecasting of tropical Pacific sea surface temperature(SST), which has often been used as an illustration of GQN, and then the short-term forecasting of sea level pressure (SLP). This is followedby a discussion in Section 4.

2. MODEL DEVELOPMENT AND METHODOLOGYThis section describes the development of a physically motivated scale interaction parameterization for a reduced rank GQN dynamic spatio-temporal model. After a review of GQN, a hierarchical model is presented that includes a decomposition of the spatio-temporal process intothe sum of two reduced rank processes corresponding to large-scale and medium-scale spatial modes, respectively. The critical modelingcomponent then corresponds to the inclusion of a term to represent the interaction of the medium-scale modes to influence the evolutionof the large-scale modes. The medium-scale modes are assumed to evolve linearly, and conjugate priors are assigned to parameters andcovariance matrices. This is then followed by a discussion of the Markov chain Monte Carlo (MCMC) Bayesian implementation.

2.1. General quadratic nonlinearity

Following the notation in Wikle and Hooten (2010), consider a time-varying spatial process Y t � .Yt .s1/; : : : ; Yt .sn//0 for t D 1; : : : ; T atspatial locations s1; : : : ; sn, where si 2 Ds � Rd (where typically, d D 2). Then, the GQN univariate formulation is given by

Yt .si / DnXjD1

aijYt�1.sj /CnXkD1

nXlD1

bi;klYt�1.sk/g.Yt�1.sl /I�G/C �t .si / (1)

for i D 1; : : : ; n, where aij are linear evolution parameters, bi;kl are nonlinear evolution parameters, g.�/ is some function that may transformYt�1.s/, and �G are the parameters associated with this function. The fact that g.�/ can accommodate transformations of the process is theorigin of the term “general” in this framework (e.g., Wikle and Hooten, 2010). Here, f�t .si /g is a stochastic error process that accounts formodel uncertainty and/or forcing and is assumed to contain the effects of small-scale interactions. Typically, this error process is assumedto have zero mean and is uncorrelated with fYt�1.si /g. Further, this process is generally assumed to be correlated in space but not time,although this assumption may be relaxed depending on the application.

The univariate formulation of GQN given in (1) can be written in various matrix forms. One matrix formulation illustrating that this modelcorresponds to a state-dependent evolution process is given by

Yt D AYt�1 C�

In ˝ G�

Yt�1I�G�0�

BYt�1 C �t

D

�AC

�In ˝ G

�Yt�1I�G

�0�B�

Yt�1 C �t

where ˝ denotes Kronecker product, G .Yt�1I�G/ � .g.Yt�1.sl /I�G/; : : : ; g.Yt�1.sn/I�G//0, A � .aij /n�n, and B an n2 � n matrix

given by

B �

0B@

B1:::

Bn

1CA (2)

where Bi � .bi;kl /n�n. Note that, for almost any realistic spatio-temporal process, A and B will have on order n2 and n3 parameters,respectively, which makes their efficient estimation impractical. For this reason, one can parameterize these on the basis of some mechanisticmodel. That is, the mechanistic model “motivates” the structure of these matrices (Wikle and Hooten, 2010). For example, Cressie and Wikle(2011, Section 7.3.3) described several cases where if one discretizes a nonlinear partial differential equation, it can suggest the forms of Aand B, which are both typically very sparse.

Alternatively, one can consider the GQN to apply to a reduced rank process (i.e., the coefficients associated with a reduced rank basisexpansion, such that the new dimension is p, where p << n.). This can be quite effective, but there can still be quite a large number ofparameters (order p3), which can be problematic. Wikle and Holan (2011) mitigated this by applying Bayesian stochastic search variableselection to the parameters, which helps to shrink the parameters toward zero, and facilitates estimation and prediction through the implicitmodel averaging. The focus here is on a simple, yet physically motivated, alternative approach that reduces parameter dimensionality byconsidering the evolution of the large-scale modes to be impacted by the medium-scale modes but not the other way around.

2.2. The general quadratic nonlinear scale-interaction hierarchical model

For our model development, we utilize the Bayesian hierarchical model framework as outlined by Berliner (1996). First, we define the datavector of length mt as Zt D .Zt .r1/; : : : ; Zt .rmt //

0, t D 1; : : : ; T for mt spatial locations fr1; : : : ; rmt g � Ds 2 Rd , where Ds is the

Environmetrics 2014; 25: 230–244 Copyright © 2014 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/environmetrics

231

Page 3: Physically motivated scale interaction parameterization in reduced rank quadratic nonlinear dynamic spatio-temporal models

Environmetrics D. W. GLADISH AND C. K. WIKLE

spatial domain of interest and d the dimension of the spatial domain (hereafter assumed to be d D 2). Define Ht as somemt �n matrix thatmaps the data onto the underlying process Yt D .Yt .s1/; : : : ; Yt .sn//

0, where fs1; : : : ; sng � Ds . Thus, our observation locations may notcoincide with the process locations (although, they do in the examples described in the succeeding text). Then, our data model is given by

Zt D HtYt C �t (3)

where here we assume that �t � Gau�0; �2� I

and �2� is the associated observation variance that we assume is known. In some applications,

we may have more complicated error covariance structure, non-Gaussian data, or these variances and covariances must be modeled orestimated.

The process model is our focus here. Specifically, we consider a dimension reduction approach. Often, in such a rank-reduced context,a dynamic spatio-temporal process will be modeled “spectrally”. That is, the true underlying dynamical process is modeled in terms of aprojection onto spatial basis functions, with the dynamic modeling focus then on the random projection coefficients (e.g., see the overview inCressie and Wikle, 2011, Section 7.2.6). For a spatial-temporal process Yt , it is often reasonable to consider such a decomposition in termsof two reduced rank processes (Wikle et al., 2001)

Yt D ˆ.1/˛t Cˆ.2/ˇt C �t (4)

where ˆ.i/, i D 1; 2 correspond to n � p and n � q matrices containing large-scale and medium-scale (and/or small) basis functions,respectively. Then, ˛t and ˇt correspond to vectors of large-scale and medium-scale expansion coefficients of lengths p and q, respectively,and �t is an error process assumed to be independent of ˛t and ˇt , which accounts for the remaining medium-scale spectral modes. In ourcase, we assume �t � Gau

�0; �2� I

, where �2� is some unknown variance. However, this process could include dependence if necessary.

The fact that we have the second basis expansion in (4) is important. In many reduced rank spatial statistics applications, the model containsonly one set of basis coefficients, and the truncated components are ignored or assumed to be independent and absorbed into the residualerror process. This can lead to serious over smoothing and is not appropriate for many processes (e.g., see the discussion in Stein, 2013).

As described in Cressie and Wikle (2011, pg. 401), there are many possible choices for the basis functions in (4), including empiricalorthogonal functions (EOFs), Fourier bases, wavelets, bisquares, splines, and various smoothing kernels. Typically, such spatial basis func-tions are associated with a range of spatial scales. Thus, one may consider a single basis type for both ˆ.1/ and ˆ.2/, corresponding todifferent spatial scales (e.g., the first p basis functions in an EOF expansion for ˆ.1/ and the next q basis functions in that expansion forˆ.2/). Alternatively, one may have different basis types. For example, Wikle et al. (2001) considered the first p normal modes from a theo-retical shallow water system for ˆ.1/ to account for large-scale tropical waves and then multiresolution basis functions for ˆ.2/ to accountfor turbulent scaling effects in tropical winds. Thus,ˆ.1/ andˆ.2/ may or may not be orthogonal, depending on the choice of basis function.In cases where there is overlap between spatial scales, the prior specification for ˛t and ˇt can be critical (e.g., Wikle et al., 2001). Finally,we note that there is no reason why additional basis sets could not be considered, but such a formulation would typically be motivated by theparticular dynamics of the application.

Although modeling a spatial process in terms of the large-scale basis coefficients alone is often unrealistic, it is much more realisticto model the dynamical evolution of the process in terms of such a reduced rank process because the dynamics often exist on a lower-dimensional manifold. Thus, our primary interest is in the evolution of the large-scale coefficients given by ˛t . However, in many processes,there is a scale interaction across components of the spatial spectrum, such as that caused by nonlinear advection in atmospheric and oceandynamics. For example, considering the spectral distribution of kinetic energy in nondivergent atmospheric flow, Wiin-Nielsen and Chen(1993, Chap. 10) showed that the medium scales transfer kinetic energy primarily to the large scales. Although we are not strictly interestedin modeling only nondivergent flows, the notion of medium scales contributing to the evolution of large-scale modes is appealing moregenerally, so this physical relationship suggests a mechanistically motivated parameterization for GQN statistical models that are formulatedin terms of spatial basis functions. This provides a novel extension of the mechanistically motivated parameterizations suggested in Wikleand Hooten (2010) for GQN models formulated in physical space. Critically, our parameterization does not require the discretization of aspecific differential equation but could be motivated by such a spectral (i.e., Galerkin) discretization.

Thus, critical to our model development is allowing the propagation of ˛t to ˛tC� (where � > 1) to be influenced by the medium-scale coefficients ˇt but not allowing ˛t to influence ˇtC� directly in the dynamical formulation. Indeed, this is the primary novelty ofthis methodology as it provides a physically realistic way in which to reduce the parameter space in the rank-reduced GQN formulation.Specifically, we consider the following model for the evolution of ˛t :

˛tC� DM˛˛t C .Ip ˝ G .˛t /0/M˛;Q˛t CMˇ;Lˇt C .Ip ˝ G .ˇt /

0/Mˇ;Qˇt C �tC� (5)

for t D 1; : : : ; T and some appropriate time increment � , where �t � Gau.0;Q˛/;M˛ corresponds to the linear evolution of coefficientsfor the ˛t process, M˛;Q corresponds to the nonlinear evolution coefficients for the ˛t process, Mˇ;L corresponds to the linear interactionsbetween ˇt and ˛tC� , Mˇ;Q corresponds to the nonlinear interactions between ˇt and their impact on ˛tC� , and Q˛ is a p � p covariancematrix. Note that M˛ and Mˇ;L are p � p and p � q matrices while M˛;Q and Mˇ;Q are p2 � p and pq � p matrices that follow theform of (2). Although we may consider a variety of transformation functions G .�/, for the applications presented here, it is reasonable tospecify this function as the identity and hence define G .˛t / � ˛t (similarly for ˇt ). That is, this transformation function is most oftenused for nonlinear reaction–diffusion processes in which there is a nonlinear growth term (e.g., Hooten and Wikle, 2008). For the types ofgeophysical processes considered here, such a term is not warranted.

To complete the process model, it remains to include a prior for ˇt . An important assumption with the model presented here is that thesemedium-scale coefficients account for a relatively small (but important) portion of the variability of Yt , and although they may in some cases

wileyonlinelibrary.com/journal/environmetrics Copyright © 2014 John Wiley & Sons, Ltd. Environmetrics 2014; 25: 230–244

232

Page 4: Physically motivated scale interaction parameterization in reduced rank quadratic nonlinear dynamic spatio-temporal models

REDUCED RANK NONLINEAR SPATIO-TEMPORAL MODELS Environmetrics

evolve nonlinearly, it is not worth the parameterization cost to model them that way. Rather, their critical importance is in the evolution of thehigh variance modes, and so we make a plausible assumption that these medium-scale modes evolve linearly. In this case, a reasonable modelfor linear rank-reduced models is a lagged-� linear model (e.g., Cressie and Wikle, 2011, Section 9.1). For computational convenience, wefurther assume here that these medium-scale modes do not interact and so consider simple univariate models for each mode independently(e.g., see Wikle et al., 2001), although it is simple to include an unstructured transition operator for these modes. Thus, the prior model forˇtC� is given by

ˇtC� DMˇˇt C � tC� (6)

where Mˇ is a q � q diagonal matrix whose elements are given by the vector mˇ � .m1; : : : ; mq/0 and � t � Gau.0;Qˇ / where Qˇ is a

q � q diagonal matrix whose elements contain the variances associated with the mode-specific linear lagged-� models.Lastly, we complete the Bayesian hierarchical model by specifying distributions for the remaining parameters. We choose these

distributions so that our full conditionals are conjugate distributions. In particular, we assume the following prior distributions:

˛0;i � Gau.a0;†a0/; i D 0; : : : ; � � 1

ˇ0;i � Gau.b0;†b0/; i D 0; : : : ; � � 1

�2� � IG.q� ; r�/

vec.M˛/ � m˛ � Gau . Qm˛ ;†˛/

vec�QM˛;Q

�� Gau

�Qm˛;Q;†˛;Q

vec.Mˇ;L/ � mˇ;L � Gau

�Qmˇ;L;†ˇ;L

vec

�QMˇ;Q

�� Gau

�Qmˇ;Q;†ˇ;Q

mˇ � Gau

�Qmˇ ;†ˇ

Q�1˛ �Wishart..a˛S˛/

�1; a˛/

Qˇ .j; j / � IQ.qˇ .j /; rˇ .j //; j D 1; : : : ; q

where we specify the hyperparameters associated with a0, the p � p matrix †a0, b0, the q � q matrix †b0, q� , r� , the p2-dimensionalvector Qm˛ , the p2 � p2 matrix †˛ , the p2.p C 1/=2-dimensional vector Qm˛;Q, the p2.p C 1/=2 � p2.p C 1/=2 matrix †˛;Q, thepq-dimensional vector Qmˇ;L, the pq � pq matrix †ˇ;L, the pq.q C 1/=2-dimensional vector Qmˇ;Q, the pq.q C 1/=2 � pq.q C 1/=2matrix †ˇ;Q, the q-dimensional vector Qmˇ , the q � q matrix †ˇ , a˛ , the p � p matrix S˛ , qˇ .j /, and rˇ .j /; for j D 1; : : : ; q. Note thatQM˛;Q and QMˇ;Q are reformulations for M˛;Q and Mˇ;Q (see Appendix A for details).

2.3. Alternative models

Our model allows for inherent flexibility depending on whether one wants to include nonlinear effects in the propagation of ˛t and whetherlinear and nonlinear interactions associated with ˇt are included in the model for ˛t . Thus, we can consider various sub-models for theevolution of ˛t from the general model presented in (5). This allows the flexibility to evaluate the trade-offs between linear and nonlinearcomponents in the model and to gauge the effectiveness of the parameterization proposed here in the context of spatio-temporal prediction.In particular, we consider eight possible models for the evolution of ˛t , labeled M1 through M8:

M1 W ˛tC� DM˛˛t C �tC�

M2 W ˛tC� DM˛˛t CMˇ;Lˇt C �tC�

M3 W ˛tC� DM˛˛t C�Ip ˝ ˇ0t

Mˇ;Qˇt C �tC�

M4 W ˛tC� DM˛˛t CMˇ;Lˇt C�Ip ˝ ˇ0t

Mˇ;Qˇt C �tC�

M5 W ˛tC� DM˛˛t C�Ip ˝ ˛0t

M˛;Q˛t C �tC�

M6 W ˛tC� DM˛˛t C�Ip ˝ ˛0t

M˛;Q˛t CMˇ;Lˇt C �tC�

M7 W ˛tC� DM˛˛t C�Ip ˝ ˛0t

M˛;Q˛t C

�Ip ˝ ˇ0t

Mˇ;Qˇt C �tC�

M8 W ˛tC� DM˛˛t C�Ip ˝ ˛0t

M˛;Q˛t CMˇ;Lˇt C

�Ip ˝ ˇ0t

Mˇ;Qˇt C �tC�

Models M1 and M5 are the baseline linear and nonlinear models, respectively, for just the large-scale modes. The other models then includethe influence of the medium-scale modes on the large-scale coefficients linearly and/or nonlinearly. Note that we do not consider models thatare purely nonlinear because the geophysical processes of interest typically include a substantial linear component. However, such modelscould easily be considered if warranted by the application.

To compare these eight models, it would be ideal to run a suite of simulation studies. However, as has been discussed by Majda andYuan (2012) and Majda and Harlim (2013), quadratically nonlinear models are subject to finite-time blow-up (instability) that makes theirsimulation extremely challenging without significant physical constraints. Thus, instead of simulation studies, we consider two real-worldforecasting problems in Section 3, which correspond to different potentially nonlinear physical processes. To our knowledge, this is the firsttime such a comprehensive suite of linear and nonlinear space-time dynamic statistical models has been applied to real-world environmental

Environmetrics 2014; 25: 230–244 Copyright © 2014 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/environmetrics

233

Page 5: Physically motivated scale interaction parameterization in reduced rank quadratic nonlinear dynamic spatio-temporal models

Environmetrics D. W. GLADISH AND C. K. WIKLE

forecasting problems. We note that the finite-time blow-up is not typically an issue for one-step ahead prediction, especially when there aredata to control the estimation.

2.4. Implementation

By specifying the hierarchical model as given earlier, all parameters are sampled using Gibbs steps in the MCMC unless the quadratic termsin ˛t or ˇt are included. In that case, these process variables are sampled with Metropolis–Hastings within Gibbs steps. For details, theMCMC algorithm for model M8 is given in Appendix A. If we choose to use a reduced model where the quadratic terms of either the ˛t orˇt coefficients are not included, then the respective parameters may be sampled using a Gibbs step.

A critical issue is how to choose the number of large-scale and medium-scale elements p and q, respectively, to use in the models.These may be specified for particular applications, for example, perhaps on the basis of variance explained if EOFs are used for the basisfunctions. However, in many cases, we do not know a priori how many of each type of mode to consider. Given that these models are verycomputationally expensive to implement in an MCMC framework, complete cross-validation evaluations to select p and q are not typicallypractical. Reversible jump procedures could be included in the MCMC, but this is complicated and typically computationally prohibitive forquadratic nonlinear models. As an alternative, we use an ordinary least squares procedure to help identify plausible model orders for thesecomponents as described in Appendix B. We make no claim that this procedure is “optimal”, but it does provide a practical decision supporttool in the presence of such model complexity.

3. FORECASTING EXAMPLESTo evaluate the importance of the parameterization in which medium-scale modes impact the evolution of large-scale modes, we fit theeight models in Section 2.3 with two real-world data sets. Our interest is forecasting in both applications. The number of large-scale andmedium-scale modes for each example was chosen as described in Appendix B.

3.1. Long lead forecasting of tropical Pacific sea surfaceTemperature anomalies

Tropical Pacific SST anomalies exhibit strongly structured variabilities on multiple spatial and temporal scales. In particular, the ElNiño/Southern Oscillation phenomenon is evident in this region and is known to be one of the most important sources of climate variability(e.g., Philander, 1990). The El Niño/Southern Oscillation consists of a quasi-periodic warming (El Niño) and cooling (La Niña) in the trop-ical Pacific on 3- to 5-year time scales. The effects of such warming and cooling influence atmospheric and ecological systems on a globalscale, suggesting that it is useful to accurately forecast such effects many months in advance. Over the last couple of decades, models havebeen developed that have shown promise in producing such “long lead” forecasts (see the overview in Wikle and Holan, 2011). However,statistical models for long-lead SST forecasting typically only include large-scale effects and attribute the medium-scale and small-scaleeffects to the error term (e.g., Berliner et al., 2000). Thus, this forecasting environment provides an ideal test bed for considering the role ofmedium-scale effects in linear and nonlinear statistical models, as is our interest here.

We utilize monthly SST data from the Pacific ocean across 2261 gridded spatial locations at 2ı � 2ı resolution from January 1970 toMarch 1998 as described in Wikle and Hooten (2010). Specifically, we use our model to forecast the major 1997 El Niño and the 1998La Niña events, on the basis of model fits that are out of sample relative to these target predictions. Recent observational models and dataassimilation approaches have improved the ability to forecast SSTs with lead times approaching 6–12 months. As such, we choose a leadtime of � D 7 to forecast the October 1997 El Niño and October 1998 La Niña events given data through March 1996 and March 1997,respectively. This gives T D 327 time points (months) to fit the El Niño model and T D 339 time points to fit the La Niña model. Figure 1shows the SST anomalies associated with these El Niño and La Niña events.

For both forecast periods, we consider the eight possible models described in Section 2.3. We use the aforementioned least squares orderselection procedure described in Appendix B to choose p and q from an initial selection of P D 6, 8, 10, or 12 EOF components (recall,P D pCq), which correspond to 66.6%, 70.9%, 74.1%, and 76.6% of the variance in the SST anomalies, respectively. Specifically, we selectthe number of components that give the minimum forecasted prediction error in the Niño 3.4 index. The Niño 3.4 index corresponds to theaverage of the SST anomalies over the so-called Niño 3.4 region, which is defined as the area bounded by 5ıN �5ıS and 120ıW �170ıW .We compare the absolute value of the error between the predicted mean and actual Niño 3.4 index on the basis of various values for p andq. In both the El Niño and La Niña cases, P D 12 components were chosen on the basis of this prediction error measure using the linearmodel as described in Section 2.3.

Table 1 shows the number of p large-scale and q medium-scale components chosen by the prediction criteria in the eight models for theEl Niño and La Niña cases. On the basis of these choices for p and q, the full MCMC implementation was run for each model for 20,000iterations with the first 5000 iterations discarded as burn-in. Convergence was assessed via visual inspection, with no obvious departuresfrom convergence detected. Tables 2 and 3 show the resulting estimated Niño 3.4 index with prediction intervals, along with the truth for theEl Niño and La Niña models, respectively, as well as the posterior prediction error for the Niño 3.4 index. We note that all of the El Niñoperiod predictions are biased in the sense that they under predict the strength of the actual event, whereas the La Niña forecasts are biasedin the sense that they over predict the strength of the event. This is typical of almost all long-lead SST forecast models when it comes topredicting extreme El Niños (e.g., Barnston et al. 1999).

In the case of forecasting El Niño, model M6 was best in terms of having the smallest Niño 3.4 index prediction error among all eightmodels considered. This model includes a quadratic nonlinear term for ˛t and a linear term for ˇt in the evolution of the large-scalecoefficients. All models that included a quadratic term for ˛t (M5–M8) were better than the models without the quadratic term. Critically,

wileyonlinelibrary.com/journal/environmetrics Copyright © 2014 John Wiley & Sons, Ltd. Environmetrics 2014; 25: 230–244

234

Page 6: Physically motivated scale interaction parameterization in reduced rank quadratic nonlinear dynamic spatio-temporal models

REDUCED RANK NONLINEAR SPATIO-TEMPORAL MODELS Environmetrics

LongitudeLa

titud

e

(a)

140 160 180 200 220 240 260 280−30

−20

−10

0

10

20

30

−2

0

2

Longitude

Latit

ude

(b)

140 160 180 200 220 240 260 280−30

−20

−10

0

10

20

30

−2

0

2

Figure 1. Image plot of the observed Pacific sea surface temperature anomalies (in degree Celsius) for (a) October 1997 (El Niño) and (b) October1998 (La Niña)

Table 1. The number of large-scale (p) andmedium-scale (q) components for the El Niñoand La Niña forecasting models, as chosenon the basis of minimum prediction errorbetween the forecasted and actual Niño 3.4index utilizing the least squares selection pro-cedure described in Appendix B

El Niño La Niña

Model p q p q

M1 11 1 11 1M2 7 5 11 1M3 9 3 11 1M4 5 7 11 1M5 9 3 8 4M6 9 3 8 4M7 9 3 8 4M8 9 3 8 4

along with model M4, these models were able to capture the true value within their prediction intervals, while models M1, M2, and M3were not. The inclusion of the quadratic term for ˇt did not perform as well in predicting the Niño 3.4 index as models without this term.However, model M6 does show that having the medium-scale modes influence the large-scale modes linearly does produce the best forecast,although the next best model (M5) is more parsimonious. We present the spatial field ( OYt ) prediction results for both M6 and M8 to illustratethe spatial aspects of the best and the full models, respectively. Specifically, Figure 2 shows the posterior mean and variance of the forecastdistribution of SST anomalies for October 1997 given data through March 1997 for model M6, and Figure 3 shows the corresponding resultsfor model M8. Model M6 appears to visually capture the forecasted El Niño effect better than M8 in terms of the overall intensity. Thisshows why the prediction error of the Niño 3.4 index is better for the M6 model because the Niño 3.4 region corresponds to the area withthe greatest intensity (red) in these images.

In the case of forecasting La Niña, model M4, the model with both linear and quadratic medium-scale coefficients, performed the best interms of predicting the Niño 3.4 index as seen in Table 3, although model M2, which is linear in both scales, is almost as good and is moreparsimonious. In addition, all eight models were able to capture the true Niño 3.4 index in their respective prediction intervals. Figure 4 showsthe forecasted posterior mean and variance of the process for October 1998 given data through March 1998 for model M4, and Figure 5shows the corresponding results for the full model, M8. Note that the predicted mean for M4 has a narrower El Niño signature in the central

Environmetrics 2014; 25: 230–244 Copyright © 2014 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/environmetrics

235

Page 7: Physically motivated scale interaction parameterization in reduced rank quadratic nonlinear dynamic spatio-temporal models

Environmetrics D. W. GLADISH AND C. K. WIKLE

Table 2. Posterior predicted mean Niño 3.4 index, associated95% prediction interval, and prediction error for the true valueof the Niño 3.4 index associated with the eight models for theEl Niño forecasts, where the models were implemented viaMarkov chain Monte Carlo

Model Niño 3.4 95% PI Prediction error

Truth 2.6365M1 1.0529 (�0.3559, 2.4617) 1.5836M2 0.9031 (�0.5346, 2.3407) 1.7334M3 0.3786 (�1.1001, 1.8572) 2.2579M4 0.7499 (�1.1471, 2.6468) 1.8866M5 1.6457 (0.0418, 3.2497) 0.9907M6 1.6997 (0.1507, 3.2488) 0.9367M7 1.1989 (�0.3836, 2.7814) 1.4376M8 1.3082 (�0.2226, 2.8390) 1.3282

Bold indicates the lowest Niño 3.4 index prediction error.PI, prediction interval.

Table 3. Posterior predicted mean Niño 3.4 index, associated95% prediction interval, and prediction from the true value ofthe Niño 3.4 index associated with the eight models for theLa Niña forecasts, where the models were implemented viaMarkov chain Monte Carlo

Model Niño 3.4 95% PI Prediction error

Truth �0.8908M1 �0.9946 (�2.4913, 0.5020) 0.5020M2 �1.0840 (�2.6019, 0.4338) 0.4338M3 �1.0019 (�2.5140, 0.5102) 0.5102M4 �1.0959 (�2.6158, 0.4239) 0.4239M5 �1.1009 (�3.0874, 0.8856) 0.8856M6 �1.1449 (�3.0047, 0.7149) 0.7149M7 �1.1611 (�3.0902, 0.7681) 0.7681M8 �1.3430 (�3.1728, 0.4868) 0.4868

Bold indicates the lowest Niño 3.4 index prediction error.PI, prediction interval.

Pacific and is more realistic relative to the truth, which is why this model performed best in terms of the Niño 3.4 index. However, we notethat model M8 captures the forecasted warm anomaly just off the coast of South America, which is present in the true anomaly field, but isnot captured by M4. This suggests that the full quadratic nonlinearity may be important for predicting certain features in the anomaly patternthat are not typically considered part of the La Niña signature.

3.2. Forecasting sea level pressure

We now apply the eight models to gridded SLP data across the Midwestern United States. The region of interest corresponds to 1200 spatiallocations from 35ıN to 45ıN and 82:6ıW to 99:3ıW . There are 30 spatial locations along the x-axis and 40 spatial locations along they-axis. The data on the temporal scale correspond to 3-h intervals starting at 0:00 UTC on 1 April 2012 and ending at 0:00 UTC on 11 May2012, where the last period is used for out-of-sample forecasting. The data are standardized to have mean zero and a standard deviationof one.

We implement models M1 through M8 for lead times corresponding to � D 1, 4, and 8 (corresponding to 3, 12, and 24 h, respectively). Asbefore, we utilize the model order selection procedure described in Appendix B to determine the values of p and q for each of the 24 modelsconsidered here. Specifically, we determine p and q on the basis of an out-of-sample forecast, choosing the values that give the minimumroot mean squared error (RMSE) averaged across all spatial locations for the forecast time. For � D 1 and 4, 12 components were chosen,while for � D 8, six components were selected. Table 4 lists the values of p and q for each model.

For all 24 models, we ran an MCMC for 20,000 iterations, discarding the first 5000 as burn-in, with convergence assessed visually. Weevaluate the forecast on the basis of the RMSE between the forecast and observed spatial process, and compare models M1–M8 for each

wileyonlinelibrary.com/journal/environmetrics Copyright © 2014 John Wiley & Sons, Ltd. Environmetrics 2014; 25: 230–244

236

Page 8: Physically motivated scale interaction parameterization in reduced rank quadratic nonlinear dynamic spatio-temporal models

REDUCED RANK NONLINEAR SPATIO-TEMPORAL MODELS Environmetrics

Longitude

Latit

ude

(a)

140 160 180 200 220 240 260 280−30

−20

−10

0

10

20

30

−30

−20

−10

0

10

20

30

Longitude

Latit

ude

(b)

140 160 180 200 220 240 260 280

−3

−2

−1

0

1

2

3

0

0.5

1

1.5

2

Figure 2. Results of model M5 for the 1997 El Niño forecast model: (a) the posterior mean of the forecasted Pacific sea surface temperature anomalies�OYt�

for October 1997 based on data through March 1997 and (b) the associated posterior variance

Longitude

Latit

ude

(a)

140 160 180 200 220 240 260 280−30

−20

−10

0

10

20

30

Latit

ude

−30

−20

−10

0

10

20

30

Longitude

(b)

140 160 180 200 220 240 260 280

−3

−2

−1

0

1

2

3

0

0.5

1

1.5

2

Figure 3. Results of model M8 for the 1997 El Niño forecast model: (a) the posterior mean of the forecasted Pacific sea surface temperature anomalies�OYt�

for October 1997 based on data through March 1997 and (b) the associated posterior variance

� . The results are given in Table 5. The best model for � D 1 was the M2 model, the model that utilizes only the linear ˛t and ˇt terms.Increasing the lead to � D 4 and � D 8 both give M8 as the best model, suggesting a need for inclusion of the quadratic terms for the large-scale and medium-scale coefficients as we increase lead time. However, note that model M4 performs almost as well and is significantlymore parsimonious.

Environmetrics 2014; 25: 230–244 Copyright © 2014 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/environmetrics

237

Page 9: Physically motivated scale interaction parameterization in reduced rank quadratic nonlinear dynamic spatio-temporal models

Environmetrics D. W. GLADISH AND C. K. WIKLE

Longitude

Latit

ude

(a)

140 160 180 200 220 240 260 280−30

−20

−10

0

0

10

20

30

Latit

ude

−30

−20

−10

10

20

30

Longitude

(b)

140 160 180 200 220 240 260 280

−3

−2

−1

0

1

2

3

0

0.5

1

1.5

2

Figure 4. Results of model M4 for the 1998 La Niña forecast model: (a) the posterior mean of the forecasted Pacific sea surface temperature anomalies�OYt�

for October 1998 based on data through March 1998 and (b) the associated posterior variance

Longitude

Latit

ude

(a)

140 160 180 200 220 240 260 280−30

−20

−10

0

0

10

20

30

Latit

ude

−30

−20

−10

10

20

30

Longitude

(b)

140 160 180 200 220 240 260 280

−3

−2

−1

0

1

2

3

0

0.5

1

1.5

2

Figure 5. Results of model M8 for the 1998 La Niña forecast model: (a) the posterior mean of the forecasted Pacific sea surface temperature anomalies�OYt�

for October 1998 based on data through March 1998 and (b) the associated posterior variance

The results from this analysis confirm conventional wisdom with regard to short-term meteorological forecasting that fairly short-timeforecasts are well approximated by linear processes, but as the lead times become longer, the nonlinear aspects of the evolution become moresignificant. In this sense, this justifies the notion that having medium-scale components impacts the large-scale for long lead times and thatthose interactions should be allowed to be nonlinear.

wileyonlinelibrary.com/journal/environmetrics Copyright © 2014 John Wiley & Sons, Ltd. Environmetrics 2014; 25: 230–244

238

Page 10: Physically motivated scale interaction parameterization in reduced rank quadratic nonlinear dynamic spatio-temporal models

REDUCED RANK NONLINEAR SPATIO-TEMPORAL MODELS Environmetrics

Table 4. The number of large-scale (p) andmedium-scale (q) components associated with theprediction of sea level pressure for lead times� D 1, 4, and 8 based on minimum RMSE ofthe forecasted and observed sea level pressure val-ues from the least squares procedure described inAppendix B

� D 1 � D 4 � D 8

Model p q p q p q

M1 10 2 4 8 1 5M2 6 6 5 7 1 5M3 4 8 4 8 1 5M4 6 6 5 7 1 5M5 8 4 4 8 1 5M6 8 4 5 7 1 5M7 8 4 4 8 1 5M8 8 4 5 7 1 5

Table 5. RSME for the sea level pressure posteriormean predictions for the eight alternative models,for lead times � D 1, 4, 8 based on Markov chainMonte Carlo estimation

RMSE

Model � D 1 � D 4 � D 8

M1 0.0834 0.2432 0.4372M2 0.0820 0.2273 0.4634M3 0.0949 0.2253 0.8424M4 0.1006 0.2505 0.4385M5 0.1424 0.3926 0.4340M6 0.1608 0.2426 0.5181M7 0.1349 0.2310 0.8604M8 0.1456 0.2214 0.4245

Bold indicates the minimum RMSE for that speci-fied lead time.

4. DISCUSSIONSpatio-temporal processes have been the subject of substantial research within statistics in recent years. The focus in such studies has oftenbeen on capturing large-scale structure, and attributing medium-scale and small-scale spatial structure to noise. GQN dynamic models haverecently shown promise in being able to accommodate the large-scale structures of environmental processes in a reasonable fashion. Inthis paper, we developed an extension to the GQN framework that introduces linear and nonlinear influence of medium-scale structure onthe large-scale spectral coefficients of the dynamical process. That is, our model allows for flexibility in that it can accommodate dynamicevolution in terms of linear and nonlinear interactions of large-scale modes, as well as allowing medium-scale modes to influence thelarge-scale modes either linearly or nonlinearly.

Critically, the inclusion of medium-scale modes in this way also performs a dimension reduction in the parameter space relative to amodel in which both sets of modes are evolved nonlinearly. For example, a GQN large-scale model with P D 12 components has on theorder of P 3 parameters. However, by splitting the components up into large-scale and medium-scale as suggested here, for instance p D 6and q D 6, we reduce the dimensionality of the evolution operator to the order of p3 C q3, for example, 123 D 1728 versus 2.63/ D 432

parameters for the P D 12, q D 6, p D 6. This nearly 75% reduction in the number of parameters is substantial. Thus, this methodologyprovides a physically motivated way to reduce the parameter space in quadratically nonlinear models. In real-world forecasting applications,one can then balance the trade-offs between modest gains in predictive skill and model parsimony.

The SST anomaly long-lead forecasting example showed that allowing medium-scale components to influence large-scale componentsdoes indeed help improve model forecast. In that case, the inclusion of a nonlinear medium-scale term as well as a linear and nonlinearlarge-scale term in the dynamic process for the large-scale components provided the best model among those considered for the El Niñoprediction. In addition, the inclusion of both linear and nonlinear medium-scale evolutions in the large-scale process for the La Niña forecast

Environmetrics 2014; 25: 230–244 Copyright © 2014 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/environmetrics

239

Page 11: Physically motivated scale interaction parameterization in reduced rank quadratic nonlinear dynamic spatio-temporal models

Environmetrics D. W. GLADISH AND C. K. WIKLE

showed the best forecast among the models considered, although the linear model with both scales was competitive and more parsimonious.This provides evidence that it is important to allow medium-scale components to influence the evolution of the large-scale modes.

The SLP forecast model showed that the medium-scale nonlinear components were not important for very short lead times, where the pro-cess is essentially evolving linearly, but they are important when the lead times increase. Further, the forecast was better when these mediumscales were allowed to interact nonlinearly to contribute to the evolution of the large-scale modes. This is consistent with conventional wis-dom concerning meteorological forecasting and confirms that it is important in many forecasting applications to allow the medium-scalecoefficients to influence the large-scale modes nonlinearly.

Clearly, the scale interaction parameterization presented here contains some important assumptions. The most critical assumption is thatthe interaction from medium scales to large scales only goes one way. Another important, but less fundamental, assumption is that themedium scales do not interact with each other to influence their propagation. As discussed, the first assumption has strong physical support,whereas the second is simply due to convenience and model parsimony (i.e., it is almost certainly the case that medium scales interact atleast linearly, but the contribution to forecast skill may be negligible in certain applications). We note that this can essentially be mitigatedby allowing p to be larger (i.e., by what we classify as large or medium scales). Alternatively, it is simple to add a transition operator forthe medium scales that allows interaction. This suggests that a more principled model selection/testing environment should be considered.For example, if computational cost allows, one could fit such models and compare them via information criteria, Bayes factors, or posteriorpredictive checks. In addition, stochastic search variable selection (as in Wikle and Holan, 2011) or the Bayesian Lasso (Park and Casella,2008) could be used to discover which specific model interactions are significant.

AcknowledgementsWe thank the two anonymous referees who provided helpful comments that improved this manuscript. Funding for this research was providedby Office of Naval Research grant N00014-10-1-0518 and National Science Foundation grant DMS-1049093.

REFERENCES

Barnston AG, Glantz MH, He Y. 1999. Predictive skill of statistical and dynamical climate models in forecasts of SST during the 1998-1997 El Niño episodeand the 1998 La Niña onset. Bulletin of the American Meteorological Society 80:217–244.

Berliner LM. 1996. Hierarchical Bayesian time-series models. Fundamental Theories of Physics 79:15–22.Berliner LM, Wikle CK, Cressie N. 2000. Long-lead prediction of Pacific SSTs via Bayesian dynamic modeling. Journal of Climate 13:3953–3968.Cressie N, Wikle CK. 2011. Statistics for Spatio-temporal Data. Wiley and Sons, Inc.: New York.Hooten MB, Wikle CK. 2008. A hierarchical Bayesian nonlinear spatio-temporal model for the spread of invasive species with application to the Eurasian

Collared-Dove. Environmental and Ecological Statistics 15:59–70.Leeds WB, Wikle CK, Fiechter J, Brown J, Millif RF. 2013. Modeling 3-D spatio-temporal biogeochemical processes with a forest of 1-D statistical

emulators. Environmetrics 24(2):1–12.Majda AJ, Yuan Y. 2012. Fundamental limitations of ad hoc linear and quadratic multi-level regression models for physical systems. Discrete and Continuous

Dynamical Systems - Series B 4:1333–1363.Majda AJ, Harlim J. 2013. Physics constrained nonlinear regression models for time series. Nonlinearity 26:201–217.Park T, Casella G. 2008. The Bayesian Lasso. Journal of the American Statistical Association 103(482):681–686.Philander SG. 1990. El Niño, La Niña and the Southern Oscillation. Academic Press: San Diego.Stein ML. 2013. Limitations on low rank approximations for covariance matrices of spatial data. Spatial Statistics. DOI: 10.1016/j.spasta.2013.06.003.Wiin-Nielsen AC, Chen TC. 1993. Fundamentals of Atmospheric Energetics. Oxford University Press: New York.Wikle CK, Holan SH. 2011. Polynomial nonlinear spatio-temporal integro-difference equation models. Journal of Time Series Analysis 32:339–350.Wikle CK, Hooten MB. 2010. A general science-based framework for dynamical spatio-temporal models. Test 19(3):417–451.Wikle CK, Milliff RF, Nychka D, Berliner LM. 2001. Spatiotemporal hierarchical Bayesian modeling: tropical ocean surface winds. Journal of the American

Statistical Association 96(454):382–397.Wilks DS. 2011. Statistical Methods in the Atmospheric Sciences, 3rd ed. Academic Press: San Diego.

APPENDIX A. MARKOV CHAIN MONTE CARLO ALGORITHM

The MCMC algorithm changes slightly depending upon the specific model for the expansion coefficients. The algorithm described hereassumes a GQN dynamic model framework for both ˛t and ˇt as specified in Section 2 (i.e., model M8). The remaining models are specialcases of this model and, thus, are not described specifically here. The full conditionals are of standard form for the majority of the parametersexcept for ˛t and ˇt , t D 1; : : : ; T , and thus can be sampled directly with Gibbs steps. In the case for ˛t and ˇt , t D 1; : : : ; T , thefull conditionals require a Metropolis within Gibbs update. The exception to this is when a quadratic term is not included for ˛t or ˇt . Inthese cases, the full conditional distribution of the respective parameter without the quadratic term is of standard form and can be sampleddirectly as a Gibbs step. For ease of notation, we utilize brackets to indicate distributions. Specifically, ŒY jX� and ŒX� denote the conditionaldistribution of Y given X and the unconditional distribution of X , respectively. The following are the full conditional distributions. Recallfrom Section 2 that a0, †a0, b0, †b0, q� , r� , Qm˛ , †˛ , Qm˛;Q, †˛;Q, Qmˇ;L, †ˇ;L, Qmˇ;Q, †ˇ;Q, Qmˇ , †ˇ , a˛ , S˛ , qˇ .j /, and rˇ .j / forj D 1; : : : ; q are user-specified hyperparameters. The hyperparameters used for our models are specified in Table A1. Further, we assume�2� is known, although it could easily be estimated in this framework.

For t D 1; : : : ; T , the full conditional distribution of Yt is given by

Yt j� � Gau�

W�1y;tv0y;t ;W

�1y;t

wileyonlinelibrary.com/journal/environmetrics Copyright © 2014 John Wiley & Sons, Ltd. Environmetrics 2014; 25: 230–244

240

Page 12: Physically motivated scale interaction parameterization in reduced rank quadratic nonlinear dynamic spatio-temporal models

REDUCED RANK NONLINEAR SPATIO-TEMPORAL MODELS Environmetrics

Table A1. The hyperparameters specified for the prior distributionsused in the analyses presented here

Hyperparameter Value Hyperparameter Value

a0P��1iD0

�˛.0/0;i

�=� †a0 100I

b0 0 †b0 100Iq� 2:001 r� 9:99

Qm˛ 0:5 †˛ 100IQm˛;Q 0 †˛;Q 100IQmˇ;L 0 †ˇ;L 100IQmˇ;Q 0 †ˇ;Q 100IQmˇ 0 †ˇ 10Ia˛ p C 1 S˛ 100Iqˇ .j / 3 rˇ .j / 0:05

Note that the value a0 is a data-derived hyperparameter chosen onthe basis of initial values set from the least squares procedure inAppendix B. The values for q� and r� were chosen such that themean and variance of the prior are 0.1 and 10, respectively. Thevalues for qˇ .j / and rˇ .j / were chosen such that the mean andvariance of the prior are 10 and 100, respectively. Finally, the valueqˇ .j / and rˇ .j / is the same value for j D 1; : : : ; q.

where Wy;t D H0tHt=�2� C I=�2� , vy;t D Z0tHt=�

2� C .ˆ

.1/˛t C ˆ.2/ˇt /

0=�2� , and ŒYt j�� is the full conditional distribution of Yt givenall the other model parameters. The full conditional distribution of �2� is also straightforward to derive and is given by

�2� j� � IG�q�;0; r�;0

with

q�;0 DnT

2C q�

and

r�;0 D

8<:1

r�C1

2

TXtD1

�Yt �ˆ.1/˛t �ˆ.2/ˇt

�0 �Yt �ˆ.1/˛t �ˆ.2/ˇt

�9=;�1

For the dynamical process, we set G .�/ as the identity function. Note that we can rewrite the components in (5) as

M˛˛t D .˛t ˝ Ip/0vec.M˛/

.Ip ˝ ˛0t /M˛;Q˛t D�Q̨ t ˝ Ip

vec

�QM˛;Q

�Mˇ;Lˇt D .ˇt ˝ Ip/0vec.Mˇ;L/

.Ip ˝ ˇ0t /Mˇ;Qˇt D�Q̌t ˝ Ip

vec

�QMˇ;Q

where

Q̨ t ��˛2t .1/; ˛t .1/˛t .2/; : : : ; ˛t .1/˛t .p/; ˛

2t .2/; : : : ; ˛

2t .p/

�0Q̌t �

�ˇ2t .1/; ˇt .1/ˇt .2/; : : : ; ˇt .1/˛t .q/; ˇ

2t .2/; : : : ; ˇ

2t .q/

�0and

QM˛;Q �

0B@Qm0˛;1

:::

Qm0˛;p

1CA

QMˇ;Q �

0BB@Qm0ˇ;1:::

Qm0ˇ;p

1CCA

Environmetrics 2014; 25: 230–244 Copyright © 2014 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/environmetrics

241

Page 13: Physically motivated scale interaction parameterization in reduced rank quadratic nonlinear dynamic spatio-temporal models

Environmetrics D. W. GLADISH AND C. K. WIKLE

with

Qm˛;i ��

M.i/˛;Q

.1; 1/;M.i/˛;Q

.2; 1/; : : : ;M.i/˛;Q

.p; p/�0

Qmˇ;i ��

M.i/ˇ;Q

.1; 1/;M.i/ˇ;Q

.2; 1/; : : : ;M.i/ˇ;Q

.p; p/�0

Thus, the full conditional of the matrix of coefficients for the propagation of the linear term of ˛t is vec.M˛/j� �

Gau�

W�1˛;L

a˛;L;W�1˛;L

�, where

W˛;L D

TXtD1

�˛t�1 ˝ Ip

0Q�1˛ �˛t�1 ˝ Ip

C†�1˛;L

a˛;L DTXtD1

v0˛;L;tQ�1˛

�˛t�1 ˝ Ip

C†�1˛;L

v˛;L;t D ˛t ��Q̨ t�1 ˝ Ip

vec

�QM˛;Q

��Mˇ;Lˇt�1 �

�Q̌t�1 ˝ Ip

vec

�QMˇ;Q

The full conditional distribution for vec.M˛;Q/ is vec.M˛;Q/j� � Gau�

W�1˛;Q

a˛;Q;W�1˛;Q

�, where

W˛;L D

TXtD1

�Q̨ t�1 ˝ Ip

0Q�1˛ �Q̨ t�1 ˝ Ip

C†�1˛;Q

a˛;Q DTXtD1

v0˛;Q;tQ�1˛

�Q̨ t�1 ˝ Ip

C Qm0˛;Q†

�1˛;Q

v˛;Q;t D ˛t �M˛;L˛t�1 �Mˇ;Lˇt�1 ��Q̌t�1 ˝ Ip

vec

�QMˇ;Q

Similarly, for the propagation matrices of ˇt on the large-scale coefficients ˛t , the full conditional distribution is vec.Mˇ;L/j� �

Gau�

W�1ˇ;L

aˇ;L;W�1ˇ;L

�, where

Wˇ;L D

TXtD2

�ˇt�1 ˝ Ip

0Q�1˛ �ˇt�1 ˝ Ip

C†�1ˇ;L

aˇ;L DTXtD2

v0˛;L;tQ�1˛

�˛t�1 ˝ Ip

C†�1ˇ;L

vˇ;L;t D ˛t ��Q̌t�1 ˝ Ip

vec

�QMˇ;Q

��M˛;L˛t�1 �

�Q̨ t�1 ˝ Ip

vec

�QM˛;Q

The full conditional distribution for vec.Mˇ;Q/ is vec.Mˇ;Q/j� � Gau�

W�1ˇ;Q

aˇ;Q;W�1ˇ;Q

�, where

Wˇ;L D

TXtD2

�Q̌t�1 ˝ Ip

0Q�1˛

�Q̌t�1 ˝ Ip

C†�1ˇ;Q

a˛;Q DTXtD2

v0ˇ;Q;tQ�1˛

�Q̌t�1 ˝ Ip

C Qm0ˇ;Q†

�1ˇ;Q

vˇ;Q;t D ˛t �Mˇ;Lˇt�1 �M˛;L˛t�1 ��Q̨ t ˝ Ip

vec

�QM˛;Q

The precision matrix Q�1˛ has the full conditional distribution Q�1˛ j� �Wishart�

S�1˛;0; T C a˛�

, where

S˛;0 D a˛S˛ CTXtD1

q˛;tq0˛;t

q˛;t D ˛t �M˛;L˛t�1 ��Q̨ t�1 ˝ Ip

vec

�QM˛;Q

��Mˇ;Lˇt�1 �

�Q̌t�1 ˝ Ip

vec

�QMˇ;Q

The full conditionals for mˇ .j /, j D 1; : : : ; q, which are the diagonal elements of the propagation matrix Mˇ , are given by mˇ .j /j� �Gau

�W �1m am; W

�1m

, where

wileyonlinelibrary.com/journal/environmetrics Copyright © 2014 John Wiley & Sons, Ltd. Environmetrics 2014; 25: 230–244

242

Page 14: Physically motivated scale interaction parameterization in reduced rank quadratic nonlinear dynamic spatio-temporal models

REDUCED RANK NONLINEAR SPATIO-TEMPORAL MODELS Environmetrics

Wm D

TXtD1

ˇ2t�1.j /

Qˇ .j /C

1

�2ˇ.j /

am D

TXtD1

ˇt .j /ˇt�1.j /

Qˇ .j /CQmˇ .j /

�2ˇ.j /

with Qmˇ .j / the j th element of prior mean Qmˇ and �2ˇ.j / the j th diagonal element of the prior covariance †ˇ .

The full conditionals for Qˇ .j / are given by Qˇ .j /j� � IG.qˇ .j /C T=2; rˇ;0.j //, where

rˇ;0.j / D

24 1

rˇ .j /C1

2

TXtD1

.ˇt .j / �mˇ .j /ˇt�1.j //2

35�1

When the quadratic term for ˛t is in the model, the full conditionals for the large-scale coefficients are not of standard form and thus aMetropolis–Hastings within Gibbs step is required. For time t D �� C 1; : : : ; 0, the target distribution is proportional to

Œ˛t j�� /˛tC� j˛t ;ˇt ;M˛;L;M˛;Q;Mˇ;L;Mˇ;Q;Q˛

�Œ˛t �

For time t D 1; : : : ; T � � , the target distribution is

Œ˛t j�� /˛tC� j˛t ;ˇt ;M˛;L;M˛;Q;Mˇ;L;Mˇ;Q;Q˛

�˛t j˛t�� ;ˇt�� ;M˛;L;M˛;Q;Mˇ;L;Mˇ;Q;Q˛

�ŒYt j˛t ;ˇt �

For time t D T � � C 1; : : : ; T , the target distribution is

Œ˛t j�� /˛t j˛t�� ;ˇt�� ;M˛;L;M˛;Q;Mˇ;L;Mˇ;Q;Q˛

�Œ˛t �ŒYt j˛t ;ˇt �

Similarly for the medium-scale coefficients, ˇt is sampled using a Metropolis–Hastings within Gibbs step. For time t D �� C 1; : : : ; 0,the target distribution is proportional to

Œˇt j�� /˛tC� j˛t ;ˇt ;M˛;L;M˛;Q;Mˇ;L;Mˇ;Q;Q˛

�Œˇt �

For time t D 1; : : : ; T � � , the target distribution is

Œˇt j�� /˛tC� j˛t ;ˇt ;M˛;L;M˛;Q;Mˇ;L;Mˇ;Q;Q˛

�ŒYt j˛t ;ˇt �

ˇtC� jˇt ;Mˇ ;Qˇ �Œˇt jˇt�� ;Mˇ ;Qˇ

For time t D T � � C 1; : : : ; T , the target distribution is

Œˇt j�� /˛tC� j˛t ;ˇt ;M˛;L;M˛;Q;Mˇ;L;Mˇ;Q;Q˛

�ŒYt j˛t ;ˇt �

ˇt jˇt�� ;Mˇ ;Qˇ

Proposal candidates are generated using a random-walk proposal distribution. Specifically, for iteration i , a proposal is generated fromGau

�˛i�1jt ; ı

�, where ı is some tuning parameter chosen for suitable mixing of the MCMC sample chains.

APPENDIX B. CHOICE OF MODEL ORDER

Initially, ˛t and ˇt , for t D 1; : : : ; T , are obtained by the orthogonal projection of the data vectors Zt onto known ˆ.1/ and ˆ.2/,respectively; this is facilitated in forecasting applications where there are no missing data, as is the case in the examples presented inSection 3. In these examples, the basis function matrices correspond to EOF decompositions of the data (e.g., Cressie and Wikle, 2011, Chap.5). It is typically the case that the leading EOFs correspond to larger spatial scales of variability, and the spatial scale generally decreasesas the EOF order increases. The important assumption is that these expansion coefficients are then treated as observed as opposed to beinghidden processes and that they in fact play the role of the data. This is not an unreasonable assumption and, in fact, is the standard in manygeophysical analyses (e.g., Wilks, 2011). After obtaining these spectral coefficients, the propagation matrices are estimated via least squaresdepending upon the chosen model as described in the succeeding text.

The least squares estimation is accomplished by noting that we can rewrite (5) as

˛tC� D .˛t ˝ Ip/0vec.M˛/C�Q̨ t ˝ Ip

0 vec�QM˛;Q

�C .ˇt ˝ Ip/0vec.Mˇ;L/

C�Q̌t ˝ Ip

0vec

�QMˇ;Q

�C �tC�

Environmetrics 2014; 25: 230–244 Copyright © 2014 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/environmetrics

243

Page 15: Physically motivated scale interaction parameterization in reduced rank quadratic nonlinear dynamic spatio-temporal models

Environmetrics D. W. GLADISH AND C. K. WIKLE

where Q̨ t and Q̌ t are vectors containing the quadratic interactions for their respective components (see Appendix A for details). We definethe design matrix X such that

Xt D�.˛t ˝ Ip/0

�Q̨ t ˝ Ip

0.ˇt ˝ Ip/0

�Q̌t ˝ Ip

0 �

and define the vector of dynamic model parameters Qm D�

vec.M˛/0; vec

�QM˛;Q

�0; vec.Mˇ;L/

0; vec�QMˇ;Q

�0�0. Then, (5) can be written

as

˛tC� D Xt QmC �tC� (B.1)

We then can estimate the propagator parameters by ordinary least squares estimation:

OQm D

0@ TXtD1

X0tXt

1A�1

TXtD1

�X0t˛t

(B.2)

The estimates for each of the original propagator matrices then are their respective elements in OQm. In certain applications, depending onthe data volume, the inverse in (B.2) is singular (or nearly so), and we may have to add a small ridge constant to the diagonal to ensurenonsingularity (and to shrink the parameter estimates).

Choice of p and q may be user specified, or we may seek to apply the least squares approach as a quick way to evaluate models to helpselect them. First, we must select the total number of components P � p C q. This would be relatively easy if we were only consideringon model, but given we would like to consider the range of potential models, we consider a baseline model, ˛tC� D M˛t C �tC� , withoutthe inclusion of ˇt in (4) or (6), where P is the number of components in ˛t . In the examples presented in Section 3, we implement thefollowing procedure:

1. Fit the model ˛tC� DM˛t C �tC� ; for P D 6; 8; 10; 12 components.2. Choose P using a prespecified out-of-sample procedure. For the long-lead SST prediction example in Section 3, we select the P that

gives the minimum prediction error between forecasted and observed Niño 3.4 index. For the SLP model in Section 3, we select the Pthat gives the minimum RMSE between the forecasted and observed data (averaged across all spatial locations). The idea here is that thisgives us a realistic, yet fairly large, number of modes to consider in the GQN framework. Typically, one would need more modes in alinear model than in a nonlinear model to accommodate variability, so the hope is that this is a conservative choice.

3. Fit model M1 for all possible values of p and q using the least squares estimation procedure described earlier, where P D p C q forp D 1; : : : ; P � 1, q D P � 1; : : : ; 1.

4. Determine the best p and q as specified in step 2.5. Continue fitting models for M2,: : :,M8 for all values of p and q as mentioned in step 3. Then, for each model M2,: : :,M8, determine the

best p and q in the same fashion as mentioned in step 4.

After p and q are determined from the aforementioned procedure, a full MCMC implementation is then run for all eight possible models inour examples.

wileyonlinelibrary.com/journal/environmetrics Copyright © 2014 John Wiley & Sons, Ltd. Environmetrics 2014; 25: 230–244

244