gibbs samplers for varma and its extensions

22
Gibbs Samplers for VARMA and Its Extensions By Joshua C.C. Chan Australian National University Eric Eisenstat University of Bucharest ANU Working Papers in Economics and Econometrics # 604 February 2013 JEL: C11, C32, C53 ISBN: 0 86831 604 0

Upload: letuyen

Post on 02-Jan-2017

230 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Gibbs Samplers for VARMA and Its Extensions

Gibbs Samplers for VARMA and Its Extensions

By

Joshua C.C. Chan

Australian National University

Eric Eisenstat University of Bucharest

ANU Working Papers in Economics and Econometrics # 604

February 2013 JEL: C11, C32, C53

ISBN: 0 86831 604 0

Page 2: Gibbs Samplers for VARMA and Its Extensions

Gibbs Samplers for VARMA and Its Extensions

Joshua C.C. ChanResearch School of Economics,

Australian National University

Eric EisenstatFaculty of Business Administration,

University of Bucharest

January 2013

Abstract

Empirical work in macroeconometrics has mostly restricted to using VARs, eventhough there are strong theoretical reasons to consider general VARMAs. This isperhaps because estimation of VARMAs is perceived to be challenging. In thisarticle, we develop a Gibbs sampler for the basic VARMA, and demonstrate how itcan be extended to models with stochastic volatility and time-varying parameters.We illustrate the methodology through a macroeconomic forecasting exercise. Weshow that VARMAs produce better density forecasts than VARs, particularly forshort forecast horizons.

Keywords: state space, stochastic volatility, factor model, macroeconomic forecast-ing, density forecast

JEL codes: C11, C32, C53

Page 3: Gibbs Samplers for VARMA and Its Extensions

1 Introduction

Vector autoregressive moving average (VARMA) models have long been considered anappropriate framework for modeling covariance stationary time series. As is well known,by the Wold decomposition theorem, any covariance stationary time series has an infinitemoving average representation, which can then be approximated by a suitable finite-orderVARMA model. However, in most empirical work, only purely autoregressive models areconsidered. In fact, since the seminal work of Sims (1980), VARs have become the mostprominent approach in empirical macroeconometrics. A wide range of specifications ex-tending the basic VAR have been suggested — including time-varying parameter VARs(Canova, 1993), Markov switching or regime switching VARs (Paap and van Dijk, 2003;Koop and Potter, 2007), VARs with stochastic volatility (Cogley and Sargent, 2005; Prim-iceri, 2005), and factor augmented VARs (Bernanke, Boivin, and Eliasz, 2005; Korobilis,2012), among many others (see, e.g., Koop and Korobilis, 2010) — whereas VARMAsremain underdeveloped.

This unbalanced development is perhaps due to the fact that in contrast to VARs, identi-fication and estimation of VARMAs are more difficult. In particular, for identification asystem of nonlinear invertibility conditions have to be imposed on the VMA coefficients,which makes estimation using maximum likelihood or Bayesian methods difficult. Ontop of that, even simple VMAs are typically high-dimensional. To obtain the maximumlikelihood estimates, iterative gradient-based or direct-search optimization routines arerequired; for Bayesian estimation, special Metropolis-Hastings algorithms are needed asthe conditional distribution of the VMA coefficients is not of standard form. This articletackles this issue by developing an efficient Gibbs sampler to estimate the VARMA model.The significance of our contribution lies in the realization that once the basic VARMAcan be efficiently estimated via the Gibbs sampler, a wide variety of generalizations,analogous to those mentioned earlier extending the basic VAR, can also be fitted easilyusing the machinery of Markov chain Monte Carlo (MCMC) techniques. In fact, we willdiscuss two generalizations of VARMA: models allowing for time-varying parameters andstochastic volatility.

Our approach draws on a few recent developments in the state space literature. First, wemake use of a convenient state space representation of VARMA introduced in Metaxoglouand Smith (2007). Specifically, by using the fact that a VMA plus white noise remainsa VMA (Peiris, 1988), the authors write a VARMA as a latent factor model, with theunusual feature that lagged factors also enter the current measurement equation. Thismodel is then estimated using the EM algorithm and the Kalman filter. Our point ofdeparture is to develop an efficient Gibbs sampler for this state space representationof VARMA. As discussed, due to the modular nature of MCMC algorithms, furthergeneralizations can be easily fitted once we have an efficient Gibbs sampler for the basicmodel. Second, instead of using the Kalman filter — which involves redefining the statesinto a higher dimensional vector that in turns slows down estimation — we make useof the precision-based sampler to simulate the latent factors without the usual filteringand smoothing iterations (Chan and Jeliazkov, 2009; McCausland, Miller, and Pelletier,

2

Page 4: Gibbs Samplers for VARMA and Its Extensions

2011).

Of course, in principle any invertible VMA can be approximated well by a sufficientlyhigh order VAR. In practice, however, a high order VAR has too many parameters toestimate relative to the number of observations typically available for macroeconomicdata. This can lead to imprecise estimation of impulse responses and poor forecastperformance. In fact, some recent papers have shown that parsimonious ARMAs — inboth univariate and multivariate settings — often forecast better than purely AR models(e.g., Athanasopoulos and Vahid, 2008; Chan, 2012). In our empirical application, wefind that VARMAs provide better density forecasts compared to VARs, particularly forshort forecast horizons.

In addition, even when a VAR is a suitable model for a particular set of variables overall,a VARMA may be more appropriate for a small subset of these variables; using a VARin this setting might lead to loss of forecast accuracy. This perhaps explains why largeand medium VARs — with the addition of strong and sensible priors — tend to forecastbetter than small VARs (e.g., Banbura, Giannone, and Reichlin, 2010; Koop, 2011). Sincethese small VARs are typically used in macroeconomic applications, an easy and quickway to estimate VARMAs would be a valuable addition to the toolkit for modeling andforecasting time series.

The rest of this article is organized as follows. Section 2 first introduces a state spacerepresentation of VARMA that facilitates efficient estimation, followed by a detaileddiscussion of two alternative identification strategies. We develop a Gibbs sampler forthe basic VARMA in Section 3, which is extended to models with time-varying parametersand stochastic volatility in Section 4. In Section 5, the methodology is illustrated using arecursive forecasting exercise involving three macroeconomic variables. Lastly, we discusssome future research directions in Section 6.

2 A Convenient Representation of VARMA

Following Metaxoglou and Smith (2007), we consider a convenient state space represen-tation of VARMA that allows for efficient estimation as follows. Recall that a (p, q)-thorder VARMA, or VARMA(p, q), is given by

yt = µ+ Π1yt−1 + · · ·+ Πpyt−p + ft + Ψ1ft−1 + · · ·+ Ψq ft−q, (1)

where yt is an n × 1 vector of observations, Π1, . . . , Πp are n × n matrices of VAR

coefficients, Ψ1, . . . , Ψq are n × n matrices of VMA coefficients, and the innovations ft

are distributed iid N (0, Ω). We make use of the results in Peiris (1988), who showsthat white noise added to a VMA(q) remains a VMA(q). Hence, the following is anotherrepresentation of a VARMA(p, q):

yt = µ+ Π1yt−1 + · · ·+ Πpyt−p + ft + Ψ1ft−1 + · · ·+ Ψq ft−q + εt,

3

Page 5: Gibbs Samplers for VARMA and Its Extensions

where εt are iid N (0, Σ) distributed and Σ is a diagonal matrix. Now, we reparameterizeand obtain the following representation:

yt = µ+Π1yt−1 + · · ·+Πpyt−p +Ψ0ft +Ψ1ft−1 + · · ·+Ψqft−q + εt, (2)

where Ψ0 is a lower triangular matrix with ones on the diagonal, εt ∼ N (0,Σ) andft ∼ N (0,Ω) are independent of each other at all leads and lags, and Σ,Ω are diagonalmatrices. We further assume that f0 = f−1 = · · · = f−q+1 = 0. Of course, one cantreat these initial latent factors as parameters if desired, and the estimation proceduresdiscussed in subsequent sections can be modified easily to allow for this possibility. Fortypical situations where T ≫ q, whether these errors are modeled explicitly or not makeslittle difference in practice.

By fixing the covariance matrix Σ, Metaxoglou and Smith (2007) discuss a one-to-onemapping between the two parameterizations in (1) and (2). With this mapping, onecan then recover the estimates of the original parameters once the estimates for (2)are obtained. However, since the parameters in the VARMA(p, q) are typically not ofprimary interest — they are used to produce forecasts or compute impulse responses —we take a different approach and work directly with the parameterization in (2). Now,the representation in (2) may be viewed as a factor-augmented VAR (FAVAR) with theinnovations ft taking the role of unobserved factors. However, there are two unusualfeatures that are worth commenting. First, the number of factors is equal to the numberof variables, and as such, either additional time series (that can be used to estimate thelatent factors) or further identifying restrictions are required to identify the system in (2).Second, lagged latent factors ft−1, . . . , ft−q enter the equation for the current observationsyt, which induce serial correlation in yt.

The key challenge in making this specification operational is to develop an identificationstrategy such that the system in (2) is identifiable to the extent that an efficient samplingalgorithm may be readily constructed, while still allowing for rich dynamics. Essentially,with the reparameterization in (2), the identification issue that remains to be resolved isrelated to the fact that the system contains n more parameters than can be identified bythe available moments. However, arbitrarily imposing restrictions on Ψ0, . . . ,Ψ1,Ω,Σis hardly innocuous. For example, consider the variance and autocovariance matricesgenerated by (2) for p = 0:

Var(yt) =

q∑

r=0

ΨrΩΨ′

r +Σ, (3)

Cov(yt,yt−s) =

q−s∑

r=0

Ψr+sΩΨ′

r, s = 1, . . . , q, (4)

and Cov(yt,yt−s) = 0 for s > q. It is evident from this that the strategy of Metaxoglouand Smith (2007) — fixing Σ to be a constant matrix, say, C = (cij) — immediatelyimplies that Var(yit) > cii, where yit is the i-th element of yt. Clearly, imposing arbi-trary lower bounds on the process variance runs the risk of detrimentally distorting the

4

Page 6: Gibbs Samplers for VARMA and Its Extensions

estimation of free parameters and subsequent forecasts; yet there is rarely convincing a

priori guidance for bounding variances this way.

In what follows, we consider two general approaches to identifying (2). The first approachis based on imposing restrictions on the elements of Ψ1, . . . ,Ψq when exactly n series arerelated to the n latent factors, as presented thus far. Alternatively, it is possible to identifythe system without imposing further restrictions on the VMA coefficient matrices, as longas additional data are available that can be reasonably related to the n underlying factors.The remainder of this section describes both approaches in more detail.

2.1 Identification without Additional Time Series

As previously discussed, (2) can be hypothetically identified by appropriately fixing anadditional n parameters in the matrices Ψ1, . . . ,Ψq. However, applying exclusion restric-tions in this case is not trivial from at least two perspectives. First, arbitrarily imposingexclusion restrictions does not necessarily guarantee identification; this is clear from thesimple fact that if we restrict the n2 elements of Ψq to zero, the system remains uniden-tified with respect to n free parameters. Indeed, only certain combinations of exclusionsrestrictions in Ψ1, . . . ,Ψq lead to identifiability. For example, consider a VMA(1) withn = 4 and the following combinations of restrictions on Ψ1:

· · 0 0· · 0 0· · · ·· · · ·

,

· · 0 0· · · 0· · · 0· · · ·

,

· · 0 ·· · 0 0· · · 0· · · ·

,

· 0 0 0· · 0 0· · · 0· · · ·

,

(A) (B) (C) (D)

where each · represents an unrestricted parameter. It may be tempting to concludethat schemes A–C lead to just-identified systems since they result in exactly 26 modelparameters that are matched to 26 moments. Similar reasoning may likewise lead to theconclusion that scheme D results in over-identification.

Neither of the two foregoing conclusions is correct: only scheme C yields a just-identifiedsystem, while under schemes A, B, and D, a subset of the parameters is unidentified.Indeed, for scheme D where the number of parameters is actually less than the numberof available moments, Ψ1,44, ω2

4, and σ24 — the (4, 4)-th elements of Ψ1, Ω, and Σ,

respectively — are not separately identifiable (i.e., only the product ω24Ψ1,44 and the sum

ω24 + σ2

4 are identified), while the remaining parameters are over-identified. In general,any scheme involving exclusion restrictions in this context that resembles a block lowertriangular Ψ1 will be characterized by under-identification of the parameters in the lowerright quadrant (along with the corresponding elements inΩ andΣ) and over-identificationfor the remaining parameters. For the lower triangular case represented in scheme D, thekey problem is that while 26 moments are available to identify the parameters, the VMA

5

Page 7: Gibbs Samplers for VARMA and Its Extensions

autocovariance structure is such that Ψ1,44, ω24, and σ2

4 are related to only two of thesemoments, namely, Var(y4t) and Cov(y4t, y4,t−1), where y4t is the 4-th element of yt.

In this light, removing the exclusion restriction on Ψ1,44 serves to redistribute the threeunidentified parameters across a wider range of moments and thereby provides full iden-tification of the entire system. Alternatively, one could achieve identification with any ofthe schemes A, B, and D by adding an extra lag (such as to yield a VMA(2)) with anunrestricted Ψ2. However, either of these solutions relies critically on certain sample mo-ments not being close to 0. For example, if the data support Ψ1,44 ≈ 0, then “activating”Ψ1,44 for identification purposes is futile: identification in the system is weak and conven-tional problems related to computation prevail. The same is true when augmenting theVMA specification with additional lags and relying on elements of Ψ2 to identify Ψ1,44,ω24, and σ2

4 .

Moreover, care must be taken to ensure that reasonable system dynamics remain intact.Continuing with the VMA(1) example, recall the autocovariances given by (4). LetΨj,kl denote the (k, l)-th element of Ψj and let yit represent the i-th element of yt.It is clear that if, say, Ψ1,11 = 1 is imposed, then the lag-1 autocovariance of y1t isCov(y1t, y1,t−1) = ω2

11 > 0. In addition, restrictions of the type Ψ1,21 = Ψ1,22 = 0 leadto Cov(y2t, y2,t−1) = 0, and so on. This suggests that, in general, one should avoidrestricting the diagonal elements as these tend to translate to peculiar restrictions on theprocess autocovariances, while exclusion restrictions on the off-diagonal elements of theVMA matrices are far less imposing. Nevertheless, as previously discussed, only certaincombinations of restrictions involving off-diagonal elements achieve identification.

Clearly, in a general case with an arbitrarily large n, finding the right combination ofexclusion restrictions that yields a just-identified system while retaining reasonable sys-tem dynamics is a daunting task. On the other hand, in applications where the MAcoefficients themselves are not of primary interest (such as when the end-goal is fore-casting), exact identification of model parameters is not of central concern; to the extentthat one succeeds to construct an efficient sampling algorithm, the fact that the data arenot informative regarding certain (nuisance) parameters is irrelevant. This view has beenadvocated, for example, by Stock and Watson (2002).

Consequently, one may wish to consider a practical alternative strategy based on partialidentification. To be precise, for any VARMA(p, q) we restrict Ψ1 to be lower triangular,while allowing Ψ2, . . . ,Ψq to be full matrices. Then, the system is (at worst) partiallyidentified in the sense that Ψ1,nn, ω

2n, and σ2

n might not be separately identifiable, butthe remaining parameters are over-identified. The reasoning follows as an extension ofthe discussion of the n = 4 case above. Intuitively, for q = 1 each additional data seriesyit allows for the identification (and over-identification) of the parameters in the exist-ing measurement equations (i.e., those related to y1t, . . . , yi−1,t), but adds a new set ofparameters Ψ1,ii, ω

2i , and σ2

i that cannot be individually identified, apart from the quan-tities ω2

iΨ1,ii and ω2i +σ2

i . Therefore, regardless of the system size, only three parametersprovide cause for possible concern in terms of sampling efficiency, while this can be dealtwith by specifying a reasonably informative prior for Ψ1,nn. For q > 1, the identifica-

6

Page 8: Gibbs Samplers for VARMA and Its Extensions

tion/computation problem is potentially resolved through the effect of Ψ2, . . . ,Ψq on theoverall covariance structure, provided that their sample moments are not close to zero.

Aside from the practical simplicity of the partial identification strategy, another advan-tage of specifying a lower triangular Ψ1 may be perceived in terms of parsimony. Thismay be of particular value especially in the context of forecasting. A potential drawbackof this scheme is that it restricts the space of autocovariances that can be represented,and it accentuates the importance of the variable ordering. A consequence of the latter,for example, is that forecasting y1t with a VMA(1) and Ψ1 lower triangular is equivalentto forecasting it with a univariate MA. Regarding the implications on process autocovari-ances, this scheme yields only mild restrictions. The simplest example of these (albeitnot the only) is

Cov(y1t, yit)2 ≥ 4Cov(y1t, yi,t−1)Cov(yit, y1,t−1),

for i > 1. However, in many contexts such restrictions will not detract significantly fromthe flexibility of the model and (2) may still be regarded as a close approximation to theWold decomposition of the multivariate series. In our empirical application, the meritof this (partial) identification scheme can be assessed by the forecast performance of theVARMAs compared to standard VARs.

2.2 Identification with Additional Time Series

Alternatively, suppose we obtain additional m time series that can be related to yt andft. Denoting these additional variables as zt, we can specify an extended version of (2)as

(yt

zt

)=

µz

)+

(Π1 0

Πzy1 Πzz

1

)(yt−1

zt−1

)+ · · ·+

(Πp 0

Πzyp Πzz

p

)(yt−p

zt−p

)

+

(Ψ0

Ψz0

)ft +

(Ψ1

Ψz1

)ft−1 + · · ·+

(Ψq

Ψzq

)ft−q +

(εtεzt

), (5)

where εzt ∼ N (0,Σz) is independent of εt. Recall that Σ and Ω are diagonal, andΨ0 is lower triangular with ones on the diagonal. Since we have m + n variables butonly n factors, it is clear that for any n, p, and q the system in (5) is identified for asufficiently largem, without the need for further restrictions on the remaining parameters.Nevertheless, introducing an additional m time series in this manner leads to m(mp +n(p + q + 1) + 2) additional parameters to estimate, which raises concerns of parameterproliferation and over-fitting.

In developing this identification strategy, therefore, it may be of interest to design amore parsimonious way in which the additional time series zt are related to the variablesof interest yt and the factors ft. This essentially involves setting some of the newlyintroduced parameters to zero. To that end, one reasonable approach may be to specify

zt = µz +Π

zy1 yt−1 +Ψz

1ft−1 + εzt , ε ∼ N (0,Σz), (6)

7

Page 9: Gibbs Samplers for VARMA and Its Extensions

which leads to the FAVAR specification of Bernanke et al. (2005).1 Consequently, as longas zt represents relevant “information time series”, such a specification is straightforwardto implement. The implication of this is that the FAVAR model imbeds an unrestricted

VARMA(p, q) process for yt and a simple static factor process for zt (i.e., following (4),the covariances Cov(zt, zt−s) = 0 for all s > 1), and therefore, informational times seriesare only being used to “estimate” the factors in the simplest possible way. Of course,the challenge is to find a sufficiently large set of additional data zt that fits the abovespecification.

Further, we note that the above formulation may be adopted in developing a methodologyfor estimating medium/large VARMAs. In particular, we may consider starting with (2),for a fairly large n, but instead of adding informational time series to achieve properidentification, we reduce the number of factors to r ≪ n (as proposed in Eisenstat, 2011).Following the same line of reasoning, restricting the upper r × r block of Ψ0 to be lowertriangular (with ones on the diagonal) leads to (over-) identification without the needfor additional parameter restrictions. Moreover, because the covariances given in (4)continue to hold for r < n, in general, allowing the remaining parameters to be estimatedfreely implies a VARMA(p, q) process for the entire set of time series yt.

The intuition behind this approach is that the reduction in the number of factors sim-ply leads to certain interrelationships between the VMA coefficients in the underlyingVARMA process. One may regard it, therefore, as a parsimonious specification of alarge VARMA model where reasonable a priori restrictions are imposed on the VMAcoefficients in the same spirit that a typical factor model interrelates the elements of thecontemporaneous covariances for a set of related time series. Consequently, to estimatesuch a specification in practice, it only remains to impose sufficient a priori informationon the autoregressive parameters by way of strong, sensible priors. To this end, a numberof methods — such as those explored in Litterman (1986), Banbura et al. (2010), andKoop (2011), among others — have been demonstrated to perform well in large VARsettings and can be readily adopted here as well. However, we leave the parsimoniousformulation of medium/large VARMA models for future research.

3 Posterior Analysis

In what follows, we assume that we do not have additional “information time series” andpursue the identification strategy discussed in Section 2.1. Specifically, we assume thatΨ1 is lower triangular whereas Ψ2, . . . ,Ψq are unrestricted. Estimation with additionalinformation time series as in (5) can proceed similarly with only minor modifications.First, we introduce the conjugate priors, followed by a discussion of an efficient Gibbs

1To be precise, Bernanke et al. (2005, equation 2.2) relates zt to yt and ft contemporaneously, while(6) appears to match zt to the previous period outcomes yt−1 and ft−1. The two specifications arenevertheless equivalent since in practice, the data in zt can be shifted on period forward to achieve theappropriate temporal matching.

8

Page 10: Gibbs Samplers for VARMA and Its Extensions

sampler for VARMA(p, q) in (2). We note that throughout, the analysis is performedconditional on the initial observations y0, . . . ,y1−p. Again, one can extend the posteriorsampler to the case where the initial observations are modeled explicitly. To facilitateestimation, we first rewrite (2) as

yt = Xtβ +Ψ0ft +Ψ1ft−1 + · · ·+Ψqft−q + εt,

where Xt = In ⊗ (1,y′

t−1, . . . ,y′

t−p), β = vec((µ,Π1, . . . ,Πq)′), εt ∼ N (0,Σ), and ft ∼

N (0,Ω). Now, stacking the observations over t, we have

y = Xβ +Ψf + ε, (7)

where

X =

X1

...XT

, f =

f1...fT

, ε =

ε1...εT

,

and Ψ is a Tn× Tn lower triangular matrix with Ψ0 on the main diagonal block, Ψ1 onfirst lower diagonal block, Ψ2 on second lower diagonal block, and so forth. For example,for q = 2, we have

Ψ =

Ψ0 0 0 0 · · · 0

Ψ1 Ψ0 0 0 · · · 0

Ψ2 Ψ1 Ψ0 0 · · · 0

0 Ψ2 Ψ1 Ψ0 · · · 0...

. . .. . .

. . ....

0 0 · · · Ψ2 Ψ1 Ψ0

.

It is important to note that in general Ψ is a sparse Tn × Tn matrix that contains atmost

n2

((q + 1)T −

q(q + 1)

2

)< n2(q + 1)T

non-zero elements, which grows linearly in T and is substantially less than the total (Tn)2

elements for typical applications where T ≫ q. This special structure can be exploitedto speed up computation, e.g., by using block-banded or sparse matrix algorithms (see,e.g., Kroese, Taimre, and Botev, 2011, p. 220).

To complete the model specification, we assume independent priors for β,Ψ0,Ψ1, . . . ,Ψq,Σ, and Ω as follows. For β, we consider the multivariate normal prior N (β0,Vβ). Letψi denote the free parameters in the i-th column of (Ψ0,Ψ1, . . . ,Ψq)

′. For example, ifwe impose the condition that Ψ1 is lower triangular, we have

ψi = (Ψ0,i1, . . . ,Ψ0,ii−1,Ψ1,i1, . . . ,Ψ1,ii,Ψ2,i1, . . . ,Ψ2,in, . . . ,Ψq,i1, . . . ,Ψq,in)′,

where Ψj,kl is the (k, l)-th element of Ψj (details related to a more general set of restric-tions are covered in the Appendix). We assume the normal prior N (ψ0i,Vψi

) for ψi.For later reference we further let ψ = (ψ′

1, . . . ,ψ′

n)′. Finally, for the diagonal elements

9

Page 11: Gibbs Samplers for VARMA and Its Extensions

of Σ = diag(σ21 , . . . , σ

2n) and Ω = diag(ω2

1, . . . , ω2n), we assume the following independent

inverse-gamma priors:

σ2

i ∼ IG(νσi, Sσi), ω2

i ∼ IG(νωi, Sωi).

The Gibbs sampler proceeds by sequentially drawing from

1. p(β, f |y,ψ,Σ,Ω);

2. p(ψ |y,β, f ,Σ,Ω);

3. p(Σ,Ω |y,β, f ,ψ) = p(Σ |y,β, f ,ψ)p(Ω |y,β, f ,ψ).

Steps 2 and 3 are standard, and we leave the details to the Appendix; here we focus onStep 1. Although it might be straightforward to sample β and f separately by simulat-ing (β |y, f ,ψ,Σ,Ω) followed by (f |y,β,ψ,Σ,Ω), such an approach would potentiallyinduce high autocorrelation and slow mixing in the constructed Markov chain as β andf enter (7) additively. Instead, we aim to sample β and f jointly — by first drawing(β |y,ψ,Σ,Ω) marginally of f , followed by (f |y,β,ψ,Σ,Ω).

To implement the first step, we note that since f ∼ N (0, IT ⊗Ω), the joint density of ymarginal of f is

(y |β,ψ,Σ) ∼ N (Xβ,Sy),

where Sy = IT ⊗Σ +Ψ(IT ⊗Ω)Ψ′. Since both Σ and Ω are diagonal matrices, and Ψ

is a lower triangular sparse matrix, the covariance matrix is also sparse. Using standardresults from linear regression (see, e.g., Koop, 2003, p. 119-121), we have

(β |y,ψ,Σ,Ω) ∼ N (β,Dβ),

whereDβ =

(V−1

β +X′S−1

y X)−1

, β = Dβ

(V−1

β β0 +X′S−1

y y).

It is important to realize that in order to compute X′S−1y X or X′S−1

y y, one needs notobtain the Tn×Tn matrix S−1

y — otherwise it would involve O(T 3) operations. Instead,we exploit the fact that the covariance matrix Sy is sparse. For instance, to obtainX′S−1

y y, we can first solve the system Syz = y for z, which can be computed in O(T )operations. The solution is z = S−1

y y and we return X′z = X′S−1y y, which is the desired

quantity. Similarly, X′S−1y X can be computed quickly without inverting any big matrices.

Next, we sample all the latent factors f jointly. Note that even though a priori the latentfactors are independent, they are no longer independent given y. As such, samplingeach ft sequentially would potentially induce high autocorrelation and slow mixing in theMarkov chain. One could sample f using Kalman filter-based algorithms, but they wouldinvolve redefining the states so that only the state at time t enters the measurementequation at time t. As such, each (new) state vector would be of much higher dimension,

10

Page 12: Gibbs Samplers for VARMA and Its Extensions

which in turn results in slower algorithms. Instead, we avoid the Kalman filter andinstead implement the precision-based sampler developed in Chan and Jeliazkov (2009)to sample the latent factors jointly. To that end, note that a priori f ∼ N (0, IT ⊗ Ω).Using (7) and standard linear regression results again, we have

(f |y,β,ψ,Σ,Ω) ∼ N (f ,K−1

f ),

where

Kf = IT ⊗Ω−1 +Ψ′(IT ⊗Σ−1)Ψ, f = K−1

f Ψ′(IT ⊗Σ−1)(y−Xβ).

The challenge, of course, is that the covariance matrix K−1

f is a Tn×Tn full matrix, andsampling (f |y,β,ψ,Σ,Ω) using brute force is infeasible. However, the precision matrixKf is sparse (recall that both Σ−1 and Ω−1 are diagonal, whereas Ψ is also sparse), which

can be exploited to speed up computation. As before, we first obtain f by solving

Kfz = Ψ′(IT ⊗Σ−1)(y −Xβ)

for z. Next, obtain the Cholesky decomposition of Kf such that CfC′

f = Kf . Solve

C′

fz = u for z, where u ∼ N (0, ITn). Finally, return f = f + z, which follows the desireddistribution. We refer the readers to Chan and Jeliazkov (2009) for details.

The Gibbs sampler developed above can be extended in a straightforward manner toestimate various generalizations of the basic VARMA(p, q). This is what we turn to inthe next section.

4 Extensions

In this section we discuss two extensions of the basic VARMA(p, q) relevant to empir-ical macroeconomic applications. First, we relax the assumption of constant varianceand incorporate stochastic volatility into the model. Specifically, consider again theVARMA(p, q) specified in (7). Now, we allow the latent factors to have time-varyingvolatilities ft ∼ N (0,Ωt), where Ωt = diag(eh1t , . . . , ehnt), and each of the log-volatilityfollows an independent random walk:

hit = hi,t−1 + ζit, (8)

where ζit ∼ N (0, σ2hi) for i = 1, . . . , n, t = 2, . . . , T . The log-volatilities are initialized with

hi1 ∼ N (hi0, Vhi0), where hi0 and Vhi1

are known constants. For notational convenience,let ht = (h1t, . . . , hnt)

′, h = (h′

1, . . . ,hT )′ and σ2

h = (σ2h1, . . . , σ2

hn)′. Estimation of this

VARMA(p, q) with stochastic volatility would only require minor modifications of thebasic Gibbs sampler developed above. More specifically, posterior draws can be obtainedby sequentially sampling from

1. p(β, f |y,ψ,h,σ2h,Σ);

11

Page 13: Gibbs Samplers for VARMA and Its Extensions

2. p(ψ |y,β, f ,h,σ2h,Σ);

3. p(h |y,β, f ,ψ,σ2h,Σ);

4. p(Σ,σ2h |y,β, f ,h,ψ) = p(Σ |y,β, f ,h,ψ)p(σ2

h |y,β, f ,h,ψ).

Steps 1 and 2 proceed as before with minor modifications — e.g., the prior covariancematrix for f is now diag(eh11 , . . . , ehn1, . . . , eh1T , . . . , ehnT ) instead of IT ⊗Ω. But since thecovariance matrix remains diagonal, all the sparse matrix routines are as fast as before.Step 3 can be carried out, for example, using the auxiliary mixture sampling approach ofKim, Shepherd, and Chib (1998). Lastly, Step 4 is also standard if we assume independentinverse-gamma priors for σ2

h1, . . . , σ2

hn.

The second extension that can be easily estimated is a time-varying parameter variant.For example, suppose we allow the VMA coefficient matrices to be time-varying:

yt = Xtβ +Ψ0tft +Ψ1tft−1 + · · ·+Ψqtft−q + εt, (9)

where Xt and β are defined as before, εt ∼ N (0,Σ) and Σ is a diagonal matrix. Letψit denote the vector of free parameters in the i-th column of (Ψ0t,Ψ1t, . . . ,Ψqt)

′. Then,consider the transition equation

ψit = ψi,t−1 + ηit,

where ηit ∼ N (0,Qi) for i = 1, . . . , n, t = 2, . . . , T . The initial conditions are specifiedas ψit ∼ N (ψi0,Qi0), where ψi0 and Qi0 are known constant matrices.

Since Σ is a diagonal matrix and each (ψ′

i1, . . . ,ψ′

iT )′ is defined by a linear Gaussian state

space model, sampling from the relevant conditional distribution can be done quicklyby the precision-based sampler. In addition, sampling all the other parameters can becarried out as in the basic sampler with minor modifications. For example, to sample(β, f) jointly, note that we can stack (9) over t and write

y = Xβ + Ψf + ε,

where

X =

X1

...XT

, f =

f1...fT

, ε =

ε1...εT

,

and Ψ is a Tn × Tn lower triangular matrix with Ψ01, . . . ,Ψ0T on the main diagonalblock, Ψ12, . . . ,Ψ1T on first lower diagonal block, Ψ23, . . . ,Ψ2T on second lower diagonalblock, and so forth. The algorithms discussed before can then be applied directly.

12

Page 14: Gibbs Samplers for VARMA and Its Extensions

5 Empirical Application

In this section we illustrate the proposed approach and estimation methods with a re-cursive forecasting exercise that involves U.S. GDP growth, inflation, and interest rate.These variables are commonly used in forecasting (e.g., Banbura et al., 2010; Koop, 2011)and small DSGE models (e.g., An and Schorfheide, 2007). We first outline the set of com-peting models in Section 5.1, followed by a brief description of the data and the priors inSection 5.2. The results of the density forecasting exercise are reported in Section 5.3.

5.1 Competing Models

The main goal of this forecasting exercise is to illustrate the methodology and investigatehow VARMAs and the variants with stochastic volatility compare with standard VARs.To that end, we consider three sets of models: VAR(p), VARMA(p, 1), and VARMA(p, 1)with stochastic volatility. We investigate three cases where p = 2, 3, 4. First, considerthe following standard VAR(p):

yt = Xtβ + ξt,

where Xt = In ⊗ (1,y′

t−1, . . . ,y′

t−p), β = vec((µ,Π1, . . . ,Πq)′), ξt ∼ N (0,Ξ), and Ξ is a

full n × n covariance matrix. VARMA(p, 1) is the same as given in (2), and we imposethe identification restrictions discussed in Section 2.1. For the purpose of this exercise,we restrict Ψ1 to be lower triangular. Finally, for the VARMA variant with stochasticvolatility, we assume ft ∼ N (0,Ωt), where Ωt = diag(ω2

1, eh2t , ω2

3). The log-volatilityfollows a random walk:

h2t = h2,t−1 + ζ2t,

where ζ2t ∼ N (0, σ2h2) for t = 2, . . . , T , and the transition equation is initialized with

h21 ∼ N (h20, Vh20). In other words, we allow the latent factors corresponding to the in-

flation equation to have stochastic volatility, but not the other factors. This way we canincorporate time-varying volatility which is found to be empirically important for fore-casting inflation, while avoiding parameter proliferation that might result in overfittingand poor forecast performance.

5.2 Data and Priors

The data consist of U.S. quarterly GDP growth, CPI inflation, and the effective Fedfunds rate from 1959:Q1 to 2011:Q4. More specifically, given the quarterly real GDPseries w1t, we transform it via y1t = 400 log(w1t/w1,t−1) to obtain the growth rate. Weperform a similar transformation to the CPI index to get the inflation rate, whereas theeffective Fed funds rate series is kept unchanged. For easy comparison, we choose broadlysimilar priors across models. For instance, the priors for the VAR coefficients in VARMAspecifications are exactly the same as those of the corresponding VAR.

13

Page 15: Gibbs Samplers for VARMA and Its Extensions

As discussed in Section 3, we assume the following independent priors: β ∼ N (β0,Vβ),ψi ∼ N (ψ0i,Vψi

), σ2i ∼ IG(νσi, Sσi), and ω2

i ∼ IG(νωi, Sωi), i = 1, . . . , n. We set β0 = 0

and set the prior covariance Vβ to be diagonal, where the variances associated with theintercepts are 100 and those corresponding to the VAR coefficients are 1. For ψi, we setψ0i = 0, and set Vψi

such that the variances associated with the elements in Ψ0 are 1and those associated with the elements in Ψ1, . . . ,Ψq are all 0.252. We choose relativelysmall values for the degrees of freedom parameters which imply relatively non-informativepriors: νσi = νωi = 5, i = 1, . . . , n. For the scale parameters, we set Sσi = Sωi = 4,i = 1, . . . , n. These values imply E σ2

i = Eω2i = 1, i = 1, . . . , n. For VARMAs with

stochastic volatility, we also need to specify a prior for σ2h2. We assume σ2

h2∼ IG(νh, Sh),

where νh = 10 and Sh = 0.9, which implies E σ2h2

= 0.1.

5.3 Forecasting Results

To compare the performance of the competing models in producing density forecasts, weconsider a recursive out-of-sample forecasting exercise at various horizons as follows. Atthe t-th iteration, for each of the model we use data up to time t, denoted as y1:t, toestimate the joint predictive density p(yt+k |y1:t) under the model as the k-step-aheaddensity forecast for yt+k. We then expand the sample using data up to time t + 1, andrepeat the whole exercise. We continue this procedure until time T −k. At the end of theiterations, we obtain density forecasts under the competing models for t = t0, . . . , T − k,where we set t0 to be 1970:Q1.

We note that the joint predictive density p(yt+k |y1:t) is not available analytically, butit can be estimated using MCMC methods. For VARs and VARMAs, the conditionaldensity of yt+k given the data and the model parameters — denoted as p(yt+k |y1:t, θ) —is Gaussian with known mean vector and covariance matrix. Hence, the predictive densitycan be estimated by averaging p(yt+k |y1:t, θ) over the MCMC draws of θ. For VARMAswith stochastic volatility, at every MCMC iteration given the model parameters and allthe states up to time t, we simulate future log-volatilities from time t + 1 to t + k usingthe transition equation. Given these draws, yt+k has a Gaussian density. Finally, theseGaussian densities are averaged over the MCMC iterations to obtain the joint predictivedensity p(yt+k |y1:t).

To evaluate the quality of the joint density forecast, consider the predictive likelihoodp(yt+k = yo

t+k |y1:t), i.e., the joint predictive density of yt+k evaluated at the observedvalue yo

t+k. Intuitively, if the actual outcome yot+k is unlikely under the density forecast,

the value of the predictive likelihood will be small, and vise versa. We then evaluate thejoint density forecasts using the sum of log predictive likelihoods as a metric:

T−k∑

t=t0

log p(yt+k = yo

t+k |y1:t).

This measure can also be viewed as an approximation of the log marginal likelihood;see, e.g., Geweke and Amisano (2011) for a more detailed discussion. In addition to

14

Page 16: Gibbs Samplers for VARMA and Its Extensions

assessing the joint density forecasts, we also evaluate the performance of the models interms of the forecasts of each individual series. For example, to evaluate the performancefor forecasting the i-th component of yt+k, we simply replace the joint density p(yt+k =yot+k |y1:t) by the marginal density p(yi,t+k = yoi,t+k |y1:t), and report the corresponding

sum.

Table 1 reports the performance of the nine competing models for producing 1-quarter-ahead joint density forecasts, as well as the marginal density forecasts for the threecomponents. The figures are presented relative to those of the VAR(2) benchmark. Hence,any positive values indicate better forecast performance than the benchmark and viceversa. A few broad conclusions can be drawn from the results. Overall, adding a movingaverage component improves the forecast performance of standard VARs for all lag lengthsconsidered. For example, the difference between the log predictive likelihoods of VAR(3)and VARMA(3,1) is 15.5, which may be interpreted as a Bayes factor of about 5.4×106 infavor of the VARMA(3,1) model. In addition, adding stochastic volatility to the modelsfurther improves their forecast performance substantially. This is in line with the largeliterature on inflation forecasting that shows the considerable benefits of allowing forstochastic volatility for both point and density forecasts (see, e.g., Stock and Watson,2007; Clark and Doh, 2011; Chan, Koop, Leon-Gonzalez, and Strachan, 2012).

Table 1: Relative log predictive likelihoods for 1-quarter-ahead density forecasts.

GDP Inflation Interest JointVAR(2) 0.0 0.0 0.0 0.0VARMA(2,1) -0.1 3.8 8.8 12.0VARMA(2,1)-SV -1.3 12.4 38.6 33.2VAR(3) 2.7 5.0 4.6 10.3VARMA(3,1) 3.3 12.5 13.3 25.8VARMA(3,1)-SV 3.1 21.5 37.5 45.8VAR(4) 0.4 4.5 1.8 3.1VARMA(4,1) 0.9 10.8 9.4 17.2VARMA(4,1)-SV 0.8 32.9 35.9 56.1

Next, to investigate the source of differences in forecast performance, we look at the logpredictive likelihoods for each series. The results suggest that the gain in adding themoving average and stochastic volatility components comes mainly from forecasting thetwo nominal variables — inflation and interest rate — better. In fact, all the nine modelsperform very similarly in forecasting GDP growth — only models with p = 3 lags do alittle better than the rest.

Tables 2–3 present the results for 2- and 3-quarter-ahead density forecasts, respectively.Overall, VARMAs with stochastic volatility perform best, whereas VARMAs with con-stant variance do not perform as well as VARs. For example, for the 2-quarter-aheadforecasts, VARMAs are better at forecasting inflation compared to VARs, but they doworse at forecasting GDP growth and interest rate, and overall as well. These results are

15

Page 17: Gibbs Samplers for VARMA and Its Extensions

perhaps not surprising, as the moving average components in VARMAs are of first order,and they are not designed to produce long horizon forecasts. Despite this, VARMAs withstochastic volatility perform better than VARs in all cases, highlighting the importanceof allowing for time-varying volatility.

Table 2: Relative log predictive likelihoods for 2-quarter-ahead density forecasts.

GDP Inflation Interest JointVAR(2) 0.0 0.0 0.0 0.0VARMA(2,1) -6.2 8.4 -12.9 -10.1VARMA(2,1)-SV -7.2 15.6 20.8 22.9VAR(3) 1.7 -4.9 7.3 7.1VARMA(3,1) -3.0 9.2 -11.6 -0.6VARMA(3,1)-SV -2.7 19.9 17.3 32.6VAR(4) 1.4 -1.3 9.8 5.8VARMA(4,1) -2.3 12.1 -5.3 8.2VARMA(4,1)-SV -3.4 23.1 12.9 27.1

Table 3: Relative log predictive likelihoods for 3-quarter-ahead density forecasts.

GDP Inflation Interest JointVAR(2) 0.0 0.0 0.0 0.0VARMA(2,1) -3.2 -14.5 -27.2 -35.6VARMA(2,1)-SV -4.2 -1.0 7.8 2.0VAR(3) 1.0 -11.4 12.4 4.5VARMA(3,1) -7.8 -6.6 -18.9 -24.6VARMA(3,1)-SV -8.0 5.9 8.6 13.4VAR(4) 0.0 -7.3 13.3 0.5VARMA(4,1) -11.1 -5.3 -16.8 -24.1VARMA(4,1)-SV -10.5 4.0 1.5 3.6

Figures 1–3 allow us to investigate the forecast performance of the competing models inmore detail by plotting the cumulative sums of log predictive likelihoods over the wholeevaluation period. For 1-quarter-ahead forecasts, it is clear that overall VARMAs — withor without stochastic volatility — consistently perform better than VARs. It is also clearthat allowing for stochastic volatility is important, especially after the early 1990s.

16

Page 18: Gibbs Samplers for VARMA and Its Extensions

1970 1980 1990 2000 2010−10

0

10

20

30

40

50

60

70

1970 1980 1990 2000 2010−10

0

10

20

30

40

50

60

70

VAR(4)VARMA(4,1)VARMA(4,1)−SV

VAR(3)VARMA(3,1)VARMA(3,1)−SV

Figure 1: Cumulative sums of log predictive likelihoods for jointly forecasting GDPgrowth, inflation, and interest rate relative to VAR(2); one-quarter-ahead forecasts.

Figures 2 and 3 also reveal some interesting patterns over time. For instance, VARMAsperform better than VARs from the beginning of the evaluation period up to aroundmid-1980s — the start of the great moderation. After that their performance graduallydeteriorates relative to VARs — with the constant-volatility variants having a steeperdecline — until about the financial crisis in 2008, when VARMAs evidently compare morefavorably against VARs. This observation suggests that VARMAs would be a valuableaddition for producing combination density forecasts, especially when the portfolio offorecasts consists of only VARs.

1970 1980 1990 2000 2010−10

0

10

20

30

40

50

60

1970 1980 1990 2000 2010−10

0

10

20

30

40

50

60

VAR(3)VARMA(3,1)VARMA(3,3)−SV

VAR(4)VARMA(4,1)VARMA(4,1)−SV

Figure 2: Cumulative sums of log predictive likelihoods for jointly forecasting GDPgrowth, inflation, and interest rate relative to VAR(2); two-quarter-ahead forecasts.

17

Page 19: Gibbs Samplers for VARMA and Its Extensions

1970 1980 1990 2000 2010−40

−30

−20

−10

0

10

20

30

40

50

1970 1980 1990 2000 2010−40

−30

−20

−10

0

10

20

30

40

50

VAR(3)VARMA(3,1)VARMA(3,1)−SV

VAR(4)VARMA(4,1)VARMA(4,1)−SV

Figure 3: Cumulative sums of log predictive likelihoods for jointly forecasting GDPgrowth, inflation, and interest rate relative to VAR(2); three-quarter-ahead forecasts.

6 Concluding Remarks and Future Research

We have introduced a latent factors representation of the VARMA model, as well astwo alternative approaches to identification in this setting. We have developed a Gibbssampler for the model and discussed how this algorithm can be extended to modelswith time-varying parameters and stochastic volatility. The proposed methodology wasdemonstrated using a density forecasting exercise, in which we also showed that VARMAswith stochastic volatility forecast better standard VARs.

The methodology developed in this article leads to a few lines of inquiry in the future.First, it would be of interest to develop informative priors for large VARMAs analogousto those proposed in the large VARs literature, and investigate if further forecast im-provement can be achieved by introducing the moving average components. In addition,empirical investigation of time-varying parameter VARMAs and regime-switching VAR-MAs also seems beneficial. Moreover, in the discussion of identification with additionaltime series, we have recovered the popular FAVAR as a VARMA with a particular setof identification restrictions. It would be interesting to further investigate alternativeidentification restrictions, and how they compare to the standard ones.

18

Page 20: Gibbs Samplers for VARMA and Its Extensions

Appendix

In this appendix we discuss the details of Steps 2 and 3 in the basic Gibbs sampler forthe VARMA(p, q) model in (2). To sample (ψ |y,β, f ,Σ,Ω), note that the innovation εthas a diagonal covariance matrix Σ. Hence, we can estimate the MA coefficients in (2)equation by equation. To that end, define y∗

t = yt −µ−Π1yt−1 − · · · −Πpyt−p, and lety∗it denote the i-th element of y∗

t . Let φj,i denote the i-th column of Ψ′

j , and accordingly,ψj,i be the free elements in φj,i, such that

φ0,i = R0,iψ0,i + ιi,

φj,i = Rj,iψj,i, for j > 1,

where ιi is an n× 1 vector with the i-th element ιii = 1 and all others set to zero. Rj,i

in this context is a pre-determined selection matrix of appropriate dimensions, whereasour assumptions regarding Ψ0 correspond to R0,i = (Ii−1, 0)

′ for i > 1 (for i = 1, itis clear that φ0,1 = ι1 and there are no free elements). In addition, imposing a lowertriangular Ψ1 (i.e. the identification strategy applied in the text) is equivalent to settingR1,i = (Ii, 0)

′ for i ≥ 1. Using this formulation, define Ri = diag(R0,1,R1,1, . . . ,Rq,i)and ψi = (ψ′

0,i,ψ′

1,i . . . ,ψ′

q,i)′, such that φi = Riψi + (ι′i, 0)

′ forms the i-th column of(Ψ0,Ψ1, . . . ,Ψq)

′.

Then, the i-th equation in (2) can be written as

y∗it = fit + wtRiψi + εit,

where wt = (f ′t , f′

t−1, . . . , f′

t−q), fit is the i-th element of ft, and εit is the i-th element ofεt. Denoting further wit = wtRi, standard results dictate

(ψi |y,β, f ,Σ,Ω) ∼ N (ψi,Dψi),

where

Dψi=

(V−1

ψi

+1

σ2i

T∑

t=1

w′

itwit

)−1

, ψi = Dψi

(V−1

ψi

ψ0i +1

σ2i

T∑

t=1

w′

it(y∗

it − fit)

),

and σ2i is the i-th diagonal element of Σ.

For Step 2, note that both p(Σ |y,β, f ,ψ) and p(Ω |y,β, f ,ψ) are products of inverse-gamma densities, and can therefore be sampled using standard methods. In fact, wehave

(σ2

i |y,β, f ,Ψ) ∼ IG

(νσi + T/2, Sσi +

T∑

t=1

(y∗it − fit −witψi)2/2

)

(ω2

i |y,β, f ,Ψ) ∼ IG

(νωi + T/2, Sωi +

T∑

t=1

f 2

it/2

).

19

Page 21: Gibbs Samplers for VARMA and Its Extensions

References

S. An and F. Schorfheide. Bayesian analysis of DSGE models. Econometric Reviews, 26(2-4):113–172, 2007.

G. Athanasopoulos and F. Vahid. VARMA versus VAR for macroeconomic forecasting.Journal of Business and Economic Statistics, 26(2):237–252, 2008.

M. Banbura, D. Giannone, and L. Reichlin. Large Bayesian vector auto regressions.Journal of Applied Econometrics, 25(1):71–92, 2010.

B. Bernanke, J. Boivin, and P. S. Eliasz. Measuring the effects of monetary policy: Afactor-augmented vector autoregressive (FAVAR) approach. The Quarterly Journal of

Economics, 120(1):387–422, 2005.

F. Canova. Modelling and forecasting exchange rates with a Bayesian time-varying coef-ficient model. Journal of Economic Dynamics and Control, 17:233–261, 1993.

J. C. C. Chan. Moving average stochastic volatility models with application to inflationforecast. Research School of Economics, Australian National University, 2012.

J. C. C. Chan and I. Jeliazkov. Efficient simulation and integrated likelihood estimationin state space models. International Journal of Mathematical Modelling and Numerical

Optimisation, 1:101–120, 2009.

J. C. C. Chan, G. Koop, R. Leon-Gonzalez, and R. Strachan. Time varying dimensionmodels. Journal of Business and Economic Statistics, 30:358–367, 2012.

T. E. Clark and T. Doh. A Bayesian evaluation of alternative models of trend inflation.Working paper, Federal Reserve Bank of Cleveland, 2011.

T. Cogley and T. J. Sargent. Drifts and volatilities: monetary policies and outcomes inthe post WWII US. Review of Economic Dynamics, 8(2):262 – 302, 2005.

E. Eisenstat. A short-term demand forecasting simulation model for information-driven resource planning. In Proceedings of the 17th International Conference on the

Knowledge-Based Organization 3: Applied Technical Sciences and Advanced Military

Technologies, Sibiu, Romania, November 2011.

J. Geweke and G. Amisano. Hierarchical Markov normal mixture models with applicationsto financial asset returns. Journal of Applied Econometrics, 26:1–29, 2011.

S. Kim, N. Shepherd, and S. Chib. Stochastic volatility: Likelihood inference and com-parison with ARCH models. Review of Economic Studies, 65(3):361–393, 1998.

G. Koop. Bayesian Econometrics. Wiley & Sons, New York, 2003.

G. Koop. Forecasting with medium and large Bayesian VARs. Journal of Applied Econo-

metrics, 2011. DOI: 10.1002/jae.1270.

20

Page 22: Gibbs Samplers for VARMA and Its Extensions

G. Koop and D. Korobilis. Bayesian multivariate time series methods for empiricalmacroeconomics. Foundations and Trends in Econometrics, 3(4):267–358, 2010.

G. Koop and S. M. Potter. Estimation and forecasting in models with multiple breaks.Review of Economic Studies, 74:763–789, 2007.

D. Korobilis. Assessing the transmission of monetary policy shocks using time-varyingparameter dynamic factor models. Oxford Bulletin of Economics and Statistics, 2012.Forthcoming.

D. P. Kroese, T. Taimre, and Z. I. Botev. Handbook of Monte Carlo Methods. John Wiley& Sons, New York, 2011.

R. Litterman. Forecasting with Bayesian vector autoregressions — five years of experi-ence. Journal of Business and Economic Statistics, 4:25–38, 1986.

W. J. McCausland, S. Miller, and D. Pelletier. Simulation smoothing for state-space mod-els: A computational efficiency analysis. Computational Statistics and Data Analysis,55:199–212, 2011.

K. Metaxoglou and A. Smith. Maximum likelihood estimation of VARMA models usinga state-space EM algorithm. Journal of Time Series Analysis, 28(5):666–685, 2007.

R. Paap and H. van Dijk. Bayes estimates of Markov trends in possibly cointegrated series:An application to us consumption and income. Journal of Business and Economic

Statistics, 21:547–563, 2003.

M. S. Peiris. On the study of some functions of multivariate ARMA processes. Journal

of Time Series Analysis, 25(1):146–151, 1988.

G. E. Primiceri. Time varying structural vector autoregressions and monetary policy.Review of Economic Studies, 72(3):821–852, 2005.

C. A. Sims. Macroeconomics and reality. Econometrica, 48:1–48, 1980.

J. H. Stock and M. W. Watson. Macroeconomic forecasting using diffusion indexes.Journal of Business and Economic Statistics, 20:147–162, 2002.

J. H. Stock and M. W. Watson. Why has U.S. inflation become harder to forecast?Journal of Money Credit and Banking, 39:3–33, 2007.

21