forecasting with weakly identified linear state-space models. · logo motivation weakly identiﬁed...

logo

Forecasting with Weakly Identified LinearState-Space Models.

Sebastien Blais

Bank of Canada

4 November 2009

logo

Motivation Weakly Identified LSSMs Simulation Results The Identification Principle Conclusion

Outline

1 Motivation

2 Weakly Identified LSSMs

3 Simulation Results

4 The Identification Principle

5 Conclusion

logo


Likelihood-based inference

Consider a model with E [yt+1|y1:t ,θ] = f (y1:t ,θ) and a prior distributionp(θ).

The predictive density is

p(yT+1|y1:T ) =

∫

p(yT+1|y1:T ,θ)p(θ|y1:T ) dθ.

logo





p(yT+1|y1:T ) =

∫


Minimizing expected square error yields a point forecast

E [yT+1|y1:T ] = arg minδ

∫

(yT+1 − δ)2p(yT+1|y1:T ) dyT+1

6= f (y1:T ,θMLE ) in finite sample

logo





p(yT+1|y1:T ) =

∫


Minimizing expected square error yields a point forecast

E [yT+1|y1:T ] = arg minδ

∫

(yT+1 − δ)2p(yT+1|y1:T ) dyT+1

6= f (y1:T ,θMLE ) in finite sample

When does it matter?

Posterior averaging is beneficial when a linear state-space modelis weakly identified.

logo


When is a LSSM weakly identified?

When the “true” parameter values are “close” to the regionwhere the model is locally unidentified.

logo




Weak identification (empirical underidentification, localalmost nonidentification) is a finite-sample problem.

logo




Examples:

Multicollinearity - correlation is “close” to being equal to 1

Weak instruments - correlation is “close” to being equal to0

ARMA processes - MA and AR roots are “close” tocanceling out

logo




Consequences: irregular finite-sample distributions (far from asymmetric normal, e.g. multimodal)

Parameter point estimators are bad summaries ofuncertainty (e.g. strong bias)

Unreliable asymptotics

Confidence regions can be disjoint (challenges forcommunication)

logo


Related literature

1 Weak identification and “irregular” distributions.The likelihood function of a finite mixture distribution isinvariant with respect to permutations of its componentdistributions (“Label switching”)The likelihood function of latent factor models is invariantwith respect to factor permutationsThe likelihood function of latent factor models is invariantwith respect to factor reflections (“Sign switching”)No valid bounded confidence interval for a parameter existsif this parameter is not identifiable on a subset of theparameter space.

2 How best to normalize?Normalization in structural equations models affects thefinite-sample distribution of OLS and 2SLS estimators(Hillier, 1990).Normalization becomes critical when weak identificationissues arise (Hamilton, Waggoner and Zha, 2007)

logo


Related literature

1 Weak identification and “irregular” distributions.The likelihood function of a finite mixture distribution isinvariant with respect to permutations of its componentdistributions (“Label switching”)

Dick and Bowden (1973), Redner and Walker (1984),Stephen (1997,2000), Celeux, Hurn and Robert (2000),Fruhwirth-Schnatter (2001), Geweke (2007)

The likelihood function of latent factor models is invariantwith respect to factor permutationsThe likelihood function of latent factor models is invariantwith respect to factor reflections (“Sign switching”)No valid bounded confidence interval for a parameter existsif this parameter is not identifiable on a subset of theparameter space.

2 How best to normalize?Normalization in structural equations models affects thefinite-sample distribution of OLS and 2SLS estimators(Hillier, 1990).

logo


Related literature



logo


Related literature

1 Weak identification and “irregular” distributions.The likelihood function of a finite mixture distribution isinvariant with respect to permutations of its componentdistributions (“Label switching”)The likelihood function of latent factor models is invariantwith respect to factor permutations

Jennrich (1978)The likelihood function of latent factor models is invariantwith respect to factor reflections (“Sign switching”)No valid bounded confidence interval for a parameter existsif this parameter is not identifiable on a subset of theparameter space.


logo


Related literature



logo


Related literature

1 Weak identification and “irregular” distributions.The likelihood function of a finite mixture distribution isinvariant with respect to permutations of its componentdistributions (“Label switching”)The likelihood function of latent factor models is invariantwith respect to factor permutationsThe likelihood function of latent factor models is invariantwith respect to factor reflections (“Sign switching”)

Blais (this paper), Box and Jenkins (1976), Stoffer and Wall(1991), Kleibergen and Hoek (2000), Fruhwirth-Schnatterand Wagner (2008)

No valid bounded confidence interval for a parameter existsif this parameter is not identifiable on a subset of theparameter space.

2 How best to normalize?Normalization in structural equations models affects thefinite-sample distribution of OLS and 2SLS estimators(Hillier, 1990).

logo


Related literature



logo


Related literature


Gleser and Hwang (1987), Dufour (1997)2 How best to normalize?

Normalization in structural equations models affects thefinite-sample distribution of OLS and 2SLS estimators(Hillier, 1990).Normalization becomes critical when weak identificationissues arise (Hamilton, Waggoner and Zha, 2007)

logo


Related literature

1 Weak identification and “irregular” distributions.2 How best to normalize?


logo


Related literature

1 Weak identification and “irregular” distributions.2 How best to normalize?


“A poor normalization can lead to multimodal distributions,disjoint confidence intervals, and very misleadingcharacterizations of the true statistical uncertainty.”They propose an “identification principle”.

logo


Outline

1 Motivation




5 Conclusion

logo


The likelihood function of Gaussian Linear State-Space Models(LSSMs )

y t (N×1) = B + H′ξt + wt , wt ∼ N (0,R)

ξt+1(K×1) = Fξt + v t , v t ∼ N (0,Q)

logo



y t (N×1) = B + H′ξt + wt , wt ∼ N (0,R)

ξt+1(K×1) = Fξt + v t , v t ∼ N (0,Q)

is invariant with respect to linear transformations: for anyinvertible matrix M,

l(

B,H,R,F,Q∣

∣ y)

= l(

B,M−1′H,R,MFM−1,MQM′∣

∣ y)

∀y ∈ Y.

y t = B + H′M−1Mξt + wt , wt ∼ N (0,R)

Mξt+1 = MFM−1Mξt + v t , v t ∼ N (0,MQM′)

logo



y t (N×1) = B + H′ξt + wt , wt ∼ N (0,R)

ξt+1(K×1) = Fξt + v t , v t ∼ N (0,Q)

is invariant with respect to linear transformations: for anyinvertible matrix M,

l(

B,H,R,F,Q∣

∣ y)

= l(

B,M−1′H,R,MFM−1,MQM′∣

∣ y)

∀y ∈ Y.

Elementary linear transformations:

M = D : diagonal scale matrix

M = O : rotation matrix

M = P : permutation matrix

M = S : diagonal reflection matrix

logo



y t (N×1) = B + H′ξt + wt , wt ∼ N (0,R)

ξt+1(K×1) = Fξt + v t , v t ∼ N (0,Q)

Example

LSSMs are invariant with respect to a (finite) set of 2K

reflections. With K = 2, these transformations are

S ∈

{[

1 00 1

]

,

[

−1 00 1

]

,

[

1 00 −1

]

,

[

−1 00 −1

]}

logo


DefinitionA normalization is a parameter subspace

ΘN ⊆ Θ

Example

y t = B + H′ξt + wt , wt ∼ N (0,R)

ξt+1 = Fξt + v t , v t ∼ N (0,Q)

The normalization

ΘQ1 ={

θ ∈ Θ∣

∣ Qkk = 1,k = 1, . . . ,K}

breaks scale invariance.

logo


What can be done?

Taking parameter uncertainty into account will improveforecasts → Simulation results.

Taking parameter uncertainty into account becomes morebeneficial the weaker the identification of some parameters.

Normalization ensuring global identification may helpcommunication → An identification principle.

Normalizations satifying the identification principle are morelikely to yield unimodal distributions.Many normalizations satisfy this principle: it may be usefulto try several of them.

logo


Outline

1 Motivation




5 Conclusion

logo


Gibbs sampler

Every parameter except γ admits a conditionally conjugateprior.

I use a random-walk Metropolis-Hastings step to draw γ

with latent factors as a single block.

q(

γ′,ξ′∣

∣

∣y,γ,Φ,Σγ

)

= p(

ξ′∣

∣

∣y,γ′,Φ

)

φ(

γ′

∣

∣

∣γ,Σγ

)

,

where p(

ξ′∣

∣

∣y,γ′,Φ)

is available inclosed form and

Φ ≡ {B,Q,R,F,ξ1} .

Note: the joint proposal does not depend on ξ.

logo


Posterior averaging is beneficial for weaklyidentified LSSMs.

H All ARMA(1,1) AR(1)

Weaker reflection identification 0.005 0,795 0,817 0,655(0,020) (0,022) (0,048)M=1000 M=898 M=102

0.010 0,852 0,880 0,737↓ (0,024) (0,027) (0,050)

M=1000 M=884 M=116

0.050 0,883 0,919 0,748(0,024) (0,028) (0,051)M=1000 M=871 M=129

Stronger reflection identification 0.100 0,961 0,968 0,908(0,023) (0,026) (0,059)M=1000 M=876 M=124

The data-generating process (an ARMA(1,1)) is

ξt = Fξt−1 + vt ,

yt = B + H ′ξt + wt ,

with B=0, R =1, Q =1, F =0.95.

logo


Outline

1 Motivation




5 Conclusion

logo


Definition

Let Θl denote the nonidentification subset. A normalizationΘN ⊆ Θ satisfies the identification principe if it

a) int(

ΘN)

∩ Θl = ∅ and Θl ⊆ fr(

ΘN)

;

b) is connected;

c) provides global identification.

Note: Intersections of hyperplanes and half-spaces areconnected.

Hamilton, Waggoner and Zha (2007)“Our proposal is that the boundaries of [a normalization set]A should correspond to the loci along which the structure islocally unidentified or the log likelihood is −∞.”“One easy way to check whether a proposed normalizationset A conforms to this identification principle is to make surethat the model is locally identified at all interior points of A.”

logo


Example (Harvey, 1989, K = 2)

y t = B + H′ξt + wt , wt ∼ N (0,R)

ξt+1 = Fξt + v t , v t ∼ N (0,Q)

Normalization{

θ ∈ Θ∣

∣ H12 = 0}

does not provide globalidentification because it does not break permutation invarianceif H0

22 = 0 under the data-generating process.

H =

[

H11 0H21 H22

]

The sampling distribution of H11 will be bimodal for sufficientlylarge samples if H22 is close enough to being equal to 0.

logo



y t = B + H′ξt + wt , wt ∼ N (0,R)

ξt+1 = Fξt + v t , v t ∼ N (0,Q)

Normalization{

θ ∈ Θ∣

∣ H12 = 0}



H =

[

H11 0H21 0

]


logo



y t = B + H′ξt + wt , wt ∼ N (0,R)

ξt+1 = Fξt + v t , v t ∼ N (0,Q)

Normalization{

θ ∈ Θ∣

∣ H12 = 0}



H =

[

H21 0H11 0

]


logo


A normalization of LSSMs that ensures globalidentification.

I propose

ΘHO ={

θ ∈ Θ∣

∣ HH′ = I}

,

which breaks scale and rotation invariance, but preservespermutation and reflection invariance.

I parameterize a K×N row-orthogonal matrix as:H′ = B1B2 . . . BK U, ou

Bk = ρk,k+1ρk,k+2 . . . ρk,N , γk,n = arctan

Hk,n+1√

n∑

i=1H2

k,i

ρi,j =

I

cos γi,j − sin γi,jI

sin γi,j cos γi,jI

(N×N)

, U(N×K ) =[

I

0

]

logo


Outline

1 Motivation




5 Conclusion

logo


Conclusion

This paper

argues that LSSMs are subject to weak identificationissues;

shows that posterior averaging is beneficial whenforecasting with weakly identified LSSMs;

offers a normalization which can alleviate communicationproblems caused by weak identification;

describes a novel Gibbs sampler for Gaussian LSSMs.

forecasting with weakly identified linear state-space models. · logo motivation weakly identiﬁed...

Documents