paris2012 session2

State space methods for signal extraction,parameter estimation and forecasting

Siem Jan Koopmanhttp://personal.vu.nl/s.j.koopman

Department of EconometricsVU University Amsterdam

Tinbergen Institute2012

Regression Model in time series

For a time series of a dependent variable yt and a set of regressorsXt , we can consider the regression model

yt = µ+ Xtβ + εt , εt ∼ NID(0, σ2), t = 1, . . . , n.

For a large n, is it reasonable to assume constant coefficients µ, βand σ2 ?

Assume we have strong reasons to believe that µ and β aretime-varying, the model may be written as

yt = µt + Xtβt + εt , µt+1 = µt + ηt , βt+1 = βt + ξt .

This is an example of a linear state space model.

2 / 35

State Space ModelLinear Gaussian state space model is defined in three parts:

→ State equation:

αt+1 = Ttαt + Rtζt , ζt ∼ NID(0,Qt),

→ Observation equation:

yt = Ztαt + εt , εt ∼ NID(0,Ht),

→ Initial state distribution α1 ∼ N (a1,P1).

Notice that

• ζt and εs independent for all t, s, and independent from α1;

• observation yt can be multivariate;

• state vector αt is unobserved;

• matrices Tt ,Zt ,Rt ,Qt ,Ht determine structure of model.3 / 35

State Space Model

• state space model is linear and Gaussian: therefore propertiesand results of multivariate normal distribution apply;

• state vector αt evolves as a VAR(1) process;

• system matrices usually contain unknown parameters;

• estimation has therefore two aspects:• measuring the unobservable state (prediction, filtering and

smoothing);• estimation of unknown parameters (maximum likelihood

estimation);

• state space methods offer a unified approach to a wide rangeof models and techniques: dynamic regression, ARIMA, UCmodels, latent variable models, spline-fitting and many ad-hocfilters;

• next, some well-known model specifications in state spaceform ...

4 / 35

Regression with time-varying coefficients

General state space model:


yt = Ztαt + εt , εt ∼ NID(0,Ht).

Put regressors in Zt (that is Zt = Xt),

Tt = I , Rt = I ,

Result is regression model yt = Xtβt + εt with time-varyingcoefficient βt = αt following a random walk.

In case Qt = 0, the time-varying regression model reduces to thestandard linear regression model yt = Xtβ + εt with β = α1.

Many dynamic specifications for the regression coefficient can beconsidered, see below.

5 / 35

AR(1) process in State Space Form

Consider AR(1) model yt+1 = φyt + ζt , in state space form:



with state vector αt and system matrices

Zt = 1, Ht = 0,

Tt = φ, Rt = 1, Qt = σ2

we have

• Zt and Ht = 0 imply that α1t = yt ;

• State equation implies yt+1 = φ1yt + ζt with ζt ∼ NID(0, σ2);

• This is the AR(1) model !

6 / 35

AR(2) in State Space FormConsider AR(2) model yt+1 = φ1yt + φ2yt−1 + ζt , in state spaceform:



with 2× 1 state vector αt and system matrices:

Zt =[

1 0]

, Ht = 0

Tt =

[

φ1 1φ2 0

]

, Rt =

[

10

]

, Qt = σ2

we have


• First state equation implies yt+1 = φ1yt + α2t + ζt withζt ∼ NID(0, σ2);

• Second state equation implies α2,t+1 = φ2yt ;

7 / 35

MA(1) in State Space FormConsider MA(1) model yt+1 = ζt + θζt−1, in state space form:



with 2× 1 state vector αt and system matrices:

Zt =[

1 0]

, Ht = 0

Tt =

[

0 10 0

]

, Rt =

[

1θ

]

, Qt = σ2

we have


• First state equation implies yt+1 = α2t + ζt withζt ∼ NID(0, σ2);

• Second state equation implies α2,t+1 = θζt ;

8 / 35

General ARMA in State Space Form

Consider ARMA(2,1) model

yt = φ1yt−1 + φ2yt−2 + ζt + θζt−1

in state space form

αt =

[

ytφ2yt−1 + θζt

]

Zt =[

1 0]

, Ht = 0,

Tt =

[

φ1 1φ2 0

]

, Rt =

[

1θ

]

, Qt = σ2

All ARIMA(p, d , q) models have a (non-unique) state spacerepresentation.

9 / 35

UC models in State Space FormState space model: αt+1 = Ttαt + Rtζt , yt = Ztαt + εt .

LL model µt+1 = µt + ηt and yt = µt + εt :

αt = µt , Tt = 1, Rt = 1, Qt = σ2η ,

Zt = 1, Ht = σ2ε .

LLT model µt+1 = µt + βt + ηt , βt+1 = βt + ξt andyt = µt + εt :

αt =

[

µtβt

]

, Tt =

[

1 10 1

]

, Rt =

[

1 00 1

]

, Qt =

[

σ2η 0

0 σ2ξ

]

,

Zt =[

1 0]

, Ht = σ2ε .

10 / 35

UC models in State Space Form

State space model: αt+1 = Ttαt + Rtζt , yt = Ztαt + εt .

LLT model with season: µt+1 = µt + βt + ηt , βt+1 = βt + ξt ,S(L)γt+1 = ωt and yt = µt + γt + εt :

αt =[

µt βt γt γt−1 γt−2

]′,

Tt =

1 1 0 0 00 1 0 0 00 0 −1 −1 −10 0 1 0 00 0 0 1 0

, Qt =

σ2η 0 0

0 σ2ξ 0

0 0 σ2ω

, Rt =

1 0 00 1 00 0 10 0 00 0 0

,

Zt =[

1 0 1 0 0]

, Ht = σ2ε .

11 / 35

Kalman Filter

• The Kalman filter calculates the mean and variance of theunobserved state, given the observations.

• The state is Gaussian: the complete distribution ischaracterized by the mean and variance.

• The filter is a recursive algorithm; the current best estimate isupdated whenever a new observation is obtained.

• To start the recursion, we need a1 and P1, which we assumedgiven.

• There are various ways to initialize when a1 and P1 areunknown, which we will not discuss here.

12 / 35

Kalman Filter

The unobserved state αt can be estimated from the observationswith the Kalman filter:

vt = yt − Ztat ,

Ft = ZtPtZ′t + Ht ,

Kt = TtPtZ′tF

−1t ,

at+1 = Ttat + Ktvt ,

Pt+1 = TtPtT′t + RtQtR

′t − KtFtK

′t ,

for t = 1, . . . , n and starting with given values for a1 and P1.

• Writing Yt = {y1, . . . , yt},

at+1 = E(αt+1|Yt), Pt+1 = Var(αt+1|Yt).

13 / 35

Lemma I : multivariate Normal regressionSuppose x and y are jointly Normally distributed vectors withmeans E(x) and E(y), respectively, with variance matrices Σxx andΣyy , respectively, and covariance matrix Σxy . Then

E(x |y) = µx +ΣxyΣ−1yy (y − µy ),

Var(x |y) = Σxx − ΣxyΣ−1yy Σ

′xy .

Define e = x − E(x |y). ThenVar(e) = Var([x − µx ]− ΣxyΣ

−1yy [y − µy ])

= Σxx − ΣxyΣ−1yy Σ

′xy

= Var(x |y),and

Cov(e, y) = Cov([x − µx ]− ΣxyΣ−1yy [y − µy ], y)

= Σxy − Σxy = 0.

14 / 35

Lemma II : an extension

The proof of the Kalman filter uses a slight extension of the lemmafor multivariate Normal regression theory.

Suppose x , y and z are jointly Normally distributed vectors withE(z) = 0 and Σyz = 0. Then

E(x |y , z) = E(x |y) + ΣxzΣ−1zz z ,

Var(x |y , z) = Var(x |y)− ΣxzΣ−1zz Σ

′xz ,

Can you give a proof of Lemma II ?Hint : apply Lemma I for x and y∗ = (y ′, z ′)′.

15 / 35

Kalman Filter : derivation


• Writing Yt = {y1, . . . , yt}, define

at+1 = E(αt+1|Yt), Pt+1 = Var(αt+1|Yt);

• The prediction error is

vt = yt − E(yt |Yt−1)

= yt − E(Ztαt + εt |Yt−1)

= yt − Zt E(αt |Yt−1)

= yt − Ztat ;

• It follows that vt = Zt(αt − at) + εt and E(vt) = 0;

• The prediction error variance is Ft = Var(vt) = ZtPtZ′t + Ht .

16 / 35

Kalman Filter : derivation


• We have Yt = {Yt−1, yt} = {Yt−1, vt} and E(vtyt−j) = 0 forj = 1, . . . , t − 1;

• Lemma E(x |y , z) = E(x |y) + ΣxzΣ−1zz z , and take x = αt+1,

y = Yt−1 and z = vt = Zt(αt − at) + εt ;

• It follows that E(αt+1|Yt−1) = Ttat ;

• Further, E(αt+1v′t) = Tt E(αtv

′t) + Rt E(ζtv

′t) = TtPtZ

′t ;

• We carry out lemma and obtain the state update

at+1 = E(αt+1|Yt−1, yt)

= Ttat + TtPtZ′tF

−1t vt

= Ttat + Ktvt ;

with Kt = TtPtZ′tF

−1t

17 / 35

Kalman Filter : derivation• Our best prediction of yt is Ztat . When the actualobservation arrives, calculate the prediction errorvt = yt − Ztat and its variance Ft = ZtPtZ

′t + Ht . The new

best estimates of the state mean is based on both the oldestimate at and the new information vt :

at+1 = Ttat + Ktvt ,

similarly for the variance:

Pt+1 = TtPtT′t + RtQtR

′t − KtFtK

′t .

Can you derive the updating equation for Pt+1 ?• The Kalman gain

Kt = TtPtZ′tF

−1t

is the optimal weighting matrix for the new evidence.• You should be able to replicate the proof of the Kalman filterfor the Local Level Model (DK, Chapter 2).

18 / 35

Kalman Filter Illustration

1880 1900 1920 1940 1960

500

750

1000

1250

observation filtered level a_t

1880 1900 1920 1940 1960

6000

7000

8000

9000

10000state variance P_t

1880 1900 1920 1940 1960

−250

0

250

prediction error v_t

1880 1900 1920 1940 1960

21000

22000

23000

24000

25000prediction error variance F_t

19 / 35

Prediction and FilteringThis version of the Kalman filter computes the predictions of thestates directly:

at+1 = E(αt+1|Yt), Pt+1 = Var(αt+1|Yt).

The filtered estimates are defined by

at|t = E(αt |Yt), Pt|t = Var(αt |Yt),

for t = 1, . . . , n.

The Kalman filter can be “re-ordered” to compute both:

at|t = at +Mtvt , Pt|t = Pt −MtFtM′t ,

at+1 = Ttat|t , Pt+1 = TtPt|tT′t + RtQtR

′t ,

with Mt = PtZ′tF

−1t for t = 1, . . . , n and starting with a1 and P1.

Compare Mt with Kt . Verify this version of KF using earlierderivations.

20 / 35

Smoothing

• The filter calculates the mean and variance conditional on Yt ;

• The Kalman smoother calculates the mean and varianceconditional on the full set of observations Yn;

• After the filtered estimates are calculated, the smoothingrecursion starts at the last observations and runs until thefirst.

α̂t = E(αt |Yn), Vt = Var(αt |Yn),

rt = weighted sum of future innovations, Nt = Var(rt),

Lt = Tt − KtZt .

Starting with rn = 0, Nn = 0, the smoothing recursions are givenby

rt−1 = F−1t vt + Lt rt , Nt−1 = F−1

t + L′tNtLt ,

α̂t = at + Ptrt−1, Vt = Pt − PtNt−1Pt .

21 / 35

Smoothing Illustration

1880 1900 1920 1940 1960

500

750

1000

1250

observations smoothed state

1880 1900 1920 1940 1960

2500

3000

3500

4000 V_t

1880 1900 1920 1940 1960

−0.02

0.00

0.02 r_t

1880 1900 1920 1940 1960

0.000025

0.000050

0.000075

0.000100 N_t

22 / 35

Filtering and Smoothing

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970

500

600

700

800

900

1000

1100

1200

1300

1400observation smoothed level

filtered level

23 / 35

Missing Observations

Missing observations are very easy to handle in Kalman filtering:

• suppose yj is missing

• put vj = 0,Kj = 0 and Fj = ∞ in the algorithm

• proceed further calculations as normal

The filter algorithm extrapolates according to the state equationuntil a new observation arrives. The smoother interpolatesbetween observations.

24 / 35

Missing Observations

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970

500

750

1000

1250observation a_t

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970

10000

20000

30000

P_t

25 / 35

Missing Observations, Filter and Smoohter

1880 1900 1920 1940 1960

500

750

1000

1250

filtered state

1880 1900 1920 1940 1960

10000

20000

30000

P_t

1880 1900 1920 1940 1960

500

750

1000

1250

smoothed state

1880 1900 1920 1940 1960

2500

5000

7500

10000V_t

26 / 35

Forecasting

Forecasting requires no extra theory: just treat future observationsas missing:

• put vj = 0,Kj = 0 and Fj = ∞ for j = n + 1, . . . , n + k

• proceed further calculations as normal

• forecast for yj is Zjaj

27 / 35

Forecasting

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000

500

750

1000

1250observation a_t

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000

10000

20000

30000

40000

50000P_t

28 / 35

Parameter EstimationThe system matrices in a state space model typically depends on aparameter vector ψ. The model is completely Gaussian; weestimate by Maximum Likelihood.The loglikelihood af a time series is

log L =

n∑

t=1

log p(yt |Yt−1).

In the state space model, p(yt |Yt−1) is a Gaussian density withmean at and variance Ft :

log L = −nN

2log 2π − 1

2

n∑

t=1

(

log |Ft |+ v ′tF−1t vt

)

,

with vt and Ft from the Kalman filter. This is called the prediction

error decomposition of the likelihood. Estimation proceeds bynumerically maximising log L.

29 / 35

ML Estimate of Nile Data

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970

500

600

700

800

900

1000

1100

1200

1300

1400observation level (q = 1000)

level (q = 0) level (q = 0.0973 = ML estimate)

30 / 35

Diagnostics

• Null hypothesis: standardised residuals

vt/√F t ∼ NID(0, 1)

• Apply standard test for Normality, heteroskedasticity, serialcorrelation;

• A recursive algorithm is available to calculate smootheddisturbances (auxilliary residuals), which can be used to detectbreaks and outliers;

• Model comparison and parameter restrictions: use likelihoodbased procedures (LR test, AIC, BIC).

31 / 35

Nile Data Residuals Diagnostics

1880 1900 1920 1940 1960

−2

0

2Residual Nile

0 5 10 15 20

−0.5

0.0

0.5

1.0 CorrelogramResidual Nile

−4 −3 −2 −1 0 1 2 3 4

0.1

0.2

0.3

0.4

0.5N(s=0.996)

−2 −1 0 1 2

−2

0

2

QQ plotnormal

32 / 35

Assignment 1 - Exercises 1 and 2

1. Formulate the following models in state space form:• ARMA(3,1) process;• ARMA(1,3) process;• ARIMA(3,1,1) process;• AR(3) plus noise model;• Random Walk plus AR(3) process.

Please discuss the initial conditions for αt in all cases.

2. Derive a Kalman filter for the local level model

yt = µt + ξt , ξt ∼ N(0, σ2ξ ), ∆µt+1 = ηt ∼ N(0, σ2η),

with E (ξtηt) = σξη 6= 0 and E (ξtηs) = 0 for all t, s and t 6= s.Also discuss the problem of missing obervations in this case.

33 / 35

Assignment 1 - Exercise 3

3.

Consider a time series for yt that is modelled by a state spacemodel and consider a time series vector of k exogenous variablesXt . The dependent time series yt depend on the explanatoryvariables Xt in a linear way.

• How would you introduce Xt as regression effects in the statespace model ?

• Can you allow the transition matrix Tt to depend on Xt too ?

• Can you allow the transition matrix Tt to depend on yt ?

For the last two questions, please also consider maximumlikelihood estimation of coefficients within the system matrices.

34 / 35

Assignment 1 - Computing work

• Consider Chapter 2 of Durbin and Koopman book.

• There are 8 figures in Chapter 2.

• Write computer code that can reproduce all these figures inChapter 2, except Fig. 2.4.

• Write a short documentation for the Ox program.

35 / 35

paris2012 session2

Documents