paris2012 session2

35
State space methods for signal extraction, parameter estimation and forecasting Siem Jan Koopman http://personal.vu.nl/s.j.koopman Department of Econometrics VU University Amsterdam Tinbergen Institute 2012

Upload: cornec

Post on 25-Jan-2015

693 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Paris2012 session2

State space methods for signal extraction,parameter estimation and forecasting

Siem Jan Koopmanhttp://personal.vu.nl/s.j.koopman

Department of EconometricsVU University Amsterdam

Tinbergen Institute2012

Page 2: Paris2012 session2

Regression Model in time series

For a time series of a dependent variable yt and a set of regressorsXt , we can consider the regression model

yt = µ+ Xtβ + εt , εt ∼ NID(0, σ2), t = 1, . . . , n.

For a large n, is it reasonable to assume constant coefficients µ, βand σ2 ?

Assume we have strong reasons to believe that µ and β aretime-varying, the model may be written as

yt = µt + Xtβt + εt , µt+1 = µt + ηt , βt+1 = βt + ξt .

This is an example of a linear state space model.

2 / 35

Page 3: Paris2012 session2

State Space ModelLinear Gaussian state space model is defined in three parts:

→ State equation:

αt+1 = Ttαt + Rtζt , ζt ∼ NID(0,Qt),

→ Observation equation:

yt = Ztαt + εt , εt ∼ NID(0,Ht),

→ Initial state distribution α1 ∼ N (a1,P1).

Notice that

• ζt and εs independent for all t, s, and independent from α1;

• observation yt can be multivariate;

• state vector αt is unobserved;

• matrices Tt ,Zt ,Rt ,Qt ,Ht determine structure of model.3 / 35

Page 4: Paris2012 session2

State Space Model

• state space model is linear and Gaussian: therefore propertiesand results of multivariate normal distribution apply;

• state vector αt evolves as a VAR(1) process;

• system matrices usually contain unknown parameters;

• estimation has therefore two aspects:• measuring the unobservable state (prediction, filtering and

smoothing);• estimation of unknown parameters (maximum likelihood

estimation);

• state space methods offer a unified approach to a wide rangeof models and techniques: dynamic regression, ARIMA, UCmodels, latent variable models, spline-fitting and many ad-hocfilters;

• next, some well-known model specifications in state spaceform ...

4 / 35

Page 5: Paris2012 session2

Regression with time-varying coefficients

General state space model:

αt+1 = Ttαt + Rtζt , ζt ∼ NID(0,Qt),

yt = Ztαt + εt , εt ∼ NID(0,Ht).

Put regressors in Zt (that is Zt = Xt),

Tt = I , Rt = I ,

Result is regression model yt = Xtβt + εt with time-varyingcoefficient βt = αt following a random walk.

In case Qt = 0, the time-varying regression model reduces to thestandard linear regression model yt = Xtβ + εt with β = α1.

Many dynamic specifications for the regression coefficient can beconsidered, see below.

5 / 35

Page 6: Paris2012 session2

AR(1) process in State Space Form

Consider AR(1) model yt+1 = φyt + ζt , in state space form:

αt+1 = Ttαt + Rtζt , ζt ∼ NID(0,Qt),

yt = Ztαt + εt , εt ∼ NID(0,Ht).

with state vector αt and system matrices

Zt = 1, Ht = 0,

Tt = φ, Rt = 1, Qt = σ2

we have

• Zt and Ht = 0 imply that α1t = yt ;

• State equation implies yt+1 = φ1yt + ζt with ζt ∼ NID(0, σ2);

• This is the AR(1) model !

6 / 35

Page 7: Paris2012 session2

AR(2) in State Space FormConsider AR(2) model yt+1 = φ1yt + φ2yt−1 + ζt , in state spaceform:

αt+1 = Ttαt + Rtζt , ζt ∼ NID(0,Qt),

yt = Ztαt + εt , εt ∼ NID(0,Ht).

with 2× 1 state vector αt and system matrices:

Zt =[

1 0]

, Ht = 0

Tt =

[

φ1 1φ2 0

]

, Rt =

[

10

]

, Qt = σ2

we have

• Zt and Ht = 0 imply that α1t = yt ;

• First state equation implies yt+1 = φ1yt + α2t + ζt withζt ∼ NID(0, σ2);

• Second state equation implies α2,t+1 = φ2yt ;

7 / 35

Page 8: Paris2012 session2

MA(1) in State Space FormConsider MA(1) model yt+1 = ζt + θζt−1, in state space form:

αt+1 = Ttαt + Rtζt , ζt ∼ NID(0,Qt),

yt = Ztαt + εt , εt ∼ NID(0,Ht).

with 2× 1 state vector αt and system matrices:

Zt =[

1 0]

, Ht = 0

Tt =

[

0 10 0

]

, Rt =

[

]

, Qt = σ2

we have

• Zt and Ht = 0 imply that α1t = yt ;

• First state equation implies yt+1 = α2t + ζt withζt ∼ NID(0, σ2);

• Second state equation implies α2,t+1 = θζt ;

8 / 35

Page 9: Paris2012 session2

General ARMA in State Space Form

Consider ARMA(2,1) model

yt = φ1yt−1 + φ2yt−2 + ζt + θζt−1

in state space form

αt =

[

ytφ2yt−1 + θζt

]

Zt =[

1 0]

, Ht = 0,

Tt =

[

φ1 1φ2 0

]

, Rt =

[

]

, Qt = σ2

All ARIMA(p, d , q) models have a (non-unique) state spacerepresentation.

9 / 35

Page 10: Paris2012 session2

UC models in State Space FormState space model: αt+1 = Ttαt + Rtζt , yt = Ztαt + εt .

LL model µt+1 = µt + ηt and yt = µt + εt :

αt = µt , Tt = 1, Rt = 1, Qt = σ2η ,

Zt = 1, Ht = σ2ε .

LLT model µt+1 = µt + βt + ηt , βt+1 = βt + ξt andyt = µt + εt :

αt =

[

µtβt

]

, Tt =

[

1 10 1

]

, Rt =

[

1 00 1

]

, Qt =

[

σ2η 0

0 σ2ξ

]

,

Zt =[

1 0]

, Ht = σ2ε .

10 / 35

Page 11: Paris2012 session2

UC models in State Space Form

State space model: αt+1 = Ttαt + Rtζt , yt = Ztαt + εt .

LLT model with season: µt+1 = µt + βt + ηt , βt+1 = βt + ξt ,S(L)γt+1 = ωt and yt = µt + γt + εt :

αt =[

µt βt γt γt−1 γt−2

]′,

Tt =

1 1 0 0 00 1 0 0 00 0 −1 −1 −10 0 1 0 00 0 0 1 0

, Qt =

σ2η 0 0

0 σ2ξ 0

0 0 σ2ω

, Rt =

1 0 00 1 00 0 10 0 00 0 0

,

Zt =[

1 0 1 0 0]

, Ht = σ2ε .

11 / 35

Page 12: Paris2012 session2

Kalman Filter

• The Kalman filter calculates the mean and variance of theunobserved state, given the observations.

• The state is Gaussian: the complete distribution ischaracterized by the mean and variance.

• The filter is a recursive algorithm; the current best estimate isupdated whenever a new observation is obtained.

• To start the recursion, we need a1 and P1, which we assumedgiven.

• There are various ways to initialize when a1 and P1 areunknown, which we will not discuss here.

12 / 35

Page 13: Paris2012 session2

Kalman Filter

The unobserved state αt can be estimated from the observationswith the Kalman filter:

vt = yt − Ztat ,

Ft = ZtPtZ′t + Ht ,

Kt = TtPtZ′tF

−1t ,

at+1 = Ttat + Ktvt ,

Pt+1 = TtPtT′t + RtQtR

′t − KtFtK

′t ,

for t = 1, . . . , n and starting with given values for a1 and P1.

• Writing Yt = {y1, . . . , yt},

at+1 = E(αt+1|Yt), Pt+1 = Var(αt+1|Yt).

13 / 35

Page 14: Paris2012 session2

Lemma I : multivariate Normal regressionSuppose x and y are jointly Normally distributed vectors withmeans E(x) and E(y), respectively, with variance matrices Σxx andΣyy , respectively, and covariance matrix Σxy . Then

E(x |y) = µx +ΣxyΣ−1yy (y − µy ),

Var(x |y) = Σxx − ΣxyΣ−1yy Σ

′xy .

Define e = x − E(x |y). ThenVar(e) = Var([x − µx ]− ΣxyΣ

−1yy [y − µy ])

= Σxx − ΣxyΣ−1yy Σ

′xy

= Var(x |y),and

Cov(e, y) = Cov([x − µx ]− ΣxyΣ−1yy [y − µy ], y)

= Σxy − Σxy = 0.

14 / 35

Page 15: Paris2012 session2

Lemma II : an extension

The proof of the Kalman filter uses a slight extension of the lemmafor multivariate Normal regression theory.

Suppose x , y and z are jointly Normally distributed vectors withE(z) = 0 and Σyz = 0. Then

E(x |y , z) = E(x |y) + ΣxzΣ−1zz z ,

Var(x |y , z) = Var(x |y)− ΣxzΣ−1zz Σ

′xz ,

Can you give a proof of Lemma II ?Hint : apply Lemma I for x and y∗ = (y ′, z ′)′.

15 / 35

Page 16: Paris2012 session2

Kalman Filter : derivation

State space model: αt+1 = Ttαt + Rtζt , yt = Ztαt + εt .

• Writing Yt = {y1, . . . , yt}, define

at+1 = E(αt+1|Yt), Pt+1 = Var(αt+1|Yt);

• The prediction error is

vt = yt − E(yt |Yt−1)

= yt − E(Ztαt + εt |Yt−1)

= yt − Zt E(αt |Yt−1)

= yt − Ztat ;

• It follows that vt = Zt(αt − at) + εt and E(vt) = 0;

• The prediction error variance is Ft = Var(vt) = ZtPtZ′t + Ht .

16 / 35

Page 17: Paris2012 session2

Kalman Filter : derivation

State space model: αt+1 = Ttαt + Rtζt , yt = Ztαt + εt .

• We have Yt = {Yt−1, yt} = {Yt−1, vt} and E(vtyt−j) = 0 forj = 1, . . . , t − 1;

• Lemma E(x |y , z) = E(x |y) + ΣxzΣ−1zz z , and take x = αt+1,

y = Yt−1 and z = vt = Zt(αt − at) + εt ;

• It follows that E(αt+1|Yt−1) = Ttat ;

• Further, E(αt+1v′t) = Tt E(αtv

′t) + Rt E(ζtv

′t) = TtPtZ

′t ;

• We carry out lemma and obtain the state update

at+1 = E(αt+1|Yt−1, yt)

= Ttat + TtPtZ′tF

−1t vt

= Ttat + Ktvt ;

with Kt = TtPtZ′tF

−1t

17 / 35

Page 18: Paris2012 session2

Kalman Filter : derivation• Our best prediction of yt is Ztat . When the actualobservation arrives, calculate the prediction errorvt = yt − Ztat and its variance Ft = ZtPtZ

′t + Ht . The new

best estimates of the state mean is based on both the oldestimate at and the new information vt :

at+1 = Ttat + Ktvt ,

similarly for the variance:

Pt+1 = TtPtT′t + RtQtR

′t − KtFtK

′t .

Can you derive the updating equation for Pt+1 ?• The Kalman gain

Kt = TtPtZ′tF

−1t

is the optimal weighting matrix for the new evidence.• You should be able to replicate the proof of the Kalman filterfor the Local Level Model (DK, Chapter 2).

18 / 35

Page 19: Paris2012 session2

Kalman Filter Illustration

1880 1900 1920 1940 1960

500

750

1000

1250

observation filtered level a_t

1880 1900 1920 1940 1960

6000

7000

8000

9000

10000state variance P_t

1880 1900 1920 1940 1960

−250

0

250

prediction error v_t

1880 1900 1920 1940 1960

21000

22000

23000

24000

25000prediction error variance F_t

19 / 35

Page 20: Paris2012 session2

Prediction and FilteringThis version of the Kalman filter computes the predictions of thestates directly:

at+1 = E(αt+1|Yt), Pt+1 = Var(αt+1|Yt).

The filtered estimates are defined by

at|t = E(αt |Yt), Pt|t = Var(αt |Yt),

for t = 1, . . . , n.

The Kalman filter can be “re-ordered” to compute both:

at|t = at +Mtvt , Pt|t = Pt −MtFtM′t ,

at+1 = Ttat|t , Pt+1 = TtPt|tT′t + RtQtR

′t ,

with Mt = PtZ′tF

−1t for t = 1, . . . , n and starting with a1 and P1.

Compare Mt with Kt . Verify this version of KF using earlierderivations.

20 / 35

Page 21: Paris2012 session2

Smoothing

• The filter calculates the mean and variance conditional on Yt ;

• The Kalman smoother calculates the mean and varianceconditional on the full set of observations Yn;

• After the filtered estimates are calculated, the smoothingrecursion starts at the last observations and runs until thefirst.

α̂t = E(αt |Yn), Vt = Var(αt |Yn),

rt = weighted sum of future innovations, Nt = Var(rt),

Lt = Tt − KtZt .

Starting with rn = 0, Nn = 0, the smoothing recursions are givenby

rt−1 = F−1t vt + Lt rt , Nt−1 = F−1

t + L′tNtLt ,

α̂t = at + Ptrt−1, Vt = Pt − PtNt−1Pt .

21 / 35

Page 22: Paris2012 session2

Smoothing Illustration

1880 1900 1920 1940 1960

500

750

1000

1250

observations smoothed state

1880 1900 1920 1940 1960

2500

3000

3500

4000 V_t

1880 1900 1920 1940 1960

−0.02

0.00

0.02 r_t

1880 1900 1920 1940 1960

0.000025

0.000050

0.000075

0.000100 N_t

22 / 35

Page 23: Paris2012 session2

Filtering and Smoothing

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970

500

600

700

800

900

1000

1100

1200

1300

1400observation smoothed level

filtered level

23 / 35

Page 24: Paris2012 session2

Missing Observations

Missing observations are very easy to handle in Kalman filtering:

• suppose yj is missing

• put vj = 0,Kj = 0 and Fj = ∞ in the algorithm

• proceed further calculations as normal

The filter algorithm extrapolates according to the state equationuntil a new observation arrives. The smoother interpolatesbetween observations.

24 / 35

Page 25: Paris2012 session2

Missing Observations

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970

500

750

1000

1250observation a_t

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970

10000

20000

30000

P_t

25 / 35

Page 26: Paris2012 session2

Missing Observations, Filter and Smoohter

1880 1900 1920 1940 1960

500

750

1000

1250

filtered state

1880 1900 1920 1940 1960

10000

20000

30000

P_t

1880 1900 1920 1940 1960

500

750

1000

1250

smoothed state

1880 1900 1920 1940 1960

2500

5000

7500

10000V_t

26 / 35

Page 27: Paris2012 session2

Forecasting

Forecasting requires no extra theory: just treat future observationsas missing:

• put vj = 0,Kj = 0 and Fj = ∞ for j = n + 1, . . . , n + k

• proceed further calculations as normal

• forecast for yj is Zjaj

27 / 35

Page 28: Paris2012 session2

Forecasting

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000

500

750

1000

1250observation a_t

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000

10000

20000

30000

40000

50000P_t

28 / 35

Page 29: Paris2012 session2

Parameter EstimationThe system matrices in a state space model typically depends on aparameter vector ψ. The model is completely Gaussian; weestimate by Maximum Likelihood.The loglikelihood af a time series is

log L =

n∑

t=1

log p(yt |Yt−1).

In the state space model, p(yt |Yt−1) is a Gaussian density withmean at and variance Ft :

log L = −nN

2log 2π − 1

2

n∑

t=1

(

log |Ft |+ v ′tF−1t vt

)

,

with vt and Ft from the Kalman filter. This is called the prediction

error decomposition of the likelihood. Estimation proceeds bynumerically maximising log L.

29 / 35

Page 30: Paris2012 session2

ML Estimate of Nile Data

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970

500

600

700

800

900

1000

1100

1200

1300

1400observation level (q = 1000)

level (q = 0) level (q = 0.0973 = ML estimate)

30 / 35

Page 31: Paris2012 session2

Diagnostics

• Null hypothesis: standardised residuals

vt/√F t ∼ NID(0, 1)

• Apply standard test for Normality, heteroskedasticity, serialcorrelation;

• A recursive algorithm is available to calculate smootheddisturbances (auxilliary residuals), which can be used to detectbreaks and outliers;

• Model comparison and parameter restrictions: use likelihoodbased procedures (LR test, AIC, BIC).

31 / 35

Page 32: Paris2012 session2

Nile Data Residuals Diagnostics

1880 1900 1920 1940 1960

−2

0

2Residual Nile

0 5 10 15 20

−0.5

0.0

0.5

1.0 CorrelogramResidual Nile

−4 −3 −2 −1 0 1 2 3 4

0.1

0.2

0.3

0.4

0.5N(s=0.996)

−2 −1 0 1 2

−2

0

2

QQ plotnormal

32 / 35

Page 33: Paris2012 session2

Assignment 1 - Exercises 1 and 2

1. Formulate the following models in state space form:• ARMA(3,1) process;• ARMA(1,3) process;• ARIMA(3,1,1) process;• AR(3) plus noise model;• Random Walk plus AR(3) process.

Please discuss the initial conditions for αt in all cases.

2. Derive a Kalman filter for the local level model

yt = µt + ξt , ξt ∼ N(0, σ2ξ ), ∆µt+1 = ηt ∼ N(0, σ2η),

with E (ξtηt) = σξη 6= 0 and E (ξtηs) = 0 for all t, s and t 6= s.Also discuss the problem of missing obervations in this case.

33 / 35

Page 34: Paris2012 session2

Assignment 1 - Exercise 3

3.

Consider a time series for yt that is modelled by a state spacemodel and consider a time series vector of k exogenous variablesXt . The dependent time series yt depend on the explanatoryvariables Xt in a linear way.

• How would you introduce Xt as regression effects in the statespace model ?

• Can you allow the transition matrix Tt to depend on Xt too ?

• Can you allow the transition matrix Tt to depend on yt ?

For the last two questions, please also consider maximumlikelihood estimation of coefficients within the system matrices.

34 / 35

Page 35: Paris2012 session2

Assignment 1 - Computing work

• Consider Chapter 2 of Durbin and Koopman book.

• There are 8 figures in Chapter 2.

• Write computer code that can reproduce all these figures inChapter 2, except Fig. 2.4.

• Write a short documentation for the Ox program.

35 / 35