mit14_384f13_lec17

Review from last time 1

14.384 Time Series Analysis, Fall 2007Professor Anna Mikusheva

Paul Schrimpf, scribeOctober 23, 2007

revised October 22, 2009

Lecture 17

Unit Roots

Review from last time

Let yt be a random walk

yt = yt + t , = 1

where t is a martingale difference sequence (E[t|It] = 0), with 1TE[2t |I 2t 1] a.s. and E4t < K

Adding a constant 2

Adding a constant

Suppose we add a constant to our OLS regression of yt. This is equivalent to running OLS on demeaned yt.Let ymt = yt y. Then,

1 = ymt1t

(ymt1)2

Consider the numerator:

1 ym

1t 1 ( t =T

yt1

T y)t

1=

yt

T1t y

y = 1T 3/2

y 1t 1 t. We know that 1 3/2 yt 1 WT T dt, and 1 T t

1 2

W (1). Also, from beforewe know that ytT 1t WdW . Combining this, we have:

1 ymt 1

2t W (s)dW (s) W (1) W (t)dt

T

(

)We can think of this as the integral of a demeaned Brownian motion,

W (s)dW (s)W (1)W (t)dt =

(W (s)

W (t)dt

)dW (s)

=Wm(s)dW (s)

The limiting distributions of all statistics (, t, etc) would change. In most cases, the change is only inreplacing W by Wm. One also can include a linear trend in the regression, then the limiting distributionwould depend on a detrended Brownian motion.

Limiting Distribution

Lets return to the case without a constant. The limiting distribution of the OLS esitmate is T ( 1)WdW

W 2dt

. This distribution is skewed and shifted to the left. If we include a constant, the distribution is evenmore shifted. If we also include a trend, the distribution shifts yet more. For example, the 2.5%-tile withouta constant is -10.5, with a constant is -16.9, and with a trend is -25. The 97.5%-tiles are 1.6, 0.41, and -1.8,respectively. Thus, estimates of have bias of order 1T . This bias is quite large in small samples.

The distribution of the t-statistic is also shifted to the left and skewed, but less so than the distributionof the OLS estimate. The 2.5%-tiles are -2.23, -3.12, -3.66 and of the 97.5%-tiles are 1.6, 0.23, and -0.66 forno constant, with a constant, and with a trend, respectively.

Allowing Auto-correlation

So far, we have assumed that t are not auto-correlated. This assumption is too strong for empirical work.Let,

yt = yt1 + vt

where vt is a zero-mean stationary process with an MA representation,

vt =

cjj

j=0

Cite as: Anna Mikusheva, course materials for 14.384 Time Series Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu),Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Augmented Dickey Fuller 3

where j is white noise.before (

vt,yt 1vt,

Now we needto look at the limiting distributions of each of the terms we looked aty2t , etc). vt is a term we already encountered in HAC estimation:

1 TT

vt

t=1

N(0, 2)

where 2 is the long-term variance,

2 =

j=

j = 2c(1)2 2

For linearly dependent process, vt, we have the following central limit theorem:

1[T ]

T () = vt W ( )T

t=1

This can be proven by using the Beveridge-Nelson decomposition. All the other terms converge to thingssimilar to the uncorrelated case, except with replaced by . For example,

y = T (t/T ) W (t/T )T

1T 3/2

yt1

W (s)ds

An exception is:

1 1 1yt 1vt = y2

T 2T T

2T

v2t

1 (2W (1)22

2v)2 22 WdW + v

2

This leads to an extra constant in the distribution of :

T ( 1) 2WdW 1 2v 2 2

W 2dt

This additional constant is nuisance parameter, and thus, the statistic is not pivotal. So it is impossibleto just look up critical values in a table. Phillips & Perron (1988) suggested corrected statistics:

12 2 2

T ( 1) + 1T 2

where 2 is an estimate of the long-run variance, say

vy2 2t

WdWW dt

Newey-West, and 2v is the variance of the residuals.

Augmented Dickey Fuller

Another approach is the augmented Dickey-Fuller test. Suppose,

p

yt =

ajytj + tj=1


Augmented Dickey Fuller 4

where z = 1 is a root,

p

1

ajzj =0

j=1

1 = aj

Suppose we factor a(L) as

p

1 ajLj = (1 L)b(L)

with pb(L) = 1 1j=1 jLj . We can write the

model as

p1yt =

jyt +j t

j=1

This suggests estimating:

p1yt = yt1 +

jytj + t

j=1

and testing whether = 1 to see if we have a unit root. This is another way of allowing auto-correlationbecause we can write the model as:

a(L)yt =tb(L)yt =t

y 1t =b(L) tyt = yt + v1 t

where vt = b(L)1t is auto-correlated.The nice thing about the augemented Dickey-Fuller test is that the coefficients on ytj each converge

to a normal distribution at rateT , and the coefficient on yt1 converges to a non-standard distribution

without nuisance parameters. To see this, let xt = [yt1,yt1, ...,yt ]p+1 and let = [, 1, ..., p ].1

Consider: = (X X)1(X ). We need to normalize this so that it converges. Our normalizing matrixwill be

T ... 0 0 T 0Q =

.. . .. .

0T

Multiplying

by Q gives:

Q( ) =(Q1(X X)1Q1)1(Q1X )The denominator is: 1 y2 1T 2 t 1 3/2 yt 1y

1 T Q (X X) Q1 = 1 t1 ...11 T 3/2yt1yt X1

T

X..

.


Other Tests 5

Note that 13/2 yt 1p

yt j 0 (since we proved that 1 yt 1 )).T T y1 2

t 2 ( W (1)2 2j v Also, we

know that 1 2T 2

y 2

t 1 2

W dt, so

2 W 2dt 0 ...

Q1(X X)1 EX XQ 1 0

..

.

Similarly,

Q1X =( 1

T

yt1t

1 WdW

ytjt

)(N(0, E

[X XT ]

2)

)Thus,

WdWT ( 1)

W 2dt

or

T ( 1) WdWT ( 1) =

1 j

W 2dt

Other Tests

Sargan-Bhargrava (1983) For a non-stationary process, the variance of yt grows with time, so we couldlook at:

1 2T 2 y1

t1T y

2t1

1 2

when is an m.d.s., this converges to a distribution, T2 tty 1

1 2y 1 W 2(dt). For a stationary process, this

converges to 0 in probability.T

t

Range (Mandelbrot and coauthors) You could also look at the range of yt, this should blow up fornon-stationary processes for a random walk, we have:

1 (max min )T yt yt (

1 W

y2 sup ) inf W ()

T t

Test Comparison

The Sargan-Bhargrava (1983) test has optimal asymptotic power against local alternatives, but has bad sizedistortions in small samples, especially if errors are negatively autocorrelated. The augmented Dickey-Fullertest with the BIC used to choose lag-length has overall good size in finite samples.


MIT OpenCourseWarehttp://ocw.mit.edu

14.384 Time Series AnalysisFall 2013

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

Review from last timeAdding a constantLimiting Distribution

Allowing Auto-correlationAugmented Dickey Fuller

Other TestsTest Comparison

mit14_384f13_lec17

Documents