mit14_384f13_lec17
DESCRIPTION
series de tiempoTRANSCRIPT
-
Review from last time 1
14.384 Time Series Analysis, Fall 2007Professor Anna Mikusheva
Paul Schrimpf, scribeOctober 23, 2007
revised October 22, 2009
Lecture 17
Unit Roots
Review from last time
Let yt be a random walk
yt = yt + t , = 1
where t is a martingale difference sequence (E[t|It] = 0), with 1TE[2t |I 2t 1] a.s. and E4t < K
-
Adding a constant 2
Adding a constant
Suppose we add a constant to our OLS regression of yt. This is equivalent to running OLS on demeaned yt.Let ymt = yt y. Then,
1 = ymt1t
(ymt1)2
Consider the numerator:
1 ym
1t 1 ( t =T
yt1
T y)t
1=
yt
T1t y
y = 1T 3/2
y 1t 1 t. We know that 1 3/2 yt 1 WT T dt, and 1 T t
1 2
W (1). Also, from beforewe know that ytT 1t WdW . Combining this, we have:
1 ymt 1
2t W (s)dW (s) W (1) W (t)dt
T
(
)We can think of this as the integral of a demeaned Brownian motion,
W (s)dW (s)W (1)W (t)dt =
(W (s)
W (t)dt
)dW (s)
=Wm(s)dW (s)
The limiting distributions of all statistics (, t, etc) would change. In most cases, the change is only inreplacing W by Wm. One also can include a linear trend in the regression, then the limiting distributionwould depend on a detrended Brownian motion.
Limiting Distribution
Lets return to the case without a constant. The limiting distribution of the OLS esitmate is T ( 1)WdW
W 2dt
. This distribution is skewed and shifted to the left. If we include a constant, the distribution is evenmore shifted. If we also include a trend, the distribution shifts yet more. For example, the 2.5%-tile withouta constant is -10.5, with a constant is -16.9, and with a trend is -25. The 97.5%-tiles are 1.6, 0.41, and -1.8,respectively. Thus, estimates of have bias of order 1T . This bias is quite large in small samples.
The distribution of the t-statistic is also shifted to the left and skewed, but less so than the distributionof the OLS estimate. The 2.5%-tiles are -2.23, -3.12, -3.66 and of the 97.5%-tiles are 1.6, 0.23, and -0.66 forno constant, with a constant, and with a trend, respectively.
Allowing Auto-correlation
So far, we have assumed that t are not auto-correlated. This assumption is too strong for empirical work.Let,
yt = yt1 + vt
where vt is a zero-mean stationary process with an MA representation,
vt =
cjj
j=0
Cite as: Anna Mikusheva, course materials for 14.384 Time Series Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu),Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
-
Augmented Dickey Fuller 3
where j is white noise.before (
vt,yt 1vt,
Now we needto look at the limiting distributions of each of the terms we looked aty2t , etc). vt is a term we already encountered in HAC estimation:
1 TT
vt
t=1
N(0, 2)
where 2 is the long-term variance,
2 =
j=
j = 2c(1)2 2
For linearly dependent process, vt, we have the following central limit theorem:
1[T ]
T () = vt W ( )T
t=1
This can be proven by using the Beveridge-Nelson decomposition. All the other terms converge to thingssimilar to the uncorrelated case, except with replaced by . For example,
y = T (t/T ) W (t/T )T
1T 3/2
yt1
W (s)ds
An exception is:
1 1 1yt 1vt = y2
T 2T T
2T
v2t
1 (2W (1)22
2v)2 22 WdW + v
2
This leads to an extra constant in the distribution of :
T ( 1) 2WdW 1 2v 2 2
W 2dt
This additional constant is nuisance parameter, and thus, the statistic is not pivotal. So it is impossibleto just look up critical values in a table. Phillips & Perron (1988) suggested corrected statistics:
12 2 2
T ( 1) + 1T 2
where 2 is an estimate of the long-run variance, say
vy2 2t
WdWW dt
Newey-West, and 2v is the variance of the residuals.
Augmented Dickey Fuller
Another approach is the augmented Dickey-Fuller test. Suppose,
p
yt =
ajytj + tj=1
Cite as: Anna Mikusheva, course materials for 14.384 Time Series Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu),Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
-
Augmented Dickey Fuller 4
where z = 1 is a root,
p
1
ajzj =0
j=1
1 = aj
Suppose we factor a(L) as
p
1 ajLj = (1 L)b(L)
with pb(L) = 1 1j=1 jLj . We can write the
model as
p1yt =
jyt +j t
j=1
This suggests estimating:
p1yt = yt1 +
jytj + t
j=1
and testing whether = 1 to see if we have a unit root. This is another way of allowing auto-correlationbecause we can write the model as:
a(L)yt =tb(L)yt =t
y 1t =b(L) tyt = yt + v1 t
where vt = b(L)1t is auto-correlated.The nice thing about the augemented Dickey-Fuller test is that the coefficients on ytj each converge
to a normal distribution at rateT , and the coefficient on yt1 converges to a non-standard distribution
without nuisance parameters. To see this, let xt = [yt1,yt1, ...,yt ]p+1 and let = [, 1, ..., p ].1
Consider: = (X X)1(X ). We need to normalize this so that it converges. Our normalizing matrixwill be
T ... 0 0 T 0Q =
.. . .. .
0T
Multiplying
by Q gives:
Q( ) =(Q1(X X)1Q1)1(Q1X )The denominator is: 1 y2 1T 2 t 1 3/2 yt 1y
1 T Q (X X) Q1 = 1 t1 ...11 T 3/2yt1yt X1
T
X..
.
Cite as: Anna Mikusheva, course materials for 14.384 Time Series Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu),Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
-
Other Tests 5
Note that 13/2 yt 1p
yt j 0 (since we proved that 1 yt 1 )).T T y1 2
t 2 ( W (1)2 2j v Also, we
know that 1 2T 2
y 2
t 1 2
W dt, so
2 W 2dt 0 ...
Q1(X X)1 EX XQ 1 0
..
.
Similarly,
Q1X =( 1
T
yt1t
1 WdW
ytjt
)(N(0, E
[X XT ]
2)
)Thus,
WdWT ( 1)
W 2dt
or
T ( 1) WdWT ( 1) =
1 j
W 2dt
Other Tests
Sargan-Bhargrava (1983) For a non-stationary process, the variance of yt grows with time, so we couldlook at:
1 2T 2 y1
t1T y
2t1
1 2
when is an m.d.s., this converges to a distribution, T2 tty 1
1 2y 1 W 2(dt). For a stationary process, this
converges to 0 in probability.T
t
Range (Mandelbrot and coauthors) You could also look at the range of yt, this should blow up fornon-stationary processes for a random walk, we have:
1 (max min )T yt yt (
1 W
y2 sup ) inf W ()
T t
Test Comparison
The Sargan-Bhargrava (1983) test has optimal asymptotic power against local alternatives, but has bad sizedistortions in small samples, especially if errors are negatively autocorrelated. The augmented Dickey-Fullertest with the BIC used to choose lag-length has overall good size in finite samples.
Cite as: Anna Mikusheva, course materials for 14.384 Time Series Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu),Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
-
MIT OpenCourseWarehttp://ocw.mit.edu
14.384 Time Series AnalysisFall 2013
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Review from last timeAdding a constantLimiting Distribution
Allowing Auto-correlationAugmented Dickey Fuller
Other TestsTest Comparison