chapter 6 estimation of the mean and covariancesuhasini/teaching673/chapter6.pdf · 2016-11-09 ·...
TRANSCRIPT
Chapter 6
Estimation of the mean and
covariance
Prerequisites
• Some idea of what a cumulant is.
Objectives
• To derive the sample autocovariance of a time series, and show that this is a positive definite
sequence.
• To show that the variance of the sample covariance involves fourth order cumulants, which
can be unwielding to estimate in practice. But under linearity the expression for the variance
greatly simplifies.
• To show that under linearity the correlation does not involve the fourth order cumulant. This
is the Bartlett formula.
• To use the above results to construct a test for uncorrelatedness of a time series (the Port-
manteau test). And understand how this test may be useful for testing for independence in
various di↵erent setting. Also understand situations where the test may fail.
Here we summarize the Central limit theorems we will use in this chapter. The simplest is the
case of iid random variables. The first is the classical central limit theorem. Suppose that {Xi} are
171
iid random variables with mean µ and variance �2 < 1. Then
1pn
nX
i=1
(Xi � µ)D! N (0,�2).
A small variant on the classical CLT is the case that {Xi} are independent random variables (but
not identically distributed). Suppose E[Xi] = µi, var[Xi] = �2i < 1 and for every " > 0
1
s2n
nX
i=1
E�
(Xi � µi)2I(s�1
n |Xi � µi| > ")�! 0
where s2n =Pn
i=1
�2i , which is the variance ofPn
i=1
Xi (the above condition is called the Lindeberg
condition). Then
1q
Pni=1
�2i
nX
i=1
(Xi � µi)D! N (0, 1).
The Lindeberg condition looks unwieldy, however by using Chebyshev’s and Holder inequality it
can be reduced to simple bounds on the moments.
Note that the Lindeberg condition generalizes to the conditional Lindeberg condition when
dealing with martingale di↵erences.
We now state a generalisation of this central limit to triangular arrays. Suppose that {Xt,n}are independent random variables with mean zero. Let Sn =
Pnt=1
Xt,n we assume that var[Sn] =Pn
t=1
var[Xt,n] = 1. For example, in the case that {Xt} are iid random variables and Sn =
1pn
Pnt=1
[Xt � µ] =Pn
t=1
Xt,n, where Xt,n = ��1/2n�1/2(Xt � µ). If for all " > 0
nX
t=1
E�
X2
t,nI(|Xt,n| > ")�! 0,
then SnD! N (0, 1).
6.1 An estimator of the mean
Suppose we observe {Yt}nt=1
, where
Yt = µ+Xt,
172
where µ is the finite mean, {Xt} is a zero mean stationary time series with absolutely summable
covariances (P
k |cov(X0
, Xk)| < 1). Our aim is to estimate the mean µ. The most obvious
estimator is the sample mean, that is Yn = n�1
Pnt=1
Yt as an estimator of µ.
6.1.1 The sampling properties of the sample mean
We recall from Example 1.5.1 that we obtained an expression for the sample mean. We showed
that
var(Yn) =1
nvar(X
0
) +2
n
nX
k=1
�n� k
n
�
c(k).
Furthermore, ifP
k |c(k)| < 1, then in Example 1.5.1 we showed that
var(Yn) =1
nvar(X
0
) +2
n
1X
k=1
c(k) + o(1
n).
Thus if the time series has su�cient decay in it’s correlation structure a mean squared consistent
estimator of the sample mean can be achieved. However, one drawback is that the dependency
means that one observation will influence the next, and if the influence is positive (seen by a positive
covariance), the resulting estimator may have a (much) larger variance than the iid case.
The above result does not require any more conditions on the process, besides second order
stationarity and summability of its covariance. However, to obtain confidence intervals we require
a stronger result, namely a central limit theorem for the sample mean. The above conditions are
not enough to give a central limit theorem. To obtain a CLT for sums of the formPn
t=1
Xt we
need the following main ingredients:
(i) The variance needs to be finite.
(ii) The dependence between Xt decreases the further apart in time the observations. However,
this is more than just the correlation, it really means the dependence.
The above conditions are satisfied by linear time series, if the co�cients �j decay su�cient fast.
However, these conditions can also be verified for nonlinear time series (for example the (G)ARCH
and Bilinear model described in Chapter 4).
We now state the asymptotic normality result for linear models.
173
Theorem 6.1.1 Suppose that Xt is a linear time series, of the form Xt =P1
j=�1 j"t�j, where "t
are iid random variables with mean zero and variance one,P1
j=�1 | j | < 1 andP1
j=�1 j 6= 0.
Let Yt = µ+Xt, then we have
pn�
Yn � µ�
= N (0,�2)
where �2 = var(X0
) + 2P1
k=1
c(k).
PROOF. Later in this course we will give precise details on how to prove asymptotic normality of
several di↵erent type of estimators in time series. However, we give a small flavour here by showing
asymptotic normality of Yn in the special case that {Xt}nt=1
satisfy an MA(q) model, then explain
how it can be extended to MA(1) processes.
The main idea of the proof is to transform/approximate the average into a quantity that we
know is asymptotic normal. We know if {✏t}nt=1
are iid random variables with mean µ and variance
one then
pn(✏n � µ)
D! N (0, 1). (6.1)
We aim to use this result to prove the theorem. Returning to Yn by a change of variables (s = t�j)
we can show that
1
n
nX
t=1
Yt = µ+1
n
nX
t=1
Xt = µ+1
n
nX
t=1
qX
j=0
j"t�j
= µ+1
n
n�qX
s=1
"s
0
@
qX
j=0
j
1
A+0
X
s=�q+1
"s
0
@
qX
j=q�s
j
1
A+nX
s=n�q+1
"s
0
@
n�sX
j=0
j
1
A
= µ+n� q
n
0
@
qX
j=0
j
1
A
1
n� q
n�qX
s=1
"s +1
n
0
X
s=�q+1
"s
0
@
qX
j=q+s
j
1
A+1
n
nX
s=n�q+1
"s
0
@
n�sX
j=0
j
1
A
:= µ+ (n� q)
n"n�q + E
1
+ E2
, (6.2)
where =Pq
j=0
j . It is straightforward to show that E|E1
| Cn�1 and E|E2
| Cn�1.
Finally we examine (n�q)n "n�q. We note that if the assumptions are not satisfied and
Pqj=0
j =
174
0 (for example the process Xt = "t � "t�1
), then
1
n
nX
t=1
Yt = µ+1
n
0
X
s=�q+1
"s
0
@
qX
j=q�s
j
1
A+1
n
nX
s=n�q+1
"s
0
@
n�sX
j=0
j
1
A .
This is a degenerate case, since E1
and E2
only consist of a finite number of terms and thus if "t are
non-Gaussian these terms will never be asymptotically normal. Therefore, in this case we simply
have that 1
n
Pnt=1
Yt = µ+O( 1n) (this is why in the assumptions it was stated that 6= 0).
On the other hand, if 6= 0, then the dominating term in Yn is "n�q. From (6.1) it is
clear thatpn� q"n�q
P! N (0, 1) as n ! 1. However, for finite q,p
(n� q)/nP! 1, therefore
pn"n�q
P! N (0, 1). Altogether, substituting E|E1
| Cn�1 and E|E2
| Cn�1into (6.2) gives
pn�
Yn � µ�
= pn"n�q +Op(
1
n)
P! N �
0, 2
�
.
With a little work, it can be shown that 2 = �2.
Observe that the proof simply approximated the sum by a sum of iid random variables. In the
case that the process is a MA(1) or linear time series, a similar method is used. More precisely,
we have
pn�
Yn � µ�
=1pn
nX
t=1
1X
j=0
j"t�j =1pn
1X
j=0
j
n�jX
s=1�j
"s
=1pn
1X
j=0
j
nX
t=1
"t +Rn
where
Rn =1pn
1X
j=0
j
0
@
n�jX
s=1�j
"s �nX
s=1
"s
1
A
=1pn
nX
j=0
j
0
@
0
X
s=1�j
"s �nX
s=n�j
"s
1
A+1pn
1X
j=n+1
j
0
@
n�jX
s=1�j
"s �nX
s=1
"s
1
A
:= Rn1 +Rn2 +Rn3 +Rn4.
175
We will show that E[R2
n,j ] = o(1) for 1 j 4. We start with Rn,1
E[R2
n,1] =1
n
nX
j1
,j2
=0
j1
j2
cov
0
@
0
X
s1
=1�j1
"s1
,0
X
s2
=1�j2
"s2
1
A
=1
n
nX
j1
,j2
=0
j1
j2
min[j1
� 1, j2
� 1]
=1
n
nX
j=0
2
j (j � 1) +2
n
nX
j1
=0
j1
,j1
�1
X
j2
=0
j2
min[j2
� 1]
1
n
nX
j=0
2
j (j � 1) +2
n
nX
j1
=0
|j1
j1
|.
SinceP1
j=0
| j | < 1 and, thus,P1
j=0
| j |2 < 1, then by dominated convegencePn
j=0
[1�j/n] j !P1
j=0
j andPn
j=0
[1 � j/n] 2
j ! P1j=0
2
j as n ! 1. This implies thatPn
j=0
(j/n) j ! 0 andPn
j=0
(j/n) 2
j ! 0. Substituting this into the above bounds for E[R2
n,1] we immediately obtain
E[R2
n,1] = o(1). Using the same argument we obtain the same bound for Rn,2, Rn,3 and Rn,4. Thus
pn�
Yn � µ�
= 1pn
nX
j=1
"t + op(1)
and the result then immediately follows. ⇤
Estimation of the so called long run variance (given in Theorem 6.1.1) can be di�cult. There
are various methods that can be used, such as estimating the spectral density function (which we
define in Chapter 8) at zero. Another approach proposed in Lobato (2001) and Shao (2010) is to
use the method of so called self-normalization which circumvents the need to estimate the long run
mean, by privotalising the statistic.
6.2 An estimator of the covariance
Suppose we observe {Yt}nt=1
, to estimate the covariance we can estimate the covariance c(k) =
cov(Y0
, Yk) from the the observations. A plausible estimator is
cn(k) =1
n
n�|k|X
t=1
(Yt � Yn)(Yt+|k| � Yn), (6.3)
176
since E[(Yt � Yn)(Yt+|k| � Yn)] ⇡ c(k). Of course if the mean of Yt is known to be zero (Yt = Xt),
then the covariance estimator is
cn(k) =1
n
n�|k|X
t=1
XtXt+|k|. (6.4)
The eagle-eyed amongst you may wonder why we don’t use 1
n�|k|Pn�|k|
t=1
XtXt+|k|, when cn(k) is a
biased estimator, whereas 1
n�|k|Pn�|k|
t=1
XtXt+|k| is not. However cn(k) has some very nice properties
which we discuss in the lemma below.
Lemma 6.2.1 Suppose we define the empirical covariances
cn(k) =
8
<
:
1
n
Pn�|k|t=1
XtXt+|k| |k| n� 1
0 otherwise
then {cn(k)} is a positive definite sequence. Therefore, using Lemma 1.6.1 there exists a stationary
time series {Zt} which has the covariance cn(k).
PROOF. There are various ways to show that {cn(k)} is a positive definite sequence. One method
uses that the spectral density corresponding to this sequence is non-negative, we give this proof in
Section 8.3.1.
Here we give an alternative proof. We recall a sequence is semi-positive definite if for any vector
a = (a1
, . . . , ar)0 we have
rX
k1
,k2
=1
ak1
ak2
cn(k1 � k2
) =nX
k1
,k2
=1
ak1
ak2
cn(k1 � k2
) = a0b⌃na � 0
where
b⌃n =
0
B
B
B
B
B
B
@
cn(0) cn(1) cn(2) . . . cn(n� 1)
cn(1) cn(0) cn(1) . . . cn(n� 2)...
.... . .
......
cn(n� 1) cn(n� 2)...
... cn(0)
1
C
C
C
C
C
C
A
,
noting that cn(k) =1
n
Pn�|k|t=1
XtXt+|k|. However, cn(k) =1
n
Pn�|k|t=1
XtXt+|k| has a very interesting
construction, it can be shown that the above convariance matrix is b⌃n = XnX0n, where Xn is a
177
n⇥ 2n matrix with
Xn =
0
B
B
B
B
B
B
@
0 0 . . . 0 X1
X2
. . . Xn�1
Xn
0 0 . . . X1
X2
. . . Xn�1
Xn 0...
......
......
......
......
X1
X2
. . . Xn�1
Xn 0 . . . . . . 0
1
C
C
C
C
C
C
A
Using the above we have
a0b⌃na = a0XnX0na = kX0ak2
2
� 0.
This this proves that {cn(k)} is a positive definite sequence.
Finally, by using Theorem 1.6.1, there exists a stochastic process with {cn(k)} as its autoco-
variance function. ⇤
6.2.1 Asymptotic properties of the covariance estimator
The main reason we construct an estimator is either for testing or constructing a confidence interval
for the parameter of interest. To do this we need the variance and distribution of the estimator. It
is impossible to derive the finite sample distribution, thus we look at their asymptotic distribution.
Besides showing asymptotic normality, it is important to derive an expression for the variance.
In an ideal world the variance will be simple and will not involve unknown parameters. Usually
in time series this will not be the case, and the variance will involve several (often an infinite)
number of parameters which are not straightforward to estimate. Later in this section we show
that the variance of the sample covariance can be extremely complicated. However, a substantial
simplification can arise if we consider only the sample correlation (not variance) and assume linearity
of the time series. This result is known as Bartlett’s formula (you may have come across Maurice
Bartlett before, besides his fundamental contributions in time series he is well known for proposing
the famous Bartlett correction). This example demonstrates, how the assumption of linearity can
really simplify problems in time series analysis and also how we can circumvent certain problems
in which arise by making slight modifications of the estimator (such as going from covariance to
correlation).
The following theorem gives the asymptotic sampling properties of the covariance estimator
(6.3). One proof of the result can be found in Brockwell and Davis (1998), Chapter 8, Fuller
178
(1995), but it goes back to Bartlett (indeed its called Bartlett’s formula). We prove the result in
Section 6.2.2.
Theorem 6.2.1 Suppose {Xt} is a mean zero linear stationary time series where
Xt = µ+1X
j=�1 j"t�j ,
whereP
j | j | < 1, {"t} are iid random variables with E("t) = 0 and E("4t ) < 1. Suppose we
observe {Xt : t = 1, . . . , n} and use (6.3) as an estimator of the covariance c(k) = cov(X0
, Xk).
Define ⇢n(r) = cn(r)/cn(0) as the sample correlation. Then for each h 2 {1, . . . , n}
pn(⇢n(h)� ⇢(h))
D! N (0,Wh) (6.5)
where ⇢n(h) = (⇢n(1), . . . , ⇢n(h)), ⇢(h) = (⇢(1), . . . , ⇢(h)) and
(Wh)ij =1X
k=�1{⇢(k + i)⇢(k + j) + ⇢(k � i)⇢(k + j) + 2⇢(i)⇢(j)⇢2(k)� 2⇢(i)⇢(k)⇢(k + j)� 2⇢(j)⇢(k)⇢(k + i)}.(6.6)
Equation (6.6) is known as Bartlett’s formula.
In Section 6.3 we apply the method for checking for correlation in a time series. We first show
how the expression for the asymptotic variance is obtained.
6.2.2 Proof of Bartlett’s formula
What are cumulants?
We first derive an expression for cn(r) under the assumption that {Xt} is a strictly stationary time
series with finite fourth order moment,P
k |c(k)| < 1 and for all r1
, r2
2 Z,P
k |4(r1, k, k+r2
)| <1 where
4
(k1
, k2
, k3
) = cum(X0
, Xk1
, Xk2
, Xk3
).
It is reasonable to ask what cumulants are. Cumulants often crops up in time series. To
understand what they are and why they are used, we focus the following discussion on just fourth
order cumulants.
The joint cumulant of Xt, Xt+k1
, Xt+k2
, Xt+k3
(denoted as cum(Xt, Xt+k1
, Xt+k2
, Xt+k3
)) is the
179
coe�cient of s1
s2
s3
s4
in the power series expansion of
log E[eis1Xt
+is2
Xt+k
1
+is3
Xt+k
2
+is4
Xt+k
4 ].
It looks very similar to the definition of moments. Indeed there is a one to one correpondence
between the moments and the cumulants, which is why they arise in time series. However, there
are important di↵erences
• If Xt is independent of Xt+k1
, Xt+k2
, Xt+k3
then
cum (Xt, Xt+k1
, Xt+k2
, Xt+k3
) = 0.
This is because the log of the corresponding characteristic function is
log E[eis1Xt
+is2
Xt+k
1
+is3
Xt+k
2
+is4
Xt+k
4 ] = log E[eis1Xt ] + log[E[eis2Xt+k
1
+is3
Xt+k
2
+is4
Xt+k
4 ].
Thus we see that the coe�cient of s1
s2
s3
s4
in the above expansion is zero.
• If Xt, Xt+k1
, Xt+k2
, Xt+k3
is multivariate Gaussian, then all cumulants higher than order 2
are zero.
Neither of the above two properties hold for moments.
We can see from the definition of the characteristic function, if the time series is strictly sta-
tionary then
log E[eis1Xt
+is2
Xt+k
1
+is3
Xt+k
2
+is4
Xt+k
4 ] = log E[eis1X0
+is2
Xk
1
+is3
Xk
2
+is4
Xk
4 ].
Thus the cumulants are invariant to shift
cum(Xt, Xt+k1
, Xt+k2
, Xt+k3
) = cum(X0
, Xk1
, Xk2
, Xk3
) = 4
(k1
, k2
, k3
).
Thus like the autocovariance functions for stationary processes it does not depend on t.
The cumulant is similar to the covariance in that
(a) The covariance measures the dependence between Xt and Xt+k. Note that cov[Xt, Xt+k] =
cov[Xt+k, Xt], hence the covariance is invariant to order (if the random variables are real).
The covariance is the second order cumulant.
180
Like the covariance, the joint cumulant cum[Xt, Xt+k1
, Xt+k2
, Xt+k3
] is also invariant to order.
(b) The cumulant is measuring the dependence between cum[Xt, Xt+k1
, Xt+k2
, Xt+k3
] in “all
directions”. For example, suppose {Xt} has zero mean then
cum[Xt, Xt+k1
, Xt+k2
, Xt+k3
] = E[XtXt+k1
Xt+k2
Xt+k3
]� E[XtXt+k1
]E[Xt+k2
Xt+k3
] (6.7)
�E[XtXt+k2
]E[Xt+k1
Xt+k3
]� E[XtXt+k3
]E[Xt+k1
Xt+k2
].(6.8)
(c) In time series we usually assume that the covariance decays over time i.e. if k > 0
|cov[Xt, Xt+k]| ↵(k)
where ↵(k) is a positive sequence such that ↵(k) ! 0 as k ! 1. We showed this result was
true for linear time series. The same is true of cumulants i.e. assume k1
k2
k3
then
|cum[Xt, Xt+k1
, Xt+k2
, Xt+k3
]| ↵(k1
)↵(k2
� k1
)↵(k3
� k2
).
(d) Often in proofs we can the assumptionP
r |c(r)| < 1. An analogous assumption isP
k1
,k2
,k3
|4
(k1
, k2
, k3
)| < 1.
Example 6.2.1 For the causal AR(1) model Xt = �Xt�1
+"t (where {"t} are iid random variables
with finite fourth order cumulant 4
) by using the MA(1) representation (assuming 0 k1
k2
k3
) we have
cum[Xt, Xt+k1
, Xt+k2
, Xt+k3
= 4
1X
j=0
�j�j+k1�j+k
2�j+k3 =
4
�k1+k2
+k3
1X
j=0
�4j .
Observe that the fourth order dependence decays as the lag increases.
The variance of the sample covariance in the case of strict stationarity
We will consider
var[cn(r)] =1
n2
n�|r|X
t,⌧=1
cov(XtXt+r, X⌧X⌧+r).
181
One approach for the analysis of cov(XtXt+r, X⌧X⌧+r) is to expand it in terms of expectations
cov(XtXt+r, X⌧X⌧+r) = E(XtXt+r, X⌧X⌧+r)�E(XtXt+r)E(X⌧X⌧+r), however it not clear how this
will give var[XtXt+r] = O(n�1). Instead we observe that cov(XtXt+r, X⌧X⌧+r) is the covariance
of the product of random variables. This belongs to the general class of cumulants of products
of random variables. We now use standard results on cumulants. The most important is that if
X,Y, U and V are mean zero random variables, then
cov[XY,UV ] = cov[X,U ]cov[Y, V ] + cov[X,V ]cov[Y, U ] + cum(X,Y, U, V )
(this result can be seen from (6.7)). This result can be generalized to higher order cumulants, see
Brillinger (2001). Using this result we have
var[cn(r)]
=1
n2
n�|r|X
t,⌧=1
�
cov(Xt, X⌧ )| {z }
=c(t�⌧) by stationarity
cov(Xt+r, X⌧+r) + cov(Xt, X⌧+r)cov(Xt+r, X⌧ ) + cum(Xt, Xt+r, X⌧ , X⌧+r)�
=1
n2
n�|r|X
t,⌧=1
⇥
c(t� ⌧)2 + c(t� ⌧ � r)c(t+ r � ⌧) + k4
(r, ⌧ � t, ⌧ + r � t)⇤
:= I + II + III,
where the above is due to strict stationarity of the time series. We analyse the above term by
term. Either (i) by changing variables and letting k = t � ⌧ and thus changing the limits of the
summand in an appropriate way or (ii) observing thatPn�|r|
t,⌧=1
c(t� ⌧)2 is the sum of the elements
in the Toeplitz matrix
0
B
B
B
B
B
B
@
c(0)2 c(1)2 . . . c(n� |r|� 1)2
c(�1)2 c(0)2 . . . c(n� |r|� 2)2
......
. . ....
c(�(n� |r|� 1))2 c(�(n� |r|� 2))2 . . . c(0)2
1
C
C
C
C
C
C
A
,
(noting that c(�k) = c(k)) the sum I can be written as
I =1
n2
n�|r|X
t,⌧=1
c(t� ⌧)2 =1
n2
(n�|r|�1)
X
k=�(n�|r|�1)
c(k)2n�|r|�|k|X
t=1
1 =1
n
n�|r|X
k=�(n�|r|)
✓
n� |r|� |k|n
◆
c(k)2.
182
For all k, (1�|k|/n)c(k)2 ! c(k)2 and |Pn�|r|k=�(n�|r|)(1�|k|/n)c(k)2| Pk c(k)
2, thus by dominated
convergence (see Appendix A)Pn
k=�(n�|r|)(1� |k|/n)c(k)2 !P1k=�1 c(k)2. This gives
I =1
n
1X
k=�1c(k)2 + o(
1
n).
Using a similar argument we can show that
II =1
n
1X
k=�1c(k + r)c(k � r) + o(
1
n).
To derive the limit of III, again we use a change of variables to give
III =1
n
n�|r|X
k=�(n�|r|)
✓
n� |r|� |k|n
◆
k4
(r, k, k + r).
To bound III we note that for all k, (1� |k|/n)k4
(r, k, k+r) ! k4
(r, k, k+r) and |Pn�|r|k=�(n�|r|)(1�
|k|/n)k4
(r, k, k+r)| Pk |k4(r, k, k+r)|, thus by dominated convergence we havePn
k=�(n�|r|)(1�|k|/n)k
4
(r, k, k + r) !P1k=�1 k
4
(r, k, k + r). This gives
III =1
n
1X
k=�14
(r, k, k + r) + o(1
n).
Therefore altogether we have
nvar[cn(r)] =1X
k=�1c(k)2 +
1X
k=�1c(k + r)c(k � r) +
1X
k=�14
(r, k, k + r) + o(1).
Using similar arguments we obtain
ncov[cn(r1), cn(r2)] =1X
k=�1c(k)c(k + r
1
� r2
) +1X
k=�1c(k � r
1
)c(k + r2
) +1X
k=�14
(r1
, k, k + r2
) + o(1).
We observe that the covariance of the covariance estimator contains both covariance and cumulants
terms. Thus if we need to estimate them, for example to construct confidence intervals, this can
be di�cult. However, there does exist methods for estimating the variance, the most popular is to
use a block type bootstrap, another method is to exploit properties of Fourier transforms and use
the method of orthogonal samples (see Subba Rao (2016)).
183
Below that under linearity the above fourth order cumulant term has a simpler form.
The covariance of the sample covariance under linearity
We recall that
1X
k=�1c(k + r
1
� r2
)c(k) +1X
k=�1c(k � r
1
)c(k + r2
) +1X
k=�14
(r1
, k, k + r2
) + o(1) = T1
+ T2
+ T3
+ o(1).
We now show that under linearity, T3
(the fourth order cumulant) has a much simpler form. Let
us suppose that the time series is linear
Xt =1X
j=�1 j"t�j
whereP
j | j | < 1, {"t} are iid, E("t) = 0, var("t) = 1 and 4
= cum4
("t). Then T3
is
T3
=1X
k=�1cum
0
@
X
j1
=�1 j
1
"�j1
,X
j2
=�1 j
2
"r1
�j2
,X
j3
=�1 j
3
"k�j3
,X
j4
=�1 j
4
"k+r2
�j1
1
A
=1X
k=�1
X
j1
,...,j4
=�1 j
1
j2
j3
j4
cum ("�j1
, "r1
�j2
, "k�j3
, "k+r2
�j1
) .
Standard results in cumulants, state that if any a variable is independent of all the others (see Sec-
tion 6.2.2), then cum[Y1
, Y2
, . . . , Yn] = 0. Applying this result to cum ("�j1
, "r1
�j2
, "k�j3
, "k+r2
�j1
)
reduces T3
to
T3
= 4
1X
k=�1
1X
j=�1 j j�r
1
j�k j�r2
�k.
Using a change of variables j1
= j and j2
= j � k we have
4
�
1X
j1
=�1 j j�r
1
��
1X
j2
=�1 j
2
j2
�r2
�
= 4
c(r1
)c(r2
),
recalling the covariance of a linear process in Lemma 3.1.1 (and assuming var["t] = 1).
Altogether this gives
ncov[cn(r1), cn(r2)] =1X
k=�1c(k)c(k + r
1
� r2
) +1X
k=�1c(k � r
1
)c(k + r2
) + 4
c(r1
)c(r2
) + o(1). (6.9)
184
Thus in the case of linearity our expression for the variance is simpler, and the only di�cult
parameter to estimate of 4
.
The variance of the sample correlation under linearity
A suprisingly twist in the story is that (6.9) can be reduced further, if we are interested in estimating
the correlation rather than the covariance. We recall the sample correlation is
⇢n(r) =cn(r)
cn(0),
which is an estimator of ⇢(r) = c(r)/c(0).
Lemma 6.2.2 (Bartlett’s formula) Suppose {Xt} is a linear time series, whereP
j | (j)| < 1.
Then the variance of the distribution of ⇢n(r) is
1X
k=�1{⇢(k + r)2 + ⇢(k � r)⇢(k + r) + 2⇢(r)2⇢2(k)� 4⇢(r)⇢(k)⇢(k + r)}.
PROOF. We use that
var[cn(r)] = O(n�1) and E[cn(r)] =
✓
n� |r|n
◆
c(r).
Thus for fixed r
|cn(0)� c(0)| = Op(n�1/2), |cn(r)� c(r)| = Op(n
�1/2) and |cn(0)� c(0)| |cn(r)� c(r)| = Op(n�1).
By making a Taylor expansion of cn(0)�1 about c(0)�1 we have
⇢n(r)� ⇢(r) =cn(r)
cn(0)� c(r)
c(0)
=[cn(r)� c(r)]
c(0)� [cn(0)� c(0)]
cn(r)
c(0)2+ [cn(0)� c(0)]2
cn(r)
cn(0)3| {z }
=O(n�1
)
(6.10)
=[cn(r)� c(r)]
c(0)� [cn(0)� c(0)]
c(r)
c(0)2+ 2 [cn(0)� c(0)]2
cn(r)
cn(0)3� [cn(0)� c(0)]
[cn(r)� c(r)]
c(0)2| {z }
O(n�1
)
:= An +Op(n�1), (6.11)
185
where cn(0) lies between cn(0) and c(0) and
An =[cn(r)� c(r)]
c(0)� [cn(0)� c(0)]
c(r)
c(0)2.
Thus the dominating term in ⇢n(r) � ⇢(r) is An, which is of order O(n�1/2) (by (6.9)). Thus the
limiting distribution of ⇢n(r)�⇢(r) is determined by An and the variance of the limiting distribution
is also determined by An. It is straightforward to show that
nvar[An] = nvar[cn(r)]
c(0)2� 2ncov[cn(r), cn(0)]
c(r)2
c(0)3+ nvar[cn(0)]
c(r)2
c(0)4. (6.12)
By using (6.9) we have
nvar
0
@
cn(r)
cn(0)
1
A
=
0
@
P1k=�1 c(k)2 +
P1k=�1 c(k)c(k � r) +
4
c(r)2 2P1
k=�1 c(k)c(k � r) + 4
c(r)c(0)
2P1
k=�1 c(k)c(k � r) + 4
c(r)c(0)P1
k=�1 c(k)2 +P1
k=�1 c(k)c(k � r) + 4
c(0)2
1
A
+o(1).
Substituting the above into (6.12) gives us
nvar[An] =
1X
k=�1c(k)2 +
1X
k=�1c(k)c(k � r) +
4
c(r)2!
1
c(0)2�
2
21X
k=�1c(k)c(k � r) +
4
c(r)c(0)
!
c(r)2
c(0)3+
1X
k=�1c(k)2 +
1X
k=�1c(k)c(k � r) +
4
c(0)2!
c(r)2
c(0)4+ o(1).
Focusing on the fourth order cumulant terms, we see that these cancel, which gives the result. ⇤
To prove Theorem 6.2.1, we simply use the Lemma 6.2.2 to obtain an asymptotic expression
for the variance, then we use An to show asymptotic normality of cn(r) (under linearity).
Exercise 6.1 Under the assumption that {Xt} are iid random variables show that cn(1) is asymp-
totically normal.
186
Hint: Let m = n/(B + 1) and partition the sumPn�1
k=1
XtXt+1
as follows
n�1
X
t=1
XtXt+1
=BX
t=1
XtXt+1
+XB+1
XB+2
+2B+1
X
t=B+2
XtXt+1
+X2B+2
X2B+3
+
3B+2
X
t=2B+3
XtXt+1
+X3B+3
X3B+4
+4B+3
X
t=3B+4
XtXt+1
+ . . .
=m�1
X
j=0
Um,j +m�1
X
j=0
X(j+1)(B+1)
X(j+1)(B+1)+1
where Um,j =Pj(B+1)+B
t=j(B+1)+1
XtXt+1
. Show that the second term in the above summand is asymp-
totically negligible and show that the classical CLT for triangular arrays can be applied to the first
term.
Exercise 6.2 Under the assumption that {Xt} is a MA(1) process, show that bcn(1) is asymptoti-
cally normal.
Exercise 6.3 The block bootstrap scheme is a commonly used method for estimating the finite
sample distribution of a statistic (which includes its variance). The aim in this exercise is to see
how well the bootstrap variance approximates the finite sample variance of a statistic.
(i) In R write a function to calculate the autocovariance bcn(1) =1
n
Pn�1
t=1
XtXt+1
.
Remember the function is defined as cov1 = function(x){...}
(ii) Load the library boot library("boot") into R. We will use the block bootstrap, which parti-
tions the data into blocks of lengths l and then samples from the blocks n/l times to construct
a new bootstrap time series of length n. For each bootstrap time series the covariance is
evaluated and this is done R times. The variance is calculated based on these R bootstrap
estimates.
You will need to use the function tsboot(tseries,statistic,R=100,l=20,sim="fixed").
tseries refers to the original data, statistic to the function you wrote in part (i) (which should
only be a function of the data), R=is the number of bootstrap replications and l is the length
of the block.
Note that tsboot(tseries,statistic,R=100,l=20,sim="fixed")$t will be vector of length
R = 100 which will contain the bootstrap statistics, you can calculate the variance of this
vector.
187
(iii) Simulate the AR(2) time series arima.sim(list(order = c(2, 0, 0), ar = c(1.5,�0.75)), n =
128) 500 times. For each realisation calculate the sample autocovariance at lag one and also
the bootstrap variance.
(iv) Calculate the mean of the bootstrap variances and also the mean squared error (compared
with the empirical variance), how does the bootstrap perform?
(iv) Play around with the bootstrap block length l. Observe how the block length can influence the
result.
Remark 6.2.1 The above would appear to be a nice trick, but there are two major factors that
lead to the cancellation of the fourth order cumulant term
• Linearity of the time series
• Ratio between cn(r) and cn(0).
Indeed this is not a chance result, in fact there is a logical reason why this result is true (and is
true for many statistics, which have a similar form - commonly called ratio statistics). It is easiest
explained in the Fourier domain. If the estimator can be written as
1
n
Pnk=1
�(!k)In(!k)1
n
Pnk=1
In(!k),
where In(!) is the periodogram, and {Xt} is a linear time series, then we will show later that the
asymptotic distribution of the above has a variance which is only in terms of the covariances not
higher order cumulants. We prove this result in Section 9.5.
6.3 Checking for correlation in a time series
Bartlett’s formula if commonly used to check by ‘eye; whether a time series is uncorrelated (there
are more sensitive tests, but this one is often used to construct CI in for the sample autocovariances
in several statistical packages). This is an important problem, for many reasons:
• Given a data set, we need to check whether there is dependence, if there is we need to analyse
it in a di↵erent way.
188
• Suppose we fit a linear regression to time series data. We may to check whether the residuals
are actually uncorrelated, else the standard errors based on the assumption of uncorrelated-
ness would be unreliable.
• We need to check whether a time series model is the appropriate model. To do this we fit
the model and estimate the residuals. If the residuals appear to be uncorrelated it would
seem likely that the model is correct. If they are correlated, then the model is inappropriate.
For example, we may fit an AR(1) to the data, estimate the residuals "t, if there is still
correlation in the residuals, then the AR(1) was not the correct model, since Xt � �Xt�1
is
still correlated (which it would not be, if it were the correct model).
We now apply Theorem 6.2.1 to the case that the time series are iid random variables. Suppose {Xt}are iid random variables, then it is clear that it is trivial example of a (not necessarily Gaussian)
linear process. We use (6.3) as an estimator of the autocovariances.
To derive the asymptotic variance of {cn(r)}, we recall that if {Xt} are iid then ⇢(k) = 0 for
k 6= 0. Thus by making a Taylor expansion (and noting that c(k)=0) (and/or using (6.6)) we see
that
pn⇢n =
pn
c(0)cn + op(1)
D! N (0,Wh),
where b⇢n = (b⇢n(1), . . . , b⇢n(m)), similar with bcn and
(Wh)ij =
8
<
:
1 i = j
0 i 6= j
In other words,pn⇢n
D! N (0, Ih). Hence the sample autocovariances at di↵erent lags are asymp-
totically uncorrelated and have variance one. This allows us to easily construct error bars for the
sample autocovariances under the assumption of independence. If the vast majority of the sample
autocovariance lie inside the error bars there is not enough evidence to suggest that the data is
a realisation of a iid random variables (often called a white noise process). An example of the
empirical ACF and error bars is given in Figure 6.1. We see that the empirical autocorrelations of
the realisation from iid random variables all lie within the error bars. In contrast in Figure 6.2
we give a plot of the sample ACF of an AR(2). We observe that a large number of the sample
autocorrelations lie outside the error bars.
189
0 5 10 15 20
0.00.2
0.40.6
0.81.0
Lag
ACF
Series iid
Figure 6.1: The sample ACF of an iid sample with error bars (sample size n = 200).
0 5 10 15 20
−0.4
0.00.4
0.8
Lag
ACF
Series ar2
5 10 15 20
−0.4
0.00.4
0.8
lag
acf
Figure 6.2: Top: The sample ACF of the AR(2) process Xt = 1.5Xt�1
+ 0.75Xt�2
+ "t witherror bars n = 200. Bottom: The true ACF.
190
Of course, simply checking by eye means that we risk misconstruing a sample coe�cient that
lies outside the error bars as meaning that the time series is correlated, whereas this could simply
be a false positive (due to multiple testing). To counter this problem, we construct a test statistic
for testing uncorrelatedness. Since under the nullpn(⇢n(h) � ⇢(h))
D! N (0, I), one method of
testing is to use the square correlations
Sh = nhX
r=1
|⇢n(r)|2, (6.13)
under the null it will asymptotically have a �2-distribution with h degrees of freedom, under the
alternative it will be a non-central (generalised) chi-squared. The non-centrality is what makes us
reject the null if the alternative of correlatedness is true. This is known as the Box-Pierce test.
Of course, a big question is how to select h. In general, we do not have to use large h since most
correlations will arise when r is small, However the choice of h will have an influence on power. If
h is too large the test will loose power (since the mean of the chi-squared grows as h ! 1), on
the other hand choosing h too small may mean that certain correlations at higher lags are missed.
How to selection h is discussed in several papers, see for example Escanciano and Lobato (2009).
6.4 Checking for partial correlation
We recall that the partial correlation of a stationary time series at lag t is given by the last coe�cient
of the best linear predictor of Xm+1
given {Xj}mj=1
i.e. �m where bXm+1|m =Pm
j=1
�jXm+1�j . Thus
�m can be estimated using the Yule-Walker estimator or least squares (more of this later) and the
sampling properties of the estimator are determined by the sampling properties of the estimator of
an AR(m) process. We state these now. We assume {Xt} is a AR(p) time series of the form
Xt =pX
j=1
�jXt�j + "t
where {"t} are iid random variables with mean zero and variance �2. Suppose an AR(m) model is
fitted to the data using the Yule-Walker estimator, we denote this estimator as b�m = b⌃�1
m rm. Let
b�m = (b�m1
, . . . , b�mm), the estimator of the partial correlation at lag m is b�mm. Assume m � p.
191
Then by using Theorem 7.2.1 (see also Theorem 8.1.2, Brockwell and Davis (1998)) we have
pn⇣
b�m � �m
⌘
P! N(0,�2⌃�1
m ).
where �m are the true parameters. Ifm > p, then �m = (�1
, . . . ,�p, 0, . . . , 0) and the last coe�cient
has the marginal distribution
pnb�mm
P! N(0,�2⌃mm).
Since m > p, we can obtain a closed for expression for ⌃mm. By using Remark 3.2.2 we have
⌃mm = ��2, thus
pnb�mm
P! N(0, 1).
Therefore, for lags m > p the partial correlations will be asymptotically pivotal. The errors bars in
the partial correlations are [�1.96n�1/2, 1.96n�1/2] and these can be used as a guide in determining
the order of the autoregressive process (note there will be dependence between the partial correlation
at di↵erent lags).
This is quite a surprising result and very di↵erent to the behaviour of the sample autocorrelation
function of an MA(p) process.
Exercise 6.4
(a) Simulate a mean zero invertible MA(1) process (use Gaussian errors). Use a reasonable sample
size (say n = 200). Evaluate the sample correlation at lag 2, drhon(2). Note the sample correlation
at lag two is estimating 0. Do this 500 times.
• Calculate of proportion of sample covariances |b⇢n(2)| > 1.96/pn
• Make a QQplot of b⇢n(2)/pn against a standard normal distribution. What do you observe?
(b) Simulate a causal, stationary AR(1) process (use Gaussian errors). Use a reasonable sample
size (say n = 200). Evaluate the sample partial correlation at lag 2, b�n(2). Note the sample partial
correlation at lag two is estimating 0. Do this 500 times.
• Calculate of proportion of sample partial correlations |b�n(2)| > 1.96/pn
• Make a QQplot of b�n(2)/pn against a standard normal distribution. What do you observe?
192
6.5 Checking for Goodness of fit
To check for adequency of a model, after fitting a model to the data the sample correlation of the
estimated residuals is evaluated. If there appears to be no correlation in the estimated residuals
(so the residuals are near uncorrelated) then the model is determined to adequately fit the data.
Consider the general model
Xt = g(Yt, ✓) + "t
where {"t} are iid random variables and "t is independent of Yt, Yt�1
, . . .. Note Yt can be a vector,
such as Yt�1
= (Xt�1
, Xt�2
, . . . , Xt�p) and examples of models which satisfy the above include the
AR(p) process. We will assume that {Xt, Yt} is a stationary ergodic process. Further to simplify
the discussion we will assume that ✓ is univariate, it is straightforward to generalize the discussion
below to the multivariate case.
Let b✓ denote the least squares estimator of ✓ i.e.
b✓ = argminnX
t=1
(Xt � g(Yt, ✓))2 . (6.14)
Using the “usual” Taylor expansion methods (and assuming all the usual conditions are satisfied,
such as |b✓ � ✓| = Op(n�1/2) etc) then it can be shown that
pn⇣
b✓ � ✓⌘
= I�1
1pn
nX
t=1
"t@g(Yt, ✓)
@✓+ op(1) where I = E
✓
@g(Yt, ✓)
@✓
◆
2
.
{"t @g(Yt
,✓)@✓ } are martingale di↵erences, which is why
pn⇣
b✓ � ✓⌘
is asymptotically normal, but more
of this in the next chapter. Let Ln(✓) denote the least squares criterion. Note that the above is
true because
@Ln(✓)
@✓= �2
nX
t=1
[Xt � g(Yt, ✓)]@g(Yt, ✓)
@✓
and
@2Ln(✓)
@✓2= �2
nX
t=1
[Xt � g(Yt, ✓)]@2g(Yt, ✓)
@✓2+ 2
nX
t=1
✓
@g(Yt, ✓)
@✓
◆
2
,
193
thus at the true parameter, ✓,
1
n
@2Ln(✓)
@✓2P! 2I.
Based on (6.14) we estimate the residuals using
b"t = Xt � g(Yt, b✓)
and the sample correlation with b⇢(r) = bc(r)/bc(0) where
bc(r) =1
n
n�|r|X
t=1
X
t
b"tb"t+r.
Often it is (wrongly) assumed that one can simply apply the results in Section 6.3 when checking
for adequacy of the model. That is make an ACF plot of b⇢(r) and use [�n�1/2, n1/2] as the error
bars. However, since the parameters have been estimated the size of the error bars will change. In
particular, under the null that the model is correct we will show that
pnb⇢(r) = N
0
B
B
B
@
0, 1|{z}
iid part
� �2
c(0)JrI�1Jr
| {z }
due to parameter estimation
1
C
C
C
A
where c(0) = var[Xt], �2 = var("t) and Jr = E[@g(Yt+r
,✓)@✓ "t] and I = E
⇣
@g(Yt
,✓)@✓
⌘
2
(see, for example,
Li (1992)). Thus the error bars under the null are
±✓
1pn
1� �2
c(0)JrI�1Jr
�◆�
.
Estimation of the parameters means the inclusion of the term �2
c(0)JrI�1Jr. If the lag r is not too
small then Jr will be close to zero and the [±1/pn] approximation is fine, but for small r, JrI�1Jr
can be large and positive, thus the error bars, ±n�1/2, are too wide. Thus one needs to be a little
cautious when interpreting the ±n�1/2 error bars. Note if there is no dependence between "t and
Yt+r then using the usual error bars is fine.
Remark 6.5.1 The fact that the error bars get narrower after fitting a model to the data seems
a little strange. However, it is far from unusual. One explanation is that the variance of the
194
estimated residuals tend to be less than the true residuals (since the estimated residuals contain
less information about the process than the true residuals). The most simplest example are iid
observations {Xi}ni=1
with mean µ and variance �2. The variance of the “estimated residual”
Xi � X is (n� 1)�2/n.
We now derive the above result (using lots of Taylor expansions). By making a Taylor expansion
similar to (6.10) we have
pn [b⇢n(r)� ⇢(r)]
pn[cn(r)� c(r)]
c(0)�p
n [cn(0)� c(0)]c(r)
c(0)2+Op(n
�1/2).
However, under the “null” that the correct model was fitted to the data we have c(r) = 0 for |r| > 0,
this gives
pnb⇢n(r) =
pnbcn(r)
c(0)+ op(1),
thus the sampling properties of b⇢n(r) are determined by bcn(r), and we focus on this term. It is
easy to see that
pnbcn(r) =
1pn
n�rX
t=1
⇣
"t + g(✓, Yt)� g(b✓, Yt)⌘⇣
"t+r + g(✓, Yt+r)� g(b✓, Yt+r)⌘
.
Heuristically, by expanding the above, we can see that
pnbcn(r) ⇡ 1p
n
n�rX
t=1
"t"t+r +1pn
nX
t=1
"t+r
⇣
g(✓, Yt)� g(b✓, Yt)⌘
+1pn
nX
t=1
"t⇣
g(✓, Yt+r)� g(b✓, Yt+r)⌘
,
then by making a Taylor expansion of g(b✓, ·) about g(✓, ·) (to take (b✓ � ✓) out of the sum)
pnbcn(r) ⇡ 1p
n
n�rX
t=1
"t"t+r +(b✓ � ✓)p
n
"
nX
t=1
"t+r@g(✓, Yt)
@✓+ "t
@g(✓, Yt+r)
@✓
#
+ op(1)
=1pn
n�rX
t=1
"t"t+r +pn(b✓ � ✓)
1
n
"
nX
t=1
"t+r@g(✓, Yt)
@✓+ "t
@g(✓, Yt+r)
@✓
#
+ op(1).
195
We make this argument precise below. Making a Taylor expansion we have
pnbcn(r) =
1pn
n�rX
t=1
"t � (b✓ � ✓)@g(✓, Yt)
@✓+
(b✓ � ✓)2
2
@2g(✓t, Yt)
@✓2
!
⇥
"t+r � (b✓ � ✓)@g(✓, Yt+r)
@✓+
(b✓ � ✓)2
2
@2g(✓t+r, Yt+r)
@✓2
!
=pnecn(r)�
pn(b✓ � ✓)
1
n
n�rX
t=1
✓
"t@g(✓, Yt+r)
@✓+ "t+r
@g(✓, Yt)
@✓
◆
+Op(n�1/2)(6.15)
where ✓t lies between b✓ and ✓ and
ecn(r) =1
n
n�rX
t=1
"t"t+r.
We recall that by using ergodicity we have
1
n
n�rX
t=1
✓
"t@g(✓, Yt+r)
@✓+ "t+r
@g(✓, Yt)
@✓
◆
a.s.! E
✓
"t@g(✓, Yt+r)
@✓
◆
= Jr,
where we use that "t+r and @g(✓,Yt
)
@✓ are independent. Subsituting this into (6.15) gives
pnbcn(r) =
pnecn(r)�
pn(b✓ � ✓)Jr + op(1)
=pnecn(r)� I�1Jr
1pn
n�rX
t=1
@g(Yt, ✓)
@✓"t
| {z }
=�p
n
2
@Ln
(✓)
@✓
+op(1).
Asymptotic normality ofpnbcn(r) can be shown by showing asymptotic normality of the bivariate
vectorpn(ecn(r),
@Ln
(✓)@✓ ). Therefore all that remains is to obtain the asymptotic variance of the
above (which will give the desired result);
var
pnecn(r) +
pn
2I�1Jr
@Ln(✓)
@✓
�
var�p
necn(r)�
| {z }
=1
+2I�1Jrcov
✓pnecn(r),
pn
2
@Ln(✓)
@✓
◆
+ I�2J 2
r var
✓pn
2
@Ln(✓)
@✓
◆
(6.16)
196
We evaluate the two covariance above;
cov
✓pnecn(r),�
pn
2
@Ln(✓)
@✓
◆
=1
n
n�rX
t1
,t2
=1
cov
⇢
"t1
"t1
+r, "t2
@g(Yt2
, ✓)
@✓
��
=1
n
n�rX
t1
,t2
=1
cov {"t1
, "t2
} cov⇢
"t1
+r,@g(Yt
2
, ✓)
@✓
�
+ cov {"t1
+r, "t2
} cov⇢
"t1
,@g(Yt
2
, ✓)
@✓
�
+cum
⇢
"t1
, "t1
+r, "t2
,@g(Yt
2
, ✓)
@✓
��
= �2E
"t@g(Yt+r, ✓)
@✓
�
= �2Jr.
Similarly we have
var
✓pn
2
@Ln(✓)
@✓
◆
=1
n
nX
t1
,t2
=1
cov
✓
"t1
@g(Yt1
, ✓)
@✓, "t
2
@g(Yt2
, ✓)
@✓
◆
= �2E
✓
@g(Yt1
, ✓)
@✓
◆
2
= �2I.
Substituting the above into (6.16) gives the asymptotic variance ofpnbc(r) to be
1� �2JrI�1Jr.
Thus we obtain the required result
pnb⇢(r) = N
✓
0, 1� �2
c(0)JrI�1Jr
◆
.
6.6 Long range dependence (long memory) versus changes
in the mean
A process is said to have long range dependence if the autocovariances are not absolutely summable,
i.e.P
k |c(k)| = 1. A nice historical background on long memory is given in this paper.
From a practical point of view data is said to exhibit long range dependence if the autocovari-
ances do not decay very fast to zero as the lag increases. Returning to the Yahoo data considered
in Section 4.1.1 we recall that the ACF plot of the absolute log di↵erences, given again in Figure
6.3 appears to exhibit this type of behaviour. However, it has been argued by several authors that
the ‘appearance of long memory’ is really because of a time-dependent mean has not been corrected
for. Could this be the reason we see the ‘memory’ in the log di↵erences?
We now demonstrate that one must be careful when diagnosing long range dependence, because
a slow/none decay of the autocovariance could also imply a time-dependent mean that has not been
197
0 5 10 15 20 25 30 35
0.00.2
0.40.6
0.81.0
Lag
ACF
Series abs(yahoo.log.diff)
Figure 6.3: ACF plot of the absolute of the log di↵erences.
corrected for. This was shown in Bhattacharya et al. (1983), and applied to econometric data in
Mikosch and Starica (2000) and Mikosch and Starica (2003). A test for distinguishing between long
range dependence and change points is proposed in Berkes et al. (2006).
Suppose that Yt satisfies
Yt = µt + "t,
where {"t} are iid random variables and the mean µt depends on t. We observe {Yt} but do not
know the mean is changing. We want to evaluate the autocovariance function, hence estimate the
autocovariance at lag k using
cn(k) =1
n
n�|k|X
t=1
(Yt � Yn)(Yt+|k| � Yn).
Observe that Yn is not really estimating the mean but the average mean! If we plotted the empirical
ACF {cn(k)} we would see that the covariances do not decay with time. However the true ACF
would be zero and at all lags but zero. The reason the empirical ACF does not decay to zero is
198
because we have not corrected for the time dependent mean. Indeed it can be shown that
cn(k) =1
n
n�|k|X
t=1
(Yt � µt + µt � Yn)(Yt+|k| � µt+k + µt+k � Yn)
⇡ 1
n
n�|k|X
t=1
(Yt � µt)(Yt+|k| � µt+k) +1
n
n�|k|X
t=1
(µt � Yn)(µt+k � Yn)
⇡ c(k)|{z}
true autocovariance=0
+1
n
n�|k|X
t=1
(µt � Yn)(µt+k � Yn)
| {z }
additional term due to time-dependent mean
Expanding the second term and assuming that k << n and µt ⇡ µ(t/n) (and is thus smooth) we
have
1
n
n�|k|X
t=1
(µt � Yn)(µt+k � Yn)
⇡ 1
n
nX
t=1
µ2
t �
1
n
nX
t=1
µt
!
2
+ op(1)
=1
n2
nX
s=1
nX
t=1
µ2
t �
1
n
nX
t=1
µt
!
2
+ op(1)
=1
n2
nX
s=1
nX
t=1
µt (µt � µs) =1
n2
nX
s=1
nX
t=1
(µt � µs)2 +
1
n2
nX
s=1
nX
t=1
µs (µt � µs)
| {z }
=� 1
n
2
Pn
s=1
Pn
t=1
µt
(µt
�µs
)
=1
n2
nX
s=1
nX
t=1
(µt � µs)2 +
1
2n2
nX
s=1
nX
t=1
µs (µt � µs)� 1
2n2
nX
s=1
nX
t=1
µt (µt � µs)
=1
n2
nX
s=1
nX
t=1
(µt � µs)2 +
1
2n2
nX
s=1
nX
t=1
(µs � µt) (µt � µs) =1
2n2
nX
s=1
nX
t=1
(µt � µs)2 .
Therefore
1
n
n�|k|X
t=1
(µt � Yn)(µt+k � Yn) ⇡ 1
2n2
nX
s=1
nX
t=1
(µt � µs)2 .
Thus we observe that the sample covariances are positive and don’t tend to zero for large lags.
This gives the false impression of long memory.
It should be noted if you study a realisation of a time series with a large amount of dependence,
it is unclear whether what you see is actually a stochastic time series or an underlying trend. This
199
makes disentangling a trend from data with a large amount of correlation extremely di�cult.
200