complements if a then s a a ( 1 ) · i} as a sample path of an infinite sequence of random...
TRANSCRIPT
Time Series: Given joint ergodic stationarity between Y and X, when we have serially correlated errors, our consistency results do not change. However, our asymptotic distribution results do - we need to generalize CLT’s to obtain asymptotic normality for serially correlated processes with the “right” variance. (The generalization is possible under certain conditions by restricting the degree of serial correlation, and the condition is transparent for linear stochastic processes) Probability Measure, Sigma Algebra (Sigma Algebra is what we define our measures on), Borel Fields Def: The set S of all possible outcomes of a particular experiment is called the sample space of the experiment. Def: A collection of subsets of S, denoted B, is called a sigma field or sigma algebra if it satisfies the following:
( )1 2 1
1. :
2. : , \
3. : , ,... ,
c
ii
Empty Set
Complements If A then S A A
Unions If A A then A
β
β β
β β=
∅ ∈
∈ =
∈ ∈∪∈
1 21 1 1
: , ,... , 2 'c
ci i i
i i i
Pf If A A then clearly A A by A by De Morgan s Lawsβ β β β∞ ∞ ∞
= = =
⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟∈ ∈ ⇒ ∈ ⇒ ∈⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠∩ ∩ ∪
Def: P is a probability measure on the pair (S,B), if P satisfies:
( )1 2 11
1. ( ) 02. ( ) 1
3. , ,... int, ( )i iii
P A for all AP S
If A A are pairwise disjo P A P A
β
β∞
==
≥ ∈=
∈ = ∑∪
Def: Let X: (Ω,F) (R, B) be F measurable. A Borel field is the smallest σ-field that makes X measurable, given by: { }1( ) : ( )X G G X B for some Bσ β−≡ ⊆ Ω = ∈
(Think of this is the only sets in the universe that the random variables gives us information about – since they are the sets that are preimages of all the possible outcomes of the r.v. So, the random variables X is informative about members of σ(X) but not more than that! ) Probability Space, Random Variables, and Measurability Def: The triple (Ω,F,P) is called a probability space,
where Ω is the “universe” (or the whole set of outcomes, like S), F is the σ-field on Ω (like B), and P is the underlying probability measure that governs all random variables, i.e. a probability measure on (Ω,F)
Def: A random variable is a function from the sample space into the real numbers, or a measurable mapping from (Ω,F) into (R, B) (So, for a random variable X: (Ω,F) (R, B), the sample space for X is R) Def : A random variable X: (Ω,F) (R, B) is F-measurable if the preimage { }: ( )X F for all Bω ω β β∈ Ω ∈ ∈ ∈ (all the events in B can be mapped back to F and be measured there) Note: X(ω) is a random variable that induces a probability measure PX on (R, B), ω ε Ω (the universe)
PX is defined from P (a probability measure on (Ω,F) ) by { }( )Pr : ( ) ( ) : ( )XX takes on values in B P B P X B P X B for some Bω ω β≡ ∈ = ∈ Ω ∈ ∈
Def: A random variable Y = g(X): (RX, BX) (RY, BY) induces the probability measure PY on the sample space RY as follows: { }( ) { }( ), ( ) ( ) : ( ) : ( )Y Y X X Xfor some A B P A P Y A P X x R Y g x A P x R Y g x A∈ ≡ ∈ = ∈ ∈ = ∈ = ∈ = ∈ Prop: Let F and G be 2 σ-fields s.t. G c F (all the sets in G are also in F). If a random variable X is G-measurable, then X is F-measurable1. Conditional Expectations and Law of Iterated Expectations Def: Let X and Y be real-valued random variables on (Ω,F,P) and let G = σ(X). Suppose E|Y| finite. The conditional expected value of Y given X is a random variable (function of X) that satisfies the following 3 conditions2: 1. ( | )2. ( | ) : . . , { : ( | )( )} ( ) ( ( | ) inf )
3. , ( | )( ) ( ) ( ) ( )g g
E E Y XE Y X is G measurable i e B E Y X X E Y X is as ormative as X but no more sophisticated than X
For all g G E Y X dP Y dP
β ω ω σ
ω ω ω ω
< ∞
− ∀ ∈ ∈ Ω ∈
∈ =∫ ∫
1 { }: , : ( )Pf B X B G Fβ ω ω∀ ∈ ∈ Ω ∈ ∈ ⊆2 Y always satisfies 1 and 3. But Y will only satisfy 2 if σ(Y) c σ(X) i.e. Y is no more informative than X. So typically not possible to use Y as E(Y|X).
Alternative representation of E(Y|X) and usefulness: E(Y|X) = E(Y|σ(X) ) = E(Y|G)
We do this bc when X takes on certain values, it maps to values in the preimage or equivalently the Borel field. Example: Let E(Y|X) = E(Y|σ(X) ) = E(Y|G) and E(Y|X,Z) = E(Y|σ(X,Z) ) = E(Y|H) Since G c H, then E(E(Y|X,Z)) = E(E(Y|H)) = E(E(Y|G) | H) = E(Y|G) = E(Y|X) Law of Iterated Expectations: ( ) ( )|XE Y E E Y X⎡ ⎤= ⎣ ⎦
3 Generalized Law of Iterated Expectations: For G c H (G is a less fine partition than H),
( ) ( ) ( )| | | | |E Y G E E Y H G E E Y G H⎡ ⎤ ⎡ ⎤= =⎣ ⎦ ⎣ ⎦4
Property of Conditional Expectation: For real-valued random variables, Y and X, we have E(YX|X) = E(Y|X)X Basic Time Series / Stochastic process Concepts: Stochastic process : a sequence of random variables { }iz
Realization/Sample Path of a stochastic process: a realization of { }iz sequence of real numbers Time Series: If the sequence of rv’s is indexed by time, the stochastic process is called a time series. (we often will use “time series”
to refer to the realization and the stochastic process) Ensemble Mean: { }iz (“true” mean of each of the rv’s in the sequence) Important Point: We can always think of our data {zi} as a sample path of an infinite sequence of random variables {…,Z-1,Z0,Z1,Z2,…}, where the data is a realization of a particular subsequence of the infinite sequence.
What can we say about the underlying random variables that generate our data/time series: Strictly Stationary Processes5: A stochastic process/sequence of RV’s (e.g. a time series) { } ...)2,1( =izi is (strictly) stationary if { } 1 1( ,..., ) 0,1, 2,... ( ,..., ) ( ,..., )t t K K t t KF Z Z does not depend on t for all K F Z Z F Z Z for all t and K+ + += ⇔ = F(Z ) does not depend on t! (all observations come from same distribution. Though this says entical but
Prop: If strictly
idt not necessarily independent)
{ } ...)2,1( =izi ( ) stationary, then { }( ) ( 1, 2...)ig z i = (strictly) stationary for some cont. function g6.
eakly Stationary Process: A stochastic process/sequence of RV’s
{ } ...)2,1( =iziW is weakly (or covariance) stationary if:
pends only on j but not on i (e.g. Cov(z ,z ) =Cov(z ,z ) )
hite Noise Processes : A covariance stationary process {zi} is a white noise if
0 for j != 0 Not weakly stationary processes. It is a process with 0-mean and no serial correlation.
i. E(zi) does not depend on i and ii. Cov(zi,zi-j) exists, is finite, and de 1 5 12 16
7Wi. E(zi) = 0 ii. Cov(zi,zi-j) =
e: This is an important class of
3 By condition 3 in the definition of conditional expectation, since E(Y|X) is clearly Ω-measurable,
, ( ( | )) ( | )( ) ( ) ( ) ( ) ( )for E E Y X E Y X dP Y dP E Yω ω ω ωΩ Ω
Ω ∈ Ω = = =∫ ∫
4 So the usual law of iterated expectations is a special case where G = { },Ω ∅ because E(Y|G) = E(Y) in this case. Remember, E(Y) is just taking expectation over the trivial sigma field. 5Note: So, the joint distribution of depends only on (for any given finite integer r and any set of subscripts i1,…,ir) i1-i, i2-i, …,ir-i but not i. (e.g. joint
distribution of (z1,z5) is same as (z12,z16) what matters is the relative position in the sequence not the absolute position!)
),...,,,(21 riiii zzzz
7 A white noise process that is not strictly stationary: Let w ~ unif(0,2π) and define zi = cos(iw) for i = 1,2,… Then, E(zi) = 0, Var(zi) = ½, and Cov(zi, zj) = 0. So {zi} is white noise process, but it is clearly not an independent (since each depends on w) white noise process or strictly stationary (since it depends on i). (WHY????)
11 1
1 12 2
1
21
2 [ ] /( ) ( ) ( ( ) ) ( [ ] / ) 1 [ ] /( 2 )2
1 1( ) 1 [ ] /( 2 ) sin( 2 ) 1 1
1, ( ) ( )( 2 ) 1
Zi i
Zi
i Z
z z
Cos z iF z P Z z P Cos iw z P w Cos z i Cos z i by uniform
f z Cos z i ce Cos zz zi z z
zFor a fixed i E Z z f z dz dzi z
π ππ
ππ
π
−− −
− −
=−
−= ≥ = ≥ = ≥ = = −
∂ ∂⎡ ⎤ ⎡ ⎤= − = =⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦∂ ∂− −
= =−∫
1 1 12
2 11 1
1 2 1 1 02( 2 ) 2( 2 )1z
z dz zi izπ π −
=− =−
− ⎡ ⎤= − = − − =⎢ ⎥⎣ ⎦−∫ ∫
Independent White Noise Process: An independently and identically distributed (iid) sequence with 0-mean and finite variance is a
d
Ergodic Process8: A stationary process is said to be ergodic if, for any two bounded functions:
special case of a white noise process. (but generally, elements of white noise process may not come from the same distribution and may not be independent from each other. Independence anidentical distribution are much stronger assumptions!)
{ }iz ℜ→ℜℜ→ℜ LK gf :,: ( ) ( ) ( )),...,(),...,(,...,(,...,(lim lnin zzgEzzfEzzzzfE ++∞→ =) nikiilninikii g ++++++
(This requires independence between elements sufficiently far apart i.e. (zi,…,zi+k) ind of (zi+n,…,zi+n+l) for n large)
Heuristically, a stationary process is ergodic (i.e. ergodic stationarity) if it is asymptotically independent i.e. any 2
Ergodic stationarity is important in developing large sample theory because of the ergodic theorem. Ergodic Theorem:
Let e a stationary and ergodic process with
(Intuitively, ergodicity says that the process is not “too persistent”)
rv’s or positioned far apart in the sequence are almost independently distributed.
{ }niiz 1= b μ=)( izE . Then, μ⎯⎯→⎯≡ ∑
=..
1
1 n
sai
in zn
z
Idea: This generalizes Kolmogorov’s LLN . Ergodic theorem allows for serial dependence (whereas Kol’s rules it out by
Implic and ergodic process (if exists and finite) is consistently estimated by the
Martingale & Martingale Difference
Martingales Vector Process: A vector process
9
iid assumption), provided that the serial dependence disappears in the “long-run” (by stationary ergodic), i.e. asymptotically in large samples. ation: Any moment of a stationary
sample moment. 10 (iid is a special case!)
{ }ig is called a martingale if lled the information set, and is called a martingale
artingale Difference Sequence: A vector process
2),...,|( 111 ≥= −− iforggggE iii Note: The conditioning set 11,..., ggi− is often ca { }igsince its information set is its own past values.
{ }igM with 0)( =igE is called a margingale difference sequence (m.d.s.) or martingale difference if the expectation conditional on its past values is also 0: ( ) 0,...,| ≥ 211 =− iforgggE ii
Implica 0 for i andtion: A martingale difference sequence has no serial correlation Cov(gi,gi-j) = 0≠j 11
rgodic Stationary Martingale Differences CLT: Let
{ }igE be a vector martingale difference sequence that is stationary and ergodic
with 12, and let ∑=)'( ii ggE ∑=
≡n1 , ),0(1 n
1
Σ⎯→⎯= ∑=
Ngn
gn D
ii
iig
ng
1
. Then
1c. Autocorrelation, Ergodic Stationarity, and Martingale difference sequence
arity. Just check the definition. However, a MDS
Pf: Suppose Zi is MDS. Cov(Zi,Zi-k) = E(ZiZi-k)-E(Zi)E(Zi-k) = E[Zi-k E(Zi | Zi-1,…)] – E[E(Zi | Zi-1,…)]E(Zi-k) = 0 by MDS assumption.
It should be noted that autocorrelation in {Zi} does not affect ergodicity or stationimplies no autocovariance! This is why the model in Ch2 (specifically assumption 2.5) is changed by the assumption that there is autocorrelation. In which case, consistency results change, we need to invoke a different CLT to get asymptotic normality, and theasymptotic variance is different.
8Need for Ergodic Stationarity: The fundamental problem in time-series analysis is that we observe the realization of the process only once. (i.e. we get only 1 sample and 1 observation of realized { !) Ideally, we would like to observe history many times over to obtain more samples, but clearly this is not feasible. But if each of the zi’s come from
the same distribution (stationarity) , then we can view each realization of
}iz{ }iz as n realizations from the same distribution. Furthermore, if the process is not too persistent
(ergodicity), then each element of will contain some information not available from the other elements. In this case, the time average over the elements of { }iz { }iz will be consistent for the ensemble mean!
9 Kolmogorov’s Second Strong Law of Large Numbers: Let be iid with { }niiz 1= μ=)( izE .Then, μ⎯⎯→⎯≡ ∑
=..
1
1sa
n
iin z
nz
10 Since if ergodic stationary, then also ergodic stationary for any measurable function f(.). { }iz { )( izf }11 ( ) ( ) ( ) ( ) ( )[ ] ( )[ ] ( )[ ] 0),...,|(sin0'|),...,|('||'''),( 1111 ======−= −−−−−−−−−−−− gggEcegggggEEEgggEEgggEEggEgEgEggEggCov iijijiiijijiijijiijiijiijiijii 12 Since stationary, the matrix of cross moments does not depend on i. Also, we implicitly assume that all the cross moments exist and are finite. { }ig
Linear Processes: An important class of covariance-stationary processes, which are created by taking a moving
1. MA(q): A process {yt} is called the q-th order moving average process (MA(q)) if it can be written as a weighted average of the
or
average of a white noise process13.
current and most recent q values of a white noise process:
0 1 1 0... 1t t t q t qy withμ θ ε θ ε θ ε θ− −= + + + + = 0 00
1q
t t jj
y withμ θ ε θ−=
= + =∑
MA(q) process is Covariance-Stationary mean μ with j-th order autocovariance14:
( ) 2 2
0 1 10
( , ) ... 0,1,...,
0
j t t j j j q q j j k kk
j
Cov y y for j q
for j q
γ θ θ θ θ θ θ σ σ θ θ
γ
− + − +=
= = + + + = =
= >
∑q j−
MA(inf) as a Mean Square Limit: MA(q) process dies out after q lags ( 0j for j qγ = > ). Though some time series have this
el serial correlat (inf) property, we want to learn to mod ion that depends on the infinite past MA
0 0
( ) : , ( )t j t j j jj j
MA y where is a sequence of reals for nec but not suffμ ψ ε ψ ψ∞ ∞
15−
= =
∞ = + < ∞∑ ∑
Prop 1: MA(inf) with absolutely summable coefficients
ers that is absolutely summable. Then
Let {εt} be white noise and {φj} be a sequence of real numb
0
0 0
0
( ) , ( 2 )
. . lim
( ) { } cov ( ) cov { }
t j t jj
n
t j t j t j t jj j
t t t j j
a For each t y converges in mean square L convergence
i e so that y is a meaningful it of y
b y is stationary with E y and auto ariance are given by
μ ψ ε
μ ψ ε μ ψ ε
μ γ γ θ θ θ
−=
∞
− −= =
= +
= + = +
− = = +
∑
∑ ∑
( )
∞
2 21 1
0
0
...
( ) cov :
( ) { } , { } ( )
j j k kk
jj
t t
c Auto ariances are absolutely summable
d If is iid white noise then y is strictly stationary and ergodic
θ σ σ θ θ
γ
ε
∞
+ +=
∞
=
+ =
< ∞
∑
∑
(Note: this includes MA(q) as a special case since absolute summability is trivially met for q finite)
13 White noise process {et} is a 0-mean covariance stationary (cov between terms does not depend on t) process with no serial correlation: E(et) = 0, E(et
2) = s2 > 0 , E(etet-j) = Cov(etet-j) = 0 for j != 0 14
1 1
1 1 1 1 1 1 1 1
( ) ( ) ( ) ... ( ) ( )
0 ,( , ) ( ... , ... ) ( ... , ... )
t t t q t q
t t j t t q t q t j t j q t j q t t q t q t j t j q t j q
k t kk
E y E E E does not depend on t
For j qCov y y Cov Cov
Cov
μ ε θ ε θ ε μ
μ ε θ ε θ ε μ ε θ ε θ ε ε θ ε θ ε ε θ ε θ ε
θ ε
− −
− − − − − − − − − − − − − − −
−=
= + + + + =
< <= + + + + + + + + = + + + + + +
=
( )
0 1 1 1 1 1
1
, ,
sin cov
j q j q j q q j q j
j k t j k k t j k t j k t j k j k t j k k t j kk k k q j k k
q j
k j k t j kk
Cov these are the overlapping terms
Var ce white noise so no bt cross terms
θ ε θ ε θ ε θ ε θ ε
θ θ ε
− − − −
+ − − − − − − − − + − − − −= = = − + = =
−
+ − −=
⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟+ + =⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠
=
∑ ∑ ∑ ∑ ∑ ∑
∑2
0
covq j
j k kk
by stationarity of white noiseσ θ θ−
+=
= −∑15However, for this to work we need the infinite series to converge to a random variables. A necessary condition is absolute summability.
16Prop 2: Filtering covariance stationary processes (This generalizes above theorem to not just white noise but cov-stat)
0j
−=
∑
{ }
Let {xt} be white noise and {hi} be a sequence of real numbers that is absolutely summable. Then
2
0
0 0
( ) , ( 2 )
. . 0 lim
{ } cov( ) cov { }
t j t jj
nL
j t j t j t j t j t jj j
t
t
a For each t y h x converges in mean square L convergence
i e so that y is a meaningful it of y
y is stationaryb If the auto ariances of x are absolut
ψ ε μ ψ ε μ ψ ε
−=
∞ ∞
− −= =
=
⎯⎯→ = + = +
−
∑
∑ ∑
, cov tely summable then so are the auto ariances of y
∞
16 Recall the following results about mean square convergence:
( )2( ){ } 0 ,
( ) , lim ( ) ( ) lim ( ) ( )
n m n
n ms n ms n n n n n
i z converges in mean squareiff E z z as m n
ii If x x and z z then E x E x and E x z E xz→∞ →∞
⎡ ⎤− → → ∞⎢ ⎥⎣ ⎦→ → = =
Proof of Prop 1 is as follows:
2. Filters Background: The operation of taking a weighted average of (possibly infinitely many) successes values of a process, i.e.
t j , is called filtering, and can be expressed compactly using the lag-operator: j
Def: For a given arbitrary sequence of reals,
0t j
j
y μ ψ ε −=
= + ∑∞
jt tL x x −= .
define a filter by 20 1 1( ) ...L L Lα α α α≡ + + + 0 1{ , ,...}α α
Therefore, applying to a process t k 20 1 2 0 1 1 2 2
0
{ }, ( ) ... ...t t t t t t t t kk
x L x x L x L x x x x xα α α α α α α α∞
− − −=
= + + + = + + + = ∑
Note: 0 1 20
( ), ( ) (1) ( ...) .t jj
If x c cons L c c c c aα α α α α∞
=
= = = + + + = ∑
ef: If 0 0j jfor j p and for j pα α≠ = =D > ,then the filter reduces to a p-th degree lag polynomial:
2 pL0 1 2( ) ... pL L Lα α α α α= + + + +
[Note: We can treat these like polynomials in terms of operations: multiplication, inverting, roots, etc…]
Def: A filter with absolutely summable coefficients is called absolutely summable filter. It is a mapping from the set of covariance
Product of Filters
stationary processes to itself.
Def: Let { },{ }j jα β be two arbitrary sequences of reals s.t. 2 2
0 1 2 0 1 2( ) ..., ( ) ...L L L L L Lα α α α β β β β≡ + + + ≡ + + +
Then uct of the filters ( ), ( )L L, the prod α β is: ( )L ( ) ( )L L where{ }jδδ α β= be defined as a convolution of{ },{ }j jα β :
0 0 0 1 0 1 1 0 2, , ...0 2 1 1 2 0 0 1 1 2 2 1 1 0,. ... , j j j j j 0δ α β δ α β α β δ β α β α β α β α β− − −= = + + + + + +α β α α β δ α β= + + =
xample: for 1 1L21 1 1 1 1 1( ) 1 , ( ) 1 ( ) ( ) ( ) (1 )(1 ) 1 ( )L L L L L L L L L Lα α β β δ α β α β α β α β= + = + ⇒ = = + + = + + + E
ty: Filter mult. is commutativeProper : ( ) ( ) ( ) ( )L L L Lα β β α= (Just like matrix multiplication, will apply for matrix filters)
rop: If
P ( ), ( )L Lα β absolutely summable and if {xt} covariance stationary, then by Prop2 ( ) ( ) tL L xα β is a well-defined variable,
al to equ tx ( )Lδ : " ( ) ( ) ( )" " ( ) ( ) ( ) "t tL L L L L x L xα β δ α β δ= ⇒ =
verses of FiltersIn LDef: We say that ( )β is the inverse of ( )Lα if ( ) ( ) ( ) 1L L Lδ α β= = , we denote .
erse atisfy
1( ) ( )L Lβ α −=
(i.e. The inv 1( )Lα − is a filter s ing 1( ) ( ) 1L Lα α − = )
, the inverse 1( )Lα − of ( )Lα can be defined for any arbitrary sequence { }ja Note: As long as 0 0α ≠ because
( ) ( ) 1L L ( )Lδ α β= =
0( ) 1 1 0 0jL and for jδ δ δ= ⇒ = = > . Then we can solve the following successively:
0 0 0 1 0 1 1 0 2 0 2 1 1 2 0 0 1 1 2,. ... , j j j j 2 1 1 0 0
10 1 2
0 0
, , ...
1 , ,....
jδ α β δ α β α β δ α α β α β
αβ β
α α
−= = + = + + +
⇒ = = −
Example: Consider the filter 1-L. Then,
β α β α β δ α β α β α β− −+ + = + +
1 2 3(1 ) 1 ...L L L L−− = + + + +
Why? 1(1 ) ; (1t t tL x x x −− = − − +1 21 1 1 2 2 3) ( ) 1 ... (1 ...)( )t t t t t t t t t tL x x x x x x x x L L x x−
− − − − − −− = = − − + − + = + + + − 1−
Prop: Inverses are commutative , and 1
: 1 1( ) ( ) ( ) ( )L L L Lα α α α− −=
10 00, 0, ( ( )provided L Lα β α ) ( ) ( ) ( ) ( ) ( ) ( ) ( )L L L L L L Lβ δ β α δ α δ β− −= ⇔ = ⇔ =≠ ≠ (does not apply to matrix filters)
Inverting Lag Polynomials (“finite” filter) . . ( ) 1 ...
1Re ( ) ( ) :
, , ,. ... , ...
pp
j j j j j
s t L L L L
Inverse can be defined as followscall convolution of L and L
φ φ φ φ
φφ β
δ φ β δ φ β φ β δ φ β φ β φ β δ φ β φ β φ β φ β− − −
= − − − −
= ⇒
= = + = + + = + + + +
21 2
0
0 0 0 1 0 1 1 0 2 0 2 1 1 2 0 0 1 1 2 2 1
( ) degLet L be a p th ree lag polynomialφ −
1 0 0
0
20 0 1 1 1 2 1
0
( ) ( ) 1 1 0 0
,1 1 (sin 1), , ,...
jand if L is the inverse L and for j
Thus
ce
φ β
β δ δ δ
β φ β φ β φ φφ
+
⇒ = ⇒ = = >
= = = = = +
Takeaway: for any p-th degree lag polynomial ( )Lφ with 0 1φ = , its inverse is defined/exosts!
mma less isfied)
Prop 3: Absolutely Summable Inverses of Lag Polynomials L and let
(though it may not be absolutely su ble un the invertibility condition is sat
Consider a p-th degree lag polynomial 21 2( ) 1 ..L L Lφ φ φ= − − − . p
pφ− 1( ) ( ) .L Lψ φ −= If the associated p-th degree
polynomial equation ( ) 0zφ = satisfies t : a) All roots the p-th degree polynomial equation in z
he following stability condition (λ ) of
zλ− are s.t. i
21 2 1( ) 0 ( ) 1 ... (1 )(1p
pz where z z z z zφ φ φ φ φ λ= ≡ − − − − = − − 2 )...(1 )zλ p 1i for all iλ >
Then the icient sequence coeff { }iψ is bounded in absolute value is bounded by a geometric lining sequence and hence is l he
ally decabsolutely summable. (This wil lp us express AR processes as “well-defined” MA processes).
• Example: Consider a lag polynomial of degree 1: ( ) 1L Lφ φ= −
1 0zφ− = is 1zφ
= . The root of the associated polynomial equation
The stability condition is 1 1 1or φφ
> < . Then, the inverse filter
1 2 2
0
( ) ( ) 1 ... j j
j
L L which is evidently L L Lψ φ φ φ φ∞
−
=
⎛ ⎞⎜ ⎟= = + + +⎜⎝∑=
⎟⎠
has absolutely summable coefficients.
We will use this to show that AR(1) can be expressed as MA(inf)
Note: Why do we need stability condition? Stability condition guarantees that the inverse is absolutely summable, and ity to
therefore we can multiply both sides of the equation by the inverse filter. (It wouldn’t make sense to multiply infin both sides)
3. ARMA Processes: Important class of linear processes (cov-stat gen from white noise), which are a parameterization of coefficients n
rst-Order Autoregressive Process AR(1) satisfies the following stochastic difference equation:
of MA(inf) process (the idea here is to be able to represent covariance stationary processes in a nice parametizatioof coefficients)
Def: A Fi 1 1 (1 ) { }t t t t t t t t ty y y y L y where is white noiseμ φ ε φ μ ε φ μ ε ε− −= + + ⇔ − = + ⇔ − = +
I. MA(inf) representation of AR(1) as long as 1φ < (stability condition to guarantee absolute summability)17
1
0 0 0
(1 ) (1 ) ( ) ( ) ( ) ( )1
j j j jt t t t t t j t
j j j
L y y L L L Lμtφ μ ε φ μ ε φ μ ε μ φ φ ε φ ε μ φ ε
φ
∞ ∞ ∞−
−= = =
− = + ⇒ = − + = + = + = + = +−∑ ∑ ∑
Note: If y has mean t 1μμ
φ=
−, then it is covariance stationary and has the above MA(inf) representation
white noiset
(Here we have and absolute summability. So all we need is for expectation to be right). Note: Intuitively, this result says that the process, if it started a long time ago, “settles down”, provided tha 1φ < so that the effect
. Autocovariances of AR(1) processes (1) (for case of
of the past dies out geometrically as time progresses. This is called the stationarity condition. II 2 Methods of Calculating autocov of AR | | 1φ < ):
a) 2 0 1 1 2 2 2 4 22
0
( ...) (1 ...)1
jj j j j j
t j jj
y φ 2μ φ ε γ σ φ φ φ φ φ φ φ φ φ σ σφ
∞+ +
=
⎛ ⎞= + ⇒ = + + + = + + + = ⎜ ⎟⎜ ⎟−⎝ ⎠
∑
and 2 22 2
0
1/1 1
jj j
jγ φρ σ σγ φ φ
⎛ ⎞ ⎛ ⎞≡ = =⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟− −⎝ ⎠⎝ ⎠
φ
b) Via “Yule-Walker” Equations (See 271B HW2)
I. AR(p) and its MA(inf) Representation: Above results for AR(1) can be generalized for AR(p) – p-th order autoreg. process
The generalization of AR(p) of what we derived for AR(1) is as follows:
Prop 4: AR(p) as a MA(inf) with Absolutely Summable Coefficients ity condition (so that the inverse is well defined) i.e.
II Def: A P-th Order Autoregressive Process AR(P) satisfies the following stochastic difference equation:
1 1 2 2 2...t t t t p ty y y yμ φ φ φ ε− − −= + + + + +
1 1 2 2 2...
(1 ) { }t t t t p t
t t t
or y y y y
or L y where is white noise
φ φ φ μ ε
φ μ ε ε− − −− + + + = +
− = +
Suppose the p-th deree polynomial satisfies the stationarity/stabil( ) 0 | | 1z zφ = ⇒ > . Then,
ovarianc(a) The unique c e-stationary solution to the p-th order stochastic difference equation has a MA(inf)
t
representation :
1 2( ...φ φ+ + + 21 1 2 1 2
1 1 1 1 1 11 2 1 2 1 2
0 0
) (1 ... ) (1 )(1 )...(1 )
(1 ) (1 ) ...(1 ) (1 ) (1 ) ...(1 ) ..
p pt p t t p t t p t
j j j jt p p t
j j
y L L y L L L y L L L y
y L L L L L L L L
μ φ ε φ φ φ μ ε λ λ λ μ ε
λ λ λ μ λ λ λ ε μ λ λ
−
∞ ∞− − − − − −
= =
= + + ⇒ − − − − = + ⇒ − − − = +
⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⇒ = − − − + − − − = +⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠∑ ∑
0
2 3 1 10 1 2 3
.
( ) , ( ) ... ( ) ( ) , (1)
j jp t
j
t t
L
y L L L L L where L L
λ ε
μ ψ ε ψ ψ ψ ψ ψ ψ φ μ φ μ
∞
=
− −
⎛ ⎞⎜ ⎟⎜ ⎟⎝ ⎠
⇒ = + = + + + + = =
∑
The coefficients { }jψ are bounded in absolute value by a sequence by a geometrically declining sequence and hence
mab
ote: The MA(inf) representation comes from the fact that the inverse is represented as product of infinite sums!
is absolutely sum le. (Thus the MA representation is well-defined. RHS converges.) N
17 Recall MA processes are well defined only when the lag coefficients are absolute summable. We can multiply the inverse on both sides only if absolute summability is met, and then RHS becomes a meaningful/well-defined limit. The 1φ < is equivalent to the stability condition
because the root of 1 0zφ− = is 1zφ
= , and the stability condition is 1 1 1or φφ
> < .
(b) he mean of the process is given by: T 1(1)μ φ μ−= (c) { }jγ is bounded in absolute value by a e u nce that declines s q e geometrically with j. Therefore autocovariances are
Note: We refer to “an AR(p) process” as the unique covariance stationary solution to an AR(p) equation that satisfies the
RMA(p,q): An AR(p,q) process combines AR(p) and MA(q)
q t q
absolutely summable.
stationarity condition. The absolute summability of the inverse in part (a) follows from prop 3, since stability conditions are satisfied.
IV. A
21 2 1 2 2( ) ( ) ( ) 1 ... , ( ) 1 ...
q t q
pt t p tL y L where L L L L L L1 1 2 2 1 1 2 2... ...t t t p t p t t ty y y yμ φ φ φ ε θ ε θ ε θ ε
φ μ θ ε φ φ φ φ θ θ θ ε θ ε
−− − − − −
− −= + = − − − − = + + + +
= + + + + + + + + +
or Where {εt}is white noise.
Prop 4: AR(p) as a MA(inf) with Absolutely Summable Coefficients ition (so that the inverse is well defined) i.e.
Suppose the p-th deree polynomial satisfies the stationarity/stability cond( ) 0 | | 1z zφ = ⇒ > . Then,
unique c(a) The ovariance-stationary solution to the p-th order stochastic difference equation has a MA(inf)
t
representation : 2
1 2
1 2 2 1 2 1
21 2 1 2
1 1 11 2
) ( ) 1 ...
( ) 1 ... ( ... )
(1 ... ) (1 )(1 )...(1 )
(1 ) (1 ) ...(1 ) (1
pt p
pt q t q t p t t
pp t t p t t
t p
where L L L L
L L y L L y
L L L y L L L y
y L L L
ε φ φ φ φ
θ θ θ ε θ ε μ φ φ φ ε
φ φ φ μ ε λ λ λ μ ε
λ λ λ μ
− − −
− − −
= − − − −
= + + + + = + + + + +
− − − − = + ⇒ − − − = +
⇒ = − − − + − 1 1 11 2 1 2
1 2 1 20 0 0
2 3 10 1 2 3
) (1 ) ...(1 ) (1 )(1 )...(1 )
... (1 )(1 )...(1 )
( ) , ( ) ... ( ) ( ) ( ), (1
p q
j j j j j jp q t
j j j
t t
L L L
L L L
y L L L L L where L L L
( ) (tL y Lφ μ θ= +
λ λ λ η η η
μ λ λ λ η η η ε
μ ψ ε ψ ψ ψ ψ ψ ψ φ θ μ φ
− − −
∞ ∞ ∞
= = =
−
− − − − −
⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟ ⎜ ⎟= + − − −⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠ ⎝ ⎠
⇒ = + = + + + + = =
∑ ∑ ∑1) μ−
ε
The coefficients { }jψ are bounded in absolute value by a sequence by a geometrically declining sequence and
of infinite sums!
(b) The mean of the process is given by:
hence is absolutely summable. (Thus the MA representation is well-defined. RHS converges.) Note: The MA(inf) representation comes from the fact that the inverse is represented as product
1(1)μ φ μ−=
(c) { }jγ is bounded in absolute value by a e u nce that declines s q e geometrically with j. Therefore autocovariances are
Note: We refer to “an ARMA(p,q) process” as the unique covariance stationary solution to an ARMA equation that satisfies
Invertibility of ARMA(p,q) and its MA(inf) and AR(inf) Representations
absolutely summable.
the stationarity condition. The absolute summability of the inverse in part (a) follows from prop 3, since stability conditions are satisfied.
If an ARMA(p,q) process satisfies the stability condition ( ) 0 | | 1z zφ = ⇒ > , and ( ) 0 | | 1z zθ = ⇒ > (stability condition on theta is called invertibility condition), and if in addition ( ) 0z ( ) 0and zφ θ= = have no co en, the ARMA process is invertible. And the process as an AR(inf) and M !
mmon roots. ThA(inf) representation
( ) 0 | | 1z zθ = ⇒ > , then is absolutely summable and we can multiply both sides of ARMA by 1( )Lθ − 1( )Lθ −If to obtain its AR(inf) representation18:
1 1 1 1
1
( ) ( ) ( ) ( ) ( ) ( ) ( ) (1)(1)
( ) ( ) ( ( ))
t t t t t t
t t
L y L L L y L L L y
L L y c AR
μtφ μ θ ε θ φ θ μ ε θ φ θ μ ε ε
θ
θ φ ε
− − − −
−
= + ⇒ = + ⇒ = + = +
∴ = + ∞
18 Recall, exists and is defined if 1( )Lθ −
0θ =1, as is the case here. However, it may not be absolutely summable (and therefore we cannot multiply both sides of the equation by it) unless the stability condition ( ) 0 | | 1z zθ = ⇒ > (aka the invertibility condition) is satisfied.
4. Estimating Autoregressions: Autoregressive processes (autoregressions) are popular in econometrics because they have a natural
Prop 7: (Estimation of AR coefficients) the stationarity condition. Suppose further that {e } is iid white noise. Then the OLS
interpretation and also because they are easy to estimate.
Let {yt} be an AR(p) process that satisfies t
estimator of 1 2{ , , ,..., }pc φ φ φ obtained by regressing yt on a constant and p lagged values of y is consistent and asymptotically normal, with
( )
2 11
122 2
1 11 1
ˆvar( ) ( ') (1, ,..., ) '
1 1ˆ ˆ ˆ ˆˆvar( ) ' ...1
t t t t t p
n n
t t t p t pt t
A E x x where x y y
which is consistently estimated by
A s x x where s y c y yn n p
β σ
β φ
−− −
−
− −= =
= =
⎛ ⎞= = − −⎜ ⎟⎜ ⎟ − −⎝ ⎠
∑ ∑
φ− −
ote: This follows from the fact that yt can be written as a MA(inf), so if {et} is iid white noise, then yt is ergodic stationary from Prop
→ ∞
LS… WHY?!
Estimating ARMA we can use GMM/IV/2SLS to deal with endogeneity.
Autocovariance-Generating Function ole profile of autocovariances
N1(d), and it satisfies orthogonality to et by its independence. Therefore, assumptions for OLS are met. Though, we check that E(xtxt’) is nonsingular, or equivalently checking that the autocovariance matrix Var(yt,…,tt-p) is nonsingular. It can be shown that the autocovariance of a covariance-stationary process is nonsingular for any p if 0 0 0jand as jγ γ> → . (which is satisified here since we have AR(p) as an MA(inf) and {e } is iid) tNote: An AR(p) process with non-iid white noise can also be estimated via O
To estimate AR part, AGF is a useful way to summarize the wh { }jγ of a covariance stationary process {yt}.
01
( ) ( ) (sin cov )j j jY j j j j
j j
g z z z z ce byγ γ γ γ γ−−
=−∞ =
≡ = + + =∑ ∑∞ ∞
stat
There’s a question about whether the summation ex ts, but for z = 1 and foris { }jγ absolutely summable
j01
(1) 2Y jj j
g γ γ γ=−∞ =
≡ = +∞ ∞
∑ ∑
Prop 6: Autocovariance generating function for MA(inf)
3 ...+ be an absolutely summable filter (i.e. with Let {et} be white noise and let 20 1 2 3( )L L L Lψ ψ ψ ψ ψ= + + + { }jψ absolutely
ance-generating function of the MAsummable. Then the autocovari (inf) process {yt} where ( )t ty Lμ ψ ε= + is2 1( ) ( ) ( )Y
:
g z z zσ ψ ψ −= 19
Therefore, for AR(p) and ARMA(p,q), since they have MA(inf) representations:
AR(p): 2 1( )g z σ≡ 1( ) ( )Y z zφ φ −
ARMA(p): 1
21
( ) ( )( )( ) ( )Yz zg zz z
θ θσφ φ
−
−≡
Note: These are useful because long-run variance can be expressed as gy(1)
1−
19
2
0
1 2 2 2 1 2 2 1 2 10 0 1 0 1 1 0 1 0 1 1 1 1
1
Re , ( ) , 0,1,..., 0
(1).
, ( ) ( ) ( ) ( ) (1 ) (1 )(1 )
q j
j j k k jk
j j jY j j
j j
call for MA q process for j q and for j q
Take MA
Then g z z z z z z z z z z z z
Thef
γ σ θ θ γ
γ γ γ γ γ σ θ θ θ θ θ θ σ θ θ θ σ θ θ
−
+=
∞ ∞− − − −
=−∞ =
= = = >
= = + + = + + = + + + = + + + = + +
∑
∑ ∑2 1, ( ) ( ) ( )Yore g z z zσ θ θ −=
5. Asymptotics for Sample Means of Serially Correlated Processes viously, we used Billingsly (ergodic stationary MDS ) CLT,
a) LLN for Covariance Stationary Processes with vanishing Autocovariances
Here we develop LLN and CLT for serially correlated processes. Prewhich rules out serial correlation since the process is assumed to be a martingales difference sequence20. 1 21: Let {yt} be covariance-stationary with mean μ and { }jγ be the autocovariances of {yt}. Then,
( )
. ./ 2
1
1( ) lim 0
( ) lim
Tm s L
t j jt
j jj
a y y ifT
b Var n y if
μ γ
γ
→∞=
∞
→∞=−∞
= ⎯⎯⎯⎯→ =
= < ∞
∑
∑
(Note: we also call this the long-run variance of the covariance stationary process22, it can be expressed from AGF gY(1)).
b) LLN for Vector Covariance-Stationary Processes with vanishing Autocovariances (diag element of 1 { }jΓ ) :
Let {yt} be a vector covariance-stationary with mean μ and { }jΓ be the autocovariances23 of {yt}. Then,
( )
. ./ 2. .
1
1( ) 0
( ) lim { } ( . . )
Tm s L
t j m st
j j j jj
a y y if diagonal elements of as jT
b Var n y if is summable i e each component of summable
μ=
∞
→∞=−∞
= ⎯⎯⎯⎯→ Γ → → ∞
= Γ < ∞ Γ Γ
∑
∑
(Note: we also call this the long-run covariance variance matrix of the vector covariance stationary process24, it can be expressed
from Multivariate AGF: j ). 01
(1) ( ')Y j jj j
G∞ ∞
=−∞ =
= Γ = Γ + Γ + Γ∑ ∑
20 gt MDS E(gt | gt-1,…,)=0. Therefore, Cov(gt,gt-1)=E(gtgt-1) = E( E(gt | gt-1,…)) = 0. 21
( ) 11 1 1 1 1
0 1 1 1 0 1 2 1 0
1 1 1 1, , ...
1 ( ... ) ( ... ) ... ( ...
n n n n n
i i i i i ni i i i i i
n n n
Var n y Var n y Var y Cov y y Cov y y Cov y yn n n n
nγ γ γ γ γ γ γ γ γ
= = = = =
− − − − +
⎛ ⎞ ⎛⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛⎜ ⎟ ⎜= = = = + +⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜⎜ ⎟ ⎜⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝⎝ ⎠ ⎝
= + + + + + + + + + + +
∑ ∑ ∑ ∑ ∑ ∑
( )
1
,n
i=
⎞⎞⎟⎟⎟⎟⎠⎠
( )
( )
0 1 2 1 1 2 1
0 1 2 1
1 1 1
0 01 1 1
1
)
1 ( 1) ( 2) ... ( 1) ( 2) ...
1 2( 1) 2( 2) ... 2 cov
22 1 2
1 0
, lim
n n
n
n n n
j j jj j j
n
j jj
n
n n n n nn
n n n by stationarityn
j jn n
summable jn
Therefore
γ γ γ γ γ γ γ
γ γ γ γ
γ γ γ γ γ
γ γ
− − − − +
−
− − −
= = =
=
= + − + − + + + − + − + +
= + − + − + + −
⎛ ⎞= + − = + −⎜ ⎟⎝ ⎠
⇒ →
∑ ∑ ∑
∑
( )1
01
lim 2n
n j jj j
Var n y γ γ γ− ∞
→∞ →∞= =−∞
= + =∑ ∑
22 We can think of the sample as being generated from an infinite sequence of random variables (which is cov. Stationary). So, the “long-run” variance is the sum of covariances from any 1 element in the sequence to all the other elements. 23 In a vector process, the diagonal elements of { }jΓ are the autocovariances and the off diagonal are the covariances between the lagged values of the elements of the vector.
For example:
.
, cov( , ) ( ') ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
tt
t
t tj t t j t t j t t j t j t j t j t j
t t
t t j t t j t t j t t j
t j t t j t t t j t
xLet y
z
x xThen y y E y y E y E y E x z E E x z
z z
E x x E x E x E x z E x E z
E x z E x E z E z z E z
− − − − − −
− − − −
− − −
⎡ ⎤= ⎢ ⎥
⎣ ⎦⎛ ⎞⎡ ⎤ ⎡ ⎤
−⎡ ⎤ ⎡Γ = = − = −⎜ ⎟⎢ ⎥ ⎢ ⎥ ⎤⎣ ⎦ ⎣⎜ ⎟⎣ ⎦ ⎣ ⎦⎝ ⎠− −
=− −
( , ) ( , )
( ) ( , ) ( , )t t j t t j
t j t j t t t j
Cov x x Cov x z
E z Cov x z Cov z z− −
− − −
⎡ ⎤ ⎡=⎢ ⎥ ⎢
⎢ ⎥ ⎢⎣ ⎦ ⎣
⎦
⎤⎥⎥⎦
2a) CLT for MA(inf) (Billingsley generalizes Lindberg-Levy25 to stationary and ergodic mds, now we generalize for serial corr)
Let t j t jy0j
μ ψ ε −= + ∑ w∞
=
here { }tε is iid white noise and 0
jψj
∞
=
< ∞∑ . Then,
( ) 0,Djj
n y Nμ γ∞⎛ ⎞− ⎯⎯→ ⎜ ⎟∑ =−∞⎝ ⎠
2b) MV CLT for MA(inf)
Let t j0
t jj
y μ ψ ε∞
−=
= + ∑ where { }tε is vector iid white noise (i.e. jointly covariance stationary) and 0
jj
ψ∞
=
< ∞∑ . Then,
( ) 0,Djj
n y Nμ∞
=−∞⎛ ⎞− ⎯⎯→ Γ⎜ ⎟⎝ ⎠∑
24 We can think of the sample as being generated from an infinite sequence of random variables (which is cov. Stationary). So, the “long-run” variance is the sum of covariances from any 1 element in the sequence to all the other elements.
25 Let {zi} be iid with E(zi) = ( )1
1( ) ( ) . ( ) ( , )n
Di i i
i
E z and Var z Then n z z Nn
μ μ μ=
= = Σ − = − ⎯⎯→ Σ∑ μ (an iid random vector converges to
a random vector that is normally distributed)
263a) Gordin’s CLT for Ergodic + Stationary + Gordin (conditions for restricting degree of autocorrelation) Processes uppose {y } is stationary and ergodic and suppose Gordin’s conditions are satisfied:
a) (this is a restriction on [strictly] stationary processes which might not have finite 2nd moments)
ondition: restricts degree of autocorrelation) Then,
The autocovariances
S t
2( )tE y < ∞
b) . . . .1 0 1( | , ,...) 0 (sin ) ( | , ,...) 0m s m s
t t j t j j jE y y y as j or equivalently ce stationary E y y y− − − − − −⎯⎯⎯→ → ∞ ⎯⎯⎯→ 27
c) )2, 1 1 2( | ) ( | ) ( , , ,...)t t k t t k t t k t k t k t k t kwhere r E y I E y I and Info Set I y y y− − − − − − − − − −< ∞ ≡ − ≡∑ ( ,
0t t kE r −
=28
1/ 2∞
k
(Telescoping sum c
1. { }jγ are absolutely summable
( ) 0,Djj
n y Nμ γ∞
=−∞⎛ ⎞− ⎯⎯→ ⎜ ⎟⎝ ⎠∑ 2.
3b) Gordin’s MV CLT for Ergodic + Stationary + Gordin29 (conditions for restricting degree of autocorrelation) Processes
} is stationary and ergodic and suppose Gordin’s conditions are satisfied for the vector ergodic stationary process: Suppose {yt
a) ( ')t tE y y < ∞ (this is a restriction ctly] stationary processes which might not have finite 2nd moments) on [stri
b) . . . .1 0 1( | , ,...) 0 (sin ) ( | , ,...) 0m s m s
t t j t j j jE y y y as j or equivalently ce stationary E y y y− − − − − −⎯⎯⎯→ → ∞ ⎯⎯⎯→
c) ( )1/ 2∞2
,0
t t kk
E r −=
ondition: restricts degree of autocorrelation) Then,
The covariance matrices
, 1 1 2( | ) ( | ) ( , , ,...)t t k t t k t t k t k t k t k t kwhere r E y I E y I and Info Set I y y y− − − − − − − − − −< ∞ ≡ − ≡∑
(Telescoping sum c
1. { }jΓ are absolutely summable
( ) 0,Djj
n y Nμ∞
=−∞⎛ ⎞− ⎯⎯→ Γ⎜ ⎟⎝ ⎠∑ 2.
a generalization of MDS CLT, t}is MDS sequence, then Gordin’s condition is satisfied. Note: This is since if {y
26 We should be able to check that with these condition we satisfy Billingsly’s general requirements for CLT to hold (Peter’s notes slide 16). 27 This condition implies that the unconditional mean is 0: ( ) 0tE y = . So, as j infinity, that is, as the forecast (the conditional expectation) is based on less and less information, it should approach the forecast based on no information (i.e. the unconditional expectation). (See White text for proof) 28 We can write yt as a telescoping sum as follows:
( ) ( ) ( )( ) ( ) ( )( )
1 1 2 2
1 1 2 1
, , 1 , 2 ,
( | ) ( | ) ( | ) ( | ) ... ( | ) ( | )
( | ) ( | ) ( | ) ... ( | ) ( | ) ( | )
... ( | )
( ), (
t t t t t t t t t t t t j t t j
t t t t t t t t t j t t j t t j
t t t t t t t t j t t j
t
y y E y I E y I E y I E y I E y I E y I
y E y I E y I E y I E y I E y I E y I
r r r r E y I
By b E y
− − − − − −
− − − − + −
− − − −
= − + − + − − +
= − + − + + − +
= + + + +
,0
| ) 0 ,t j ms
t t t kk
I as j therefore
y r
−
∞
−=
→ → ∞
= ∑
−
This telescoping sum indicates how the “shocks” represented by (rt,t,rt,t-1,…) influence the current value of y. Condition (c) says, roughly, that shocks that occurred a long time ago do not have disproportionately large influence. As such, this condition restricts the degree of serial correlation in {y}. 29 We should be able to check that with these condition we satisfy Billingsly’s general requirements for CLT to hold (Peter’s notes slide 16).
4) Examples: Asymptotic Distributions of AR(p), ARMA(p,q) Let {yt} be ARMA(1,1). To check whether LLN applies, we need to check whether { }jγ vanishes as j inf. ARMA(1,1) has a MA(inf) representation, and from previous, we know autocovariance is obtained by
(( , ) ...Cov y yγ θ θ θ θ θ θ= = + + + )0 1 10
0,1,...,j t t j j j q q j j kk
σ σ θ− + − +=
= =∑2 2q j
k for j qθ−
0j for j qγ = >
So for j >q, verify from here that the autocovariance. Then, to get asymptotic variance,
212 2 1
11
1(1) (1 )(1)1(1) (1 )j yj
gθθ θγ σ σφφ φ
−∞
−=−∞
⎛ ⎞−= = = ⎜ ⎟−⎝ ⎠
∑
In general we have for ARMA(p,q):
21
1
1 ...( ) 0, 0,
1 ...q
D jqj
n y N Nφ φ
μ γψ ψ
∞
=−∞
⎛ ⎞⎛ ⎞ ⎛ ⎞− − −⎜ ⎟⎜ ⎟− → = ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ − − −⎜ ⎟⎝ ⎠⎝ ⎠ ⎝ ⎠∑ by same reasoning as above.
6. Incorporating Serial Correlation in GMM Background: Recall in our GMM assumptions, from assumption 3.3 (orthogonality condition) E(gt) = 0, and in 3.5 we assumed it was a MDS no serial correlation in gt is allowed under this assumption, and as such, the matrix S (from which we obtain the asymptotic variance) is defined to be the variance of gt. Now we can relax assumption 3.5 a follows.
y = x ’δ + εt (t = 1,2,…,n) (this is the equation we want to estimate)
.2 Ergodic Stationarity: nt elements of (yt,xt, zt).
d er
n
Assumptions 3.1 Linearity: The data we observe comes from underlying RV’s { yt (1x1), xt (1xd)}with
t t
3Let zt be a M-dimensional vector of instruments, and let wt be the unique and nonconsta{wt} is jointly stationary an godic.
3.3 Orthogonality ConditioAll the M variables in zt are predetermined in the sense that they are all orthogonal to the current error term:
( ) 0 ,tm tE z t mε = ∀ ∀ ⇔ E[(yt – xt’β) . ⇔ zt ] = 0 E(gt) = 0 where gt = zt . (yt – xt’β) = zt .εt 3.4 Rank Condition for Identification : Guarantees there’s a unique solution to the system of equations
The m x d matrix E(zt xt’) is of full column nk (or E(x z ’) is of full row rank), M mxd ra t t > d (# of equations > # of unknowns). We denote this matrix by
y
∞ (this is a restriction on [strictly] stationary processes which might not have finite 2nd moments)
ΣZX.
3.5 Gordin’s Condition Restricting the Degree of Serial Correlation: Assumption for Asymptotic NormalitThe vector process {gt} = xt . εt satisfies the Gordin conditions:
a) ' '( ') ( )t t t t t tE g g E x xε ε= <
b) . .1 0 1( | , ,...) 0 (sin ) ( | , ,...)m s m s
t t j t j j jE g g g as j or equivalently ce stationary E g g g− − − − − −. . 0⎯⎯⎯→ → ∞ ⎯⎯⎯→
c) ( )1/ 2∞2
,t t kE r where− < ∞∑elescoping sum condition: restricts degree of autocorrelation)
no ular long-run covariance.
The
, 1 1 2( | ) ( | ) ( , , ,...)t t k t t k t t k t k t k t k t kr E g I E g I and Info Set I g g g− − − − − − − − − −≡ − ≡ 0k =
(T
{gt} has nsing
n, by 3(b),
( ) (00, ,D j jg N S S→ = Γ = Γ + Γ +∑ ∑ ) ( )' '
1
( 0, 1, 2,...) covj j t t jj j
n where E g g for j j th auto matrix∞ ∞
−=−∞ =
Γ Γ = = ± ± = −
Note: THIS IS THE ONLY DIFFERENCE BETWEEN THE GMM MODEL WITH AND WITHOUT SERIAL CORRELATION! Under the previous assumption 3.5 (martingale differences sequence), 0( ')i iS E g g= = Γ All the results developed previously carry over, using the new S (long run variance) matrix. Note2: If we know the form of the autocorrelation, we could always use a GLS estimator. However, GLS requires us to know var( | )XεΩ = , and is typically infeasible. Futhermore, when we have to estimate
var( | )XεΩ = , GLS may not dominate OLS! Thus, if we were to run OLS/GMM, then the consistent var-cov matrix is given here!
GMM E with Serial Correlation Results:
1. onsistency30: Under assumptions 3.1 – 3.4, bGMM(Wn) p δ 2. symptotic Normality31: Under assumptions 3.1 – 3.5,
stimator
C
A( ) D
nGMM VNWbn ),0()( ⎯→⎯−δ
( ) ( ) ( )( ) niiiiii
XZXZ
WpWwherezxEWzxEzxEW
W
lim)'()''()'() 1
1'
=
ΣΣ−
−
3. Consistent Estimate of Avar(bGMM(Wn))32:
Suppose there exists a consistent estimator S* of S = E(g g ’). Then, under 3.2, Avar(bGMM(Wn)) is consistently ated by
( ) iiiiiiii ggEWzxEzxEWzxE '()''()'()''( 1= −
estim
XZXZXZXZnGMM WSWWWbAVwhere )(var '1' ΣΣΣΣ==−
mxm i i
( ) ( ) 11 ''*'')ˆvar(ˆ −−≡ ZXnZXZXZXZXnZXGMM SWSSWSWSSWSbA
“true” error is consistent)33: 4. Consistency of s2 (estimation of variance of ( ) ( ) finisandexistsxxEassumeandUnderWbxydefineofWbestimatorconsistentanyFor iiniiin )'(,2.3,1.3.ˆ'ˆ,ˆ εδ −≡
( ) ( ) finiteisandexistsEprovidedEn
ite
iiP
n
i222ˆ1
,
εεε ⎯→⎯∑
5. Consistent estimation of S (long-run variance matrix): (see next)
i 1=
30 From above, since
( )
δ
εε
⎯→⎯∴
==⋅⎯→⎯=⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛⋅∑
=P
GMM
iiiP
n
n
iii
b
andtheoremergodicbygEzEgzn
ˆ
3.30)(1
1
31 Continuing from above,
( ) ( ) ( )
( )
1 1
1
ˆ ˆ' ' ' '
1 (0, ( ')) , ,
ˆ '
GMM ZX n ZX ZX n x GMM ZX n ZX ZX n z
nD P
x i i i i ZX XZ ni
GMM ZX
b S W S S W s n b S W S S W ns
ns z N E g g by Ergodic Martingale Differences CLT S by Ergodic Theorem W W by constructionn
n b S W
ε ε
ε
δ δ
ε
δ
− −
=
− = ⇒ − =
⎛ ⎞⎜ ⎟= ⋅ ⎯⎯→ ⎯⎯→Σ ⎯⎯→⎜ ⎟⎜ ⎟⎝ ⎠
∴ − =
∑( )
P
( ) ( )1 1 1' 0, ' ' ( ') ' ' 'Dn ZX ZX n z ZX n ZX ZX i i ZX ZX n ZXS S W ns N W W E g g W W by CMT and Slutsky sε
− − −⎛ ⎞⎯⎯→ Σ Σ Σ Σ Σ Σ⎜ ⎟⎝ ⎠
32 This follows from above. Standard asymptotic tools. 33 This proof is very similar to 3D from previous notes
assumptionbyfiniteSandbcebSb
bandvectorfinitesomescesb
bSbsbn
bxxn
bxn
bn
bxxbbxn
en
bxxbbxebxxxbxybxy
XXP
XXPP
XX
PPx
Px
XXx
n
ii
n
iii
n
iii
n
ii
n
iiiiii
n
ii
iiiiiiiiiiiiiii
Σ⎯→⎯⎯→⎯−⎯→⎯−−
⎯→⎯⎯→⎯⎯→⎯−
−−+−+=−⎟⎟⎠
⎞⎜⎜⎝
⎛−+⎟
⎟⎠
⎞⎜⎜⎝
⎛−+=−−+−+=⇒
−−+−+=⇒−+=−+−=−=
∑∑∑∑∑∑======
0ˆsin0)ˆ()'ˆ(
ˆsin0)'ˆ(2
)ˆ()'ˆ()'ˆ(21)ˆ('1)'ˆ('1)ˆ(21)ˆ(')'ˆ()ˆ('211
)ˆ(')'ˆ()ˆ('2)ˆ('''ˆ'ˆ'ˆ
1
2
111
2
1
2
1
2
22
βββ
ββ
βββεδδεδεδδδεε
δδδεεδεδδε
εε
ε
Met ods of Estimating Sh : This is difficult, particularly when autocovariances do not vanish after finite lags . Estimating S when Autocovariances Vanish after Finite Lags
We can estimate j-th order autocovariance for our vector process {g } of moment conditions as follows: 1
t
' 2 ' '
1 1j t t j t t t j t t t t t t
t j t jn n1 1 ˆ ˆˆ ˆ ˆ ˆˆ ˆ ˆ ,
n n
Let g g x x where g x y z for consistent forε ε ε δ δ δΓ = = = = −∑ ∑ 34 − −= + = +
If we know a priori that for a known and finite q, 0j for j qΓ = > (i.e. a finite lag after which there is no autocorrelation), then
S is consistently estimated by: ˆ '0
1
ˆ ˆ ˆq
j jj
S=
= Γ + Γ + Γ∑
If we do not know q, then the following are metho LR variance matrix: ds for estimating
2. Using Kernels to Estimate S Kernel-based (or an be expressed as a age of estimated autocovariances: non-parametric) estimators c weighted aver
1
1 ( )ˆ ˆ
n
jj n q n=− + ⎝ ⎠
jS k− ⎛ ⎞
= ⋅Γ⎜ ⎟∑ where the “kernel”, k(.), is a function that gives weights to autocovariances, and q(n) is the “bandwidth”
The above estimator in
Problems , with unknown q, we can use the truncated kernel with a bandwidth q that increases with sample size. As q(n) infinity, more and more (and ultimately all) autocovariances can be included in the calculation of S*. However,
i-definite in finite s problem because then the estimated asymptotic variance of the GMM estimator (which includes S) may not be positive semi-
Bartlett
| 1
Truncated Kernel
1 is a special kernel-based estimator with q(n) = q and 1 | | 1
( )0 | | 1
for xk x
for x≤⎧
= ⎨ > a “truncated” kernel.
⎩
In this caseincreases tothe truncated kernel-based S* is not guaranteed to be positive sem amples. This creates a
definite.)
Kernel and Newey-West Estimator: Newey and West found that the kernel-based estimator can be made nonnegative definite in finite samples using the Bartlett kernel
1 | | | | 1x for x ( )
0 |k x
for x− ≤⎧
= ⎨ >⎩ (looks like a bell-shaped curve bt [-1,1] that peaks at 0)
Bartlett kernel based estimators of S is called in econometrics the Newey-West estimator.
( )'{|
( ) ˆ ˆ1( )
( )1
| ( )} 01 1
| | ˆ ˆ( )
q nn
j q n jj n j
j q−
≤=− + =
= Γ +⎟
three) lags:
j jn j
q n q n⎛ ⎞ ⎛ ⎞−
− Γ Γ + Γ⎜ ⎜ ⎟⎝ ⎠ ⎝ ⎠
∑ ∑
Example: For q(n) = 3, the kernel-based estimator includes autocovariances up to two (not
( )1
0 1 1 2 2 3 3 0 1 1 21
0 1 2 2 1ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ( ') ( ') 1 ( ') ( ') ( ')3 3 3 3 3 3
n
jj n
jS k k k k k−
=− +
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞= ⋅Γ = ⋅Γ + ⋅ Γ + Γ + ⋅ Γ + Γ + ⋅ Γ + Γ = Γ + Γ + Γ + ⋅ Γ + Γ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠∑ 2
ˆ
3. HAC
VAR
34 Note that here gt is no longer ergodic stationary because we need to estimate the error term. For consistency we need to impose a fourth-moment condition (similar to that in prop 3.4). See Hansen’s notes p.56
Background: In time series regressions, {yt, xt} not always stationary as we have assumed thus far. There are 2 cases in which OLS is still consisten 1. Time Regressions (Ch 6) and 2. Cointegrating Regressions (Ch 10) Time Regressions
t:
Setup:
random variable xt = (1,t)’ is not stationary, since the second element (t) has mean t, which increases with time. Similarly, yt is not stationary since the time trend, α + δ.t is (trivially) non-stationary. However, y
Population Model of Interest: { } (t t ty t where is independent white noiseα δ ε ε= + ⋅ +
' '
{ } ( ) 0, ( ) )
(1, ) , ( , ) 't t t
t t t t
iid with E Var finite
or y x with x t
ε ε ε
β ε β α δ
=
= + = =
Trend Stationarity: Here, the t is trend stationary.
of rv’s is trend stationary if it can be written as the sum of a time trend and a stationary process (here the station mponent is the independent white noise)
onsiste cy of O
Def: A process/sequence
ary co C n LS estimates in Time Regressions: It turns out, OLS estimates ˆˆ,α δ are consistent but have different rates of convergence.
ˆProp: Consider the time regression where εt is independent white noise with E(εt2) = σ2 and E(εt
4) < inf and let ˆ ,α β be the OLS estimates of α and δ. Then,
12
3/ 2
ˆ( ) 1 10,
ˆ 1/ 2 1Dn
Nα α
σ− ⎛ ⎞⎜ ⎟⎯⎯→⎢ ⎥ ⎜ ⎟⎜ ⎟
/ 2/ 3
−⎛ ⎞⎡ ⎤
,
( )n δ δ⎢ ⎥− ⎝ ⎠⎣ ⎦ ⎝ ⎠
Just as in the stationary case α̂ -consistent because ˆ( )n α α− converges to a (normal) R.V. On the other hand, δ̂is n is
n3/2- consistent in that 3/ 2 ( )n δ̂ δ− converges to a R.V. Therefore, δ̂ is hyperconsistent35. Hypothesis TesFor 0
Proof: See next page.
ting for time Regressions: 0 ˆ:H α α=
( )( )
( )( )
( )( )
( )(
(( )
1 1 112 2 2 2 12
2 1
ˆ ˆ ˆ ˆˆt
1 1 11 0 ' 1 0 ' 1 0' 0 00
ˆ
11 0
0
n t t n n t t n nt t t tt
n
n n n n
n s x x s x x
α α α α α α α
1 0 ' 00t tts x x ss n x x
n
s Q
α α−
α α
− −−−
−
− − −= = = = =
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎛ϒ ϒ ϒ ϒ⎡ ⎤ ⎡ ⎤
−
⎡ ⎤⎡ ⎤ ϒ⎡ ⎤⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎜⎣ ⎦ ⎣ ⎦ ⎣ ⎦⎝⎣ ⎦ ⎣ ⎦⎢ ⎥⎣ ⎦
−=
⎡⎡ ⎤⎣ ⎦
∑ ∑⎢ ⎥⎣ ⎦ ⎣ ⎦⎣ ⎦∑ ∑∑
⎤⎢ ⎥⎣ ⎦
For 0 0
ˆ:H δ δ=
( )( )
( )( )
( )( )
( )( )
( )
3/ 2 3/ 2 3/ 2 3/ 20 0 00
11 1 112 2 2 2 1 12 3/ 23/ 2
ˆ ˆ ˆ ˆˆ
0 0 0 0 00 1 ' 0 1 ' 0 1 ' 0 1 '0 '1 1 1 1t t n t t n n t t n n t t nt tt t t tt
n n n nt
s x x s x x s x x s x xs n x xn
δ δ δ δ δ δ δ δδ δ−− − −−
− −
− − − −−= = = = =
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0
⎡ ⎤⎛ ⎞⎡ ⎤ ϒ ϒ ϒ ϒ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ϒ ϒ⎡ ⎤⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎜ ⎟⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎢⎣ ⎦⎣ ⎦ ⎝ ⎠⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣⎢ ⎥⎣ ⎦∑ ∑ ∑ ∑∑
( )3/ 20
2 1
ˆ
00 1
1n
n
s Q
δ δ
−
⎥⎦
−=
⎡ ⎤⎡ ⎤ ⎢ ⎥⎣ ⎦
⎣ ⎦
35 Estimators that are n-consistent are usually called superconsistent. (οτηερσ υσε τηε τερμ φορ εστιματορσ τηατ αρε νγ consistent for γ > ½.
( )1 11
1
2
ˆˆ ' ' ' ( ' )ˆ
1ˆ 1 ( 1) /ˆ ' ' 1ˆ ( 1) / 2
t t t t t t t t t t t t tt t t t
t tt t t t t tt t t t
t t
b X X X y x x x y x x x x
t n n nb x x x with x x t
t n nt t
αβ ε
δ
α αβ ε
δ δ
− −−
−
⎡ ⎤ ⎛ ⎞ ⎛ ⎞≡ = = ⋅ = ⋅ +⎢ ⎥ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎢ ⎥⎣ ⎦
⎡ ⎤⎡ ⎤− ⎢ ⎥ +⎡ ⎤⎛ ⎞ ⎛ ⎞− = = ⋅ = = =⎡ ⎤⎢ ⎥ ⎢ ⎥⎜ ⎟ ⎜ ⎟ ⎢ ⎥ ⎣ ⎦ +⎝ ⎠ ⎝ ⎠−⎢ ⎥ ⎣ ⎦ ⎢ ⎥⎣ ⎦
⎣ ⎦
∑ ∑ ∑ ∑∑ ∑∑ ∑ ∑ ∑ ∑ ∑
2
( )( )
3/ 2
1 11 1
3/ 2
(2 1)( 1) / 6
0, . ...
0
ˆˆ( ) ' 'ˆ
n n
n n t t t t n t t n n t t nt t t t
n n n
nTo get consistency result let Multiply both sides by to get
n
nb x x x x x x x
n
α αβ ε ε
δ δ
− −− −
⎡ ⎤⎢ ⎥+ +⎣ ⎦
⎡ ⎤⎢ ⎥ϒ = ϒ⎢ ⎥⎣ ⎦
⎡ ⎤−⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎢ ⎥ϒ − = = ϒ ⋅ = ϒ ϒ ϒ ⋅ = ϒ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎢ ⎥ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠−⎢ ⎥⎣ ⎦∑ ∑ ∑ ∑
( )
11 1
1
1 1/ 21
3/ 2 3/ 2
1/ 2 1/ 2 1/ 2 1
3/ 2 3/ 2 3/ 2
'
0 0
0 0
0 ( 1) / 2 0 0( 1) / 2 (2 1)( 1) / 60 0 0
t t n n t tt t
n n
n
n
x x
Q v
n n
n n
n n n n n n nQ
n n n n nn n n
ε−
− −
−
− −−
−
− − −
− − −
⎛ ⎞ ⎛⎛ ⎞ ⎛ϒ ϒ ⋅⎜ ⎟ ⎜⎜ ⎟ ⎜⎝ ⎠ ⎝⎝ ⎠ ⎝
≡
⎡ ⎤⎡ ⎤⎢ ⎥⎢ ⎥ϒ = =⎢ ⎥⎢ ⎥⎣ ⎦ ⎣ ⎦
⎡ ⎤ ⎡ ⎤ ⎡ ⎤+⎡ ⎤⎢ ⎥ ⎢ ⎥ ⎢ ⎥= =⎢ ⎥+ + +⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎣ ⎦⎣ ⎦ ⎣ ⎦ ⎣ ⎦
∑ ∑ ⎞⎞⎟⎟⎠⎠
/ 2 1/ 2
1/ 2 1/ 2
2
1/ 2 1/ 2
3/ 2 3/ 23/ 2
( 1) / 2
( 1) / 2 (2 1)( 1) / 6
1 ( 1) /(2 )
( 1) /(2 ) (2 1)( 1) /(6 )
10 1 0
10 0
tt ttn tt
t tt t
n n
n n n n n
n n
n n n n n
n n nvtn n t t
n
εεε
ε ε
−
−
− −
− −
⎡ ⎤+⎢ ⎥⎢ ⎥+ + +⎣ ⎦
+⎡ ⎤= ⎢ ⎥
+ + +⎢ ⎥⎣ ⎦⎡⎡ ⎤
⎡ ⎤ ⎡ ⎤⎛ ⎞ ⎢ ⎥⎡ ⎤⎢ ⎥ ⎢ ⎥= ⋅ = =⎜ ⎟ ⎢ ⎥⎢ ⎥⎜ ⎟⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎢ ⎥⎝ ⎠⎣ ⎦ ⎣ ⎦⎣ ⎦
∑∑∑ ∑ ∑( )
11
2
1
1
1 1/ 2 1 1/ 2,
1/ 2 1/ 3 1/ 2 1/ 3,
(0, )
1 1
( ') ( ) ( ')
tt
tt
n nn n
Dn
t tt tn n n n
ntnn
Clearly Q Q Q
Based on a CLT for nonstationary processes
v N Q
n nV E u u E u E u E
ε
ε
σ
ε ε
−−→∞ →∞
⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎣ ⎦⎣ ⎦
⎡ ⎤ ⎡ ⎤⎯⎯⎯⎯→ = ⇒ ⎯⎯⎯⎯→⎢ ⎥ ⎢ ⎥
⎣ ⎦ ⎣ ⎦
⎯⎯→
⎛ ⎞ ⎛⎜ ⎟⎝ ⎠ ⎝
= − =
∑∑
∑ ∑'1 1 1 1
1 11 1 1 1
1 1
1
t t t tt t t t
t tt t t t t tt t t t
t t t tt t t t
t t
tnn n n nE E
t tt t tn nn nn n nn n n n
tE En n n
tEn n
ε ε ε ε
ε εε ε ε ε
ε ε ε ε
ε ε
⎛ ⎞⎛ ⎞ ⎡ ⎤ ⎡ ⎤⎞ ⎛ ⎞⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥⎠ ⎝ ⎠ ⎝ ⎠⎜ ⎟ ⎢ ⎥ ⎢ ⎥−⎜ ⎟ ⎢ ⎥ ⎢ ⎥⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞⎜ ⎟ ⎢ ⎥ ⎢ ⎥⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎣ ⎦ ⎣ ⎦⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠
⎛ ⎞⎛ ⎞⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠=
∑ ∑ ∑ ∑∑ ∑∑ ∑ ∑ ∑
∑ ∑ ∑ ∑'
2 2
22 2
2
1 1( ) ( )( ) 0 0
1 11
1 1
1 1
t tt tt
t tt t t tt t t t
t tt t
t tt t
nE nEtn n with E so E
t t nt t E EEn n n nn n n
tE En n n
t tE En n n n
ε εε ε
ε εε ε
ε ε
ε ε
⎛ ⎞ ⎡ ⎤ ⎡ ⎤⎜ ⎟ ⎢ ⎥ ⎢ ⎥⎜ ⎟ ⎢ ⎥ ⎢ ⎥− = =⎜ ⎟ ⎢ ⎥ ⎢ ⎥⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥⎜ ⎟ ⎣ ⎦ ⎣ ⎦⎝ ⎠ ⎝ ⎠⎝ ⎠⎛ ⎛ ⎞⎛ ⎞⎜ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎜
= ⎜ ⎛ ⎞⎛ ⎞⎜ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎝
∑∑ ∑∑ ∑ ∑ ∑
∑ ∑∑ ∑
2 2 2 21 2
2 2 22 2 2 2 2 2
1 2 1 22 2
sin ( , ) 0, ( ) 0 ( ) 0
1 1 1 2( ) ( ) ( ) ... ( )
1 1 2 1 1 2( ) ( ) ... ( ) ( ) ( ) ... ( )
i j i i j
t n
n n
ce Cov e e E e E e e for i j
nnE E E En n n n n
win nE E E E E E
n n n n n nn n
ε ε ε ε
ε ε ε ε ε ε
⎞⎟⎟
= = ⇒ = ≠⎟⎟
⎜ ⎟⎜ ⎟⎠
⎛ ⎞⎡ ⎤+ + +⎜ ⎟⎢ ⎥
⎣ ⎦⎜ ⎟= ⎜ ⎟⎡ ⎤⎡ ⎤⎜ ⎟⎢ ⎥+ + + + + +⎢ ⎥⎜ ⎟⎜ ⎟⎢ ⎥⎣ ⎦ ⎣ ⎦⎝ ⎠
2 2
2 22 2
2 22 2
2 2 2 2 2 2 22 2 2
2 3 2 2
2
( )
(1 2 ... ) ( 1) / 2( 1) / 2
( 1) / 2 (2 1)( 1) / 6(1 2 ... ) (1 2 ... ) ( 1) / 2 (2 1)( 1) / 6
1 1/
t
n
th E for all t
n n nn nn n
n n n n nn n n n n n nn n n n
ε σ
σ σσ σσ σ
σ σ σ σ σ σ
σ→∞
=
⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟+ + + ⎛ ⎞+⎜ ⎟ ⎜ ⎟ ⎜ ⎟= = =⎜ ⎟ ⎜ ⎟ ⎜ ⎟+ + +⎜ ⎟ ⎜ ⎟ ⎝ ⎠+ + + + + + +⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
⎯⎯⎯⎯→
( )
2
1 1 2 1 2 1 13/ 2
21/ 2 1/ 3
ˆ( ), (0, ') (0, ) sin
ˆ( )D
n n
Q
nBy Slutsky Q v N Q QQ N Q ce Q symmetric
n
σ
α ασ σ
δ δ
− − − − −
⎡ ⎤=⎢ ⎥
⎣ ⎦⎡ ⎤−⎢ ⎥∴ = ⎯⎯→ =⎢ ⎥−⎣ ⎦
Cointegration Regressions This is the other case where we don’t have stationarity, but we can still apply OLS to obtain consistent results.
1. Background: Modeling Trends The idea of this chapter is that economic time series can be represented as the sum of…
i. a linear time trend, ii. a stochastic trend, iii. and a stationary process
Def: Deterministic Trend / Linear Time Trend is a trend in the mean Def: Stochastic Trend is a Martingale36, a trend where the best predictor of future value is its current value (and thus every change in the trend seems to have a permanent effect on its future value).
36 Zt is a martingale if E(Zt | Zt-1, Zt-2, …) = Zt-1
Unit Root : Let Yt = Yt-1 + εt (where εt is iid white noise)
1. Step Function (maps discrete time to cumulative shocks i.e Yt): ⎣ ⎦
∑=
≡un
t
tn n
uW1
1)(εσ
ε
2. Asymptotic Results of Wn(u)
• CLT37: For any ),0()(],1,0[ uNuWuanyFor Dn →∈
• FCLT: If εt is iid white noise, then )()(],1,0[ uWuWu Dn →∈∀ (this is a convergence of a sequence of functions) Note: This means that for our step function converges to a Brownian motion at the limit, when the steps become finer and finer, or equivalently, when the shocks are accumulated in shorter and shorter periods of time until it becomes “continuous time”
2. Useful Things to Keep in Mind in Deriving Asymptotic Convergence Results to Stochastic Integral:
⎣ ⎦
)(1)(1
uWn
uW eD
un
ttne σεσ →≡ ∑
=
1 12
01 1
1 ( ) ( )n t
s t Dt s
W u dW un εε ε σ
−
= =
→∑∑ ∫
a) ⎣ ⎦
∑=
≡un
t
tn n
uW1
1)(εσ
ε
b) 11
1111−
=
=⎟⎠⎞
⎜⎝⎛ −
==⎟⎠⎞
⎜⎝⎛ ∑ tnt
t
t
tn Y
nntWORY
nnntW
εεε σσσε
c) )1,0[111 n
forYnnn
tW t
t
t
tn ∈==⎟
⎠⎞
⎜⎝⎛ + ∑
=
εσσ
εε
εε
d) ( ) )(1 1
cfromfollowsthisduuWntW
nn
t
nt nn ∫
+
=⎟⎠⎞
⎜⎝⎛
e) ( ) ( ) ( ) ( )∫∑∫∑∫∑∫∑ ==⎟⎠⎞
⎜⎝⎛ −
==⎟⎠⎞
⎜⎝⎛
=−
=
−
=
+−
=
1
01
11
1
0
1
0
11
0
111 duuWduuWn
tWn
ORduuWduuWntW
n n
n
t
nt
nt n
n
tnn
n
t
nt
nt n
n
tn
f) ( ) ( )∫∑∫∑ ==⎟⎠⎞
⎜⎝⎛ −
==
−
−
=−
1
00
1
1
01
111 duuWduuWn
tWn
Yn n
n
t
nt
nt n
t
tnt
37 ⎣ ⎦ ⎣ ⎦
⎣ ⎦
⎣ ⎦
⎣ ⎦
⎣ ⎦ ⎣ ⎦ un
unandN
unceuN
unnun
nuW D
un
t
tD
un
t
tun
t
tn →→→=≡ ∑∑∑
===
)1,0(1sin),0(11)(111 εεε σ
εσε
σε
Applications of Unit Root Tools
1. Unit Root and Regression Coefficient
Suppose we have the following unit root process, (That is, let Y0 = 0) ∑∑==
− =+=+=t
ss
t
ssttt YYY
1101 εεε
What is the distribution of sampling error when we regress Yt on Yt-1?
( )
( )( ) ( )( )
( )( ) ( )( )
( )( ))(
)()()1ˆ(
)()(1
11
11
1
11
1
11
1
11
1
)1ˆ(
1
0
2
1
0
1
0
221
0
221
0
2
1
1
1
1
0
22
1
1
1
11
22
1
1
1
1
22
1
1
1
1
21
1
1
1
21
1
1
1
1
1
21
11
FullerDickieisThisCMTbyduuW
udWuWn
CMTbyduuWduuWandudWuWn
duuW
n
duuW
n
ntW
n
n
ntW
n
n
nn
n
Ynn
Ynn
D
DnD
n
tt
t
ss
n
n
tt
t
ss
n
t
nt
nt n
n
tt
t
ss
n
tn
n
tt
t
ss
n
tn
n
tt
t
ss
n
t
t
ss
n
tt
t
ss
n
tt
n
ttt
−→−⇒
→→
==
⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎠⎞
⎜⎝⎛ −
=
⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎠⎞
⎜⎝⎛ −
=
⎟⎟⎠
⎞⎜⎜⎝
⎛==−
∫∫
∫∫∫∑∑
∫
∑∑
∑∫
∑∑
∑
∑∑
∑
∑∑
∑ ∑
∑∑
∑
∑
=
−
=
=
−
=
=−
=
−
=
=
=
−
=
=
=
−
=
=
−
=
=
−
=
=−
=−
φ
σσσεε
σ
εε
σ
εε
σ
εε
σ
εε
ε
εεεφ
εεε
εε
εε
2. AR(1) with Unit Root and Constant
Suppose we have the following unit root process: ttt YY εμ ++= −1 (Here each period there are 2 terms in the shock) What is the distribution of sampling error of the OLS estimate from regression Yt on (1, Yt-1)
( )( )
( )
( )
( )
( ) ( ) ( ) ( ) ( )
1 11 1 1
2 21 1
1 11 1 1 1 1
11 1 1 1 1 1 1 1 1 1 1
ˆ1 1
1 1 1 1 1 ( 1) 1,2 2
( 1
n n n
t t t t tt t t
n n
t tt t
n n t n t n t n t n t
t s s s st t s t s t s t s t s
Y Y Y Yn
Y Y Y Yn n
n n nNow Y Yn n n n n n n
nY
ε ε εφ φ
1μ ε ε μ ε μ ε
− − − −= = =
− − − −= =
− − − − −
− −= = = = = = = = = = =
−
− −
− = =
− −
− −= = + = + = + = +
−⇒ −
∑ ∑ ∑
∑ ∑
∑ ∑∑ ∑∑ ∑∑ ∑∑ ∑∑ μ
( )
( ) ( ) ( ) ( )
1
1 1 1
1
1 01 1
1 1 1 1
11 1 1 1
) 1 1 12
1 ( 1) 1 1 1 ( )2
, ( 1)
n t n
s nt s t
tn nn
n n n ntt t n
t t t t
t s s ss s s s
tWnn n
n t tY W W W u du Wn n nn
Similarly Y t
ε
ε
μ ε σ
μ σ
ε μ ε μ ε μ σ
−
= = =
− −= =
− − − −
−= = = =
⎛ ⎞ −⎛ ⎞ ⎛ ⎞= =⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
− − −⎛ ⎞ ⎛ ⎞ ⎛ ⎞⇒ − = = = ≡⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠
= + = + = + − =
∑ ∑ ∑
∑ ∑∫ ∫
∑ ∑ ∑ ∑
( )
1 1
11
2
11
1 ( 1)
1 1 1( 1) ( 1)
1 1 1( 1)2ˆ
1 1 ( 1) ( 1)2 2
n
t n t n
n
t t nt
n
tt
tnW tn
t tY t nW Y t Wn nn
n tY t Y Wn n
nn n n nY Y
n n
ε
ε ε
ε
μ
μ σ μ σ
μ μ ε σφ φ
μ μ
− −
− −=
− −=
−⎛ ⎞ + −⎜ ⎟⎝ ⎠
− −⎛ ⎞ ⎛ ⎞⇒ − − = ⇒ − − =⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎜ ⎟ ⎣ ⎦ ⎜ ⎟⎝ ⎠ ⎝ ⎠
⎛ ⎞− −⎡ ⎤− − − −⎡ ⎤⎜ ⎟⎣ ⎦ ⎢ ⎥⎣ ⎦⎝ ⎠∴ − = =⎛ ⎞− −⎡ ⎤ ⎡ ⎤− − −⎜ ⎟⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦⎝ ⎠
∑
∑1
2
1
1
...1 1
n
n tt
n
n nt
Wn
tW Wn nε
ε
σ
=
=
⎛ ⎞⎛ ⎞ −⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
⎛ ⎞−⎛ ⎞ −⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
∑
∑
3. Spurious Regression Example Suppose we’re estimating
1 11 1
1 1 1 1 1 1 12 2
2
1 1 1 1 1
1 1, ,
1 1 1 1
ˆ
1 1
t t
t t t t s t s n t n ts s
n n t t n t t
t t s s s s nt t s s t s s
n n t n tt s s
t t s t s
t tY X by OLS where Y X W Y V Xn nn n
tX Y Wn nn n
Xn n
ε ε
ε
β ε ε ησ σ
ε η ε η σβ
η η
+ += =
= = = = = = =
= = = = =
⎛ ⎞ ⎛ ⎞= + = = ≡ ≡⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
−
= = = =⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
∑ ∑
∑ ∑∑ ∑ ∑ ∑ ∑
∑ ∑ ∑ ∑ ∑
( ) ( )
1 12 2
2
1 1
11 0
2
11
1 1 1
1 1
1 1
1
n n
n nt t
n n
n nt t
tnn
n ntn nt n
tnn
ntt n
t tV Wn n n
t tV Vn n
t tW V W u V u dun n
tVn
η ε η
η η
εε
η
σ σ σ
σ σ
σ σ
σ
= =
= =
−=
−=
1
1
ntV
n
n
− − −⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠
=⎛ ⎞ ⎛ ⎞− −⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠
− −⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
= =⎛ ⎞−⎛ ⎞⎜ ⎟⎜ ⎟
⎝ ⎠⎝ ⎠
∑ ∑
∑ ∑
∑∫
∑∫ ( )
( ) ( )
( )
1 1
01 12 2
0 0
D
n
W u V u du
V u du V u du
ε
η η
σ
σ σ→
⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦
∫ ∫∫ ∫
What is the distribution of variance of errors?
( ) ( )
( )( ) ( )
( )
( ) ( )
( )( )
2 22 2
21 12
2 22 2 2 0 01 12 2
1 1 1 1 1 1 1 10 0
2
ˆ ˆ ˆ ˆˆ ˆ 2
ˆ ˆˆ 2 2
1
t t t t t t t t t t
n n n n n t n nn n n nt t t t t s t t t
t t t t t s t tn n
e Y X e Y X Y X Y X
W u V u du W u V u due Y X Y X X Y X
V u du V u du
n
ε ε
η η
β β β β
σ σβ β ε
σ σ= = = = = = = =
= − ⇒ = − = − +
⎡ ⎤⎛ ⎞ ⎢ ⎥
⇒ = − + = − +⎜ ⎟ ⎢ ⎥⎜ ⎟ ⎢ ⎥⎝ ⎠ ⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦⎣ ⎦
⇒
∫ ∫∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑∫ ∫
( ) ( )
( )
( ) ( )
( )
( ) ( )
( )
21 12 2
2 0 01 12 2
1 1 1 1 10 0
11 22 0
2 1 201
0
1 1 1 1 1 1 1ˆ 2
1 ˆ ( ) 2
n n t n nn n n nt s t t t
t t s t tn n
n n nt n
t n
W u V u du W u V u due X Y
n n nn n nV u du V u du
W u V u due W u du
n V u du
ε ε
η η
ε
ε
η
σ σε
σ σ
σσ σ
σ
= = = = =
=
⎡ ⎤⎛ ⎞ ⎢ ⎥ ⎛ ⎞
= − +⎜ ⎟ ⎢ ⎥ ⎜ ⎟⎜ ⎟ ⎝ ⎠⎢ ⎥⎝ ⎠ ⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦⎣ ⎦
⇒ = −⎡ ⎤⎣ ⎦⎡ ⎤⎣ ⎦
∫ ∫∑ ∑ ∑ ∑ ∑∫ ∫
∫∑ ∫∫
( ) ( )( ) ( )
( )
Xn
( )
( ) ( )
( )
( ) ( )
( )
( ) ( )
211 1 20
1 20 0
02 21 1
1 0 021 12 20
0 021
1 02
0
( ) 2
( )
n nn n n
n
n n n n
n
n n
n n
n
W u V u duW u V u du V u du
V u du
W u V u du W u V u duW u du
V u du V u du
W u V u duW u du
ε
ε η η
η
ε ε
ε
η η
ε
ε
σσ σ
σ
σ σσ
σ σ
σσ
σ
⎡ ⎤⎢ ⎥
⎡ ⎤+ ⎢ ⎥ ⎣ ⎦⎢ ⎥⎡ ⎤⎣ ⎦⎣ ⎦
⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦= − +⎡ ⎤⎣ ⎦
⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦
⎡ ⎤⎢ ⎥⎣ ⎦= −⎡ ⎤⎣ ⎦
∫∫ ∫
∫
∫ ∫∫
∫ ∫
∫∫
( )
[ ]( ) ( )
( )
( ) ( )
( )
( )
1 2
021
1 021 20
0
21
1 022 2 1 20
222 1 1 0ˆ 1 2
2 2 20
1 1 1
( )
( )1 1ˆ ˆˆ 1ˆ 0
1
n
D
n nn n
nt t
nt t
n n nnt t t
t t t
V u du
W u V u duW u du
V u du
W u V u duW u du
e e V u dun nSince
n V u duX X Xn
η
ε
ε
η
ε
ε
ηε
β
η
σσ
σ
σσ
σσσ
σ
= =
= = =
⎡ ⎤⎣ ⎦
⎡ ⎤⎢ ⎥⎣ ⎦→ −
⎡ ⎤⎣ ⎦
⎡ ⎤⎢ ⎥⎣ ⎦−⎡ ⎤⎣ ⎦
⎡ ⎤⎣ ⎦= = = = →
⎡ ⎤⎣ ⎦
∫
∫∫
∫
∫∫∑ ∑ ∫
∫∑ ∑ ∑