complements if a then s a a ( 1 ) · i} as a sample path of an infinite sequence of random...

Time Series: Given joint ergodic stationarity between Y and X, when we have serially correlated errors, our consistency results do not change. However, our asymptotic distribution results do - we need to generalize CLT’s to obtain asymptotic normality for serially correlated processes with the “right” variance. (The generalization is possible under certain conditions by restricting the degree of serial correlation, and the condition is transparent for linear stochastic processes) Probability Measure, Sigma Algebra (Sigma Algebra is what we define our measures on), Borel Fields Def: The set S of all possible outcomes of a particular experiment is called the sample space of the experiment. Def: A collection of subsets of S, denoted B, is called a sigma field or sigma algebra if it satisfies the following:

( )1 2 1

1. :

2. : , \

3. : , ,... ,

c

ii

Empty Set

Complements If A then S A A

Unions If A A then A

β

β β

β β=

∅ ∈

∈ =

∈ ∈∪∈

1 21 1 1

: , ,... , 2 'c

ci i i

i i i

Pf If A A then clearly A A by A by De Morgan s Lawsβ β β β∞ ∞ ∞

= = =

⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟∈ ∈ ⇒ ∈ ⇒ ∈⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠∩ ∩ ∪

Def: P is a probability measure on the pair (S,B), if P satisfies:

( )1 2 11

1. ( ) 02. ( ) 1

3. , ,... int, ( )i iii

P A for all AP S

If A A are pairwise disjo P A P A

β

β∞

==

≥ ∈=

∈ = ∑∪

Def: Let X: (Ω,F) (R, B) be F measurable. A Borel field is the smallest σ-field that makes X measurable, given by: { }1( ) : ( )X G G X B for some Bσ β−≡ ⊆ Ω = ∈

(Think of this is the only sets in the universe that the random variables gives us information about – since they are the sets that are preimages of all the possible outcomes of the r.v. So, the random variables X is informative about members of σ(X) but not more than that! ) Probability Space, Random Variables, and Measurability Def: The triple (Ω,F,P) is called a probability space,

where Ω is the “universe” (or the whole set of outcomes, like S), F is the σ-field on Ω (like B), and P is the underlying probability measure that governs all random variables, i.e. a probability measure on (Ω,F)

Def: A random variable is a function from the sample space into the real numbers, or a measurable mapping from (Ω,F) into (R, B) (So, for a random variable X: (Ω,F) (R, B), the sample space for X is R) Def : A random variable X: (Ω,F) (R, B) is F-measurable if the preimage { }: ( )X F for all Bω ω β β∈ Ω ∈ ∈ ∈ (all the events in B can be mapped back to F and be measured there) Note: X(ω) is a random variable that induces a probability measure PX on (R, B), ω ε Ω (the universe)

PX is defined from P (a probability measure on (Ω,F) ) by { }( )Pr : ( ) ( ) : ( )XX takes on values in B P B P X B P X B for some Bω ω β≡ ∈ = ∈ Ω ∈ ∈

Def: A random variable Y = g(X): (RX, BX) (RY, BY) induces the probability measure PY on the sample space RY as follows: { }( ) { }( ), ( ) ( ) : ( ) : ( )Y Y X X Xfor some A B P A P Y A P X x R Y g x A P x R Y g x A∈ ≡ ∈ = ∈ ∈ = ∈ = ∈ = ∈ Prop: Let F and G be 2 σ-fields s.t. G c F (all the sets in G are also in F). If a random variable X is G-measurable, then X is F-measurable1. Conditional Expectations and Law of Iterated Expectations Def: Let X and Y be real-valued random variables on (Ω,F,P) and let G = σ(X). Suppose E|Y| finite. The conditional expected value of Y given X is a random variable (function of X) that satisfies the following 3 conditions2: 1. ( | )2. ( | ) : . . , { : ( | )( )} ( ) ( ( | ) inf )

3. , ( | )( ) ( ) ( ) ( )g g

E E Y XE Y X is G measurable i e B E Y X X E Y X is as ormative as X but no more sophisticated than X

For all g G E Y X dP Y dP

β ω ω σ

ω ω ω ω

< ∞

− ∀ ∈ ∈ Ω ∈

∈ =∫ ∫

1 { }: , : ( )Pf B X B G Fβ ω ω∀ ∈ ∈ Ω ∈ ∈ ⊆2 Y always satisfies 1 and 3. But Y will only satisfy 2 if σ(Y) c σ(X) i.e. Y is no more informative than X. So typically not possible to use Y as E(Y|X).

Alternative representation of E(Y|X) and usefulness: E(Y|X) = E(Y|σ(X) ) = E(Y|G)

We do this bc when X takes on certain values, it maps to values in the preimage or equivalently the Borel field. Example: Let E(Y|X) = E(Y|σ(X) ) = E(Y|G) and E(Y|X,Z) = E(Y|σ(X,Z) ) = E(Y|H) Since G c H, then E(E(Y|X,Z)) = E(E(Y|H)) = E(E(Y|G) | H) = E(Y|G) = E(Y|X) Law of Iterated Expectations: ( ) ( )|XE Y E E Y X⎡ ⎤= ⎣ ⎦

3 Generalized Law of Iterated Expectations: For G c H (G is a less fine partition than H),

( ) ( ) ( )| | | | |E Y G E E Y H G E E Y G H⎡ ⎤ ⎡ ⎤= =⎣ ⎦ ⎣ ⎦4

Property of Conditional Expectation: For real-valued random variables, Y and X, we have E(YX|X) = E(Y|X)X Basic Time Series / Stochastic process Concepts: Stochastic process : a sequence of random variables { }iz

Realization/Sample Path of a stochastic process: a realization of { }iz sequence of real numbers Time Series: If the sequence of rv’s is indexed by time, the stochastic process is called a time series. (we often will use “time series”

to refer to the realization and the stochastic process) Ensemble Mean: { }iz (“true” mean of each of the rv’s in the sequence) Important Point: We can always think of our data {zi} as a sample path of an infinite sequence of random variables {…,Z-1,Z0,Z1,Z2,…}, where the data is a realization of a particular subsequence of the infinite sequence.

What can we say about the underlying random variables that generate our data/time series: Strictly Stationary Processes5: A stochastic process/sequence of RV’s (e.g. a time series) { } ...)2,1( =izi is (strictly) stationary if { } 1 1( ,..., ) 0,1, 2,... ( ,..., ) ( ,..., )t t K K t t KF Z Z does not depend on t for all K F Z Z F Z Z for all t and K+ + += ⇔ = F(Z ) does not depend on t! (all observations come from same distribution. Though this says entical but

Prop: If strictly

idt not necessarily independent)

{ } ...)2,1( =izi ( ) stationary, then { }( ) ( 1, 2...)ig z i = (strictly) stationary for some cont. function g6.

eakly Stationary Process: A stochastic process/sequence of RV’s

{ } ...)2,1( =iziW is weakly (or covariance) stationary if:

pends only on j but not on i (e.g. Cov(z ,z ) =Cov(z ,z ) )

hite Noise Processes : A covariance stationary process {zi} is a white noise if

0 for j != 0 Not weakly stationary processes. It is a process with 0-mean and no serial correlation.

i. E(zi) does not depend on i and ii. Cov(zi,zi-j) exists, is finite, and de 1 5 12 16

7Wi. E(zi) = 0 ii. Cov(zi,zi-j) =

e: This is an important class of

3 By condition 3 in the definition of conditional expectation, since E(Y|X) is clearly Ω-measurable,

, ( ( | )) ( | )( ) ( ) ( ) ( ) ( )for E E Y X E Y X dP Y dP E Yω ω ω ωΩ Ω

Ω ∈ Ω = = =∫ ∫

4 So the usual law of iterated expectations is a special case where G = { },Ω ∅ because E(Y|G) = E(Y) in this case. Remember, E(Y) is just taking expectation over the trivial sigma field. 5Note: So, the joint distribution of depends only on (for any given finite integer r and any set of subscripts i1,…,ir) i1-i, i2-i, …,ir-i but not i. (e.g. joint

distribution of (z1,z5) is same as (z12,z16) what matters is the relative position in the sequence not the absolute position!)

),...,,,(21 riiii zzzz

7 A white noise process that is not strictly stationary: Let w ~ unif(0,2π) and define zi = cos(iw) for i = 1,2,… Then, E(zi) = 0, Var(zi) = ½, and Cov(zi, zj) = 0. So {zi} is white noise process, but it is clearly not an independent (since each depends on w) white noise process or strictly stationary (since it depends on i). (WHY????)

11 1

1 12 2

1

21

2 [ ] /( ) ( ) ( ( ) ) ( [ ] / ) 1 [ ] /( 2 )2

1 1( ) 1 [ ] /( 2 ) sin( 2 ) 1 1

1, ( ) ( )( 2 ) 1

Zi i

Zi

i Z

z z

Cos z iF z P Z z P Cos iw z P w Cos z i Cos z i by uniform

f z Cos z i ce Cos zz zi z z

zFor a fixed i E Z z f z dz dzi z

π ππ

ππ

π

−− −

− −

=−

−= ≥ = ≥ = ≥ = = −

∂ ∂⎡ ⎤ ⎡ ⎤= − = =⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦∂ ∂− −

= =−∫

1 1 12

2 11 1

1 2 1 1 02( 2 ) 2( 2 )1z

z dz zi izπ π −

=− =−

− ⎡ ⎤= − = − − =⎢ ⎥⎣ ⎦−∫ ∫

Independent White Noise Process: An independently and identically distributed (iid) sequence with 0-mean and finite variance is a

d

Ergodic Process8: A stationary process is said to be ergodic if, for any two bounded functions:

special case of a white noise process. (but generally, elements of white noise process may not come from the same distribution and may not be independent from each other. Independence anidentical distribution are much stronger assumptions!)

{ }iz ℜ→ℜℜ→ℜ LK gf :,: ( ) ( ) ( )),...,(),...,(,...,(,...,(lim lnin zzgEzzfEzzzzfE ++∞→ =) nikiilninikii g ++++++

(This requires independence between elements sufficiently far apart i.e. (zi,…,zi+k) ind of (zi+n,…,zi+n+l) for n large)

Heuristically, a stationary process is ergodic (i.e. ergodic stationarity) if it is asymptotically independent i.e. any 2

Ergodic stationarity is important in developing large sample theory because of the ergodic theorem. Ergodic Theorem:

Let e a stationary and ergodic process with

(Intuitively, ergodicity says that the process is not “too persistent”)

rv’s or positioned far apart in the sequence are almost independently distributed.

{ }niiz 1= b μ=)( izE . Then, μ⎯⎯→⎯≡ ∑

=..

1

1 n

sai

in zn

z

Idea: This generalizes Kolmogorov’s LLN . Ergodic theorem allows for serial dependence (whereas Kol’s rules it out by

Implic and ergodic process (if exists and finite) is consistently estimated by the

Martingale & Martingale Difference

Martingales Vector Process: A vector process

9

iid assumption), provided that the serial dependence disappears in the “long-run” (by stationary ergodic), i.e. asymptotically in large samples. ation: Any moment of a stationary

sample moment. 10 (iid is a special case!)

{ }ig is called a martingale if lled the information set, and is called a martingale

artingale Difference Sequence: A vector process

2),...,|( 111 ≥= −− iforggggE iii Note: The conditioning set 11,..., ggi− is often ca { }igsince its information set is its own past values.

{ }igM with 0)( =igE is called a margingale difference sequence (m.d.s.) or martingale difference if the expectation conditional on its past values is also 0: ( ) 0,...,| ≥ 211 =− iforgggE ii

Implica 0 for i andtion: A martingale difference sequence has no serial correlation Cov(gi,gi-j) = 0≠j 11

rgodic Stationary Martingale Differences CLT: Let

{ }igE be a vector martingale difference sequence that is stationary and ergodic

with 12, and let ∑=)'( ii ggE ∑=

≡n1 , ),0(1 n

1

Σ⎯→⎯= ∑=

Ngn

gn D

ii

iig

ng

1

. Then

1c. Autocorrelation, Ergodic Stationarity, and Martingale difference sequence

arity. Just check the definition. However, a MDS

Pf: Suppose Zi is MDS. Cov(Zi,Zi-k) = E(ZiZi-k)-E(Zi)E(Zi-k) = E[Zi-k E(Zi | Zi-1,…)] – E[E(Zi | Zi-1,…)]E(Zi-k) = 0 by MDS assumption.

It should be noted that autocorrelation in {Zi} does not affect ergodicity or stationimplies no autocovariance! This is why the model in Ch2 (specifically assumption 2.5) is changed by the assumption that there is autocorrelation. In which case, consistency results change, we need to invoke a different CLT to get asymptotic normality, and theasymptotic variance is different.

8Need for Ergodic Stationarity: The fundamental problem in time-series analysis is that we observe the realization of the process only once. (i.e. we get only 1 sample and 1 observation of realized { !) Ideally, we would like to observe history many times over to obtain more samples, but clearly this is not feasible. But if each of the zi’s come from

the same distribution (stationarity) , then we can view each realization of

}iz{ }iz as n realizations from the same distribution. Furthermore, if the process is not too persistent

(ergodicity), then each element of will contain some information not available from the other elements. In this case, the time average over the elements of { }iz { }iz will be consistent for the ensemble mean!

9 Kolmogorov’s Second Strong Law of Large Numbers: Let be iid with { }niiz 1= μ=)( izE .Then, μ⎯⎯→⎯≡ ∑

=..

1

1sa

n

iin z

nz

10 Since if ergodic stationary, then also ergodic stationary for any measurable function f(.). { }iz { )( izf }11 ( ) ( ) ( ) ( ) ( )[ ] ( )[ ] ( )[ ] 0),...,|(sin0'|),...,|('||'''),( 1111 ======−= −−−−−−−−−−−− gggEcegggggEEEgggEEgggEEggEgEgEggEggCov iijijiiijijiijijiijiijiijiijii 12 Since stationary, the matrix of cross moments does not depend on i. Also, we implicitly assume that all the cross moments exist and are finite. { }ig

Linear Processes: An important class of covariance-stationary processes, which are created by taking a moving

1. MA(q): A process {yt} is called the q-th order moving average process (MA(q)) if it can be written as a weighted average of the

or

average of a white noise process13.

current and most recent q values of a white noise process:

0 1 1 0... 1t t t q t qy withμ θ ε θ ε θ ε θ− −= + + + + = 0 00

1q

t t jj

y withμ θ ε θ−=

= + =∑

MA(q) process is Covariance-Stationary mean μ with j-th order autocovariance14:

( ) 2 2

0 1 10

( , ) ... 0,1,...,

0

j t t j j j q q j j k kk

j

Cov y y for j q

for j q

γ θ θ θ θ θ θ σ σ θ θ

γ

− + − +=

= = + + + = =

= >

∑q j−

MA(inf) as a Mean Square Limit: MA(q) process dies out after q lags ( 0j for j qγ = > ). Though some time series have this

el serial correlat (inf) property, we want to learn to mod ion that depends on the infinite past MA

0 0

( ) : , ( )t j t j j jj j

MA y where is a sequence of reals for nec but not suffμ ψ ε ψ ψ∞ ∞

15−

= =

∞ = + < ∞∑ ∑

Prop 1: MA(inf) with absolutely summable coefficients

ers that is absolutely summable. Then

Let {εt} be white noise and {φj} be a sequence of real numb

0

0 0

0

( ) , ( 2 )

. . lim

( ) { } cov ( ) cov { }

t j t jj

n

t j t j t j t jj j

t t t j j

a For each t y converges in mean square L convergence

i e so that y is a meaningful it of y

b y is stationary with E y and auto ariance are given by

μ ψ ε

μ ψ ε μ ψ ε

μ γ γ θ θ θ

−=

∞

− −= =

= +

= + = +

− = = +

∑

∑ ∑

( )

∞

2 21 1

0

0

...

( ) cov :

( ) { } , { } ( )

j j k kk

jj

t t

c Auto ariances are absolutely summable

d If is iid white noise then y is strictly stationary and ergodic

θ σ σ θ θ

γ

ε

∞

+ +=

∞

=

+ =

< ∞

∑

∑

(Note: this includes MA(q) as a special case since absolute summability is trivially met for q finite)

13 White noise process {et} is a 0-mean covariance stationary (cov between terms does not depend on t) process with no serial correlation: E(et) = 0, E(et

2) = s2 > 0 , E(etet-j) = Cov(etet-j) = 0 for j != 0 14

1 1

1 1 1 1 1 1 1 1

( ) ( ) ( ) ... ( ) ( )

0 ,( , ) ( ... , ... ) ( ... , ... )

t t t q t q

t t j t t q t q t j t j q t j q t t q t q t j t j q t j q

k t kk

E y E E E does not depend on t

For j qCov y y Cov Cov

Cov

μ ε θ ε θ ε μ

μ ε θ ε θ ε μ ε θ ε θ ε ε θ ε θ ε ε θ ε θ ε

θ ε

− −

− − − − − − − − − − − − − − −

−=

= + + + + =

< <= + + + + + + + + = + + + + + +

=

( )

0 1 1 1 1 1

1

, ,

sin cov

j q j q j q q j q j

j k t j k k t j k t j k t j k j k t j k k t j kk k k q j k k

q j

k j k t j kk

Cov these are the overlapping terms

Var ce white noise so no bt cross terms

θ ε θ ε θ ε θ ε θ ε

θ θ ε

− − − −

+ − − − − − − − − + − − − −= = = − + = =

−

+ − −=

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟+ + =⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠

=

∑ ∑ ∑ ∑ ∑ ∑

∑2

0

covq j

j k kk

by stationarity of white noiseσ θ θ−

+=

= −∑15However, for this to work we need the infinite series to converge to a random variables. A necessary condition is absolute summability.

16Prop 2: Filtering covariance stationary processes (This generalizes above theorem to not just white noise but cov-stat)

0j

−=

∑

{ }

Let {xt} be white noise and {hi} be a sequence of real numbers that is absolutely summable. Then

2

0

0 0

( ) , ( 2 )

. . 0 lim

{ } cov( ) cov { }

t j t jj

nL

j t j t j t j t j t jj j

t

t

a For each t y h x converges in mean square L convergence

i e so that y is a meaningful it of y

y is stationaryb If the auto ariances of x are absolut

ψ ε μ ψ ε μ ψ ε

−=

∞ ∞

− −= =

=

⎯⎯→ = + = +

−

∑

∑ ∑

, cov tely summable then so are the auto ariances of y

∞

16 Recall the following results about mean square convergence:

( )2( ){ } 0 ,

( ) , lim ( ) ( ) lim ( ) ( )

n m n

n ms n ms n n n n n

i z converges in mean squareiff E z z as m n

ii If x x and z z then E x E x and E x z E xz→∞ →∞

⎡ ⎤− → → ∞⎢ ⎥⎣ ⎦→ → = =

Proof of Prop 1 is as follows:

2. Filters Background: The operation of taking a weighted average of (possibly infinitely many) successes values of a process, i.e.

t j , is called filtering, and can be expressed compactly using the lag-operator: j

Def: For a given arbitrary sequence of reals,

0t j

j

y μ ψ ε −=

= + ∑∞

jt tL x x −= .

define a filter by 20 1 1( ) ...L L Lα α α α≡ + + + 0 1{ , ,...}α α

Therefore, applying to a process t k 20 1 2 0 1 1 2 2

0

{ }, ( ) ... ...t t t t t t t t kk

x L x x L x L x x x x xα α α α α α α α∞

− − −=

= + + + = + + + = ∑

Note: 0 1 20

( ), ( ) (1) ( ...) .t jj

If x c cons L c c c c aα α α α α∞

=

= = = + + + = ∑

ef: If 0 0j jfor j p and for j pα α≠ = =D > ,then the filter reduces to a p-th degree lag polynomial:

2 pL0 1 2( ) ... pL L Lα α α α α= + + + +

[Note: We can treat these like polynomials in terms of operations: multiplication, inverting, roots, etc…]

Def: A filter with absolutely summable coefficients is called absolutely summable filter. It is a mapping from the set of covariance

Product of Filters

stationary processes to itself.

Def: Let { },{ }j jα β be two arbitrary sequences of reals s.t. 2 2

0 1 2 0 1 2( ) ..., ( ) ...L L L L L Lα α α α β β β β≡ + + + ≡ + + +

Then uct of the filters ( ), ( )L L, the prod α β is: ( )L ( ) ( )L L where{ }jδδ α β= be defined as a convolution of{ },{ }j jα β :

0 0 0 1 0 1 1 0 2, , ...0 2 1 1 2 0 0 1 1 2 2 1 1 0,. ... , j j j j j 0δ α β δ α β α β δ β α β α β α β α β− − −= = + + + + + +α β α α β δ α β= + + =

xample: for 1 1L21 1 1 1 1 1( ) 1 , ( ) 1 ( ) ( ) ( ) (1 )(1 ) 1 ( )L L L L L L L L L Lα α β β δ α β α β α β α β= + = + ⇒ = = + + = + + + E

ty: Filter mult. is commutativeProper : ( ) ( ) ( ) ( )L L L Lα β β α= (Just like matrix multiplication, will apply for matrix filters)

rop: If

P ( ), ( )L Lα β absolutely summable and if {xt} covariance stationary, then by Prop2 ( ) ( ) tL L xα β is a well-defined variable,

al to equ tx ( )Lδ : " ( ) ( ) ( )" " ( ) ( ) ( ) "t tL L L L L x L xα β δ α β δ= ⇒ =

verses of FiltersIn LDef: We say that ( )β is the inverse of ( )Lα if ( ) ( ) ( ) 1L L Lδ α β= = , we denote .

erse atisfy

1( ) ( )L Lβ α −=

(i.e. The inv 1( )Lα − is a filter s ing 1( ) ( ) 1L Lα α − = )

, the inverse 1( )Lα − of ( )Lα can be defined for any arbitrary sequence { }ja Note: As long as 0 0α ≠ because

( ) ( ) 1L L ( )Lδ α β= =

0( ) 1 1 0 0jL and for jδ δ δ= ⇒ = = > . Then we can solve the following successively:

0 0 0 1 0 1 1 0 2 0 2 1 1 2 0 0 1 1 2,. ... , j j j j 2 1 1 0 0

10 1 2

0 0

, , ...

1 , ,....

jδ α β δ α β α β δ α α β α β

αβ β

α α

−= = + = + + +

⇒ = = −

Example: Consider the filter 1-L. Then,

β α β α β δ α β α β α β− −+ + = + +

1 2 3(1 ) 1 ...L L L L−− = + + + +

Why? 1(1 ) ; (1t t tL x x x −− = − − +1 21 1 1 2 2 3) ( ) 1 ... (1 ...)( )t t t t t t t t t tL x x x x x x x x L L x x−

− − − − − −− = = − − + − + = + + + − 1−

Prop: Inverses are commutative , and 1

: 1 1( ) ( ) ( ) ( )L L L Lα α α α− −=

10 00, 0, ( ( )provided L Lα β α ) ( ) ( ) ( ) ( ) ( ) ( ) ( )L L L L L L Lβ δ β α δ α δ β− −= ⇔ = ⇔ =≠ ≠ (does not apply to matrix filters)

Inverting Lag Polynomials (“finite” filter) . . ( ) 1 ...

1Re ( ) ( ) :

, , ,. ... , ...

pp

j j j j j

s t L L L L

Inverse can be defined as followscall convolution of L and L

φ φ φ φ

φφ β

δ φ β δ φ β φ β δ φ β φ β φ β δ φ β φ β φ β φ β− − −

= − − − −

= ⇒

= = + = + + = + + + +

21 2

0

0 0 0 1 0 1 1 0 2 0 2 1 1 2 0 0 1 1 2 2 1

( ) degLet L be a p th ree lag polynomialφ −

1 0 0

0

20 0 1 1 1 2 1

0

( ) ( ) 1 1 0 0

,1 1 (sin 1), , ,...

jand if L is the inverse L and for j

Thus

ce

φ β

β δ δ δ

β φ β φ β φ φφ

+

⇒ = ⇒ = = >

= = = = = +

Takeaway: for any p-th degree lag polynomial ( )Lφ with 0 1φ = , its inverse is defined/exosts!

mma less isfied)

Prop 3: Absolutely Summable Inverses of Lag Polynomials L and let

(though it may not be absolutely su ble un the invertibility condition is sat

Consider a p-th degree lag polynomial 21 2( ) 1 ..L L Lφ φ φ= − − − . p

pφ− 1( ) ( ) .L Lψ φ −= If the associated p-th degree

polynomial equation ( ) 0zφ = satisfies t : a) All roots the p-th degree polynomial equation in z

he following stability condition (λ ) of

zλ− are s.t. i

21 2 1( ) 0 ( ) 1 ... (1 )(1p

pz where z z z z zφ φ φ φ φ λ= ≡ − − − − = − − 2 )...(1 )zλ p 1i for all iλ >

Then the icient sequence coeff { }iψ is bounded in absolute value is bounded by a geometric lining sequence and hence is l he

ally decabsolutely summable. (This wil lp us express AR processes as “well-defined” MA processes).

• Example: Consider a lag polynomial of degree 1: ( ) 1L Lφ φ= −

1 0zφ− = is 1zφ

= . The root of the associated polynomial equation

The stability condition is 1 1 1or φφ

> < . Then, the inverse filter

1 2 2

0

( ) ( ) 1 ... j j

j

L L which is evidently L L Lψ φ φ φ φ∞

−

=

⎛ ⎞⎜ ⎟= = + + +⎜⎝∑=

⎟⎠

has absolutely summable coefficients.

We will use this to show that AR(1) can be expressed as MA(inf)

Note: Why do we need stability condition? Stability condition guarantees that the inverse is absolutely summable, and ity to

therefore we can multiply both sides of the equation by the inverse filter. (It wouldn’t make sense to multiply infin both sides)

3. ARMA Processes: Important class of linear processes (cov-stat gen from white noise), which are a parameterization of coefficients n

rst-Order Autoregressive Process AR(1) satisfies the following stochastic difference equation:

of MA(inf) process (the idea here is to be able to represent covariance stationary processes in a nice parametizatioof coefficients)

Def: A Fi 1 1 (1 ) { }t t t t t t t t ty y y y L y where is white noiseμ φ ε φ μ ε φ μ ε ε− −= + + ⇔ − = + ⇔ − = +

I. MA(inf) representation of AR(1) as long as 1φ < (stability condition to guarantee absolute summability)17

1

0 0 0

(1 ) (1 ) ( ) ( ) ( ) ( )1

j j j jt t t t t t j t

j j j

L y y L L L Lμtφ μ ε φ μ ε φ μ ε μ φ φ ε φ ε μ φ ε

φ

∞ ∞ ∞−

−= = =

− = + ⇒ = − + = + = + = + = +−∑ ∑ ∑

Note: If y has mean t 1μμ

φ=

−, then it is covariance stationary and has the above MA(inf) representation

white noiset

(Here we have and absolute summability. So all we need is for expectation to be right). Note: Intuitively, this result says that the process, if it started a long time ago, “settles down”, provided tha 1φ < so that the effect

. Autocovariances of AR(1) processes (1) (for case of

of the past dies out geometrically as time progresses. This is called the stationarity condition. II 2 Methods of Calculating autocov of AR | | 1φ < ):

a) 2 0 1 1 2 2 2 4 22

0

( ...) (1 ...)1

jj j j j j

t j jj

y φ 2μ φ ε γ σ φ φ φ φ φ φ φ φ φ σ σφ

∞+ +

=

⎛ ⎞= + ⇒ = + + + = + + + = ⎜ ⎟⎜ ⎟−⎝ ⎠

∑

and 2 22 2

0

1/1 1

jj j

jγ φρ σ σγ φ φ

⎛ ⎞ ⎛ ⎞≡ = =⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟− −⎝ ⎠⎝ ⎠

φ

b) Via “Yule-Walker” Equations (See 271B HW2)

I. AR(p) and its MA(inf) Representation: Above results for AR(1) can be generalized for AR(p) – p-th order autoreg. process

The generalization of AR(p) of what we derived for AR(1) is as follows:

Prop 4: AR(p) as a MA(inf) with Absolutely Summable Coefficients ity condition (so that the inverse is well defined) i.e.

II Def: A P-th Order Autoregressive Process AR(P) satisfies the following stochastic difference equation:

1 1 2 2 2...t t t t p ty y y yμ φ φ φ ε− − −= + + + + +

1 1 2 2 2...

(1 ) { }t t t t p t

t t t

or y y y y

or L y where is white noise

φ φ φ μ ε

φ μ ε ε− − −− + + + = +

− = +

Suppose the p-th deree polynomial satisfies the stationarity/stabil( ) 0 | | 1z zφ = ⇒ > . Then,

ovarianc(a) The unique c e-stationary solution to the p-th order stochastic difference equation has a MA(inf)

t

representation :

1 2( ...φ φ+ + + 21 1 2 1 2

1 1 1 1 1 11 2 1 2 1 2

0 0

) (1 ... ) (1 )(1 )...(1 )

(1 ) (1 ) ...(1 ) (1 ) (1 ) ...(1 ) ..

p pt p t t p t t p t

j j j jt p p t

j j

y L L y L L L y L L L y

y L L L L L L L L

μ φ ε φ φ φ μ ε λ λ λ μ ε

λ λ λ μ λ λ λ ε μ λ λ

−

∞ ∞− − − − − −

= =

= + + ⇒ − − − − = + ⇒ − − − = +

⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⇒ = − − − + − − − = +⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠∑ ∑

0

2 3 1 10 1 2 3

.

( ) , ( ) ... ( ) ( ) , (1)

j jp t

j

t t

L

y L L L L L where L L

λ ε

μ ψ ε ψ ψ ψ ψ ψ ψ φ μ φ μ

∞

=

− −

⎛ ⎞⎜ ⎟⎜ ⎟⎝ ⎠

⇒ = + = + + + + = =

∑

The coefficients { }jψ are bounded in absolute value by a sequence by a geometrically declining sequence and hence

mab

ote: The MA(inf) representation comes from the fact that the inverse is represented as product of infinite sums!

is absolutely sum le. (Thus the MA representation is well-defined. RHS converges.) N

17 Recall MA processes are well defined only when the lag coefficients are absolute summable. We can multiply the inverse on both sides only if absolute summability is met, and then RHS becomes a meaningful/well-defined limit. The 1φ < is equivalent to the stability condition

because the root of 1 0zφ− = is 1zφ

= , and the stability condition is 1 1 1or φφ

> < .

(b) he mean of the process is given by: T 1(1)μ φ μ−= (c) { }jγ is bounded in absolute value by a e u nce that declines s q e geometrically with j. Therefore autocovariances are

Note: We refer to “an AR(p) process” as the unique covariance stationary solution to an AR(p) equation that satisfies the

RMA(p,q): An AR(p,q) process combines AR(p) and MA(q)

q t q

absolutely summable.

stationarity condition. The absolute summability of the inverse in part (a) follows from prop 3, since stability conditions are satisfied.

IV. A

21 2 1 2 2( ) ( ) ( ) 1 ... , ( ) 1 ...

q t q

pt t p tL y L where L L L L L L1 1 2 2 1 1 2 2... ...t t t p t p t t ty y y yμ φ φ φ ε θ ε θ ε θ ε

φ μ θ ε φ φ φ φ θ θ θ ε θ ε

−− − − − −

− −= + = − − − − = + + + +

= + + + + + + + + +

or Where {εt}is white noise.

Prop 4: AR(p) as a MA(inf) with Absolutely Summable Coefficients ition (so that the inverse is well defined) i.e.

Suppose the p-th deree polynomial satisfies the stationarity/stability cond( ) 0 | | 1z zφ = ⇒ > . Then,

unique c(a) The ovariance-stationary solution to the p-th order stochastic difference equation has a MA(inf)

t

representation : 2

1 2

1 2 2 1 2 1

21 2 1 2

1 1 11 2

) ( ) 1 ...

( ) 1 ... ( ... )

(1 ... ) (1 )(1 )...(1 )

(1 ) (1 ) ...(1 ) (1

pt p

pt q t q t p t t

pp t t p t t

t p

where L L L L

L L y L L y

L L L y L L L y

y L L L

ε φ φ φ φ

θ θ θ ε θ ε μ φ φ φ ε

φ φ φ μ ε λ λ λ μ ε

λ λ λ μ

− − −

− − −

= − − − −

= + + + + = + + + + +

− − − − = + ⇒ − − − = +

⇒ = − − − + − 1 1 11 2 1 2

1 2 1 20 0 0

2 3 10 1 2 3

) (1 ) ...(1 ) (1 )(1 )...(1 )

... (1 )(1 )...(1 )

( ) , ( ) ... ( ) ( ) ( ), (1

p q

j j j j j jp q t

j j j

t t

L L L

L L L

y L L L L L where L L L

( ) (tL y Lφ μ θ= +

λ λ λ η η η

μ λ λ λ η η η ε

μ ψ ε ψ ψ ψ ψ ψ ψ φ θ μ φ

− − −

∞ ∞ ∞

= = =

−

− − − − −

⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟ ⎜ ⎟= + − − −⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠ ⎝ ⎠

⇒ = + = + + + + = =

∑ ∑ ∑1) μ−

ε

The coefficients { }jψ are bounded in absolute value by a sequence by a geometrically declining sequence and

of infinite sums!

(b) The mean of the process is given by:

hence is absolutely summable. (Thus the MA representation is well-defined. RHS converges.) Note: The MA(inf) representation comes from the fact that the inverse is represented as product

1(1)μ φ μ−=

(c) { }jγ is bounded in absolute value by a e u nce that declines s q e geometrically with j. Therefore autocovariances are

Note: We refer to “an ARMA(p,q) process” as the unique covariance stationary solution to an ARMA equation that satisfies

Invertibility of ARMA(p,q) and its MA(inf) and AR(inf) Representations

absolutely summable.

the stationarity condition. The absolute summability of the inverse in part (a) follows from prop 3, since stability conditions are satisfied.

If an ARMA(p,q) process satisfies the stability condition ( ) 0 | | 1z zφ = ⇒ > , and ( ) 0 | | 1z zθ = ⇒ > (stability condition on theta is called invertibility condition), and if in addition ( ) 0z ( ) 0and zφ θ= = have no co en, the ARMA process is invertible. And the process as an AR(inf) and M !

mmon roots. ThA(inf) representation

( ) 0 | | 1z zθ = ⇒ > , then is absolutely summable and we can multiply both sides of ARMA by 1( )Lθ − 1( )Lθ −If to obtain its AR(inf) representation18:

1 1 1 1

1

( ) ( ) ( ) ( ) ( ) ( ) ( ) (1)(1)

( ) ( ) ( ( ))

t t t t t t

t t

L y L L L y L L L y

L L y c AR

μtφ μ θ ε θ φ θ μ ε θ φ θ μ ε ε

θ

θ φ ε

− − − −

−

= + ⇒ = + ⇒ = + = +

∴ = + ∞

18 Recall, exists and is defined if 1( )Lθ −

0θ =1, as is the case here. However, it may not be absolutely summable (and therefore we cannot multiply both sides of the equation by it) unless the stability condition ( ) 0 | | 1z zθ = ⇒ > (aka the invertibility condition) is satisfied.

4. Estimating Autoregressions: Autoregressive processes (autoregressions) are popular in econometrics because they have a natural

Prop 7: (Estimation of AR coefficients) the stationarity condition. Suppose further that {e } is iid white noise. Then the OLS

interpretation and also because they are easy to estimate.

Let {yt} be an AR(p) process that satisfies t

estimator of 1 2{ , , ,..., }pc φ φ φ obtained by regressing yt on a constant and p lagged values of y is consistent and asymptotically normal, with

( )

2 11

122 2

1 11 1

ˆvar( ) ( ') (1, ,..., ) '

1 1ˆ ˆ ˆ ˆˆvar( ) ' ...1

t t t t t p

n n

t t t p t pt t

A E x x where x y y

which is consistently estimated by

A s x x where s y c y yn n p

β σ

β φ

−− −

−

− −= =

= =

⎛ ⎞= = − −⎜ ⎟⎜ ⎟ − −⎝ ⎠

∑ ∑

φ− −

ote: This follows from the fact that yt can be written as a MA(inf), so if {et} is iid white noise, then yt is ergodic stationary from Prop

→ ∞

LS… WHY?!

Estimating ARMA we can use GMM/IV/2SLS to deal with endogeneity.

Autocovariance-Generating Function ole profile of autocovariances

N1(d), and it satisfies orthogonality to et by its independence. Therefore, assumptions for OLS are met. Though, we check that E(xtxt’) is nonsingular, or equivalently checking that the autocovariance matrix Var(yt,…,tt-p) is nonsingular. It can be shown that the autocovariance of a covariance-stationary process is nonsingular for any p if 0 0 0jand as jγ γ> → . (which is satisified here since we have AR(p) as an MA(inf) and {e } is iid) tNote: An AR(p) process with non-iid white noise can also be estimated via O

To estimate AR part, AGF is a useful way to summarize the wh { }jγ of a covariance stationary process {yt}.

01

( ) ( ) (sin cov )j j jY j j j j

j j

g z z z z ce byγ γ γ γ γ−−

=−∞ =

≡ = + + =∑ ∑∞ ∞

stat

There’s a question about whether the summation ex ts, but for z = 1 and foris { }jγ absolutely summable

j01

(1) 2Y jj j

g γ γ γ=−∞ =

≡ = +∞ ∞

∑ ∑

Prop 6: Autocovariance generating function for MA(inf)

3 ...+ be an absolutely summable filter (i.e. with Let {et} be white noise and let 20 1 2 3( )L L L Lψ ψ ψ ψ ψ= + + + { }jψ absolutely

ance-generating function of the MAsummable. Then the autocovari (inf) process {yt} where ( )t ty Lμ ψ ε= + is2 1( ) ( ) ( )Y

:

g z z zσ ψ ψ −= 19

Therefore, for AR(p) and ARMA(p,q), since they have MA(inf) representations:

AR(p): 2 1( )g z σ≡ 1( ) ( )Y z zφ φ −

ARMA(p): 1

21

( ) ( )( )( ) ( )Yz zg zz z

θ θσφ φ

−

−≡

Note: These are useful because long-run variance can be expressed as gy(1)

1−

19

2

0

1 2 2 2 1 2 2 1 2 10 0 1 0 1 1 0 1 0 1 1 1 1

1

Re , ( ) , 0,1,..., 0

(1).

, ( ) ( ) ( ) ( ) (1 ) (1 )(1 )

q j

j j k k jk

j j jY j j

j j

call for MA q process for j q and for j q

Take MA

Then g z z z z z z z z z z z z

Thef

γ σ θ θ γ

γ γ γ γ γ σ θ θ θ θ θ θ σ θ θ θ σ θ θ

−

+=

∞ ∞− − − −

=−∞ =

= = = >

= = + + = + + = + + + = + + + = + +

∑

∑ ∑2 1, ( ) ( ) ( )Yore g z z zσ θ θ −=

5. Asymptotics for Sample Means of Serially Correlated Processes viously, we used Billingsly (ergodic stationary MDS ) CLT,

a) LLN for Covariance Stationary Processes with vanishing Autocovariances

Here we develop LLN and CLT for serially correlated processes. Prewhich rules out serial correlation since the process is assumed to be a martingales difference sequence20. 1 21: Let {yt} be covariance-stationary with mean μ and { }jγ be the autocovariances of {yt}. Then,

( )

. ./ 2

1

1( ) lim 0

( ) lim

Tm s L

t j jt

j jj

a y y ifT

b Var n y if

μ γ

γ

→∞=

∞

→∞=−∞

= ⎯⎯⎯⎯→ =

= < ∞

∑

∑

(Note: we also call this the long-run variance of the covariance stationary process22, it can be expressed from AGF gY(1)).

b) LLN for Vector Covariance-Stationary Processes with vanishing Autocovariances (diag element of 1 { }jΓ ) :

Let {yt} be a vector covariance-stationary with mean μ and { }jΓ be the autocovariances23 of {yt}. Then,

( )

. ./ 2. .

1

1( ) 0

( ) lim { } ( . . )

Tm s L

t j m st

j j j jj

a y y if diagonal elements of as jT

b Var n y if is summable i e each component of summable

μ=

∞

→∞=−∞

= ⎯⎯⎯⎯→ Γ → → ∞

= Γ < ∞ Γ Γ

∑

∑

(Note: we also call this the long-run covariance variance matrix of the vector covariance stationary process24, it can be expressed

from Multivariate AGF: j ). 01

(1) ( ')Y j jj j

G∞ ∞

=−∞ =

= Γ = Γ + Γ + Γ∑ ∑

20 gt MDS E(gt | gt-1,…,)=0. Therefore, Cov(gt,gt-1)=E(gtgt-1) = E( E(gt | gt-1,…)) = 0. 21

( ) 11 1 1 1 1

0 1 1 1 0 1 2 1 0

1 1 1 1, , ...

1 ( ... ) ( ... ) ... ( ...

n n n n n

i i i i i ni i i i i i

n n n

Var n y Var n y Var y Cov y y Cov y y Cov y yn n n n

nγ γ γ γ γ γ γ γ γ

= = = = =

− − − − +

⎛ ⎞ ⎛⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛⎜ ⎟ ⎜= = = = + +⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜⎜ ⎟ ⎜⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝⎝ ⎠ ⎝

= + + + + + + + + + + +

∑ ∑ ∑ ∑ ∑ ∑

( )

1

,n

i=

⎞⎞⎟⎟⎟⎟⎠⎠

( )

( )

0 1 2 1 1 2 1

0 1 2 1

1 1 1

0 01 1 1

1

)

1 ( 1) ( 2) ... ( 1) ( 2) ...

1 2( 1) 2( 2) ... 2 cov

22 1 2

1 0

, lim

n n

n

n n n

j j jj j j

n

j jj

n

n n n n nn

n n n by stationarityn

j jn n

summable jn

Therefore

γ γ γ γ γ γ γ

γ γ γ γ

γ γ γ γ γ

γ γ

− − − − +

−

− − −

= = =

=

= + − + − + + + − + − + +

= + − + − + + −

⎛ ⎞= + − = + −⎜ ⎟⎝ ⎠

⇒ →

∑ ∑ ∑

∑

( )1

01

lim 2n

n j jj j

Var n y γ γ γ− ∞

→∞ →∞= =−∞

= + =∑ ∑

22 We can think of the sample as being generated from an infinite sequence of random variables (which is cov. Stationary). So, the “long-run” variance is the sum of covariances from any 1 element in the sequence to all the other elements. 23 In a vector process, the diagonal elements of { }jΓ are the autocovariances and the off diagonal are the covariances between the lagged values of the elements of the vector.

For example:

.

, cov( , ) ( ') ( ) ( )

( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( )

tt

t

t tj t t j t t j t t j t j t j t j t j

t t

t t j t t j t t j t t j

t j t t j t t t j t

xLet y

z

x xThen y y E y y E y E y E x z E E x z

z z

E x x E x E x E x z E x E z

E x z E x E z E z z E z

− − − − − −

− − − −

− − −

⎡ ⎤= ⎢ ⎥

⎣ ⎦⎛ ⎞⎡ ⎤ ⎡ ⎤

−⎡ ⎤ ⎡Γ = = − = −⎜ ⎟⎢ ⎥ ⎢ ⎥ ⎤⎣ ⎦ ⎣⎜ ⎟⎣ ⎦ ⎣ ⎦⎝ ⎠− −

=− −

( , ) ( , )

( ) ( , ) ( , )t t j t t j

t j t j t t t j

Cov x x Cov x z

E z Cov x z Cov z z− −

− − −

⎡ ⎤ ⎡=⎢ ⎥ ⎢

⎢ ⎥ ⎢⎣ ⎦ ⎣

⎦

⎤⎥⎥⎦

2a) CLT for MA(inf) (Billingsley generalizes Lindberg-Levy25 to stationary and ergodic mds, now we generalize for serial corr)

Let t j t jy0j

μ ψ ε −= + ∑ w∞

=

here { }tε is iid white noise and 0

jψj

∞

=

< ∞∑ . Then,

( ) 0,Djj

n y Nμ γ∞⎛ ⎞− ⎯⎯→ ⎜ ⎟∑ =−∞⎝ ⎠

2b) MV CLT for MA(inf)

Let t j0

t jj

y μ ψ ε∞

−=

= + ∑ where { }tε is vector iid white noise (i.e. jointly covariance stationary) and 0

jj

ψ∞

=

< ∞∑ . Then,

( ) 0,Djj

n y Nμ∞

=−∞⎛ ⎞− ⎯⎯→ Γ⎜ ⎟⎝ ⎠∑

24 We can think of the sample as being generated from an infinite sequence of random variables (which is cov. Stationary). So, the “long-run” variance is the sum of covariances from any 1 element in the sequence to all the other elements.

25 Let {zi} be iid with E(zi) = ( )1

1( ) ( ) . ( ) ( , )n

Di i i

i

E z and Var z Then n z z Nn

μ μ μ=

= = Σ − = − ⎯⎯→ Σ∑ μ (an iid random vector converges to

a random vector that is normally distributed)

263a) Gordin’s CLT for Ergodic + Stationary + Gordin (conditions for restricting degree of autocorrelation) Processes uppose {y } is stationary and ergodic and suppose Gordin’s conditions are satisfied:

a) (this is a restriction on [strictly] stationary processes which might not have finite 2nd moments)

ondition: restricts degree of autocorrelation) Then,

The autocovariances

S t

2( )tE y < ∞

b) . . . .1 0 1( | , ,...) 0 (sin ) ( | , ,...) 0m s m s

t t j t j j jE y y y as j or equivalently ce stationary E y y y− − − − − −⎯⎯⎯→ → ∞ ⎯⎯⎯→ 27

c) )2, 1 1 2( | ) ( | ) ( , , ,...)t t k t t k t t k t k t k t k t kwhere r E y I E y I and Info Set I y y y− − − − − − − − − −< ∞ ≡ − ≡∑ ( ,

0t t kE r −

=28

1/ 2∞

k

(Telescoping sum c

1. { }jγ are absolutely summable

( ) 0,Djj

n y Nμ γ∞

=−∞⎛ ⎞− ⎯⎯→ ⎜ ⎟⎝ ⎠∑ 2.

3b) Gordin’s MV CLT for Ergodic + Stationary + Gordin29 (conditions for restricting degree of autocorrelation) Processes

} is stationary and ergodic and suppose Gordin’s conditions are satisfied for the vector ergodic stationary process: Suppose {yt

a) ( ')t tE y y < ∞ (this is a restriction ctly] stationary processes which might not have finite 2nd moments) on [stri

b) . . . .1 0 1( | , ,...) 0 (sin ) ( | , ,...) 0m s m s

t t j t j j jE y y y as j or equivalently ce stationary E y y y− − − − − −⎯⎯⎯→ → ∞ ⎯⎯⎯→

c) ( )1/ 2∞2

,0

t t kk

E r −=

ondition: restricts degree of autocorrelation) Then,

The covariance matrices

, 1 1 2( | ) ( | ) ( , , ,...)t t k t t k t t k t k t k t k t kwhere r E y I E y I and Info Set I y y y− − − − − − − − − −< ∞ ≡ − ≡∑

(Telescoping sum c

1. { }jΓ are absolutely summable

( ) 0,Djj

n y Nμ∞

=−∞⎛ ⎞− ⎯⎯→ Γ⎜ ⎟⎝ ⎠∑ 2.

a generalization of MDS CLT, t}is MDS sequence, then Gordin’s condition is satisfied. Note: This is since if {y

26 We should be able to check that with these condition we satisfy Billingsly’s general requirements for CLT to hold (Peter’s notes slide 16). 27 This condition implies that the unconditional mean is 0: ( ) 0tE y = . So, as j infinity, that is, as the forecast (the conditional expectation) is based on less and less information, it should approach the forecast based on no information (i.e. the unconditional expectation). (See White text for proof) 28 We can write yt as a telescoping sum as follows:

( ) ( ) ( )( ) ( ) ( )( )

1 1 2 2

1 1 2 1

, , 1 , 2 ,

( | ) ( | ) ( | ) ( | ) ... ( | ) ( | )

( | ) ( | ) ( | ) ... ( | ) ( | ) ( | )

... ( | )

( ), (

t t t t t t t t t t t t j t t j

t t t t t t t t t j t t j t t j

t t t t t t t t j t t j

t

y y E y I E y I E y I E y I E y I E y I

y E y I E y I E y I E y I E y I E y I

r r r r E y I

By b E y

− − − − − −

− − − − + −

− − − −

= − + − + − − +

= − + − + + − +

= + + + +

,0

| ) 0 ,t j ms

t t t kk

I as j therefore

y r

−

∞

−=

→ → ∞

= ∑

−

This telescoping sum indicates how the “shocks” represented by (rt,t,rt,t-1,…) influence the current value of y. Condition (c) says, roughly, that shocks that occurred a long time ago do not have disproportionately large influence. As such, this condition restricts the degree of serial correlation in {y}. 29 We should be able to check that with these condition we satisfy Billingsly’s general requirements for CLT to hold (Peter’s notes slide 16).

4) Examples: Asymptotic Distributions of AR(p), ARMA(p,q) Let {yt} be ARMA(1,1). To check whether LLN applies, we need to check whether { }jγ vanishes as j inf. ARMA(1,1) has a MA(inf) representation, and from previous, we know autocovariance is obtained by

(( , ) ...Cov y yγ θ θ θ θ θ θ= = + + + )0 1 10

0,1,...,j t t j j j q q j j kk

σ σ θ− + − +=

= =∑2 2q j

k for j qθ−

0j for j qγ = >

So for j >q, verify from here that the autocovariance. Then, to get asymptotic variance,

212 2 1

11

1(1) (1 )(1)1(1) (1 )j yj

gθθ θγ σ σφφ φ

−∞

−=−∞

⎛ ⎞−= = = ⎜ ⎟−⎝ ⎠

∑

In general we have for ARMA(p,q):

21

1

1 ...( ) 0, 0,

1 ...q

D jqj

n y N Nφ φ

μ γψ ψ

∞

=−∞

⎛ ⎞⎛ ⎞ ⎛ ⎞− − −⎜ ⎟⎜ ⎟− → = ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ − − −⎜ ⎟⎝ ⎠⎝ ⎠ ⎝ ⎠∑ by same reasoning as above.

6. Incorporating Serial Correlation in GMM Background: Recall in our GMM assumptions, from assumption 3.3 (orthogonality condition) E(gt) = 0, and in 3.5 we assumed it was a MDS no serial correlation in gt is allowed under this assumption, and as such, the matrix S (from which we obtain the asymptotic variance) is defined to be the variance of gt. Now we can relax assumption 3.5 a follows.

y = x ’δ + εt (t = 1,2,…,n) (this is the equation we want to estimate)

.2 Ergodic Stationarity: nt elements of (yt,xt, zt).

d er

n

Assumptions 3.1 Linearity: The data we observe comes from underlying RV’s { yt (1x1), xt (1xd)}with

t t

3Let zt be a M-dimensional vector of instruments, and let wt be the unique and nonconsta{wt} is jointly stationary an godic.

3.3 Orthogonality ConditioAll the M variables in zt are predetermined in the sense that they are all orthogonal to the current error term:

( ) 0 ,tm tE z t mε = ∀ ∀ ⇔ E[(yt – xt’β) . ⇔ zt ] = 0 E(gt) = 0 where gt = zt . (yt – xt’β) = zt .εt 3.4 Rank Condition for Identification : Guarantees there’s a unique solution to the system of equations

The m x d matrix E(zt xt’) is of full column nk (or E(x z ’) is of full row rank), M mxd ra t t > d (# of equations > # of unknowns). We denote this matrix by

y

∞ (this is a restriction on [strictly] stationary processes which might not have finite 2nd moments)

ΣZX.

3.5 Gordin’s Condition Restricting the Degree of Serial Correlation: Assumption for Asymptotic NormalitThe vector process {gt} = xt . εt satisfies the Gordin conditions:

a) ' '( ') ( )t t t t t tE g g E x xε ε= <

b) . .1 0 1( | , ,...) 0 (sin ) ( | , ,...)m s m s

t t j t j j jE g g g as j or equivalently ce stationary E g g g− − − − − −. . 0⎯⎯⎯→ → ∞ ⎯⎯⎯→

c) ( )1/ 2∞2

,t t kE r where− < ∞∑elescoping sum condition: restricts degree of autocorrelation)

no ular long-run covariance.

The

, 1 1 2( | ) ( | ) ( , , ,...)t t k t t k t t k t k t k t k t kr E g I E g I and Info Set I g g g− − − − − − − − − −≡ − ≡ 0k =

(T

{gt} has nsing

n, by 3(b),

( ) (00, ,D j jg N S S→ = Γ = Γ + Γ +∑ ∑ ) ( )' '

1

( 0, 1, 2,...) covj j t t jj j

n where E g g for j j th auto matrix∞ ∞

−=−∞ =

Γ Γ = = ± ± = −

Note: THIS IS THE ONLY DIFFERENCE BETWEEN THE GMM MODEL WITH AND WITHOUT SERIAL CORRELATION! Under the previous assumption 3.5 (martingale differences sequence), 0( ')i iS E g g= = Γ All the results developed previously carry over, using the new S (long run variance) matrix. Note2: If we know the form of the autocorrelation, we could always use a GLS estimator. However, GLS requires us to know var( | )XεΩ = , and is typically infeasible. Futhermore, when we have to estimate

var( | )XεΩ = , GLS may not dominate OLS! Thus, if we were to run OLS/GMM, then the consistent var-cov matrix is given here!

GMM E with Serial Correlation Results:

1. onsistency30: Under assumptions 3.1 – 3.4, bGMM(Wn) p δ 2. symptotic Normality31: Under assumptions 3.1 – 3.5,

stimator

C

A( ) D

nGMM VNWbn ),0()( ⎯→⎯−δ

( ) ( ) ( )( ) niiiiii

XZXZ

WpWwherezxEWzxEzxEW

W

lim)'()''()'() 1

1'

=

ΣΣ−

−

3. Consistent Estimate of Avar(bGMM(Wn))32:

Suppose there exists a consistent estimator S* of S = E(g g ’). Then, under 3.2, Avar(bGMM(Wn)) is consistently ated by

( ) iiiiiiii ggEWzxEzxEWzxE '()''()'()''( 1= −

estim

XZXZXZXZnGMM WSWWWbAVwhere )(var '1' ΣΣΣΣ==−

mxm i i

( ) ( ) 11 ''*'')ˆvar(ˆ −−≡ ZXnZXZXZXZXnZXGMM SWSSWSWSSWSbA

“true” error is consistent)33: 4. Consistency of s2 (estimation of variance of ( ) ( ) finisandexistsxxEassumeandUnderWbxydefineofWbestimatorconsistentanyFor iiniiin )'(,2.3,1.3.ˆ'ˆ,ˆ εδ −≡

( ) ( ) finiteisandexistsEprovidedEn

ite

iiP

n

i222ˆ1

,

εεε ⎯→⎯∑

5. Consistent estimation of S (long-run variance matrix): (see next)

i 1=

30 From above, since

( )

δ

εε

⎯→⎯∴

==⋅⎯→⎯=⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛⋅∑

=P

GMM

iiiP

n

n

iii

b

andtheoremergodicbygEzEgzn

ˆ

3.30)(1

1

31 Continuing from above,

( ) ( ) ( )

( )

1 1

1

ˆ ˆ' ' ' '

1 (0, ( ')) , ,

ˆ '

GMM ZX n ZX ZX n x GMM ZX n ZX ZX n z

nD P

x i i i i ZX XZ ni

GMM ZX

b S W S S W s n b S W S S W ns

ns z N E g g by Ergodic Martingale Differences CLT S by Ergodic Theorem W W by constructionn

n b S W

ε ε

ε

δ δ

ε

δ

− −

=

− = ⇒ − =

⎛ ⎞⎜ ⎟= ⋅ ⎯⎯→ ⎯⎯→Σ ⎯⎯→⎜ ⎟⎜ ⎟⎝ ⎠

∴ − =

∑( )

P

( ) ( )1 1 1' 0, ' ' ( ') ' ' 'Dn ZX ZX n z ZX n ZX ZX i i ZX ZX n ZXS S W ns N W W E g g W W by CMT and Slutsky sε

− − −⎛ ⎞⎯⎯→ Σ Σ Σ Σ Σ Σ⎜ ⎟⎝ ⎠

32 This follows from above. Standard asymptotic tools. 33 This proof is very similar to 3D from previous notes

assumptionbyfiniteSandbcebSb

bandvectorfinitesomescesb

bSbsbn

bxxn

bxn

bn

bxxbbxn

en

bxxbbxebxxxbxybxy

XXP

XXPP

XX

PPx

Px

XXx

n

ii

n

iii

n

iii

n

ii

n

iiiiii

n

ii

iiiiiiiiiiiiiii

Σ⎯→⎯⎯→⎯−⎯→⎯−−

⎯→⎯⎯→⎯⎯→⎯−

−−+−+=−⎟⎟⎠

⎞⎜⎜⎝

⎛−+⎟

⎟⎠

⎞⎜⎜⎝

⎛−+=−−+−+=⇒

−−+−+=⇒−+=−+−=−=

∑∑∑∑∑∑======

0ˆsin0)ˆ()'ˆ(

ˆsin0)'ˆ(2

)ˆ()'ˆ()'ˆ(21)ˆ('1)'ˆ('1)ˆ(21)ˆ(')'ˆ()ˆ('211

)ˆ(')'ˆ()ˆ('2)ˆ('''ˆ'ˆ'ˆ

1

2

111

2

1

2

1

2

22

βββ

ββ

βββεδδεδεδδδεε

δδδεεδεδδε

εε

ε

Met ods of Estimating Sh : This is difficult, particularly when autocovariances do not vanish after finite lags . Estimating S when Autocovariances Vanish after Finite Lags

We can estimate j-th order autocovariance for our vector process {g } of moment conditions as follows: 1

t

' 2 ' '

1 1j t t j t t t j t t t t t t

t j t jn n1 1 ˆ ˆˆ ˆ ˆ ˆˆ ˆ ˆ ,

n n

Let g g x x where g x y z for consistent forε ε ε δ δ δΓ = = = = −∑ ∑ 34 − −= + = +

If we know a priori that for a known and finite q, 0j for j qΓ = > (i.e. a finite lag after which there is no autocorrelation), then

S is consistently estimated by: ˆ '0

1

ˆ ˆ ˆq

j jj

S=

= Γ + Γ + Γ∑

If we do not know q, then the following are metho LR variance matrix: ds for estimating

2. Using Kernels to Estimate S Kernel-based (or an be expressed as a age of estimated autocovariances: non-parametric) estimators c weighted aver

1

1 ( )ˆ ˆ

n

jj n q n=− + ⎝ ⎠

jS k− ⎛ ⎞

= ⋅Γ⎜ ⎟∑ where the “kernel”, k(.), is a function that gives weights to autocovariances, and q(n) is the “bandwidth”

The above estimator in

Problems , with unknown q, we can use the truncated kernel with a bandwidth q that increases with sample size. As q(n) infinity, more and more (and ultimately all) autocovariances can be included in the calculation of S*. However,

i-definite in finite s problem because then the estimated asymptotic variance of the GMM estimator (which includes S) may not be positive semi-

Bartlett

| 1

Truncated Kernel

1 is a special kernel-based estimator with q(n) = q and 1 | | 1

( )0 | | 1

for xk x

for x≤⎧

= ⎨ > a “truncated” kernel.

⎩

In this caseincreases tothe truncated kernel-based S* is not guaranteed to be positive sem amples. This creates a

definite.)

Kernel and Newey-West Estimator: Newey and West found that the kernel-based estimator can be made nonnegative definite in finite samples using the Bartlett kernel

1 | | | | 1x for x ( )

0 |k x

for x− ≤⎧

= ⎨ >⎩ (looks like a bell-shaped curve bt [-1,1] that peaks at 0)

Bartlett kernel based estimators of S is called in econometrics the Newey-West estimator.

( )'{|

( ) ˆ ˆ1( )

( )1

| ( )} 01 1

| | ˆ ˆ( )

q nn

j q n jj n j

j q−

≤=− + =

= Γ +⎟

three) lags:

j jn j

q n q n⎛ ⎞ ⎛ ⎞−

− Γ Γ + Γ⎜ ⎜ ⎟⎝ ⎠ ⎝ ⎠

∑ ∑

Example: For q(n) = 3, the kernel-based estimator includes autocovariances up to two (not

( )1

0 1 1 2 2 3 3 0 1 1 21

0 1 2 2 1ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ( ') ( ') 1 ( ') ( ') ( ')3 3 3 3 3 3

n

jj n

jS k k k k k−

=− +

⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞= ⋅Γ = ⋅Γ + ⋅ Γ + Γ + ⋅ Γ + Γ + ⋅ Γ + Γ = Γ + Γ + Γ + ⋅ Γ + Γ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠∑ 2

ˆ

3. HAC

VAR

34 Note that here gt is no longer ergodic stationary because we need to estimate the error term. For consistency we need to impose a fourth-moment condition (similar to that in prop 3.4). See Hansen’s notes p.56

Background: In time series regressions, {yt, xt} not always stationary as we have assumed thus far. There are 2 cases in which OLS is still consisten 1. Time Regressions (Ch 6) and 2. Cointegrating Regressions (Ch 10) Time Regressions

t:

Setup:

random variable xt = (1,t)’ is not stationary, since the second element (t) has mean t, which increases with time. Similarly, yt is not stationary since the time trend, α + δ.t is (trivially) non-stationary. However, y

Population Model of Interest: { } (t t ty t where is independent white noiseα δ ε ε= + ⋅ +

' '

{ } ( ) 0, ( ) )

(1, ) , ( , ) 't t t

t t t t

iid with E Var finite

or y x with x t

ε ε ε

β ε β α δ

=

= + = =

Trend Stationarity: Here, the t is trend stationary.

of rv’s is trend stationary if it can be written as the sum of a time trend and a stationary process (here the station mponent is the independent white noise)

onsiste cy of O

Def: A process/sequence

ary co C n LS estimates in Time Regressions: It turns out, OLS estimates ˆˆ,α δ are consistent but have different rates of convergence.

ˆProp: Consider the time regression where εt is independent white noise with E(εt2) = σ2 and E(εt

4) < inf and let ˆ ,α β be the OLS estimates of α and δ. Then,

12

3/ 2

ˆ( ) 1 10,

ˆ 1/ 2 1Dn

Nα α

σ− ⎛ ⎞⎜ ⎟⎯⎯→⎢ ⎥ ⎜ ⎟⎜ ⎟

/ 2/ 3

−⎛ ⎞⎡ ⎤

,

( )n δ δ⎢ ⎥− ⎝ ⎠⎣ ⎦ ⎝ ⎠

Just as in the stationary case α̂ -consistent because ˆ( )n α α− converges to a (normal) R.V. On the other hand, δ̂is n is

n3/2- consistent in that 3/ 2 ( )n δ̂ δ− converges to a R.V. Therefore, δ̂ is hyperconsistent35. Hypothesis TesFor 0

Proof: See next page.

ting for time Regressions: 0 ˆ:H α α=

( )( )

( )( )

( )( )

( )(

(( )

1 1 112 2 2 2 12

2 1

ˆ ˆ ˆ ˆˆt

1 1 11 0 ' 1 0 ' 1 0' 0 00

ˆ

11 0

0

n t t n n t t n nt t t tt

n

n n n n

n s x x s x x

α α α α α α α

1 0 ' 00t tts x x ss n x x

n

s Q

α α−

α α

− −−−

−

− − −= = = = =

⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎛ϒ ϒ ϒ ϒ⎡ ⎤ ⎡ ⎤

−

⎡ ⎤⎡ ⎤ ϒ⎡ ⎤⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎜⎣ ⎦ ⎣ ⎦ ⎣ ⎦⎝⎣ ⎦ ⎣ ⎦⎢ ⎥⎣ ⎦

−=

⎡⎡ ⎤⎣ ⎦

∑ ∑⎢ ⎥⎣ ⎦ ⎣ ⎦⎣ ⎦∑ ∑∑

⎤⎢ ⎥⎣ ⎦

For 0 0

ˆ:H δ δ=

( )( )

( )( )

( )( )

( )( )

( )

3/ 2 3/ 2 3/ 2 3/ 20 0 00

11 1 112 2 2 2 1 12 3/ 23/ 2

ˆ ˆ ˆ ˆˆ

0 0 0 0 00 1 ' 0 1 ' 0 1 ' 0 1 '0 '1 1 1 1t t n t t n n t t n n t t nt tt t t tt

n n n nt

s x x s x x s x x s x xs n x xn

δ δ δ δ δ δ δ δδ δ−− − −−

− −

− − − −−= = = = =

⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤

0

⎡ ⎤⎛ ⎞⎡ ⎤ ϒ ϒ ϒ ϒ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ϒ ϒ⎡ ⎤⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎜ ⎟⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎢⎣ ⎦⎣ ⎦ ⎝ ⎠⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣⎢ ⎥⎣ ⎦∑ ∑ ∑ ∑∑

( )3/ 20

2 1

ˆ

00 1

1n

n

s Q

δ δ

−

⎥⎦

−=

⎡ ⎤⎡ ⎤ ⎢ ⎥⎣ ⎦

⎣ ⎦

35 Estimators that are n-consistent are usually called superconsistent. (οτηερσ υσε τηε τερμ φορ εστιματορσ τηατ αρε νγ consistent for γ > ½.

( )1 11

1

2

ˆˆ ' ' ' ( ' )ˆ

1ˆ 1 ( 1) /ˆ ' ' 1ˆ ( 1) / 2

t t t t t t t t t t t t tt t t t

t tt t t t t tt t t t

t t

b X X X y x x x y x x x x

t n n nb x x x with x x t

t n nt t

αβ ε

δ

α αβ ε

δ δ

− −−

−

⎡ ⎤ ⎛ ⎞ ⎛ ⎞≡ = = ⋅ = ⋅ +⎢ ⎥ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎢ ⎥⎣ ⎦

⎡ ⎤⎡ ⎤− ⎢ ⎥ +⎡ ⎤⎛ ⎞ ⎛ ⎞− = = ⋅ = = =⎡ ⎤⎢ ⎥ ⎢ ⎥⎜ ⎟ ⎜ ⎟ ⎢ ⎥ ⎣ ⎦ +⎝ ⎠ ⎝ ⎠−⎢ ⎥ ⎣ ⎦ ⎢ ⎥⎣ ⎦

⎣ ⎦

∑ ∑ ∑ ∑∑ ∑∑ ∑ ∑ ∑ ∑ ∑

2

( )( )

3/ 2

1 11 1

3/ 2

(2 1)( 1) / 6

0, . ...

0

ˆˆ( ) ' 'ˆ

n n

n n t t t t n t t n n t t nt t t t

n n n

nTo get consistency result let Multiply both sides by to get

n

nb x x x x x x x

n

α αβ ε ε

δ δ

− −− −

⎡ ⎤⎢ ⎥+ +⎣ ⎦

⎡ ⎤⎢ ⎥ϒ = ϒ⎢ ⎥⎣ ⎦

⎡ ⎤−⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎢ ⎥ϒ − = = ϒ ⋅ = ϒ ϒ ϒ ⋅ = ϒ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎢ ⎥ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠−⎢ ⎥⎣ ⎦∑ ∑ ∑ ∑

( )

11 1

1

1 1/ 21

3/ 2 3/ 2

1/ 2 1/ 2 1/ 2 1

3/ 2 3/ 2 3/ 2

'

0 0

0 0

0 ( 1) / 2 0 0( 1) / 2 (2 1)( 1) / 60 0 0

t t n n t tt t

n n

n

n

x x

Q v

n n

n n

n n n n n n nQ

n n n n nn n n

ε−

− −

−

− −−

−

− − −

− − −

⎛ ⎞ ⎛⎛ ⎞ ⎛ϒ ϒ ⋅⎜ ⎟ ⎜⎜ ⎟ ⎜⎝ ⎠ ⎝⎝ ⎠ ⎝

≡

⎡ ⎤⎡ ⎤⎢ ⎥⎢ ⎥ϒ = =⎢ ⎥⎢ ⎥⎣ ⎦ ⎣ ⎦

⎡ ⎤ ⎡ ⎤ ⎡ ⎤+⎡ ⎤⎢ ⎥ ⎢ ⎥ ⎢ ⎥= =⎢ ⎥+ + +⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎣ ⎦⎣ ⎦ ⎣ ⎦ ⎣ ⎦

∑ ∑ ⎞⎞⎟⎟⎠⎠

/ 2 1/ 2

1/ 2 1/ 2

2

1/ 2 1/ 2

3/ 2 3/ 23/ 2

( 1) / 2

( 1) / 2 (2 1)( 1) / 6

1 ( 1) /(2 )

( 1) /(2 ) (2 1)( 1) /(6 )

10 1 0

10 0

tt ttn tt

t tt t

n n

n n n n n

n n

n n n n n

n n nvtn n t t

n

εεε

ε ε

−

−

− −

− −

⎡ ⎤+⎢ ⎥⎢ ⎥+ + +⎣ ⎦

+⎡ ⎤= ⎢ ⎥

+ + +⎢ ⎥⎣ ⎦⎡⎡ ⎤

⎡ ⎤ ⎡ ⎤⎛ ⎞ ⎢ ⎥⎡ ⎤⎢ ⎥ ⎢ ⎥= ⋅ = =⎜ ⎟ ⎢ ⎥⎢ ⎥⎜ ⎟⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎢ ⎥⎝ ⎠⎣ ⎦ ⎣ ⎦⎣ ⎦

∑∑∑ ∑ ∑( )

11

2

1

1

1 1/ 2 1 1/ 2,

1/ 2 1/ 3 1/ 2 1/ 3,

(0, )

1 1

( ') ( ) ( ')

tt

tt

n nn n

Dn

t tt tn n n n

ntnn

Clearly Q Q Q

Based on a CLT for nonstationary processes

v N Q

n nV E u u E u E u E

ε

ε

σ

ε ε

−−→∞ →∞

⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥

⎣ ⎦⎣ ⎦

⎡ ⎤ ⎡ ⎤⎯⎯⎯⎯→ = ⇒ ⎯⎯⎯⎯→⎢ ⎥ ⎢ ⎥

⎣ ⎦ ⎣ ⎦

⎯⎯→

⎛ ⎞ ⎛⎜ ⎟⎝ ⎠ ⎝

= − =

∑∑

∑ ∑'1 1 1 1

1 11 1 1 1

1 1

1

t t t tt t t t

t tt t t t t tt t t t

t t t tt t t t

t t

tnn n n nE E

t tt t tn nn nn n nn n n n

tE En n n

tEn n

ε ε ε ε

ε εε ε ε ε

ε ε ε ε

ε ε

⎛ ⎞⎛ ⎞ ⎡ ⎤ ⎡ ⎤⎞ ⎛ ⎞⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥⎠ ⎝ ⎠ ⎝ ⎠⎜ ⎟ ⎢ ⎥ ⎢ ⎥−⎜ ⎟ ⎢ ⎥ ⎢ ⎥⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞⎜ ⎟ ⎢ ⎥ ⎢ ⎥⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎣ ⎦ ⎣ ⎦⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠

⎛ ⎞⎛ ⎞⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠=

∑ ∑ ∑ ∑∑ ∑∑ ∑ ∑ ∑

∑ ∑ ∑ ∑'

2 2

22 2

2

1 1( ) ( )( ) 0 0

1 11

1 1

1 1

t tt tt

t tt t t tt t t t

t tt t

t tt t

nE nEtn n with E so E

t t nt t E EEn n n nn n n

tE En n n

t tE En n n n

ε εε ε

ε εε ε

ε ε

ε ε

⎛ ⎞ ⎡ ⎤ ⎡ ⎤⎜ ⎟ ⎢ ⎥ ⎢ ⎥⎜ ⎟ ⎢ ⎥ ⎢ ⎥− = =⎜ ⎟ ⎢ ⎥ ⎢ ⎥⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥⎜ ⎟ ⎣ ⎦ ⎣ ⎦⎝ ⎠ ⎝ ⎠⎝ ⎠⎛ ⎛ ⎞⎛ ⎞⎜ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎜

= ⎜ ⎛ ⎞⎛ ⎞⎜ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎝

∑∑ ∑∑ ∑ ∑ ∑

∑ ∑∑ ∑

2 2 2 21 2

2 2 22 2 2 2 2 2

1 2 1 22 2

sin ( , ) 0, ( ) 0 ( ) 0

1 1 1 2( ) ( ) ( ) ... ( )

1 1 2 1 1 2( ) ( ) ... ( ) ( ) ( ) ... ( )

i j i i j

t n

n n

ce Cov e e E e E e e for i j

nnE E E En n n n n

win nE E E E E E

n n n n n nn n

ε ε ε ε

ε ε ε ε ε ε

⎞⎟⎟

= = ⇒ = ≠⎟⎟

⎜ ⎟⎜ ⎟⎠

⎛ ⎞⎡ ⎤+ + +⎜ ⎟⎢ ⎥

⎣ ⎦⎜ ⎟= ⎜ ⎟⎡ ⎤⎡ ⎤⎜ ⎟⎢ ⎥+ + + + + +⎢ ⎥⎜ ⎟⎜ ⎟⎢ ⎥⎣ ⎦ ⎣ ⎦⎝ ⎠

2 2

2 22 2

2 22 2

2 2 2 2 2 2 22 2 2

2 3 2 2

2

( )

(1 2 ... ) ( 1) / 2( 1) / 2

( 1) / 2 (2 1)( 1) / 6(1 2 ... ) (1 2 ... ) ( 1) / 2 (2 1)( 1) / 6

1 1/

t

n

th E for all t

n n nn nn n

n n n n nn n n n n n nn n n n

ε σ

σ σσ σσ σ

σ σ σ σ σ σ

σ→∞

=

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟+ + + ⎛ ⎞+⎜ ⎟ ⎜ ⎟ ⎜ ⎟= = =⎜ ⎟ ⎜ ⎟ ⎜ ⎟+ + +⎜ ⎟ ⎜ ⎟ ⎝ ⎠+ + + + + + +⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

⎯⎯⎯⎯→

( )

2

1 1 2 1 2 1 13/ 2

21/ 2 1/ 3

ˆ( ), (0, ') (0, ) sin

ˆ( )D

n n

Q

nBy Slutsky Q v N Q QQ N Q ce Q symmetric

n

σ

α ασ σ

δ δ

− − − − −

⎡ ⎤=⎢ ⎥

⎣ ⎦⎡ ⎤−⎢ ⎥∴ = ⎯⎯→ =⎢ ⎥−⎣ ⎦

Cointegration Regressions This is the other case where we don’t have stationarity, but we can still apply OLS to obtain consistent results.

1. Background: Modeling Trends The idea of this chapter is that economic time series can be represented as the sum of…

i. a linear time trend, ii. a stochastic trend, iii. and a stationary process

Def: Deterministic Trend / Linear Time Trend is a trend in the mean Def: Stochastic Trend is a Martingale36, a trend where the best predictor of future value is its current value (and thus every change in the trend seems to have a permanent effect on its future value).

36 Zt is a martingale if E(Zt | Zt-1, Zt-2, …) = Zt-1

Unit Root : Let Yt = Yt-1 + εt (where εt is iid white noise)

1. Step Function (maps discrete time to cumulative shocks i.e Yt): ⎣ ⎦

∑=

≡un

t

tn n

uW1

1)(εσ

ε

2. Asymptotic Results of Wn(u)

• CLT37: For any ),0()(],1,0[ uNuWuanyFor Dn →∈

• FCLT: If εt is iid white noise, then )()(],1,0[ uWuWu Dn →∈∀ (this is a convergence of a sequence of functions) Note: This means that for our step function converges to a Brownian motion at the limit, when the steps become finer and finer, or equivalently, when the shocks are accumulated in shorter and shorter periods of time until it becomes “continuous time”

2. Useful Things to Keep in Mind in Deriving Asymptotic Convergence Results to Stochastic Integral:

⎣ ⎦

)(1)(1

uWn

uW eD

un

ttne σεσ →≡ ∑

=

1 12

01 1

1 ( ) ( )n t

s t Dt s

W u dW un εε ε σ

−

= =

→∑∑ ∫

a) ⎣ ⎦

∑=

≡un

t

tn n

uW1

1)(εσ

ε

b) 11

1111−

=

=⎟⎠⎞

⎜⎝⎛ −

==⎟⎠⎞

⎜⎝⎛ ∑ tnt

t

t

tn Y

nntWORY

nnntW

εεε σσσε

c) )1,0[111 n

forYnnn

tW t

t

t

tn ∈==⎟

⎠⎞

⎜⎝⎛ + ∑

=

εσσ

εε

εε

d) ( ) )(1 1

cfromfollowsthisduuWntW

nn

t

nt nn ∫

+

=⎟⎠⎞

⎜⎝⎛

e) ( ) ( ) ( ) ( )∫∑∫∑∫∑∫∑ ==⎟⎠⎞

⎜⎝⎛ −

==⎟⎠⎞

⎜⎝⎛

=−

=

−

=

+−

=

1

01

11

1

0

1

0

11

0

111 duuWduuWn

tWn

ORduuWduuWntW

n n

n

t

nt

nt n

n

tnn

n

t

nt

nt n

n

tn

f) ( ) ( )∫∑∫∑ ==⎟⎠⎞

⎜⎝⎛ −

==

−

−

=−

1

00

1

1

01

111 duuWduuWn

tWn

Yn n

n

t

nt

nt n

t

tnt

37 ⎣ ⎦ ⎣ ⎦

⎣ ⎦

⎣ ⎦

⎣ ⎦

⎣ ⎦ ⎣ ⎦ un

unandN

unceuN

unnun

nuW D

un

t

tD

un

t

tun

t

tn →→→=≡ ∑∑∑

===

)1,0(1sin),0(11)(111 εεε σ

εσε

σε

Applications of Unit Root Tools

1. Unit Root and Regression Coefficient

Suppose we have the following unit root process, (That is, let Y0 = 0) ∑∑==

− =+=+=t

ss

t

ssttt YYY

1101 εεε

What is the distribution of sampling error when we regress Yt on Yt-1?

( )

( )( ) ( )( )

( )( ) ( )( )

( )( ))(

)()()1ˆ(

)()(1

11

11

1

11

1

11

1

11

1

)1ˆ(

1

0

2

1

0

1

0

221

0

221

0

2

1

1

1

1

0

22

1

1

1

11

22

1

1

1

1

22

1

1

1

1

21

1

1

1

21

1

1

1

1

1

21

11

FullerDickieisThisCMTbyduuW

udWuWn

CMTbyduuWduuWandudWuWn

duuW

n

duuW

n

ntW

n

n

ntW

n

n

nn

n

Ynn

Ynn

D

DnD

n

tt

t

ss

n

n

tt

t

ss

n

t

nt

nt n

n

tt

t

ss

n

tn

n

tt

t

ss

n

tn

n

tt

t

ss

n

t

t

ss

n

tt

t

ss

n

tt

n

ttt

−→−⇒

→→

==

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎠⎞

⎜⎝⎛ −

=

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎠⎞

⎜⎝⎛ −

=

⎟⎟⎠

⎞⎜⎜⎝

⎛==−

∫∫

∫∫∫∑∑

∫

∑∑

∑∫

∑∑

∑

∑∑

∑

∑∑

∑ ∑

∑∑

∑

∑

=

−

=

=

−

=

=−

=

−

=

=

=

−

=

=

=

−

=

=

−

=

=

−

=

=−

=−

φ

σσσεε

σ

εε

σ

εε

σ

εε

σ

εε

ε

εεεφ

εεε

εε

εε

2. AR(1) with Unit Root and Constant

Suppose we have the following unit root process: ttt YY εμ ++= −1 (Here each period there are 2 terms in the shock) What is the distribution of sampling error of the OLS estimate from regression Yt on (1, Yt-1)

( )( )

( )

( )

( )

( ) ( ) ( ) ( ) ( )

1 11 1 1

2 21 1

1 11 1 1 1 1

11 1 1 1 1 1 1 1 1 1 1

ˆ1 1

1 1 1 1 1 ( 1) 1,2 2

( 1

n n n

t t t t tt t t

n n

t tt t

n n t n t n t n t n t

t s s s st t s t s t s t s t s

Y Y Y Yn

Y Y Y Yn n

n n nNow Y Yn n n n n n n

nY

ε ε εφ φ

1μ ε ε μ ε μ ε

− − − −= = =

− − − −= =

− − − − −

− −= = = = = = = = = = =

−

− −

− = =

− −

− −= = + = + = + = +

−⇒ −

∑ ∑ ∑

∑ ∑

∑ ∑∑ ∑∑ ∑∑ ∑∑ ∑∑ μ

( )

( ) ( ) ( ) ( )

1

1 1 1

1

1 01 1

1 1 1 1

11 1 1 1

) 1 1 12

1 ( 1) 1 1 1 ( )2

, ( 1)

n t n

s nt s t

tn nn

n n n ntt t n

t t t t

t s s ss s s s

tWnn n

n t tY W W W u du Wn n nn

Similarly Y t

ε

ε

μ ε σ

μ σ

ε μ ε μ ε μ σ

−

= = =

− −= =

− − − −

−= = = =

⎛ ⎞ −⎛ ⎞ ⎛ ⎞= =⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

− − −⎛ ⎞ ⎛ ⎞ ⎛ ⎞⇒ − = = = ≡⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠

= + = + = + − =

∑ ∑ ∑

∑ ∑∫ ∫

∑ ∑ ∑ ∑

( )

1 1

11

2

11

1 ( 1)

1 1 1( 1) ( 1)

1 1 1( 1)2ˆ

1 1 ( 1) ( 1)2 2

n

t n t n

n

t t nt

n

tt

tnW tn

t tY t nW Y t Wn nn

n tY t Y Wn n

nn n n nY Y

n n

ε

ε ε

ε

μ

μ σ μ σ

μ μ ε σφ φ

μ μ

− −

− −=

− −=

−⎛ ⎞ + −⎜ ⎟⎝ ⎠

− −⎛ ⎞ ⎛ ⎞⇒ − − = ⇒ − − =⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎜ ⎟ ⎣ ⎦ ⎜ ⎟⎝ ⎠ ⎝ ⎠

⎛ ⎞− −⎡ ⎤− − − −⎡ ⎤⎜ ⎟⎣ ⎦ ⎢ ⎥⎣ ⎦⎝ ⎠∴ − = =⎛ ⎞− −⎡ ⎤ ⎡ ⎤− − −⎜ ⎟⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦⎝ ⎠

∑

∑1

2

1

1

...1 1

n

n tt

n

n nt

Wn

tW Wn nε

ε

σ

=

=

⎛ ⎞⎛ ⎞ −⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

⎛ ⎞−⎛ ⎞ −⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

∑

∑

3. Spurious Regression Example Suppose we’re estimating

1 11 1

1 1 1 1 1 1 12 2

2

1 1 1 1 1

1 1, ,

1 1 1 1

ˆ

1 1

t t

t t t t s t s n t n ts s

n n t t n t t

t t s s s s nt t s s t s s

n n t n tt s s

t t s t s

t tY X by OLS where Y X W Y V Xn nn n

tX Y Wn nn n

Xn n

ε ε

ε

β ε ε ησ σ

ε η ε η σβ

η η

+ += =

= = = = = = =

= = = = =

⎛ ⎞ ⎛ ⎞= + = = ≡ ≡⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

−

= = = =⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

∑ ∑

∑ ∑∑ ∑ ∑ ∑ ∑

∑ ∑ ∑ ∑ ∑

( ) ( )

1 12 2

2

1 1

11 0

2

11

1 1 1

1 1

1 1

1

n n

n nt t

n n

n nt t

tnn

n ntn nt n

tnn

ntt n

t tV Wn n n

t tV Vn n

t tW V W u V u dun n

tVn

η ε η

η η

εε

η

σ σ σ

σ σ

σ σ

σ

= =

= =

−=

−=

1

1

ntV

n

n

− − −⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠

=⎛ ⎞ ⎛ ⎞− −⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠

− −⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

= =⎛ ⎞−⎛ ⎞⎜ ⎟⎜ ⎟

⎝ ⎠⎝ ⎠

∑ ∑

∑ ∑

∑∫

∑∫ ( )

( ) ( )

( )

1 1

01 12 2

0 0

D

n

W u V u du

V u du V u du

ε

η η

σ

σ σ→

⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦

∫ ∫∫ ∫

What is the distribution of variance of errors?

( ) ( )

( )( ) ( )

( )

( ) ( )

( )( )

2 22 2

21 12

2 22 2 2 0 01 12 2

1 1 1 1 1 1 1 10 0

2

ˆ ˆ ˆ ˆˆ ˆ 2

ˆ ˆˆ 2 2

1

t t t t t t t t t t

n n n n n t n nn n n nt t t t t s t t t

t t t t t s t tn n

e Y X e Y X Y X Y X

W u V u du W u V u due Y X Y X X Y X

V u du V u du

n

ε ε

η η

β β β β

σ σβ β ε

σ σ= = = = = = = =

= − ⇒ = − = − +

⎡ ⎤⎛ ⎞ ⎢ ⎥

⇒ = − + = − +⎜ ⎟ ⎢ ⎥⎜ ⎟ ⎢ ⎥⎝ ⎠ ⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦⎣ ⎦

⇒

∫ ∫∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑∫ ∫

( ) ( )

( )

( ) ( )

( )

( ) ( )

( )

21 12 2

2 0 01 12 2

1 1 1 1 10 0

11 22 0

2 1 201

0

1 1 1 1 1 1 1ˆ 2

1 ˆ ( ) 2

n n t n nn n n nt s t t t

t t s t tn n

n n nt n

t n

W u V u du W u V u due X Y

n n nn n nV u du V u du

W u V u due W u du

n V u du

ε ε

η η

ε

ε

η

σ σε

σ σ

σσ σ

σ

= = = = =

=

⎡ ⎤⎛ ⎞ ⎢ ⎥ ⎛ ⎞

= − +⎜ ⎟ ⎢ ⎥ ⎜ ⎟⎜ ⎟ ⎝ ⎠⎢ ⎥⎝ ⎠ ⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦⎣ ⎦

⇒ = −⎡ ⎤⎣ ⎦⎡ ⎤⎣ ⎦

∫ ∫∑ ∑ ∑ ∑ ∑∫ ∫

∫∑ ∫∫

( ) ( )( ) ( )

( )

Xn

( )

( ) ( )

( )

( ) ( )

( )

( ) ( )

211 1 20

1 20 0

02 21 1

1 0 021 12 20

0 021

1 02

0

( ) 2

( )

n nn n n

n

n n n n

n

n n

n n

n

W u V u duW u V u du V u du

V u du

W u V u du W u V u duW u du

V u du V u du

W u V u duW u du

ε

ε η η

η

ε ε

ε

η η

ε

ε

σσ σ

σ

σ σσ

σ σ

σσ

σ

⎡ ⎤⎢ ⎥

⎡ ⎤+ ⎢ ⎥ ⎣ ⎦⎢ ⎥⎡ ⎤⎣ ⎦⎣ ⎦

⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦= − +⎡ ⎤⎣ ⎦

⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦

⎡ ⎤⎢ ⎥⎣ ⎦= −⎡ ⎤⎣ ⎦

∫∫ ∫

∫

∫ ∫∫

∫ ∫

∫∫

( )

[ ]( ) ( )

( )

( ) ( )

( )

( )

1 2

021

1 021 20

0

21

1 022 2 1 20

222 1 1 0ˆ 1 2

2 2 20

1 1 1

( )

( )1 1ˆ ˆˆ 1ˆ 0

1

n

D

n nn n

nt t

nt t

n n nnt t t

t t t

V u du

W u V u duW u du

V u du

W u V u duW u du

e e V u dun nSince

n V u duX X Xn

η

ε

ε

η

ε

ε

ηε

β

η

σσ

σ

σσ

σσσ

σ

= =

= = =

⎡ ⎤⎣ ⎦

⎡ ⎤⎢ ⎥⎣ ⎦→ −

⎡ ⎤⎣ ⎦

⎡ ⎤⎢ ⎥⎣ ⎦−⎡ ⎤⎣ ⎦

⎡ ⎤⎣ ⎦= = = = →

⎡ ⎤⎣ ⎦

∫

∫∫

∫

∫∫∑ ∑ ∫

∫∑ ∑ ∑

complements if a then s a a ( 1 ) · i} as a sample path of an infinite sequence of random...

Documents