econometrics iii daniel l. millimetpreceding examples are two-dimensional panels i more complex...

ECO 6375Econometrics III

Daniel L. Millimet

Southern Methodist University

Fall 2020

DL Millimet (SMU) ECO 6375 Fall 2020 1 / 164

Panel DataIntroduction

DefinitionPanel data refers to data with multiple observations of each basic unit ofobservation.

Does not necessarily involve time dimension

ExamplesI yit = outcome of observation i at time t (i = 1, ...,N; t = 1, ...,T )

F Exs: countries over time; firms over time; individuals over time

I yih = outcome of individual i in household h (i = 1, ...,N; h = 1, ...,H)I yim = outcome of employee i in firm m (i = 1, ...,N; m = 1, ...,M)I yis = test score of student i in subject s (i = 1, ...,N; s = 1, ...,S)


Preceding examples are two-dimensional panelsI More complex panels involve more dimensionsI Examples

F yist = outcome of county i in state s at time tF yiht = outcome of individual i in household h at time tF yijst = outcome of firm i in industry j in state s at time t

For exposition, we will stick to repeated observations of the same uniti over time t


Types of panel data sets ...

DefinitionBalanced panel data set refers to situation where each unit of observationi is observed the same number of time periods, T . Thus, the total samplesize is NT .

DefinitionUnbalanced panel data set refers to situation where each unit ofobservation i is observed an unequal number of time periods, Ti . Thus,the total sample size is ∑i Ti .

Micro vs macro panelsI Micro: Typically large N, small TI Macro: Typically small N, large T

F Some convergence over time as data become plentiful


Why does panel data require special attention?

Panel data is useful due to1 Effi ciency gains from large samples2 Potential to analyze dynamic behavior (relative to CS)3 Potential to address correlation btw covariates and the error

F Panel data methods provide a solution to endogeneity in only certainsituations

F Requires ‘new’estimation techniques

Asymptotics with panel data requires more precisionI Large N, small TI Small N, large TI Large N, large T (jointly or sequentially)

Econometric analysis of panel data started with Mundlak (1961,1978), Hoch (1962), Balestra, Nerlove (1966)


Panel DataModel & Terminology

Population regression fn given by E[y |x1, ..., xk , c ]I Assuming linearity: E[y |x1, ..., xk , c ] = β0 + xβ+ cI xk , k = 1, ...,K , are observable (to the econometrician)I c is an unobservable (to the econometrician) variable

Error form of the model

y = β0 + xβ+ c + ε

where c is the unobserved effect and ε is the idiosyncratic error

c also referred to as individual heterogeneity, unobserved component,individual effect, latent variable


Some historyI c may be thought of as either a random variable, whose dbn may beestimated, or as a fixed parameter to be estimated

F c is called a random effect in the first view, a fixed effect in the secondview (Balestra & Nerlove 1966)

F This terminology is not the common usage of these terms today

I Terminology has become linked to assumption concerning correlationbetween x and c

F Random effect: Cov(x , c) = 0F Fixed effect: Cov(x , c) 6= 0

I The ‘modern’usage of the terms are now universal


Formally, model is given by

yit = β0 + xitβ+ εit

where εit is the composite error and captures all unobserveddeterminants of yit

I The composite error is decomposed into partsI In a one-way error components model, εit = ci + εit

F Unobserved effect captures time invariant unobserved attributesspecific to i

F Idiosyncratic error captures the remainder

I In a two-way error components model, εit = ci + λt + εitF Unobserved time effect captures individual invariant unobservedattributes specific to t

F Idiosyncratic error captures the remainder

Problem: Given presence of ci and λt and access to panel data, howcan we recover consistent estimates of β?


Roadmap

1 Estimation techniques

1 Pooled OLS (POLS)2 Random effects (RE)3 Least squares dummy variable model (LSDV)4 Fixed effects (FE)5 First-differencing (FD)6 Long differences (LD)

2 Dynamic panel models3 LDV models revisited4 Time series concerns revisited


Panel DataPooled OLS

Estimate by OLSyit = β0 + xitβ+ εit

Assumptions

(POLS.i) Linearity(POLS.ii) (Contemporaneous) Exogeneity: E[x ′it εit ] = 0, t = 1, ...,T

If E [εε′] 6= σ2 INT , POLS1 Is ineffi cient relative to GLS2 Provides incorrect standard errors3 Requires large N, small T asymptotics


Two optionsI Estimate parameters using POLS, but report cluster-robust standarderrors to allow for arbitrary correlation within i

F See Cameron & Miller (JHR, 2015)F More details laterF Stata: -reg, vce(cluster i)-

I Place structure on E [εε′] = Ω(θ) and use pooled FGLS (PFGLS)F Equicorrelated errors: ρts = Corr(εit , εis ) = ρ ∀i , t 6= sF Autoregressive errors: AR(p) s.t. εit = ∑ps=1 ρs εit−s + uitF Moving average errors: MA(p) s.t. εit = uit −∑ps=1 θsuit−sF Unstructured, except equal across i :

ρts =1N

∑i(εit − εt) (εis − εs)

F Stata: -xtreg, pa-


Panel DataRandom Effects

Estimate by FGLS

yit = β0 + xitβ+ εit , εit = ci + εit

This is identical to POLS except explicit structure is placed on thecomposite error

Assumptions

(RE.i) Strict exogeneity: E[εit |xi , ci ] = 0, t = 1, ...,T (xi is a vector)(RE.ii) Independence: E[ci |xi ] = E[ci ] = 0(RE.iii) Full rank: E[x ′iΩ

−1i xi ] = Kwhere Ωi = E[εi ε

′i ] is a T × T matrix

(RE.iv) Conditional homoskedasticity and zero covariances:E[εi ε′i |xi , ci ] = σ2ε IT

(RE.v) Homoskedasticity: E[c2i |xi ] = σ2c


Estimation by Feasible GLS (FGLS)

Under (RE.i) — (RE.iii), FGLS uses an unrestricted estimate of Ωi

βFGLS =(

∑i x′i Ω−1i xi

)−1 (∑i x

′i Ω−1i yi

)Using (RE.iv), (RE.v), Ωi = Ω(θ) has a convenient form

Ω(θ) =

σ2ε + σ2c σ2c · · · σ2c

σ2c. . .

......

. . . σ2cσ2c · · · σ2c σ2ε + σ2c

(RE structure)

RE structure ⇒

corr(εit , εis ) =σ2c

σ2c + σ2ε=

Var(c)Var(ε)

> 0 ∀t 6= s

which measures the fraction of the variance of the composite errordue to the unobserved effectDL Millimet (SMU) ECO 6375 Fall 2020 13 / 164

Given estimates of σ2c and σ2ε , estimator given by

βRE =(

∑i x′i Ω−1xi

)−1 (∑i x

′i Ω−1yi

)where

Ω = σ2ε IT + σ2c jT j′T

IT = T × T identity matrix

jT = T × 1 vector of 1s

NotesI Consistency does not require (RE.iv), (RE.v) holding — just like OLS isconsistent even when Ω 6= IT

I Consistency of FGLS does require strict exogeneity such that

plim(

∑i x′i Ω−1xi

)−1 (∑i x

′i Ω−1εi

)= 0


To estimate σ2c and σ2εI Do POLS ⇒ βPOLSI Obtain residuals ⇒ εitI Obtain estimates

σ2ε =1

NT −K ∑i ∑t ε2itσ2c =

1[NT (T−1)

2 −K] ∑i ∑T−1t=1 ∑Ts=t+1 εitεis

σ2ε = σ2ε − σ2c

If εit is heteroskedastic and/or serially correlated, then

Ω =1N

∑iεiε′i

is more effi cient asymptotically, but improved finite sampleperformance requires N >>> TNote: RE estimator is asymptotically equivalent to PFGLS imposingequicorrelated errors


Can show that the RE estimator is equivalent to a quasi-timedemeaned estimator

I Define ∨z it ≡ zit − λz i ·

for any variable zit for some value of λand z i · is the average of zwithin i

I The RE estimator may be obtained by POLS estimation of

∨y it =

∨x itβ+

∨ε it

with

λ = 1−

1

1+ T(

σ2cσ2ε

)1/2

I In contrast, POLS sets λ = 0I λ 6= 0 ⇒ strict exogeneity is needed instead of exogeneity


Specification test: RE vs. POLS

Hypothesis

Ho : σ2c = 0

Ha : σ2c 6= 0

Devising a test is diffi cult because the null lies on the edge of thefeasible parameter space

Breusch-Pagan (1980) test

λLM =NT

2(T − 1)

∑Ni=1

(∑Tt=1εit)2

∑Ni=1 ∑T

t=1ε2it − 1

2

∼ χ21

where εit is the residual from POLS and ∑Tt=1εit = Tεit

Rejection of Ho is indicative of serial correlation and not necessarilythe RE structure per se


A more robust test which is valid for any dbn of εit is based on

∑Ni=1 ∑T−1

t=1 ∑Ts=t+1

εitεis[∑Ni=1

(∑T−1t=1 ∑T

s=t+1εitεis)2]1/2 ∼N(0, 1)

where rejection of Ho again is indicative of serial correlation and notnecessarily the RE structure per se

Wood (Biometrika, 2013) develops an alternative test

Baltagi & Li (1995) devise a test of serial correlation of εit assumingc and ε are normally distributed; this is a test of the RE structure as εshould be serially uncorrelated

Extension to two-way error component model most easily handled byadding T − 1 time dummies, although there is also a two-way REerror components model as well estimable via FGLS

Stata: -xtreg, re-, -xttest0-


Panel DataFixed Effects

Modelyit = β0 + xitβ+ εit , εit = ci + εit

Alternative strategies transform the model to eliminate ci from themodel

I Several possible transformationsI One commonly used approach is FE; also known as within estimator, ormean-differencing

Assumptions

(FE.i) (RE.i) Strict exogeneity(FE.ii) Full rank: E[

..x ′i..x i ] = K

where..x ′i = xi − x i ·

(FE.iii) (RE.iv) Conditional homoskedasticity and zero covariances


Estimation...

Structural modelyit = ci + xitβ+ εit

where β0 is subsumed into ciAveraging over T time periods implies

y i = ci + x i β+ εi

andy = c + xβ+ ε

where bars indicate average over T obs within group; double barsindicate average over entire sample


Taking differences yields

yit − y i = (xit − x i )β+ (εit − εi )..y it =

..x itβ+

..εit

I Estimable by POLS without a constantI Equivalent to our prior notation with λ = 1

Alternative estimating equation

..y it + y = c + (

..x it + x)β+ (

..εit + ε)˜y it = c + ˜x itβ+ ˜εit

I Estimable by POLS with a constantI Constant is the mean unobserved effect


NotesI Although estimating equation differs from structural equation, interpret

βFE as original βI Transformation eliminates ci and any time invariant x’s; time invariantx’s invalidate (FE.ii)

I Estimates of unobserved effects are recoverable from FE estimation

ci = y i − x i βFE

F The dbn of c can be examined to assess the amount of heterogeneity inthe population

F ci is the mean of εit over T for each iF ci is unbiased, but consistency requires T → ∞F Stata: -predict, u-

I Between estimator is obtained from OLS estimation of

y i = x i β+ εi

which is only consistent under (RE.i) and (RE.ii)


Inference

Var(

βFE

)= σ2..ε

(∑i ∑t

..x ′it

..x it)−1

σ2..ε =1

N(T − 1)−K ∑i ∑t..ε2

it

I Differs from the usual POLS std errors since that would use NT −Kdof rather than N(T − 1)−K

I Difference arises since std errors must account for fact that N dof arelost due to estimation of means

I Solution: Multiply std errors by correction factor ⇒√

NT−KN (T−1)−K

Stata: -xtreg, fe-, -areg-


Extension to two-way error component model most easily handled byadding T − 1 time dummiesIf T is too large, an alternative is to transform the model to eliminatethe time effects, λt

I Model given byyit = xitβ+ εit

wherezit = zit − z i − z t + z

for z = y , x , ε estimable by POLS with dof correctionI Additional parameters estimated as

c = yit − xit βci = (y i − y)− (x i − x)β

λt = (y t − y)− (x t − x)β

I Stata: -felsdvreg-, -reghdfe-


Specification tests

εit are serially correlatedI Not obvious since FE yields estimates of

..εit , not εit

I Moreover,..εit are negatively serially correlated under Ho : εit is serially

uncorrelatedCorr(

..εit ,

..εis ) = −

1T − 1 ∀s 6= t

I Test implemented by estimating via OLS

..εiT = α0 + α1

..εiT−1 + ηi , i = 1, ...,N

and testing Ho : α1 = −1/(T − 1) using a standard t-testI Presence of serial correlation invalidates the usual FE standard errors;cluster-robust std errors given by

Var(

βFE

)=(..x ′..x)−1 (

∑i..x ′i..εi..ε′i..x i

)−1 (..x ′..x)−1

F See Cameron & Miller (JHR, 2015)


(cont.)I Alternative to cluster-robust standard errors is FE-FGLS estimator maybe used

F AR(1) model setup

yit = ci + βxit + εit

εit = ρεit−1 + uit

where |ρ| < 1 and uit ∼ WN(0, σ2u )F Estimation proceeds by estimating ρ and transforming the model toyield iid errors, then the FE or RE estimator is applied to thetransformed model

F See Baltagi & Li (1991), Baltagi & Wu (1999), Wooldridge (2002, pp.276-8)

F Stata: -xtregar-


εit are cross-sectionally correlated (aka cross-sectional dependence)

Ho : Cov(εit , εjt ) = 0 ∀i 6= jHa : Cov(εit , εjt ) 6= 0 for some i 6= j

I T > N : Breusch-Pagan (1980) test

λLM = T ∑N−1i=1 ∑N

j=i+1 ρ2ij ∼ χ2d

where d = N(N − 1)/2 and

ρij = ρji =∑Tt=1 εit εjt√(

∑Tt=1 ε2it

)√(∑Tt=1 ε2jt

)F Intuition: compute N ×N correlation matrix; test significance ofoff-diagonal terms

F Test does not have good statistical properties when T < N , and likelyto do worse as N → ∞

F See -xttest2- in Stata


(cont.)I A scaled version is appropriate asymptotically if T → ∞ and thenN → ∞

λSCLM =

√1

N(N − 1) ∑N−1i=1 ∑N

j=i+1(T ρ2ij − 1) ∼N(0, 1)

with an improved version developed in Peasaran et al. (2008)I T < N : Pesaran (2004) test

λCD =

√2T

N(N − 1) ∑N−1i=1 ∑N

j=i+1 ρij ∼N(0, 1)

as N → ∞ for suffi ciently large T

F Stata: -xtcsd, pes-


Groupwise heteroskedasticityI Errors are homoskedastic within groups, heteroskedastic across groupsI For example, errors for a given individual have same variance in eachperiod, but each individual has a unique variance

I Structure:

Σi︸︷︷︸TxT

=

σ2i

0. . .

.... . .

. . .0 · · · 0 σ2i

; Σ︸︷︷︸NTxNT

=

Σ1

0. . .

.... . .

. . .0 · · · 0 ΣN

I Hypothesis

Ho : σ2i = σ2 ∀iHa : σ2i 6= σ2 for some i


(cont.)I Modified Wald test statistic

W ′ = ∑Ni=1

(σ2i − σ2

)2Vi

∼ χ2N

where

Vi =1

T − 1 ∑Tt=1

(ε2it − σ2i

)2σ2i =

1T ∑T

t=1 ε2it

I Notes

F Valid in presence of non-normalityF Lower power in ‘large N , small T’FE models

I Solution: FGLS or robust standard errorsI Stata: -xttest3-


Hausman test: FE vs. RE (Ho : βFE = βRE )I Intuition

F If Cov(ci , xit ) = 0, then RE and FE are both consistent, but RE ismore effi cient ⇒ βRE ≈ βFE

F If Cov(ci , xit ) 6= 0, then RE is inconsistent, but FE is consistent⇒ βRE 6= βFE

I Test statistic based on difference βFE − βRE

H = T(

βFE − βRE

)′ (ΣFE − ΣRE

)−1 (βFE − βRE

)∼ χ2K

I If test statistic is too large, then reject Cov(ci , xit ) = 0I Issues:

1 Does not perform well in small samples2 ΣFE − ΣRE need not be PD

I Stata: -hausman-


Alternative test: FE vs. REI Mundlak (1978) approach offers an alternative approach to FE;referred to as correlated random effects (CRE) approach as dependencebetween c and x is directly modeled

I Structural model

yit = ci + xitβ+ εit

E[ci |xit ] = x iγ

where we can write ci = x iγ+ ai in error formF Substitution implies

yit = xit β+ x iγ+ ci − E[ci |xit ]+ εit

= xit β+ x iγ+ ai + εit

which is estimable by RE since E[ai + εit |xi ] = 0F Time invariant regressors do not drop out, but coeff is β+ γF An equivalent Hausman test of FE vs. RE is given by Ho : γ = 0

I Approach is very useful in panel data LDV models (discussed later)I Alternative to this alternative: replace x i with xisTs=1 (i.e., x iγbecomes ∑s xisγs ) [Chamberlain approach]


Panel DataLSDV

Rather than eliminate ci , treats ci as a parameter to be estimated

Not included as part of the error term

Assumptions are identical to (FE.i) — (FE.iii)

Estimationyit = ∑N

j=1 cjDji + xitβ+ εit

where

Dji =1 if j = i0 otherwise

I Amounts to including N dummy vars (and no constant), 1 for eachunit of observation

I Estimated by POLS


βLSDV is consistent even if Cov(ci , xit ) 6= 0 (regressors can always becorrelated)

I Time invariant regressors still drop out since they are perfectly collinearwith the observation dummies

ci ,LSDV is unbiased (and = ci ,FE ), but consistency requires T → ∞Only feasible computationally if N is of reasonable size

Stata: -areg- (or use Stata’s factor vars syntax)


Panel DataFirst-Differencing

Alternative transformation than FE to eliminate ciAssumptions

(FD.i) (FE.i)(FD.ii) Full rank: ∑Tt=2 E[∆x ′it∆xit ] = K

where ∆ represents the change from the preceding year(FD.iii) Conditional homoskedasticity and zero covariances:

E[∆εi∆ε′i |xi1, ..., xiT , ci ] = σ2∆ε IT−1

Final assumption holds if εit follows a random walk

εit = εit−1 + ηit

where E[ηiη′i |xi1, ..., xiT , ci ] = σ2η IT−1, or if εit is white noise


Estimation

Structural modelyit = ci + xitβ+ εit

Implies

yi1 = ci + xi1β+ εi1...

yiT = ci + xiT β+ εiT


Taking differences between consecutive periods yields

yi2 − yi1 = (xi2 − xi1)β+ (εi2 − εi1)...

yiT − yiT−1 = (xiT − xiT−1)β+ (εiT − εiT−1)

or,

∆yi2 = ∆xi2β+ ∆εi2...

∆yiT = ∆xiT β+ ∆εiT

Estimating equation

∆yit = ∆xitβ+ ∆εit , i = 1, ...,N; t = 2, ...,T

which is estimable by POLSInference does not require dof adjustment since estimation based ononly N(T − 1) observationsCluster-robust std errors obtained in a similar fashion as in FEStata: -reg- and the time series operatorsDL Millimet (SMU) ECO 6375 Fall 2020 37 / 164

NotesI Interpretation of βFD is same as original βI Differencing eliminates ci and any time invariant x’sI Estimates of unobserved effects are recoverable from FD estimation

ci ,FD = y i − x i βFD

but consistency requires T → ∞

Testing for serial correlationI Estimate FD model ⇒ ∆εitI Estimate via OLS

∆εit = ρ∆εit−1 + υit , t = 3, ...,T ; i = 1, ...,N

I If εit is serially uncorrelated, then ∆εit will be serially correlated withCorr(∆εit ,∆εit−1) = −0.5


Panel DataComparison of Estimators

POLS vs RE vs FE/FD/LSDVI POLS/RE requires E[x ′itci ] = 0 ∀t, FE/FD/LSDV do notI RE/FE/FD/LSDV requires E[x ′is εit ] = 0 ∀s, POLS does not⇒ Little apparent advantage to RE

T = 2⇒ LSDV, FE, and FD are identicalT > 2⇒ FD differs, but all unbiased

I Under (FE.i) — (FE.iii), FE is effi cientI Under (FD.i) — (FD.iii), FD is effi cient

F Note: FE.iii assumes errors are iid in levels, FD.iii assumes errors are iidin differences (and hence may be serially correlated in levels) ... verydifferent assumptions

Long differences (LD) used in some instances; less effi cient, but maybe less susceptible to mis-specification biasMis-specification may be indicated if FE, FD, and LD yield largediscrepancies (ME, timing issues)


Panel DataStrict Exogeneity

Assumption of strict exogeneity is crucial in FE, FD models

Specification testI FE model

F Estimate via FEyit = ci + xit β+ zit+1δ+ εit

where z ⊆ xF Ho : strict exogeneity ⇔ Ho : δ = 0

I FD model

F Estimate via POLS

∆yit = ∆xit β+ zit δ+ ∆εit

where z ⊆ xF Ho : strict exogeneity ⇔ Ho : δ = 0


Panel DataGoodness of Fit

Three measures of goodness of fit are commonly reported

Within R2 : Corr[(yit − y i ), (xit β− x i β)]2

Between R2 : Corr[y i , x i β]2

Overall R2 : Corr[yit , xit β]2

Each measure is based on the squared correlation between the actualand fitted values


Panel DataLong Panels

Long panels entail T > N and asymptotics rely on T → ∞ for fixedN

Estimation can easily be handled using the LSDV approach

Thus, focus is on specification of the form of Ω ⇒ POLS or PFGLSI Arbitrary serial correlation in errors is not possible ⇒ need to specifythe form of Ω

I In addition, one can relax the assumption of independence across i

Stata: -xtpcse-, -xtgls-, -xtscc-


Panel DataMeasurement Error

Panel introduces several complications as it relates to measurementerror in covariates

1 In POLS, ME and OVB may partially offset2 In FE and FD,

F bias from ME may be large if covariates are highly persistentF bias from ME affects estimators differently

3 ME may be nonclassical in that it is also persistent

Do not ignore ME in panel models


Model setup

yit = ci + βx∗it + εit

xit = x∗it + υit

Assuming classical ME and x∗ to be stationary, one can showI For POLS

plim β = β

[1− σ2υ

σ2x

]︸︷︷︸Attenuation Bias

+σxcσ2x︸︷︷︸

OV Bias

I If sgn(β) = sgn(σxc ), then attenuation and omitted variable bias atleast partially offset


Assuming classical ME and x∗ to be stationary, one can show(McKinnish 2008)

I For differenced estimators

plim β = β

[σ2x ∗(1− ρj )

σ2x ∗(1− ρj ) + σ2υ

]

where ρj = Corr(x∗it , x∗it−j )

I For FE estimator

plim β = β

σ2x ∗ − 1T 2

[T + 2∑T−1j=1 (T − j − 1)ρj

]σ2x ∗ − 1

T 2

[T + 2∑T−1j=1 (T − j − 1)ρj

]+(T−1T

)σ2υ


Griliches & Hausman (1986) verify that if x∗it has a decliningcorrelogram, then ρ1 > ρ2 > ... and

I FD has the largest biasI Longer differences reduce the attenuation biasI T − 1 long difference has the smallest biasI Relative bias of long differences shorter than T − 1 and the FEestimator is ambiguous, depends on specifics of the correlationstructure

I Relative bias of POLS and FE/FD/LD is unknownI Longer differences result in a loss of dof, effi ciency loss


Serial correlation in ME

yit = ci + βx∗it + εit

xit = x∗it + υit

x∗it = ρx∗it−1 + ηitυit = δυit−1 + ξ it

where ρ is the degree of persistance in x∗ and δ is the degree ofpersistance in υ

I For differenced estimators

plim β = β

[σ2x ∗(1− ρj )

σ2x ∗(1− ρj ) + σ2υ(1− δj )

]

where ρj = Corr(x∗it , x∗it−j ) and δj = Corr(υit , υit−j )

I If δ < ρ, then the pattern for bias remains the same: longer differences⇒ smaller bias

I If δ > ρ, then the pattern for bias reverses: FD ⇒ smallest bias


Consistent estimation in the presence of ME requires IVI Assuming

F No serial correlation in the ME,F Strict exogeneity of xit and x∗it conditional on ci with respect to εit , andF Strict exogeneity of x∗it conditional on ci with respect to υit ,

we can substitute for x∗it and FD to eliminate ci

∆yit = ∆xitβ+ ∆εit + ∆υit

I Possible instruments for ∆xit include xit−2 or xit+1 or externalinstruments


Panel DataContemporaneous Correlation Between Covariates and the Error

Measurement error or omitted variables can lead to contemporaneouscorrelation between x and ε

If c is also correlated with x , begin by first-differencing to eliminate c

∆yit = ∆xitβ+ ∆εit

POLS applied to this eqtn is biased if Corr(∆x ,∆ε) 6= 0IV methods remain consistent; in addition to finding typicalinstruments, zit , from outside the model, other possibilities include

I xit−2, xit+1I yit−2

While popular, such instruments may be weak, invalid in the presenceof serial correlation, or hard to justify (Reed 2015)

Stata: -xtivreg2-


Panel data can lead to other ‘tricks’that may generate instruments

Example: Pitt and Rosenzweig (1990) siblings FE modelI Sample restricted to hh’s with 1 boy, 1 girlI Interested in effect of an endogenous hh level variable, DhI No outside instruments availableI Solution

ybh = x1hβb + x2hγ+ τbDh + εbh

ygh = x1hβg + x2hγ+ τgDh + εgh

⇒ ∆yh = x1h∆β+ ∆τDh + ∆εh

and now x2h are available as instruments for Dh , but model onlyidentifies ∆τ (the differential impact of D on boys relative to girls)


Panel DataTime Invariant Regressors

FE, FD estimators preclude estimation of coeffs on time invariantregressorsOne solution is to use a two-step estimator

I Model setupyit = ci + xitβ+ ziα+ εit

where zi is a vector of time invariant regressorsI Estimation

F Step #1: Estimate via FE or FD

yit = ci + xit β+ εit

yielding a consistent estimate of β and an unbiased estimate of ciF Step #2: Estimate via OLS (or WLS)c i = α0 + ziα+ υi

(Hornstein & Greene 2012)

I α is consistent if E[υ|z ] = E[υ]; strong assumption on z


Alternative solution: Hausman & Taylor (1981)

Re-write above model as

yit = ci + x1itβ1 + x2itβ2 + z1iα1 + z2iα2 + εit

such thatI x1it = vector of K1 vars uncorrelated with ciI x2it = vector of K2 vars correlated with ciI z1i = vector of L1 vars uncorrelated with ciI z2i = vector of L2 vars correlated with ci

Assumption about the unobservables

E[ci |x1it , z1i ] = 0; E[ci |x2it , z2i ] 6= 0Var(ci |x1it , z1i , x2it , z2i ) = σ2c

Cov(εit , ci |x1it , z1i , x2it , z2i ) = 0

Var(εit + ci |x1it , z1i , x2it , z2i ) = σ2 = σ2ε + σ2c

Corr(εit + ci , εis + ci |x1it , z1i , x2it , z2i ) = ρ = σ2c/σ2


EstimationI POLS or GLS is biased and inconsistentI FE or FD precludes estimation of αI Solution is to use an IV estimator based only on the data at hand

F Mean-differencing yields

..y it =

..x1it β1 +

..x2it β2 +

..εit

which could be estimatedF This shows that

..x1 and

..x2 are uncorrelated with c and are thus

available as instruments for the eqtn in levelsF z1 serves as instruments for itselfF x1 serves as instruments for z2F ..x serve as instruments for x

I Identification requires K1 ≥ L2I IV regression must account for the treatment of c as a RE ⇒ IV-FGLSsolution

I Weak IVs may still be a problem

Stata: -xthtaylor-


Panel DataDynamic Panel Model (DPD)

Structural model

yit = ci + xitβ+ γyit−1 + εit , |γ| < 1

where i = 1, ...,N; t = 2, ...,T ; and T > 3Assumptions

(DPD.i) (FD.i) Strict exogeneity(DPD.ii) (FD.ii) Full rank(DPD.iii) (FD.iii) Conditional homoskedasticity and zero covariances(DPD.iv) Initial condition: E[yi1εit ] = 0 ∀t > 1

InterpretationI β = short-run (one-period) effect of a unit change in xI β/(1− γ) = long-run effect of a unit change in x


Comments

Even if Cov(ci , xit ) = 0, POLS/RE not applicable sinceCov(ci , yit−1) 6= 0Even if Cov(ci , xit ) = 0, FE/FD not applicable sinceE[yit−1εit−1] 6= 0⇒ yit−1 is not strictly exogenous

I Fixed Effects

F Nickel (1981) derived the formula for the bias of γFEF Bias → 0 as T → ∞ since Cov(

..y it−1,

..εit )→ 0 as T → ∞

I FD

F Model

∆yit = ∆xit β+ γ∆yit−1 + ∆εit , i = 1, ...,N ; t = 3, ...,T

which is not estimable by POLS since Cov(∆yit−1,∆εit ) 6= 0 sinceCov(yit−1, εit−1) 6= 0

F Bias 9 0 as T → ∞


For both FE/FD, bias 9 0 as N → ∞ ⇒ FE/FD are inconsistentunder large N asymptotics

If Cov(xit , ci ) = Cov(xit , εit ) = 0 ∀x , then

γPOLS > γ > γFE > γFD

since

Cov(yit−1, ci + εit ) > 0 > Cov(..y it−1,

..εit ) > Cov(∆yit−1,∆εit )

I Thus, estimators bound true valueI Not overly useful though since we usually are interested in β as well


Solution #1 (Anderson & Hsiao 1981, 1982)

FD, then estimate via instrumental variables (IV), treating ∆yit−1 asendogenous

Instruments: yit−2, xit−2I Cov(∆yit−1, yit−2) = Cov(yit−1, yit−2)−Cov(yit−2, yit−2) 6= 0I Cov(∆yit−1, xit−2) = Cov(yit−1, xit−2)−Cov(yit−2, xit−2) 6= 0I Cov(∆εit , yit−2) = Cov(εit , yit−2)−Cov(εit−1, yit−2) = 0 if ε is notserially correlated

I Cov(∆εit , xit−2) = Cov(εit , xit−2)−Cov(εit−1, xit−2) = 0 if x ispredetermined (weaker than strictly exogenous)

Model is overidentified

Anderson & Hsaio (1981) also suggest ∆yit−3 as an instrumentγIV is consistent as N → ∞ for fixed T as long as T > 3 under(DPD.i) — (DPD.iv)

Note: As γ→ 1, yit−2, xit−2 become very weak instruments


Solution #2 (Arellano & Bond 1991)

FD, then estimate via IV, treating ∆yit−1 as endogenous as aboveHowever, there are LOTS of potential instruments not considered inA&H method (if T > 3)What are potential IVs?

I Need vars that are correlated with ∆yit−1, uncorrelated with ∆εitI Suitable candidates

F xit−2 (through yit−2)F yit−2 (through yit−2)F yit−3, yit−4, yit−5, ... (through autoregressive process) ... e.g.,

Cov(∆yit−1, yit−3) = Cov(yit−1, yit−3)−Cov(yit−2, yit−3) 6= 0

I Lots of instruments (beware of weak IVs)I Not standard IV/TSLS setup since # of instruments varies byobservation (by t)


Estimation by GMM to utilize more instruments

Effi cient given (DPD.ii) — (DPD.iv)

Writing out model for each period yields

∆yi3 = ∆xi3β+ γ∆yi2 + ∆εi3

∆yi4 = ∆xi4β+ γ∆yi3 + ∆εi4...

∆yiT = ∆xiT β+ γ∆yiT−1 + ∆εiT

where IVs for ∆yi2 are xi1, yi1; IVs for ∆yi3 are xi2, yi2, yi1; ... ; IVs for∆yiT−1 are xiT−2, yiT−2, ..., yi2, yi1GMM allows moment conditions to be derived using as many IVs asdesired

Requires εit to be serially uncorrelated; or, equivalently, ∆εit should beAR(1)


ASIDE: GMM/MDE details

MLE is effi cient among CAN estimators in the context of thespecified parametric model

GMM estimators are robust to some variations in the underlyingDGP; less effi cient than MLE if parametric assumptions hold

Both are examples of M-estimators

Method of Moments (MM) intuitionI With random sampling, a sample statistic —moment —converges inprobability to a constant

I This constant is a fn of unknown population parametersI Thus, to estimate K parameters (θk , k = 1, ...,K ) requires K statisticswith known plims

I Equating the K moments to the K plims, these are inverted to solvefor θ as fns of the moments

I Moments are CAN ⇒ estimator of θ will also be CAN


Example: yi ∼N(µ, σ2)

I Given a random sample, two moment conditions given by

m1 = plim1N

∑i yi = µ

m2 = plim1N

∑i y2i = σ2 + µ2

I Inversion yields

µ = m1 = y

σ2 = m2 −m21 =(1N

∑i y2i

)−(1N

∑i yi

)2=1N

∑i (yi − y)2


General case for MM estimatorI K moment conditions

m1 = µ1(θ1, ..., θK )

...

mK = µK (θ1, ..., θK )

I Assuming the K eqtns are functionally independent, inversion yields

θ1 = θ1(m1, ...,mK )...

θK = θK (m1, ...,mK )

I Except for the case of random sampling from the exponential family ofdbns, θMM is not effi cient


Asymptotic properties

Asy.Var(θ) = 1N

[G′(θ)Φ−1G (θ)

]−1where G (θ) is a K ×K matrix with row k given by

G (θ) =∂mk∂θ′

and Φ is the K ×K estimated asymptotic covariance matrix of themoment conditions, given by

1N

Φjk =1N

1N

∑i [(mj (yi )−mj )(mk (yi )−mk )], j , k = 1, ...,K

and the moment conditions are expressed as

mk =1N

∑i mk (yi )


Estimation based on orthogonality conditionsI CLRM given by

yi = xi β+ εi

where E[xi εi ] = 0F Condition implies E[xi (yi − xi β)] = 0F Sample analog given by

1N

∑i xi (yi − xi β) = 0

which is a set of K moment conditions as a fn of K parametersF These moment conditions are identical to the LS normal eqtnsF Thus, OLS is a MM estimator

I IV estimator solves1N

∑i zi (yi − xi β) = 0

I TSLS estimator solves

1N

∑i xi (yi − xi β) = 0


ML can also be recast as a MM estimatorI The scaled log-likelihood fn is

1Nln [L(θ)] = 1

N∑i ln[f (yi |xi , θ)]

I MLE solves

1N

∂ ln[L(θ|y)

]∂θ

=1N

∑i∂ ln[f (yi |xi , θ)]

∂θ= 0

Many estimators can be viewed as a MM estimator


Minimum Distance Estimation (MDE) extends the MM estimator tothe case of over-identified models

I Setup

m1 = µ1(θ1, ..., θK )

...

mL = µL(θ1, ..., θK )

where L ≥ KI MDE estimator obtained as

θMDE = argminθ[m− µ(θ)]′W [m− µ(θ)]

where m is the vector of moment conditions, µ(θ) is the vector ofplims, and W is a PD weighting matrix


Example:I Two consistent estimates of the same parameter, θ

m1 = plim θ1 = θ

m2 = plim θ2 = θ

I Estimator is

θMDE = argminθ

[θ1 − θ

θ2 − θ

]′W[

θ1 − θ

θ2 − θ

]where W might be the covariance matrix of θ1, θ2


Asymptotic properties: CAN with

Asy.Var(θ) =1N

[G′(θ)WG (θ)

]−1 [G′(θ)WΦWG (θ)

][G′(θ)WG (θ)

]−1where G (θ) is a L×K matrix with row ` given by

G (θ) = plim∂m`∂θ′

and Φ is the L× L estimated asymptotic covariance matrix of themoment conditions

Optimal weight matrix minimizes the asymptotic variance: W = Φ−1

Asy.Var(θ) =1N

[G′(θ)Φ−1G (θ)

]−1which is equivalent to the MM estimator


Generalized Method of Moments (GMM) extends the MDEI Estimator is based on set of population orthogonality conditions

E[m(yi , xi , θ0)] = 0

where θ0 is a K−dimensional vector of true values and m(·) is aL−dimensional vector (L > K )

I Averaging over obs produces the sample moment conditions

E[m(yi , xi , θ0)] = 0

wherem(yi , xi , θ0) =

1N

∑i m(yi , xi , θ0)

I GMM estimator obtained as

θGMM = argminθm(yi , xi , θ)

′Wm(yi , xi , θ)

where W is a PD weighting matrixI Optimal weight matrix minimizes the asymptotic variance: W = Φ−1

Asy.Var(θ) =1N

[G′(θ)Φ−1G (θ)

]−1which is equivalent to the MDE estimator


Example:I IV estimator solves

1N

∑i zi (yi − xi β) = 0

which yields K moments as fns of K parametersI TSLS estimator can be written as

1N

∑i zi (yi − xi β) = 0

where now there are L moments as fns of K parameters


Inference in GMMI Wald tests can be conducted as in MLI LR tests can be conducted based on the minimized GMM criteria

N(qR − q) ∼ χ2J

where q (qR ) is the value of the criterion in the unrestricted(restricted) model and J = # of restrictions

I LM tests can be conducted based on the derivatives of the unrestrictedGMM criterion evaluated at the restricted estimator

See -gmm- in Stata


RETURNING: DPD Estimation Specifics

Recall, the model

yit = ci + xitβ+ γyit−1 + εit , t = 2, ...,T

FD to eliminate ci yields

∆yi3 = ∆xi3β+ γ∆yi2 + ∆εi3

∆yi4 = ∆xi4β+ γ∆yi3 + ∆εi4...

∆yiT = ∆xiT β+ γ∆yiT−1 + ∆εiT

where IVs forI ∆yi2 are xi1, yi1I ∆yi3 are xi2, yi2, yi1...

I ∆yiT−1 are xiT−2, yiT−2, ..., yi2, yi1


Define matrix of instruments (assuming no x’s)

zi︸︷︷︸(T−2)×p

=

yi1 0 0 · · · 0 · · · 00 yi1 yi2 · · · 0 · · · 0... 0 0 · · · 0 · · · 0...

...... · · · ... · · · ...

0 0 0 · · · yi1 · · · yiT−2

Moment conditions (p × 1)

E[z ′i∆εi ] = 0

∆εi = [∆εi3 ∆εi4 · · · ∆εiT ]′


Estimator

minγ

(1N

∑i ∆ε′izi

)WN

(1N

∑i z′i∆εi

)where

WN =

[(1N

∑i z′i∆εi∆ε′izi

)]−1and ∆εi are estimated residuals from consistent first-step estimates

I Known as two-step differenced GMM


Consistent first-step obtained using

W1N =

[(1N

∑i z′iHzi

)]−1where

H(T−2)x (T−2)

=

2 −1 0 · · · 0

−1 . . . . . . . . ....

0. . . . . . . . . 0

.... . . . . . . . . −1

0 · · · 0 −1 2

I Known as one-step differenced GMMI Two-step estimator is asymptotically effi cient if εit are heteroskedastic;one-step if homoskedastic

I Not much gain in practice to two-step


Specification tests1 Overidentification test (Sargan test)

∆ε′z[∑i z

′i∆εi∆ε′i zi

]−1z ′∆ε ∼ χ2p−1

where p is the # of columns in z2 No serial correlation in εit ⇔ ∆εit is AR(1)

Adding x’s into the model requires assumptions about theircorrelations with the errors

I Assumptions about x’s

F Strict exogeneity: E[xit εis ] = 0 ∀t, sF Predetermined (sequential exogeniety): E[xit εis ] = 0 ∀t 6 sF Endogenous: E[xit εis ] = 0 ∀t < s


Incorporating x’s into moment conditions

Moment conditions (p × 1)

E[z ′i∆εi ] = 0, ∆εi = [∆εi3 ∆εi4 · · · ∆εiT ]′

x’s strictly exogenous

zi =

Zixi1 · · · xiT 0 · · · · · · · · · · · · · · · 00 · · · 0 xi1 · · · xiT · · · · · · · · · 0...

.

.

....

.

.

....

.

.

.

0 · · · · · · · · · · · · · · · 0 xi1 · · · xiT

where

Zi =

yi1 0 0 · · · 0 · · · 00 yi1 yi2 · · · 0 · · · 0... 0 0 · · · 0 · · · 0...

...... · · · ... · · · ...

0 0 0 · · · yi1 · · · yiT−2


x’s predetermined

zi =

Zixi1 xi2 0 · · · · · · · · · · · · · · · 00 0 xi1 xi2 xi3 · · · · · · · · · 0...

.

.

....

.

.

....

.

.

.

0 · · · · · · · · · · · · 0 xi1 · · · xiT−1


x’s endogenous

zi =

Zixi1 0 · · · · · · · · · · · · 00 xi1 xi2 · · · · · · · · · 0...

.

.

....

.

.

....

.

.

.

0 · · · · · · 0 xi1 · · · xiT−2

Notes:

I Ahn & Schmidt (1995) utilize T − 2 additional moment conditions

E[εit∆εit−1 ] = 0, t = 3, ...,T

I All three versions assume E[xitci ] 6= 0; if E[xitci ] = 0, then there areeven more moment conditions

Stata: -xtabond-, -xtdpd-; -xtdpd- allows for greater user-controlledflexibility


Solution #3 (Arellano & Bover 1995; Blundell & Bond 1998)

A&B (1995), B&B (1998) show that as |γ| → 1, A&H and A&Bestimator is downward biased

In MC study, B&B find if |γ| > 0.8, bias is fairly severeProblem arises due to the fact that the IVs become weak aspersistence grows

SolutionI Add additional moment conditions derived from the model in levels

yit = xitβ+ γyit−1 + ci + εit

= xitβ+ γyit−1 + εit

I What are IVs for yit−1?

F ∆yit−1 (independent of ci , εit )F ∆yit−2,∆yit−3, ... (through autoregressive process)


EstimationI Define

zsysi =

zi 0 · · · · · · · · · · · · · · · 00 ∆yi2 · · · · · · · · · · · · · · · 0... ∆yi2 ∆yi3

......

......

...... 0

0 · · · · · · · · · 0 ∆yi2 · · · ∆yiT−1

where zi is the matrix of instruments from A&B

I Complete set of moment conditions

E[zsys ′i εsysi ] = 0

εsysi = [∆εi3 ∆εi4 · · · ∆εiT εi3 · · · εiT ]′


Known as system GMM

Requires additional assumption

(DPD.v) E[ci∆yi2 ] = 0 ∀i

Stata: -xtdpdsys-


Solution #4 (Hahn et al. 2007)

An alternative approach to circumvent the weak IV issue when|γ| → 1 is to utilize long differences (LD) instead of first-differencingto eliminate ciStructural model


LD yields

yiT − yi2 = (xiT − xi2)β+ γ(yiT−1 − yi1) + εiT − εi2

where yi1 is available as an IV for yiT−1 − yi1Even if |γ| is close to unity,

Cov(yiT−1, yi1)−Cov(yi1, yi1) 6= 0

if T is suffi ciently large


Hahn et al. (2007) also utilize additional IVsI After initial IV estimation, obtain estimated residuals

εis ≡ yis − xis β− γyis−1, s = 3, ...,T − 1I Re-estimate the LD model using yi1, εis as instrumentsI Iterate this procedure; typically only a few iterations are needed

Hahn et al. (2007) demonstrate the superior performance of thisestimator when |γ| → 1


Solution #5 (Kiviet 1995, 1999; Bun & Kiviet 2003; Bruno 2005;Hausman & Pinkovskiy 2017)

Revert back to Nickel (1981) who derived bias to FE/LSDV estimateof DPD model

StrategyI Obtain an estimate of the approximate bias of the LSDV estimatorI Adjust the estimate γLSDV by the estimated bias to obtain the LSDVC(corrected) estimate

I Implies E[γLSDV + BIAS − γ] = 0

MC evidence indicates good performance for moderate size N

See -xtlsdvc-, -xtbcfe- in Stata


Solution #6 (Everaert 2013)

Estimation by IV, but in levels using orthogonal to backward meantransformation to instrument for yit−1


Instrument for yit−1 is the residual from the first-stage regression ofyit−1 on its ‘backward mean’defined as

ybit−1 =1

t − 1t−1∑s=1

yis

If Cov(xit , ci ) 6= 0, thenI Instruments for xit are derived based on Hausman & Taylor (1981)I Specifically, deviations from individual-specific means, xit − x i

Consistency requires T → ∞


Solution #7 (Bhargava and Sargan 1983; Hsiao et al. 2002; Kripfganz2016)

QML estimator for large N, small T panels

Based on ML estimator after first-differencing

Stata: -xtdpdqml-


Final comment

Preceding estimators require εit to be serially uncorrelatedI Can relax this assumption and allow ε to follow a low order MA processI Stata: -xtdpd-

Preceding estimators are no longer consistent if the data areirregularly spaced

I Irregular spacing occurs when the interval between periods in the datadoes not align with the interval between periods in the DGP

I This can occur even if the data are evenly spacedI See McKenzie (2001), Millimet & McDonough (2017), Sasaki & Xin(2017)


Panel DataLDV Models: Overview

LDV panel models are complicated due to four main issues1 Estimation by MLE requires iid obs to compute a tractable likelihood;unlikely with panels

F Solution ⇒ partial maximum likelihood

2 Transformations to eliminate FEs are generally not available sinceproblems are nonlinear

F Solution (in some models) ⇒ conditional maximum likelihood

3 DV approach to FE models are generally inconsistent unless N,T → ∞F Known as the incidental parameters problemF Solutions ⇒ correlated random effects, bias-correction

4 Marginal effects generally depend on c which is only estimable if oneplaces more structure on the model

Research on-going in many areas, many unresolved questions


Panel DataLDV Models: Binary Choice Models

Case of a binary dependent variable with panel data

Often useful to start with a LPM and use the FE or FD estimatorI Same shortcomings as LPM with cross-section data

F HeteroskedasticityF Predictions outside unit interval

I Plus, implies restrictions on the feasible values of ciI Nonetheless, very common in applied research


Pooled Models...

Instead, suppose the model is given by

Pr(yit = 1|xit ) = G (xitβ), t = 1, ...,T

where G (·) ∈ (0, 1) is known and xit must be exogenous (but notstrictly)

I Recall, the likelihood fn gives the total probability of the dataconditional on the parameters

F We simplify this if we assume observations are iidF With panel data, we usually do not want to assume this; rather assumeindependent panels, but not independence across time within a panel

F So, we want to model the joint Pr(yi |xi )F However, without further assumptions, we cannot derive the dbn ofyi ≡ (yi1, ..., yiT ) given xi ≡ (xi1, ..., xiT )

Could assumePr(yi = 1|xi ) = GT (xi β,Σ)

but then likelihood fn would entail T integrals if, say, GT is aT -dimensional normal dbnDL Millimet (SMU) ECO 6375 Fall 2020 92 / 164

We can still obtain consistent estimates using the partial likelihood fn

ln[L(θ)] = ∑i ∑t yit ln[G (xitβ)] + (1− yit ) ln[1− G (xitβ)]

which is different from ML in that we do not assume thatf (yi |xi ) = ∏ ft (yit |xit )

I Known as the PMLE of the pooled probit/logit model when G = Φ,Λ(std normal CDF or logistic CDF)

I Under partial likelihood, it is assumed that G (xitβ) is the correctdensity of f (yit |xi ) = f (yit |xit ) for each t, but the joint densityG (yi |xi ) is not the product of the marginals


(cont.)I Robust std errors should be used unless one assumes dynamiccompleteness, that is

Pr(yit = 1|xit , yit−1, xit−1, ...) = Pr(yit = 1|xit )

which implies that lagged x and y do not enter the density for yit givenxit

I Specification test for dynamic completeness

F Estimate the pooled probit/logit and obtain εit = yit − G (xit β)F Estimate a new pooled probit/logit where εit−1 is included as aregressor (t > 1); rejection of its coeff as zero implies rejection ofdynamic completeness


Models With Unobserved Effects...

Model given in latent form is

y ∗it = xitβ+ ci + εit , εit ∼N/L

yit = I(xitβ+ ci + εit > 0)

RE specification assumes

f (ci |xit ) is not dependent on xit

FE specification leaves f (ci |xit ) unrestricted s.t. ci and xit may becorrelated


In general, the likelihood involves high-dimensional integration

However, the RE structure simplifies estimation by factoring thedensity into f (εi1, ..., εiT |ci )f (ci ) = f (εi1|ci ) · · · f (εiT |ci )f (ci )Log-likelihood becomes

ln[L(θ)] = ∑i ln[∫ ∞−∞

[∏Tt=1 Pr(yit |xitβ+ ci )

]f (ci )dc

]which requires integrating over the dbn of c

Assuming ci |xi ∼N(0, σ2c ), the likelihood fn becomes

ln[L(θ)] = ∑i ln[∫ ∞−∞

[∏Tit=1 Pr(yit |xitβ+ ci )

] 1σc

φ

(ciσc

)dc]

Butler & Moffi tt (1982) show how to approximate the integralI Known as RE probit estimatorI Estimate of ρ allows testing for presence of unobserved heterogeneity


Notes...

Maximum Simulated Likelihood (MSL) estimation is an alternativethat allows for c to follow some non-normal dbns

RE logit model also exists and assumes c ∼N

Integrals approximated using various quadrature procedures; resultsmay be sensitive to number of quadrature points

I Stata: -quadchk-


Marginal effects are more complex since they depend on c

∂Pr(yit = 1)∂xkit

= βkφ(xitβ+ ci )

I Since E[c |x ] = 0, the ME at the population average of c in thepopulation is βkφ(xitβ)

I The relative ME of any two covariates, xk and xk ′ , is

∂Pr(yit=1)∂xkit

∂Pr(yit=1)∂xk ′ it

=βkφ(xitβ+ ci )βk ′φ(xitβ+ ci )

=βkβk ′

which is not observation-specific


(cont.)I The average marginal effect (AME) is the value of βkφ(xitβ+ ci )averaged over the dbn of c , which in the RE probit is

βkσε

φ

(xit

β

σε

)which only requires knowledge of βc = β/σε, known as thepopulation-averaged parameters

F Interestingly, βc is what is estimated by the pooled probit estimator if cis ignored

F Thus, one may estimate βc via pooled probit with robust std errors(which does not assume iid observations over time), or impose the fullerror structure and estimate the RE probit


FE Specification

Model re-written as

yit = I(cidit + xitβ+ εit > 0), εit ∼N/L

where dit is a dummy variable equal to 1 for obs i , 0 otherwise

Log-likelihood is

ln[L(θ)] = ∑i ∑Tt=1 Pr(yit |cidit + xitβ)

where

Pr(·) =

Φ[qit (cidit + xitβ)] if εit ∼N

Λ[qit (cidit + xitβ)] if εit ∼ L

and qit = 2yit − 1


The FE estimator is flawedI As with continuous FE models, ci estimates are inconsistent underfixed-T asymptotics

F Unlike continuous FE models, the inconsistency also causes β to beinconsistent

F There is also a small sample bias, although the magnitude remains anopen question

F Problem known as the incidental parameters problem

I Thus, there is a trade-off between the restrictiveness of the REassumption vs. the bias/inconsistency due to the incidental parametersproblem

I Research continues on bias-corrected estimators


Question: Why does the inconsistency of ci in the continuous FEmodel not affect β?

I In the continuous y case, the model is transformed into deviations fromobs-specific means

I Formally, while f (yit |X ) depends on ci , f (yit |X , y i ) does notF y i is a minimal suffi cient statistic for ciF y i is then used in the estimation of β

I In the probit model, there is no suffi cient statistic for ciI There is in the logit model


Before discussing the FE logit model, there is Chamberlain’scorrelated RE probit estimator based on Mundlak (1978)

I Assumeci |xi ∼N(ξ0 + x i ξ1, σ

2a)

where ai = ci − ξ0 − x i ξ1 and σ2a does not depend on xI Under the prior assumptions for the RE probit estimator, apply the REprobit estimator to the model

Pr(yit |xit , ci ) = Φ(ξ0 + xitβ+ x i ξ1 + ai )

where ai |xi ∼N(0, σ2a) and x does not contain an interceptF This yields estimates of σ2a , not σ2cF Testing Ho : ξ1 = 0 ⇔ RE model vs. Chamberlain’s RE model

I The AMEs can be obtained by estimating the pooled probit model

Pr(yit |xit ) = Φ(ξ0 + xit β+ x i ξ1)

or after the RE probit model as

βkφ(

ξ0 + xit β+ x i ξ1)

where β = β/√(1+ σ2a) (and similarly for ξ)


Now, turn to the FE logit model

Pr(yit = 1|xit ) = Git =expxitβ+ ci

1+ expxitβ+ ci

with the likelihood given as

L(θ) = ∏i ∏t (Git )yit (1− Git )1−yit

and no assumptions are made about dependence between xit and ciTrick is to derive the joint dbn of yi |xi , ci , ni , where ni ≡ ∑t yit (# ofperiods obs i has y = 1)

I Turns out that this dbn is independent of ciI Thus, ni is a minimal suffi cient statistic for ci

Probit model does not give rise to such a suffi cient statistic such thatthe conditional dbn does not depend on ci


Chamberlain (1980) observed that the conditional likelihood does notdepend on ci

L(θ) = ∏i Pr (yi1, ..., yiT |xi , ni )where the contribution of obs i to the likelihood is

Li (θ) = Pr (yi1, ..., yiT |xi , ni )

=Pr (yi1, ..., yiT , ni |xi )

Pr (ni |xi )=

exp ∑t yit (xitβ)∑a∈Ri exp ∑t (atxitβ)

and Ri ⊂ RT defined as a ∈ RT : at ∈ 0, 1 and ∑t at = niI Ri represents all sequences of 1s and 0s across the T periods such that

∑t at = niI The denominator is summed over the set of all (Tni ) unique sequencesof at that sum to ni

Only obs for which ni /∈ 0,T contribute information about β sinceyi is perfectly determined in these two cases


Test of homogeneity given by Ho : ci = c ∀iI Likelihood ratio test is not possible since the FE estimates are based onthe conditional likelihood fn

I Hausman test is applicable since both are consistent under the null, butthe FE model is ineffi cient, while only the FE model is consistent underthe alternative(

βFE − β)′ (

ΣFE − Σ)−1 (

βFE − β)∼ χ2

InterpretationI β gives the ME of x on the log odds ratio

ln[Pr(yit = 1)Pr(yit = 0)

]= xitβ+ ci

I MEs of x on Pr(yit = 1), and estimation of the AMEs, is not feasiblewithout structure on the dbn of c , which is what the FE logit modelavoids


Final notes...

All RE, FE estimators require strict exogeneity of xit conditional on ciI Straightforward to test by adding leads, xit+1, to the model and testingfor significance

I Requires omitting the final period from the estimation

FE logit vs. Chamberlain’s RE probit modelI FE logit leaves the dbn of c unspecified, but does not yield estimatesof the AMEs

I Correlated RE probit based on the augmented pooled probit allows forserial correlation and estimates the AMEs, but forces one to place somestructure on the dbn of c |x

Additional work on FE binary choice models is on-going (e.g., Ai &Gan 2010)

Stata: -xtprobit-, -xtlogit-


Dynamic Models

Extensions to dynamic binary choice models are in progressI Dynamic binary probit models first considered in Heckman (1981a,b)I Orme (1997, 2001) and Wooldridge (2005) propose alternatives thatare easier to estimate

I Arulampalam and Stewart (2009) propose an easier to estimate versionof Heckman’s estimator; estimable in Stata in a straightforward manner

Akay (2011) compares Heckman and Wooldridge’s approaches


Model

Pr(yit = 1|xi , yit−1, yit−2, ..., ci ) = G (xitβ+γyit−1+ ci ), t = 2, ...,T

Initial conditions problemI General estimation strategy is to model f (y1, ..., yT |x , c), specify a dbnf (c |x), and then integrate wrt this density

I To figure out f (y1, ..., yT |x , c), we can factor this as

f (y1, ..., yT |x , c) = f (y2, ..., yT |x , y1, c)f (y1 |x , c)

where yi1 is the initial conditionI Thus, maximizing the likelihood fn requires

1 Specification of f (y1 |x , c), or2 Assumption that f (y1 |x , c) = f (y1 |x); i.e., yi1 is independent of ciconditional on xi


Under (2) the above model can be estimated by conditional MLEusing the RE probit or correlated RE probit with only data fromperiods 2, ..,T and assuming ci |xi ∼N(0, σ2c )

ln[L(θ)] = ∑i ln[∫ ∞−∞

[∏Tt=2 Pr(yit |xitβ+ γyit−1 + ci )

] 1σc

φ

(ciσc

)dc]

or assuming ci |xi ∼N(ξ0 + x i ξ1, σ2a) and integrating wrt to the

density of a


However, yi1 is likely not to be conditionally independent of ciI Even if period 1 corresponds to the ‘true’initial period, in mosteconomic applications it is diffi cult to justify the assumption of yi1being unrelated to ci

I Thus, the likelihood fn must account for the initial period

ln[L(θ)] = ∑i ln[∫ ∞−∞

[Pr(yi1 |xi , ci )×

∏Tit=2 Pr(yit |xitβ+ γyit−1 + ci )

]1

σcφ

(ciσc

)dc]

Different solutions based on different specifications ofPr(yi1 = 1|xi , ci )


Heckman approach

Pr(yi1 = 1|xi , zi , ci ) = G (xi1β+ zi1π + λci )

where z may include additional determinants of the initial period ifavailable and λ allows for differences in the distribution of thecomposite error in the initial period

I Likelihood fn

ln[L(θ)] = ∑i ln[∫ ∞−∞

[Pr(yi1 |xi1β+ zi1π + λci )×

∏Tt=2 Pr(yit |xitβ+ γyit−1 + ci )

]1

σcφ

(ciσc

)dc]

I Stewart (2007) extends this to allow for serial correlation in theidiosyncratic errors

I Correlation between xi and ci can be addressed using the Mundlak(1978) approach (i.e., including x i in xit in which case ci becomes ai )

I Stata: -redprob-, -redpace-(http://www2.warwick.ac.uk/fac/soc/economics/staff/academic/stewart/stata/)


Wooldridge approachI Rather than model the initial condition, this approach specifies a dbnfor f (ci |xi , yi1) and integrates wrt this dbn

I Extending the Mundlak (1978) approach, assume

ci |xi , yi1 ∼N(ξ0 + x i ξ1 + ξ2yi1, σ2a)

where ai = ci − ξ0 − x i ξ1 − ξ2yi1 and σ2a does not depend on x or y1I Now, the conditional likelihood fn is given by

ln[L(θ)] = ∑i ln[∫ ∞−∞

[∏Tt=2 Pr(yit |ξ0 + xitβ+ x i ξ1

+ γyit−1 + ξ2yi1 + ai )

]1

σaφ

(aiσa

)da]

which can be estimated using the usual RE probit modelI Akay (2011) shows this approach performs better than the Heckmanapproach when T > 5


Orme approachI Not as commonly usedI Write the model for the initial period as

yi1 = I(xi1β+ zi1π + εi1 > 0)

where εi1, ci ∼N2(0, 0, σ2ε , σ2c , r)

I Utilizing properties of the bivariate normal density, we have

ci = rσcσε

εi1 + σc

√(1− r2)wi

where wi ∼N(0, 1) and is independent of εi1I Substituting in for ci yields

Pr(yit = 1|xi , yit−1, ci ) = G(xitβ+ γyit−1 + r

σcσε

εi1 + σc

√(1− r2)wi

), t = 2, ..

which can be estimated by conditional ML using the usual RE probitmodel where wi is the RE and εi1 is replaced with

εi1 =(2yi1 − 1)φ(xi1 β+ zi1π)

Φ[(2yi1 − 1)(xi1 β+ zi1π)]

where β and π are estimated via probit on data from initial period


Panel DataLDV Models: Tobit Model

Start with the pooled tobit model

y ∗it = xitβ+ εit , εit |xit ∼N(0, σ2)

yit = max(0, y ∗it )

which is estimable as a tobit model with NT obs assuming xit areexogenous (but not strictly)

I As with the pooled probit, this a Partial ML estimatorI Dynamic completeness implies usual inference is correct

F Tested by estimating the following pooled tobit

y ∗it = xit β+ γ1rit−1 + γ2(1− rit−1 )εit−1 + ηit

where rit−1 = I(yit−1 = 0) and εit−1 = yit−1 − xit−1 β if yit−1 > 0 andεit−1 = 0 if yit−1 = 0 using t > 1

I Absent this assumption, robust std errors are needed


Model with unobserved effects is given in latent form as

y ∗it = xitβ+ ci + εit , εit |xi , ci ∼N(0, σ2ε )

yit = max(0, y ∗it )

Skipping a simple RE version, move to a Chamberlain-like modelI Assume

ci |xi ∼ N(ξ0 + x i ξ1, σ2a)

ai |xi ∼ N(0, σ2a)

εit |xi , ai ∼ N(0, σ2ε )

where ai = ci − ξ0 − x i ξ1I Assumption of εit |xi , ci ∼N(0, σ2ε ) implies xit are strictly exogenousconditional on ci

I Also assume that εit are serially uncorrelated

Under this setup, a RE tobit estimator with xit and x i as covariatesprovides consistent estimates of the parametersTesting Ho : ξ1 = 0 ⇔ RE tobit model vs. Chamberlain’s RE model


Final notes...

AMEs discussed in Wooldridge (2002)

No firm results on the behavior of MLE including obs-specificdummies

I Prior research suggests β is not biased, but σ is biased down (Greene2003)

I Even if β is unbiased, inference and computation of marginal effects arediffi cult given the bias in σ

I Research continues

Extensions to dynamic tobit models are available (e.g., following theWooldridge (2005) approach)

Stata: -xttobit-


Panel DataLDV Models: Count Models

Start with the pooled Poisson modelI Model expected number of events conditional on x

E[yit |xit ] = F (xitβ)

where F (·)> 0I Common functional form of F (·) = expxitβI Recall, Poisson dbn depends only on the mean given by

λit = expxitβI Log-likelihood fn given by

ln[L(θ)] = ∑i ∑t [−λit + yit lnλit − ln(yit !)]

I Estimator is CAN assuming conditional mean is correctly specifiedregardless of Poisson dbn

I Further assumptions needed for inference


RE model given by

E[yit |xit , ci ] = F (xitβ+ αi )

where F (·) = expxitβ+ αiAssumptions

1 Functional form arises if one assumes that ci = exp(αi ) andunobserved effect enters multiplicatively

2 xit is strictly exogenous3 ci |xi ∼ Γ(δ0, 1/δ0) dbn

F Implies E[ci ] = 1, Var(ci ) = 1/δ0 ≡ η20

4 No serial correlation in yit conditional on xi , ci

Poisson assumption implies equi-dispersion in yit |xi , ciHowever, Var(yit |xi ) = E [yit |xi ](1+ η20 E [yit |xi ]) ⇒ over-dispersionconditional on xi alone


Estimation entails forming the likelihood fn, based on the densityf (yi1, ..., yiT |xi ), which requires integrating over the dbn of c

I Assumption concerning the gamma dbn allows this density to betractable

F Assuming normality is also possible

I Under maintained assumptions, MLE is effi cientI Estimates are sensitive to violations

A QMLE RE estimator relaxes some of these assumptions and isequivalent to a pooled NB model

A Mundlak (1978) correlated RE model is feasible


Hausman et al. (1984) develop a FE version, allowing dependencebetween xi and ciRequires same assumptions as RE model above except thoseconcerning the dbn of ciThe conditional likelihood does not depend on ci

L(θ) = ∏i Pr (yi1, ..., yiT |xi , ni , ci )

if we condition on ni , ni = ∑Tt=1 yit (ni is a minimal suffi cient

statistic)The contribution of obs i to the conditional log-likelihood is

ln[Li (θ)] = ln[Γ (1+∑t yit )]−∑t ln [Γ (yit + 1)] +∑t yit ln[expxitβ

∑s expxisβ

]Known as the FE Poisson estimator

I Consistency only requires the conditional mean to be correctly specifiedI As such, estimator is applicable to any model where this is the case(e.g., yit could be continuous, contain negative values, binary, etc.)

F Example: gravity models in levels


Notes:I If ni = 0, then obs does not contribute any informationI FE-NB model also feasible, based on

ln[Li (θ)] = ln [Γ (∑t λit )] + ln [Γ (1+∑t yit )]− ln [Γ (∑t yit +∑t λit )]

+∑t ln [Γ (yit + λit )]− ln [Γ(λit )]− ln [Γ(1+ yit )]

where λit = expxitβF FE-NB allows estimation of coeffs on time invariant regressors and anintercept

F FE-Poisson does not

I Difference follows from the fact that the Poisson specifiesE[yit |xit ] = F (xitβ+ αi ), whereas NB specifiesE[yit |xit ] = θi expxitβ, where θ is a scaling parameter rather than ashifter of the conditional mean function

I Since the conditional mean is homogeneous in the FE-NB, an interceptand effects of time invariant regressors are estimable

I Dynamic count models considered in Wooldridge (2005)

Stata: -xtpoisson-, -xtpqml-, -xtnbreg-


Panel DataLDV Models: Qualitative Choice Models

Pg. 653-4 in Wooldridge


Panel DataLDV Models: Ordered Models

FE ordered probitI Suffers from incidental parameters problem, as in the binary probitmodel

I Evidence suggests that finite sample bias appears to be of similarmagnitude

FE ordered logit is consistent, but has limited usefulnessI Model setup

y∗it = xitβ+ ci + εit , εit ∼ L

yit = j if y∗it ∈ (µj−1, µj )

whereF j = 0, 1, ..., JF µ−1 = −∞, µ0 = 0, and µJ = ∞

I Probabilities given by

Pr(yit = j) = Λ(µj − xitβ− ci )−Λ(µj−1 − xitβ− ci )


Define a new sequence of dummy variables

wit ,j =1 if yit > j0 otherwise

, j = 0, 1, ..., J − 1

It follows that

Pr(wit ,j = 1|X ) = Λ(−µj + xitβ+ ci )

= Λ(θi + xitβ)

which reduces to a series of J − 1 binary, FE logits estimable as before


Notes:I Procedure produces J − 1 consistent estimates of β; these can becombined into a single vector β using MDE

βMDE = argminβ

∑J−1j=0 ∑J−1m=0(βj − β)′[V−1jm

](βm − β)

where V−1jm is the j ,m block of the inverse of the (J − 1)K × (J − 1)Kmatrix V that contains the Asy.Cov(βj , βm), but estimates of theoff-diagonal blocks are not obvious

I While β is consistent, the threshhold parameters are not identified,implying no estimates of marginal effects or predicted probabilities

I RE version available


Panel DataHeterogeneous Time Trends

All estimators to this point are known as homogeneous estimators asthey assume constant parameters across i

Allowing for individual-specific time trends yields

yit = ci + λi t + xitβ+ εit

Estimation proceeds by FD

∆yit = λi + ∆xitβ+ ∆εit

and then estimating this model by RE, FE, FD, or LSDV given theassumptions one wishes to invoke

Note: This requires T ≥ 3


Panel DataHeterogeneous Slopes

Model allowing for heterogeneous slopes more generally is given by

yit = ci + γiyit−1 + xitβi + εit

γi = γ+ η1iβi = β+ η2i

θi =βi

1− γi

Perhaps more realistic in many applications as assumption of identicaleffects of covariates may not be justified

Estimation depends on what one is interested in estimating:population average parameters and/or panel-specific parameters

Even if one is only interested in population average parameters,acknowledgement of panel-specific parameters complicates estimation

Estimation now typically requires N,T → ∞


Swamy (1970) FGLS estimator

Restricted to models where ci = c and γi = 0 ∀i

yit = xitβi + εit

and x includes an intercept

Assumptions:

(S.i) E[ε] = 0

(S.ii) E[εi ε′j ] =

σ2i IT if i = j0 otherwise

(S.iii) βi = β+ ηi , where E[η] = 0(S.iv) ηi and εj are independent

(S.v) E[ηiη′j ] =

Ω if i = j0 otherwise

where Ω is a K ×K matrix


Substituting for βi implies

yit = xitβ+ (εit + xitηi ) = xitβ+ υit

where

E[υυ′] =

x1Ωx ′1 + σ21 IT 0 · · · 0

0 x2Ωx ′2 + σ22 IT · · · 0...

......

0 0 · · · xNΩx ′N + σ2N IT

= Σ

Thus, there are two sources of heteroskedasticity, as well as fact thatυ is serially correlated within panels

Estimation could ignore panel-specific parameters and insteadestimate the model using pooled OLS with clustered standard errors

Alternative is FGLS


GLS (Aitken) estimator given by

βGLS = (x ′Σ−1x)−1x ′Σ−1y

=[∑i x

′i (xiΩx

′i + σ2i IT )

−1xj]−1

∑i x′i (xiΩx

′i + σ2i IT )

−1yi

= ∑i ωi βi

where

ωi =

∑j[Ω+ σ2j (x

′j xj )

−1]−1−1 [Ω+ σ2i (x′j xj )

−1]−1βi = (x ′i xi )

−1x ′i yi

which is a weighted average of βi


FGLS estimatorI Requires estimates of Ω and σ2i ∀iI Estimates given by

σ2i =y ′iMi yiT −K =

ε′i εiT −K

Ω =SbN − 1 −

1N

∑i σ2i (x′i xi )

−1

where

Mi = I−xi (x ′i xi )−1x ′iSb = ∑i βi β

′i −

1N

∑i βi ∑i β′i

and βi and εi are obtained via panel-specific OLS (∴ large T )Can test Ho : β1 = · · · = βN after

Stata: -xtrc-


Pesaran & Smith (1995) estimators: ci 6= c,γi 6= γ ∀iMean Group (MG) Estimator

I Estimate separate time series regressions for each iI Obtain an average estimate of γ and β across the N regressions

Pooled (P) EstimatorI Pool data, but allow parameters to have a random component

Aggregate (A) EstimatorI Averaging the data across i (for each t)I Estimate a single, aggregate time series regression

Cross-Section (CS) EstimatorI Average the data across t (for each i)I Estimate a single, cross-section regression

⇒ MG provides estimates of individual-specific coeffs, as well as averagecoeffs; remaining estimators estimate only the average coeffs

See -xtmg- in Stata


PropertiesI In a static model (γi = 0 ∀i) with x strictly exogenous and βi differrandomly and are distributed independently across i , all four estimatorsare unbiased and consistent for β

I In a dynamic model (γi 6= 0 ∀i), this is not the caseF MG estimator remains consistent as N ,T → ∞F P and A estimators are inconsistent due to serial correlation in theregressors (at least yit−1 is serially correlated due to ci ) which, whencoeff heterogeneity is ignored, induces serial correlation in the errorterm which leads to inconsistent estimates in models with laggeddependent vars (proofs in paper)

F CS estimator remains consistent for the average long-run coeffs, θ,where θi = βi/(1− γi ), but is biased and inconsistent if based onshort T


Pooled estimatorI Estimating equation

yit = ci + γyit−1 + xitβ+ [εit + η1i yit−1 + xitη2i ]

= ci + γyit−1 + xitβ+ υit

where E[yit−1υit ] 6= 0 and E[xitυit ] 6= 0 if x is serially correlated;E[yit−1υit ] 6= 0 even if x is not serially correlated since E[yit−1η1i ] 6= 0

I IV solution does not exist since any strong IV will also be correlatedwith υit

Aggregate estimatorI Estimating equation

y t = c + γy t−1 + x tβ+ [εt +1N

∑i (η1i yit−1 + xitη2i )]

= c + γy t−1 + x tβ+ υt

I OLS will be biased due to correlation between regressors and υI Unlikely that an IV solution exists


Cross-section estimatorI Estimating equation

y i = ci + γy i ,−1 + x i β+ [εi + (η1i y i ,−1 + x iη2i )]

= ci + γy i ,−1 + x i β+ υi

which is the between estimator of the dynamic modelI OLS will be biased even if γ and β are homogeneous, but c isheterogeneous (due to correlation with y i ,−1)


Alternative cross-section estimatorI Estimating equation obtained by substituting

y i ,−1 = y i − (1/T )(yiT − yi0) = y i − ∆i

since y i ,−1 = (1/T )∑Tt=1 yit−1 and solving for y i

y i =1

1− γici −

γi1− γi

∆i + x i θi +1

1− γiεi

I Introduce heterogeneity asγi

1− γi=

γ

1− γ+ ξ1i

θi = θ + ξ2i

yielding

y i =1

1− γici −

γ

1− γ∆i + x i θ +

[1

1− γiεi − ξ1i∆i + x i ξ2i

]which can be estimated by OLS omitting ∆i since it is asymptoticallyuncorrelated with x i

I Provides biased but consistent estimates of the average LR coeff, θ, ifx is uncorrelated with c ; bias vanishes only as T → ∞


Pesaran, Shin & Smith (1997, 1999) estimator

MG estimator allows all parameters to vary by i

FE estimator allows only the intercept, c , to vary with i

Pooled Mean Group (PMG) estimator is a compromiseI Model written as a ECMI The long-run coeffs in the cointegrating relationship, θ, are constrainedto be equal across i

I The short-run coeffs and speed of adjustment parameters vary by iI Stata: -xtpmg-

Fomby & Balke (1997): threshold cointegration allows the speed ofadjustment parameter to vary between two regimes


Nonstationary Panels

The increased use of panel data by more macro-oriented economistsled to study of asymptotics as both N,T → ∞Allowing T → ∞ led to concerns over nonstationarity, spuriousregressions, cointegration

Estimation with nonstationary data is easier in some respects withpanel data than with pure time series

I Many test statistics and estimators are asymptotically normal whereasthey are not with time series

I Spurious regression gives consistent estimates as N,T → ∞

Brief overview of these topics; see Baltagi for details and additionalreferences


Nonstationary PanelsPanel Unit Roots

Asymptotic properties of test statistics and estimators dependcrucially on the way in which N and T go to ∞Three possibilities

1 Sequential Limit: one, say N, goes to ∞ followed by the other, say T2 Diagonal Path Limit: N,T → ∞ at same rate, but along a specific,diagonal path s.t. T = T (N)

3 Joint Limit: N,T → ∞ at same rate without placing diagonal pathrestrictions

Joint limit theory is the most robust, but is more diffi cult and requiresstronger conditions (e.g., existence of higher moments)

Stata: -xtunitroot- (replaces previous, user-written commands)


Model

Given byyit = ρiyit−1 + zitβi + εit

where

zit =

1 ⇒ FE model0[1 t] ⇒ FE model with panel-specific time trend


Test Procedures

1 Levin, Lin, & Chu (LLC, 2002)2 Harris & Tsavalis (1999)3 Breitung (2000)4 Im, Pesaran, & Shin (IPS, 2003)5 Fisher-type tests6 Hadri (2000)7 Pesaran (2003)

Hlouskova & Wagner (2006) conduct a large-scale MC study of thedifferent tests


Tests differ in terms of1 Whether ρi = ρ ∀i is assumed2 Rate at which N,T → ∞

no constant ⇒ z = 0; trend ⇒ z = [1 t]DL Millimet (SMU) ECO 6375 Fall 2020 144 / 164

Before examining the panel tests, recall the ADF tests from a puretime series

I Model

yt = α+ βt + γyt−1 +∑ps=1 γs∆yt−s + εt , or

∆yt = α+ βt + δyt−1 +∑ps=1 γs∆yt−s + εt

where the random walk with and without drift are obtained imposingthe proper restrictions

I Test statistics are

DFτ =γ− 1s.e.(γ)

or the usual t-statistic for δI Critical values depend on T and whether or not the intercept and/orthe time trend are included


Harris & Tsavalis (1999) TestI Model

yit = ρyit−1 + zitβi + εit

where ε is iid and normally distributedI Hypotheses

Ho : all panels contain a unit root

Ha : no panels contain a unit root

I Test statistic is based on the derivation of the asymptotic dbn of ρolsunder Ho (and each choice of z)

√N(ρols − µ) ∼ N(0, σ2)

where µ and σ2 depend on the choice of zI Asymptotic dbn is valid as N → ∞ (even with fixed T )I Test requires a balanced panel


Levin, Lin & Chu (LLC, 2002) TestI Model

∆yit = φyit−1 + zitβi +∑ps=1 γis∆yit−s + uit

where p is chosen based on some model selection criteriaI Hypotheses



I Test requires

1 εit to be independent across panels (zero cross-sectional dependence);uit to be white noise

2√N/T or N/T → 0 depending on choice of z

3 balanced panel

I Test is based on a ‘bias-adjusted’t-statistic for φ which has a N(0, 1)dbn

I However, procedure starts with panel-specific OLS estimates of themodel


Breitung (2000) TestI Amended version of LLC that performs better with fixed effectsI Hypotheses



I Test requires

1 a balanced panel2 ε is iid

I A robust version is available that allows for contemporaneouscross-sectional dependence


Im, Pesaran & Shin (IPS, 2003) TestI Model

∆yit = φi yit−1 + zitβi +∑ps=1 γis∆yit−s + εit

where ε is iid normal across panels, but may be groupwiseheteroskedastic

I Hypotheses

Ho : all panels contain a unit root (φi = 0 ∀i)Ha : at least some panels do not contain a unit root

I Formally, Ha entails that limN→∞(N1/N)→ δ, δ ∈ (0, 1], where N1 isthe # of panels without a unit root

I Test statistic based on average of the ADF statistics for φ obtainedfrom each panel separately

I Critical values based on fixed T and either fixed N or N → ∞


Combining p-values (Fisher-type) TestI Developed in Madala & Wu (1999)I Hypotheses


Ha : at least some panels do not contain a unit root

I Similar to IPS, but rather than averaging test statistics acrossindividual panels, test is based on aggregating p-values from individualtime series unit root tests

I P-values obtained using either ADF of Phillips-Perron unit root testsapplied to each panel

I Choi (2001) offers some extensions


Residual-Based LM TestI Developed in Hadri (2000); flips the null and alternative hypotheses toaddress bias toward the null in classical statistical tests

I Hypotheses

Ho : no panels contain a unit root

Ha : at least one panel contains a unit root

I Test designed for large T , moderate N panels; asymptotically testrequires T → ∞ followed by N → ∞

I Test statistic is based on the OLS residualsI Test requires a balanced panel


Pesaran (2003) TestI Extends IPS to allow for cross-sectional dependence; referred to asCIPS

I Test statistic based on average of the cross-sectionally augmented DF(CADF) statistics obtained from each panel separately

I CADF regressions include the lagged cross-section mean, y t−1, and itsFD, ∆y t , in the ADF regressions

I Stata: -pescadf-


Nonstationary PanelsSpurious Regression

Entorf (1997)I FE regressions involving independent random walks with and withoutdrift

I For T → ∞ and N fixed, standard FE results are highly misleading;significant, high R2

Phillips & Moon (1999)I Analyzed four cases:

1 Panel spurious regression with no cointegration2 Heterogeneous panel cointegration (vectors vary across i)3 Homogeneous panel cointegration4 Near-homogeneous panel cointegration

I Show that pooled OLS is consistent in each case using both sequentialand joint limits

I Intuition: information from independent cross-section data is valuable

Choi (2013)My own simulations follow ...


SimulationsI DGP

yit = ci + yit−1 + εyt , t = 2, ...,T

xit = ci + xit−1 + εxt , t = 2, ...,T

yi1 = cixi1 = ci

εxt , εyt ∼ N(0, 1)

I Reps = 20,000I Sample sizes

1 N = 10, T = 1012 N = 10, T = 10013 N = 1000, T = 1014 N = 1000, T = 1001

I Estimating equationyit = ci + βxit + εit


02

46

8D

ensi

ty

.7 .8 .9 1 1.1 1.2Beta

Note: y=c+L.y+ey; x=c+L.x+ex; ey,ex ~ N(0,1). N = 10; T = 101; Reps = 20000.

Spurious RegressionEmpirical Distribution of Beta


0.0

02.0

04.0

06.0

08.0

1D

ensi

ty

0 100 200 300 400tstatistic

Note: y=L.y+ey; x=L.x+ex; ey,ex ~ N(0,1). N = 10; T = 101; Reps = 20000.

Spurious RegressionEmpirical Distribution of tStatistics


05

1015

2025

Den

sity

.9 .95 1 1.05 1.1Beta




02.

0e0

44.

0e0

46.

0e0

48.

0e0

4.0

01D

ensi

ty

0 1000 2000 3000 4000tstatistic




020

4060

80D

ensi

ty

.96 .97 .98 .99 1Beta




0.0

02.0

04.0

06.0

08.0

1D

ensi

ty

1400 1500 1600 1700 1800tstatistic




050

100

150

200

250

Den

sity

.99 .995 1 1.005Beta




02.

0e0

44.

0e0

46.

0e0

48.

0e0

4.0

01D

ensi

ty

14000 15000 16000 17000 18000tstatistic




Nonstationary PanelsCointegration

Modelyit = βixit + zitγi + εit

where y and x are I (d), d > 0, variables

zit =

1 ⇒ FE model0[1 t] ⇒ FE model with panel-specific time trend

and εit is I (d − 1) iff y and x are cointegrated

Homogeneity or poolability refer to whether the cointegrating vectoris common across panels, βi = β

Tests differ depending on whether they allow for cross-sectionaldependence, structural breaks, etc.

Error correction models allow for estimation of adjustment speedconditional on cointegration


Test procedures

Kao (1999) TestsI Based on DF- or ADF-type tests of the FE residualsI Different test statistics depending on assumptions concerning strictexogeneity

I Ho : no cointegration

Residual-Based LM TestI Developed in McCoskey & Kao (1998)I Based on residuals from Fully Modified OLS (FMOLS) or dynamic OLS(DOLS)

I Ho : cointegrationI Stata: -xtdolshm-

Pedroni (2000, 2004) TestsI Ho : no cointegrationI Stata: -xtpedroni-

Westerlund (2007)I Stata: -xtwest-


econometrics iii daniel l. millimetpreceding examples are two-dimensional panels i more complex...

Documents