econometrics iii daniel l. millimetpreceding examples are two-dimensional panels i more complex...
TRANSCRIPT
ECO 6375Econometrics III
Daniel L. Millimet
Southern Methodist University
Fall 2020
DL Millimet (SMU) ECO 6375 Fall 2020 1 / 164
Panel DataIntroduction
DefinitionPanel data refers to data with multiple observations of each basic unit ofobservation.
Does not necessarily involve time dimension
ExamplesI yit = outcome of observation i at time t (i = 1, ...,N; t = 1, ...,T )
F Exs: countries over time; firms over time; individuals over time
I yih = outcome of individual i in household h (i = 1, ...,N; h = 1, ...,H)I yim = outcome of employee i in firm m (i = 1, ...,N; m = 1, ...,M)I yis = test score of student i in subject s (i = 1, ...,N; s = 1, ...,S)
DL Millimet (SMU) ECO 6375 Fall 2020 2 / 164
Preceding examples are two-dimensional panelsI More complex panels involve more dimensionsI Examples
F yist = outcome of county i in state s at time tF yiht = outcome of individual i in household h at time tF yijst = outcome of firm i in industry j in state s at time t
For exposition, we will stick to repeated observations of the same uniti over time t
DL Millimet (SMU) ECO 6375 Fall 2020 3 / 164
Types of panel data sets ...
DefinitionBalanced panel data set refers to situation where each unit of observationi is observed the same number of time periods, T . Thus, the total samplesize is NT .
DefinitionUnbalanced panel data set refers to situation where each unit ofobservation i is observed an unequal number of time periods, Ti . Thus,the total sample size is ∑i Ti .
Micro vs macro panelsI Micro: Typically large N, small TI Macro: Typically small N, large T
F Some convergence over time as data become plentiful
DL Millimet (SMU) ECO 6375 Fall 2020 4 / 164
Why does panel data require special attention?
Panel data is useful due to1 Effi ciency gains from large samples2 Potential to analyze dynamic behavior (relative to CS)3 Potential to address correlation btw covariates and the error
F Panel data methods provide a solution to endogeneity in only certainsituations
F Requires ‘new’estimation techniques
Asymptotics with panel data requires more precisionI Large N, small TI Small N, large TI Large N, large T (jointly or sequentially)
Econometric analysis of panel data started with Mundlak (1961,1978), Hoch (1962), Balestra, Nerlove (1966)
DL Millimet (SMU) ECO 6375 Fall 2020 5 / 164
Panel DataModel & Terminology
Population regression fn given by E[y |x1, ..., xk , c ]I Assuming linearity: E[y |x1, ..., xk , c ] = β0 + xβ+ cI xk , k = 1, ...,K , are observable (to the econometrician)I c is an unobservable (to the econometrician) variable
Error form of the model
y = β0 + xβ+ c + ε
where c is the unobserved effect and ε is the idiosyncratic error
c also referred to as individual heterogeneity, unobserved component,individual effect, latent variable
DL Millimet (SMU) ECO 6375 Fall 2020 6 / 164
Some historyI c may be thought of as either a random variable, whose dbn may beestimated, or as a fixed parameter to be estimated
F c is called a random effect in the first view, a fixed effect in the secondview (Balestra & Nerlove 1966)
F This terminology is not the common usage of these terms today
I Terminology has become linked to assumption concerning correlationbetween x and c
F Random effect: Cov(x , c) = 0F Fixed effect: Cov(x , c) 6= 0
I The ‘modern’usage of the terms are now universal
DL Millimet (SMU) ECO 6375 Fall 2020 7 / 164
Formally, model is given by
yit = β0 + xitβ+ εit
where εit is the composite error and captures all unobserveddeterminants of yit
I The composite error is decomposed into partsI In a one-way error components model, εit = ci + εit
F Unobserved effect captures time invariant unobserved attributesspecific to i
F Idiosyncratic error captures the remainder
I In a two-way error components model, εit = ci + λt + εitF Unobserved time effect captures individual invariant unobservedattributes specific to t
F Idiosyncratic error captures the remainder
Problem: Given presence of ci and λt and access to panel data, howcan we recover consistent estimates of β?
DL Millimet (SMU) ECO 6375 Fall 2020 8 / 164
Roadmap
1 Estimation techniques
1 Pooled OLS (POLS)2 Random effects (RE)3 Least squares dummy variable model (LSDV)4 Fixed effects (FE)5 First-differencing (FD)6 Long differences (LD)
2 Dynamic panel models3 LDV models revisited4 Time series concerns revisited
DL Millimet (SMU) ECO 6375 Fall 2020 9 / 164
Panel DataPooled OLS
Estimate by OLSyit = β0 + xitβ+ εit
Assumptions
(POLS.i) Linearity(POLS.ii) (Contemporaneous) Exogeneity: E[x ′it εit ] = 0, t = 1, ...,T
If E [εε′] 6= σ2 INT , POLS1 Is ineffi cient relative to GLS2 Provides incorrect standard errors3 Requires large N, small T asymptotics
DL Millimet (SMU) ECO 6375 Fall 2020 10 / 164
Two optionsI Estimate parameters using POLS, but report cluster-robust standarderrors to allow for arbitrary correlation within i
F See Cameron & Miller (JHR, 2015)F More details laterF Stata: -reg, vce(cluster i)-
I Place structure on E [εε′] = Ω(θ) and use pooled FGLS (PFGLS)F Equicorrelated errors: ρts = Corr(εit , εis ) = ρ ∀i , t 6= sF Autoregressive errors: AR(p) s.t. εit = ∑ps=1 ρs εit−s + uitF Moving average errors: MA(p) s.t. εit = uit −∑ps=1 θsuit−sF Unstructured, except equal across i :
ρts =1N
∑i(εit − εt) (εis − εs)
F Stata: -xtreg, pa-
DL Millimet (SMU) ECO 6375 Fall 2020 11 / 164
Panel DataRandom Effects
Estimate by FGLS
yit = β0 + xitβ+ εit , εit = ci + εit
This is identical to POLS except explicit structure is placed on thecomposite error
Assumptions
(RE.i) Strict exogeneity: E[εit |xi , ci ] = 0, t = 1, ...,T (xi is a vector)(RE.ii) Independence: E[ci |xi ] = E[ci ] = 0(RE.iii) Full rank: E[x ′iΩ
−1i xi ] = Kwhere Ωi = E[εi ε
′i ] is a T × T matrix
(RE.iv) Conditional homoskedasticity and zero covariances:E[εi ε′i |xi , ci ] = σ2ε IT
(RE.v) Homoskedasticity: E[c2i |xi ] = σ2c
DL Millimet (SMU) ECO 6375 Fall 2020 12 / 164
Estimation by Feasible GLS (FGLS)
Under (RE.i) — (RE.iii), FGLS uses an unrestricted estimate of Ωi
βFGLS =(
∑i x′i Ω−1i xi
)−1 (∑i x
′i Ω−1i yi
)Using (RE.iv), (RE.v), Ωi = Ω(θ) has a convenient form
Ω(θ) =
σ2ε + σ2c σ2c · · · σ2c
σ2c. . .
......
. . . σ2cσ2c · · · σ2c σ2ε + σ2c
(RE structure)
RE structure ⇒
corr(εit , εis ) =σ2c
σ2c + σ2ε=
Var(c)Var(ε)
> 0 ∀t 6= s
which measures the fraction of the variance of the composite errordue to the unobserved effectDL Millimet (SMU) ECO 6375 Fall 2020 13 / 164
Given estimates of σ2c and σ2ε , estimator given by
βRE =(
∑i x′i Ω−1xi
)−1 (∑i x
′i Ω−1yi
)where
Ω = σ2ε IT + σ2c jT j′T
IT = T × T identity matrix
jT = T × 1 vector of 1s
NotesI Consistency does not require (RE.iv), (RE.v) holding — just like OLS isconsistent even when Ω 6= IT
I Consistency of FGLS does require strict exogeneity such that
plim(
∑i x′i Ω−1xi
)−1 (∑i x
′i Ω−1εi
)= 0
DL Millimet (SMU) ECO 6375 Fall 2020 14 / 164
To estimate σ2c and σ2εI Do POLS ⇒ βPOLSI Obtain residuals ⇒ εitI Obtain estimates
σ2ε =1
NT −K ∑i ∑t ε2itσ2c =
1[NT (T−1)
2 −K] ∑i ∑T−1t=1 ∑Ts=t+1 εitεis
σ2ε = σ2ε − σ2c
If εit is heteroskedastic and/or serially correlated, then
Ω =1N
∑iεiε′i
is more effi cient asymptotically, but improved finite sampleperformance requires N >>> TNote: RE estimator is asymptotically equivalent to PFGLS imposingequicorrelated errors
DL Millimet (SMU) ECO 6375 Fall 2020 15 / 164
Can show that the RE estimator is equivalent to a quasi-timedemeaned estimator
I Define ∨z it ≡ zit − λz i ·
for any variable zit for some value of λand z i · is the average of zwithin i
I The RE estimator may be obtained by POLS estimation of
∨y it =
∨x itβ+
∨ε it
with
λ = 1−
1
1+ T(
σ2cσ2ε
)1/2
I In contrast, POLS sets λ = 0I λ 6= 0 ⇒ strict exogeneity is needed instead of exogeneity
DL Millimet (SMU) ECO 6375 Fall 2020 16 / 164
Specification test: RE vs. POLS
Hypothesis
Ho : σ2c = 0
Ha : σ2c 6= 0
Devising a test is diffi cult because the null lies on the edge of thefeasible parameter space
Breusch-Pagan (1980) test
λLM =NT
2(T − 1)
∑Ni=1
(∑Tt=1εit)2
∑Ni=1 ∑T
t=1ε2it − 1
2
∼ χ21
where εit is the residual from POLS and ∑Tt=1εit = Tεit
Rejection of Ho is indicative of serial correlation and not necessarilythe RE structure per se
DL Millimet (SMU) ECO 6375 Fall 2020 17 / 164
A more robust test which is valid for any dbn of εit is based on
∑Ni=1 ∑T−1
t=1 ∑Ts=t+1
εitεis[∑Ni=1
(∑T−1t=1 ∑T
s=t+1εitεis)2]1/2 ∼N(0, 1)
where rejection of Ho again is indicative of serial correlation and notnecessarily the RE structure per se
Wood (Biometrika, 2013) develops an alternative test
Baltagi & Li (1995) devise a test of serial correlation of εit assumingc and ε are normally distributed; this is a test of the RE structure as εshould be serially uncorrelated
Extension to two-way error component model most easily handled byadding T − 1 time dummies, although there is also a two-way REerror components model as well estimable via FGLS
Stata: -xtreg, re-, -xttest0-
DL Millimet (SMU) ECO 6375 Fall 2020 18 / 164
Panel DataFixed Effects
Modelyit = β0 + xitβ+ εit , εit = ci + εit
Alternative strategies transform the model to eliminate ci from themodel
I Several possible transformationsI One commonly used approach is FE; also known as within estimator, ormean-differencing
Assumptions
(FE.i) (RE.i) Strict exogeneity(FE.ii) Full rank: E[
..x ′i..x i ] = K
where..x ′i = xi − x i ·
(FE.iii) (RE.iv) Conditional homoskedasticity and zero covariances
DL Millimet (SMU) ECO 6375 Fall 2020 19 / 164
Estimation...
Structural modelyit = ci + xitβ+ εit
where β0 is subsumed into ciAveraging over T time periods implies
y i = ci + x i β+ εi
andy = c + xβ+ ε
where bars indicate average over T obs within group; double barsindicate average over entire sample
DL Millimet (SMU) ECO 6375 Fall 2020 20 / 164
Taking differences yields
yit − y i = (xit − x i )β+ (εit − εi )..y it =
..x itβ+
..εit
I Estimable by POLS without a constantI Equivalent to our prior notation with λ = 1
Alternative estimating equation
..y it + y = c + (
..x it + x)β+ (
..εit + ε)˜y it = c + ˜x itβ+ ˜εit
I Estimable by POLS with a constantI Constant is the mean unobserved effect
DL Millimet (SMU) ECO 6375 Fall 2020 21 / 164
NotesI Although estimating equation differs from structural equation, interpret
βFE as original βI Transformation eliminates ci and any time invariant x’s; time invariantx’s invalidate (FE.ii)
I Estimates of unobserved effects are recoverable from FE estimation
ci = y i − x i βFE
F The dbn of c can be examined to assess the amount of heterogeneity inthe population
F ci is the mean of εit over T for each iF ci is unbiased, but consistency requires T → ∞F Stata: -predict, u-
I Between estimator is obtained from OLS estimation of
y i = x i β+ εi
which is only consistent under (RE.i) and (RE.ii)
DL Millimet (SMU) ECO 6375 Fall 2020 22 / 164
Inference
Var(
βFE
)= σ2..ε
(∑i ∑t
..x ′it
..x it)−1
σ2..ε =1
N(T − 1)−K ∑i ∑t..ε2
it
I Differs from the usual POLS std errors since that would use NT −Kdof rather than N(T − 1)−K
I Difference arises since std errors must account for fact that N dof arelost due to estimation of means
I Solution: Multiply std errors by correction factor ⇒√
NT−KN (T−1)−K
Stata: -xtreg, fe-, -areg-
DL Millimet (SMU) ECO 6375 Fall 2020 23 / 164
Extension to two-way error component model most easily handled byadding T − 1 time dummiesIf T is too large, an alternative is to transform the model to eliminatethe time effects, λt
I Model given byyit = xitβ+ εit
wherezit = zit − z i − z t + z
for z = y , x , ε estimable by POLS with dof correctionI Additional parameters estimated as
c = yit − xit βci = (y i − y)− (x i − x)β
λt = (y t − y)− (x t − x)β
I Stata: -felsdvreg-, -reghdfe-
DL Millimet (SMU) ECO 6375 Fall 2020 24 / 164
Specification tests
εit are serially correlatedI Not obvious since FE yields estimates of
..εit , not εit
I Moreover,..εit are negatively serially correlated under Ho : εit is serially
uncorrelatedCorr(
..εit ,
..εis ) = −
1T − 1 ∀s 6= t
I Test implemented by estimating via OLS
..εiT = α0 + α1
..εiT−1 + ηi , i = 1, ...,N
and testing Ho : α1 = −1/(T − 1) using a standard t-testI Presence of serial correlation invalidates the usual FE standard errors;cluster-robust std errors given by
Var(
βFE
)=(..x ′..x)−1 (
∑i..x ′i..εi..ε′i..x i
)−1 (..x ′..x)−1
F See Cameron & Miller (JHR, 2015)
DL Millimet (SMU) ECO 6375 Fall 2020 25 / 164
(cont.)I Alternative to cluster-robust standard errors is FE-FGLS estimator maybe used
F AR(1) model setup
yit = ci + βxit + εit
εit = ρεit−1 + uit
where |ρ| < 1 and uit ∼ WN(0, σ2u )F Estimation proceeds by estimating ρ and transforming the model toyield iid errors, then the FE or RE estimator is applied to thetransformed model
F See Baltagi & Li (1991), Baltagi & Wu (1999), Wooldridge (2002, pp.276-8)
F Stata: -xtregar-
DL Millimet (SMU) ECO 6375 Fall 2020 26 / 164
εit are cross-sectionally correlated (aka cross-sectional dependence)
Ho : Cov(εit , εjt ) = 0 ∀i 6= jHa : Cov(εit , εjt ) 6= 0 for some i 6= j
I T > N : Breusch-Pagan (1980) test
λLM = T ∑N−1i=1 ∑N
j=i+1 ρ2ij ∼ χ2d
where d = N(N − 1)/2 and
ρij = ρji =∑Tt=1 εit εjt√(
∑Tt=1 ε2it
)√(∑Tt=1 ε2jt
)F Intuition: compute N ×N correlation matrix; test significance ofoff-diagonal terms
F Test does not have good statistical properties when T < N , and likelyto do worse as N → ∞
F See -xttest2- in Stata
DL Millimet (SMU) ECO 6375 Fall 2020 27 / 164
(cont.)I A scaled version is appropriate asymptotically if T → ∞ and thenN → ∞
λSCLM =
√1
N(N − 1) ∑N−1i=1 ∑N
j=i+1(T ρ2ij − 1) ∼N(0, 1)
with an improved version developed in Peasaran et al. (2008)I T < N : Pesaran (2004) test
λCD =
√2T
N(N − 1) ∑N−1i=1 ∑N
j=i+1 ρij ∼N(0, 1)
as N → ∞ for suffi ciently large T
F Stata: -xtcsd, pes-
DL Millimet (SMU) ECO 6375 Fall 2020 28 / 164
Groupwise heteroskedasticityI Errors are homoskedastic within groups, heteroskedastic across groupsI For example, errors for a given individual have same variance in eachperiod, but each individual has a unique variance
I Structure:
Σi︸︷︷︸TxT
=
σ2i
0. . .
.... . .
. . .0 · · · 0 σ2i
; Σ︸︷︷︸NTxNT
=
Σ1
0. . .
.... . .
. . .0 · · · 0 ΣN
I Hypothesis
Ho : σ2i = σ2 ∀iHa : σ2i 6= σ2 for some i
DL Millimet (SMU) ECO 6375 Fall 2020 29 / 164
(cont.)I Modified Wald test statistic
W ′ = ∑Ni=1
(σ2i − σ2
)2Vi
∼ χ2N
where
Vi =1
T − 1 ∑Tt=1
(ε2it − σ2i
)2σ2i =
1T ∑T
t=1 ε2it
I Notes
F Valid in presence of non-normalityF Lower power in ‘large N , small T’FE models
I Solution: FGLS or robust standard errorsI Stata: -xttest3-
DL Millimet (SMU) ECO 6375 Fall 2020 30 / 164
Hausman test: FE vs. RE (Ho : βFE = βRE )I Intuition
F If Cov(ci , xit ) = 0, then RE and FE are both consistent, but RE ismore effi cient ⇒ βRE ≈ βFE
F If Cov(ci , xit ) 6= 0, then RE is inconsistent, but FE is consistent⇒ βRE 6= βFE
I Test statistic based on difference βFE − βRE
H = T(
βFE − βRE
)′ (ΣFE − ΣRE
)−1 (βFE − βRE
)∼ χ2K
I If test statistic is too large, then reject Cov(ci , xit ) = 0I Issues:
1 Does not perform well in small samples2 ΣFE − ΣRE need not be PD
I Stata: -hausman-
DL Millimet (SMU) ECO 6375 Fall 2020 31 / 164
Alternative test: FE vs. REI Mundlak (1978) approach offers an alternative approach to FE;referred to as correlated random effects (CRE) approach as dependencebetween c and x is directly modeled
I Structural model
yit = ci + xitβ+ εit
E[ci |xit ] = x iγ
where we can write ci = x iγ+ ai in error formF Substitution implies
yit = xit β+ x iγ+ ci − E[ci |xit ]+ εit
= xit β+ x iγ+ ai + εit
which is estimable by RE since E[ai + εit |xi ] = 0F Time invariant regressors do not drop out, but coeff is β+ γF An equivalent Hausman test of FE vs. RE is given by Ho : γ = 0
I Approach is very useful in panel data LDV models (discussed later)I Alternative to this alternative: replace x i with xisTs=1 (i.e., x iγbecomes ∑s xisγs ) [Chamberlain approach]
DL Millimet (SMU) ECO 6375 Fall 2020 32 / 164
Panel DataLSDV
Rather than eliminate ci , treats ci as a parameter to be estimated
Not included as part of the error term
Assumptions are identical to (FE.i) — (FE.iii)
Estimationyit = ∑N
j=1 cjDji + xitβ+ εit
where
Dji =1 if j = i0 otherwise
I Amounts to including N dummy vars (and no constant), 1 for eachunit of observation
I Estimated by POLS
DL Millimet (SMU) ECO 6375 Fall 2020 33 / 164
βLSDV is consistent even if Cov(ci , xit ) 6= 0 (regressors can always becorrelated)
I Time invariant regressors still drop out since they are perfectly collinearwith the observation dummies
ci ,LSDV is unbiased (and = ci ,FE ), but consistency requires T → ∞Only feasible computationally if N is of reasonable size
Stata: -areg- (or use Stata’s factor vars syntax)
DL Millimet (SMU) ECO 6375 Fall 2020 34 / 164
Panel DataFirst-Differencing
Alternative transformation than FE to eliminate ciAssumptions
(FD.i) (FE.i)(FD.ii) Full rank: ∑Tt=2 E[∆x ′it∆xit ] = K
where ∆ represents the change from the preceding year(FD.iii) Conditional homoskedasticity and zero covariances:
E[∆εi∆ε′i |xi1, ..., xiT , ci ] = σ2∆ε IT−1
Final assumption holds if εit follows a random walk
εit = εit−1 + ηit
where E[ηiη′i |xi1, ..., xiT , ci ] = σ2η IT−1, or if εit is white noise
DL Millimet (SMU) ECO 6375 Fall 2020 35 / 164
Estimation
Structural modelyit = ci + xitβ+ εit
Implies
yi1 = ci + xi1β+ εi1...
yiT = ci + xiT β+ εiT
DL Millimet (SMU) ECO 6375 Fall 2020 36 / 164
Taking differences between consecutive periods yields
yi2 − yi1 = (xi2 − xi1)β+ (εi2 − εi1)...
yiT − yiT−1 = (xiT − xiT−1)β+ (εiT − εiT−1)
or,
∆yi2 = ∆xi2β+ ∆εi2...
∆yiT = ∆xiT β+ ∆εiT
Estimating equation
∆yit = ∆xitβ+ ∆εit , i = 1, ...,N; t = 2, ...,T
which is estimable by POLSInference does not require dof adjustment since estimation based ononly N(T − 1) observationsCluster-robust std errors obtained in a similar fashion as in FEStata: -reg- and the time series operatorsDL Millimet (SMU) ECO 6375 Fall 2020 37 / 164
NotesI Interpretation of βFD is same as original βI Differencing eliminates ci and any time invariant x’sI Estimates of unobserved effects are recoverable from FD estimation
ci ,FD = y i − x i βFD
but consistency requires T → ∞
Testing for serial correlationI Estimate FD model ⇒ ∆εitI Estimate via OLS
∆εit = ρ∆εit−1 + υit , t = 3, ...,T ; i = 1, ...,N
I If εit is serially uncorrelated, then ∆εit will be serially correlated withCorr(∆εit ,∆εit−1) = −0.5
DL Millimet (SMU) ECO 6375 Fall 2020 38 / 164
Panel DataComparison of Estimators
POLS vs RE vs FE/FD/LSDVI POLS/RE requires E[x ′itci ] = 0 ∀t, FE/FD/LSDV do notI RE/FE/FD/LSDV requires E[x ′is εit ] = 0 ∀s, POLS does not⇒ Little apparent advantage to RE
T = 2⇒ LSDV, FE, and FD are identicalT > 2⇒ FD differs, but all unbiased
I Under (FE.i) — (FE.iii), FE is effi cientI Under (FD.i) — (FD.iii), FD is effi cient
F Note: FE.iii assumes errors are iid in levels, FD.iii assumes errors are iidin differences (and hence may be serially correlated in levels) ... verydifferent assumptions
Long differences (LD) used in some instances; less effi cient, but maybe less susceptible to mis-specification biasMis-specification may be indicated if FE, FD, and LD yield largediscrepancies (ME, timing issues)
DL Millimet (SMU) ECO 6375 Fall 2020 39 / 164
Panel DataStrict Exogeneity
Assumption of strict exogeneity is crucial in FE, FD models
Specification testI FE model
F Estimate via FEyit = ci + xit β+ zit+1δ+ εit
where z ⊆ xF Ho : strict exogeneity ⇔ Ho : δ = 0
I FD model
F Estimate via POLS
∆yit = ∆xit β+ zit δ+ ∆εit
where z ⊆ xF Ho : strict exogeneity ⇔ Ho : δ = 0
DL Millimet (SMU) ECO 6375 Fall 2020 40 / 164
Panel DataGoodness of Fit
Three measures of goodness of fit are commonly reported
Within R2 : Corr[(yit − y i ), (xit β− x i β)]2
Between R2 : Corr[y i , x i β]2
Overall R2 : Corr[yit , xit β]2
Each measure is based on the squared correlation between the actualand fitted values
DL Millimet (SMU) ECO 6375 Fall 2020 41 / 164
Panel DataLong Panels
Long panels entail T > N and asymptotics rely on T → ∞ for fixedN
Estimation can easily be handled using the LSDV approach
Thus, focus is on specification of the form of Ω ⇒ POLS or PFGLSI Arbitrary serial correlation in errors is not possible ⇒ need to specifythe form of Ω
I In addition, one can relax the assumption of independence across i
Stata: -xtpcse-, -xtgls-, -xtscc-
DL Millimet (SMU) ECO 6375 Fall 2020 42 / 164
DL Millimet (SMU) ECO 6375 Fall 2020 43 / 164
Panel DataMeasurement Error
Panel introduces several complications as it relates to measurementerror in covariates
1 In POLS, ME and OVB may partially offset2 In FE and FD,
F bias from ME may be large if covariates are highly persistentF bias from ME affects estimators differently
3 ME may be nonclassical in that it is also persistent
Do not ignore ME in panel models
DL Millimet (SMU) ECO 6375 Fall 2020 44 / 164
Model setup
yit = ci + βx∗it + εit
xit = x∗it + υit
Assuming classical ME and x∗ to be stationary, one can showI For POLS
plim β = β
[1− σ2υ
σ2x
]︸ ︷︷ ︸Attenuation Bias
+σxcσ2x︸︷︷︸
OV Bias
I If sgn(β) = sgn(σxc ), then attenuation and omitted variable bias atleast partially offset
DL Millimet (SMU) ECO 6375 Fall 2020 45 / 164
Assuming classical ME and x∗ to be stationary, one can show(McKinnish 2008)
I For differenced estimators
plim β = β
[σ2x ∗(1− ρj )
σ2x ∗(1− ρj ) + σ2υ
]
where ρj = Corr(x∗it , x∗it−j )
I For FE estimator
plim β = β
σ2x ∗ − 1T 2
[T + 2∑T−1j=1 (T − j − 1)ρj
]σ2x ∗ − 1
T 2
[T + 2∑T−1j=1 (T − j − 1)ρj
]+(T−1T
)σ2υ
DL Millimet (SMU) ECO 6375 Fall 2020 46 / 164
Griliches & Hausman (1986) verify that if x∗it has a decliningcorrelogram, then ρ1 > ρ2 > ... and
I FD has the largest biasI Longer differences reduce the attenuation biasI T − 1 long difference has the smallest biasI Relative bias of long differences shorter than T − 1 and the FEestimator is ambiguous, depends on specifics of the correlationstructure
I Relative bias of POLS and FE/FD/LD is unknownI Longer differences result in a loss of dof, effi ciency loss
DL Millimet (SMU) ECO 6375 Fall 2020 47 / 164
Serial correlation in ME
yit = ci + βx∗it + εit
xit = x∗it + υit
x∗it = ρx∗it−1 + ηitυit = δυit−1 + ξ it
where ρ is the degree of persistance in x∗ and δ is the degree ofpersistance in υ
I For differenced estimators
plim β = β
[σ2x ∗(1− ρj )
σ2x ∗(1− ρj ) + σ2υ(1− δj )
]
where ρj = Corr(x∗it , x∗it−j ) and δj = Corr(υit , υit−j )
I If δ < ρ, then the pattern for bias remains the same: longer differences⇒ smaller bias
I If δ > ρ, then the pattern for bias reverses: FD ⇒ smallest bias
DL Millimet (SMU) ECO 6375 Fall 2020 48 / 164
Consistent estimation in the presence of ME requires IVI Assuming
F No serial correlation in the ME,F Strict exogeneity of xit and x∗it conditional on ci with respect to εit , andF Strict exogeneity of x∗it conditional on ci with respect to υit ,
we can substitute for x∗it and FD to eliminate ci
∆yit = ∆xitβ+ ∆εit + ∆υit
I Possible instruments for ∆xit include xit−2 or xit+1 or externalinstruments
DL Millimet (SMU) ECO 6375 Fall 2020 49 / 164
Panel DataContemporaneous Correlation Between Covariates and the Error
Measurement error or omitted variables can lead to contemporaneouscorrelation between x and ε
If c is also correlated with x , begin by first-differencing to eliminate c
∆yit = ∆xitβ+ ∆εit
POLS applied to this eqtn is biased if Corr(∆x ,∆ε) 6= 0IV methods remain consistent; in addition to finding typicalinstruments, zit , from outside the model, other possibilities include
I xit−2, xit+1I yit−2
While popular, such instruments may be weak, invalid in the presenceof serial correlation, or hard to justify (Reed 2015)
Stata: -xtivreg2-
DL Millimet (SMU) ECO 6375 Fall 2020 50 / 164
Panel data can lead to other ‘tricks’that may generate instruments
Example: Pitt and Rosenzweig (1990) siblings FE modelI Sample restricted to hh’s with 1 boy, 1 girlI Interested in effect of an endogenous hh level variable, DhI No outside instruments availableI Solution
ybh = x1hβb + x2hγ+ τbDh + εbh
ygh = x1hβg + x2hγ+ τgDh + εgh
⇒ ∆yh = x1h∆β+ ∆τDh + ∆εh
and now x2h are available as instruments for Dh , but model onlyidentifies ∆τ (the differential impact of D on boys relative to girls)
DL Millimet (SMU) ECO 6375 Fall 2020 51 / 164
Panel DataTime Invariant Regressors
FE, FD estimators preclude estimation of coeffs on time invariantregressorsOne solution is to use a two-step estimator
I Model setupyit = ci + xitβ+ ziα+ εit
where zi is a vector of time invariant regressorsI Estimation
F Step #1: Estimate via FE or FD
yit = ci + xit β+ εit
yielding a consistent estimate of β and an unbiased estimate of ciF Step #2: Estimate via OLS (or WLS)c i = α0 + ziα+ υi
(Hornstein & Greene 2012)
I α is consistent if E[υ|z ] = E[υ]; strong assumption on z
DL Millimet (SMU) ECO 6375 Fall 2020 52 / 164
Alternative solution: Hausman & Taylor (1981)
Re-write above model as
yit = ci + x1itβ1 + x2itβ2 + z1iα1 + z2iα2 + εit
such thatI x1it = vector of K1 vars uncorrelated with ciI x2it = vector of K2 vars correlated with ciI z1i = vector of L1 vars uncorrelated with ciI z2i = vector of L2 vars correlated with ci
Assumption about the unobservables
E[ci |x1it , z1i ] = 0; E[ci |x2it , z2i ] 6= 0Var(ci |x1it , z1i , x2it , z2i ) = σ2c
Cov(εit , ci |x1it , z1i , x2it , z2i ) = 0
Var(εit + ci |x1it , z1i , x2it , z2i ) = σ2 = σ2ε + σ2c
Corr(εit + ci , εis + ci |x1it , z1i , x2it , z2i ) = ρ = σ2c/σ2
DL Millimet (SMU) ECO 6375 Fall 2020 53 / 164
EstimationI POLS or GLS is biased and inconsistentI FE or FD precludes estimation of αI Solution is to use an IV estimator based only on the data at hand
F Mean-differencing yields
..y it =
..x1it β1 +
..x2it β2 +
..εit
which could be estimatedF This shows that
..x1 and
..x2 are uncorrelated with c and are thus
available as instruments for the eqtn in levelsF z1 serves as instruments for itselfF x1 serves as instruments for z2F ..x serve as instruments for x
I Identification requires K1 ≥ L2I IV regression must account for the treatment of c as a RE ⇒ IV-FGLSsolution
I Weak IVs may still be a problem
Stata: -xthtaylor-
DL Millimet (SMU) ECO 6375 Fall 2020 54 / 164
Panel DataDynamic Panel Model (DPD)
Structural model
yit = ci + xitβ+ γyit−1 + εit , |γ| < 1
where i = 1, ...,N; t = 2, ...,T ; and T > 3Assumptions
(DPD.i) (FD.i) Strict exogeneity(DPD.ii) (FD.ii) Full rank(DPD.iii) (FD.iii) Conditional homoskedasticity and zero covariances(DPD.iv) Initial condition: E[yi1εit ] = 0 ∀t > 1
InterpretationI β = short-run (one-period) effect of a unit change in xI β/(1− γ) = long-run effect of a unit change in x
DL Millimet (SMU) ECO 6375 Fall 2020 55 / 164
Comments
Even if Cov(ci , xit ) = 0, POLS/RE not applicable sinceCov(ci , yit−1) 6= 0Even if Cov(ci , xit ) = 0, FE/FD not applicable sinceE[yit−1εit−1] 6= 0⇒ yit−1 is not strictly exogenous
I Fixed Effects
F Nickel (1981) derived the formula for the bias of γFEF Bias → 0 as T → ∞ since Cov(
..y it−1,
..εit )→ 0 as T → ∞
I FD
F Model
∆yit = ∆xit β+ γ∆yit−1 + ∆εit , i = 1, ...,N ; t = 3, ...,T
which is not estimable by POLS since Cov(∆yit−1,∆εit ) 6= 0 sinceCov(yit−1, εit−1) 6= 0
F Bias 9 0 as T → ∞
DL Millimet (SMU) ECO 6375 Fall 2020 56 / 164
For both FE/FD, bias 9 0 as N → ∞ ⇒ FE/FD are inconsistentunder large N asymptotics
If Cov(xit , ci ) = Cov(xit , εit ) = 0 ∀x , then
γPOLS > γ > γFE > γFD
since
Cov(yit−1, ci + εit ) > 0 > Cov(..y it−1,
..εit ) > Cov(∆yit−1,∆εit )
I Thus, estimators bound true valueI Not overly useful though since we usually are interested in β as well
DL Millimet (SMU) ECO 6375 Fall 2020 57 / 164
Solution #1 (Anderson & Hsiao 1981, 1982)
FD, then estimate via instrumental variables (IV), treating ∆yit−1 asendogenous
Instruments: yit−2, xit−2I Cov(∆yit−1, yit−2) = Cov(yit−1, yit−2)−Cov(yit−2, yit−2) 6= 0I Cov(∆yit−1, xit−2) = Cov(yit−1, xit−2)−Cov(yit−2, xit−2) 6= 0I Cov(∆εit , yit−2) = Cov(εit , yit−2)−Cov(εit−1, yit−2) = 0 if ε is notserially correlated
I Cov(∆εit , xit−2) = Cov(εit , xit−2)−Cov(εit−1, xit−2) = 0 if x ispredetermined (weaker than strictly exogenous)
Model is overidentified
Anderson & Hsaio (1981) also suggest ∆yit−3 as an instrumentγIV is consistent as N → ∞ for fixed T as long as T > 3 under(DPD.i) — (DPD.iv)
Note: As γ→ 1, yit−2, xit−2 become very weak instruments
DL Millimet (SMU) ECO 6375 Fall 2020 58 / 164
Solution #2 (Arellano & Bond 1991)
FD, then estimate via IV, treating ∆yit−1 as endogenous as aboveHowever, there are LOTS of potential instruments not considered inA&H method (if T > 3)What are potential IVs?
I Need vars that are correlated with ∆yit−1, uncorrelated with ∆εitI Suitable candidates
F xit−2 (through yit−2)F yit−2 (through yit−2)F yit−3, yit−4, yit−5, ... (through autoregressive process) ... e.g.,
Cov(∆yit−1, yit−3) = Cov(yit−1, yit−3)−Cov(yit−2, yit−3) 6= 0
I Lots of instruments (beware of weak IVs)I Not standard IV/TSLS setup since # of instruments varies byobservation (by t)
DL Millimet (SMU) ECO 6375 Fall 2020 59 / 164
Estimation by GMM to utilize more instruments
Effi cient given (DPD.ii) — (DPD.iv)
Writing out model for each period yields
∆yi3 = ∆xi3β+ γ∆yi2 + ∆εi3
∆yi4 = ∆xi4β+ γ∆yi3 + ∆εi4...
∆yiT = ∆xiT β+ γ∆yiT−1 + ∆εiT
where IVs for ∆yi2 are xi1, yi1; IVs for ∆yi3 are xi2, yi2, yi1; ... ; IVs for∆yiT−1 are xiT−2, yiT−2, ..., yi2, yi1GMM allows moment conditions to be derived using as many IVs asdesired
Requires εit to be serially uncorrelated; or, equivalently, ∆εit should beAR(1)
DL Millimet (SMU) ECO 6375 Fall 2020 60 / 164
ASIDE: GMM/MDE details
MLE is effi cient among CAN estimators in the context of thespecified parametric model
GMM estimators are robust to some variations in the underlyingDGP; less effi cient than MLE if parametric assumptions hold
Both are examples of M-estimators
Method of Moments (MM) intuitionI With random sampling, a sample statistic —moment —converges inprobability to a constant
I This constant is a fn of unknown population parametersI Thus, to estimate K parameters (θk , k = 1, ...,K ) requires K statisticswith known plims
I Equating the K moments to the K plims, these are inverted to solvefor θ as fns of the moments
I Moments are CAN ⇒ estimator of θ will also be CAN
DL Millimet (SMU) ECO 6375 Fall 2020 61 / 164
Example: yi ∼N(µ, σ2)
I Given a random sample, two moment conditions given by
m1 = plim1N
∑i yi = µ
m2 = plim1N
∑i y2i = σ2 + µ2
I Inversion yields
µ = m1 = y
σ2 = m2 −m21 =(1N
∑i y2i
)−(1N
∑i yi
)2=1N
∑i (yi − y)2
DL Millimet (SMU) ECO 6375 Fall 2020 62 / 164
General case for MM estimatorI K moment conditions
m1 = µ1(θ1, ..., θK )
...
mK = µK (θ1, ..., θK )
I Assuming the K eqtns are functionally independent, inversion yields
θ1 = θ1(m1, ...,mK )...
θK = θK (m1, ...,mK )
I Except for the case of random sampling from the exponential family ofdbns, θMM is not effi cient
DL Millimet (SMU) ECO 6375 Fall 2020 63 / 164
Asymptotic properties
Asy.Var(θ) = 1N
[G′(θ)Φ−1G (θ)
]−1where G (θ) is a K ×K matrix with row k given by
G (θ) =∂mk∂θ′
and Φ is the K ×K estimated asymptotic covariance matrix of themoment conditions, given by
1N
Φjk =1N
1N
∑i [(mj (yi )−mj )(mk (yi )−mk )], j , k = 1, ...,K
and the moment conditions are expressed as
mk =1N
∑i mk (yi )
DL Millimet (SMU) ECO 6375 Fall 2020 64 / 164
Estimation based on orthogonality conditionsI CLRM given by
yi = xi β+ εi
where E[xi εi ] = 0F Condition implies E[xi (yi − xi β)] = 0F Sample analog given by
1N
∑i xi (yi − xi β) = 0
which is a set of K moment conditions as a fn of K parametersF These moment conditions are identical to the LS normal eqtnsF Thus, OLS is a MM estimator
I IV estimator solves1N
∑i zi (yi − xi β) = 0
I TSLS estimator solves
1N
∑i xi (yi − xi β) = 0
DL Millimet (SMU) ECO 6375 Fall 2020 65 / 164
ML can also be recast as a MM estimatorI The scaled log-likelihood fn is
1Nln [L(θ)] = 1
N∑i ln[f (yi |xi , θ)]
I MLE solves
1N
∂ ln[L(θ|y)
]∂θ
=1N
∑i∂ ln[f (yi |xi , θ)]
∂θ= 0
Many estimators can be viewed as a MM estimator
DL Millimet (SMU) ECO 6375 Fall 2020 66 / 164
Minimum Distance Estimation (MDE) extends the MM estimator tothe case of over-identified models
I Setup
m1 = µ1(θ1, ..., θK )
...
mL = µL(θ1, ..., θK )
where L ≥ KI MDE estimator obtained as
θMDE = argminθ[m− µ(θ)]′W [m− µ(θ)]
where m is the vector of moment conditions, µ(θ) is the vector ofplims, and W is a PD weighting matrix
DL Millimet (SMU) ECO 6375 Fall 2020 67 / 164
Example:I Two consistent estimates of the same parameter, θ
m1 = plim θ1 = θ
m2 = plim θ2 = θ
I Estimator is
θMDE = argminθ
[θ1 − θ
θ2 − θ
]′W[
θ1 − θ
θ2 − θ
]where W might be the covariance matrix of θ1, θ2
DL Millimet (SMU) ECO 6375 Fall 2020 68 / 164
Asymptotic properties: CAN with
Asy.Var(θ) =1N
[G′(θ)WG (θ)
]−1 [G′(θ)WΦWG (θ)
][G′(θ)WG (θ)
]−1where G (θ) is a L×K matrix with row ` given by
G (θ) = plim∂m`∂θ′
and Φ is the L× L estimated asymptotic covariance matrix of themoment conditions
Optimal weight matrix minimizes the asymptotic variance: W = Φ−1
Asy.Var(θ) =1N
[G′(θ)Φ−1G (θ)
]−1which is equivalent to the MM estimator
DL Millimet (SMU) ECO 6375 Fall 2020 69 / 164
Generalized Method of Moments (GMM) extends the MDEI Estimator is based on set of population orthogonality conditions
E[m(yi , xi , θ0)] = 0
where θ0 is a K−dimensional vector of true values and m(·) is aL−dimensional vector (L > K )
I Averaging over obs produces the sample moment conditions
E[m(yi , xi , θ0)] = 0
wherem(yi , xi , θ0) =
1N
∑i m(yi , xi , θ0)
I GMM estimator obtained as
θGMM = argminθm(yi , xi , θ)
′Wm(yi , xi , θ)
where W is a PD weighting matrixI Optimal weight matrix minimizes the asymptotic variance: W = Φ−1
Asy.Var(θ) =1N
[G′(θ)Φ−1G (θ)
]−1which is equivalent to the MDE estimator
DL Millimet (SMU) ECO 6375 Fall 2020 70 / 164
Example:I IV estimator solves
1N
∑i zi (yi − xi β) = 0
which yields K moments as fns of K parametersI TSLS estimator can be written as
1N
∑i zi (yi − xi β) = 0
where now there are L moments as fns of K parameters
DL Millimet (SMU) ECO 6375 Fall 2020 71 / 164
Inference in GMMI Wald tests can be conducted as in MLI LR tests can be conducted based on the minimized GMM criteria
N(qR − q) ∼ χ2J
where q (qR ) is the value of the criterion in the unrestricted(restricted) model and J = # of restrictions
I LM tests can be conducted based on the derivatives of the unrestrictedGMM criterion evaluated at the restricted estimator
See -gmm- in Stata
DL Millimet (SMU) ECO 6375 Fall 2020 72 / 164
RETURNING: DPD Estimation Specifics
Recall, the model
yit = ci + xitβ+ γyit−1 + εit , t = 2, ...,T
FD to eliminate ci yields
∆yi3 = ∆xi3β+ γ∆yi2 + ∆εi3
∆yi4 = ∆xi4β+ γ∆yi3 + ∆εi4...
∆yiT = ∆xiT β+ γ∆yiT−1 + ∆εiT
where IVs forI ∆yi2 are xi1, yi1I ∆yi3 are xi2, yi2, yi1...
I ∆yiT−1 are xiT−2, yiT−2, ..., yi2, yi1
DL Millimet (SMU) ECO 6375 Fall 2020 73 / 164
Define matrix of instruments (assuming no x’s)
zi︸︷︷︸(T−2)×p
=
yi1 0 0 · · · 0 · · · 00 yi1 yi2 · · · 0 · · · 0... 0 0 · · · 0 · · · 0...
...... · · · ... · · · ...
0 0 0 · · · yi1 · · · yiT−2
Moment conditions (p × 1)
E[z ′i∆εi ] = 0
∆εi = [∆εi3 ∆εi4 · · · ∆εiT ]′
DL Millimet (SMU) ECO 6375 Fall 2020 74 / 164
Estimator
minγ
(1N
∑i ∆ε′izi
)WN
(1N
∑i z′i∆εi
)where
WN =
[(1N
∑i z′i∆εi∆ε′izi
)]−1and ∆εi are estimated residuals from consistent first-step estimates
I Known as two-step differenced GMM
DL Millimet (SMU) ECO 6375 Fall 2020 75 / 164
Consistent first-step obtained using
W1N =
[(1N
∑i z′iHzi
)]−1where
H(T−2)x (T−2)
=
2 −1 0 · · · 0
−1 . . . . . . . . ....
0. . . . . . . . . 0
.... . . . . . . . . −1
0 · · · 0 −1 2
I Known as one-step differenced GMMI Two-step estimator is asymptotically effi cient if εit are heteroskedastic;one-step if homoskedastic
I Not much gain in practice to two-step
DL Millimet (SMU) ECO 6375 Fall 2020 76 / 164
Specification tests1 Overidentification test (Sargan test)
∆ε′z[∑i z
′i∆εi∆ε′i zi
]−1z ′∆ε ∼ χ2p−1
where p is the # of columns in z2 No serial correlation in εit ⇔ ∆εit is AR(1)
Adding x’s into the model requires assumptions about theircorrelations with the errors
I Assumptions about x’s
F Strict exogeneity: E[xit εis ] = 0 ∀t, sF Predetermined (sequential exogeniety): E[xit εis ] = 0 ∀t 6 sF Endogenous: E[xit εis ] = 0 ∀t < s
DL Millimet (SMU) ECO 6375 Fall 2020 77 / 164
Incorporating x’s into moment conditions
Moment conditions (p × 1)
E[z ′i∆εi ] = 0, ∆εi = [∆εi3 ∆εi4 · · · ∆εiT ]′
x’s strictly exogenous
zi =
Zixi1 · · · xiT 0 · · · · · · · · · · · · · · · 00 · · · 0 xi1 · · · xiT · · · · · · · · · 0...
.
.
....
.
.
....
.
.
.
0 · · · · · · · · · · · · · · · 0 xi1 · · · xiT
where
Zi =
yi1 0 0 · · · 0 · · · 00 yi1 yi2 · · · 0 · · · 0... 0 0 · · · 0 · · · 0...
...... · · · ... · · · ...
0 0 0 · · · yi1 · · · yiT−2
DL Millimet (SMU) ECO 6375 Fall 2020 78 / 164
x’s predetermined
zi =
Zixi1 xi2 0 · · · · · · · · · · · · · · · 00 0 xi1 xi2 xi3 · · · · · · · · · 0...
.
.
....
.
.
....
.
.
.
0 · · · · · · · · · · · · 0 xi1 · · · xiT−1
DL Millimet (SMU) ECO 6375 Fall 2020 79 / 164
x’s endogenous
zi =
Zixi1 0 · · · · · · · · · · · · 00 xi1 xi2 · · · · · · · · · 0...
.
.
....
.
.
....
.
.
.
0 · · · · · · 0 xi1 · · · xiT−2
Notes:
I Ahn & Schmidt (1995) utilize T − 2 additional moment conditions
E[εit∆εit−1 ] = 0, t = 3, ...,T
I All three versions assume E[xitci ] 6= 0; if E[xitci ] = 0, then there areeven more moment conditions
Stata: -xtabond-, -xtdpd-; -xtdpd- allows for greater user-controlledflexibility
DL Millimet (SMU) ECO 6375 Fall 2020 80 / 164
Solution #3 (Arellano & Bover 1995; Blundell & Bond 1998)
A&B (1995), B&B (1998) show that as |γ| → 1, A&H and A&Bestimator is downward biased
In MC study, B&B find if |γ| > 0.8, bias is fairly severeProblem arises due to the fact that the IVs become weak aspersistence grows
SolutionI Add additional moment conditions derived from the model in levels
yit = xitβ+ γyit−1 + ci + εit
= xitβ+ γyit−1 + εit
I What are IVs for yit−1?
F ∆yit−1 (independent of ci , εit )F ∆yit−2,∆yit−3, ... (through autoregressive process)
DL Millimet (SMU) ECO 6375 Fall 2020 81 / 164
EstimationI Define
zsysi =
zi 0 · · · · · · · · · · · · · · · 00 ∆yi2 · · · · · · · · · · · · · · · 0... ∆yi2 ∆yi3
......
......
...... 0
0 · · · · · · · · · 0 ∆yi2 · · · ∆yiT−1
where zi is the matrix of instruments from A&B
I Complete set of moment conditions
E[zsys ′i εsysi ] = 0
εsysi = [∆εi3 ∆εi4 · · · ∆εiT εi3 · · · εiT ]′
DL Millimet (SMU) ECO 6375 Fall 2020 82 / 164
Known as system GMM
Requires additional assumption
(DPD.v) E[ci∆yi2 ] = 0 ∀i
Stata: -xtdpdsys-
DL Millimet (SMU) ECO 6375 Fall 2020 83 / 164
Solution #4 (Hahn et al. 2007)
An alternative approach to circumvent the weak IV issue when|γ| → 1 is to utilize long differences (LD) instead of first-differencingto eliminate ciStructural model
yit = ci + xitβ+ γyit−1 + εit , t = 2, ...,T
LD yields
yiT − yi2 = (xiT − xi2)β+ γ(yiT−1 − yi1) + εiT − εi2
where yi1 is available as an IV for yiT−1 − yi1Even if |γ| is close to unity,
Cov(yiT−1, yi1)−Cov(yi1, yi1) 6= 0
if T is suffi ciently large
DL Millimet (SMU) ECO 6375 Fall 2020 84 / 164
Hahn et al. (2007) also utilize additional IVsI After initial IV estimation, obtain estimated residuals
εis ≡ yis − xis β− γyis−1, s = 3, ...,T − 1I Re-estimate the LD model using yi1, εis as instrumentsI Iterate this procedure; typically only a few iterations are needed
Hahn et al. (2007) demonstrate the superior performance of thisestimator when |γ| → 1
DL Millimet (SMU) ECO 6375 Fall 2020 85 / 164
Solution #5 (Kiviet 1995, 1999; Bun & Kiviet 2003; Bruno 2005;Hausman & Pinkovskiy 2017)
Revert back to Nickel (1981) who derived bias to FE/LSDV estimateof DPD model
StrategyI Obtain an estimate of the approximate bias of the LSDV estimatorI Adjust the estimate γLSDV by the estimated bias to obtain the LSDVC(corrected) estimate
I Implies E[γLSDV + BIAS − γ] = 0
MC evidence indicates good performance for moderate size N
See -xtlsdvc-, -xtbcfe- in Stata
DL Millimet (SMU) ECO 6375 Fall 2020 86 / 164
Solution #6 (Everaert 2013)
Estimation by IV, but in levels using orthogonal to backward meantransformation to instrument for yit−1
yit = ci + xitβ+ γyit−1 + εit , t = 2, ...,T
Instrument for yit−1 is the residual from the first-stage regression ofyit−1 on its ‘backward mean’defined as
ybit−1 =1
t − 1t−1∑s=1
yis
If Cov(xit , ci ) 6= 0, thenI Instruments for xit are derived based on Hausman & Taylor (1981)I Specifically, deviations from individual-specific means, xit − x i
Consistency requires T → ∞
DL Millimet (SMU) ECO 6375 Fall 2020 87 / 164
Solution #7 (Bhargava and Sargan 1983; Hsiao et al. 2002; Kripfganz2016)
QML estimator for large N, small T panels
Based on ML estimator after first-differencing
Stata: -xtdpdqml-
DL Millimet (SMU) ECO 6375 Fall 2020 88 / 164
Final comment
Preceding estimators require εit to be serially uncorrelatedI Can relax this assumption and allow ε to follow a low order MA processI Stata: -xtdpd-
Preceding estimators are no longer consistent if the data areirregularly spaced
I Irregular spacing occurs when the interval between periods in the datadoes not align with the interval between periods in the DGP
I This can occur even if the data are evenly spacedI See McKenzie (2001), Millimet & McDonough (2017), Sasaki & Xin(2017)
DL Millimet (SMU) ECO 6375 Fall 2020 89 / 164
Panel DataLDV Models: Overview
LDV panel models are complicated due to four main issues1 Estimation by MLE requires iid obs to compute a tractable likelihood;unlikely with panels
F Solution ⇒ partial maximum likelihood
2 Transformations to eliminate FEs are generally not available sinceproblems are nonlinear
F Solution (in some models) ⇒ conditional maximum likelihood
3 DV approach to FE models are generally inconsistent unless N,T → ∞F Known as the incidental parameters problemF Solutions ⇒ correlated random effects, bias-correction
4 Marginal effects generally depend on c which is only estimable if oneplaces more structure on the model
Research on-going in many areas, many unresolved questions
DL Millimet (SMU) ECO 6375 Fall 2020 90 / 164
Panel DataLDV Models: Binary Choice Models
Case of a binary dependent variable with panel data
Often useful to start with a LPM and use the FE or FD estimatorI Same shortcomings as LPM with cross-section data
F HeteroskedasticityF Predictions outside unit interval
I Plus, implies restrictions on the feasible values of ciI Nonetheless, very common in applied research
DL Millimet (SMU) ECO 6375 Fall 2020 91 / 164
Pooled Models...
Instead, suppose the model is given by
Pr(yit = 1|xit ) = G (xitβ), t = 1, ...,T
where G (·) ∈ (0, 1) is known and xit must be exogenous (but notstrictly)
I Recall, the likelihood fn gives the total probability of the dataconditional on the parameters
F We simplify this if we assume observations are iidF With panel data, we usually do not want to assume this; rather assumeindependent panels, but not independence across time within a panel
F So, we want to model the joint Pr(yi |xi )F However, without further assumptions, we cannot derive the dbn ofyi ≡ (yi1, ..., yiT ) given xi ≡ (xi1, ..., xiT )
Could assumePr(yi = 1|xi ) = GT (xi β,Σ)
but then likelihood fn would entail T integrals if, say, GT is aT -dimensional normal dbnDL Millimet (SMU) ECO 6375 Fall 2020 92 / 164
We can still obtain consistent estimates using the partial likelihood fn
ln[L(θ)] = ∑i ∑t yit ln[G (xitβ)] + (1− yit ) ln[1− G (xitβ)]
which is different from ML in that we do not assume thatf (yi |xi ) = ∏ ft (yit |xit )
I Known as the PMLE of the pooled probit/logit model when G = Φ,Λ(std normal CDF or logistic CDF)
I Under partial likelihood, it is assumed that G (xitβ) is the correctdensity of f (yit |xi ) = f (yit |xit ) for each t, but the joint densityG (yi |xi ) is not the product of the marginals
DL Millimet (SMU) ECO 6375 Fall 2020 93 / 164
(cont.)I Robust std errors should be used unless one assumes dynamiccompleteness, that is
Pr(yit = 1|xit , yit−1, xit−1, ...) = Pr(yit = 1|xit )
which implies that lagged x and y do not enter the density for yit givenxit
I Specification test for dynamic completeness
F Estimate the pooled probit/logit and obtain εit = yit − G (xit β)F Estimate a new pooled probit/logit where εit−1 is included as aregressor (t > 1); rejection of its coeff as zero implies rejection ofdynamic completeness
DL Millimet (SMU) ECO 6375 Fall 2020 94 / 164
Models With Unobserved Effects...
Model given in latent form is
y ∗it = xitβ+ ci + εit , εit ∼N/L
yit = I(xitβ+ ci + εit > 0)
RE specification assumes
f (ci |xit ) is not dependent on xit
FE specification leaves f (ci |xit ) unrestricted s.t. ci and xit may becorrelated
DL Millimet (SMU) ECO 6375 Fall 2020 95 / 164
RE Specification
Assumptions
E[εit |x ] = E[ci |x ] = Cov(εit , cj |x) = 0 ∀i , j , t
Cov(εit , εjs |x) = Var(εit |x) =1 i = j , t = s0 otherwise
Cov(ci , cj |x) = Var(ci |x) =
σ2c i = j0 otherwise
where x includes xit ∀i , t and assumptions imply that xit is strictlyexogenous conditional on ciImplies
E[εit |x ] = 0
Var(εit |x) = σ2ε = σ2ε + σ2c = 1+ σ2c
Corr(εit , εis |x) = ρ =σ2c
1+ σ2c
where ε is the composite error
DL Millimet (SMU) ECO 6375 Fall 2020 96 / 164
In general, the likelihood involves high-dimensional integration
However, the RE structure simplifies estimation by factoring thedensity into f (εi1, ..., εiT |ci )f (ci ) = f (εi1|ci ) · · · f (εiT |ci )f (ci )Log-likelihood becomes
ln[L(θ)] = ∑i ln[∫ ∞−∞
[∏Tt=1 Pr(yit |xitβ+ ci )
]f (ci )dc
]which requires integrating over the dbn of c
Assuming ci |xi ∼N(0, σ2c ), the likelihood fn becomes
ln[L(θ)] = ∑i ln[∫ ∞−∞
[∏Tit=1 Pr(yit |xitβ+ ci )
] 1σc
φ
(ciσc
)dc]
Butler & Moffi tt (1982) show how to approximate the integralI Known as RE probit estimatorI Estimate of ρ allows testing for presence of unobserved heterogeneity
DL Millimet (SMU) ECO 6375 Fall 2020 97 / 164
Notes...
Maximum Simulated Likelihood (MSL) estimation is an alternativethat allows for c to follow some non-normal dbns
RE logit model also exists and assumes c ∼N
Integrals approximated using various quadrature procedures; resultsmay be sensitive to number of quadrature points
I Stata: -quadchk-
DL Millimet (SMU) ECO 6375 Fall 2020 98 / 164
Marginal effects are more complex since they depend on c
∂Pr(yit = 1)∂xkit
= βkφ(xitβ+ ci )
I Since E[c |x ] = 0, the ME at the population average of c in thepopulation is βkφ(xitβ)
I The relative ME of any two covariates, xk and xk ′ , is
∂Pr(yit=1)∂xkit
∂Pr(yit=1)∂xk ′ it
=βkφ(xitβ+ ci )βk ′φ(xitβ+ ci )
=βkβk ′
which is not observation-specific
DL Millimet (SMU) ECO 6375 Fall 2020 99 / 164
(cont.)I The average marginal effect (AME) is the value of βkφ(xitβ+ ci )averaged over the dbn of c , which in the RE probit is
βkσε
φ
(xit
β
σε
)which only requires knowledge of βc = β/σε, known as thepopulation-averaged parameters
F Interestingly, βc is what is estimated by the pooled probit estimator if cis ignored
F Thus, one may estimate βc via pooled probit with robust std errors(which does not assume iid observations over time), or impose the fullerror structure and estimate the RE probit
DL Millimet (SMU) ECO 6375 Fall 2020 100 / 164
FE Specification
Model re-written as
yit = I(cidit + xitβ+ εit > 0), εit ∼N/L
where dit is a dummy variable equal to 1 for obs i , 0 otherwise
Log-likelihood is
ln[L(θ)] = ∑i ∑Tt=1 Pr(yit |cidit + xitβ)
where
Pr(·) =
Φ[qit (cidit + xitβ)] if εit ∼N
Λ[qit (cidit + xitβ)] if εit ∼ L
and qit = 2yit − 1
DL Millimet (SMU) ECO 6375 Fall 2020 101 / 164
The FE estimator is flawedI As with continuous FE models, ci estimates are inconsistent underfixed-T asymptotics
F Unlike continuous FE models, the inconsistency also causes β to beinconsistent
F There is also a small sample bias, although the magnitude remains anopen question
F Problem known as the incidental parameters problem
I Thus, there is a trade-off between the restrictiveness of the REassumption vs. the bias/inconsistency due to the incidental parametersproblem
I Research continues on bias-corrected estimators
DL Millimet (SMU) ECO 6375 Fall 2020 102 / 164
Question: Why does the inconsistency of ci in the continuous FEmodel not affect β?
I In the continuous y case, the model is transformed into deviations fromobs-specific means
I Formally, while f (yit |X ) depends on ci , f (yit |X , y i ) does notF y i is a minimal suffi cient statistic for ciF y i is then used in the estimation of β
I In the probit model, there is no suffi cient statistic for ciI There is in the logit model
DL Millimet (SMU) ECO 6375 Fall 2020 103 / 164
Before discussing the FE logit model, there is Chamberlain’scorrelated RE probit estimator based on Mundlak (1978)
I Assumeci |xi ∼N(ξ0 + x i ξ1, σ
2a)
where ai = ci − ξ0 − x i ξ1 and σ2a does not depend on xI Under the prior assumptions for the RE probit estimator, apply the REprobit estimator to the model
Pr(yit |xit , ci ) = Φ(ξ0 + xitβ+ x i ξ1 + ai )
where ai |xi ∼N(0, σ2a) and x does not contain an interceptF This yields estimates of σ2a , not σ2cF Testing Ho : ξ1 = 0 ⇔ RE model vs. Chamberlain’s RE model
I The AMEs can be obtained by estimating the pooled probit model
Pr(yit |xit ) = Φ(ξ0 + xit β+ x i ξ1)
or after the RE probit model as
βkφ(
ξ0 + xit β+ x i ξ1)
where β = β/√(1+ σ2a) (and similarly for ξ)
DL Millimet (SMU) ECO 6375 Fall 2020 104 / 164
Now, turn to the FE logit model
Pr(yit = 1|xit ) = Git =expxitβ+ ci
1+ expxitβ+ ci
with the likelihood given as
L(θ) = ∏i ∏t (Git )yit (1− Git )1−yit
and no assumptions are made about dependence between xit and ciTrick is to derive the joint dbn of yi |xi , ci , ni , where ni ≡ ∑t yit (# ofperiods obs i has y = 1)
I Turns out that this dbn is independent of ciI Thus, ni is a minimal suffi cient statistic for ci
Probit model does not give rise to such a suffi cient statistic such thatthe conditional dbn does not depend on ci
DL Millimet (SMU) ECO 6375 Fall 2020 105 / 164
Chamberlain (1980) observed that the conditional likelihood does notdepend on ci
L(θ) = ∏i Pr (yi1, ..., yiT |xi , ni )where the contribution of obs i to the likelihood is
Li (θ) = Pr (yi1, ..., yiT |xi , ni )
=Pr (yi1, ..., yiT , ni |xi )
Pr (ni |xi )=
exp ∑t yit (xitβ)∑a∈Ri exp ∑t (atxitβ)
and Ri ⊂ RT defined as a ∈ RT : at ∈ 0, 1 and ∑t at = niI Ri represents all sequences of 1s and 0s across the T periods such that
∑t at = niI The denominator is summed over the set of all (Tni ) unique sequencesof at that sum to ni
Only obs for which ni /∈ 0,T contribute information about β sinceyi is perfectly determined in these two cases
DL Millimet (SMU) ECO 6375 Fall 2020 106 / 164
Test of homogeneity given by Ho : ci = c ∀iI Likelihood ratio test is not possible since the FE estimates are based onthe conditional likelihood fn
I Hausman test is applicable since both are consistent under the null, butthe FE model is ineffi cient, while only the FE model is consistent underthe alternative(
βFE − β)′ (
ΣFE − Σ)−1 (
βFE − β)∼ χ2
InterpretationI β gives the ME of x on the log odds ratio
ln[Pr(yit = 1)Pr(yit = 0)
]= xitβ+ ci
I MEs of x on Pr(yit = 1), and estimation of the AMEs, is not feasiblewithout structure on the dbn of c , which is what the FE logit modelavoids
DL Millimet (SMU) ECO 6375 Fall 2020 107 / 164
Final notes...
All RE, FE estimators require strict exogeneity of xit conditional on ciI Straightforward to test by adding leads, xit+1, to the model and testingfor significance
I Requires omitting the final period from the estimation
FE logit vs. Chamberlain’s RE probit modelI FE logit leaves the dbn of c unspecified, but does not yield estimatesof the AMEs
I Correlated RE probit based on the augmented pooled probit allows forserial correlation and estimates the AMEs, but forces one to place somestructure on the dbn of c |x
Additional work on FE binary choice models is on-going (e.g., Ai &Gan 2010)
Stata: -xtprobit-, -xtlogit-
DL Millimet (SMU) ECO 6375 Fall 2020 108 / 164
Dynamic Models
Extensions to dynamic binary choice models are in progressI Dynamic binary probit models first considered in Heckman (1981a,b)I Orme (1997, 2001) and Wooldridge (2005) propose alternatives thatare easier to estimate
I Arulampalam and Stewart (2009) propose an easier to estimate versionof Heckman’s estimator; estimable in Stata in a straightforward manner
Akay (2011) compares Heckman and Wooldridge’s approaches
DL Millimet (SMU) ECO 6375 Fall 2020 109 / 164
Model
Pr(yit = 1|xi , yit−1, yit−2, ..., ci ) = G (xitβ+γyit−1+ ci ), t = 2, ...,T
Initial conditions problemI General estimation strategy is to model f (y1, ..., yT |x , c), specify a dbnf (c |x), and then integrate wrt this density
I To figure out f (y1, ..., yT |x , c), we can factor this as
f (y1, ..., yT |x , c) = f (y2, ..., yT |x , y1, c)f (y1 |x , c)
where yi1 is the initial conditionI Thus, maximizing the likelihood fn requires
1 Specification of f (y1 |x , c), or2 Assumption that f (y1 |x , c) = f (y1 |x); i.e., yi1 is independent of ciconditional on xi
DL Millimet (SMU) ECO 6375 Fall 2020 110 / 164
Under (2) the above model can be estimated by conditional MLEusing the RE probit or correlated RE probit with only data fromperiods 2, ..,T and assuming ci |xi ∼N(0, σ2c )
ln[L(θ)] = ∑i ln[∫ ∞−∞
[∏Tt=2 Pr(yit |xitβ+ γyit−1 + ci )
] 1σc
φ
(ciσc
)dc]
or assuming ci |xi ∼N(ξ0 + x i ξ1, σ2a) and integrating wrt to the
density of a
DL Millimet (SMU) ECO 6375 Fall 2020 111 / 164
However, yi1 is likely not to be conditionally independent of ciI Even if period 1 corresponds to the ‘true’initial period, in mosteconomic applications it is diffi cult to justify the assumption of yi1being unrelated to ci
I Thus, the likelihood fn must account for the initial period
ln[L(θ)] = ∑i ln[∫ ∞−∞
[Pr(yi1 |xi , ci )×
∏Tit=2 Pr(yit |xitβ+ γyit−1 + ci )
]1
σcφ
(ciσc
)dc]
Different solutions based on different specifications ofPr(yi1 = 1|xi , ci )
DL Millimet (SMU) ECO 6375 Fall 2020 112 / 164
Heckman approach
Pr(yi1 = 1|xi , zi , ci ) = G (xi1β+ zi1π + λci )
where z may include additional determinants of the initial period ifavailable and λ allows for differences in the distribution of thecomposite error in the initial period
I Likelihood fn
ln[L(θ)] = ∑i ln[∫ ∞−∞
[Pr(yi1 |xi1β+ zi1π + λci )×
∏Tt=2 Pr(yit |xitβ+ γyit−1 + ci )
]1
σcφ
(ciσc
)dc]
I Stewart (2007) extends this to allow for serial correlation in theidiosyncratic errors
I Correlation between xi and ci can be addressed using the Mundlak(1978) approach (i.e., including x i in xit in which case ci becomes ai )
I Stata: -redprob-, -redpace-(http://www2.warwick.ac.uk/fac/soc/economics/staff/academic/stewart/stata/)
DL Millimet (SMU) ECO 6375 Fall 2020 113 / 164
Wooldridge approachI Rather than model the initial condition, this approach specifies a dbnfor f (ci |xi , yi1) and integrates wrt this dbn
I Extending the Mundlak (1978) approach, assume
ci |xi , yi1 ∼N(ξ0 + x i ξ1 + ξ2yi1, σ2a)
where ai = ci − ξ0 − x i ξ1 − ξ2yi1 and σ2a does not depend on x or y1I Now, the conditional likelihood fn is given by
ln[L(θ)] = ∑i ln[∫ ∞−∞
[∏Tt=2 Pr(yit |ξ0 + xitβ+ x i ξ1
+ γyit−1 + ξ2yi1 + ai )
]1
σaφ
(aiσa
)da]
which can be estimated using the usual RE probit modelI Akay (2011) shows this approach performs better than the Heckmanapproach when T > 5
DL Millimet (SMU) ECO 6375 Fall 2020 114 / 164
Orme approachI Not as commonly usedI Write the model for the initial period as
yi1 = I(xi1β+ zi1π + εi1 > 0)
where εi1, ci ∼N2(0, 0, σ2ε , σ2c , r)
I Utilizing properties of the bivariate normal density, we have
ci = rσcσε
εi1 + σc
√(1− r2)wi
where wi ∼N(0, 1) and is independent of εi1I Substituting in for ci yields
Pr(yit = 1|xi , yit−1, ci ) = G(xitβ+ γyit−1 + r
σcσε
εi1 + σc
√(1− r2)wi
), t = 2, ..
which can be estimated by conditional ML using the usual RE probitmodel where wi is the RE and εi1 is replaced with
εi1 =(2yi1 − 1)φ(xi1 β+ zi1π)
Φ[(2yi1 − 1)(xi1 β+ zi1π)]
where β and π are estimated via probit on data from initial period
DL Millimet (SMU) ECO 6375 Fall 2020 115 / 164
Panel DataLDV Models: Tobit Model
Start with the pooled tobit model
y ∗it = xitβ+ εit , εit |xit ∼N(0, σ2)
yit = max(0, y ∗it )
which is estimable as a tobit model with NT obs assuming xit areexogenous (but not strictly)
I As with the pooled probit, this a Partial ML estimatorI Dynamic completeness implies usual inference is correct
F Tested by estimating the following pooled tobit
y ∗it = xit β+ γ1rit−1 + γ2(1− rit−1 )εit−1 + ηit
where rit−1 = I(yit−1 = 0) and εit−1 = yit−1 − xit−1 β if yit−1 > 0 andεit−1 = 0 if yit−1 = 0 using t > 1
I Absent this assumption, robust std errors are needed
DL Millimet (SMU) ECO 6375 Fall 2020 116 / 164
Model with unobserved effects is given in latent form as
y ∗it = xitβ+ ci + εit , εit |xi , ci ∼N(0, σ2ε )
yit = max(0, y ∗it )
Skipping a simple RE version, move to a Chamberlain-like modelI Assume
ci |xi ∼ N(ξ0 + x i ξ1, σ2a)
ai |xi ∼ N(0, σ2a)
εit |xi , ai ∼ N(0, σ2ε )
where ai = ci − ξ0 − x i ξ1I Assumption of εit |xi , ci ∼N(0, σ2ε ) implies xit are strictly exogenousconditional on ci
I Also assume that εit are serially uncorrelated
Under this setup, a RE tobit estimator with xit and x i as covariatesprovides consistent estimates of the parametersTesting Ho : ξ1 = 0 ⇔ RE tobit model vs. Chamberlain’s RE model
DL Millimet (SMU) ECO 6375 Fall 2020 117 / 164
Final notes...
AMEs discussed in Wooldridge (2002)
No firm results on the behavior of MLE including obs-specificdummies
I Prior research suggests β is not biased, but σ is biased down (Greene2003)
I Even if β is unbiased, inference and computation of marginal effects arediffi cult given the bias in σ
I Research continues
Extensions to dynamic tobit models are available (e.g., following theWooldridge (2005) approach)
Stata: -xttobit-
DL Millimet (SMU) ECO 6375 Fall 2020 118 / 164
Panel DataLDV Models: Count Models
Start with the pooled Poisson modelI Model expected number of events conditional on x
E[yit |xit ] = F (xitβ)
where F (·)> 0I Common functional form of F (·) = expxitβI Recall, Poisson dbn depends only on the mean given by
λit = expxitβI Log-likelihood fn given by
ln[L(θ)] = ∑i ∑t [−λit + yit lnλit − ln(yit !)]
I Estimator is CAN assuming conditional mean is correctly specifiedregardless of Poisson dbn
I Further assumptions needed for inference
DL Millimet (SMU) ECO 6375 Fall 2020 119 / 164
RE model given by
E[yit |xit , ci ] = F (xitβ+ αi )
where F (·) = expxitβ+ αiAssumptions
1 Functional form arises if one assumes that ci = exp(αi ) andunobserved effect enters multiplicatively
2 xit is strictly exogenous3 ci |xi ∼ Γ(δ0, 1/δ0) dbn
F Implies E[ci ] = 1, Var(ci ) = 1/δ0 ≡ η20
4 No serial correlation in yit conditional on xi , ci
Poisson assumption implies equi-dispersion in yit |xi , ciHowever, Var(yit |xi ) = E [yit |xi ](1+ η20 E [yit |xi ]) ⇒ over-dispersionconditional on xi alone
DL Millimet (SMU) ECO 6375 Fall 2020 120 / 164
Estimation entails forming the likelihood fn, based on the densityf (yi1, ..., yiT |xi ), which requires integrating over the dbn of c
I Assumption concerning the gamma dbn allows this density to betractable
F Assuming normality is also possible
I Under maintained assumptions, MLE is effi cientI Estimates are sensitive to violations
A QMLE RE estimator relaxes some of these assumptions and isequivalent to a pooled NB model
A Mundlak (1978) correlated RE model is feasible
DL Millimet (SMU) ECO 6375 Fall 2020 121 / 164
Hausman et al. (1984) develop a FE version, allowing dependencebetween xi and ciRequires same assumptions as RE model above except thoseconcerning the dbn of ciThe conditional likelihood does not depend on ci
L(θ) = ∏i Pr (yi1, ..., yiT |xi , ni , ci )
if we condition on ni , ni = ∑Tt=1 yit (ni is a minimal suffi cient
statistic)The contribution of obs i to the conditional log-likelihood is
ln[Li (θ)] = ln[Γ (1+∑t yit )]−∑t ln [Γ (yit + 1)] +∑t yit ln[expxitβ
∑s expxisβ
]Known as the FE Poisson estimator
I Consistency only requires the conditional mean to be correctly specifiedI As such, estimator is applicable to any model where this is the case(e.g., yit could be continuous, contain negative values, binary, etc.)
F Example: gravity models in levels
DL Millimet (SMU) ECO 6375 Fall 2020 122 / 164
Notes:I If ni = 0, then obs does not contribute any informationI FE-NB model also feasible, based on
ln[Li (θ)] = ln [Γ (∑t λit )] + ln [Γ (1+∑t yit )]− ln [Γ (∑t yit +∑t λit )]
+∑t ln [Γ (yit + λit )]− ln [Γ(λit )]− ln [Γ(1+ yit )]
where λit = expxitβF FE-NB allows estimation of coeffs on time invariant regressors and anintercept
F FE-Poisson does not
I Difference follows from the fact that the Poisson specifiesE[yit |xit ] = F (xitβ+ αi ), whereas NB specifiesE[yit |xit ] = θi expxitβ, where θ is a scaling parameter rather than ashifter of the conditional mean function
I Since the conditional mean is homogeneous in the FE-NB, an interceptand effects of time invariant regressors are estimable
I Dynamic count models considered in Wooldridge (2005)
Stata: -xtpoisson-, -xtpqml-, -xtnbreg-
DL Millimet (SMU) ECO 6375 Fall 2020 123 / 164
Panel DataLDV Models: Qualitative Choice Models
Pg. 653-4 in Wooldridge
DL Millimet (SMU) ECO 6375 Fall 2020 124 / 164
Panel DataLDV Models: Ordered Models
FE ordered probitI Suffers from incidental parameters problem, as in the binary probitmodel
I Evidence suggests that finite sample bias appears to be of similarmagnitude
FE ordered logit is consistent, but has limited usefulnessI Model setup
y∗it = xitβ+ ci + εit , εit ∼ L
yit = j if y∗it ∈ (µj−1, µj )
whereF j = 0, 1, ..., JF µ−1 = −∞, µ0 = 0, and µJ = ∞
I Probabilities given by
Pr(yit = j) = Λ(µj − xitβ− ci )−Λ(µj−1 − xitβ− ci )
DL Millimet (SMU) ECO 6375 Fall 2020 125 / 164
Define a new sequence of dummy variables
wit ,j =1 if yit > j0 otherwise
, j = 0, 1, ..., J − 1
It follows that
Pr(wit ,j = 1|X ) = Λ(−µj + xitβ+ ci )
= Λ(θi + xitβ)
which reduces to a series of J − 1 binary, FE logits estimable as before
DL Millimet (SMU) ECO 6375 Fall 2020 126 / 164
Notes:I Procedure produces J − 1 consistent estimates of β; these can becombined into a single vector β using MDE
βMDE = argminβ
∑J−1j=0 ∑J−1m=0(βj − β)′[V−1jm
](βm − β)
where V−1jm is the j ,m block of the inverse of the (J − 1)K × (J − 1)Kmatrix V that contains the Asy.Cov(βj , βm), but estimates of theoff-diagonal blocks are not obvious
I While β is consistent, the threshhold parameters are not identified,implying no estimates of marginal effects or predicted probabilities
I RE version available
DL Millimet (SMU) ECO 6375 Fall 2020 127 / 164
Panel DataHeterogeneous Time Trends
All estimators to this point are known as homogeneous estimators asthey assume constant parameters across i
Allowing for individual-specific time trends yields
yit = ci + λi t + xitβ+ εit
Estimation proceeds by FD
∆yit = λi + ∆xitβ+ ∆εit
and then estimating this model by RE, FE, FD, or LSDV given theassumptions one wishes to invoke
Note: This requires T ≥ 3
DL Millimet (SMU) ECO 6375 Fall 2020 128 / 164
Panel DataHeterogeneous Slopes
Model allowing for heterogeneous slopes more generally is given by
yit = ci + γiyit−1 + xitβi + εit
γi = γ+ η1iβi = β+ η2i
θi =βi
1− γi
Perhaps more realistic in many applications as assumption of identicaleffects of covariates may not be justified
Estimation depends on what one is interested in estimating:population average parameters and/or panel-specific parameters
Even if one is only interested in population average parameters,acknowledgement of panel-specific parameters complicates estimation
Estimation now typically requires N,T → ∞
DL Millimet (SMU) ECO 6375 Fall 2020 129 / 164
Swamy (1970) FGLS estimator
Restricted to models where ci = c and γi = 0 ∀i
yit = xitβi + εit
and x includes an intercept
Assumptions:
(S.i) E[ε] = 0
(S.ii) E[εi ε′j ] =
σ2i IT if i = j0 otherwise
(S.iii) βi = β+ ηi , where E[η] = 0(S.iv) ηi and εj are independent
(S.v) E[ηiη′j ] =
Ω if i = j0 otherwise
where Ω is a K ×K matrix
DL Millimet (SMU) ECO 6375 Fall 2020 130 / 164
Substituting for βi implies
yit = xitβ+ (εit + xitηi ) = xitβ+ υit
where
E[υυ′] =
x1Ωx ′1 + σ21 IT 0 · · · 0
0 x2Ωx ′2 + σ22 IT · · · 0...
......
0 0 · · · xNΩx ′N + σ2N IT
= Σ
Thus, there are two sources of heteroskedasticity, as well as fact thatυ is serially correlated within panels
Estimation could ignore panel-specific parameters and insteadestimate the model using pooled OLS with clustered standard errors
Alternative is FGLS
DL Millimet (SMU) ECO 6375 Fall 2020 131 / 164
GLS (Aitken) estimator given by
βGLS = (x ′Σ−1x)−1x ′Σ−1y
=[∑i x
′i (xiΩx
′i + σ2i IT )
−1xj]−1
∑i x′i (xiΩx
′i + σ2i IT )
−1yi
= ∑i ωi βi
where
ωi =
∑j[Ω+ σ2j (x
′j xj )
−1]−1−1 [Ω+ σ2i (x′j xj )
−1]−1βi = (x ′i xi )
−1x ′i yi
which is a weighted average of βi
DL Millimet (SMU) ECO 6375 Fall 2020 132 / 164
FGLS estimatorI Requires estimates of Ω and σ2i ∀iI Estimates given by
σ2i =y ′iMi yiT −K =
ε′i εiT −K
Ω =SbN − 1 −
1N
∑i σ2i (x′i xi )
−1
where
Mi = I−xi (x ′i xi )−1x ′iSb = ∑i βi β
′i −
1N
∑i βi ∑i β′i
and βi and εi are obtained via panel-specific OLS (∴ large T )Can test Ho : β1 = · · · = βN after
Stata: -xtrc-
DL Millimet (SMU) ECO 6375 Fall 2020 133 / 164
Pesaran & Smith (1995) estimators: ci 6= c,γi 6= γ ∀iMean Group (MG) Estimator
I Estimate separate time series regressions for each iI Obtain an average estimate of γ and β across the N regressions
Pooled (P) EstimatorI Pool data, but allow parameters to have a random component
Aggregate (A) EstimatorI Averaging the data across i (for each t)I Estimate a single, aggregate time series regression
Cross-Section (CS) EstimatorI Average the data across t (for each i)I Estimate a single, cross-section regression
⇒ MG provides estimates of individual-specific coeffs, as well as averagecoeffs; remaining estimators estimate only the average coeffs
See -xtmg- in Stata
DL Millimet (SMU) ECO 6375 Fall 2020 134 / 164
PropertiesI In a static model (γi = 0 ∀i) with x strictly exogenous and βi differrandomly and are distributed independently across i , all four estimatorsare unbiased and consistent for β
I In a dynamic model (γi 6= 0 ∀i), this is not the caseF MG estimator remains consistent as N ,T → ∞F P and A estimators are inconsistent due to serial correlation in theregressors (at least yit−1 is serially correlated due to ci ) which, whencoeff heterogeneity is ignored, induces serial correlation in the errorterm which leads to inconsistent estimates in models with laggeddependent vars (proofs in paper)
F CS estimator remains consistent for the average long-run coeffs, θ,where θi = βi/(1− γi ), but is biased and inconsistent if based onshort T
DL Millimet (SMU) ECO 6375 Fall 2020 135 / 164
Pooled estimatorI Estimating equation
yit = ci + γyit−1 + xitβ+ [εit + η1i yit−1 + xitη2i ]
= ci + γyit−1 + xitβ+ υit
where E[yit−1υit ] 6= 0 and E[xitυit ] 6= 0 if x is serially correlated;E[yit−1υit ] 6= 0 even if x is not serially correlated since E[yit−1η1i ] 6= 0
I IV solution does not exist since any strong IV will also be correlatedwith υit
Aggregate estimatorI Estimating equation
y t = c + γy t−1 + x tβ+ [εt +1N
∑i (η1i yit−1 + xitη2i )]
= c + γy t−1 + x tβ+ υt
I OLS will be biased due to correlation between regressors and υI Unlikely that an IV solution exists
DL Millimet (SMU) ECO 6375 Fall 2020 136 / 164
Cross-section estimatorI Estimating equation
y i = ci + γy i ,−1 + x i β+ [εi + (η1i y i ,−1 + x iη2i )]
= ci + γy i ,−1 + x i β+ υi
which is the between estimator of the dynamic modelI OLS will be biased even if γ and β are homogeneous, but c isheterogeneous (due to correlation with y i ,−1)
DL Millimet (SMU) ECO 6375 Fall 2020 137 / 164
Alternative cross-section estimatorI Estimating equation obtained by substituting
y i ,−1 = y i − (1/T )(yiT − yi0) = y i − ∆i
since y i ,−1 = (1/T )∑Tt=1 yit−1 and solving for y i
y i =1
1− γici −
γi1− γi
∆i + x i θi +1
1− γiεi
I Introduce heterogeneity asγi
1− γi=
γ
1− γ+ ξ1i
θi = θ + ξ2i
yielding
y i =1
1− γici −
γ
1− γ∆i + x i θ +
[1
1− γiεi − ξ1i∆i + x i ξ2i
]which can be estimated by OLS omitting ∆i since it is asymptoticallyuncorrelated with x i
I Provides biased but consistent estimates of the average LR coeff, θ, ifx is uncorrelated with c ; bias vanishes only as T → ∞
DL Millimet (SMU) ECO 6375 Fall 2020 138 / 164
Pesaran, Shin & Smith (1997, 1999) estimator
MG estimator allows all parameters to vary by i
FE estimator allows only the intercept, c , to vary with i
Pooled Mean Group (PMG) estimator is a compromiseI Model written as a ECMI The long-run coeffs in the cointegrating relationship, θ, are constrainedto be equal across i
I The short-run coeffs and speed of adjustment parameters vary by iI Stata: -xtpmg-
Fomby & Balke (1997): threshold cointegration allows the speed ofadjustment parameter to vary between two regimes
DL Millimet (SMU) ECO 6375 Fall 2020 139 / 164
Nonstationary Panels
The increased use of panel data by more macro-oriented economistsled to study of asymptotics as both N,T → ∞Allowing T → ∞ led to concerns over nonstationarity, spuriousregressions, cointegration
Estimation with nonstationary data is easier in some respects withpanel data than with pure time series
I Many test statistics and estimators are asymptotically normal whereasthey are not with time series
I Spurious regression gives consistent estimates as N,T → ∞
Brief overview of these topics; see Baltagi for details and additionalreferences
DL Millimet (SMU) ECO 6375 Fall 2020 140 / 164
Nonstationary PanelsPanel Unit Roots
Asymptotic properties of test statistics and estimators dependcrucially on the way in which N and T go to ∞Three possibilities
1 Sequential Limit: one, say N, goes to ∞ followed by the other, say T2 Diagonal Path Limit: N,T → ∞ at same rate, but along a specific,diagonal path s.t. T = T (N)
3 Joint Limit: N,T → ∞ at same rate without placing diagonal pathrestrictions
Joint limit theory is the most robust, but is more diffi cult and requiresstronger conditions (e.g., existence of higher moments)
Stata: -xtunitroot- (replaces previous, user-written commands)
DL Millimet (SMU) ECO 6375 Fall 2020 141 / 164
Model
Given byyit = ρiyit−1 + zitβi + εit
where
zit =
1 ⇒ FE model0[1 t] ⇒ FE model with panel-specific time trend
DL Millimet (SMU) ECO 6375 Fall 2020 142 / 164
Test Procedures
1 Levin, Lin, & Chu (LLC, 2002)2 Harris & Tsavalis (1999)3 Breitung (2000)4 Im, Pesaran, & Shin (IPS, 2003)5 Fisher-type tests6 Hadri (2000)7 Pesaran (2003)
Hlouskova & Wagner (2006) conduct a large-scale MC study of thedifferent tests
DL Millimet (SMU) ECO 6375 Fall 2020 143 / 164
Tests differ in terms of1 Whether ρi = ρ ∀i is assumed2 Rate at which N,T → ∞
no constant ⇒ z = 0; trend ⇒ z = [1 t]DL Millimet (SMU) ECO 6375 Fall 2020 144 / 164
Before examining the panel tests, recall the ADF tests from a puretime series
I Model
yt = α+ βt + γyt−1 +∑ps=1 γs∆yt−s + εt , or
∆yt = α+ βt + δyt−1 +∑ps=1 γs∆yt−s + εt
where the random walk with and without drift are obtained imposingthe proper restrictions
I Test statistics are
DFτ =γ− 1s.e.(γ)
or the usual t-statistic for δI Critical values depend on T and whether or not the intercept and/orthe time trend are included
DL Millimet (SMU) ECO 6375 Fall 2020 145 / 164
Harris & Tsavalis (1999) TestI Model
yit = ρyit−1 + zitβi + εit
where ε is iid and normally distributedI Hypotheses
Ho : all panels contain a unit root
Ha : no panels contain a unit root
I Test statistic is based on the derivation of the asymptotic dbn of ρolsunder Ho (and each choice of z)
√N(ρols − µ) ∼ N(0, σ2)
where µ and σ2 depend on the choice of zI Asymptotic dbn is valid as N → ∞ (even with fixed T )I Test requires a balanced panel
DL Millimet (SMU) ECO 6375 Fall 2020 146 / 164
Levin, Lin & Chu (LLC, 2002) TestI Model
∆yit = φyit−1 + zitβi +∑ps=1 γis∆yit−s + uit
where p is chosen based on some model selection criteriaI Hypotheses
Ho : all panels contain a unit root
Ha : no panels contain a unit root
I Test requires
1 εit to be independent across panels (zero cross-sectional dependence);uit to be white noise
2√N/T or N/T → 0 depending on choice of z
3 balanced panel
I Test is based on a ‘bias-adjusted’t-statistic for φ which has a N(0, 1)dbn
I However, procedure starts with panel-specific OLS estimates of themodel
DL Millimet (SMU) ECO 6375 Fall 2020 147 / 164
Breitung (2000) TestI Amended version of LLC that performs better with fixed effectsI Hypotheses
Ho : all panels contain a unit root
Ha : no panels contain a unit root
I Test requires
1 a balanced panel2 ε is iid
I A robust version is available that allows for contemporaneouscross-sectional dependence
DL Millimet (SMU) ECO 6375 Fall 2020 148 / 164
Im, Pesaran & Shin (IPS, 2003) TestI Model
∆yit = φi yit−1 + zitβi +∑ps=1 γis∆yit−s + εit
where ε is iid normal across panels, but may be groupwiseheteroskedastic
I Hypotheses
Ho : all panels contain a unit root (φi = 0 ∀i)Ha : at least some panels do not contain a unit root
I Formally, Ha entails that limN→∞(N1/N)→ δ, δ ∈ (0, 1], where N1 isthe # of panels without a unit root
I Test statistic based on average of the ADF statistics for φ obtainedfrom each panel separately
I Critical values based on fixed T and either fixed N or N → ∞
DL Millimet (SMU) ECO 6375 Fall 2020 149 / 164
Combining p-values (Fisher-type) TestI Developed in Madala & Wu (1999)I Hypotheses
Ho : all panels contain a unit root
Ha : at least some panels do not contain a unit root
I Similar to IPS, but rather than averaging test statistics acrossindividual panels, test is based on aggregating p-values from individualtime series unit root tests
I P-values obtained using either ADF of Phillips-Perron unit root testsapplied to each panel
I Choi (2001) offers some extensions
DL Millimet (SMU) ECO 6375 Fall 2020 150 / 164
Residual-Based LM TestI Developed in Hadri (2000); flips the null and alternative hypotheses toaddress bias toward the null in classical statistical tests
I Hypotheses
Ho : no panels contain a unit root
Ha : at least one panel contains a unit root
I Test designed for large T , moderate N panels; asymptotically testrequires T → ∞ followed by N → ∞
I Test statistic is based on the OLS residualsI Test requires a balanced panel
DL Millimet (SMU) ECO 6375 Fall 2020 151 / 164
Pesaran (2003) TestI Extends IPS to allow for cross-sectional dependence; referred to asCIPS
I Test statistic based on average of the cross-sectionally augmented DF(CADF) statistics obtained from each panel separately
I CADF regressions include the lagged cross-section mean, y t−1, and itsFD, ∆y t , in the ADF regressions
I Stata: -pescadf-
DL Millimet (SMU) ECO 6375 Fall 2020 152 / 164
Nonstationary PanelsSpurious Regression
Entorf (1997)I FE regressions involving independent random walks with and withoutdrift
I For T → ∞ and N fixed, standard FE results are highly misleading;significant, high R2
Phillips & Moon (1999)I Analyzed four cases:
1 Panel spurious regression with no cointegration2 Heterogeneous panel cointegration (vectors vary across i)3 Homogeneous panel cointegration4 Near-homogeneous panel cointegration
I Show that pooled OLS is consistent in each case using both sequentialand joint limits
I Intuition: information from independent cross-section data is valuable
Choi (2013)My own simulations follow ...
DL Millimet (SMU) ECO 6375 Fall 2020 153 / 164
SimulationsI DGP
yit = ci + yit−1 + εyt , t = 2, ...,T
xit = ci + xit−1 + εxt , t = 2, ...,T
yi1 = cixi1 = ci
εxt , εyt ∼ N(0, 1)
I Reps = 20,000I Sample sizes
1 N = 10, T = 1012 N = 10, T = 10013 N = 1000, T = 1014 N = 1000, T = 1001
I Estimating equationyit = ci + βxit + εit
DL Millimet (SMU) ECO 6375 Fall 2020 154 / 164
02
46
8D
ensi
ty
.7 .8 .9 1 1.1 1.2Beta
Note: y=c+L.y+ey; x=c+L.x+ex; ey,ex ~ N(0,1). N = 10; T = 101; Reps = 20000.
Spurious RegressionEmpirical Distribution of Beta
DL Millimet (SMU) ECO 6375 Fall 2020 155 / 164
0.0
02.0
04.0
06.0
08.0
1D
ensi
ty
0 100 200 300 400tstatistic
Note: y=L.y+ey; x=L.x+ex; ey,ex ~ N(0,1). N = 10; T = 101; Reps = 20000.
Spurious RegressionEmpirical Distribution of tStatistics
DL Millimet (SMU) ECO 6375 Fall 2020 156 / 164
05
1015
2025
Den
sity
.9 .95 1 1.05 1.1Beta
Note: y=c+L.y+ey; x=c+L.x+ex; ey,ex ~ N(0,1). N = 10; T = 1001; Reps = 20000.
Spurious RegressionEmpirical Distribution of Beta
DL Millimet (SMU) ECO 6375 Fall 2020 157 / 164
02.
0e0
44.
0e0
46.
0e0
48.
0e0
4.0
01D
ensi
ty
0 1000 2000 3000 4000tstatistic
Note: y=L.y+ey; x=L.x+ex; ey,ex ~ N(0,1). N = 10; T = 1001; Reps = 20000.
Spurious RegressionEmpirical Distribution of tStatistics
DL Millimet (SMU) ECO 6375 Fall 2020 158 / 164
020
4060
80D
ensi
ty
.96 .97 .98 .99 1Beta
Note: y=c+L.y+ey; x=c+L.x+ex; ey,ex ~ N(0,1). N = 1000; T = 101; Reps = 20000.
Spurious RegressionEmpirical Distribution of Beta
DL Millimet (SMU) ECO 6375 Fall 2020 159 / 164
0.0
02.0
04.0
06.0
08.0
1D
ensi
ty
1400 1500 1600 1700 1800tstatistic
Note: y=L.y+ey; x=L.x+ex; ey,ex ~ N(0,1). N = 1000; T = 101; Reps = 20000.
Spurious RegressionEmpirical Distribution of tStatistics
DL Millimet (SMU) ECO 6375 Fall 2020 160 / 164
050
100
150
200
250
Den
sity
.99 .995 1 1.005Beta
Note: y=c+L.y+ey; x=c+L.x+ex; ey,ex ~ N(0,1). N = 1000; T = 1001; Reps = 20000.
Spurious RegressionEmpirical Distribution of Beta
DL Millimet (SMU) ECO 6375 Fall 2020 161 / 164
02.
0e0
44.
0e0
46.
0e0
48.
0e0
4.0
01D
ensi
ty
14000 15000 16000 17000 18000tstatistic
Note: y=L.y+ey; x=L.x+ex; ey,ex ~ N(0,1). N = 1000; T = 1001; Reps = 20000.
Spurious RegressionEmpirical Distribution of tStatistics
DL Millimet (SMU) ECO 6375 Fall 2020 162 / 164
Nonstationary PanelsCointegration
Modelyit = βixit + zitγi + εit
where y and x are I (d), d > 0, variables
zit =
1 ⇒ FE model0[1 t] ⇒ FE model with panel-specific time trend
and εit is I (d − 1) iff y and x are cointegrated
Homogeneity or poolability refer to whether the cointegrating vectoris common across panels, βi = β
Tests differ depending on whether they allow for cross-sectionaldependence, structural breaks, etc.
Error correction models allow for estimation of adjustment speedconditional on cointegration
DL Millimet (SMU) ECO 6375 Fall 2020 163 / 164
Test procedures
Kao (1999) TestsI Based on DF- or ADF-type tests of the FE residualsI Different test statistics depending on assumptions concerning strictexogeneity
I Ho : no cointegration
Residual-Based LM TestI Developed in McCoskey & Kao (1998)I Based on residuals from Fully Modified OLS (FMOLS) or dynamic OLS(DOLS)
I Ho : cointegrationI Stata: -xtdolshm-
Pedroni (2000, 2004) TestsI Ho : no cointegrationI Stata: -xtpedroni-
Westerlund (2007)I Stata: -xtwest-
DL Millimet (SMU) ECO 6375 Fall 2020 164 / 164