and common factorsi - wei shi · many spatial panel data sets exhibit cross sectional and/or...
TRANSCRIPT
A Spatial Panel Data Model with Time Varying Endogenous Weights Matricesand Common FactorsI
Wei Shia,∗, Lung-fei Leeb
aInstitute for Economic and Social Research, Jinan University, Guangzhou, Guangdong 510632, China.bDepartment of Economics, The Ohio State University, Columbus, Ohio 43210, United States.
Abstract
Many spatial panel data sets exhibit cross sectional and/or intertemporal dependence from spatial interac-
tions or common factors. In an application of a spatial autoregressive model, a spatial weights matrix may
be constructed from variables that may correlate with unobservables in the main equation and therefore is
endogenous. Some common factors may be unobserved and correlate with included regressors in the equa-
tion. This paper presents a unified approach to model spatial panels with endogenous time varying spatial
weights matrices and unobserved common factors. We show that the proposed QML estimator is consistent
and asymptotically normal. As its limiting distribution may have a leading order bias, an analytical bias
correction is proposed. Monte Carlo simulations demonstrate good finite sample properties of the estima-
tors. This model is empirically applied to examine the effects of house price dynamics on reverse mortgage
origination rates in the United States.
Keywords: Spatial panel data, endogenous spatial weighting matrix, multiplicative individual and time
effects, QMLE, reverse mortgages
JEL classification: C13, C23, C51, G21
IThis version: 9/7/2016. We would like to thank Stephanie Moulton for sharing the HECM loan data, and participants atseminars in the Ohio State University, the 15th International Workshop on Spatial Econometrics and Statistics, the 2016 NorthAmerican Summer Meeting of the Econometric Society, and the 2016 China Meeting of the Econometric Society for comments.The usual disclaimer applies.∗Corresponding author.Email addresses: [email protected] (Wei Shi ), [email protected] (Lung-fei Lee)
1. Introduction
The increasing availability of large panel data brings many opportunities that are not available with only
cross sectional or time series data (see Hsiao (2014) for a good review). Individual heterogeneity can be
better accounted for in a panel. In a two-way fixed effects model, observed regressors can correlate with an
individual and time invariant component (the individual effect), and a common and time variant component
(the time effect) of the error term. Recently the additive convention on individual and time effects is extended
to allow for additional interactive effects, where unobserved time factors can have heterogeneous impacts on
individuals (Pesaran (2006), Bai (2009), Ahn et al. (2013) and Moon and Weidner (2015)). In those models,
the unobservables exhibit a factor structure and the covariates may correlate with these factors. The factors
can capture omitted economy-wide shocks, etc, and the two-way fixed effects model is a special case of a
general interactive fixed effects model.
Cross sectional dependence can be generated with unobserved time factors as they influence all cross
sectional units. In addition, cross sectional dependence can also arise due to spatial effects. In terms of
econometric modeling, spatial effects can be modeled with a spatially lagged dependent variable (Cliff and
Ord (1973)). Spatial weights matrix provides a structure on spatial dependence and outcomes of individuals
close by may be interdependent. Estimation methods of the spatial autoregressive (SAR) model have been
well established for an exogenously given spatial weights matrix (see, inter alia, Kelejian and Prucha (2001),
Kelejian and Prucha (2010), Lee (2004) and Yu et al. (2008)). In a SAR model, an explanatory variable may
have both an direct effect on an individual’s outcome, as well as an indirect effect through spillovers on its
neighbors. In addition, spatial correlation may be modeled with spatially dependent errors. For example,
Pesaran and Tosetti (2011) consider cross sectionally dependent errors that have factor and spatial structures,
which are interpreted, respectively, as strong and weak cross section dependence (also see Holly et al. (2010)
and Holly et al. (2011) for empirical applications to house prices).
A spatial weights matrix can be based on geographic distances, or in many applications, economic theory
may inform how individuals are connected with links constructed using economic variables (see the appli-
cations in Conley and Ligon (2002), Kelejian and Piras (2014)). However, when a spatial weights matrix
is constructed from economic variables, the exogeneity assumption may not be satisfied if these included
economic variables in the SAR equation correlate with disturbances of the equation. In a recent paper, Hor-
race et al. (2015) propose a network production model where a manager selects workers into the network
and whose decision may depend on factors that correlate with the network unobserved characteristics in the
production function. They show that the endogenous effect can be accounted for by using a Heckman type
parametric or semi-parametric approach, or by including network fixed effects. Qu and Lee (2015) consider
a cross section SAR model with an endogenously generated spatial weights matrix. Their approach caters to
cases where individuals reside in a space, which is a common feature in many regional science applications.
Using a control function approach, they provide consistent and asymptotically normal IV, QMLE and GMM
estimators. Kelejian and Piras (2014) propose a 2SLS estimator for a panel data model with an endogenous
spatial weights matrix and additive individual and time effects.
Panels with interactive effects also have wide empirical applications. For example, in regional policy
evaluations, interactive effects can better control for time varying cross section heterogeneities (Gobillon
1
and Magnac (2015)). It has been used in the study of the effect of political and economic integration of
Hong Kong with mainland China (Hsiao et al. (2012)), the effect of divorce law reforms on divorce rates
(Kim and Oka (2014)), among others. As noted in Blundell et al. (2004), researchers need to weigh the
benefits of using close neighbors as controls which are more likely to be otherwise the same as the treated,
against the potential contaminating spatial spillover effects. In the regression panel with interactive effects,
one might argue that spatial effects are absorbed into the factor loadings. However, this approach will likely
be inadequate if spatial effects are time varying or endogenous. Furthermore, spatial correlation may be due
to agents interacting with each other, but not just through spatial similarity in factor loading responses. In
addition, the direct and indirect spillover effect of a policy may also be of interest.
This paper extends endogeneity of spatial weights in a cross section setting in Qu and Lee (2015) to a
large n,T panel data model with time varying spatial weights matrices and unobserved interactive individual
and time effects. Bai and Li (2014) and Shi and Lee (2015) consider a panel SAR model with interactive
effects and an exogenously given time invariant spatial weights matrix. In this paper, we have time varying
endogenous spatial weights matrices constructed from variables that may correlate with the disturbances in
the main equation. The interactive fixed effects are controlled for in equations and are treated as parameters.
Even though spatial weights matrices are endogenous and time varying, interactive individual effects and
time factors can be concentrated out. The profile objective function (concentrated likelihood function)
includes a sum of certain eigenvalues of a random matrix. For a regression panel model, Moon and Weidner
(2014, 2015) apply the perturbation method in Kato (1995) to derive its approximate gradient vector and
Hessian matrix, so that asymptotic distributions of estimates of the regression coefficients can be derived.
We show that the perturbation method can also be applied beyond regression panels to our model and that the
QML estimator taking into account of endogenous spatial weights matrices is consistent and asymptotically
normal.
We demonstrate the estimator’s good finite sample performance by a Monte Carlo study. We apply the
model to analyze the effect of house price dynamics on reverse mortgage origination rates in the U.S. We
find that in the reverse mortgage market, loan activity in one state influences activities in neighboring states,
but that the spillover effect is smaller for states dominated by large lenders. This suggests that demand
side factors such as common media exposure may be the driving force behind spatial spillovers in reverse
mortgage activities.
This paper is organized as follows. We discuss our spatial panel model in the next section. We present the
QML estimator in Section 3. Under some regularity assumptions, we investigate its asymptotic properties in
Section 4, mainly its consistency and limiting distribution. As the limiting distribution may not be properly
centered, an analytic bias correction is proposed. A Monte Carlo study is reported in Section 5. The
empirical application on reverse mortgage origination rates is in Section 6. Section 7 concludes.
Notation 1. For a finite dimensional vector η , ‖η‖1 = ∑k |ηk| and ‖η‖2 =
√∑k |ηk|2. Let µi(M) denote
the i-th largest eigenvalue of a symmetric matrix M of dimension n with eigenvalues listed in a decreasingorder such that µn(M) ≤ µn−1(M) ≤ ·· · ≤ µ1(M). For a real matrix A, its spectral norm is ||A||2, i.e.,‖A‖2 =
√µ1 (A′A). In addition, ||A||1 = max1≤ j≤n ∑
ni=1 |Ai j| is its maximum column sum norm, ||A||∞ =
max1≤i≤n ∑nj=1 |Ai j| is its maximum row sum norm, and ‖A‖F =
√tr(AA′) is its Frobenius norm. Define
the projection matrices PA = A(A′A)−1A′ and MA = I−A(A′A)−1A′. In general, lower case letters represent
2
vectors and upper case letters represent matrices.
2. The model
2.1. Model specification
This paper analyzes panel data where individuals are located in space through time. The data set contains
n individuals indexed by i, and each of them has observations for T time periods indexed by t. Individuals
are located on a possibly unevenly spaced lattice D ⊂ Rd0 with d0 ≥ 1, which is time invariant and can
base on geographic variables. The location l : 1, · · · ,n → Dn ⊂ D is a mapping of individual i to its
location l(i) ∈ D ⊂ Rd0 . In addition, let DT ⊂ Z denote the set of time indexes. An individual i at time t
has geographic location l(i) and time location t. This is an adaptation of the topological structure of spatial
processes in Jenish and Prucha (2009, 2012) and its extension by Qu and Lee (2015) to the panel data setting
where the distance between it (individual i at time t) and js depends on both individual identities and time
indexes. Define the metric ρit, js = ρ(it, js) = maxmax1≤k≤d0 |l(i)k− l( j)k| , |t− s| where l(i)k is the kth
component of the d0×1 vector l(i). For asymptotic analysis, sample expansion is ensured by an unbounded
expansion of the sample region (increasing domain asymptotics) along both the cross section and the time
dimensions. Note that although individuals’ geographic locations are time invariant, the strength of a spatial
link may depend also on economic variables, which can vary over time. Those will be made precise later.
Assumption 1. The lattice D⊂Rd0 , d0≥ 1, is infinitely countable. All elements in D are located at distancesof at least ρ0 > 0 from each other. Without loss of generality, we assume that ρ0 = 1.
The equation of interest is
ynt = λWntynt +Xntyβy + Γny fyt + vnt , (1)
where ynt is n×1; Xnty is n×ky matrix of ky exogenous regressors. Wnt is a n×n spatial weights matrix with
elements denoted as wit, jt,nT . A spatial weights matrix measures degrees of connections between spatial units
and is specified as nonnegative and with zero diagonals. Let yit denote the i-th element of ynt . In Eq. (1),
λwit, jt,nT measures the impact of y jt on yit and λ is the spatial interactions coefficient. Economic theories
may suggest variables to be included and how the spatial weights matrix is constructed. For example, in
the study of cross country growth spillovers (Conley and Ligon (2002)), a distance measure between two
countries includes their geographic distance and transportation costs of goods and labor. In the analysis of
cigarette demand (Kelejian and Piras (2014)), weights are based on the relative prices of cigarettes between
two neighboring states. Essentially the spatial weights matrix provides information on how an individual
is relatively affected by its neighbors. Note that individuals’ geographic locations in Assumption 1 may
also limit degrees of spatial interactions among individuals (see Assumption 6). In the following, we drop
the subscript nT in wit, jt.nT and simplify it to wit, jt when no confusion arises, while keeping in mind that
elements like wit, jt may depend on n and T .
Remark 1. Eq. (1) can be interpreted as a best response function where individual i’s optimal action (e.g.contribution towards a public good) depends on its neighbors’ actions. The spatial interaction coefficientλ reflects the indirect effect of neighbors’ Xnty, which can be seen from the reduced form of Eq. (1),
3
ynt = (In−λWnt)−1 (Xntyβy + Γny fyt + vnt
)= ∑
∞s=0 λ sW s
nt(Xntyβy + Γny fyt + vnt
), assuming that In−λWnt is
invertible and its inverse can be represented by a Neumann series. W snt reflects the influence of s-th level
neighbors. Also note that the model (1) and subsequent analysis can be easily extended to the case withmultiple spatial weights matrices.
In an economic model, it is unlikely that included regressors capture all the information. When omit-
ted variables exist and correlate with included regressors, a standard estimation method generally becomes
invalid. In large panels with long time dimension (many time periods), to account for possible unobserved
effects, we postulate that the error term in a SAR panel equation can be decomposed into a common factor
component and an idiosyncratic component, where common factors can potentially correlate with included
regressors. In Eq. (1), the dependent variable is affected by Ry unobserved factors, where the time factors,
fyt , is Ry× 1 and its loading matrix, Γny, is n× Ry which allows time factors to have heterogeneous im-
pacts on individuals. We treat factors and loadings as fixed effects to allow for hidden complex structures.
vnt =(
v1t,nT · · · vnt,nT
)′is a n×1 vector of idiosyncratic shocks with zero mean and finite variance σ2
v .
The factor structure is a flexible yet parsimonious way to model common shocks or a variance-covariance
structure of a panel.1
2.2. Sources of endogeneity
When the degree of spatial interaction is jointly determined with the outcome, Wnt becomes endogenous.
Over time, the spatial weights matrix may also be time varying to reflect changing economic environments
and network links. In this section, we illustrate the types of endogeneity covered in this paper.
The fixed effects formulation allows correlation between Wnt and ynt due to the common factors. For
example, the spatial weights can be influenced by the same time factors fyt that impact ynt , i.e., wit, jt =
h(
γ ′w,i j fyt + εit, jt
)with conformable factor loading γw,i j and even εit, jt being independent from vnt , where
h(·) is some transformation. This can be covered once time factors are treated as fixed time effects in
estimation. More interestingly, the spatial weights matrix may be constructed by variables that may correlate
with disturbances of the outcomes. Suppose there are p variables, zit1, · · · ,zit p used to construct Wnt . The
equations for zitl , l = 1, · · · , p, are
zitl = x′itzlβzl + γ′izl fzt + εitl, (2)
where xitzl are kzl×1 regressors with corresponding coefficient vector βzl , and the unobservables have two
components, fzt consists of Rz×1 time factors with loading γizl and εitl is idiosyncratic error. Stacking Eq.
(2) across i and then over l for time period t, we have znt = (z1t1, · · · ,znt1,z1t2, · · · ,znt p)′ an np× 1 vector
1In the event that time factors are present but mistakenly ignored, the spatial lags might mistakenly capture the influence ofthe time factors as it were generated by spatial interactions in estimation. So it is important to control for time factors in order toproperly recover spatial interaction.
4
which has the structure,
znt = Xntzβz + Γnz fzt + εnt , where Xntz =
x′1tz1... 0n×kzp
x′ntz1...
. . ....
x′1tzp
0n×kz1
...
x′ntzp
(3)
is np× kz with kz = kz1 + · · ·+ kzp; βz is a column vector of dimension kz, βz = (β ′z1, · · · ,β ′zp)′; Γnz =
(γ1z1, · · · , γnz1, γ1z2, · · · , γnzp)′ is np×Rz; and εnt = (ε1t1, · · · ,εnt1,ε1t2, · · · ,εnt p)
′ is an np× 1 vector. Note
that Xnty and Xntz may have some common regressors. Let wit, jt = hit, jt(znt ,ρit, jt) for i, j = 1, · · · ,n, i 6= j,
where h(·)’s are uniformly bounded functions. wit, jt exhibits limited spatial dependence via distance ρit, jt
which will be made precise in Assumption 6.
The following conditional moment assumption specifies that the disturbances in znt may correlate with
vit of the ynt equation. Indirectly, as Wnt is a function of znt , the spatial weights may be endogenous. For an
individual i, collect the error term εit p in its zit p equation and define the p×1 vector εit = (εit1, · · · ,εit p)′.
Assumption 2. The error terms(vit , ε ′it
)′ are independently and identically distributed across i and over
t, and have a joint distribution(vit , ε ′it
)′ ∼ (0,Σvε), where Σvε =
(σ2
v σ ′vε
σvε Σε
)is positive definite, σ2
v is
a scalar variance, the covariance σvε =(σvε1 · · · σvεP
)′ is a p× 1 vector, and Σε is a p× p matrix.Furthermore, supn,T supi,t E |vit |4+δε and supn,T supi,t E‖εit‖4+δε exist for some δε > 0. Denote E(vit |εit) =
ε ′itδ and define ξit = vit−ε ′itδ . Assuming that E(ξ 2
it |εit)=E
(ξ 2
it)=σ2
ξ, E(ξ 3
it |εit)=E
(ξ 3
it
)and E
(ξ 4
it |εit)=
E(ξ 4
it).
Assumption 2 is used in Qu and Lee (2015) to model the endogeneity of the spatial weights matrix.
If σvε 6= 0, the spatial weights matrix correlates with disturbances in the outcome equation of the same
period and estimation methods that rely on exogenous spatial weights matrices might no longer be valid.
As E(ξit |εit) = 0 and var(ξit |εit) = σ2ξ
, the disturbances in the model can therefore be rewritten in terms
of ςit =(
ξit ε ′it
)′. Note that the conditional moment assumptions do not require ξit to be independent
from εit as they are sufficient for our analysis. Denote EnT = (εn1, · · · ,εnT ) and ΞnT =(~ξn1, · · · ,~ξnT
)with
~ξ nt = (ξ1t , · · · ,ξnt)′, respectively np×T and n×T matrices of the disturbances.
Claim 1. Under Assumption 2, ‖ΞnT‖2 = OP
(√max(n,T )
)and ‖EnT‖2 = OP
(√max(n,T )
).
The claim is proved in Lemma S.2.1 of Moon and Weidner (2014) using the result of Latala (2005).
3. Estimation method
The parameters of interest of the model consisting of Eqs. (1) and (3) are θ =(
β ′z ,β′y,λ ,σ
2ξ,α ′,δ ′
)′where βz and βy are respectively the regression coefficients in Eqs. (3) and (1), λ is the spatial interaction
5
coefficient, and σ2ξ
, α and δ are from the variance structure of the disturbances with α being a J×1 vector of
distinct elements in Σε . The unobserved time factors are collected in matrices FTy =(
fy1, · · · , fyT)′ and FT z =
( fz1, · · · , fzT )′, which have dimensions T × Ry and T ×Rz respectively. In view of the unobserved nature of
the common factor components (Γnz, Γny, FT z, FTy), we shall make minimal assumptions on their structures
and allow the znt and ynt equations to be impacted by possibly different factors. As the common factors and
loadings are allowed to correlate with the regressors, they are treated as parameters to be estimated. The
sample average log quasi-likelihood function is
1nT
logLnT (θ , Γnz, Γny,FT z, ΓTy)
=− p+12
log(2π)− 12
log |Σε |−12
logσ2ξ+
1nT
T
∑t=1
log |Snt(λ )|
− 12nT
T
∑t=1
(znt −Xntzβz− Γnz fzt
)′ (Σ−1ε ⊗ In
)(znt −Xntzβz− Γnz fzt
)− 1
2σ2ξ
nT
T
∑t=1
(Snt(λ )ynt −Xntyβy−
(δ′⊗ In
)(znt −Xntzβz− Γnz fzt
)− Γny fyt
)′×(Snt(λ )ynt −Xntyβy−
(δ′⊗ In
)(znt −Xntzβz− Γnz fzt
)− Γny fyt
),
where Snt(λ ) = In − λWnt . Let Γnz =
(Σ− 1
2ε ⊗ In
)Γnz. The common component in the ynt equation is
σ−1ξ
Γny fyt−σ−1ξ
(δ ′⊗ In) Γnz fzt = Γny fyt with the factor loading Γny appropriately constructed from σ−1ξ
Γny
and σ−1ξ
(δ ′⊗ In) Γnz, and fyt collects the factors in fyt and fzt . For example, if fyt = fzt ,
Γny = σ−1ξ
[Γny− (δ ′⊗ In) Γnz
]and fyt = fyt . The dimension of fyt is Ry×1, and write FTy = ( fy1, · · · , fyT )
′.
Then the above equation can be written as
1nT
logLnT (θ ,Γnz,Γny,FT z,FTy)
=− p+12
log(2π)− 12
log |Σε |−12
logσ2ξ+
1nT
T
∑t=1
log |Snt(λ )|
− 12nT
T
∑t=1
[(Σ− 1
2ε ⊗ In
)(znt −Xntzβz)−Γnz fzt
]′[(Σ− 1
2ε ⊗ In
)(znt −Xntzβz)−Γnz fzt
](4)
− 12nT
T
∑t=1
[σ−1ξ
(Snt(λ )ynt −Xntyβy−
(δ′⊗ In
)(znt −Xntzβz)
)−Γny fyt
]′×[σ−1ξ
(Snt(λ )ynt −Xntyβy−
(δ′⊗ In
)(znt −Xntzβz)
)−Γny fyt
]. (5)
In Eq. (5), (δ ′⊗ In)(znt −Xntzβz) controls for the correlation between znt and the disturbances in the ynt
equation. As sample expands in n and T , the number of parameters in factors and their loadings also
increases. Because the parameter of interest is θ , we concentrate out factors and their loadings using the
following lemma.
Lemma 1. Let n, T , R be positive integers such that R ≤ n and R ≤ T . Let ZnT be an n×T matrix, Γn bean n×R matrix, and FT be an T ×R matrix. Then minFT∈RT×R,Γn∈Rn×R tr
((ZnT −ΓnF ′T )
′ (ZnT −ΓnF ′T ))=
minΓn∈Rn×R tr(Z′nT MΓnZnT ) = ∑nr=R+1 µi (ZnT Z′nT ) = ∑
Tr=R+1 µi (Z′nT ZnT ).
6
This lemma relates the estimation of factor loadings as the problem of principle components in statistics.
The proof can be found, for example, in Lemma A1 of Moon and Weidner (2014). The concentrated log
likelihood is2
QnT (θ) =−p+1
2log(2π)− 1
2log |Σε |−
12
logσ2ξ+
1nT
T
∑t=1
log |Snt(λ )|−12
LznT (θ)−
12
LynT (θ), (6)
where
LznT (θ) =
1nT
np
∑r=Rz+1
µr
(T
∑t=1
dntd′nt
), with dnt =
(Σ− 1
2ε ⊗ In
)(znt −Xntzβz) , and,
LynT (θ) =
1nT
n
∑r=Ry+1
µr
(T
∑t=1
gntg′nt
), with gnt = σ
−1ξ
(Snt(λ )ynt −Xntyβy−
(δ′⊗ In
)(znt −Xntzβz)
).
The QML estimator is θ = argmaxθ∈Θ QnT (θ). The estimate Γny (Γnz) for Γny (Γnz) can be obtained as the
eigenvectors associated with the first Ry (Rz) largest eigenvalues of ∑Tt=1 gntg′nt (∑T
t=1 dntd′nt). By switching
n and T , the estimate FTy (FT z) for FTy (FT z) can be similarly obtained. Note that Γny and FTy (Γnz and FT z)
are not unique as ΓnyHH−1F ′Ty is observationally equivalent to ΓnyF ′Ty for an invertible Ry×Ry matrix H.
However, the column spaces of Γny and FTy (Γnz and FT z) are invariant to H, hence the projectors MΓny
, MFTy,
MΓnz
and MFT zare uniquely determined.
Our model also covers two simpler cases of interest.
• Seemingly unrelated regression (SUR) equations panel with factors
By considering the equations of znt alone, one is considering the estimation of a p equation SUR model with
unobserved common factors, which extends the single equation model (e.g., Pesaran (2006) and Bai (2009))
by allowing cross equation correlations. The likelihood component of znt , t = 1, · · · ,T , is
− p2
log(2π)− 12
log |Σε |−1
2nT
T
∑t=1
[(Σ− 1
2ε ⊗ In
)(znt −Xntzβz)−Γnz fzt
]′[(Σ− 1
2ε ⊗ In
)(znt −Xntzβz)−Γnz fzt
].
• Exogenous spatial weights matrices
In the event that vit is independent from εit , δ = 0 and the model becomes a spatial panel with unobserved
common factors but weights matrices are exogenous. In this case, the outcome equation can be estimated
separately from the znt equation. Imposing δ = 0, the likelihood component of ynt , t = 1, · · · ,T , from (5)
becomes
− 12
log(2π)− 12
logσ2ξ+
1nT
T
∑t=1
log |Snt(λ )|
2In Shi and Lee (2015), we also concentrate out the variance parameter. Because the correlations between equations are also ofinterest in this paper, the variance parameters are not concentrated out here. For example, when the disturbances are jointly normal
and p = 1, the correlation coefficient (ρ) between εit and vit can be solved from δ = ρσvΣ− 1
2ε and σ2
ξ=(1−ρ2)σ2
v .
7
− 12nT
T
∑t=1
[σ−1ξ
(Snt(λ )ynt −Xntyβy)−Γny fyt
]′ [σ−1ξ
(Snt(λ )ynt −Xntyβy)−Γny fyt
],
with Γny = σ−1ξ
Γny. If the spatial weights matrix is time invariant, the case is covered in Shi and Lee (2015).
Here we allow for the more general case with time varying spatial weights matrices. Spatial panel models
with time varying weights matrices have been considered in Lee and Yu (2012); however, those models have
only additive individual and time effects but not interactive effects.
4. Asymptotic properties of the QML estimator
4.1. AssumptionsAssumption 3. (1) The parameter vector θ is in a compact set Θ in the Euclidean space Rkθ with kθ =kz + ky +2+ J+ p, where kz is the dimension of βz, ky is the dimension of βy, J is the dimension of α whichcontains distinct elements in Σε , and p is the dimension of δ . The other two parameters are the spatialinteraction coefficient λ and the variance σ2
ξ. The true parameter vector θ0 is in the interior of Θ.
(2) For any i, j, n and t, the spatial weights wit, jt ≥ 0, wit,it = 0. Furthermore, Wnt and S−1nt are uni-
formly bounded in absolute value in both row and column sums, where Snt = In− λ0Wnt . Furthermore,let supn,t ‖Wnt‖∞
≤ cw with probability 1 for some finite constant cw and supλ∈Λ |λ |cw < 1, where Λ is theparameter space for λ .
(3) All elements in Xnty and Xntz are deterministic and bounded in absolute value.(4) There are respectively Rz0 and Ry0 factors in the znt and ynt equations. The factors and their loadings
are nonstochastic and uniformly bounded.(5) n is an increasing function of T and T goes to infinity.
Assumption 3(2) limits the spatial interaction to a manageable degree. The non-stochastic assumptions
in (3) and (4) are used here to simplify the analysis in order to concentrate on the key issue of endogenous
spatial dependence in the presence of common time factors, which generate another form of cross sectional
dependence, but these assumptions can be relaxed by imposing moment conditions, see Moon and Weidner
(2014). The asymptotics of this paper are derived with both n and T jointly increasing, as in (5). The
following assumptions rule out collinearity among the regressors and are critical for parameter identification.
Notation 2. Define the following n× T dimensional matrices of regressors in the z and y equations. Fork = 1, · · · ,kz, XnT z,k =
(Xn1z,k · · · XnT z,k
)and Xntz,k is the k-th column of Xntz. For k = 1, · · · ,ky, XnTy,k =(
Xn1y,k · · · XnTy,k)
and Xnty,k is the k-th column of Xnty.
Assumption 4. There exists positive constants cz, cy, such that
minlz∈Bkz
np
∑r=2Rz+1
µr
(1
nT(lz ·Zz
nT )(lz ·ZznT )′)≥ cz > 0,
minly∈Bky
n
∑r=2Ry+1
µr
(1
nT
(ly ·Zy
nT
)(ly ·Zy
nT
)′)≥ cy > 0,
wpa 1 as n,T →∞, where Bkz (Bky) is a unit ball in the kz (ky+1+ p)-dimensional Euclidean space; lz (ly) isa kz×1 ((ky+1+ p)×1) vector with ‖lz‖2 =
√l′zlz = 1 (‖ly‖2 = 1); furthermore, lz ·Zz
nT =∑kzk=1 lz,kXnT z,k and
ly · ZynT = ∑
ky+1+pk=1 ly,kZy
k,nT , where for k = 1, · · · ,ky, Zyk,nT = XnTy,k;
Zyky+1,nT =
(Gn1
(Xn1yβy0 + εn1δ0 + Γny0 fy10
)· · · GnT
(XnTyβy0 + εnT δ0 + Γny0 fyT 0
))with Gnt = WntS−1
nt
8
and εnt =(ε1t · · · εnt
)′ is n× p; and for k = 1, · · · , p, Zyky+1+k,nT =
(εn1,k · · · εnT,k
)with εnt,k being the
k-th column of εnt .
Note that despite the simultaneity in the model, no exclusion restrictions are necessary, because of the
nonlinear interactions of Wnt and ynt and the nonlinearity of znt in Wnt . Assumption 4 will not be satisfied
if rank(
Zyky+1,nT
)≤ 2Ry, which can happen if βy0 = 0, δ0 = 0 and the spatial weights matrices are time
invariant. This assumption will also be violated if Gnt(Xntyβy0 + εntδ0 + Γny0 fyt0
)is linearly dependent on
Xnty and εnt,k. We provide the following alternative conditions, which do not include the above term with
Gnt , but with additional condition on variance structures in terms of the reduced form disturbances. The
later condition is motivated in Lee (2004), which is relevant for the identification of λ .
Assumption 5. (1) There exists positive constants cz, cy, such that
minlz∈Bkz
np
∑r=2Rz+1
µr
(1
nT(lz ·Zz
nT )(lz ·ZznT )′)≥ cz > 0,
minly∈Bky
n
∑r=2Ry+1
µr
(1
nT
(ly ·Zy
nT
)(ly ·Zy
nT
)′)≥ cy > 0,
wpa 1 as n,T → ∞, where Bkz (Bky) is a unit ball in the kz (ky + p)-dimensional Euclidean space; lz (ly)is a kz × 1 ((ky + p)× 1) nonzero vector with ‖lz‖2 =
√l′zlz = 1 (‖ly‖2 = 1). Furthermore, lz · Zz
nT =
∑kzk=1 lz,kXnT z,k. In addition, ly · Zy
nT = ∑ky+pk=1 ly,kZy
k,nT . For k = 1, · · · ,ky, Zyk,nT = XnTy,k; for k = 1, · · · , p,
Zyky+k,nT =
(εn1,k · · · εnT,k
)and εnt,k is the k-th column of εnt , with εnt =
(ε1t · · · εnt
)′ an n× p matrix.(2) For any λ ∈Θλ , λ 6= λ0,
lim infn,T→∞
(1
nT
T
∑t=1
tr(Snt(λ )
′Snt(λ )S−1nt S−1
nt′)− T
∏t=1
∣∣Snt(λ )′Snt(λ )S−1
nt S−1nt′∣∣ 1
nT
)> 0.
Assumption 5(2) ensures the identification of λ . By the arithmetic and geometric means inequality,
1nT ∑
Tt=1 tr
(Snt(λ )
′Snt(λ )S−1nt S−1
nt′)≥ 1
T ∑Tt=1
∣∣Snt(λ )′Snt(λ )S−1
nt S−1nt′∣∣ 1
n ≥(
∏Tt=1
∣∣Snt(λ )′Snt(λ )S−1
nt S−1nt′∣∣ 1
n
) 1T
.
So this assumption essentially requires that the inequality holds strictly in the limit for λ 6= λ0.
Qu and Lee (2015) provides a useful LLN for some key statistics for a cross sectional spatial model with
an endogenous spatial weights matrix, which is derived using properties of spatial NED processes (Jenish
and Prucha (2012)). That result will also be useful for our panel model by extending spacial location into
the time-space location setting. Write zit =(
zit1 · · · zit p
)′.
Assumption 6. (1) The spatial weights satisfy wit. jt ≥ 0, wit,it = 0, and wit, jt = 0 if ρit, jt > ρc, i.e., thereexists a threshold ρc > 1 such that the weight is zero if the geographic distance exceeds ρc. For i 6= j,wit, jt = hi j(zit ,z jt)I(ρit, jt < ρc) or the row normalized version that wit, jt =
hi j(zit ,z jt)I(ρit, jt<ρc)
∑nk=1 hik(zit ,zkt)I(ρit,kt<ρc)
, wherehi j(·)′s are non-negative, uniformly bounded functions. (2) The function hi j(·) satisfies the Lipschitz condi-tion,
∣∣hi j(a1,b1)−hi j(a2,b2)∣∣≤ c0 (|a1−a2|+ |b1−b2|) for some finite constant c0.
The underlying topological structure provides a bound on the degree of spatial dependence. Intuitively,
this assumption requires that an individual is connected to other individuals at most ρc distance away and
the functions that construct the spatial weights are uniformly bounded and smooth. Many functions satisfy
9
those properties. For the LLNs in Lemmas 2 and 4 below, the terms that involve the spatial weights need to
satisfy the NED property, which can be derived from the NED property of z with smooth function h(·)’s, as
in Qu and Lee (2015) and Qu et al. (2015).
4.2. Consistency
To accommodate time varying spatial effects, time is considered as an additional dimension in an indi-
vidual’s location vector, as Assumption 1 shows. Viewing individuals at different time points as different
individuals, the panel data can be structured as a large collection of nodes with interactions via space-time
locations. Define GnT = diag(Gn1, · · · ,GnT ) which is a nT×nT diagonal block matrix with each block n×n
submatrix. Let ς∗it be a real valued function such that ς∗it = fit(ξit ,εit ,Xnty,Xntz,Γnz,FT z,Γny,FTy,θ0), and de-
note ς∗nT = (ς∗11, · · · ,ς∗n1,ς∗12, · · · ,ς∗nT )
′ a nT ×1 vector. To prove the consistency of the QML estimator, the
LLN and CLT in Qu and Lee (2015), Propositions 1 and 2, are reproduced here with modification for the
panel data setting.
Lemma 2. Under Assumptions 1, 3(2) and 6, suppose supn,T supi,t E‖ς∗it‖42 < ∞, then
1nT
(ς∗nT′G′nT ς
∗nT −E
(ς∗nT′G′nT ς
∗nT))
= OP(1√nT
), and
1nT
(ς∗nT′G′nT GnT ς
∗nT −E
(ς∗nT′G′nT GnT ς
∗nT))
= OP(1√nT
).
The above is analogue to a cross section data with a spatial weights matrix. Because the asymptotics are
derived with both n and T increasing, the preceding result originated in Qu and Lee (2015) holds by consid-
ering an “extended” cross section data of nT individuals with the nT ×nT matrix GnT . It is straightforward
to check that GnT with Assumptions 3(2) and 6 satisfy Qu and Lee (2015)’s Assumptions 3(1) and 4 for the
LLN. The following results on matrix norms are useful.
Claim 2. Under Assumptions 1,2,3 and 6,∥∥XnT z,k
∥∥2 = O
(√nT)
for k = 1, · · · ,kz,∥∥∥Zy
k,nT
∥∥∥2= OP(
√nT )
for k = 1, · · · ,ky +1, and∥∥∥Zy
ky+1+k,nT
∥∥∥2= OP(max
(√n,√
T)) for k = 1, · · · , p, where XnT z,k and Zy
k,nT aredefined in Notation 2 and Assumption 4.
Claim 1 in Subsection 2.2 is for i.i.d. disturbances. The following result, whose proof is in the appendix,
generalizes it to allow spatially dependent errors.
Claim 3. Under Assumptions 2 and 3 (1) and (2), ‖UnT‖2 = OP
(√nT(
n−14 +T−
14
)), where UnT =
(un1, · · · ,unT ) with unt = Gntξnt or unt = Snt(λ )S−1nt ξnt .
The following proposition establishes that the QML estimator is consistent under appropriate regularity
conditions. Notice that the consistency only requires the number of factors specified in the model to be
not fewer than the true number of factors. Lemma 2 is used to show that the sum of squared residuals
in the likelihood is dominated by ‖θ −θ0‖22. Note that for this purpose the relevant terms do not involve
projectors. When we later show that the variance of the estimator can be consistently estimated, Lemma 2
will be extended to allow for projectors, see Lemma 4. We can also establish a possible rate of convergence.
The convergence rate can be improved to 1√nT
if the number of factors is correctly specified, as shown in the
next subsection.
10
Proposition 1. Under Assumptions 1-3, 4 or 5, 6, and assuming that the number of factors is not under-estimated, i.e., Rz ≥ Rz0 and Ry ≥ Ry0, then
∥∥θ −θ0∥∥
2 = oP(1). Furthermore, under Assumptions 1-3, 4,
6, Rz ≥ Rz0 and Ry ≥ Ry0,∥∥∥βz−βz0
∥∥∥2= Op
(c−
12
nT
),∥∥∥βy−βy0
∥∥∥2= Op
(c−
12
nT
),∣∣∣λ −λ0
∣∣∣= Op
(c−
12
nT
)and∥∥∥δ −δ0
∥∥∥2= Op
(c−
12
nT
), where cnT = min(n,T ).
The proof is in the appendix.
4.3. Asymptotic normality of the QML estimator
To derive the asymptotic distribution of the QML estimator, the mean value theorem or a Taylor ex-
pansion is often used to derive a proper linear-quadratic expansion of the log quasi-likelihood at their true
parameter values. Such a technique is difficult for this model as the log likelihood involves the sum of certain
eigenvalues of the sum of squared residuals. However, because the eigenvalues in the objective function are
0 if θ = θ0, EnT = 0 and ΞnT = 0, and they will be close to 0 when small perturbations are applied, the per-
turbation theory of linear operator are useful (see Kato (1995)). The perturbations are small asymptotically
under assumptions on error disturbances and the consistency of the QML estimator. Moon and Weidner
(2015) use the perturbation method to obtain an approximate gradient vector and an approximate Hessian
matrix for a regression panel with factors. Shi and Lee (2015) apply the method to a QML objective function
of a spatial panel data model. Here we extend our analysis in Shi and Lee (2015) to LznT (θ) and Ly
nT (θ) in
Eq. (6) by taking into account the presence of endogenous and time varying spatial weights matrices. An
additional feature is the presence of variance matrix Σε of the log likelihood function. In order to utilize the
perturbation theory, we adopt the following reparameterization.
Assumption 7. Let Σ− 1
2ε represent the principal square root of Σ−1
ε and it can be linearly parameterized, as
Σ− 1
2ε = ∑
Jj=1 Ω jα j where each Ω j does not involve unknown parameters, Ω1, · · · ,ΩJ are linearly indepen-
dent, and the α’s are J unique unknown elements of Σ− 1
2ε .
Example 1. Supposing p = 2 and no further restrictions are imposed on Σ− 1
2ε , then Σ
− 12
ε has three distinct
elements. So one way to parameterize Σ− 1
2ε is Σ
− 12
ε = ∑3j=1 Ω jα j with Ω1 =
(1 00 0
), Ω2 =
(0 00 1
)and
Ω3 =
(0 11 0
).
Because the common factors and loadings are concentrated out in estimation, in the following, Γnz,
FT z, Γny and FTy refer to their true values in the data generating process, as they are parts of the DGP
for asymptotic analysis. The asymptotic distribution of the QMLE is derived under the asymptotics that
limn,T→∞nT → κ > 0.
Assumption 8. The numbers of factors, Rz0 and Ry0 are constants and known. Γnz, FT z, Γny and FTy as de-fined in Eqs. (4) and (5) are nonstochastic. limn→∞
1n Γ′nyΓny =Γy, limn→∞
1np Γ′nzΓnz =Γz , limT→∞
1T F ′T zFT z =
Fz and limT→∞1T F ′TyFTy = Fy exist and are positive definite.
The above assumption which has also been used in Bai (2009) and Moon and Weidner (2015), states
that each factor contributes a nontrivial share towards the common factor component. When normalized
11
by the sample size nT , this assumption ensures that the eigenvalues from the common factor component
are bounded away from 0 while those from the idiosyncratic errors are close to 0, and thus they are well
separated.
Firstly consider LznT (θ) in Eq. (6) for the spatial weights formation. Let DnT = (dn1, · · · ,dnT ) with
dnt =
(Σ− 1
2ε ⊗ In
)(znt −Xntzβz). We have DnT = ΓnzF ′T z +∑
kz+J+1k=1 η
zkV z
k +∑kz+J+1k1,k2=1 η
zk1
ηzk2
V zk1k2
where
1. For k = 1, · · · ,kz, ηzk = βz0,k−βz,k, V z
k =
(Σ− 1
2ε0 ⊗ In
)XnT z,k.
2. For k = 1, · · · ,J, ηzkz+k = αk0−αk and V z
kz+k =−(Ωk⊗ In)(EnT + ΓnzF ′T z
).
3. ηzkz+J+1 =
‖EnT ‖2√nT
and V zkz+J+1 =
(Σ− 1
2ε0 ⊗ In
)EnT√
nT‖EnT ‖2
.
4. For k1 = 1, · · · ,kz and k2 = 1, · · · ,J, V zk1,kz+k2
=−(Ωk2⊗ In) XnT z,k1 and V zkk′ = 0 otherwise.
The component under analysis here is a SUR panel with p equations and correlations between the equations.
Items (2) and (4) come from the variance. It extends a single regression panel with factors in Moon and
Weidner (2015). Then, we have
LznT (θ) =
1nT
np
∑r=Rz0+1
µr(DnT D′nT
)=
1nT
np
∑r=Rz0+1
µr
(T (0)+
kz+J+1
∑k1=1
ηzk1
T (1)k1
+kz+J+1
∑k1,k2=1
ηzk1
ηzk2
T (2)k1k2
+kz+J+1
∑k1,k2,k3=1
ηzk1
ηzk2
ηzk3
T (3)k1k2k3
+kz+J+1
∑k1,k2,k3,k4=1
ηzk1
ηzk2
ηzk3
ηzk4
T (4)k1k2k3k4
), (7)
where T (0) = ΓnzF ′T zFT zΓ′nz, T (1)
k1= V z
k1FT zΓ
′nz + ΓnzF ′T zV
zk1′, T (2)
k1k2= V z
k1V z
k2′ +V z
k1k2FT zΓ
′nz + ΓnzF ′T zV
zk1k2′,
T (3)k1k2k3
= V zk1
V zk2k3′+V z
k1k2V z
k3′ and T (4)
k1k2k3k4= V z
k1k2V z
k3k4′. Under Assumption 8, 1
nT ∑npr=Rz0+1 µr
(T (0)
)= 0,
because 1nT T (0) has exactly Rz0 positive eigenvalues and the remaining ones are zeros. For small η
zk , Lz
nT (θ)
can be approximated by a quadratic polynomial of ηzk . Using Lemma D.1 and proper but simplified expres-
sions of Lemma D.2 in Shi and Lee (2015), we have
LznT (θ) =Lz
nT (θ0)+2kz+J
∑k=1
ηzk
1nT
tr(
MΓnz
(Σ− 1
2ε0 ⊗ In
)EnT MFT zV
zk′)
−2kz+J
∑k=1
ηzk
1nT
tr(
MΓnzVzk MFT zE
′nT
(Σ− 1
2ε0 ⊗ In
)PΓnz,FT zE
′nT
(Σ− 1
2ε0 ⊗ In
))−2
kz+J
∑k=1
ηzk
1nT
tr(
MΓnz
(Σ− 1
2ε0 ⊗ In
)EnT MFT zV
zk′PΓnz,FT zE
′nT
(Σ− 1
2ε0 ⊗ In
))−2
kz+J
∑k=1
ηzk
1nT
tr(
MΓnz
(Σ− 1
2ε0 ⊗ In
)EnT MFT zE
′nT
(Σ− 1
2ε0 ⊗ In
)PΓnz,FT zV
zk′)
+kz+J
∑k1,k2=1
ηzk1
ηzk2
1nT
tr(
MΓnzVzk1
MFT zVzk2
′)+Lrem,z
nT (θ), (8)
where MΓnz and MFT z are the projection matrices, PΓnz,FT z =Γnz(Γ′nzΓnz
)−1 (F ′T zFT z)−1 F ′T z, and Lrem,z
nT (θ) cap-
tures remainder terms that can be shown to be OP
(‖θ −θ0‖3
2
)+OP
(‖θ −θ0‖2
2 n−12
)+OP
(‖θ −θ0‖2 n−
32
)+
12
OP
(n−
52
). Note that the perturbations that are purely due to the idiosyncratic errors (ηz
kz+J+1V zkz+J+1) are
collected in LznT (θ0), as
LznT (θ0) =
1nT
np
∑r=Rz0+1
µr
(ΓnzF ′T zFT zΓ
′nz +
(Σ− 1
2ε0 ⊗ In
)EnT FT zΓ
′nz +ΓnzF ′T zE
′nT
(Σ− 1
2ε0 ⊗ In
)+
(Σ− 1
2ε0 ⊗ In
)EnT E ′nT
(Σ− 1
2ε0 ⊗ In
)).
Next consider gnt = σ−1ξ
(Snt(λ )ynt −Xntyβy− (δ ′⊗ In)(znt −Xntzβz)) in Eq. (6) for the outcome equa-
tion. By reparameterization, let αξ = σ−1ξ
. Define the following terms
1. For k = 1, · · · ,kz, ηyk = βz0,k−βz,k and V y
k =−σ−1ξ 0 (δ ′0⊗ In) XnT z,k.
2. For k = 1, · · · ,ky, ηykz+k = βy0,k−βy,k, V y
kz+k = σ−1ξ 0 XnTy,k.
3. ηykz+ky+1 = λ0− λ , V y
kz+ky+1 = σ−1ξ 0 XnTy,ky+1 with the t-th column of XnTy,ky+1 given by Xnty,ky+1 =
Gnt(Xntyβy0 + Γny fyt + εntδ0 +ξnt
).
4. ηykz+ky+2 = αξ 0−αξ and V y
kz+ky+2 =−ΞnT −σξ 0ΓnyF ′Ty.
5. For k = 1, · · · , p, ηykz+ky+2+k = δ0k− δk, define ek as a p× 1 vector whose k-th element is 1 and all
others are 0 and V ykz+ky+2+k = σ
−1ξ 0
(e′k⊗ In
)(∑
kzk=1 XnT z,k (βz0,k−βz,k)+ ΓnzF ′T z +EnT
).
Item (3) comes from the spatial interaction and (5) comes from the control function. Similarly, the pertur-
bation theory gives
LynT (θ)
=LynT (θ0)+
kz+ky+2+p
∑k=1
ηyk
2nT σξ 0
tr(MΓnyΞnT MFTyV
yk′)
−kz+ky+2+p
∑k=1
ηyk
2nT σ2
ξ 0tr(MΓnyV
yk MFTyΞ
′nT PΓny,FTyΞ
′nT +MΓnyΞnT MFTyV
yk′PΓny,FTyΞ
′nT +MΓnyΞnT MFTyΞ
′nT PΓny,FTyV
yk′)
− (λ0−λ )(αξ 0−αξ )2
nT σ2ξ 0
vec(ΞnT )′ GnT vec(ΞnT )+
kz+ky+2+p
∑k1,k2=1
ηyk1
ηyk2
1nT
tr(
MΓnyVyk1
MFTyVyk2
′)+Lrem,y
nT (θ),
(9)
where
LynT (θ0) =
1nT
n
∑r=Ry0+1
µr
(ΓnyF ′TyFTyΓ
′ny +σ
−1ξ 0 ΞnT FTyΓ
′ny +σ
−1ξ 0 ΓnyF ′TyΞ
′nT +σ
−2ξ 0 ΞnT Ξ
′nT
),
and the remainder term satisfies Lrem,ynT (θ)=OP
(‖θ −θ0‖3
2
)+OP
(‖θ −θ0‖2
2 n−12
)+OP
(‖θ −θ0‖2 n−
32
)+
OP
(n−
52
). In Eq.(9), the term (λ0−λ )(αξ 0−αξ )
2nT σ2
ξ 0vec(ΞnT )
′ GnT vec(ΞnT ) comes from the interaction
between the spatially generated regressor XnTy,ky+1 and the error term, while the interaction between the
exogenous regressors and the error term can be absorbed into OP
(‖θ −θ0‖2
2 n−12
)in the remainder.
The total number of parameters in θ is k = kz + ky + 2+ J + p. Substituting Eqs.(8) and (9) into the
QML objective function (6), and expanding log∣∣∣∣Σ− 1
2ε
∣∣∣∣, logσ−1ξ
and log |Snt(λ )| around their respective true
13
values, we have
QnT (θ) =−p+1
2log2π− 1
2
(logσ
2ξ 0−2σξ 0
(αξ −αξ 0
)+σ
2ξ 0
(αξ −αξ 0
)2)
− 12
(log |Σε0|−2
J
∑j=1
tr(
Σ12ε0Ω j
)(α j−α0)+
J
∑j,k=1
tr(
Σ12ε0Ω jΣ
12ε0Ωk
)(α j−α j0)(αk−αk0)
)
+1
nT
T
∑t=1
(log |Snt |− tr(Gnt)(λ −λ0)−
12
tr(G2
nt)(λ −λ0)
2)
− 12
LznT (θ0)−
12
LynT (θ0)+
1√nT
(θ −θ0)′C(1)
nT −12(θ −θ0)
′CnT (θ −θ0)+QremnT (θ)
=QnT (θ0)+1√nT
(θ −θ0)′ C(1)
nT −12(θ −θ0)
′ CnT (θ −θ0)+QremnT (θ), (10)
where C(1)nT and C(1)
nT are k× 1 vectors, CnT and CnT are k× k matrices, and the remainder term satisfies
QremnT (θ) = OP
(‖θ −θ0‖3
2
)+OP
(‖θ −θ0‖2
2 n−12
)+OP
(‖θ −θ0‖2 n−
32
)+OP
(n−
52
). The detailed ex-
pressions can be found in the appendix.
Define γnT = 1√nT
C−1nT C(1)
nT , enT = vec((
Σ− 1
2ε0 ⊗ In
)EnT
), and ξnT = σ
−1ξ 0 vec(ΞnT ). The limiting dis-
tribution of the random vector C(1)nT can be derived using the Cramér-Wold device. Let v be an arbitrary
conformable vector, which gives a linear combination RvnT = ν ′C(1)
nT of columns of C(1)nT . Rv
nT can be ex-
pressed in the following general form,
RvnT−ϕ
vnT =
1√nT
(av
nT′enT + e′nT Av
nT enT +bv1nT′ξnT + ξ
′nT Bv
nT ξnT + e′nT DvnT ξnT − tr(Av
nT )− tr(BvnT ))+oP(1),
where ϕvnT = ν ′ϕnT with ϕnT defined in Proposition 2 below is the leading order bias term. The bν
1nT and
BνnT are in general functions of enT , while av
nT , AvnT and Dv
nT are constant vectors or matrices. The limiting
distribution of RνnT −ϕν
nT has been derived in Qu et al. (2015) using martingale differences CLT.
Lemma 3. Let σ2Rv
nTdenote the variance of Rv
nT −ϕvnT . Assuming that σ2
Rνnt
is bounded away from 0 and
Assumptions 1-3 and 6-8 hold, then RνnT−ϕν
nTσRν
nT
d→ N(0,1).
To show the probability limit of its variance, appropriate LLNs are needed. In Qu et al. (2015), in order
to concentrate out additive individual and time effects, it gives rise to statistics of variables transformed by
the projectors In− 1n`n`
′n and IT − 1
T `T `′T where `n and `T are vectors of 1’s. Because individual and/or time
observations of spatial units are averaged over space and/or time, the LLN in preceding Lemma 2 is not
applicable, and they establish additional LLN in their Proposition 1. In this paper with interactive effects
instead of additive effects, the projectors of loadings and factors are more general. However, MΓny = In−1n Γny
(1n Γ′nyΓny
)−1Γ′ny = In− 1
n ∑Ry0r=1 ζny,rζ
′ny,r and MFTy = IT − 1
T FTy
(1T F ′TyFTy
)−1F ′Ty = In− 1
n ∑R0r=1 ζTy,rζ
′Ty,r,
where ζny,r and ζTy,r are respectively the r-th columns of Γny(1
n Γ′nyΓny)− 1
2 and FTy
(1T F ′TyFTy
)− 12. With
Assumptions 8 and 3(4), elements of ζny,r and ζTy,r are uniformly bounded and it can be checked that the
LLNs in Qu et al. (2015) can be extended to statistics with these general projectors.
14
Lemma 4. Under Assumptions 1,2,3,6 and 8, define ς∗nT as in Lemma 5, then
1nT
(ς∗nT′G′nT
(MFTy⊗MΓny
)ς∗nT −E
(ς∗nT′G′nT
(MFTy⊗MΓny
)ς∗nT))
= oP(1), and
1nT
(ς∗nT′G′nT
(MFTy⊗MΓny
)GnT ς
∗nT −E
(ς∗nT′G′nT
(MFTy⊗MΓny
)GnT ς
∗nT))
= oP(1).
Lemma 3 implies that∥∥∥C(1)
nT
∥∥∥2= OP (1). Completing the squares on the right hand side of Eq. (10),
QnT (θ) = QnT (θ0)−12(θ −θ0− γ)′ CnT (θ −θ0− γ)+
12
γ′nTCnT γnT +Qrem
nT (θ). (11)
At θ = θ0 + γnT , QnT (θ0 + γnT ) = QnT (θ0) +12 γ ′nTCnT γnT + Qrem
nT (θ0 + γnT ). Now let θnT be the QML
estimator. From Eq. (11),
QnT (θnT ) = QnT (θ0)−12(θnT −θ0− γnT
)′CnT
(θnT −θ0− γnT
)+
12
γ′nTCnT γnT +Qrem
nT (θnT ).
Because θnT maximizes the objective function, QnT (θnT )≥ QnT (θ0 + γnT ), which implies that(θnT −θ0− γnT
)′CnT
(θnT −θ0− γnT
)≤ 2
(Qrem
nT (θnT )−QremnT (θ0 + γnT )
). (12)
Assuming that CnT is positive definite, it follows that(θnT −θ0− γnT
)′CnT
(θnT −θ0− γnT
)≥ b∥∥θnT −θ0− γnT
∥∥22
for some b > 0, and ‖γnT‖2 = OP
(1√nT
)from
∥∥∥C(1)nT
∥∥∥2= OP (1). Using the similar argument in Shi
and Lee (2015), it follows that b∥∥θnT −θ0− γnT
∥∥22 +OP
(∥∥θnT −θ0− γnT∥∥n−
32
)+OP
(n−
52
)≤ 0, which
implies that√
nT∥∥θnT −θ0− γnT
∥∥2 = OP
(n−
14
)as n and T are proportional in the limit. Therefore,
√nT(θnT −θ0
)=√
nT γnT +oP (1) = C−1nT C(1)
nT +oP(1).
Recall that the parameter vector is θ =(β ′z ,β
′y,λ ,αξ ,α
′,δ ′)′. In the Proposition 2 below, denote ac-
cordingly,
ϕnT =
00
− 1√nT
tr[(
PFTy⊗ In)
GnT +(IT ⊗PΓny
)GnT
](√ nT +
√Tn
)Ry0σξ 0
Rz0√ n
T tr[
Σ12ε0Ω1
]+√
Tn tr[(
Σ12ε0Ω1⊗ In
)PΓnz
]...
Rz0√ n
T tr[
Σ12ε0ΩJ
]+√
Tn tr[(
Σ12ε0ΩJ⊗ In
)PΓnz
]0
.
Proposition 2. Assuming that limn,T→∞nT → κ > 0, Σ= limn,T→∞ ΣnT is positive definite, Π= limn,T→∞ ΠnT ,
which are given in Eq. (A.30) of the appendix, and Assumptions 1-3,4 or 5, 6-8 hold, then
√nT(
θnT −θ0−1√nT
C−1nT ϕnT
)= C−1
nT
(C(1)
nT −ϕnT
)d→ N(0,Σ−1 (Σ+Π)Σ
−1).
15
If the disturbance terms are normally distributed, Π = 0 and√
nT(
θnT −θ0− 1√nT
C−1nT ϕnT
)d→ N(0,Σ−1).
It can be shown that the variance of C(1)nT −ϕnT is ΣnT +ΠnT . Comparing the terms of CnT with ΣnT
reveals that p limn,T→∞
(CnT −ΣnT
)= 0, which gives the limiting variance matrix Σ−1 (Σ+Π)Σ−1.
Proposition 2 shows that the limiting distributions of the QML estimates do not center at their true values
but with a bias term of the order OP
(1√nT
). The bias term is due to incidental parameters, namely factor
loadings and time factors, as the estimates for them have slower convergence rate. As in Shi and Lee (2015),
the bias can be corrected by substituting in the QML estimates θnT and the estimated projectors of factors
and factor loadings in ϕnT and ΣnT . The bias corrected estimator is θ cnT = θnT − 1√
nTΣ−1nT ϕnT .
Corollary 1. Assuming that limn,T→∞nT → κ > 0, Σ = limn,T→∞ ΣnT is positive definite, Π = limn,T→∞ ΠnT ,
and Assumptions 1-3,4 or 5, 6-8 hold, then√
nT(θ c
nT −θ0) d→ N(0,Σ−1 (Σ+Π)Σ−1). If the disturbance
terms are normally distributed,√
nT(θ c
nT −θ0) d→ N(0,Σ−1).
4.4. The number of factors
As long as the number of factors specified is not fewer than the true number of factors, the QML es-
timator is consistent. However, the limiting distribution is under the premise that the number of factors is
correctly specified. Although Moon and Weidner (2015) show that under certain conditions, the limiting
distribution of the estimator for a regression panel is invariant to the inclusion of redundant factors, its finite
sample performance may suffer, as Lu and Su (2015) emphasize. If factors are interpreted as omitted vari-
ables, their detection is a first step in trying to measure them. In this section, we demonstrate how the factor
number can be consistently determined.
Several criteria on factor number selection have been proposed in the literature, including Bai and Ng
(2002)’s PC and IC criteria, Onatski (2010)’s Edge Distribution estimator, and Ahn and Horenstein (2013)’s
eigenvalue ratio and growth ratio criteria. We specifically show how Ahn and Horenstein (2013)’s eigen-
value ratio and growth ratio criteria can be extended here. Denote θnT the QML estimator with Rz ≥ Rz0
and Ry ≥ Ry0, which is a preliminary consistent estimator using a larger number of factors than the true
one. For the z equation, let DnT =(dn1, · · · , dnT
)with dnt =
(Σ− 1
2ε ⊗ In
)(znt −Xntzβz
). Using the no-
tation of Eq. (7), we have DnT = ΓnzF ′T z +
(Σ− 1
2ε0 ⊗ In
)EnT + EnT (θnT ), with EnT (θnT ) = ∑
kz+Jk=1 η
zkV z
k +
∑kz+Jk1,k2=1 η
zk1
ηzk2
V zk1k2
. Denote µznT,k = µk
( 1nT DnT D′nT
)for k ≥ 1, V z(k) = ∑
npj=k+1 µ
znT, j for k ≥ 0, and µ
znT,0 =
V z(0)/ log(min(n,T )). Define the “eigenvalue ratio” statistic, ERz(k) =µ
znT,k
µznT,k+1
, and the “growth ratio” statis-
tic, GRz(k) = log(
V z(k−1)V z(k)
)/ log
(V z(k)
V z(k+1)
). The number of factors in the z equation can be selected accord-
ing to kzER = max0≤k≤kmax ERz(k) or kz
GR = max0≤k≤kmax GRz(k).
For the y equation, let GnT =(gn1, · · · , gnT ) with gnt = σ−1ξ
(Snt
(λ
)ynt −Xntyβy−
(δ ⊗ In
)(znt −Xntzβz
)).
From Eq. (9), GnT =ΓnyF ′Ty+σ−1ξ 0 ΞnT +ΞnT (θnT ), with ΞnT (θnT )=∑
kz+ky+2+pk=1 η
ykV y
k +∑kz+ky+2+pk1,k2=1 η
yk1
ηyk2
V yk1k2
.
kyER and ky
GR are defined similarly via µynT,k = µk
( 1nT GnT G′nT
).
Proposition 3. Assuming that limn,T→∞nT → κ > 0, Assumptions 2,3 and 8 hold, the preliminary estima-
tor∥∥θnT −θ0
∥∥2 = op
(n−
12
), Rz0 ≥ 0, Ry0 ≥ 0, then limn,T→∞ Pr
(kz
ER = Rz0)= limn,T→∞ Pr
(kz
GR = Rz0)=
limn,T→∞ Pr(ky
ER = Ry0)= limn,T→∞ Pr
(ky
GR = Ry0)= 1.
16
Therefore the number of factors can be determined consistently. The proof, which is in the appendix,
checks that the relevant assumptions of Ahn and Horenstein (2013) are satisfied and their result then applies
here. Note that the case with no factors (Rz0 = 0 or Ry0 = 0) is covered.
5. Monte Carlo simulations
5.1. Design
We study the finite sample performance of the QML estimator and the accuracy of the factor number
selection in samples of different sizes, different degrees of spatial interaction and endogeneity, and different
ratios between the variances of the idiosyncratic error and the factors. We also check the finite sample
performance of the QML estimator in a data specification that tailors to the empirical application in Section
6.
In the main data specification, the outcome equation is yit = λ ∑nj=1 wit, jty jt + xitβy + γ ′yi fyt + vit . There
are two unobserved factors in the yit equation. The factors ( fyt) and factor loadings (γyi) are generated from
independent uniform random variables on [−2,2]. The observed regressor (xit) is a scalar and correlates
with factors through xit =13
(γ ′yi fyt + γ ′yi`2 + f ′yt`2
)+ηit , where ηit is a uniform random variable on [−2,2]
and `2 is a 2×1 vector of 1’s. The spatial weights are correlated with vit according to the following process.
1. Individuals indexed by 1 to n are located successively on a chessboard of dimension√
n×√
n. The
neighborhood structure follows the rook pattern, i.e. two individuals are neighbors if they share a
common border. Therefore individuals in the interior of the chessboard have 4 neighbors, and those
on the border and the corner have 3 and 2 neighbors, respectively. Let wdi j = 1 if i and j are neighbors
and wdi j = 0 otherwise, and denote the n× n matrix Wd =
[wd
i j
]. Note that in this experiment, the
geographic locations do not change over time.2. Let zit = xitβz + γzi fzt + εit . There is one unobserved factor in the zit equation. The scalar γzi is
generated from uniform random variable on [−2,2], and fzt is the first element of fyt . Let weit, jt =
wdi j ×min
(1
|zit−z jt| ,2)
. The spatial dependence is stronger for neighbors with similar z’s. In line
with the condition that the functions hit, jt(·) are uniformly bounded, weit, jt is capped at 2 such that
individuals whose z’s are very similar (∣∣zit − z jt
∣∣ < 0.5) have the same degree of spatial effect. The
spatial weights in the outcome equation are row-normalized, wit, jt =we
it, jt
∑nj=1 we
it, jt. Note that the spatial
weights matrix complies with Assumption 6.3. The idiosyncratic errors vit and εit are generated from i.i.d. bivariate normal random variables with
mean 0 and variance 43 ϑ
(1 ρ
ρ 1
). The inverse of ϑ is a measure of the signal to noise ratio. Each
common factor and the idiosyncratic error have the same variance when ϑ = 1, and the latter is more
variable when ϑ > 1.
Our empirical application considers multiple spatial weights matrices. To provide evidence that our es-
timator can perform well, in an alternative data specification, the outcome equation is replaced by yit =
λ1 ∑nj=1 w1,i jy jt + λ2 ∑
nj=1 wit, jty jt + xitβy + γ ′yi fyt + vit , where wit, jt is the same as in the main data spec-
ification representing time varying spatial relations, and W1 = [w1,i j] is row-normalized Wd representing
geographic contiguity. Other parts of the data specification are the same.
17
5.2. Monte Carlo results
1000 Monte Carlo replications are carried out. The numerical maximization routine starts at multiple
values, because the objective function is not concave and multiple local maxima may exist. We vary the
degree of spatial interaction (λ = 0.2 or λ = 0.6) and endogeneity (ρ = 0.2 or ρ = 0.6). αξ denotes σ−1ξ
and α denotes Σ− 1
2ε . Table 1 reports the Monte Carlo results for the QML estimator. The magnitude of biases
generally decreases as n and T increase. The coverage probability (CP) is calculated using the asymptotic
variance covariance matrix Σ in Eq. (A.30), and the nominal coverage probability is set at 95%. The
estimates for the variance parameters (αξ and α) have noticeable biases in finite sample, and as a result,
their CPs are well below 95%. The CPs for other parameter estimates are generally slightly below 95% and
are better. Therefore overall, hypothesis tests might be over-rejected. The Monte Carlo results of the bias
corrected estimator are reported in Table 2. The biases for the variance parameters have been significantly
reduced. The CPs are more accurate so a more reliable statistical inference can be based on the bias corrected
estimator.
Table 3 shows that estimates suffer from substantial bias if the spatial weights matrix is treated erro-
neously as exogenous in the estimation, where only the outcome equation is estimated. The biases still
remain in large samples. The estimate for the spatial effect is biased upward, and the bias is larger in
samples with a higher degree of endogeneity.
Proposition 1 shows that the QML estimates are consistent even when the number of factors is over-
specified. In Table 4, we report the performance of the bias corrected estimators when the number of factors
in the outcome equation is over-specified by 1, i.e. Ry = 3 while the true Ry0 = 2. Although estimates
are still consistent, but for these finite samples, the variance parameter (αξ ) has noticeable bias that is not
removed by the bias correction procedure. Tables 9 and 10 in the appendix report additional Monte Carlo
results. As the number of redundant factors increases, the CP deteriorates. Therefore for valid inference, it
seems important that a correct number of factors is chosen.
We check the accuracy of factor selections given by the eigenvalue ratio (ER) and growth ratio (GR)
criterions. Figure 1 shows the number of incorrect selection in 1000 simulations. The accuracy is almost
100% when the variances of elements of the idiosyncratic errors and the factors are the same (ϑ = 1), even
in small sample (n = 25, T = 25). We then make the idiosyncratic errors to be up to 9 times more variable
than the factors, and find that the selection errors increase as a result. However, the selection accuracy
quickly improves as sample size increases, and 100% accuracy is achieved in the sample with n = 81 and
T = 81. The ER and GR criterions have similar overall performances, and GR criterion performs slightly
better when the factors are weak (high ϑ ).
The empirical application in the next section uses two spatial weights matrices and the estimation results
there reveal different degrees of spatial interactions. Table 5 based on the empirical feature of the empirical
application confirms that our bias corrected estimator still performs well in the alternative specification with
multiple spatial weights matrices.
18
Table 1: Performance of the QML Estimator θnT
n T λ0 ρ0 βz βy λ αξ α δ
(1) 25 25 0.2 0.2 Bias 0.00252 -0.00113 -0.00211 0.08269 0.03813 0.00099CP 0.941 0.898 0.903 0.159 0.674 0.928
49 25 0.2 0.2 Bias 0.00013 0.00142 -0.00143 0.05931 0.02786 0.00086CP 0.942 0.926 0.918 0.119 0.675 0.955
25 49 0.2 0.2 Bias -0.00115 0.00001 -0.00190 0.05873 0.02836 0.00128CP 0.938 0.938 0.923 0.140 0.673 0.934
49 49 0.2 0.2 Bias 0.00028 0.00146 -0.00180 0.03846 0.01868 0.00029CP 0.945 0.946 0.916 0.187 0.696 0.942
(2) 25 25 0.2 0.6 Bias 0.00108 -0.00334 0.00013 0.10114 0.03806 0.00003CP 0.941 0.913 0.906 0.150 0.687 0.912
49 25 0.2 0.6 Bias 0.00020 0.00063 -0.00104 0.07260 0.02746 0.00051CP 0.936 0.931 0.928 0.114 0.693 0.927
25 49 0.2 0.6 Bias -0.00119 -0.00074 0.00076 0.07245 0.02773 0.00063CP 0.935 0.940 0.922 0.122 0.681 0.921
49 49 0.2 0.6 Bias 0.00046 0.00128 -0.00143 0.04699 0.01856 0.00038CP 0.938 0.943 0.929 0.187 0.706 0.924
(3) 25 25 0.6 0.2 Bias 0.00252 -0.00003 -0.00284 0.08201 0.03813 0.00137CP 0.941 0.901 0.907 0.183 0.674 0.926
49 25 0.6 0.2 Bias 0.00013 0.00176 -0.00129 0.05904 0.02786 0.00092CP 0.942 0.927 0.930 0.141 0.675 0.956
25 49 0.6 0.2 Bias -0.00115 0.00082 -0.00232 0.05822 0.02836 0.00154CP 0.938 0.938 0.917 0.156 0.673 0.933
49 49 0.6 0.2 Bias 0.00028 0.00176 -0.00144 0.03821 0.01868 0.00034CP 0.945 0.948 0.926 0.198 0.696 0.940
(4) 25 25 0.6 0.6 Bias 0.00108 -0.00236 -0.00139 0.10062 0.03806 0.00072CP 0.941 0.910 0.912 0.171 0.687 0.916
49 25 0.6 0.6 Bias 0.00020 0.00082 -0.00083 0.07239 0.02746 0.00060CP 0.936 0.931 0.930 0.126 0.693 0.933
25 49 0.6 0.6 Bias -0.00119 -0.00010 -0.00063 0.07216 0.02773 0.00107CP 0.935 0.937 0.929 0.144 0.681 0.919
49 49 0.6 0.6 Bias 0.00046 0.00150 -0.00112 0.04676 0.01856 0.00050CP 0.938 0.941 0.928 0.205 0.706 0.920
True parameter values: βz0 = 1, βy0 = 1, σv0 =√
43 , σe0 =
√43 (hence α0 =
( 43)− 1
2 ). In the low endogeneity case, ρ0 = 0.2,
αξ 0 = 0.88 and δ0 = 0.2. In the high endogeneity case, ρ0 = 0.6, αξ 0 = 1.08 and δ0 = 0.6. θnT is the QML estimator. Thecoverage probabilities (CP) are calculated using the theoretical standard deviations obtained from the diagonal elements of 1
nT Σ−1nT .
19
Table 2: Performance of the Bias-Corrected QML Estimator θ cnT
n T λ0 ρ0 βz βy λ αξ α δ
(1) 25 25 0.2 0.2 Bias 0.00252 -0.00145 -0.00096 0.00543 0.00197 0.00123CP 0.952 0.934 0.934 0.928 0.945 0.944
49 25 0.2 0.2 Bias 0.00013 0.00117 -0.00026 0.00241 0.00086 0.00105CP 0.945 0.948 0.932 0.936 0.943 0.964
25 49 0.2 0.2 Bias -0.00115 -0.00034 -0.00059 0.00188 0.00135 0.00153CP 0.943 0.956 0.939 0.915 0.940 0.951
49 49 0.2 0.2 Bias 0.00028 0.00122 -0.00076 0.00088 0.00063 0.00048CP 0.948 0.954 0.936 0.933 0.945 0.953
(2) 25 25 0.2 0.6 Bias 0.00108 -0.00307 -0.00057 0.00637 0.00189 -0.00017CP 0.948 0.932 0.935 0.931 0.950 0.933
49 25 0.2 0.6 Bias 0.00020 0.00038 -0.00004 0.00290 0.00047 0.00073CP 0.948 0.950 0.949 0.932 0.949 0.945
25 49 0.2 0.6 Bias -0.00119 -0.00059 0.00029 0.00263 0.00073 0.00051CP 0.948 0.954 0.940 0.932 0.938 0.943
49 49 0.2 0.6 Bias 0.00047 0.00106 -0.00062 0.00096 0.00051 0.00058CP 0.942 0.952 0.942 0.945 0.950 0.935
(3) 25 25 0.6 0.2 Bias 0.00252 -0.00100 -0.00118 0.00520 0.00197 0.00187CP 0.952 0.937 0.935 0.928 0.945 0.947
49 25 0.6 0.2 Bias 0.00013 0.00141 -0.00055 0.00227 0.00086 0.00110CP 0.945 0.946 0.947 0.936 0.943 0.964
25 49 0.6 0.2 Bias -0.00115 -0.00017 -0.00055 0.00180 0.00135 0.00203CP 0.943 0.959 0.940 0.916 0.940 0.950
49 49 0.6 0.2 Bias 0.00028 0.00143 -0.00080 0.00074 0.00063 0.00051CP 0.948 0.958 0.937 0.937 0.945 0.952
(4) 25 25 0.6 0.6 Bias 0.00108 -0.00267 -0.00090 0.00613 0.00189 0.00094CP 0.949 0.933 0.936 0.931 0.950 0.936
49 25 0.6 0.6 Bias 0.00020 0.00053 -0.00028 0.00281 0.00047 0.00081CP 0.948 0.947 0.954 0.935 0.949 0.943
25 49 0.6 0.6 Bias -0.00119 -0.00048 -0.00001 0.00262 0.00073 0.00133CP 0.948 0.956 0.948 0.927 0.938 0.941
49 49 0.6 0.6 Bias 0.00047 0.00126 -0.00068 0.00080 0.00051 0.00068CP 0.942 0.952 0.945 0.944 0.950 0.930
True parameter values: βz0 = 1, βy0 = 1, σv0 =√
43 , σe0 =
√43 (hence α0 =
( 43)− 1
2 ). In the low endogeneity case, ρ0 = 0.2,
αξ 0 = 0.88 and δ0 = 0.2. In the high endogeneity case, ρ0 = 0.6, αξ 0 = 1.08 and δ0 = 0.6. θ cnT is the bias-corrected QML
estimator. The coverage probabilities (CP) are calculated using the theoretical standard deviations obtained from the diagonalelements of 1
nT Σ−1nT .
20
Table 3: Estimation under Misspecification: Assuming Exogenous Spatial Weights Matrix
n T λ0 ρ0 βy λ ρ0 βy λ
25 25 0.2 0.2 Bias -0.00739 0.02007 0.6 Bias -0.03103 0.08171E-SD 0.04823 0.03597 E-SD 0.04883 0.03516
RMSE 0.04877 0.04117 RMSE 0.05783 0.0889549 25 0.2 0.2 Bias -0.00263 0.01775 0.6 Bias -0.01665 0.06913
E-SD 0.03099 0.02466 E-SD 0.03095 0.02386RMSE 0.03109 0.03038 RMSE 0.03513 0.07313
25 49 0.2 0.2 Bias -0.00552 0.01789 0.6 Bias -0.02391 0.07239E-SD 0.03121 0.02381 E-SD 0.03150 0.02282
RMSE 0.03168 0.02977 RMSE 0.03953 0.0759049 49 0.2 0.2 Bias -0.00347 0.01879 0.6 Bias -0.02013 0.07490
E-SD 0.02143 0.01679 E-SD 0.02157 0.01620RMSE 0.02170 0.02519 RMSE 0.02949 0.07664
25 25 0.6 0.2 Bias -0.00709 0.00981 0.6 Bias -0.02968 0.04219E-SD 0.04893 0.02346 E-SD 0.04908 0.02195
RMSE 0.04942 0.02541 RMSE 0.05734 0.0475549 25 0.6 0.2 Bias -0.00355 0.01010 0.6 Bias -0.01931 0.03910
E-SD 0.03159 0.01654 E-SD 0.03136 0.01548RMSE 0.03177 0.01937 RMSE 0.03682 0.04205
25 49 0.6 0.2 Bias -0.00572 0.00895 0.6 Bias -0.02432 0.03767E-SD 0.03173 0.01579 E-SD 0.03195 0.01456
RMSE 0.03222 0.01814 RMSE 0.04014 0.0403849 49 0.6 0.2 Bias -0.00456 0.01102 0.6 Bias -0.02298 0.04313
E-SD 0.02174 0.01164 E-SD 0.02179 0.01075RMSE 0.02220 0.01603 RMSE 0.03166 0.04445
The estimation mistakenly assumes the spatial weights matrix to be exogenous, hence only the outcome equation is estimated.
Interactive effects are still included. True parameter values: βz0 = 1, βy0 = 1, σv0 =√
43 , σe0 =
√43 (hence α0 =
( 43)− 1
2 ). In thelow endogeneity case, ρ0 = 0.2, αξ 0 = 0.88 and δ0 = 0.2. In the high endogeneity case, ρ0 = 0.6, αξ 0 = 1.08 and δ0 = 0.6. E-SDis the empirical standard deviation of the estimates.
21
Table 4: Performance of the Bias-Corrected QML Estimator θ cnT when Ry = Ry0 +1.
n T λ0 ρ0 βz βy λ αξ α δ
(1) 25 25 0.2 0.2 Bias 0.00252 -0.00201 -0.00158 0.04184 0.00197 0.00055CP 0.952 0.905 0.897 0.633 0.945 0.922
49 25 0.2 0.2 Bias 0.00013 0.00093 -0.00090 0.02825 0.00086 0.00139CP 0.945 0.921 0.912 0.678 0.943 0.940
25 49 0.2 0.2 Bias -0.00115 -0.00044 -0.00088 0.02740 0.00135 0.00158CP 0.943 0.938 0.911 0.677 0.940 0.938
49 49 0.2 0.2 Bias 0.00028 0.00138 -0.00082 0.01865 0.00063 0.00054CP 0.948 0.941 0.934 0.699 0.945 0.942
(2) 25 25 0.2 0.6 Bias 0.00108 -0.00316 -0.00067 0.05076 0.00189 -0.00002CP 0.948 0.919 0.907 0.642 0.950 0.914
49 25 0.2 0.6 Bias 0.00020 0.00079 -0.00063 0.03452 0.00047 0.00048CP 0.948 0.932 0.929 0.668 0.949 0.921
25 49 0.2 0.6 Bias -0.00119 -0.00033 -0.00015 0.03395 0.00073 0.00091CP 0.947 0.937 0.930 0.672 0.938 0.926
49 49 0.2 0.6 Bias 0.00047 0.00138 -0.00074 0.02277 0.00051 0.00059CP 0.942 0.940 0.929 0.708 0.950 0.929
(3) 25 25 0.6 0.2 Bias 0.00252 -0.00130 -0.00188 0.04149 0.00197 0.00147CP 0.952 0.907 0.903 0.648 0.945 0.923
49 25 0.6 0.2 Bias 0.00013 0.00127 -0.00112 0.02801 0.00086 0.00148CP 0.945 0.920 0.930 0.689 0.943 0.941
25 49 0.6 0.2 Bias -0.00115 -0.00007 -0.00103 0.02721 0.00135 0.00222CP 0.943 0.940 0.911 0.698 0.940 0.934
49 49 0.6 0.2 Bias 0.00028 0.00167 -0.00092 0.01849 0.00063 0.00060CP 0.948 0.940 0.931 0.730 0.945 0.940
(4) 25 25 0.6 0.6 Bias 0.00108 -0.00257 -0.00126 0.05043 0.00189 0.00123CP 0.947 0.919 0.911 0.649 0.950 0.917
49 25 0.6 0.6 Bias 0.00020 0.00102 -0.00075 0.03433 0.00047 0.00062CP 0.948 0.934 0.935 0.682 0.949 0.923
25 49 0.6 0.6 Bias -0.00119 -0.00003 -0.00049 0.03383 0.00073 0.00182CP 0.947 0.940 0.926 0.701 0.938 0.925
49 49 0.6 0.6 Bias 0.00047 0.00160 -0.00077 0.02259 0.00051 0.00070CP 0.942 0.940 0.935 0.722 0.950 0.927
The true number of factor is 1 in the z equation and 2 in the y equation. In the estimation, the number of factors is set at 1 for the z
equation and 3 for the y equation. True parameter values: βz0 = 1, βy0 = 1, σv0 =√
43 , σe0 =
√43 (hence α0 =
( 43)− 1
2 ). In the low
endogeneity case, ρ0 = 0.2, αξ 0 = 0.88 and δ0 = 0.2. In the high endogeneity case, ρ0 = 0.6, αξ 0 = 1.08 and δ0 = 0.6. θ cnT is the
bias-corrected QML estimator. The coverage probabilities (CP) are calculated using the theoretical standard deviations obtainedfrom the diagonal elements of 1
nT Σ−1nT .
22
Figure 1: Frequencies of Incorrect Estimation
True parameter values: βz0 = 1, βy0 = 1, σv0 = 1, σe0 = 1 (hence α0 = 1), ρ0 = 0.2, αξ 0 = 0.88 and δ0 = 0.2. True number offactors: 1 in the z equation and 2 in the y equation. Initial estimates assume 10 factors in both equations.
23
Tabl
e5:
Perf
orm
ance
ofth
eB
ias-
Cor
rect
edQ
ML
Est
imat
orθ
c nTfo
rthe
Alte
rnat
ive
Mod
el
nT
λ10
λ20
ρ0
βz
βy
λ1
λ2
αξ
αδ
(1)
2525
0.2−
0.2
0.2
Bia
s-0
.000
76-0
.000
040.
0035
9-0
.002
110.
0068
80.
0021
60.
0006
5C
P0.
964
0.95
70.
956
0.95
40.
922
0.94
30.
943
4925
0.2−
0.2
0.2
Bia
s0.
0002
60.
0002
20.
0011
5-0
.000
900.
0031
30.
0009
80.
0012
4C
P0.
952
0.93
20.
938
0.94
40.
919
0.94
10.
953
2549
0.2−
0.2
0.2
Bia
s0.
0004
4-0
.000
890.
0017
5-0
.000
650.
0023
40.
0012
50.
0010
9C
P0.
957
0.95
50.
946
0.94
60.
927
0.93
90.
951
4949
0.2−
0.2
0.2
Bia
s0.
0007
7-0
.000
120.
0002
3-0
.000
520.
0013
20.
0006
30.
0009
3C
P0.
950
0.94
30.
939
0.95
00.
936
0.94
80.
953
(2)
2525
0.2−
0.2
0.6
Bia
s-0
.000
83-0
.000
290.
0017
0-0
.001
780.
0078
30.
0019
5-0
.000
59C
P0.
963
0.94
90.
954
0.95
00.
930
0.94
30.
936
4925
0.2−
0.2
0.6
Bia
s-0
.000
08-0
.000
250.
0008
3-0
.000
850.
0038
30.
0004
90.
0009
1C
P0.
952
0.93
60.
950
0.94
50.
925
0.94
70.
941
2549
0.2−
0.2
0.6
Bia
s-0
.000
03-0
.001
340.
0019
8-0
.000
440.
0030
00.
0009
30.
0005
2C
P0.
955
0.95
40.
934
0.94
10.
935
0.94
00.
947
4949
0.2−
0.2
0.6
Bia
s0.
0004
8-0
.000
14-0
.000
700.
0002
70.
0017
30.
0003
40.
0006
5C
P0.
948
0.95
10.
947
0.94
00.
942
0.94
70.
943
(3)
2525
0.3−
0.3
0.2
Bia
s-0
.000
76-0
.000
070.
0033
3-0
.001
930.
0068
90.
0021
60.
0006
5C
P0.
964
0.95
60.
957
0.95
60.
920
0.94
30.
943
4925
0.3−
0.3
0.2
Bia
s0.
0002
60.
0002
00.
0010
4-0
.000
770.
0031
40.
0009
80.
0012
2C
P0.
952
0.93
40.
940
0.94
60.
920
0.94
10.
954
2549
0.3−
0.3
0.2
Bia
s0.
0004
4-0
.000
910.
0015
5-0
.000
520.
0023
50.
0012
50.
0010
9C
P0.
957
0.95
30.
944
0.94
60.
926
0.93
90.
951
4949
0.3−
0.3
0.2
Bia
s0.
0007
7-0
.000
130.
0001
6-0
.000
490.
0013
30.
0006
30.
0009
2C
P0.
950
0.94
20.
939
0.94
90.
937
0.94
80.
951
(4)
2525
0.3−
0.3
0.6
Bia
s-0
.000
83-0
.000
300.
0015
8-0
.001
750.
0078
40.
0019
5-0
.000
60C
P0.
963
0.94
80.
954
0.95
20.
930
0.94
30.
934
4925
0.3−
0.3
0.6
Bia
s-0
.000
08-0
.000
260.
0007
6-0
.000
770.
0038
30.
0004
90.
0008
8C
P0.
952
0.93
60.
950
0.94
00.
925
0.94
70.
942
2549
0.3−
0.3
0.6
Bia
s-0
.000
03-0
.001
350.
0018
3-0
.000
360.
0030
00.
0009
30.
0004
9C
P0.
955
0.95
50.
933
0.94
00.
934
0.94
00.
946
4949
0.3−
0.3
0.6
Bia
s0.
0004
8-0
.000
14-0
.000
770.
0003
00.
0017
30.
0003
40.
0006
4C
P0.
948
0.95
20.
948
0.94
60.
941
0.94
70.
943
The
outc
ome
equa
tion
has
two
spat
ialw
eigh
tsm
atri
ces,
one
time
inva
rian
t(λ
1)an
dth
eot
her
istim
eva
rian
t(λ
2).
True
para
met
erva
lues
:β
z0=
1,β
y0=
1,σ
v0=√ 4 3
,σe0
=√ 4 3
(hen
ceα
0=( 4 3) −1 2
).In
the
low
endo
gene
ityca
se,ρ
0=
0.2,
αξ
0=
0.88
and
δ0=
0.2.
Inth
ehi
ghen
doge
neity
case
,ρ0=
0.6,
αξ
0=
1.08
and
δ0=
0.6.
θc nT
isth
ebi
as-c
orre
cted
QM
Les
timat
or.T
heco
vera
gepr
obab
ilitie
s(C
P)ar
eca
lcul
ated
usin
gth
eth
eore
tical
stan
dard
devi
atio
nsob
tain
edfr
omth
edi
agon
alel
emen
tsof
1 nTΣ−
1nT
.
24
Figure 2: Average HECM Origination Rates and Top Lender Shares by US Regions
6. Empirical application: spatial spillovers in mortgage originations
Our empirical application is motivated by Haurin et al. (2014) which analyzes the effect of house prices
on state-level origination rates of the Home Equity Conversion Mortgage (HECM). HECM is the predom-
inant type of reverse mortgages in the United States, which enable senior homeowners to withdraw their
home equity without home sale or monthly payments. HECMs are insured and regulated by the federal gov-
ernment, although the private market originates the loans. The insurance is provided by the Federal Housing
Administration through the mutual mortgage insurance fund, which guarantees that the borrower can have
access to the loan fund in the future even when the lender is no longer in business, and the lender can be
fully repaid when the loan terminates even if the house value is less than the loan balance. The borrower
pays mortgage insurance premium both at loan closing and monthly over the lifetime of the loan. Haurin
et al. (2014) find that states with past volatile house prices and current house price levels above long term
norms have higher origination rates. This is consistent with the hypothesis that households use HECMs
to insure against house price declines and therefore the mortgage insurance should take into account this
behavioral response to house price dynamics, as the insurance fund will face higher claim risk when the
insured HECMs concentrate disproportionately in areas that more likely see house price declines.
Observing that the origination rates exhibit spatial clustering, it is of interest to investigate the extent
of spatial spillovers. The HECM activity in a state may be contemporaneously affected by developments
in neighboring states. We are also interested in understanding the factors that may give rise to the spatial
spillovers. We investigate two types of spatial effects, spillovers from neighboring states and spillovers due
to large lenders. We hypothesize that two types of lenders exist. Small lenders concentrate on local markets
while large lenders serve multiple states. Two neighboring states that have high concentrations of large
lenders might be more closely connected because large lenders may be more aware of the developments in
neighboring states. Our data covers 50 states plus Washington, D.C. and 52 quarters from 2001 to 2013.
Table 6 shows that the HECM market is fragmented with a handful of large lenders and a large number of
small lenders.
Let yit denote the HECM origination rate, defined as the number of newly originated HECM loans in
25
Table 6: Largest HECM Lenders by Volume (2001-2013)Name # Loans Share of Total #
Loans (2001-2013)Wells Fargo Bank 157,772 20.35%Financial Freedom Senior Funding Corp 47,187 6.09%Bank of America 25,343 3.27%One Reverse Mortgage LLC 23,431 3.02%MetLife Bank 20,179 2.60%Liberty Home Equity Solutions Inc 17,997 2.32%American Advisors Group 17,136 2.21%Seattle Mortgage Company 13,344 1.72%Urban Financial of America LLC 10,357 1.34%Generation Mortgage Company 9,557 1.23%World Alliance Financial Corp. 9,241 1.19%Reverse Mortgage USA Inc 9,112 1.18%Everbank Reverse Mortgage LLC 8,304 1.07%Ditech Mortgage Corp 7,479 0.96%M and T Bank 6,480 0.84%Countrywide Bank FSB 6,391 0.82%American Reverse Mortgage Corp 6,231 0.80%Academy Mortgage LLC 4,719 0.61%First Mariner Bank 4,341 0.56%Reverse Mortgage Solutions Inc 4,200 0.54%
Figure 3: Average House Price Deviations and Volatility by US Regions
26
state i at quarter t as a percentage of the senior population (age 65 plus) in state i from the 2010 census. There
are two n×n spatial weights matrices, W1,n = (w1,i j) and W2,nt = (w2,it, jt). w1,i j = 1 if states i and j share
the same border and w1,i j = 0 otherwise. W2,nt is time varying and captures a possibly different spillover
effect from large lenders. w2,it, jt = w1,i j × zit × z jt , where zit is the share of the HECM loans originated
by the large lenders (defined as being one of the top 10 largest lenders in Table 63) in state i at quarter
t. A larger weight is given to a state with more dominant large lenders. House price dynamic variables
are constructed using the Federal Housing Finance Agency’s quarterly all-transactions house price indexes
(HPI) deflated by the CPI, which start from 1975. Following Haurin et al. (2014), the house price dynamic
variables include percentage deviations from the previous 9 year averages (hpi_dev)4, standard deviations
of house price changes in the previous 9 years (hpi_v) and the interaction between the two. Figures 2 and
3 show the averages of these variables by U.S. regions in our sample period. It is likely that lender activity
and origination rates are affected by some macroeconomic factors, which are captured by the time factors
fzt and fyt . It is also likely that macroeconomic factors have different impacts in different states, as captured
by state-specific factor loadings. Note that factor loadings may be spatially correlated, capturing residual
spatial effects not directly modeled. Interactive effects include additive individual and time effects as special
cases, hence state time invariant variables are not included. The model consists of
zit = hpi_devitβz1 +hpi_vitβz2 +(hpi_devit ×hpi_vit)βz3 + γ′zi fzt + εit ,
and
yit = λ1
n
∑j=1
w1,i jy jt +λ2
n
∑j=1
w2,it, jty jt +hpi_devitβy1 +hpi_vitβy2 +(hpi_devit ×hpi_vit)βy3 + γ′yi fyt + vit ,
where, respectively, εit and vit have variances σ2ε and σ2
v with correlation ρ . Denote α = σ−1ε , αξ =((
1−ρ2)
σ2v)− 1
2 and δ = ρσvσ−1ε .
We assume that the upper bound of the number of factors is 10, so preliminary estimates use 10 factors.
According to the eigenvalue ratio criterion (see Proposition 3), it indicates that zit has one unobserved factor
and so is yit , and the growth ratio criterion gives the same result. The same number of factors is selected if
5 factors are used for the preliminary estimates. Figure 4 shows a heat map of the estimated factor loadings
(γyi), where darker colors indicate higher values. A higher factor loading indicates that the time factor has
a larger impact. Because the factor loadings of all states are positive, the corresponding time factors can
be interpreted as positively impacting the take-up rates. Figure 5 plots the estimated time factor ( fyt) with
the quarterly percentage change in the real house price index. The time factors generate increasing HECM
activity prior to 2009 and decreasing HECM activity post 2009, and they appear to move opposite to the
change in house prices with a lag. As the common factor component can account for the impact of an
economy wide shock, we explore possible links between the estimated time factor and some observable
3Alternatively, we define a large lender as one of the top 20 largest lenders. The results are similar.4As in Haurin et al. (2014), the choice of 9 years (or 36 quarters) is ad hoc. The time window needs to be long enough to capture
long term norms, while short enough to reflect changing economic circumstances. We have assumed that households continuouslyupdate their beliefs on the long term norms.
27
Figure 4: Estimated Factor Loadings
economic variables. The credit available from a HECM loan depends on the interest rate, the principal
limit factor set by HUD, age of the borrower and the property value. Before April 2009, only HECMs with
adjustable interest rates were available. Since April 2009, fixed rate HECMs became widely available. We
include the average interest rate of adjustable rate HECMs and the average spread between the interest rate
on an adjustable rate HECM and a fixed rate HECM5. There were two reductions to the principal limit factors
in our sample period, corresponding to the two dummy variables “After 2009Q4” and “After 2010Q4”. We
also consider households’ income fluctuations, as measured by the GDP growth rate.
Table 7 reports the results of the factor regression. More HECM credit is available with lower interest
rates and higher principal limit factors, and as expected, we find that these are associated with higher HECM
activity. In addition, more HECMs were closed in economic downturns, suggesting that households who
face negative income shocks use HECMs to supplement their income. It is interesting to note that because
the time factor is changing over time and its factor loadings are heterogeneous, the variation that has been
attributed to the common factor component (γyi fyt) can not be explained entirely by additive individual and
time effects.
Table 8 reports the main estimation results with one unobserved factor for both definitions of large
lenders, namely, top 10 largest lenders and top 20 largest lenders. As a comparison, we also report the
estimates of panel regressions with additive state fixed effects and time effects in the “FE” column. The
results via the z equation show that small lenders tend to be more active in states with more volatile house
prices and current house prices above long term norms. Such states also have higher origination rates,
consistent with the findings of Haurin et al. (2014). Comparing column FE with the spatial panel regression
reveals that the coefficients of the panel regression are overestimated when spatial effects are ignored. It is
also interesting to note that spatial effects work through different channels. Higher HECM activity in a state
5Because fixed rate HECMs were not available before 2009 April, the spread is coded as 0 before 2009Q2.
28
Figure 5: Estimated Time Factors
Table 7: Regression of Time Factors on Observable Economic VariablesCoefficient SD
Average Adjustable Rate HECM Interest Rate −0.0852∗∗∗ 0.0298Spread Between Adjustable Rate HECM and Fixed Rate HECM 0.0634 0.0566After 2009Q4 0.0258 0.0391After 2010Q4 −0.1203∗∗ 0.0462GDP Growth Rate −0.0190∗∗∗ 0.0057R2 0.38# obs 52
The dependent variable is the estimated time factor. ∗∗∗ p < 0.01, ∗∗ p < 0.05, ∗ p < 0.1.
29
Table 8: Estimation of State-Level Origination RatesTop 10 Top 20 FE
Coefficient SD Coefficient SD Coefficient SDTop Lender Share
House Price Deviation −0.1779∗∗∗ 0.0268 −0.0278 0.0286 0.0227 0.0570House Price Volatility −1.3030∗∗∗ 0.1857 −1.4777∗∗∗ 0.2091 −1.9331∗∗∗ 0.2290Deviation × Volatility −0.6965∗∗ 0.2723 −1.4172∗∗∗ 0.2910 −2.3791∗∗∗ 0.6354
HECM Origination Rateλ1 0.0965∗∗∗ 0.0059 0.1033∗∗∗ 0.0063λ2 −0.0687∗∗∗ 0.0134 −0.0750∗∗∗ 0.0132
House Price Deviation −0.0283∗∗∗ 0.0029 −0.0272∗∗∗ 0.0029 −0.0010 0.0070House Price Volatility 0.1210∗∗∗ 0.0138 0.1236∗∗∗ 0.0142 0.2722∗∗∗ 0.0281Deviation × Volatility 0.8868∗∗∗ 0.0375 0.8772∗∗∗ 0.0373 0.9433∗∗∗ 0.0781
α 7.5808∗∗∗ 0.1041 7.1922∗∗∗ 0.0988αξ 104.6246∗∗∗ 1.4507 104.7460∗∗∗ 1.4523δ 0.0036∗∗∗ 0.0008 0.0035∗∗∗ 0.0008
The sample size is n = 51 and T = 52. Bias corrected estimates are reported. ∗∗∗ p < 0.01, ∗∗ p < 0.05, ∗p < 0.1.
positively influences neighboring states, possibly due to higher awareness from increased media coverage.
However, the spillover effect is smaller for states with more large lenders. This suggests that large lenders
may shift resources towards states with higher demand, resulting in lower activity in neighboring states.
The net spillover effect is still positive, indicating that adverse selection may be slightly mitigated. The
correlation between the share of large lender and the origination rate is statistically significant, although its
magnitude is small.
7. Conclusion
This paper presents a spatial panel data model with endogenous spatial weights matrices and common
factors. The spatial weights matrices can be time varying. Endogeneity arises because the weights are con-
structed from variables that may correlate with disturbances in the main equation. By treating time factors
and factor loadings as fixed effects, the proposed QML estimator is robust to the presence of unobserved
time varying heterogeneity, which may allow to correlate with included regressors. We show that the QML
estimator is consistent and asymptotically normal in panels with large n and T . Because the approximate
score vector is not centered for the spatial interaction parameter and the variances parameters, the limiting
distribution of the estimator has a bias term in the order OP
(1√nT
). An analytical bias correction procedure
is then proposed to remove the leading order bias. Monte Carlo simulations demonstrate good finite sample
performance of the bias corrected estimator. An empirical application of the spatial panel model with time
factors to the study of effects of house prices on state-level origination rates of the Home Equity Conversion
Mortgage reveals some interesting patterns.
30
References
Ahn, Seung C. and Alex R. Horenstein, “Eigenvalue Ratio Test for the Number of Factors,” Econometrica,
May 2013, 81 (3), 1203–1227.
, Young H. Lee, and Peter Schmidt, “Panel Data Models with Multiple Time-Varying Individual Ef-
fects,” Journal of Econometrics, 2013, 174, 1–14.
Bai, Jushan, “Panel Data Models with Interactive Fixed Effects,” Econometrica, 2009, 77 (4), 1229–1279.
and Kunpeng Li, “Spatial Panel Data Models with Common Shocks,” Manuscript, Columbia University,
2014.
and Serena Ng, “Determining the Number of Factors in Approximate Factor Models,” Econometrica,
Jan. 2002, 70 (1), 191–221.
Bai, Z.D. and Y.Q. Yin, “Limit of the Smallest Eigenvalue of a Large Dimensional Sample Covariance
Matrix,” The Annals of Probability, 1993, 21 (3), 1275–1294.
Blundell, Richard, Costas Meghir, Monica Costa-Dias, and John van Reenen, “Evaluating the Employ-
ment Impact of a Mandatory Job Search Program,” Journal of the European Economic Association, June
2004, 2 (4), 569–606.
Cliff, Andrew David and J. K. Ord, Spatial Autocorrelation, London: Pion, 1973.
Conley, Timothy G. and Ethan Ligon, “Economic Distance and Cross-Country Spillovers,” Journal of
Economic Growth, 2002, 7, 157–187.
Gobillon, Laurent and Thierry Magnac, “Regional Policy Evaluation: Interactive Fixed Effects and Syn-
thetic Controls,” The Review of Economics and Statistics, Forthcoming, 2015.
Haurin, Donald, Chao Ma, Stephanie Moulton, Maximilian Schmeiser, Jason Seligman, and Wei Shi,“Spatial Variation in Reverse Mortgage Usage: House Price Dynamics and Consumer Selection,” Journal
of Real Estate Finance and Economics, Forthcoming, 2014.
Holly, Sean, M. Hashem Pesaran, and Takashi Yamagata, “A Spatio-Temporal Model of House Prices
in the USA,” Journal of Econometrics, 2010, 158, 160–173.
, , and , “The Spatial and Temporal Diffusion of House Prices in the UK,” Journal of Urban Eco-
nomics, 2011, 69, 2–23.
Horrace, William C., Xiaodong Liu, and Eleonora Patacchini, “Endogenous Network Production Func-
tions with Selectivity,” Journal of Econometrics, 2015.
Hsiao, Cheng, Analysis of Panel Data, 3rd. ed., New York, NY.: Cambridge University Press, 2014.
31
, H. Steve Ching, and Shui Ki Wan, “A Panel Data Approach for Program Evaluation: Measuring the
Benefits of Political and Economic Integration of Hong Kong with Mainland China,” Journal of Applied
Econometrics, 2012, 27, 705–740.
Jenish, Nazgul and Ingmar R. Prucha, “Central Limit Theorems and Uniform Larws of Large Numbers
for Arrays of Random Fields,” Journal of Econometrics, 2009, 150, 86–98.
and , “On Spatial Processes and Asymptotic Inference under Near-Epoch Dependence,” Journal of
Econometrics, 2012, 170, 178–190.
Kato, Tosio, Perturbation Theory for Linear Operators, Springer-Verlag, 1995.
Kelejian, Harry H. and Gianfranco Piras, “Estimation of Spatial Models with Endogenous Weighting
Matrices, and an Application to a Demand Model for Cigarettes,” Regional Science and Urban Eco-
nomics, 2014, 46, 140–149.
and Ingmar R. Prucha, “On the Asymptotic Distribution of the Moran I Test Statistic with Applica-
tions,” Journal of Econometrics, 2001, 104, 219–257.
and , “Specification and Estimation of Spatial Autoregressive Models with Autoregressive and Het-
eroskedastic Disturbances,” Journal of Econometrics, 2010, 157, 53–67.
Kim, Dukpa and Tatsushi Oka, “Divorce Law Reforms and Divorce Rates in the USA: An Interactive
Fixed-Effects Approach,” Journal of Applied Econometrics, 2014, 29, 231–245.
Latala, Rafal, “Some Estimates of Norms of Random Matrices,” Proceedings of the American Mathemati-
cal Society, 2005, 133 (5), 1273–1282.
Lee, Lung-Fei, “Asymptotic Distributions of Quasi-Maximum Likelihood Estimator for Spatial Autore-
gressive Models,” Econometrica, Nov. 2004, 72 (6), 1899–1925.
and Jihai Yu, “QML Estimation of Spatial Dynamic Panel Data Models with Time Varying Spatial
Weights Matrices,” Spatial Economic Analysis, 2012, 7 (1).
Lu, Xun and Liangjun Su, “Shrinkage Estimation of Dynamic Panel Data Models with Interactive Fixed
Effects,” Manuscript, Singapore Management University, 2015.
Moon, Hyungsik Roger and Martin Weidner, “Linear Regression for Panel with Unknown Number of
Factors as Interactive Fixed Effects,” Econometrica, July 2015, 83 (4), 1543–1579.
and Moon Weidner, “Dynamic Linear Panel Regression Models with Interactive Fixed Effects,”
Manuscript, University of Southern California, 2014.
Onatski, Alexei, “Determining the Number of Factors from Empirical Distribution of Eigenvalues,” Review
of Economic and Statistics, 2010, 92, 1004–1016.
32
Pesaran, M. Hashem, “Estimation and Inference in Large Heterogeneous Panels with a Multifactor Error
Structure,” Econometrica, 2006, 74, 967–1012.
and Elisa Tosetti, “Large Panels with Common Factors and Spatial Correlation,” Journal of Economet-
rics, 2011, 161, 182–202.
Qu, Xi and Lung fei Lee, “Estimating a Spatial Autoregressive Model with an Endogenous Spatial Weight
Matrix,” Journal of Econometrics, 2015, 184, 209–232.
, , and Jihai Yu, “QML Estimation of Spatial Dynamic Panel Data Models with Endogenous Time
Varying Spatial Weights Matrices,” Manuscript, 2015.
Shi, Wei and Lung fei Lee, “Spatial Dynamic Panel Data Models with Interactive Fixed Effects,”
Manuscript, the Ohio State University, 2015.
Wu, Chien-Fu, “Asymptotic Theory of Nonlinear Least Squares Estimation,” The Annals of Statistics, 1981,
9 (3), 501–513.
Yu, Jihai, Robert De Jong, and Lung-Fei Lee, “Quasi-maximum Likelihood Estimators for Spatial Dy-
namic Panel Data with Fixed Effects when Both n and T are Large,” Journal of Econometrics, 2008, 146,
118–134.
Zhang, Fuzhen, Matrix Theory: Basic Results and Techniques, Springer, 2011.
33
AppendixA1. Proof of Claim 2
Because elements of Xntz are uniformly bounded,∥∥XnT z,k
∥∥2≤∥∥XnT z,k
∥∥F =O(
√nT ). Similarly,
∥∥∥Zyk,nT
∥∥∥2=
O(√
nT ) for k = 1, · · · ,ky.
Next consider Zyky+1,nT = σ
−1ξ
(Gn1(Xn1yβy0 + εn1δ0 + Γny0 fy10), · · · ,GnT (XnTyβy0 + εnT δ0 + Γny0 fyT 0)
).
For this paragraph, denote ς∗nT = (ς ′n1, · · · ,ς ′nT )′ with ςnt = σ
−1ξ
(Xntyβy0 + εntδ0 + Γny0 fyt0
). Therefore we
have
tr(
Zyky+1,nT
′Zyky+1,nT
)= σ
−2ξ
T
∑t=1
(Xntyβy0 + εntδ0 + Γny0 fyt0
)′G′ntGnt(Xntyβy0 + εntδ0 + Γny0 fyt0
)= ς
∗nT′G′nT GnT ς
∗nT .
By Proposition 2, Eς∗nT′G′nT GnT ς∗nT = O(nT ). It follows that
∥∥∥Zyky+1,nT
∥∥∥2≤∥∥∥Zy
ky+1,nT
∥∥∥F= OP
(√nT)
by
Markov’s inequality.
For k = 1, · · · , p, Zyky+1+k,nT = σ
−1ξ
εk,nT where εk,nT = (εn1,k, · · · , εnT,k) with εnt,k being the k-th column
of the n× p matrix εnt = (ε1t , · · · ,εnt)′. Because elements of εk,nT are i.i.d. across i and t and have uniformly
bounded 4-th moments, by Latala (2005), ‖εk,nT‖2 = OP(√
max(n,T )) and therefore∥∥∥Zy
ky+1+k,nT
∥∥∥2=
OP
(√max(n,T )
).
A2. Proof of Claim 3
Claim 1 shows that ‖ΞnT‖2 =OP
(√max(n,T )
). Now we show that ‖UnT‖2 =OP
(√nT(
n−14 +T−
14
))with UnT = (un1, · · · ,unT ) and unt = Gntξnt . By definition,
‖UnT‖42 =
(µ1(U ′nTUnT
))2= µ1
(U ′nTUnTU ′nTUnT
)=∥∥U ′nTUnT
∥∥22
≤∥∥U ′nTUnT
∥∥2F =
T
∑t=1
T
∑s=1
(n
∑i=1
[UnT ]it [UnT ]is
)2
.
The (i, t) element of UnT is [UnT ]it = ∑nj=1 Gnt,i jξnt, j. Then
‖UnT‖42 ≤
T
∑t,s=1
n
∑i,i′=1
n
∑j, j′′=1
n
∑j′, j′′′=1
Gnt,i jGnt,i′ j′′Gns,i j′Gns,i′ j′′′ξnt, jξnt, j′′ξns, j′ξns, j′′′ .
Because ξnt,i are independently distributed over t and i, E(ξnt,i|EnT )= 0, E(ξ 2
nt,i|EnT)=σ2
ξ 0, E(
ξ 3nt,i|EnT
)=
E(
ξ 3nt,i
)and E
(ξ 4
nt,i|EnT)= E
(ξ 4
nt,i),
E‖UnT‖42
≤T
∑t,s=1,t 6=s
n
∑i,i′=1
n
∑j, j′=1
E(Gnt,i jGnt,i′ jGns,i j′Gns,i′ j′
)σ
4ξ 0 +
T
∑t=1
n
∑i,i′=1
n
∑j=1
E(G2
nt,i jG2nt,i′ j)E(ξ
4nt, j)
34
+T
∑t=1
n
∑i,i′=1
n
∑j, j′=1, j 6= j′
E(G2
nt,i jG2nt,i′ j′
)σ
4ξ 0 +2
T
∑t=1
n
∑i,i′=1
n
∑j, j′=1, j 6= j′
E(Gnt,i jGnt,i′ jGnt,i j′Gnt,i′ j′
)σ
4ξ 0
=T
∑t,s=1
n
∑i,i′=1
n
∑j, j′=1
E(Gnt,i jGnt,i′ jGns,i j′Gns,i′ j′
)σ
4ξ 0−
T
∑t=1
n
∑i,i′=1
n
∑j, j′=1
E(Gnt,i jGnt,i′ jGnt,i j′Gnt,i′ j′
)σ
4ξ 0
+T
∑t=1
n
∑i,i′=1
n
∑j=1
E(G2
nt,i jG2nt,i′ j)E(ξ
4nt, j)+
T
∑t=1
n
∑i,i′=1
n
∑j, j′=1, j 6= j′
E(G2
nt,i jG2nt,i′ j′
)σ
4ξ 0
+2T
∑t=1
n
∑i,i′=1
n
∑j, j′=1, j 6= j′
E(Gnt,i jGnt,i′ jGnt,i j′Gnt,i′ j′
)σ
4ξ 0
=T
∑t,s=1
n
∑i,i′=1
n
∑j, j′=1
E(Gnt,i jGnt,i′ jGns,i j′Gns,i′ j′
)σ
4ξ 0 +
T
∑t=1
n
∑i,i′=1
n
∑j=1
E(G2
nt,i jG2nt,i′ j)(
E(ξ
4nt, j)−σ
4ξ 0
)+
T
∑t=1
n
∑i,i′=1
n
∑j, j′=1, j 6= j′
E(G2
nt,i jG2nt,i′ j′
)σ
4ξ 0 +
T
∑t=1
n
∑i,i′=1
n
∑j, j′=1, j 6= j′
E(Gnt,i jGnt,i′ jGnt,i j′Gnt,i′ j′
)σ
4ξ 0
≤σ4ξ 0E
T
∑t,s=1
n
∑i,i′=1
[GntG′nt
]ii′[GnsG′ns
]ii′+max
j
(∣∣∣E(ξ 4nt, j)−σ
4ξ 0
∣∣∣ ,σ4ξ 0
)E
T
∑t=1
n
∑i,i′=1
n
∑j=1
G2nt,i j
n
∑j=1
G2nt,i′ j
+σ4ξ 0E
T
∑t=1
n
∑i,i′=1
[GntG′nt
]2ii′−σ
4ξ 0
T
∑t=1
n
∑i,i′=1
n
∑j=1
E(G2
nt,i jG2nt,i′ j)
≤σ4ξ 0E
T
∑t,s=1
n
∑i,i′=1
[GntG′nt
]ii′[GnsG′ns
]ii′+max
j
(∣∣∣E(ξ 4nt, j)−σ
4ξ 0
∣∣∣ ,σ4ξ 0
)E
T
∑t=1
n
∑i,i′=1
[GntG′nt
]ii
[GntG′nt
]i′i′
+σ4ξ 0E
T
∑t=1
n
∑i,i′=1
[GntG′nt
]2ii′
≤σ4ξ 0E
T
∑t,s=1
n
∑i=1
∥∥GntG′nt
∥∥∞
∥∥GnsG′ns
∥∥∞+max
j
(∣∣∣E(ξ 4nt, j)−σ
4ξ 0
∣∣∣ ,σ4ξ 0
)E
T
∑t=1
tr(GntG′nt
)2+σ
4ξ 0E
T
∑t=1
n
∑i=1
∥∥GntG′nt
∥∥2∞
=O(nT 2)+O
(n2T
).
By Markov’s inequality, ‖UnT‖2 = OP
(√nT(
n−14 +T−
14
)).
Let UnT = (un1, · · · , unT ) with unt = Snt(λ )S−1nt ξnt . Because Snt(λ )S−1
nt = In +(λ0−λ )Gnt ,∥∥UnT
∥∥2 =
‖ΞnT +(λ0−λ )UnT‖2 ≤ ‖ΞnT‖2 + |λ0−λ |‖UnT‖2 = OP
(√nT(
n−14 +T−
14
)).
A3. Proof of Proposition 1
Let Rz and Ry denote the number of factors specified in the model, and Rz0 and Ry0 the true numbers of
factors. We have assumed that Rz ≥ Rz0 and Ry ≥ Ry0. In order to prove∥∥θ −θ0
∥∥1
p→ 0, by Lemma 1 of Wu
(1981), it is sufficient to show that, for any τ > 0, liminfn,T→∞ P(infθ∈Θ,‖θ−θ0‖2≥τ (QnT (θ0)−QnT (θ))> 0
)=
1, where QnT (θ) is the concentrated likelihood in Eq. (6). The proof consists of four steps.
1. There exists a lower bound Q˜ nT
(θ0) at θ0, such that QnT (θ0)≥ Q˜ nT
(θ0). Because
(Σ− 1
2ε0 ⊗ In
)(znt −Xntzβz0)−Γnz fzt =
(Σ− 1
2ε0 ⊗ In
)εnt +Γnz0 fzt0−Γnz fzt ,
35
and
σ−1ξ 0
[Sntynt −Xntyβy0−
(δ′0⊗ In
)(znt −Xntzβz0)
]−Γny fyt
=σ−1ξ 0
[(δ′0⊗ In
)εnt + Γny0 fyt0 +ξnt −
(δ′0⊗ In
)(Γnz0 fzt0 + εnt
)]−Γny fyt
=σ−1ξ 0 ξnt +σ
−1ξ 0
(Γny0 fyt0−
(δ′0⊗ In
)Γnz0 fzt0
)−Γny fyt = σ
−1ξ 0 ξnt +Γny0 fyt0−Γny fyt , hence,
logLnT (θ0,Γnz,Γny,FT z,FTy)
=− p+12
log(2π)− 12
log |Σε0|−12
logσ2ξ 0 +
1nT
T
∑t=1
log |Snt |
− 12nT
T
∑t=1
[(Σ− 1
2ε0 ⊗ In
)εnt +Γnz0 fzt0−Γnz fzt
]′[(Σ− 1
2ε0 ⊗ In
)εnt +Γnz0 fzt0−Γnz fzt
]− 1
2nT
T
∑t=1
[σ−1ξ 0 ξnt +Γny0 fyt0−Γny fyt
]′ [σ−1ξ 0 ξnt +Γny0 fyt0−Γny fyt
], and,
logLnT (θ0,Γnz0,Γny0,FT z0,FTy0)
=− p+12
log(2π)− 12
log |Σε0|−12
logσ2ξ 0 +
1nT
T
∑t=1
log |Snt |
− 12nT
T
∑t=1
ε′nt(Σ−1ε0 ⊗ In
)εnt −
12nT σ2
ξ 0
T
∑t=1
ξ′ntξnt
=− p+12
(log(2π)+1)− 12
log |Σε0|−12
logσ2ξ 0 +
1nT
T
∑t=1
log |Snt |+OP
(1√nT
).
Therefore,
QnT (θ0)≥ logLnT (θ0,Γnz0,Γny0,FT z0,FTy0)
=− p+12
(log(2π)+1)− 12
log |Σε0|−12
logσ2ξ 0 +
1nT
T
∑t=1
log |Snt |+OP
(1√nT
). (A.1)
Note that this lower bound may not hold if the number of factors is under-specified, in which case the
variation due to factors may not all be accounted for.
2. There exists a function QnT (·), such that QnT (θ)≤ QnT (θ) for all θ ∈Θ.
QnT (θ)
=− p+12
log(2π)− 12
log |Σε |−12
logσ2ξ+
1nT
T
∑t=1
log |Snt(λ )|
− 12nT
minΓnz∈RnP×Rz ,FT z∈RT×Rz
T
∑t=1
[(Σ− 1
2ε ⊗ In
)(znt −Xntzβz)−Γnz fzt
]′[(Σ− 1
2ε ⊗ In
)(znt −Xntzβz)−Γnz fzt
](A.2)
36
− 12nT
minΓny∈Rn×Ry ,FTy∈RT×Ry
T
∑t=1
[σ−1ξ
(Snt(λ )ynt −Xntyβy−
(δ′⊗ In
)(Znt −Xntzβz)
)−Γny fyt
]′·[σ−1ξ
(Snt(λ )ynt −Xntyβy−
(δ′⊗ In
)(Znt −Xntzβz)
)−Γny fyt
]. (A.3)
Consider first Eq. (A.2).
minΓnz∈Rnp×Rz ,FT z∈RT×Rz
1nT
T
∑t=1
[(Σ− 1
2ε ⊗ In
)(znt −Xntzβz)−Γnz fzt
]′[(Σ− 1
2ε ⊗ In
)(znt −Xntzβz)−Γnz fzt
]= min
Γnz∈Rnp×Rz ,FT z∈RT×Rz
1nT
T
∑t=1
[(Σ− 1
2ε ⊗ In
)(Xntz(βz0−βz)+ εnt)+
(Σ− 1
2ε ⊗ In
)Γnz0 fzt0−Γnz fzt
]′·[(
Σ− 1
2ε ⊗ In
)(Xntz(βz0−βz)+ εnt)+
(Σ− 1
2ε ⊗ In
)Γnz0 fzt0−Γnz fzt
]≥ min
˜Γnz∈Rnp×2Rz , ˜FT z∈RT×2Rz
1nT
T
∑t=1
[(Σ− 1
2ε ⊗ In
)(Xntz(βz0−βz)+ εnt)− ˜
Γnz˜fzt
]′·[(
Σ− 1
2ε ⊗ In
)(Xntz(βz0−βz)+ εnt)− ˜
Γnz˜fzt
]= min
˜Γnz∈Rnp×2Rz
1nT
T
∑t=1
[(Σ− 1
2ε ⊗ In
)(Xntz(βz0−βz)+ εnt)
]′M ˜
Γnz
[(Σ− 1
2ε ⊗ In
)(Xntz(βz0−βz)+ εnt)
]≥ min
˜Γnz∈Rnp×2Rz
1nT
T
∑t=1
[(Σ− 1
2ε ⊗ In
)Xntz(βz0−βz)
]′M ˜
Γnz
[(Σ− 1
2ε ⊗ In
)Xntz(βz0−βz)
]+ min
˜Γnz∈Rnp×2Rz
1nT
T
∑t=1
[(Σ− 1
2ε ⊗ In
)εnt
]′M ˜
Γnz
[(Σ− 1
2ε ⊗ In
)εnt
]+ min
˜Γnz∈Rnp×2Rz
2nT
T
∑t=1
[(Σ− 1
2ε ⊗ In
)Xntz(βz0−βz)
]′M ˜
Γnz
[(Σ− 1
2ε ⊗ In
)εnt
]=
np
∑r=2Rz+1
µr
(1
nT
(Σ− 1
2ε ⊗ In
)(ηz ·Zz
nT )(ηz ·ZznT )′(
Σ− 1
2ε ⊗ In
))(A.4)
+1
nT
T
∑t=1
ε′nt(Σ−1ε ⊗ In
)εnt − max
˜Γnz∈RnP×2Rz
1nT
T
∑t=1
tr[(
Σ− 1
2ε ⊗ In
)εntε
′nt
(Σ− 1
2ε ⊗ In
)P˜
Γnz
]+
2nT
T
∑t=1
[Xntz(βz0−βz)]′ (
Σ−1ε ⊗ In
)εnt
− max˜Γnz∈RnP×2Rz
2nT
T
∑t=1
tr[(
Σ− 1
2ε ⊗ In
)εnt [Xntz(βz0−βz)]
′(
Σ− 1
2ε ⊗ In
)P˜
Γnz
]≥bz ‖ηz‖2
2 (A.5)
+ tr(Σ−1ε Σε0
)− 2Rz
nT
∥∥∥∥Σ− 1
2ε ⊗ In
∥∥∥∥2
2
∥∥EnT E ′nT
∥∥2 +OP
(1√nT
)(A.6)
− 2Rz
nT
∥∥∥∥Σ− 1
2ε ⊗ In
∥∥∥∥2
2‖EnT‖2
kz
∑k=1
∥∥XnT z,k∥∥
2 |βz0,k−βz,k| (A.7)
=bz ‖ηz‖22 + tr
(Σ−1ε Σε0
)+OP
(‖ηz‖2
(n−
12 +T−
12
))+OP
(1√nT
), (A.8)
37
where ηz = βz0−βz. Eq. (A.5) is because
np
∑r=2Rz+1
µr
(1
nT
(Σ− 1
2ε ⊗ In
)(ηz ·Zz
nT )(ηz ·ZznT )′(
Σ− 1
2ε ⊗ In
))=
np
∑r=2Rz+1
µr
(1
nT
(Σ−1ε ⊗ In
)(ηz ·Zz
nT )(ηz ·ZznT )′)
≥µpn(Σ−1ε ⊗ In
) np
∑r=2Rz+1
µr
(1
nT(ηz ·Zz
nT )(ηz ·ZznT )′)= µp
(Σ−1ε
) np
∑r=2Rz+1
µr
(1
nT(ηz ·Zz
nT )(ηz ·ZznT )′)(A.9)
≥bz ‖ηz‖22 ,
where the inequality in Eq. (A.9) can be found in Theorem 8.12 of Zhang (2011); the last inequality is due
to Assumption 4 and bz is some generic positive constant. In Eq. (A.6), we use the inequality that for a
square matrix A, |tr(A)| ≤ rank(A)‖A‖2 (see Zhang (2011)) and then ‖EnT‖2 = OP
(√max(n,T )
)from
Claim 1. In Eq. (A.7),∥∥XnT z,k
∥∥2 = O
(√nT)
from Claim 2. Next consider Eq. (A.3). Substituting in
ynt = S−1nt(Xntyβy0 +(δ ′0⊗ In)εnt + Γny0 fyt0 +ξnt
)and znt = Xntzβz0 + Γnz0 fzt0 + εnt ,
σ−1ξ
(Snt(λ )ynt −Xntyβy−
(δ′⊗ In
)(znt −Xntzβz)
)−Γny fyt
=σ−1ξ
[Snt(λ )S−1
nt(Xntyβy0 +(δ ′0⊗ In)εnt + Γny0 fyt0 +ξnt
)−Xntyβy
−(δ ′⊗ In)(Xntzβz0 + Γnz0 fzt0 + εnt −Xntzβz
)]−Γny fyt
=σ−1ξ
[Xnty(βy0−βy)+Gnt
(Xntyβy0 +
(δ′0⊗ In
)εnt + Γny0 fyt0
)(λ0−λ )+
((δ′0−δ
′)⊗ In)
εnt
−(δ′⊗ In
)Xntz(βz0−βz)+Snt(λ )S−1
nt ξnt]+σ
−1ξ
(Γny0 fyt0−
(δ′⊗ In
)Γnz0 fzt0
)−Γny fyt .
Define ηy =(
β ′y0−β ′y,λ0−λ ,δ ′0−δ ′)′
, we have
minΓny∈Rn×Ry ,FTy∈RT×Ry
1nT
T
∑t=1
[σ−1ξ
(Snt(λ )ynt −Xntyβy−
(δ′⊗ In
)(znt −Xntzβz)
)−Γny fyt
]′[σ−1ξ
(Snt(λ )ynt −Xntyβy−
(δ′⊗ In
)(znt −Xntzβz)
)−Γny fyt
]≥ min
˜Γny∈Rn×2Ry , ˜FTy∈RT×2Ry
1nT σ2
ξ
T
∑t=1
[Xnty(βy0−βy)+
((δ′0−δ
′)⊗ In)+Gnt
(Xntyβy0 +
(δ′0⊗ In
)εnt + Γny0 fyt0
)(λ0−λ )εnt
−(δ′⊗ In
)Xntz(βz0−βz)+Snt(λ )S−1
nt ξnt]− ˜
Γny˜fyt
′[Xnty(βy0−βy)+
((δ′0−δ
′)⊗ In)
εnt
+Gnt(Xntyβy0 +
(δ′0⊗ In
)εnt + Γny0 fyt0
)(λ0−λ )−
(δ′⊗ In
)Xntz(βz0−βz)+Snt(λ )S−1
nt ξnt]− ˜
Γny˜fyt
≥ min
˜Γny∈Rn×2Ry
1nT
T
∑t=1
σ−1ξ
[Xnty(βy0−βy)+Gnt
(Xntyβy0 + εntδ0 + Γny0 fyt0
)(λ0−λ )+ εnt(δ0−δ )
]′M ˜
Γnyσ−1ξ
[Xnty(βy0−βy)+Gnt
(Xntyβy0 + εntδ0 + Γny0 fyt0
)(λ0−λ )+ εnt(δ0−δ )
]
38
+ min˜Γny∈Rn×2Ry
1nT
T
∑t=1
σ−1ξ
((δ′⊗ In
)Xntz(βz0−βz)
)′M ˜
Γny
σ−1ξ
((δ′⊗ In
)Xntz(βz0−βz)
)+ min
˜Γny∈Rn×2Ry
− 2nT
T
∑t=1
σ−1ξ
[Xnty(βy0−βy)+Gnt
(Xntyβy0 + εntδ0 + Γny0 fyt0
)(λ0−λ )+ εnt (δ0−δ )
]′M ˜
Γnyσ−1ξ
((δ′⊗ In
)Xntz(βz0−βz)
)+ min
˜Γny∈Rn×2Ry
1nT
T
∑t=1
[σ−1ξ
Snt(λ )S−1nt ξnt
]′M ˜
Γny
[σ−1ξ
Snt(λ )S−1nt ξnt
]+ min
˜Γny∈Rn×2Ry
2nT
T
∑t=1
σ−1ξ
[Xnty(βy0−βy)+Gnt
(Xntyβy0 + εntδ0 + Γny0 fyt0
)(λ0−λ )+ εnt(δ0−δ )
−(δ′⊗ In
)Xntz(βz0−βz)
]′M ˜Γny
σ−1ξ
Snt(λ )S−1nt ξnt
=σ
−2ξ
n
∑r=2Ry+1
µr
(1
nT
(ηy ·Zy
nT
)(ηy ·Zy
nT
)′)+OP
(‖βz0−βz‖2
2
)+OP
(‖βz0−βz‖2 ‖ηy‖2
)+
1nT
T
∑t=1
σ−2ξ
ξ′ntS−1nt′Snt(λ )
′Snt(λ )S−1nt ξnt − max
˜Γny∈Rn×2Ry
1nT
T
∑t=1
tr(
σ−2ξ
Snt(λ )S−1nt ξntξ
′ntS−1nt′Snt(λ )
′P˜Γny
)(A.10)
+2
nT
T
∑t=1
σ−2ξ
[Xnty(βy0−βy)+Gnt
(Xntyβy0 + εntδ0 + Γny0 fyt0
)(λ0−λ )
+εnt (δ0−δ )−(δ′⊗ In
)Xntz(βz0−βz)
]′ Snt(λ )S−1nt ξnt (A.11)
− max˜Γny∈Rn×2Ry
2nT σ2
ξ
T
∑t=1
tr(Snt(λ )S−1
nt ξnt[Xnty(βy0−βy)+Gnt
(Xntyβy0 + εntδ0 + Γny0 fyt0
)(λ0−λ )
+εnt (δ0−δ )−(δ′⊗ In
)Xntz(βz0−βz)
]′P˜Γny∈Rn×2Ry
)(A.12)
≥by ‖ηy‖22 +OP
(‖βz0−βz‖2
2
)+OP
(‖βz0−βz‖2 ‖ηy‖2
)+
σ2ξ 0
σ2ξ
1nT
T
∑t=1
tr(S−1
nt′Snt(λ )
′Snt(λ )S−1nt)
+OP
(‖ηy‖2
(n−
12 +T−
12
))+OP
((‖ηy‖2
2 +‖ηy‖2 ‖ηz‖2
)(n−
14 +T−
14
))+OP
(1√nT
). (A.13)
Under Assumption 4, by > 0. If there are collinearity between Gnt(Xntyβ0 + εntδ0 + Γny0 fyt0
), Xnty and εnt ,
then Assumption 4 will not be satisfied. However, with λ identified from the alternative Assumption 5(2),
we can still have by > 0 from Assumption 5(1). In Eq. (A.10), 1nT ∑
Tt=1 σ
−2ξ
ξ ′ntS−1nt′Snt(λ )
′Snt(λ )S−1nt ξnt =
σ2ξ 0
σ2ξ
1nT ∑
Tt=1 tr
(S−1
nt′Snt(λ )
′Snt(λ )S−1nt)+OP(
1√nT) from Lemma 5(1), and
∣∣∣∣∣ max˜Γny∈Rn×2Ry
1nT
T
∑t=1
tr(
σ−2ξ
Snt(λ )S−1nt ξntξ
′ntS−1nt′Snt(λ )
′P˜Γny
)∣∣∣∣∣≤
∣∣∣∣∣ max˜Γny∈Rn×2Ry
1nT σ2
ξ
T
∑t=1
tr(
ξntξ′ntP˜
Γny
)∣∣∣∣∣+∣∣∣∣∣ max
˜Γny∈Rn×2Ry
2(λ0−λ )
nT σ2ξ
T
∑t=1
tr(
Gntξntξ′ntP˜
Γny
)∣∣∣∣∣
39
+
∣∣∣∣∣ max˜Γny∈Rn×2Ry
(λ0−λ )2
nT σ2ξ
T
∑t=1
tr(
Gntξntξ′ntG′ntP˜
Γny
)∣∣∣∣∣≤
2Ry
nT σ2ξ
‖ΞnT‖22 +
2Ry |λ0−λ |nT σ2
ξ
‖UnT‖2 ‖ΞnT‖2 +Ry |λ0−λ |2
nT σ2ξ
‖UnT‖22
=Op
(1n+
1T
)+Op
(|λ0−λ |
(n−
34 +T−
34
))+Op
(|λ0−λ |2
(n−
12 +T−
12
)),
where UnT = (un1, · · · ,unT ) with unt = Gntξnt is an n× T matrix, and the last equality is from Claim 3,‖UnT‖2 = OP
(√nT(
n−14 +T−
14
)). Eq. (A.11) is OP(
1√nT) by Proposition 2. For example, one term can
be written as
1nT
T
∑t=1
(GntXntyβy0)′ Snt(λ )S−1
nt ξnt =1
nTXβy0
nTy′G′nT
(InT +(λ0−λ ) GnT
)ξ∗nT
=1
nTEXβy0
nTy′G′nT
(InT +(λ0−λ ) GnT
)ξ∗nT +Op
(1√nT
)= OP(
1√nT
),
with ξ ∗nT = (ξ ′n1, · · · ,ξnT ), GnT in Proposition 2 and Xβy0nTy =
(β ′y0X ′n1y, · · · ,β ′y0X ′nTy
)′. The term in (A.12) can
be similarly shown to be OP
(‖ηy‖2
(n−
12 +T−
12
))+OP
((‖ηy‖2
2 +‖ηy‖2 ‖ηz‖2
)(n−
14 +T−
14
)). Com-
bining Eqs. (A.8) and (A.13), we have
QnT (θ)≤−p+1
2log(2π)− 1
2log |Σε |−
12
logσ2ξ+
1nT
T
∑t=1
log |Snt(λ )|−12
σ2ξ 0
σ2ξ
1nT
T
∑t=1
tr(S−1
nt′Snt(λ )
′Snt(λ )S−1nt)
− 12
[by ‖ηy‖2
2 +OP
(‖ηz‖2
2
)+OP
(‖ηz‖2 ‖ηy‖2
)+ (A.14)
Op
(‖ηy‖2
(n−
12 +T−
12
))+OP
((‖ηy‖2
2 +‖ηz‖2 ‖ηy‖2
)(n−
14 +T−
14
))](A.15)
− 12
[bz ‖ηz‖2
2 + tr(Σ−1ε Σε0
)+OP
(‖ηz‖2
(n−
12 +T−
12
))]+OP
(n−1 +T−1) , (A.16)
Notice that with a slight abuse of notation, we list (A.14) separately from (A.15) because the former terms
come from
1nT
T
∑t=1
σ−2ξ
[Xnty(βy0−βy)+Gnt
(Xntyβy0 + εntδ0 + Γny0 fyt0
)(λ0−λ )+ εnt(δ0−δ )−
(δ′⊗ In
)Xntz(βz0−βz)
]′M ˜
Γny[Xnty(βy0−βy)+Gnt
(Xntyβy0 + εntδ0 + Γny0 fyt0
)(λ0−λ )+ εnt(δ0−δ )−
(δ′⊗ In
)Xntz(βz0−βz)
], (A.17)
and M ˜Γny
is positive semi-definite, and therefore it is non-negative; while the later terms come from (A.10)
and (A.12). In the following analysis, we can take QnT (θ) from the right hand side of the preceding inequal-
ity for QnT (θ).
3. liminfn,T→∞ P(infθ∈Θ,‖θ−θ0‖2≥τ [QnT (θ0)−QnT (θ)]> 0
)= 1 for any τ > 0. From Eq. (A.1) and
(A.16),
QnT (θ0)−QnT (θ)
40
≥12
[bz ‖ηz‖2
2 +by ‖ηy‖22 +OP
(‖ηz‖2
2
)+OP
(‖ηz‖2 ‖ηy‖2
)]+
12
[OP
(‖ηz‖2
(n−
12 +T−
12
))+OP
(‖ηy‖2
(n−
12 +T−
12
))+OP
((‖ηy‖2
2 +‖ηz‖2 ‖ηy‖2
)(n−
14 +T−
14
))]+
12
(− log
[σ2
ξ 0
σ2ξ
(T
∏t=1
∣∣Snt(λ )′Snt(λ )S−1
nt S−1nt′∣∣ 1
nT
)]+
σ2ξ 0
σ2ξ
1nT
T
∑t=1
tr(Snt(λ )
′Snt(λ )S−1nt S−1
nt′)−1
)
+p2
(− log
∣∣Σε0Σ−1ε
∣∣ 1P +
1p
tr(Σε0Σ
−1ε
)−1)+OP(n−1 +T−1). (A.18)
By the inequality of arithmetic and geometric means,
1nT
T
∑t=1
tr(Snt(λ )
′Snt(λ )S−1nt S−1
nt′)≥ 1
T
T
∑t=1
∣∣Snt(λ )′Snt(λ )S−1
nt S−1nt′∣∣ 1
n ≥
(T
∏t=1
∣∣Snt(λ )′Snt(λ )S−1
nt S−1nt′∣∣ 1
n
) 1T
,
(A.19)1p
tr(Σε0Σ
−1ε
)≥∣∣Σε0Σ
−1ε
∣∣ 1p . (A.20)
Furthermore, because by ‖ηy‖22 +OP
(‖ηz‖2
2
)+OP
(‖ηz‖2 ‖ηy‖2
)≥ 0 and ‖ηy‖2 is bounded,
QnT (θ0)−QnT (θ)
≥12
(bz ‖ηz‖2
2 +oP (1))
(A.21)
+12
(− log
[σ2
ξ 0
σ2ξ
(T
∏t=1
∣∣Snt(λ )′Snt(λ )S−1
nt S−1nt′∣∣ 1
nT
)]+
σ2ξ 0
σ2ξ
T
∏t=1
∣∣Snt(λ )′Snt(λ )S−1
nt S−1nt′∣∣ 1
nT −1
)
+p2
(− log
∣∣Σε0Σ−1ε
∣∣ 1P +∣∣Σε0Σ
−1ε
∣∣ 1p −1
),
If ‖ηz‖2 = 0,
QnT (θ0)−QnT (θ)
≥12
(by ‖ηy‖2
2 +oP (1))
(A.22)
+12
(− log
[σ2
ξ 0
σ2ξ
(T
∏t=1
∣∣Snt(λ )′Snt(λ )S−1
nt S−1nt′∣∣ 1
nT
)]+
σ2ξ 0
σ2ξ
T
∏t=1
∣∣Snt(λ )′Snt(λ )S−1
nt S−1nt′∣∣ 1
nT −1
)
+p2
(− log
∣∣Σε0Σ−1ε
∣∣ 1P +∣∣Σε0Σ
−1ε
∣∣ 1p −1
).
For any real number x > 0, − logx+ x− 1 ≥ 0 with equality holds only if x = 1. Because 1p tr(Σε0Σ−1
ε
)>∣∣Σε0Σ−1
ε
∣∣ 1p unless Σε = cΣε0 for some scalar c, and − log
∣∣Σε0Σ−1ε
∣∣ 1p +∣∣Σε0Σ−1
ε
∣∣ 1p −1 > 0 if c 6= 1, we have
QnT (θ0)−QnT (θ) > 0 if Σε 6= Σε0 with probability approaching (wpa) 1. Because the mapping from Σε
to α is one to one, QnT (θ0)−QnT (θ) > 0 if α 6= α0 wpa 1. For the other parameters, consider the case
with Assumption 4 first. If any of the following holds for some τ > 0, ‖βz0−βz‖2 ≥ τ , ‖βy0−βy‖2 ≥ τ ,|λ0−λ | ≥ τ , ‖δ0−δ‖2 ≥ τ , Eq. (A.21) and/or Eq. (A.22) is strictly greater than 0 wpa 1, and hence
41
QnT (θ0)−QnT (θ) > 0 wpa 1. Alternatively, consider the case with Assumption 5. The inequality in Eq.
(A.19) holds strictly for any λ 6= λ0 wpa 1. On the other hand, with λ = λ0, Eq. (A.21) and/or Eq. (A.22) is
strictly positive wpa 1 if for some τ > 0, either ‖βz0−βz‖2 ≥ τ , ‖βy0−βy‖2 ≥ τ , or ‖δ0−δ‖2 ≥ τ holds.
4. The rate of convergence. Let θ denote the QML estimator, ηz = βz0 − βz and
ηy =(
βy0− βy,λ0− λ ,δ ′0− δ ′)′
. Let cnT = min(n,T ). By definition, QnT(θ)−QnT (θ0) ≥ 0, which im-
plies from Eq. (A.18),
by ‖ηy‖22 +OP
(‖ηz‖2
2
)+OP
(‖ηz‖2 ‖ηy‖2
)+bz ‖ηz‖2
2 +OP
(‖ηz‖2 c−
12
nT
)+OP
(‖ηy‖2 c−
12
nT
)+OP
((‖ηy‖2
2 +‖ηz‖2 ‖ηy‖2
)c−
14
nT
)+OP
(c−1
nT
)≤ 0.
(A.23)
Completing the squares, we have from (A.23),(b
12y,nT ‖ηy‖2 +
12
ay,nT b−12
y,nT
)2
+dy,nT −14
a2y,nT b−1
y,nT ≤ 0, (A.24)
with by,nT = by+OP
(c−
14
nT
), ay,nT =OP
(c−
12
nT
)+OP (‖ηz‖2)+OP
(‖ηz‖2 c−
14
nT
)and dy,nT =OP
(‖ηz‖2
2
)+
bz ‖ηz‖22 +OP
(‖ηz‖2 c−
12
nT
)+OP
(c−1
nT
). Recall that by ‖ηy‖2
2 +OP
(‖ηz‖2
2
)+OP
(‖ηz‖2 ‖ηy‖2
)≥ 0 (see
(A.17)), then (A.23) implies that
bz ‖ηz‖22 +‖ηz‖2 OP
(c−
12
nT
)︸ ︷︷ ︸
az,nT
+OP
(‖ηy‖2 c−
12
nT
)+OP
((‖ηy‖2
2 +‖ηz‖2 ‖ηy‖2
)c−
14
nT
)+OP
(c−1
nT
)︸ ︷︷ ︸
dz,nT
≤ 0.
(A.25)
Similarly, by completing the squares, from (A.25),(b
12z ‖ηz‖2 +
12
az,nT b−12
z
)2
+dz,nT −14
a2z,nT b−1
z ≤ 0. (A.26)
Because the parameter spaces are bounded, dz,nT− 14 a2
z,nT b−1z =OP
(c−
14
nT
), b
12z ‖ηz‖2+
12 az,nT b−
12
z =Op
(c− 1
8nT
).
As az,nT = OP
(c−
12
nT
), we then have ‖ηz‖2 = OP
(c− 1
8nT
).
Now we proceed to show that ‖ηz‖2 = OP
(c−
12
nT
). Assuming that for some 1
8 ≤ a < 12 , ‖ηz‖2 =
OP(c−a
nT
). Substituting ‖ηz‖2 = OP
(c−a
nT
)into Eq. (A.24), we have dy,nT − 1
4 a2y,nT b−1
y,nT = OP(c−2a
nT
), ay,nT =
OP(c−a
nT
). Therefore ‖ηy‖2 = OP
(c−a
nT
). Substituting ‖ηy‖2 = OP
(c−a
nT
)into Eq. (A.26), we have dz,nT =
OP
(c−
12−a
nT + c−14−2a
nT
), which gives ‖ηz‖2 = OP
(max
(c−
14−
12 a
nT ,c− 1
8−anT
)). Note that for a < 1
2 , −a >
max(−1
4 −12 a,−1
8 −a)
and we can always improve the rate for a < 12 . Therefore ‖ηz‖2 = OP
(c−
12
nT
).
42
Then ‖ηy‖2 = OP
(c−
12
nT
)follows from Eq. (A.24).
Lemma 5. Let ξnT = σ−1ξ 0 (ξ ′n1, · · · ,ξ ′nT )
′. Under Assumptions 1-3 and 6,
1nT σ2
ξ 0
T
∑t=1
ξ′ntS−1nt′Snt(λ )
′Snt(λ )S−1nt ξnt =
1nT
T
∑t=1
tr(S−1
nt′Snt(λ )
′Snt(λ )S−1nt)+OP(
1√nT
).
Proof. We have
1nT σ2
ξ 0
T
∑t=1
ξ′ntS−1nt′Snt(λ )
′Snt(λ )S−1nt ξnt
=1
nTξnT
′ (InT +(λ0−λ )(GnT + G′nT
)+(λ0−λ )2G′nT GnT
)ξnT
=1
nTEtr(InT +(λ0−λ )
(GnT + G′nT
)+(λ0−λ )2G′nT GnT
)+OP(
1√nT
) (A.27)
=1
nTtr(InT +(λ0−λ )
(GnT + G′nT
)+(λ0−λ )2G′nT GnT
)+OP(
1√nT
) (A.28)
=1
nT
T
∑t=1
tr(S−1
nt′Snt(λ )
′Snt(λ )S−1nt)+OP(
1√nT
),
where Eq. (A.27) is from Lemma 2. Let ιk,nT be an nT × 1 vector with one in its k-th entry and zerosin its other entries. Because tr
(GnT
)= ∑
nTk=1 gk,nT with gk,nT = ι ′k,nT GnT ιk,nT and gk,nT satisfies the NED
properties, Eq. (A.28) follows from Qu and Lee (2015).
A4. Components of the Asymptotic Distribution
As Eq. (11) shows, the asymptotic distribution depends on the approximate Hessian matrix CnT and the
approximate gradient vector C(1)nT whose components can be collected from Eq. (10) and further simplified
using Lemmas 2 and 5(2). Recall that the parameter vector is θ =(β ′z ,β
′y,λ ,αξ ,α
′,δ ′)′.
(1) Elements of C(1)nT (the Approximate Score Vector)
The following lemma establishes that some of the second order terms in Eqs. (8) and (9) do not appear
in the leading order term of the score vector.
Lemma 6. Under Assumptions 2 and 3,
1√nT
tr(
MΓnzVzk MFT zE
′nT
(Σ− 1
2ε0 ⊗ In
)PΓnz,FT zE
′nT
(Σ− 1
2ε0 ⊗ In
))= oP(1),
1√nT
tr(
MΓnz
(Σ− 1
2ε0 ⊗ In
)EnT MFT zV
zk′PΓnz,FT zE
′nT
(Σ− 1
2ε0 ⊗ In
))= oP(1),
1√nT
tr(
MΓnz
(Σ− 1
2ε0 ⊗ In
)EnT MFT zE
′nT
(Σ− 1
2ε0 ⊗ In
)PΓnz,FT zV
zk′)= oP(1),
43
for k = 1, · · · ,kz + J, and V zk are from Section 4.3;
1√nT
tr(MΓnyV
yk MFTyΞ
′nT PΓny,FTyΞ
′nT)=
1√nT
tr(MΓnyΞnT MFTyV
yk′PΓny,FTyΞ
′nT)= oP(1),
1√nT
tr(MΓnyΞnT MFTyΞ
′nT PΓny,FTyV
yk′)= oP(1),
for k = 1, · · · ,kz + ky +2+ p, and V yk are from Section 4.3.
Proof. From Moon and Weidner (2014) (Lemma B.1),∥∥PΓnz,FT z
∥∥2 =OP
(1√nT
)and
∥∥PΓny,FTy
∥∥2 =OP
(1√nT
).
They also show that the above terms are oP(1) if the regressor contains only exogenous components. Theirresult can be applied here except for the three terms, −(Ωk⊗ In)EnT in V z
kz+k for k = 1, · · · ,J,σ−1ξ 0 (Gn1ξn1, · · · ,GnT ξnT ) in V y
kz+ky+1, and −ΞnT in V ykz+ky+2. However, because these terms are oP
(√nT),
the relevant terms are oP(1) by using the inequality |tr(A)| ≤ rank(A)‖A‖2 for a square matrix A.
The followings are the components in the score vector. The oP(1) terms in the first lines are due to
Lemma 6.
• βz: for k = 1, · · · ,kz,
C(1)nT,k =
1√nT
tr(
MFT zX′nT z,k
(Σ− 1
2ε0 ⊗ In
)MΓnzenT
)− 1√
nT σ2ξ 0
tr(MFTyX
′nT z,k (δ0⊗ In)MΓnyΞnT
)+oP(1)
=1√nT
(a(k)1nT
′enT +b(k)1nT′ξnT
)+oP(1),
where a(k)1nT = vec(
MΓnz
(Σ− 1
2ε0 ⊗ In
)XnT z,kMFT z
)and b(k)1nT =−σ
−1ξ 0 vec
(MΓny (δ
′0⊗ In) XnT z,kMFTy
).
• βy: for k = 1, · · · ,ky,
C(1)nT,kz+k =
1√nT σ2
ξ 0
tr(MFTyX
′nTy,kMΓnyΞnT
)+oP(1) =
1√nT
b(k)2nT′ξnT +oP(1),
where b(k)2nT = σ−1ξ 0 vec
(MΓnyXnTy,kMFTy
).
• λ :
C(1)nT,kz+ky+1 =−
1√nT
tr(GnT
)+
1√nT σξ 0
ξ′nT(MFTy⊗MΓny
)GnT
(XnTyβy0 +
(IT ⊗ Γny
)FTy + ˜enT Σ
12ε0δ0
)+
1√nT
ξ′nT GnT ξnT −
1√nT
ξ′nT(PFTy⊗ In
)GnT ξnT −
1√nT
ξ′nT(IT ⊗PΓny
)GnT ξnT +oP(1)
=1√nT
(b′3nT ξnT + ξ
′nT B1nT ξnT − tr(B1nT )
)− 1√
nTtr((
PFT ⊗ In + IT ⊗PΓny
)GnT
)+oP(1),
(A.29)
where GnT = diag(Gn1, · · · ,GnT ) is an nT × nT matrix, XnTy =(
X ′n1y, · · · ,X ′nTy
)′is nT × ky, FTy =
44
(f ′y1, · · · , f ′yT
)′is T Ry0×1, ˜enT =
e111 e112 · · · e11p...
......
en11 en12 · · · en1p...
......
enT 1 enT 2 · · · enT p
is nT × p,
b3nT =σ−1ξ 0
(MFTy⊗MΓny
)GnT
(XnTyβy0 +
(IT ⊗ Γny
)FTy + ˜enT Σ
12ε0δ0
), and
B1nT =12(InT −PFTy⊗ In− IT ⊗PΓny
)GnT +
12
G′nT(InT −PFTy⊗ In− IT ⊗PΓny
).
• αξ :
C(1)nT,kz+ky+2 =−
σξ 0√nT
ξ′nT(MFTy⊗MΓny
)ξnT +
√nT σξ 0 +oP(1)
=1√nT
(ξ′nT B2nT ξnT − tr(B2nT )
)+
(√nT+
√Tn
)Ry0σξ 0 +oP(1),
where B2nT =−σξ 0MFTy⊗MΓny .
• α: for k = 1, · · · ,J, by using the relation tr(B′1B2B3B4) = vec(B1)′ (B′4⊗B2)vec(B3) for any con-
formable matrices B1, · · · ,B4,
C(1)nT,kz+ky+2+k =
√nT tr
(Σ
12ε0Ωk
)− 1√
nTtr(
e′nT
(Σ
12ε0Ωk⊗ In
)MΓnzenT MFT z
)+oP(1)
=1√nT
(e′nT A(k)
nT enT − tr(
A(k)nT
))+
√Tn
tr[(
Σ12ε0Ωk⊗ In
)PΓnz
]+Rz0
√nT
tr[
Σ12ε0Ωk
]+oP(1),
where A(k)nT = 1
2
(A(k)
nT + A(k)nT′)
with A(k)nT =−MFT z⊗
((Σ
12ε0Ωk⊗ In
)MΓnz
). Note that Ωk comes from
the linear parameterization of square root of the variance, Σ− 1
2ε = ∑
Jj=1 Ω jα j.
• δ : for k = 1, · · · , p, by using the relation tr(B′1B2B3B4) = vec(B1)′ (B′4⊗B2)vec(B3),
C(1)nT,kz+ky+2+J+k =
1√nT σ2
ξ 0
tr[MΓnyΞnT MFTy
(ΓnzF ′T z +EnT
)′(ek⊗ In)
]+oP(1)
=1√nT
(b(k)4nT
′ξnT + e′nT D(k)
nT ξnT
)+oP(1),
where ek be a p × 1 vector with one in its k-th entry and zeros in its other entries,
b(k)4nT = σ−1ξ 0 vec
(MΓny
(e′k⊗ In
)ΓnzF ′T zMFTy
)and D(k)
nT = σ−1ξ 0 MFTy⊗
((Σ
12ε0ek⊗ In
)MΓny
).
45
(2) Variance of C(1)nT −ϕnT
ϕnT is defined in Proposition 2 and E(
C(1)nT −ϕnT
)= 0. The variance of C(1)
nT −ϕnT can be straightfor-
wardly shown to be ΣnT +ΠnT with terms given below:
ΣnT =
ΣβzβznT Σ
βzβynT Σ
βzλnT 0 0 Σ
βzδnT
ΣβyβynT Σ
βyλ
nT 0 0 Σβyδ
nT
∗ ΣλλnT Σ
λαξ
nT 0 ΣλδnT
∗ ∗ Σαξ αξ
nT 0 0∗ ∗ ∗ Σαα
nT 0∗ ∗ ∗ ∗ Σδδ
nT
, ΠnT =
0 0 ΠβzλnT Π
βzαξ
nT ΠβzαnT 0
0 Πβyλ
nT Πβyαξ
nT 0 0
∗ ΠλλnT Π
λαξ
nT 0 ΠλδnT
∗ ∗ Παξ αξ
nT 0 Παξ δ
nT
∗ ∗ ∗ ΠααnT 0
∗ ∗ ∗ ∗ 0
,
(A.30)
with[Σ
βzβznT
]k1k2
=1
nTtr(
MFT zX′nT z,k1
(Σ− 1
2ε0 ⊗ In
)MΓnz
(Σ− 1
2ε0 ⊗ In
)XnT z,k2
)+
1nT σ2
ξ 0tr(MFTyX
′nT z,k1
(δ0⊗ In)MΓny
(δ′0⊗ In
)XnT z,k2
)[Σ
βzβynT
]k1k2
=− 1nT σ2
ξ 0tr(MFTyX
′nT z,k1
(δ0⊗ In)MΓnyXnTy,k2
)[Σ
βzλnT
]k
=1
nTE(
b(k)1nT′b3nT
)[Σ
βzδnT
]k1k2
=1
nTb(k1)
1nT′b(k2)
4nT =− 1nT σ2
ξ 0tr(MFTyX
′nT z,k1
(δ0⊗ In)MΓny
(e′k2⊗ In
)ΓnzF ′T z
)[Σ
βyβynT
]k1k2
=1
nT σ2ξ 0
tr(MFTyX
′nTy,k1
MΓnzXnTy,k2
)[Σ
βyλ
nT
]k
=1
nTE(
b(k)2nT′b3nT
)[Σ
βyδ
nT
]k1k2
=1
nTb(k1)
2nT′b(k2)
4nT =1
nT σ2ξ 0
tr(MFTyX
′nTy,k1
MΓny
(e′k2⊗ In
)ΓnzF ′T z
)Σ
λλnT =
1nT
E(b′3nT b3nT +2tr
(B2
1nT))
=1
nTE(b′3nT b3nT + tr
(G2
nT + G′nT GnT))
+o(1)
Σλαξ
nT =2
nTE(tr(B1nT B2nT )) =−
2σξ 0
nTEtr(GnT
)[Σ
λδnT
]k
=1
nTE(
b′3nT
(b(k)4nT +D(k)
nT′enT
))Σ
αξ αξ
nT =2
nTtr(B2
2nT)= 2σ
2ξ 0 +o(1)
[ΣααnT ]k1k2
=2
nTtr(
A(k1)nT A(k2)
nT
)= tr(Σε0Ωk1Ωk2)+ tr
(Σ
12ε0Ωk1Σ
12ε0Ωk2
)+o(1)[
ΣδδnT
]k1k2
=1
nT
(b(k1)
4nT′b(k2)
4nT + tr(
D(k1)nT′D(k2)
nT
))
46
=1
nT σ2ξ 0
tr(MFTyF
′T zΓnz (ek1⊗ In)MΓny
(e′k2⊗ In
)ΓnzFT zMFTy
)+σ
−2ξ 0 tr
(e′k1
Σε0ek2
)+o(1),
and[Π
βzλnT
]k
=1
nTE
(nT
∑i=1
b(k)1nT,iB1nT,iiξ3nT,i
)[Π
βzαξ
nT
]k
=−σξ 0
nT
nT
∑i=1
b(k)1nT,i
[MFTy⊗MΓny
]iiEξ
3nT,i[
ΠβzαnT
]k1k2
=1
nT
npT
∑i=1
a(k1)1nT,iA
(k2)nT,iiEe3
nT,i
[Π
βyλ
nT
]k
=1
nTE
(nT
∑i=1
b(k)2nT,iB1nT,iiξ3nT,i
)[Π
βyαξ
nT
]k
=−σξ 0
nT
nT
∑i=1
b(k)2nT,i
[MFTy⊗MΓny
]iiEξ
3nT,i
ΠλλnT =
1nT
E
(nT
∑i=1
(B1nT,ii)2(
ξ4nT,i−3
)+2
nT
∑i=1
b3nT,iB1nT,iiξ3nT,i
)
Πλαξ
nT =1
nTE
(nT
∑i=1
B1nT,iiB2nT,ii
(ξ
4nT,i−3
)+2
nT
∑i=1
b3nT,iB2nT,iiξ3nT,i
)[Π
λδnT
]k
=1
nTE
(nT
∑i=1
npT
∑s=1
B1nT,ii
(b(k)4nT,i +D(k)
nT,sienT,s
)ξ
3nT,i
)
Παξ αξ
nT =1
nT
nT
∑i=1
(B2nT,ii)2(Eξ
4nT,i−3
)[Π
αξ δ
nT
]k
=1
nTE
(nT
∑i=1
B2nT,iib(k)4nT,iξ
3nT,i
)
[ΠααnT ]k1k2
=1
nT
npT
∑i=1
A(k1)nT,iiA
(k2)nT,ii
(Ee4
nT,i−3).
(3) Elements of CnT (the Approximate Hessian Matrix)
CnT =
CβzβznT Cβzβy
nT CβzλnT C
βzαξ
nT CβzαnT Cβzδ
nT
CβyβynT Cβyλ
nT Cβyαξ
nT 0 Cβyδ
nT
∗ CλλnT C
λαξ
nT 0 CλδnT
∗ ∗ Cαξ αξ
nT 0 Cαξ δ
nT
∗ ∗ ∗ CααnT 0
∗ ∗ ∗ ∗ CδδnT
.
[Cβzβz
nT
]k1k2
=1
nTtr(
MFT zX′nT z,k1
(Σ− 1
2ε0 ⊗ In
)MΓnz
(Σ− 1
2ε0 ⊗ In
)XnT z,k2
)47
+1
nT σ2ξ 0
tr(MFTyX
′nT z,k1
(δ0⊗ In)MΓny
(δ′0⊗ In
)XnT z,k2
)[Cβzβy
nT
]k1k2
=− 1nT σ2
ξ 0tr(MFTyX
′nT z,k1
(δ0⊗ In)MΓnyXnTy,k2
)[Cβzλ
nT
]k
=1
nTb(k)1nT
′b3nT −1
nT σξ 0vec(MΓny
(δ′0⊗ In
)XnT z,kMFTy
)′ (MFTy⊗MΓny
)GnT ξnT
=1
nTE(
b(k)1nT′b3nT
)+oP(1)[
Cβzαξ
nT
]k
=1
nT σξ 0tr(MFTyX
′nT z,k1
(δ0⊗ In)MΓnyΞnT)= oP(1)[
CβzαnT
]k1k2
=− 1nT
tr(
MΓnz
(Σ− 1
2ε0 ⊗ In
)XnT z,k1MFT zE
′nT (Ωk⊗ In)
)= oP(1)
[Cβzδ
nT
]k1k2
=− 1nT σ2
ξ 0tr
(MΓny
(δ′0⊗ In
)XnT z,k1MFTy
(kz
∑k=1
XnT z,k
(βz0,k− βz,k
)+ ΓnzF ′T z +EnT
)′(ek2⊗ In)
)
=− 1nT σ2
ξ 0tr(MFTyX
′nT z,k1
(δ0⊗ In)MΓny
(e′k2⊗ In
)ΓnzF ′T z
)+oP(1)[
CβyβynT
]k1k2
=1
nT σ2ξ 0
tr(MΓnyXnTy,k1MFTyX
′nTy,k2
)[Cβyλ
nT
]k
=1
nTb(k)2nT
′b3nT +1
nT σξ 0vec(MΓnyXnTy,kMFTy
)′ (MFTy⊗MΓny
)GnT ξnT =
1nT
E(
b(k)2nT′b3nT
)+oP(1)[
Cβyαξ
nT
]k
=− 1nT σξ 0
tr(MΓnyXnTy,kMFTyΞnT
)= oP(1)
[Cβyδ
nT
]k1k2
=1
nT σ2ξ 0
tr
(MΓnyXnTy,k1MFTy
(kz
∑k=1
XnT z,k
(βz0,k− βz,k
)+ ΓnzF ′T z +EnT
)′(ek2⊗ In)
)
=1
nT σ2ξ 0
tr(MFTyX
′nTy,k1
MΓny
(e′k2⊗ In
)ΓnzF ′T z
)+oP(1)
CλλnT =
1nT σ2
ξ 0tr(
MΓnyXnTy,ky+1MFTyX′nTy,ky+1
)+
1nT
tr(G2
nT)
=1
nTb′3nT b3nT +
1nT
ξ′nT G′nT
(MFTy⊗MΓny
)GnT ξnT +
1nT
tr(G2
nT)+oP(1)
=1
nTE(b′3nT b3nT
)+
1nT
(tr(G2
nT)+ tr
(G′nT GnT
))+op(1)
Cλαξ
nT =−σξ 0
nTξ′nT GnT ξnT −
1nT σξ 0
tr(MΓnyXnTy,ky+1MFTyΞ
′nT)
=−σξ 0
nTξ′nT GnT ξnT −
σξ 0
nTξ′nT(MFTy⊗MΓny
)GnT ξnT +oP(1) =−
2σξ 0
nTEtr(GnT
)+oP(1)[
CλδnT
]k
=1
nT σ2ξ 0
tr
(MΓnyXnTy,ky+1MFTy
(kz
∑k=1
XnT z,k
(βz0,k− βz,k
)+ ΓnzF ′T z +EnT
)′(ek⊗ In)
)
=1
nTE(
b′3nT
(b(k)4nT +D(k)
nT′enT
))+oP(1)
Cαξ αξ
nT =1
nTtr(MΓnyξnT MFTyξ
′nT)+σ
2ξ 0
48
=σ2
ξ 0
nTξ′nT(MFTy⊗MΓny
)ξnT +σ
2ξ 0 = 2σ
2ξ 0 +oP(1)[
Cαξ δ
nT
]k
=− 1nT σξ 0
tr
(MΓnyΞnT MFTy
(kz
∑k=1
XnT z,k
(βz0,k− βz,k
)+ ΓnzF ′T z +EnT
)′(ek⊗ In)
)= oP(1)
[Cαα
nT]
k1k2=
1nT
tr(MΓnz (Ωk1⊗ In)EnT MFT zE
′nT (Ωk2⊗ In)
)+ tr
(Σ
12ε0Ωk1Σ
12ε0Ωk2
)=
1nT
e′nT
(MFT z⊗
((Σ
12ε0Ωk2⊗ In
)MΓnz
(Ωk1Σ
12ε0⊗ In
)))enT + tr
(Σ
12ε0Ωk1Σ
12ε0Ωk2
)+oP(1)
=tr(Σε0Ωk1Ωk2)+ tr(
Σ12ε0Ωk1Σ
12ε0Ωk2
)+oP(1)[
CδδnT
]k1k2
=1
nT σ2ξ 0
tr
(MΓny
(e′k1⊗ In
)( kz
∑k=1
XnT z,k
(βz0,k− βz,k
)+ ΓnzF ′T z +EnT
)
MFTy
(kz
∑k=1
XnT z,k
(βz0,k− βz,k
)+ ΓnzF ′T z +EnT
)′(ek2⊗ In)
)
=1
nT σ2ξ 0
tr(MΓny
(e′k1⊗ In
)EnT MFTyE
′nT (ek2⊗ In)
)+
1nT σ2
ξ 0tr(MΓny
(e′k1⊗ In
)ΓnzF ′T zMFTyFT zΓ
′nz (ek2⊗ In)
)+oP(1)
=σ−2ξ 0 tr
(e′k1
Σε0ek2
)+
1nT σ2
ξ 0tr(MFTyF
′T zΓnz (ek1⊗ In)MΓny
(e′k2⊗ In
)ΓnzFT z
)+op(1).
A5. Proof of Proposition 3
We show that Corollary 1 of Ahn and Horenstein (2013) holds by checking that their Assumption A-D
are satisfied. Assuming that n and T are proportional, and the preliminary estimator satisfies∥∥θ −θ0
∥∥2 =
op
(n−
12
), we have the following results.
1.∥∥∥∥(Σ
− 12
ε0 ⊗ In
)EnT + EnT (θ)
∥∥∥∥2=Op (
√n), because ‖EnT‖2 =Op (
√n) by Assumption 2 and
∥∥EnT(θ)∥∥
2≤
∑kz+Jk=1
∥∥ηzk
∥∥2
∥∥V zk
∥∥2 +∑
kz+Jk1,k2=1
∥∥∥ηzk1
∥∥∥2
∥∥∥ηzk2
∥∥∥2
∥∥∥V zk1k2
∥∥∥2= op (
√n).
2. µnp
(1n
((Σ− 1
2ε0 ⊗ In
)EnT + EnT (θ)
)((Σ− 1
2ε0 ⊗ In
)EnT + EnT (θ)
)′)≥ c+op(1) for some positive
constant c. This is because
µnp
(1n
((Σ− 1
2ε0 ⊗ In
)EnT + EnT (θ)
)((Σ− 1
2ε0 ⊗ In
)EnT + EnT (θ)
)′)≥µnp
(1n
(Σ−1ε0 ⊗ In
)EnT E ′nT
)+µnp
(1n
EnT(θ)
EnT(θ)′)
+µnp
(1n
(Σ− 1
2ε0 ⊗ In
)EnT EnT
(θ)′+
1n
EnT(θ)
E ′nT
(Σ− 1
2ε0 ⊗ In
))(A.31)
≥µnp
(1n
(Σ−1ε0 ⊗ In
)EnT E ′nT
)− 1
n
∥∥∥EnT(θ)
EnT(θ)′∥∥∥
2
49
− 1n
∥∥∥∥(Σ− 1
2ε0 ⊗ In
)EnT EnT
(θ)′+ EnT
(θ)
E ′nT
(Σ− 1
2ε0 ⊗ In
)∥∥∥∥2
=µnp
(1n
(Σ−1ε0 ⊗ In
)EnT E ′nT
)+oP(1)≥ c+oP(1),
where the inequality in Eq. (A.31) can be found in Theorem 8.12 of Zhang (2011), and the last
inequality is from Lemma A.1 of Ahn and Horenstein (2013) (due to Bai and Yin (1993)).
3. For any r×q matrix A = (a1, · · · ,aq) such that A′A = T Iq,
1n3
∣∣∣∣tr(A′FT zΓ′nz
((Σ− 1
2ε0 ⊗ In
)EnT + EnT
(θ))
A)∣∣∣∣
≤Rz0
n3
∥∥AA′∥∥
2
∥∥FT zΓ′nz
∥∥2
∥∥∥∥(Σ− 1
2ε0 ⊗ In
)EnT + EnT
(θ)∥∥∥∥
2= OP
(n−
12
).
Similarly, we have
1n3
∣∣∣∣tr(A′((
Σ− 1
2ε0 ⊗ In
)EnT + EnT
(θ))′
Γnz(Γ′nzΓnz
)−1Γ′nz
((Σ− 1
2ε0 ⊗ In
)EnT + EnT
(θ))
A)∣∣∣∣
≤Rz0
n3
∥∥AA′∥∥
2
∥∥∥∥(Σ− 1
2ε0 ⊗ In
)EnT + EnT
(θ)∥∥∥∥2
2
∥∥∥Γnz(Γ′nzΓnz
)−1Γ′nz
∥∥∥2= OP
(n−1) .
In Ahn and Horenstein (2013), Assumptions C and D can be replaced by the conditions in their Eqs. (2)
and (3), which are satisfied by 1 and 2 above. Their Assumption B is used to prove Lemma A.10, which is
satisfied by 3, and their Assumption A is satisfied by our Assumption 8. It is straightforward to check that
GnT also satisfies the relevant conditions and similar methods can be applied to determine the number of
factors in the y equation. In summary, Ahn and Horenstein (2013)’s result applies here.
A6. Additional Monte Carlo Results
Estimation with Redundant Factors
Tables 9 and 10 provide the Monte Carlo results where the number of factors in the outcome equation is
over-specified by 2 and 3.
50
Table 9: Performance of the Bias-Corrected QML Estimator θ cnT when Ry = Ry0 +2.
n T λ0 ρ0 βz βy λ αξ α δ
(1) 25 25 0.2 0.2 Bias 0.00252 -0.00255 -0.00276 0.07660 0.00197 0.00071CP 0.952 0.879 0.858 0.228 0.945 0.903
49 25 0.2 0.2 Bias 0.00013 0.00165 -0.00099 0.05266 0.00086 0.00175CP 0.945 0.909 0.902 0.226 0.943 0.924
25 49 0.2 0.2 Bias -0.00115 -0.00006 -0.00106 0.05187 0.00135 0.00151CP 0.943 0.924 0.891 0.242 0.940 0.917
49 49 0.2 0.2 Bias 0.00028 0.00171 -0.00078 0.03550 0.00063 0.00040CP 0.948 0.932 0.916 0.242 0.945 0.931
(2) 25 25 0.2 0.6 Bias 0.00108 -0.00353 -0.00086 0.09293 0.00189 -0.00009CP 0.949 0.898 0.867 0.224 0.950 0.894
49 25 0.2 0.6 Bias 0.00020 0.00097 -0.00077 0.06438 0.00047 0.00032CP 0.948 0.923 0.909 0.224 0.949 0.919
25 49 0.2 0.6 Bias -0.00119 -0.00040 -0.00028 0.06389 0.00073 0.00068CP 0.947 0.931 0.896 0.227 0.938 0.916
49 49 0.2 0.6 Bias 0.00047 0.00144 -0.00110 0.04330 0.00051 0.00092CP 0.942 0.924 0.912 0.249 0.950 0.922
(3) 25 25 0.6 0.2 Bias 0.00252 -0.00148 -0.00313 0.07603 0.00197 0.00166CP 0.952 0.878 0.865 0.244 0.945 0.895
49 25 0.6 0.2 Bias 0.00013 0.00203 -0.00128 0.05237 0.00086 0.00186CP 0.945 0.906 0.903 0.245 0.943 0.922
25 49 0.6 0.2 Bias -0.00115 0.00051 -0.00157 0.05158 0.00135 0.00228CP 0.943 0.927 0.901 0.277 0.940 0.918
49 49 0.6 0.2 Bias 0.00028 0.00200 -0.00091 0.03533 0.00063 0.00046CP 0.948 0.931 0.918 0.272 0.945 0.930
(4) 25 25 0.6 0.6 Bias 0.00108 -0.00268 -0.00195 0.09242 0.00189 0.00134CP 0.948 0.901 0.866 0.244 0.950 0.899
49 25 0.6 0.6 Bias 0.00020 0.00127 -0.00096 0.06413 0.00047 0.00055CP 0.948 0.927 0.915 0.239 0.949 0.919
25 49 0.6 0.6 Bias -0.00119 -0.00001 -0.00078 0.06368 0.00073 0.00171CP 0.948 0.936 0.901 0.252 0.938 0.918
49 49 0.6 0.6 Bias 0.00047 0.00170 -0.00100 0.04309 0.00051 0.00106CP 0.942 0.927 0.914 0.266 0.950 0.919
The true number of factor is 1 in the z equation and 2 in the y equation. In the estimation, the number of factors is set at 1 for the z
equation and 4 for the y equation. True parameter values: βz0 = 1, βy0 = 1, σv0 =√
43 , σe0 =
√43 (hence α0 =
( 43)− 1
2 ). In the low
endogeneity case, ρ0 = 0.2, αξ 0 = 0.88 and δ0 = 0.2. In the high endogeneity case, ρ0 = 0.6, αξ 0 = 1.08 and δ0 = 0.6. θ cnT is the
bias-corrected QML estimator. The coverage probabilities (CP) are calculated using the theoretical standard deviations obtainedfrom the diagonal elements of 1
nT Σ−1nT .
51
Table 10: Performance of the Bias-Corrected QML Estimator θ cnT when Ry = Ry0 +3.
n T λ0 ρ0 βz βy λ αξ α δ
(1) 25 25 0.2 0.2 Bias 0.00252 -0.00269 -0.00357 0.11187 0.00197 0.00152CP 0.952 0.860 0.811 0.040 0.945 0.866
49 25 0.2 0.2 Bias 0.00013 0.00168 -0.00179 0.07701 0.00086 0.00143CP 0.945 0.905 0.866 0.031 0.943 0.900
25 49 0.2 0.2 Bias -0.00115 -0.00049 -0.00181 0.07649 0.00135 0.00150CP 0.944 0.920 0.837 0.033 0.940 0.878
49 49 0.2 0.2 Bias 0.00028 0.00196 -0.00107 0.05199 0.00063 0.00048CP 0.948 0.922 0.895 0.036 0.945 0.911
(2) 25 25 0.2 0.6 Bias 0.00108 -0.00473 -0.00100 0.13613 0.00189 -0.00024CP 0.947 0.884 0.845 0.033 0.950 0.874
49 25 0.2 0.6 Bias 0.00020 0.00119 -0.00093 0.09410 0.00047 0.00005CP 0.948 0.922 0.880 0.032 0.949 0.890
25 49 0.2 0.6 Bias -0.00119 -0.00035 -0.00034 0.09390 0.00073 0.00050CP 0.947 0.924 0.869 0.037 0.938 0.892
49 49 0.2 0.6 Bias 0.00047 0.00151 -0.00121 0.06342 0.00051 0.00058CP 0.942 0.925 0.902 0.036 0.950 0.917
(3) 25 25 0.6 0.2 Bias 0.00252 -0.00120 -0.00463 0.11097 0.00197 0.00268CP 0.952 0.861 0.808 0.049 0.945 0.871
49 25 0.6 0.2 Bias 0.00013 0.00229 -0.00204 0.07658 0.00086 0.00156CP 0.945 0.901 0.869 0.042 0.943 0.903
25 49 0.6 0.2 Bias -0.00115 0.00049 -0.00286 0.07594 0.00135 0.00238CP 0.943 0.924 0.847 0.049 0.940 0.887
49 49 0.6 0.2 Bias 0.00028 0.00227 -0.00113 0.05178 0.00063 0.00056CP 0.948 0.918 0.893 0.044 0.945 0.912
(4) 25 25 0.6 0.6 Bias 0.00108 -0.00365 -0.00247 0.13543 0.00189 0.00134CP 0.947 0.879 0.849 0.044 0.950 0.869
49 25 0.6 0.6 Bias 0.00020 0.00151 -0.00102 0.09383 0.00047 0.00023CP 0.948 0.920 0.885 0.040 0.949 0.892
25 49 0.6 0.6 Bias -0.00119 0.00020 -0.00117 0.09356 0.00073 0.00173CP 0.948 0.927 0.873 0.039 0.938 0.892
49 49 0.6 0.6 Bias 0.00047 0.00178 -0.00110 0.06319 0.00051 0.00073CP 0.942 0.923 0.894 0.040 0.950 0.918
The true number of factor is 1 in the z equation and 2 in the y equation. In the estimation, the number of factors is set at 1 for the z
equation and 5 for the y equation. True parameter values: βz0 = 1, βy0 = 1, σv0 =√
43 , σe0 =
√43 (hence α0 =
( 43)− 1
2 ). In the low
endogeneity case, ρ0 = 0.2, αξ 0 = 0.88 and δ0 = 0.2. In the high endogeneity case, ρ0 = 0.6, αξ 0 = 1.08 and δ0 = 0.6. θ cnT is the
bias-corrected QML estimator. The coverage probabilities (CP) are calculated using the theoretical standard deviations obtainedfrom the diagonal elements of 1
nT Σ−1nT .
52