and common factorsi - wei shi · many spatial panel data sets exhibit cross sectional and/or...

A Spatial Panel Data Model with Time Varying Endogenous Weights Matricesand Common FactorsI

Wei Shia,∗, Lung-fei Leeb

aInstitute for Economic and Social Research, Jinan University, Guangzhou, Guangdong 510632, China.bDepartment of Economics, The Ohio State University, Columbus, Ohio 43210, United States.

Abstract

Many spatial panel data sets exhibit cross sectional and/or intertemporal dependence from spatial interac-

tions or common factors. In an application of a spatial autoregressive model, a spatial weights matrix may

be constructed from variables that may correlate with unobservables in the main equation and therefore is

endogenous. Some common factors may be unobserved and correlate with included regressors in the equa-

tion. This paper presents a unified approach to model spatial panels with endogenous time varying spatial

weights matrices and unobserved common factors. We show that the proposed QML estimator is consistent

and asymptotically normal. As its limiting distribution may have a leading order bias, an analytical bias

correction is proposed. Monte Carlo simulations demonstrate good finite sample properties of the estima-

tors. This model is empirically applied to examine the effects of house price dynamics on reverse mortgage

origination rates in the United States.

Keywords: Spatial panel data, endogenous spatial weighting matrix, multiplicative individual and time

effects, QMLE, reverse mortgages

JEL classification: C13, C23, C51, G21

IThis version: 9/7/2016. We would like to thank Stephanie Moulton for sharing the HECM loan data, and participants atseminars in the Ohio State University, the 15th International Workshop on Spatial Econometrics and Statistics, the 2016 NorthAmerican Summer Meeting of the Econometric Society, and the 2016 China Meeting of the Econometric Society for comments.The usual disclaimer applies.∗Corresponding author.Email addresses: [email protected] (Wei Shi ), [email protected] (Lung-fei Lee)

1. Introduction

The increasing availability of large panel data brings many opportunities that are not available with only

cross sectional or time series data (see Hsiao (2014) for a good review). Individual heterogeneity can be

better accounted for in a panel. In a two-way fixed effects model, observed regressors can correlate with an

individual and time invariant component (the individual effect), and a common and time variant component

(the time effect) of the error term. Recently the additive convention on individual and time effects is extended

to allow for additional interactive effects, where unobserved time factors can have heterogeneous impacts on

individuals (Pesaran (2006), Bai (2009), Ahn et al. (2013) and Moon and Weidner (2015)). In those models,

the unobservables exhibit a factor structure and the covariates may correlate with these factors. The factors

can capture omitted economy-wide shocks, etc, and the two-way fixed effects model is a special case of a

general interactive fixed effects model.

Cross sectional dependence can be generated with unobserved time factors as they influence all cross

sectional units. In addition, cross sectional dependence can also arise due to spatial effects. In terms of

econometric modeling, spatial effects can be modeled with a spatially lagged dependent variable (Cliff and

Ord (1973)). Spatial weights matrix provides a structure on spatial dependence and outcomes of individuals

close by may be interdependent. Estimation methods of the spatial autoregressive (SAR) model have been

well established for an exogenously given spatial weights matrix (see, inter alia, Kelejian and Prucha (2001),

Kelejian and Prucha (2010), Lee (2004) and Yu et al. (2008)). In a SAR model, an explanatory variable may

have both an direct effect on an individual’s outcome, as well as an indirect effect through spillovers on its

neighbors. In addition, spatial correlation may be modeled with spatially dependent errors. For example,

Pesaran and Tosetti (2011) consider cross sectionally dependent errors that have factor and spatial structures,

which are interpreted, respectively, as strong and weak cross section dependence (also see Holly et al. (2010)

and Holly et al. (2011) for empirical applications to house prices).

A spatial weights matrix can be based on geographic distances, or in many applications, economic theory

may inform how individuals are connected with links constructed using economic variables (see the appli-

cations in Conley and Ligon (2002), Kelejian and Piras (2014)). However, when a spatial weights matrix

is constructed from economic variables, the exogeneity assumption may not be satisfied if these included

economic variables in the SAR equation correlate with disturbances of the equation. In a recent paper, Hor-

race et al. (2015) propose a network production model where a manager selects workers into the network

and whose decision may depend on factors that correlate with the network unobserved characteristics in the

production function. They show that the endogenous effect can be accounted for by using a Heckman type

parametric or semi-parametric approach, or by including network fixed effects. Qu and Lee (2015) consider

a cross section SAR model with an endogenously generated spatial weights matrix. Their approach caters to

cases where individuals reside in a space, which is a common feature in many regional science applications.

Using a control function approach, they provide consistent and asymptotically normal IV, QMLE and GMM

estimators. Kelejian and Piras (2014) propose a 2SLS estimator for a panel data model with an endogenous

spatial weights matrix and additive individual and time effects.

Panels with interactive effects also have wide empirical applications. For example, in regional policy

evaluations, interactive effects can better control for time varying cross section heterogeneities (Gobillon

1

and Magnac (2015)). It has been used in the study of the effect of political and economic integration of

Hong Kong with mainland China (Hsiao et al. (2012)), the effect of divorce law reforms on divorce rates

(Kim and Oka (2014)), among others. As noted in Blundell et al. (2004), researchers need to weigh the

benefits of using close neighbors as controls which are more likely to be otherwise the same as the treated,

against the potential contaminating spatial spillover effects. In the regression panel with interactive effects,

one might argue that spatial effects are absorbed into the factor loadings. However, this approach will likely

be inadequate if spatial effects are time varying or endogenous. Furthermore, spatial correlation may be due

to agents interacting with each other, but not just through spatial similarity in factor loading responses. In

addition, the direct and indirect spillover effect of a policy may also be of interest.

This paper extends endogeneity of spatial weights in a cross section setting in Qu and Lee (2015) to a

large n,T panel data model with time varying spatial weights matrices and unobserved interactive individual

and time effects. Bai and Li (2014) and Shi and Lee (2015) consider a panel SAR model with interactive

effects and an exogenously given time invariant spatial weights matrix. In this paper, we have time varying

endogenous spatial weights matrices constructed from variables that may correlate with the disturbances in

the main equation. The interactive fixed effects are controlled for in equations and are treated as parameters.

Even though spatial weights matrices are endogenous and time varying, interactive individual effects and

time factors can be concentrated out. The profile objective function (concentrated likelihood function)

includes a sum of certain eigenvalues of a random matrix. For a regression panel model, Moon and Weidner

(2014, 2015) apply the perturbation method in Kato (1995) to derive its approximate gradient vector and

Hessian matrix, so that asymptotic distributions of estimates of the regression coefficients can be derived.

We show that the perturbation method can also be applied beyond regression panels to our model and that the

QML estimator taking into account of endogenous spatial weights matrices is consistent and asymptotically

normal.

We demonstrate the estimator’s good finite sample performance by a Monte Carlo study. We apply the

model to analyze the effect of house price dynamics on reverse mortgage origination rates in the U.S. We

find that in the reverse mortgage market, loan activity in one state influences activities in neighboring states,

but that the spillover effect is smaller for states dominated by large lenders. This suggests that demand

side factors such as common media exposure may be the driving force behind spatial spillovers in reverse

mortgage activities.

This paper is organized as follows. We discuss our spatial panel model in the next section. We present the

QML estimator in Section 3. Under some regularity assumptions, we investigate its asymptotic properties in

Section 4, mainly its consistency and limiting distribution. As the limiting distribution may not be properly

centered, an analytic bias correction is proposed. A Monte Carlo study is reported in Section 5. The

empirical application on reverse mortgage origination rates is in Section 6. Section 7 concludes.

Notation 1. For a finite dimensional vector η , ‖η‖1 = ∑k |ηk| and ‖η‖2 =

√∑k |ηk|2. Let µi(M) denote

the i-th largest eigenvalue of a symmetric matrix M of dimension n with eigenvalues listed in a decreasingorder such that µn(M) ≤ µn−1(M) ≤ ·· · ≤ µ1(M). For a real matrix A, its spectral norm is ||A||2, i.e.,‖A‖2 =

√µ1 (A′A). In addition, ||A||1 = max1≤ j≤n ∑

ni=1 |Ai j| is its maximum column sum norm, ||A||∞ =

max1≤i≤n ∑nj=1 |Ai j| is its maximum row sum norm, and ‖A‖F =

√tr(AA′) is its Frobenius norm. Define

the projection matrices PA = A(A′A)−1A′ and MA = I−A(A′A)−1A′. In general, lower case letters represent

2

vectors and upper case letters represent matrices.

2. The model

2.1. Model specification

This paper analyzes panel data where individuals are located in space through time. The data set contains

n individuals indexed by i, and each of them has observations for T time periods indexed by t. Individuals

are located on a possibly unevenly spaced lattice D ⊂ Rd0 with d0 ≥ 1, which is time invariant and can

base on geographic variables. The location l : 1, · · · ,n → Dn ⊂ D is a mapping of individual i to its

location l(i) ∈ D ⊂ Rd0 . In addition, let DT ⊂ Z denote the set of time indexes. An individual i at time t

has geographic location l(i) and time location t. This is an adaptation of the topological structure of spatial

processes in Jenish and Prucha (2009, 2012) and its extension by Qu and Lee (2015) to the panel data setting

where the distance between it (individual i at time t) and js depends on both individual identities and time

indexes. Define the metric ρit, js = ρ(it, js) = maxmax1≤k≤d0 |l(i)k− l( j)k| , |t− s| where l(i)k is the kth

component of the d0×1 vector l(i). For asymptotic analysis, sample expansion is ensured by an unbounded

expansion of the sample region (increasing domain asymptotics) along both the cross section and the time

dimensions. Note that although individuals’ geographic locations are time invariant, the strength of a spatial

link may depend also on economic variables, which can vary over time. Those will be made precise later.

Assumption 1. The lattice D⊂Rd0 , d0≥ 1, is infinitely countable. All elements in D are located at distancesof at least ρ0 > 0 from each other. Without loss of generality, we assume that ρ0 = 1.

The equation of interest is

ynt = λWntynt +Xntyβy + Γny fyt + vnt , (1)

where ynt is n×1; Xnty is n×ky matrix of ky exogenous regressors. Wnt is a n×n spatial weights matrix with

elements denoted as wit, jt,nT . A spatial weights matrix measures degrees of connections between spatial units

and is specified as nonnegative and with zero diagonals. Let yit denote the i-th element of ynt . In Eq. (1),

λwit, jt,nT measures the impact of y jt on yit and λ is the spatial interactions coefficient. Economic theories

may suggest variables to be included and how the spatial weights matrix is constructed. For example, in

the study of cross country growth spillovers (Conley and Ligon (2002)), a distance measure between two

countries includes their geographic distance and transportation costs of goods and labor. In the analysis of

cigarette demand (Kelejian and Piras (2014)), weights are based on the relative prices of cigarettes between

two neighboring states. Essentially the spatial weights matrix provides information on how an individual

is relatively affected by its neighbors. Note that individuals’ geographic locations in Assumption 1 may

also limit degrees of spatial interactions among individuals (see Assumption 6). In the following, we drop

the subscript nT in wit, jt.nT and simplify it to wit, jt when no confusion arises, while keeping in mind that

elements like wit, jt may depend on n and T .

Remark 1. Eq. (1) can be interpreted as a best response function where individual i’s optimal action (e.g.contribution towards a public good) depends on its neighbors’ actions. The spatial interaction coefficientλ reflects the indirect effect of neighbors’ Xnty, which can be seen from the reduced form of Eq. (1),

3

ynt = (In−λWnt)−1 (Xntyβy + Γny fyt + vnt

)= ∑

∞s=0 λ sW s

nt(Xntyβy + Γny fyt + vnt

), assuming that In−λWnt is

invertible and its inverse can be represented by a Neumann series. W snt reflects the influence of s-th level

neighbors. Also note that the model (1) and subsequent analysis can be easily extended to the case withmultiple spatial weights matrices.

In an economic model, it is unlikely that included regressors capture all the information. When omit-

ted variables exist and correlate with included regressors, a standard estimation method generally becomes

invalid. In large panels with long time dimension (many time periods), to account for possible unobserved

effects, we postulate that the error term in a SAR panel equation can be decomposed into a common factor

component and an idiosyncratic component, where common factors can potentially correlate with included

regressors. In Eq. (1), the dependent variable is affected by Ry unobserved factors, where the time factors,

fyt , is Ry× 1 and its loading matrix, Γny, is n× Ry which allows time factors to have heterogeneous im-

pacts on individuals. We treat factors and loadings as fixed effects to allow for hidden complex structures.

vnt =(

v1t,nT · · · vnt,nT

)′is a n×1 vector of idiosyncratic shocks with zero mean and finite variance σ2

v .

The factor structure is a flexible yet parsimonious way to model common shocks or a variance-covariance

structure of a panel.1

2.2. Sources of endogeneity

When the degree of spatial interaction is jointly determined with the outcome, Wnt becomes endogenous.

Over time, the spatial weights matrix may also be time varying to reflect changing economic environments

and network links. In this section, we illustrate the types of endogeneity covered in this paper.

The fixed effects formulation allows correlation between Wnt and ynt due to the common factors. For

example, the spatial weights can be influenced by the same time factors fyt that impact ynt , i.e., wit, jt =

h(

γ ′w,i j fyt + εit, jt

)with conformable factor loading γw,i j and even εit, jt being independent from vnt , where

h(·) is some transformation. This can be covered once time factors are treated as fixed time effects in

estimation. More interestingly, the spatial weights matrix may be constructed by variables that may correlate

with disturbances of the outcomes. Suppose there are p variables, zit1, · · · ,zit p used to construct Wnt . The

equations for zitl , l = 1, · · · , p, are

zitl = x′itzlβzl + γ′izl fzt + εitl, (2)

where xitzl are kzl×1 regressors with corresponding coefficient vector βzl , and the unobservables have two

components, fzt consists of Rz×1 time factors with loading γizl and εitl is idiosyncratic error. Stacking Eq.

(2) across i and then over l for time period t, we have znt = (z1t1, · · · ,znt1,z1t2, · · · ,znt p)′ an np× 1 vector

1In the event that time factors are present but mistakenly ignored, the spatial lags might mistakenly capture the influence ofthe time factors as it were generated by spatial interactions in estimation. So it is important to control for time factors in order toproperly recover spatial interaction.

4

which has the structure,

znt = Xntzβz + Γnz fzt + εnt , where Xntz =

x′1tz1... 0n×kzp

x′ntz1...

. . ....

x′1tzp

0n×kz1

...

x′ntzp

(3)

is np× kz with kz = kz1 + · · ·+ kzp; βz is a column vector of dimension kz, βz = (β ′z1, · · · ,β ′zp)′; Γnz =

(γ1z1, · · · , γnz1, γ1z2, · · · , γnzp)′ is np×Rz; and εnt = (ε1t1, · · · ,εnt1,ε1t2, · · · ,εnt p)

′ is an np× 1 vector. Note

that Xnty and Xntz may have some common regressors. Let wit, jt = hit, jt(znt ,ρit, jt) for i, j = 1, · · · ,n, i 6= j,

where h(·)’s are uniformly bounded functions. wit, jt exhibits limited spatial dependence via distance ρit, jt

which will be made precise in Assumption 6.

The following conditional moment assumption specifies that the disturbances in znt may correlate with

vit of the ynt equation. Indirectly, as Wnt is a function of znt , the spatial weights may be endogenous. For an

individual i, collect the error term εit p in its zit p equation and define the p×1 vector εit = (εit1, · · · ,εit p)′.

Assumption 2. The error terms(vit , ε ′it

)′ are independently and identically distributed across i and over

t, and have a joint distribution(vit , ε ′it

)′ ∼ (0,Σvε), where Σvε =

(σ2

v σ ′vε

σvε Σε

)is positive definite, σ2

v is

a scalar variance, the covariance σvε =(σvε1 · · · σvεP

)′ is a p× 1 vector, and Σε is a p× p matrix.Furthermore, supn,T supi,t E |vit |4+δε and supn,T supi,t E‖εit‖4+δε exist for some δε > 0. Denote E(vit |εit) =

ε ′itδ and define ξit = vit−ε ′itδ . Assuming that E(ξ 2

it |εit)=E

(ξ 2

it)=σ2

ξ, E(ξ 3

it |εit)=E

(ξ 3

it

)and E

(ξ 4

it |εit)=

E(ξ 4

it).

Assumption 2 is used in Qu and Lee (2015) to model the endogeneity of the spatial weights matrix.

If σvε 6= 0, the spatial weights matrix correlates with disturbances in the outcome equation of the same

period and estimation methods that rely on exogenous spatial weights matrices might no longer be valid.

As E(ξit |εit) = 0 and var(ξit |εit) = σ2ξ

, the disturbances in the model can therefore be rewritten in terms

of ςit =(

ξit ε ′it

)′. Note that the conditional moment assumptions do not require ξit to be independent

from εit as they are sufficient for our analysis. Denote EnT = (εn1, · · · ,εnT ) and ΞnT =(~ξn1, · · · ,~ξnT

)with

~ξ nt = (ξ1t , · · · ,ξnt)′, respectively np×T and n×T matrices of the disturbances.

Claim 1. Under Assumption 2, ‖ΞnT‖2 = OP

(√max(n,T )

)and ‖EnT‖2 = OP

(√max(n,T )

).

The claim is proved in Lemma S.2.1 of Moon and Weidner (2014) using the result of Latala (2005).

3. Estimation method

The parameters of interest of the model consisting of Eqs. (1) and (3) are θ =(

β ′z ,β′y,λ ,σ

2ξ,α ′,δ ′

)′where βz and βy are respectively the regression coefficients in Eqs. (3) and (1), λ is the spatial interaction

5

coefficient, and σ2ξ

, α and δ are from the variance structure of the disturbances with α being a J×1 vector of

distinct elements in Σε . The unobserved time factors are collected in matrices FTy =(

fy1, · · · , fyT)′ and FT z =

( fz1, · · · , fzT )′, which have dimensions T × Ry and T ×Rz respectively. In view of the unobserved nature of

the common factor components (Γnz, Γny, FT z, FTy), we shall make minimal assumptions on their structures

and allow the znt and ynt equations to be impacted by possibly different factors. As the common factors and

loadings are allowed to correlate with the regressors, they are treated as parameters to be estimated. The

sample average log quasi-likelihood function is

1nT

logLnT (θ , Γnz, Γny,FT z, ΓTy)

=− p+12

log(2π)− 12

log |Σε |−12

logσ2ξ+

1nT

T

∑t=1

log |Snt(λ )|

− 12nT

T

∑t=1

(znt −Xntzβz− Γnz fzt

)′ (Σ−1ε ⊗ In

)(znt −Xntzβz− Γnz fzt

)− 1

2σ2ξ

nT

T

∑t=1

(Snt(λ )ynt −Xntyβy−

(δ′⊗ In


)− Γny fyt

)′×(Snt(λ )ynt −Xntyβy−

(δ′⊗ In


)− Γny fyt

),

where Snt(λ ) = In − λWnt . Let Γnz =

(Σ− 1

2ε ⊗ In

)Γnz. The common component in the ynt equation is

σ−1ξ

Γny fyt−σ−1ξ

(δ ′⊗ In) Γnz fzt = Γny fyt with the factor loading Γny appropriately constructed from σ−1ξ

Γny

and σ−1ξ

(δ ′⊗ In) Γnz, and fyt collects the factors in fyt and fzt . For example, if fyt = fzt ,

Γny = σ−1ξ

[Γny− (δ ′⊗ In) Γnz

]and fyt = fyt . The dimension of fyt is Ry×1, and write FTy = ( fy1, · · · , fyT )

′.

Then the above equation can be written as

1nT

logLnT (θ ,Γnz,Γny,FT z,FTy)

=− p+12

log(2π)− 12

log |Σε |−12

logσ2ξ+

1nT

T

∑t=1

log |Snt(λ )|

− 12nT

T

∑t=1

[(Σ− 1

2ε ⊗ In

)(znt −Xntzβz)−Γnz fzt

]′[(Σ− 1

2ε ⊗ In


](4)

− 12nT

T

∑t=1

[σ−1ξ


(δ′⊗ In

)(znt −Xntzβz)

)−Γny fyt

]′×[σ−1ξ


(δ′⊗ In

)(znt −Xntzβz)

)−Γny fyt

]. (5)

In Eq. (5), (δ ′⊗ In)(znt −Xntzβz) controls for the correlation between znt and the disturbances in the ynt

equation. As sample expands in n and T , the number of parameters in factors and their loadings also

increases. Because the parameter of interest is θ , we concentrate out factors and their loadings using the

following lemma.

Lemma 1. Let n, T , R be positive integers such that R ≤ n and R ≤ T . Let ZnT be an n×T matrix, Γn bean n×R matrix, and FT be an T ×R matrix. Then minFT∈RT×R,Γn∈Rn×R tr

((ZnT −ΓnF ′T )

′ (ZnT −ΓnF ′T ))=

minΓn∈Rn×R tr(Z′nT MΓnZnT ) = ∑nr=R+1 µi (ZnT Z′nT ) = ∑

Tr=R+1 µi (Z′nT ZnT ).

6

This lemma relates the estimation of factor loadings as the problem of principle components in statistics.

The proof can be found, for example, in Lemma A1 of Moon and Weidner (2014). The concentrated log

likelihood is2

QnT (θ) =−p+1

2log(2π)− 1

2log |Σε |−

12

logσ2ξ+

1nT

T

∑t=1

log |Snt(λ )|−12

LznT (θ)−

12

LynT (θ), (6)

where

LznT (θ) =

1nT

np

∑r=Rz+1

µr

(T

∑t=1

dntd′nt

), with dnt =

(Σ− 1

2ε ⊗ In

)(znt −Xntzβz) , and,

LynT (θ) =

1nT

n

∑r=Ry+1

µr

(T

∑t=1

gntg′nt

), with gnt = σ

−1ξ


(δ′⊗ In

)(znt −Xntzβz)

).

The QML estimator is θ = argmaxθ∈Θ QnT (θ). The estimate Γny (Γnz) for Γny (Γnz) can be obtained as the

eigenvectors associated with the first Ry (Rz) largest eigenvalues of ∑Tt=1 gntg′nt (∑T

t=1 dntd′nt). By switching

n and T , the estimate FTy (FT z) for FTy (FT z) can be similarly obtained. Note that Γny and FTy (Γnz and FT z)

are not unique as ΓnyHH−1F ′Ty is observationally equivalent to ΓnyF ′Ty for an invertible Ry×Ry matrix H.

However, the column spaces of Γny and FTy (Γnz and FT z) are invariant to H, hence the projectors MΓny

, MFTy,

MΓnz

and MFT zare uniquely determined.

Our model also covers two simpler cases of interest.

• Seemingly unrelated regression (SUR) equations panel with factors

By considering the equations of znt alone, one is considering the estimation of a p equation SUR model with

unobserved common factors, which extends the single equation model (e.g., Pesaran (2006) and Bai (2009))

by allowing cross equation correlations. The likelihood component of znt , t = 1, · · · ,T , is

− p2

log(2π)− 12

log |Σε |−1

2nT

T

∑t=1

[(Σ− 1

2ε ⊗ In


]′[(Σ− 1

2ε ⊗ In


].

• Exogenous spatial weights matrices

In the event that vit is independent from εit , δ = 0 and the model becomes a spatial panel with unobserved

common factors but weights matrices are exogenous. In this case, the outcome equation can be estimated

separately from the znt equation. Imposing δ = 0, the likelihood component of ynt , t = 1, · · · ,T , from (5)

becomes

− 12

log(2π)− 12

logσ2ξ+

1nT

T

∑t=1

log |Snt(λ )|

2In Shi and Lee (2015), we also concentrate out the variance parameter. Because the correlations between equations are also ofinterest in this paper, the variance parameters are not concentrated out here. For example, when the disturbances are jointly normal

and p = 1, the correlation coefficient (ρ) between εit and vit can be solved from δ = ρσvΣ− 1

2ε and σ2

ξ=(1−ρ2)σ2

v .

7

− 12nT

T

∑t=1

[σ−1ξ

(Snt(λ )ynt −Xntyβy)−Γny fyt

]′ [σ−1ξ

(Snt(λ )ynt −Xntyβy)−Γny fyt

],

with Γny = σ−1ξ

Γny. If the spatial weights matrix is time invariant, the case is covered in Shi and Lee (2015).

Here we allow for the more general case with time varying spatial weights matrices. Spatial panel models

with time varying weights matrices have been considered in Lee and Yu (2012); however, those models have

only additive individual and time effects but not interactive effects.

4. Asymptotic properties of the QML estimator

4.1. AssumptionsAssumption 3. (1) The parameter vector θ is in a compact set Θ in the Euclidean space Rkθ with kθ =kz + ky +2+ J+ p, where kz is the dimension of βz, ky is the dimension of βy, J is the dimension of α whichcontains distinct elements in Σε , and p is the dimension of δ . The other two parameters are the spatialinteraction coefficient λ and the variance σ2

ξ. The true parameter vector θ0 is in the interior of Θ.

(2) For any i, j, n and t, the spatial weights wit, jt ≥ 0, wit,it = 0. Furthermore, Wnt and S−1nt are uni-

formly bounded in absolute value in both row and column sums, where Snt = In− λ0Wnt . Furthermore,let supn,t ‖Wnt‖∞

≤ cw with probability 1 for some finite constant cw and supλ∈Λ |λ |cw < 1, where Λ is theparameter space for λ .

(3) All elements in Xnty and Xntz are deterministic and bounded in absolute value.(4) There are respectively Rz0 and Ry0 factors in the znt and ynt equations. The factors and their loadings

are nonstochastic and uniformly bounded.(5) n is an increasing function of T and T goes to infinity.

Assumption 3(2) limits the spatial interaction to a manageable degree. The non-stochastic assumptions

in (3) and (4) are used here to simplify the analysis in order to concentrate on the key issue of endogenous

spatial dependence in the presence of common time factors, which generate another form of cross sectional

dependence, but these assumptions can be relaxed by imposing moment conditions, see Moon and Weidner

(2014). The asymptotics of this paper are derived with both n and T jointly increasing, as in (5). The

following assumptions rule out collinearity among the regressors and are critical for parameter identification.

Notation 2. Define the following n× T dimensional matrices of regressors in the z and y equations. Fork = 1, · · · ,kz, XnT z,k =

(Xn1z,k · · · XnT z,k

)and Xntz,k is the k-th column of Xntz. For k = 1, · · · ,ky, XnTy,k =(

Xn1y,k · · · XnTy,k)

and Xnty,k is the k-th column of Xnty.

Assumption 4. There exists positive constants cz, cy, such that

minlz∈Bkz

np

∑r=2Rz+1

µr

(1

nT(lz ·Zz

nT )(lz ·ZznT )′)≥ cz > 0,

minly∈Bky

n

∑r=2Ry+1

µr

(1

nT

(ly ·Zy

nT

)(ly ·Zy

nT

)′)≥ cy > 0,

wpa 1 as n,T →∞, where Bkz (Bky) is a unit ball in the kz (ky+1+ p)-dimensional Euclidean space; lz (ly) isa kz×1 ((ky+1+ p)×1) vector with ‖lz‖2 =

√l′zlz = 1 (‖ly‖2 = 1); furthermore, lz ·Zz

nT =∑kzk=1 lz,kXnT z,k and

ly · ZynT = ∑

ky+1+pk=1 ly,kZy

k,nT , where for k = 1, · · · ,ky, Zyk,nT = XnTy,k;

Zyky+1,nT =

(Gn1

(Xn1yβy0 + εn1δ0 + Γny0 fy10

)· · · GnT

(XnTyβy0 + εnT δ0 + Γny0 fyT 0

))with Gnt = WntS−1

nt

8

and εnt =(ε1t · · · εnt

)′ is n× p; and for k = 1, · · · , p, Zyky+1+k,nT =

(εn1,k · · · εnT,k

)with εnt,k being the

k-th column of εnt .

Note that despite the simultaneity in the model, no exclusion restrictions are necessary, because of the

nonlinear interactions of Wnt and ynt and the nonlinearity of znt in Wnt . Assumption 4 will not be satisfied

if rank(

Zyky+1,nT

)≤ 2Ry, which can happen if βy0 = 0, δ0 = 0 and the spatial weights matrices are time

invariant. This assumption will also be violated if Gnt(Xntyβy0 + εntδ0 + Γny0 fyt0

)is linearly dependent on

Xnty and εnt,k. We provide the following alternative conditions, which do not include the above term with

Gnt , but with additional condition on variance structures in terms of the reduced form disturbances. The

later condition is motivated in Lee (2004), which is relevant for the identification of λ .

Assumption 5. (1) There exists positive constants cz, cy, such that

minlz∈Bkz

np

∑r=2Rz+1

µr

(1

nT(lz ·Zz

nT )(lz ·ZznT )′)≥ cz > 0,

minly∈Bky

n

∑r=2Ry+1

µr

(1

nT

(ly ·Zy

nT

)(ly ·Zy

nT

)′)≥ cy > 0,

wpa 1 as n,T → ∞, where Bkz (Bky) is a unit ball in the kz (ky + p)-dimensional Euclidean space; lz (ly)is a kz × 1 ((ky + p)× 1) nonzero vector with ‖lz‖2 =

√l′zlz = 1 (‖ly‖2 = 1). Furthermore, lz · Zz

nT =

∑kzk=1 lz,kXnT z,k. In addition, ly · Zy

nT = ∑ky+pk=1 ly,kZy

k,nT . For k = 1, · · · ,ky, Zyk,nT = XnTy,k; for k = 1, · · · , p,

Zyky+k,nT =

(εn1,k · · · εnT,k

)and εnt,k is the k-th column of εnt , with εnt =

(ε1t · · · εnt

)′ an n× p matrix.(2) For any λ ∈Θλ , λ 6= λ0,

lim infn,T→∞

(1

nT

T

∑t=1

tr(Snt(λ )

′Snt(λ )S−1nt S−1

nt′)− T

∏t=1

∣∣Snt(λ )′Snt(λ )S−1

nt S−1nt′∣∣ 1

nT

)> 0.

Assumption 5(2) ensures the identification of λ . By the arithmetic and geometric means inequality,

1nT ∑

Tt=1 tr

(Snt(λ )


nt′)≥ 1

T ∑Tt=1


nt S−1nt′∣∣ 1

n ≥(

∏Tt=1


nt S−1nt′∣∣ 1

n

) 1T

.

So this assumption essentially requires that the inequality holds strictly in the limit for λ 6= λ0.

Qu and Lee (2015) provides a useful LLN for some key statistics for a cross sectional spatial model with

an endogenous spatial weights matrix, which is derived using properties of spatial NED processes (Jenish

and Prucha (2012)). That result will also be useful for our panel model by extending spacial location into

the time-space location setting. Write zit =(

zit1 · · · zit p

)′.

Assumption 6. (1) The spatial weights satisfy wit. jt ≥ 0, wit,it = 0, and wit, jt = 0 if ρit, jt > ρc, i.e., thereexists a threshold ρc > 1 such that the weight is zero if the geographic distance exceeds ρc. For i 6= j,wit, jt = hi j(zit ,z jt)I(ρit, jt < ρc) or the row normalized version that wit, jt =

hi j(zit ,z jt)I(ρit, jt<ρc)

∑nk=1 hik(zit ,zkt)I(ρit,kt<ρc)

, wherehi j(·)′s are non-negative, uniformly bounded functions. (2) The function hi j(·) satisfies the Lipschitz condi-tion,

∣∣hi j(a1,b1)−hi j(a2,b2)∣∣≤ c0 (|a1−a2|+ |b1−b2|) for some finite constant c0.

The underlying topological structure provides a bound on the degree of spatial dependence. Intuitively,

this assumption requires that an individual is connected to other individuals at most ρc distance away and

the functions that construct the spatial weights are uniformly bounded and smooth. Many functions satisfy

9

those properties. For the LLNs in Lemmas 2 and 4 below, the terms that involve the spatial weights need to

satisfy the NED property, which can be derived from the NED property of z with smooth function h(·)’s, as

in Qu and Lee (2015) and Qu et al. (2015).

4.2. Consistency

To accommodate time varying spatial effects, time is considered as an additional dimension in an indi-

vidual’s location vector, as Assumption 1 shows. Viewing individuals at different time points as different

individuals, the panel data can be structured as a large collection of nodes with interactions via space-time

locations. Define GnT = diag(Gn1, · · · ,GnT ) which is a nT×nT diagonal block matrix with each block n×n

submatrix. Let ς∗it be a real valued function such that ς∗it = fit(ξit ,εit ,Xnty,Xntz,Γnz,FT z,Γny,FTy,θ0), and de-

note ς∗nT = (ς∗11, · · · ,ς∗n1,ς∗12, · · · ,ς∗nT )

′ a nT ×1 vector. To prove the consistency of the QML estimator, the

LLN and CLT in Qu and Lee (2015), Propositions 1 and 2, are reproduced here with modification for the

panel data setting.

Lemma 2. Under Assumptions 1, 3(2) and 6, suppose supn,T supi,t E‖ς∗it‖42 < ∞, then

1nT

(ς∗nT′G′nT ς

∗nT −E

(ς∗nT′G′nT ς

∗nT))

= OP(1√nT

), and

1nT

(ς∗nT′G′nT GnT ς

∗nT −E

(ς∗nT′G′nT GnT ς

∗nT))

= OP(1√nT

).

The above is analogue to a cross section data with a spatial weights matrix. Because the asymptotics are

derived with both n and T increasing, the preceding result originated in Qu and Lee (2015) holds by consid-

ering an “extended” cross section data of nT individuals with the nT ×nT matrix GnT . It is straightforward

to check that GnT with Assumptions 3(2) and 6 satisfy Qu and Lee (2015)’s Assumptions 3(1) and 4 for the

LLN. The following results on matrix norms are useful.

Claim 2. Under Assumptions 1,2,3 and 6,∥∥XnT z,k

∥∥2 = O

(√nT)

for k = 1, · · · ,kz,∥∥∥Zy

k,nT

∥∥∥2= OP(

√nT )

for k = 1, · · · ,ky +1, and∥∥∥Zy

ky+1+k,nT

∥∥∥2= OP(max

(√n,√

T)) for k = 1, · · · , p, where XnT z,k and Zy

k,nT aredefined in Notation 2 and Assumption 4.

Claim 1 in Subsection 2.2 is for i.i.d. disturbances. The following result, whose proof is in the appendix,

generalizes it to allow spatially dependent errors.

Claim 3. Under Assumptions 2 and 3 (1) and (2), ‖UnT‖2 = OP

(√nT(

n−14 +T−

14

)), where UnT =

(un1, · · · ,unT ) with unt = Gntξnt or unt = Snt(λ )S−1nt ξnt .

The following proposition establishes that the QML estimator is consistent under appropriate regularity

conditions. Notice that the consistency only requires the number of factors specified in the model to be

not fewer than the true number of factors. Lemma 2 is used to show that the sum of squared residuals

in the likelihood is dominated by ‖θ −θ0‖22. Note that for this purpose the relevant terms do not involve

projectors. When we later show that the variance of the estimator can be consistently estimated, Lemma 2

will be extended to allow for projectors, see Lemma 4. We can also establish a possible rate of convergence.

The convergence rate can be improved to 1√nT

if the number of factors is correctly specified, as shown in the

next subsection.

10

Proposition 1. Under Assumptions 1-3, 4 or 5, 6, and assuming that the number of factors is not under-estimated, i.e., Rz ≥ Rz0 and Ry ≥ Ry0, then

∥∥θ −θ0∥∥

2 = oP(1). Furthermore, under Assumptions 1-3, 4,

6, Rz ≥ Rz0 and Ry ≥ Ry0,∥∥∥βz−βz0

∥∥∥2= Op

(c−

12

nT

),∥∥∥βy−βy0

∥∥∥2= Op

(c−

12

nT

),∣∣∣λ −λ0

∣∣∣= Op

(c−

12

nT

)and∥∥∥δ −δ0

∥∥∥2= Op

(c−

12

nT

), where cnT = min(n,T ).

The proof is in the appendix.

4.3. Asymptotic normality of the QML estimator

To derive the asymptotic distribution of the QML estimator, the mean value theorem or a Taylor ex-

pansion is often used to derive a proper linear-quadratic expansion of the log quasi-likelihood at their true

parameter values. Such a technique is difficult for this model as the log likelihood involves the sum of certain

eigenvalues of the sum of squared residuals. However, because the eigenvalues in the objective function are

0 if θ = θ0, EnT = 0 and ΞnT = 0, and they will be close to 0 when small perturbations are applied, the per-

turbation theory of linear operator are useful (see Kato (1995)). The perturbations are small asymptotically

under assumptions on error disturbances and the consistency of the QML estimator. Moon and Weidner

(2015) use the perturbation method to obtain an approximate gradient vector and an approximate Hessian

matrix for a regression panel with factors. Shi and Lee (2015) apply the method to a QML objective function

of a spatial panel data model. Here we extend our analysis in Shi and Lee (2015) to LznT (θ) and Ly

nT (θ) in

Eq. (6) by taking into account the presence of endogenous and time varying spatial weights matrices. An

additional feature is the presence of variance matrix Σε of the log likelihood function. In order to utilize the

perturbation theory, we adopt the following reparameterization.

Assumption 7. Let Σ− 1

2ε represent the principal square root of Σ−1

ε and it can be linearly parameterized, as

Σ− 1

2ε = ∑

Jj=1 Ω jα j where each Ω j does not involve unknown parameters, Ω1, · · · ,ΩJ are linearly indepen-

dent, and the α’s are J unique unknown elements of Σ− 1

2ε .

Example 1. Supposing p = 2 and no further restrictions are imposed on Σ− 1

2ε , then Σ

− 12

ε has three distinct

elements. So one way to parameterize Σ− 1

2ε is Σ

− 12

ε = ∑3j=1 Ω jα j with Ω1 =

(1 00 0

), Ω2 =

(0 00 1

)and

Ω3 =

(0 11 0

).

Because the common factors and loadings are concentrated out in estimation, in the following, Γnz,

FT z, Γny and FTy refer to their true values in the data generating process, as they are parts of the DGP

for asymptotic analysis. The asymptotic distribution of the QMLE is derived under the asymptotics that

limn,T→∞nT → κ > 0.

Assumption 8. The numbers of factors, Rz0 and Ry0 are constants and known. Γnz, FT z, Γny and FTy as de-fined in Eqs. (4) and (5) are nonstochastic. limn→∞

1n Γ′nyΓny =Γy, limn→∞

1np Γ′nzΓnz =Γz , limT→∞

1T F ′T zFT z =

Fz and limT→∞1T F ′TyFTy = Fy exist and are positive definite.

The above assumption which has also been used in Bai (2009) and Moon and Weidner (2015), states

that each factor contributes a nontrivial share towards the common factor component. When normalized

11

by the sample size nT , this assumption ensures that the eigenvalues from the common factor component

are bounded away from 0 while those from the idiosyncratic errors are close to 0, and thus they are well

separated.

Firstly consider LznT (θ) in Eq. (6) for the spatial weights formation. Let DnT = (dn1, · · · ,dnT ) with

dnt =

(Σ− 1

2ε ⊗ In

)(znt −Xntzβz). We have DnT = ΓnzF ′T z +∑

kz+J+1k=1 η

zkV z

k +∑kz+J+1k1,k2=1 η

zk1

ηzk2

V zk1k2

where

1. For k = 1, · · · ,kz, ηzk = βz0,k−βz,k, V z

k =

(Σ− 1

2ε0 ⊗ In

)XnT z,k.

2. For k = 1, · · · ,J, ηzkz+k = αk0−αk and V z

kz+k =−(Ωk⊗ In)(EnT + ΓnzF ′T z

).

3. ηzkz+J+1 =

‖EnT ‖2√nT

and V zkz+J+1 =

(Σ− 1

2ε0 ⊗ In

)EnT√

nT‖EnT ‖2

.

4. For k1 = 1, · · · ,kz and k2 = 1, · · · ,J, V zk1,kz+k2

=−(Ωk2⊗ In) XnT z,k1 and V zkk′ = 0 otherwise.

The component under analysis here is a SUR panel with p equations and correlations between the equations.

Items (2) and (4) come from the variance. It extends a single regression panel with factors in Moon and

Weidner (2015). Then, we have

LznT (θ) =

1nT

np

∑r=Rz0+1

µr(DnT D′nT

)=

1nT

np

∑r=Rz0+1

µr

(T (0)+

kz+J+1

∑k1=1

ηzk1

T (1)k1

+kz+J+1

∑k1,k2=1

ηzk1

ηzk2

T (2)k1k2

+kz+J+1

∑k1,k2,k3=1

ηzk1

ηzk2

ηzk3

T (3)k1k2k3

+kz+J+1

∑k1,k2,k3,k4=1

ηzk1

ηzk2

ηzk3

ηzk4

T (4)k1k2k3k4

), (7)

where T (0) = ΓnzF ′T zFT zΓ′nz, T (1)

k1= V z

k1FT zΓ

′nz + ΓnzF ′T zV

zk1′, T (2)

k1k2= V z

k1V z

k2′ +V z

k1k2FT zΓ

′nz + ΓnzF ′T zV

zk1k2′,

T (3)k1k2k3

= V zk1

V zk2k3′+V z

k1k2V z

k3′ and T (4)

k1k2k3k4= V z

k1k2V z

k3k4′. Under Assumption 8, 1

nT ∑npr=Rz0+1 µr

(T (0)

)= 0,

because 1nT T (0) has exactly Rz0 positive eigenvalues and the remaining ones are zeros. For small η

zk , Lz

nT (θ)

can be approximated by a quadratic polynomial of ηzk . Using Lemma D.1 and proper but simplified expres-

sions of Lemma D.2 in Shi and Lee (2015), we have

LznT (θ) =Lz

nT (θ0)+2kz+J

∑k=1

ηzk

1nT

tr(

MΓnz

(Σ− 1

2ε0 ⊗ In

)EnT MFT zV

zk′)

−2kz+J

∑k=1

ηzk

1nT

tr(

MΓnzVzk MFT zE

′nT

(Σ− 1

2ε0 ⊗ In

)PΓnz,FT zE

′nT

(Σ− 1

2ε0 ⊗ In

))−2

kz+J

∑k=1

ηzk

1nT

tr(

MΓnz

(Σ− 1

2ε0 ⊗ In

)EnT MFT zV

zk′PΓnz,FT zE

′nT

(Σ− 1

2ε0 ⊗ In

))−2

kz+J

∑k=1

ηzk

1nT

tr(

MΓnz

(Σ− 1

2ε0 ⊗ In

)EnT MFT zE

′nT

(Σ− 1

2ε0 ⊗ In

)PΓnz,FT zV

zk′)

+kz+J

∑k1,k2=1

ηzk1

ηzk2

1nT

tr(

MΓnzVzk1

MFT zVzk2

′)+Lrem,z

nT (θ), (8)

where MΓnz and MFT z are the projection matrices, PΓnz,FT z =Γnz(Γ′nzΓnz

)−1 (F ′T zFT z)−1 F ′T z, and Lrem,z

nT (θ) cap-

tures remainder terms that can be shown to be OP

(‖θ −θ0‖3

2

)+OP

(‖θ −θ0‖2

2 n−12

)+OP

(‖θ −θ0‖2 n−

32

)+

12

OP

(n−

52

). Note that the perturbations that are purely due to the idiosyncratic errors (ηz

kz+J+1V zkz+J+1) are

collected in LznT (θ0), as

LznT (θ0) =

1nT

np

∑r=Rz0+1

µr

(ΓnzF ′T zFT zΓ

′nz +

(Σ− 1

2ε0 ⊗ In

)EnT FT zΓ

′nz +ΓnzF ′T zE

′nT

(Σ− 1

2ε0 ⊗ In

)+

(Σ− 1

2ε0 ⊗ In

)EnT E ′nT

(Σ− 1

2ε0 ⊗ In

)).

Next consider gnt = σ−1ξ

(Snt(λ )ynt −Xntyβy− (δ ′⊗ In)(znt −Xntzβz)) in Eq. (6) for the outcome equa-

tion. By reparameterization, let αξ = σ−1ξ

. Define the following terms

1. For k = 1, · · · ,kz, ηyk = βz0,k−βz,k and V y

k =−σ−1ξ 0 (δ ′0⊗ In) XnT z,k.

2. For k = 1, · · · ,ky, ηykz+k = βy0,k−βy,k, V y

kz+k = σ−1ξ 0 XnTy,k.

3. ηykz+ky+1 = λ0− λ , V y

kz+ky+1 = σ−1ξ 0 XnTy,ky+1 with the t-th column of XnTy,ky+1 given by Xnty,ky+1 =

Gnt(Xntyβy0 + Γny fyt + εntδ0 +ξnt

).

4. ηykz+ky+2 = αξ 0−αξ and V y

kz+ky+2 =−ΞnT −σξ 0ΓnyF ′Ty.

5. For k = 1, · · · , p, ηykz+ky+2+k = δ0k− δk, define ek as a p× 1 vector whose k-th element is 1 and all

others are 0 and V ykz+ky+2+k = σ

−1ξ 0

(e′k⊗ In

)(∑

kzk=1 XnT z,k (βz0,k−βz,k)+ ΓnzF ′T z +EnT

).

Item (3) comes from the spatial interaction and (5) comes from the control function. Similarly, the pertur-

bation theory gives

LynT (θ)

=LynT (θ0)+

kz+ky+2+p

∑k=1

ηyk

2nT σξ 0

tr(MΓnyΞnT MFTyV

yk′)

−kz+ky+2+p

∑k=1

ηyk

2nT σ2

ξ 0tr(MΓnyV

yk MFTyΞ

′nT PΓny,FTyΞ

′nT +MΓnyΞnT MFTyV

yk′PΓny,FTyΞ

′nT +MΓnyΞnT MFTyΞ

′nT PΓny,FTyV

yk′)

− (λ0−λ )(αξ 0−αξ )2

nT σ2ξ 0

vec(ΞnT )′ GnT vec(ΞnT )+

kz+ky+2+p

∑k1,k2=1

ηyk1

ηyk2

1nT

tr(

MΓnyVyk1

MFTyVyk2

′)+Lrem,y

nT (θ),

(9)

where

LynT (θ0) =

1nT

n

∑r=Ry0+1

µr

(ΓnyF ′TyFTyΓ

′ny +σ

−1ξ 0 ΞnT FTyΓ

′ny +σ

−1ξ 0 ΓnyF ′TyΞ

′nT +σ

−2ξ 0 ΞnT Ξ

′nT

),

and the remainder term satisfies Lrem,ynT (θ)=OP

(‖θ −θ0‖3

2

)+OP

(‖θ −θ0‖2

2 n−12

)+OP

(‖θ −θ0‖2 n−

32

)+

OP

(n−

52

). In Eq.(9), the term (λ0−λ )(αξ 0−αξ )

2nT σ2

ξ 0vec(ΞnT )

′ GnT vec(ΞnT ) comes from the interaction

between the spatially generated regressor XnTy,ky+1 and the error term, while the interaction between the

exogenous regressors and the error term can be absorbed into OP

(‖θ −θ0‖2

2 n−12

)in the remainder.

The total number of parameters in θ is k = kz + ky + 2+ J + p. Substituting Eqs.(8) and (9) into the

QML objective function (6), and expanding log∣∣∣∣Σ− 1

2ε

∣∣∣∣, logσ−1ξ

and log |Snt(λ )| around their respective true

13

values, we have

QnT (θ) =−p+1

2log2π− 1

2

(logσ

2ξ 0−2σξ 0

(αξ −αξ 0

)+σ

2ξ 0

(αξ −αξ 0

)2)

− 12

(log |Σε0|−2

J

∑j=1

tr(

Σ12ε0Ω j

)(α j−α0)+

J

∑j,k=1

tr(

Σ12ε0Ω jΣ

12ε0Ωk

)(α j−α j0)(αk−αk0)

)

+1

nT

T

∑t=1

(log |Snt |− tr(Gnt)(λ −λ0)−

12

tr(G2

nt)(λ −λ0)

2)

− 12

LznT (θ0)−

12

LynT (θ0)+

1√nT

(θ −θ0)′C(1)

nT −12(θ −θ0)

′CnT (θ −θ0)+QremnT (θ)

=QnT (θ0)+1√nT

(θ −θ0)′ C(1)

nT −12(θ −θ0)

′ CnT (θ −θ0)+QremnT (θ), (10)

where C(1)nT and C(1)

nT are k× 1 vectors, CnT and CnT are k× k matrices, and the remainder term satisfies

QremnT (θ) = OP

(‖θ −θ0‖3

2

)+OP

(‖θ −θ0‖2

2 n−12

)+OP

(‖θ −θ0‖2 n−

32

)+OP

(n−

52

). The detailed ex-

pressions can be found in the appendix.

Define γnT = 1√nT

C−1nT C(1)

nT , enT = vec((

Σ− 1

2ε0 ⊗ In

)EnT

), and ξnT = σ

−1ξ 0 vec(ΞnT ). The limiting dis-

tribution of the random vector C(1)nT can be derived using the Cramér-Wold device. Let v be an arbitrary

conformable vector, which gives a linear combination RvnT = ν ′C(1)

nT of columns of C(1)nT . Rv

nT can be ex-

pressed in the following general form,

RvnT−ϕ

vnT =

1√nT

(av

nT′enT + e′nT Av

nT enT +bv1nT′ξnT + ξ

′nT Bv

nT ξnT + e′nT DvnT ξnT − tr(Av

nT )− tr(BvnT ))+oP(1),

where ϕvnT = ν ′ϕnT with ϕnT defined in Proposition 2 below is the leading order bias term. The bν

1nT and

BνnT are in general functions of enT , while av

nT , AvnT and Dv

nT are constant vectors or matrices. The limiting

distribution of RνnT −ϕν

nT has been derived in Qu et al. (2015) using martingale differences CLT.

Lemma 3. Let σ2Rv

nTdenote the variance of Rv

nT −ϕvnT . Assuming that σ2

Rνnt

is bounded away from 0 and

Assumptions 1-3 and 6-8 hold, then RνnT−ϕν

nTσRν

nT

d→ N(0,1).

To show the probability limit of its variance, appropriate LLNs are needed. In Qu et al. (2015), in order

to concentrate out additive individual and time effects, it gives rise to statistics of variables transformed by

the projectors In− 1n`n`

′n and IT − 1

T `T `′T where `n and `T are vectors of 1’s. Because individual and/or time

observations of spatial units are averaged over space and/or time, the LLN in preceding Lemma 2 is not

applicable, and they establish additional LLN in their Proposition 1. In this paper with interactive effects

instead of additive effects, the projectors of loadings and factors are more general. However, MΓny = In−1n Γny

(1n Γ′nyΓny

)−1Γ′ny = In− 1

n ∑Ry0r=1 ζny,rζ

′ny,r and MFTy = IT − 1

T FTy

(1T F ′TyFTy

)−1F ′Ty = In− 1

n ∑R0r=1 ζTy,rζ

′Ty,r,

where ζny,r and ζTy,r are respectively the r-th columns of Γny(1

n Γ′nyΓny)− 1

2 and FTy

(1T F ′TyFTy

)− 12. With

Assumptions 8 and 3(4), elements of ζny,r and ζTy,r are uniformly bounded and it can be checked that the

LLNs in Qu et al. (2015) can be extended to statistics with these general projectors.

14

Lemma 4. Under Assumptions 1,2,3,6 and 8, define ς∗nT as in Lemma 5, then

1nT

(ς∗nT′G′nT

(MFTy⊗MΓny

)ς∗nT −E

(ς∗nT′G′nT

(MFTy⊗MΓny

)ς∗nT))

= oP(1), and

1nT

(ς∗nT′G′nT

(MFTy⊗MΓny

)GnT ς

∗nT −E

(ς∗nT′G′nT

(MFTy⊗MΓny

)GnT ς

∗nT))

= oP(1).

Lemma 3 implies that∥∥∥C(1)

nT

∥∥∥2= OP (1). Completing the squares on the right hand side of Eq. (10),

QnT (θ) = QnT (θ0)−12(θ −θ0− γ)′ CnT (θ −θ0− γ)+

12

γ′nTCnT γnT +Qrem

nT (θ). (11)

At θ = θ0 + γnT , QnT (θ0 + γnT ) = QnT (θ0) +12 γ ′nTCnT γnT + Qrem

nT (θ0 + γnT ). Now let θnT be the QML

estimator. From Eq. (11),

QnT (θnT ) = QnT (θ0)−12(θnT −θ0− γnT

)′CnT

(θnT −θ0− γnT

)+

12

γ′nTCnT γnT +Qrem

nT (θnT ).

Because θnT maximizes the objective function, QnT (θnT )≥ QnT (θ0 + γnT ), which implies that(θnT −θ0− γnT

)′CnT


)≤ 2

(Qrem

nT (θnT )−QremnT (θ0 + γnT )

). (12)

Assuming that CnT is positive definite, it follows that(θnT −θ0− γnT

)′CnT


)≥ b∥∥θnT −θ0− γnT

∥∥22

for some b > 0, and ‖γnT‖2 = OP

(1√nT

)from

∥∥∥C(1)nT

∥∥∥2= OP (1). Using the similar argument in Shi

and Lee (2015), it follows that b∥∥θnT −θ0− γnT

∥∥22 +OP

(∥∥θnT −θ0− γnT∥∥n−

32

)+OP

(n−

52

)≤ 0, which

implies that√

nT∥∥θnT −θ0− γnT

∥∥2 = OP

(n−

14

)as n and T are proportional in the limit. Therefore,

√nT(θnT −θ0

)=√

nT γnT +oP (1) = C−1nT C(1)

nT +oP(1).

Recall that the parameter vector is θ =(β ′z ,β

′y,λ ,αξ ,α

′,δ ′)′. In the Proposition 2 below, denote ac-

cordingly,

ϕnT =

00

− 1√nT

tr[(

PFTy⊗ In)

GnT +(IT ⊗PΓny

)GnT

](√ nT +

√Tn

)Ry0σξ 0

Rz0√ n

T tr[

Σ12ε0Ω1

]+√

Tn tr[(

Σ12ε0Ω1⊗ In

)PΓnz

]...

Rz0√ n

T tr[

Σ12ε0ΩJ

]+√

Tn tr[(

Σ12ε0ΩJ⊗ In

)PΓnz

]0

.

Proposition 2. Assuming that limn,T→∞nT → κ > 0, Σ= limn,T→∞ ΣnT is positive definite, Π= limn,T→∞ ΠnT ,

which are given in Eq. (A.30) of the appendix, and Assumptions 1-3,4 or 5, 6-8 hold, then

√nT(

θnT −θ0−1√nT

C−1nT ϕnT

)= C−1

nT

(C(1)

nT −ϕnT

)d→ N(0,Σ−1 (Σ+Π)Σ

−1).

15

If the disturbance terms are normally distributed, Π = 0 and√

nT(

θnT −θ0− 1√nT

C−1nT ϕnT

)d→ N(0,Σ−1).

It can be shown that the variance of C(1)nT −ϕnT is ΣnT +ΠnT . Comparing the terms of CnT with ΣnT

reveals that p limn,T→∞

(CnT −ΣnT

)= 0, which gives the limiting variance matrix Σ−1 (Σ+Π)Σ−1.

Proposition 2 shows that the limiting distributions of the QML estimates do not center at their true values

but with a bias term of the order OP

(1√nT

). The bias term is due to incidental parameters, namely factor

loadings and time factors, as the estimates for them have slower convergence rate. As in Shi and Lee (2015),

the bias can be corrected by substituting in the QML estimates θnT and the estimated projectors of factors

and factor loadings in ϕnT and ΣnT . The bias corrected estimator is θ cnT = θnT − 1√

nTΣ−1nT ϕnT .

Corollary 1. Assuming that limn,T→∞nT → κ > 0, Σ = limn,T→∞ ΣnT is positive definite, Π = limn,T→∞ ΠnT ,

and Assumptions 1-3,4 or 5, 6-8 hold, then√

nT(θ c

nT −θ0) d→ N(0,Σ−1 (Σ+Π)Σ−1). If the disturbance

terms are normally distributed,√

nT(θ c

nT −θ0) d→ N(0,Σ−1).

4.4. The number of factors

As long as the number of factors specified is not fewer than the true number of factors, the QML es-

timator is consistent. However, the limiting distribution is under the premise that the number of factors is

correctly specified. Although Moon and Weidner (2015) show that under certain conditions, the limiting

distribution of the estimator for a regression panel is invariant to the inclusion of redundant factors, its finite

sample performance may suffer, as Lu and Su (2015) emphasize. If factors are interpreted as omitted vari-

ables, their detection is a first step in trying to measure them. In this section, we demonstrate how the factor

number can be consistently determined.

Several criteria on factor number selection have been proposed in the literature, including Bai and Ng

(2002)’s PC and IC criteria, Onatski (2010)’s Edge Distribution estimator, and Ahn and Horenstein (2013)’s

eigenvalue ratio and growth ratio criteria. We specifically show how Ahn and Horenstein (2013)’s eigen-

value ratio and growth ratio criteria can be extended here. Denote θnT the QML estimator with Rz ≥ Rz0

and Ry ≥ Ry0, which is a preliminary consistent estimator using a larger number of factors than the true

one. For the z equation, let DnT =(dn1, · · · , dnT

)with dnt =

(Σ− 1

2ε ⊗ In

)(znt −Xntzβz

). Using the no-

tation of Eq. (7), we have DnT = ΓnzF ′T z +

(Σ− 1

2ε0 ⊗ In

)EnT + EnT (θnT ), with EnT (θnT ) = ∑

kz+Jk=1 η

zkV z

k +

∑kz+Jk1,k2=1 η

zk1

ηzk2

V zk1k2

. Denote µznT,k = µk

( 1nT DnT D′nT

)for k ≥ 1, V z(k) = ∑

npj=k+1 µ

znT, j for k ≥ 0, and µ

znT,0 =

V z(0)/ log(min(n,T )). Define the “eigenvalue ratio” statistic, ERz(k) =µ

znT,k

µznT,k+1

, and the “growth ratio” statis-

tic, GRz(k) = log(

V z(k−1)V z(k)

)/ log

(V z(k)

V z(k+1)

). The number of factors in the z equation can be selected accord-

ing to kzER = max0≤k≤kmax ERz(k) or kz

GR = max0≤k≤kmax GRz(k).

For the y equation, let GnT =(gn1, · · · , gnT ) with gnt = σ−1ξ

(Snt

(λ

)ynt −Xntyβy−

(δ ⊗ In

)(znt −Xntzβz

)).

From Eq. (9), GnT =ΓnyF ′Ty+σ−1ξ 0 ΞnT +ΞnT (θnT ), with ΞnT (θnT )=∑

kz+ky+2+pk=1 η

ykV y

k +∑kz+ky+2+pk1,k2=1 η

yk1

ηyk2

V yk1k2

.

kyER and ky

GR are defined similarly via µynT,k = µk

( 1nT GnT G′nT

).

Proposition 3. Assuming that limn,T→∞nT → κ > 0, Assumptions 2,3 and 8 hold, the preliminary estima-

tor∥∥θnT −θ0

∥∥2 = op

(n−

12

), Rz0 ≥ 0, Ry0 ≥ 0, then limn,T→∞ Pr

(kz

ER = Rz0)= limn,T→∞ Pr

(kz

GR = Rz0)=

limn,T→∞ Pr(ky

ER = Ry0)= limn,T→∞ Pr

(ky

GR = Ry0)= 1.

16

Therefore the number of factors can be determined consistently. The proof, which is in the appendix,

checks that the relevant assumptions of Ahn and Horenstein (2013) are satisfied and their result then applies

here. Note that the case with no factors (Rz0 = 0 or Ry0 = 0) is covered.

5. Monte Carlo simulations

5.1. Design

We study the finite sample performance of the QML estimator and the accuracy of the factor number

selection in samples of different sizes, different degrees of spatial interaction and endogeneity, and different

ratios between the variances of the idiosyncratic error and the factors. We also check the finite sample

performance of the QML estimator in a data specification that tailors to the empirical application in Section

6.

In the main data specification, the outcome equation is yit = λ ∑nj=1 wit, jty jt + xitβy + γ ′yi fyt + vit . There

are two unobserved factors in the yit equation. The factors ( fyt) and factor loadings (γyi) are generated from

independent uniform random variables on [−2,2]. The observed regressor (xit) is a scalar and correlates

with factors through xit =13

(γ ′yi fyt + γ ′yi`2 + f ′yt`2

)+ηit , where ηit is a uniform random variable on [−2,2]

and `2 is a 2×1 vector of 1’s. The spatial weights are correlated with vit according to the following process.

1. Individuals indexed by 1 to n are located successively on a chessboard of dimension√

n×√

n. The

neighborhood structure follows the rook pattern, i.e. two individuals are neighbors if they share a

common border. Therefore individuals in the interior of the chessboard have 4 neighbors, and those

on the border and the corner have 3 and 2 neighbors, respectively. Let wdi j = 1 if i and j are neighbors

and wdi j = 0 otherwise, and denote the n× n matrix Wd =

[wd

i j

]. Note that in this experiment, the

geographic locations do not change over time.2. Let zit = xitβz + γzi fzt + εit . There is one unobserved factor in the zit equation. The scalar γzi is

generated from uniform random variable on [−2,2], and fzt is the first element of fyt . Let weit, jt =

wdi j ×min

(1

|zit−z jt| ,2)

. The spatial dependence is stronger for neighbors with similar z’s. In line

with the condition that the functions hit, jt(·) are uniformly bounded, weit, jt is capped at 2 such that

individuals whose z’s are very similar (∣∣zit − z jt

∣∣ < 0.5) have the same degree of spatial effect. The

spatial weights in the outcome equation are row-normalized, wit, jt =we

it, jt

∑nj=1 we

it, jt. Note that the spatial

weights matrix complies with Assumption 6.3. The idiosyncratic errors vit and εit are generated from i.i.d. bivariate normal random variables with

mean 0 and variance 43 ϑ

(1 ρ

ρ 1

). The inverse of ϑ is a measure of the signal to noise ratio. Each

common factor and the idiosyncratic error have the same variance when ϑ = 1, and the latter is more

variable when ϑ > 1.

Our empirical application considers multiple spatial weights matrices. To provide evidence that our es-

timator can perform well, in an alternative data specification, the outcome equation is replaced by yit =

λ1 ∑nj=1 w1,i jy jt + λ2 ∑

nj=1 wit, jty jt + xitβy + γ ′yi fyt + vit , where wit, jt is the same as in the main data spec-

ification representing time varying spatial relations, and W1 = [w1,i j] is row-normalized Wd representing

geographic contiguity. Other parts of the data specification are the same.

17

5.2. Monte Carlo results

1000 Monte Carlo replications are carried out. The numerical maximization routine starts at multiple

values, because the objective function is not concave and multiple local maxima may exist. We vary the

degree of spatial interaction (λ = 0.2 or λ = 0.6) and endogeneity (ρ = 0.2 or ρ = 0.6). αξ denotes σ−1ξ

and α denotes Σ− 1

2ε . Table 1 reports the Monte Carlo results for the QML estimator. The magnitude of biases

generally decreases as n and T increase. The coverage probability (CP) is calculated using the asymptotic

variance covariance matrix Σ in Eq. (A.30), and the nominal coverage probability is set at 95%. The

estimates for the variance parameters (αξ and α) have noticeable biases in finite sample, and as a result,

their CPs are well below 95%. The CPs for other parameter estimates are generally slightly below 95% and

are better. Therefore overall, hypothesis tests might be over-rejected. The Monte Carlo results of the bias

corrected estimator are reported in Table 2. The biases for the variance parameters have been significantly

reduced. The CPs are more accurate so a more reliable statistical inference can be based on the bias corrected

estimator.

Table 3 shows that estimates suffer from substantial bias if the spatial weights matrix is treated erro-

neously as exogenous in the estimation, where only the outcome equation is estimated. The biases still

remain in large samples. The estimate for the spatial effect is biased upward, and the bias is larger in

samples with a higher degree of endogeneity.

Proposition 1 shows that the QML estimates are consistent even when the number of factors is over-

specified. In Table 4, we report the performance of the bias corrected estimators when the number of factors

in the outcome equation is over-specified by 1, i.e. Ry = 3 while the true Ry0 = 2. Although estimates

are still consistent, but for these finite samples, the variance parameter (αξ ) has noticeable bias that is not

removed by the bias correction procedure. Tables 9 and 10 in the appendix report additional Monte Carlo

results. As the number of redundant factors increases, the CP deteriorates. Therefore for valid inference, it

seems important that a correct number of factors is chosen.

We check the accuracy of factor selections given by the eigenvalue ratio (ER) and growth ratio (GR)

criterions. Figure 1 shows the number of incorrect selection in 1000 simulations. The accuracy is almost

100% when the variances of elements of the idiosyncratic errors and the factors are the same (ϑ = 1), even

in small sample (n = 25, T = 25). We then make the idiosyncratic errors to be up to 9 times more variable

than the factors, and find that the selection errors increase as a result. However, the selection accuracy

quickly improves as sample size increases, and 100% accuracy is achieved in the sample with n = 81 and

T = 81. The ER and GR criterions have similar overall performances, and GR criterion performs slightly

better when the factors are weak (high ϑ ).

The empirical application in the next section uses two spatial weights matrices and the estimation results

there reveal different degrees of spatial interactions. Table 5 based on the empirical feature of the empirical

application confirms that our bias corrected estimator still performs well in the alternative specification with

multiple spatial weights matrices.

18

Table 1: Performance of the QML Estimator θnT

n T λ0 ρ0 βz βy λ αξ α δ

(1) 25 25 0.2 0.2 Bias 0.00252 -0.00113 -0.00211 0.08269 0.03813 0.00099CP 0.941 0.898 0.903 0.159 0.674 0.928

49 25 0.2 0.2 Bias 0.00013 0.00142 -0.00143 0.05931 0.02786 0.00086CP 0.942 0.926 0.918 0.119 0.675 0.955

25 49 0.2 0.2 Bias -0.00115 0.00001 -0.00190 0.05873 0.02836 0.00128CP 0.938 0.938 0.923 0.140 0.673 0.934

49 49 0.2 0.2 Bias 0.00028 0.00146 -0.00180 0.03846 0.01868 0.00029CP 0.945 0.946 0.916 0.187 0.696 0.942

(2) 25 25 0.2 0.6 Bias 0.00108 -0.00334 0.00013 0.10114 0.03806 0.00003CP 0.941 0.913 0.906 0.150 0.687 0.912

49 25 0.2 0.6 Bias 0.00020 0.00063 -0.00104 0.07260 0.02746 0.00051CP 0.936 0.931 0.928 0.114 0.693 0.927

25 49 0.2 0.6 Bias -0.00119 -0.00074 0.00076 0.07245 0.02773 0.00063CP 0.935 0.940 0.922 0.122 0.681 0.921

49 49 0.2 0.6 Bias 0.00046 0.00128 -0.00143 0.04699 0.01856 0.00038CP 0.938 0.943 0.929 0.187 0.706 0.924

(3) 25 25 0.6 0.2 Bias 0.00252 -0.00003 -0.00284 0.08201 0.03813 0.00137CP 0.941 0.901 0.907 0.183 0.674 0.926

49 25 0.6 0.2 Bias 0.00013 0.00176 -0.00129 0.05904 0.02786 0.00092CP 0.942 0.927 0.930 0.141 0.675 0.956

25 49 0.6 0.2 Bias -0.00115 0.00082 -0.00232 0.05822 0.02836 0.00154CP 0.938 0.938 0.917 0.156 0.673 0.933

49 49 0.6 0.2 Bias 0.00028 0.00176 -0.00144 0.03821 0.01868 0.00034CP 0.945 0.948 0.926 0.198 0.696 0.940

(4) 25 25 0.6 0.6 Bias 0.00108 -0.00236 -0.00139 0.10062 0.03806 0.00072CP 0.941 0.910 0.912 0.171 0.687 0.916

49 25 0.6 0.6 Bias 0.00020 0.00082 -0.00083 0.07239 0.02746 0.00060CP 0.936 0.931 0.930 0.126 0.693 0.933

25 49 0.6 0.6 Bias -0.00119 -0.00010 -0.00063 0.07216 0.02773 0.00107CP 0.935 0.937 0.929 0.144 0.681 0.919

49 49 0.6 0.6 Bias 0.00046 0.00150 -0.00112 0.04676 0.01856 0.00050CP 0.938 0.941 0.928 0.205 0.706 0.920

True parameter values: βz0 = 1, βy0 = 1, σv0 =√

43 , σe0 =

√43 (hence α0 =

( 43)− 1

2 ). In the low endogeneity case, ρ0 = 0.2,

αξ 0 = 0.88 and δ0 = 0.2. In the high endogeneity case, ρ0 = 0.6, αξ 0 = 1.08 and δ0 = 0.6. θnT is the QML estimator. Thecoverage probabilities (CP) are calculated using the theoretical standard deviations obtained from the diagonal elements of 1

nT Σ−1nT .

19

Table 2: Performance of the Bias-Corrected QML Estimator θ cnT


(1) 25 25 0.2 0.2 Bias 0.00252 -0.00145 -0.00096 0.00543 0.00197 0.00123CP 0.952 0.934 0.934 0.928 0.945 0.944

49 25 0.2 0.2 Bias 0.00013 0.00117 -0.00026 0.00241 0.00086 0.00105CP 0.945 0.948 0.932 0.936 0.943 0.964

25 49 0.2 0.2 Bias -0.00115 -0.00034 -0.00059 0.00188 0.00135 0.00153CP 0.943 0.956 0.939 0.915 0.940 0.951

49 49 0.2 0.2 Bias 0.00028 0.00122 -0.00076 0.00088 0.00063 0.00048CP 0.948 0.954 0.936 0.933 0.945 0.953

(2) 25 25 0.2 0.6 Bias 0.00108 -0.00307 -0.00057 0.00637 0.00189 -0.00017CP 0.948 0.932 0.935 0.931 0.950 0.933

49 25 0.2 0.6 Bias 0.00020 0.00038 -0.00004 0.00290 0.00047 0.00073CP 0.948 0.950 0.949 0.932 0.949 0.945

25 49 0.2 0.6 Bias -0.00119 -0.00059 0.00029 0.00263 0.00073 0.00051CP 0.948 0.954 0.940 0.932 0.938 0.943

49 49 0.2 0.6 Bias 0.00047 0.00106 -0.00062 0.00096 0.00051 0.00058CP 0.942 0.952 0.942 0.945 0.950 0.935

(3) 25 25 0.6 0.2 Bias 0.00252 -0.00100 -0.00118 0.00520 0.00197 0.00187CP 0.952 0.937 0.935 0.928 0.945 0.947

49 25 0.6 0.2 Bias 0.00013 0.00141 -0.00055 0.00227 0.00086 0.00110CP 0.945 0.946 0.947 0.936 0.943 0.964

25 49 0.6 0.2 Bias -0.00115 -0.00017 -0.00055 0.00180 0.00135 0.00203CP 0.943 0.959 0.940 0.916 0.940 0.950

49 49 0.6 0.2 Bias 0.00028 0.00143 -0.00080 0.00074 0.00063 0.00051CP 0.948 0.958 0.937 0.937 0.945 0.952

(4) 25 25 0.6 0.6 Bias 0.00108 -0.00267 -0.00090 0.00613 0.00189 0.00094CP 0.949 0.933 0.936 0.931 0.950 0.936

49 25 0.6 0.6 Bias 0.00020 0.00053 -0.00028 0.00281 0.00047 0.00081CP 0.948 0.947 0.954 0.935 0.949 0.943

25 49 0.6 0.6 Bias -0.00119 -0.00048 -0.00001 0.00262 0.00073 0.00133CP 0.948 0.956 0.948 0.927 0.938 0.941

49 49 0.6 0.6 Bias 0.00047 0.00126 -0.00068 0.00080 0.00051 0.00068CP 0.942 0.952 0.945 0.944 0.950 0.930

True parameter values: βz0 = 1, βy0 = 1, σv0 =√

43 , σe0 =

√43 (hence α0 =

( 43)− 1

2 ). In the low endogeneity case, ρ0 = 0.2,

αξ 0 = 0.88 and δ0 = 0.2. In the high endogeneity case, ρ0 = 0.6, αξ 0 = 1.08 and δ0 = 0.6. θ cnT is the bias-corrected QML

estimator. The coverage probabilities (CP) are calculated using the theoretical standard deviations obtained from the diagonalelements of 1

nT Σ−1nT .

20

Table 3: Estimation under Misspecification: Assuming Exogenous Spatial Weights Matrix

n T λ0 ρ0 βy λ ρ0 βy λ

25 25 0.2 0.2 Bias -0.00739 0.02007 0.6 Bias -0.03103 0.08171E-SD 0.04823 0.03597 E-SD 0.04883 0.03516

RMSE 0.04877 0.04117 RMSE 0.05783 0.0889549 25 0.2 0.2 Bias -0.00263 0.01775 0.6 Bias -0.01665 0.06913

E-SD 0.03099 0.02466 E-SD 0.03095 0.02386RMSE 0.03109 0.03038 RMSE 0.03513 0.07313

25 49 0.2 0.2 Bias -0.00552 0.01789 0.6 Bias -0.02391 0.07239E-SD 0.03121 0.02381 E-SD 0.03150 0.02282


E-SD 0.02143 0.01679 E-SD 0.02157 0.01620RMSE 0.02170 0.02519 RMSE 0.02949 0.07664

25 25 0.6 0.2 Bias -0.00709 0.00981 0.6 Bias -0.02968 0.04219E-SD 0.04893 0.02346 E-SD 0.04908 0.02195


E-SD 0.03159 0.01654 E-SD 0.03136 0.01548RMSE 0.03177 0.01937 RMSE 0.03682 0.04205

25 49 0.6 0.2 Bias -0.00572 0.00895 0.6 Bias -0.02432 0.03767E-SD 0.03173 0.01579 E-SD 0.03195 0.01456


E-SD 0.02174 0.01164 E-SD 0.02179 0.01075RMSE 0.02220 0.01603 RMSE 0.03166 0.04445

The estimation mistakenly assumes the spatial weights matrix to be exogenous, hence only the outcome equation is estimated.

Interactive effects are still included. True parameter values: βz0 = 1, βy0 = 1, σv0 =√

43 , σe0 =

√43 (hence α0 =

( 43)− 1

2 ). In thelow endogeneity case, ρ0 = 0.2, αξ 0 = 0.88 and δ0 = 0.2. In the high endogeneity case, ρ0 = 0.6, αξ 0 = 1.08 and δ0 = 0.6. E-SDis the empirical standard deviation of the estimates.

21

Table 4: Performance of the Bias-Corrected QML Estimator θ cnT when Ry = Ry0 +1.


(1) 25 25 0.2 0.2 Bias 0.00252 -0.00201 -0.00158 0.04184 0.00197 0.00055CP 0.952 0.905 0.897 0.633 0.945 0.922

49 25 0.2 0.2 Bias 0.00013 0.00093 -0.00090 0.02825 0.00086 0.00139CP 0.945 0.921 0.912 0.678 0.943 0.940

25 49 0.2 0.2 Bias -0.00115 -0.00044 -0.00088 0.02740 0.00135 0.00158CP 0.943 0.938 0.911 0.677 0.940 0.938

49 49 0.2 0.2 Bias 0.00028 0.00138 -0.00082 0.01865 0.00063 0.00054CP 0.948 0.941 0.934 0.699 0.945 0.942

(2) 25 25 0.2 0.6 Bias 0.00108 -0.00316 -0.00067 0.05076 0.00189 -0.00002CP 0.948 0.919 0.907 0.642 0.950 0.914

49 25 0.2 0.6 Bias 0.00020 0.00079 -0.00063 0.03452 0.00047 0.00048CP 0.948 0.932 0.929 0.668 0.949 0.921

25 49 0.2 0.6 Bias -0.00119 -0.00033 -0.00015 0.03395 0.00073 0.00091CP 0.947 0.937 0.930 0.672 0.938 0.926

49 49 0.2 0.6 Bias 0.00047 0.00138 -0.00074 0.02277 0.00051 0.00059CP 0.942 0.940 0.929 0.708 0.950 0.929

(3) 25 25 0.6 0.2 Bias 0.00252 -0.00130 -0.00188 0.04149 0.00197 0.00147CP 0.952 0.907 0.903 0.648 0.945 0.923

49 25 0.6 0.2 Bias 0.00013 0.00127 -0.00112 0.02801 0.00086 0.00148CP 0.945 0.920 0.930 0.689 0.943 0.941

25 49 0.6 0.2 Bias -0.00115 -0.00007 -0.00103 0.02721 0.00135 0.00222CP 0.943 0.940 0.911 0.698 0.940 0.934

49 49 0.6 0.2 Bias 0.00028 0.00167 -0.00092 0.01849 0.00063 0.00060CP 0.948 0.940 0.931 0.730 0.945 0.940

(4) 25 25 0.6 0.6 Bias 0.00108 -0.00257 -0.00126 0.05043 0.00189 0.00123CP 0.947 0.919 0.911 0.649 0.950 0.917

49 25 0.6 0.6 Bias 0.00020 0.00102 -0.00075 0.03433 0.00047 0.00062CP 0.948 0.934 0.935 0.682 0.949 0.923

25 49 0.6 0.6 Bias -0.00119 -0.00003 -0.00049 0.03383 0.00073 0.00182CP 0.947 0.940 0.926 0.701 0.938 0.925

49 49 0.6 0.6 Bias 0.00047 0.00160 -0.00077 0.02259 0.00051 0.00070CP 0.942 0.940 0.935 0.722 0.950 0.927

The true number of factor is 1 in the z equation and 2 in the y equation. In the estimation, the number of factors is set at 1 for the z

equation and 3 for the y equation. True parameter values: βz0 = 1, βy0 = 1, σv0 =√

43 , σe0 =

√43 (hence α0 =

( 43)− 1

2 ). In the low

endogeneity case, ρ0 = 0.2, αξ 0 = 0.88 and δ0 = 0.2. In the high endogeneity case, ρ0 = 0.6, αξ 0 = 1.08 and δ0 = 0.6. θ cnT is the

bias-corrected QML estimator. The coverage probabilities (CP) are calculated using the theoretical standard deviations obtainedfrom the diagonal elements of 1

nT Σ−1nT .

22

Figure 1: Frequencies of Incorrect Estimation

True parameter values: βz0 = 1, βy0 = 1, σv0 = 1, σe0 = 1 (hence α0 = 1), ρ0 = 0.2, αξ 0 = 0.88 and δ0 = 0.2. True number offactors: 1 in the z equation and 2 in the y equation. Initial estimates assume 10 factors in both equations.

23

Tabl

e5:

Perf

orm

ance

ofth

eB

ias-

Cor

rect

edQ

ML

Est

imat

orθ

c nTfo

rthe

Alte

rnat

ive

Mod

el

nT

λ10

λ20

ρ0

βz

βy

λ1

λ2

αξ

αδ

(1)

2525

0.2−

0.2

0.2

Bia

s-0

.000

76-0

.000

040.

0035

9-0

.002

110.

0068

80.

0021

60.

0006

5C

P0.

964

0.95

70.

956

0.95

40.

922

0.94

30.

943

4925

0.2−

0.2

0.2

Bia

s0.

0002

60.

0002

20.

0011

5-0

.000

900.

0031

30.

0009

80.

0012

4C

P0.

952

0.93

20.

938

0.94

40.

919

0.94

10.

953

2549

0.2−

0.2

0.2

Bia

s0.

0004

4-0

.000

890.

0017

5-0

.000

650.

0023

40.

0012

50.

0010

9C

P0.

957

0.95

50.

946

0.94

60.

927

0.93

90.

951

4949

0.2−

0.2

0.2

Bia

s0.

0007

7-0

.000

120.

0002

3-0

.000

520.

0013

20.

0006

30.

0009

3C

P0.

950

0.94

30.

939

0.95

00.

936

0.94

80.

953

(2)

2525

0.2−

0.2

0.6

Bia

s-0

.000

83-0

.000

290.

0017

0-0

.001

780.

0078

30.

0019

5-0

.000

59C

P0.

963

0.94

90.

954

0.95

00.

930

0.94

30.

936

4925

0.2−

0.2

0.6

Bia

s-0

.000

08-0

.000

250.

0008

3-0

.000

850.

0038

30.

0004

90.

0009

1C

P0.

952

0.93

60.

950

0.94

50.

925

0.94

70.

941

2549

0.2−

0.2

0.6

Bia

s-0

.000

03-0

.001

340.

0019

8-0

.000

440.

0030

00.

0009

30.

0005

2C

P0.

955

0.95

40.

934

0.94

10.

935

0.94

00.

947

4949

0.2−

0.2

0.6

Bia

s0.

0004

8-0

.000

14-0

.000

700.

0002

70.

0017

30.

0003

40.

0006

5C

P0.

948

0.95

10.

947

0.94

00.

942

0.94

70.

943

(3)

2525

0.3−

0.3

0.2

Bia

s-0

.000

76-0

.000

070.

0033

3-0

.001

930.

0068

90.

0021

60.

0006

5C

P0.

964

0.95

60.

957

0.95

60.

920

0.94

30.

943

4925

0.3−

0.3

0.2

Bia

s0.

0002

60.

0002

00.

0010

4-0

.000

770.

0031

40.

0009

80.

0012

2C

P0.

952

0.93

40.

940

0.94

60.

920

0.94

10.

954

2549

0.3−

0.3

0.2

Bia

s0.

0004

4-0

.000

910.

0015

5-0

.000

520.

0023

50.

0012

50.

0010

9C

P0.

957

0.95

30.

944

0.94

60.

926

0.93

90.

951

4949

0.3−

0.3

0.2

Bia

s0.

0007

7-0

.000

130.

0001

6-0

.000

490.

0013

30.

0006

30.

0009

2C

P0.

950

0.94

20.

939

0.94

90.

937

0.94

80.

951

(4)

2525

0.3−

0.3

0.6

Bia

s-0

.000

83-0

.000

300.

0015

8-0

.001

750.

0078

40.

0019

5-0

.000

60C

P0.

963

0.94

80.

954

0.95

20.

930

0.94

30.

934

4925

0.3−

0.3

0.6

Bia

s-0

.000

08-0

.000

260.

0007

6-0

.000

770.

0038

30.

0004

90.

0008

8C

P0.

952

0.93

60.

950

0.94

00.

925

0.94

70.

942

2549

0.3−

0.3

0.6

Bia

s-0

.000

03-0

.001

350.

0018

3-0

.000

360.

0030

00.

0009

30.

0004

9C

P0.

955

0.95

50.

933

0.94

00.

934

0.94

00.

946

4949

0.3−

0.3

0.6

Bia

s0.

0004

8-0

.000

14-0

.000

770.

0003

00.

0017

30.

0003

40.

0006

4C

P0.

948

0.95

20.

948

0.94

60.

941

0.94

70.

943

The

outc

ome

equa

tion

has

two

spat

ialw

eigh

tsm

atri

ces,

one

time

inva

rian

t(λ

1)an

dth

eot

her

istim

eva

rian

t(λ

2).

True

para

met

erva

lues

:β

z0=

1,β

y0=

1,σ

v0=√ 4 3

,σe0

=√ 4 3

(hen

ceα

0=( 4 3) −1 2

).In

the

low

endo

gene

ityca

se,ρ

0=

0.2,

αξ

0=

0.88

and

δ0=

0.2.

Inth

ehi

ghen

doge

neity

case

,ρ0=

0.6,

αξ

0=

1.08

and

δ0=

0.6.

θc nT

isth

ebi

as-c

orre

cted

QM

Les

timat

or.T

heco

vera

gepr

obab

ilitie

s(C

P)ar

eca

lcul

ated

usin

gth

eth

eore

tical

stan

dard

devi

atio

nsob

tain

edfr

omth

edi

agon

alel

emen

tsof

1 nTΣ−

1nT

.

24

Figure 2: Average HECM Origination Rates and Top Lender Shares by US Regions

6. Empirical application: spatial spillovers in mortgage originations

Our empirical application is motivated by Haurin et al. (2014) which analyzes the effect of house prices

on state-level origination rates of the Home Equity Conversion Mortgage (HECM). HECM is the predom-

inant type of reverse mortgages in the United States, which enable senior homeowners to withdraw their

home equity without home sale or monthly payments. HECMs are insured and regulated by the federal gov-

ernment, although the private market originates the loans. The insurance is provided by the Federal Housing

Administration through the mutual mortgage insurance fund, which guarantees that the borrower can have

access to the loan fund in the future even when the lender is no longer in business, and the lender can be

fully repaid when the loan terminates even if the house value is less than the loan balance. The borrower

pays mortgage insurance premium both at loan closing and monthly over the lifetime of the loan. Haurin

et al. (2014) find that states with past volatile house prices and current house price levels above long term

norms have higher origination rates. This is consistent with the hypothesis that households use HECMs

to insure against house price declines and therefore the mortgage insurance should take into account this

behavioral response to house price dynamics, as the insurance fund will face higher claim risk when the

insured HECMs concentrate disproportionately in areas that more likely see house price declines.

Observing that the origination rates exhibit spatial clustering, it is of interest to investigate the extent

of spatial spillovers. The HECM activity in a state may be contemporaneously affected by developments

in neighboring states. We are also interested in understanding the factors that may give rise to the spatial

spillovers. We investigate two types of spatial effects, spillovers from neighboring states and spillovers due

to large lenders. We hypothesize that two types of lenders exist. Small lenders concentrate on local markets

while large lenders serve multiple states. Two neighboring states that have high concentrations of large

lenders might be more closely connected because large lenders may be more aware of the developments in

neighboring states. Our data covers 50 states plus Washington, D.C. and 52 quarters from 2001 to 2013.

Table 6 shows that the HECM market is fragmented with a handful of large lenders and a large number of

small lenders.

Let yit denote the HECM origination rate, defined as the number of newly originated HECM loans in

25

Table 6: Largest HECM Lenders by Volume (2001-2013)Name # Loans Share of Total #

Loans (2001-2013)Wells Fargo Bank 157,772 20.35%Financial Freedom Senior Funding Corp 47,187 6.09%Bank of America 25,343 3.27%One Reverse Mortgage LLC 23,431 3.02%MetLife Bank 20,179 2.60%Liberty Home Equity Solutions Inc 17,997 2.32%American Advisors Group 17,136 2.21%Seattle Mortgage Company 13,344 1.72%Urban Financial of America LLC 10,357 1.34%Generation Mortgage Company 9,557 1.23%World Alliance Financial Corp. 9,241 1.19%Reverse Mortgage USA Inc 9,112 1.18%Everbank Reverse Mortgage LLC 8,304 1.07%Ditech Mortgage Corp 7,479 0.96%M and T Bank 6,480 0.84%Countrywide Bank FSB 6,391 0.82%American Reverse Mortgage Corp 6,231 0.80%Academy Mortgage LLC 4,719 0.61%First Mariner Bank 4,341 0.56%Reverse Mortgage Solutions Inc 4,200 0.54%

Figure 3: Average House Price Deviations and Volatility by US Regions

26

state i at quarter t as a percentage of the senior population (age 65 plus) in state i from the 2010 census. There

are two n×n spatial weights matrices, W1,n = (w1,i j) and W2,nt = (w2,it, jt). w1,i j = 1 if states i and j share

the same border and w1,i j = 0 otherwise. W2,nt is time varying and captures a possibly different spillover

effect from large lenders. w2,it, jt = w1,i j × zit × z jt , where zit is the share of the HECM loans originated

by the large lenders (defined as being one of the top 10 largest lenders in Table 63) in state i at quarter

t. A larger weight is given to a state with more dominant large lenders. House price dynamic variables

are constructed using the Federal Housing Finance Agency’s quarterly all-transactions house price indexes

(HPI) deflated by the CPI, which start from 1975. Following Haurin et al. (2014), the house price dynamic

variables include percentage deviations from the previous 9 year averages (hpi_dev)4, standard deviations

of house price changes in the previous 9 years (hpi_v) and the interaction between the two. Figures 2 and

3 show the averages of these variables by U.S. regions in our sample period. It is likely that lender activity

and origination rates are affected by some macroeconomic factors, which are captured by the time factors

fzt and fyt . It is also likely that macroeconomic factors have different impacts in different states, as captured

by state-specific factor loadings. Note that factor loadings may be spatially correlated, capturing residual

spatial effects not directly modeled. Interactive effects include additive individual and time effects as special

cases, hence state time invariant variables are not included. The model consists of

zit = hpi_devitβz1 +hpi_vitβz2 +(hpi_devit ×hpi_vit)βz3 + γ′zi fzt + εit ,

and

yit = λ1

n

∑j=1

w1,i jy jt +λ2

n

∑j=1

w2,it, jty jt +hpi_devitβy1 +hpi_vitβy2 +(hpi_devit ×hpi_vit)βy3 + γ′yi fyt + vit ,

where, respectively, εit and vit have variances σ2ε and σ2

v with correlation ρ . Denote α = σ−1ε , αξ =((

1−ρ2)

σ2v)− 1

2 and δ = ρσvσ−1ε .

We assume that the upper bound of the number of factors is 10, so preliminary estimates use 10 factors.

According to the eigenvalue ratio criterion (see Proposition 3), it indicates that zit has one unobserved factor

and so is yit , and the growth ratio criterion gives the same result. The same number of factors is selected if

5 factors are used for the preliminary estimates. Figure 4 shows a heat map of the estimated factor loadings

(γyi), where darker colors indicate higher values. A higher factor loading indicates that the time factor has

a larger impact. Because the factor loadings of all states are positive, the corresponding time factors can

be interpreted as positively impacting the take-up rates. Figure 5 plots the estimated time factor ( fyt) with

the quarterly percentage change in the real house price index. The time factors generate increasing HECM

activity prior to 2009 and decreasing HECM activity post 2009, and they appear to move opposite to the

change in house prices with a lag. As the common factor component can account for the impact of an

economy wide shock, we explore possible links between the estimated time factor and some observable

3Alternatively, we define a large lender as one of the top 20 largest lenders. The results are similar.4As in Haurin et al. (2014), the choice of 9 years (or 36 quarters) is ad hoc. The time window needs to be long enough to capture

long term norms, while short enough to reflect changing economic circumstances. We have assumed that households continuouslyupdate their beliefs on the long term norms.

27

Figure 4: Estimated Factor Loadings

economic variables. The credit available from a HECM loan depends on the interest rate, the principal

limit factor set by HUD, age of the borrower and the property value. Before April 2009, only HECMs with

adjustable interest rates were available. Since April 2009, fixed rate HECMs became widely available. We

include the average interest rate of adjustable rate HECMs and the average spread between the interest rate

on an adjustable rate HECM and a fixed rate HECM5. There were two reductions to the principal limit factors

in our sample period, corresponding to the two dummy variables “After 2009Q4” and “After 2010Q4”. We

also consider households’ income fluctuations, as measured by the GDP growth rate.

Table 7 reports the results of the factor regression. More HECM credit is available with lower interest

rates and higher principal limit factors, and as expected, we find that these are associated with higher HECM

activity. In addition, more HECMs were closed in economic downturns, suggesting that households who

face negative income shocks use HECMs to supplement their income. It is interesting to note that because

the time factor is changing over time and its factor loadings are heterogeneous, the variation that has been

attributed to the common factor component (γyi fyt) can not be explained entirely by additive individual and

time effects.

Table 8 reports the main estimation results with one unobserved factor for both definitions of large

lenders, namely, top 10 largest lenders and top 20 largest lenders. As a comparison, we also report the

estimates of panel regressions with additive state fixed effects and time effects in the “FE” column. The

results via the z equation show that small lenders tend to be more active in states with more volatile house

prices and current house prices above long term norms. Such states also have higher origination rates,

consistent with the findings of Haurin et al. (2014). Comparing column FE with the spatial panel regression

reveals that the coefficients of the panel regression are overestimated when spatial effects are ignored. It is

also interesting to note that spatial effects work through different channels. Higher HECM activity in a state

5Because fixed rate HECMs were not available before 2009 April, the spread is coded as 0 before 2009Q2.

28

Figure 5: Estimated Time Factors

Table 7: Regression of Time Factors on Observable Economic VariablesCoefficient SD

Average Adjustable Rate HECM Interest Rate −0.0852∗∗∗ 0.0298Spread Between Adjustable Rate HECM and Fixed Rate HECM 0.0634 0.0566After 2009Q4 0.0258 0.0391After 2010Q4 −0.1203∗∗ 0.0462GDP Growth Rate −0.0190∗∗∗ 0.0057R2 0.38# obs 52

The dependent variable is the estimated time factor. ∗∗∗ p < 0.01, ∗∗ p < 0.05, ∗ p < 0.1.

29

Table 8: Estimation of State-Level Origination RatesTop 10 Top 20 FE

Coefficient SD Coefficient SD Coefficient SDTop Lender Share

House Price Deviation −0.1779∗∗∗ 0.0268 −0.0278 0.0286 0.0227 0.0570House Price Volatility −1.3030∗∗∗ 0.1857 −1.4777∗∗∗ 0.2091 −1.9331∗∗∗ 0.2290Deviation × Volatility −0.6965∗∗ 0.2723 −1.4172∗∗∗ 0.2910 −2.3791∗∗∗ 0.6354

HECM Origination Rateλ1 0.0965∗∗∗ 0.0059 0.1033∗∗∗ 0.0063λ2 −0.0687∗∗∗ 0.0134 −0.0750∗∗∗ 0.0132

House Price Deviation −0.0283∗∗∗ 0.0029 −0.0272∗∗∗ 0.0029 −0.0010 0.0070House Price Volatility 0.1210∗∗∗ 0.0138 0.1236∗∗∗ 0.0142 0.2722∗∗∗ 0.0281Deviation × Volatility 0.8868∗∗∗ 0.0375 0.8772∗∗∗ 0.0373 0.9433∗∗∗ 0.0781

α 7.5808∗∗∗ 0.1041 7.1922∗∗∗ 0.0988αξ 104.6246∗∗∗ 1.4507 104.7460∗∗∗ 1.4523δ 0.0036∗∗∗ 0.0008 0.0035∗∗∗ 0.0008

The sample size is n = 51 and T = 52. Bias corrected estimates are reported. ∗∗∗ p < 0.01, ∗∗ p < 0.05, ∗p < 0.1.

positively influences neighboring states, possibly due to higher awareness from increased media coverage.

However, the spillover effect is smaller for states with more large lenders. This suggests that large lenders

may shift resources towards states with higher demand, resulting in lower activity in neighboring states.

The net spillover effect is still positive, indicating that adverse selection may be slightly mitigated. The

correlation between the share of large lender and the origination rate is statistically significant, although its

magnitude is small.

7. Conclusion

This paper presents a spatial panel data model with endogenous spatial weights matrices and common

factors. The spatial weights matrices can be time varying. Endogeneity arises because the weights are con-

structed from variables that may correlate with disturbances in the main equation. By treating time factors

and factor loadings as fixed effects, the proposed QML estimator is robust to the presence of unobserved

time varying heterogeneity, which may allow to correlate with included regressors. We show that the QML

estimator is consistent and asymptotically normal in panels with large n and T . Because the approximate

score vector is not centered for the spatial interaction parameter and the variances parameters, the limiting

distribution of the estimator has a bias term in the order OP

(1√nT

). An analytical bias correction procedure

is then proposed to remove the leading order bias. Monte Carlo simulations demonstrate good finite sample

performance of the bias corrected estimator. An empirical application of the spatial panel model with time

factors to the study of effects of house prices on state-level origination rates of the Home Equity Conversion

Mortgage reveals some interesting patterns.

30

References

Ahn, Seung C. and Alex R. Horenstein, “Eigenvalue Ratio Test for the Number of Factors,” Econometrica,

May 2013, 81 (3), 1203–1227.

, Young H. Lee, and Peter Schmidt, “Panel Data Models with Multiple Time-Varying Individual Ef-

fects,” Journal of Econometrics, 2013, 174, 1–14.

Bai, Jushan, “Panel Data Models with Interactive Fixed Effects,” Econometrica, 2009, 77 (4), 1229–1279.

and Kunpeng Li, “Spatial Panel Data Models with Common Shocks,” Manuscript, Columbia University,

2014.

and Serena Ng, “Determining the Number of Factors in Approximate Factor Models,” Econometrica,

Jan. 2002, 70 (1), 191–221.

Bai, Z.D. and Y.Q. Yin, “Limit of the Smallest Eigenvalue of a Large Dimensional Sample Covariance

Matrix,” The Annals of Probability, 1993, 21 (3), 1275–1294.

Blundell, Richard, Costas Meghir, Monica Costa-Dias, and John van Reenen, “Evaluating the Employ-

ment Impact of a Mandatory Job Search Program,” Journal of the European Economic Association, June

2004, 2 (4), 569–606.

Cliff, Andrew David and J. K. Ord, Spatial Autocorrelation, London: Pion, 1973.

Conley, Timothy G. and Ethan Ligon, “Economic Distance and Cross-Country Spillovers,” Journal of

Economic Growth, 2002, 7, 157–187.

Gobillon, Laurent and Thierry Magnac, “Regional Policy Evaluation: Interactive Fixed Effects and Syn-

thetic Controls,” The Review of Economics and Statistics, Forthcoming, 2015.

Haurin, Donald, Chao Ma, Stephanie Moulton, Maximilian Schmeiser, Jason Seligman, and Wei Shi,“Spatial Variation in Reverse Mortgage Usage: House Price Dynamics and Consumer Selection,” Journal

of Real Estate Finance and Economics, Forthcoming, 2014.

Holly, Sean, M. Hashem Pesaran, and Takashi Yamagata, “A Spatio-Temporal Model of House Prices

in the USA,” Journal of Econometrics, 2010, 158, 160–173.

, , and , “The Spatial and Temporal Diffusion of House Prices in the UK,” Journal of Urban Eco-

nomics, 2011, 69, 2–23.

Horrace, William C., Xiaodong Liu, and Eleonora Patacchini, “Endogenous Network Production Func-

tions with Selectivity,” Journal of Econometrics, 2015.

Hsiao, Cheng, Analysis of Panel Data, 3rd. ed., New York, NY.: Cambridge University Press, 2014.

31

, H. Steve Ching, and Shui Ki Wan, “A Panel Data Approach for Program Evaluation: Measuring the

Benefits of Political and Economic Integration of Hong Kong with Mainland China,” Journal of Applied

Econometrics, 2012, 27, 705–740.

Jenish, Nazgul and Ingmar R. Prucha, “Central Limit Theorems and Uniform Larws of Large Numbers

for Arrays of Random Fields,” Journal of Econometrics, 2009, 150, 86–98.

and , “On Spatial Processes and Asymptotic Inference under Near-Epoch Dependence,” Journal of

Econometrics, 2012, 170, 178–190.

Kato, Tosio, Perturbation Theory for Linear Operators, Springer-Verlag, 1995.

Kelejian, Harry H. and Gianfranco Piras, “Estimation of Spatial Models with Endogenous Weighting

Matrices, and an Application to a Demand Model for Cigarettes,” Regional Science and Urban Eco-

nomics, 2014, 46, 140–149.

and Ingmar R. Prucha, “On the Asymptotic Distribution of the Moran I Test Statistic with Applica-

tions,” Journal of Econometrics, 2001, 104, 219–257.

and , “Specification and Estimation of Spatial Autoregressive Models with Autoregressive and Het-

eroskedastic Disturbances,” Journal of Econometrics, 2010, 157, 53–67.

Kim, Dukpa and Tatsushi Oka, “Divorce Law Reforms and Divorce Rates in the USA: An Interactive

Fixed-Effects Approach,” Journal of Applied Econometrics, 2014, 29, 231–245.

Latala, Rafal, “Some Estimates of Norms of Random Matrices,” Proceedings of the American Mathemati-

cal Society, 2005, 133 (5), 1273–1282.

Lee, Lung-Fei, “Asymptotic Distributions of Quasi-Maximum Likelihood Estimator for Spatial Autore-

gressive Models,” Econometrica, Nov. 2004, 72 (6), 1899–1925.

and Jihai Yu, “QML Estimation of Spatial Dynamic Panel Data Models with Time Varying Spatial

Weights Matrices,” Spatial Economic Analysis, 2012, 7 (1).

Lu, Xun and Liangjun Su, “Shrinkage Estimation of Dynamic Panel Data Models with Interactive Fixed

Effects,” Manuscript, Singapore Management University, 2015.

Moon, Hyungsik Roger and Martin Weidner, “Linear Regression for Panel with Unknown Number of

Factors as Interactive Fixed Effects,” Econometrica, July 2015, 83 (4), 1543–1579.

and Moon Weidner, “Dynamic Linear Panel Regression Models with Interactive Fixed Effects,”

Manuscript, University of Southern California, 2014.

Onatski, Alexei, “Determining the Number of Factors from Empirical Distribution of Eigenvalues,” Review

of Economic and Statistics, 2010, 92, 1004–1016.

32

Pesaran, M. Hashem, “Estimation and Inference in Large Heterogeneous Panels with a Multifactor Error

Structure,” Econometrica, 2006, 74, 967–1012.

and Elisa Tosetti, “Large Panels with Common Factors and Spatial Correlation,” Journal of Economet-

rics, 2011, 161, 182–202.

Qu, Xi and Lung fei Lee, “Estimating a Spatial Autoregressive Model with an Endogenous Spatial Weight

Matrix,” Journal of Econometrics, 2015, 184, 209–232.

, , and Jihai Yu, “QML Estimation of Spatial Dynamic Panel Data Models with Endogenous Time

Varying Spatial Weights Matrices,” Manuscript, 2015.

Shi, Wei and Lung fei Lee, “Spatial Dynamic Panel Data Models with Interactive Fixed Effects,”

Manuscript, the Ohio State University, 2015.

Wu, Chien-Fu, “Asymptotic Theory of Nonlinear Least Squares Estimation,” The Annals of Statistics, 1981,

9 (3), 501–513.

Yu, Jihai, Robert De Jong, and Lung-Fei Lee, “Quasi-maximum Likelihood Estimators for Spatial Dy-

namic Panel Data with Fixed Effects when Both n and T are Large,” Journal of Econometrics, 2008, 146,

118–134.

Zhang, Fuzhen, Matrix Theory: Basic Results and Techniques, Springer, 2011.

33

AppendixA1. Proof of Claim 2

Because elements of Xntz are uniformly bounded,∥∥XnT z,k

∥∥2≤∥∥XnT z,k

∥∥F =O(

√nT ). Similarly,

∥∥∥Zyk,nT

∥∥∥2=

O(√

nT ) for k = 1, · · · ,ky.

Next consider Zyky+1,nT = σ

−1ξ

(Gn1(Xn1yβy0 + εn1δ0 + Γny0 fy10), · · · ,GnT (XnTyβy0 + εnT δ0 + Γny0 fyT 0)

).

For this paragraph, denote ς∗nT = (ς ′n1, · · · ,ς ′nT )′ with ςnt = σ

−1ξ

(Xntyβy0 + εntδ0 + Γny0 fyt0

). Therefore we

have

tr(

Zyky+1,nT

′Zyky+1,nT

)= σ

−2ξ

T

∑t=1


)′G′ntGnt(Xntyβy0 + εntδ0 + Γny0 fyt0

)= ς

∗nT′G′nT GnT ς

∗nT .

By Proposition 2, Eς∗nT′G′nT GnT ς∗nT = O(nT ). It follows that

∥∥∥Zyky+1,nT

∥∥∥2≤∥∥∥Zy

ky+1,nT

∥∥∥F= OP

(√nT)

by

Markov’s inequality.

For k = 1, · · · , p, Zyky+1+k,nT = σ

−1ξ

εk,nT where εk,nT = (εn1,k, · · · , εnT,k) with εnt,k being the k-th column

of the n× p matrix εnt = (ε1t , · · · ,εnt)′. Because elements of εk,nT are i.i.d. across i and t and have uniformly

bounded 4-th moments, by Latala (2005), ‖εk,nT‖2 = OP(√

max(n,T )) and therefore∥∥∥Zy

ky+1+k,nT

∥∥∥2=

OP

(√max(n,T )

).

A2. Proof of Claim 3

Claim 1 shows that ‖ΞnT‖2 =OP

(√max(n,T )

). Now we show that ‖UnT‖2 =OP

(√nT(

n−14 +T−

14

))with UnT = (un1, · · · ,unT ) and unt = Gntξnt . By definition,

‖UnT‖42 =

(µ1(U ′nTUnT

))2= µ1

(U ′nTUnTU ′nTUnT

)=∥∥U ′nTUnT

∥∥22

≤∥∥U ′nTUnT

∥∥2F =

T

∑t=1

T

∑s=1

(n

∑i=1

[UnT ]it [UnT ]is

)2

.

The (i, t) element of UnT is [UnT ]it = ∑nj=1 Gnt,i jξnt, j. Then

‖UnT‖42 ≤

T

∑t,s=1

n

∑i,i′=1

n

∑j, j′′=1

n

∑j′, j′′′=1

Gnt,i jGnt,i′ j′′Gns,i j′Gns,i′ j′′′ξnt, jξnt, j′′ξns, j′ξns, j′′′ .

Because ξnt,i are independently distributed over t and i, E(ξnt,i|EnT )= 0, E(ξ 2

nt,i|EnT)=σ2

ξ 0, E(

ξ 3nt,i|EnT

)=

E(

ξ 3nt,i

)and E

(ξ 4

nt,i|EnT)= E

(ξ 4

nt,i),

E‖UnT‖42

≤T

∑t,s=1,t 6=s

n

∑i,i′=1

n

∑j, j′=1

E(Gnt,i jGnt,i′ jGns,i j′Gns,i′ j′

)σ

4ξ 0 +

T

∑t=1

n

∑i,i′=1

n

∑j=1

E(G2

nt,i jG2nt,i′ j)E(ξ

4nt, j)

34

+T

∑t=1

n

∑i,i′=1

n

∑j, j′=1, j 6= j′

E(G2

nt,i jG2nt,i′ j′

)σ

4ξ 0 +2

T

∑t=1

n

∑i,i′=1

n

∑j, j′=1, j 6= j′

E(Gnt,i jGnt,i′ jGnt,i j′Gnt,i′ j′

)σ

4ξ 0

=T

∑t,s=1

n

∑i,i′=1

n

∑j, j′=1


)σ

4ξ 0−

T

∑t=1

n

∑i,i′=1

n

∑j, j′=1


)σ

4ξ 0

+T

∑t=1

n

∑i,i′=1

n

∑j=1

E(G2

nt,i jG2nt,i′ j)E(ξ

4nt, j)+

T

∑t=1

n

∑i,i′=1

n

∑j, j′=1, j 6= j′

E(G2


)σ

4ξ 0

+2T

∑t=1

n

∑i,i′=1

n

∑j, j′=1, j 6= j′


)σ

4ξ 0

=T

∑t,s=1

n

∑i,i′=1

n

∑j, j′=1


)σ

4ξ 0 +

T

∑t=1

n

∑i,i′=1

n

∑j=1

E(G2

nt,i jG2nt,i′ j)(

E(ξ

4nt, j)−σ

4ξ 0

)+

T

∑t=1

n

∑i,i′=1

n

∑j, j′=1, j 6= j′

E(G2


)σ

4ξ 0 +

T

∑t=1

n

∑i,i′=1

n

∑j, j′=1, j 6= j′


)σ

4ξ 0

≤σ4ξ 0E

T

∑t,s=1

n

∑i,i′=1

[GntG′nt

]ii′[GnsG′ns

]ii′+max

j

(∣∣∣E(ξ 4nt, j)−σ

4ξ 0

∣∣∣ ,σ4ξ 0

)E

T

∑t=1

n

∑i,i′=1

n

∑j=1

G2nt,i j

n

∑j=1

G2nt,i′ j

+σ4ξ 0E

T

∑t=1

n

∑i,i′=1

[GntG′nt

]2ii′−σ

4ξ 0

T

∑t=1

n

∑i,i′=1

n

∑j=1

E(G2

nt,i jG2nt,i′ j)

≤σ4ξ 0E

T

∑t,s=1

n

∑i,i′=1

[GntG′nt

]ii′[GnsG′ns

]ii′+max

j

(∣∣∣E(ξ 4nt, j)−σ

4ξ 0

∣∣∣ ,σ4ξ 0

)E

T

∑t=1

n

∑i,i′=1

[GntG′nt

]ii

[GntG′nt

]i′i′

+σ4ξ 0E

T

∑t=1

n

∑i,i′=1

[GntG′nt

]2ii′

≤σ4ξ 0E

T

∑t,s=1

n

∑i=1

∥∥GntG′nt

∥∥∞

∥∥GnsG′ns

∥∥∞+max

j

(∣∣∣E(ξ 4nt, j)−σ

4ξ 0

∣∣∣ ,σ4ξ 0

)E

T

∑t=1

tr(GntG′nt

)2+σ

4ξ 0E

T

∑t=1

n

∑i=1

∥∥GntG′nt

∥∥2∞

=O(nT 2)+O

(n2T

).

By Markov’s inequality, ‖UnT‖2 = OP

(√nT(

n−14 +T−

14

)).

Let UnT = (un1, · · · , unT ) with unt = Snt(λ )S−1nt ξnt . Because Snt(λ )S−1

nt = In +(λ0−λ )Gnt ,∥∥UnT

∥∥2 =

‖ΞnT +(λ0−λ )UnT‖2 ≤ ‖ΞnT‖2 + |λ0−λ |‖UnT‖2 = OP

(√nT(

n−14 +T−

14

)).

A3. Proof of Proposition 1

Let Rz and Ry denote the number of factors specified in the model, and Rz0 and Ry0 the true numbers of

factors. We have assumed that Rz ≥ Rz0 and Ry ≥ Ry0. In order to prove∥∥θ −θ0

∥∥1

p→ 0, by Lemma 1 of Wu

(1981), it is sufficient to show that, for any τ > 0, liminfn,T→∞ P(infθ∈Θ,‖θ−θ0‖2≥τ (QnT (θ0)−QnT (θ))> 0

)=

1, where QnT (θ) is the concentrated likelihood in Eq. (6). The proof consists of four steps.

1. There exists a lower bound Q˜ nT

(θ0) at θ0, such that QnT (θ0)≥ Q˜ nT

(θ0). Because

(Σ− 1

2ε0 ⊗ In

)(znt −Xntzβz0)−Γnz fzt =

(Σ− 1

2ε0 ⊗ In

)εnt +Γnz0 fzt0−Γnz fzt ,

35

and

σ−1ξ 0

[Sntynt −Xntyβy0−

(δ′0⊗ In

)(znt −Xntzβz0)

]−Γny fyt

=σ−1ξ 0

[(δ′0⊗ In

)εnt + Γny0 fyt0 +ξnt −

(δ′0⊗ In

)(Γnz0 fzt0 + εnt

)]−Γny fyt

=σ−1ξ 0 ξnt +σ

−1ξ 0

(Γny0 fyt0−

(δ′0⊗ In

)Γnz0 fzt0

)−Γny fyt = σ

−1ξ 0 ξnt +Γny0 fyt0−Γny fyt , hence,

logLnT (θ0,Γnz,Γny,FT z,FTy)

=− p+12

log(2π)− 12

log |Σε0|−12

logσ2ξ 0 +

1nT

T

∑t=1

log |Snt |

− 12nT

T

∑t=1

[(Σ− 1

2ε0 ⊗ In

)εnt +Γnz0 fzt0−Γnz fzt

]′[(Σ− 1

2ε0 ⊗ In

)εnt +Γnz0 fzt0−Γnz fzt

]− 1

2nT

T

∑t=1

[σ−1ξ 0 ξnt +Γny0 fyt0−Γny fyt

]′ [σ−1ξ 0 ξnt +Γny0 fyt0−Γny fyt

], and,

logLnT (θ0,Γnz0,Γny0,FT z0,FTy0)

=− p+12

log(2π)− 12

log |Σε0|−12

logσ2ξ 0 +

1nT

T

∑t=1

log |Snt |

− 12nT

T

∑t=1

ε′nt(Σ−1ε0 ⊗ In

)εnt −

12nT σ2

ξ 0

T

∑t=1

ξ′ntξnt

=− p+12

(log(2π)+1)− 12

log |Σε0|−12

logσ2ξ 0 +

1nT

T

∑t=1

log |Snt |+OP

(1√nT

).

Therefore,

QnT (θ0)≥ logLnT (θ0,Γnz0,Γny0,FT z0,FTy0)

=− p+12

(log(2π)+1)− 12

log |Σε0|−12

logσ2ξ 0 +

1nT

T

∑t=1

log |Snt |+OP

(1√nT

). (A.1)

Note that this lower bound may not hold if the number of factors is under-specified, in which case the

variation due to factors may not all be accounted for.

2. There exists a function QnT (·), such that QnT (θ)≤ QnT (θ) for all θ ∈Θ.

QnT (θ)

=− p+12

log(2π)− 12

log |Σε |−12

logσ2ξ+

1nT

T

∑t=1

log |Snt(λ )|

− 12nT

minΓnz∈RnP×Rz ,FT z∈RT×Rz

T

∑t=1

[(Σ− 1

2ε ⊗ In


]′[(Σ− 1

2ε ⊗ In


](A.2)

36

− 12nT

minΓny∈Rn×Ry ,FTy∈RT×Ry

T

∑t=1

[σ−1ξ


(δ′⊗ In

)(Znt −Xntzβz)

)−Γny fyt

]′·[σ−1ξ


(δ′⊗ In

)(Znt −Xntzβz)

)−Γny fyt

]. (A.3)

Consider first Eq. (A.2).

minΓnz∈Rnp×Rz ,FT z∈RT×Rz

1nT

T

∑t=1

[(Σ− 1

2ε ⊗ In


]′[(Σ− 1

2ε ⊗ In


]= min

Γnz∈Rnp×Rz ,FT z∈RT×Rz

1nT

T

∑t=1

[(Σ− 1

2ε ⊗ In

)(Xntz(βz0−βz)+ εnt)+

(Σ− 1

2ε ⊗ In

)Γnz0 fzt0−Γnz fzt

]′·[(

Σ− 1

2ε ⊗ In

)(Xntz(βz0−βz)+ εnt)+

(Σ− 1

2ε ⊗ In

)Γnz0 fzt0−Γnz fzt

]≥ min

˜Γnz∈Rnp×2Rz , ˜FT z∈RT×2Rz

1nT

T

∑t=1

[(Σ− 1

2ε ⊗ In

)(Xntz(βz0−βz)+ εnt)− ˜

Γnz˜fzt

]′·[(

Σ− 1

2ε ⊗ In

)(Xntz(βz0−βz)+ εnt)− ˜

Γnz˜fzt

]= min

˜Γnz∈Rnp×2Rz

1nT

T

∑t=1

[(Σ− 1

2ε ⊗ In

)(Xntz(βz0−βz)+ εnt)

]′M ˜

Γnz

[(Σ− 1

2ε ⊗ In

)(Xntz(βz0−βz)+ εnt)

]≥ min

˜Γnz∈Rnp×2Rz

1nT

T

∑t=1

[(Σ− 1

2ε ⊗ In

)Xntz(βz0−βz)

]′M ˜

Γnz

[(Σ− 1

2ε ⊗ In

)Xntz(βz0−βz)

]+ min

˜Γnz∈Rnp×2Rz

1nT

T

∑t=1

[(Σ− 1

2ε ⊗ In

)εnt

]′M ˜

Γnz

[(Σ− 1

2ε ⊗ In

)εnt

]+ min

˜Γnz∈Rnp×2Rz

2nT

T

∑t=1

[(Σ− 1

2ε ⊗ In

)Xntz(βz0−βz)

]′M ˜

Γnz

[(Σ− 1

2ε ⊗ In

)εnt

]=

np

∑r=2Rz+1

µr

(1

nT

(Σ− 1

2ε ⊗ In

)(ηz ·Zz

nT )(ηz ·ZznT )′(

Σ− 1

2ε ⊗ In

))(A.4)

+1

nT

T

∑t=1

ε′nt(Σ−1ε ⊗ In

)εnt − max

˜Γnz∈RnP×2Rz

1nT

T

∑t=1

tr[(

Σ− 1

2ε ⊗ In

)εntε

′nt

(Σ− 1

2ε ⊗ In

)P˜

Γnz

]+

2nT

T

∑t=1

[Xntz(βz0−βz)]′ (

Σ−1ε ⊗ In

)εnt

− max˜Γnz∈RnP×2Rz

2nT

T

∑t=1

tr[(

Σ− 1

2ε ⊗ In

)εnt [Xntz(βz0−βz)]

′(

Σ− 1

2ε ⊗ In

)P˜

Γnz

]≥bz ‖ηz‖2

2 (A.5)

+ tr(Σ−1ε Σε0

)− 2Rz

nT

∥∥∥∥Σ− 1

2ε ⊗ In

∥∥∥∥2

2

∥∥EnT E ′nT

∥∥2 +OP

(1√nT

)(A.6)

− 2Rz

nT

∥∥∥∥Σ− 1

2ε ⊗ In

∥∥∥∥2

2‖EnT‖2

kz

∑k=1

∥∥XnT z,k∥∥

2 |βz0,k−βz,k| (A.7)

=bz ‖ηz‖22 + tr

(Σ−1ε Σε0

)+OP

(‖ηz‖2

(n−

12 +T−

12

))+OP

(1√nT

), (A.8)

37

where ηz = βz0−βz. Eq. (A.5) is because

np

∑r=2Rz+1

µr

(1

nT

(Σ− 1

2ε ⊗ In

)(ηz ·Zz

nT )(ηz ·ZznT )′(

Σ− 1

2ε ⊗ In

))=

np

∑r=2Rz+1

µr

(1

nT

(Σ−1ε ⊗ In

)(ηz ·Zz

nT )(ηz ·ZznT )′)

≥µpn(Σ−1ε ⊗ In

) np

∑r=2Rz+1

µr

(1

nT(ηz ·Zz

nT )(ηz ·ZznT )′)= µp

(Σ−1ε

) np

∑r=2Rz+1

µr

(1

nT(ηz ·Zz

nT )(ηz ·ZznT )′)(A.9)

≥bz ‖ηz‖22 ,

where the inequality in Eq. (A.9) can be found in Theorem 8.12 of Zhang (2011); the last inequality is due

to Assumption 4 and bz is some generic positive constant. In Eq. (A.6), we use the inequality that for a

square matrix A, |tr(A)| ≤ rank(A)‖A‖2 (see Zhang (2011)) and then ‖EnT‖2 = OP

(√max(n,T )

)from

Claim 1. In Eq. (A.7),∥∥XnT z,k

∥∥2 = O

(√nT)

from Claim 2. Next consider Eq. (A.3). Substituting in

ynt = S−1nt(Xntyβy0 +(δ ′0⊗ In)εnt + Γny0 fyt0 +ξnt

)and znt = Xntzβz0 + Γnz0 fzt0 + εnt ,

σ−1ξ


(δ′⊗ In

)(znt −Xntzβz)

)−Γny fyt

=σ−1ξ

[Snt(λ )S−1

nt(Xntyβy0 +(δ ′0⊗ In)εnt + Γny0 fyt0 +ξnt

)−Xntyβy

−(δ ′⊗ In)(Xntzβz0 + Γnz0 fzt0 + εnt −Xntzβz

)]−Γny fyt

=σ−1ξ

[Xnty(βy0−βy)+Gnt

(Xntyβy0 +

(δ′0⊗ In

)εnt + Γny0 fyt0

)(λ0−λ )+

((δ′0−δ

′)⊗ In)

εnt

−(δ′⊗ In

)Xntz(βz0−βz)+Snt(λ )S−1

nt ξnt]+σ

−1ξ

(Γny0 fyt0−

(δ′⊗ In

)Γnz0 fzt0

)−Γny fyt .

Define ηy =(

β ′y0−β ′y,λ0−λ ,δ ′0−δ ′)′

, we have

minΓny∈Rn×Ry ,FTy∈RT×Ry

1nT

T

∑t=1

[σ−1ξ


(δ′⊗ In

)(znt −Xntzβz)

)−Γny fyt

]′[σ−1ξ


(δ′⊗ In

)(znt −Xntzβz)

)−Γny fyt

]≥ min

˜Γny∈Rn×2Ry , ˜FTy∈RT×2Ry

1nT σ2

ξ

T

∑t=1

[Xnty(βy0−βy)+

((δ′0−δ

′)⊗ In)+Gnt

(Xntyβy0 +

(δ′0⊗ In

)εnt + Γny0 fyt0

)(λ0−λ )εnt

−(δ′⊗ In


nt ξnt]− ˜

Γny˜fyt

′[Xnty(βy0−βy)+

((δ′0−δ

′)⊗ In)

εnt

+Gnt(Xntyβy0 +

(δ′0⊗ In

)εnt + Γny0 fyt0

)(λ0−λ )−

(δ′⊗ In


nt ξnt]− ˜

Γny˜fyt

≥ min

˜Γny∈Rn×2Ry

1nT

T

∑t=1

σ−1ξ



)(λ0−λ )+ εnt(δ0−δ )

]′M ˜

Γnyσ−1ξ



)(λ0−λ )+ εnt(δ0−δ )

]

38

+ min˜Γny∈Rn×2Ry

1nT

T

∑t=1

σ−1ξ

((δ′⊗ In

)Xntz(βz0−βz)

)′M ˜

Γny

σ−1ξ

((δ′⊗ In

)Xntz(βz0−βz)

)+ min

˜Γny∈Rn×2Ry

− 2nT

T

∑t=1

σ−1ξ



)(λ0−λ )+ εnt (δ0−δ )

]′M ˜

Γnyσ−1ξ

((δ′⊗ In

)Xntz(βz0−βz)

)+ min

˜Γny∈Rn×2Ry

1nT

T

∑t=1

[σ−1ξ

Snt(λ )S−1nt ξnt

]′M ˜

Γny

[σ−1ξ


]+ min

˜Γny∈Rn×2Ry

2nT

T

∑t=1

σ−1ξ



)(λ0−λ )+ εnt(δ0−δ )

−(δ′⊗ In

)Xntz(βz0−βz)

]′M ˜Γny

σ−1ξ


=σ

−2ξ

n

∑r=2Ry+1

µr

(1

nT

(ηy ·Zy

nT

)(ηy ·Zy

nT

)′)+OP

(‖βz0−βz‖2

2

)+OP

(‖βz0−βz‖2 ‖ηy‖2

)+

1nT

T

∑t=1

σ−2ξ

ξ′ntS−1nt′Snt(λ )

′Snt(λ )S−1nt ξnt − max

˜Γny∈Rn×2Ry

1nT

T

∑t=1

tr(

σ−2ξ

Snt(λ )S−1nt ξntξ

′ntS−1nt′Snt(λ )

′P˜Γny

)(A.10)

+2

nT

T

∑t=1

σ−2ξ



)(λ0−λ )

+εnt (δ0−δ )−(δ′⊗ In

)Xntz(βz0−βz)

]′ Snt(λ )S−1nt ξnt (A.11)

− max˜Γny∈Rn×2Ry

2nT σ2

ξ

T

∑t=1

tr(Snt(λ )S−1

nt ξnt[Xnty(βy0−βy)+Gnt


)(λ0−λ )

+εnt (δ0−δ )−(δ′⊗ In

)Xntz(βz0−βz)

]′P˜Γny∈Rn×2Ry

)(A.12)

≥by ‖ηy‖22 +OP

(‖βz0−βz‖2

2

)+OP

(‖βz0−βz‖2 ‖ηy‖2

)+

σ2ξ 0

σ2ξ

1nT

T

∑t=1

tr(S−1

nt′Snt(λ )

′Snt(λ )S−1nt)

+OP

(‖ηy‖2

(n−

12 +T−

12

))+OP

((‖ηy‖2

2 +‖ηy‖2 ‖ηz‖2

)(n−

14 +T−

14

))+OP

(1√nT

). (A.13)

Under Assumption 4, by > 0. If there are collinearity between Gnt(Xntyβ0 + εntδ0 + Γny0 fyt0

), Xnty and εnt ,

then Assumption 4 will not be satisfied. However, with λ identified from the alternative Assumption 5(2),

we can still have by > 0 from Assumption 5(1). In Eq. (A.10), 1nT ∑

Tt=1 σ

−2ξ

ξ ′ntS−1nt′Snt(λ )

′Snt(λ )S−1nt ξnt =

σ2ξ 0

σ2ξ

1nT ∑

Tt=1 tr

(S−1

nt′Snt(λ )

′Snt(λ )S−1nt)+OP(

1√nT) from Lemma 5(1), and

∣∣∣∣∣ max˜Γny∈Rn×2Ry

1nT

T

∑t=1

tr(

σ−2ξ

Snt(λ )S−1nt ξntξ

′ntS−1nt′Snt(λ )

′P˜Γny

)∣∣∣∣∣≤


1nT σ2

ξ

T

∑t=1

tr(

ξntξ′ntP˜

Γny

)∣∣∣∣∣+∣∣∣∣∣ max

˜Γny∈Rn×2Ry

2(λ0−λ )

nT σ2ξ

T

∑t=1

tr(

Gntξntξ′ntP˜

Γny

)∣∣∣∣∣

39

+


(λ0−λ )2

nT σ2ξ

T

∑t=1

tr(

Gntξntξ′ntG′ntP˜

Γny

)∣∣∣∣∣≤

2Ry

nT σ2ξ

‖ΞnT‖22 +

2Ry |λ0−λ |nT σ2

ξ

‖UnT‖2 ‖ΞnT‖2 +Ry |λ0−λ |2

nT σ2ξ

‖UnT‖22

=Op

(1n+

1T

)+Op

(|λ0−λ |

(n−

34 +T−

34

))+Op

(|λ0−λ |2

(n−

12 +T−

12

)),

where UnT = (un1, · · · ,unT ) with unt = Gntξnt is an n× T matrix, and the last equality is from Claim 3,‖UnT‖2 = OP

(√nT(

n−14 +T−

14

)). Eq. (A.11) is OP(

1√nT) by Proposition 2. For example, one term can

be written as

1nT

T

∑t=1

(GntXntyβy0)′ Snt(λ )S−1

nt ξnt =1

nTXβy0

nTy′G′nT

(InT +(λ0−λ ) GnT

)ξ∗nT

=1

nTEXβy0

nTy′G′nT

(InT +(λ0−λ ) GnT

)ξ∗nT +Op

(1√nT

)= OP(

1√nT

),

with ξ ∗nT = (ξ ′n1, · · · ,ξnT ), GnT in Proposition 2 and Xβy0nTy =

(β ′y0X ′n1y, · · · ,β ′y0X ′nTy

)′. The term in (A.12) can

be similarly shown to be OP

(‖ηy‖2

(n−

12 +T−

12

))+OP

((‖ηy‖2

2 +‖ηy‖2 ‖ηz‖2

)(n−

14 +T−

14

)). Com-

bining Eqs. (A.8) and (A.13), we have

QnT (θ)≤−p+1

2log(2π)− 1

2log |Σε |−

12

logσ2ξ+

1nT

T

∑t=1

log |Snt(λ )|−12

σ2ξ 0

σ2ξ

1nT

T

∑t=1

tr(S−1

nt′Snt(λ )

′Snt(λ )S−1nt)

− 12

[by ‖ηy‖2

2 +OP

(‖ηz‖2

2

)+OP

(‖ηz‖2 ‖ηy‖2

)+ (A.14)

Op

(‖ηy‖2

(n−

12 +T−

12

))+OP

((‖ηy‖2

2 +‖ηz‖2 ‖ηy‖2

)(n−

14 +T−

14

))](A.15)

− 12

[bz ‖ηz‖2

2 + tr(Σ−1ε Σε0

)+OP

(‖ηz‖2

(n−

12 +T−

12

))]+OP

(n−1 +T−1) , (A.16)

Notice that with a slight abuse of notation, we list (A.14) separately from (A.15) because the former terms

come from

1nT

T

∑t=1

σ−2ξ



)(λ0−λ )+ εnt(δ0−δ )−

(δ′⊗ In

)Xntz(βz0−βz)

]′M ˜

Γny[Xnty(βy0−βy)+Gnt


)(λ0−λ )+ εnt(δ0−δ )−

(δ′⊗ In

)Xntz(βz0−βz)

], (A.17)

and M ˜Γny

is positive semi-definite, and therefore it is non-negative; while the later terms come from (A.10)

and (A.12). In the following analysis, we can take QnT (θ) from the right hand side of the preceding inequal-

ity for QnT (θ).

3. liminfn,T→∞ P(infθ∈Θ,‖θ−θ0‖2≥τ [QnT (θ0)−QnT (θ)]> 0

)= 1 for any τ > 0. From Eq. (A.1) and

(A.16),

QnT (θ0)−QnT (θ)

40

≥12

[bz ‖ηz‖2

2 +by ‖ηy‖22 +OP

(‖ηz‖2

2

)+OP

(‖ηz‖2 ‖ηy‖2

)]+

12

[OP

(‖ηz‖2

(n−

12 +T−

12

))+OP

(‖ηy‖2

(n−

12 +T−

12

))+OP

((‖ηy‖2

2 +‖ηz‖2 ‖ηy‖2

)(n−

14 +T−

14

))]+

12

(− log

[σ2

ξ 0

σ2ξ

(T

∏t=1


nt S−1nt′∣∣ 1

nT

)]+

σ2ξ 0

σ2ξ

1nT

T

∑t=1

tr(Snt(λ )


nt′)−1

)

+p2

(− log

∣∣Σε0Σ−1ε

∣∣ 1P +

1p

tr(Σε0Σ

−1ε

)−1)+OP(n−1 +T−1). (A.18)

By the inequality of arithmetic and geometric means,

1nT

T

∑t=1

tr(Snt(λ )


nt′)≥ 1

T

T

∑t=1


nt S−1nt′∣∣ 1

n ≥

(T

∏t=1


nt S−1nt′∣∣ 1

n

) 1T

,

(A.19)1p

tr(Σε0Σ

−1ε

)≥∣∣Σε0Σ

−1ε

∣∣ 1p . (A.20)

Furthermore, because by ‖ηy‖22 +OP

(‖ηz‖2

2

)+OP

(‖ηz‖2 ‖ηy‖2

)≥ 0 and ‖ηy‖2 is bounded,


≥12

(bz ‖ηz‖2

2 +oP (1))

(A.21)

+12

(− log

[σ2

ξ 0

σ2ξ

(T

∏t=1


nt S−1nt′∣∣ 1

nT

)]+

σ2ξ 0

σ2ξ

T

∏t=1


nt S−1nt′∣∣ 1

nT −1

)

+p2

(− log

∣∣Σε0Σ−1ε

∣∣ 1P +∣∣Σε0Σ

−1ε

∣∣ 1p −1

),

If ‖ηz‖2 = 0,


≥12

(by ‖ηy‖2

2 +oP (1))

(A.22)

+12

(− log

[σ2

ξ 0

σ2ξ

(T

∏t=1


nt S−1nt′∣∣ 1

nT

)]+

σ2ξ 0

σ2ξ

T

∏t=1


nt S−1nt′∣∣ 1

nT −1

)

+p2

(− log

∣∣Σε0Σ−1ε

∣∣ 1P +∣∣Σε0Σ

−1ε

∣∣ 1p −1

).

For any real number x > 0, − logx+ x− 1 ≥ 0 with equality holds only if x = 1. Because 1p tr(Σε0Σ−1

ε

)>∣∣Σε0Σ−1

ε

∣∣ 1p unless Σε = cΣε0 for some scalar c, and − log

∣∣Σε0Σ−1ε

∣∣ 1p +∣∣Σε0Σ−1

ε

∣∣ 1p −1 > 0 if c 6= 1, we have

QnT (θ0)−QnT (θ) > 0 if Σε 6= Σε0 with probability approaching (wpa) 1. Because the mapping from Σε

to α is one to one, QnT (θ0)−QnT (θ) > 0 if α 6= α0 wpa 1. For the other parameters, consider the case

with Assumption 4 first. If any of the following holds for some τ > 0, ‖βz0−βz‖2 ≥ τ , ‖βy0−βy‖2 ≥ τ ,|λ0−λ | ≥ τ , ‖δ0−δ‖2 ≥ τ , Eq. (A.21) and/or Eq. (A.22) is strictly greater than 0 wpa 1, and hence

41

QnT (θ0)−QnT (θ) > 0 wpa 1. Alternatively, consider the case with Assumption 5. The inequality in Eq.

(A.19) holds strictly for any λ 6= λ0 wpa 1. On the other hand, with λ = λ0, Eq. (A.21) and/or Eq. (A.22) is

strictly positive wpa 1 if for some τ > 0, either ‖βz0−βz‖2 ≥ τ , ‖βy0−βy‖2 ≥ τ , or ‖δ0−δ‖2 ≥ τ holds.

4. The rate of convergence. Let θ denote the QML estimator, ηz = βz0 − βz and

ηy =(

βy0− βy,λ0− λ ,δ ′0− δ ′)′

. Let cnT = min(n,T ). By definition, QnT(θ)−QnT (θ0) ≥ 0, which im-

plies from Eq. (A.18),

by ‖ηy‖22 +OP

(‖ηz‖2

2

)+OP

(‖ηz‖2 ‖ηy‖2

)+bz ‖ηz‖2

2 +OP

(‖ηz‖2 c−

12

nT

)+OP

(‖ηy‖2 c−

12

nT

)+OP

((‖ηy‖2

2 +‖ηz‖2 ‖ηy‖2

)c−

14

nT

)+OP

(c−1

nT

)≤ 0.

(A.23)

Completing the squares, we have from (A.23),(b

12y,nT ‖ηy‖2 +

12

ay,nT b−12

y,nT

)2

+dy,nT −14

a2y,nT b−1

y,nT ≤ 0, (A.24)

with by,nT = by+OP

(c−

14

nT

), ay,nT =OP

(c−

12

nT

)+OP (‖ηz‖2)+OP

(‖ηz‖2 c−

14

nT

)and dy,nT =OP

(‖ηz‖2

2

)+

bz ‖ηz‖22 +OP

(‖ηz‖2 c−

12

nT

)+OP

(c−1

nT

). Recall that by ‖ηy‖2

2 +OP

(‖ηz‖2

2

)+OP

(‖ηz‖2 ‖ηy‖2

)≥ 0 (see

(A.17)), then (A.23) implies that

bz ‖ηz‖22 +‖ηz‖2 OP

(c−

12

nT

)︸︷︷︸

az,nT

+OP

(‖ηy‖2 c−

12

nT

)+OP

((‖ηy‖2

2 +‖ηz‖2 ‖ηy‖2

)c−

14

nT

)+OP

(c−1

nT

)︸︷︷︸

dz,nT

≤ 0.

(A.25)

Similarly, by completing the squares, from (A.25),(b

12z ‖ηz‖2 +

12

az,nT b−12

z

)2

+dz,nT −14

a2z,nT b−1

z ≤ 0. (A.26)

Because the parameter spaces are bounded, dz,nT− 14 a2

z,nT b−1z =OP

(c−

14

nT

), b

12z ‖ηz‖2+

12 az,nT b−

12

z =Op

(c− 1

8nT

).

As az,nT = OP

(c−

12

nT

), we then have ‖ηz‖2 = OP

(c− 1

8nT

).

Now we proceed to show that ‖ηz‖2 = OP

(c−

12

nT

). Assuming that for some 1

8 ≤ a < 12 , ‖ηz‖2 =

OP(c−a

nT

). Substituting ‖ηz‖2 = OP

(c−a

nT

)into Eq. (A.24), we have dy,nT − 1

4 a2y,nT b−1

y,nT = OP(c−2a

nT

), ay,nT =

OP(c−a

nT

). Therefore ‖ηy‖2 = OP

(c−a

nT

). Substituting ‖ηy‖2 = OP

(c−a

nT

)into Eq. (A.26), we have dz,nT =

OP

(c−

12−a

nT + c−14−2a

nT

), which gives ‖ηz‖2 = OP

(max

(c−

14−

12 a

nT ,c− 1

8−anT

)). Note that for a < 1

2 , −a >

max(−1

4 −12 a,−1

8 −a)

and we can always improve the rate for a < 12 . Therefore ‖ηz‖2 = OP

(c−

12

nT

).

42

Then ‖ηy‖2 = OP

(c−

12

nT

)follows from Eq. (A.24).

Lemma 5. Let ξnT = σ−1ξ 0 (ξ ′n1, · · · ,ξ ′nT )

′. Under Assumptions 1-3 and 6,

1nT σ2

ξ 0

T

∑t=1


′Snt(λ )S−1nt ξnt =

1nT

T

∑t=1

tr(S−1

nt′Snt(λ )


1√nT

).

Proof. We have

1nT σ2

ξ 0

T

∑t=1


′Snt(λ )S−1nt ξnt

=1

nTξnT

′ (InT +(λ0−λ )(GnT + G′nT

)+(λ0−λ )2G′nT GnT

)ξnT

=1

nTEtr(InT +(λ0−λ )

(GnT + G′nT


)+OP(

1√nT

) (A.27)

=1

nTtr(InT +(λ0−λ )

(GnT + G′nT


)+OP(

1√nT

) (A.28)

=1

nT

T

∑t=1

tr(S−1

nt′Snt(λ )


1√nT

),

where Eq. (A.27) is from Lemma 2. Let ιk,nT be an nT × 1 vector with one in its k-th entry and zerosin its other entries. Because tr

(GnT

)= ∑

nTk=1 gk,nT with gk,nT = ι ′k,nT GnT ιk,nT and gk,nT satisfies the NED

properties, Eq. (A.28) follows from Qu and Lee (2015).

A4. Components of the Asymptotic Distribution

As Eq. (11) shows, the asymptotic distribution depends on the approximate Hessian matrix CnT and the

approximate gradient vector C(1)nT whose components can be collected from Eq. (10) and further simplified

using Lemmas 2 and 5(2). Recall that the parameter vector is θ =(β ′z ,β

′y,λ ,αξ ,α

′,δ ′)′.

(1) Elements of C(1)nT (the Approximate Score Vector)

The following lemma establishes that some of the second order terms in Eqs. (8) and (9) do not appear

in the leading order term of the score vector.

Lemma 6. Under Assumptions 2 and 3,

1√nT

tr(

MΓnzVzk MFT zE

′nT

(Σ− 1

2ε0 ⊗ In

)PΓnz,FT zE

′nT

(Σ− 1

2ε0 ⊗ In

))= oP(1),

1√nT

tr(

MΓnz

(Σ− 1

2ε0 ⊗ In

)EnT MFT zV

zk′PΓnz,FT zE

′nT

(Σ− 1

2ε0 ⊗ In

))= oP(1),

1√nT

tr(

MΓnz

(Σ− 1

2ε0 ⊗ In

)EnT MFT zE

′nT

(Σ− 1

2ε0 ⊗ In

)PΓnz,FT zV

zk′)= oP(1),

43

for k = 1, · · · ,kz + J, and V zk are from Section 4.3;

1√nT

tr(MΓnyV

yk MFTyΞ

′nT PΓny,FTyΞ

′nT)=

1√nT

tr(MΓnyΞnT MFTyV

yk′PΓny,FTyΞ

′nT)= oP(1),

1√nT

tr(MΓnyΞnT MFTyΞ

′nT PΓny,FTyV

yk′)= oP(1),

for k = 1, · · · ,kz + ky +2+ p, and V yk are from Section 4.3.

Proof. From Moon and Weidner (2014) (Lemma B.1),∥∥PΓnz,FT z

∥∥2 =OP

(1√nT

)and

∥∥PΓny,FTy

∥∥2 =OP

(1√nT

).

They also show that the above terms are oP(1) if the regressor contains only exogenous components. Theirresult can be applied here except for the three terms, −(Ωk⊗ In)EnT in V z

kz+k for k = 1, · · · ,J,σ−1ξ 0 (Gn1ξn1, · · · ,GnT ξnT ) in V y

kz+ky+1, and −ΞnT in V ykz+ky+2. However, because these terms are oP

(√nT),

the relevant terms are oP(1) by using the inequality |tr(A)| ≤ rank(A)‖A‖2 for a square matrix A.

The followings are the components in the score vector. The oP(1) terms in the first lines are due to

Lemma 6.

• βz: for k = 1, · · · ,kz,

C(1)nT,k =

1√nT

tr(

MFT zX′nT z,k

(Σ− 1

2ε0 ⊗ In

)MΓnzenT

)− 1√

nT σ2ξ 0

tr(MFTyX

′nT z,k (δ0⊗ In)MΓnyΞnT

)+oP(1)

=1√nT

(a(k)1nT

′enT +b(k)1nT′ξnT

)+oP(1),

where a(k)1nT = vec(

MΓnz

(Σ− 1

2ε0 ⊗ In

)XnT z,kMFT z

)and b(k)1nT =−σ

−1ξ 0 vec

(MΓny (δ

′0⊗ In) XnT z,kMFTy

).

• βy: for k = 1, · · · ,ky,

C(1)nT,kz+k =

1√nT σ2

ξ 0

tr(MFTyX

′nTy,kMΓnyΞnT

)+oP(1) =

1√nT

b(k)2nT′ξnT +oP(1),

where b(k)2nT = σ−1ξ 0 vec

(MΓnyXnTy,kMFTy

).

• λ :

C(1)nT,kz+ky+1 =−

1√nT

tr(GnT

)+

1√nT σξ 0

ξ′nT(MFTy⊗MΓny

)GnT

(XnTyβy0 +

(IT ⊗ Γny

)FTy + ˜enT Σ

12ε0δ0

)+

1√nT

ξ′nT GnT ξnT −

1√nT

ξ′nT(PFTy⊗ In

)GnT ξnT −

1√nT

ξ′nT(IT ⊗PΓny

)GnT ξnT +oP(1)

=1√nT

(b′3nT ξnT + ξ

′nT B1nT ξnT − tr(B1nT )

)− 1√

nTtr((

PFT ⊗ In + IT ⊗PΓny

)GnT

)+oP(1),

(A.29)

where GnT = diag(Gn1, · · · ,GnT ) is an nT × nT matrix, XnTy =(

X ′n1y, · · · ,X ′nTy

)′is nT × ky, FTy =

44

(f ′y1, · · · , f ′yT

)′is T Ry0×1, ˜enT =

e111 e112 · · · e11p...

......

en11 en12 · · · en1p...

......

enT 1 enT 2 · · · enT p

is nT × p,

b3nT =σ−1ξ 0

(MFTy⊗MΓny

)GnT

(XnTyβy0 +

(IT ⊗ Γny

)FTy + ˜enT Σ

12ε0δ0

), and

B1nT =12(InT −PFTy⊗ In− IT ⊗PΓny

)GnT +

12

G′nT(InT −PFTy⊗ In− IT ⊗PΓny

).

• αξ :

C(1)nT,kz+ky+2 =−

σξ 0√nT

ξ′nT(MFTy⊗MΓny

)ξnT +

√nT σξ 0 +oP(1)

=1√nT

(ξ′nT B2nT ξnT − tr(B2nT )

)+

(√nT+

√Tn

)Ry0σξ 0 +oP(1),

where B2nT =−σξ 0MFTy⊗MΓny .

• α: for k = 1, · · · ,J, by using the relation tr(B′1B2B3B4) = vec(B1)′ (B′4⊗B2)vec(B3) for any con-

formable matrices B1, · · · ,B4,

C(1)nT,kz+ky+2+k =

√nT tr

(Σ

12ε0Ωk

)− 1√

nTtr(

e′nT

(Σ

12ε0Ωk⊗ In

)MΓnzenT MFT z

)+oP(1)

=1√nT

(e′nT A(k)

nT enT − tr(

A(k)nT

))+

√Tn

tr[(

Σ12ε0Ωk⊗ In

)PΓnz

]+Rz0

√nT

tr[

Σ12ε0Ωk

]+oP(1),

where A(k)nT = 1

2

(A(k)

nT + A(k)nT′)

with A(k)nT =−MFT z⊗

((Σ

12ε0Ωk⊗ In

)MΓnz

). Note that Ωk comes from

the linear parameterization of square root of the variance, Σ− 1

2ε = ∑

Jj=1 Ω jα j.

• δ : for k = 1, · · · , p, by using the relation tr(B′1B2B3B4) = vec(B1)′ (B′4⊗B2)vec(B3),

C(1)nT,kz+ky+2+J+k =

1√nT σ2

ξ 0

tr[MΓnyΞnT MFTy

(ΓnzF ′T z +EnT

)′(ek⊗ In)

]+oP(1)

=1√nT

(b(k)4nT

′ξnT + e′nT D(k)

nT ξnT

)+oP(1),

where ek be a p × 1 vector with one in its k-th entry and zeros in its other entries,

b(k)4nT = σ−1ξ 0 vec

(MΓny

(e′k⊗ In

)ΓnzF ′T zMFTy

)and D(k)

nT = σ−1ξ 0 MFTy⊗

((Σ

12ε0ek⊗ In

)MΓny

).

45

(2) Variance of C(1)nT −ϕnT

ϕnT is defined in Proposition 2 and E(

C(1)nT −ϕnT

)= 0. The variance of C(1)

nT −ϕnT can be straightfor-

wardly shown to be ΣnT +ΠnT with terms given below:

ΣnT =

ΣβzβznT Σ

βzβynT Σ

βzλnT 0 0 Σ

βzδnT

ΣβyβynT Σ

βyλ

nT 0 0 Σβyδ

nT

∗ ΣλλnT Σ

λαξ

nT 0 ΣλδnT

∗ ∗ Σαξ αξ

nT 0 0∗ ∗ ∗ Σαα

nT 0∗ ∗ ∗ ∗ Σδδ

nT

, ΠnT =

0 0 ΠβzλnT Π

βzαξ

nT ΠβzαnT 0

0 Πβyλ

nT Πβyαξ

nT 0 0

∗ ΠλλnT Π

λαξ

nT 0 ΠλδnT

∗ ∗ Παξ αξ

nT 0 Παξ δ

nT

∗ ∗ ∗ ΠααnT 0

∗ ∗ ∗ ∗ 0

,

(A.30)

with[Σ

βzβznT

]k1k2

=1

nTtr(

MFT zX′nT z,k1

(Σ− 1

2ε0 ⊗ In

)MΓnz

(Σ− 1

2ε0 ⊗ In

)XnT z,k2

)+

1nT σ2

ξ 0tr(MFTyX

′nT z,k1

(δ0⊗ In)MΓny

(δ′0⊗ In

)XnT z,k2

)[Σ

βzβynT

]k1k2

=− 1nT σ2

ξ 0tr(MFTyX

′nT z,k1

(δ0⊗ In)MΓnyXnTy,k2

)[Σ

βzλnT

]k

=1

nTE(

b(k)1nT′b3nT

)[Σ

βzδnT

]k1k2

=1

nTb(k1)

1nT′b(k2)

4nT =− 1nT σ2

ξ 0tr(MFTyX

′nT z,k1

(δ0⊗ In)MΓny

(e′k2⊗ In

)ΓnzF ′T z

)[Σ

βyβynT

]k1k2

=1

nT σ2ξ 0

tr(MFTyX

′nTy,k1

MΓnzXnTy,k2

)[Σ

βyλ

nT

]k

=1

nTE(

b(k)2nT′b3nT

)[Σ

βyδ

nT

]k1k2

=1

nTb(k1)

2nT′b(k2)

4nT =1

nT σ2ξ 0

tr(MFTyX

′nTy,k1

MΓny

(e′k2⊗ In

)ΓnzF ′T z

)Σ

λλnT =

1nT

E(b′3nT b3nT +2tr

(B2

1nT))

=1

nTE(b′3nT b3nT + tr

(G2

nT + G′nT GnT))

+o(1)

Σλαξ

nT =2

nTE(tr(B1nT B2nT )) =−

2σξ 0

nTEtr(GnT

)[Σ

λδnT

]k

=1

nTE(

b′3nT

(b(k)4nT +D(k)

nT′enT

))Σ

αξ αξ

nT =2

nTtr(B2

2nT)= 2σ

2ξ 0 +o(1)

[ΣααnT ]k1k2

=2

nTtr(

A(k1)nT A(k2)

nT

)= tr(Σε0Ωk1Ωk2)+ tr

(Σ

12ε0Ωk1Σ

12ε0Ωk2

)+o(1)[

ΣδδnT

]k1k2

=1

nT

(b(k1)

4nT′b(k2)

4nT + tr(

D(k1)nT′D(k2)

nT

))

46

=1

nT σ2ξ 0

tr(MFTyF

′T zΓnz (ek1⊗ In)MΓny

(e′k2⊗ In

)ΓnzFT zMFTy

)+σ

−2ξ 0 tr

(e′k1

Σε0ek2

)+o(1),

and[Π

βzλnT

]k

=1

nTE

(nT

∑i=1

b(k)1nT,iB1nT,iiξ3nT,i

)[Π

βzαξ

nT

]k

=−σξ 0

nT

nT

∑i=1

b(k)1nT,i

[MFTy⊗MΓny

]iiEξ

3nT,i[

ΠβzαnT

]k1k2

=1

nT

npT

∑i=1

a(k1)1nT,iA

(k2)nT,iiEe3

nT,i

[Π

βyλ

nT

]k

=1

nTE

(nT

∑i=1

b(k)2nT,iB1nT,iiξ3nT,i

)[Π

βyαξ

nT

]k

=−σξ 0

nT

nT

∑i=1

b(k)2nT,i

[MFTy⊗MΓny

]iiEξ

3nT,i

ΠλλnT =

1nT

E

(nT

∑i=1

(B1nT,ii)2(

ξ4nT,i−3

)+2

nT

∑i=1

b3nT,iB1nT,iiξ3nT,i

)

Πλαξ

nT =1

nTE

(nT

∑i=1

B1nT,iiB2nT,ii

(ξ

4nT,i−3

)+2

nT

∑i=1

b3nT,iB2nT,iiξ3nT,i

)[Π

λδnT

]k

=1

nTE

(nT

∑i=1

npT

∑s=1

B1nT,ii

(b(k)4nT,i +D(k)

nT,sienT,s

)ξ

3nT,i

)

Παξ αξ

nT =1

nT

nT

∑i=1

(B2nT,ii)2(Eξ

4nT,i−3

)[Π

αξ δ

nT

]k

=1

nTE

(nT

∑i=1

B2nT,iib(k)4nT,iξ

3nT,i

)

[ΠααnT ]k1k2

=1

nT

npT

∑i=1

A(k1)nT,iiA

(k2)nT,ii

(Ee4

nT,i−3).

(3) Elements of CnT (the Approximate Hessian Matrix)

CnT =

CβzβznT Cβzβy

nT CβzλnT C

βzαξ

nT CβzαnT Cβzδ

nT

CβyβynT Cβyλ

nT Cβyαξ

nT 0 Cβyδ

nT

∗ CλλnT C

λαξ

nT 0 CλδnT

∗ ∗ Cαξ αξ

nT 0 Cαξ δ

nT

∗ ∗ ∗ CααnT 0

∗ ∗ ∗ ∗ CδδnT

.

[Cβzβz

nT

]k1k2

=1

nTtr(

MFT zX′nT z,k1

(Σ− 1

2ε0 ⊗ In

)MΓnz

(Σ− 1

2ε0 ⊗ In

)XnT z,k2

)47

+1

nT σ2ξ 0

tr(MFTyX

′nT z,k1

(δ0⊗ In)MΓny

(δ′0⊗ In

)XnT z,k2

)[Cβzβy

nT

]k1k2

=− 1nT σ2

ξ 0tr(MFTyX

′nT z,k1

(δ0⊗ In)MΓnyXnTy,k2

)[Cβzλ

nT

]k

=1

nTb(k)1nT

′b3nT −1

nT σξ 0vec(MΓny

(δ′0⊗ In

)XnT z,kMFTy

)′ (MFTy⊗MΓny

)GnT ξnT

=1

nTE(

b(k)1nT′b3nT

)+oP(1)[

Cβzαξ

nT

]k

=1

nT σξ 0tr(MFTyX

′nT z,k1

(δ0⊗ In)MΓnyΞnT)= oP(1)[

CβzαnT

]k1k2

=− 1nT

tr(

MΓnz

(Σ− 1

2ε0 ⊗ In

)XnT z,k1MFT zE

′nT (Ωk⊗ In)

)= oP(1)

[Cβzδ

nT

]k1k2

=− 1nT σ2

ξ 0tr

(MΓny

(δ′0⊗ In

)XnT z,k1MFTy

(kz

∑k=1

XnT z,k

(βz0,k− βz,k

)+ ΓnzF ′T z +EnT

)′(ek2⊗ In)

)

=− 1nT σ2

ξ 0tr(MFTyX

′nT z,k1

(δ0⊗ In)MΓny

(e′k2⊗ In

)ΓnzF ′T z

)+oP(1)[

CβyβynT

]k1k2

=1

nT σ2ξ 0

tr(MΓnyXnTy,k1MFTyX

′nTy,k2

)[Cβyλ

nT

]k

=1

nTb(k)2nT

′b3nT +1

nT σξ 0vec(MΓnyXnTy,kMFTy

)′ (MFTy⊗MΓny

)GnT ξnT =

1nT

E(

b(k)2nT′b3nT

)+oP(1)[

Cβyαξ

nT

]k

=− 1nT σξ 0

tr(MΓnyXnTy,kMFTyΞnT

)= oP(1)

[Cβyδ

nT

]k1k2

=1

nT σ2ξ 0

tr

(MΓnyXnTy,k1MFTy

(kz

∑k=1

XnT z,k

(βz0,k− βz,k


)′(ek2⊗ In)

)

=1

nT σ2ξ 0

tr(MFTyX

′nTy,k1

MΓny

(e′k2⊗ In

)ΓnzF ′T z

)+oP(1)

CλλnT =

1nT σ2

ξ 0tr(

MΓnyXnTy,ky+1MFTyX′nTy,ky+1

)+

1nT

tr(G2

nT)

=1

nTb′3nT b3nT +

1nT

ξ′nT G′nT

(MFTy⊗MΓny

)GnT ξnT +

1nT

tr(G2

nT)+oP(1)

=1

nTE(b′3nT b3nT

)+

1nT

(tr(G2

nT)+ tr

(G′nT GnT

))+op(1)

Cλαξ

nT =−σξ 0

nTξ′nT GnT ξnT −

1nT σξ 0

tr(MΓnyXnTy,ky+1MFTyΞ

′nT)

=−σξ 0

nTξ′nT GnT ξnT −

σξ 0

nTξ′nT(MFTy⊗MΓny

)GnT ξnT +oP(1) =−

2σξ 0

nTEtr(GnT

)+oP(1)[

CλδnT

]k

=1

nT σ2ξ 0

tr

(MΓnyXnTy,ky+1MFTy

(kz

∑k=1

XnT z,k

(βz0,k− βz,k


)′(ek⊗ In)

)

=1

nTE(

b′3nT

(b(k)4nT +D(k)

nT′enT

))+oP(1)

Cαξ αξ

nT =1

nTtr(MΓnyξnT MFTyξ

′nT)+σ

2ξ 0

48

=σ2

ξ 0

nTξ′nT(MFTy⊗MΓny

)ξnT +σ

2ξ 0 = 2σ

2ξ 0 +oP(1)[

Cαξ δ

nT

]k

=− 1nT σξ 0

tr

(MΓnyΞnT MFTy

(kz

∑k=1

XnT z,k

(βz0,k− βz,k


)′(ek⊗ In)

)= oP(1)

[Cαα

nT]

k1k2=

1nT

tr(MΓnz (Ωk1⊗ In)EnT MFT zE

′nT (Ωk2⊗ In)

)+ tr

(Σ

12ε0Ωk1Σ

12ε0Ωk2

)=

1nT

e′nT

(MFT z⊗

((Σ

12ε0Ωk2⊗ In

)MΓnz

(Ωk1Σ

12ε0⊗ In

)))enT + tr

(Σ

12ε0Ωk1Σ

12ε0Ωk2

)+oP(1)

=tr(Σε0Ωk1Ωk2)+ tr(

Σ12ε0Ωk1Σ

12ε0Ωk2

)+oP(1)[

CδδnT

]k1k2

=1

nT σ2ξ 0

tr

(MΓny

(e′k1⊗ In

)( kz

∑k=1

XnT z,k

(βz0,k− βz,k


)

MFTy

(kz

∑k=1

XnT z,k

(βz0,k− βz,k


)′(ek2⊗ In)

)

=1

nT σ2ξ 0

tr(MΓny

(e′k1⊗ In

)EnT MFTyE

′nT (ek2⊗ In)

)+

1nT σ2

ξ 0tr(MΓny

(e′k1⊗ In

)ΓnzF ′T zMFTyFT zΓ

′nz (ek2⊗ In)

)+oP(1)

=σ−2ξ 0 tr

(e′k1

Σε0ek2

)+

1nT σ2

ξ 0tr(MFTyF

′T zΓnz (ek1⊗ In)MΓny

(e′k2⊗ In

)ΓnzFT z

)+op(1).

A5. Proof of Proposition 3

We show that Corollary 1 of Ahn and Horenstein (2013) holds by checking that their Assumption A-D

are satisfied. Assuming that n and T are proportional, and the preliminary estimator satisfies∥∥θ −θ0

∥∥2 =

op

(n−

12

), we have the following results.

1.∥∥∥∥(Σ

− 12

ε0 ⊗ In

)EnT + EnT (θ)

∥∥∥∥2=Op (

√n), because ‖EnT‖2 =Op (

√n) by Assumption 2 and

∥∥EnT(θ)∥∥

2≤

∑kz+Jk=1

∥∥ηzk

∥∥2

∥∥V zk

∥∥2 +∑

kz+Jk1,k2=1

∥∥∥ηzk1

∥∥∥2

∥∥∥ηzk2

∥∥∥2

∥∥∥V zk1k2

∥∥∥2= op (

√n).

2. µnp

(1n

((Σ− 1

2ε0 ⊗ In

)EnT + EnT (θ)

)((Σ− 1

2ε0 ⊗ In

)EnT + EnT (θ)

)′)≥ c+op(1) for some positive

constant c. This is because

µnp

(1n

((Σ− 1

2ε0 ⊗ In

)EnT + EnT (θ)

)((Σ− 1

2ε0 ⊗ In

)EnT + EnT (θ)

)′)≥µnp

(1n

(Σ−1ε0 ⊗ In

)EnT E ′nT

)+µnp

(1n

EnT(θ)

EnT(θ)′)

+µnp

(1n

(Σ− 1

2ε0 ⊗ In

)EnT EnT

(θ)′+

1n

EnT(θ)

E ′nT

(Σ− 1

2ε0 ⊗ In

))(A.31)

≥µnp

(1n

(Σ−1ε0 ⊗ In

)EnT E ′nT

)− 1

n

∥∥∥EnT(θ)

EnT(θ)′∥∥∥

2

49

− 1n

∥∥∥∥(Σ− 1

2ε0 ⊗ In

)EnT EnT

(θ)′+ EnT

(θ)

E ′nT

(Σ− 1

2ε0 ⊗ In

)∥∥∥∥2

=µnp

(1n

(Σ−1ε0 ⊗ In

)EnT E ′nT

)+oP(1)≥ c+oP(1),

where the inequality in Eq. (A.31) can be found in Theorem 8.12 of Zhang (2011), and the last

inequality is from Lemma A.1 of Ahn and Horenstein (2013) (due to Bai and Yin (1993)).

3. For any r×q matrix A = (a1, · · · ,aq) such that A′A = T Iq,

1n3

∣∣∣∣tr(A′FT zΓ′nz

((Σ− 1

2ε0 ⊗ In

)EnT + EnT

(θ))

A)∣∣∣∣

≤Rz0

n3

∥∥AA′∥∥

2

∥∥FT zΓ′nz

∥∥2

∥∥∥∥(Σ− 1

2ε0 ⊗ In

)EnT + EnT

(θ)∥∥∥∥

2= OP

(n−

12

).

Similarly, we have

1n3

∣∣∣∣tr(A′((

Σ− 1

2ε0 ⊗ In

)EnT + EnT

(θ))′

Γnz(Γ′nzΓnz

)−1Γ′nz

((Σ− 1

2ε0 ⊗ In

)EnT + EnT

(θ))

A)∣∣∣∣

≤Rz0

n3

∥∥AA′∥∥

2

∥∥∥∥(Σ− 1

2ε0 ⊗ In

)EnT + EnT

(θ)∥∥∥∥2

2

∥∥∥Γnz(Γ′nzΓnz

)−1Γ′nz

∥∥∥2= OP

(n−1) .

In Ahn and Horenstein (2013), Assumptions C and D can be replaced by the conditions in their Eqs. (2)

and (3), which are satisfied by 1 and 2 above. Their Assumption B is used to prove Lemma A.10, which is

satisfied by 3, and their Assumption A is satisfied by our Assumption 8. It is straightforward to check that

GnT also satisfies the relevant conditions and similar methods can be applied to determine the number of

factors in the y equation. In summary, Ahn and Horenstein (2013)’s result applies here.

A6. Additional Monte Carlo Results

Estimation with Redundant Factors

Tables 9 and 10 provide the Monte Carlo results where the number of factors in the outcome equation is

over-specified by 2 and 3.

50



(1) 25 25 0.2 0.2 Bias 0.00252 -0.00255 -0.00276 0.07660 0.00197 0.00071CP 0.952 0.879 0.858 0.228 0.945 0.903

49 25 0.2 0.2 Bias 0.00013 0.00165 -0.00099 0.05266 0.00086 0.00175CP 0.945 0.909 0.902 0.226 0.943 0.924

25 49 0.2 0.2 Bias -0.00115 -0.00006 -0.00106 0.05187 0.00135 0.00151CP 0.943 0.924 0.891 0.242 0.940 0.917

49 49 0.2 0.2 Bias 0.00028 0.00171 -0.00078 0.03550 0.00063 0.00040CP 0.948 0.932 0.916 0.242 0.945 0.931

(2) 25 25 0.2 0.6 Bias 0.00108 -0.00353 -0.00086 0.09293 0.00189 -0.00009CP 0.949 0.898 0.867 0.224 0.950 0.894

49 25 0.2 0.6 Bias 0.00020 0.00097 -0.00077 0.06438 0.00047 0.00032CP 0.948 0.923 0.909 0.224 0.949 0.919

25 49 0.2 0.6 Bias -0.00119 -0.00040 -0.00028 0.06389 0.00073 0.00068CP 0.947 0.931 0.896 0.227 0.938 0.916

49 49 0.2 0.6 Bias 0.00047 0.00144 -0.00110 0.04330 0.00051 0.00092CP 0.942 0.924 0.912 0.249 0.950 0.922

(3) 25 25 0.6 0.2 Bias 0.00252 -0.00148 -0.00313 0.07603 0.00197 0.00166CP 0.952 0.878 0.865 0.244 0.945 0.895

49 25 0.6 0.2 Bias 0.00013 0.00203 -0.00128 0.05237 0.00086 0.00186CP 0.945 0.906 0.903 0.245 0.943 0.922

25 49 0.6 0.2 Bias -0.00115 0.00051 -0.00157 0.05158 0.00135 0.00228CP 0.943 0.927 0.901 0.277 0.940 0.918

49 49 0.6 0.2 Bias 0.00028 0.00200 -0.00091 0.03533 0.00063 0.00046CP 0.948 0.931 0.918 0.272 0.945 0.930

(4) 25 25 0.6 0.6 Bias 0.00108 -0.00268 -0.00195 0.09242 0.00189 0.00134CP 0.948 0.901 0.866 0.244 0.950 0.899

49 25 0.6 0.6 Bias 0.00020 0.00127 -0.00096 0.06413 0.00047 0.00055CP 0.948 0.927 0.915 0.239 0.949 0.919

25 49 0.6 0.6 Bias -0.00119 -0.00001 -0.00078 0.06368 0.00073 0.00171CP 0.948 0.936 0.901 0.252 0.938 0.918

49 49 0.6 0.6 Bias 0.00047 0.00170 -0.00100 0.04309 0.00051 0.00106CP 0.942 0.927 0.914 0.266 0.950 0.919



43 , σe0 =

√43 (hence α0 =

( 43)− 1

2 ). In the low



nT Σ−1nT .

51



(1) 25 25 0.2 0.2 Bias 0.00252 -0.00269 -0.00357 0.11187 0.00197 0.00152CP 0.952 0.860 0.811 0.040 0.945 0.866

49 25 0.2 0.2 Bias 0.00013 0.00168 -0.00179 0.07701 0.00086 0.00143CP 0.945 0.905 0.866 0.031 0.943 0.900

25 49 0.2 0.2 Bias -0.00115 -0.00049 -0.00181 0.07649 0.00135 0.00150CP 0.944 0.920 0.837 0.033 0.940 0.878

49 49 0.2 0.2 Bias 0.00028 0.00196 -0.00107 0.05199 0.00063 0.00048CP 0.948 0.922 0.895 0.036 0.945 0.911

(2) 25 25 0.2 0.6 Bias 0.00108 -0.00473 -0.00100 0.13613 0.00189 -0.00024CP 0.947 0.884 0.845 0.033 0.950 0.874

49 25 0.2 0.6 Bias 0.00020 0.00119 -0.00093 0.09410 0.00047 0.00005CP 0.948 0.922 0.880 0.032 0.949 0.890

25 49 0.2 0.6 Bias -0.00119 -0.00035 -0.00034 0.09390 0.00073 0.00050CP 0.947 0.924 0.869 0.037 0.938 0.892

49 49 0.2 0.6 Bias 0.00047 0.00151 -0.00121 0.06342 0.00051 0.00058CP 0.942 0.925 0.902 0.036 0.950 0.917

(3) 25 25 0.6 0.2 Bias 0.00252 -0.00120 -0.00463 0.11097 0.00197 0.00268CP 0.952 0.861 0.808 0.049 0.945 0.871

49 25 0.6 0.2 Bias 0.00013 0.00229 -0.00204 0.07658 0.00086 0.00156CP 0.945 0.901 0.869 0.042 0.943 0.903

25 49 0.6 0.2 Bias -0.00115 0.00049 -0.00286 0.07594 0.00135 0.00238CP 0.943 0.924 0.847 0.049 0.940 0.887

49 49 0.6 0.2 Bias 0.00028 0.00227 -0.00113 0.05178 0.00063 0.00056CP 0.948 0.918 0.893 0.044 0.945 0.912

(4) 25 25 0.6 0.6 Bias 0.00108 -0.00365 -0.00247 0.13543 0.00189 0.00134CP 0.947 0.879 0.849 0.044 0.950 0.869

49 25 0.6 0.6 Bias 0.00020 0.00151 -0.00102 0.09383 0.00047 0.00023CP 0.948 0.920 0.885 0.040 0.949 0.892

25 49 0.6 0.6 Bias -0.00119 0.00020 -0.00117 0.09356 0.00073 0.00173CP 0.948 0.927 0.873 0.039 0.938 0.892

49 49 0.6 0.6 Bias 0.00047 0.00178 -0.00110 0.06319 0.00051 0.00073CP 0.942 0.923 0.894 0.040 0.950 0.918



43 , σe0 =

√43 (hence α0 =

( 43)− 1

2 ). In the low



nT Σ−1nT .

52

and common factorsi - wei shi · many spatial panel data sets exhibit cross sectional and/or...

Documents