author's personal copy - stanford...

14
Author's personal copy Journal of Econometrics 157 (2010) 191–204 Contents lists available at ScienceDirect Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom On the asymptotic optimality of the LIML estimator with possibly many instruments T.W. Anderson a,b , Naoto Kunitomo c,* , Yukitoshi Matsushita c a Department of Statistics, Stanford University, United States b Department of Economics, Stanford University, United States c Graduate School of Economics, University of Tokyo, Bunkyo-ku, Hongo 7-3-1 Tokyo, Japan article info Article history: Received 21 December 2007 Received in revised form 1 December 2009 Accepted 4 December 2009 Available online 24 December 2009 JEL classification: C13 C30 Keywords: Structural equation Simultaneous equations system Many instruments Many weak instruments Limited information maximum likelihood Asymptotic optimality abstract We consider the estimation of the coefficients of a linear structural equation in a simultaneous equation system when there are many instrumental variables. We derive some asymptotic properties of the limited information maximum likelihood (LIML) estimator when the number of instruments is large; some of these results are new as well as old, and we relate them to results in some recent studies. We have found that the variance of the limiting distribution of the LIML estimator and its modifications often attain the asymptotic lower bound when the number of instruments is large and the disturbance terms are not necessarily normally distributed, that is, for the micro-econometric models of some cases recently called many instruments and many weak instruments. © 2009 Elsevier B.V. All rights reserved. 1. Introduction Over the past three decades there has been increasing interest and research on the estimation of a single structural equation in a system of simultaneous equations when the number of instruments (the number of exogenous variables excluded from the structural equation), say K 2 , is large relative to the sample size, say n. The relevance of such models is due to the collection of large data sets and the development of computational equipment capable of analysis of such data sets. One empirical example of this kind often cited in econometric literature is Angrist and Krueger (1991); there has been some discussion by Bound et al. (1995) since then. Asymptotic distributions of estimators and test criteria AKM09-11-23-2. This is a revised first part of Discussion Paper CIRJE-F-321 under the title ‘‘A New Light from Old Wisdoms: Alternative Estimation Methods of Simultaneous Equations and Microeconometric Models’’ (Graduate School of Economics, University of Tokyo, February 2005) which was presented at the Econometric Society World Congress 2005 at London (August 2005). We thank Yoichi Arai, the co-editor and the referees of this journal for some comments to the earlier versions. * Corresponding author. Tel.: +81 3 5841 5614. E-mail address: [email protected] (N. Kunitomo). are developed on the basis that both K 2 →∞ and n →∞. These asymptotic distributions are used as approximations to the distributions of the estimators and criteria when K 2 and n are large. Bekker (1994) has written ‘‘To my knowledge a first mention of such a parameter sequence was made, with respect to the linear functional relationship model, in Anderson (1976, p. 34). This work was extended to simultaneous equations by Kunitomo (1980, 1982) and Morimune (1983), who gave asymptotic expansions for the case of a single explanatory endogenous variable’’. Following Bekker there have been many studies of the behavior of estimators of the coefficients of a single equation when K 2 and n are large. The main purpose of the present paper is to show that one estimator, the Limited Information Maximum Likelihood (LIML) estimator, has some optimum properties when K 2 and n are large. As background we state and derive some asymptotic distributions of the LIML and Two-Stage Least Squares (TSLS) estimators as K 2 and n . Some of these results are improvements on Kunitomo (1981, 1982), Morimune (1983) and Bekker (1994), Chao and Swanson (2005), van Hasselt (2006), Hansen et al. (2008), and they are presented in a unified notation. In addition to the LIML and TSLS estimators there are other instrumental variables (IV) methods. See Anderson et al. (1982) on the earlier studies of the finite sample properties, for instance. 0304-4076/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2009.12.001

Upload: others

Post on 06-Mar-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Author's personal copy - Stanford Universitystatweb.stanford.edu/~ckirby/ted/papers/2010_Asymptotic...Author's personal copy T.W. Anderson et al. / Journal of Econometrics 157 (2010)

Author's personal copy

Journal of Econometrics 157 (2010) 191–204

Contents lists available at ScienceDirect

Journal of Econometrics

journal homepage: www.elsevier.com/locate/jeconom

On the asymptotic optimality of the LIML estimator with possibly manyinstrumentsI

T.W. Anderson a,b, Naoto Kunitomo c,∗, Yukitoshi Matsushita ca Department of Statistics, Stanford University, United Statesb Department of Economics, Stanford University, United Statesc Graduate School of Economics, University of Tokyo, Bunkyo-ku, Hongo 7-3-1 Tokyo, Japan

a r t i c l e i n f o

Article history:Received 21 December 2007Received in revised form1 December 2009Accepted 4 December 2009Available online 24 December 2009

JEL classification:C13C30

Keywords:Structural equationSimultaneous equations systemMany instrumentsMany weak instrumentsLimited information maximum likelihoodAsymptotic optimality

a b s t r a c t

We consider the estimation of the coefficients of a linear structural equation in a simultaneous equationsystemwhen there aremany instrumental variables.We derive some asymptotic properties of the limitedinformation maximum likelihood (LIML) estimator when the number of instruments is large; some ofthese results are new as well as old, and we relate them to results in some recent studies. We have foundthat the variance of the limiting distribution of the LIML estimator and its modifications often attain theasymptotic lower bound when the number of instruments is large and the disturbance terms are notnecessarily normally distributed, that is, for the micro-econometric models of some cases recently calledmany instruments andmany weak instruments.

© 2009 Elsevier B.V. All rights reserved.

1. Introduction

Over the past three decades there has been increasing interestand research on the estimation of a single structural equationin a system of simultaneous equations when the number ofinstruments (the number of exogenous variables excluded fromthe structural equation), say K2, is large relative to the sample size,say n. The relevance of such models is due to the collection oflarge data sets and the development of computational equipmentcapable of analysis of such data sets. One empirical example of thiskind often cited in econometric literature is Angrist and Krueger(1991); there has been some discussion by Bound et al. (1995)since then. Asymptotic distributions of estimators and test criteria

I AKM09-11-23-2. This is a revised first part of Discussion Paper CIRJE-F-321under the title ‘‘A New Light from Old Wisdoms: Alternative Estimation Methodsof Simultaneous Equations and Microeconometric Models’’ (Graduate School ofEconomics, University of Tokyo, February 2005) which was presented at theEconometric Society World Congress 2005 at London (August 2005). We thankYoichi Arai, the co-editor and the referees of this journal for some comments tothe earlier versions.∗ Corresponding author. Tel.: +81 3 5841 5614.E-mail address: [email protected] (N. Kunitomo).

are developed on the basis that both K2 → ∞ and n → ∞.These asymptotic distributions are used as approximations to thedistributions of the estimators and criteriawhenK2 and n are large.Bekker (1994) has written ‘‘To my knowledge a first mention

of such a parameter sequence was made, with respect to thelinear functional relationshipmodel, inAnderson (1976, p. 34). Thisworkwas extended to simultaneous equations by Kunitomo (1980,1982) and Morimune (1983), who gave asymptotic expansions forthe case of a single explanatory endogenous variable’’. FollowingBekker there have beenmany studies of the behavior of estimatorsof the coefficients of a single equation when K2 and n are large.The main purpose of the present paper is to show that one

estimator, the Limited Information Maximum Likelihood (LIML)estimator, has some optimum properties when K2 and n are large.As background we state and derive some asymptotic distributionsof the LIML andTwo-Stage Least Squares (TSLS) estimators asK2 →∞ and n → ∞. Some of these results are improvements onKunitomo (1981, 1982),Morimune (1983) and Bekker (1994), Chaoand Swanson (2005), van Hasselt (2006), Hansen et al. (2008), andthey are presented in a unified notation.In addition to the LIML and TSLS estimators there are other

instrumental variables (IV) methods. See Anderson et al. (1982)on the earlier studies of the finite sample properties, for instance.

0304-4076/$ – see front matter© 2009 Elsevier B.V. All rights reserved.doi:10.1016/j.jeconom.2009.12.001

Page 2: Author's personal copy - Stanford Universitystatweb.stanford.edu/~ckirby/ted/papers/2010_Asymptotic...Author's personal copy T.W. Anderson et al. / Journal of Econometrics 157 (2010)

Author's personal copy

192 T.W. Anderson et al. / Journal of Econometrics 157 (2010) 191–204

Several semiparametric estimation methods have been developedincluding the generalized method of moments (GMM) estimationand the empirical likelihood (EL) method. (See Hayashi, 2000for instance.) However, it has been recently recognized that theclassical methods have some advantages in microeconometricsituations with many instruments.In this paper we shall give the results on the asymptotic

properties of the LIML estimator when the number of instrumentsis large and we develop the large-K2 asymptotic theory or themany instruments asymptotics including the so-called case ofmany weak instruments. The TSLS and the GMM estimators arebadly biased and they lose even consistency in some of thesesituations. Our results on the asymptotic properties and optimalityof the LIML estimator and its variants give new interpretationsof the numerical information of the finite sample properties andsome guidance on the use of alternative estimation methodsin simultaneous equations and micro-econometric models withmany weak instruments. There is a growing amount of literatureon the problem of many instruments in econometric models. Weshall try to relate our results to some recent studies, includingDonald and Newey (2001), Hahn (2002), Stock and Yogo (2005),Chao and Swanson (2005, 2006), van Hasselt (2006), van der PloegandBekker (unpublished), Bekker and van der Ploeg (2005), Chiodaand Jansson (unpublished), Hansen et al. (2008), and Andersonet al. (forthcoming).In Section 2we state the formulation of a linear structuralmodel

and the alternative estimation methods of unknown parameterswith possibly many instruments. In Section 3 we develop thelarge-K2 asymptotics (or many instruments asymptotics) and givesome results on the asymptotic normality of the LIML estimatorwhen n and K2 are large. Then we shall present some results onthe asymptotic optimality of the LIML estimator in the sense thatit attains the lower bound of the asymptotic variance in a classof consistent estimators with many instruments under reasonableassumptions. Also we discuss a more general formulation of themodels and the related problems. In Section 4 we show thatthe asymptotic results in Section 3 agree with the finite sampleproperties of estimators. Then brief concluding remarks will begiven in Section 5. The proof of our theorems will be given inSection 6.

2. Alternative estimation methods in structural equationmodels with possibly many instruments

We first consider the estimation problem of a structural equa-tion in the classical linear simultaneous equations framework.1 Leta single linear structural equation in an econometric model be

y1i = β′2y2i + γ′

1z1i + ui (i = 1, . . . , n), (2.1)

where y1i and y2i are a scalar and a vector of G2 endogenousvariables, z1i is a vector of K1 (included) exogenous variables in(2.1), γ1 and β2 are K1 × 1 and G2 × 1 vectors of unknownparameters, and u1, . . . , un are independent disturbance termswith E(ui) = 0 and E(u2i ) = σ

2 (i = 1, . . . , n).We assume that(2.1) is one equation in a system of 1 + G2 equations in 1 + G2endogenous variables y ′i = (y1i, y ′2i)

′. The reduced form of themodel is

Y = Z5n + V, (2.2)

where Y = (y′i) is the n× (1+G2)matrix of endogenous variables,Z = (Z1, Z2n) = (z

(n)′i ) is the n×Knmatrix of K1+K2n instrumental

1 We intentionally consider the standard classic situation and state our resultsmainly because they are clear. Nonetheless a generalization of the formulation andthe corresponding results will be discussed in Section 3.3.

vectors z(n)i = (z′1i, z(n)′2i )′,V = (v′i) is the n × (1 + G2) matrix of

disturbances,

5n =

(π11 512

π(n)21 5

(n)22

)is the (K1 + K2n)× (1+ G2)matrix of coefficients, and

E(viv′i) = � =[ω11 ω′2ω2 �22

]is a positive definite matrix. The vector of Kn (= K1 + K2n, n > 2)instrumental variables z(n)i satisfies the orthogonality conditionE[uiz

(n)i ] = 0 (i = 1, . . . , n). The relation between (2.1) and (2.2)

gives(π11 512

π(n)21 5

(n)22

)(1−β2

)=

(γ10

), (2.3)

ui = (1,−β′2)vi = β′vi and σ 2 = β′�β with β′ = (1,−β′2).

Let52n = (π(n)21 ,5

(n)22 ) be a K2n×(1+G2)matrix of coefficients.

Define the (1+ G2)× (1+ G2)matrices by

G = Y′Z2.1A−122.1Z′

2.1Y = P′2A22.1P2, (2.4)

and

H = Y′(In − Z(Z′Z)−1Z′

)Y, (2.5)

where A22.1 = Z′2.1Z2.1, Z2.1 = Z2n − Z1A−111 A12, P2 = A−122.1Z′

2.1Y,

Z1 =

z′11...

z′1n

, Z2n =

z(n)′

21...

z(n)′

2n

, (2.6)

and

A =(Z′1Z′2n

)(Z1, Z2n) =

(A11 A12A21 A22

)(2.7)

is a nonsingular matrix (a.s.). Then the LIML estimator βLI of β =(1,−β′2)

′ is the solution of(1nG−

1qnλnH

)βLI = 0, (2.8)

where qn = n− Kn (n > 2) and λn (n > 2) is the smallest root of∣∣∣∣1nG− l 1qnH∣∣∣∣ = 0. (2.9)

The solution to (2.8) minimizes the variance ratio given in Box I.The TSLS estimator βTS (= (1,−β

2.TS)′) of β = (1,−β′2)

′ is givenby

Y′2Z2.1A−122.1Z

2.1Y(

1−β2.TS

)= 0, (2.10)

where Y2 = (y′i) is an n×G2matrix. The TSLS estimator minimizesthe numerator of the variance ratio Box I. The LIML and the TSLSestimators of γ1 are

γ1 = (Z′

1Z1)−1Z′1Yβ, (2.11)

where β is βLI or βTS, respectively. In this paper we shall discussthe asymptotic properties of β2 because of its simplicity, althoughit is straightforward to extend to treat

√n[β′

2 − β′2, γ′

1 − γ ′1]with some additional notations. The LIML and TSLS estimators andtheir properties in the general case were originally developed byAnderson and Rubin (1949, 1950). See also Anderson (2005).

Page 3: Author's personal copy - Stanford Universitystatweb.stanford.edu/~ckirby/ted/papers/2010_Asymptotic...Author's personal copy T.W. Anderson et al. / Journal of Econometrics 157 (2010)

Author's personal copy

T.W. Anderson et al. / Journal of Econometrics 157 (2010) 191–204 193

L1n =

[n∑i=1

z(n)′

i (y1i − γ ′1z1i − β′

2y2i)] [

n∑i=1

z(n)i z(n)′

i

]−1 [ n∑i=1

z(n)i (y1i − γ′

1z1i − β′

2y2i)]

n∑i=1(y1i − γ ′1z1i − β

2y2i)2.

Box I.

3. On asymptotic optimality of the LIML estimator

3.1. Asymptotic normality of the LIML estimator

We state the limiting distribution of the LIML estimator undera set of alternative assumptions when K2n and52n can depend onn and n→∞. We first consider the case when

(I)K2nn−→ c (0 ≤ c < 1),

(II)1n5(n)′22 A22.15

(n)22

p−→ 822.1,

where822.1 is a nonsingular constant matrix.Condition (I) implies that the number of coefficient parameters

is proportional to the number of observations. Because we want toestimate the covariance matrix of v(n)i (i = 1, . . . , n), we wantc < 1. Then (I) implies qn −→ ∞ as n −→ ∞. Condition(II) controls the noncentrality (or concentration) parameter to beproportional to the sample size. Since K2n grows, it is often calledthe case of many instruments. These conditions define the rates ofgrowth of the number of incidental parameters. Condition (II) shallbe weakened to Condition (II)′ where K2n increases with n but at asmaller rate as stated in Section 3.3, which is sometimes called thecase ofmany weak instruments.2We first summarize the basic results on the asymptotic

distributions as Theorems. Although the present formulation andTheorem 1 are similar to the corresponding results reported in vanHasselt (2006) and Hansen et al. (2008), we shall give the proofsin Section 6 because the method of our proof gives some insightson the underlying assumptions, its extensions and the asymptoticoptimality.To state our results conveniently we transform vi to

w2i = (0, IG2)[I1+G2 −

1σ 2�ββ′

]vi

= (0, IG2)[vi −

1σ 2Cov(v, u)ui

](3.1)

and ui = β′vi. Then E(w2iui) = 0 and

E(w2iw′2i) =1σ 2

[�σ 2 − �ββ′�

]22 , (3.2)

where [ · ]22 is the G2 × G2 lower right-hand corner of the matrix.

Theorem 1. Let z(n)i , i = 1, 2, . . . , n, be a set of Kn × 1 vectors(Kn = K1+K2n, n > 2). Let vi, i = 1, 2, . . . , n, be a set of (1+G2)×1independent random vectors independent of z(n)1 , . . . , z

(n)n such that

E(vi) = 0 and E(viv′i) = � (a.s.). Suppose that (I) and (II) hold. Inaddition assume

(III)1nmax1≤i≤n‖5

(n)′22 z∗in‖

2 p−→ 0,

where z∗in is the i-th row vector of Z2.1 = Z2n − Z1(Z′1Z1)−1Z′1Z2n.

2 A referee has pointed out that Condition (II) is the case of many stronginstruments.

(i) For c = 0, suppose that E[‖vi‖2] are bounded. Then√n(β2.LI − β2)

d−→ N(0,9∗), (3.3)

where9∗ = σ 28−122.1 and σ2= β′�β.

(ii) For 0 < c < 1, define E(u2w2w′2) − σ2E(w2w′2) = 044.2

and assume that E[‖vi‖4+ε] < ∞ for some ε > 0 (andE[‖5

(n)′22 z∗in‖

2+δ] < ∞ for some δ > 0 when z(n)i are stochastic).

3

Suppose also that there exist limits

(IV)43.2 =[11− c

]plimn→∞

1n5(n)′22

n∑i=1

z∗in[p(n)ii − c

]E(u2w′2),

(V) η =[11− c

]2plimn→∞

1n

n∑i=1

[p(n)ii − c

]2,

where p(n)ii = (Z2.1A−122.1Z

2.1)ii. Then

√n(β2.LI − β2)

d−→ N(0,9∗∗), (3.4)

where

9∗∗ = σ 28−122.1 +8−122.1

{c∗[�σ 2 − �ββ′�

]22

+[(43.2 +4

3.2)+ η044.2]}8−122.1 (3.5)

and c∗ = c/(1 − c). If G2 = 1, then [�σ 2 − �ββ′�]22 =ω11ω22 − ω

212 = |�|.

Corollary 1. When vi (= (vji)) (i = 1, . . . , n; j = 1, . . . ,G2 + 1)has an elliptically contoured (EC) distribution in Theorem 1, the fourthorder moments E(vjivkivlivmi) = (1 + κ/3)(ωjkωlm + ωjlωkm +ωjmωkl) and E[(β′vi)2viv′i] = (1 + κ/3)(σ 2� + 2�ββ ′�), where� = (ωjk), E(vjivki) = ωjk and κ is the kurtosis of EC(�).4 Then044.2 = (κ/3)

[�σ 2 − �ββ′�

]22 and (3.5) is given by

9∗∗ = σ 28−122.1

+

(c∗ +

13ηκ

)8−122.1

[�σ 2 − �ββ′�

]228

−122.1. (3.6)

Instead of making an assumption on the distribution of distur-bance terms except the existence of their moments, alternativelywe assume

(VI) plimn→∞

1n

n∑i=1

[p(n)ii − c

]2= 0.

A simple example for Condition (VI) is the casewhenwe normalize(1/n)A22.1 ∼ IK2n and z∗in ∼ N(0, IK2n). Condition (VI) is thesame as η = 0 in Condition (V), which in turn implies 43.2 =O in Condition (IV) by the Cauchy–Schwarz inequality. Theseconsequences of Condition (VI) imply the following theorem:

3 We thank a referee for suggesting the possible improvements on the momentconditions in the previous version.4 The precise definition of elliptically contoured (EC) distribution has been givenby Section 2.7 of Anderson (2003).

Page 4: Author's personal copy - Stanford Universitystatweb.stanford.edu/~ckirby/ted/papers/2010_Asymptotic...Author's personal copy T.W. Anderson et al. / Journal of Econometrics 157 (2010)

Author's personal copy

194 T.W. Anderson et al. / Journal of Econometrics 157 (2010) 191–204

Theorem 2. For 0 ≤ c < 1 assume Conditions (I), (II), (III), (VI) andassume that E[‖vi‖2+ε] <∞ for some ε > 0 (and E[‖5

(n)′22 z∗in‖

2+δ]

<∞ for some δ > 0 when z(n)i are stochastic). Then

√n(β2.LI − β2)

d−→ N(0,9∗∗), (3.7)

where

9∗∗ = σ 28−122.1 + c∗8−122.1

[�σ 2 − �ββ′�

]228

−122.1 (3.8)

and c∗ = c/(1− c).

The asymptotic properties of the LIML estimator hold when K2nincreases as n → ∞ and K2n/n → 0. In this case the limitingdistribution of the LIML estimator can be different from that of theTSLS estimator. (The proof of Theorem 3will be given in Section 6.)

Theorem 3. Let {vi, z(n)i ; i = 1, . . . , n} be a set of independent

random vectors. Assume that (2.1) and (2.2) hold with E(vi|zi) =0 (a.s.) and E(viv′i|z

(n)i ) = �

(n)i (a.s.) is a function of z(n)i , say,

�i[n, z(n)i ]. The further assumptions on (vi, z

(n)i )(vi = (vji)) are that

E(v4ji |z(n)i ) are bounded, there exists a constant matrix � such that

√n‖�(n)i − �‖ is bounded and σ

2= β′�β > 0. Suppose

(I′)K2nnη−→ c (0 ≤ η < 1, 0 < c <∞),

(II)1n5(n)′22 A22.15

(n)22

p−→ 822.1,

(III)1nmax1≤i≤n‖5

(n)′22 z∗in‖

2 p−→ 0,

where822.1 is a nonsingular constant matrix.(i) Then for the LIML estimator when 0 ≤ η < 1,

√n(β2.LI − β2)

d−→ N(0, σ 28−122.1), (3.9)

where σ 2 = β′�β.(ii) For the TSLS estimator when 1/2 < η < 1,

n1−η(β2.TS − β2)p−→ 8−122.1 c(ω21,�22)β, (3.10)

when η = 1/2,

√n(β2.TS − β2)

d−→ N

[c8−122.1(ω21,�22)β, σ

28−122.1

], (3.11)

where (ω21,�22) is the G2 × (1 + G2) lower submatrix of �. When0 ≤ η < 1/2,

√n(β2.TS − β2)

d−→ N(0, σ 28−122.1). (3.12)

It is possible to interpret the standard large sample theory asa special case of Theorem 3. The asymptotic properties of theLIML and TSLS estimators for γ1 can be derived from Theorem 1.Donald and Newey (2001) (in their Lemma A.6) has investigatedthe asymptotic properties of the LIML estimator when K2n/n −→0. Also Stock and Yogo (2005) have discussed the asymptoticproperties of the GMM estimators in some cases of the large-K2 theory when 0 < η < 1/2. In this case, the asymptoticlower bound of the covariance matrix is the same as in the case ofthe large sample asymptotic theory, although there are incidentalparameters (the number of parameters is growingwith the samplesize). Chao and Swanson (2005) have considered the consistency ofsome estimators when K2n is dependent on n and the disturbancesare not necessarily normally distributed. Hansen et al. (2008) haveobtained the limiting distribution of the LIML estimator for amore general model with non-normal disturbances and differentassumptions.

3.2. Asymptotic optimality with many instruments

For the estimation of the vector of structural parameters β,it seems natural to consider procedures based on the two (1 +G2) × (1 + G2) matrices G and H, which are sufficient statisticsin the classical standard situation. We shall consider a class ofestimators which are functions of these matrices. The typicalexamples of this class are the OLS estimator, the TSLS estimator,and the LIML estimator. It contains the modified versions and thecombined versions of these estimators including the one proposedby Fuller (1977). (It also includes other estimators which areasymptotically equivalent to these estimators.) Then we have abasic result on the asymptotic optimality of the LIML estimatorand its (asymptotically equivalent) modifications, which attainsthe lower bound of the asymptotic covariance under alternativeassumptions in most cases. The proof will be given in Section 6.

Theorem 4. Assume that (2.1) and (2.2) hold. Define a class ofconsistent estimators for β2 by

β2 = φ(G, H), (3.13)

where φ is continuously differentiable and its derivatives are boundedat the probability limits of (1/n)G and (1/qn)H as K2n → ∞ andn→∞ and 0 ≤ c < 1. Then under the assumptions of the case (i) ofTheorem 1, Corollary 1, Theorem 2 or Theorem 3,

AE[n(β2 − β2)(β2 − β2)

]≥ 9∗ (or 9∗∗), (3.14)

and9∗ (or 9∗∗) is given by (3.3), (3.6), (3.8) or (3.9), where the left-hand side of (3.14) is the covariancematrix of the limiting distributionof the normalized estimator

√n(β2 − β2) for the class of (3.13).

This is the basic result on the asymptotic optimality of theLIML estimator when there are many instruments. When thedistribution of V is normal N(0,�) and Z is exogenous, P =(Z′Z)−1Z′Y and H = Y′[In − Z(Z′Z)−1Z′]Y are a sufficient setof statistics for 5n and �, the parameters of a model. When Knis fixed, it is known that of all consistent estimators of β2, theLIML estimator suitably normalized has the minimum asymptoticvariance and the optimality of β2.LI extends to the class of allconsistent estimators not only the form of (3.13). When Kn isdependent on n, however, there is a further problem with (many)incidental parameters.The above theorems are the generalized versions of the

earlier results given by Kunitomo (1981, 1982, 1987) becausethey assumed that the disturbances are normally distributed.Furthermore, Kunitomo (1987) has investigated the higher orderefficiency property of the LIML estimatorwhenG2 = 1, 0 ≤ c < 1.In the large-K2 asymptotic theory with 0 < c < 1, the LIMLestimator is asymptotically efficient and attains the lower boundof the variance–covariance matrix, which is strictly larger than theinformation matrix and the asymptotic Cramér–Rao lower boundunder a set of assumptions, while both the TSLS and the GMMestimators are inconsistent. This is a non-regular situation becausethe number of incidental parameters increases as K2n increases inthe simultaneous equation models.5There are further comments on recent studies. Bekker (1994)

derived the approximate distributions of the TSLS and LIMLestimators under the normal disturbances. van der Ploeg andBekker (unpublished) and Bekker and van der Ploeg (2005)

5 As a non-trivial example, we take the bias-adjusted TSLS estimator by settingλn = K2n/n in (2.8) and denote β2.BTS . Then the asymptotic variance of β2.BTS isgreater than8∗ in Theorem 3 if 0 < c < 1 and

[�ββ′�

]22 ≥ 0.

Page 5: Author's personal copy - Stanford Universitystatweb.stanford.edu/~ckirby/ted/papers/2010_Asymptotic...Author's personal copy T.W. Anderson et al. / Journal of Econometrics 157 (2010)

Author's personal copy

T.W. Anderson et al. / Journal of Econometrics 157 (2010) 191–204 195

considered the asymptotic distributions and the asymptotic boundwhen the data have a group structure with a fixed number mgroups and the disturbances are normally distributed. Chioda andJansson (unpublished) obtained an asymptotic optimality of theLIML estimator under the normal disturbances when G2 = 1.Furthermore, Hahn (2002) has derived an asymptotic lower

bound when G2 = 1, K1 = 0 and the disturbances are normallydistributed. It is the same as the asymptotic Cramér–Rao lowerbound, which can be smaller than 9∗∗ under some conditions onthe incidental parameters. The trivial case which satisfies sucha condition is the case when a K2n × G1 matrix 5

(n)22 has non-

zero finite components. Another example is Condition 1 of Hahn(2002), which corresponds to the case of 0 ≤ η < 1/2 in ourTheorem 3. These cases could be reduced to the casewhen c = 0 inTheorem 4 and there is no contradictionwith our results. Themaindifference in the two approaches comes from the fact that we havepursued the asymptotic bound and optimality over the incidentalparameters uniformly in some sense6 and we do not have anyparticular condition on the incidental parameters except (I) and(II). Since we often do not have a prior information on the reducedform coefficients in applications, it may be natural to derive theasymptotic bound with some uniformity condition over the spaceof reduced form coefficients.

3.3. General formulations of the asymptotic optimality

We can generalize the asymptotic optimality of LIML in severaldirections. We consider (2.1) and a nonlinear replacement7 for thelast G2 columns of the reduced form (2.2). We treat (2.1) and

Y2 = 5(n)2Z + V2, (3.15)

where5(n)2Z = (π

2i(z(n)i )) is an n×G2 matrix, the i-th row of which

π′2i(z(n)i ) depends on the Kn × 1 vector z

(n)i (i = 1, . . . , n),V2 is a

n×G2matrix, v1 = u+V2β2, andV = (v1,V2). When the reducedform Eqs. (3.15) are linear, (2.1) and (3.15) has a representation(2.2). In this formulation, Condition (II) is replaced by

(II′)1d2n5(n)′2Z Z2.1A−122.1Z

2.15(n)2Z

p−→ 822.1,

where d2n = tr(5(n)′2Z Z2.1A−122.1Z

2.15(n)2Z ),822.1 is a positive

(constant) definite matrix and dnp→ ∞ as n → ∞. We replace

(III) by

(III)′1d2nmax1≤i≤n‖π2i(z

(n)i )‖

2 p−→ 0.

A possible additional condition (due to nonlinearilty in (3.15)) is

(VII)1qn5(n)′2Z

[In − Z(Z′Z)−1Z′

]5(n)2Z

p−→ O.

Condition (VII) is automatically satisfied in the linear case. It ispossible to weaken this condition such that the probability limitof (VII) is 83, say. Then we need some additional notations to re-express Theorems 1–4 without changing their essential results.(See Section 3 of Kunitomo, 2008.)

6 If we assume the normal disturbances with � = ω2I, the dummy instrumentsand the j-th row of 5n (j = 1, . . . , Kn) as πj ∼ N1+G1 (0,�π ), for instance, wehave (so-called) a structural relationship model (see Anderson, 1984 for instance).Then the asymptotic bound of estimators is similar to ours. A referee also pointedout the problem of quality of instruments, which is certainly related to the problemof choosing instruments. Since the related discussion on the incidental parametersproblem and the related issues, which are important, is far beyond the scope of thepresent paper, we omit the details.7 This model is very similar to the one studied by Hansen et al. (2008), which hasgeneralized some results of Anderson et al. (2005).

Three cases can be considered. We have already investigatedthe first case of dn = Op(n1/2) and K2n = O(n). The asymptoticcovariance of the LIML estimator is given by (3.5) in Theorem 1 or(3.8) in Theorem2 under alternative assumptionswith (II)′ insteadof (II). The second case is the standard large sample asymptotics,which corresponds to the cases of dn = Op(n1/2+δ) (δ > 0), ordn = Op(n1/2) and K2n/n = o(1). In this case

dn(β2.LI − β2)d−→ N(0, σ 28−122.1). (3.16)

Theorem3 is one result in this case, which can be extended directlyto the nonlinear model of (2.1) and (3.15).The third case occurs when dn = op(n1/2) and

√n/d2n → 0,

which may correspond to one case in Hansen et al. (2008) withslightly different normalization and assumptions. The next resultis an extension of Theorem 4 and it could be interepreted asan asymptotic optimality of the LIML estimator for many weakinstruments. The variance (3.19) below is simpler than (3.5) and(3.8) because the effects of n dominate the first, the third and thefourth terms of (3.5) in Theorem 1.

Theorem 5. Assume Conditions (II)′, (III)′ and (VII) with dn =op(n1/2) and

√n/d2n → 0 in Theorems 2 and 4 instead of

Conditions (II) and (III). Then[d2n√n

](β2.LI − β2)

d−→ N(0,9∗∗∗) (3.17)

and

AE

[(d2n√n

)2(β2 − β2)(β2 − β2)

]≥ 9∗∗∗, (3.18)

where the left-hand side of (3.18) is the covariance matrix of thelimiting distribution of the normalized estimator [d2n/

√n](β2 − β2)

for the class of (3.13) and

9∗∗∗ = 8−122.1

{c∗[�σ 2 − �ββ′�

]22

}8−122.1. (3.19)

is the covariance matrix of the limiting distribution of the LIMLestimator.

3.4. Heteroscedasticity and the asymptotic properties

Recently, there has been some interest on the role ofheteroscedasticity with many instruments. Let �i = E(viv′i|z

(n)i )

be the conditional covariance matrix and we assume

(VIII)1n

n∑i=1

�ip−→ �,

where � is a positive definite (constant) matrix. Then in the casewhen both Conditions (VI) and (VIII) hold, the LIML estimator stillhas some desirable asymptotic properties.In the more general cases, the distribution of the LIML

estimator could be significantly affected by the presence of theheteroscedasticity of disturbance terms with many instruments.On this issue, however, there are alternative ways to improvethe LIML estimation. Chao and Swanson (2004) has investigatedthe JIVE estimation method, and Hausman et al. (unpublished)have proposed the HLIM estimation method. Kunitomo (2008) hasalso considered the class of MLIML estimators and investigatedthe problem of asymptotic optimality under heteroscedasticityconditions.8 The details of these issues shall be discussed on ananother occasion.

8 It includes an extension of Sections 3.2 and 3.3 of this paperwith the definitionsof weak heteroscedasticity and persistent heteroscedasticity.

Page 6: Author's personal copy - Stanford Universitystatweb.stanford.edu/~ckirby/ted/papers/2010_Asymptotic...Author's personal copy T.W. Anderson et al. / Journal of Econometrics 157 (2010)

Author's personal copy

196 T.W. Anderson et al. / Journal of Econometrics 157 (2010) 191–204

4. Discussion of asymptotic properties and finite sampleproperties

It is important to investigate the finite sample properties ofestimators partly because they are not necessarily similar to theirasymptotic properties. One simple example would be the factthat the exact moments of some estimators do not necessarilyexist. (In that case it is meaningless to compare the exact MSEs ofalternative estimators and their Monte Carlo analogues.) Althoughwe discuss the asymptotic properties of the LIML estimator, weneed to investigate their relevance for practical applications.There is a notable difference between the results in Theorems 1

and 2, that is, the asymptotic variance depends on the 3rd and 4thorder moments of the disturbance terms in the former. The finitesample properties of the LIML estimator have been investigated byAnderson et al. (2005, forthcoming) in a systematic way. As typicalexamples we present only six figures (Figs. A.1–A.6) in Appendixwhen α = 0.5, 1.0 and G2 = 1. We have used the numericalestimation of the cumulative distribution function (cdf) of the LIMLestimator based on the simulation, and we have enough numericalaccuracy in most cases. The key parameters in figures and tablesare K2 (or K2n), n − K (or n − Kn), α = [ω22/|�|1/2](β2 −ω12/ω22) (� = (ωij)) and δ2 = 5

(n)′22 A22.15

(n)22 /ω22. In addition to

the LIML estimator,we have added the distribution functions of theTSLS and OLS (ordinary least squares) estimators for comparisonsin some cases. (See Anderson et al., forthcoming for the details.)The figures (Figs. A.1–A.6) show the estimated cdf of alternativeestimators in the standard form, that is,√nσ81/222.1

(β2 − β2

). (4.1)

By using (3.3) the limiting distributions of the LIML and TSLSestimators are N(0, 1) in the large sample asymptotics and theyare denoted by ‘‘o’’. By using (3.8) the corresponding limitingdistributions of the LIML estimators in the large K2 asymptotics areN(0, a) (a = 9∗−19∗∗, a ≥ 1), which are denoted by large-K-normal in Figs. A.1–A.6, and they are traced by the dashed curves.In some figures we also have the approximations based on thevariance formula (3.5) with the third and fourth order momentsof disturbance terms, which are denoted by large-K-nonnormal andtraced by ‘‘x’’.From these figures we have found that the effects of many

instruments on the cdfs of the estimators are significant and theapproximations based on the large sample asymptotics are ofteninferior. At the same time we also have found that the effectsof non-normality of disturbance terms on the cdf of the LIMLestimator are often very small. (The dashed curves and x are almostidentical.) The distributions of the TSLS and OLS estimators have asignificant bias with many instruments.One important application of the asymptotic variance is to

construct a t-ratio for testing a hypothesis on the coefficients.We can use the asymptotic variance of the LIML estimator givenby (3.5) or (3.8) replaced by its estimator. (We have used P2 for52n, (1/qn)H for � and the sample moments from residuals forσ 2 and E(u2w2), for instance.) We have investigated this problem,and as a typical example we give Table A.1 on the cdf of t-ratio

t(β2.LI) =√n(β2.LI − β2)

s(β2.LI), (4.2)

which is constructed by the LIML estimation, where s2(β2.LI) isthe estimator of the variance. The formulas (3.3)–(3.6) and (3.8)are used. (Matsushita, 2006 has investigated the finite sampleproperties of t-ratios and derived their asymptotic expansions oftheir distribution functions in a systematic way.) We can see thatthe effects of many instruments on the cdf of the null distributions

of t-ratios are often significant, while the approximations based onthe large sample asymptotics are often inferior. At the same timewe also have found that the effects of non-normality of disturbanceterms on the null-distributions of the t-ratios are often small, thatis, the differences among the effects of (3.5) in Theorem 1 are notsubstantial for practical purposes.Bekker (1994) derived the asymptotic variance formula (3.8)

for the LIML estimator under the condition that the disturbanceterms are normally distributed. It is identical to the asymptoticcovariance matrix of the LIML estimator in the large-K2 asymp-totics reported by Kunitomo (1981, 1982). From our investigationsit may be advisable to use (3.8) for statistical inferences on thestructural coefficients even under the cases when the disturbancesare not normally distributed for practical purposes. These obser-vations agree with the recent studies reported by Kunitomo andMatsushita (2009), Hansen et al. (2008) and Anderson et al. (2005,forthcoming).

5. Concluding remarks

In this paper, we have discussed the asymptotic optimalitywhen the number of instruments is large in a structural equationof the simultaneous equations system. Although the limitedinformation maximum likelihood (LIML) estimator and the twostage-least squares (TSLS) estimator are asymptotically equivalentin the standard large sample theory, they are asymptotically quitedifferent in the large-K2 asymptotics with many instruments ormany weak instruments. In some recent microeconometric modelsand models on panel data, it is often a common feature thatK2 is fairly large and this asymptotic theory has some practicalrelevance. (SeeHsiao and Tahmiscioglu, unpublished, for instance.)We have shown that the LIML estimator and its variants often haveasymptotic optimality with many instruments.In practical applications it is often not easy to identify whether

many instruments are weak or strong such as Conditions (II) orCondition (II)′. The TSLS estimator (and hence the GMM method)crucially depends on whether there are many instruments inany forms or not, while the LIML estimator does not. The onlyadditional cost is that we need to use the asymptotic covariance,which can be slightly larger than the standard one. In this sensethe LIML estimator has some asymptotic robustness.The asymptotic optimality results in this paper give some

further reasons why we have the finite sample properties of thealternative estimation methods including the classical LIML andTSLS estimators. The LIML estimator is also quite attractive over thesemi-parametric estimationmethods of the generalizedmethod ofmoments (GMM) and the empirical likelihood (EL) estimators inthe situations withmany instruments ormany weak instruments.

6. Proof of theorems

In this section we give the proofs of Theorems and themathematical derivation in Section 3.

Proof of Theorem 1. Substitution of (2.2) into (2.4) yields

G = (5′nZ′+ V′)Z2.1A−122.1Z

2.1(Z5n + V)

= 5′2nA22.152n + V′Z2.1A−122.1Z′

2.1V+5′

2nZ′

2.1V+ V′Z2.152n.

Then

G− [5′2nA22.152n + K2n�]

= 5′2nZ′

2.1V+ V′Z2.152n +[V′Z2.1A−122.1Z

2.1V− K2n�]. (6.1)

Condition (II) implies that as n −→∞

1n5′2nZ

2.1Vp−→ O, (6.2)

Page 7: Author's personal copy - Stanford Universitystatweb.stanford.edu/~ckirby/ted/papers/2010_Asymptotic...Author's personal copy T.W. Anderson et al. / Journal of Econometrics 157 (2010)

Author's personal copy

T.W. Anderson et al. / Journal of Econometrics 157 (2010) 191–204 197

and

1n

[V′Z2.1A−122.1Z

2.1V− K2n�] p−→ O. (6.3)

Then as n −→∞,

1nG

p−→ G0 =

[β′2IG2

]822.1(β2, IG2)+ c � (6.4)

and

1qn

Hp−→ �. (6.5)

Then βLIp→ β and λn

p→ c as n→∞.

Define G1,H1, λ1n, and b1 by G1 =√n( 1nG − G0),H1

=√qn( 1qnH − �), λ1n =

√n(λn − c), b1 =

√n(βLI − β). From

(2.8),

[G0 − c�]β +1√n[G1 − λ1n�]β

+1√n[G0 − c�]b1 +

1√qn[−cH1]β

= op

(1√n

).

Since (G0 − c �)β = 0 and β′

LI = (1,−β′

2.LI), (2.8) gives[β′2IG2

]822.1√n(β2.LI − β2)

= (G1 − λ1n�−√cc∗H1)β + op(1). (6.6)

Multiplication of (6.6) on the left by β′ = (1,−β′2) yields

λ1n =β′(G1 −

√cc∗H1)β

β′�β+ op(1). (6.7)

Also multiplication of (6.6) on the left by (0, IG2) and substitutionfor λ1n from (6.7) yields√n(β2.LI − β2) = 8

−122.1(0, IG2)(G1 − λ1n�−

√cc∗H1)β + op(1)

= 8−122.1(0, IG2)[IG2+1 −

�ββ′

β′�β

](G1 −

√cc∗H1)β + op(1). (6.8)

By using the relation Vβ = u,we obtain

(G1 −√cc∗H1)β

=1√n5′2nZ

2.1u+√c1√K2n

[V′Z2.1A−122.1Z

2.1u− K2n�β]

−√cc∗

1√qn

[V′(In − Z(Z′Z)−1Z′)u− qn�β

], (6.9)

where Kn + qn = n. Since we have the conditional expectationgiven Z as

E[5′2nZ

2.1Vββ′V′Z2.152n|Z

]= σ 25′2nA22.152n,

we apply the central limit theorem with the Lindeberg conditionto the first term of (6.9). (See Theorem 1 of Anderson andKunitomo, 1992 for instance). Conditions (II) and (III) implythat (1/

√n)5(n)′

22 Z′2.1u has a limiting normal distribution withcovariance matrix σ 2822.1. This proves (i) of Theorem 1.Next we shall consider (ii) of Theorem 1. We need to prove

that the limiting distribution of Tn = T1n +√cT2n −

√c c∗T3n

is normal by applying a central limit theorem, where T1n =

a′(1/√n)5(n)′

22 Z′2.1u, T2n = a′(1/√K2n)W′2Z2.1A

−122.1Z

2.1u, T3n =a′(1/√qn)W′2(In − Z(Z′Z)−1Z′)u for any constant vector a and

W′2 = (0, IG2)[IG2+1 −

�ββ′

β′�β

]V′.

For the second and third terms on the right-hand side of (6.9),we notice that each row vector of W2 (w2i = (0, IG2)(vi −uiCov(v

(n)i ui))/σ

2) and ui (i = 1, . . . , n) are uncorrelated andE[w2iw′2i] = (1/σ

2)[σ 2�− �ββ′�]22. Thus

(0, IG2)[IG2+1 −

�ββ′

β′�β

]1√K2n

[V′Z2.1A−122.1Z

2.1u− K2n�β]

=1√K2n

n∑i,j=1

w2iujp(n)ij (6.10)

and

(0, IG2)[IG2+1 −

�ββ′

β′�β

]1√qn

[V′(In − Z(Z′Z)−1Z′)u− qn�β

]=

1√qn

n∑i,j=1

w2iuj[δji − q

(n)ij ], (6.11)

where p(n)ij = z∗′in [∑nk=1 z

knz∗′

kn]−1z∗jn, q

(n)ij = z(n)

i [∑nk=1 z

(n)k z(n)

k ]−1z(n)j

and δii = 1, δji = 0 (i 6= j). Then the variances of T2n and T3n are

1K2n

E

[a′(

n∑i=1

w2iuip(n)ii +

∑i6=j

w2iujp(n)ij

)]2∣∣∣∣∣∣ Z

=1K2n

n∑i=1

E[u2i a′w2iw′2iap

(n)2ii ] +

1K2n

∑i6=j

E(u2j )E(a′w2iw′2ia)p

(n)2ij ,

and

1qn

E

[a′(

n∑i=1

w2iui(1− q(n)ii )−

∑i6=j

w2iujq(n)ij )

]2∣∣∣∣∣∣ Z

=1qn

n∑i=1

E[u2i a′w2iw′2ia](1− 2q

(n)ii + q

(n)2ii )

+1qn

∑i6=j

E(u2j )E(a′w2jw′2ia)q

(n)2ij .

By using the relations∑ni,j=1 p

(n)2ij = K2n,

∑ni,j=1 q

(n)2ij = Kn and∑n

i=1(1 − 2q(n)ii + q

(n)2ii ) +

∑i6=j q

(n)2ij = qn, the limiting variances

of T2n and T3n are the limits of

1K2n

a′[K2nσ 2E(w2iw′2i)+

n∑i=1

p(n)2ii 044.2

]a (6.12)

and

1qn

a′[qnσ 2E(w2iw′2i)+

(n− 2Kn +

n∑i=1

q(n)2ii

)044.2

]a. (6.13)

In order to evaluate the covariances of three terms of Tn, we firstnotice

E

{[1√n5(n)′22 Z′2.1u

] [1√nW′2

(Z2.1A−122.1Z

2.1

−c∗(In − Z(Z′Z)−1Z′))u]′ ∣∣∣∣Z

}

Page 8: Author's personal copy - Stanford Universitystatweb.stanford.edu/~ckirby/ted/papers/2010_Asymptotic...Author's personal copy T.W. Anderson et al. / Journal of Econometrics 157 (2010)

Author's personal copy

198 T.W. Anderson et al. / Journal of Econometrics 157 (2010) 191–204

=1n

n∑i=1

5(n)′22 z∗in

(p(n)ii − c∗(1− q

(n)ii ))

E(u2w′2)

= 4(n)3.2 (say). (6.14)

Second,

E

{[1√nW′2Z2.1A

−122.1Z

2.1u] [

1√nW′2(In − Z(Z′Z)−1Z′)u

]′∣∣∣∣ Z}=1n

n∑i=1

E(u2i w2iw′

2i)p(n)ii [1− q

(n)ii ]

+1n

n∑i6=j

σ 2E(w2iw′2i)p(n)ij [δ

ji − q

(n)ij ]

=1n

[K2n −

n∑i=1

p(n)ii q(n)ii

]044.2

by using the relations that∑ni,j=1 p

(n)ij δ

ji = K2n and

∑ni,j=1 p

(n)ij q

(n)ji =

K2n. Hence we have evaluated each term

E(T 2n ) = E(T 21n)+ cE(T22n)+ cc∗E(T

23n)

+ 2√cE(T1nT2n)− 2

√cc∗E(T1nT3n)− 2c

√c∗E(T2nT3n).

Then we use the relation c(1+ c∗) = c∗ for the coefficients of twotermsofE(u2i w2iw

2i). Also byusing the relation cc∗(1−c∗)−2cc∗ =−c2∗for the coefficients of 044.2,we find that

limn→∞

[cnK2n

1n

n∑i=1

p(n)2ii + cc∗1qn

(n− 2Kn +

n∑i=1

q(n)2ii

)

−2c√c∗nK2n

nqn

1n

(K2n −

n∑i=1

p(n)ii q(n)ii

)]= limn→∞

ηn,

where ηn = (1/n)∑ni=1[p

(n)ii + c∗q

(n)ii ]

2− c2∗. By using (6.14), the

limiting covariance matrix of√n(β2.LI − β2) is (3.5).

Finally, by using the central limit theorem (CLT) in Lemma 3below for every constant vector a, we have the asymptoticnormality of (3.4) with the asymptotic covariance matrix 9∗∗ andit proves (ii) of Theorem 1. �

The next two lemmas are the results of straightforward evaluationson projection matrices, and we have omitted their derivations.

Lemma 1. Assume Condition (VI) and c = limn→∞ K2n/n. Then

plimn→∞

1n

n∑i=1

[p(n)ii + c∗q(n)ii ]

2− c2∗

= plimn→∞

1n

n∑i=1

[p(n)ii − c∗(1− q(n)ii )]

2= 0, (6.15)

where c∗ = c/(1− c), p(n)ij = (Z2.1A

−122.1Z

2.1)ij and q(n)ij = (ZA

−1Z′)ij.

Lemma 2. Let an n× n matrix P = (pij) satisfying P2 = P = P′ andrank(P) = r ≤ n. Then

n∑i,j=1

piipjjpij ≤ r. (6.16)

Let also

B = (bij) = Z2.1(Z′2.1Z2.1)−1Z′2.1 − c∗

[In − Z(Z′Z)−1Z′

]. (6.17)

Thenn∑i,j=1

biibjjbij = O(n). (6.18)

Lemma 3. Let t(n)1i = a′5(n)′22 z∗in and t

(n)2i = a′w2i (i = 1, . . . , n) for

any (non-zero) constant vector a. As n→∞,

Tn =1√n

n∑i=1

t(n)1i ui +1√n

n∑i,j=1

t(n)2i uj[p(n)ij − c∗(δ

ji − q

(n)ij )]

d−→ N(0,∆), (6.19)

where ∆ = a′822.19∗822.1a or ∆ = a′822.19∗∗822.1a inTheorems 1 and 2.

Proof of Lemma 3. We first consider the case when z(n)i is thesequence of non-stochastic variables.Set s(n)1i = (1/

√n)t(n)1i ui, s

(n)2i = (1/

√n)t(n)2i uibii, s

(n)3i =

(1/√n)ui

∑i−1j=1 t

(n)2j bji, s

(n)4i = (1/

√n)t(n)2i

∑i−1j=1 ujbij and bij = p

(n)ij −

c∗(δji − q

(n)ij ) (i, j = 1, . . . , n). Let Fn,i be the σ -field generated

by the random variables uj, vj (j ≤ i, i ≤ n) and Fn,0 be theinitial σ -field. Then Tn =

∑ni=1 Xni can be decomposed Xni =

s(n)1i + s(n)2i + s

(n)3i + s

(n)4i and E[Xni|Fn,i−1] = 0 (i = 1, . . . , n). Since

each term Xni (i = 1, . . . , n) are martingale difference sequences,by direct calculations we find

E[X2ni|Fn,i−1

]=1n[t(n)1i ]

2σ 2 +1nb2iiE[t

(n)2i ui]

2

+1nσ 2

[i−1∑j=1

bijt2j

]2+1n

E(t22i)

[i−1∑j=1

bijuj

]2

+2nt(n)1i biiE[t

(n)2i u

2i ] +

2nσ 2t(n)1i

i−1∑j=1

bijt2j

+2n

E(u2i t2i)biii−1∑j=1

bijt2j +2n

E(uit22i)biii−1∑j=1

bijuj.

Thenweapply amartingale central limit theorem to Tn =∑ni=1 Xni.

The most important step is to show

1n

n∑i=2

( i−1∑j=1

bijt2j

)2− E

(i−1∑j=1

bijt2j

)2 p→ 0, (6.20)

1n

n∑i=2

E(t22i)

(i−1∑j=1

bijuj

)2− E(t22i)E

(i−1∑j=1

bijuj

)2 p→ 0, (6.21)

1n

n∑i=1

[t(n)1i biiE(t

(n)2i u

2i )− a′43.2a

]p→ 0, (6.22)

and

1n

n∑i=2

σ 2t(n)1ii−1∑j=1

bijt2jp→ 0, (6.23)

1n

n∑i=2

E(u2i t2i)biii−1∑j=1

bijt2jp→ 0, (6.24)

1n

n∑i=2

E(uit22i)biii−1∑j=1

bijujp→ 0. (6.25)

Page 9: Author's personal copy - Stanford Universitystatweb.stanford.edu/~ckirby/ted/papers/2010_Asymptotic...Author's personal copy T.W. Anderson et al. / Journal of Econometrics 157 (2010)

Author's personal copy

T.W. Anderson et al. / Journal of Econometrics 157 (2010) 191–204 199

Under the assumptions in Theorem 1, it is straightforward buttedious to show these relations by using Lemma 2. We only givean illustration. We evaluate

E

[1n

n∑i=2

t1i

(n∑

j=i−1

bijt2i

)]2

(1n

)2E

[n∑

i,i′=2

t1it ′1in∑j=1

bijbi′j

]E(t22i)

=

(1n

)2E

[n∑

i,i′=2

t1it1i′bii′

]E(t22i)

because we can utilize Z2.1(In − Z(Z′Z)−1Z) = O. Then we have(6.23).Finally, we apply the martingale CLT as Theorem 3.5 of Hall

and Heyde (1980). We set Vn =∑ni=1 E

[X2ni|Fn,i−1

]. Then by

utilizing that for any ξ > 0 and ν > 0,∑ni=1 E[(Xni)2I(|Xni| ≥

ξ)] ≤ (1/ξ)ν∑ni=1 E[X2+νni ], we can show their (3.33), (3.34)

and (3.36) under Assumptions of Theorem 1. Thus we have theresult by using the moment condition E[‖vi‖4+ε] < ∞. WhenCondition (VI) holds, we only need E[‖vi‖2+ε] < ∞ because∑ni=1 s

(n)2i

p→ 0. When z(n)i are stochastic, we utilize the additional

moment condition to show the conditions of Theorem 3.5 of Halland Heyde (1980). �

Proof of Theorem 3. (i) We make use of the fact that Z(Z′Z)−1Z′and Z2.1(Z′2.1Z2.1)

−1Z′2.1 are idempotent of ranks Kn and K2n, re-spectively, and that the boundedness of E[v4ji |z

(n)i ] implies a Lin-

deberg condition sup1≤i≤n E[v′iviI(v

ivi > a)|z(n)1 , . . . , z

(n)n

]p−→ 0

(a→∞). Let

G∗1 =√n[1nG−

1n5′2nA22.152n

]=1√n5′2nZ

2.1V+1√nV′Z2.152n

+1√nV′Z2.1A−122.1Z

2.1V. (6.26)

Since the matrix V′Z2.1A−122.1Z′

2.1V is positive definite and E[v(n)iv(n)

i |z(n)i ] is bounded, there is a (constant) � such that

E

[1√nV′Z2.1A−122.1Z

2.1V]= E

[1√n

n∑i=1

�(n)i p

(n)ii

]

≤K2n√n� −→ O (6.27)

when 0 < η < 1/2. Then

G∗1β −1√n5′2nZ

2.1Vβ =1√nV′Z2.1A−122.1Z

2.1Vβp→ 0. (6.28)

For the LIML estimator (2.8) implies

(0, IG2)[1n5′2nA22.152n +

1√nG∗1 − λn

1qn

H](

1−β2.LI

)= 0. (6.29)

By using the facts that (1/√n)G∗1

p→ O, λn

p→ 0 and [1/qn]H

p→

�,we have

822.1(β2, IG2) plimn→∞

(1−β2.LI

)= 0,

which implies plimn→∞β2.LI = β2 because 822.1 is positivedefinite. Then again (2.8) implies

√n[1n5′2nA22.152n +

1√nG∗1 − λn

1qn

H] [β + (βLI − β)

]= 0. (6.30)

Lemma 4. Let λn (n > 2) be the smallest root of (2.9). (i) For0 < ν < 1− η and 0 ≤ η < 1,

nνλnp−→ 0 (6.31)

as n→∞. (ii) For 0 ≤ η < 1,

√n[λn −

K2nn

]p−→ 0 (6.32)

as n −→∞.

Proof of Lemma 4. Write

λn = minb

b′ 1nGbb′ 1qnHb

≤qnnβ′Gββ′Hβ

=qnn

β′V′Z2.1A−122.1Z′

2.1Vββ′V′(In − Z(Z′Z)−1Z′)Vβ

. (6.33)

By using the boundedness of the fourth order moments of vi, wehave

1n

n∑i=1

viv′ip→ �. (6.34)

Also n−(1−ν)V′Z2.1A−122.1Z′

2.1Vp→ O by using the similar arguments

as (6.28). Then

nνλn ≤[qnn

] n−(1−ν)β′V′Z2.1A−122.1Z′2.1Vβn−1β′V′(In − Z(Z′Z)−1Z′)Vβ

p−→ 0 (6.35)

as n → ∞. The result (ii) follows from (6.31) and (6.34) andσ 2 = β′�β > 0. �

Due to Lemma 4,√n λn

p→ 0 when 0 ≤ η < 1/2 (and

the asymptotic distributions of the LIML and TSLS estimators areequivalent). Then

(0, IG2)1n5′2nA22.15

(n)22

√n(β2.LI − β2)− (0, IG2)G

1βp→ 0. (6.36)

We notice that

1n

n∑i=1

�(n)i ⊗5

(n)′22 z∗inz

∗′

in5(n)22 − �⊗822.1

=1n

n∑i=1

(�(n)i − �)⊗5

(n)′22 z∗inz

∗′

in5(n)22

+1n

n∑i=1

�⊗

[5(n)′22 z∗inz

∗′

in5(n)22 −822.1

]p−→ O

because Condition (II′) the conditions imposed on �(n)i (i =1, . . . , n).Then by applying CLT to (1/

√n)5(n)′

22 Z′2.1Vβ, we obtain thelimiting normal distribution N(0, σ 2822.1). This proves (i) ofTheorem 3 for 0 ≤ η < 1/2.(ii) We consider the asymptotic distribution of the LIML

estimator when 1/2 ≤ η < 1. By using the argument of (6.29) and

Page 10: Author's personal copy - Stanford Universitystatweb.stanford.edu/~ckirby/ted/papers/2010_Asymptotic...Author's personal copy T.W. Anderson et al. / Journal of Econometrics 157 (2010)

Author's personal copy

200 T.W. Anderson et al. / Journal of Econometrics 157 (2010) 191–204

the fact that λnp−→ 0, we have β2.LI − β2

p−→ 0. By multiplying

β′ from the left to (6.30), we have

β′{√n[K2nn− λn

]�+

1√nV′Z2.152n

+1√n[V′Z2.1A−122.1Z

2.1V− K2n�]

−λn

√nqn

H1}×

[β + (βLI − β)

]= 0.

Multiply (6.30) on the left by (0, IG2) to obtain

(0, IG2)√n{[1n5′2nA22.152n +

K2nn�

]+1√n

[1√n5′2nZ

2.1V+1√nV′Z2.152n

+1√n(V′Z2.1A−122.1Z

2.1V− K2n)�]

− λn1qn

H}×

[β + (βLI − β)

]= 0.

We consider the asymptotic behavior of the quadratic term

1√n[V′Z2.1A−122.1Z

2.1V− K2n�] =1√n

[n∑i,j=1

p(n)ij(viv′j − δ

ji�

(n)i

)]

+1√n

[n∑i=1

p(n)ii(�(n)i − �

)],

where δji is the indicator function (δii = 1 and δ

ji = 0 (i 6= j)). For

any constant vectors a and b, there exists a positive constant M1such that

1n

E

[n∑i,j=1

p(n)ij × a′(v(n)i v(n)′

j − δji�

(n)i )b

]2

=1n

E

[n∑i=1

p(n)2ii [a′(viv′i − �

(n)i )b]

2+

∑i6=j

p(n)2ij [a′vivjb]2

+

∑i6=j

p(n)2ij [a′viv′jba

′vjv′ib]

]

≤ M1K2nn−→ 0

because the conditional moments of v4ji are bounded,∑ni=1 p

(n)ii =

K2n and∑ni=1 p

(n)2ii ≤ K2n. Then we find

1√n

[V′Z2.1A−122.1Z

2.1V− K2n�] p−→ O (6.37)

when 0 ≤ η < 1 . We can use (6.30) and the fact that[1n5′2nA22.152n +

K2nn�− λn

1qn

H]β = op

(1√n

).

By multiplying the preceding equation out to separate the termswith factor β and with the factor

√n (βLI − β),we have

(0, IG2)[1n5′2nA22.152n

√n(βLI − β)+

1√n5′2nZ

2.1Vβ]

p→ 0, (6.38)

which is equivalent to822.1√n(β2.LI−β2)−

1√n5

(n)′22 Z′2.1Vβ

p→ 0 .

By applying the CLT to the second term, we complete the proof of(i) of Theorem 3 for the LIML estimator of β when 1/2 ≤ η < 1 .(iii) Next, we shall investigate the asymptotic property of the

TSLS estimator. If we substitute λn for 0 in (2.8), we have the TSLSestimator. Then we find that the limiting distribution of the TSLSestimator is the same as the LIML estimator when 0 ≤ η < 1/2.When η = 1/2, however, we have

G∗1β −[c�β +

1√n5′2nZ

2.1Vβ]

p−→ O. (6.39)

We set β′

TS = (1,−β′

2.TS), which is the solution of (2.10). Byevaluating each term of

(0, IG2)√n[1n5′2nA22.152n +

1√nG∗1

] [β + (βTS − β)

]= 0,

we have[1n5(n)′22 A22.152n

]√n(βTS − β)− (0, IG2)G

1β = op(1). (6.40)

Then the limiting distribution of√n(β2.TS − β2) is the same as

that of 8−122.1(0, IG2)G∗

1β. By using (1/√n)V′Z2.1A−122.1Z

2.1Vβp→

c�β and applying the CLT as (i), we have the result for the TSLSestimator of β when η = 1/2.When 1/2 < η < 1, we notice

n1−η[1nG−

1n5′2nA22.152n

]β =

K2nnη�β +

1nη5′2nZ

2.1Vβ

+1nη[V′Z2.1A−122.1Z

2.1V− K2n�]β. (6.41)

Because the last two terms of the right-hand side of (6.41) exceptthe first term are of the order op(n−η),we have

n1−η[1nG−

1n5′2nA22.152n

p−→ c�β (6.42)

as n −→∞. Hence by using the similar arguments as (i),

(0, IG2)1n5′2nA22.15

(n)22 × n

1−η(β2.TS − β2)− (0, IG2)c�β

p→ 0 (6.43)

and we complete the proof of (ii) of Theorem 3 for the TSLSestimator when 1/2 ≤ η < 1. �

Proof of Theorem 4. We set the vector of true parameters β′ =(1,−β′2) = (1,−β2, . . . ,−β1+G2). We write

βk = φk

(1nG,1qn

H)

(k = 2, . . . , 1+ G2). (6.44)

For the estimator to be consistent we need the conditions

βk = φk

[(β′2IG2

)822.1

(β2, IG2

)+ c�,�

](k = 2, . . . , 1+ G2) (6.45)

as identities in β2,822.1, and�. Let a (1+ G2)× (1+ G2)matrix

T(k) =(∂φk

∂gij

)= (τ

(k)ij )

(k = 2, . . . , 1+ G2; i, j = 1, . . . , 1+ G2) (6.46)

Page 11: Author's personal copy - Stanford Universitystatweb.stanford.edu/~ckirby/ted/papers/2010_Asymptotic...Author's personal copy T.W. Anderson et al. / Journal of Econometrics 157 (2010)

Author's personal copy

T.W. Anderson et al. / Journal of Econometrics 157 (2010) 191–204 201

evaluated at the probability limits of (6.45). We write a (1+ G2)×(1+ G2)matrix2 (= (θij))

2 =

(β′2IG2

)822.1

(β2, IG2

)=

[β′2822.1β2 β′2822.1822.1β2 822.1

],

where 822.1 = (ρm,l) (m, l = 2, . . . , 1 + G2), (822.1β2)l =∑1+G2j=2 βjρlj (l = 2, . . . , 1+G2), (β′2822.1)m =

∑1+G2i=2 βiρim (m =

2, . . . , 1+ G2), and β′2822.1β2 =∑1+G2i,j=2 ρijβiβj.

By differentiating each component of2with respect to βj (j =1, . . . ,G2),we have

∂2

∂βj=

(∂θlm

∂βj

), (6.47)

where ∂θ11∂βj= 2

∑1+G2i=2 ρjiβi (j = 2, . . . , 1+G2),

∂θ1m∂βj= ρjm (m =

2, . . . , 1+G2),∂θl1∂βj= ρlj (l = 2, . . . , 1+G2), and

∂θlm∂βj= 0 (l,m =

2, . . . , 1+ G2).Hence

tr(T(k)

∂2

∂βj

)= 2τ (k)11

1+G2∑i=2

ρjiβi + 21+G2∑i=2

ρjiτ(k)ji = δ

kj , (6.48)

where we define δkk = 1 and δkj = 0 (k 6= j). Define a (1 + G2) ×

(1+ G2) partitioned matrix

T(k) =

[τ(k)11 τ

(k)′2

τ(k)2 T(k)22

]. (6.49)

Then (6.48) is represented as

2τ (k)11 822.1β + 2822.1τ(k)2 = εk, (6.50)

where ε′k = (0, . . . , 0, 1, 0, . . . , 0) with 1 in the k-th place andzeros in other elements.Since822.1 is positive definite, we solve (6.50) as

τ(k)2 =

128−122.1εk − τ

(k)11 β2. (6.51)

Further, by differentiating2with respect to ρij,we have

∂2

∂ρii=

(∂θlm

∂ρii

), (6.52)

where ∂θ11∂ρii= β2i ,

∂θ1m∂ρii= βi (m = i), 0 (m 6= i),

∂θl1∂ρii= βi (l =

i), 0 (l 6= i) and ∂θlm∂ρii= 1 (l = m = i), 0 (otherwise). For i 6= j

∂2

∂ρij=

(∂θlm

∂ρij

), (6.53)

where ∂θ11∂ρij= 2βiβj,

∂θ1m∂ρij= βj (m = i), βi (m = j), 0 (m 6=

i, j), ∂θl1∂ρij= βj (l = i), βi (l = j), 0 (l 6= i, j), and

∂θlm∂ρij= 1 (l =

i,m = j or l = j,m = i), 0 (otherwise) for (2 ≤ l,m ≤ 1+ G2).Then we have the representation

tr(T(k)

∂2

∂ρij

)

=

{β2i τ

(k)11 + 2τ

(k)1i βi + τ

(k)ii (i = j)

2βiβjτ(k)11 + 2τ

(k)1j βi + 2τ

(k)1i βj + 2τ

(k)ij (i 6= j).

(6.54)

In the matrix form we have a simple relation as

τ(k)11 β2β

2 + τ(k)2 β′

2 + β2τ(k)′2 + T(k)22 = O. (6.55)

Then we have the representation

T(k)22 = −τ(k)11 β2β

2 − τ(k)2 β′

2 − β2τ(k)′2

= τ(k)11 β2β

2 −12

[8−122.1εkβ

2 + β2ε′

k8−122.1

].

Next we consider the role of the second matrix in (6.44). Bydifferentiating (6.45) with respect to ωij (i, j = 1, . . . , 1 + G2),we have the condition

c∂φk

∂gij= −

∂φk

∂hij(k = 2, . . . , 1+ G2; i, j = 1, . . . , 1+ G2)

evaluated at the probability limits. Let

S = G1 −√cc∗H1 =

[s11 s′2s2 S22

]. (6.56)

Since φ( · ) is differentiable and its first derivatives are bounded atthe true parameters by assumption, the linearized estimator of βkin the class of our concern can be represented as1+G2∑g,h=1

τ(k)gh sgh = τ

(k)11 s11 + 2τ

(k)′2 s2 + tr

[T(k)22 S22

]= τ

(k)11 s11 +

(ε′k8

−122.1 − 2τ

(k)11 β

2

)s2

+ tr[(τ(k)11 β2β

2 −8−122.1εkβ

2

)S22]

= τ(k)11

[s11 − 2β′2s2 + β

2S22β2]+ ε′k8

−122.1(s2 − S22β2)

= τ(k)11 β

′Sβ + ε′k8−122.1(s2, S22)β.

Let

τ11 =

τ(2)11...

τ(1+G2)11

(6.57)

and we consider the asymptotic behavior of the normalizedestimator

√n(β2 − β2) as

e =[τ11β

′+ (0,8−122.1)

]Sβ. (6.58)

Since the asymptotic variance–covariance matrix of Sβ has beenobtained by the proof of Theorem 1, Theorem 2 and Lemma 5below, we have

E[e e′

]=

[(τ11 +

1σ 2(0,8−122.1)�β

)β′

+ (0,8−122.1)(IG2+1 −

�ββ′

β′�β

)]× E[Sββ′S]

[(τ11 +

1σ 2(0,8−122.1)�β

)β′

+ (0,8−122.1)(IG2+1 −

�ββ′

β′�β

)]′= 9∗∗ + E

[(β′Sβ)2

] [τ11 + (0,8−122.1)

1σ 2�β

[τ ′11 +

1σ 2β′�

(0′

8−122.1

)]+ o(1),

where 9∗∗ has been given by Theorem 1 or Theorem 2. Thiscovariance matrix is the sum of a positive semi-definite matrix ofrank 1 and a positive definite matrix. It has a minimum if

τ11 = −1σ 2(0,8−122.1)�β . (6.59)

Hence we have completed the proof of Theorem 4. �

Page 12: Author's personal copy - Stanford Universitystatweb.stanford.edu/~ckirby/ted/papers/2010_Asymptotic...Author's personal copy T.W. Anderson et al. / Journal of Econometrics 157 (2010)

Author's personal copy

202 T.W. Anderson et al. / Journal of Econometrics 157 (2010) 191–204

Lemma 5. Under the assumptions of Theorem 2,

(0, IG2)[IG2+1 −

�ββ′

β′�β

]E[Sββ′Sβ|Z

]= op(1). (6.60)

Proof of Lemma 5. We need to evaluate each term of1n

E{[

u′Z2.1A−122.1Z′

2.1u− c∗u′(In − Z(Z′Z)−1Z′)u

[5(n)′22 Z′2.1u+W′2Z2.1A

−122.1Z

2.1u

− c∗W′2(In − Z(Z′Z)−1Z′)u]|Z},

whereW′2 = V′2 − (0, IG2)�βu′/σ 2.

By using the similar calculations as (6.12)–(6.14) on the thirdand fourth order moments, it is equivalent to

1n

n∑i=1

5(n)′22 z∗in

(p(n)ii − c∗(1− q

(n)ii ))

E(u3i )

+1n

n∑i=1

(p(n)ii − c∗(1− q

(n)ii ))2

E(u3i w2i).

Then by using Lemma 1, we have the desired result. �

Proof of Theorem 5. We use the arguments in a parallel way tothe proof of Theorem 1. In the nonlinear case we set

G = 5(z)′2n Z2.1A−122.1Z

2.15(z)2n + V′Z2.1A−122.1Z

2.1V

+5(z)′2n Z2.1A−122.1Z

2.1V+ V′Z2.1A−122.1Z′

2.15(z)2n

and

H = 5(z)′2n [In − ZA−1Z′]5(z)

2n + V′[In − ZA−1Z′]V

+5(z)′2n [In − ZA−1Z′]V+ V′[In − ZA−1Z′]5(z)

2n , (6.61)

where5(z)2n = 5

(n)2Z [β, IG2 ] and5

(n)2Z is given by (3.15).

Because of Condition (VII), (1/qn)H−(1/qn)V′[In−ZA−1Z′]V =op(1), then the essential arguments of the proof of Theorem1 hold.In the third case, however, we notice that the noncentrality term(i.e. the first term) of (1/n)G is of a smaller order than the secondterm (1/n)V′Z2.1A−122.1Z

2.1V. Hence in this case because (1/n)Gp→

c� and (1/qn)Hp→ �,we find

|c�− plimλn�| = 0 (6.62)

and hence plimλn = c. Then by using (2.8) we consider

nd2n

[(1nG− c�

)− (λn − c)�− c

(1qn

H− �)]plimβLI

= op(1). (6.63)

By evaluating each terms as in the proof of Theorem 1,[β′2IG2

]822.1(β2, IG2) plimβLI = op(1) (6.64)

and thus βLIp→ β as n→∞ because822.1 is nonsingular.

For the asymptotic normality of the LIML estimator, we use thesimilar arguments as (6.6)–(6.8) in the proof of Theorem 1. In thepresent case, the equation corresponding to (6.8) becomes

(0, IG2)[IG2+1 −

1σ 2�ββ′

](G1 −

√cc∗H1)β

= 822.1d2n√n(β2.LI − β2)+ op(1), (6.65)

Fig. A.1. CDFs of Standardized LIML, TSLS and OLS estimators and approximations:n− K = 30, K2 = 5, α = 0.5, δ2 = 30, ui = N(0, 1).

Fig. A.2. CDFs of Standardized LIML, TSLS and OLS estimators and approximations:n− K = 30, K2 = 30, α = 0.5, δ2 = 50, ui = N(0, 1).

Fig. A.3. CDF of Standardized LIML estimator and approximations: n − K =100, K2 = 30, α = 0.5, δ2 = 30, ui = t(5).

where G1 and H1 are defined in a similar way as the proof ofTheorem 1 Because d2n/n → 0, the first term of (6.9) convergesto a zero vector and 43.2 = O as n → ∞. Then we havethe result. The proof of optimality is similar to Theorem 4, whichis omitted. �

Page 13: Author's personal copy - Stanford Universitystatweb.stanford.edu/~ckirby/ted/papers/2010_Asymptotic...Author's personal copy T.W. Anderson et al. / Journal of Econometrics 157 (2010)

Author's personal copy

T.W. Anderson et al. / Journal of Econometrics 157 (2010) 191–204 203

Fig. A.4. CDF of Standardized LIML estimator and approximations: n − K =30, K2 = 5, α = 1, δ2 = 30, ui = N(0, 1).

Fig. A.5. CDF of Standardized LIML estimator and approximations: n − K =30, K2 = 30, α = 1, δ2 = 50, ui = N(0, 1).

Fig. A.6. CDF of Standardized LIML estimator and approximations: n − K =100, K2 = 30, α = 1, δ2 = 30, ui = (χ2(3)− 3)/

√6.

Appendix. Tables and figures

The distribution functions of the LIML estimator are shownin Figures with the large sample normalization. The limitingdistributions for the LIML estimator in the standard largeasymptotics are N(0, 1) as n → ∞, which are denoted as ‘‘o’’while the limiting distributions for the LIML estimator in thelarge K2 asymptotics are N(0, a) (a ≥ 1), which are denoted

Table A.1Null distributions of t-ratios: n− K = 100, K2 = 30, δ2 = 30, α = 1, ui = t(5).

Normal tlarge−n tlarge−K telliplarge−K tnonnormallarge−K

X05 −1.65 −2.60 −2.02 −2.02 −2.02X10 −1.28 −1.96 −1.51 −1.51 −1.51MEDN 0 0.00 0.00 0.00 0.00X90 1.28 1.23 0.95 0.95 0.95X95 1.65 1.43 1.13 1.13 1.13

P(t < z05) 5.0% 13.6% 8.4% 8.4% 8.5%P(t > z95) 5.0% 1.8% 0.2% 0.2% 0.2%P(|t| > z975) 5.0% 10.2% 5.5% 5.5% 5.5%P(|t| > z95) 10.0% 15.4% 8.7% 8.6% 8.6%

by the dashed curves and ‘‘x’’. For the sake of comparisons,the distribution functions of the OLS and TSLS estimators arenormalized in the same way. The parameter α stands for thenormalized coefficient of an endogenous variable, and the detailsof numerical computation method are given in Anderson et al.(2005, forthcoming). The tables of t-ratios include the 5, 10, 90 and95 percentiles in one-side or two-sides, of the null-distributionsfor each case. The details of computation method are given inMatsushita (2006).In Figures and Tables, ui = N(0, 1)means that the disturbance

terms follow N(0, 1). In the same token we use ui = (χ2(3) −3)/√6 and ui = t(5) (the numbers in the parentheses are the

degrees of freedom) and zα stands for the α-percentile of N(0, 1).tlarge−n, tlarge−K , t

elliplarge−K , and t

nonnormallarge−K stand for the alternative

approximations based on different variance formulae in Section 3.

References

Anderson, T.W., 1976. Estimation of linear functional relationships: Approximatedistributions and connections to simultaneous equations in econometrics.Journal of the Royal Statistical Society, B 38, 1–36.

Anderson, T.W., 1984. Estimating linear statistical relationships. Annals of Statistics12, 1–45.

Anderson, T.W., 2003. An Introduction to Multivariate Statistical Analysis, 3rdedition. John-Wiley.

Anderson, T.W., 2005. Origins of the limited information maximum likelihood andtwo-stage least squares estimators. Journal of Econometrics 127, 1–16.

Anderson, T.W., Kunitomo, N., 1992. Asymptotic distributions of regression andautoregression coefficients with martingale difference disturbances. Journal ofMultivariate Analysis 40, 221–243.

Anderson, T.W., Kunitomo, N., Matsushita, Y., 2005. A new light from old wisdoms:Alternative estimation methods of simultaneous equations with possibly manyinstruments, Discussion Paper CIRJE-F-321, Graduate School of Economics,University of Tokyo (http://www.e.u-tokyo.ac.jp/cirje/research/dp/2005).

Anderson, T.W., Kunitomo, N., Matsushita, Y., 2008. On finite sample propertiesof alternative estimators of coefficients in a structural equation with manyinstruments, Discussion Paper CIRJE-F-576, Graduate School of Economics,University of Tokyo, The Journal of Econometrics (forthcoming).

Anderson, T.W., Kunitomo, N., Sawa, T., 1982. Evaluation of the distribution functionof the limited information maximum likelihood estimator. Econometrica 50,1009–1027.

Anderson, T.W., Rubin, H., 1949. Estimation of the parameters of a single equationin a complete system of stochastic equations. Annals of Mathematical Statistics20, 46–63.

Anderson, T.W., Rubin, H., 1950. The asymptotic properties of estimates of theparameters of a single equation in a complete system of stochastic equation.Annals of Mathematical Statistics 21, 570–582.

Angrist, J.D., Krueger, A., 1991. Does compulsory school attendance affect schoolingand earnings. Quarterly Journal of Economics 106, 979–1014.

Bekker, P.A., 1994. Alternative approximations to the distributions of instrumentalvariables estimators. Econometrica 63, 657–681.

Bekker, P.A., van der Ploeg, J., 2005. Instrumental variables estimation based ongroup data. Statistica Neerlandica 59 (3), 239–267.

Bound, J., Jaeger, D.A., Baker, R.M., 1995. Problems with instrumental variablesestimation when the correlation between the instruments and the endogenousexplanatory variables is weak. Journal of the American Statistical Association90, 443–450.

Chao, J., Swanson, N., 2004. asymptotic distributions of JIVE in a heteroscedastic IVregression with many instruments, working paper.

Chao, J., Swanson, N., 2005. Consistent estimation with a large number of weakinstruments. Econometrica 73, 1673–1692.

Page 14: Author's personal copy - Stanford Universitystatweb.stanford.edu/~ckirby/ted/papers/2010_Asymptotic...Author's personal copy T.W. Anderson et al. / Journal of Econometrics 157 (2010)

Author's personal copy

204 T.W. Anderson et al. / Journal of Econometrics 157 (2010) 191–204

Chao, J., Swanson, N., 2006. Asymptotic normality of single-equation estimatorsfor the case with a large number of instruments. In: Corbae, D., Durlauf, S.,Hansen, B. (Eds.), Econometric Theory and Practice. Cambridge UniversityPress.

Chioda, J., Jansson,, 2007. Optimal invariant inference when the number ofinstruments is large, Unpublished Manuscript.

Donald, S., Newey, W., 2001. Choosing the number of instruments. Econometrica69–5, 1161–1191.

Fuller, W., 1977. Some properties of a modification of the limited informationestimator. Econometrica 45, 939–953.

Hahn, J., 2002. Optimal inference with many instruments. Econometric Theory 18,140–168.

Hall, P., Heyde, C., 1980. Martingale Limit Theory and its Applications. AcademicPress.

Hansen, C., Hausman, J., Newey, W.K., 2008. Estimation with many instrumentalvariables. Journal of Business and Economic Statistics 26–4, 398–422.

Hausman, J., Newey, W., Woutersen, T., Chao, J., Swanson, N., 2007. Instrumentalvariables estimation with heteroscedasticity and many instruments, Unpub-lished Manuscript.

Hayashi, F., 2000. Econometrics. Princeton University Press.Hsiao, C., Tahmiscioglu, A.K., 2008. Estimation of dynamic panel data models withboth individual and time specific effects, Unpublished Manuscript.

Kunitomo, N., 1980. Asymptotic expansions of distributions of estimators in a linearfunctional relationship and simultaneous equations. Journal of the AmericanStatistical Association 75, 693–700.

Kunitomo, N., 1981. Asymptotic optimality of the limited information maximumlikelihood estimator in large econometric models. The Economic StudiesQuarterly XXXII-3, 247–266.

Kunitomo, N., 1982. Asymptotic efficiency and higher order efficiency of the limitedinformation maximum likelihood estimator in large econometric models,Technical Report No. 365, Institute for Mathematical Studies in the SocialSciences, Stanford University.

Kunitomo, N., 1987. A third order optimum property of the ML estimator in alinear functional relationship model and simultaneous equation system ineconometrics. Annals of the Institute of Statistical Mathematics 39, 575–591.

Kunitomo, N., 2008. Improving the LIML estimation with many instruments andpersistent heteroscedasticity, Discussion Paper CIRJE-F-576, Graduate School ofEconomics, University of Tokyo.

Kunitomo, N., Matsushita, Y., 2009. Asymptotic expansions and higher orderproperties of semi-parametric estimators in a linear simultaneous equations.Journal of Multivariate Analysis 100, 1727–1751.

Matsushita, Y., 2006. t-Tests in a structural equation with many instruments,Discussion Paper CIRJE-F-399, Graduate School of Economics, University ofTokyo.

Morimune, K., 1983. Approximate distributions of k-class estimators when thedegree of overidentification is large compared with sample size. Econometrica51–3, 821–841.

Stock, J., Yogo, M., 2005. Asymptotic distributions of instrumental variablesstatistics with many instruments. In: Andrews, D., Stock, J. (Eds.), Identificationand Inference for Econometric Models. Cambridge University Press.

van der Ploeg, J., Bekker, P.A., 1995. Efficiency bounds for instrumental variablesestimators under group-symptotics, Unpublished Manuscript.

van Hasselt, M., 2006. Many instruments asymptotic approximations under non-normal distributions, University of Western Ontario, Working Paper.