asymptotic theory — part vjenni.uchicago.edu/econ312/slides/topic2/asymp5... · 4/18/2006  ·...

40
Non-linear Least Squares and Durbin’s Problem Asymptotic Theory — Part V James J. Heckman University of Chicago Econ 312 This draft, April 18, 2006

Upload: others

Post on 10-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

Non-linear Least Squaresand Durbin’s ProblemAsymptotic Theory — Part V

James J. HeckmanUniversity of Chicago

Econ 312This draft, April 18, 2006

Page 2: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

This lecture consists of two parts:

1. Non-linear least squares: This looks at Non-linear leastsquares estimation in detail; and

2. Durbin’s problem: This examines the correction of as-ymptotic variances in the case of two stage estimators.

1

Page 3: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

1 Nonlinear Least Squares

In this section, we examine in detail the Non-linear LeastSquares estimator. The section is organized as follows:

• Section 1.1: Recap the analog principle motivation forthe NLLS estimator (using the extremum principle);

• Section 1.2: Consistency of the NLLS estimator;• Section 1.3: Draw analogy with the OLS estimator;• Section 1.4: Asymptotic normality of NLLS estimator;• Section 1.5: Discussion of asymptotic e ciency;

• Section 1.6: Estimation of b .

2

Page 4: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

1.1 NLLS estimator as an application of theExtremum principle

Here we recap the derivation of the NLLS estimator as an ap-plication of the Extremum principle, from section 3.2 of thenotes Asymptotic Theory II, with slight modification in no-tation. As noted there, we could also motivate NLLS as amoment estimator (refer section 3.2 of Asymptotic Theory II).

1. The model: We assume that in the population the fol-lowing model holds:

= ( | 0) + (1)

= ( ; ) + [ ( ; 0) ( ; )] +

where is a vector of exogenous variables. Unlike in

3

Page 5: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

the linear regression model, may not necessarily be ofthe same dimension as . Since ( | ) is a nonlinearfunction of and , (*) is called the nonlinear regressionmodel. Assume ( ) i.i.d.; so that:

( ; ) . Then we can write out a least squarecriterion function as below.

2. Criterion function: We choose criterion function as:

= ( ( ; ))2 = [ ( ; 0) ( ; )]2 + 2

Then possess the property that it is minimized at =

0 (true parameter value). If = 0 is the only suchvalue, model is identified (wrt criterion).

4

Page 6: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

3. Analog in sample:

Pick: ( ) =1X

=1

( ( ; ))2

as analog to in the sample. As established in the OLScase in the notes Asymptotic Theory II (Section 3.2), wecan show that plim = .

4. The estimator: We construct the NLLS estimator as:ˆ = argmin ( )

Thus we chose ˆ to minimize ( ).

In the next few sections, we establish consistency and asymp-totic normality for the NLLS estimator (under certain condi-tions), and discuss conditions for asymptotic e ciency.

5

Page 7: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

1.2 Consistency of NLLS estimator

Assume:

1. i.i.d., ( ) = 0 ( 2) = 2 ;

2. 0 is a vector of unknown parameters;

3. Assume 0 0;

4. exists and is continuous in nbd of 0;

5. ( ) is continuous in uniformly in (i.e., for every0 there exists 0 such that | ( 1) ( 2)|

for 1 2 closer than (i.e. || 1 2|| ), for all 1 2

in nbd of 0 and for all );

6

Page 8: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

6.1 P

=1

( 1) ( 2) converges uniformly in 1 2 in nbd of

0;

7. lim1 P

( ( 0) ( ))2 6= 0 if 6= 0.

Then, we have that there exists a unique root b such that:b = argminX( ( | ))2;

and that it is consistent, i.e.:b0

Proof: Amemiya p. 129. The proof is an application of theExtremumAnalogy Theorem for the class of estimators definedas b = argmin ( ).

7

Page 9: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

1.3 Analogy with OLS estimator

Gallant (1975): Consider the NLLS model from (1) above:

= ( | ) +

Now expand in nbd of in Taylor series to get:

= ( | ) +( | )

0

¯̄̄̄( ) +

Rewrite the equation as:

( | ) +( | )

0

¯̄̄̄=

( | )+

8

Page 10: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

Now by analogy with classical linear regression model, we have:

• ( | ) +( | )

0

¯̄̄̄is analogous to the

dependent variable in OLS.

• ( | )is analogous to the independent variables ma-

trix in OLS.

9

Page 11: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

The NLLS estimator is:

b =Xμ

( | )¶0

μ( | )

0

¶0

¸ 1

(2)

×μX ( | ) ¶

0

so that in comparison to the OLS estimator we have:

• 0 replaced byP

=1

Ø!Ã

˜0

!; and

• 0 replaced byP

=1

μ ¶,

where ˜= ( 1 ) = ( 1 ).

10

Page 12: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

Then analogy with OLS goes through exactly. Now, as for theOLS case, we can do hypothesis testing, etc., using derivativesin nbd of optimum.

Using the analogy, we also obtain the estimator for Asy. var(b ) as:

\Asy. var(b ) = ˆ2(˜0 ˜ ) 1 where ˜0 =( )

¯̄̄̄ˆ

11

Page 13: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

1.4 Asymptotic normality

To justify large sample normality, we need additional condi-tions on the model. The required conditions for asymptoticnormality, assuming the conditions for consistency hold, arethe following .

1. lim1 P

=1

¯̄̄̄0

0

¯̄̄̄0

= a positive definite matrix;

2.1 P

=10 converges uniformly to a finite matrix in an

open nbd of 0;

3.2

0 is continuous in in an open nbd of 0 uniformly

( · need uniform continuity of first and second partials);

12

Page 14: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

4. lim12

P=1

2

0

¸= 0 for all in an open nbd of 0;

and

5.1 P

=1

( 1)2

0

¯̄̄̄2

converges to a finite matrix uniformly

for all 1 2 in an open nbd of 0

Then:(b 0) (0 2 1)

where 2 = ( ).Sketch of proof (For rigorous proof, see Amemiya, p.132-4).Theintuition for this result is exactly as in Cramer’s Theorem (referto Section 2 of notes Asymptotic Theory III).

13

Page 15: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

Look at first order condition:

=2X

=1

( ( ))

Then as in Cramer’s theorem (Theorem 3 in handout III) weget: ¯̄̄̄

0

=1 X

( )

¯̄̄̄0

14

Page 16: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

This is asymptotically normal (i.i.d. r.v.) by Lindeberg-LevyCentral Limit Theorem. Then using equation (2) we obtain:

(b 0) =

"1X

=1

μ( | )¶μ ( | )

0

¶# 1

× 1 X=1

μ( | )¶

( )

We get that this is asymptotically normal in nbd of 0, if£1P¡ ¢ ¡

0¢¤

converges uniformly to a non-singular ma-trix (which is true by assumption).This completes the analogy with the Cramer’s theorem provedin earlier lecture. (See Amemiya for a rigorous derivation.Also, see the result in Gallant.)

15

Page 17: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

1.5 Asymptotic e ciency of NLLS estima-tor

Analogy of the NLLS estimator with is complete if weassume is normal. Then, we get the log likelihood function:

ln$ =2ln 2 1

2 2

X( ( | ))2

So that here we get b = b (FOC and asy. theory asbefore).

16

Page 18: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

Thus we obtain the general result that any nonlinear regressionmodel if we have that normal. Though thenonlinear regression is picking another criterion, the estimatoris identical to the MLE estimator.

· NLLS estimator is e cient in normal case. In general,Greene (p. 305-8) shows that (unless is normal) NLLS isnot necessarily asymptotically e cient.

17

Page 19: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

1.6 Estimation of bNow, consider the problem of numerical estimation: How toobtain b? The two commonly used methods are:i. Newton-Raphson; and

ii. Gauss-Newton.

18

Page 20: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

1.6.1 Newton-Raphson Method

In the NLLS case, we wish to find a solution to the equation:( )

= 0. This is true for many criteria outside of NLLS

(all criteria in Asymptotic Theory handout III).

We expand the criterion function ( ) in nbd of an initial

starting value b1, by a second order (quadratic) Taylor seriesapproximation to get:

( ) ' (b1)+ ¯̄̄̄01

( b1)+12( b1)0 2

0 ( b1) (3)This quadratic problem has a solution if Hessian matrix

2

0is a definite matrix (pos. def. for min).

19

Page 21: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

In equation (3), we min ( ) wrt (by taking the FOC) andobtain the algorithm:

b2 = b1 2

0

¸ 1

1

¯̄̄̄1

We continue iteration until convergence occurs. The methodassumes that we can approximate with a quadratic. Somethe drawbacks of the method and possible fixes are discussedbelow.

(A) Singular Hessian: There is a problem if the Hessiansingular: the method fails as we are unable then to obtain

2

0

¸ 1

1.

20

Page 22: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

In case the Hessian is singular, the following correctioncould be used : Use such that:

2

0

¸neg. def.

Usually we pick scalar (obviously can pick vectors).One can then fiddle with this to get out of nbd of lo-cal singularity. In applications of the Newton-Raphson method, one could use idea due to T.W. An-derson on reading list and note that asymptotically:μ

2 ln0

¶=

μln ln

0

¶to arrive at an alternative estimator (sometimes calledBHHH but method due to Anderson) for the Hessian.

21

Page 23: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

(B) Algorithm Overshoots: In this case, one could scalethe step back by :

b2 = b1 2

0

¸ 1

1

¯̄̄̄1

We choose 0 1 so that the iteration di erences getdampened, reducing chances of overshooting.

22

Page 24: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

1.6.2 Gauss-Newton Method

The motivation for the Gauss-Newton method mimics exactlythe NLLS set up in section 1.3 where we drew the analogy withOLS. Expanding in nbd of some initial starting value b1, weget:

( | b1) + ( | b1)b1 = ( | b1)b2 + 2

This set-up is analogous to OLS; the LHS and part of RHS aredata once one knows (guesses) the starting value b1. Then doOLS, to get the next iteration in the algorithm:

b2 = " 1X=1

0

# 1

1

1X=1

( | b1)23

Page 25: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

so that we get:

b2 = b1+" 1X=1

0

# 1

1

1X=1

( | b1) h( | b1)i

Revise, update, start all over again. This method has the sameproblems as in Newton-Raphson.

(A) Singular Hessian: As in the Newton-Raphson method,to solve for optimum use:

0 +¸

scalar.

(B) Algorithm Overshoots: To avoid overshooting, use

24

Page 26: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

Hartley modification. Form:

1 =

"1X

=1

0

# 1

1

1X=1

( | b1) h( | b1)i

Then choose 0 1 such that:

(b1 + 1) (b1)where ( ) =

P( ( | ))2. Update by settingb2 = b1 + 1. Then algorithm converges to a root of

the equation. General Global convergence is a mess, un-resolved.

25

Page 27: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

1.6.3 E ciency theorems for estimation methods

Theorem 1 One Newton-Raphson Step toward an optimum isfully e cient if you start from an initial consistent estimator.

This theorem suggests a strategy for quick convergence. Getcheap (low computational cost) estimator which is consistentbut not e cient. Then iterate once — avoids computationalcost. (True also for Gauss-Newton). Note that here one mustuse unmodified Hessians (without corrections for overshootingor singularity).

Proof. Suppose b1 0 and (b1 0) (0X

0

)

It is consistent but not necessarily e cient. Now expand root

26

Page 28: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

of likelihood equation in nbd of b1 to get:ln$

¯̄̄̄1=

ln$¯̄̄̄0

+2 ln$

0

¯̄̄̄1

(b1 0)

b1 does not necessarily set left hand side to zero. If it did, wewould have an e cient estimator. As before 1 is intermediatevalue.

27

Page 29: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

Now look at Newton-Raphson criterion.

b20 = (b1 0)

2 ln$0

¸ 1

1

ln$¸

1

= (b1 0)2 ln$

0

¸ 1

1

ln$¯̄̄̄0

+2 ln$

0

¯̄̄̄(b1 0)

Multiplying by and collecting terms, we get:³b20

´=

1 2 ln$0

¸ 1

1

1 ln$¯̄̄̄0

+

"1 2 ln$

0

¸ 1

1

1 2 ln$0

¯̄̄̄1

# ³b10

´28

Page 30: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

1 10 0

ln$¯̄̄̄0

+h£

10 0 0 0

¤ ³b10

´iThe second term vanishes as (ˆ

1

0) is 0 (1). Therefore,one Newton-Raphson step satisfies likelihood equation at 0.

29

Page 31: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

Same result obviously holds for Gauss-Newton. One G-N stepfor a consistent estimator is fully e cient (or at least as e cientas NLLS).

Thus starting from a consistent estimator (where possible)saves computer time, avoids problems of nonlinear optimiza-tions and also avoids local optimization problem (i.e., possibil-ity of arriving at an inconsistent local optima).

30

Page 32: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

2 “Durbin Problem”

Durbin’s problem is question of arriving at the correct variance-covariance matrix for a set of parameters estimated in the sec-ond step of a two-step estimation procedure.

For example, let 0 = (¯1 ¯2), where ¯1 ¯2 are “true values”,as in the case with the composite hypothesis considered in theearlier lecture (Asymptotic theory IV).

31

Page 33: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

Suppose we use an initial consistent estimator for 2. Then ifwe treat likelihood as if 2 known (but it is estimated by ˜2),we have:

ln$¯̄̄̄1 2| {z }

=0

=1 ln$

1

¯̄̄̄0

+1 2 ln$

101

¯̄̄̄ ³b1 1

´+1 2 ln$

102

¯̄̄̄ ³˜2 2

´| {z }

“Durbin Problem”

We assume sample sizes the same in both samples.

32

Page 34: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

which implies³b1

¯1

´= 1

1 1

1 ln$

1

¯̄̄̄0

11 1 1 2

³˜2

¯2

´where $̃ is from the likelihood with sample size used to produce ˜2.

33

Page 35: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

Thus to obtain the right covariance matrix for ˆ1, we needcovariance between the two score vectors. We have:

(˜2 ¯2) =

Ã1 2$̃

202

! 1Ã$̃

2

!= 1

2 2

Ã$̃

2

!

which implies

(ˆ1 ¯1) = 1

1 1

1 ln$

1

¯̄̄̄0

11 1 1 2

12 2

1 $̃

2

¯̄̄̄¯0

34

Page 36: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

We need to compute covariance to get the right standard errors.Just form a new covariance matrix:

(ˆ1 ¯1) = 1

1 1+ 1

1 1 1 2˜ 12 2 2 1

11 1

11 1 1 2

˜2 2 ( 1̂ 2̃)

11 1

11 1

( 1̂ 2̃)˜ 2 2 2 1

11 1

where (now we assume 2 di erent sample sizes):

35

Page 37: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

1 =1 ln$

1

¯̄̄̄˜1ˆ2

where is the sample size for primary samples.

2 =1p˜

ln $̃

2

¯̄̄̄¯2

where ˜ is the sample size for samples used to get ˜2.

In the independent sample case, we get that last two terms in(*) vanish, so that we get

(ˆ1 ¯1) =

11 1+ 1

1 1 1 2˜ 12 2 2 1

11 1

36

Page 38: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

2.1 Concentrated Likelihood Problem

This problem seems similar to, but is actually di erent fromthe Durbin problem. Here we have a log likelihood functionwhich has two sets of parameters, ln$( ).

In the first step here, we solveln$

( ) = 0 to get ( )

We then optimize with respect to ( ).

While this looks like the two-step estimator in Durbin’s prob-lem, here we are not using an estimate of , but rather using( ). In fact here we can show that this is the same as jointmaximization of .

37

Page 39: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

Using the Envelope Theorem (i.e., utilizing the fact that ( )is arrived at through an optimization), we get:

ln$( ( ))=

ln$( ( ))

2 ln$( ( ))0 =

2 ln$( ( ))0 +

2 ln$( ( ))

Now, we also have:

ln$( )¸=

2 ln$+

( )

2 ln$0 = 0

= =2 ln$

μ2 ln$

0

¶ 1

38

Page 40: Asymptotic Theory — Part Vjenni.uchicago.edu/econ312/Slides/topic2/asymp5... · 4/18/2006  · (all Tcriteria in Asymptotic Theory handout III). We expand the criterion function

Substituting into previous expression, we get:

plim1 2 ln$( ( ))

0 = ( 1 )

= The asymptotic distribution is the same for if we esti-mate jointly or through the concentrated likelihood approach.

(Refer to Asymptotic Theory — Lecture IV, section on compos-ite hypothesis for the distribution of sub-vector of parameterswhen estimation is done jointly.)

39