part 6: mle for re models [ 1/38] econometric analysis of panel data william greene department of...

Part 6: MLE for RE Models [ 1/38]

Econometric Analysis of Panel Data

William Greene

Department of Economics

Stern School of Business


The Random Effects Model The random effects model

ci is uncorrelated with xit for all t; E[ci |Xi] = 0

E[εit|Xi,ci]=0

it it i it

i i i i i

i i i i i i i

Ni=1 i

1 2 N

y = +c+ε , observation for person i at time t

= +c + , T observations in group i

= + + , note (c ,c ,...,c )

= + + , T observations in the sample

c=( , ,... ) ,

x β

y Xβ i ε

Xβ c ε c

y Xβ c ε

c c c Ni=1 iT by 1 vector


Error Components Model

Generalized Regression Model

it it i

it i

2 2it i

i i

2 2i i u

i i i i

y +ε +u

E[ε | ] 0

E[ε | ] σ

E[u | ] 0

E[u | ] σ

= + +u for T observations

it

i

x b

X

X

X

X

y Xβ ε i

2 2 2 2u u u

2 2 2 2u u u

i i

2 2 2 2u u u

Var[ +u ]

ε i


Notation2 2 2 2

u u u2 2 2 2u u u

i i

2 2 2 2u u u

2 2u i i

2 2u

i

1

2

N

Var[ +u ]

= T T

=

=

Var[ | ]

i

i

T

T

ε i

I ii

I ii

Ω

Ω 0 0

0 Ω 0w X

0 0 Ω

i

(Note these differ only

in the dimension T)


Maximum Likelihood

i

it i

i1 i2 iT i i

i i

2 2u

Assuming normality of and u.

Treat T joint observations on [( , ,... ),u ] as one T

variate observation. The mean vector of u is zero

and the covariance matrix is = I .

The ji

ε i

Ω ii

iT / 2 1/ 2 12

Ni=1 i

2 2i u i

i

oint density for ( ) is

f( ) (2 ) | | exp ( ) ( )

logL= logL where

-1logL ( , ) = T log2 log| | ( ) ( )

2-1

= T l2

i i i

-1i i i i i i i

-1i i i i i i

ε y - Xβ

ε Ω y - Xβ Ω y - Xβ

β, Ω y - Xβ Ω y - Xβ

og2 log| | -1i i i iΩ ε Ω ε


MLE Panel Data Algebra (1)

i

2

T2 2 2i u

2

i i i i i i2 2 2i u

2 2i i

i i2 2 2i u

1I

T

So,

1T

(T )1

T

-1i

-1i

Ω ii

εΩ ε εε εii ε

εε


MLE Panel Data Algebra (1, cont.)

i2 2 2 2 2

u

T2

t 1

2 2

= = [ ]=

| |=( ) , = a characteristic root of

Roots are (real since is symmetric) solutions to =

= = + or ( ) ( 1)

Any vector whose e

i

i

T

i t

Ω I ii I ii A

Ω A

A Ac c

Ac c c ii c i i c = - c

i i

2

lements sum to zero ( =0)

is a characteristic vector that corresponds to root = 1.

There are T -1 such vectors, so T - 1 of the roots are 1.

Suppose 0. Premultiply by to find

( )

i c

i c i

i i i c =

ii i

2i

2i

TT T2 2 2t it 1

( 1) = T ( )=( 1) . Since 0,

divide by it to obtain the remaining root =1+T .

Therefore, | |=( ) ( ) (1 T )i

- i c i c - i c i c

Ω


MLE Panel Data Algebra (1, conc.)

i i

2 22 2 i i

i i i i i2 2 2i u

Ni 1 i

2 N N 2 Ni 1 i i 1 i i 1 i2

-1logL T log2 log| |

2

(T )-1 1 T log2 T log log(1 T )

2 T

logL logL

-1 1[(log2 log ) T + log(1 T )]

2 2

-1i i i iΩ ε Ω ε

εε

εε

2 2i i

i 2 2i u

2 2 2 2 22 2 2 i i i i i i

u 2 2 2 2 2 2i u i i

22 2i i i

i i i i2 2i

(T )T

(T ) (T ) (T )since / ,

T T 1 T

-T (T )1logL [(log2 log ) +log(1 T )]

2 2 1 Tεε


Maximizing the Log Likelihood Difficult: “Brute force” + some elegant theoretical results:

See Baltagi, pp. 22-23. (Back and forth from GLS to ε2

and u2.)

Somewhat less difficult and more practical: At any iteration, given estimates of ε

2 and u2 the estimator of

is GLS (of course), so we iterate back and forth between these. See Hsiao, pp. 39-40.

2 2u

2 2 2 2,r u,r r+1 ,r u,r

N ii=1 ii,r 1 D i,r 12

r+1 ,r+1 Ni=1 i

0. Begin iterations with, say, FGLS estimates of , , .

ˆ1. Given and , compute by FGLS ( , )ˆ ˆ ˆ ˆ

Mˆ ˆˆ2. Given compute = ˆ(T 1)

3. Gi

β

β

β

N 22 2 i=1 i.r 1

r+1 ,r+1 u,r+1

r+1 r

ˆˆven , compute = ˆ ˆN

ˆ ˆ4. Return to step 1 and repeat until - = .

β

β β 0


Direct Maximization of LogL

2 2 2u i i i i

2i i i i i i i

Simpler : Take advantage of the invariance of maximum

likelihood estimators to transformations of the parameters.

Let =1/ , = / , R T 1, Q / R ,

logL (1/ 2)[ ( Q(T ) ) logR T log T l

i iε ε og2 ]

Can be maximized using ordinary optimization methods (not

Newton, as suggested by Hsiao). Treat as a standard nonlinear

optimization problem. Solve with iterative, gradient methods.


Maximum Simulated Likelihood

it i i u i

i it u i it i

it i

Assuming and u are normally distributed. Write u = v

where v ~ N[0,1]. Then y = + v . If v were

observed data, all observations would be independent, and

log f(y | ,v ) 1

it

it

x β

x

i

2 2 2it u i

2 2

i i

T2 2 2 2i u i t 1 it u i

/ 2[log2 log (y - - v ) / ]

Let 1/

The log of the joint density for T observations with common v is

logL ( , , | v ) ( 1/ 2)[log2 log (y - - v ) ]

The condi

it

it

x β

β x β

iT2 N 2 2 2u i 1 t 1 it u i

tional log likelihood for the sample is then

logL( , , | ) ( 1/ 2)[log2 log (y - - v ) ] itβ v x β


Likelihood Function for Individual i

iT2 N 2 2 2u i 1 t 1 it u i

i i

The for the sample is then

logL( , , | ) ( 1/ 2)[log2 log (y - - v ) ]

The is obtained by integrating v out of L (

it

conditional log likelihood

β v x β

unconditional log likelihood β

i

i

2u i

2 2 2T2 2it u i

i u t 1 i i v i u i

, , | v );

exp[ ( / 2)(y - - v ) ]L ( , , ) (v )dv E L ( , , | v )

2The integral usually does not have a closed form. (For the normal distribution

above, actually, i

itx ββ β

t does. We used that earlier. We ignore that for now.)


Log Likelihood Function

i

i

N 2i ui 1

2 2 2N T it u i

t 1 i ii 1

N

v ii 1

The full log likelihood function that needs to be maximized is

logL logL ( , , )

exp[ ( / 2)(y - - v ) ] = log (v )dv

2

logE L (

it

β

x β

β

2u i

u

, , | v )

This is the function to be maximized to obtain the MLE of [ , ]β,


Computing the Expected LogL

i

i

2i i

2 2 2T it u it 1 i i

2v i u i

How to compute the integral: First note, (v ) exp( v / 2) / 2

exp[ ( / 2)(y - - v ) ] (v )dv

2

E L ( , , | v )

(1) Numerical (Gauss-Hermite) quadrature for int

itx β

β

2 Hvh hh 1

egrals of this form is

remarkably accurate;

e g(v)dv w g(a )

Example: Hermite Quadrature Nodes and Weights, H=5Nodes: -2.02018,-0.95857, 0.00000, 0.95857, 2.02018Weights: 1.99532,0.39362, 0.94531, 0.39362, 1.99532Applications usually use many more points, up to 96 andMuch more accurate (more digits) representations.


Quadrature

i

2 2 2H T it u h

i,Q h t 1h 1

A change of variable is needed to get it into the right form: Each

term then becomes

exp[ ( / 2)(y - - a ) ]1L w

2

and the problem is solved by maximizing with respect to

itx β

β

2u

N

Q i,Qi 1

,

logL logL

(Maximization will be continued later in the semester.)

,


Gauss-Hermite Quadrature

i

i


2i i

i i i i i i

2 2 2T it u i2

i t 1

exp[ ( / 2)(y - - v ) ](v )dv

2

(v ) exp( v / 2) / 2

Make a change of variable to a v / 2 ,v= 2 a, dv= 2 da

exp[ ( / 2)(y - - 2a ) ]1 1exp( a ) 2

2 2 2

it

it

x β

x β

i

i

i

2 2 2T it u i2

i t 1 i

T2 2 2 2i t 1 it u i i

2 Hi i i h 1 h h

da

exp[ ( / 2)(y - - 2a ) ]1 2exp( a ) ] da

2 2 21

exp( a ) exp[ ( / 2)(y - - 2a ) ] da2

1 1exp( a )g a da w g(a )

2 2

it

it

x β

x β


Simulation

i

i

2i u


2v i u i v i

The unconditional log likelihood is an expected value;

logL ( , , )

exp[ ( / 2)(y - - v ) ] = log (v )dv

2

logE L ( , , | v ) = E g(v )

An expected value can

it

β

x β

β

i

i

2 2 2R T it u ir

v i t 1r 1

2 2Tt 1

be 'estimated' by sampling observations

and averaging them

exp[ ( / 2)(y - - v ) ]1E g(v )

R 2

The unconditional log likelihood function is then

exp[ ( / 21log

R

itx β

2

N R it u iri 1 r 1

2u i,1 i,R

i,r

)(y - - v ) ]

2

This is a function of ( , | , ,v ,...,v ),i 1,...,N

The random draws on v become part of the data, and the function

is maximized with respect to

it

i i

x β

β, y X

the unknown parameters.


Convergence Results

i

i

2v i

i

2S

2 2 2N R T it u ir

t 1i 1 r 1

Target is expected log likelihood: logE [L( , |v )]

Simulation estimator based on random sampling from population of v

LogL ( , )

exp[ ( / 2)(y - - v ) ]1log

R 2it

β

β

x β

i

2 2S v i

The essential result is

plim(R )LogL ( , ) logE [L( , |v )]

Conditions:

(1) General regularity and smoothness of the log likelihood

(2) R increases faster than N. ('Intelligent draws' - e.g. Halton

β β

i

2 2S v i

sequences

makes this somewhat ambiguous.)

Result:

Maximizer of LogL ( , ) converges to the maximizer of logE [L( , |v )].β β


MSL vs. ML

.154272 = .023799


Two Level Panel Data

Nested by construction Unbalanced panels

No real obstacle to estimation Some inconvenient algebra. In 2 step FGLS of the RE, need “1/T” to solve

for an estimate of σu2. What to use?

Ni 1 i

N 1/NH i=1 i

Q 1/ T

(1/T)=(1/N) (1 / T ) (early NLOGIT)

Q =[ (1/T )] (Stata)

(TSP, current NLOGIT, do not use this.)


Balanced Nested Panel Data

Zi,j,k,t = test score for student

t, teacher k, school j, district iL = 2 school districts, i = 1,…,LMi = 3 schools in each district, j = 1,…,Mi

Nij = 4 teachers in each school, k = 1,…,Nij

Tijk = 20 students in each class, t = 1,…,Tijk

Antweiler, W., “Nested Random Effects Estimation in Unbalanced Panel Data,” Journal of Econometrics, 101, 2001, pp. 295-313.


Nested Effects Model

ijkt ijkt ijk ij i ijkt

2 2 2 2ijk ij i ijkt u v w

y x u v w

Strict exogeneity, all parts uncorrelated.

Normality assumption added later

Var[u v w ]=

Overall covariance matrix is block diagonal over i, eac

Ω h diagonal block

is block diagonal over j, each of these, in turn, is block diagonal over k,

and each lowest level block has the form of we saw earlier.Ω


GLS with Nested Effects

2

2 2 2 2 21 u u

2 2 2 2 2 22 v u 1 v

2 2 2 2 2 2 2v w v u 2 w

ijkt ijkt ijk1

Define

T T

NT T NT

MNT NT T MNT

GLS is equivalent to OLS regression of

y y 1 y .

ij i

1 2 2 3

ijkt

y .. y...

on the same transformation of . FGLS estimates are

obtained by "three group-wise between estimators and

the within estimator for the innermost group."

x


Unbalanced Nested Data

With unbalanced panels, all the preceding results fall apart.

GLS, FGLS, even fixed effects become analytically intractable.

The log likelihood is very tractable Note a collision of practicality with

nonrobustness. (Normality must be assumed.)


Log Likelihood (1)

ij

i

ijk

2 2 2u v w

u v w2 2 2

N ijkijk ijk u ij k 1

ijk

ijMij ij v i j 1

ij

i w i

T 2ijk t 1 ijkt ijk

Define: , , .

TConstruct: 1 T ,

1 ,

1

Sums of squares: A e , e

ijk ij i

t ijkt ijkt

T N ijk ijMijk t 1 ijkt ij k 1 i j 1

ijk ij

y

B B B e , B , B

x β


Log Likelihood (2)

i

ij

2 Li 1

Mi j 1

Nij k 1

2 2ijk ijk iju v w

ijk 2 2 2ijk ij

H total number of observations

-1logL= [Hlog(2 ) {

2 log {

log {

A B B log } - }-

2i2

i

w

B}]

(For 3 levels instead of 4, set L = 1 and = 0.)


Maximizing Log L

Antweiler provides analytic first derivatives for gradient methods of optimization. Ugly to program.

Numerical derivatives:

r 0 1 r

r r

r

Let be the full vector of K+4 parameters.

Let perturbation vector, with =max( , | |)

in the rth position and zero in the other K+3 positions.

logL( ) logL( )logL2

δ

δ δ


Asymptotic Covariance Matrix

"Even with an analytic gradient, however, the Hessian

matrix, is typically obtained through numeric approximation

methods." Read "the second derivatives are too complicated

to derive, much less prog

Ψ

ram." Also, since logL is not a sum

of terms, the BHHH estimator is not useable. Numerical

second derivatives were used.


An Appropriate Asymptotic Covariance Matrix

ij ijki

ij ijk ijki

2N TML

i 1 j 1 k 1 t 1 ijkt ijkt2

N T TMLWi 1 j 1 k 1 t 1 ijkt t 1 ijkt2

ijk

vi2

The expected Hessian is block diagonal. We can isolate .

logL 1-

1

β

x xβ β

x x

ij ijk ij ijki

ij ijk ij ijki i

N T N TML1 j 1 k 1 t 1 ijkt k 1 t 1 ijkt

ij ijk ijk

N T N TM MLui 1 j 1 k 1 t 1 ijkt j 1 k 1 t 1 ijkt2

ij ijk ij ijk

1 1 1

1 1 1 1

x x

x x

The inverse of this, evaluated at the MLEs provides the appropriate

ˆestimated asymptotic covariance matrix for . Standard errors for the

variance estimators are not needed.

β


Some Observations

Assuming the wrong (e.g., nonnested) error structure Still consistent – GLS with the wrong weights Standard errors (apparently) biased

downward (Moulton bias) Adding “time” effects or other nonnested

effects is “very challenging.” Perhaps do with “fixed” effects (dummy variables).


An Application

Y1jkt = log of atmospheric sulfur dioxide concentration at observation station k at time t, in country i.

H = 2621, 293 stations, 44 countries, various numbers of observations, not equally spaced

Three levels, not 4 as in article. Xjkt =1,log(GDP/km2),log(K/L),log(Income),

Suburban, Rural,Communist,log(Oil price), average temperature, time trend.


Estimates Dimension Random Effects Nested Effects

x1 . . . -10.787 (12.03) -7.103 (5.613)

x2 C S T 0.445 (7.921) 0.202 (2.531)

x3 C . T 0.255 (1.999) 0.371 (2.345)

x4 C . T -0.714 (5.005) -0.477 (2.620)

x5 C S T -0.627 (3.685) -0.720 (4.531)

x6 C S T -0.834 (2.181) -1.061 (3.439)

x7 C . . 0.471 (2.241) 0.613 (1.443)

x8 . . T -0.831 (2.267) -0.08

2

9 (2.410)

x9 C S T -0.045 (4.299) -0.044 (3.719)

x10 . . T -0.043 (1.666) -0.046 (10.927)

0.330

u

v

0.329

1.807 1.017

1.347

logL -2645.4

-2606.0

(t ratios in parentheses)


Rotating Panel-1The structure of the sample and selection of individuals in a rotating sampling design are as follows: Let all individuals in the population be numbered consecutively. The sample in period 1 consists of N, individuals. In period 2, a fraction, met (0 < me2 < N1) of the sample in period 1 are replaced by mi2 new individuals from the population. In period 3 another fraction of the sample in the period 2, me2 (0 < me2 < N2) individuals are replaced by mi3 new individuals and so on. Thus the sample size in period t is Nt = {Nt-1 - met-1 + mii }. The procedure of dropping met-1 individuals selected in period t - 1 and replacing them by mit individuals from the population in period t is called rotating sampling. In this framework total number of observations and individuals observed are ΣtNt and N1 + Σt=2 to Tmit respectively.

Heshmati, A,“Efficiency measurement in rotating panel data,” Applied Economics, 30, 1998, pp. 919-930


Rotating Panel-2

The outcome of the rotating sample for farms producing dairy products is given in Table 1. Each of the annual sample is composed of four parts or subsamples. For example, in 1980 the sample contains 79, 62, 98, and 74 farms. The first three parts (79, 62, and 98) are those not replaced during the transition from 1979 to 1980. The last subsample contains 74 newly included farms from the population. At the same time 85 farms are excluded from the sample in 1979. The difference between the excluded part (85) and the included part (74) corresponds to the changes in the rotating sample size between these two periods, i.e. 313-324 = -11. This difference includes only the part of the sample where each farm is observed consecutively for four years, Nrot. The difference in the non-rotating part, N„„„, is due to those farms which are not observed consecutively. The proportion of farms not observed consecutively, Nnon in the total annual sample, Nnon varies from 11.2 to 22.6% with an average of 18.7 per cent.


Rotating Panels-3

Simply an unbalanced panel Treat with the familiar techniques Accounting is complicated

Time effects may be complicated. Biorn and Jansen (Scand. J. E., 1983) households cohort

1 has T = 1976,1977 while cohort 2 has T=1977,1978. But,… “Time in sample bias…” may require

special treatment. Mexican labor survey has 3 periods rotation. Some families in 1 or 2 or 3 periods.


Pseudo Panels

i(t),t i(t),t i(t) i(t),t

Tt=1

c,t c,t c,t c,t

T different cross sections.

y u , i(t)=1,...,N(t); t=1,...,T

These are N(t) independent observations.

Define C cohorts - e.g., those born 1950-1955.

y u ,

x

x

c

c,t c

c,t c,t c c,t

c=1,...,C; t=1,...,T

Cohort sizes are N (t). Assume large. Then

u u for each cohort. Creates a fixed effects model:

y u , c=1,...,C; t=1,...,T.

(See Baltagi 10.3 for issues relating

x

to measurement error.)

part 6: mle for re models [ 1/38] econometric analysis of panel data william greene department of...

Documents