part 6: mle for re models [ 1/38] econometric analysis of panel data william greene department of...
TRANSCRIPT
Part 6: MLE for RE Models [ 1/38]
Econometric Analysis of Panel Data
William Greene
Department of Economics
Stern School of Business
Part 6: MLE for RE Models [ 2/38]
The Random Effects Model The random effects model
ci is uncorrelated with xit for all t; E[ci |Xi] = 0
E[εit|Xi,ci]=0
it it i it
i i i i i
i i i i i i i
Ni=1 i
1 2 N
y = +c+ε , observation for person i at time t
= +c + , T observations in group i
= + + , note (c ,c ,...,c )
= + + , T observations in the sample
c=( , ,... ) ,
x β
y Xβ i ε
Xβ c ε c
y Xβ c ε
c c c Ni=1 iT by 1 vector
Part 6: MLE for RE Models [ 3/38]
Error Components Model
Generalized Regression Model
it it i
it i
2 2it i
i i
2 2i i u
i i i i
y +ε +u
E[ε | ] 0
E[ε | ] σ
E[u | ] 0
E[u | ] σ
= + +u for T observations
it
i
x b
X
X
X
X
y Xβ ε i
2 2 2 2u u u
2 2 2 2u u u
i i
2 2 2 2u u u
Var[ +u ]
ε i
Part 6: MLE for RE Models [ 4/38]
Notation2 2 2 2
u u u2 2 2 2u u u
i i
2 2 2 2u u u
2 2u i i
2 2u
i
1
2
N
Var[ +u ]
= T T
=
=
Var[ | ]
i
i
T
T
ε i
I ii
I ii
Ω
Ω 0 0
0 Ω 0w X
0 0 Ω
i
(Note these differ only
in the dimension T)
Part 6: MLE for RE Models [ 5/38]
Maximum Likelihood
i
it i
i1 i2 iT i i
i i
2 2u
Assuming normality of and u.
Treat T joint observations on [( , ,... ),u ] as one T
variate observation. The mean vector of u is zero
and the covariance matrix is = I .
The ji
ε i
Ω ii
iT / 2 1/ 2 12
Ni=1 i
2 2i u i
i
oint density for ( ) is
f( ) (2 ) | | exp ( ) ( )
logL= logL where
-1logL ( , ) = T log2 log| | ( ) ( )
2-1
= T l2
i i i
-1i i i i i i i
-1i i i i i i
ε y - Xβ
ε Ω y - Xβ Ω y - Xβ
β, Ω y - Xβ Ω y - Xβ
og2 log| | -1i i i iΩ ε Ω ε
Part 6: MLE for RE Models [ 6/38]
MLE Panel Data Algebra (1)
i
2
T2 2 2i u
2
i i i i i i2 2 2i u
2 2i i
i i2 2 2i u
1I
T
So,
1T
(T )1
T
-1i
-1i
Ω ii
εΩ ε εε εii ε
εε
Part 6: MLE for RE Models [ 7/38]
MLE Panel Data Algebra (1, cont.)
i2 2 2 2 2
u
T2
t 1
2 2
= = [ ]=
| |=( ) , = a characteristic root of
Roots are (real since is symmetric) solutions to =
= = + or ( ) ( 1)
Any vector whose e
i
i
T
i t
Ω I ii I ii A
Ω A
A Ac c
Ac c c ii c i i c = - c
i i
2
lements sum to zero ( =0)
is a characteristic vector that corresponds to root = 1.
There are T -1 such vectors, so T - 1 of the roots are 1.
Suppose 0. Premultiply by to find
( )
i c
i c i
i i i c =
ii i
2i
2i
TT T2 2 2t it 1
( 1) = T ( )=( 1) . Since 0,
divide by it to obtain the remaining root =1+T .
Therefore, | |=( ) ( ) (1 T )i
- i c i c - i c i c
Ω
Part 6: MLE for RE Models [ 8/38]
MLE Panel Data Algebra (1, conc.)
i i
2 22 2 i i
i i i i i2 2 2i u
Ni 1 i
2 N N 2 Ni 1 i i 1 i i 1 i2
-1logL T log2 log| |
2
(T )-1 1 T log2 T log log(1 T )
2 T
logL logL
-1 1[(log2 log ) T + log(1 T )]
2 2
-1i i i iΩ ε Ω ε
εε
εε
2 2i i
i 2 2i u
2 2 2 2 22 2 2 i i i i i i
u 2 2 2 2 2 2i u i i
22 2i i i
i i i i2 2i
(T )T
(T ) (T ) (T )since / ,
T T 1 T
-T (T )1logL [(log2 log ) +log(1 T )]
2 2 1 Tεε
Part 6: MLE for RE Models [ 9/38]
Maximizing the Log Likelihood Difficult: “Brute force” + some elegant theoretical results:
See Baltagi, pp. 22-23. (Back and forth from GLS to ε2
and u2.)
Somewhat less difficult and more practical: At any iteration, given estimates of ε
2 and u2 the estimator of
is GLS (of course), so we iterate back and forth between these. See Hsiao, pp. 39-40.
2 2u
2 2 2 2,r u,r r+1 ,r u,r
N ii=1 ii,r 1 D i,r 12
r+1 ,r+1 Ni=1 i
0. Begin iterations with, say, FGLS estimates of , , .
ˆ1. Given and , compute by FGLS ( , )ˆ ˆ ˆ ˆ
Mˆ ˆˆ2. Given compute = ˆ(T 1)
3. Gi
β
β
β
N 22 2 i=1 i.r 1
r+1 ,r+1 u,r+1
r+1 r
ˆˆven , compute = ˆ ˆN
ˆ ˆ4. Return to step 1 and repeat until - = .
β
β β 0
Part 6: MLE for RE Models [ 10/38]
Direct Maximization of LogL
2 2 2u i i i i
2i i i i i i i
Simpler : Take advantage of the invariance of maximum
likelihood estimators to transformations of the parameters.
Let =1/ , = / , R T 1, Q / R ,
logL (1/ 2)[ ( Q(T ) ) logR T log T l
i iε ε og2 ]
Can be maximized using ordinary optimization methods (not
Newton, as suggested by Hsiao). Treat as a standard nonlinear
optimization problem. Solve with iterative, gradient methods.
Part 6: MLE for RE Models [ 11/38]
Part 6: MLE for RE Models [ 12/38]
Part 6: MLE for RE Models [ 13/38]
Maximum Simulated Likelihood
it i i u i
i it u i it i
it i
Assuming and u are normally distributed. Write u = v
where v ~ N[0,1]. Then y = + v . If v were
observed data, all observations would be independent, and
log f(y | ,v ) 1
it
it
x β
x
i
2 2 2it u i
2 2
i i
T2 2 2 2i u i t 1 it u i
/ 2[log2 log (y - - v ) / ]
Let 1/
The log of the joint density for T observations with common v is
logL ( , , | v ) ( 1/ 2)[log2 log (y - - v ) ]
The condi
it
it
x β
β x β
iT2 N 2 2 2u i 1 t 1 it u i
tional log likelihood for the sample is then
logL( , , | ) ( 1/ 2)[log2 log (y - - v ) ] itβ v x β
Part 6: MLE for RE Models [ 14/38]
Likelihood Function for Individual i
iT2 N 2 2 2u i 1 t 1 it u i
i i
The for the sample is then
logL( , , | ) ( 1/ 2)[log2 log (y - - v ) ]
The is obtained by integrating v out of L (
it
conditional log likelihood
β v x β
unconditional log likelihood β
i
i
2u i
2 2 2T2 2it u i
i u t 1 i i v i u i
, , | v );
exp[ ( / 2)(y - - v ) ]L ( , , ) (v )dv E L ( , , | v )
2The integral usually does not have a closed form. (For the normal distribution
above, actually, i
itx ββ β
t does. We used that earlier. We ignore that for now.)
Part 6: MLE for RE Models [ 15/38]
Log Likelihood Function
i
i
N 2i ui 1
2 2 2N T it u i
t 1 i ii 1
N
v ii 1
The full log likelihood function that needs to be maximized is
logL logL ( , , )
exp[ ( / 2)(y - - v ) ] = log (v )dv
2
logE L (
it
β
x β
β
2u i
u
, , | v )
This is the function to be maximized to obtain the MLE of [ , ]β,
Part 6: MLE for RE Models [ 16/38]
Computing the Expected LogL
i
i
2i i
2 2 2T it u it 1 i i
2v i u i
How to compute the integral: First note, (v ) exp( v / 2) / 2
exp[ ( / 2)(y - - v ) ] (v )dv
2
E L ( , , | v )
(1) Numerical (Gauss-Hermite) quadrature for int
itx β
β
2 Hvh hh 1
egrals of this form is
remarkably accurate;
e g(v)dv w g(a )
Example: Hermite Quadrature Nodes and Weights, H=5Nodes: -2.02018,-0.95857, 0.00000, 0.95857, 2.02018Weights: 1.99532,0.39362, 0.94531, 0.39362, 1.99532Applications usually use many more points, up to 96 andMuch more accurate (more digits) representations.
Part 6: MLE for RE Models [ 17/38]
Quadrature
i
2 2 2H T it u h
i,Q h t 1h 1
A change of variable is needed to get it into the right form: Each
term then becomes
exp[ ( / 2)(y - - a ) ]1L w
2
and the problem is solved by maximizing with respect to
itx β
β
2u
N
Q i,Qi 1
,
logL logL
(Maximization will be continued later in the semester.)
,
Part 6: MLE for RE Models [ 18/38]
Gauss-Hermite Quadrature
i
i
2 2 2T it u it 1 i i
2i i
i i i i i i
2 2 2T it u i2
i t 1
exp[ ( / 2)(y - - v ) ](v )dv
2
(v ) exp( v / 2) / 2
Make a change of variable to a v / 2 ,v= 2 a, dv= 2 da
exp[ ( / 2)(y - - 2a ) ]1 1exp( a ) 2
2 2 2
it
it
x β
x β
i
i
i
2 2 2T it u i2
i t 1 i
T2 2 2 2i t 1 it u i i
2 Hi i i h 1 h h
da
exp[ ( / 2)(y - - 2a ) ]1 2exp( a ) ] da
2 2 21
exp( a ) exp[ ( / 2)(y - - 2a ) ] da2
1 1exp( a )g a da w g(a )
2 2
it
it
x β
x β
Part 6: MLE for RE Models [ 19/38]
Simulation
i
i
2i u
2 2 2T it u it 1 i i
2v i u i v i
The unconditional log likelihood is an expected value;
logL ( , , )
exp[ ( / 2)(y - - v ) ] = log (v )dv
2
logE L ( , , | v ) = E g(v )
An expected value can
it
β
x β
β
i
i
2 2 2R T it u ir
v i t 1r 1
2 2Tt 1
be 'estimated' by sampling observations
and averaging them
exp[ ( / 2)(y - - v ) ]1E g(v )
R 2
The unconditional log likelihood function is then
exp[ ( / 21log
R
itx β
2
N R it u iri 1 r 1
2u i,1 i,R
i,r
)(y - - v ) ]
2
This is a function of ( , | , ,v ,...,v ),i 1,...,N
The random draws on v become part of the data, and the function
is maximized with respect to
it
i i
x β
β, y X
the unknown parameters.
Part 6: MLE for RE Models [ 20/38]
Convergence Results
i
i
2v i
i
2S
2 2 2N R T it u ir
t 1i 1 r 1
Target is expected log likelihood: logE [L( , |v )]
Simulation estimator based on random sampling from population of v
LogL ( , )
exp[ ( / 2)(y - - v ) ]1log
R 2it
β
β
x β
i
2 2S v i
The essential result is
plim(R )LogL ( , ) logE [L( , |v )]
Conditions:
(1) General regularity and smoothness of the log likelihood
(2) R increases faster than N. ('Intelligent draws' - e.g. Halton
β β
i
2 2S v i
sequences
makes this somewhat ambiguous.)
Result:
Maximizer of LogL ( , ) converges to the maximizer of logE [L( , |v )].β β
Part 6: MLE for RE Models [ 21/38]
MSL vs. ML
.154272 = .023799
Part 6: MLE for RE Models [ 22/38]
Two Level Panel Data
Nested by construction Unbalanced panels
No real obstacle to estimation Some inconvenient algebra. In 2 step FGLS of the RE, need “1/T” to solve
for an estimate of σu2. What to use?
Ni 1 i
N 1/NH i=1 i
Q 1/ T
(1/T)=(1/N) (1 / T ) (early NLOGIT)
Q =[ (1/T )] (Stata)
(TSP, current NLOGIT, do not use this.)
Part 6: MLE for RE Models [ 23/38]
Balanced Nested Panel Data
Zi,j,k,t = test score for student
t, teacher k, school j, district iL = 2 school districts, i = 1,…,LMi = 3 schools in each district, j = 1,…,Mi
Nij = 4 teachers in each school, k = 1,…,Nij
Tijk = 20 students in each class, t = 1,…,Tijk
Antweiler, W., “Nested Random Effects Estimation in Unbalanced Panel Data,” Journal of Econometrics, 101, 2001, pp. 295-313.
Part 6: MLE for RE Models [ 24/38]
Nested Effects Model
ijkt ijkt ijk ij i ijkt
2 2 2 2ijk ij i ijkt u v w
y x u v w
Strict exogeneity, all parts uncorrelated.
Normality assumption added later
Var[u v w ]=
Overall covariance matrix is block diagonal over i, eac
Ω h diagonal block
is block diagonal over j, each of these, in turn, is block diagonal over k,
and each lowest level block has the form of we saw earlier.Ω
Part 6: MLE for RE Models [ 25/38]
GLS with Nested Effects
2
2 2 2 2 21 u u
2 2 2 2 2 22 v u 1 v
2 2 2 2 2 2 2v w v u 2 w
ijkt ijkt ijk1
Define
T T
NT T NT
MNT NT T MNT
GLS is equivalent to OLS regression of
y y 1 y .
ij i
1 2 2 3
ijkt
y .. y...
on the same transformation of . FGLS estimates are
obtained by "three group-wise between estimators and
the within estimator for the innermost group."
x
Part 6: MLE for RE Models [ 26/38]
Unbalanced Nested Data
With unbalanced panels, all the preceding results fall apart.
GLS, FGLS, even fixed effects become analytically intractable.
The log likelihood is very tractable Note a collision of practicality with
nonrobustness. (Normality must be assumed.)
Part 6: MLE for RE Models [ 27/38]
Log Likelihood (1)
ij
i
ijk
2 2 2u v w
u v w2 2 2
N ijkijk ijk u ij k 1
ijk
ijMij ij v i j 1
ij
i w i
T 2ijk t 1 ijkt ijk
Define: , , .
TConstruct: 1 T ,
1 ,
1
Sums of squares: A e , e
ijk ij i
t ijkt ijkt
T N ijk ijMijk t 1 ijkt ij k 1 i j 1
ijk ij
y
B B B e , B , B
x β
Part 6: MLE for RE Models [ 28/38]
Log Likelihood (2)
i
ij
2 Li 1
Mi j 1
Nij k 1
2 2ijk ijk iju v w
ijk 2 2 2ijk ij
H total number of observations
-1logL= [Hlog(2 ) {
2 log {
log {
A B B log } - }-
2i2
i
w
B}]
(For 3 levels instead of 4, set L = 1 and = 0.)
Part 6: MLE for RE Models [ 29/38]
Maximizing Log L
Antweiler provides analytic first derivatives for gradient methods of optimization. Ugly to program.
Numerical derivatives:
r 0 1 r
r r
r
Let be the full vector of K+4 parameters.
Let perturbation vector, with =max( , | |)
in the rth position and zero in the other K+3 positions.
logL( ) logL( )logL2
δ
δ δ
Part 6: MLE for RE Models [ 30/38]
Asymptotic Covariance Matrix
"Even with an analytic gradient, however, the Hessian
matrix, is typically obtained through numeric approximation
methods." Read "the second derivatives are too complicated
to derive, much less prog
Ψ
ram." Also, since logL is not a sum
of terms, the BHHH estimator is not useable. Numerical
second derivatives were used.
Part 6: MLE for RE Models [ 31/38]
An Appropriate Asymptotic Covariance Matrix
ij ijki
ij ijk ijki
2N TML
i 1 j 1 k 1 t 1 ijkt ijkt2
N T TMLWi 1 j 1 k 1 t 1 ijkt t 1 ijkt2
ijk
vi2
The expected Hessian is block diagonal. We can isolate .
logL 1-
1
β
x xβ β
x x
ij ijk ij ijki
ij ijk ij ijki i
N T N TML1 j 1 k 1 t 1 ijkt k 1 t 1 ijkt
ij ijk ijk
N T N TM MLui 1 j 1 k 1 t 1 ijkt j 1 k 1 t 1 ijkt2
ij ijk ij ijk
1 1 1
1 1 1 1
x x
x x
The inverse of this, evaluated at the MLEs provides the appropriate
ˆestimated asymptotic covariance matrix for . Standard errors for the
variance estimators are not needed.
β
Part 6: MLE for RE Models [ 32/38]
Some Observations
Assuming the wrong (e.g., nonnested) error structure Still consistent – GLS with the wrong weights Standard errors (apparently) biased
downward (Moulton bias) Adding “time” effects or other nonnested
effects is “very challenging.” Perhaps do with “fixed” effects (dummy variables).
Part 6: MLE for RE Models [ 33/38]
An Application
Y1jkt = log of atmospheric sulfur dioxide concentration at observation station k at time t, in country i.
H = 2621, 293 stations, 44 countries, various numbers of observations, not equally spaced
Three levels, not 4 as in article. Xjkt =1,log(GDP/km2),log(K/L),log(Income),
Suburban, Rural,Communist,log(Oil price), average temperature, time trend.
Part 6: MLE for RE Models [ 34/38]
Estimates Dimension Random Effects Nested Effects
x1 . . . -10.787 (12.03) -7.103 (5.613)
x2 C S T 0.445 (7.921) 0.202 (2.531)
x3 C . T 0.255 (1.999) 0.371 (2.345)
x4 C . T -0.714 (5.005) -0.477 (2.620)
x5 C S T -0.627 (3.685) -0.720 (4.531)
x6 C S T -0.834 (2.181) -1.061 (3.439)
x7 C . . 0.471 (2.241) 0.613 (1.443)
x8 . . T -0.831 (2.267) -0.08
2
9 (2.410)
x9 C S T -0.045 (4.299) -0.044 (3.719)
x10 . . T -0.043 (1.666) -0.046 (10.927)
0.330
u
v
0.329
1.807 1.017
1.347
logL -2645.4
-2606.0
(t ratios in parentheses)
Part 6: MLE for RE Models [ 35/38]
Rotating Panel-1The structure of the sample and selection of individuals in a rotating sampling design are as follows: Let all individuals in the population be numbered consecutively. The sample in period 1 consists of N, individuals. In period 2, a fraction, met (0 < me2 < N1) of the sample in period 1 are replaced by mi2 new individuals from the population. In period 3 another fraction of the sample in the period 2, me2 (0 < me2 < N2) individuals are replaced by mi3 new individuals and so on. Thus the sample size in period t is Nt = {Nt-1 - met-1 + mii }. The procedure of dropping met-1 individuals selected in period t - 1 and replacing them by mit individuals from the population in period t is called rotating sampling. In this framework total number of observations and individuals observed are ΣtNt and N1 + Σt=2 to Tmit respectively.
Heshmati, A,“Efficiency measurement in rotating panel data,” Applied Economics, 30, 1998, pp. 919-930
Part 6: MLE for RE Models [ 36/38]
Rotating Panel-2
The outcome of the rotating sample for farms producing dairy products is given in Table 1. Each of the annual sample is composed of four parts or subsamples. For example, in 1980 the sample contains 79, 62, 98, and 74 farms. The first three parts (79, 62, and 98) are those not replaced during the transition from 1979 to 1980. The last subsample contains 74 newly included farms from the population. At the same time 85 farms are excluded from the sample in 1979. The difference between the excluded part (85) and the included part (74) corresponds to the changes in the rotating sample size between these two periods, i.e. 313-324 = -11. This difference includes only the part of the sample where each farm is observed consecutively for four years, Nrot. The difference in the non-rotating part, N„„„, is due to those farms which are not observed consecutively. The proportion of farms not observed consecutively, Nnon in the total annual sample, Nnon varies from 11.2 to 22.6% with an average of 18.7 per cent.
Part 6: MLE for RE Models [ 37/38]
Rotating Panels-3
Simply an unbalanced panel Treat with the familiar techniques Accounting is complicated
Time effects may be complicated. Biorn and Jansen (Scand. J. E., 1983) households cohort
1 has T = 1976,1977 while cohort 2 has T=1977,1978. But,… “Time in sample bias…” may require
special treatment. Mexican labor survey has 3 periods rotation. Some families in 1 or 2 or 3 periods.
Part 6: MLE for RE Models [ 38/38]
Pseudo Panels
i(t),t i(t),t i(t) i(t),t
Tt=1
c,t c,t c,t c,t
T different cross sections.
y u , i(t)=1,...,N(t); t=1,...,T
These are N(t) independent observations.
Define C cohorts - e.g., those born 1950-1955.
y u ,
x
x
c
c,t c
c,t c,t c c,t
c=1,...,C; t=1,...,T
Cohort sizes are N (t). Assume large. Then
u u for each cohort. Creates a fixed effects model:
y u , c=1,...,C; t=1,...,T.
(See Baltagi 10.3 for issues relating
x
to measurement error.)