1/59: topic 1.2 – extensions of the linear regression model microeconometric modeling william...
TRANSCRIPT
1/59: Topic 1.2 – Extensions of the Linear Regression Model
Microeconometric Modeling
William GreeneStern School of BusinessNew York UniversityNew York NY USA
1.2 Extensions of the Linear Regression Model
2/59: Topic 1.2 – Extensions of the Linear Regression Model
Concepts
• Multiple Imputation• Robust Covariance
Matrices• Bootstrap• Maximum Likelihood• Method of Moments• Estimating Individual
Outcomes
Models
• Linear Regression Model• Quantile Regression• Stochastic Frontier
5/59: Topic 1.2 – Extensions of the Linear Regression Model
Implementation
SAS, Stata: Create full data sets with imputed values inserted. M = 5 is the familiar standard number of imputed data sets. Data are replicated and redistributed SAS: Standard procedure and code distributed. Stata: Elaborate imputation equations, M=5
NLOGIT Create an internal map of the missing values
and a set of engines for filling missing values Loop through imputed data sets during
estimation. M may be arbitrary – memory usage and data
storage are independent of M. Data may be replicated
6/59: Topic 1.2 – Extensions of the Linear Regression Model
Regression with Conventional Standard Errors
7/59: Topic 1.2 – Extensions of the Linear Regression Model
Robust Covariance Matrices
Robust standard errors, not estimates Robust to: Heteroscedasticty Not robust to: (all considered later)
Correlation across observations Individual unobserved heterogeneity Incorrect model specification
‘Robust inference’ means hypothesis tests and confidence intervals using robust covariance matrices
-1 2 -1i i ii
The White Estimator
Est.Var[ ] = ( ) e ( )b X X x x X X
9/59: Topic 1.2 – Extensions of the Linear Regression Model
Bootstrap Estimation of the Asymptotic Variance of an Estimator
Known form of asymptotic variance: Compute from known results
Unknown form, known generalities about properties: Use bootstrapping Root N consistency Sampling conditions amenable to central limit theorems Compute by resampling mechanism within the sample.
10/59: Topic 1.2 – Extensions of the Linear Regression Model
Bootstrapping Algorithm
1. Estimate parameters using full sample: b2. Repeat R times:
Draw n observations from the n, with replacement
Estimate with b(r). 3. Estimate variance with
V = (1/R)r [b(r) - b][b(r) - b]’
(Some use mean of replications instead of b. Advocated (without motivation) by original designers of the method.)
11/59: Topic 1.2 – Extensions of the Linear Regression Model
Application: Correlation between Age and Education
14/59: Topic 1.2 – Extensions of the Linear Regression Model
Bootstrapped Confidence IntervalsEstimate Norm()=(12 + 22 + 32 + 42)1/2
16/59: Topic 1.2 – Extensions of the Linear Regression Model
Quantile Regression
Q(y|x,) = x, = quantile Estimated by linear programming Q(y|x,.50) = x, .50 median regression Median regression estimated by LAD (estimates
same parameters as mean regression if symmetric conditional distribution)
Why use quantile (median) regression? Semiparametric Robust to some extensions (heteroscedasticity?) Complete characterization of conditional distribution
17/59: Topic 1.2 – Extensions of the Linear Regression Model
1 1
Model : , ( | , ) , [ , ] 0
ˆˆResiduals: u
1Asymptotic Variance:
= E[f (0) ] Estimated by
Asymptotic Theory Based Estimator of Variance of Q - REG
x | x
A C A
A xx
i i i i i i i i
i i i
u
y u Q y Q u
y
N
βx βx
-βx
1
.2
1 1 1ˆ1 | | B
B 2 Bandwidth B can be Silverman's Rule of Thumb:
ˆ ˆ( | .75) ( | .25)1.06 ,
1.349
(1- )(1- ) [ ] Estimated by
x x
C = xx
N
i i ii
i iu
uN
Q u Q uMin s
N
EN
12For =.5 and normally distributed u, this all simplifies to .2
But, this is an ideal application for bootstrap
X X
X
g.
X
pin
us
Estimated Variance for Quantile Regression
21/59: Topic 1.2 – Extensions of the Linear Regression Model
Coefficient on MALE dummy variable in quantile regressions
22/59: Topic 1.2 – Extensions of the Linear Regression Model
A Production Function Model with Inefficiency The Stochastic Frontier Model
24/59: Topic 1.2 – Extensions of the Linear Regression Model
Cost Inefficiency
y* = f(x) C* = g(y*,w)
(Samuelson – Shephard duality results)
Cost inefficiency: If y < f(x), then C must be greater than g(y,w). Implies the idea of a cost frontier.
lnC = lng(y,w) + u, u > 0.
27/59: Topic 1.2 – Extensions of the Linear Regression Model
Stochastic Frontier Models Motivation:
Factors not under control of the firm Measurement error Differential rates of adoption of technology
Frontier is randomly placed by the whole collection of stochastic elements which might enter the model outside the control of the firm.
Aigner, Lovell, Schmidt (1977),
Meeusen, van den Broeck (1977),
Battese, Corra (1977)
28/59: Topic 1.2 – Extensions of the Linear Regression Model
The Stochastic Frontier Model
( )
ln +
= + .
iviii
i i ii
i i
= fy eTE
= + v uy
+
x
x
x
ui > 0, but vi may take any value. A symmetric distribution, such as the normal distribution, is usually assumed for vi. Thus, the stochastic frontier is
+’xi+vi
and, as before, ui represents the inefficiency.
29/59: Topic 1.2 – Extensions of the Linear Regression Model
Least Squares Estimation
Average inefficiency is embodied in the third moment of the disturbance εi = vi - ui.
So long as E[vi - ui] is constant, the OLS estimates of the slope parameters of the frontier function are unbiased and consistent. (The constant term estimates α-E[ui]. The average inefficiency present in the distribution is reflected in the asymmetry of the distribution, which can be estimated using the OLS residuals:
3
1
1 ˆˆ( - [ ])N
N
3 i ii
= Em
30/59: Topic 1.2 – Extensions of the Linear Regression Model
Application to Spanish Dairy Farms
Input Units Mean Std. Dev.
Minimum
Maximum
Milk Milk production (liters)
131,108 92,539 14,110 727,281
Cows # of milking cows 2.12 11.27 4.5 82.3
Labor
# man-equivalent units
1.67 0.55 1.0 4.0
Land Hectares of land devoted to pasture and crops.
12.99 6.17 2.0 45.1
Feed Total amount of feedstuffs fed to dairy cows (tons)
57,941 47,981 3,924.14
376,732
N = 247 farms, T = 6 years (1993-1998)
32/59: Topic 1.2 – Extensions of the Linear Regression Model
The Normal-Half Normal Model
2
2
ln
1Normal component: ~ [0, ]; ( ) , .
Half normal component: | |, ~ [0, ]
1 Underlying normal: ( ) ,
Half
x
xi i i i
i i
ii v i i
v v
i i i u
ii i
u u
y v u
vv N f v v
u U U N
Uf U v
1 1normal ( ) ,0
(0)
ii i
u u
uf u u
34/59: Topic 1.2 – Extensions of the Linear Regression Model
Estimation: Least Squares/MoM
OLS estimator of β is consistent E[ui] = (2/π)1/2σu, so OLS constant
estimates α+ (2/π)1/2σu
Second and third moments of OLS residuals estimate
Method of Moments:Use [a,b,m2,m3] to estimate [,,u, v]
and 0
2 2 32 u v 3 u
- 2 2 4 = + = 1 - m m
35/59: Topic 1.2 – Extensions of the Linear Regression Model
Standard Form: The Skew Normal Distribution
36/59: Topic 1.2 – Extensions of the Linear Regression Model
Log Likelihood Function
Waldman (1982) result on skewness of OLS residuals: If the OLS residuals are positively skewed, rather than negative, then OLS maximizes the log likelihood, and there is no evidence of inefficiency in the data.
40/59: Topic 1.2 – Extensions of the Linear Regression Model
Alternative Models:Half Normal and Exponential
41/59: Topic 1.2 – Extensions of the Linear Regression Model
Other Models
Many other parametric models Semiparametric and nonparametric – the recent
outer reaches of the theoretical literature Other variations including heterogeneity in the
frontier function and in the distribution of inefficiency
Normal-Exponential Likelihood
2 2n
ui=1
Ln ( ; ) =
(( ) / ( )1-ln ln
2
v u
u i i v u i i
v v u
L data
v u v u
42/59: Topic 1.2 – Extensions of the Linear Regression Model
A Test for Inefficiency? Base test on u = 0 <=> = 0 Standard test procedures
Likelihood ratio Wald Lagrange Multiplier
Nonstandard testing situation: Variance = 0 on the boundary of the
parameter space Standard chi squared distribution does not
apply.
44/59: Topic 1.2 – Extensions of the Linear Regression Model
Estimating ui
No direct estimate of ui
Data permit estimation of yi – β’xi. Can this be used? εi = yi – β’xi = vi – ui Indirect estimate of ui, using E[ui|vi – ui] This is E[ui|yi, xi]
vi – ui is estimable with ei = yi – b’xi.
45/59: Topic 1.2 – Extensions of the Linear Regression Model
Fundamental Tool - JLMS
2
( )[ | ] ,
1 ( )i i
i i i ii
E u
We can insert our maximum likelihood estimates of all parameters.
Note: This estimates E[u|vi – ui], not ui.
2
ˆ ˆˆ ˆˆ ( ) ( )ˆ ˆ ˆˆ[ | ] , ˆ ˆ ˆ( )1
i i ii i i i
i
yE u
x
47/59: Topic 1.2 – Extensions of the Linear Regression Model
Estimated Translog Production Frontiers
51/59: Topic 1.2 – Extensions of the Linear Regression Model
A Semiparametric Approach
Y = g(x,z) + v - u [Normal-Half Normal](1) Locally linear nonparametric regression estimates g(x,z)(2) Use residuals from nonparametric regression to estimate variance parameters using MLE(3) Use estimated variance parameters and residuals to estimate technical efficiency.
57/59: Topic 1.2 – Extensions of the Linear Regression Model
Methodological Problems with DEA
Measurement error Outliers Specification errors The overall problem with the
deterministic frontier approach
58/59: Topic 1.2 – Extensions of the Linear Regression Model
DEA and SFA: Same Answer?
Christensen and Greene data N=123 minus 6 tiny firms X = capital, labor, fuel Y = millions of KWH
Cobb-Douglas Production Function vs. DEA