endogeneity and instrumental variables

Upload: justin-bal

Post on 22-Feb-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/24/2019 Endogeneity and Instrumental Variables

    1/22

    Endogeneity andInstrumental

    VariablesPresented by Justin Balthrop

    September 28, 2015

  • 7/24/2019 Endogeneity and Instrumental Variables

    2/22

    What Exactly Are InstrumentalVariables?

    Take the simple case of ordinary least squares regression with a singleexplanatory variable:

    A fundamental assumption of estimating _1 is that the correlation betweenX and u is zero.

    If this is not the case (X is endogenous), then using instrumental variables Zcan essentially detect the part of X which is *not* correlated with the errorterm.

  • 7/24/2019 Endogeneity and Instrumental Variables

    3/22

    An Instrument must satisfyRelevance and Exclusion

    Loosely speaking, the relevance restriction imposes that the instrument Z isnon-trivially related to the endogenous regressor X.

    Exclusion, or exogeneity requires that Z not be systematically related to theerror term, u.

  • 7/24/2019 Endogeneity and Instrumental Variables

    4/22

    Why do we need IV?

    Before we worry about external validity and the big picture implications ofour results, we need to satisfy internal validity.

    Three main sources of internal validity issues are: Omitted variables bias

    Simultaneity bias

    Errors-in-variables bias

    Appropriately instrumenting for endogenous regressors can eliminate thesebiases

  • 7/24/2019 Endogeneity and Instrumental Variables

    5/22

    Univariate IV Estimation: Two StLeast Squares

    Stage 1: Identify the portion of X that is uncorrelated with the error, u

    This gives estimates of _0 and _1, which are used to get predicted X values:

    Stage 2: Replace X values with estimated X

  • 7/24/2019 Endogeneity and Instrumental Variables

    6/22

    Underlying Assumptions of 2SLS

    Instrument validity

    _0 and _1 are well-estimated in the first stage (large samples)

    Why it works:

  • 7/24/2019 Endogeneity and Instrumental Variables

    7/22

    Careful about Standard Errors Second-stage OLS standard errors are not correct

    They need to be adjusted for the fact that the explanatory variables areestimated

    See Woolridgefor the math, STATA for the code- ivreg, robust

    Other considerations: Heteroskedasticity

    Appropriate clustering

    Instrument relevance- more relevancelower estimator variance and higher R-squared in the first stage

  • 7/24/2019 Endogeneity and Instrumental Variables

    8/22

    IV Regression in a Multivariate MoAside from messy algebra, estimation generalizes rather easily

    Key identification criterion:at least as manyZ as endogenous X Dont forget to instrument for interactions between endogenous X

    Underidentified = too few to estimate _vec (correctly, anyway)

    Exactly identified = equal number of Z and endogenous X Overidentified = too many instruments

  • 7/24/2019 Endogeneity and Instrumental Variables

    9/22

    Testing for Instrument RelevanceAssume one endogenous X

    First stage regress is therefore:

    Relevance comes fromat least one_i different from zero

    If not, the instrument isweak

  • 7/24/2019 Endogeneity and Instrumental Variables

    10/22

    Why are weak instruments so bad Back to the simple model:

    With estimator:

    Weak instruments leads to a near-zero denominator, and the resultingsampling distribution cannot be accurately approximated by its asymptoticdistribution

  • 7/24/2019 Endogeneity and Instrumental Variables

    11/22

    Measuring the Strength of Instrum The first-stage F-test

    Tests the hypothesis that instruments Z_i do not enter the first-stage regression

    Small F-stat (less than 10) are the result of weak instruments

    If the set of instruments is weak, get better instruments If that is impossible, consider dropping the weakest to improve the first-stage F

    This is somewhat ad-hoc

  • 7/24/2019 Endogeneity and Instrumental Variables

    12/22

    Too many Instruments = Tests fOveridentifying RestrictionsAssume we have multiple valid instruments with a single endogenousregressor.

    Intuition: If we perform 2SLS using both instruments separately and arriveat completely different results, it shouldnt be that both instruments arevalid.

    Statistics: J-test

  • 7/24/2019 Endogeneity and Instrumental Variables

    13/22

    J-Test of Overidentifying Restrict Step 1: estimate the conditional expectation function using TSLS and both instru

    Step 2: Compute predicted Y values using theactualXs

    Step 3: Compute residuals

    Step 4: Regress residuals against all instruments Z and exogenous regressors X

    Step 5: Test the hypothesis that all coefficients on Z_i are zero, with J-statistic J=

    Here, F is the F-stat from testing coefficients on Z_i

    If some instruments are exogenous and others endogenous, J-stat will be large, rnull that all instruments are exogenous

  • 7/24/2019 Endogeneity and Instrumental Variables

    14/22

    Sargan Test for Overidentification1. Estimate the 2SLS IV regression - Extract residuals

    2. Regress these residuals on all exogenous variables and extract R2

    3. Calculate nR2 which is 2 distributed

    4. Compare the value with the critical value in the chi-square table withdegrees of freedom equal to # instruments less #

    If the statistic (nR2) exceeds the critical 2 value, conclude the instruments

    are invalid. They are not uncorrelated with the error term and hence has some explanatorypower in the main equation.

    Be very careful: The test assumes that one instrument is valid.

    If all instruments do not fulfill the criteria Cov(zi,ui) = 0, then the test mightsuggest that the instruments are valid, even when they are not

  • 7/24/2019 Endogeneity and Instrumental Variables

    15/22

    Durbin-Wu-Hausman Test Balances the consistency of IV against the efficiency of LS

    H0: IV and LS both consistent, but LS is efficient

    H1: Only IV is consistent

    DWH test for a single endogenous regressor:DWH = (bIV bLS) / (s2bIVs2bLS)~ N(0,1)

    If |DWH| > 1.96, then X is endogenous and IV is the preferred estimator despite itsinefficiency

    A roughly equivalent procedure for DWH:1. Estimate the first-stage model2. Include the first-stage residual in the structural model along with the endogenous X

    3. Test for significance of the coefficient on residual

    Note: Coefficient on endogenous X in this model is bIV(standard error issmaller, though) First-stage residual is a generated regressor

  • 7/24/2019 Endogeneity and Instrumental Variables

    16/22

    The following example istaken from the University of

    Albany Center for Social and

    Demographic Analysispresentation on IVEstimation

  • 7/24/2019 Endogeneity and Instrumental Variables

    17/22

    Angrist and Krueger (1991),J.L.E Returns to education (Y = wages)

    Problem of omitted ability bias

    Years of schooling vary by quarter of birth Compulsory schooling laws, age-at-entry rules

    Someone born in Q1 is a little older and will be able to drop out sooner than someone born

    Q.O.B. can be treated as a useful source of exogeneity in schooling

  • 7/24/2019 Endogeneity and Instrumental Variables

    18/22

    Angrist and Krueger (1991),J.L.E. People born in Q1 do obtain lessschooling But pay close attention to the scale ofthe y-axis

    Mean difference between Q1 and Q4is only 0.124, or 1.5 months

    So...need large N since R2X,Zwill

    be very smallA&K had over 300k for the 1930-39cohort

    Source: Angrist and Krueger (1991), Figure I

  • 7/24/2019 Endogeneity and Instrumental Variables

    19/22

    Angrist and Krueger (1991),J.L.E.

    Final 2SLS model interacted QOB with year of birth (30), state of birth (150) OLS: b = .0628 (s.e. = .0003)

    2SLS: b = .0811 (s.e. = .0109)

    Least squares estimate does not appear to be badly biased by omitted variables But...replication effort identified some pitfalls in this analysis that are instructive

  • 7/24/2019 Endogeneity and Instrumental Variables

    20/22

    Bound, Jaeger, and Baker (1995),J.A.S

    Potential problems with QOB as an IV Correlation between QOB and schooling is weak

    Small Cov(X,Z) introduces finite-sample bias, which will be exacerbated with the inclusion of many IV

    QOB may not be completely exogenous Even small Cov(Z,e) will cause inconsistency, and this will be exacerbated when Cov(X,Z) is small

    QOB qualifies as a weak instrument that may be correlated with unobserved determ

    wages (e.g., family income)

  • 7/24/2019 Endogeneity and Instrumental Variables

    21/22

    Bound, Jaeger, and Baker (1995),J.A.S

    Even if the instrument is good, matters can be made far worse with IV as opposed Weak correlation between IV and endogenous regressor can pose severe finite-sample bias

    Andreally large samples wont help, especially if there is even weak endogeneity between IV and er

    First-stage diagnostics provide a sense of how good an IV is in a given setting F-test and partial-R2on IVs

  • 7/24/2019 Endogeneity and Instrumental Variables

    22/22

    Lewbel (2012) Method of Identificati Mostly applicable to models with an unobserved common factor

    Identification is achieved by having regressors that are uncorrelated with the product ofheteroskedastic errors

    ConsiderY1,Y2 as observed endogenous variables, X a vector of observed exogenousregressors, and =(1,2) as unobserved error processes.

    Consider a structural model of the form:

    Y1 = X1+Y21+1 (1)

    Y2 = X2+Y12+2 (2)

    Higher-moment considerations (restricting correlations of with X);

    In the presence of heteroskedasticity related to at least some elements of X,identification can be achieved.