regression with dummy variables econ420 1

Upload: shaharhr1

Post on 14-Apr-2018

237 views

Category:

Documents


3 download

TRANSCRIPT

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    1/47

    Econometrics

    OLS Regression with Dummy Variables

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    2/47

    Copyright 2009 South-Western/Cengage Learning

    2

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    3/47

    Interpretation of Regression Coefficients with a

    Binary Regressor

    Female is a dummy variable, which indicates whether a person

    is female:

    Consider a regression of wage on a constant, and Female:

    For a male, the regression model is:

    0 is the average wage for male workers.

    For a female, the regression model is:

    0+ 0 is the average wage for female workers.

    0 is the difference in average wages between female and

    male workers (how much more females earn relative to

    males).3

    maleif,0

    femaleif,1Female

    iii uFemaleWage 00

    ii uWage 0

    ii

    uWage 00

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    4/47

    Regression for Females

    -> sex = 0

    Linear regression Number of obs= 12

    F( 1, 10) = 2.21

    Prob > F = 0.1683R-squared = 0.2868

    Root MSE = .38277

    Robust

    gpa Coef. Std. Err. t P>t [95% Conf. Interval]

    hsgpa .869 .585 1.49 0.168 -.435 2.173

    _cons .244 2.26 0.11 0.916 -4.785 5.273

    4

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    5/47

    Regression for Males

    -> sex = 1

    Linear regression Number of obs= 22

    F( 1, 20) = 15.29

    Prob > F = 0.0009R-squared = 0.2236

    Root MSE = .35021

    Robust

    gpa Coef. Std. Err. t P>t [95% Conf. Interval]

    hsgpa .710 .182 3.91 0.001 .331 1.089

    _cons .691 .711 0.97 0.342 -.791 2.173

    5

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    6/47

    GPA Example: Regression with Dummy Variables

    regress gpa sex, robust

    Linear regression Number of obs = 34

    F( 1, 32) = 1.41

    Prob > F = 0.2440

    R-squared = 0.0443

    Root MSE = .40364

    Robust

    gpa Coef. Std. Err. t P>t [95% Conf. Interval]

    sex -.1764 .1486 -1.19 0.244 -.4792 .1264

    _cons 3.540 .1231 28.75 0.000 3.2889 .7904

    GPA = 3.540 0.1764 SEX, R2 = 0.04

    (0.12) (0.15)6

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    7/47

    Interpretation of Regression Coefficients with a

    Binary Regressor

    Consider a regression of wage on a constant, and Female:

    For a male, the regression model is:

    0 is the intercept for male workers.

    For a female, the regression model is:

    0+ 0 is the intercept for female workers.

    0 is the shift in the intercept7

    iiii uEducFemaleWage 100

    iii uEducWage 10

    iii uEducWage 100

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    8/47

    Copyright 2009 South-Western/Cengage Learning

    8

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    9/47

    Example: Wage Discrimination

    Consider a regression model:

    (0.72) (0.26) (0.049) (0.012)

    0 = -1.81

    In a simple regression:

    (0.21) (0.30)9

    iiiii uExperEducFemaleWage 2100

    iiiiiiuExperEducFemaleWage 025.0572.081.157.1

    iii uFemaleWage 51.210.7

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    10/47

    Using Dummy Variables for Multiple Categories

    4 groups: married men (MM), married women(MF), single

    men (SM) and single women (SF)

    Regression model:

    (0.100) (0.055) (0.058) (0.056) (0.007) (0.005)

    Which one is the base category?

    10

    iu

    iExper

    iEduc

    iSF

    iMF

    iMM

    iWageLog 027.0079.0110.0198.0213.0321.0)(

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    11/47

    Example: Effects of Physical Attractiveness on Wage

    3 groups: Below average(BA), above average(AA), and

    average(A)

    Regression model for men:

    (0.100) (0.046) (0.033)

    Which one is the base category?

    Regression model for women:

    (0.100) (0.066) (0.049)

    11

    i

    rsotherfacto

    i

    AA

    i

    BA

    i

    WageLog 016.0164.0321.0)(

    irsotherfactoiAAiBAiWageLog 035.0124.0200.0)(

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    12/47

    Outline Last Time: What is a dummy variable?

    How to interpret coefficients in a regression witha dummy variable(s)?

    Can we show the coefficient on a dummy

    variable on a graph?

    Today: Interaction terms and heteroskedasticity

    Why do we need interaction terms? 3 types of interaction terms

    What are the consequences of and the solution

    for heteroskedasticity?

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    13/47

    Interaction Terms Involving Dummy Variable

    Consider a regression model:

    (0.100) (0.056) (0.055) (0.072)

    13

    iuiMarriedFemaleiMarriediFemaleiWageLog ...*301.0213.0110.0321.0)(

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    14/47

    14

    Interactions Between Independent

    Variables: Test Score Example

    Perhaps the effect of class size reduction is bigger in districtswhere many students are still learning English,

    i.e smaller classes help more if there are many Englishlearners, who need individual attention

    That is, TestScoreSTR

    might depend onPctEL

    More generally,1

    YX

    might depend onX2

    How to model such interactions betweenX1 andX2?We first consider binaryXs, then continuousXs

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    15/47

    15

    (a) Interactions Between 2 Binary Variables

    Yi =0 +1D1i +2D2i + uiD1i,D2i are binary1 is the effect of changingD1=0 toD1=1. In this specification,

    this effect doesnt depend on the value of D2.To allow the effect of changingD1 to depend onD2, include the

    interaction termD1iD2i as a regressor:

    Yi =0 +1D1i +2D2i +3(D1iD2i) + ui

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    16/47

    16

    Interpreting the Coefficients

    Yi=

    0+

    1D

    1i+

    2D

    2i+

    3(D

    1iD

    2i) + u

    i

    General rule: compare various cases:

    E(Yi|D1i=0) = 0 + 2D2 (1)

    E(Yi|D1i=1) = 0 + 1 +2D2 + 3D2 (2)

    subtract (2)(1):

    E(Yi|D1i=1)E(Yi|D1i=0) = 1 + 3D2

    The effect of a change inD1 depends onD2 (what we wanted)3 = is the difference in the effect of a change inD1 on Ybetween those who haveD2 = 1 and those who have D2 =0

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    17/47

    Example: ln(wage) vs. gender and

    completion of a college degree

    F is the effect of being female on wages,

    C is the effect of a college education on wages.

    This regression does not allow the effect of obtaining a collegedegree to depend on gender.

    IfFC

    is statistically different from zero, then the effect of

    education on earnings is gender specific.

    FC shows by how much the wage differential between those

    with college degree and those without college degree is larger

    for females relative to males17

    iCiCFiFiuDDY

    0

    iCiFiFCCiCFiFi uDDDDY 0

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    18/47

    18

    Example: TestScore, STR, English learners

    Let

    HiSTR =1 if 20

    0 if 20

    STR

    STR

    and HiEL =

    1 if l0

    0 if 10

    PctEL

    PctEL

    TestScore = 664.118.2HiEL1.9HiSTR3.5(HiSTRHiEL)(1.4) (2.3) (1.9) (3.1)

    Effect ofHiSTR whenHiEL = 0 is1.9Effect ofHiSTR whenHiEL = 1 is1.93.5 =5.4Class size reduction is estimated to have a bigger effect when

    the percent of English learners is large

    This interaction isnt statistically significant: t= 3.5/3.1

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    19/47

    19

    (b) Interaction between Continuous andBinary Variables

    Yi =0 + 1Di + 2Xi + ui

    Di is binary,Xis continuousThe effect of X on Y(holding constantD) =2, which does not

    depend onD

    To allow the effect ofXto depend onD, include theinteraction termDiXi as a regressor:

    Yi =0 + 1Di +2Xi +3(DiXi) + ui

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    20/47

    20

    b) Interaction between Continuous and

    Binary Variables 2 Regression Lines

    Yi =0 + 1Di +2Xi +3(DiXi) + ui

    For observations withDi= 0 (the D= 0 group):Yi =0 +2Xi + ui - The D=0 regression l ine

    For observations withDi= 1 (the D= 1 group):Yi =0 + 1 +2Xi +3Xi + ui

    = (0+1) + (2+3)Xi + ui The D=1 regression l ine

    The 2 regression lines have both different intercepts anddifferent slopes.

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    21/47

    21

    Interaction between Continuous and Binary

    Variables 2 Regression Lines.

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    22/47

    22

    Interpreting the Coefficients

    Yi = 0 + 1Di + 2Xi +3(DiXi) + ui

    General rule: compare the various cases

    Y= 0 + 1D + 2X+ 3(DX) (1)

    Now changeX:Y+ Y= 0 + 1D + 2(X+X) +3[D(X+X)] (2)

    subtract (2)(1):

    Y= 2X+ 3DX

    Y

    X

    = 2 + 3D

    The effect ofXdepends onD (what we wanted)3 = increment to the effect ofX, whenD = 1

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    23/47

    23

    Example: TestScore, STR, HiEL

    (=1 if PctEL 10)

    TestScore = 682.20.97STR + 5.6HiEL1.28(STRHiEL)(11.9) (0.59) (19.5) (0.97)

    WhenHiEL = 0:Test Score = 682.20.97STR +u

    WhenHiEL = 1,Test Score= 682.20.97STR + 5.61.28STR +u

    = 687.82.25STR + u

    Two regression lines: one for eachHiSTR group.Class size reduction is estimated to have a larger effect when

    the percent of English learners is large.

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    24/47

    24

    Example: Testing hypotheses

    Test Score = 682.20.97STR + 5.6HiEL1.28(STRHiEL)(11.9) (0.59) (19.5) (0.97)

    If the two regression lines have the same slope thecoefficient on STRHiEL is zero: t=1.28/0.97 =1.32

    The two regression lines have the same intercept thecoefficient onHiEL is zero: t=5.6/19.5 = 0.29

    The two regression lines are the same populationcoefficient onHiEL = 0 andpopulation coefficient on

    STRHiEL = 0:F= 89.94 (p-value < .001) !!We reject the joint hypothesis but neither individualhypothesis (how can this be?)

    The regressors are highly correlated large standard errorson individual coefficients

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    25/47

    25

    (c) Interaction between 2 Continuous

    Variables

    Yi =0 + 1X1i +2X2i + ui

    X1,X2 are continuousAs specified, the effect ofX1doesnt depend onX2As specified, the effect ofX2doesnt depend onX1To allow the effect ofX1to depend onX2, include the

    interaction termX1iX2i as a regressor:

    Yi = 0 +1X1i + 2X2i + 3(X1iX2i) + ui

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    26/47

    26

    Interpreting the Coefficients:

    Yi = 0 +1X1i + 2X2i + 3(X1iX2i) + ui

    General rule: compare the various cases

    Y= 0 +1X1 +2X2 +3(X1X2) (1)

    Now changeX1:Y+ Y= 0 +1(X1+X1) +2X2 + 3[(X1+X1)X2] (2)

    subtract (2)(1):

    Y=1X1 +3X2X1 or1

    Y

    X

    =1 +3X2

    The effect ofX1 depends onX2 (what we wanted)3 = increment to the effect ofX1 from a unit change inX2

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    27/47

    27

    Example: TestScore, STR, PctEL

    Test Score = 686.31.12STR0.67PctEL + .0012(STRPctEL),(11.8) (0.59) (0.37) (0.019)

    The estimated effect of class size reduction is nonlinear because

    the size of the effect itself depends onPctEL:TestScore

    STR

    =1.12 + .0012PctEL

    PctEL TestScore

    STR

    0 1.12

    20% 1.12+.001220 =1.10

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    28/47

    28

    Example: Hypothesis Tests

    Test Score = 686.31.12STR0.67PctEL + .0012(STRPctEL),(11.8) (0.59) (0.37) (0.019)

    Does population coefficient on STRPctEL = 0?t= .0012/.019 = .06 cant reject null at 5% level

    Does population coefficient on STR = 0?t=1.12/0.59 =1.90 cant reject null at 5% level

    Do the coefficients on bothSTRandSTRPctEL = 0?F= 3.89 (p-value = .021) reject null at 5% level(!!)

    (Why? high but imperfect multicollinearity)

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    29/47

    Copyright 2009 South-Western/Cengage Learning

    29

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    30/47

    30

    Heteroskedasticity and Homoskedasticity

    What?Consequences of homoskedasticityImplication for computing standard errors

    Homoskedasticity

    If var(u|Xi) is constantthat is, if the variance of the

    conditional distribution ofu givenXdoes not depend onXthen u is said to be homoskedastic. Otherwise, u is

    heteroskedastic.

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    31/47

    Example: Earnings of male and female

    college graduates

    Homoskedasticity: Var(ui) does not depend on Malei

    For women:

    For men:

    Homoskedasticity: the variance of earnings is

    the same for men and for women

    Equal group variances = homoskedasticity

    Unequal group variances = heteroskedasticity31

    iii uMaleEarnings 10

    ii uEarnings 0

    ii uEarnings 10

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    32/47

    32

    Homoskedasticity in a picture:

    E(u|X=x) = 0 (u satisfies Least Squares Assumption #1)The variance ofudoes notdepends onx: u is homoskedastic.

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    33/47

    33

    Heteroskedasticity in a picture:

    E(u|X=x) = 0 (u satisfies Least Squares Assumption #1)The variance ofudoesdepend onx: u is heteroskedastic

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    34/47

    34

    A real-data example from labor economics: average

    hourly earnings vs. years of education (Data source:

    Current Population Survey)

    eteroskedastic or homoskedastic?

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    35/47

    35

    So far we have (without saying so) assumed that

    u might be heteroskedastic.

    Heteroskedasticity and homoskedasticity concern var(u|X=x).

    Because we have not explicitly assumed homoskedastic errors,

    we have implicitly allowed for heteroskedasticity.

    The OLS estimators remain unbiased, consistent and

    asymptotically Normal even when the errors are heteroskedastic.

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    36/47

    36

    What if the errors are in fact homoskedastic?

    If Assumptions 1-4 hold and the errors are homoskedastic, OLSestimators are efficient (have the lowest variance) among all linear

    estimators. (Gauss-Markov theorem).

    The formula for the variance of1

    and the OLS standard error

    simplifies: If var(ui|Xi=x) =2

    u , then

    2

    2

    1)(

    2

    1

    )(XX

    unVar

    i

    i

    Note: var( 1

    ) is inversely proportional to var(

    X): more spread in

    means more information about1

    )()( 11 VarSE

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    37/47

    37

    We now have two formulas for standard

    errors for1

    Homoskedasticity-only standard errorsare valid only if theerrors are homoskedastic.

    The usual standard errorsorheteroskedasticityrobuststandard errors- are valid whether or not the errors areheteroskedastic.

    The main advantage of the homoskedasticity-only standarderrors is that the formula is simpler. But the disadvantage is

    that the formula is only correct if the errors are

    homoskedastic.

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    38/47

    38

    Practical implications

    The homoskedasticity-only formula for the standard error of1

    and the heteroskedasticity-robust formula differ so in

    general,you get different standard errors using the different

    formulas.

    Homoskedasticity-only standard errors are the default settingin regression softwaresometimes the only setting (e.g.

    Excel). To get the general heteroskedasticity-robust

    standard errors you must override the default.

    If you dont override the default and there is in fact

    heteroskedasticity, your standard errors (and wrong t-

    statistics and confidence intervals) will be wrongtypically,

    homoskedasticity-only SEs are too small.

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    39/47

    Consequences of Heteroskedasticity

    Homoskedasticity-only standard errors (that

    Stata reports) are valid only if the errors are

    homoskedastic.

    Heteroskedasticity

    robuststandard errors

    (that Stata reports when we add the robust

    option) - are valid whether or not the errors are

    heteroskedastic.

    39

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    40/47

    Heteroskedasticity-robust standard errors

    in STATA regress testscr str, robust

    Regression with robust standard errors Number of obs = 420

    F( 1, 418) = 19.26

    Prob > F = 0.0000

    R-squared = 0.0512

    Root MSE = 18.581- | Robust

    testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]

    --------+----------------------------------------------------------------

    str | -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671

    _cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057

    If you use the, robust option, STATA computes heteroskedasticity-robust

    standard errors

    Otherwise, STATA computes homoskedasticity-only standard errors

    40

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    41/47

    The bottom line:

    Heteroskedasticity-robust standard errors are

    correct whether the errors are heteroskedastic

    or homoskerdastic

    If the errors are heteroskedastic and you use

    the homoskedasticity-only standard errors,

    your standard errors will be wrong

    So, you should always use heteroskedasticity-

    robust standard errors.

    41

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    42/47

    Evaluating the Results of Regression Analysis

    Testing for Heteroskedasticity:1. Visual Evidence:

    - Does u_hat exhibit any systematic pattern?

    - regress y x1 x2 x3

    - predict uhat, residuals - Stata will create

    a record residuals for the estimated model in a variable

    uhat

    - gen uhatsq=uhat*uhat

    - scatter uhatsq x1

    - scatter uhatsq x2

    - scatter uhatsq x342

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    43/47

    Evaluating the Results of Regression

    Analysis

    2. White Test for Heteroskedasticity

    Regresse the squared error term from the OLS regression

    on the independent variables in the regression, their

    squares and interaction terms. Calculate R-squared from this regression

    n R-squared ~ 2q , where q is the number of regressors

    including constant

    regress y x1 x2 x3

    imtest, white

    Stata provides p-value for H0 of homoskedasticity (low p-

    value provides evidence for rejecting the null hypothesis)43

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    44/47

    Evaluating the Results of Regression

    Analysis

    Testing for Normality of the Error Terms

    1. Visual Evidence:

    - regress y x1 x2 x3

    - predict uhat, residuals

    - Histogram uhat, normal Stata will build

    the histogram for the residuals and plot it on te

    same graph with a normal density function

    44

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    45/47

    Evaluating the Results of Regression

    Analysis

    45

    0

    .001

    .

    -1000 0 1000 2000

    Residuals

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    46/47

    Evaluating the Results of Regression

    Analysis

    2. Jarque-Bera Test for Normality

    H0: error terms are Normal

    Test statistic:

    The 5% critical value is 5.99; if JB > 5.99, reject the

    null of normality.

    In Stata, there is no command to calculate this teststatistic directly.

    summarize uhat, detail

    Calculate JB test statistic manually46

  • 7/29/2019 Regression With Dummy Variables Econ420 1

    47/47

    summarize uhat, detail Residuals Percentiles

    1% -661.5541 -835.7715 5% -507.4577 -799.6147

    10% -416.4949 -723.7985 Obs 935

    25% -249.1033 -721.2892 Sum of Wgt. 935

    50% -42.96934 Mean 1.09e-07

    Std. Dev. 372.7708

    75% 197.4006 1544.31

    90% 459.8176 1788.275 Variance 138958.1

    95% 625.2347 2005.186 Skewness 1.210522 99% 1168.44 2225.102Kurtosis 6.533908

    JB = (n/6)*(skewness^2+(((kurtosis-3)^2)/4)) =715