regression continued: functional form notes/fall...4 functional form functional form a first point...
TRANSCRIPT
1
Regression Continued:Functional Form
LIR 832
Topics for the Evening
1 Qualitative Variables1. Qualitative Variables2. Non-linear Estimation
2
Functional Form
Not all relations among variables are linear:Not all relations among variables are linear:Our basic linear model:
y=β0+ β1X1 + β2X2 +…+ βkXk + e
Functional Form
Q: Given that we are using OLS can weQ: Given that we are using OLS, can we mimic these non-linear forms?A: We have a small bag of tricks which we can use with OLS.
3
Functional Form
Functional Form
4
Functional Form
Functional Form
A first point about functional form: You must have anA first point about functional form: You must have an intercept.
Consider the following case: We estimate a model and test the intercept to determine if it is significantly different than zero. We are not able to reject the null in a hypothesis test and we decide to re-estimate the model without an intercept. What is really going on?Return to our basic model:
y=β0+ β1X1 + β2X2 +…+ βkXk + e What are we doing when we remove the intercept?
y=0+ β1X1 + β2X2 +…+ βkXk + e
5
Functional Form
Functional Form
6
Functional Form/* Regression without an intercept */Regression Analysis: weekearn versus years edRegression Analysis: weekearn versus years ed
The regression equation isweekearn = 57.3 years ed
47576 cases used, 7582 cases contain missing values
Predictor Coef SE Coef T PNoconstantyears ed 57.3005 0.1541 371.96 0.000
S = 534.450
Functional Form/* Regression with an intercept */Regression Analysis: weekearn versus years edRegression Analysis: weekearn versus years ed
The regression equation isweekearn = - 485 + 87.5 years ed
47576 cases used, 7582 cases contain missing values
Predictor Coef SE Coef T PConstant -484.57 18.18 -26.65 0.000years ed 87.492 1.143 76.54 0.000
S = 530.510 R-Sq = 11.0% R-Sq(adj) = 11.0%
7
Functional Form
Consequences of forcing through zero:Consequences of forcing through zero:Unless the intercept is really zero, we are going to bias both the intercept and the slope coefficients.Remember that we calculate the intercept so that the line passes through the point of means:
Assures that the Σε = 0If we impose 0 as the intercept, the line may not pass through the
i t f d th f th t lpoint of means and the sum of the errors may not equal zero.Biases the coefficients and leads to incorrect estimates of the standard errors of the βs.
Never suppress the intercept, even if your theory suggests that it is not necessary.
Functional Form/* What About Those Residuals? */
S S SDescriptive Statistics: RESI1, RESI2
Variable N N* Mean SE Mean StDev Minimum Q1 MedianRESI1 47576 7582 -8.67 2.45 534.38 -1180.31 -359.12 -122.21RESI2 47576 7582 0.00 2.43 530.50 -1329.77 -340.32 -107.62
Variable Q3 MaximumRESI1 218.59 2311.61RESI2 237.69 2494.26
8
Functional Form
Returning to the issue of non-linearityReturning to the issue of non-linearity…In our basic model:
β = ΔY/ΔX = change in Y for a one-unit change in XConsider the effect of Education on base salary…
Functional FormDescriptive Statistics: years ed, Exp
Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximumyears ed 55158 0 15.734 0.00941 2.211 1.000 14.000 16.000 18.000 21.000Exp 55107 51 21.644 0.0496 11.640 0.0000 13.000 22.000 30.000 76.000
Regression Analysis: weekearn versus years ed
The regression equation isweekearn = - 485 + 87.5 years ed
47576 cases used, 7582 cases contain missing values
Predictor Coef SE Coef T PConstant -484.57 18.18 -26.65 0.000years ed 87.492 1.143 76.54 0.000
S = 530.510 R-Sq = 11.0% R-Sq(adj) = 11.0%
9
Functional Form
Now create a graph in MINITAB:Now create a graph in MINITAB:Work in a new worksheet:Create values for years of education 0 - 21Use the calculator to create the predicted weekly earnings.Use the scatterplot graphing function:
Functional Form
Every year of education increases earnings by $87.49!
10
Functional Form
Q: How do we estimate non-linear relations?Q: How do we estimate non linear relations?A: We can use log transforms of variables to measure relations between variables as percentages rather than units.
What is a log? What is a log transform?Take any number, let’s take 10.Then calculate b such that 10 = 2.71828b. Then b is the log of 10. In this case b = 2.302585.You can do this on your calculator, in a spreadsheet, or in MINITAB.
Functional Form
As your text shows:As your text shows:ln(100) = 4.605 100 = 2.71828b
ln(1000) = 6.908 1000 = 2.71828b
ln(10,000) = 9.210 10,000 = 2.71828b
ln(1,000,000) = 13.816 1,000,000 = 2.71828b
We typically do not write 2.71828, rather we b tit t th t l b (th l b 10substitute e the natural base (there are also base 10
logs). So…10 = e2.302585
Some nice properties of log functions:ln(X*Y) = ln(X) + ln(Y)ln(X2) = 2*ln(X)
11
Functional Form
This property made it possible to manipulate very large p p y p p y gnumbers very easily and provides the foundation for slide rules and many modern computer calculations.
Consider: 1,212,345*375,282A real mess to do by hand
Now consider the following transformation of this problem:ln(1,212,345*375,282)
=ln(1 212 345) + ln(375 282)=ln(1,212,345) + ln(375,282) =14.008067 + 12.83543= 26.8435= 2.7182826.8435
= antilog(26.8435) = 45,484,956.5078803
Functional Form
The Shell presentation has an equation associated with an p qupward curve of:
Earnings = 62988x0.2676
Or… y=β0Xβ1
We cannot estimate this in its current form using regression, but think about taking the log of each side:
ln(y) = ln(β0Xβ1)ln(y) = ln(β )+ln(Xβ1)ln(y) = ln(β0)+ln(Xβ1)ln(y) = ln(β0)+β1ln(X)
So, if we take the log of each side, we get a linear equation that we can estimate!
12
Functional Form
Consider the following equation: (single logConsider the following equation: (single log equation)
ln(weekearn) = β0 + β1*YearsEd + eThe interpretation of the coefficient on years of education is now the % change in base salary for a 1 year change in Education.H t d thi i MINITABHow to do this in MINITAB:
Calculate the log of weekly earningsEstimate the regression as…
Functional FormRegression Analysis: ln week earn versus years ed
The regression equation isln week earn = 4.87 + 0.109 years ed
47576 cases used, 7582 cases contain missing values
Predictor Coef SE Coef T PConstant 4.86646 0.02382 204.33 0.000years ed 0.108980 0.001497 72.78 0.000
S = 0.694967 R-Sq = 10.0% R-Sq(adj) = 10.0%
Analysis of Variance
Source DF SS MS F PRegression 1 2558.4 2558.4 5297.03 0.000Residual Error 47574 22977.3 0.5Total 47575 25535.6
13
Functional Form
Now we find that an additional year of educationNow we find that an additional year of education results in a 10.98% increase in salary.
Interpretation is different from linear modelr2 is different between linear and log model.
Linear: r2 =11.0%Log: r2 = 10.0%
Does this mean the fit of the log model is worse than the glinear model?No, cannot compare the two because you have transformed the equation. Fundamentally altered the variance of the dependent variable.
Functional FormDescriptive Statistics: weekearn, ln week earn
Variable N N* Mean SE Mean StDev Minimum Q1 Medianweekearn 47576 7582 894.53 2.58 562.22 0.01 519.00 769.23ln week earn 47576 7582 6.5843 0.00336 0.7326 -4.6052 6.2519 6.6454
Variable Q3 Maximumweekearn 1153.00 2884.61ln week earn 7.0501 7.967
What Does the Log Model Look Like? -- How to create aWhat Does the Log Model Look Like? How to create a prediction in MINITAB & graph:
Use regression equation to create estimated log wage from years of education dataExponentiate the predicted value using the MINITAB calculatorGraph predicted wage against years of education
14
Functional Form
Functional Form
What is the equation underlying this model?What is the equation underlying this model?
Model of growth (such as compound interest)interest)…
15
Functional Form
Now lets try another approach, taking the log of bothNow lets try another approach, taking the log of both sides (double log equation):
The interpretation of the coefficient on JEP is now the % change in base salary for a 1 % change in JEP.Note that this is an elasticity (which you will discuss in 809 in talking about supply and demand – the elasticity of labor demand with respect to the wage is the % change in the demand for labor for a 1% change in the wage).
Functional FormRegression Analysis: ln week earn versus ln ed
The regression equation isln week earn = 2.13 + 1.62 ln ed
47576 cases used, 7582 cases contain missing values
Predictor Coef SE Coef T PConstant 2.12844 0.06203 34.32 0.000ln ed 1.62142 0.02254 71.93 0.000
S = 0.695775 R-Sq = 9.8% R-Sq(adj) = 9.8%
16
Functional Form
Functional Form
17
Functional Form
What is going on graphically? What are weWhat is going on graphically? What are we really doing?
Functional Form
18
Functional Form
Functional Form
Q: How do we choose?Q: How do we choose?A: Prior work and theory
Is it sensible to measure as a linear model, or does one of these non-linear forms make better sense?
Example: hi ki f h l i hi b d i dThinking of the relationship between education and wages:
wage = β0 + β1*Years_of_Educationln(wage) = β0 + β1*Years_of_Educationln(wage) = β0 + β1*ln(Years_of_Education)
19
Functional Form
What does prior work indicate?What does prior work indicate?We typically use a log wage equation rather than a wage equation because…
Turns out the error term is normally distributed in a log wage equation.More readily compared across models as it is not dependent on the scaling of the variabledependent on the scaling of the variable.Comparing the effect of education in percentage terms frees us from the effect of inflation and alternative currencies.
Functional Form
A more general non-linear form (TheA more general non-linear form (The Polynomial Form)
Problem: Do we really believe that you get an additional 0.723% in weekly earnings for each year you get older. Hardly makes it worth getting olderolder.
20
Functional FormRegression Analysis: ln(wkern) versus age, gender, edattain
The regression equation isln(wkern) = 2.41 + 0.00723 age - 0.368 gender + 0.105 edattain
47576 cases used 7582 cases contain missing values
Predictor Coef SE Coef T PConstant 2.41075 0.06470 37.26 0.000age 0.0072344 0.0002669 27.11 0.000gender -0.368278 0.006115 -60.22 0.000edattain 0.105032 0.001491 70.45 0.000
S = 0.6626 R-Sq = 18.2% R-Sq(adj) = 18.2%
This model remains linear in ln(weekly earnings), each unit increase in age causes earnings to rise by 0.7%.
Functional Form
It would be more reasonable to believe we willIt would be more reasonable to believe we will get a relationship which looks like: Why?
21
Functional Form
How do we mimic this? Consider estimatingHow do we mimic this? Consider estimating the following linear regression:
Notice that age enters twice first as a linearNotice that age enters twice, first as a linear term and then as a square. What does this model look like with real data?
Functional FormRegression Analysis: ln(wkern) versus age, age2, gender, edattain
The regression equation isln(wkern) = 0.927 + 0.104 age - 0.00113 age2 - 0.376 gender + 0.0948
edattain
47576 cases used 7582 cases contain missing values
Predictor Coef SE Coef T PConstant 0.92706 0.06640 13.96 0.000
0 103919 0 001547 67 17 0 000age 0.103919 0.001547 67.17 0.000age2 -0.00112565 0.00001776 -63.37 0.000gender -0.376012 0.005874 -64.01 0.000edattain 0.094822 0.001441 65.82 0.000
S = 0.6363 R-Sq = 24.6% R-Sq(adj) = 24.6%
22
Functional Form
Note that we now have two coefficients on Age:Note that we now have two coefficients on Age:Age .103919Age2 -0.00112565
We know that the first term indicates that for each additional year our weekly earnings rise by 10.39%. But how do we chart out the second term. so that weBut how do we chart out the second term. so that we have the full effect of age on earnings?
Functional Form
23
Functional Form
The effect of an additional year on earningsThe effect of an additional year on earnings (formula for a polynomial model):If our model is: y = β0 + β1X + β2X2 + ….Then ΔY/ΔX = β1+2*β2*XFirst issue look at the prediction of ln weeklyFirst issue, look at the prediction of ln weekly earnings based on age (leave all other variables at their mean).
Functional Form
24
Functional Form
Functional Form
What about the ‘marginal effect’ of age?What about the marginal effect of age?What is the effect on income of getting an additional year older?
Obviously varies with how old you are. Things are pretty good when you are youngTwo ways of obtaining this:
1 Calculate the difference in the total effect of age for any two1. Calculate the difference in the total effect of age for any two years.
Age22 1.741Age21 1.686Diff 0.055 or + 5.5%
25
Functional Form
2. Alternatively, use the polynomial formula:y p y
Functional Form
What is the increase in earnings at age 21?What is the increase in earnings at age 21?.103919 - .0022513*21 =0.056642
What about age 25?.103919 - .0022513*25 =0.0476365
What about age 50? (Class work)Note that the effect of an additional year of education is no longer constant, it depends on how old you are.
26
Functional Form
Functional Form
The gains to aging are greatest when you areThe gains to aging are greatest when you are youngest:
They decline steadily as you age.By age fifty your earnings are falling as you get older (oops!).
A couple points about polynomial and functional forms:
Polynomial forms have the strength of letting the data tellPolynomial forms have the strength of letting the data tell you if the relationship is linear or not. If it is, the coefficient on X2 will be 0 or very close to it.You cannot compare r2 across log and non-log forms because it changes the dependent variable and the sum of squares. You can between linear and non-linear forms.
27
Recap on Functional Form
Not all relationships are linearNot all relationships are linearRegression allows us to estimate non-linear models and to let the data tell us whether we should be using a non-linear form
Single and double log transformsPolynomial form
MultiCollinearity
Issue: What happens when two variablesIssue: What happens when two variables contain the same, or almost the same information?
Condition is called multicollinearity
28
Perfect MultiCollinearity Is Not a Problem
Try putting both a Male and FemaleTry putting both a Male and Female dummy variable in a wage equation
Base Regression: Earnings=F(age, Education)
Regression Analysis: weekearn versusRegression Analysis: weekearn versus years ed, age The regression equation isweekearn = - 707 + 83.5 years ed + 6.87 agePredictor Coef SE Coef T PConstant -706.63 19.24 -36.73 0.000years ed 83.463 1.137 73.38 0.000age 6.8717 0.2118 32.45 0.000
S = 524.739 R-Sq = 12.9% R-Sq(adj) = 12.9%
29
Now Put Male & Female Into Model
Regression Analysis: weekearn versusRegression Analysis: weekearn versus years ed, age, Male, Female
* Female is highly correlated with other X variables* Female has been removed from the equation.
The RegressionThe regression equation isweekearn = - 720 + 76.4 years ed + 6.29 age + 319 Male
Predictor Coef SE Coef T PConstant -720.28 18.35 -39.25 0.000years ed 76.432 1.089 70.16 0.000
6 2874 0 2021 31 11 0 000age 6.2874 0.2021 31.11 0.000Male 318.522 4.625 68.87 0.000
S = 500.391 R-Sq = 20.8% R-Sq(adj) = 20.8%
30
Male & Female Contain the Same Information
Correlations: Male FemaleCorrelations: Male, Female
Pearson correlation of Male and Female = -1.000P-Value = *P Value
What If Several Variables Contain the Same Information
Regression Analysis: weekearn versus age, years ed, Female, NE, MW, S, W
* W is highly correlated with other X variables* W has been removed from the equation.
The regression equation isweekearn = - 392 + 6.25 age + 75.9 years ed - 318 Female + 47.7 NE - 18.2 MW
- 20.3 S
47576 cases used, 7582 cases contain missing values
Predictor Coef SE Coef T PConstant -392.10 19.21 -20.42 0.000age 6.2532 0.2019 30.98 0.000years ed 75.895 1.089 69.67 0.000Female -318.406 4.619 -68.93 0.000NE 47.658 6.768 7.04 0.000MW -18.155 6.594 -2.75 0.006S -20.323 6.317 -3.22 0.001
S = 499.701 R-Sq = 21.0% R-Sq(adj) = 21.0%
31
What Are the Regional Dummies Correlated With?
Descriptive Statistics: NE, MW, S, W
Variable N N* Mean SE Mean StDev Minimum Q1 MedianNE 55158 0 0.22310 0.00177 0.41633 0.00000 0.00000 MW 55158 0 0.23873 0.00182 0.42631 0.00000 0.00000 S 55158 0 0.29211 0.00194 0.45474 0.00000 0.00000 W 55158 0 0.24606 0.00183 0.43072 0.00000 0.00000
Imperfect MultiCollinearity
Two or more variables contain similar butTwo or more variables contain similar but not identical information
32
Log Wage RegressionSource | SS df MS Number of obs = 156130-------------+------------------------------ F( 11,156118) = 4227.42
Model | 11630.4798 11 1057.31635 Prob > F = 0.0000Residual | 39046.5066156118 .250108934 R-squared = 0.2295Residual | 39046.5066156118 .250108934 R squared 0.2295
-------------+------------------------------ Adj R-squared = 0.2294Total | 50676.9864156129 .324584071 Root MSE = .50011
------------------------------------------------------------------------------lnwage3 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------age | .0712402 .0005528 128.87 0.000 .0701567 .0723237
age2 | -.0007535 6.58e-06 -114.54 0.000 -.0007664 -.0007406female | -.1999096 .0025452 -78.54 0.000 -.2048982 -.1949211
married | .0947973 .0028481 33.28 0.000 .089215 .1003796black | -.1314511 .0043814 -30.00 0.000 -.1400385 -.1228637other | -.0063689 .0057833 -1.10 0.271 -.0177041 .0049663
NE | .0328108 .0038223 8.58 0.000 .0253191 .0403024Midwest | .007487 .0036482 2.05 0.040 .0003367 .0146373
South | -.0204817 .0035696 -5.74 0.000 -.027478 -.0134854city1mil | .1440377 .0026054 55.28 0.000 .1389312 .1491443union2 | .1358151 .0037783 35.95 0.000 .1284097 .1432205_cons | .9784856 .0107005 91.44 0.000 .9575129 .999458
Switch CBC for UnionSource | SS df MS Number of obs = 156130
-------------+------------------------------ F( 11,156118) = 4242.43Model | 11662.2696 11 1060.20633 Prob > F = 0.0000Model | 11662.2696 11 1060.20633 Prob > F 0.0000
Residual | 39014.7168156118 .249905307 R-squared = 0.2301-------------+------------------------------ Adj R-squared = 0.2301
Total | 50676.9864156129 .324584071 Root MSE = .49991------------------------------------------------------------------------------
lnwage3 | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+----------------------------------------------------------------
age | .0710808 .0005528 128.59 0.000 .0699974 .0721642age2 | -.000752 6.58e-06 -114.34 0.000 -.0007649 -.0007391
female | -.2003086 .0025431 -78.77 0.000 -.205293 -.1953242married | .0946468 .002847 33.24 0.000 .0890668 .1002269
black | -.1321203 .0043799 -30.17 0.000 -.1407048 -.1235358other | -.0061873 .005781 -1.07 0.284 -.0175179 .0051434
NE | .033546 .0038197 8.78 0.000 .0260595 .0410324Midwest | .0079032 .0036465 2.17 0.030 .000756 .0150503
South | -.0200437 .003568 -5.62 0.000 -.0270369 -.0130504city1mil | .1442921 .0026043 55.41 0.000 .1391878 .1493965
cbc2 | .1363582 .0036181 37.69 0.000 .1292668 .1434495_cons | .9799436 .0106968 91.61 0.000 .9589782 1.000909
------------------------------------------------------------------------------
33
Use Union & CBCSource | SS df MS Number of obs = 156130-------------+------------------------------ F( 12,156117) = 3889.14
Model | 11662.8996 12 971.908303 Prob > F = 0.0000Residual | 39014.0867156117 .249902872 R-squared = 0.2301Residual | 39014.0867156117 .249902872 R squared 0.2301
-------------+------------------------------ Adj R-squared = 0.2301Total | 50676.9864156129 .324584071 Root MSE = .4999
------------------------------------------------------------------------------lnwage3 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------age | .0710741 .0005528 128.58 0.000 .0699907 .0721575
age2 | -.0007519 6.58e-06 -114.32 0.000 -.0007648 -.000739female | -.2001837 .0025443 -78.68 0.000 -.2051704 -.1951969
married | .0946413 .002847 33.24 0.000 .0890612 .1002213black | -.1321795 .00438 -30.18 0.000 -.1407643 -.1235947other | -.0061938 .005781 -1.07 0.284 -.0175244 .0051367
NE | .0333811 .0038211 8.74 0.000 .0258919 .0408703Midwest | .0078341 .0036468 2.15 0.032 .0006864 .0149817
South | -.0199589 .0035684 -5.59 0.000 -.0269529 -.0129649city1mil | .1442482 .0026044 55.39 0.000 .1391436 .1493528union2 | .0175444 .0110493 1.59 0.112 -.0041121 .0392008
cbc2 | .1205632 .0105851 11.39 0.000 .0998166 .1413098_cons | .9800641 .010697 91.62 0.000 .9590982 1.00103
Consequences of MultiCollinearity
Estimates remain unbiasedEstimates remain unbiasedVariances and Standard Errors Increase
Computed t-scores fallEstimates will be very sensitive to specificationOverall fit of the model (r-square) will be unaffectedPredictions are also unaffected
34
What Is the Issue
Where there is MultiCollinearity we needWhere there is MultiCollinearity, we need to be careful about interpreting results
Can be misleading about effect of variables
Detecting Collinearity
High correlation between variablesHigh correlation between variablesIssue: multiple variables are collectively collinear (region example)
Variance Inflation FactorRegress each explanatory variable on all other explanatory variablesCalculate
)1(1
2i
i RVIF
−=
35
How Do We Calculate the VIF?
Regression Analysis: age versus years ed, Female, NE, MW, S, W
* W is highly correlated with other X variables* W has been removed from the equation.
The regression equation isage = 35.8 + 0.480 years ed - 1.59 Female + 0.098 NE - 0.617 MW - 0.204 S
Predictor Coef SE Coef T PConstant 35.7977 0.3712 96.43 0.000years ed 0.47978 0.02241 21.41 0.000Female -1.59360 0.09896 -16.10 0.000NE 0.0979 0.1443 0.68 0.498MW -0.6174 0.1416 -4.36 0.000S -0.2044 0.1349 -1.52 0.130
S = 11.5764 R-Sq = 1.5% R-Sq(adj) = 1.5%
It’s a Different Story with Regional Variables
Regression Analysis: NE versus age, years ed, Female, MW, S, W
The regression equation isNE = 1.00 + 0.000000 age + 0.000000 years ed + 0.000000 Female - 1.00 MW
- 1.00 S - 1.00 W
Predictor Coef SE Coef T PConstant 1.00000 0.00000 * *age 0.00000000 0.00000000 * *years ed 0.00000000 0.00000000 * *Female 0.00000000 0.00000000 * *MW -1.00000 0.00000 * *S -1.00000 0.00000 * *W -1.00000 0.00000 * *
S = 0 R-Sq = 100.0% R-Sq(adj) = 100.0%
36
CBC Has A High VIF. reg cbc2 age age2 female married black other NE Midwest South city1mil union2
Source | SS df MS Number of obs = 161792-------------+------------------------------ F( 11,161780) = .
Model | 18165.9762 11 1651.45238 Prob > F = 0.0000Residual | 2301.31742161780 .014224981 R-squared = 0.8876
-------------+------------------------------ Adj R-squared = 0.8876Total | 20467.2936161791 .126504525 Root MSE = .11927
------------------------------------------------------------------------------cbc2 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------age | .0013903 .0001288 10.80 0.000 .0011379 .0016426
age2 | -.0000133 1.53e-06 -8.72 0.000 -.0000163 -.0000103female | .0025409 .0005963 4.26 0.000 .0013722 .0037096
married | .0013089 .0006676 1.96 0.050 4.52e-07 .0026174black | .0063441 .001032 6.15 0.000 .0043214 .0083668other | -.0016395 .0013597 -1.21 0.228 -.0043046 .0010255
NE | -.0043777 .000895 -4.89 0.000 -.0061319 -.0026234Midwest | -.0027157 .0008563 -3.17 0.002 -.0043941 -.0010374
South | -.0041338 .0008356 -4.95 0.000 -.0057716 -.0024961city1mil | -.0018596 .0006102 -3.05 0.002 -.0030555 -.0006636union2 | .9811512 .0008888 1103.92 0.000 .9794092 .9828932_cons | -.013585 .0025048 -5.42 0.000 -.0184943 -.0086757
What To Do About MultiCollinearity
Do NothingDo NothingGet More Data
We had 156,000 observations for the wage regressions
Drop the Redundant VariablepCare needed in interpretation
37
Compare Specification IssuesOmitted Extraneous MultiCollinearity
Added Variable Right signed & Large in Magnitude
Coefficient close to zero
Right or wrong signed
Significance Highly Significant Non-significant Weak or n.s.
Other Coef Change sign Little Change Possibly change signsign
Significance Remains singificant Little Change Becomes weak or n.s.
R-square Increase alot Little change Little change
New Sample Little Difference Little Difference Unstable Estimates