week 2, 2007 lecture 2aslide #1 bivariate regression analysis theoretical models basic linear...
TRANSCRIPT
Week 2, 2007 Lecture 2a Slide #1
Bivariate Regression Analysis
• Theoretical Models
• Basic Linear Models: Deterministic Version
• Basic Linear Models: Stochastic Version
• Statistical Assumptions
• Estimating Linear Models
• Residuals (and the Pursuit of Truth…)
• An Example
Week 2, 2007 Lecture 2a Slide #2
Theoretical Linear Models
• The basis of “causality” in models– Time ordering– Co-variation– Non-spuriousness
• Examples– Fire Deaths f (# of fire trucks at the scene)– Job Retention f (current job satisfaction)– Income f (education)
Week 2, 2007 Lecture 2a Slide #3
Deterministic Linear Models
• Theoretical Model:– andare constant terms
• is the intercept
• is the slope
– Xi is a predictor of Yi
Yi = +Xi
Yi = +Xi
a
b
1 =a
b
Xi
Yi
Week 2, 2007 Lecture 2a Slide #4
Stochastic Linear Models• E[Yi] = +Xi
–
–
– Variation in Y is caused by more than X:
error (i)
•
• So:
0 = Y when X = 0
Each 1 unit increase in X increases Y by
i = Yi −(β0 + β1Xi ) = Yi − E[Yi ]
Yi =E[Yi ] + i= + Xi + i
Week 2, 2007 Lecture 2a Slide #5
Assumptions Necessary for Estimating Linear Models
1.Errors have identical distributions
Zero mean, same variance, across the range of X
2.Errors are independent of X and other i
3.Errors are normally distributed
E[ i ] ≠ f(X)andE[i ] ≠ f( j , j ≠i)
i=0
X
Week 2, 2007 Lecture 2a Slide #6
Normal, Independent & Identical i Distributions (“Normal iid”)
Y
X
Problem: We don’t know:
a) if error assumptions are true; b) values for 0 and 1
Solution: Estimate ‘em!
Week 2, 2007 Lecture 2a Slide #7
Estimating Linear ModelsYi = +Xi is modeled as:
Y^
=b +bXi
Y^ is the predicted value forYi
So: Yi −Yi^
=eior: ei =Yi −b −bXi
This is the formula for RESIDUALS -- which you will cometo know and cherish.
Week 2, 2007 Lecture 2a Slide #8
Residuals: Statistical Forensics
• Residuals measure prediction error:
ei > 0 if Yi > Yi
ei < 0 if Yi < Yi
Yi = +Xi
^
^
Y
X
Week 2, 2007 Lecture 2a Slide #9
Stata and Regression: Predicting Incarceration with Average income
• Stata dataset: Guns.dta– From “Data for empirical exercises”– What are your expectations? Why?
• Stata command:– Regression: “regress incarc_rate avginc”
• Output: Source | SS df MS Number of obs = 1173 -------------+------------------------------ F( 1, 1171) = 316.82 Model | 7986385.28 1 7986385.28 Prob > F = 0.0000 Residual | 29518728.5 1171 25208.1371 R-squared = 0.2129 -------------+------------------------------ Adj R-squared = 0.2123 Total | 37505113.8 1172 32000.9503 Root MSE = 158.77 ------------------------------------------------------------------------------ incarc_rate | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- avginc | 32.31456 1.815488 17.80 0.000 28.75258 35.87653 _cons | -216.931 25.34477 -8.56 0.000 -266.6572 -167.2047 ------------------------------------------------------------------------------
Week 2, 2007 Lecture 2a Slide #10
€
ei = Yi − b0 − b1X i or ei = Yi + 216.93− 32.31X i
Residual Analysis
In our data, some observed valuesare larger than
would bepredicted by
average income alone
Week 2, 2007 Lecture 2a Slide #11
Normality of Residuals
Week 2, 2007 Lecture 2a Slide #12
More on Normality: Q-Normal
Week 2, 2007 Lecture 2a Slide #13
Distribution of Residuals by X
Week 2, 2007 Lecture 2a Slide #14
BREAK TIME