week 2, 2007 lecture 2aslide #1 bivariate regression analysis theoretical models basic linear...

Week 2, 2007 Lecture 2a Slide #1

Bivariate Regression Analysis

• Theoretical Models

• Basic Linear Models: Deterministic Version

• Basic Linear Models: Stochastic Version

• Statistical Assumptions

• Estimating Linear Models

• Residuals (and the Pursuit of Truth…)

• An Example


Theoretical Linear Models

• The basis of “causality” in models– Time ordering– Co-variation– Non-spuriousness

• Examples– Fire Deaths f (# of fire trucks at the scene)– Job Retention f (current job satisfaction)– Income f (education)


Deterministic Linear Models

• Theoretical Model:– andare constant terms

• is the intercept

• is the slope

– Xi is a predictor of Yi

Yi = +Xi

Yi = +Xi

a

b

1 =a

b

Xi

Yi


Stochastic Linear Models• E[Yi] = +Xi

–

–

– Variation in Y is caused by more than X:

error (i)

•

• So:

0 = Y when X = 0

Each 1 unit increase in X increases Y by

i = Yi −(β0 + β1Xi ) = Yi − E[Yi ]

Yi =E[Yi ] + i= + Xi + i


Assumptions Necessary for Estimating Linear Models

1.Errors have identical distributions

Zero mean, same variance, across the range of X

2.Errors are independent of X and other i

3.Errors are normally distributed

E[ i ] ≠ f(X)andE[i ] ≠ f( j , j ≠i)

i=0

X


Normal, Independent & Identical i Distributions (“Normal iid”)

Y

X

Problem: We don’t know:

a) if error assumptions are true; b) values for 0 and 1

Solution: Estimate ‘em!


Estimating Linear ModelsYi = +Xi is modeled as:

Y^

=b +bXi

Y^ is the predicted value forYi

So: Yi −Yi^

=eior: ei =Yi −b −bXi

This is the formula for RESIDUALS -- which you will cometo know and cherish.


Residuals: Statistical Forensics

• Residuals measure prediction error:

ei > 0 if Yi > Yi

ei < 0 if Yi < Yi

Yi = +Xi

^

^

Y

X


Stata and Regression: Predicting Incarceration with Average income

• Stata dataset: Guns.dta– From “Data for empirical exercises”– What are your expectations? Why?

• Stata command:– Regression: “regress incarc_rate avginc”

• Output: Source | SS df MS Number of obs = 1173 -------------+------------------------------ F( 1, 1171) = 316.82 Model | 7986385.28 1 7986385.28 Prob > F = 0.0000 Residual | 29518728.5 1171 25208.1371 R-squared = 0.2129 -------------+------------------------------ Adj R-squared = 0.2123 Total | 37505113.8 1172 32000.9503 Root MSE = 158.77 ------------------------------------------------------------------------------ incarc_rate | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- avginc | 32.31456 1.815488 17.80 0.000 28.75258 35.87653 _cons | -216.931 25.34477 -8.56 0.000 -266.6572 -167.2047 ------------------------------------------------------------------------------


€

ei = Yi − b0 − b1X i or ei = Yi + 216.93− 32.31X i

Residual Analysis

In our data, some observed valuesare larger than

would bepredicted by

average income alone


Normality of Residuals

week 2, 2007 lecture 2aslide #1 bivariate regression analysis theoretical models basic linear...

Documents