psy 1950 regression november 10, 2008. definition simple linear regression –models the linear...

22
PSY 1950 Regression November 10, 2008

Post on 21-Dec-2015

223 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: PSY 1950 Regression November 10, 2008. Definition Simple linear regression –Models the linear relationship between one predictor variable and one outcome

PSY 1950Regression

November 10, 2008

Page 2: PSY 1950 Regression November 10, 2008. Definition Simple linear regression –Models the linear relationship between one predictor variable and one outcome

Definition• Simple linear regression

– Models the linear relationship between one predictor variable and one outcome variable

– e.g., predicting income based upon age

• Multiple linear regression– Models the linear relationship between more than one predictor variables and one outcome variable

– e.g., predicting income based upon age and sex

• Lingo– Independent/dependent, predictor/outcome

Page 3: PSY 1950 Regression November 10, 2008. Definition Simple linear regression –Models the linear relationship between one predictor variable and one outcome

History• Astronomical predictions: method of least squares– Piazzi (1801) spotted Ceres, made 22 observations over 41 days, got sick, lost Ceres

– Gauss: "... for it is now clearly shown that the orbit of a heavenly body may be determined quite nearly from good observations embracing only a few days; and this without any hypothetical assumption.”

• Genetics: Regression to the mean– Galton, F. (1886). Regression towards mediocrity in hereditary stature. Journal of the Anthropological Institute, 15, 246–263.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 4: PSY 1950 Regression November 10, 2008. Definition Simple linear regression –Models the linear relationship between one predictor variable and one outcome

Lines• Mathematically, a line is defined by its slope and intercept– Slope is change in Y per change in X

– Intercept is the points at which the line crosses the Y-axis, i.e., Y when X = 0

• Y = bX + a– b is slope– a is intercept

Page 5: PSY 1950 Regression November 10, 2008. Definition Simple linear regression –Models the linear relationship between one predictor variable and one outcome

Which Lines is Best?

0

50

100

150

200

250

300

350

400

0 500 1000 1500 2000 2500 3000 3500 4000

Advertising Budget (Thousands of $)

Record Sales (Thousands)

Page 6: PSY 1950 Regression November 10, 2008. Definition Simple linear regression –Models the linear relationship between one predictor variable and one outcome

Residuals• Residuals are

– Errors in prediction– Difference between expected values (under your model) and observed values (in your dataset)

Y = 0.063X + 131.59

0

50

100

150

200

250

300

350

400

0 500 1000 1500 2000 2500 3000 3500 4000

Advertising Budget (Thousands of $)

Record Sales (Thousands)

-200

-150

-100

-50

0

50

100

150

200

250

0 1000 2000 3000 4000

Advertising Budget (Thousands of $)

Residuals (Thousands of

Record Sales)

Page 7: PSY 1950 Regression November 10, 2008. Definition Simple linear regression –Models the linear relationship between one predictor variable and one outcome

Minimizing Residuals• Can define the best fit line by summing– Absolute residuals (Method of Least Absolute Deviations)

– Squared residuals (Method of Least Squares)

Page 8: PSY 1950 Regression November 10, 2008. Definition Simple linear regression –Models the linear relationship between one predictor variable and one outcome

Which is Better?• Method of Least Squares

– Not robust– Stable (line doesn’t “jump” with small changes in X)

– Only one solution (unique line for each dataset)

• Method of Least Absolute Deviations– Robust– Unstable (line does “jump” with small changes in X)

– Multiple solutions (sometimes)• http://www.math.wpi.edu/Course_Materials/SAS/lablets/7.3/7.3c/lab73c.html

Page 9: PSY 1950 Regression November 10, 2008. Definition Simple linear regression –Models the linear relationship between one predictor variable and one outcome

Multiple Solutions• Any line within the “green zone” produces the same summed residuals via the method of least absolute deviations

QuickTime™ and a decompressor

are needed to see this picture.

Page 10: PSY 1950 Regression November 10, 2008. Definition Simple linear regression –Models the linear relationship between one predictor variable and one outcome

Method of (Ordinary) Least Squares-1.02738397 0.34691735

y = 0.063x + 131.59

R2 = 0.355

0

50

100

150

200

250

300

350

400

0 1000 2000 3000 4000

Advertising Budget (Thousands of $)

Record Sales (Thousands)

Page 11: PSY 1950 Regression November 10, 2008. Definition Simple linear regression –Models the linear relationship between one predictor variable and one outcome

Regression Coefficients• Slope

• Intercept

Page 12: PSY 1950 Regression November 10, 2008. Definition Simple linear regression –Models the linear relationship between one predictor variable and one outcome

Standardized Coefficients

^

^

Page 13: PSY 1950 Regression November 10, 2008. Definition Simple linear regression –Models the linear relationship between one predictor variable and one outcome

Y = 0.5958X

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

-2 -1 0 1 2 3 4

Advertising Budget (z-score)

Record Sales (z-score)

Page 14: PSY 1950 Regression November 10, 2008. Definition Simple linear regression –Models the linear relationship between one predictor variable and one outcome

Regression Line Passes Through (MX, MY)

Page 15: PSY 1950 Regression November 10, 2008. Definition Simple linear regression –Models the linear relationship between one predictor variable and one outcome

Correlation and Regression • Statistical distinction based on nature of the variables– In correlation, both X and Y are random– In regression, X is fixed and Y is random

• Practical distinction based on interest of researcher– With correlation, the researcher asks: What is the strength (and direction) of the linear relationship between X and Y

– With regression, the research asks the above and/or: How do I predict Y given X?

Page 16: PSY 1950 Regression November 10, 2008. Definition Simple linear regression –Models the linear relationship between one predictor variable and one outcome

Goodness of Fit• The regression equation does not reveal how well your data fit your model– e.g., in the below, both sets of data produce the same regression equation

0

1

2

3

4

5

6

4 5 6 7 8 9 100

1

2

3

4

5

6

4 5 6 7 8 9 10

Page 17: PSY 1950 Regression November 10, 2008. Definition Simple linear regression –Models the linear relationship between one predictor variable and one outcome

Standard Error of Estimate• The standard residual

• Why df = n - 2?– To determine regression equation (and thus the residuals), we need to estimate two population parameters• Slope and intercept OR• Mean of X and mean of Y

– A regression with n = 2 has no df

^

Page 18: PSY 1950 Regression November 10, 2008. Definition Simple linear regression –Models the linear relationship between one predictor variable and one outcome

Coefficicent of Determination (r2)

Page 19: PSY 1950 Regression November 10, 2008. Definition Simple linear regression –Models the linear relationship between one predictor variable and one outcome

0

1

2

3

4

5

6

0 1 2 3 4 5 6

0

1

2

3

4

5

6

0 1 2 3 4 5 6

0

1

2

3

4

5

6

0 1 2 3 4 5 6

Partitioning Sums of Squares

Page 20: PSY 1950 Regression November 10, 2008. Definition Simple linear regression –Models the linear relationship between one predictor variable and one outcome

Partitioning Sums of Squares

Page 21: PSY 1950 Regression November 10, 2008. Definition Simple linear regression –Models the linear relationship between one predictor variable and one outcome

Testing the Model

# predictors

n minus # model parametersn minus (1 + # predictors)

Page 22: PSY 1950 Regression November 10, 2008. Definition Simple linear regression –Models the linear relationship between one predictor variable and one outcome

Online Applets• Explaining variance

– http://www.duxbury.com/authors/mcclellandg/tiein/johnson/reg.htm

• Leverage– http://www.stat.sc.edu/~west/javahtml/Regression.html

• Distribution of slopes/intercepts– http://lstat.kuleuven.be/java/version2.0/Applet003.html