linear regression. psyc 6130, prof. j. elder 2 correlation vs regression: what’s the difference?...

Post on 16-Dec-2015

236 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Linear Regression

PSYC 6130, PROF. J. ELDER 2

Correlation vs Regression: What’s the Difference?

• Correlation measures how strongly related 2 variables are.

• Regression provides a means for predicting the value of one variable based on the value of a related variable.

• The underlying mathematics are the same.

• Here we are dealing only with linear correlation and linear regression.

PSYC 6130, PROF. J. ELDER 3

Optimal Prediction using z Scores

• Consider 2 variables X and Y that may be related in some way.

– e.g.,

• X = midterm score, Y = final exam score

• X = reaction time, Y = error rate

• Suppose you know X for a particular case (e.g., individual, trial). What is your best guess at Y?

• The answer turns out to be pretty simple:

Y Xz rz

PSYC 6130, PROF. J. ELDER 4

Example: 6130A 2005-06 Assignment marksAssignment 1 Assignment 2

X Y86.7% 81.8%81.5% 82.4%85.0% 84.3%85.5% 86.8%90.2% 83.6%95.4% 87.4%91.9% 93.1%93.1% 93.1%94.8% 91.8%93.6% 93.7%94.8% 93.1%94.2% 94.3%94.8% 95.6%

Mean 90.9% 89.3%Sample Std. Dev. 4.66% 5.04%

PSYC 6130, PROF. J. ELDER 5

Graphical Representation

0.7998Y Xz z

Regression line

PSYC 6130A 2005-06

-3

-2

-1

0

1

2

3

-3 -2 -1 0 1 2 3

Assignment 1 z-Score

Ass

ignm

ent

2 z-

Sco

re

PSYC 6130, PROF. J. ELDER 6

The Raw-Score Regression Formula

YX YXY a b X

( )YY X

X

Y r X

YYX

X

b r

YX Y YX Xa b

or

where

In terms of population parameters: In terms of sample statistics:

YX YXY a b X

( )Y

X

sY Y r X X

s

YYX

X

sb r

s

YX YXa Y b X

or

where

PSYC 6130, PROF. J. ELDER 7

Example: 6130A 2005-06 Assignment marksAssignment 1 Assignment 2

X Y86.7% 81.8%81.5% 82.4%85.0% 84.3%85.5% 86.8%90.2% 83.6%95.4% 87.4%91.9% 93.1%93.1% 93.1%94.8% 91.8%93.6% 93.7%94.8% 93.1%94.2% 94.3%94.8% 95.6%

Mean 90.9% 89.3%Sample Std. Dev. 4.66% 5.04%

PSYC 6130, PROF. J. ELDER 8

Graphical Representation

PSYC 6130 Section A 2005-2006

75%

80%

85%

90%

95%

100%

80% 85% 90% 95% 100%

Assignment 1 Grade

Ass

ign

me

nt 2

Gra

de0.867

10.5%YX

YX

b

a

y = 0.867x + 10.5%

Regression line

PSYC 6130, PROF. J. ELDER 9

Residuals• The deviations of the actual Y values from the Y values predicted by

the regression line are called residuals.

• The regression line minimizes the sum of squared residuals (and hence is called a mean-squared fit).

PSYC 6130 Section A 2005-2006

75%

80%

85%

90%

95%

100%

80% 85% 90% 95% 100%

Assignment 1 Grade

Ass

ign

me

nt 2

Gra

de

Y

Yresidual Y Y

PSYC 6130, PROF. J. ELDER 10

Variance of the Estimate

• Total prediction error is expressed as the variance of the estimate (or mean-squared error) :

22est Y

( )Y Y

N

2 2est YNote that .Y

Equality applies only when 0.r

2

2est Y

( )

2

Y Ys

N

In terms of population parameters: In terms of sample statistics:

2est Y

est Y est Y ( ) standard error of is calle the estid mh .t e ates

PSYC 6130, PROF. J. ELDER 11

Explained and Unexplained Variance

2 2exp

1Explained Variance: ( )

N YY

PSYC 6130 Section A 2005-2006

75%

80%

85%

90%

95%

100%

80% 85% 90% 95% 100%

Assignment 1 Grade

Ass

ign

me

nt 2

Gra

de Y

Y

Y

Unexplained

Explained

2 2est

1Unexplained Variance ( )

Ny Y Y

PSYC 6130, PROF. J. ELDER 12

Summary of Variances

22exp

( )Explained Variance: YY

N

22

( )Unexplained Variance est Y

Y Y

N

Population:2

2 ( )Total Variance Y

Y

Y

N

PSYC 6130, PROF. J. ELDER 13

Summary of Variances

• It can be shown that:

• i.e., the variance is equal to the sum of the explained and unexplained variances.

Population:

2 2 2exp Y estY

PSYC 6130, PROF. J. ELDER 14

Summary of Variances

Sample:

2 2 2expExplained Variance: Y estYs s s

22

( )Unexplained Variance

2est Y

Y Ys

N

22 ( )

Total Variance s 1Y

Y Y

N

PSYC 6130, PROF. J. ELDER 15

Coefficient of Determination• The fraction of the total variance explained by the regression line is

called the coefficient of determination

• It can be shown that this is just the square of the Pearson coefficient r:

• Population:

• Sample:

2 22

2 2

( )Coefficient of Determination 1

( )Y estY

Y Y

Yr

Y

2 22

2 2

( ) 2Coefficient of Determination 1

( ) 1estY

Y

Y Y snr

Y Y n s

PSYC 6130, PROF. J. ELDER 16

Coefficient of Nondetermination• The fraction of the total variance that remains unexplained by the

regression line is called the coefficient of nondetermination

• It can be shown that this is just 1-r2:

• Population:

• Sample:

2 22

2 2

( )Coefficient of Nondetermination 1-

( )estY

Y Y

Y Yr

Y

2 22

2 2

( ) 2Coefficient of Nondetermination 1-

( ) 1estY

Y

Y Y snr

Y Y n s

PSYC 6130, PROF. J. ELDER 17

Summary of Coefficients

2 22

2 2

Coefficient of Determination:

( )r 1

( )Y estY

Y Y

Y

Y

Population: Sample:

2 22 est Y

2 2

Coefficient of Nondetermination:

( )1-r

( )Y Y

Y Y

Y

2 22

2 2

Coefficient of Determination:

( ) 2r 1

( ) 1estY

Y

Y Y sn

Y Y n s

2 22 est Y

2 2

Coefficient of Nondetermination:

( ) 21-r

( ) 1 Y

Y Y sn

Y Y n s

PSYC 6130, PROF. J. ELDER 18

Components of Variance: SPSS Output

ANOVA b

861347.2 1 861347.186 7465.139 .000 a

1325861 11491 115.383

2187209 11492

Regression

Residual

Total

Model

1

Sum of

Squares df Mean Square F Sig.

Predictors: (Constant), How tall are you without your shoes on (in cm.)a.

Dependent Variable: How much do you weigh (in kilograms)b.

2Explained SS: ( )Y Y

2Unexplained SS: ( )Y Y 2Total SS: ( )Y Y

22

( )Unexplained Variance

2est Y

Y Ys

N

PSYC 6130, PROF. J. ELDER 19

Estimating the Variance of the Estimate

• Uncertainty in predictions can be estimated using the assumption of homoscedasticity.

– (Etymology: hom- + Greek skedastikos able to disperse, from skedannynai to disperse)

– Thought question: does this also explain the origin of the verb skedaddle?

– In other words, homogeneity of variance in Y over the range of X.

PSYC 6130, PROF. J. ELDER 20

Confidence Intervals for Predictions

2

2

1 ( )1

( 1)crit estYX

X XY Y t s

N N s

PSYC 6130, PROF. J. ELDER 21

Example: 6130A 2005-06 Assignment marksAssignment 1 Assignment 2

X Y86.7% 81.8%81.5% 82.4%85.0% 84.3%85.5% 86.8%90.2% 83.6%95.4% 87.4%91.9% 93.1%93.1% 93.1%94.8% 91.8%93.6% 93.7%94.8% 93.1%94.2% 94.3%94.8% 95.6%

Mean 90.9% 89.3%Sample Std. Dev. 4.66% 5.04%

0.7998r

PSYC 6130, PROF. J. ELDER 22

Underlying Assumptions

• Independent random sampling

• Linearity

• Normal Distribution

• Homoscedasticity

PSYC 6130, PROF. J. ELDER 23

Regressing X on Y• Simply reverse the formulae, e.g.,

In terms of sample statistics:

XY XYX a b Y

( )X

Y

sX X r Y Y

s

XXY

Y

sb r

s

XY XYa X b Y

or

where

PSYC 6130, PROF. J. ELDER 24

When to Use Linear Regression

• Prediction

• Statistical Control

– Adjust for effects of confounding variable.

– Also known as partialing out the effect of the confounding variable.

• Experimental Psychology: modeling effect of continuous independent variable on continuous dependent variable.

– e.g., reaction time vs set size in visual search.

PSYC 6130, PROF. J. ELDER 25

Statistical Control Example: Mental Health

Women report more bad mental health days than men, t(8176)=-7.1, p<.001, 2-tailed.

PSYC 6130, PROF. J. ELDER 26

Statistical Control Example: Physical Health

PSYC 6130, PROF. J. ELDER 27

Correlation

Pearson’s r = 0.31

PSYC 6130, PROF. J. ELDER 28

After Partialing Out Physical Health

PSYC 6130, PROF. J. ELDER 29

Result of Partialing Out Physical Health

Controlling for physical health, women report more bad mental health days than men, t(8176)=-5.7, p<.001, 2-tailed.

top related