linear correlation and linear regression + summary of tests dr. omar al jadaan assistant professor...

27
Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

Upload: bruno-collins

Post on 13-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

Linear correlation and linear regression + summary of tests

Dr. Omar Al JadaanAssistant Professor – Computer Science &

Mathematics

Page 2: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

Recall: Covariance

1

))((),(cov 1

n

YyXxyx

n

iii

Page 3: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

cov(X,Y) > 0 X and Y are positively correlated

cov(X,Y) < 0 X and Y are inversely correlated

cov(X,Y) = 0 X and Y are independent

Interpreting Covariance

Page 4: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

Correlation coefficient

Pearson’s Correlation Coefficient is standardized covariance (unitless):

yx

yxariancer

varvar

),(cov

Page 5: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

Correlation Measures the relative strength of the linear

relationship between two variables Unit-less Ranges between –1 and 1 The closer to –1, the stronger the negative linear

relationship The closer to 1, the stronger the positive linear

relationship The closer to 0, the weaker any positive linear

relationship

Page 6: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

Scatter Plots of Data with Various Correlation Coefficients

Y

X

Y

X

Y

X

Y

X

Y

X

r = -1 r = -.6 r = 0

r = +.3r = +1

Y

Xr = 0

Page 7: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

Y

X

Y

X

Y

Y

X

X

Linear relationships Curvilinear relationships

Linear Correlation

Page 8: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

Y

X

Y

X

Y

Y

X

X

Strong relationships Weak relationships

Linear Correlation

Page 9: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

Linear Correlation

Y

X

Y

X

No relationship

Page 10: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

Some calculation formulas…

yx

xy

n

ii

n

ii

n

iii

n

ii

n

ii

n

iii

SSSS

SS

yyxx

yyxx

n

yy

n

xx

n

yyxx

r

1

2

1

2

1

1

2

1

2

1

)()(

))((

1

)(

1

)(

1

))((

ˆ

yx

xy

SSSS

SSr ˆ

Note: Easier computation formulas:

22

22

ynySS

xnxSS

yxnyxSS

iy

ix

iixy

Page 11: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

Sampling distribution of correlation coefficient:

*note, like a proportion, the variance of the correlation coefficient depends on the correlation coefficient itselfsubstitute in estimated r

2

1)ˆ(

2

n

rrSE

The sample correlation coefficient follows a T-distribution with n-2 degrees of freedom (since you have to estimate the standard error).

Page 12: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

What is “Linear”?

Remember this: Y=mX+B?

B

m

Page 13: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

What’s Slope?

A slope of 2 means that every 1-unit change in X yields a 2-unit change in Y.

Page 14: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

Simple linear regression

The linear regression model:

Love of Math = 5 + .01*math SAT score

intercept

slope

P=.22; not significant

Page 15: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

PredictionIf you know something about X, this knowledge helps you

predict something about Y. (Sound familiar?…sound like conditional probabilities?)

Page 16: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

EXAMPLE The distribution of baby weights at

Stanford ~ N(3400, 360000)

Your “Best guess” at a random baby’s weight, given no information about the baby, is what?

3400 grams

But, what if you have relevant information? Can you make a better guess?

Page 17: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

Predictor variable X=gestation time

Assume that babies that gestate for longer are born heavier, all other things being equal.

Pretend (at least for the purposes of this example) that this relationship is linear.

Example: suppose a one-week increase in gestation, on average, leads to a 100-gram increase in birth-weight

Page 18: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

Y depends on X

Y=birth- weight

(g)

X=gestation time (weeks)

Best fit line is chosen such that the sum of the squared (why squared?) distances of the points (Yi’s) from the line is minimized:

Or mathematically… (remember max and mins from calculus)…

Derivative[(Yi-(mx+b))2]=0

Page 19: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

Prediction

A new baby is born that had gestated for just 30 weeks. What’s your best guess at the birth-weight?

Are you still best off guessing 3400? NO!

Page 20: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

Y=birth- weight

(g)

X=gestation time (weeks)

At 30 weeks…

3000

30

Page 21: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

Y=birth weight

(g)

X=gestation time (weeks)

At 30 weeks…

(x,y)=

(30,3000)

3000

30

Page 22: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

At 30 weeks…

The babies that gestate for 30 weeks appear to center around a weight of 3000 grams.

In Math-Speak… E(Y/X=30 weeks)=3000 grams

Note the conditional expectation

Page 23: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

But…Note that not every Y-value (Yi) sits on the line. There’s variability.

Yi=3000 + random errori

In fact, babies that gestate for 30 weeks have birth-weights that center at 3000 grams, but vary around 3000 with some variance 2

Approximately what distribution do birth-weights follow? Normal. Y/X=30 weeks ~ N(3000, 2)

Page 24: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

Y=birth- weight

(g)

X=gestation time (weeks)

And, if X=20, 30, or 40…

20 30 40

Page 25: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

Y=baby weights

(g)

X=gestation times (weeks)

If X=20, 30, or 40…

20 30 40

Y/X=40 weeks ~ N(4000, 2)

Y/X=30 weeks ~ N(3000, 2)

Y/X=20 weeks ~ N(2000, 2)

Page 26: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

Mean values fall on the line

E(Y/X=40 weeks)=4000 E(Y/X=30 weeks)=3000 E(Y/X=20 weeks)=2000

E(Y/X)= Y/X = 100 grams/week*X weeks

Page 27: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

Linear Regression Model

Y’s are modeled…

Yi= 100*X + random errori

Follows a normal distribution

Fixed – exactly on the line