lesson 8 linear correlation and regression

45
Simple Linear Simple Linear Regression and Regression and Correlation Correlation Teaching Assistant: Teaching Assistant: Zuo Xiaoy Zuo Xiaoy u u Chapter 8

Upload: sumit-prajapati

Post on 17-Jan-2015

11.587 views

Category:

Technology


4 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Lesson 8 Linear Correlation And Regression

Simple Linear Simple Linear Regression and Regression and

CorrelationCorrelation

Teaching Assistant: Teaching Assistant: Zuo XiaoyuZuo Xiaoyu

Chapter 8

Page 2: Lesson 8 Linear Correlation And Regression

OutlineOutline Discussion partDiscussion part Steps of performing correlation Steps of performing correlation

and regressionand regressionThe The Distinguish and connectionDistinguish and connection

Between Linear Correlation and Between Linear Correlation and Regression?Regression?

Experiment part (incorporated in Experiment part (incorporated in discussion part)discussion part)

Simple linear correlationSimple linear correlationSimple linear regressionSimple linear regression

Exercise partExercise part

Page 3: Lesson 8 Linear Correlation And Regression

Discussion partDiscussion part

Page 4: Lesson 8 Linear Correlation And Regression

CaseCase

In a study of the relationship In a study of the relationship between plasma amphetamine levels between plasma amphetamine levels and amphetamine-induced psychosis, 10 and amphetamine-induced psychosis, 10 psychosis amphetamine abusers psychosis amphetamine abusers underwent psychiatric evaluation and underwent psychiatric evaluation and were assigned a psychosis intensity were assigned a psychosis intensity score. At the same time, plasma score. At the same time, plasma amphetamine levels in these patients amphetamine levels in these patients were determined. The results are shown were determined. The results are shown in Table 8-1in Table 8-1 Data file: discussion.s

av

Page 5: Lesson 8 Linear Correlation And Regression

Table 8-1 psychosis intensity scores and Table 8-1 psychosis intensity scores and plasma amphetamine levels for 10 chronic plasma amphetamine levels for 10 chronic

amphetamine abusersamphetamine abuserspatienpatien

ttPsychosis intensity score Psychosis intensity score

(Y)(Y)Plasma Plasma

amphetamine amphetamine mg/ml (X)mg/ml (X)

11 1010 150150

22 3030 300300

33 2020 250250

44 1515 150150

55 4545 450450

66 3535 400400

77 5050 425425

88 1515 200200

99 4040 350350

1010 5555 475475

Page 6: Lesson 8 Linear Correlation And Regression

Question 1Question 1

Is there an intuitive Is there an intuitive relationship between plasma relationship between plasma amphetamine levels and amphetamine levels and amphetamine-induced amphetamine-induced psychosis ?psychosis ?

Scatter plot diagram

Both variables are random

Page 7: Lesson 8 Linear Correlation And Regression

ProcedureProcedure

8.1.2 Data File8.1.2 Data File Variable Name: x; Variable Label: Psychosis Variable Name: x; Variable Label: Psychosis

intensity scoresintensity scores Variable Name: y; Variable Label: Plasma Variable Name: y; Variable Label: Plasma

amphetamine (mg/ml)amphetamine (mg/ml) 8.1.3 Procedure 8.1.3 Procedure (1)(1) scatter diagramscatter diagram from the menus, choose: Analyze Graphs from the menus, choose: Analyze Graphs

ScatterScatterIn scatter plot box, choose “simple”, click on In scatter plot box, choose “simple”, click on

button.button.In simple Scatter plot box, move In simple Scatter plot box, move yy to the box of “Y to the box of “Y

Axis” and moveAxis” and move x x to the box of “X Axis”, click on to the box of “X Axis”, click on button. button.

Page 8: Lesson 8 Linear Correlation And Regression

ProcedureProcedure

Page 9: Lesson 8 Linear Correlation And Regression

Scatter diagramScatter diagram

Page 10: Lesson 8 Linear Correlation And Regression

Different types of Different types of relationrelation

Page 11: Lesson 8 Linear Correlation And Regression

Question 2Question 2

How to quantify the How to quantify the relationship between relationship between plasma amphetamine plasma amphetamine levels and amphetamine-levels and amphetamine-induced psychosis ?induced psychosis ?

Correlation coefficient

Page 12: Lesson 8 Linear Correlation And Regression

ProcedureProcedure

(2) From the menus, choose: Analyze Correla(2) From the menus, choose: Analyze Correlate Bivariate, open “Bivariate Correlationte Bivariate, open “Bivariate Correlations” dialog box; move s” dialog box; move yy and and xx to the “Variabl to the “Variable” box; choose “Pearson” for Correlation e” box; choose “Pearson” for Correlation Coefficients (default), or choose “spearmaCoefficients (default), or choose “spearman” if the variable are not normal distributed;n” if the variable are not normal distributed; click on button. click on button.

Page 13: Lesson 8 Linear Correlation And Regression
Page 14: Lesson 8 Linear Correlation And Regression

Output and Output and InterpretationInterpretation

Pearson correlation

Spearman correlation

Page 15: Lesson 8 Linear Correlation And Regression

Correlation CoefficientCorrelation Coefficient

Pearson correlation Pearson correlation coefficientcoefficient

Spearman’s rank Spearman’s rank correlation coefficientcorrelation coefficient

•Both Both XX and and YY are random are random•XX and and YY follow bivariate norm follow bivariate normal distributional distribution

Page 16: Lesson 8 Linear Correlation And Regression

Spearman’s rank Spearman’s rank correlation coefficientcorrelation coefficient

It is useful to:It is useful to: ranked dataranked data As well as measurement dataAs well as measurement data ---- ---- not follow a normal distributionot follow a normal distributio

nn; ; or not sure about the distribution;or not sure about the distribution;

or not precisely measuredor not precisely measured or X or Y are ordinal variablesor X or Y are ordinal variables

Page 17: Lesson 8 Linear Correlation And Regression

The The directiondirection of correlation? of correlation?

-- positive or negative -- positive or negative

The The strengthstrength of correlation? high of correlation? high or not?or not?

-- Is the absolute value big -- Is the absolute value big enough? enough?

Complete correlationComplete correlation: : +1 or -1, +1 or -1,

11 r

Understanding the r

Page 18: Lesson 8 Linear Correlation And Regression

Question 3Question 3

Can we draw a conclusion Can we draw a conclusion that plasma amphetamine that plasma amphetamine levels are correlated with levels are correlated with amphetamine-induced amphetamine-induced psychosis in population?psychosis in population?

What is the actual situation What is the actual situation in population?in population?

Hypothesis testing on r

Interval estimate of ρ

Page 19: Lesson 8 Linear Correlation And Regression

Hypothesis testing and Hypothesis testing and interval estimationinterval estimation

tt test test (Assume normal (Assume normal distribution)distribution) HH00: : ρρ=0, =0, HH11: : ρρ≠0≠0

Interval estimationInterval estimation

2

2

1

02

n

n

r

rt

Inverse  of hyperbolic tangent

Page 20: Lesson 8 Linear Correlation And Regression

Short summaryShort summary

Scatter plot diagramScatter plot diagram Compute correlation indexCompute correlation index (descriptive)(descriptive) Is the index statistically significant? Is the index statistically significant?

-----hypothesis testing -----hypothesis testing (inference)(inference) Interpretation of correlation Interpretation of correlation

coefficientcoefficient (application)(application)

Page 21: Lesson 8 Linear Correlation And Regression

Question 4Question 4

Could we predict the psychosis Could we predict the psychosis intensity score from the plasma intensity score from the plasma amphetamine levels ?amphetamine levels ?

EX: Could we estimate and predict EX: Could we estimate and predict the psychosis intensity score when the psychosis intensity score when the plasma amphetamine levels is the plasma amphetamine levels is 440 and 460?440 and 460? Linear regression

Page 22: Lesson 8 Linear Correlation And Regression

Procedure Procedure

From the menus, choose: Analyze From the menus, choose: Analyze Regression Linear, open “Linear Regression Linear, open “Linear Regression” dialog box; move Regression” dialog box; move yy to to “Dependent” box and move “Dependent” box and move xx to to “Independent” box; click on “Independent” box; click on button.button.

Page 23: Lesson 8 Linear Correlation And Regression

Y

X

Page 24: Lesson 8 Linear Correlation And Regression

Output and Output and InterpretationInterpretation

Intercept and slope

Page 25: Lesson 8 Linear Correlation And Regression

Question 5Question 5 Could this regression equation be Could this regression equation be

established in the our studying established in the our studying population?population?

Could we use this regression Could we use this regression equation to predict the psychosis equation to predict the psychosis intensity score when the plasma intensity score when the plasma amphetamine levels is 440 and 460, amphetamine levels is 440 and 460, respectively? respectively?

Hypothesis testing on the total equation--ANOVA

Page 26: Lesson 8 Linear Correlation And Regression

Output and Output and InterpretationInterpretation

ANOVA result

Page 27: Lesson 8 Linear Correlation And Regression

Question 6Question 6

What is the proportion of the What is the proportion of the psychosis intensity score could we psychosis intensity score could we explain from the plasma explain from the plasma amphetamine levels ?amphetamine levels ?

Could we view the plasma Could we view the plasma amphetamine levels as the influence amphetamine levels as the influence factor of the amphetamine-induced factor of the amphetamine-induced psychosis ?psychosis ?

R square

Hypothesis testing on regression coefficient---t-test

Page 28: Lesson 8 Linear Correlation And Regression

Output and Output and InterpretationInterpretation

R square

Hypothesis testing on the regression coefficient

Page 29: Lesson 8 Linear Correlation And Regression

Short summaryShort summary Scatter plot diagramScatter plot diagram Compute the slope and intercept of Compute the slope and intercept of

samplesample (descriptive)(descriptive) Is the regression equation significant? Is the regression equation significant?

-----ANOVA -----ANOVA (inference)(inference) Is the regression coefficient Is the regression coefficient

significant? -----one sample significant? -----one sample tt-test -test (inference)(inference) Interpretation and application of the Interpretation and application of the

regression model.regression model. (application)(application)

Page 30: Lesson 8 Linear Correlation And Regression

Basic assumptions Basic assumptions -------- LINE LINE

(1) (1) LLinear inear :: There exists a linear tendency There exists a linear tendency between the dependent variable and the between the dependent variable and the independent variableindependent variable

(2) (2) IIndependent ndependent :: The individual The individual observations are independent each otherobservations are independent each other

(3)(3) NNormalormal :: Given the value of, the Given the value of, the corresponding follows a normal distribution corresponding follows a normal distribution

(4) (4) EEqualqual variancesvariances : : The variances of for The variances of for different values of are all equal, denoted different values of are all equal, denoted with .with .

Page 31: Lesson 8 Linear Correlation And Regression

Pre-requisite for linear Pre-requisite for linear regressionregression

(1) (1) LinearLinear : There exists a linear tendency b : There exists a linear tendency between the dependent variable and the inetween the dependent variable and the independent variabledependent variable

(2)(2) Independent Independent : The individual observati : The individual observations are independent each otherons are independent each other

(3) (3) NormalNormal : Given the value of, the corresp : Given the value of, the corresponding follows a normal distribution onding follows a normal distribution

(4) (4) Equal variancesEqual variances : The variances of for d : The variances of for different values of are all equal, denoted wifferent values of are all equal, denoted with .ith .

Page 32: Lesson 8 Linear Correlation And Regression

Summary of Summary of discussion partdiscussion part

Page 33: Lesson 8 Linear Correlation And Regression

Two types of questions:Two types of questions:

Whether there is a linear Whether there is a linear relationship? relationship?

-- -- Linear correlationLinear correlation

How to predict one variable by How to predict one variable by another variable?another variable?

-- -- Linear regressionLinear regression

Page 34: Lesson 8 Linear Correlation And Regression

Summary Summary The The Distinguish and connectionDistinguish and connection

Between Linear Correlation and Between Linear Correlation and Regression?Regression?

Basic conceptsBasic concepts Basic assumptions for dataBasic assumptions for data Correlation Coefficient and Correlation Coefficient and

Regression CoefficientRegression Coefficient

Page 35: Lesson 8 Linear Correlation And Regression

Summary Summary Assumptions:Assumptions:

Correlation: Both Correlation: Both XX and and YY are are random random

Regression: (LINE)Regression: (LINE)

YY must be random must be random

X X could be random or notcould be random or not randomrandom

Correlation Coefficient (r)Correlation Coefficient (r)

2

2

22

)(

)(

))((

)()(

))((

YYl

XXl

YYXXl

ll

l

YYXX

YYXXr

YY

XX

XY

YYXX

XY

Page 36: Lesson 8 Linear Correlation And Regression

Summary Summary Linear Regression Equation, Regression CLinear Regression Equation, Regression C

oefficient (b)oefficient (b)

Try to estimate Try to estimate and and , getting, getting|y x X

bXaY ˆ

2)(

))((

xx

yyxx

l

lb

i

ii

xx

xy

Page 37: Lesson 8 Linear Correlation And Regression

Summary Summary Connection:Connection: When both When both XX and and YY are ran are ran

dom dom 1) Same sign for Correlation Coefficient 1) Same sign for Correlation Coefficient and Regression Coefficient and Regression Coefficient 2)2) t t tests are equivalent tests are equivalent ttr r = = ttbb

3) Determination3) Determination Coefficient Coefficient R=SSR=SSregressionregression/SS/SStotaltotal

R=rR=r22

Page 38: Lesson 8 Linear Correlation And Regression

CorrelationCorrelation Regression Regression ImplicatiImplicati

ononQuantify the Quantify the relationship relationship between two or between two or more variables.more variables.

Investigate the dependency Investigate the dependency relationship between the relationship between the independent and dependent independent and dependent variables.variables.

Pre-Pre-requisiterequisite

Bivariate normal dBivariate normal distributionistribution

Independent variable be a Independent variable be a normally distributed random normally distributed random variable.variable.

ApplicatiApplication on

investigate the investigate the quantitative quantitative associationassociation

1.investigate the quantitative 1.investigate the quantitative dependency relationship dependency relationship between variablesbetween variables

2.prediction 2.prediction

3. variable selection3. variable selection

connecticonnectionon

1. The correlation coefficient has the same sign as 1. The correlation coefficient has the same sign as regression coefficient.regression coefficient.

2. The hypothesis testing for correlation coefficient 2. The hypothesis testing for correlation coefficient and regression coefficient is equivalent.and regression coefficient is equivalent.

3. For bivariate normal distributed variables, regression 3. For bivariate normal distributed variables, regression could be used to interpret correlation:could be used to interpret correlation: The high determine coefficient indicates the X is closely The high determine coefficient indicates the X is closely correlated to Y. correlated to Y.

Page 39: Lesson 8 Linear Correlation And Regression

Discussion——true or Discussion——true or false?false?

1. Put any two variables together for 1. Put any two variables together for correlation and regression ?correlation and regression ?

((×× They must have some relation in subject They must have some relation in subject matter)matter)

2. Correlation and regression mean causality?2. Correlation and regression mean causality? ((×× sometimes may be indirect relation or sometimes may be indirect relation or

even no any real relation)even no any real relation)3. 3. A big value of r means a big regression

coefficient b? (××)4. To reject means that the

correlation is strong? (× (× just only means just only means )

0:0 H

0

Page 40: Lesson 8 Linear Correlation And Regression

Discussion——true or Discussion——true or false?false?

5. A regression equation is statistically significant means that one can well predict Y by X ?

(× well predict or not depends on coefficient of determination)

6. The regression equation is allowed to be applied beyond the range of the data set ?

(×)

Page 41: Lesson 8 Linear Correlation And Regression

To explore the correlation between To explore the correlation between the heights of father and son, 20 graduate the heights of father and son, 20 graduate

male male students were randomly selected from a students were randomly selected from a

name name list of graduates in a high school. The list of graduates in a high school. The

heights heights (cm) of fathers and sons were measured. (cm) of fathers and sons were measured.

(1) What is the relationship of the heights (1) What is the relationship of the heights of father and son?of father and son?

(2) Can we predict the son’s height if a (2) Can we predict the son’s height if a father with height 166 cm?father with height 166 cm?

ExerciseExercise

Page 42: Lesson 8 Linear Correlation And Regression

Heights (cm) of 20 pairs of father Heights (cm) of 20 pairs of father and sonand son

No. 1 2 3 4 5 6 7 8 9 10

Father’s height, X 150 153 155 158 161 164 165 167 168 169 Son’s height, Y 159 157 163 166 169 170 169 167 169 170

No. 11 12 13 14 15 16 17 18 19 20

Father’s height, X 170 171 172 174 175 177 178 181 183 185 Son’s height, Y 173 170 170 176 178 174 173 178 176 180

Page 43: Lesson 8 Linear Correlation And Regression

About HomeworkAbout Homework Test forTest for D Differenceifference —— ——treated as paired designtreated as paired design

Test for Test for AAssociationssociation ————treated as independent designtreated as independent design 2

McNemar

Page 44: Lesson 8 Linear Correlation And Regression

AssignmentAssignment

P 129 N. 5P 129 N. 5

Page 45: Lesson 8 Linear Correlation And Regression

Thank you!!!Thank you!!!