lesson 8 linear correlation and regression
DESCRIPTION
TRANSCRIPT
Simple Linear Simple Linear Regression and Regression and
CorrelationCorrelation
Teaching Assistant: Teaching Assistant: Zuo XiaoyuZuo Xiaoyu
Chapter 8
OutlineOutline Discussion partDiscussion part Steps of performing correlation Steps of performing correlation
and regressionand regressionThe The Distinguish and connectionDistinguish and connection
Between Linear Correlation and Between Linear Correlation and Regression?Regression?
Experiment part (incorporated in Experiment part (incorporated in discussion part)discussion part)
Simple linear correlationSimple linear correlationSimple linear regressionSimple linear regression
Exercise partExercise part
Discussion partDiscussion part
CaseCase
In a study of the relationship In a study of the relationship between plasma amphetamine levels between plasma amphetamine levels and amphetamine-induced psychosis, 10 and amphetamine-induced psychosis, 10 psychosis amphetamine abusers psychosis amphetamine abusers underwent psychiatric evaluation and underwent psychiatric evaluation and were assigned a psychosis intensity were assigned a psychosis intensity score. At the same time, plasma score. At the same time, plasma amphetamine levels in these patients amphetamine levels in these patients were determined. The results are shown were determined. The results are shown in Table 8-1in Table 8-1 Data file: discussion.s
av
Table 8-1 psychosis intensity scores and Table 8-1 psychosis intensity scores and plasma amphetamine levels for 10 chronic plasma amphetamine levels for 10 chronic
amphetamine abusersamphetamine abuserspatienpatien
ttPsychosis intensity score Psychosis intensity score
(Y)(Y)Plasma Plasma
amphetamine amphetamine mg/ml (X)mg/ml (X)
11 1010 150150
22 3030 300300
33 2020 250250
44 1515 150150
55 4545 450450
66 3535 400400
77 5050 425425
88 1515 200200
99 4040 350350
1010 5555 475475
Question 1Question 1
Is there an intuitive Is there an intuitive relationship between plasma relationship between plasma amphetamine levels and amphetamine levels and amphetamine-induced amphetamine-induced psychosis ?psychosis ?
Scatter plot diagram
Both variables are random
ProcedureProcedure
8.1.2 Data File8.1.2 Data File Variable Name: x; Variable Label: Psychosis Variable Name: x; Variable Label: Psychosis
intensity scoresintensity scores Variable Name: y; Variable Label: Plasma Variable Name: y; Variable Label: Plasma
amphetamine (mg/ml)amphetamine (mg/ml) 8.1.3 Procedure 8.1.3 Procedure (1)(1) scatter diagramscatter diagram from the menus, choose: Analyze Graphs from the menus, choose: Analyze Graphs
ScatterScatterIn scatter plot box, choose “simple”, click on In scatter plot box, choose “simple”, click on
button.button.In simple Scatter plot box, move In simple Scatter plot box, move yy to the box of “Y to the box of “Y
Axis” and moveAxis” and move x x to the box of “X Axis”, click on to the box of “X Axis”, click on button. button.
ProcedureProcedure
Scatter diagramScatter diagram
Different types of Different types of relationrelation
Question 2Question 2
How to quantify the How to quantify the relationship between relationship between plasma amphetamine plasma amphetamine levels and amphetamine-levels and amphetamine-induced psychosis ?induced psychosis ?
Correlation coefficient
ProcedureProcedure
(2) From the menus, choose: Analyze Correla(2) From the menus, choose: Analyze Correlate Bivariate, open “Bivariate Correlationte Bivariate, open “Bivariate Correlations” dialog box; move s” dialog box; move yy and and xx to the “Variabl to the “Variable” box; choose “Pearson” for Correlation e” box; choose “Pearson” for Correlation Coefficients (default), or choose “spearmaCoefficients (default), or choose “spearman” if the variable are not normal distributed;n” if the variable are not normal distributed; click on button. click on button.
Output and Output and InterpretationInterpretation
Pearson correlation
Spearman correlation
Correlation CoefficientCorrelation Coefficient
Pearson correlation Pearson correlation coefficientcoefficient
Spearman’s rank Spearman’s rank correlation coefficientcorrelation coefficient
•Both Both XX and and YY are random are random•XX and and YY follow bivariate norm follow bivariate normal distributional distribution
Spearman’s rank Spearman’s rank correlation coefficientcorrelation coefficient
It is useful to:It is useful to: ranked dataranked data As well as measurement dataAs well as measurement data ---- ---- not follow a normal distributionot follow a normal distributio
nn; ; or not sure about the distribution;or not sure about the distribution;
or not precisely measuredor not precisely measured or X or Y are ordinal variablesor X or Y are ordinal variables
The The directiondirection of correlation? of correlation?
-- positive or negative -- positive or negative
The The strengthstrength of correlation? high of correlation? high or not?or not?
-- Is the absolute value big -- Is the absolute value big enough? enough?
Complete correlationComplete correlation: : +1 or -1, +1 or -1,
11 r
Understanding the r
Question 3Question 3
Can we draw a conclusion Can we draw a conclusion that plasma amphetamine that plasma amphetamine levels are correlated with levels are correlated with amphetamine-induced amphetamine-induced psychosis in population?psychosis in population?
What is the actual situation What is the actual situation in population?in population?
Hypothesis testing on r
Interval estimate of ρ
Hypothesis testing and Hypothesis testing and interval estimationinterval estimation
tt test test (Assume normal (Assume normal distribution)distribution) HH00: : ρρ=0, =0, HH11: : ρρ≠0≠0
Interval estimationInterval estimation
2
2
1
02
n
n
r
rt
Inverse of hyperbolic tangent
Short summaryShort summary
Scatter plot diagramScatter plot diagram Compute correlation indexCompute correlation index (descriptive)(descriptive) Is the index statistically significant? Is the index statistically significant?
-----hypothesis testing -----hypothesis testing (inference)(inference) Interpretation of correlation Interpretation of correlation
coefficientcoefficient (application)(application)
Question 4Question 4
Could we predict the psychosis Could we predict the psychosis intensity score from the plasma intensity score from the plasma amphetamine levels ?amphetamine levels ?
EX: Could we estimate and predict EX: Could we estimate and predict the psychosis intensity score when the psychosis intensity score when the plasma amphetamine levels is the plasma amphetamine levels is 440 and 460?440 and 460? Linear regression
Procedure Procedure
From the menus, choose: Analyze From the menus, choose: Analyze Regression Linear, open “Linear Regression Linear, open “Linear Regression” dialog box; move Regression” dialog box; move yy to to “Dependent” box and move “Dependent” box and move xx to to “Independent” box; click on “Independent” box; click on button.button.
Y
X
Output and Output and InterpretationInterpretation
Intercept and slope
Question 5Question 5 Could this regression equation be Could this regression equation be
established in the our studying established in the our studying population?population?
Could we use this regression Could we use this regression equation to predict the psychosis equation to predict the psychosis intensity score when the plasma intensity score when the plasma amphetamine levels is 440 and 460, amphetamine levels is 440 and 460, respectively? respectively?
Hypothesis testing on the total equation--ANOVA
Output and Output and InterpretationInterpretation
ANOVA result
Question 6Question 6
What is the proportion of the What is the proportion of the psychosis intensity score could we psychosis intensity score could we explain from the plasma explain from the plasma amphetamine levels ?amphetamine levels ?
Could we view the plasma Could we view the plasma amphetamine levels as the influence amphetamine levels as the influence factor of the amphetamine-induced factor of the amphetamine-induced psychosis ?psychosis ?
R square
Hypothesis testing on regression coefficient---t-test
Output and Output and InterpretationInterpretation
R square
Hypothesis testing on the regression coefficient
Short summaryShort summary Scatter plot diagramScatter plot diagram Compute the slope and intercept of Compute the slope and intercept of
samplesample (descriptive)(descriptive) Is the regression equation significant? Is the regression equation significant?
-----ANOVA -----ANOVA (inference)(inference) Is the regression coefficient Is the regression coefficient
significant? -----one sample significant? -----one sample tt-test -test (inference)(inference) Interpretation and application of the Interpretation and application of the
regression model.regression model. (application)(application)
Basic assumptions Basic assumptions -------- LINE LINE
(1) (1) LLinear inear :: There exists a linear tendency There exists a linear tendency between the dependent variable and the between the dependent variable and the independent variableindependent variable
(2) (2) IIndependent ndependent :: The individual The individual observations are independent each otherobservations are independent each other
(3)(3) NNormalormal :: Given the value of, the Given the value of, the corresponding follows a normal distribution corresponding follows a normal distribution
(4) (4) EEqualqual variancesvariances : : The variances of for The variances of for different values of are all equal, denoted different values of are all equal, denoted with .with .
Pre-requisite for linear Pre-requisite for linear regressionregression
(1) (1) LinearLinear : There exists a linear tendency b : There exists a linear tendency between the dependent variable and the inetween the dependent variable and the independent variabledependent variable
(2)(2) Independent Independent : The individual observati : The individual observations are independent each otherons are independent each other
(3) (3) NormalNormal : Given the value of, the corresp : Given the value of, the corresponding follows a normal distribution onding follows a normal distribution
(4) (4) Equal variancesEqual variances : The variances of for d : The variances of for different values of are all equal, denoted wifferent values of are all equal, denoted with .ith .
Summary of Summary of discussion partdiscussion part
Two types of questions:Two types of questions:
Whether there is a linear Whether there is a linear relationship? relationship?
-- -- Linear correlationLinear correlation
How to predict one variable by How to predict one variable by another variable?another variable?
-- -- Linear regressionLinear regression
Summary Summary The The Distinguish and connectionDistinguish and connection
Between Linear Correlation and Between Linear Correlation and Regression?Regression?
Basic conceptsBasic concepts Basic assumptions for dataBasic assumptions for data Correlation Coefficient and Correlation Coefficient and
Regression CoefficientRegression Coefficient
Summary Summary Assumptions:Assumptions:
Correlation: Both Correlation: Both XX and and YY are are random random
Regression: (LINE)Regression: (LINE)
YY must be random must be random
X X could be random or notcould be random or not randomrandom
Correlation Coefficient (r)Correlation Coefficient (r)
2
2
22
)(
)(
))((
)()(
))((
YYl
XXl
YYXXl
ll
l
YYXX
YYXXr
YY
XX
XY
YYXX
XY
Summary Summary Linear Regression Equation, Regression CLinear Regression Equation, Regression C
oefficient (b)oefficient (b)
Try to estimate Try to estimate and and , getting, getting|y x X
bXaY ˆ
2)(
))((
xx
yyxx
l
lb
i
ii
xx
xy
Summary Summary Connection:Connection: When both When both XX and and YY are ran are ran
dom dom 1) Same sign for Correlation Coefficient 1) Same sign for Correlation Coefficient and Regression Coefficient and Regression Coefficient 2)2) t t tests are equivalent tests are equivalent ttr r = = ttbb
3) Determination3) Determination Coefficient Coefficient R=SSR=SSregressionregression/SS/SStotaltotal
R=rR=r22
CorrelationCorrelation Regression Regression ImplicatiImplicati
ononQuantify the Quantify the relationship relationship between two or between two or more variables.more variables.
Investigate the dependency Investigate the dependency relationship between the relationship between the independent and dependent independent and dependent variables.variables.
Pre-Pre-requisiterequisite
Bivariate normal dBivariate normal distributionistribution
Independent variable be a Independent variable be a normally distributed random normally distributed random variable.variable.
ApplicatiApplication on
investigate the investigate the quantitative quantitative associationassociation
1.investigate the quantitative 1.investigate the quantitative dependency relationship dependency relationship between variablesbetween variables
2.prediction 2.prediction
3. variable selection3. variable selection
connecticonnectionon
1. The correlation coefficient has the same sign as 1. The correlation coefficient has the same sign as regression coefficient.regression coefficient.
2. The hypothesis testing for correlation coefficient 2. The hypothesis testing for correlation coefficient and regression coefficient is equivalent.and regression coefficient is equivalent.
3. For bivariate normal distributed variables, regression 3. For bivariate normal distributed variables, regression could be used to interpret correlation:could be used to interpret correlation: The high determine coefficient indicates the X is closely The high determine coefficient indicates the X is closely correlated to Y. correlated to Y.
Discussion——true or Discussion——true or false?false?
1. Put any two variables together for 1. Put any two variables together for correlation and regression ?correlation and regression ?
((×× They must have some relation in subject They must have some relation in subject matter)matter)
2. Correlation and regression mean causality?2. Correlation and regression mean causality? ((×× sometimes may be indirect relation or sometimes may be indirect relation or
even no any real relation)even no any real relation)3. 3. A big value of r means a big regression
coefficient b? (××)4. To reject means that the
correlation is strong? (× (× just only means just only means )
0:0 H
0
Discussion——true or Discussion——true or false?false?
5. A regression equation is statistically significant means that one can well predict Y by X ?
(× well predict or not depends on coefficient of determination)
6. The regression equation is allowed to be applied beyond the range of the data set ?
(×)
To explore the correlation between To explore the correlation between the heights of father and son, 20 graduate the heights of father and son, 20 graduate
male male students were randomly selected from a students were randomly selected from a
name name list of graduates in a high school. The list of graduates in a high school. The
heights heights (cm) of fathers and sons were measured. (cm) of fathers and sons were measured.
(1) What is the relationship of the heights (1) What is the relationship of the heights of father and son?of father and son?
(2) Can we predict the son’s height if a (2) Can we predict the son’s height if a father with height 166 cm?father with height 166 cm?
ExerciseExercise
Heights (cm) of 20 pairs of father Heights (cm) of 20 pairs of father and sonand son
No. 1 2 3 4 5 6 7 8 9 10
Father’s height, X 150 153 155 158 161 164 165 167 168 169 Son’s height, Y 159 157 163 166 169 170 169 167 169 170
No. 11 12 13 14 15 16 17 18 19 20
Father’s height, X 170 171 172 174 175 177 178 181 183 185 Son’s height, Y 173 170 170 176 178 174 173 178 176 180
About HomeworkAbout Homework Test forTest for D Differenceifference —— ——treated as paired designtreated as paired design
Test for Test for AAssociationssociation ————treated as independent designtreated as independent design 2
McNemar
AssignmentAssignment
P 129 N. 5P 129 N. 5
Thank you!!!Thank you!!!