correlation and linear regression peter t. donnan professor of epidemiology and biostatistics...
Post on 13-Dec-2015
224 Views
Preview:
TRANSCRIPT
Correlation and Linear Regression
Peter T. Donnan
Professor of Epidemiology and Biostatistics
Statistics for Health ResearchStatistics for Health Research
CONTENTS
• Correlation coefficients• meaning• values• role• significance
• Regression• line of best fit• prediction• significance
2
INTRODUCTION
• Correlation• the strength of the linear relationship between
two variables
• Regression analysis• determines the nature of the relationship
• For example - Is there a relationship between the number of units of alcohol consumed and the likelihood of developing cirrhosis of the liver?
3
PEARSON’S COEFFICIENT OF CORRELATION (r)
• Measures the strength of the linear relationship between one dependent and one independent variable• curvilinear relationships need other techniques
• Values lie between +1 and -1• perfect positive correlation r = +1 • perfect negative correlation r = -1• no linear relationship r = 0
4
PEARSON’S COEFFICIENT OF CORRELATION
5
r = +1
r = -1
r = 0.6
r = 0
SCATTER PLOT
6
dependent variable
make inferences about
independent variable
Calcium intake
BMD
NON-NORMAL DATA
7
NORMALISED WITH LOG TRANSFORMATION
8
SPSS OUTPUT: SCATTER PLOT
9
SPSS OUTPUT: CORRELATIONS
10
11
Interpreting correlation
Large r does not necessarily imply: strong correlation
r tends to increase with sample size cause and effect
strong correlation between the number of televisions sold and the number of cases of paranoid schizophrenia
watching TV causes paranoid schizophrenia
may be due to indirect relationship
12
Interpreting correlation
Variation in dependent variable due to: relationship with independent variable: r2 random noise: 1 - r2 r2 is the Coefficient of Determination or
Variation explained e.g. r = 0.661 r2 = = 0.44 less than half of the variation (44%) in
the dependent variable due to independent variable
13
14
Agreement
Correlation should never be used to determine the level of agreement between repeated measures: measuring devices users techniques
It measures the degree of linear relationship You can have high correlation with poor
agreement
15
Non-parametric correlation
Make no assumptions Carried out on ranks Spearman’s
easy to calculate Kendall’s
has some advantages over distribution has better statistical
properties easier to identify concordant / discordant
pairs Usually both lead to same
conclusions
16
Role of regression
Shows how one variable changes with another
By determining the line of best fit Default is linear Curvilinear?
17
Line of best fit
Simplest case linear Line of best fit between:
dependent variable Y BMD
independent variable X dietary intake of Calcium
value of Y when X=0
Y = a + bX
change in Y when X increases by 1
18
Role of regression
Used to predict or explore associations the value of the dependent variable when value of independent variable(s)
known within the range of the known data
extrapolation is risky! relation between age and bone age
Does not imply causality
SPSS OUTPUT: REGRESSION
19
20
Multiple regression
Later - More than one independent variable BMD may be dependent on:
agegendercalorific intakeUse of bisphosphonatesExerciseetc
21
Summary
Correlation strength of linear relationship between two
variables Pearson’s - parametric Spearman’s / Kendall’s non-parametric Interpret with care!
Regression line of best fit prediction Multiple regression logistic
top related