correlation - computing services for faculty & staffmmm431/quant_methods_s15/qm_lectur… ·...
TRANSCRIPT
Correlation
01:830:200 Spring 2015
Correlation
Reminder: Student Instructional Rating Surveys
You have until May 7th to fill out the student instructional rating
surveys at https://sakai.rutgers.edu/portal/site/sirs
The survey should be available on any device with a full-featured
web browser. Please take the time to fill it out. Your answers:
• Will be anonymous
• Will help me to improve my teaching strategies and the structure of
the course
• Will help the department in planning and designing future courses
• Will be used by the university in promotion, tenure, and
reappointment decisions
01:830:200 Spring 2015
Correlation
Correlation: Relationships between Variables
• So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means
• However, researchers are often interested in graded relationships between variables, such as how well one variable can predict another
• Examples: – How well do SAT scores predict a student’s GPA?
– How is the amount of time a student takes to complete an exam related to her grade on that exam?
– How well do IQ scores correlate with income?
– How does a child’s height correlate with his running speed?
– How does class size affect student performance?
01:830:200 Spring 2015
Correlation
Correlation: Relationships between Variables
• Correlation is a statistical technique used to describe the
relationship between two variables.
• Usually the two variables are simply observed as they exist in
the environment (with no experimental manipulation—a
correlational study)
• However, results from experimental studies (in which one of
the variables is systematically manipulated) can also be
analyzed using correlation
01:830:200 Spring 2015
Correlation
Mean Comparison Approach
Height Weight
70 150
67 140
72 180
75 190
68 145
69 150
71.5 164
71 140
72 142
69 136
67 123
68 155
66 140
72 145
73.5 160
73 190
69 155
73 165
72 150
Weights
Short Tall
140 164
140 180
123 142
145 145
155 150
150 190
136 165
155 160
150 190
140
01:830:200 Spring 2015
Correlation
Correlation: Scatter Plots
Height Weight
70 150
67 140
72 180
75 190
68 145
69 150
71.5 164
71 140
72 142
69 136
67 123
68 155
66 140
72 145
73.5 160
73 190
69 155
73 165
72 150
01:830:200 Spring 2015
Correlation
Scatter Plots
Height Weight
70 150
67 140
72 180
75 190
68 145
69 150
71.5 164
71 140
72 142
69 136
67 123
68 155
66 140
72 145
73.5 160
73 190
69 155
73 165
72 150
01:830:200 Spring 2015
Correlation
Scatter Plots
Height Weight
70 150
67 140
72 180
75 190
68 145
69 150
71.5 164
71 140
72 142
69 136
67 123
68 155
66 140
72 145
73.5 160
73 190
69 155
73 165
72 150
01:830:200 Spring 2015
Correlation
Scatter Plots
Height Weight
70 150
67 140
72 180
75 190
68 145
69 150
71.5 164
71 140
72 142
69 136
67 123
68 155
66 140
72 145
73.5 160
73 190
69 155
73 165
72 150
01:830:200 Spring 2015
Correlation
Scatter Plots
Height Weight
70 150
67 140
72 180
75 190
68 145
69 150
71.5 164
71 140
72 142
69 136
67 123
68 155
66 140
72 145
73.5 160
73 190
69 155
73 165
72 150
01:830:200 Spring 2015
Correlation
Characteristics of the Correlation
A Correlation coefficient is a single number describing the relationship between two variables. This number describes:
• The direction of the relationship – Variables sharing a positive correlation tend to change in the same direction
(e.g., height and weight). As the value of one of the variables (height) increases, the value of other variable (weight) also increases
– Variables sharing a negative correlation tend to change in opposite directions (e.g., snowfall and beach visitors). As the value of one of the variables (amount of snowfall) increases, the value of the other variable (number of beach visitors) decreases.
• The strength of the relationship – Variables that share a strong correlation (close to +1 or -1) strongly predict
one another, while variables that share a weak correlation (near 0) do not.
01:830:200 Spring 2015
Correlation
Positive versus Negative Correlations
Positive Correlation Negative Correlation
01:830:200 Spring 2015
Correlation
Strong versus Weak Correlations
01:830:200 Spring 2015
Correlation
Correlation is not Causation
01:830:200 Spring 2015
Correlation
Possible Sources of Correlation
• The relationship is causal. – Manipulating the predictor variable causes an increase or decrease in the
criterion variable.
• E.g., leg strength and sprinting speed
• The causal relationship is backwards (reverse causality). – Manipulating the criterion variable causes changes in the predictor variable
• The two variables work together systematically to cause an effect
• The relationship may be due to one or more confounding variables – Changes in both variables reflect the effect of a confounding variable
• E.g., intelligence as an explanation for correlated performance on different exams
• E.g., increasing density in cities increases the number of physicians and the number of crimes
01:830:200 Spring 2015
Correlation
Measuring Correlation: Pearson’s r
• To compute a correlation you need a pair of scores, X and Y,
for each individual in the sample.
• The most commonly used measure of correlation is
Pearson’s product-moment correlation coefficient, or
more simply, Pearson’s r.
• Conceptually, Pearson’s r is a ratio between the degree to
which two variables (X and Y) vary together and the degree to
which they vary separately.
co-variability( , )
variability( ) variability( )
X Y
X Yr
01:830:200 Spring 2015
Correlation
The Covariance
• The term in the numerator of Pearson’s r is the covariance, an unnormalized statistic representing the degree to which two variables (X and Y) vary together.
• Mathematically, it is the average of the product of the deviations of two paired variables
• The covariance depends both on how consistently X and Y tend to vary together and on the individual variability of the variables (X and Y).
cov
1
X Y
XY
X YM M
n
01:830:200 Spring 2015
Correlation
The Covariance
Notice that the formula for covariance looks a lot like the formula for variance:
cov
1
X Y
XY
X YM M
n
2
2
1 1
X X X
X
MX M
n n
X Xs
M
01:830:200 Spring 2015
Correlation
The Covariance
Moreover, they share a similar computational formula:
2
2 2; where 1
XX X
SSSS X
X X X
ns X
nX
n
; where ov1
c XYXY XY
SPS
XP
nn
YXY
01:830:200 Spring 2015
Correlation
Computing Pearson’s r
• Pearson’s r is computed by dividing by the product of the
standard deviations of each of the variables
– This removes the effect of the variability of the individual variables
covXY XY
X Y X Y
SPr
s s SS SS
01:830:200 Spring 2015
Correlation
Computing Pearson’s r: Example
X Y
0 2
10 6
4 2
8 4
8 6
01:830:200 Spring 2015
Correlation
X Y XY
0 2 0
10 6 60
4 2 8
8 4 32
8 6 48
𝑋 = 30 𝑌 = 20 𝑋𝑌 = 148 𝑋2 = 244 𝑌2 = 96
Computing Pearson’s r: Example
2
22 30
244 244 180 645
X XX
SSN
2
22 20
96 96 80 165
Y
YSS Y
N
148 120 28XY
X Y
NSP XY
Compute SSX, SSY, & SPXY:
28 280.875
3264 16
XY
X YS
SPr
SS S
Compute r:
01:830:200 Spring 2015
Correlation
Computing Pearson’s r: Example
Hypothesis testing for r:
The null hypothesis is that the population correlation coefficient ρ = 0
The alternative hypothesis is that ρ ≠ 0
01:830:200 Spring 2015
Correlation
Level of Significance for One-Tailed Test
0.05 0.025 0.01 0.005 0.0005
Level of Significance for Two-Tailed Test
df = n-2 0.1 0.05 0.02 0.01 0.001
1 0.988 0.997 1.000 1.000 1.000
2 0.900 0.950 0.980 0.990 0.999
3 0.805 0.878 0.934 0.959 0.991
4 0.729 0.811 0.882 0.917 0.974
5 0.669 0.754 0.833 0.875 0.951
6 0.621 0.707 0.789 0.834 0.925
7 0.582 0.666 0.750 0.798 0.898
8 0.549 0.632 0.715 0.765 0.872
9 0.521 0.602 0.685 0.735 0.847
10 0.497 0.576 0.658 0.708 0.823
11 0.476 0.553 0.634 0.684 0.801
12 0.458 0.532 0.612 0.661 0.780
13 0.441 0.514 0.592 0.641 0.760
14 0.426 0.497 0.574 0.623 0.742
15 0.412 0.482 0.558 0.606 0.725
16 0.400 0.468 0.543 0.590 0.708
17 0.389 0.456 0.529 0.575 0.693
18 0.378 0.444 0.516 0.561 0.679
19 0.369 0.433 0.503 0.549 0.665
20 0.360 0.423 0.492 0.537 0.652
21 0.352 0.413 0.482 0.526 0.640
22 0.344 0.404 0.472 0.515 0.629
23 0.337 0.396 0.462 0.505 0.618
24 0.330 0.388 0.453 0.496 0.607
25 0.323 0.381 0.445 0.487 0.597
26 0.317 0.374 0.437 0.479 0.588
27 0.311 0.367 0.430 0.471 0.579
28 0.306 0.361 0.423 0.463 0.570
29 0.301 0.355 0.416 0.456 0.562
30 0.296 0.349 0.409 0.449 0.554
40 0.257 0.304 0.358 0.393 0.490
50 0.231 0.273 0.322 0.354 0.443
100 0.164 0.195 0.230 0.254 0.321
Critical values for
Pearson’s r
01:830:200 Spring 2015
Correlation
Computing Pearson’s r: Example
Hypothesis testing for r:
2
( )( ) ; 2crit
crit
crit
t dfr df df N
df t
The null hypothesis is that the population correlation coefficient ρ = 0
The alternative hypothesis is that ρ ≠ 0
Note that you can also compute rcrit using the t distribution table:
01:830:200 Spring 2015
Correlation
Level of significance for one-tailed test
0.25 0.2 0.15 0.1 0.05 0.025 0.01 0.005 0.0005
Level of significance for two-tailed test
df 0.5 0.4 0.3 0.2 0.1 0.05 0.02 0.01 0.001
1 1.000 1.376 1.963 3.078 6.314 12.706 31.821 63.657 636.619
2 0.816 1.061 1.386 1.886 2.920 4.303 6.965 9.925 31.599
3 0.765 0.978 1.250 1.638 2.353 3.182 4.541 5.841 12.924
4 0.741 0.941 1.190 1.533 2.132 2.776 3.747 4.604 8.610
5 0.727 0.920 1.156 1.476 2.015 2.571 3.365 4.032 6.869
6 0.718 0.906 1.134 1.440 1.943 2.447 3.143 3.707 5.959
7 0.711 0.896 1.119 1.415 1.895 2.365 2.998 3.499 5.408
8 0.706 0.889 1.108 1.397 1.860 2.306 2.896 3.355 5.041
9 0.703 0.883 1.100 1.383 1.833 2.262 2.821 3.250 4.781
10 0.700 0.879 1.093 1.372 1.812 2.228 2.764 3.169 4.587
11 0.697 0.876 1.088 1.363 1.796 2.201 2.718 3.106 4.437
12 0.695 0.873 1.083 1.356 1.782 2.179 2.681 3.055 4.318
13 0.694 0.870 1.079 1.350 1.771 2.160 2.650 3.012 4.221
14 0.692 0.868 1.076 1.345 1.761 2.145 2.624 2.977 4.140
15 0.691 0.866 1.074 1.341 1.753 2.131 2.602 2.947 4.073
16 0.690 0.865 1.071 1.337 1.746 2.120 2.583 2.921 4.015
17 0.689 0.863 1.069 1.333 1.740 2.110 2.567 2.898 3.965
18 0.688 0.862 1.067 1.330 1.734 2.101 2.552 2.878 3.922
19 0.688 0.861 1.066 1.328 1.729 2.093 2.539 2.861 3.883
20 0.687 0.860 1.064 1.325 1.725 2.086 2.528 2.845 3.850 21 0.686 0.859 1.063 1.323 1.721 2.080 2.518 2.831 3.819
22 0.686 0.858 1.061 1.321 1.717 2.074 2.508 2.819 3.792
23 0.685 0.858 1.060 1.319 1.714 2.069 2.500 2.807 3.768
24 0.685 0.857 1.059 1.318 1.711 2.064 2.492 2.797 3.745
25 0.684 0.856 1.058 1.316 1.708 2.060 2.485 2.787 3.725 26 0.684 0.856 1.058 1.315 1.706 2.056 2.479 2.779 3.707
27 0.684 0.855 1.057 1.314 1.703 2.052 2.473 2.771 3.690
28 0.683 0.855 1.056 1.313 1.701 2.048 2.467 2.763 3.674
29 0.683 0.854 1.055 1.311 1.699 2.045 2.462 2.756 3.659
30 0.683 0.854 1.055 1.310 1.697 2.042 2.457 2.750 3.646 40 0.681 0.851 1.050 1.303 1.684 2.021 2.423 2.704 3.551
50 0.679 0.849 1.047 1.299 1.676 2.009 2.403 2.678 3.496
100 0.677 0.845 1.042 1.290 1.660 1.984 2.364 2.626 3.390
t-Distribution Table
Two-tailed test
One-tailed test
α
t
α/2 α/2
t -t
01:830:200 Spring 2015
Correlation
Computing Pearson’s r: Example
2
( )( ) ; 2 5 3 3
( )
critcrit
crit
t dfr df df N
df t df
(3) 3.182critt
2 2
3.182 3.182
3.6233 3.0 8
182.87crit
crit
crit
tr
df t
01:830:200 Spring 2015
Correlation
Linear Correlation: Assumptions
1. Linearity
– Assumes that the relationship between the paired scores is best
described by a straight line
2. Normality
– Assumes that the marginal score distributions, their joint distribution,
and any conditional distributions are normally distributed
3. Homoscedasticity
– Assumes that the variability around the regression line is
homogeneous across different score values
01:830:200 Spring 2015
Correlation
Other Correlation Coefficients
• Spearman’s correlation coefficient (rs) for ranked data – As the name suggests, Spearman’s correlation is used when the scores
for both X and Y consist of (or have been converted to) ordinal ranks
• The point biserial correlation coefficient (rpb) – This correlation is used when one of the scores is continuous and the
other is dichotomous, taking on one of only two possible values
• The phi correlation coefficient (rϕ) – The phi correlation is used when both scores are dichotomous
All of the above can be computed in the same manner as Pearson’s correlation.
01:830:200 Spring 2015
Correlation
Converting Data for Spearman’s Correlation
Original Data
Age Height
10 31.4
11 41
12 47.8
13 52.8
14 55.7
15 58.3
16 60.7
17 62.1
18 62.7
19 63.3
20 64.1
21 64.3
22 64.6
23 64.7
24 64.5
25 64.3
r = 0.86
01:830:200 Spring 2015
Correlation
Converting Data for Spearman’s Correlation
Original Data Converted Scores
Age Height Age Rank Height rank
10 31.4 1 1
11 41 2 2
12 47.8 3 3
13 52.8 4 4
14 55.7 5 5
15 58.3 6 6
16 60.7 7 7
17 62.1 8 8
18 62.7 9 9
19 63.3 10 10
20 64.1 11 11
21 64.3 12 12.5
22 64.6 13 15
23 64.7 14 16
24 64.5 15 14
25 64.3 16 12.5
r = 0.86
r = 0.97
01:830:200 Spring 2015
Correlation
Converting Data for the Point Biserial Correlation
01:830:200 Spring 2015
Correlation
Converting Data for Phi Correlation