9.11 using statistics to make inferences 9 summary correlation – pearson correlation. spearmans...

86
9.1 Using Statistics To Make Inferences 9 Summary Correlation Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Monday 31 October 2022 03:38 PM

Upload: brett-daniels

Post on 17-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.11

Using Statistics To Make Inferences 9

Summary 

Correlation – Pearson correlation.Spearmans rank correlation.Point Biserial Correlation.

 

Tuesday 18 April 2023 03:53 PM

Page 2: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.22

Goals 

To evaluate the correlation, rank correlation and the point biserial correlation and test if they are significant.

Practical 

Perform scatter plots and evaluate correlations.

Page 3: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.33

Recall

What graph would you use to represent any possible relationship between two variables?

Scatter plotcccccccccccc

Page 4: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.44

Looking for Relationships

Raw data

nn yx

yx

yx

yx

,

...

,

,

,

33

22

11

Page 5: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.55

Which do we plot horizontally?

Horizontally

Independent

Accurate

x

Vertically

Dependent

Errors

y(x) y = m x + c y = a x + b

Cccccccccccccccccccccccccccccc

Page 6: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.66

Scatterplot

Subjects are scored on verbal and spatial reasoning skills.

Subject 1 2 3 4 5 6 7 8 910

11

12

Verbal 5066

7384

57

8376

95

7378

4853

Spatial 6985

8870

84

7890

97

7995

6760

Plot first

Which variable is “dependent”?

Page 7: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.77

Scatterplot

Verbal

Spati

al

1009080706050

100

90

80

70

60

Scatterplot of Spatial vs Verbal

Is there a relationship?

Page 8: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.88

Recallmean

variance

Given the raw data (x)

15 9 4 15 10 13 9

Find the sample mean and variance

The following sums might prove useful Σx = 75 and Σx2 = 897

Page 9: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.99

Recall

n = 7, Σx = 75 and Σx2 = 897

57.157571

89717

1

11

1

2

222

x

nx

ns

71.10775

nx

xCCCCCCCCc

Page 10: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.1010

Looking for Relationships

Correlation (r) – the Pearson Correlation

22 1iixx x

nxS 22 1

iiyy yn

yS

iiiixy yxn

yxS1

yyxx

xy

SS

Sr

Compare to the variance

Page 11: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.1111

Notation

Formally

xxSnxVariance

1

1)(

yySnyVariance

1

1)(

xySnyx

1

1),(Covariance

Page 12: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.1212

Significance Test

ν = n - 2

Degrees of freedom

Page 13: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.1313

Interpretation-1 ≤ r ≤ 1

r > rcrit a significant positive correlation

Any fitting line has a positive slope

r < -rcrit a significant negative correlation

Any fitting line has a negative slope

-rcrit < r < rcrit uncorrelated

Any fitting line is effectively horizontal

From tables the critical value is rcrit

Page 14: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.1414

AssumptionsVariables are measured at the interval or ratio level (continuous).

Variables are approximately normally distributed. Essentially neither set of data is independently skewed.

There is a linear relationship between the two variables.

Pearson’s correlation is sensitive to outliers so it is best if outliers are kept to a minimum or there are no outliers.

Page 15: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.1515

ConcernsDon’t forget causality, which means that the two sets of data may have a third influencing factor (firemen cause fires, storks bring babies…).

Variables are homoscedastic this means that there needs to be a consistent scatter pattern over the whole range. Otherwise, you may get a positive correlation over a range of the data that is tainted by an unproven correlation in another area.

Page 16: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.1616

Example

Subjects are scored on verbal and spatial reasoning skills.

Subject 1 2 3 4 5 6 7 8 9 10

11

12

Verbal x 50 66

73 84

57

83 76

95

73 78

48 53

Spatial y 69 85

88 70

84

78 90

97

79 95

67 60

Page 17: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.1717

Scatterplot

Verbal

Spati

al

1009080706050

100

90

80

70

60

Scatterplot of Spatial vs Verbal

Page 18: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.1818

Calculation

n = 12

Subject 1 2 3 4 5 6 7 8 9 10

11

12

Verbal x 50 66

73 84

57

83 76

95

73 78

48 53

Spatial y 69 85

88 70

84

78 90

97

79 95

67 60

Σxi = 50+66+…+53 = 836

Σyi = 69+85+…+60 = 962

Σxi2 = 502+662+…+532 =

60706 Σyi2 = 692+852+…+602 =

78634 Σxiyi = 50×69+66×85+…+53×60 = 68254

Page 19: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.1919

Calculation Sxx

n = 12 Σxi = 836 Σyi = 962

Σxi2 =

60706 Σyi

2 = 78634

Σxiyi = 68254

67.246412

83660706

1 222 iixx x

nxS

Page 20: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.2020

Calculation Syy

n = 12 Σxi = 836 Σyi = 962

Σxi2 = 60706 Σyi

2 = 78634

Σxiyi = 68254

67.151312

96278634

1 222 iiyy y

nyS

Page 21: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.2121

Calculation Sxy

n = 12 Σxi = 836 Σyi = 962

Σxi2 = 60706 Σyi

2 = 78634

Σxiyi = 68254

67.123412

96283668254

1 iiiixy yxn

yxS

Page 22: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.2222

Calculationn = 12 Σxi = 836 Σyi = 962

Σxi2 = 60706 Σyi

2 = 78634

Σxiyi = 68254

67.246412

83660706

1 222 iixx x

nxS

67.151312

96278634

1 222 iiyy y

nyS

67.123412

96283668254

1 iiiixy yxn

yxS

Page 23: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.2323

Conclusion67.2464xxS 67.1513yyS 67.1234xyS

64.067.151367.2464

67.1234

yyxx

xy

SS

Sr

ν = n – 2 = 10 ν p=0.1 p=0.0

5p=0.02

5p=0.0

1p=0.00

5p=0.002

10 0.497 0.576 0.640 0.708 0.750 0.795The tables give one and two tail values.Since r10(0.025) = 0.58. There appears to be a significant correlation at the 95% confidence level (0.64 > 0.58).

Page 24: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.2424

SPSSAnalyze > Correlate > Bivariate

Page 25: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.2525

SPSSThe correlation (0.64) and p (p < 0.05) value are consistent with our calculation.

Correlations

1 .639*

.025

12 12

.639* 1

.025

12 12

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Verbal

Spatial

Verbal Spatial

Correlation is significant at the 0.05 level (2-tailed).*.

Page 26: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.2626

Aside

From previous experience it is known that in a population, measurements of IQ are approximately normally distributed with standard deviation 16. Tests are carried out on a particular subgroup of 14 individuals from the population. Calculate a 95% confidence interval for the population mean. Is your interval consistent with a population mean of 132?

Observed data133.06 119.30 109.17 93.88 116.93 130.98 135.25 140.02 118.38 121.86 124.78 142.91 135.52 132.96

What are the key words/information?

Page 27: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.2727

Aside

From previous experience it is known that in a population, measurements of IQ are approximately normally distributed with standard deviation 16. Tests are carried out on a particular subgroup of 14 individuals from the population. Calculate a 95% confidence interval for the population mean. Is your interval consistent with a population mean of 132?

Observed data133.06 119.30 109.17 93.88 116.93 130.98 135.25 140.02 118.38 121.86 124.78 142.91 135.52 132.96

CCCCCCCCCCCc

Page 28: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.2828

Aside

Which tests would be appropriate for the sample mean?

z or t

Page 29: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.2929

Aside

Which tests would be appropriate for the sample mean? What are the key parameters for these tests?

z

t

µ σ n

µ s n

CCC

Page 30: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.3030

Aside

In this case which test would be appropriate for the sample mean? What are the key parameters for this test?

z µ σ nSince σ is available use the z test

CCC

Page 31: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.3131

Rank Correlation

What if we cannot assume normality or there are outliers when calculating and assessing a correlation?

If your samples violate the assumption of normality or have outliers then you might need to consider using a non-parametric test such as Spearman's Correlation.

Page 32: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.3232

Spearman’s Rank Correlation Coefficient (rs)

di is the difference between the rankings in each pair of scores; n is the number of pairs of scores.

1

61

2

1

2

nn

dr

n

ii

s

Note that rs only matches the conventional correlation (direct calculation) for the ranked data if there are no ties.

Page 33: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.3333

Example

A researcher has a theory that phonological working memory (memory for speech, the auditory component of the working memory model) is related to children's vocabulary size. The researcher tests this theory by measuring both phonological working memory (A) and vocabulary size (B) in children of 4 and 5 years of age.

Page 34: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.3434

Data

Child A B

1 18 187

2 14 134

3 15 121

4 11 150

5 17 145

6 18 178

7 12 112

8 9 87

First plot the data

Page 35: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.3535

Scatterplot

85

105

125

145

165

185

8 10 12 14 16 18

A

B

Page 36: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.3636

Data

Child A B

1 18 187

2 14 134

3 15 121

4 11 150

5 17 145

6 18 178

7 12 112

8 9 87

Replace all observed values (A,B) by their ranks

Page 37: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.3737

Rank A

Child A Rank A

1 18 1

6 18 2

5 17 3

3 15 4

2 14 5

7 12 6

4 11 7

8 9 8

The first two observations are tied!

Is there a problem?

CCCC

Page 38: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.3838

Rank A

Child A Rank A True Rank

1 18 1 1.5

6 18 2 1.5

5 17 3 3

3 15 4 4

2 14 5 5

7 12 6 6

4 11 7 7

8 9 8 8

Page 39: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.3939

Rank B

Child B Rank B

1 187 1

6 178 2

4 150 3

5 145 4

2 134 5

3 121 6

7 112 7

8 87 8

No ties in this case

Page 40: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.4040

Rank B

Child B Rank B True Rank

1 187 1 1

6 178 2 2

4 150 3 3

5 145 4 4

2 134 5 5

3 121 6 6

7 112 7 7

8 87 8 8

Page 41: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.4141

DifferencesChild A

True Rank

1 18 1.5

2 14 5

3 15 4

4 11 7

5 17 3

6 18 1.5

7 12 6

8 9 8

Total

Rearrange the A data against the identifier (child)

Page 42: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.4242

DifferencesChild A

True Rank

BTrue Rank

1 18 1.5 187 1

2 14 5 134 5

3 15 4 121 6

4 11 7 150 3

5 17 3 145 4

6 18 1.5 178 2

7 12 6 112 7

8 9 8 87 8

Total

Similarly for the B data

Now find the difference between the true ranks

Page 43: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.4343

DifferencesChild A

True Rank

BTrue Rank

di

1 18 1.5 187 1 0.5

2 14 5 134 5 0

3 15 4 121 6 -2

4 11 7 150 3 4

5 17 3 145 4 -1

6 18 1.5 178 2 -0.5

7 12 6 112 7 -1

8 9 8 87 8 0

Total

Now square these differences

Page 44: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.4444

DifferencesChild A

True Rank

BTrue Rank

di di2

1 18 1.5 187 1 0.5 0.25

2 14 5 134 5 0 0

3 15 4 121 6 -2 4

4 11 7 150 3 4 16

5 17 3 145 4 -1 1

6 18 1.5 178 2 -0.5 0.25

7 12 6 112 7 -1 1

8 9 8 87 8 0 0

Total

And form the total

Page 45: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.4545

DifferencesChild A

True Rank

BTrue Rank

di di2

1 18 1.5 187 1 0.5 0.25

2 14 5 134 5 0 0

3 15 4 121 6 -2 4

4 11 7 150 3 4 16

5 17 3 145 4 -1 1

6 18 1.5 178 2 -0.5 0.25

7 12 6 112 7 -1 1

8 9 8 87 8 0 0

Total 22.5

Now calculate the correlation

Page 46: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.4646

Conclusion

n = 8 Σdi2 = 22.5

732.0638

1351

1885.226

11

61

22

1

2

nn

dr

n

ii

s

n p = 0.05 p = 0.01

8 0.738 0.881The tables give one and two tail values.Note that SPSS reports an approximate p value based on the Pearson correlation.

Page 47: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.4747

Conclusionn = 8 rcrit = 0.738 rs = 0.732

The critical value for n = 8 at the p_=_0.025 level is 0.738. Since rs =_0.732, which is less than 0.738; then rs is apparently not significant at the 95% confidence level. Or more plainly it would appear that there was probably no relationship.

Page 48: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.4848

Example

The hypothesis tested is that prices should decrease with distance from the key area of gentrification surrounding the Contemporary Art Museum (CAM, El Raval, Barcelona). The line followed is Transect 2 in the map, with continuous sampling of the price of a 50cl. bottle water at every convenience store.

Page 49: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.4949

MACBA Barcelona Contemporary Art Museum

Page 50: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.5050

Map

Selected transect Museum

Page 51: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.5151

Data

Convenience Store

Distance from CAM (m)

Price of 50cl. bottle (€)

1 50 1.80

2 175 1.20

3 270 2.00

4 375 1.00

5 425 1.00

6 580 1.20

7 710 0.80

8 790 0.60

9 890 1.00

10 980 0.85

First plot the data

Which variable is dependent?

Page 52: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.5252

Scatterplot

0.5

0.7

0.9

1.1

1.3

1.5

1.7

1.9

40 140 240 340 440 540 640 740 840 940

Distance from CAM

Pri

ce

(E

uro

s)

Page 53: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.5353

Data

Convenience Store

Distance from CAM (m)

Price of 50cl. bottle (€)

1 50 1.80

2 175 1.20

3 270 2.00

4 375 1.00

5 425 1.00

6 580 1.20

7 710 0.80

8 790 0.60

9 890 1.00

10 980 0.85

Can we assume normality?

Now rank the data

Page 54: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.5454

Rank Price

Ties have been identified

Convenience Store

Price of 50cl. bottle (€)

Rank

3 2 1

1 1.8 2

2 1.2 3

6 1.2 4

4 1 5

5 1 6

9 1 7

10 0.85 8

7 0.8 9

8 0.6 10

Page 55: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.5555

Rank Price

Convenience Store

Price of 50cl. bottle (€)

RankTrue Rank

3 2 1 1

1 1.8 2 2

2 1.2 3 3.5

6 1.2 4 3.5

4 1 5 6

5 1 6 6

9 1 7 6

10 0.85 8 8

7 0.8 9 9

8 0.6 10 10

Page 56: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.5656

Rank Distance

There are no ties

Convenience Store

Distance from CAM (m)

Rank

10 980 1

9 890 2

8 790 3

7 710 4

6 580 5

5 425 6

4 375 7

3 270 8

2 175 9

1 50 10

Page 57: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.5757

Rank Distance

Convenience StoreDistance from CAM

(m)Rank True Rank

10 980 1 1

9 890 2 2

8 790 3 3

7 710 4 4

6 580 5 5

5 425 6 6

4 375 7 7

3 270 8 8

2 175 9 9

1 50 10 10

Page 58: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.5858

Differences

Convenience Store

Price of

50cl. bottle

(€)

True Rank Price

1 1.8 2

2 1.2 3.5

3 2 1

4 1 6

5 1 6

6 1.2 3.5

7 0.8 9

8 0.6 10

9 1 6

10 0.85 8

Arrange true rank of price by store identifier

Page 59: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.5959

Differences

Convenience Store

Price of

50cl. bottle

(€)

True Rank Price

Distance from

CAM(m)

True Rank

Distance

1 1.8 2 50 10

2 1.2 3.5 175 9

3 2 1 270 8

4 1 6 375 7

5 1 6 425 6

6 1.2 3.5 580 5

7 0.8 9 710 4

8 0.6 10 790 3

9 1 6 890 2

10 0.85 8 980 1

Arrange true rank of distance by store identifier

Page 60: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.6060

Differences

Convenience Store

Price of

50cl. bottle

(€)

True Rank Price

Distance from

CAM(m)

True Rank

Distance

di

1 1.8 2 50 10 8

2 1.2 3.5 175 9 5.5

3 2 1 270 8 7

4 1 6 375 7 1

5 1 6 425 6 0

6 1.2 3.5 580 5 1.5

7 0.8 9 710 4 -5

8 0.6 10 790 3 -7

9 1 6 890 2 -4

10 0.85 8 980 1 -7

Calculate the differences between the true ranks

Page 61: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.6161

Differences

Convenience Store

Price of

50cl. bottle

(€)

True Rank Price

Distance from

CAM(m)

True Rank

Distance

di di2

1 1.8 2 50 10 8 64

2 1.2 3.5 175 9 5.5 30.25

3 2 1 270 8 7 49

4 1 6 375 7 1 1

5 1 6 425 6 0 0

6 1.2 3.5 580 5 1.5 2.25

7 0.8 9 710 4 -5 25

8 0.6 10 790 3 -7 49

9 1 6 890 2 -4 16

10 0.85 8 980 1 -7 49

Calculate the square of the differences

Page 62: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.6262

Differences

Convenience Store

Price of

50cl. bottle

(€)

True Rank Price

Distance from

CAM(m)

True Rank

Distance

di di2

1 1.8 2 50 10 8 64

2 1.2 3.5 175 9 5.5 30.25

3 2 1 270 8 7 49

4 1 6 375 7 1 1

5 1 6 425 6 0 0

6 1.2 3.5 580 5 1.5 2.25

7 0.8 9 710 4 -5 25

8 0.6 10 790 3 -7 49

9 1 6 890 2 -4 16

10 0.85 8 980 1 -7 49

Total285.

5

Form the total of the differences squared

Page 63: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.6363

Conclusion

n = 10 Σdi2 = 285.5

730.09910

17131

110105.2856

11

61

22

1

2

nn

dr

n

ii

s

n p = 0.05 p = 0.01

10 0.648 0.794

The tables give one and two tail values.

Page 64: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.6464

Conclusion

n = 10 rcrit = 0.648 rs = -0.730

The critical value for n = 10 at the p = 0.025 level is 0.648. The value 0.73 for two tails gives a significance level of slightly less than 5%.

Apparently there is a relationship.

Page 65: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.6565

Comparison

Recall that rs only matches the conventional correlation for the ranked data if there are no ties.

The previous calculation is repeated using ranked data and the full correlation formula and then tested with software.

Page 66: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.6666

Calculation

n = 10

Σxi = 10+9+…+1 = 55

Σyi = 2+3.5+…+8 = 55

Σxi2 = 102+92+…+12 = 385

Σyi2 = 22+3.52+…+82 = 382.5

Σxiyi = 10×2+9×3.5+…+1×8 = 241

1 2 3 4 5 6 7 8 9 10

10 9 8 7 6 5 4 3 2 12 3.5 1 6 6 3.5 9 10 6 8

Convenience Store

x Distance f rom CAM (m)y Price of 50cl. bottle (€)

Agree

Disagree because of ties

Page 67: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.6767

Calculation Sxx

n = 10 Σxi = 55 Σyi = 55

Σxi2 = 385 Σyi

2 = 382.5 Σxiyi = 241

5.821055

3851 2

22 iixx xn

xS

Page 68: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.6868

Calculation Syy

n = 10 Σxi = 55 Σyi = 55

Σxi2 = 385 Σyi

2 = 382.5 Σxiyi = 241

801055

5.3821 2

22 iiyy yn

yS

Page 69: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.6969

Calculation Sxy

n = 10 Σxi = 55 Σyi = 55

Σxi2 = 385 Σyi

2 = 382.5 Σxiyi = 241

5.6110

5555241

1 iiiixy yxn

yxS

Page 70: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.7070

Calculation

5.6110

5555241

1 iiiixy yxn

yxS

801055

5.3821 2

22 iiyy yn

yS

n = 10 Σxi = 55 Σyi = 55

Σxi2 = 385 Σyi

2 = 382.5 Σxiyi = 241

5.821055

3851 2

22 iixx xn

xS

Page 71: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.7171

Conclusion

n p = 0.05 p = 0.01

10 0.648 0.794

The tables give one and two tail values.The critical value for n = 10 at the p = 0.025 level is 0.648. The value 0.757 for two tails gives a significance level of slightly less than 5%. Apparently there is a relationship.

n = 12 rcrit = 0.648 rs = -0.757

75.0805.82

5.61

yyxx

xy

SS

Sr

Note slight difference due to a single tie.

Page 72: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.7272

SPSSAnalyze > Correlate > Bivariate

Page 73: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.7373

SPSSThe correlation (-0.76) and p (p < 0.05) value are consistent with our calculation.

Correlations

1.000 -.757*

. .011

10 10

-.757* 1.000

.011 .

10 10

Correlation Coefficient

Sig. (2-tailed)

N

Correlation Coefficient

Sig. (2-tailed)

N

Distance

Price

Spearman's rhoDistance Price

Correlation is significant at the 0.05 level (2-tailed).*.

Page 74: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.7474

Point Biserial Correlation

The point biserial correlation coefficient (rpb) is a correlation coefficient used when one variable is dichotomous; ideally it will be “naturally” dichotomous such as pass/fail (P/F). The point biserial correlation is mathematically equivalent to the Pearson correlation. This can be shown by assigning two distinct numerical values (usually 0/1) to the dichotomous variable .

Page 75: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.7575

Point Biserial Correlation

To calculate rpb, use the dichotomous variable to divide the data set into two groups. Evaluate MP is the mean score for group PMF is the mean score for group FS is the population standard deviation, evaluated for all entriesp the proportion of those in group Pf the proportion of those in group F (f = 1 – p)  

There is no version of the formula for a case where you only have sample data.

pfS

MMr FPpb

Page 76: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.7676

Point Biserial Correlation - Example

The following data is for 165 students attempting 50 multiple-choice questions. The final mark for the examination (out of 50) was

30 38 24 22 31 33 37 38 33 27 35 31 35 42 24 26 34 33 35 22 25 22 38 36 41 40 26 29 43 30 34 28 16 38 34 33

26 32 39 27 12 32 35 17 39 20 18 30 37 17 26 37 21 19 38 25 38 31 21 29 25 26 27 31 33 37 35 26 35 17 22 26

24 21 34 40 32 22 28 24 38 23 17 22 19 33 13 32 17 33 26 15 39 32 22 23 32 19 41 29 33 29 24 19 20 35 31 31

33 37 24 18 38 22 33 29 26 32 27 24 25 27 26 29 29 34 35 38 27 23 35 35 34 26 19 27 33 38 32 25 37 24 39 3028 25 32 28 33 26 22 26 29 25 32 30 37 29 33 28 22 33 29 27 29  

Individual success on the first question (1 pass, 0 fail) was

1 0 1 1 1 1 1 0 1 0 1 0 1 1 1 0 1 1 1 1 0 1 0 1 1 1 0 0 1 0 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 1 0 0 1 1 0 1 0 1 1 0 1

1 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0 0 0 1 0 1 0 1 1 0 1 0 0 0 0 0 1 1 0 1 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 1 0 1

0 1 0 0 1 1 1 1 0 0 1 1 0 0 0 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 1 1 1 0 1 1 1 0 1 1 0 1 0 0 0 

Page 77: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.7777

Point Biserial Correlation - Example

MP 30.87 is the mean score for group PMF 25.71 is the mean score for group FS 6.63 is the population standard deviation, evaluated for all entriesp .65 the proportion of those in group Pf .35 the proportion of those in group F (f = 1 – p)  

 This may be verified by direct calculation.

37.

pfS

MMr FPpb

Correlations  

  Question_1  

Total

Pearson Correlation .372**  

Sig. (2-tailed) .000  

N 165  

**. Correlation is significant at the 0.01 level (2-tailed).

Page 78: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.7878

Point Biserial Correlation - Example

Since rcrit is 0.153 (use the Calculator) and rpb is 0.37 the result is significant. Corresponding to a large positive correlation. We would expect students with high scores on the overall test to also be getting the item right. That students with low scores on the overall test get the item wrong. Correspond to a large negative correlation. We would not expect that students who get the item correct to tend to do poorly on the overall test. Students who get the item wrong would tend to do well on the test.

Page 79: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.7979

Read

Read Howitt and Cramer pages 59-74

Read Howitt and Cramer (e-text) pages 87-124

Read Russo (e-text) pages 176-201

Read Davis and Smith pages 173-192

Page 80: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.8080

Practical 9

This material is available from the module web page.

http://www.staff.ncl.ac.uk/mike.cox

Module Web Page

Page 81: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.8181

Practical 9

This material for the practical is available.

Instructions for the practical

Practical 9

Material for the practicalPractical 9

Page 82: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.8282

Whoops!

"There's this cluster of interrelated findings", said Richard A. Lippa, a professor of psychology at California State University at Fullerton, who has found evidence that in gay men, the hair on the back of the head is more likely to curl counter-clockwise than in straight men. "These are all biological markers that something must have gone on early in development".

Washington Post

5 February 2008

Source

Page 83: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.8383

However!

Dilbert

Page 84: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.8484

Cause and Effect

Recall - Type I error – false positive – conclude that the variable or coefficient is important, but the true state of nature is that it is not.

Firemen cause damageStorks bring babies

Conclude two drugs differ, when in fact they do not

Correlation does not imply causation! Spurious Correlations

Page 85: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

5.8585

Big Data Helps Companies Find Some Surprising Correlations

There's a Link Between Sales and Phases of the Moon, Among Other Things, By Deborah Gage , Wall Street Journal, 23 March 2014

Page 86: 9.11 Using Statistics To Make Inferences 9 Summary Correlation – Pearson correlation. Spearmans rank correlation. Point Biserial Correlation. Saturday,

9.8686

Whoops!