what is correlation? how to compute? how to interpret? 2

25
Social Statistics: Correlation

Upload: damian-foster

Post on 23-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: What is correlation?  How to compute?  How to interpret? 2

Social Statistics: Correlation

Page 2: What is correlation?  How to compute?  How to interpret? 2

What is correlation? How to compute? How to interpret?

This week

2

Page 3: What is correlation?  How to compute?  How to interpret? 2

The relations between two variables How the value of one variable changes

when the value of another variable changes

A correlation coefficient is a numerical index to reflect the relationship between two variables. Range: -1 ~ +1 Bivariate correlation (for two variables)

Correlation Coefficients

3

Page 4: What is correlation?  How to compute?  How to interpret? 2

Parametric Pearson product-moment correlation

(named for inventor Karl Pearson) Non-parametric

Spearman’s rank correlation Kendall tau rank correlation coefficient

Correlation Coefficients

4

Page 5: What is correlation?  How to compute?  How to interpret? 2

For two variables which are continuous in nature Height, age, test score, income

But not for discrete or categorical variables Race, political affiliation, social class,

rank

Pearson correlation coefficient

Rxy is the correlation between variable X and variable Y

5

Page 6: What is correlation?  How to compute?  How to interpret? 2

Direct correlation (positive correlation): If both variables change in the same direction

Indirect correlation (negative correlation): If both variables change in opposite directions

Types of correlation coefficients

6

Page 7: What is correlation?  How to compute?  How to interpret? 2

Below is Correlation Report of different Currency Exchange Rate on November 13 – 2014 (source: Bloomberg Terminal)

-0.8 and 0.5, which is stronger?

Types of correlation coefficients

7

Page 8: What is correlation?  How to compute?  How to interpret? 2

])(][)([ 2222 YYnXXn

YXXYnrxy

Pearson product-moment correlation coefficient

xyr the correlation coefficient between X and Y n the size of the sample X the individual’s score on the X variable Y the individual’s score on the Y variable XY the product of each X score times its corresponding Y score X2 the individual X score, squared Y2 the individual Y score, squared

8

Page 9: What is correlation?  How to compute?  How to interpret? 2

Calculate Pearson correlation coefficient for US school enrollment (unit: k) in some time points of previous 50 years. (Source: United States Census Bureau)

Exercise

1. Select two columns of data – are they correlated?2. What does this correlated mean?

9

YearG9-12 Public

G9-12 Private

College-Public

College-Private

1965 11610 1400 3970 1951

1970 13336 1311 6428 2153

1975 14304 1300 8836 2350

1980 13231 1339 9457 2640

1985 12388 1362 9479 2768

1990 11341 1136 10845 2974

1995 12502 1163 11092 3169

2000 13517 1264 11753 3560

2005 14909 1349 13022 4466

Page 10: What is correlation?  How to compute?  How to interpret? 2

CORREL function Or PEARSON function

Using Excel to calculate

10

1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 20100

2000

4000

6000

8000

10000

12000

14000

16000

G9-12 PublicG9-12 PrivateCollege-PublicCollege-Private

Page 11: What is correlation?  How to compute?  How to interpret? 2

Scatterplot or scattergram

Visualizing a correlation

X

Y

11

X Y2 34 25 66 54 37 68 55 46 47 5

Page 12: What is correlation?  How to compute?  How to interpret? 2

Visualizing a correlation

12

Page 13: What is correlation?  How to compute?  How to interpret? 2

r =1, a perfect direct (or positive) correlation In real life case, 0.7 and 0.8 could be the highest you will

see

Direct (positive) correlation

0 1 2 3 4 5 6 7 8 90

2

4

6

8

10

13

Page 14: What is correlation?  How to compute?  How to interpret? 2

Strength and direction are important

Indirect (or negative) correlation

0 1 2 3 4 5 6 7 8 90

2

4

6

8

10

14

Page 15: What is correlation?  How to compute?  How to interpret? 2

Excel Scatterplot

Four sets of data with the same correlation of 0.816

15

Page 16: What is correlation?  How to compute?  How to interpret? 2

Linear correlation means that X and Y are in one straight line

Curvlilinear correlation Age and memory

Linear correlation

16

Page 17: What is correlation?  How to compute?  How to interpret? 2

incomeeducation attitudevote

74190 13 1 180931 12 3 281314 11 4 273089 11 5 262023 11 3 261217 10 4 284526 11 5 187251 11 4 162659 12 5 276450 10 6 270512 12 7 278858 9 6 178628 13 7 186212 14 8 274962 9 8 258828 11 9 461471 10 8 578621 12 7 560071 9 8 4

More than 2 variables?

How to calculate the correlation coefficient?

1. CORREL()2. Correlation in data analysis

toolset

17

Page 18: What is correlation?  How to compute?  How to interpret? 2

Correlation matrix

More than 2 variables?

Income Education

Attitude Vote

Income 1.00 0.35 -0.19 -0.51

Education 1.00 -0.21 -0.20

Attitude 1.00 0.55

Vote 1.00

18

Page 19: What is correlation?  How to compute?  How to interpret? 2

Data Analysis tool - correlation

Excel

19

Page 20: What is correlation?  How to compute?  How to interpret? 2

Correlation value: - finite number ~ + finite number

Correlation coefficient value: -1.00 ~ +1.00

Meaning of Correlation coefficient

rxy value

Interpretation

0.8 ~ 1.0 Very strong relationship (share most of the things in common)

0.6 ~0.8 Strong relationship (share many things in common)

0.4 ~ 0.6 Moderate relationship (share something in common)

0.2 ~ 0.4 Weak relationship (share a little in common)

0.0 ~ 0.2 Weak or no relationship (share very little or nothing in common)

20

Page 21: What is correlation?  How to compute?  How to interpret? 2

Coefficient of determination: The percentage of variance in one

variable that is accounted for by the variance in the other variable.

= square of coefficient

Coefficient of determination

49.0

70.02

.

.

TimeGPA

TimeGPA

r

r 49% of the variance in GPA can be explained by the variance in studying time

21

Page 22: What is correlation?  How to compute?  How to interpret? 2

The amount of unexplained variance is called the coefficient of undetermination (coefficient of alienation)

Coefficient of nondetermination

correlation

determination

interpretation

0 0

0.5 0.25

0.9 0.81

22

Page 23: What is correlation?  How to compute?  How to interpret? 2

In a small town in Greece, The local police found the direct

correlation between ice cream and crime

Ice cream and crime

23

Page 24: What is correlation?  How to compute?  How to interpret? 2

The correlation represents the association between two or more variables

It has nothing to do with causality (there is no cause relation between two correlated variables) Ices cream and crime are correlated, but Ices cream does not cause crime

Correlation vs. causality

24

Page 25: What is correlation?  How to compute?  How to interpret? 2

Correlation vs. causalitySummer

Summer is when people get together. More specifically, casual drinkers and drug users are more likely to go to bars or parties on weekends and evenings, as opposed to a Tuesday morning. These people in the social mix, flooding the city’s streets and neighborhood bars, feed the peak times for murder, experts say.

25