correlation hal whitehead biol4062/5062. the correlation coefficient tests non-parametric...

Post on 13-Jan-2016

242 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CorrelationHal Whitehead

BIOL4062/5062

• The correlation coefficient

• Tests

• Non-parametric correlations

• Partial correlation

• Multiple correlation

• Autocorrelation

• Many correlation coefficients

The correlation coefficient

Linked observations: x1,x2,...,xn y1,y2,...,yn

Mean: x = Σ xi / n y = Σ yi / n

Variance: S²(x)= Σ(xi-x)²/(n-1) S²(y)= Σ(yi-y)²/(n-1)

Standard Deviation:

S(x) S(y) Covariance: S²(x,y) = Σ(xi-x) ∙ (yi-y) / (n-1)

Covariance: S²(x,y) = Σ(xi-x) ∙ (yi-y) / (n-1)

Correlation coefficient

(“Pearson” or “product-moment”):

r = {Σ(xi-x) ∙ (yi-y) / (n-1) } / {S(x) ∙ S(y)}

r = S²(x,y) / {S(x) ∙ S(y)}

The correlation coefficient:

r = S²(x,y) / {S(x) ∙ S(y)}

-1 ≤ r ≤ +1

If no linear relationship: r = 0

r2: proportion of variance accounted for by linear regression

r = -0.01

r = 0.38

r = -0.31

r = 0.95

r = 0.04

r = 0.64

r = -0.46

r = 0.99

r = -0.0

Tests on Correlation Coefficients

Tests on Correlation Coefficients• Assume:

– Independence– Bivariate Normality

Tests on Correlation Coefficients• Assume:

– Independence– Bivariate Normality

Tests on Correlation Coefficients• Assume:

– Independence

– Bivariate Normality

• Then:

z = Ln [(1+r)/(1-r)]/2 is normally distributed

with variance 1/(n-3)

And, if (true population value of r) = 0 :

r ∙ √(n-2) / √(1-r²) is distributed as Student's t with n-2 degrees of freedom

We can test:

a) r ≠ 0

b) r > 0 or r < 0

c) r = constant

d) r(x,y) = r(z,w)

Also confidence intervals for r

Are Whales Battering Rams?(Carrier et al. J. Exp. Biol. 2002)

-30 -20 -10 0 10 20 30 40 50 60Sexual Size Dimorphism

0

10

20

30

Rel

ativ

e M

elon

Are

a

Are Whales Battering Rams?(Carrier et al. J. Exp. Biol. 2002)

r = 0.75

(SE = 0.15)

(95% C.I. 0.47-0.89)

Tests:

r ≠ 0 : P = 0.0001

r > 0 : P = 0.00005-30 -20 -10 0 10 20 30 40 50 60

Sexual Size Dimorphism

0

10

20

30

Rel

ativ

e M

elon

Are

a

More sexually dimorphic specieshave relatively larger melons

Why do Large Animals have Large Brains?

(Schoenemann Brain Behav. Evol. 2004)• Correlations among mammals

– Log brain size with

• Log muscle mass

r=0.984

• Log fat mass r=0.942

• Are these significantly different?

t=5.50; df=36; P<0.01

Hotelling-William test

• Brain mass is more closely related to muscle than fat 0.1 1.0 10.0 100.0 1000.0

Fat/Muscle mass (g)

1.0

10.0

100.0

Bra

in m

ass

(g)

MuscleFat

Non-Parametric Correlation

Non-Parametric Correlation

• If one variable normally distributed– can test r=0 as before.

• If neither normally distributed:– Spearman's rS rank correlation coefficient

(replace values by ranks)

or:– Kendall's τ correlation coefficient

• Use Spearman's when there is less certainty about the close rankings

Are Whales Battering Rams?(Carrier et al. J. Exp. Biol. 2002)

r = 0.75

rS = 0.62

τ= 0.47

-30 -20 -10 0 10 20 30 40 50 60Sexual Size Dimorphism

0

10

20

30

Rel

ativ

e M

elon

Are

a

Partial Correlation

Partial Correlation• Correlation between X and Y controlling for Z

r (X,Y|Z) = {r(X,Y) - r(X,Z)∙r(Y,Z)}

√{(1 - r(X,Z)²)∙(1 - r(Y,Z)²)}

• Correlation between X and Y controlling for W,Zr (X,Y|W,Z) = {r(X,Y|W) - r(X,Z|W)∙r(Y,Z|W)}

√{(1 - r(X,Z|W)²)∙(1 - r(Y,Z|W)²)}

n-2-c degrees of freedom

(c is number of control variables)

Why do Large Animals have Large Brains?

(Schoenemann Brain Behav. Evol. 2004)

• Correlations among mammals

– Log brain size with

Log muscle mass

Controlling for Log body mass

r=0.466

Log fat mass

Controlling for Log body mass

r=-0.299

• Fatter species have relatively smaller brains and more muscular species relatively larger brains

Semi-partial Correlation Coefficient

• Correlation between X & Y controlling Y for Z

r (X,(Y|Z)) = {r(X,Y) - r(X,Z)∙r(Y,Z)}

√(1 - r(Y,Z)²)

Are Whales Battering Rams?(Carrier et al. J. Exp. Biol. 2002)

Correlation

r = 0.75

Partial Correlation

r (SSD,MA|L) = 0.73

Semi-partial Correlations

r (SSD,(MA|L)) = 0.69

r ((SSD |L),MA) = 0.71

ME

LA

RE

AS

SD

MELAREA

LE

NG

TH

SSD LENGTH

Multiple Correlation

Multiple Correlation Coefficient

• Correlation between one dependent variable and its best estimate from a regression on several independent variables:

r(Y∙X1,X2,X3,...)

• Square of multiple correlation coefficient is:– proportion of variance accounted for by multiple

regression

Multiple Partial Correlation Coefficient

!

Autocorrelation

Autocorrelation

• Purposes– Examine time series

– Look at (serial) independence

Data

(e.g. Feeding rate on consecutive days,

plankton biomass at each station on a transect):

1.5 1.7 4.3 5.4 5.7 6.2 3.9 4.4 5.2 4.8 3.9 3.7 3.6

Autocorrelation of lag=1 is correlation between:

1.5 1.7 4.3 5.4 5.7 6.2 3.9 4.4 5.2 4.8 3.9 3.7

1.7 4.3 5.4 5.7 6.2 3.9 4.4 5.2 4.8 3.9 3.7 3.6

r = 0.508

Autocorrelation of lag=2 is correlation between:

1.5 1.7 4.3 5.4 5.7 6.2 3.9 4.4 5.2 4.8 3.9

4.3 5.4 5.7 6.2 3.9 4.4 5.2 4.8 3.9 3.7 3.6

r = -0.053

…….

Autocorrelation Plot

0 5 10 15Lag

-1.0

-0.5

0.0

0.5

1.0

Cor

rela

t ion

Autocorrelation Plot (Correlogram)

Many Correlation Coefficients

Many Correlation Coefficients:[Behaviour of Sperm Whale Groups]

NGR25L SST SHITR LSPEED APROP SOCV SHR2 LFMECS LAERRNGR25L 1.00SST 0.12 1.00SHITR -0.21 -0.33* 1.00LSPEED 0.10 -0.28+ 0.06 1.00APROP -0.15 -0.34* 0.07 0.18 1.00SOCV -0.05 0.08 -0.16 -0.01 -0.33* 1.00SHR2 -0.18 -0.12 0.01 -0.20 0.19 -0.03 1.00LFMECS 0.08 0.14 -0.13 -0.12 -0.22 0.29+ -0.18 1.00LAERR -0.10 0.03 -0.21 -0.24 -0.02 0.24 -0.08 0.23 1.00

Listwise deletion, n=40; P<0.10; P<0.05; uncorrected

Expected no. with P<0.10 = 3.6; with P<0.05 = 1.8

Many Correlation Coefficients:[Behaviour of Sperm Whale Groups]

NGR25L SST SHITR LSPEED APROP SOCV SHR2 LFMECS LAERRNGR25L 1.00SST 0.12 1.00SHITR -0.21 -0.33 1.00LSPEED 0.10 -0.28 0.06 1.00APROP -0.15 -0.34 0.07 0.18 1.00SOCV -0.05 0.08 -0.16 -0.01 -0.33 1.00SHR2 -0.18 -0.12 0.01 -0.20 0.19 -0.03 1.00LFMECS 0.08 0.14 -0.13 -0.12 -0.22 0.29 -0.18 1.00LAERR -0.10 0.03 -0.21 -0.24 -0.02 0.24 -0.08 0.23 1.00

Listwise deletion, n=40; P<0.10; P<0.05; Bonferroni corrected

P=1.0 for all coefficients

Many Correlation Coefficients:[Behaviour of Sperm Whale Groups]

NGR25L SST SHITR LSPEED APROP SOCV SHR2 LFMECS LAERRNGR25L 1.00SST 0.12 1.00SHITR -0.21 -0.33* 1.00LSPEED 0.10 -0.28+ 0.06 1.00APROP -0.15 -0.34* 0.07 0.18 1.00SOCV -0.05 0.08 -0.16 -0.01 -0.33* 1.00SHR2 -0.18 -0.12 0.01 -0.20 0.19 -0.03 1.00LFMECS 0.08 0.14 -0.13 -0.12 -0.22 0.29+ -0.18 1.00LAERR -0.10 0.03 -0.21 -0.24 -0.02 0.24 -0.08 0.23 1.00

Listwise deletion, n=40; P<0.10; P<0.05; uncorrected

Pairwise deletion, n=59-118; P<0.10; P<0.05; uncorrectedNGR25L SST SHITR LSPEED APROP SOCV SHR2 LFMECS LAERR

NGR25L 1.00SST 0.11 1.00SHITR -0.17+ -0.46* 1.00LSPEED 0.05 -0.17 0.05 1.00APROP -0.05 -0.20+ 0.04 0.31* 1.00SOCV -0.00 -0.05 -0.06 -0.02 -0.25* 1.00SHR2 -0.15 -0.13 0.07 -0.14 0.05 0.01 1.00LFMECS 0.01 0.07 -0.02 -0.14 -0.25* 0.43* -0.26+ 1.00LAERR -0.06 0.06 0.09 -0.27* -0.20+ 0.06 -0.06 0.21+ 1.00

Many Correlation Coefficients

• Missing values:– Listwise deletion (comparability), or– Pairwise deletion (power)

• P-values:– Uncorrected: type 1 errors– Bonferroni, etc.: type 2 errors

Beware!

Correlation Causation

Y1 Y2

Y1 Y3

Y4

Y2 Y5

Y1

Y3

Y2

Y2

Y1 Y3

Y4

Y1 Y3

Y4

Y2 Y5

Y1 Y3

Y4

Y5

Y2 Y6

top related