lecture 5: an o v a and co rrelat io nam3xa/biostatii/slides/lecture5.pdflecture 5: an o v a and co...

62
Lecture 5: ANOVA and Correlation Ani Manichaikul [email protected] 23 April 2007 1 / 62

Upload: others

Post on 30-May-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Lecture 5: ANOVA and Correlation

Ani [email protected]

23 April 2007

1 / 62

Page 2: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Comparing Multiple Groups

Continous data: comparing means

Analysis of variance

Binary data: comparing proportions

Pearson’s Chi-square tests for r × 2 tablesIndependenceGoodness of FitHomogeneity

Categorical data: r × c tables

Pearson chi-square tests

Odds ratio and relative risk

2 / 62

Page 3: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

ANOVA: Definition

Statistical technique for comparing means for multiplepopulations

Partitioning the total variation in a data set into componentsdefined by specific sources

ANOVA = ANalysis Of VAriance

3 / 62

Page 4: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

ANOVA: Concepts

Estimate group means

Assess magnitude of variation attributable to specific sources

Extension of 2-sample t-test to multiple groups

Population model

Sample model: estimates, standard errors

Partition of variability

4 / 62

Page 5: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Types of ANOVA

One-way ANOVA

One factor — e.g. smoking status

Two-way ANOVA

Two factors — e.g. gender and smoking status

Three-way ANOVA

Three factors — e.g. gender, smoking and beer

5 / 62

Page 6: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Emphasis

One-way ANOVA is an extension of the t-test to 3 or more samples

focus analysis on group differences

Two-way ANOVA (and higher) focuses on the interaction offactors

Does the effect due to one factor change as the level ofanother factor changes?

6 / 62

Page 7: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

ANOVA Rationale I

Variation VariationVariation between each between each

in all = observation + group meanobservations and its group and the overall

mean mean

In other words,

Total = Within group + Between groupssum of squares sum of squares sum of squares

7 / 62

Page 8: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

ANOVA Rationale II

In shorthand:

SST = SSW + SSB

If the group means are not very different, the variationbetween them and the overall mean (SSB) will not be muchmore than the variation between the observations within agroup (SSW)

8 / 62

Page 9: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

ANOVA: One-Way

9 / 62

Page 10: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

MSW

We can pool the estimates of σ2 across groups and use anoverall estimate for the population variance:

Variation within a group = σ2W

=SSW

N − k= MSW

MSW is called the “within groups mean square”

10 / 62

Page 11: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

MSB

We can also look at systematic variation among groups

Variation between groups = σ2B

=SSB

k − 1= MSB

11 / 62

Page 12: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

An ANOVA table

Suppose there are k groups (e.g. if smoking status hascategories current, former or never, then k=3)

We calculate our test statistic using the sum of square valuesas follows:

12 / 62

Page 13: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Hypothesis testing with ANOVA

In performing ANOVA, we may want to ask: is there truly adifference in means across groups?

Formally, we can specify the hypotheses:

H0 : µ1 = µ2 = · · · = µk

Ha : at least one of the µi ’s is different

The null hypothesis specifies a global relationship

If the result of the test is significant, then perform individualcomparisons

13 / 62

Page 14: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Goal of the comparisons

Compare the two variability estimates, MSW and MSB

If Fobs = MSBMSW =

σ2B

σ2W

is small,

then variability between groups is negligible compared tovariation within groups⇒ The grouping does not explain much variation in the data

14 / 62

Page 15: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

The F-statistic

For our observations, we assume X ∼ N(µgp,σ2), where

µgp = E (X |gp)

= β0 + β1 · I (group=2) + β1 · I (group=3) + · · · )

and I (group=i) is an indicator to denote whether or not eachindividual is in the ith group

Note: we have assumed the same variance σ2 for all groups— important to check this assumption

Under these assumptions, we know the null distribution of thestatistic F= MSB

MSW

The distribution is called an F-distribution

15 / 62

Page 16: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

The F-distribution

Remember that a χ2 distribution is always specified by itsdegrees of freedom

An F-distribution is any distribution obtained by taking thequotient of two χ2 distributions divided by their respectivedegrees of freedom

When we specify an F-distribution, we must state twoparameters, which correspond to the degrees of freedom forthe two χ2 distributions

If X1 ∼ χ2df1

and X2 ∼ χ2df2

we write:

X1/df1X2/df2

∼ Fdf1,df2

16 / 62

Page 17: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Back to the hypothesis test . . .

Knowing the null distribution of MSBMSW,

we can define a decision rule to test the hypothesis for ANOVA:

Reject H0 if F ≥ Fα;k−1,N−k

Fail to reject H0 if F < Fα;k−1,N−k

17 / 62

Page 18: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

ANOVA: F-tests I

18 / 62

Page 19: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

ANOVA: F-tests II

19 / 62

Page 20: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Example: ANOVA for HDL

Study design: Randomize control trial

132 men randomized to one ofDiet + exericseDietControl

Follow-up one year later:

119 men remaining in study

Outcome: mean change in plasma levels of HDL cholesterol frombaseline to one-year follow-up in the three groups

20 / 62

Page 21: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Model for HDL outcomes

We model the means for each group as follows:

µc = E (HDL|gp = c) = mean change in control group

µd = E (HDL|gp = d) = mean change in diet group

µde = E (HDL|gp = de) = mean change in diet and exercise group

We could also write the model as

E (HDL|gp) = β0 + β1I (gp = d) + β2I (gp = de)

Recall that I(gp=D), I(gp=DE) are 0/1 group indicators

21 / 62

Page 22: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

HDL ANOVA Table

We obtain the following results from the HDL experiment:

22 / 62

Page 23: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

HDL ANOVA results

F-test

H0 : µc = µd = µde (or H0 : β1 = β2 = 0)

Ha : at least one mean is different from the others

Test statistic

Fobs = 13

df1 = k − 1 = 3− 1 = 2

df2 = N − k = 116

23 / 62

Page 24: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

HDL ANOVA Conclusions

Rejection region: F > F0.05;2,116 = 3.07

Since Fobs = 13.0 > 3.07, we reject H0

We conclude that at least one of the group means is differentfrom the others

24 / 62

Page 25: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Which groups are different?

We might proceed to make individual comparisons

Conduct two-sample t-tests for each pair of groups:

t =θ − θ0

SE (θ)=

Xi − Xj − 0√s2p

ni+

s2p

nj

25 / 62

Page 26: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Multiple Comparisons

Performing individual comparisons require multiple hypothesistests

If α = 0.05 for each comparison, there is a 5% chance thateach comparison will falsely be called significant

Overall, the probability of Type I error is elevated above 5%

Question How can we address this multiple comparisons issue?

26 / 62

Page 27: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Bonferroni adjustment

A possible correction for multiple comparisons

Test each hypothesis at level α∗ = (α/3) = 0.0167

Adjustment ensures overall Type I error rate does not exceedα = 0.05

However, this adjustment may be too conservative

27 / 62

Page 28: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Multiple comparisons α

Hypothesis α∗ = α/3H0 : µc = µd (or β1 = 0) 0.0167H0 : µc = µde (or β2 = 0) 0.0167H0 : µd = µde (or β1 − β2 = 0) 0.0167

Overall α = 0.05

28 / 62

Page 29: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

HDL: Pairwise comparisons I

Control and Diet groups

H0 : µc = µd (or β1 = 0)

t = −0.05−0.02q0.028

40 +0.02840

= −1.87

p-value = 0.06

29 / 62

Page 30: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

HDL: Pairwise comparisons II

Control and Diet + exercise groups

H0 : µc = µde (or β2 = 0)

t = −0.05−0.14q0.028

40 +0.02839

= 5.05

p-value = 4.4× 10−7

30 / 62

Page 31: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

HDL: Pairwise comparisons III

Diet and Diet + exercise groups

H0 : µd = µde (or β1 − β2 = 0)

t = −0.02−0.14q0.028

40 +0.02839

= −3.19

p-value = 0.0014

31 / 62

Page 32: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Bonferroni corrected p-values

Hypothesis p-value adjusted p-valueH0 : µc = µd 0.06 0.18H0 : µc = µde 4.4× 10−7 1.3× 10−6

H0 : µd = µde 0.0014 0.0042Overall α = 0.05

Conclusion: Significant difference in HDL change for DE groupcompared to other groups

32 / 62

Page 33: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Two-way ANOVA

Uses the same idea as one-way ANOVA by partitioningvariability

Allows us to look at interaction of factors

Does the effect due to one factor change as the level ofanother factor changes?

33 / 62

Page 34: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Example: Public health students’ medical expenditures

Study design: In an observation study, total medicalexpenditures and various demographic characteristics wererecorded for 200 public health students

Goal: determine how gender and smoking status affect totalmedical expenditures in this population

34 / 62

Page 35: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Example: Set-up

Y = Total medical expenditures

F = Indicator of Female= 1 if Gender=Female, 0 otherwise

S = Indicator of Smoking= 1 if smoked 100 cigarettes or more, 0 otherwise

35 / 62

Page 36: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Interaction model

We assume the model

Y ∼ N(µ,σ2)

whereµ = E (Y ) = β0 + β1F + β2S + β3F · S

What are the interpretations of β0,β1,β2, and β3

36 / 62

Page 37: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Two-way ANOVA: Interactions

Mean Model

µ = E (Y ) = β0 + β1F + β2S + β3F · S

SmokerNo Yes

GenderMale β0 β0 + β2

Female β0 + β1 β0 + β1 + β2 + β3

37 / 62

Page 38: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Mean Model

E (Expenditure|Male, non-smoker) = β0 + β1 · 0 + β2 · 0 + β3 · 0

= β0

E (Expenditure|Female, non-smoker) = β0 + β1 · 1 + β2 · 0 + β3 · 0

= β0 + β1

E (Expenditure|Male, Smoker) = β0 + β1 · 0 + β2 · 1 + β3 · 0

= β0 + β2

E (Expenditure|Female, Smoker) = β0 + β1 · 1 + β2 · 1 + β3 · 1

= β0 + β1 + β2 + β3

38 / 62

Page 39: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Medical Expenditures: ANOVA table

Source of Sum of MeanVariation Square df Square F p-valueModel

(between groups) 1.7× 109 3 5.6× 108 28.11 < 0.001Error

(within groups) 3.9× 109 196 2.0× 107

Total 5.6× 109 199

39 / 62

Page 40: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Medical Expenditures: Results

Overall model F-test:

H0 : β1 = β2 = β3 = 0

Ha : At least one group is different

Test statistic:Fobs = 28.11df1 = k − 1 = 3df2 = N − k = 196p-value < 0.001

40 / 62

Page 41: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Medical Expenditures: Overall Conclusions

The medical expenditures are different in at least one of thegroups

Now we can figure out which ones. . .

41 / 62

Page 42: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Medical Expenditures: Two-way ANOVA I

Table of coefficient estimates

Coefficient Estimate Standard Errorβ0 (baseline) 5049 597

β1 (female effect) 1784 765β2 (smoker effect) 907 1062β3 (female*smoke) 6239 1422

42 / 62

Page 43: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Medical Expenditures: Two-way ANOVA II

Test statistics and confidence intervals

Coefficient t P> |t| 95% Confidence intervalβ0 8.45 0.000 (3870, 6228)β1 2.33 0.21 (276, 3292)β2 0.85 0.394 (-1187, 3001)β3 4.39 0.000 (3434, 9043)

43 / 62

Page 44: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Medical Expenditures: Group-wise Conclusions

In this population, an average male non-smoker spends about$5000 on medical costs per year

Males who smoked were estimated as having spent about$900 more than non-smokers, but this difference was notfound to be statistically significant

Female non-smokers spent about $1700 more than therenon-smoking male counterparts

Female smokers spent about $8900 (= β1 + β2 + β3) morethan non-smoking males

44 / 62

Page 45: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Association and Correlation I

Association

Express the relationship between two variables

Can be measured in different ways, depending on the natureof the variables

For now, we’ll focus on continuous variables (e.g. height,weight)

Important note: association does not imply causation

45 / 62

Page 46: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Association and Correlation II

Describing the relationship between two continuous variables

Correlation analysis

Measures strength of relationship between two variablesSpecifies direction of relationship

Regression analysis

Concerns prediction or estimation of outcome variable, basedon value of another variable (or variables)

46 / 62

Page 47: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Correlation analysis

Plot the data (or have a computer to do so)

Visually inspect the relationship between two continousvariables

Is there a linear relationship (correlation)?

Are there outliers?

Are the distributions skewed?

47 / 62

Page 48: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Correlation Coefficient I

Measures the strength and direction of the linear relationshipbetween to variables X and Y

Population correlation coefficient,

ρ =cov(X ,Y )√

var(X ) · var(Y )=

E [(X − µX )(Y − µY )]√E [(X − µX )2] · E[(Y − µY )2]

48 / 62

Page 49: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Correlation Coefficient II

The correlation coefficient, ρ, takes values between -1 and +1

-1: Perfect negative linear relationship

0: No linear relationship

+1: Perfect positive relationship

49 / 62

Page 50: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Correlation Coefficient III

Sample correlation coefficient:

Obtained by plugging sample estimates into the populationcorrelation coefficient

r =sample cov(X ,Y )√

s2x · s2

Y

=

∑ni=1

(Xi−X )(Yi−Y )n−1√∑n

i=1(Xi−X )2

n−1 · ∑ni=1

(Yi−Y )2

n−1

50 / 62

Page 51: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Correlation Coefficient IV

Plot standardized Y versusstandardized X

Observe an ellipse(elongated circle)

Correlation is the slope ofthe major axis

51 / 62

Page 52: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Correlation Notes

Other names for rPearson correlation coefficientProduct moment of correlation

Characteristics of rMeasures *linear* associationThe value of r is independent of units used to measure thevariablesThe value of r is sensitive to outliersr2 tells us what proportion of variation in Y is explained bylinear relationship with X

52 / 62

Page 53: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Several levels of correlation

53 / 62

Page 54: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Examples of the Correlation Coefficient I

Perfect positive correlation, r ≈ 1

● ●●

●●

●●

●●

● ●

● ●●

●●

● ●●

● ●●

● ●●

● ●

●●

●●

● ●●

54 / 62

Page 55: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Examples of the Correlation Coefficient II

Perfect negative correlation, r ≈ -1

● ●●

●●

●●

● ●●

●●

● ●

●●

●●

●● ●

●●

●● ●

●●

● ●

●●

●●

55 / 62

Page 56: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Examples of the Correlation Coefficient III

Imperfect positive correlation, 0< r <1

●●

●●

● ●

●●

● ●

●●

56 / 62

Page 57: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Examples of the Correlation Coefficient IV

Imperfect negative correlation, -1<r <0

● ●

●●

● ●

57 / 62

Page 58: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Examples of the Correlation Coefficient V

No relation, r ≈ 0

●●

●●

●●

●●

●●

● ●

●●

58 / 62

Page 59: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Examples of the Correlation Coefficient VI

Some relation but little *linear* relationship, r ≈ 0

●●

●●

● ●

● ●

● ●●

●●

59 / 62

Page 60: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Association and Causality

In general, association between two variables means theresome form of relationship between them

The relationship is not necessarily causalAssociation does not imply causation, no matter how much wewould like it to

Example: Hot days, ice cream, drowning

60 / 62

Page 61: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Sir Bradford Hill’s Criteria for Causality I

Strength: magnitude of association

Consistency of association: repeated observation of theassociation in different situations

Specificity: uniqueness of the association

Temporality: cause precedes effect

61 / 62

Page 62: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e

Sir Bradford Hill’s Criteria for Causality II

Biologic gradient: dose-response relationship

Biologic plausibility: known mechanisms

Coherence: makes sense based on other known facts

Experimental evidence: from designed (randomized)experiments

Analogy: with other known associations

62 / 62