![Page 2: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/2.jpg)
Comparing Multiple Groups
Continous data: comparing means
Analysis of variance
Binary data: comparing proportions
Pearson’s Chi-square tests for r × 2 tablesIndependenceGoodness of FitHomogeneity
Categorical data: r × c tables
Pearson chi-square tests
Odds ratio and relative risk
2 / 62
![Page 3: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/3.jpg)
ANOVA: Definition
Statistical technique for comparing means for multiplepopulations
Partitioning the total variation in a data set into componentsdefined by specific sources
ANOVA = ANalysis Of VAriance
3 / 62
![Page 4: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/4.jpg)
ANOVA: Concepts
Estimate group means
Assess magnitude of variation attributable to specific sources
Extension of 2-sample t-test to multiple groups
Population model
Sample model: estimates, standard errors
Partition of variability
4 / 62
![Page 5: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/5.jpg)
Types of ANOVA
One-way ANOVA
One factor — e.g. smoking status
Two-way ANOVA
Two factors — e.g. gender and smoking status
Three-way ANOVA
Three factors — e.g. gender, smoking and beer
5 / 62
![Page 6: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/6.jpg)
Emphasis
One-way ANOVA is an extension of the t-test to 3 or more samples
focus analysis on group differences
Two-way ANOVA (and higher) focuses on the interaction offactors
Does the effect due to one factor change as the level ofanother factor changes?
6 / 62
![Page 7: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/7.jpg)
ANOVA Rationale I
Variation VariationVariation between each between each
in all = observation + group meanobservations and its group and the overall
mean mean
In other words,
Total = Within group + Between groupssum of squares sum of squares sum of squares
7 / 62
![Page 8: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/8.jpg)
ANOVA Rationale II
In shorthand:
SST = SSW + SSB
If the group means are not very different, the variationbetween them and the overall mean (SSB) will not be muchmore than the variation between the observations within agroup (SSW)
8 / 62
![Page 9: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/9.jpg)
ANOVA: One-Way
9 / 62
![Page 10: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/10.jpg)
MSW
We can pool the estimates of σ2 across groups and use anoverall estimate for the population variance:
Variation within a group = σ2W
=SSW
N − k= MSW
MSW is called the “within groups mean square”
10 / 62
![Page 11: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/11.jpg)
MSB
We can also look at systematic variation among groups
Variation between groups = σ2B
=SSB
k − 1= MSB
11 / 62
![Page 12: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/12.jpg)
An ANOVA table
Suppose there are k groups (e.g. if smoking status hascategories current, former or never, then k=3)
We calculate our test statistic using the sum of square valuesas follows:
12 / 62
![Page 13: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/13.jpg)
Hypothesis testing with ANOVA
In performing ANOVA, we may want to ask: is there truly adifference in means across groups?
Formally, we can specify the hypotheses:
H0 : µ1 = µ2 = · · · = µk
Ha : at least one of the µi ’s is different
The null hypothesis specifies a global relationship
If the result of the test is significant, then perform individualcomparisons
13 / 62
![Page 14: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/14.jpg)
Goal of the comparisons
Compare the two variability estimates, MSW and MSB
If Fobs = MSBMSW =
σ2B
σ2W
is small,
then variability between groups is negligible compared tovariation within groups⇒ The grouping does not explain much variation in the data
14 / 62
![Page 15: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/15.jpg)
The F-statistic
For our observations, we assume X ∼ N(µgp,σ2), where
µgp = E (X |gp)
= β0 + β1 · I (group=2) + β1 · I (group=3) + · · · )
and I (group=i) is an indicator to denote whether or not eachindividual is in the ith group
Note: we have assumed the same variance σ2 for all groups— important to check this assumption
Under these assumptions, we know the null distribution of thestatistic F= MSB
MSW
The distribution is called an F-distribution
15 / 62
![Page 16: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/16.jpg)
The F-distribution
Remember that a χ2 distribution is always specified by itsdegrees of freedom
An F-distribution is any distribution obtained by taking thequotient of two χ2 distributions divided by their respectivedegrees of freedom
When we specify an F-distribution, we must state twoparameters, which correspond to the degrees of freedom forthe two χ2 distributions
If X1 ∼ χ2df1
and X2 ∼ χ2df2
we write:
X1/df1X2/df2
∼ Fdf1,df2
16 / 62
![Page 17: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/17.jpg)
Back to the hypothesis test . . .
Knowing the null distribution of MSBMSW,
we can define a decision rule to test the hypothesis for ANOVA:
Reject H0 if F ≥ Fα;k−1,N−k
Fail to reject H0 if F < Fα;k−1,N−k
17 / 62
![Page 18: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/18.jpg)
ANOVA: F-tests I
18 / 62
![Page 19: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/19.jpg)
ANOVA: F-tests II
19 / 62
![Page 20: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/20.jpg)
Example: ANOVA for HDL
Study design: Randomize control trial
132 men randomized to one ofDiet + exericseDietControl
Follow-up one year later:
119 men remaining in study
Outcome: mean change in plasma levels of HDL cholesterol frombaseline to one-year follow-up in the three groups
20 / 62
![Page 21: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/21.jpg)
Model for HDL outcomes
We model the means for each group as follows:
µc = E (HDL|gp = c) = mean change in control group
µd = E (HDL|gp = d) = mean change in diet group
µde = E (HDL|gp = de) = mean change in diet and exercise group
We could also write the model as
E (HDL|gp) = β0 + β1I (gp = d) + β2I (gp = de)
Recall that I(gp=D), I(gp=DE) are 0/1 group indicators
21 / 62
![Page 22: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/22.jpg)
HDL ANOVA Table
We obtain the following results from the HDL experiment:
22 / 62
![Page 23: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/23.jpg)
HDL ANOVA results
F-test
H0 : µc = µd = µde (or H0 : β1 = β2 = 0)
Ha : at least one mean is different from the others
Test statistic
Fobs = 13
df1 = k − 1 = 3− 1 = 2
df2 = N − k = 116
23 / 62
![Page 24: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/24.jpg)
HDL ANOVA Conclusions
Rejection region: F > F0.05;2,116 = 3.07
Since Fobs = 13.0 > 3.07, we reject H0
We conclude that at least one of the group means is differentfrom the others
24 / 62
![Page 25: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/25.jpg)
Which groups are different?
We might proceed to make individual comparisons
Conduct two-sample t-tests for each pair of groups:
t =θ − θ0
SE (θ)=
Xi − Xj − 0√s2p
ni+
s2p
nj
25 / 62
![Page 26: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/26.jpg)
Multiple Comparisons
Performing individual comparisons require multiple hypothesistests
If α = 0.05 for each comparison, there is a 5% chance thateach comparison will falsely be called significant
Overall, the probability of Type I error is elevated above 5%
Question How can we address this multiple comparisons issue?
26 / 62
![Page 27: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/27.jpg)
Bonferroni adjustment
A possible correction for multiple comparisons
Test each hypothesis at level α∗ = (α/3) = 0.0167
Adjustment ensures overall Type I error rate does not exceedα = 0.05
However, this adjustment may be too conservative
27 / 62
![Page 28: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/28.jpg)
Multiple comparisons α
Hypothesis α∗ = α/3H0 : µc = µd (or β1 = 0) 0.0167H0 : µc = µde (or β2 = 0) 0.0167H0 : µd = µde (or β1 − β2 = 0) 0.0167
Overall α = 0.05
28 / 62
![Page 29: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/29.jpg)
HDL: Pairwise comparisons I
Control and Diet groups
H0 : µc = µd (or β1 = 0)
t = −0.05−0.02q0.028
40 +0.02840
= −1.87
p-value = 0.06
29 / 62
![Page 30: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/30.jpg)
HDL: Pairwise comparisons II
Control and Diet + exercise groups
H0 : µc = µde (or β2 = 0)
t = −0.05−0.14q0.028
40 +0.02839
= 5.05
p-value = 4.4× 10−7
30 / 62
![Page 31: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/31.jpg)
HDL: Pairwise comparisons III
Diet and Diet + exercise groups
H0 : µd = µde (or β1 − β2 = 0)
t = −0.02−0.14q0.028
40 +0.02839
= −3.19
p-value = 0.0014
31 / 62
![Page 32: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/32.jpg)
Bonferroni corrected p-values
Hypothesis p-value adjusted p-valueH0 : µc = µd 0.06 0.18H0 : µc = µde 4.4× 10−7 1.3× 10−6
H0 : µd = µde 0.0014 0.0042Overall α = 0.05
Conclusion: Significant difference in HDL change for DE groupcompared to other groups
32 / 62
![Page 33: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/33.jpg)
Two-way ANOVA
Uses the same idea as one-way ANOVA by partitioningvariability
Allows us to look at interaction of factors
Does the effect due to one factor change as the level ofanother factor changes?
33 / 62
![Page 34: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/34.jpg)
Example: Public health students’ medical expenditures
Study design: In an observation study, total medicalexpenditures and various demographic characteristics wererecorded for 200 public health students
Goal: determine how gender and smoking status affect totalmedical expenditures in this population
34 / 62
![Page 35: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/35.jpg)
Example: Set-up
Y = Total medical expenditures
F = Indicator of Female= 1 if Gender=Female, 0 otherwise
S = Indicator of Smoking= 1 if smoked 100 cigarettes or more, 0 otherwise
35 / 62
![Page 36: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/36.jpg)
Interaction model
We assume the model
Y ∼ N(µ,σ2)
whereµ = E (Y ) = β0 + β1F + β2S + β3F · S
What are the interpretations of β0,β1,β2, and β3
36 / 62
![Page 37: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/37.jpg)
Two-way ANOVA: Interactions
Mean Model
µ = E (Y ) = β0 + β1F + β2S + β3F · S
SmokerNo Yes
GenderMale β0 β0 + β2
Female β0 + β1 β0 + β1 + β2 + β3
37 / 62
![Page 38: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/38.jpg)
Mean Model
E (Expenditure|Male, non-smoker) = β0 + β1 · 0 + β2 · 0 + β3 · 0
= β0
E (Expenditure|Female, non-smoker) = β0 + β1 · 1 + β2 · 0 + β3 · 0
= β0 + β1
E (Expenditure|Male, Smoker) = β0 + β1 · 0 + β2 · 1 + β3 · 0
= β0 + β2
E (Expenditure|Female, Smoker) = β0 + β1 · 1 + β2 · 1 + β3 · 1
= β0 + β1 + β2 + β3
38 / 62
![Page 39: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/39.jpg)
Medical Expenditures: ANOVA table
Source of Sum of MeanVariation Square df Square F p-valueModel
(between groups) 1.7× 109 3 5.6× 108 28.11 < 0.001Error
(within groups) 3.9× 109 196 2.0× 107
Total 5.6× 109 199
39 / 62
![Page 40: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/40.jpg)
Medical Expenditures: Results
Overall model F-test:
H0 : β1 = β2 = β3 = 0
Ha : At least one group is different
Test statistic:Fobs = 28.11df1 = k − 1 = 3df2 = N − k = 196p-value < 0.001
40 / 62
![Page 41: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/41.jpg)
Medical Expenditures: Overall Conclusions
The medical expenditures are different in at least one of thegroups
Now we can figure out which ones. . .
41 / 62
![Page 42: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/42.jpg)
Medical Expenditures: Two-way ANOVA I
Table of coefficient estimates
Coefficient Estimate Standard Errorβ0 (baseline) 5049 597
β1 (female effect) 1784 765β2 (smoker effect) 907 1062β3 (female*smoke) 6239 1422
42 / 62
![Page 43: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/43.jpg)
Medical Expenditures: Two-way ANOVA II
Test statistics and confidence intervals
Coefficient t P> |t| 95% Confidence intervalβ0 8.45 0.000 (3870, 6228)β1 2.33 0.21 (276, 3292)β2 0.85 0.394 (-1187, 3001)β3 4.39 0.000 (3434, 9043)
43 / 62
![Page 44: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/44.jpg)
Medical Expenditures: Group-wise Conclusions
In this population, an average male non-smoker spends about$5000 on medical costs per year
Males who smoked were estimated as having spent about$900 more than non-smokers, but this difference was notfound to be statistically significant
Female non-smokers spent about $1700 more than therenon-smoking male counterparts
Female smokers spent about $8900 (= β1 + β2 + β3) morethan non-smoking males
44 / 62
![Page 45: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/45.jpg)
Association and Correlation I
Association
Express the relationship between two variables
Can be measured in different ways, depending on the natureof the variables
For now, we’ll focus on continuous variables (e.g. height,weight)
Important note: association does not imply causation
45 / 62
![Page 46: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/46.jpg)
Association and Correlation II
Describing the relationship between two continuous variables
Correlation analysis
Measures strength of relationship between two variablesSpecifies direction of relationship
Regression analysis
Concerns prediction or estimation of outcome variable, basedon value of another variable (or variables)
46 / 62
![Page 47: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/47.jpg)
Correlation analysis
Plot the data (or have a computer to do so)
Visually inspect the relationship between two continousvariables
Is there a linear relationship (correlation)?
Are there outliers?
Are the distributions skewed?
47 / 62
![Page 48: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/48.jpg)
Correlation Coefficient I
Measures the strength and direction of the linear relationshipbetween to variables X and Y
Population correlation coefficient,
ρ =cov(X ,Y )√
var(X ) · var(Y )=
E [(X − µX )(Y − µY )]√E [(X − µX )2] · E[(Y − µY )2]
48 / 62
![Page 49: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/49.jpg)
Correlation Coefficient II
The correlation coefficient, ρ, takes values between -1 and +1
-1: Perfect negative linear relationship
0: No linear relationship
+1: Perfect positive relationship
49 / 62
![Page 50: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/50.jpg)
Correlation Coefficient III
Sample correlation coefficient:
Obtained by plugging sample estimates into the populationcorrelation coefficient
r =sample cov(X ,Y )√
s2x · s2
Y
=
∑ni=1
(Xi−X )(Yi−Y )n−1√∑n
i=1(Xi−X )2
n−1 · ∑ni=1
(Yi−Y )2
n−1
50 / 62
![Page 51: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/51.jpg)
Correlation Coefficient IV
Plot standardized Y versusstandardized X
Observe an ellipse(elongated circle)
Correlation is the slope ofthe major axis
51 / 62
![Page 52: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/52.jpg)
Correlation Notes
Other names for rPearson correlation coefficientProduct moment of correlation
Characteristics of rMeasures *linear* associationThe value of r is independent of units used to measure thevariablesThe value of r is sensitive to outliersr2 tells us what proportion of variation in Y is explained bylinear relationship with X
52 / 62
![Page 53: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/53.jpg)
Several levels of correlation
53 / 62
![Page 54: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/54.jpg)
Examples of the Correlation Coefficient I
Perfect positive correlation, r ≈ 1
● ●●
●●
●●
●●
● ●
●
● ●●
●●
● ●●
●
● ●●
● ●●
●
●
● ●
●●
●
●●
●
● ●●
54 / 62
![Page 55: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/55.jpg)
Examples of the Correlation Coefficient II
Perfect negative correlation, r ≈ -1
● ●●
●●
●●
●
● ●●
●
●●
● ●
●●
●●
●
●● ●
●
●●
●● ●
●
●●
● ●
●●
●●
●
55 / 62
![Page 56: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/56.jpg)
Examples of the Correlation Coefficient III
Imperfect positive correlation, 0< r <1
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
56 / 62
![Page 57: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/57.jpg)
Examples of the Correlation Coefficient IV
Imperfect negative correlation, -1<r <0
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
57 / 62
![Page 58: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/58.jpg)
Examples of the Correlation Coefficient V
No relation, r ≈ 0
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●●
●
●
● ●
●
●
●
●●
58 / 62
![Page 59: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/59.jpg)
Examples of the Correlation Coefficient VI
Some relation but little *linear* relationship, r ≈ 0
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
● ●
●
●
● ●●
●
●
●
●
●
●
●
●
●●
●
59 / 62
![Page 60: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/60.jpg)
Association and Causality
In general, association between two variables means theresome form of relationship between them
The relationship is not necessarily causalAssociation does not imply causation, no matter how much wewould like it to
Example: Hot days, ice cream, drowning
60 / 62
![Page 61: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/61.jpg)
Sir Bradford Hill’s Criteria for Causality I
Strength: magnitude of association
Consistency of association: repeated observation of theassociation in different situations
Specificity: uniqueness of the association
Temporality: cause precedes effect
61 / 62
![Page 62: Lecture 5: AN O V A and Co rrelat io nam3xa/BiostatII/slides/lecture5.pdfLecture 5: AN O V A and Co rrelat io n An i Ma nicha ikul amanicha@jhsph.edu 23 Ap ril 2007 ... Su pp os e](https://reader033.vdocuments.net/reader033/viewer/2022042310/5ed82e7d0fa3e705ec0dfcd3/html5/thumbnails/62.jpg)
Sir Bradford Hill’s Criteria for Causality II
Biologic gradient: dose-response relationship
Biologic plausibility: known mechanisms
Coherence: makes sense based on other known facts
Experimental evidence: from designed (randomized)experiments
Analogy: with other known associations
62 / 62