analysis of variance - stony brookzhu/ams394/lab11.pdfanalysis of variance 1 one way analysis of...
TRANSCRIPT
![Page 1: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/1.jpg)
ANOVA
Analysis of Variance
1
![Page 2: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/2.jpg)
One way Analysis of Variance
(ANOVA)
Comparing k Populations
2
![Page 3: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/3.jpg)
The F test – for comparing k means
Situation
• We have k normal populations
• Let mi and s 2 denote the mean and variance
of population i.
• i = 1, 2, 3, … k.
• Note: we assume that the variance for each
population is unknown but the same.
s12 = s2
2 = … = sk2= s 2
3
![Page 4: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/4.jpg)
We want to test
kH mmmm 3210 :
against
jiH jiA ,pair oneleast at for : mm
4
![Page 5: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/5.jpg)
The F statistic
k
i
n
j
iijkN
k
i
iik
j
xx
xxn
F
1 1
21
1
2
11
where xij = the jth observation in the i th sample.
injki ,,2,1 and ,,2,1
kiin
x
x th
i
n
j
ij
i
i
,,2,1 sample for mean 1
size sample Total 1
k
i
inN
mean Overall 1 1
N
x
x
k
i
n
j
ij
i
5
![Page 6: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/6.jpg)
The ANOVA table
k
i
iiB xxnSS1
2
W
B
MS
MSF
k
i
iikB xxnMS1
2
11
k
i
n
j
iijW
j
xxSS1 1
2
k
i
n
j
iijkNW
j
xxMS1 1
21
1k
kN
Source S.S d.f, M.S. F
Between
Within
The ANOVA table is a tool for displaying the
computations for the F test. It is very important when
the Between Sample variability is due to two or more
factors
6
![Page 7: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/7.jpg)
Computing Formulae:
k
i
n
j
ij
i
x1 1
2
Compute
ixTin
j
iji samplefor Total 1
Total Grand 1 11
k
i
n
j
ij
k
i
i
i
xTG
size sample Total1
k
i
inN
k
i i
i
n
T
1
2
1)
2)
3)
4)
5) 7
![Page 8: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/8.jpg)
The data
• Assume we have collected data from each of
k populations
• Let xi1, xi2 , xi3 , … denote the ni observations
from population i.
• i = 1, 2, 3, … k.
8
![Page 9: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/9.jpg)
Then
1)
2)
k
i i
ik
i
n
j
ijWithinn
TxSS
i
1
2
1 1
2
BetweenSS
k
i i
i
N
G
n
T
1
22
3)
kNSS
kSSF
Within
Between
1
9
![Page 10: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/10.jpg)
Source d.f. Sum of
Squares Mean
Square
F-ratio
Between k - 1 SSBetween MSBetween MSB /MSW
Within N - k SSWithin MSWithin
Total N - 1 SSTotal
Anova Table
SSMS
df
10
![Page 11: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/11.jpg)
Example
In the following example we are comparing weight
gains resulting from the following six diets
1. Diet 1 - High Protein , Beef
2. Diet 2 - High Protein , Cereal
3. Diet 3 - High Protein , Pork
4. Diet 4 - Low protein , Beef
5. Diet 5 - Low protein , Cereal
6. Diet 6 - Low protein , Pork
11
![Page 12: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/12.jpg)
Gains in weight (grams) for rats under six diets
differing in level of protein (High or Low)
and source of protein (Beef, Cereal, or Pork)
Diet 1 2 3 4 5 6
73 98 94 90 107 49
102 74 79 76 95 82
118 56 96 90 97 73
104 111 98 64 80 86
81 95 102 86 98 81
107 88 102 51 74 97
100 82 108 72 74 106
87 77 91 90 67 70
117 86 120 95 89 61
111 92 105 78 58 82
Mean 100.0 85.9 99.5 79.2 83.9 78.7
Std. Dev. 15.14 15.02 10.92 13.89 15.71 16.55
x 1000 859 995 792 839 787
x2 102062 75819 100075 64462 72613 64401
12
![Page 13: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/13.jpg)
Thus
115864678464794321
2
1 1
2
k
i i
ik
i
n
j
ijWithinn
TxSS
i
BetweenSS 933.461260
5272467846
2
1
22
k
i i
i
N
G
n
T
3.456.214
6.922
54/11586
5/933.46121
kNSS
kSSF
Within
Between
54 and 5 with 386.2 2105.0 F
Thus since F > 2.386 we reject H0 13
![Page 14: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/14.jpg)
Source d.f. Sum of
Squares Mean
Square
F-ratio
Between 5 4612.933 922.587 4.3** (p = 0.0023)
Within 54 11586.000 214.556
Total 59 16198.933
Anova Table
* - Significant at 0.05 (not 0.01)
SSSSSS
** - Significant at 0.01
14
![Page 15: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/15.jpg)
Equivalence of the F-test and the t-test
when k = 2
mns
yxt
Pooled
11
2
11 22
mn
smsns
yx
Pooled
the t-test
15
![Page 16: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/16.jpg)
the F-test
knsn
kxxn
s
sF
k
i
i
k
i
ii
k
i
ii
Pooled
Between
11
2
1
2
2
2
1
1
211 21
2
11
2
11
2
12
2
11
nnsnsn
xxnxxn
2
12
2
11numerator xxnxxn
2r denominato pooleds
16
![Page 17: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/17.jpg)
2
21
221122
2
22
nn
xnxnxnxxn
2
21
221111
2
11
nn
xnxnxnxxn
2
212
21
2
21 xxnn
nn
2
212
21
2
2
1 xxnn
nn
17
![Page 18: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/18.jpg)
2
212
21
2
12
2
212
22
2
11 xxnn
nnnnxxnxxn
2
21
21
21 xxnn
nn
2
21
21
11
1xx
nn
2
2
2
21
21
11
1t
s
xx
nn
FPooled
Hence
18
![Page 19: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/19.jpg)
Gains in weight (grams) for rats under six diets
differing in level of protein (High or Low)
and source of protein (Beef, Cereal, or Pork)
Diet 1 2 3 4 5 6
73 98 94 90 107 49
102 74 79 76 95 82
118 56 96 90 97 73
104 111 98 64 80 86
81 95 102 86 98 81 107 88 102 51 74 97
100 82 108 72 74 106
87 77 91 90 67 70
117 86 120 95 89 61
111 92 105 78 58 82
Mean 100.0 85.9 99.5 79.2 83.9 78.7
Std. Dev. 15.14 15.02 10.92 13.89 15.71 16.55
x 1000 859 995 792 839 787
x2 102062 75819 100075 64462 72613 64401 19
SAS Code for one-way ANOVA
![Page 20: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/20.jpg)
20
Data oneway;
Input diet $ weight_gain;
Datalines;
1 73
1 102
1 118
1 104
1 81
1 107
1 100
1 87
1 117
1 111
2 98
2 74
2 56
2 111
2 95
2 88
2 82
2 77
2 86
2 92
3 94
3 79
3 96
3 98
3 102
3 102
3 108
3 91
3 120
3 105
4 90
4 76
4 90
4 64
4 86
4 51
4 72
4 90
4 95
4 78
5 107
5 95
5 97
5 80
5 98
5 74
5 74
5 67
5 89
5 58
6 49
6 82
6 73
6 86
6 81
6 97
6 106
6 70
6 61
6 82
;
Run;
Note: there are
easier ways to
enter the data.
We will come
to that later.
![Page 21: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/21.jpg)
SAS Code for one-way ANOVA
To test our hypothesis,
we use the following
code in SAS:
• “class” tells SAS the classification variable. In general, this is
going to be the effect that you are studying. In this case, the
effect is “diet.”
• “model” tells SAS the dependent variable. The general format
is “model Y = X” where Y is the dependent variable, and X is
the independent variable. In this case, weight_gain is
dependent on diet.
• Often a “quit” statement is necessary, because SAS may
continue to run a procedure until either another one has been
run, or SAS has been told to quit.
PROC ANOVA DATA = oneway;
class diet;
model weight_gain = diet;
RUN;
QUIT;
![Page 22: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/22.jpg)
SAS Output
The ANOVA Procedure
Class Level Information
Class Levels Values
diet 6 1 2 3 4 5 6
Number of Observations Read 60
Number of Observations Used 60
![Page 23: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/23.jpg)
The ANOVA Procedure
Dependent Variable: weight_gain
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 5 4612.93333 922.58667 4.30 0.0023
Error 54 11586.00000 214.55556
Corrected Total 59 16198.93333
R-Square Coeff Var Root MSE weight_gain Mean
0.284768 16.67039 14.64772 87.86667
Source DF Anova SS Mean Square F Value Pr > F
diet 5 4612.933333 922.586667 4.30 0.0023
![Page 24: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/24.jpg)
Factorial Experiments
Analysis of Variance
24
![Page 25: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/25.jpg)
• Dependent variable Y
• k Categorical independent variables A, B,
C, … (the Factors)
• Let
– a = the number of categories of A
– b = the number of categories of B
– c = the number of categories of C
– etc.
25
![Page 26: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/26.jpg)
The Completely Randomized Design
• We form the set of all treatment combinations
– the set of all combinations of the k factors
• Total number of treatment combinations
– t = abc….
• In the completely randomized design n
experimental units (test animals , test plots, etc.
are randomly assigned to each treatment
combination.
– Total number of experimental units N = nt=nabc..
26
![Page 27: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/27.jpg)
The treatment combinations can thought to be
arranged in a k-dimensional rectangular block
A
1
2
a
B 1 2 b
27
![Page 28: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/28.jpg)
A
B
C
28
![Page 29: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/29.jpg)
• The Completely Randomized Design is called balanced
• If the number of observations per treatment combination is unequal the design is called unbalanced. (resulting mathematically more complex analysis and computations)
• If for some of the treatment combinations there are no observations the design is called incomplete. (In this case it may happen that some of the parameters - main effects and interactions - cannot be estimated.)
29
![Page 30: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/30.jpg)
Example: Two-way ANOVA
(two-factor experiment)
In this example we are examining the effect of
We have n = 10 test animals randomly
assigned to k = 6 diets
The level of protein A (High or Low) and
the source of protein B (Beef, Cereal, or
Pork) on weight gains (grams) in rats.
30
![Page 31: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/31.jpg)
The k = 6 diets are the 6 = 3×2 Level-
Source combinations
1. High - Beef
2. High - Cereal
3. High - Pork
4. Low - Beef
5. Low - Cereal
6. Low - Pork
31
![Page 32: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/32.jpg)
Source of Protein
Level
of
Protein
Beef Cereal Pork
High
Low
Treatment combinations
Diet 1 Diet 2 Diet 3
Diet 4 Diet 5 Diet 6
32
![Page 33: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/33.jpg)
Level of Protein Beef Cereal Pork Overall
Low 79.20 83.90 78.70 80.60
Source of Protein
High 100.00 85.90 99.50 95.13
Overall 89.60 84.90 89.10 87.87
Summary Table of Means
33
![Page 34: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/34.jpg)
Table Gains in weight (grams) for rats under six diets differing in level of protein (High or Low) and s
ource of protein (Beef, Cereal, or Pork)
Level of Protein High Protein Low protein
Source of Protein Beef Cereal Pork Beef Cereal Pork
Diet 1 2 3 4 5 6
73 98 94 90 107 49 102 74 79 76 95 82 118 56 96 90 97 73 104 111 98 64 80 86 81 95 102 86 98 81 107 88 102 51 74 97 100 82 108 72 74 106 87 77 91 90 67 70 117 86 120 95 89 61
111 92 105 78 58 82
Mean 100.0 85.9 99.5 79.2 83.9 78.7
Std. Dev. 15.14 15.02 10.92 13.89 15.71 16.55 34
![Page 35: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/35.jpg)
35
Data twoway;
Input Protein $ Source $ weight_gain;
Datalines;
High Beef 73
High Beef 102
High Beef 118
High Beef 104
High Beef 81
High Beef 107
High Beef 100
High Beef 87
High Beef 117
High Beef 111
High Cereal 98
High Cereal 74
High Cereal 56
High Cereal 111
High Cereal 95
High Cereal 88
High Cereal 82
High Cereal 77
High Cereal 86
High Cereal 92
High Pork 94
High Pork 79
High Pork 96
High Pork 98
High Pork 102
High Pork 102
High Pork 108
High Pork 91
High Pork 120
High Pork 105
Low Beef 90
Low Beef 76
Low Beef 90
Low Beef 64
Low Beef 86
Low Beef 51
Low Beef 72
Low Beef 90
Low Beef 95
Low Beef 78
Low Cereal 107
Low Cereal 95
Low Cereal 97
Low Cereal 80
Low Cereal 98
Low Cereal 74
Low Cereal 74
Low Cereal 67
Low Cereal 89
Low Cereal 58
Low Pork 49
Low Pork 82
Low Pork 73
Low Pork 86
Low Pork 81
Low Pork 97
Low Pork 106
Low Pork 70
Low Pork 61
Low Pork 82
;
Run;
![Page 36: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/36.jpg)
SAS Code for two-way ANOVA
To test our hypotheses,
we use the following
code in SAS:
• “class” tells SAS the two classification variables, which are
generally going to be the effects that you are studying. In this
case, the effects are “Protein” and “Source”
• “model” tells SAS the dependent variable. The general format
is “model Y = X1 X2 X1*X2” where Y is the dependent
variable, X1 and X2 are independent variables. X1*X2 means
the interaction of X1 and X2.
• Often a “quit” statement is necessary, because SAS may
continue to run a procedure until either another one has been
run, or SAS has been told to quit.
PROC ANOVA DATA = twoway;
class Protein Source;
model weight_gain = Protein Source
Protein*Source;
RUN;
QUIT;
![Page 37: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/37.jpg)
SAS Output
The ANOVA Procedure
Class Level Information
Class Levels Values
Protein 2 High Low
Source 3 Beef Cereal Pork
Number of Observations Read 60
Number of Observations Used 60
![Page 38: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/38.jpg)
The ANOVA Procedure
Dependent Variable: weight_gain
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 5 4612.93333 922.58667 4.30 0.0023
Error 54 11586.00000 214.55556
Corrected Total 59 16198.93333
R-Square Coeff Var Root MSE weight_gain Mean
0.284768 16.67039 14.64772 87.86667
Source DF Anova SS Mean Square F Value Pr > F
Protein 1 3168.266667 3168.266667 14.77 0.0003
Source 2 266.533333 133.266667 0.62 0.5411
Protein*Source 2 1178.133333 589.066667 2.75 0.0732
![Page 39: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/39.jpg)
Profiles of the response relative
to a factor
A graphical representation of the
effect of a factor on a reponse
variable (dependent variable)
39
![Page 40: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/40.jpg)
Profile Y for A
Y
Levels of A
a 1 2 3 …
This could be for an
individual case or
averaged over a group
of cases
This could be for
specific level of
another factor or
averaged levels of
another factor
40
![Page 41: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/41.jpg)
70
80
90
100
110
Beef Cereal Pork
Weig
ht
Ga
in
High Protein
Low Protein
Overall
Profiles of Weight Gain for
Source and Level of Protein
41
![Page 42: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/42.jpg)
70
80
90
100
110
High Protein Low Protein
Weig
ht
Ga
in
Beef
Cereal
Pork
Overall
Profiles of Weight Gain for
Source and Level of Protein
42
![Page 43: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/43.jpg)
Example – Four factor experiment
Four factors are studied for their effect on Y (luster of paint film). The four factors are:
Two observations of film luster (Y) are taken
for each treatment combination
1) Film Thickness - (1 or 2 mils)
2) Drying conditions (Regular or Special)
3) Length of wash (10,30,40 or 60 Minutes), and
4) Temperature of wash (92 ˚C or 100 ˚C)
43
![Page 44: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/44.jpg)
The data is tabulated below: Regular Dry Special Dry Minutes 92 C 100 C 92C 100 C 1-mil Thickness 20 3.4 3.4 19.6 14.5 2.1 3.8 17.2 13.4 30 4.1 4.1 17.5 17.0 4.0 4.6 13.5 14.3 40 4.9 4.2 17.6 15.2 5.1 3.3 16.0 17.8 60 5.0 4.9 20.9 17.1 8.3 4.3 17.5 13.9 2-mil Thickness 20 5.5 3.7 26.6 29.5 4.5 4.5 25.6 22.5 30 5.7 6.1 31.6 30.2 5.9 5.9 29.2 29.8 40 5.5 5.6 30.5 30.2 5.5 5.8 32.6 27.4 60 7.2 6.0 31.4 29.6 8.0 9.9 33.5 29.5
44
![Page 45: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/45.jpg)
Definition:
A factor is said to not affect the response if
the profile of the factor is horizontal for all
combinations of levels of the other factors:
No change in the response when you change
the levels of the factor (true for all
combinations of levels of the other factors)
Otherwise the factor is said to affect the
response:
45
![Page 46: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/46.jpg)
Profile Y for A – A affects the response
Y
Levels of A
a 1 2 3 …
Levels of B
46
![Page 47: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/47.jpg)
Profile Y for A – no affect on the response
Y
Levels of A
a 1 2 3 …
Levels of B
47
![Page 48: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/48.jpg)
Definition:
• Two (or more) factors are said to interact if changes in the response when you change the level of one factor depend on the level(s) of the other factor(s).
• Profiles of the factor for different levels of the other factor(s) are not parallel
• Otherwise the factors are said to be additive .
• Profiles of the factor for different levels of the other factor(s) are parallel.
48
![Page 49: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/49.jpg)
Interacting factors A and B Y
Levels of A
a 1 2 3 …
Levels of B
49
![Page 50: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/50.jpg)
Additive factors A and B Y
Levels of A
a 1 2 3 …
Levels of B
50
![Page 51: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/51.jpg)
• If two (or more) factors interact each factor
effects the response.
• If two (or more) factors are additive it still
remains to be determined if the factors
affect the response
• In factorial experiments we are interested in
determining
– which factors effect the response and
– which groups of factors interact .
51
![Page 52: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/52.jpg)
Order of testing in factorial experiments
1. Test first the higher order interactions.
2. If an interaction is present there is no need to test lower order interactions or main effects involving those factors. All factors in the interaction affect the response and they interact
3. The testing continues for lower order interactions and main effects for factors which have not yet been determined to affect the response.
52
![Page 53: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/53.jpg)
More SAS Program: Proc GLM
The ANOVA procedure is one of several procedures available in SAS/STAT software for analysis of variance. The ANOVA procedure is designed to handle balanced data (that is, data with equal numbers of observations for every combination of the classification factors), whereas the GLM procedure can analyze both balanced and unbalanced data. Because PROC ANOVA takes into account the special structure of a balanced design, it is faster and uses less storage than PROC GLM for balanced data.
![Page 54: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/54.jpg)
Proc GLM
PROC GLM DATA = twoway;
class Protein Source;
model weight_gain = Protein Source Protein*Source;
lsmeans Protein Source Protein*Source /out=outmns;
*gives least square means and outputs them into another data set called 'outmns';
means Protein Source /cldiff bon;
*ask SAS for the confidence limits for the difference of means and the type of comparison;
output out=resout p=preds rstudent=exstdres;
*outputs the residuals and predicted value to a data set called 'resout';
RUN;
QUIT;
![Page 55: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/55.jpg)
Proc GLM, continued title 'Profile/Interaction Plots';
symbol i=j;
*tells SAS to draw lines between joint means;
proc gplot data=outmns;
where poison ne . and treatment ne .;
*remove the marginal means from the data set since we only wish to plot joint means;
plot lsmean*Protein=Source;
plot lsmean*Source=Protein;
run; quit;
goptions reset=all; *resets PROC GPLOT options;
title 'Residual Plot';
proc gplot data=resout;
plot exstdres*preds;
run; quit;
![Page 56: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/56.jpg)
Mean versus LS Mean (LSM)
56
![Page 57: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/57.jpg)
Mean versus LS Mean (LSM)
57
Note, for balanced designs,
as true for our examples,
the mean and LSM are the same.
![Page 58: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/58.jpg)
Bonferroni Pairwise Mean Comparisons The GLM Procedure
Bonferroni (Dunn) t Tests for weight_gain
NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type
II error rate than Tukey's for all pairwise comparisons.
Alpha 0.05
Error Degrees of Freedom 54
Error Mean Square 214.5556
Critical Value of t 2.00488
Minimum Significant Difference 7.5825
Comparisons significant at the 0.05 level are indicated by ***.
Difference
Protein Between Simultaneous 95%
Comparison Means Confidence Limits
High - Low 14.533 6.951 22.116 ***
Low - High -14.533 -22.116 -6.951 ***
![Page 59: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/59.jpg)
The GLM Procedure
Bonferroni (Dunn) t Tests for weight_gain
NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type
II error rate than Tukey's for all pairwise comparisons.
Alpha 0.05
Error Degrees of Freedom 54
Error Mean Square 214.5556
Critical Value of t 2.47085
Minimum Significant Difference 11.445
Comparisons significant at the 0.05 level are indicated by ***.
Difference Simultaneous
Source Between 95% Confidence
Comparison Means Limits
Beef - Pork 0.500 -10.945 11.945
Beef - Cereal 4.700 -6.745 16.145
Pork - Beef -0.500 -11.945 10.945
Pork - Cereal 4.200 -7.245 15.645
Cereal - Beef -4.700 -16.145 6.745
Cereal - Pork -4.200 -15.645 7.245
![Page 60: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/60.jpg)
![Page 61: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/61.jpg)
![Page 62: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/62.jpg)
![Page 63: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/63.jpg)
Tukey pairwise mean comparisons
PROC GLM DATA = twoway;
class Protein Source;
model weight_gain = Protein Source Protein*Source;
means Protein Source /tukey;
RUN;
QUIT;
![Page 64: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/64.jpg)
The GLM Procedure
Tukey's Studentized Range (HSD) Test for weight_gain
NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type
II error rate than REGWQ.
Alpha 0.05
Error Degrees of Freedom 54
Error Mean Square 214.5556
Critical Value of Studentized Range 2.83533
Minimum Significant Difference 7.5825
Means with the same letter are not significantly different.
Tukey Grouping Mean N Protein
A 95.133 30 High
B 80.600 30 Low
![Page 65: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/65.jpg)
The GLM Procedure
Tukey's Studentized Range (HSD) Test for weight_gain
NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type
II error rate than REGWQ.
Alpha 0.05
Error Degrees of Freedom 54
Error Mean Square 214.5556
Critical Value of Studentized Range 3.40823
Minimum Significant Difference 11.163
Means with the same letter are not significantly different.
Tukey Grouping Mean N Source
A 89.600 20 Beef
A
A 89.100 20 Pork
A
A 84.900 20 Cereal
![Page 66: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/66.jpg)
Models for factorial
Experiments
66
![Page 67: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/67.jpg)
Part I. Factor Effects Model
67
![Page 68: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/68.jpg)
The Single Factor Experiment (One-way ANOVA)
Situation
• We have t = a treatment combinations
• Let mi and s 2 denote the mean and variance
of treatment (population) i.
• i = 1, 2, 3, … a.
• Note: we assume that the variance for each
population is unknown but the same.
s12 = s2
2 = … = sa2= s 2
68
![Page 69: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/69.jpg)
The data
• Assume we have collected data for each of
the a treatments
• Let yi1, yi2 , yi3 , … , yin denote the n
observations for treatment i.
• i = 1, 2, 3, … a.
69
![Page 70: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/70.jpg)
The model
Note:
ij i ij i i ijy ym m m
i ij i ijm m m m
where ij ij iy m
1
1 k
i
ikm m
i i m m
has N(0,s 2) distribution
(overall mean effect)
(Effect of Factor A)
Note: 1
0a
i
i
by their definition. 70
![Page 71: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/71.jpg)
Model 1:
ij i ijy m
yij (i = 1, … , a; j = 1, …, n) are independent
Normal with mean mi and variance s 2.
Model 2:
where ij (i = 1, … , a; j = 1, …, n) are independent
Normal with mean 0 and variance s 2.
ij i ijy m Model 3:
where ij (i = 1, … , a; j = 1, …, n) are independent
Normal with mean 0 and variance s 2 and
1
0a
i
i
71
![Page 72: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/72.jpg)
The Two Factor Experiment
Situation
• We have t = ab treatment combinations
• Let mij and s 2 denote the mean and variance
of observations from the treatment
combination when A = i and B = j.
• i = 1, 2, 3, … a, j = 1, 2, 3, … b.
72
![Page 73: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/73.jpg)
The data
• Assume we have collected data (n observations)
for each of the t = ab treatment combinations.
• Let yij1, yij2 , yij3 , … , yijn denote the n observations
for treatment combination - A = i, B = j.
• i = 1, 2, 3, … a, j = 1, 2, 3, … b.
73
![Page 74: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/74.jpg)
The model Note:
ijk ij ijk ij ij ijky ym m m
i j ij i j ijm m m m m m m m m
where ijk ijk ijy m
1 1 1 1
1 1 1, and
a b b a
ij i ij j ij
i j j iab b am m m m m m
, ,i i j j m m m m
follows N(0,s 2) distribution
and
i j ijkijm
ij i jij m m m m
74
![Page 75: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/75.jpg)
The model Note:
ijk ij ijk ij ij ijky ym m m
i j ij i j ijm m m m m m m m m
where ijk ijk ijy m
1 1 1 1
1 1 1, and
a b b a
ij i ij j ij
i j j iab b am m m m m m
, ,i i j j m m m m
follows N(0,s 2) distribution
Note: 1
0a
i
i
by their definition.
i j ijkijm
75
![Page 76: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/76.jpg)
ijk i j ijkijy m
Model :
where ijk (i = 1, … , a; j = 1, …, b ; k = 1, …, n) are
independent Normal with mean 0 and variance s 2 and
1
0a
i
i
1
0b
j
j
1 1
and 0a b
ij iji j
Main effects Interaction
Effect Mean Error
76
![Page 77: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/77.jpg)
ijk i j ijkijy m
Maximum Likelihood Estimates
where ijk (i = 1, … , a; j = 1, …, b ; k = 1, …, n) are
independent Normal with mean 0 and variance s 2 and
1 1 1
ˆa b n
ijk
i j k
y y abnm
1 1
ˆb n
i i ijk
j k
y y y bn y
1 1
ˆa n
j j ijk
i k
y y y an y
77
![Page 78: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/78.jpg)
^
ij i jijy y y y
1
n
ijk i j
k
y n y y y
2
2
1 1 1
1ˆ
a b n
ijk ij
i j k
y ynab
s
2
1 1 1
^1 ˆˆˆa b n
ijk i j iji j k
ynab
m
This is not an unbiased estimator of s 2 (usually the
case when estimating variance.)
The unbiased estimator results when we divide by
ab(n -1) instead of abn 78
![Page 79: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/79.jpg)
22
1 1 1
1
1
a b n
ijk ij
i j k
s y yab n
2
1 1 1
^1 ˆˆˆ1
a b n
ijk i j iji j k
yab n
m
The unbiased estimator of s 2 is
1
1Error ErrorSS MS
ab n
2
1 1 1
a b n
Error ijk ij
i j k
SS y y
where
79
![Page 80: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/80.jpg)
22
1 1 1 1
^a b a b
AB ij i jiji j i j
SS y y y y
Testing for Interaction:
1
1 1AB
AB
Error Error
SSa bMS
FMS MS
where
We want to test:
H0: ()ij = 0 for all i and j, against
HA: ()ij ≠ 0 for at least one i and j.
The test statistic
80
![Page 81: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/81.jpg)
( 1)( 1), ( 1)AB
Error
MSF F a b ab n
MS
We reject
H0: ()ij = 0 for all i and j,
If
81
![Page 82: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/82.jpg)
22
1 1
ˆa a
A i i
i i
SS y y
Testing for the Main Effect of A:
1
1A
A
Error Error
SSaMS
FMS MS
where
We want to test:
H0: i = 0 for all i, against
HA: i ≠ 0 for at least one i.
The test statistic
82
![Page 83: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/83.jpg)
( 1), ( 1)A
Error
MSF F a ab n
MS
We reject
H0: i = 0 for all i,
If
83
![Page 84: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/84.jpg)
2
2
1 1
ˆb b
B j j
j j
SS y y
Testing for the Main Effect of B:
1
1B
B
Error Error
SSbMS
FMS MS
where
We want to test:
H0: j = 0 for all j, against
HA: j ≠ 0 for at least one j.
The test statistic
84
![Page 85: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/85.jpg)
( 1), ( 1)B
Error
MSF F b ab n
MS
We reject
H0: j = 0 for all j,
If
85
![Page 86: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/86.jpg)
The ANOVA Table
Source S.S. d.f. MS =SS/df F
A SSA a - 1 MSA MSA / MSError
B SSB b - 1 MSB MSB / MSError
AB SSAB (a - 1)(b - 1) MSAB MSAB/ MSError
Error SSError ab(n - 1) MSError
Total SSTotal abn - 1
86
![Page 87: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/87.jpg)
The ANOVA Procedure
Dependent Variable: weight_gain
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 5 4612.93333 922.58667 4.30 0.0023
Error 54 11586.00000 214.55556
Corrected Total 59 16198.93333
R-Square Coeff Var Root MSE weight_gain Mean
0.284768 16.67039 14.64772 87.86667
Source DF Anova SS Mean Square F Value Pr > F
Protein 1 3168.266667 3168.266667 14.77 0.0003
Source 2 266.533333 133.266667 0.62 0.5411
Protein*Source 2 1178.133333 589.066667 2.75 0.0732
![Page 88: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/88.jpg)
Part II. General Linear Model
88
![Page 89: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/89.jpg)
One-way ANOVA
The ANOVA is indeed a special case of the
general linear model (GLM) when all the
predictors are categorical variables.
For one-way ANOVA, we have only one
categorical predictor. As shown in the
following slides, we can easily translate the
ANOVA into a GLM using dummy variables.
89
![Page 90: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/90.jpg)
Dummy Variables • Dummy coding
• 0s and 1s
– For a categorical predictor with k categories, k-1 dummy variables will go into the regression equation leaving out one
reference category (e.g. control)
• Coefficients are interpreted
as change with respect to the
reference variable (the one
with all zeros)
– In this case group 3
Group D1 D2
1 1 0
2 0 1
3 0 0
![Page 91: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/91.jpg)
GLM representation and
interpretations • GLM model:
• Relation to category/group means:
• Therefore the ANOVA hypothesis:
• Can be expressed as:
mmmm
mmmm
mmmm
003
10
01
2122113
22122112
12122111
DD:Group
DD:2Group
DD:1Group
m 2211 DDY
3210 : mmm H
0: 210 H
![Page 92: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/92.jpg)
Two-way ANOVA
We will revisit the two-way ANOVA example
on the impact of weight_gain from two
factors:
(1)Protein level (denoted as Protein) – it has
two levels: High/Low
(2)Protein source (denoted as Source) – it has
three levels: Beef/Cereal/Pork
92
![Page 93: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/93.jpg)
Dummy Variables
Source D1 D2
Beef 1 0
Cereal 0 1
Pork 0 0
Protein D
High 1
Low 0
![Page 94: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/94.jpg)
GLM representation and
interpretations • GLM model:
• Relation to category/group means:
mmm
mmm
mmm
mmm
mmm
mmm
0*00*0000
1*00*0100
0*01*0010
0*10*1001
1*10*1101
0*11*1011
543216
3543215
2543214
1543213
531543212
421543211
:Low/Pork
:Low/Cereal
:Low/Beef
:High/Pork
:lHigh/Cerea
:High/Beef
m 251423121 ** DDDDDDDY
![Page 95: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/95.jpg)
GLM representation and
interpretations • Test for Interaction:
• Test for Protein (level) main effect:
• Test for (protein) Source main effect:
0: 540 H
0: 10 H
0: 320 H
![Page 96: Analysis of Variance - Stony Brookzhu/ams394/Lab11.pdfAnalysis of Variance 1 One way Analysis of Variance (ANOVA) Comparing k Populations 2 The F test – for comparing k means Situation](https://reader033.vdocuments.net/reader033/viewer/2022042513/5f7988af87e5172ceb4371d5/html5/thumbnails/96.jpg)
Acknowledgement:
• We thank colleagues who posted their lecture notes on the internet@!
• Please note that in SAS, we have several procedures that will enable you to perform ANOVA. These include Proc ANOVA and Proc GLM, plus several other procedures such as Proc Mixed, etc. The ANOVA procedures we have learned so far are just the basic fixed effect ANOVAs. In the future we will also learn those with random effect, and mixed effects. See the following websites for a review and preview:
• http://www.ats.ucla.edu/stat/sas/library/SASAnova_mf.htm
• http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#mixed_toc.htm
• http://www.hawaii.edu/hisug/pdf/AnnMariaprocmixed.pdf
96