notes_anova_04-13-08
Post on 02-Jun-2018
216 Views
Preview:
TRANSCRIPT
8/10/2019 Notes_ANOVA_04-13-08
http://slidepdf.com/reader/full/notesanova04-13-08 1/12
One-way Analysis of Variance (available on E-book)
In basic statistics, the F-distribution is used in: (1) making inferences about two population variances—i.e., homogeneity of variance test, and (2) analysis of variance (ANOVA). In this class, we will cover
only the ANOVA test.
E.g., if samples are drawn of size n1=8 from Population 1 and size n2=5 from Population 2, then F has df= 7, 4 (i.e., (8-1), (5-1)).
718
)11(8
2
2
11 =−
−==∑
df x x
sn i
472
2
2
1 and df hass
sF ==
415
)22(5
2
2
22 =−
−==
∑df
x xsn
i
Note that df for F is always stated as first numerator df and then denominator df .
Finding critical values of the F-distribution using Table V.
Characteristics of the F-distribution
1. F > 0.2. The F-distribution is not symmetric; it is skewed to the right.
3. The F-distribution is asymptotic to the horizontal axis on the right hand side.
4. As df increase, the high point of the F-distribution approaches 1.
5. The shape of the F-distribution depends upon the degrees of freedom in the numerator anddenominator (see Figure 10 above). This is similar to Student’s t-distribution, whose shape
depends upon the degrees of freedom.
6. The total area under the curve is 1.
Fisher’s F-distribution
If σ12 = σ2
2 and s1
2 and s2
2 are sample variances from independent simple random samples of size n1
and n2, respectively, drawn from normal populations, then
2
2
2
1
s
sF =
follows the F-distribution with n1-1 degrees of freedom in the numerator and n2-1 degrees of freedom
in the denominator.
8/10/2019 Notes_ANOVA_04-13-08
http://slidepdf.com/reader/full/notesanova04-13-08 2/12
2
Find the critical F-value for a right-tailed test with α=0.05, degrees of freedom in the numerator = 10 anddegrees of freedom in the denominator = 6.
F 0.05, 10. 6
Area in the right tail
(i.e., α or significance
level)
df of numerator
df of denominator
F0.05, 10, 6
8/10/2019 Notes_ANOVA_04-13-08
http://slidepdf.com/reader/full/notesanova04-13-08 3/12
3
Analysis of Variance (ANOVA) is an inferential method that is used to test the equality of three or morepopulation means. ANOVA is an extension of a t-test for independent samples (section 10.2)
H0: µ 1 = µ 2 = … = µ kH1: not all means are equal
For example, for k=3 the null hypothesis and alternative hypotheses are:H0: µ 1 = µ 2 = µ 3
H1: µ 1 = µ 2 ≠ µ 3µ 1 ≠ µ 2 = µ 3µ 1 = µ 3 ≠ µ 2µ 1 ≠ µ 2 ≠ µ 3
Assumptions of a One-Way ANOVA
1. There are k simple random samples from k populations.
2. The k samples are independent of each other, that is, the subjects in one group
cannot be related in any way to subjects in a second group.3. The populations are normally distributed.
4. The populations have the same variance; that is, each treatment group haso ulation variance σ
2.
Population 1
Population 3
Population 2
8/10/2019 Notes_ANOVA_04-13-08
http://slidepdf.com/reader/full/notesanova04-13-08 4/12
ANOVA Test using the F-distribution—Hypothesis Test Regarding Three or More Means with σ
Unknown
Assumptions:
• k simple random samples from k populations. • k populations are normally distributed.
• k samples are independent of each other • The populations have the same variances, σ2.
Step 1: A claim is made regarding the means of three or more populations. The null and alternative hypotheses are written as:
H0: µ 1 = µ 2 = … = µ kH1: not all means are equal
Step 2: Select a level of significance, α, and find the right-tailed critical value for the F-distribution with df=(k-1),
(n1+n2+…+nk -k). The rejection region (or critical region) is the set of all values of the test statistic to the right of the critical
F-value.
Step 3: Calculate the test statistic or calculated F-value:
a. Calculate the grand mean of the combined data set, x , by adding up all the observations and dividing by the number of
observations.
b. Find the sample mean for each population or treatment ( 1 x = sample mean from population 1; 2 x = sample mean from
population 2; and so on).
c. Find the sample variance for each population (s12 = sample variance from population 1; s2
2 = sample variance from
population 2; and so on).
d. Calculate the mean square due to treatment. (Another name for mean square is variance which is equal to the “mean” of
the squared deviations about x ).
1k
)xx(n...)xx(n)xx(nMST
2k k 222
211
−
−++−+−= ,
where n1 is the sample size from population 1;
n2 is the sample size from population 2; and so on
k is the number of populations, or treatment levels.
e. Calculate the mean square due to error:
k )n...nn(
s)1n(...s)1n(s)1n(MSE
k 21
2
k k
2
22
2
11
−+++
−++−+−= .
f. Calculate the F test statistic:
)errortoduesquaremean(MSE
)treatmenttoduesquaremean(MSTF =
The calculations in Step 3 are reported in an ANOVA table as shown below.
Source of Variation Sum of Squares Degrees of Freedom Mean Square
F-
Statistic
Treatment SST k-1 MST=SST/(k-1) F=MST/MSE
Error SSE n1+n2+…+nk -k MSE=SSE / ( n1+n2+…+nk -k)
Total SS n1+n2+…+nk -1
Step 4: Draw a conclusion:
• Compare the calculated F-value (or F test statistic) to the critical F-value and state whether or not H0 is rejected at the
specified α.
If F > Fα, (k-1),(n1+n2+…+nk-k), reject H0; otherwise do not reject H0.
• Interpret the conclusion in the context of the problem.
Fα, k-1 , n1+n2+…+nk-k
The numerator in the computation
of MST is called the “sum of
s uares treatment” or SST.
The numerator in the computation
of MSE is called the “sum of
s uares error” or SSE.
8/10/2019 Notes_ANOVA_04-13-08
http://slidepdf.com/reader/full/notesanova04-13-08 5/12
5
ANOVA—Blood Glucose Levels of Rats.
Problem: Researcher Jelodar Gholamali wanted to determine the effectiveness of various
treatments on glucose levels of diabetic rats. He randomly assigned diabetic albino rats into four
treatments groups. Group 1 rats served as a control group and were fed a regular diet. Group 2
rats were served a regular diet supplemented with a herb, fenugreek. Group 3 rats were served a
regular diet supplemented with garlic. Group 4 rats were served a regular diet supplemented
with onion. The basis for the study is that Persian folklore states that diets supplemented withfenugreek, garlic, or onion help to treat diabetes. After 15 days of treatment, the blood glucose
was measured in milligrams per deciliter (mg/dL). The results are presented in the table below.
Carry out a test of the relevant null hypothesis to test the claim made by Persian folklore that
fenugreek, garlic, and onion help treat diabetes. Use α = 0.05. Show all 4 steps of test of a
hypothesis.
Step 1: A claim is made regarding the means of the three populations. The null and alternative hypotheses are
written as:
H0: µ 1 = µ 2 = µ 3H1: not all means are equal
Step 2: Select α = 0.05 and find the right-tailed critical value for the F-distribution with df=(k-1), (n1+n2+n3+n4-k)
or df=3, 28.
F0.05, 2, 33 = 2.99
Control Fenugreek Garlic Onion
288.1 229.1 177.4 299.7
296.8 240.7 202.2 258.3
267.8 239.4 163.1 286.8256.7 207.7 184.7 244.0
292.1 225.7 197.9 267.1
282.9 230.8 164.6 297.1
260.3 206.6 193.9 249.9
283.8 213.3 158.1 265.1
8/10/2019 Notes_ANOVA_04-13-08
http://slidepdf.com/reader/full/notesanova04-13-08 6/12
6
Step 3:
a. Calculate the grand mean of the entire data set:
54.23832
1.2659.249...8.2961.288x =
++++=
b. Find the sample mean of each population, where control = Population 1, Fenugreek = Population 2, Garlic =
Population 3 and Onion = Population 4.
56.2788
8.283...8.2961.2881x =
+++=
16.2248
3.213...7.2401.2292x =
+++=
24.1808
1.158...2.2024.1773x =
+++=
00.2718
1.265...3.2587.2994x =
+++=
c. Find the sample variance for each population.
77.225
18
2)56.2788.283(...
2)56.2788.296(
2)56.2781.288(2
1s =
−
−++−+−=
99.18118
2)16.2243.213(...
2)16.2247.240(
2)16.2241.229(2
2s =−
−++−+−=
03.29118
2)24.1801.158(...
2)24.1802.202(
2)24.1804.177(2
3s =−
−++−+−=
58.44818
2.)2711.265(...
2.)2713.258(
2.)2717.299(2
3s =−
−++−+−=
d. Compute MST:
8.695,16
3
4112.087,50
13
2)54.238271(8
2)54.23824.180(8
2)54.23816.224(8
2)54.23856.278(8
MST ==
−
−+−+−+−=
e. Compute MSE:
82.28628
89.030,8
432
48.448)18(03.291)18(99.181)18(77.225)18(MSW ==
−
−+−+−+−=
f. Compute F test statistic.
21.5882.286
8.695,16
MSE
MST
errortoduesquareMean
treatmenttoduesquareMeanF ====
ANOVA Table:Source ofVariation
Sum ofSquares
Degrees ofFreedom Mean Square F-Test Statistic
Between 50,087.41 k-1=4-1=3 MST=16,695.80 calc F=58.21
Within 8,030.89 n1+n2+n3+n4-k=28 MSE=286.82
Total 58,118.30 n1+n2+n3+n4-1=31
Step 4: Conclusion—Because the calculated F-statistic=58.21 is less than the critical F=2.99,reject H0 at the 0.05 significance level. At least one of the population means is different from the
others.
8/10/2019 Notes_ANOVA_04-13-08
http://slidepdf.com/reader/full/notesanova04-13-08 7/12
Excel: ANOVA—Single Factor
Step 1: Enter the raw data in columns A, B, C, ... for each sample (or treatment). Step 2: From the Windows menubar, select Tools/Data Analysis/ANOVA: Single Factor.
Step 3: With the cursor in the “Input Range:” box, highlight the data. Click OK.
Perform the calculations using Excel.
A B C D
1 Control Fenugreek Garlic Onion
2 288.1 229.1 177.4 299.7
3 296.8 240.7 202.2 258.3
4 267.8 239.4 163.1 286.8
5 256.7 207.7 184.7 244.0
6 292.1 225.7 197.9 267.17 282.9 230.8 164.6 297.1
8 260.3 206.6 193.9 249.9
9 283.8 213.3 158.1 265.1
10
11
12 Anova: Single Factor
13 SUMMARY
14 Groups Count Sum Average Variance
15 Control 8 2228.5 278.5625 225.7713
16 Fenugreek 8 1793.3 224.1625 181.9884
17 Garlic 8 1441.9 180.2375 291.0341
18 Onion 8 2168.0 271 448.58
19
20 ANOVA
21 Source of Variation SS df MS F P-value F crit
22Between Groups(SST) 50090.69 3 16696.9 58.2091
3.74E-12 2.946685
23 Within Groups (SSE) 8031.616 28 286.8434
24 Total 58122.31 31
Note: Be sure the Data Analysis Tool Pak is activated. This is
done by selecting the Tools menu and highlighting, Add-Ins.Check the box for the Analysis Tool Pak and select OK.
F-statistic (or
calculated F)
Crit
8/10/2019 Notes_ANOVA_04-13-08
http://slidepdf.com/reader/full/notesanova04-13-08 8/12
Logic of the ANOVA test
If H0 is TRUE, MST will approximately equal MSE and the calculated F will be approximately
equal to 1.If H0 is FALSE, MST will be greater than MSE and the calculated F > 1.
• If the k samples are taken from populations with different means, then MST will be
considerably greater than MSE, owing to the wider dispersion of the sample means ( i x )
about the grand mean ( x )—see figure below.
• If MST is so large that in comparison to MSE it yields a calculated F-value > the critical F-value, we conclude that the sample means are significantly different and there must be at
least one pair of samples whose means differ significantly.
H0 is TRUE H0 is FALSE
meangrand x= meangrand x=
ANOVA test is always a one-tailed test:
• A significant result occurs only if MST > MSE, i.e., if the calculated F > 1; thus, a right-tailed test is always used in ANOVA.
• Whenever MST < MSE, the result is never considered significant.
8/10/2019 Notes_ANOVA_04-13-08
http://slidepdf.com/reader/full/notesanova04-13-08 9/12
9
Tukey’s Test Using the Studendized Range Distribution—Hypothesis Test Comparing Two
Means (see Section 13.2, available under Course Compass).
Assumptions:• k simple random samples from k populations. • k populations are normally distributed.
• k samples are independent of each other • The populations have the same variances, σ2.
Step 1: A claim is made regarding the two population means (µi and µ j).
Two-Tailed Test
H0: µi = µ j
H1: µi ≠ µ j
µi<µ j or µi>µ j
Step 2: Determine the critical value, qα, (n1+n2+…+nk-k), k , where α = significance level,
(n1+n2+…+nk -k) = df for error, and k = df for treatments.
Step 3: (a) Compute the pairwise differences, ji xx − , where ji xx > .
(b) Compute the test statistic,
+∗
−=
ji
2
ji
n
1
n
1
2
s
xxq .
Note that s2 is the mean square error due to error, MSE, from the ANOVA table; n i is the sample
size from population i; and n j is the sample size from population j.
Step 4: Draw a conclusion:
• Compare the calculated q (or q statistic) to the critical value, qα, (n1+n2+…+nk-k), k , and state
whether or not the H0 is rejected at the specified α.
If q≥ qα, (n1+n2+…+nk-k), k , reject H0; otherwise do not reject H0.
• Interpret the conclusion in the context of the problem
Compare all pairwise differences to identify which population means are considered equal.
8/10/2019 Notes_ANOVA_04-13-08
http://slidepdf.com/reader/full/notesanova04-13-08 10/12
10
Tukey’s Test Using the Studentized Range Distribution—Example
Control Fenugreek Garlic Onion
288.1 229.1 177.4 299.7
296.8 240.7 202.2 258.3
267.8 239.4 163.1 286.8
256.7 207.7 184.7 244.0
292.1 225.7 197.9 267.1282.9 230.8 164.6 297.1
260.3 206.6 193.9 249.9
283.8 213.3 158.1 265.1
278.56 224.16 180.24 271.00
Source of
Variation
Sum of
Squares
Degrees of
Freedom Mean Square
F-
Statistic
Treatment 50,087.41 3 16,695.80
Error 8,030.89 28 286.82 58.21
Total 58,118.30 31
Step 1: State the null and alternative hypotheses.
Step 2: Determine the critical value, qα, (n1+n2+n3+n4-k), k , where α = significance level,(n1+n2+n3+n4-k) = df for error, and k = df for treatments.
α = ________
k = ______________
n1+n2+n3+n4-k = __________________________
Step 3: (a) Compute the pairwise difference, ji xx − , where ji xx > .
(b) Compute the test statistic,
+∗
µ−µ−−=
ji
2
ji ji
n
1
n
1
2
s
)()xx(q .
Step 4—Conclusion. Provide a conclusion and the statistical justification for the conclusion, and
interpret your conclusion in the context of the problem.
Repeat this procedure for all pairwise differences in sample means.
8/10/2019 Notes_ANOVA_04-13-08
http://slidepdf.com/reader/full/notesanova04-13-08 11/12
Comparison, Difference,
H0 and H1 Test Statistic, q Critical Value Conclus
Summary of Tukey’s Test (arrange sample means from highest to lowest and draw a line under means that are
ji xx −
8/10/2019 Notes_ANOVA_04-13-08
http://slidepdf.com/reader/full/notesanova04-13-08 12/12
top related