09 analysis of variance- part1
TRANSCRIPT
Breakthrough Management GroupBMG
Analysis of Variance – Part I
ANOVA One-way –Hypothesis testing for multiple means
4μ1μ 3μ2μ
Pg 2© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
Module Objectives
To introduce the concepts of Analysis of VarianceSum of SquaresMean Square Error
To demonstrate and practice calculating the ANOVA tableManuallyMinitab
To practice ANOVA ExercisesHomeworkQuiz
Pg 3© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
Analyze Phase Deliverables
1) Describe ProjectObjective statementMetrics.xls chartsInitial validated forecast
2) FMEA3) ID Variation: Graphical Methods4) ID Variation: Statistical Methods
Correlation & RegressionMeans testingSigma testingProportions testingContingency tables
5) Planning for DOE6) Complete Phase Summary
Conclusions, Issues, & Next Steps
Failure Modes & Effects Analysis
ID Variation: Graphical Analysis
Plan for DOE
Analyze
ID Variation: Statistical Analysis
Pg 4© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
Mathematical Tools & Data Types
Discrete ContinuousD
iscr
ete
Analysis of Variance
Con
tinuo
us
Response VariableIn
depe
nden
t Var
iabl
e
A Menu of Six Sigma Tools.
Pg 5© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
Hypothesis Test Reference
Data Description Hypothesis Test Tool examples
Printer Model (X1) vs. Service Level (Y)
1 Discrete XContinuous Y
Ho: u = Target 1-Sample t test
2-sample t test; 2-sample paired t-testANOVA
Printer cartridge defect rate (p) vs. target
1 Discrete XContinuous Y
Ho: p = po 1 Proportion p-value
Comparison of quality levels of two products
2 Discrete XDiscrete Y
Ho: p1 – p2 = po 2 Proportions Difference, confidence interval, p-value
Observed Events vs. Expected Events
3+ Discrete X Discrete Y
Ho: p1=…=pk Chi-Square;Analysis of Means
Contingency table
Printer Models (X1, X2) vs. Fuser Life (Y)
2 Discrete XContinuous Y
Ho: u1 = u2
Time series charts; histogram; Capability analysisScatter plots; Dot Plots
Monday-Friday (X1, X2, X3, X4, X5) vs. Sales (Y)
3+ Discrete X Continuous Y
Ho: u1=…=uk Box Plots; Main Effects plots; Interaction plots; Pareto charts
Pg 6© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
Testing Means
Testing for one meanZ-test for large samples or when σ is knownt-test for smaller samples or when σ is unknown
Testing for two means2-sample t-testPaired t-test
Testing for three or more means1-way ANOVA
Ho: μ = μtargetHa: μ < μtarget
μ > μtargetμ ≠ μtarget
Ho: μ1 = μ2
Ha: μ1 < μ2μ1 > μ2μ1 ≠ μ2
Ho: μ1 = μ2= μ3= μ4
Ha: at least one μ is different from another
μtarget
μ
1μ 2μ
4μ1μ 3μ2μ
Pg 7© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
ANOVA Model
Mathematical Model for ANOVA
ijjijy ετμ ++=Where:yij = a single response from Treatment jμ = overall mean τj = the contribution from Treatment jεij = random error
000
≠=
ja one least@:Hs':H
ττ
different is one least atH
H
ja
j
μ
μμμ
:
...: 210 ===Mathematical Hypothesis Conventional Translation
ANOVA is a model for discrete inputs and continuous output features
Pg 8© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
ANOVA Introduction
Example:A fertilizer company wants to compare fields treated with four new, spring wheat fertilizers to fields with no fertilizer.Ten different fields were sampled for each treatment and the average bushels/acre was computed.
Problem:Were any of the fertilizers different from each other and the unfertilized control group?
How can we analyze this data?
Pg 9© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
Case Studies
Case 1
Case 2Trial TmtA TmtB TmtC TmtD TmtCntrl11 6.6 7.0 6.1 7.5 5.52 6.4 7.0 5.9 7.5 5.73 6.5 7.0 6.0 7.6 5.54 6.6 6.8 6.1 7.4 5.45 6.5 7.1 5.9 7.4 5.56 6.6 7.0 5.8 7.4 5.57 6.5 6.9 6.0 7.6 5.48 6.6 6.9 5.9 7.7 5.59 6.7 6.9 6.1 7.5 5.5
10 6.5 6.9 6.1 7.6 5.7Mean 6.6 7.0 6.0 7.5 5.5
Trial Tmt1 Tmt2 Tmt3 Tmt4 TmtCntrl21 6.0 5.5 6.4 6.4 7.62 4.3 8.3 5.3 7.1 4.93 4.7 4.8 6.7 8.8 6.34 6.7 6.8 6.9 8.8 5.35 5.0 8.2 6.8 6.0 2.26 4.9 8.2 5.0 7.4 6.57 7.6 6.3 7.2 9.1 5.78 7.9 7.8 6.1 5.5 6.09 7.2 6.6 6.4 9.4 7.4
10 8.2 10.1 5.6 8.6 3.0Mean 6.3 7.3 6.2 7.7 5.5
Are the treatments different? Are the they different now?
IntroANOVA.mtw
Pg 10© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
Case 1 Graphs
Case 1 – Boxplot
Case 1 – Dotplot
Are the treatments different?
Dat
a
TmtCntrl1TmtDTmtCTmtBTmtA
8.0
7.5
7.0
6.5
6.0
5.5
Boxplot of TmtA, TmtB, TmtC, TmtD, TmtCntrl1
Dat
a
TmtCntrl1TmtDTmtCTmtBTmtA
8.0
7.5
7.0
6.5
6.0
5.5
Individual Value Plot of TmtA, TmtB, TmtC, TmtD, TmtCntrl1
Pg 11© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
Case 2 Graphs
Case 2 -- Boxplot
Case 2 -- Dotplot
How might one analyze these differences statistically?
Dat
a
TmtCntrl2Tmt4Tmt3Tmt2Tmt1
11
10
9
8
7
6
5
4
3
2
Boxplot of Tmt1, Tmt2, Tmt3, Tmt4, TmtCntrl2
Dat
a
TmtCntrl2Tmt4Tmt3Tmt2Tmt1
11
10
9
8
7
6
5
4
3
2
Individual Value Plot of Tmt1, Tmt2, Tmt3, Tmt4, TmtCntrl2
Pg 12© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
Multiple t-tests – Adequate?
An experimenter could run 2-sample t-tests against every combination of means
1 vs. 2, 1 vs. 3, 1 vs. 4, 1 vs. 52 vs. 3, 2 vs. 4, 2 vs. 53 vs. 4, 3 vs. 54 vs. 5
Why would this not be a good idea?Obviously, it is tedious and cumbersome
What about alpha risk?Each t-test has a risk of false rejection (α)
4.010^95.01
=−=totalα
What would be the total α risk?
Pg 13© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
Solution – Analysis of Variance
ANOVA is really an extension (generalization) of the 2-sample t-test
ANOVA is a method of detecting differences between multiple means of samples
Why is it called Analysis of Variance?ANOVA compares/analyzes variances
Variance within a groupVariance between groups
ANOVA is the mathematics behind the intuitive evaluation
Pg 14© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
ANOVA in Minitab – A Quick Demo
In Minitab select Stat>ANOVA>One-way (Unstacked)…
different is one least @:H
...:H
ja
j
μ
μμμ === 210
Pg 15© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
Minitab ANOVA Results – Case 1 One-way ANOVA: TmtA, TmtB, TmtC, TmtD, TmtCntrl1
Source DF SS MS F PFactor 4 24.65720 6.16430 643.60 0.000Error 45 0.43100 0.00958Total 49 25.08820
S = 0.09787 R-Sq = 98.28% R-Sq(adj) = 98.13%
Individual 95% CIs For Mean Based onPooled StDev
Level N Mean StDev --------+--------+--------+--------+TmtA 10 6.5500 0.0850 (*)TmtB 10 6.9500 0.0850 (*)TmtC 10 5.9900 0.1101 (*)TmtD 10 7.5200 0.1033 (*)TmtCntrl1 10 5.5200 0.1033 (*)
--------+--------+--------+--------+6.00 6.60 7.20 7.80
Pooled StDev = 0.0979
What do you think this means?
Reject Ho or Fail to Reject Ho?
Pg 16© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
Minitab ANOVA Results – Case 2One-way ANOVA: Tmt1, Tmt2, Tmt3, Tmt4, TmtCntrl2
Source DF SS MS F PFactor 4 31.51 7.88 3.89 0.009Error 45 91.25 2.03Total 49 122.77
S = 1.424 R-Sq = 25.67% R-Sq(adj) = 19.06%
Individual 95% CIs For Mean Based onPooled StDev
Level N Mean StDev --+---------+---------+---------+-----Tmt1 10 6.250 1.457 (------*-------)Tmt2 10 7.260 1.561 (-------*------)Tmt3 10 6.240 0.729 (-------*-------)Tmt4 10 7.710 1.412 (------*-------)TmtCntrl2 10 5.490 1.748 (-------*------)
--+---------+---------+---------+-----4.8 6.0 7.2 8.4
Pooled StDev = 1.424
What do you think this means?
Reject Ho or Fail to Reject Ho?
Pg 17© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
Analysis of Variance – General Recipe
1. State the practical problem
2. State the null hypothesis
3. State the alternate hypothesis
4. Do the model assumptions hold?
5. Construct the Analysis of Variance Table
6. Do the assumptions for the errors hold (residual analysis)?
7. Interpret the p-value for the factor effect (p < α)
8. Calculate %SS for the factor and error terms
9. Translate the conclusion into practical terms
A general recipe for all types of hypothesis tests.
Pg 18© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
Control 66 67 74 73 75 64Herbicide A 85 85 76 82 79 86Herbicide B 91 93 88 87 90 86
An ANOVA Calculation Example
Peaches & HerbsTwo herbicides were tested to determine if treatment of the surrounding weeds improved the growth of peach tree seedlings. A third group was left untreated as a control.Eighteen seedlings were selected for the test, six assigned randomly to each of the three groups. At the end of the study period, the height, in cm, was recorded for each seedling.Use ANOVA and the following data to detect differences among the different seedling heights. Use α = 0.05
What is the first thing you should do in any analysis?
Peaches.mtw
Pg 19© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
Graphical Exploration – Minitab
What conclusions can you draw?
Group
Hei
ght
HerbBHerbACntrl
95
90
85
80
75
70
65
60
Boxplot of Height by Group
Group
Hei
ght
HerbBHerbACntrl
95
90
85
80
75
70
65
60
Individual Value Plot of Height vs Group
Pg 20© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
Group
Hei
ght
HerbBHerbACntrl
95
90
85
80
75
70
65
60
Grand Mean = 80.39
89.1667
82.1667
69.8333
Individual Value Plot of Height vs Group
ANOVA Visually
Pg 21© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
ANOVA Assumptions
Each sample is an independent, random sampleIndependent
The selection of any sample is not dependent on any other sample being selected or not selected
RandomAll members of the population have an equal chance of being selected
The measurements within each group are normally distributed and have equal variances
This only applies for the within group variation, not between group variationThe variances for each group (treatment) are equal
ANOVA Assumptions: Normality and Equal Variances.
Pg 22© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
Group
Hei
ght
HerbBHerbACntrl
95
90
85
80
75
70
65
60
89.17
82.17
69.83
Individual Value Plot of Height vs Group
ANOVA Assumptions – Visually
2Cntrlσ
2HerbAσ
2HerbBσ
222HerbBHerbACntrl σσσ ==
Pg 23© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
Normality Testing for ANOVA
How many normality tests should we run?
Height_HerbB
Perc
ent
9694929088868482
99
95
90
80
70
60504030
20
10
5
1
Mean
0.859
89.17StDev 2.639N 6AD 0.178P-Value
Probability Plot of Height_HerbBNormal
Height_HerbA
Perc
ent
90858075
99
95
90
80
70
60504030
20
10
5
1
Mean
0.304
82.17StDev 3.971N 6AD 0.364P-Value
Probability Plot of Height_HerbANormal
Height_Cntrl
Perc
ent
8075706560
99
95
90
80
70
60504030
20
10
5
1
Mean
0.230
69.83StDev 4.708N 6AD 0.407P-Value
Probability Plot of Height_CntrlNormal
Pg 24© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
Equal Variances – Another ANOVA Assumption
Select Stat>ANOVA>Test for Equal Variances…
Have we proven the variances are equal? What if the variances were unequal?
Grou
p
95% Bonferroni Confidence Intervals for StDevs
HerbB
HerbA
Cntrl
1614121086420
Bartlett's Test
0.158
Test Statistic 1.47P-Value 0.480
Levene's Test
Test Statistic 2.09P-Value
Test for Equal Variances for Height
Pg 25© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
Group
Hei
ght
HerbBHerbACntrl
95
90
85
80
75
70
65
60
Individual Value Plot of Height vs Group
ANOVA Visually
Data points can be characterized by distances
80.4
89.2
82.2
69.8
The distance of the group mean from the grand
mean
The distance of the data
point from the group mean
Pg 26© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
A Variation Estimate – Sum of Squares
The variability of n sample measurements about their mean can be measured using the sum of squared deviations from the grand mean
Likewise, the sum of square deviations within each group is:
The variability of each group as compared to the grand mean is:
According to the model:
2)(∑∑ −=
i jijTotal yySS
2)(∑∑ −=
i jjijWithin yySS
2)(∑∑ −=
i jjBetween yySS
WithinBetweenTotal SSSSSS +=
Variances are Sums of Squares divided by degrees of freedom.
Pg 27© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
60
70
80
90
100
0 5 10 15 20 25 30
The Sum of Squares Model
Putting it all together:
( ) ( )2 2 2
1 1 1 1( )
k m k m
ij ijj jj i i j j i
y y y y y y= = = =
− = − + −∑∑ ∑∑ ∑∑
TotalSS WithinSSBetweenSS
1Y
2Y
3Y
Y
Pg 28© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
k is the number of groups (or factors)d.f. (between) =
n is the total number of samplesd.f. (within) =d.f. (total) =
The ANOVA table: Degrees of Freedom
SourceDegrees ofFreedom
Sum of Squares
MeanSquare F-Statistic
Between(or Factor) k-1 SS Between s 2
Between = SS Between /k-1 s 2Between /s 2
Within
Within(or Error) n-k SS Within s 2
Within = SS Within /n-k
Totals n-1 SS Total
2
1517
Pg 29© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
The ANOVA table: Sum of Squares
SourceDegrees ofFreedom
Sum of Squares
MeanSquare F-Statistic
Between(or Factor) 2 SS Between s 2
Between = SS Between /k-1 s 2Between /s 2
Within
Within(or Error) 15 SS Within s 2
Within = SS Within /n-k
Totals 17 SS Total
2)(∑∑ −=
i jjijWithin yySS
2)(∑∑ −=
i jijTotal yySS
2)(∑∑ −=
i jjBetween yySS
How else can we calculate SSTotal?
Pg 30© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
Calculating the SSQs – the Grand Mean
Select Stat>Basic Stat>Store Descriptive Statistics…
Pg 31© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
Calculating the SSQ’s – the Group Means
Select Stat>Basic Stat>Store Descriptive Statistics…
Pg 32© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
Calculating the Squared Differences
Select Calc>Calculator…
Pg 33© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
Calculating the Squared Differences
Results of the Sum of Squares Calculations
Pg 34© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
Summing the Squared Differences
Select Calc>Column Statistics…Repeat for SSB and SSW
Minitab Results:Sum of SSB = 1149.8Sum of SSW = 224.5
Sum of TSS = 1374.3
The SSTotal equals the sum of SSBetween and SSWithin.
TSS = TOTAL SUMS OF SQUARES
SSB = BETWEEN GROUP SUMS OF SQUARES
SSW = WITHIN GROUP SUMS OF SQUARES
Pg 35© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
SourceDegrees ofFreedom
Sum of Squares
MeanSquare F-Statistic
Between(or Factor) 2 1150 s 2
Between = SS Between /k-1 s 2Between /s 2
Within
Within(or Error) 15 224 s 2
Within = SS Within /n-k
Totals 17 1374
224 1150 - 1374 ==− WithinBetweenTotal SSSSSS
WithinBetweenTotal SSSSSS +=
The ANOVA table: SSWithin
Remember:
Notice that:
Pg 36© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
SourceDegrees ofFreedom
Sum of Squares
MeanSquare F-Statistic
Between(or Factor) 2 1150 s 2
Between = SS Between /k-1 s 2Between /s 2
Within
Within(or Error) 15 224 s 2
Within = SS Within /n-k
Totals 17 1374
SourceDegrees ofFreedom
Sum of Squares
MeanSquare F-Statistic
Between(or Factor) 2 1150 575 38.33
Within(or Error) 15 224 15
Totals 17 1374
===
2BetweenBetweenBetween dfSSs
===
2WithinWithinWithin dfSSs
===
22WithinBetweenCalc ssF
5752 1150
2
=== BetweenBetweenBetween dfSSs
1515 224
2
=== WithinWithinWithin dfSSs
33.3815 575
22
=== WithinBetweenCalc ssF
The ANOVA Table: Mean Square & Fcalc
What is an F test? An F distribution?
Pg 37© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
The ANOVA Table – Complete
SourceDegrees ofFreedom
Sum of Squares
MeanSquare F-Statistic
Between(or Factor) 2 1150 575 38.33
Within(or Error) 15 224 15
Totals 17 1374
Pg 38© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
P (F >= f) = 1.000 – 1.0000 or p-value = 0.0000
Calculating a P-value from an F Statistic
In Minitab select Calc > Probability Distributions > F…
Minitab Outputx P( X <= x )38.3300 1.0000
dfBetween
dfWithin
F
Reject Ho or Fail to Reject Ho?
Pg 39© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
In Summary
ANOVA is a powerful tool for testing equality of meansANOVA compares the variability between group means to the average variability within a group
ANOVA p-values are significant when the group-to-group differences are too large to be explained by the within group variability
The initial assumptions for ANOVA:Each subgroup is normally distributedThe variances of all the subgroups are equal
The ANOVA table is the standard way of reporting ANOVA analysis
Pg 40© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
The (ω)Rap!
Key TakeawaysANOVA is a powerful tool for determining the differences between multiple meansANOVA compares the between-group variability to the within-group variabilityThe assumptions of ANOVA are subgroup normality and equal variancesOthers:__________________________________________________________________
How can I use this in my project?____________________________________________________________________________________________________________________________________________________________________________________________________________________________
Pg 41© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
10-QQ! (Choose the best answers)1) ANOVA is used to test hypotheses of means.
a) Trueb) False
2) If the group means are different, the p-value will be large.a) Trueb) False
3) Excel and Minitab calculate p-values from F-distributions in the same way.a) Trueb) False
4) Which statements are not true?a) Manual calculations are less accurate than
Minitab calculationsb) ANOVA is just an easier way to conduct multiple
t testsc) ANOVA is a test for only one input at multiple
levelsd) All of the Above
5) The most important step in ANOVA is:a) Checking the initial assumptionsb) Making practical conclusionsc) Gathering large amounts of datad) Asking a practical questione) Letting Minitab run the analysis automatically
6) The null hypothesis of Analysis of Variance is:a) All of the means are differentb) All of the variances are differentc) At least one of the variances is different from
anotherd) At least on of the means is different from anothere) None of the above
7) ANOVA is a way of analyzing:a) Continuous independent variables and continuous
dependent variablesb) Continuous dependent variables and discrete
independent variablesc) Discrete independent variables and continuous
independent variablesd) Discrete dependent variables and discrete
dependent variables
8) The initial assumptions of ANOVA are:a) Normality of the datasetb) All groups have equal variancesc) All groups are normally distributedd) All samples are measured perfectlye) All of the groups have the same mean
9) Which statement is not true?a) ANOVA is a hypothesis test for meansb) The father of modern statistics was R. A. Fisherc) Pooled standard deviation is the square root of an
average varianced) ANOVA requires relatively few samples per groupe) R.A. Fisher did farm research in Iowa
Pg 42© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
New and Important Terms
ANOVA TableContinuous VariableDiscrete (Categorical) VariableEmpirical ModelF-testOne-way Analysis of Variance
Pg 43© Breakthrough Management Group, Inc. Unpublished proprietary work available only under license. All rights reserved.
Glossary
ANOVA TableThe standard way of displaying the results of the calculations from ANOVA
Continuous VariableVariables (data) whose possible of values form a whole interval, range, or continuum (e.g., Temperature)
Discrete (Categorical) VariableVariables (data) whose possible values are distinct or separate (0, 1, 2, etc.)
Empirical ModelAn equation derived from the data the expresses a relationship between the inputs and an output (Y=f(x))
F-testA hypothesis test for comparing variances
One-way Analysis of Variance One-way analysis of variance tests the equality of population means when classification is by one variable. The classification variable, or factor, usually has three or more levels.