2 sample t-test one way anova - minitab maestro | minitab...
TRANSCRIPT
HYPOTHESIS TESTING:
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 1
• 2 Sample t-Test
• One Way ANOVA
Hypothesis Testing
• How do you sift through variables to
separate the “Vital Few” from the “Trivial
Many”?
• There are many tools:
– The Cause & Effect (Fishbone) Diagram
– The Cause & Effect Matrix
– Failure Modes and Effect Analysis (FMEA)
– Prior Knowledge
– Graphical Analysis
– Hypothesis Testing
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 2
Hypothesis Testing
• Why would we want to use Hypothesis
Testing?
• Hypothesis Testing takes a practical question like:
“Is the polymer viscosity higher with the new solvent?” and frames it in statistical terms
• It quantifies the risk that your conclusion is right
or wrong so you can justify spending more time &
money on experiments, scale ups, etc.
• It reduces subjectivity in decision making
• There are many forms of hypothesis tests which
can answer questions related to differences in
means or variability, proportion of defects, counts
of occurrences, etc. M. A. Sibley Consulting – All Rights Reserved HypothesisTest 3
Hypothesis Testing cont’d
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 4
Population
Sampling
Scheme Sample
Data
Conclusions
about the
Population
Should select a representative
sample.
Hypothesis Testing cont’d
• Hypothesis Tests are statements
about Population Parameters based
on Sample Statistics
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 5
Population
Parameter
Sample
Statistic
Mean
Standard Deviation s
Proportion π p
Hypothesis Testing cont’d
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 6
Hypothesis Testing Roadmap
1. Select the X and Y for your test. eg. Y = Breaking Strength; X = Additive Presence
2. Find, or better still, run experiments to obtain data. 3. Graph. 4. If graphical analysis looks promising continue. 5. Write competing hypotheses: eg.
• Null Hypothesis (Ho) = additive does not affect breaking strength
• Alternative Hypothesis (Ha) = additive does affect breaking strength
6. Select appropriate statistical test eg. 2 Sample t-test. 7. Verify data is acceptable for the test eg. Data is normally
distributed at each level (eg. With or without additive). 8. Perform analysis. 9. Draw conclusions: eg. We have statistical evidence that the
presence of the Additive increases Breaking Strength.
Hypothesis Testing cont’d
• Statistical Significance:
• Ho is assumed; the burden of proof is on Ha.
• Look for strong evidence to reject Ho. Then we accept Ha.
• Typically we look for 95% confidence that Ho is false.
• Ha is sometimes called the “Research Claim”.
• 2 outcomes:
• Reject Ho and accept Ha. statistically significant
• Fail to reject Ho. not statistically significant
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 7
Hypothesis Testing cont’d
• Consider a Canadian Court of Law:
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 8
Conclusion (Verdict)
Innocent Guilty
True State
Innocent Correct
Decision
Incorrect
Decision
Guilty Incorrect
Decision
Correct
Decision
Conclusion (Verdict)
Not Reject Ho Reject Ho
True State
Ho true Correct
Decision Type 1 (α)
Error
Ho false Type 2 (β)
Error
Correct
Decision
In statistical terms:
In everyday terms:
Which error are we more
tolerant of?
Even if there is a statistically significant
difference, it may not be practically significant!
Hypothesis Testing cont’d
Risks:
• α Risk: conclude there is a difference
when there is not
• Make changes, investments that are not
needed
• β Risk: Conclude there is no
difference when there is a difference.
• Status quo; missed opportunity
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 9
Hypothesis Testing cont’d
• All statistical tests calculate a p-value
(Probability-value)
• Decision Rule:
• Reject Ho if the p-value is less than the “critical”
value you choose before hand. Ha is supported.
• If not less, cannot reject Ho; Ha is not supported.
• Critical values (Pcritical or α risk):
• Typically use 0.05
• For a critical safety system might choose 0.003
• For a marketing decision might choose 0.2
“If p is low the Null must go” M. A. Sibley Consulting – All Rights Reserved HypothesisTest 10
Hypothesis Testing cont’d
Y Type X Type # of
X’s
#
Subgroups
Test
Continuous Discrete 1 1 1 Sample t-Test
Continuous Discrete 1 2 2 Sample t-Test
Continuous Discrete 1 2+ 1 way ANOVA (ANOVA =
Analysis of Variance)
Continuous Continuous
/Discrete
2+ 2+ ANOVA GLM (GLM =
General Linear Model)
Continuous Continuous 1 n/a (Linear) Regression
Continuous Continuous 2+ n/a Multiple Regression
Discrete Discrete 1 2+ π Analysis of Proportions
Discrete or
Binary
Continuous 1+ 2+ π Binary Logistic Regression
Continuous n/a Test for Equal Variance
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 11
Major Hypothesis Tests
Highlighted tests will be shown in this presentation.
Hypothesis Testing cont’d
• Acceptable data: • Representative
• No outliers, little auto-correlation (which is where next value is likely
to be similar to the previous value), no severe skewing (for most
tests), no distinct bi-modality, roughly bell shaped distribution
• Correctly measured, recorded
• Good measurement capability
• Total Variance = Process Variance + Measurement Variance
• If Measurement Variance is too high a proportion of the total
variance, then seeing the benefit of a process improvement can
be very difficult. This topic is called “Measurement Systems
Analysis” (MSA). The premier tool is Gauge R & R.
• Enough data to have a sensitive enough test
• “Power & Sample Size” (Stat > Power and Sample Size)
• More samples reduces the β risk – i.e. increases the likelihood of
seeing a difference if there really is one.
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 12
Hypothesis Testing cont’d
2 Sample t-Test: • A sample drawn from a normal distribution follows a
“Student-t” distribution, which is a bell shaped distribution
like the normal distribution but with heavier tails. As the
number of samples increases the corresponding t-
distribution becomes closer to normal.
• This distribution lets us draw conclusions about the range in
which the population mean is likely to be found.
• There are 2 types of 2 sample t-tests:
• 2 Sample – eg. Before / After a process change, Batches on
Night Shift vs Day Shift
• Paired – eg. Same sample analyzed on instrument A vs
instrument B.; hair coverage before and after use of Rogaine
by a group of individuals. This data could be analyzed by a
simple 2 sample t-test but it is then less “powerful”.
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 13
Hypothesis Testing cont’d
• Open the worksheet: 2 Sample t Test.MTW
• Perform a 2 Sample t-Test (Stat > Basic Statistics > 2 Sample t…)
Output:
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 14
Two-sample T for Viscosity
vs New Air Dryer Installation
N Mean StDev SE Mean
After 20 63.04 1.76 0.39
Before 23 62.02 1.93 0.40
Difference = mu (After) - mu (Before)
Estimate for difference: 1.028
95% CI for difference: (-0.109, 2.165)
T-Test of difference = 0 (vs not =): T-Value = 1.83 P-Value = 0.075 DF = 40
End of
story?
Not
statistically
significant!
Hypothesis Testing cont’d
• We look at the data graphically (which we should
have done beforehand!)
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 15
BeforeAfter
67
66
65
64
63
62
61
60
59
58
vs New Air Dryer Installation
Vis
co
sit
y
Boxplot of Viscosity
We see an “outlier”. The comment for this point says “Faulty sample valve operation; water in sample” so we are justified in removing this point from the analysis.
Hypothesis Testing cont’d
• Output after removal (replacement of data point by an
asterisk – Minitab’s missing data symbol):
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 16
Two-sample T for Viscosity
vs New Air Dryer Installation
N Mean StDev SE Mean
After 20 63.04 1.76 0.39
Before 22 61.79 1.64 0.35
Difference = mu (After) - mu (Before)
Estimate for difference: 1.254
95% CI for difference: (0.191, 2.317)
T-Test of difference = 0 (vs not =): T-Value = 2.39 P-Value = 0.022 DF = 38
BeforeAfter
67
66
65
64
63
62
61
60
59
58
vs New Air Dryer Installation
Vis
co
sit
y
Boxplot of ViscosityStatistically significant!
Reject H0
Hypothesis Testing cont’d
• We notice that the graph shows “After”
before “Before”! We can fix this:
• Right Click on the column:
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 17
After we click on OK, we right click on the graph and select Update Graph Now or we run the analysis which generated it again.
Hypothesis Testing cont’d
One Way ANOVA: ANOVA or Analysis of Variance is the hypothesis test equivalent of the
BoxPlot i.e. it is used to judge if a discrete X can explain variability in a Y.
Theory: Consider these 9 observations (Y) (3 per colour)
We can see that if the numbers within a row are similar to the row
average but the row averages are quite different from each other, then
Colour might be a significant factor in explaining overall variability in the
Y. The ANOVA test uses this principle to calculate the p-Value that you
can use to judge if the factor is significant in explaining the variability.
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 18
Colour (X) i 1 2 3 j Row Avg (𝑥 𝑗)
Red 1 11 10 9 10 Blue 2 15 13 14 14
Green 3 16 13 16 15
Grand Avg (𝑥 ): 13
Hypothesis Testing cont’d
One Way ANOVA: Theory cont’d:
Here are the formulae:
• 𝑆𝑆𝑡𝑜𝑡𝑎𝑙 = ( 𝑥𝑖𝑗 − 𝑥 )2𝑛𝑗
𝑖=1𝑝𝑗=1
• 𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 = 𝑛𝑗 𝑥 𝑗 − 𝑥 2𝑝
𝑗=1
• 𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛 = ( 𝑥𝑖𝑗 − 𝑥 𝑗 )2𝑛𝑗
𝑖=1𝑝𝑗=1
• Here is the same table but now showing *Squared*
Differences from the Grand Average:
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 19
Colour i 1 2 3 j
Red 1 4 9 16
Blue 2 4 0 1
Green 3 9 0 9
𝑆𝑆𝑡𝑜𝑡𝑎𝑙 52
Reference Slide
Reference Slide
The sum of these squared differences.
Hypothesis Testing cont’d
One Way ANOVA: Theory cont’d:
• In a similar fashion we calculate 𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 and 𝑆𝑆𝑡𝑜𝑡𝑎𝑙.
• Degrees of freedom (df) is the number of values in the
final calculation of a statistic that are free to vary.
• So for our total sum of squares, if we know the sum then
once we know n-1 of the values that make up the sum, then
the last sum is known (i.e. can’t vary) so df = n-1
• Knowing this we calculate Mean Squares (MS):
𝑆𝑆
𝑑𝑓= 𝑀𝑆 = 𝑠2
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 20
Reference Slide
Source SS df MS
Between 42 3-1=2 21.0
Within 10 9-3=6 1.67
Total 52 9-1=8
Reference Slide
Hypothesis Testing cont’d
One Way ANOVA: Theory cont’d:
• Variances follow a distribution know as Chi-
Square
• The ratio of 2 scaled Chi-Squared variables
follows the F Distribution
• For our ANOVA test since Mean Squares are
variances we calculate our test statistic F:
𝐹 = 𝑀𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛
= 21.0
1.67= 12.6
• Looking up 12.6 in the F distribution with the
right degrees of freedom gives us a value of
0.007 which is the P-Value for our ANOVA test.
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 21
Reference Slide
Reference Slide
Hypothesis Testing cont’d
One Way ANOVA: Theory cont’d:
• Here is Minitab’s output for this data: One-way ANOVA: Value versus Colour
Source DF SS MS F P
Colour 2 42.00 21.00 12.60 0.007
Error 6 10.00 1.67
Total 8 52.00
S = 1.291 R-Sq = 80.77% R-Sq(adj) = 74.36%
Individual 95% CIs For Mean Based on
Pooled StDev
Level N Mean StDev -------+---------+---------+---------+--
Blue 3 14.000 1.000 (------*------)
Green 3 15.000 1.732 (------*------)
Red 3 10.000 1.000 (------*------)
-------+---------+---------+---------+--
10.0 12.5 15.0 17.5
Pooled StDev = 1.291
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 22
Reference Slide
P is low, the null must go.
Statistically significant!
Reject H0; accept Ha
Reference Slide
Hypothesis Testing cont’d
One Way ANOVA: Theory cont’d:
• Here is Minitab’s output for this data:
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 23
Reference Slide
GreenBlueRed
16
15
14
13
12
11
10
9
Colour
Va
lue
Individual Value Plot of Value vs Colour
Reference Slide
Hypothesis Testing cont’d
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 24
1. Open the dataset: HOMECAR.MTW
2. Create a BoxPlot of Consumption vs Month. (You will have to
create the variable Month from the date).
3. Perform a Hypothesis test to see if there is a statistically
significant difference in Fuel Consumption vs Month.
Note: you will need to use the Hypothesis Test equivalent of the
BoxPlot, namely ANOVA (Analysis of Variance).
Stat > ANOVA > One-Way
If time permits, repeat the exercise for Quarter (of the year).
Exercise
5 minutes
Hypothesis Testing cont’d
• We create a new variable “Month” from Date
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 25
Hypothesis Testing cont’d
• We perform a one way ANOVA:
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 26
Hypothesis Testing cont’d
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 27
121110987654321
10
9
8
7
6
5
Month
Co
nsu
mp
tio
n(L
/1
00
Km
)
Boxplot of Consumption(L/100Km)
Boxplot output from the
ANOVA shows a suggestive pattern vs
month but it is still in the
“grey area”.
ANOVA does not differentiate based on the order of the categories – i.e. our eye sees a non random seasonal pattern but ANOVA has the same
output no matter what the ordering of the categories is.
Hypothesis Testing cont’d
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 28
210-1-2
99.9
99
90
50
10
1
0.1
Residual
Pe
rce
nt
8.007.757.507.257.00
2
1
0
-1
-2
Fitted Value
Re
sid
ua
l2.251.500.750.00-0.75-1.50-2.25
20
15
10
5
0
Residual
Fre
qu
en
cy
140
130
120
110
1009080706050403020101
2
1
0
-1
-2
Observation Order
Re
sid
ua
l
Normal Probability Plot Versus Fits
Histogram Versus Order
Residual Plots for Consumption(L/100Km)
We look for problems in
the residuals.
The distribution looks to be
roughly normal
No obvious change in variance or curvature in the residuals.
A random pattern vs order of observations (a good thing!).
Hypothesis Testing cont’d
The ANOVA table in the session window:
One-way ANOVA: Consumption(L/100Km) versus Month
Source DF SS MS F P
Month 11 8.480 0.771 1.38 0.190
Error 126 70.391 0.559
Total 137 78.871
S = 0.7474 R-Sq = 10.75% R-Sq(adj) = 2.96%
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 29
P-Value is not below 0.05 so we cannot reject Ho i.e. we have no / insufficient evidence that there is a
relationship between month and fuel consumption.
Hypothesis Testing cont’d
Individual 95% CIs For Mean Based on
Pooled StDev
Level N Mean StDev -------+---------+---------+---------+--
1 7 8.0276 0.6886 (-----------*----------)
2 7 7.7260 0.5718 (-----------*----------)
3 8 7.7382 0.8514 (----------*---------)
4 9 7.5029 0.5035 (---------*---------)
5 9 7.1770 0.5295 (---------*--------)
6 11 7.2524 0.8357 (--------*--------)
7 20 7.3601 0.9416 (-----*------)
8 21 7.2793 0.8750 (------*-----)
9 10 7.0942 0.5938 (--------*--------)
10 13 7.5054 0.4882 (-------*-------)
11 12 7.7771 0.4345 (--------*-------)
12 11 7.6613 0.9480 (--------*--------)
-------+---------+---------+---------+--
7.00 7.50 8.00 8.50
Pooled StDev = 0.7474
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 30
The confidence intervals are overlapping
Hypothesis Testing cont’d
• There was a pattern in the data that
suggested a seasonal effect. If we
fewer categories, but more data in
each category, we might we a
statistically significant effect.
• We create a new variable “Quarter” in
a similar fashion to how we created
“Month”.
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 31
Hypothesis Testing cont’d
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 32
4321
10
9
8
7
6
5
Quarter
Co
nsu
mp
tio
n(L
/1
00
Km
)
Boxplot of Consumption(L/100Km)
The graph is suggestive of a difference, but it is not completely
clear cut, so we judge by the ANOVA table in the
session window.
Hypothesis Testing cont’d
One-way ANOVA: Consumption(L/100Km) versus Quarter
Source DF SS MS F P
Quarter 3 6.595 2.198 4.08 0.008
Error 134 72.275 0.539
Total 137 78.871
S = 0.7344 R-Sq = 8.36% R-Sq(adj) = 6.31%
Individual 95% CIs For Mean Based on
Pooled StDev
Level N Mean StDev -----+---------+---------+---------+----
1 22 7.8264 0.7002 (---------*---------)
2 29 7.3067 0.6488 (--------*--------)
3 51 7.2747 0.8462 (-----*------)
4 36 7.6436 0.6412 (-------*-------)
-----+---------+---------+---------+----
7.20 7.50 7.80 8.10
M. A. Sibley Consulting – All Rights Reserved HypothesisTest 33
The confidence
intervals are not completely
overlapping
P-Value is below 0.05 so we reject Ho and accept Ha
i.e. we have evidence that there is a relationship
between quarter and fuel consumption.
This source of variation explains 8% of the variance in fuel consumption.
The max. seasonal difference in fuel consumption is about 0.5 L/100Km