2 sample t-test one way anova - minitab maestro | minitab...

HYPOTHESIS TESTING:

M. A. Sibley Consulting – All Rights Reserved HypothesisTest 1

• 2 Sample t-Test

• One Way ANOVA

Hypothesis Testing

• How do you sift through variables to

separate the “Vital Few” from the “Trivial

Many”?

• There are many tools:

– The Cause & Effect (Fishbone) Diagram

– The Cause & Effect Matrix

– Failure Modes and Effect Analysis (FMEA)

– Prior Knowledge

– Graphical Analysis

– Hypothesis Testing


Hypothesis Testing

• Why would we want to use Hypothesis

Testing?

• Hypothesis Testing takes a practical question like:

“Is the polymer viscosity higher with the new solvent?” and frames it in statistical terms

• It quantifies the risk that your conclusion is right

or wrong so you can justify spending more time &

money on experiments, scale ups, etc.

• It reduces subjectivity in decision making

• There are many forms of hypothesis tests which

can answer questions related to differences in

means or variability, proportion of defects, counts

of occurrences, etc. M. A. Sibley Consulting – All Rights Reserved HypothesisTest 3

Hypothesis Testing cont’d


Population

Sampling

Scheme Sample

Data

Conclusions

about the

Population

Should select a representative

sample.


• Hypothesis Tests are statements

about Population Parameters based

on Sample Statistics


Population

Parameter

Sample

Statistic

Mean

Standard Deviation s

Proportion π p



Hypothesis Testing Roadmap

1. Select the X and Y for your test. eg. Y = Breaking Strength; X = Additive Presence

2. Find, or better still, run experiments to obtain data. 3. Graph. 4. If graphical analysis looks promising continue. 5. Write competing hypotheses: eg.

• Null Hypothesis (Ho) = additive does not affect breaking strength

• Alternative Hypothesis (Ha) = additive does affect breaking strength

6. Select appropriate statistical test eg. 2 Sample t-test. 7. Verify data is acceptable for the test eg. Data is normally

distributed at each level (eg. With or without additive). 8. Perform analysis. 9. Draw conclusions: eg. We have statistical evidence that the

presence of the Additive increases Breaking Strength.


• Statistical Significance:

• Ho is assumed; the burden of proof is on Ha.

• Look for strong evidence to reject Ho. Then we accept Ha.

• Typically we look for 95% confidence that Ho is false.

• Ha is sometimes called the “Research Claim”.

• 2 outcomes:

• Reject Ho and accept Ha. statistically significant

• Fail to reject Ho. not statistically significant



• Consider a Canadian Court of Law:


Conclusion (Verdict)

Innocent Guilty

True State

Innocent Correct

Decision

Incorrect

Decision

Guilty Incorrect

Decision

Correct

Decision

Conclusion (Verdict)

Not Reject Ho Reject Ho

True State

Ho true Correct

Decision Type 1 (α)

Error

Ho false Type 2 (β)

Error

Correct

Decision

In statistical terms:

In everyday terms:

Which error are we more

tolerant of?

Even if there is a statistically significant

difference, it may not be practically significant!


Risks:

• α Risk: conclude there is a difference

when there is not

• Make changes, investments that are not

needed

• β Risk: Conclude there is no

difference when there is a difference.

• Status quo; missed opportunity



• All statistical tests calculate a p-value

(Probability-value)

• Decision Rule:

• Reject Ho if the p-value is less than the “critical”

value you choose before hand. Ha is supported.

• If not less, cannot reject Ho; Ha is not supported.

• Critical values (Pcritical or α risk):

• Typically use 0.05

• For a critical safety system might choose 0.003

• For a marketing decision might choose 0.2

“If p is low the Null must go” M. A. Sibley Consulting – All Rights Reserved HypothesisTest 10


Y Type X Type # of

X’s

#

Subgroups

Test

Continuous Discrete 1 1 1 Sample t-Test

Continuous Discrete 1 2 2 Sample t-Test

Continuous Discrete 1 2+ 1 way ANOVA (ANOVA =

Analysis of Variance)

Continuous Continuous

/Discrete

2+ 2+ ANOVA GLM (GLM =

General Linear Model)

Continuous Continuous 1 n/a (Linear) Regression

Continuous Continuous 2+ n/a Multiple Regression

Discrete Discrete 1 2+ π Analysis of Proportions

Discrete or

Binary

Continuous 1+ 2+ π Binary Logistic Regression

Continuous n/a Test for Equal Variance


Major Hypothesis Tests

Highlighted tests will be shown in this presentation.


• Acceptable data: • Representative

• No outliers, little auto-correlation (which is where next value is likely

to be similar to the previous value), no severe skewing (for most

tests), no distinct bi-modality, roughly bell shaped distribution

• Correctly measured, recorded

• Good measurement capability

• Total Variance = Process Variance + Measurement Variance

• If Measurement Variance is too high a proportion of the total

variance, then seeing the benefit of a process improvement can

be very difficult. This topic is called “Measurement Systems

Analysis” (MSA). The premier tool is Gauge R & R.

• Enough data to have a sensitive enough test

• “Power & Sample Size” (Stat > Power and Sample Size)

• More samples reduces the β risk – i.e. increases the likelihood of

seeing a difference if there really is one.



2 Sample t-Test: • A sample drawn from a normal distribution follows a

“Student-t” distribution, which is a bell shaped distribution

like the normal distribution but with heavier tails. As the

number of samples increases the corresponding t-

distribution becomes closer to normal.

• This distribution lets us draw conclusions about the range in

which the population mean is likely to be found.

• There are 2 types of 2 sample t-tests:

• 2 Sample – eg. Before / After a process change, Batches on

Night Shift vs Day Shift

• Paired – eg. Same sample analyzed on instrument A vs

instrument B.; hair coverage before and after use of Rogaine

by a group of individuals. This data could be analyzed by a

simple 2 sample t-test but it is then less “powerful”.



• Open the worksheet: 2 Sample t Test.MTW

• Perform a 2 Sample t-Test (Stat > Basic Statistics > 2 Sample t…)

Output:


Two-sample T for Viscosity

vs New Air Dryer Installation

N Mean StDev SE Mean

After 20 63.04 1.76 0.39

Before 23 62.02 1.93 0.40

Difference = mu (After) - mu (Before)

Estimate for difference: 1.028

95% CI for difference: (-0.109, 2.165)

T-Test of difference = 0 (vs not =): T-Value = 1.83 P-Value = 0.075 DF = 40

End of

story?

Not

statistically

significant!


• We look at the data graphically (which we should

have done beforehand!)


BeforeAfter

67

66

65

64

63

62

61

60

59

58


Vis

co

sit

y

Boxplot of Viscosity

We see an “outlier”. The comment for this point says “Faulty sample valve operation; water in sample” so we are justified in removing this point from the analysis.


• Output after removal (replacement of data point by an

asterisk – Minitab’s missing data symbol):


Two-sample T for Viscosity


N Mean StDev SE Mean

After 20 63.04 1.76 0.39

Before 22 61.79 1.64 0.35

Difference = mu (After) - mu (Before)

Estimate for difference: 1.254

95% CI for difference: (0.191, 2.317)

T-Test of difference = 0 (vs not =): T-Value = 2.39 P-Value = 0.022 DF = 38

BeforeAfter

67

66

65

64

63

62

61

60

59

58


Vis

co

sit

y

Boxplot of ViscosityStatistically significant!

Reject H0


• We notice that the graph shows “After”

before “Before”! We can fix this:

• Right Click on the column:


After we click on OK, we right click on the graph and select Update Graph Now or we run the analysis which generated it again.


One Way ANOVA: ANOVA or Analysis of Variance is the hypothesis test equivalent of the

BoxPlot i.e. it is used to judge if a discrete X can explain variability in a Y.

Theory: Consider these 9 observations (Y) (3 per colour)

We can see that if the numbers within a row are similar to the row

average but the row averages are quite different from each other, then

Colour might be a significant factor in explaining overall variability in the

Y. The ANOVA test uses this principle to calculate the p-Value that you

can use to judge if the factor is significant in explaining the variability.


Colour (X) i 1 2 3 j Row Avg (𝑥 𝑗)

Red 1 11 10 9 10 Blue 2 15 13 14 14

Green 3 16 13 16 15

Grand Avg (𝑥 ): 13


One Way ANOVA: Theory cont’d:

Here are the formulae:

• 𝑆𝑆𝑡𝑜𝑡𝑎𝑙 = ( 𝑥𝑖𝑗 − 𝑥 )2𝑛𝑗

𝑖=1𝑝𝑗=1

• 𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 = 𝑛𝑗 𝑥 𝑗 − 𝑥 2𝑝

𝑗=1

• 𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛 = ( 𝑥𝑖𝑗 − 𝑥 𝑗 )2𝑛𝑗

𝑖=1𝑝𝑗=1

• Here is the same table but now showing *Squared*

Differences from the Grand Average:


Colour i 1 2 3 j

Red 1 4 9 16

Blue 2 4 0 1

Green 3 9 0 9

𝑆𝑆𝑡𝑜𝑡𝑎𝑙 52

Reference Slide

Reference Slide

The sum of these squared differences.



• In a similar fashion we calculate 𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 and 𝑆𝑆𝑡𝑜𝑡𝑎𝑙.

• Degrees of freedom (df) is the number of values in the

final calculation of a statistic that are free to vary.

• So for our total sum of squares, if we know the sum then

once we know n-1 of the values that make up the sum, then

the last sum is known (i.e. can’t vary) so df = n-1

• Knowing this we calculate Mean Squares (MS):

𝑆𝑆

𝑑𝑓= 𝑀𝑆 = 𝑠2


Reference Slide

Source SS df MS

Between 42 3-1=2 21.0

Within 10 9-3=6 1.67

Total 52 9-1=8

Reference Slide



• Variances follow a distribution know as Chi-

Square

• The ratio of 2 scaled Chi-Squared variables

follows the F Distribution

• For our ANOVA test since Mean Squares are

variances we calculate our test statistic F:

𝐹 = 𝑀𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛

= 21.0

1.67= 12.6

• Looking up 12.6 in the F distribution with the

right degrees of freedom gives us a value of

0.007 which is the P-Value for our ANOVA test.


Reference Slide

Reference Slide



• Here is Minitab’s output for this data: One-way ANOVA: Value versus Colour

Source DF SS MS F P

Colour 2 42.00 21.00 12.60 0.007

Error 6 10.00 1.67

Total 8 52.00

S = 1.291 R-Sq = 80.77% R-Sq(adj) = 74.36%

Individual 95% CIs For Mean Based on

Pooled StDev

Level N Mean StDev -------+---------+---------+---------+--

Blue 3 14.000 1.000 (------*------)

Green 3 15.000 1.732 (------*------)

Red 3 10.000 1.000 (------*------)

-------+---------+---------+---------+--

10.0 12.5 15.0 17.5

Pooled StDev = 1.291


Reference Slide

P is low, the null must go.

Statistically significant!

Reject H0; accept Ha

Reference Slide



• Here is Minitab’s output for this data:


Reference Slide

GreenBlueRed

16

15

14

13

12

11

10

9

Colour

Va

lue

Individual Value Plot of Value vs Colour

Reference Slide



1. Open the dataset: HOMECAR.MTW

2. Create a BoxPlot of Consumption vs Month. (You will have to

create the variable Month from the date).

3. Perform a Hypothesis test to see if there is a statistically

significant difference in Fuel Consumption vs Month.

Note: you will need to use the Hypothesis Test equivalent of the

BoxPlot, namely ANOVA (Analysis of Variance).

Stat > ANOVA > One-Way

If time permits, repeat the exercise for Quarter (of the year).

Exercise

5 minutes


• We create a new variable “Month” from Date



• We perform a one way ANOVA:




121110987654321

10

9

8

7

6

5

Month

Co

nsu

mp

tio

n(L

/1

00

Km

)

Boxplot of Consumption(L/100Km)

Boxplot output from the

ANOVA shows a suggestive pattern vs

month but it is still in the

“grey area”.

ANOVA does not differentiate based on the order of the categories – i.e. our eye sees a non random seasonal pattern but ANOVA has the same

output no matter what the ordering of the categories is.



210-1-2

99.9

99

90

50

10

1

0.1

Residual

Pe

rce

nt

8.007.757.507.257.00

2

1

0

-1

-2

Fitted Value

Re

sid

ua

l2.251.500.750.00-0.75-1.50-2.25

20

15

10

5

0

Residual

Fre

qu

en

cy

140

130

120

110

1009080706050403020101

2

1

0

-1

-2

Observation Order

Re

sid

ua

l

Normal Probability Plot Versus Fits

Histogram Versus Order

Residual Plots for Consumption(L/100Km)

We look for problems in

the residuals.

The distribution looks to be

roughly normal

No obvious change in variance or curvature in the residuals.

A random pattern vs order of observations (a good thing!).


The ANOVA table in the session window:

One-way ANOVA: Consumption(L/100Km) versus Month

Source DF SS MS F P

Month 11 8.480 0.771 1.38 0.190

Error 126 70.391 0.559

Total 137 78.871

S = 0.7474 R-Sq = 10.75% R-Sq(adj) = 2.96%


P-Value is not below 0.05 so we cannot reject Ho i.e. we have no / insufficient evidence that there is a

relationship between month and fuel consumption.



Pooled StDev

Level N Mean StDev -------+---------+---------+---------+--

1 7 8.0276 0.6886 (-----------*----------)

2 7 7.7260 0.5718 (-----------*----------)

3 8 7.7382 0.8514 (----------*---------)

4 9 7.5029 0.5035 (---------*---------)

5 9 7.1770 0.5295 (---------*--------)

6 11 7.2524 0.8357 (--------*--------)

7 20 7.3601 0.9416 (-----*------)

8 21 7.2793 0.8750 (------*-----)

9 10 7.0942 0.5938 (--------*--------)

10 13 7.5054 0.4882 (-------*-------)

11 12 7.7771 0.4345 (--------*-------)

12 11 7.6613 0.9480 (--------*--------)

-------+---------+---------+---------+--

7.00 7.50 8.00 8.50

Pooled StDev = 0.7474


The confidence intervals are overlapping


• There was a pattern in the data that

suggested a seasonal effect. If we

fewer categories, but more data in

each category, we might we a

statistically significant effect.

• We create a new variable “Quarter” in

a similar fashion to how we created

“Month”.




4321

10

9

8

7

6

5

Quarter

Co

nsu

mp

tio

n(L

/1

00

Km

)

Boxplot of Consumption(L/100Km)

The graph is suggestive of a difference, but it is not completely

clear cut, so we judge by the ANOVA table in the

session window.


One-way ANOVA: Consumption(L/100Km) versus Quarter

Source DF SS MS F P

Quarter 3 6.595 2.198 4.08 0.008

Error 134 72.275 0.539

Total 137 78.871

S = 0.7344 R-Sq = 8.36% R-Sq(adj) = 6.31%


Pooled StDev

Level N Mean StDev -----+---------+---------+---------+----

1 22 7.8264 0.7002 (---------*---------)

2 29 7.3067 0.6488 (--------*--------)

3 51 7.2747 0.8462 (-----*------)

4 36 7.6436 0.6412 (-------*-------)

-----+---------+---------+---------+----

7.20 7.50 7.80 8.10


The confidence

intervals are not completely

overlapping

P-Value is below 0.05 so we reject Ho and accept Ha

i.e. we have evidence that there is a relationship

between quarter and fuel consumption.

This source of variation explains 8% of the variance in fuel consumption.

The max. seasonal difference in fuel consumption is about 0.5 L/100Km

2 sample t-test one way anova - minitab maestro | minitab...

Documents