l08_ch 08.ppt

1 Copyright F. Michael Speed1

Learning Outcomes

• You will learn– How to identify if a problem fits an ANOVA

model– How to setup a ANOVA model– How to interpret the terms of the model– What are the important hypotheses to be

tested– What is the difference between “clean” &

“dirty” models

2

A Statistical Test About More Than Two Population Means:

An Analysis of Variance

3

REGRESSION – ANOVA

X1 X2 X3 …… Xp

All Scale

Regression

All Factors

ANOVA

4

Poll

• Is the following and ANOVA or a regression?

Creatinine clearance (Y) is an important measure of kidney function, but it is difficult to obtain in a clinical office setting because it requires 24-hour urine collection. To determine whether this measure can be predicted from some data that are easily available, a kidney specialist obtained the data that follow from 33 male subjects. The predictor variable is serum creatinine concentration (X1).

5

Poll

• Is the following and ANOVA or a regression?

The two most crucial factors that influence the strength of solders used in cementing computer chips into the mother board of the guidance system of an airplane are identified as the machine used to insert the solder and the operator of the machine. Only three qualified operators were available, and four solder machines were randomly selected from the many solder machines available at the company’s plants. Each operator made two solders on each of the four machines. The resulting strength determinations of the solders are given here.

6

ANOVA

NO X’s

u1

u3

u2

Y

7

New way to look at Data

Y

u1 u3 u2

8

FIGURE 8.5Distributions of four populations that

satisfy AOV assumptions

9

Applet

• http://bcs.whfreeman.com/ips4e/cat_010/applets/anova.html

10

A Treatment (Factor) 5 Levels

• A• B• C• D• E

• How many populations?

54321: oH

11

Multiple t tests

Null Hypotheses

1 2 1 4 2 3 2 5 3 5

1 3 1 5 2 4 3 4 4 5

12

Analysis of Variance Procedures

1. Each of the five populations has a normal distribution. Use residuals to test this.

2. The variances of the five populations are equal; that is

3. The five sets of measurements are independent random samples from their respective populations.

2 2 2 2 2 21 2 3 4 5 .

13

The Null and Alternative Hypotheses:

(i.e., the t population means are equal)

At least one of the t population means differs from the rest.

0 1 2 3: tH

:aH

14

Table 8.6An example of an AOV table for a

completely randomized design

15

Model

ij i ij

ij i ij

y

or

y

Dirty

Clean

16

Poll

• In the “dirty” model, the parameters are population parameters

• Yes - No

• In the “clean” model, the parameters are population parameters

• Yes - No

17

TABLE 8.11Summary of some of the assumptions for a

completely randomized design

Population

Population

Mean

Population

Variance

Sample

Measurements

1

2

t

2

2

2

1

2

11 12 1

21 22 2

1 2

, , ,

, , ,

, , ,t

n

n

t t tn

y y y

y y y

y y y

1

2

t

1 2 t

18

Checking on the AOV Conditions

• Residuals analysis

• Levene’s test for equality of variances

19

Reporting Conclusions1. Statement of objective for study2. Description of study design and data collection

procedures3. Discussion of why the results from 11 of the 100

patients were not included in the data analysis4. Numerical and graphical summaries of data sets5. Description of all inference methodologies:

– AOV table and F –test– t-based confidence intervals on means– Verification that all necessary conditions for

using inference techniques were satisfied

20

6. Discussion of results and conclusions

7. Interpretation of finding relative to previous studies

8. Recommendations for future studies

9. Listing of data sets

21

• This demonstration illustrates ...

Demonstrationcxxsxdx

22

• This exercise reinforces the concepts discussed previously.

Exercises

25

Multiple ComparisonsBut Which Means Are Different?

Chapter 9

26

Elementary, Watson

27

Linear Contrasts - LMATRIX

1 1 2 21

1 2

2 31

( )

2

t

t t i ii

l

l

l

28

DEFINITION 9.1

1 1 2 2ˆ

.

0.

t t i i i

i i i i

i i

l a y a y a y a y

t

l a a

a

is called

a among the sample

means and can be used to estimate

The s are constants satisfying

the constraint

linear contrast

29

Which Error Rate Is Controlled?

• Individual comparisons

• Experimentwise error rate

• Bonferroni inequality

• Fisher’s protected LSD

• Tukey

• And on and on and on ….

30

Individual Comparison

0 1 1 2 2

1 1 2 2

Error

: 0

: 0

SSCT.S.:

MS

t t

a t t

H l a a a

H l a a a

F

level is correct.

31

2 Comparisons

1

2

1o 1

2o 2

Suppose that we want to test 2 comparisons L

and L . Let be P(Rejecting L | L is true}= .1 .

H : 0

H : 0

Probability of making a TYPE I error on at least

one of the null hypothesis is

P{at lea

L

L

L

2st 1 error} = 1 - (1-.1) 1 .81 .19

32

Table 9.4

33

Experimentwise Error Rate

E

Bonferroni Inequality

If we want to test m hypothesis, then use

/

This will guarantee that the chance of a TYPE I

error is at most .

L m

34

Fisher’s Least Significant Difference Procedure

1. Perform an analysis of variance to test against the alternative

hypothesis that at least one of the means differs from the rest.

2. If there is insufficient evidence to reject using F = MSB/MSW, proceed no further.

3. If is rejected, define the least significant difference (LSD) to be the observed difference between two sample means necessary to declare the corresponding population means different.

0 1:H

0H

2 t

0H

35

Testing What You Want To Test

o 1 2

o 1 3

1 2o

1 3

Is there a difference?

H :

H :

and

H :

36

Testing What You Want To Test - Continued

1 2 3 4

1 2 3 4

1 3 4

: 2* 2*

:( ) / 2

O

O

H

H

37

Testing What You Want To Test - Continued

1 2 3 4

Rewrite as:

1 2 -2 -1

1 -1 -1 1

2 0 -1 -1

38

• This demonstration illustrates ...

Demonstrationcxxsxdx

39

• This exercise reinforces the concepts discussed previously.

Exercises

l08_ch 08.ppt

Documents

anova model

clean model

t populationmeans

model yij

data collectionprocedures3

data analysis4

y2 n2 t t 2yt1

aov table