treatment comparisons anova can determine if there are differences among the treatments, but what is...

24
Treatment Comparisons ANOVA can determine if there are differences among the treatments, but what is the nature of those differences? Are the treatments measured on a continuous scale? Look at response surfaces (linear regression, polynomials) Is there an underlying structure to the treatments? Compare groups of treatments using orthogonal contrasts or a limited number of preplanned mean comparison tests Use simultaneous confidence intervals on preplanned comparisons

Upload: ally-partin

Post on 30-Mar-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Treatment Comparisons ANOVA can determine if there are differences among the treatments, but what is the nature of those differences? Are the treatments

Treatment Comparisons ANOVA can determine if there are differences

among the treatments, but what is the nature of those differences?

Are the treatments measured on a continuous scale? Look at response surfaces (linear regression, polynomials)

Is there an underlying structure to the treatments? Compare groups of treatments using orthogonal contrasts

or a limited number of preplanned mean comparison tests Use simultaneous confidence intervals on preplanned

comparisons

Are the treatments unstructured? Use appropriate multiple comparison tests (today’s topic)

Page 2: Treatment Comparisons ANOVA can determine if there are differences among the treatments, but what is the nature of those differences? Are the treatments

Variety Trials

In a breeding program, you need to examine large numbers of selections and then narrow to the best

In the early stages, based on single plants or single rows of related plants. Seed and space are limited, so difficult to have replication

When numbers have been reduced and there is sufficient seed, you can conduct replicated yield trials and you want to be able to “pick the winner”

Page 3: Treatment Comparisons ANOVA can determine if there are differences among the treatments, but what is the nature of those differences? Are the treatments

Comparison of Means Pairwise Comparisons

– Least Significant Difference (LSD)

Simultaneous Confidence Intervals– Tukey’s Honestly Significant Difference (HSD)– Dunnett Test (making all comparisons to a control)

• May be a one-sided or two-sided test– Bonferroni Inequality– Scheffé’s Test – can be used for unplanned comparisons

Other Multiple Comparison Tests - “Data Snooping”– Fisher’s Protected LSD (FPLSD)– Student-Newman-Keuls test (SNK)– Waller and Duncan’s Bayes LSD (BLSD)– False Discovery Rate Procedure

Often misused - intended to be used only for data from experiments with unstructured treatments

Page 4: Treatment Comparisons ANOVA can determine if there are differences among the treatments, but what is the nature of those differences? Are the treatments

Multiple Comparison Tests

Fixed Range Tests – a constant value is used for all comparisons– Application

• Hypothesis Tests• Confidence Intervals

Multiple Range Tests – values used for comparison vary across a range of means– Application

• Hypothesis Tests

Page 5: Treatment Comparisons ANOVA can determine if there are differences among the treatments, but what is the nature of those differences? Are the treatments

Type I vs Type II Errors Type I error - saying something is different when it is really the

same (false positive) (Paranoia)– the rate at which this type of error is made is the significance

level Type II error - saying something is the same when it is really

different (false negative) (Sloth)– the probability of committing this type of error is designated b– the probability that a comparison procedure will pick up a real

difference is called the power of the test and is equal to 1-b Type I and Type II error rates are inversely related to each other For a given Type I error rate, the rate of Type II error depends on

– sample size– variance– true differences among means

Page 6: Treatment Comparisons ANOVA can determine if there are differences among the treatments, but what is the nature of those differences? Are the treatments

Nobody likes to be wrong... Protection against Type I is choosing a significance level Protection against Type II is a little harder because

– it depends on the true magnitude of the difference which is unknown

– choose a test with sufficiently high power Reasons for not using LSD to make all possible

comparisons– the chance for a Type I error increases dramatically as

the number of treatments increases

Page 7: Treatment Comparisons ANOVA can determine if there are differences among the treatments, but what is the nature of those differences? Are the treatments

Pairwise Comparisons

Making all possible pairwise comparisons among t treatments– # of comparisons:

If you have 10 varieties and want to look at all possible pairwise comparisons – that would be t(t-1)/2 or 10(9)/2 = 45

– that’s quite a few more than t-1 df = 9

t! t(t 1)t2 2!(t 2)! 2

Page 8: Treatment Comparisons ANOVA can determine if there are differences among the treatments, but what is the nature of those differences? Are the treatments

Comparisonwise vs Experimentwise Error

Comparisonwise error rate ( = C)– measures the proportion of all differences that are

expected to be declared real when they are not

Experimentwise error rate (E)– the risk of making at least one Type I error among the

set (family) of comparisons in the experiment– measures the proportion of experiments in which one

or more differences are falsely declared to be significant

– the probability of being wrong increases as the number of means being compared increases

– Also called familywise error rate (FWE)

Page 9: Treatment Comparisons ANOVA can determine if there are differences among the treatments, but what is the nature of those differences? Are the treatments

Experimentwise error rate (E)Probability of no Type I errors = (1-C)x

where x = number of pairwise comparisons

Max x = t(t-1)/2 , where t=number of treatments

Probability of at least one Type I error

E = 1- (1-C)x

Comparisonwise error rate

C = 1- (1-E)1/x

if t = 10, Max x = 45E = 1-(1-0.05)45 = 90%

Comparisonwise vs Experimentwise Error

Page 10: Treatment Comparisons ANOVA can determine if there are differences among the treatments, but what is the nature of those differences? Are the treatments

Least Significant Difference Calculating a t for testing the difference between two

means

– Any difference for which the tcalc > t would be declared significant

Further, is the smallest difference for which significance would be declared

– Therefore

– For equal replication, where r is the number of observations forming each mean

1 2

2calc 1 2 Y Yt (Y Y ) / s

1 2

2Y Yt s

1 2

2Y YLSD t s

2*MSELSD t

r

Page 11: Treatment Comparisons ANOVA can determine if there are differences among the treatments, but what is the nature of those differences? Are the treatments

Do’s and Don’ts of using LSD

LSD is a valid test when– Making comparisons planned in advance of seeing the

data (this includes the comparison of each treatment with the control)

– Comparing adjacent ranked means

The LSD should not (unless F test for treatments is significant**) be used for– Making all possible pairwise comparisons

– Making more comparisons than df for treatments

**Some would say that LSD should never be used unless the F test from ANOVA is significant

Page 12: Treatment Comparisons ANOVA can determine if there are differences among the treatments, but what is the nature of those differences? Are the treatments

Pick the Winner

A plant breeder wanted to measure resistance to stem rust for six wheat varieties– planted 5 seeds of each variety in each of four pots– placed the 24 pots randomly on a greenhouse bench– inoculated with stem rust– measured seed yield per pot at maturity

Page 13: Treatment Comparisons ANOVA can determine if there are differences among the treatments, but what is the nature of those differences? Are the treatments

Ranked Mean Yields (g/pot)

Mean Yield Difference

Variety Rank

F 1 95.3

D 2 94.0 1.3

E 3 75.0 19.0

B 4 69.0 6.0

A 5 50.3 18.7

C 6 24.0 26.3

iY i 1 iY - Y

Page 14: Treatment Comparisons ANOVA can determine if there are differences among the treatments, but what is the nature of those differences? Are the treatments

ANOVA

Source df MS F

Variety 5 2,976.44 24.80

Error 18 120.00

Compute LSD at 5% and 1%

0.05,df 18

2*MSE 2*120LSD t 2.101 16.27

r 4

0.01,df 18

2*MSE 2*120LSD t 2.878 22.29

r 4

Page 15: Treatment Comparisons ANOVA can determine if there are differences among the treatments, but what is the nature of those differences? Are the treatments

Back to the data...

Mean Yield Difference

Variety Rank

F 1 95.3

D 2 94.0 1.3

E 3 75.0 19.0*

B 4 69.0 6.0

A 5 50.3 18.7*

C 6 24.0 26.3**

LSD=0.05 = 16.27

LSD=0.01 = 22.29

iY i 1 iY - Y

Page 16: Treatment Comparisons ANOVA can determine if there are differences among the treatments, but what is the nature of those differences? Are the treatments

Fisher’s protected LSD (FPLSD)

Uses comparisonwise error rate

Computed just like LSD but you don’t use it unless the F for treatments tests significant

So in our example data, any difference between means that is greater than 16.27 is declared to be significant

2*MSELSD t

r

Page 17: Treatment Comparisons ANOVA can determine if there are differences among the treatments, but what is the nature of those differences? Are the treatments

Tukey’s Honestly Significant Difference (HSD)

From a table of Studentized range values (see handout), select a value of Q which depends on p (the number of means) and v (error df)

Compute:

For any pair of means, if the difference is greater than HSD, it is significant

Uses an experimentwise error rate

Use the Tukey-Kramer test with unequal sample size

,

MSEHSD Q

r p,v

,1 2

MSE 1 1HSD Q

2 r r

p,v

Page 18: Treatment Comparisons ANOVA can determine if there are differences among the treatments, but what is the nature of those differences? Are the treatments

Student-Newman-Keuls Test (SNK) Rank the means from high to low

Compute t-1 significant differences, SNKj , using the studentized values for the HSD

Compare the highest and lowest– if less than SNK, no differences are significant– if greater than SNK, compare next highest mean with

next lowest using next SNK

Uses experimentwise for the extremes

Uses comparisonwise for adjacent means

where j=1,2,..., t-1; k=2,3,...,tk = number of means in the rangej ,

MSESNK Q

r k,v

Page 19: Treatment Comparisons ANOVA can determine if there are differences among the treatments, but what is the nature of those differences? Are the treatments

Using SNK with example data:

Mean YieldVariety Rank

F 1 95.3

D 2 94.0

E 3 75.0

B 4 69.0

A 5 50.3

C 6 24.0

k 2 3 4 5 6

Q 2.97 3.61 4.00 4.28 4.49

SNK 16.27 19.77 21.91 23.44 24.59

5 4 3 2 1

= 15 comparisons

18 df for error

SNK=Q*se

iY

MSE 120se 5.477

r 4

Page 20: Treatment Comparisons ANOVA can determine if there are differences among the treatments, but what is the nature of those differences? Are the treatments

Duncan’s New Multiple-Range Test Critical value varies depending on the number of means

involved in the test

Alpha 0.05 Error Degrees of Freedom 6 Error Mean Square 113.0833  Number of Means 2 3 4 5 6 Critical Range 26.02 26.97 27.44 27.67 27.78  Means with the same letter are not significantly different.  Duncan Grouping Mean N variety  A 95.30 2 6 A A 94.00 2 4 A B A 75.00 2 5 B A B A 69.00 2 2 B B 50.30 2 1  C 22.50 2 3

Page 21: Treatment Comparisons ANOVA can determine if there are differences among the treatments, but what is the nature of those differences? Are the treatments

Waller-Duncan Bayes LSD (BLSD)

Do ANOVA and compute F (MST/MSE) with q and f df (corresponds to table nomenclature)

Choose error weight ratio, k– k=100 corresponds to 5% significance level– k=500 for a 1% test

Obtain tb from table (A7 in Petersen) – depends on k, F, q (treatment df) and f (error df)

Compute Any difference greater than BLSD is significant Does not provide complete control of experimentwise Type

I error Reduces Type II error

BLSD = t 2MSE/r

Page 22: Treatment Comparisons ANOVA can determine if there are differences among the treatments, but what is the nature of those differences? Are the treatments

Bonferroni Inequality Theory

E X * C where X = number of pairwise comparisons

To get critical probability value for significanceC = E / X where E = maximum desired experimentwise

error rate

Alternatively, multiply observed probability value by X and compare to E (values >1 are set to 1)

Advantages– simple– strict control of Type I error

Disadvantage– very conservative, low power to detect differences

Page 23: Treatment Comparisons ANOVA can determine if there are differences among the treatments, but what is the nature of those differences? Are the treatments

False Discovery Rate

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 210.00

0.05

0.10

0.15

0.20

0.25

False Positive Procedure

Rank (i)

Pro

ba

bili

ty

Reject H0

Bars show P values for simple t tests among means– Largest differences have the smallest P values

Line represents critical P values = (i/X)* E

i = 1 to XRanks for

-

Page 24: Treatment Comparisons ANOVA can determine if there are differences among the treatments, but what is the nature of those differences? Are the treatments

Most Popular

FPLSD test is widely used, and widely abused BLSD is preferred by some because

– It is a single value and therefore easy to use– Larger when F indicates that the means are homogeneous and

small when means appear to be heterogeneous

The False Discovery Rate (FDR) has nice features– Good experimentwise Type I error control– Good power (Type II error control)– May not be as well-known as some other tests

Tukey’s HSD test– Widely accepted and often recommended by statisticians– May be too conservative if Type II error has more serious

consequences than Type I error